De Novo Peptide Design: From Computer Screen to Clinical Trial

Ten years ago, designing a peptide meant screening thousands of molecules in a lab. Today, researchers generate therapeutic candidates on a laptop before synthesizing a single compound. This shift from wet lab to algorithm represents one of the biggest changes in drug discovery since combinatorial chemistry.

"De novo" means "from scratch" — designing peptides without starting from natural templates. Instead of tweaking existing sequences, computational tools build entirely new molecules atom by atom, optimizing them for specific therapeutic targets. The promise is compelling: faster development, lower costs, and access to "undruggable" disease targets. The reality includes both remarkable successes and instructive failures.

The Computational Design Pipeline

Traditional drug discovery costs up to $1 billion and takes over 15 years, with a 90% failure rate in clinical trials. Computational peptide design aims to shift those failures earlier — to the computer screen where they're cheap to fix rather than in Phase III trials where they're catastrophic.

The pipeline follows a clear sequence:

1. Target Selection and Analysis Researchers identify a protein target — a cancer marker, viral receptor, or signaling molecule. They need structural information: X-ray crystallography data, NMR structures, or increasingly, AlphaFold predictions. The quality of this structural data determines everything downstream.

2. Computational Design This is where algorithms take over. Tools like Rosetta, RFdiffusion, and ProteinMPNN generate candidate peptide sequences designed to bind the target. Rosetta uses physics-based energy calculations to predict how peptides fold and interact. RFdiffusion, developed at the University of Washington's Institute for Protein Design, applies diffusion models — the same AI architecture behind image generators — to protein design. It generates diverse, target-compatible peptide binders from random noise distributions.

ProteinMPNN typically works downstream of RFdiffusion, designing sequences that encode the backbone structures RFdiffusion generates. Together, these tools can design peptides with nanomolar to picomolar binding affinities purely in silico.

3. Virtual Screening and Optimization The initial designs go through computational filters. Molecular dynamics simulations test stability — does the peptide hold its shape? Will it unfold in biological conditions? Physics-based methods can predict peptide structures with experimental-level accuracy, though performance depends heavily on the force fields used in simulations. Stable RMSD (root mean square deviation) values indicate reliable conformations. RMSF (root mean square fluctuation) captures residue flexibility.

Researchers also screen for drug-like properties: Can the peptide cross cell membranes? Will it resist enzymatic degradation? Is it likely to trigger immune responses? Virtual screening reduces the initial pool from thousands of candidates to dozens worth synthesizing.

4. Chemical Synthesis Solid-phase peptide synthesis makes it possible to manufacture designed sequences rapidly. Where traditional small molecule synthesis might take months, peptides can be ready in days. Modifications like stapling (crosslinking residues to lock in helical structure) or cyclization improve stability and membrane permeability.

5. In Vitro Testing Synthesized peptides face their first experimental test. Do they actually bind the target? Surface plasmon resonance, isothermal titration calorimetry, and other biophysical assays measure binding affinity. Many designs fail here — computational predictions don't always match wet lab reality.

6. Cell-Based Assays Binding isn't enough. The peptide must function in living cells. Does it reach its intracellular target? Does it affect the intended pathway? Cell-penetrating sequences often get added at this stage to improve delivery.

7. Animal Studies Peptides that work in cells move to animal models. Here, pharmacokinetics becomes critical. Many peptides have short half-lives in blood — minutes to hours before enzymatic degradation. Chemical modifications, PEGylation, or peptide-drug conjugate strategies extend circulation time.

8. Clinical Trials If animal data shows efficacy and acceptable safety, the peptide enters human trials. This is where the validation gap — the distance between computational promise and clinical reality — becomes most visible.

Key Computational Tools

Rosetta

Developed over decades by David Baker's lab and the broader Rosetta Commons community, Rosetta remains foundational to protein and peptide design. It combines physics-based energy calculations with sampling algorithms to predict structures and design sequences.

Rosetta excels at designing constrained cyclic peptides and other macrocycles. Using Rosetta's design tools in combination with sampling and scoring approaches, researchers have designed, validated, synthesized, and experimentally verified dozens of peptide macrocycles. One notable example: peptide macrocycle inhibitors of New Delhi metallo-β-lactamase 1 (NDM-1), an antibiotic resistance factor.

The software also powers Rosetta@home, a distributed computing platform increasingly used for large-scale virtual screening and peptide simulations where even current AI models struggle with novel small molecules and non-canonical peptides.

RFdiffusion

RFdiffusion represents the current frontier. This generative model, created by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, can design protein binders from scratch. The Institute for Protein Design extended it to RFpeptides specifically for cyclic peptide design.

RFdiffusion generates diverse backbone structures, then ProteinMPNN designs amino acid sequences to encode those backbones. Designed binders to peptides like Bim showed single-digit nanomolar affinity. Binders to parathyroid hormone (PTH) achieved ~340 picomolar dissociation constants — among the tightest affinities achieved through computational design.

The technology enables design of binders to flexible targets, a longstanding challenge. Researchers have created picomolar-affinity binders to helical peptide targets either by refining existing designs or completely de novo from random noise.

Language Model Approaches

Protein language models, trained on millions of natural sequences, have entered peptide design. PepMLM, introduced in early 2025, finetunes ESM-2 (a transformer-based protein language model) to design linear peptide binders conditioned on target sequences. PepMLM-derived peptides bind cancer markers like NCAM1 and AMHR2, enabling targeted protein degradation across disease contexts from Huntington's disease to viral infections.

PepPrCLIP uses contrastive learning to design binders to conformationally diverse targets. CYC_BUILDER, a reinforcement learning framework published in 2025, assembles peptide fragments and performs efficient cyclization via head-to-tail amide or disulfide bonds. Four of nine experimentally tested CYC_BUILDER peptides showed potent binding and cellular activity.

These AI-driven approaches increasingly complement and compete with physics-based methods. Language models capture patterns from natural evolution that pure physics calculations miss. Physics-based tools provide mechanistic understanding that language models lack. Most successful workflows combine both.

Success Stories: Molecules That Made It

ALRN-6924 (Sulanemadlin)

The clearest success story is ALRN-6924, the first cell-permeating, stabilized α-helical peptide to reach clinical trials. Developed by Aileron Therapeutics, it's a stapled peptide designed to disrupt p53 inhibition by MDM2 and MDMX, inducing cell-cycle arrest or apoptosis in tumors with wild-type TP53.

Phase I trials enrolled 71 patients with solid tumors and lymphomas. The peptide was well tolerated. Disease control rate hit 45%, including two complete responses (peripheral T-cell lymphoma and Merkel cell carcinoma), two partial responses, and 21 patients with stable disease. Phase IIa trials followed in relapsed/refractory peripheral T-cell lymphoma.

ALRN-6924 validates the concept that computational design combined with chemical stabilization (stapling) can produce cell-permeable peptides that work in humans.

Institute for Protein Design Therapeutics

Several computationally designed proteins and peptides from David Baker's lab have reached clinical testing:

KumaMax, an enzyme designed for celiac disease treatment, started as an undergraduate project at the Institute for Protein Design. Takeda Pharmaceuticals is now testing it in human trials.

GBP510, a COVID-19 vaccine candidate designed using computational methods, received Emergency Use Listing from the World Health Organization — the 12th COVID-19 vaccine to do so. It was approved in the United Kingdom and South Korea. Human trials began in February 2026 for a computationally designed vaccine targeting the entire SARS virus family.

Flu-Mos-v1, a candidate mosaic influenza vaccine potentially requiring only one lifetime dose, entered Phase I testing.

Neoleukin Therapeutics' fully-engineered compound for solid tumors reached early clinical trials.

These molecules span autoimmune disease, infectious disease, and oncology — diverse therapeutic areas united by computational origin.

GLP-1 Analogs and Beyond

While not purely "de novo" in the sense of entirely novel sequences, computational optimization has transformed naturally occurring peptides into blockbuster therapeutics. GLP-1 receptor agonists like semaglutide (Ozempic, Wegovy) underwent extensive computational modeling to optimize half-life and receptor selectivity. These drugs demonstrate how computational tools accelerate peptide therapeutics development even when starting from natural templates.

The global therapeutic peptide pipeline now includes over 80 FDA/EMA-approved agents with 650+ candidates in development. Not all are computationally designed, but the proportion using computational methods grows yearly.

The Validation Gap: When Designs Fail

For every success, computational peptide design produces many failures. Understanding why designs fail reveals the technology's current limits.

Peptide Flexibility

Peptides occupy uncomfortable middle ground. They're too large for small molecule docking tools but too flexible for protein design methods. Small molecules have defined conformations. Proteins, especially folded domains, have stable structures. Peptides, particularly linear ones, sample many conformations.

Computational tools struggle with this flexibility. They often design for a single peptide conformation, but in solution the peptide may adopt different shapes. The designed binding pose becomes one possibility among many, diluting activity. Evidence suggests that applying small-molecule or protein design strategies directly to peptides fails primarily because peptide conformational flexibility limits the effectiveness of both docking tools and scoring functions.

Force Field Accuracy

Molecular dynamics simulations depend on empirical force fields — mathematical models of atomic interactions. These force fields, while sophisticated, are approximations. They're optimized and validated primarily on natural amino acids in common structural contexts. Unusual peptide conformations, non-canonical amino acids, or chemical modifications can push force fields beyond their reliable range.

A peptide that looks stable in simulation may unfold rapidly in experimental conditions, or vice versa.

Context Collapse

Computational design typically optimizes binding in isolation — peptide and target protein in vacuum or simple solvent. Real biology is messier. Cell membranes, crowded cytoplasm, post-translational modifications, pH gradients, and competing binding partners all affect whether a designed peptide works.

The gap between in vitro success and clinical translation remains substantial. Many peptide modifications that show promise in laboratory studies fail in complex biological environments. Peptide-mediated interactions are weaker and more transient than those involving larger ligands, complicating experimental validation.

Confidence Overestimation

A 2026 benchmark evaluating BindCraft, BoltzGen, and RFdiffusion3 for GPCR peptide design revealed a systematic problem: design pipelines significantly overestimate confidence for misplaced peptides during validation. All three methods sampled backbone space sufficiently, but their simultaneous sequence generation remained subpar. Tools confidently predict binding for peptides that fail experimentally.

This overconfidence wastes resources. Researchers synthesize and test peptides the algorithm flagged as high-probability successes, only to find they don't work. Calibrating computational confidence to match experimental success rates remains an open problem.

Data Heterogeneity

Machine learning approaches require training data. Peptide-protein interaction data is sparse, inconsistent, and biased toward certain target classes. Models trained on this data inherit its limitations. They may generalize poorly to novel targets or unusual binding modes. Challenges persist in data heterogeneity, model generalizability, and the gap between in silico predictions and experimental validation.

Cost and Time: What Computational Design Actually Saves

The marketing pitch is seductive: computational design replaces years of screening and optimization with algorithms that run in hours. The reality is more nuanced.

Reduced Screening Costs

Traditional peptide discovery screens tens or hundreds of thousands of compounds in high-throughput biochemical assays. Each compound costs money to synthesize or obtain. Each assay consumes reagents and technician time.

Computational screening is orders of magnitude cheaper. Virtual screening significantly reduces time and resources compared to experimental initial screening. One study with 160,000 tetrapeptides targeting viral envelope proteins demonstrated the efficiency gain: training a machine learning model on just 1% of the dataset accurately predicted the remaining 99%, reducing computational costs while increasing screening speed tenfold.

In silico predictions are faster and cheaper than in vitro assays. Learning a model accelerates the costly screening process.

Earlier Failure Detection

Computational technology shifts failures to earlier stages where they're less expensive. Later steps, particularly clinical trials, involve enormous resource investment. Computational predictions improve the probability of success in later steps by informing earlier decisions.

If a peptide will fail due to poor stability or off-target binding, catching that failure on a computer is vastly cheaper than discovering it in Phase II trials.

Development Time

Traditional drug discovery exceeds 15 years from target identification to approval. Computational methods compress the early discovery phase. What might take years of library screening and lead optimization can happen in months of computational design and focused experimental validation.

However, computational design doesn't accelerate everything. Chemical synthesis, cell assays, animal studies, and clinical trials still take time. A peptide designed in a week still faces years of validation.

The Caveat

No AI-assisted peptide drugs have yet received FDA approval based purely on computational design from scratch. Most FDA-approved peptide drugs still rely on traditional discovery methods — screening, evolution, or modification of natural sequences. Computational tools increasingly assist these methods, optimizing leads faster and predicting modifications, but fully autonomous computational design hasn't yet produced an approved therapeutic.

The technology accelerates discovery. It hasn't replaced the full pipeline.

Current State of the Field (2025-2026)

The pace of innovation accelerated sharply in 2024-2025. Multiple platforms launched demonstrating experimental validation:

DiffPepBuilder, a target-specific peptide binder generation method using SE(3)-equivariant diffusion models, effectively recalled native peptide structures and sequences while generating novel binders with improved binding free energy.

PepMLM's publication in Nature Biotechnology demonstrated sequence-specific binding to cancer and reproductive targets with successful targeted protein degradation.

Reinforcement learning entered the field. CYC_BUILDER uses Monte Carlo Tree Search to guide fragment selection, peptide growth, and structure refinement for cyclic peptides. Four of nine tested designs showed potent binding and cellular activity — a 44% success rate far exceeding traditional rational design.

Generative AI approaches increasingly combine multiple AI architectures. Fragment-based methods, language models, diffusion models, and reinforcement learning work together in integrated pipelines.

The bottleneck shifted from generating candidates to validating them. Tools can now produce thousands of designed peptides. The experimental capacity to synthesize, test, and characterize them lags behind. High-throughput validation technologies — multiplexed binding assays, pooled synthesis, DNA-barcoded libraries — are evolving to match computational output.

What Comes Next

Several trends are converging:

Active learning loops close the gap between computation and experiment. Instead of designing in isolation then testing, algorithms learn from experimental results in real time. Each round of synthesis and testing refines the model, creating a feedback loop that improves prediction accuracy.

Multi-objective optimization moves beyond binding affinity. Next-generation tools optimize simultaneously for binding, stability, cell permeability, immunogenicity, half-life, and manufacturability. This requires integrating diverse prediction models and balancing competing constraints.

Inverse folding — designing sequences to encode desired structures — becomes routine. Tools like ProteinMPNN already do this reliably for common folds. Extending inverse folding to unusual constrained peptides and modified amino acids is underway.

Experimental AI integrates computational design with automated synthesis and testing. Robotic systems synthesize computationally designed peptides, run binding assays, sequence binders, and feed results back to design algorithms without human intervention. This closed-loop automation could accelerate discovery by orders of magnitude.

Expanded chemical space includes non-canonical amino acids, backbone modifications, and hybrid structures mixing peptide and small molecule elements. Current tools focus on the 20 standard amino acids. Future platforms will design with hundreds of building blocks.

The market pressure is real. The peptide therapeutics market is projected to reach $49.68 billion in 2026. The FDA has cracked down on compounding, pushing development toward properly validated therapeutics. Millions of patients are searching for peptide information. The economic incentive to make computational design work is enormous.

Making Sense of the Hype

Computational peptide design is not magic. It's a set of increasingly powerful tools that accelerate parts of drug discovery while leaving other parts unchanged.

These tools can generate novel binding molecules faster than traditional screening. They can optimize properties that once required iterative rounds of synthesis and testing. They enable access to difficult targets — disordered proteins, intracellular protein-protein interactions — that resist small molecule approaches.

They cannot yet predict with certainty whether a designed peptide will work in humans. They cannot eliminate the need for rigorous validation. They cannot make biology simpler than it is.

The question is not whether computational design works — ALRN-6924, KumaMax, and GBP510 demonstrate it does. The question is how often it works, under what conditions, and how we improve success rates.

Current data suggests success rates for experimentally validated binding in the 30-50% range for state-of-the-art tools on favorable targets. That's remarkable compared to random screening but far from deterministic. For each target class, for each type of modification, for each therapeutic modality, the field is learning what works through accumulating experimental data.

This is early innings. The tools launched in 2023-2025 represent the first generation of deep learning-powered peptide design. We're still figuring out how to use them, where they fail, and how to fix those failures.

The trajectory is clear: from random screening to rational design, from years to months, from empirical guesswork to data-driven optimization. Not overnight. Not without setbacks. But directionally, unmistakably, toward a future where therapeutic peptides are designed on screens before they're synthesized in labs.

References

Watson, J.L., et al. (2023). De novo design of protein structure and function with RFdiffusion. Nature, 620(7976), 1089-1100. https://www.nature.com/articles/s41586-023-06415-8
Dauparas, J., et al. (2022). Robust deep learning–based protein sequence design using ProteinMPNN. Science, 378(6615), 49-56. https://www.science.org/doi/10.1126/science.add2187
Cao, L., et al. (2025). Target sequence-conditioned design of peptide binders using masked language modeling. Nature Biotechnology. https://www.nature.com/articles/s41587-025-02761-2
Wu, K., et al. (2025). Reinforcement Learning-Based Target-Specific De Novo Design of Cyclic Peptide Binders. Journal of Medicinal Chemistry. https://pubs.acs.org/doi/10.1021/acs.jmedchem.5c00789
Bennett, N.R., et al. (2023). Discovery of Sulanemadlin (ALRN-6924), the First Cell-Permeating, Stabilized α-Helical Peptide in Clinical Development. Journal of Medicinal Chemistry, 66(14), 9401-9424. https://pubs.acs.org/doi/10.1021/acs.jmedchem.3c00623
Manz, C., et al. (2021). Phase 1 Trial of ALRN-6924, a Dual Inhibitor of MDMX and MDM2, in Patients with Solid Tumors and Lymphomas Bearing Wild-type TP53. Clinical Cancer Research, 27(19), 5236-5247. https://pmc.ncbi.nlm.nih.gov/articles/PMC9401461/
Institute for Protein Design. (2024). Introducing RFpeptides – AI for cyclic peptide design. https://www.ipd.uw.edu/2024/11/introducing-rfpeptides-ai-for-cyclic-peptide-design/
Rettie, S.A., et al. (2023). De novo design of high-affinity binders of bioactive helical peptides. Nature, 626(7997), 435-442. https://www.nature.com/articles/s41586-023-06953-1
Scardino, I., et al. (2025). De novo design of peptide binders to conformationally diverse targets with contrastive language modeling. Science Advances, 11(2). https://www.science.org/doi/10.1126/sciadv.adr8638
Hossain, S., et al. (2025). Therapeutic Peptides: Recent Advances in Discovery, Synthesis, and Clinical Translation. International Journal of Molecular Sciences, 26(11), 5131. https://pmc.ncbi.nlm.nih.gov/articles/PMC12154100/
Gao, Y., & Gao, F. (2024). Peptide-based drug discovery through artificial intelligence: towards an autonomous design of therapeutic peptides. Briefings in Bioinformatics, 25(4), bbae275. https://academic.oup.com/bib/article/25/4/bbae275/7690345
Basanta, B., et al. (2020). The emerging role of computational design in peptide macrocycle drug discovery. Expert Opinion on Drug Discovery, 15(7), 833-852. https://www.tandfonline.com/doi/full/10.1080/17460441.2020.1751117
Wang, Y., et al. (2022). Towards rational computational peptide design. Frontiers in Bioinformatics, 2, 1046493. https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2022.1046493/full
Zhao, R., et al. (2019). Applications of Molecular Dynamics Simulation in Structure Prediction of Peptides and Proteins. Computational and Structural Biotechnology Journal, 17, 1162-1170. https://pmc.ncbi.nlm.nih.gov/articles/PMC6709365/
Wang, L., et al. (2022). Therapeutic peptides: current applications and future directions. Signal Transduction and Targeted Therapy, 7(1), 48. https://www.nature.com/articles/s41392-022-00904-4