Peptide Sequence Notation: How to Read Peptide Names
Open any peptide research paper and you will encounter strings like "Ac-D-Trp-Ala-Trp-D-Phe-Lys-NH2" or "GHRP-6" or "H-Gly-Arg-Gly-Asp-Ser-OH." For the uninitiated, this looks like a different language.
Open any peptide research paper and you will encounter strings like "Ac-D-Trp-Ala-Trp-D-Phe-Lys-NH2" or "GHRP-6" or "H-Gly-Arg-Gly-Asp-Ser-OH." For the uninitiated, this looks like a different language. It basically is one — a standardized shorthand developed over decades by international chemistry organizations to describe peptide structures unambiguously.
The good news: once you learn the rules, reading peptide notation becomes straightforward. The system is logical and consistent. A researcher in Tokyo, a clinician in Boston, and a manufacturer in Frankfurt all read "Ala-Gly-Pro" and understand exactly the same molecule.
This guide covers everything you need to decode peptide names — from the one-letter and three-letter amino acid codes to modification notation, IUPAC naming conventions, and the common abbreviations you will encounter in the literature.
Table of Contents
- Why Peptide Naming Conventions Exist
- The Amino Acid Alphabets: One-Letter and Three-Letter Codes
- The N-to-C Convention
- Reading Three-Letter Sequence Notation
- Reading One-Letter Sequence Notation
- Modification Notation
- IUPAC and Systematic Naming
- Common Abbreviations and Special Cases
- Practical Examples: Decoding Real Peptide Names
- FAQ
- The Bottom Line
- References
Why Peptide Naming Conventions Exist {#why-peptide-naming-conventions-exist}
Before standardization, peptide naming was a mess. Different laboratories used different abbreviations. The same peptide could be described three different ways in three different journals. Errors in communication led to errors in synthesis, wasted experiments, and confusion in the literature.
The International Union of Pure and Applied Chemistry (IUPAC) and the International Union of Biochemistry and Molecular Biology (IUBMB) solved this through their Joint Commission on Biochemical Nomenclature (JCBN). Their recommendations, first published comprehensively in 1984 in journals including Biochemical Journal, European Journal of Biochemistry, and Pure and Applied Chemistry, established the rules that the scientific community still uses today (IUPAC-IUBMB, 1984).
These conventions are not suggestions — they are the agreed-upon language of peptide science. Journals, databases, regulatory agencies, and manufacturers all follow them (with occasional shortcuts that we will cover below). Knowing these rules means you can pick up any peptide paper or product sheet and understand exactly what molecule is being described.
The Amino Acid Alphabets: One-Letter and Three-Letter Codes {#the-amino-acid-alphabets}
The 20 standard ("proteinogenic") amino acids — the ones encoded by DNA and incorporated during ribosomal translation — each have two shorthand representations: a three-letter code and a one-letter code.
The 20 Standard Amino Acids
| Amino Acid | Three-Letter | One-Letter | Side Chain Character |
|---|---|---|---|
| Alanine | Ala | A | Nonpolar |
| Arginine | Arg | R | Positive charge |
| Asparagine | Asn | N | Polar |
| Aspartic acid | Asp | D | Negative charge |
| Cysteine | Cys | C | Polar (thiol) |
| Glutamic acid | Glu | E | Negative charge |
| Glutamine | Gln | Q | Polar |
| Glycine | Gly | G | Nonpolar (smallest) |
| Histidine | His | H | Positive charge (at low pH) |
| Isoleucine | Ile | I | Nonpolar |
| Leucine | Leu | L | Nonpolar |
| Lysine | Lys | K | Positive charge |
| Methionine | Met | M | Nonpolar (sulfur) |
| Phenylalanine | Phe | F | Nonpolar (aromatic) |
| Proline | Pro | P | Nonpolar (cyclic) |
| Serine | Ser | S | Polar |
| Threonine | Thr | T | Polar |
| Tryptophan | Trp | W | Nonpolar (aromatic) |
| Tyrosine | Tyr | Y | Polar (aromatic) |
| Valine | Val | V | Nonpolar |
When to Use Which
Three-letter codes are used for short sequences, structural descriptions, and when clarity matters most. They are easier to read at a glance and less prone to errors. Example: Ala-Gly-Pro-Ser.
One-letter codes are preferred for long sequences and database entries, where space matters. Per IUPAC guidelines, one-letter codes "should be restricted to the comparison of long sequences." Example: AGPS.
You will see both in the literature. Short peptides (under 10-15 residues) are typically written in three-letter code. Longer sequences use one-letter code. Databases like UniProt and NCBI use one-letter code almost exclusively.
Ambiguity Codes
Sometimes the exact identity of a residue is uncertain — for instance, when mass spectrometry cannot distinguish between two amino acids of identical mass. The system handles this:
| Ambiguity | Three-Letter | One-Letter | Meaning |
|---|---|---|---|
| Asp or Asn | Asx | B | Could be either (differ by 1 Da) |
| Glu or Gln | Glx | Z | Could be either (differ by 1 Da) |
| Unknown | Xaa | X | Any amino acid |
The B and Z codes are most commonly seen in protein sequencing data where deamidation (Asn to Asp, or Gln to Glu) may have occurred but cannot be confirmed.
The N-to-C Convention {#the-n-to-c-convention}
Every peptide sequence is written from left to right, starting at the N-terminus (the end with a free amino group) and ending at the C-terminus (the end with a free carboxyl group). This is universal. There are no exceptions in standard notation.
The convention mirrors how peptides are actually synthesized in biology: ribosomes build proteins by adding amino acids from the N-terminal end to the C-terminal end. (Solid-phase peptide synthesis actually works in the opposite direction — C-to-N — but the sequence is still written and reported N-to-C.)
So when you see:
Ala-Gly-Pro
This means:
- Alanine is at the N-terminus (left)
- Proline is at the C-terminus (right)
- The peptide bond connects Ala to Gly, and Gly to Pro
Write it as Pro-Gly-Ala and you are describing a completely different molecule — one with reversed polarity, different folding properties, and potentially different biological activity.
Reading Three-Letter Sequence Notation {#reading-three-letter-sequence-notation}
Three-letter notation is the most common format for describing research peptides. The rules are straightforward:
Residues are separated by hyphens. Each three-letter code represents one amino acid in the chain, separated by dashes: Gly-Arg-Gly-Asp-Ser.
Terminal groups may be specified. The full notation can include the N-terminal and C-terminal groups:
- H- at the start indicates a free amino group (the default)
- -OH at the end indicates a free carboxyl group (the default)
So H-Gly-Arg-Gly-Asp-Ser-OH is the fully explicit form of what is usually just written as Gly-Arg-Gly-Asp-Ser. Both describe the same molecule. The explicit form becomes necessary when modifications are present (more on that below).
Naming short peptides. Peptides can also be named by reading the residues as a modified word. A dipeptide of Ala and Gly would be "alanylglycine" — where the "-yl" suffix on alanine indicates it forms the peptide bond, and glycine (as the C-terminal residue) keeps its full name. In practice, this naming convention is used mainly for very short peptides (di-, tri-, and tetrapeptides). Beyond that, sequence notation is far more practical.
Number of residues. Peptides are classified by length:
- Dipeptide: 2 residues
- Tripeptide: 3 residues
- Tetrapeptide: 4 residues
- Oligopeptide: roughly 2-20 residues
- Polypeptide: roughly 20-50+ residues
For an overview of these classifications, see our guide on peptide classifications.
Reading One-Letter Sequence Notation {#reading-one-letter-sequence-notation}
One-letter notation strips the sequence to its bare essentials. No hyphens, no spaces — just a string of capital letters, each representing one amino acid.
The sequence GRGDS is the one-letter equivalent of Gly-Arg-Gly-Asp-Ser. The pentapeptide BPC-157, with 15 residues, would be GEPPPGKPADDAGLV in one-letter code.
One-letter notation is compact and efficient for long sequences. But for short peptides, it can be less immediately readable than three-letter code — and single-letter typos are harder to catch. "GEPPPGKPADDAGLV" versus "GEPPGKPADDAGLV" (one P missing) is easy to miss on visual inspection.
Position Numbering
When discussing specific residues, superscript or subscript numbers indicate position. In papers about semaglutide, you might see references to "Aib at position 2" or "Lys26" — meaning the second amino acid is 2-aminoisobutyric acid (a non-standard residue) and the lysine at position 26 carries a fatty acid modification.
Modification Notation {#modification-notation}
Unmodified, natural-sequence peptides are the simplest to notate. But many research and therapeutic peptides carry modifications — changes designed to improve stability, alter activity, or facilitate detection. The notation system handles all of these.
N-Terminal Modifications
Modifications to the N-terminus are written as prefixes:
- Ac-: Acetylation (addition of an acetyl group, -COCH3). Protects against aminopeptidases.
- Boc-: tert-Butyloxycarbonyl group. Common protecting group in synthesis.
- Fmoc-: Fluorenylmethyloxycarbonyl group. The standard protecting group in modern SPPS.
- pGlu- or <Glu-: Pyroglutamate (cyclized glutamic acid). Found in many natural peptides like TRH.
Example: Ac-Ala-Gly-Pro-NH2 is a tripeptide with an acetylated N-terminus and an amidated C-terminus.
C-Terminal Modifications
Modifications to the C-terminus are written as suffixes:
- -NH2: Amidation (the carboxyl group is replaced by an amide). Common in bioactive peptides; improves receptor binding and stability.
- -OH: Free carboxyl group (the default; often omitted).
- -OMe: Methyl ester.
- -pNA: para-nitroanilide (used in enzyme substrates).
D-Amino Acids
Most natural amino acids exist in the L-configuration. D-amino acids (the mirror image) are indicated by a lowercase "d" or "D-" prefix:
- D-Ala or d-Ala: D-alanine
- D-Trp: D-tryptophan
- D-Phe: D-phenylalanine
D-amino acid substitutions are a common strategy for increasing protease resistance, since most human proteases are optimized to cleave L-amino acid substrates. See our guide on natural vs. synthetic peptides for more context.
Non-Standard Amino Acids
Non-proteinogenic amino acids that lack standard codes are written out or given specific abbreviations:
- Aib: 2-aminoisobutyric acid (alpha-methylalanine)
- Orn: Ornithine
- Nle: Norleucine
- Sar: Sarcosine (N-methylglycine)
- Dab: Diaminobutyric acid
- Hyp: Hydroxyproline
Side-Chain Modifications
Modifications to amino acid side chains are indicated in parentheses or brackets after the affected residue:
- Lys(Ac): Lysine with an acetylated side chain
- Cys(Acm): Cysteine with an acetamidomethyl protecting group
- Ser(PO3H2): Phosphoserine
- Tyr(SO3H): Sulfotyrosine
Disulfide Bonds
Disulfide bonds between cysteine residues are indicated by noting which cysteines are connected. For example, in oxytocin (Cys1-Tyr-Ile-Gln-Asn-Cys6-Pro-Leu-Gly-NH2), the disulfide bond between Cys1 and Cys6 is sometimes shown as a line connecting the two residues in structural diagrams, or noted as "Cys1-Cys6 disulfide" in the text.
IUPAC and Systematic Naming {#iupac-and-systematic-naming}
Beyond sequence-based notation, IUPAC provides rules for systematic naming of peptides — names built from chemical nomenclature principles rather than shorthand codes. In practice, systematic names are rarely used for peptides longer than about three residues because they become unwieldy very quickly.
Systematic Name Construction
The systematic name of a peptide is built by:
- Taking each amino acid name in sequence (N-to-C)
- Converting all but the last to the "-yl" form
- Leaving the C-terminal residue as the full amino acid name
Example: The dipeptide Ala-Gly has the systematic name alanylglycine.
A tripeptide Gly-Ala-Phe becomes glycylalanylphenylalanine.
You can see why this gets impractical fast. A 15-residue peptide like BPC-157 would have a systematic name roughly a paragraph long.
Named Peptides and Modifications
For well-known peptides with established trivial names (oxytocin, vasopressin, insulin, etc.), the IUPAC system provides a way to describe modifications:
- Substitution: [Xaa^q]peptidename — replacing residue at position q with amino acid Xaa. Example: [D-Arg8]vasopressin means vasopressin with the arginine at position 8 replaced by D-arginine.
- Deletion: des-q-aminoacid-peptidename — removing a specific residue. Example: des-Gly10-[GnRH] removes glycine at position 10.
- Extension: Adding residues is described with specific notation for the N- or C-terminal additions.
This bracket notation is commonly used in the endocrinology and pharmacology literature for describing analogs of well-characterized hormones.
Common Abbreviations and Special Cases {#common-abbreviations-and-special-cases}
The peptide literature is full of abbreviations that go beyond simple sequence codes. Here are the most frequently encountered:
Peptide Family Abbreviations
| Abbreviation | Full Name | Notes |
|---|---|---|
| GLP-1 | Glucagon-like peptide-1 | Incretin hormone; basis for semaglutide |
| GnRH | Gonadotropin-releasing hormone | Also called LHRH |
| GHRH | Growth hormone-releasing hormone | Basis for CJC-1295 |
| GHRP | Growth hormone-releasing peptide | Family including GHRP-6, GHRP-2 |
| ACTH | Adrenocorticotropic hormone | 39-residue pituitary peptide |
| MSH | Melanocyte-stimulating hormone | Alpha, beta, and gamma forms |
| VIP | Vasoactive intestinal peptide | 28-residue neuropeptide |
| CRH/CRF | Corticotropin-releasing hormone/factor | 41-residue hypothalamic peptide |
| ANP/BNP | Atrial/Brain natriuretic peptide | Cardiac biomarkers |
| AMP | Antimicrobial peptide | Broad class; see defensins overview |
Code-Named Research Peptides
Many research peptides are known by alphanumeric designations rather than sequences:
- BPC-157: Body Protection Compound-157 (a pentadecapeptide with a specific sequence)
- TB-500: A synthetic fragment of thymosin beta-4
- AOD-9604: Anti-Obesity Drug 9604 (a fragment of human growth hormone)
- PT-141: Bremelanotide (a melanocortin receptor agonist)
- MK-677: Ibutamoren (technically a peptidomimetic, not a peptide)
- GHK-Cu: Glycyl-histidyl-lysine with a copper ion (the copper peptide)
These code names are informal but widely used. The underlying sequences are defined in the literature and should be cited when precision matters.
Synthesis-Related Terms
| Term | Meaning |
|---|---|
| SPPS | Solid-phase peptide synthesis |
| Fmoc | Fluorenylmethyloxycarbonyl (protecting group strategy) |
| Boc | tert-Butyloxycarbonyl (older protecting group strategy) |
| TFA | Trifluoroacetic acid (cleavage reagent and counterion) |
| Resin | Solid support used in SPPS |
Practical Examples: Decoding Real Peptide Names {#practical-examples}
Let us work through several real examples, applying the rules above.
Example 1: Oxytocin
Notation: H-Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly-NH2 (disulfide: Cys1-Cys6)
Reading this: Nine amino acids, N-to-C. The H- means a free N-terminus. The -NH2 means an amidated C-terminus. Cys1 and Cys6 form a disulfide bond, creating a six-residue ring with a three-residue tail. This is oxytocin — a cyclic nonapeptide.
Example 2: Desmopressin
Notation: Mpa-Tyr-Phe-Gln-Asn-Cys-Pro-D-Arg-Gly-NH2
Reading this: "Mpa" is 3-mercaptopropionic acid (deaminated cysteine) at position 1. D-Arg at position 8 means D-arginine replaces the natural L-arginine. This is a vasopressin analog with two modifications — deamination at position 1 and D-amino acid substitution at position 8 — designed to increase potency and duration of action.
Example 3: GHRP-6
Notation: H-His-D-Trp-Ala-Trp-D-Phe-Lys-NH2
Reading this: A hexapeptide (six residues). Two D-amino acids (D-Trp at position 2 and D-Phe at position 5) provide protease resistance. Amidated C-terminus. This is growth hormone-releasing peptide-6.
Example 4: Semaglutide (partial)
Key modifications: Aib2, Lys26(N-epsilon-(N-(N-(2-(2-(2-(2-(2-(2-((S)-2-carboxy-3-((1S,2R)-3-carboxy-1-{[(1S,2R)-3-carboxy-1-(carboxymethylcarbamoyl)propyl]carbamoyl}propyl)propanoyl)amino)ethoxy)ethoxy)acetyl)(S)-gamma-glutamyl)(S)-gamma-glutamyl)amino)octadecanoyl))
Reading this (simplified): Position 2 has Aib instead of alanine (DPP-4 resistance). Position 26 lysine carries a complex fatty diacid chain through a PEG-gamma-glutamic acid linker (for albumin binding). In practice, most papers simplify this to "semaglutide" and reference the full structural description in the methods section — which is exactly what the naming system allows.
Example 5: RGD Peptide
Notation: Arg-Gly-Asp or RGD (one-letter code)
Reading this: A simple tripeptide of arginine, glycine, and aspartate. The RGD motif is one of the most studied sequences in biology — it is the cell-binding domain recognized by integrin receptors, found in fibronectin and other extracellular matrix proteins. When researchers say "RGD peptide," they mean any peptide containing this recognition sequence.
FAQ {#faq}
What is the difference between one-letter and three-letter amino acid codes?
They represent the same amino acids in different formats. Three-letter codes (Ala, Gly, Pro) are more readable for short sequences and are standard in most peptide product descriptions. One-letter codes (A, G, P) are more compact and preferred for long sequences, database entries, and bioinformatics. IUPAC recommends restricting one-letter codes to comparisons of long sequences.
Why is the sequence always written N-to-C?
Convention and biology. Ribosomes synthesize proteins from the N-terminus to the C-terminus, and the scientific community adopted this direction for notation. Writing in the opposite direction would describe a completely different molecule — reversing a peptide sequence changes its structure and function.
What does "D-" mean before an amino acid?
It indicates the D-enantiomer — the mirror image of the natural L-amino acid. Most biological amino acids are L-form. D-amino acids are used in synthetic peptides to increase resistance to enzymatic degradation, since proteases have evolved to recognize L-amino acid substrates. D-amino acid substitution is one of the simplest strategies for improving peptide stability.
How do I interpret modification notation like Ac- or -NH2?
Ac- at the beginning means the N-terminus is acetylated (capped with an acetyl group). -NH2 at the end means the C-terminus is amidated (the carboxyl group is replaced by an amide). Both modifications protect against exopeptidases and are commonly found in bioactive peptides. Other common modifications are listed in the modification notation section above.
What is Xaa or X in a peptide sequence?
Xaa (three-letter) or X (one-letter) represents any amino acid or an unknown amino acid. It is used when the identity of a residue is uncertain or when describing a peptide family motif where that position can be variable.
How are cyclic peptides notated?
Cyclic peptides require additional notation to indicate the type of cyclization. Head-to-tail cyclization (N-terminus bonded to C-terminus) is indicated by "cyclo-" prefix: cyclo(Arg-Gly-Asp-D-Phe-Val). Side-chain cyclization through disulfide bonds is noted separately, as in the oxytocin example above.
The Bottom Line {#the-bottom-line}
Peptide notation is a language, and like any language, it becomes intuitive with practice. The core rules are simple: amino acids have standard three-letter and one-letter codes, sequences read left to right from N-terminus to C-terminus, and modifications are indicated by specific prefixes, suffixes, and bracket notations.
For everyday use — reading product labels, understanding research papers, or discussing peptides with a clinician — knowing the 20 standard amino acid codes and the basic modification symbols covers most situations. The more specialized notation (IUPAC systematic names, complex modification descriptors, ambiguity codes) matters mainly for synthesis chemists and database curators.
The system exists to eliminate ambiguity. When a researcher writes Ac-D-Trp-Ala-Trp-D-Phe-Lys-NH2, there is exactly one molecule that description can mean. That precision is what makes peptide science reproducible across laboratories, languages, and decades. For a broader foundation in peptide science, see our complete beginner's guide and mechanisms of action overview.
References {#references}
-
IUPAC-IUB Joint Commission on Biochemical Nomenclature. (1984). Nomenclature and symbolism for amino acids and peptides. Pure and Applied Chemistry, 56(5), 595-624. IUPAC
-
IUPAC-IUB Commission on Biochemical Nomenclature. (1984). Nomenclature and symbolism for amino acids and peptides. Biochemical Journal, 219, 345-373. PDF
-
Markus, G. (2024). Peptide nomenclature: Reference for naming and abbreviations. BioLongevity Labs. BioLongevity
-
Tocris Bioscience. Peptide nomenclature guide. Tocris
-
IUPAC codes for amino acids. Bioinformatics.org. Bioinformatics.org
-
IUPAC-IUB Recommendations 1998: Nomenclature of peptide modifications. BMRB PDF