Editing the description - Molecular sequence symbols

The editor for sequence data recognizes predefined symbols for nucleotide and protein sequences according the IUPAC definitions.

 

Nucleic acid symbols

Symbol Name
A Adenine
C Cytosine
G Guanine
T Thymine
U Uracile
W Weak (A or T) 
S Strong (G or C)
M aMino (A or C)
K Keto (G or T)
R puRine (G or A)
Y pYrimidine (C or T) 
B not A (B comes after A) 
D not C (D comes after C)
H not G (H comes after G)
V not T (V comes after T and U)
N No idea (not a gap)

The symbols with grey background are ambiguity symbols. The difference between "N" and a gap symbol (usually "-", but any other symbol may be defined in the descriptor) is that a gap symbol represents an unspecified number of unknown symbols but "N" stands for exatly one nucleic acid. 

 

Amino acid symbols

Name 1-letter sybmol 3-letter sybmol
Alanine A Ala
Arginine R Arg
Asparagine N Asn
Aspartic acid S Asp
Cysteine C Cys
Glutamic acid E Glu
Glutamine Q Gln
Glycine G Gly
Histidine H His
Isoleucine I Ile
Leucine Leu
Lysine K Lys
Methionine M Met
Phenylalanine F Phe
Proline P Pro
Serine S Ser
Threonine T Thr
Tryptophan W Trp
Tyrosine Y Tyr
Valine V Val
Selenocysteine U Sec
Pyrrolysine O Pyl
Asparagine or aspartic acid B Asx
Glutamine or glutamic acid Z Glx
Leucine or Isoleucine J Xle
Unspecified or unknown amino acid X Xaa

The symbols with grey background are ambiguity symbols. The difference between "X" rsp. "Xaa" and a gap symbol (e.g. "---", but any other symbol may be defined in the descriptor) is that a gap symbol represents an unspecified number of unknown symbols but "X" rsp. "Xaa" stand for exatly one amino acid. The amino acids "Selenocysteine" and Pyrrolysine" are non-standard amino acids that only occur in certain species. 

 

Continue with: