Sequence Alignment

SEQUENCE ALIGNMENT
Two Alignment Multiple Alignment
1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 0.0% 0 0.0% 0 0.0% 1 25.0% 0 0.0%
2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
3
4
5
Fundamental steps of the procedure leading to optimal 2 sequences alignment
n - 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S n R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S n + 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S n + 2 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
1 3.6%
18 62.1%
5 17.2%
2 6.9%
n + m -3
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 1 33.3%
n + m -2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 0.0%
n + m -1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0.0%
n
R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 22 7 3 %
GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGG ACGAGTTGGGTAGCC
1) 2)
V N C S L Y A S G I G K D G T S WV A
ATTGATTGCTCTCCGTACCTCCAA GTTGTAAGAGATGGTAACACCATGGTAGCC
I D C S P Y L Q - V V R D G N T M V A
UNITARY MATRIX
Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology
V N C S L Y A S G I G K D G T S WV A I D C S P Y <L Q V V R> D G N T M V A
0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 GENETIC CODE MATRIX
GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGG ACGAGTTGGGTAGCC ATTGATTGCTCTCCGTACCTC < CAA > GTTGTAAGAGATGGTAACACCATGGTAGCC
% SCORE
7/19 36.8
2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 PAM250 SCORING
29/57 50.9
V N C S L Y A S G I G K D G T S WV A I D C S P Y L < Q >V V R D G N T M V A
1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 GENETIC SEMIHOMOLOGY
42/97 43.3 42/89 47.2 20/38 52.6
V N C S L Y A S G I G K D G T S WV A I D C S P Y L < Q >V V R D G N T M V A
2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6
1) Contribution (%) of identical positions

PKILMECKKD 8 PKILMKCKHD 80% similar PKILMECKKD 2 SDCLLDCVCL 20% not similar
2) Length of the compared strings (sequences)

LCE 1 WCG 33.3% casual M V EI C I E P K I R C I K V C T K D E R I T C L I L D ET 8 M V Y WC P R R F M H C V H L K A G G C T C W C L R L D Y Y 2 6 % probably similar
What is important in the protein similarity search ?
3) Distribution of the identical positions along the analyzed sequence

MVEMICIEPKIRCIKVCTKDERITL 5 HVYYWRPERFMHTVKLKAGGCRCWL 20% casual MVEMIMAGDARCIKVCTKDERITCL 5 HHYYWMAGDAHTVQLKAGGCWCWAG 20% similar
4) Residues at conservative positions

MVCPKILMKCKHDSDCLLDCVCLED EDEGKRRTKREHFKESNLAAAFKEQ not similar MVCPKILMKCKHDSDTLLDCVCLED QNCPGPREWCFTTRMNDSSCACPQT similar
5) Structural/genetic similarity of the amino acids at non-conservative positions

Identity only MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE Structural MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE Genetic MVCPKILMKCKHDSDCLLDCVCLED RLCRRLVKRCRKETECIVECICIDE
The probability of randomly occurred minimum identity match (a is equal to declared or higher) is:
The sequence identity estimation procedure
n k nk k x ( x( x 1)) k =a Pan = 2n x
n
x the number of unit types in sequence (20 Where: for proteins; 4 for NA) n the sequence length (the number of compared position pairs) a the number of identical positions
Genetic conditioning of the amino acid replacement probabilities and spectrum in molecular evolution
Do the amino acids possess their pedigree ?

or...
Do they contain the information about their history (genealogy)?

or...
Can the amino acid mutational replacements described as Markovian processes ?
The Markov model assumes that the substitution probability of amino acid AA1 by AA2 is the same, regardless of what the initial residue AA1 was transformed from (AAx, AAy)
AAx AAy
AA1 AA1
Pa Pb
AA2 AA2
Pa = Pb
The currently used statistical algorithms are based on Markovian model of the amino acid replacement (they directly use stochastic matrices of replacement frequency indices)
PAM250 matrix of amino acid replacements
C S T P A G N D E Q H R K M I L V F Y W
12 0 -2 -3 -2 -3 -4 -5 -5 -5 -3 -4 -5 -5 -2 -6 -2 -4 0 -8 C
2 1 1 1 1 1 0 0 -1 -1 0 0 -2 -1 -3 -1 -3 -3 -2 S
Why tryptophane is here the most conservative residue?

3 0 1 0 0 0 0 -1 -1 -1 0 -1 0 -2 0 -3 -3 -5 T 6 1 -1 -1 -1 -1 0 0 0 -1 -2 -2 -3 -1 -5 -5 -6 P 2 1 0 0 0 0 -1 -2 -1 -1 -1 -2 0 -5 -3 -6 A 5 0 1 0 -1 -2 -3 -2 -3 -3 -4 -1 -5 -5 -7 G
2 2 1 1 2 0 1 -2 -2 -3 -2 -4 -2 -4 N
4 3 2 1 -1 0 -3 -2 -4 -2 -6 -4 -7 D
4 2 1 -1 0 -2 -2 -3 -2 -5 -4 -7 E
4 3 1 1 -1 -2 -2 -2 -5 -4 -5 Q
6 2 0 -2 -2 -2 -2 -2 0 -3 H
6 3 0 -2 -3 -2 -4 -4 2 R
5 0 6 -2 2 5 -3 4 2 6 -2 2 4 2 4 -5 0 1 2 -1 -4 -2 -1 -1 -2 -3 -4 -5 -2 -6 K M I L V
9 7 0 F
10 0 17 Y W
BLOSUM62 matrix of amino acid replacements

A R N D C Q E G H I L K M F P S T W Y V 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 A 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 R
6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 N
6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 D
9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 C
5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 Q
5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 E
6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 G
8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 H
4 2 -3 1 0 -3 -2 -1 -3 -1 3 I
4 -2 2 0 -3 -2 -1 -2 -1 1 L
5 -1 -3 -1 0 -1 -3 -2 -2 K
5 0 -2 -1 -1 -1 -1 1 M
6 -4 -2 -2 1 3 -1 F
7 -1 -1 -4 -3 -2 P
4 1 5 -3 -2 11 -2 -2 2 7 -2 0 -3 -1 4 S T W Y V
Replacemant Arg Lys according to the statistical interpretation using stochastical matrix indices
PAM250
BLOSUM62
3 2 2 3 3
Arg
BLOSUM35 BLOSUM45 BLOSUM100
Lys
Diagram of genetic relationships between amino acids

K K N N AGCU R 1 R S 3 2 T T T T I M I I V V V V S A A A A L L L L G G G P P P P L L F F R R R S S S S W C C G E E D D R Q Q H H Y Y
Diagram of of codon genetic relationships Diagram amino acid genetic relationships

K AAA K AAG N AAC N AAU R AGA AGCU 1 R AGG S AGC S AGU 3 2 T ACA T ACG T ACC T ACU I AUA M AUG I AUC I AUU V GUA V GUG V GUC V GUU A GCA A GCG A GCC A GCU L CUA L CUG L CUC L CUU G GGA G GGG G GGC G GGU P CCA P CCG P CCC P CCU L UUA L UUG F UUC F UUU E GAA E GAG D GAC D GAU R CGA R CGG R CGC R CGU S UCA S UCG S UCC S UCU Q CAA Q CAG H CAC H CAU UGA W UGG C UGC C UGU UAA UAG Y UAC Y UAU
Arginine-to-lysine mutational conversion pathways for arginines of different origin
Met
AUG
Arg
AGG
Lys
AAG
His
CAC
Asn
AAC
Pro
CCC
Arg
CGC
Ser
AGC
Arg
AGG
Lys
AAG
Arg
CGG
Gln
CAG
Possible single-point-mutational processing of serine with respect to its origin

Trp
UGG
Asn
AAU
Ser
UCG
Ser
AGU
Thr Ser
Ala Trp
(UAG)
Pro Leu
Thr Ser
Ile Arg Gly
Asn Cys
Amino acid mutational substitution based on the single transition/transversion is NOT the Markovian process
Theoretical proof The conversion pathway of arginine into lysine, glutamine and serine for arginine resulting from the processing of the codons encoding different amino acids
Possible codons for arginine: AGA AGG CGA CGG CGC CGT
Conversion of arginine into lysine
Met
ATG
Arg
AGG
Lys
AAG
Gln Leu
CTR
Arg
CGR
CAR
Lys
AAR
Arg
AGR
Ser
AGY
His
CAY
Arg
CGY
Arg
AGR
Lys
AAR
Arg
CGR
Conversion of arginine into serine
Met
ATG
Arg
AGG
Ser
AGY
Arg Leu
CTR
Arg
CGR
AGR
Ser
AGY
Arg
CGY
His
CAY
Arg
CGY
Ser
AGY
Conversion of arginine into glutamine

Lys Met
ATG
Arg
AGG
AAG
Gln
CAG
Arg
CGG
Leu
CTR
Arg
CGR
Gln
CAR
His His
CAY
Arg
CGY
CAY
Gln
CAR
Arg
CGR
then...
Probability of the replacement of one amino acid into another depends significantly on what amino acids occupied that position in the past
There is a high risk, that commonly used algorithms applying the stochastic data matrices (MDM, PAM, BLOSUM) lead to the wrong interpretation of mutational processes occurring in proteins
Genetic relationhips between Arg and Met/Gln

K K N N AGCU R 1 G E E D D
Q Q
H H
Y Y
R
G
R
S S T T T T I V A
R
G G P R R S P A A L P P L L V L V L
W C C
3 2
S S S
M
I I
L F F
Arg-Met and Arg-Gln substitutions. Two kinds of arginine

Inhibitory z rolin dyniowatych
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 1 6. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. RVMIG RVMIG S C P RKL I [LW ][Y] MNK REKQ P C KSQ T [KSRHQ TY][V] DN RSDA D C LFMP ALTPG R DEG Q K C VITKR C LKG Q MV PKEQ RSA NHEQ DS [I][D] GE YFIH C G * * 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78.
Inhibitory typu Bowmana-Birk

C C DRBSN Q HELZRSIFTK # C ASTKEMILRDVPF * C T [KR][A] S NMIEKRDQ *# P P Q KZETI C [RHQ S][V] # C STNVAEHR DZBN MILVTR * R L NDE SKTR C H S A C KSDEN SLG RTFH C 79. IAVLM 80. C 81. ATNR 82. LYFRK 83. S 84. YIEFMQ DN 85. P 86. AG P 87. Q KZM 88. C 89. FVRIHSQ 90. C 91. VTBG LAYF 92. DB 93. [IMTV][Q ] 94. TNBKAHD 95. DBNKT 96. FSY 97. C 98. [YH][T] 99. EAKPD 100. PSAK 101. C
Domeny owomukoidu (typ Kazala )

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. # 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. VILE NDH C [STR][D] LPKQ E YF ALPKQ SQ TK G TRS IVKNT G VSTL KRTQ DG N G TNRKE STLAQ P W MLIV VTI [A][R] C PT [RM][F] [NI][E] [L][Y] KSLQ DV [P][E] [V][H] C GA TS DN GS 33. SFV 34. T 35. Y 36. SDA 37. NS 38. [ED][R] 39. C 40. G STF 41. ILF 42. C 43. [L][A][N] 44. [YH][A] # 45. NY 46. RAILV 47. EQ 48. HQ LS 49. G HRN 50. ATR 51. [NHST][E] 52. VIL 53. ESKAG N 54. [K][L] * 55. ELKSRV 56. [YHS][K] 57. [DN][M] 58. G A 59. EKRA 60. C 61. RKE 62. PLQ E 63. KERD 64. [ISV][H] 65. [VG ][PT] 66. [MEK][PS]
PAM250 matrix of amino acid replacements
C S T P A G N D E Q H R K M I L V F Y W
12 0 -2 -3 -2 -3 -4 -5 -5 -5 -3 -4 -5 -5 -2 -6 -2 -4 0 -8 C
2 1 1 1 1 1 0 0 -1 -1 0 0 -2 -1 -3 -1 -3 -3 -2 S
3 0 1 0 0 0 0 -1 -1 -1 0 -1 0 -2 0 -3 -3 -5 T
6 1 -1 -1 -1 -1 0 0 0 -1 -2 -2 -3 -1 -5 -5 -6 P
2 1 0 0 0 0 -1 -2 -1 -1 -1 -2 0 -5 -3 -6 A
5 0 1 0 -1 -2 -3 -2 -3 -3 -4 -1 -5 -5 -7 G
2 2 1 1 2 0 1 -2 -2 -3 -2 -4 -2 -4 N
4 3 2 1 -1 0 -3 -2 -4 -2 -6 -4 -7 D
4 2 1 -1 0 -2 -2 -3 -2 -5 -4 -7 E
4 3 1 1 -1 -2 -2 -2 -5 -4 -5 Q
6 2 0 -2 -2 -2 -2 -2 0 -3 H
6 3 0 -2 -3 -2 -4 -4 2 R
5 0 6 -2 2 5 -3 4 2 6 -2 2 4 2 4 -5 0 1 2 -1 -4 -2 -1 -1 -2 -3 -4 -5 -2 -6 K M I L V
9 7 0 F
10 0 17 Y W
PAM250 and BLOSUM62 scores for the replacements:

Arg-Lys Lys-Gln Lys-Glu Arg-Gln and Arg-Glu
Replacement Arg/Lys Lys/Gln Arg/Gln Lys/Glu Arg/Glu
PAM250 3 1 1 0 -1
BLOSUM62 2 1 1 1 0
Genetic relationships among Arg, Lys, Glu and Gln

K K
N N AGCU
E E
D D G
Q Q
H H
Y Y
R
1
R
G
R
S S T T T T I M I I V A
R
G G P R R S P A A L P P L L V V L L
W C C
3 2
S S S
L F F
Arg-Glu and Lys-Glu substitutions (Arg/Lys/Gln/Glu replacements)

Inhibitory z rolin dyniowatych
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. RVMIG RVMIGS C P RKL I [LW][Y] MNK REKQP C KSQT [KSRHQTY][V] DN RSDA D C LFMP ALTPGR DEGQK C VITKR C LKGQMV PKEQRSA NHEQDS [I][D] GE YFIH C G 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78.
Inhibitory typu Bowmana-Birk

C C DRBSN QHELZRSIFTK C ASTKEMILRDVPF C T [KR][A] S NMIEKRDQ P P QKZETI C [RHQS][V] C STNVAEHR ! DZBN MILVTR R L NDE SKTR C H S A C KSDEN SLGRTFH C 79. IAVLM 80. C 81. ATNR 82. LYFRK 83. S 84. YIEFMQDN 85. P 86. AGP 87. QKZM 88. C 89. FVRIHSQ 90. C 91. VTBGLAYF 92. DB 93. [IMTV][Q] 94. TNBKAHD 95. DBNKT 96. FSY 97. C 98. [YH][T] 99. EAKPD 100. PSAK 101. C
Domeny owomukoidu (typ Kazala)

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. VILE NDH C [STR][D] LPKQE YF ALPKQ SQTK GTRS IVKNT GVSTL KRTQ DGN G TNRKE STLAQP WMLIV VTI [A][R] C PT [RM][F] [NI][E] [L][Y] KSLQDV [P][E] [V][H] C GA TS DN GS 33. SFV 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. T Y SDA NS [ED][R] C GSTF ILF C [L][A][N] [YH][A] NY RAILV EQ HQLS GHRN ATR [NHST][E] VIL ESKAGN [K][L] ELKSRV [YHS][K] [DN][M] GA EKRA C RKE PLQE KERD [ISV][H] [VG][PT] 66. [MEK][PS]
What part of the codon contains the information about the previous amino acid that occurred at certain position of the protein sequence?
At most 2/3 of the entire codon.
Ala
GCG
Val
GUG
How long is the information about codons of preceeding amino acids stored?
The shortest storage period is 3 transitions/transversions
Ala
GCG
Val
GUG
Met
AUG
Ile
AUA
Ser
UCC
Ser
UCU
Thr
ACU
Ser
AGU
Theoreticaly the longest period is infinite
Lys
AAA
Asn
AAC
Asp
GAC
His
CAC
Gln
CAG
Glu
GAG
Asp
GAU
Tyr
UAU
His
CAU
Asn
AAU
Lys
AAG
Gln
CAG
His
CAC
...
CONCLUSIONS
The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein variability at the amino acid level.
The amino acid codons do contain the information about the ancestral amino acids, whose codons were the starting point to the codon of current residue.
It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary variability.

Sequence Alignment

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Sequence Alignment

Cargado por

Copyright:

Formatos disponibles

SEQUENCE ALIGNMENT

Two Alignment Multiple Alignment

Fundamental steps of the procedure leading to optimal 2 sequences alignment

42/97 43.3 42/89 47.2 20/38 52.6

1) Contribution (%) of identical positions

2) Length of the compared strings (sequences)

What is important in the protein similarity search ?

3) Distribution of the identical positions along the analyzed sequence

4) Residues at conservative positions

5) Structural/genetic similarity of the amino acids at non-conservative positions

The sequence identity estimation procedure

Do the amino acids possess their pedigree ?

Do they contain the information about their history (genealogy)?

Can the amino acid mutational replacements described as Markovian processes ?

PAM250 matrix of amino acid replacements

Why tryptophane is here the most conservative residue?

BLOSUM62 matrix of amino acid replacements

BLOSUM35 BLOSUM45 BLOSUM100

Diagram of genetic relationships between amino acids

Diagram of of codon genetic relationships Diagram amino acid genetic relationships

Arginine-to-lysine mutational conversion pathways for arginines of different origin

Possible single-point-mutational processing of serine with respect to its origin

Ile Arg Gly

Conversion of arginine into lysine

Conversion of arginine into serine

Conversion of arginine into glutamine

Genetic relationhips between Arg and Met/Gln

Arg-Met and Arg-Gln substitutions. Two kinds of arginine

Inhibitory typu Bowmana-Birk

Domeny owomukoidu (typ Kazala )

PAM250 matrix of amino acid replacements

PAM250 and BLOSUM62 scores for the replacements:

Replacement Arg/Lys Lys/Gln Arg/Gln Lys/Glu Arg/Glu

Genetic relationships among Arg, Lys, Glu and Gln

Arg-Glu and Lys-Glu substitutions (Arg/Lys/Gln/Glu replacements)

Inhibitory typu Bowmana-Birk

Domeny owomukoidu (typ Kazala)

At most 2/3 of the entire codon.

Theoreticaly the longest period is infinite

También podría gustarte