Documentos de Académico
Documentos de Profesional
Documentos de Cultura
1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 0.0% 0 0.0% 0 0.0% 1 25.0% 0 0.0%
2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
3
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
4
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
5
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
n - 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S n R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S n + 1 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S n + 2 R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S
1 3.6%
18 62.1%
5 17.2%
2 6.9%
n + m -3
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 1 33.3%
n + m -2
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0 0.0%
n + m -1
R V C P K I L M E C K K D S D C L A E C I C L E H G Y C G 0 M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 0.0%
n
R V C P K I L M E C K K D S D C L A E C I C L E H - G Y C G M V C P K I L M K C K H D S D C L L D C V C L E D I G Y C G V S 22 7 3 %
GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGG ACGAGTTGGGTAGCC
1) 2)
V N C S L Y A S G I G K D G T S WV A
ATTGATTGCTCTCCGTACCTCCAA GTTGTAAGAGATGGTAACACCATGGTAGCC
I D C S P Y L Q - V V R D G N T M V A
UNITARY MATRIX
Comparison of the fragments of 1st and 2nd domain of chicken ovomucoid using unitary matrix, GCM, PAM250 and algorithm of genetic semihomology
V N C S L Y A S G I G K D G T S WV A I D C S P Y <L Q V V R> D G N T M V A
0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 GENETIC CODE MATRIX
GTTAATTGCAGCCTGTATGCCAGCGGCATCGGCAAGGATGGG ACGAGTTGGGTAGCC ATTGATTGCTCTCCGTACCTC < CAA > GTTGTAAGAGATGGTAACACCATGGTAGCC
% SCORE
7/19 36.8
2 2 3 0 2 2 1 0 0 1 1 1 3 2 1 1 1 3 3 PAM250 SCORING
29/57 50.9
V N C S L Y A S G I G K D G T S WV A I D C S P Y L < Q >V V R D G N T M V A
1 1 2 2 0 2 0 0 0 1 0 1 2 2 1 1 0 2 2 GENETIC SEMIHOMOLOGY
V N C S L Y A S G I G K D G T S WV A I D C S P Y L < Q >V V R D G N T M V A
2 2 3 3 2 3 0 0 0 2 1 2 3 3 1 1 0 3 3 34/57 59.6
The probability of randomly occurred minimum identity match (a is equal to declared or higher) is:
n k nk k x ( x( x 1)) k =a Pan = 2n x
n
x the number of unit types in sequence (20 Where: for proteins; 4 for NA) n the sequence length (the number of compared position pairs) a the number of identical positions
Genetic conditioning of the amino acid replacement probabilities and spectrum in molecular evolution
The Markov model assumes that the substitution probability of amino acid AA1 by AA2 is the same, regardless of what the initial residue AA1 was transformed from (AAx, AAy)
AAx AAy
AA1 AA1
Pa Pb
AA2 AA2
Pa = Pb
The currently used statistical algorithms are based on Markovian model of the amino acid replacement (they directly use stochastic matrices of replacement frequency indices)
C S T P A G N D E Q H R K M I L V F Y W
12 0 -2 -3 -2 -3 -4 -5 -5 -5 -3 -4 -5 -5 -2 -6 -2 -4 0 -8 C
2 1 1 1 1 1 0 0 -1 -1 0 0 -2 -1 -3 -1 -3 -3 -2 S
2 2 1 1 2 0 1 -2 -2 -3 -2 -4 -2 -4 N
4 3 2 1 -1 0 -3 -2 -4 -2 -6 -4 -7 D
4 2 1 -1 0 -2 -2 -3 -2 -5 -4 -7 E
4 3 1 1 -1 -2 -2 -2 -5 -4 -5 Q
6 2 0 -2 -2 -2 -2 -2 0 -3 H
6 3 0 -2 -3 -2 -4 -4 2 R
5 0 6 -2 2 5 -3 4 2 6 -2 2 4 2 4 -5 0 1 2 -1 -4 -2 -1 -1 -2 -3 -4 -5 -2 -6 K M I L V
9 7 0 F
10 0 17 Y W
6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 N
6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 D
9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 C
5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 Q
5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 E
6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 G
8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 H
4 2 -3 1 0 -3 -2 -1 -3 -1 3 I
4 -2 2 0 -3 -2 -1 -2 -1 1 L
5 -1 -3 -1 0 -1 -3 -2 -2 K
5 0 -2 -1 -1 -1 -1 1 M
6 -4 -2 -2 1 3 -1 F
7 -1 -1 -4 -3 -2 P
4 1 5 -3 -2 11 -2 -2 2 7 -2 0 -3 -1 4 S T W Y V
Replacemant Arg Lys according to the statistical interpretation using stochastical matrix indices
PAM250
BLOSUM62
3 2 2 3 3
Arg
Lys
Met
AUG
Arg
AGG
Lys
AAG
His
CAC
Asn
AAC
Pro
CCC
Arg
CGC
Ser
AGC
Arg
AGG
Lys
AAG
Arg
CGG
Gln
CAG
Asn
AAU
Ser
UCG
Ser
AGU
Thr Ser
Ala Trp
(UAG)
Pro Leu
Thr Ser
Asn Cys
Amino acid mutational substitution based on the single transition/transversion is NOT the Markovian process
Theoretical proof The conversion pathway of arginine into lysine, glutamine and serine for arginine resulting from the processing of the codons encoding different amino acids
Possible codons for arginine: AGA AGG CGA CGG CGC CGT
Met
ATG
Arg
AGG
Lys
AAG
Gln Leu
CTR
Arg
CGR
CAR
Lys
AAR
Arg
AGR
Ser
AGY
His
CAY
Arg
CGY
Arg
AGR
Lys
AAR
Arg
CGR
Met
ATG
Arg
AGG
Ser
AGY
Arg Leu
CTR
Arg
CGR
AGR
Ser
AGY
Arg
CGY
His
CAY
Arg
CGY
Ser
AGY
Arg
AGG
AAG
Gln
CAG
Arg
CGG
Leu
CTR
Arg
CGR
Gln
CAR
His His
CAY
Arg
CGY
CAY
Gln
CAR
Arg
CGR
then...
Probability of the replacement of one amino acid into another depends significantly on what amino acids occupied that position in the past
There is a high risk, that commonly used algorithms applying the stochastic data matrices (MDM, PAM, BLOSUM) lead to the wrong interpretation of mutational processes occurring in proteins
Q Q
H H
Y Y
R
G
R
S S T T T T I V A
R
G G P R R S P A A L P P L L V L V L
W C C
3 2
S S S
M
I I
L F F
C S T P A G N D E Q H R K M I L V F Y W
12 0 -2 -3 -2 -3 -4 -5 -5 -5 -3 -4 -5 -5 -2 -6 -2 -4 0 -8 C
2 1 1 1 1 1 0 0 -1 -1 0 0 -2 -1 -3 -1 -3 -3 -2 S
3 0 1 0 0 0 0 -1 -1 -1 0 -1 0 -2 0 -3 -3 -5 T
6 1 -1 -1 -1 -1 0 0 0 -1 -2 -2 -3 -1 -5 -5 -6 P
2 1 0 0 0 0 -1 -2 -1 -1 -1 -2 0 -5 -3 -6 A
5 0 1 0 -1 -2 -3 -2 -3 -3 -4 -1 -5 -5 -7 G
2 2 1 1 2 0 1 -2 -2 -3 -2 -4 -2 -4 N
4 3 2 1 -1 0 -3 -2 -4 -2 -6 -4 -7 D
4 2 1 -1 0 -2 -2 -3 -2 -5 -4 -7 E
4 3 1 1 -1 -2 -2 -2 -5 -4 -5 Q
6 2 0 -2 -2 -2 -2 -2 0 -3 H
6 3 0 -2 -3 -2 -4 -4 2 R
5 0 6 -2 2 5 -3 4 2 6 -2 2 4 2 4 -5 0 1 2 -1 -4 -2 -1 -1 -2 -3 -4 -5 -2 -6 K M I L V
9 7 0 F
10 0 17 Y W
PAM250 3 1 1 0 -1
BLOSUM62 2 1 1 1 0
E E
D D G
Q Q
H H
Y Y
R
1
R
G
R
S S T T T T I M I I V A
R
G G P R R S P A A L P P L L V V L L
W C C
3 2
S S S
L F F
What part of the codon contains the information about the previous amino acid that occurred at certain position of the protein sequence?
Ala
GCG
Val
GUG
How long is the information about codons of preceeding amino acids stored?
The shortest storage period is 3 transitions/transversions
Ala
GCG
Val
GUG
Met
AUG
Ile
AUA
Ser
UCC
Ser
UCU
Thr
ACU
Ser
AGU
Lys
AAA
Asn
AAC
Asp
GAC
His
CAC
Gln
CAG
Glu
GAG
Asp
GAU
Tyr
UAU
His
CAU
Asn
AAU
Lys
AAG
Gln
CAG
His
CAC
...
CONCLUSIONS
The analysis of genetic semihomology excludes applicability of Markov model for the studies on protein variability at the amino acid level.
The amino acid codons do contain the information about the ancestral amino acids, whose codons were the starting point to the codon of current residue.
It refers mainly to the positions undergoing single-point mutations as the most basic mechanism of evolutionary variability.