The program calculates the correlation between each pair of columns [i, j]: one from the proteins alignment, and the other from the site alignment.
As a measure of correlation, the mutual information is used:
where:
X, Y | Arrays of 20 aminoacids and 4 nucleotides respectively. |
![]() | Observed frequency of aminoacid x being in position i and nucleotide y being in position j. |
![]() | Expected frequency with the hypothesis of absence of correlations between columns. Calculated as frequency of aminoacid x in column i multiplied by frequency of nucleotide y in column j |
Large difference of the and the
expected distribution
is reflected in high mutual information value