The program calculates the correlation between each pair of columns [i, j]: one from the proteins alignment, and the other from the site alignment.
As a measure of correlation, the mutual information is used:
where:
X, Y | Arrays of 20 aminoacids and 4 nucleotides respectively. |
Observed frequency of aminoacid x being in position i and nucleotide y being in position j. | |
Expected frequency with the hypothesis of absence of correlations between columns. Calculated as frequency of aminoacid x in column i multiplied by frequency of nucleotide y in column j |
Large difference of the and the expected distribution is reflected in high mutual information value