### Mutual Information

The program calculates the correlation between each pair of columns [*i, j*]: one from the proteins alignment, and the other from the site alignment.
As a measure of correlation, the mutual information is used:

where:

*X, Y* | Arrays of 20 aminoacids and 4 nucleotides respectively. |

| Observed frequency of aminoacid *x* being in position *i*
and nucleotide *y* being in position *j*. |

| Expected frequency with the hypothesis of absence of correlations between columns.
Calculated as frequency of aminoacid *x*
in column *i* multiplied by frequency of nucleotide *y* in column *j* |

Large difference of the and the
expected distribution is reflected in high mutual information value

### Statistical significance

In order to understand what values of mutual information are sufficiently high to be non-random, the statistical significance value is calculated as Z-score:

where:

- is the distribution of the mutual information for random (non-correlated)
pairs of columns with the same aminoacid and nucleotide compositions as in [*i, j*] pair.

and are the mean and the standard deviation, respectively.

To estimate , 10,000 random pairs of columns are generated and mutual informtion values are calculeted for them.