Launching program

In order to run thr program one needs Java 5.0 environment. It may be downloaded here.

The program by launched clicking here or on the launch button in the left menu. At the first use one needs to accept certificate to grant access to ones hard drive and network. This is needed to read the input query file from the hard drive and to submit it to the server.

Input data format

The protein and binding sites sequences are presented as a two columns table. The first column should have a header (the first row) "Sequences.AA" and should contain aligned protein sequences.
The second column should have a header (the first row) "Sequences.DNA" and should contain aligned nucleotide sequences of the binding sites.
Protein and its binding site should be in the same row. If protein has multiple binding sites, it should be repeated for each site.

Example:
sequences.AA	sequences.DNA
NAPH---TSEA-A	GaAAtCGTTTtC
NAPH---TSEA-A	GgAAACGTTTtC
GTAP---LTTQ-V	GTAAGCGCTTgC
GRFDK--MSQE-T	TAAACCGGTaaA
DNRS---ISMK-T	GTTtACGTTgtC

Note that the second and third rows correspond to one protein with two different sites. Sequences fields are case-insensitive. One may paste the input example from the 'Help' -> 'paste input example' menu item.

Optional "weights.AA" and "weights.DNA" columns with custom weights can be provided. In this case weighting is not performed and weights are taken from the respective columns. Please note that the sum of the weights for proteins is expected to be equal to the number of proteins in input dataset. The same goes for sites.

Submitting task to the server

Paste the input table in the program window or provide a file with the input table. Press the submit button.

A window with the task being processed will appear. Note: one may access the task by ID number within 24 hours.


Selecting the number of correlated pairs

At the end of computations one will be asked to review the selected number of correlated pairs. The plot displays the probability (y-value) of a given number (x-value) of most correlated pairs to come from the normal distribution. The least probable number of pairs should be selected (corresponds to the global minimum), although fewer number of pairs may be preferred if their probability corresponds to a local minimum, which is about as low as the global minimum. One may select the desired number of pairs by clicking on the plot. One may zoom-in a region of the plot by drawing a box from the top-left to the bottom-right corner around the desired region. To zoom out draw A box from the bottom-right to the top-left corners.

Note: One may redefine the number of correlated pairs by selecting the 'Edit' -> 'redefine correlated pairs number' menu item.

Working with the heatmap

Correlated pairs of positions are presented as a colored matrix, the heatmap. The pairs are colored according to the statistical significance. Pairs with non-random correlation scores are colored with the red-yellow pallette (with red assigned for the most correlated pair). The violet-black pallette is used for other pairs. The X-coordinate corresponds to protein positions and the y-coordinate corresponds to nucleotide positions.

Note: You can change the default coloring by clicking on the pallette at the bottom of the window.

Logo on the matrix headers visualizes the residues composition and conservation levels of a position


Contingency tables

Double clicking on a pair brings a window with the contingency table which contains numbers of 'amino acid'-'nucleotide' pairs.

Pairs which occur more often than expected are colored red (strongly preferred) and yellow, those occuring less often than expected are colored blue (strongly avoided) and light-blue.

The threshold for coloring is based on the&Chi² score summand: , where:
"exp" and "obs" are the expected and observed numbers of pairs nucleotide-aminoacid (i,j). The default thresholds are 100 and 50. The colors and thresholds maybe customized in the "Customize"->"Contigency table" menu

Saving data

The save correlations may be saved in a text data using the file menu. In addition the program data may be saved in a binary file, which can be loaded later without resubmitting the task to the server.