Data set

Protein-DNA complex data

Two sequence data sets are used for training two prediction models: (1) data set for a prediction model which uses both DNA and protein sequences, (2) data set for a prediction model which uses DNA sequence data alone. [Click here] to download data sets.

Prediction model #DNA sequences #Protein sequences Binding DNA sequence fragments Non-binding DNA sequence fragments
A model which uses both DNA and protein sequences 1416 837 20,588 27,630
A model which uses DNA sequence data alone 1416 0 20,378 23,950

Binding criteria

A binding site should be involved in at least one of the following interactions between RNA and protein: hydrogen bonds, water bridges and hydrophobic interactions. [Click here] for detailed information.