Overview
Identifying protein recognition parts in DNAs or DNA recognition parts in proteins will help understanding a variety of cellular processes.
Theoretical and experimental studies about protein-DNA interactions have been carried out.
Many studies have been focused on predicting DNA-binding residues in proteins, but the inverse problem
(i.e., predicgin protein-binding nucleotides in DNA sequence) has received fewer attempts.
Here we developed a web application called PNImodeler which predicts protein-binding nucleotides in DNA sequence using sequence information.
As of July 2013, we collected 1,584 protein-DNA complexes which are determined by X-ray crystallography with a resolution of 3.0Å or better.
In the 1,584 protein-DNA complexes, there are 1416 DNA sequences 837 protein sequences.
To determin binding sites in DNA, we used three types of interactions: hydrogen bonds, water bridges and hydrophobic interactions.
PNImodeler provides two prediction models: one uses DNA sequence data alone and the other uses both DNA and protein sequence data.
The first model consists of 20,378 binding DNA sequence fragments and 23,950 non-binding DNA sequence fragments.
The other model consists of 20,558 binding DNA sequence fragments and 27,630 non-binding DNA sequence fragments.
The two models are tested on independent data set which has different DNA sequences from the model with sequence similarity of 80%
The first model achived a sensitivity of 73.4%, a specificity of 64.8%, an accuracy of 68.9 and a correlation coefficient of 0.382 and
the other model achived a sensitivity of 67.6%, a specificity of 74.3%, and accuracy of 71.4% and a correlation coefficient of 0.418.