As we are entering the age of "Personal Genomics" or "Personalized Medicine", it has been expected that the knowledge of human genetic polymorphisms and variations could provide a foundation for understanding differences in susceptibility to diseases and designing individualized therapeutic treatments (Cargill, et al., 1999; Collins, et al., 1998). Recent progresses of the International HapMap Project and similar projects (International HapMap Consortium, 2005; Frazer, et al., 2007) have provided a wealth of information detailing tens of millions human genetic variations between individuals, including copy number variations (CNVs) (Redon, et al., 2006) and single nucleotide polymorphisms (SNPs) (Hinds, et al., 2005). It was estimated that ~90% of human genetic variations are due to SNPs (Collins, et al., 1998). In particular, by changing amino acids in proteins, non-synonymous SNPs (nsSNPs) in the gene coding regions could account for nearly half of the known genetic variations linked to human inherited diseases (Stenson, et al., 2003). In this regard, numerous efforts have been contributed to elucidate how nsSNPs generate deleterious effects on the stability and function of proteins. Obviously, an nsSNP might change the physicochemical property of a wild-type amino acid to affect the protein stability and dynamics, or disrupt the interacting interface that prohibits the protein to form a complex with its partners (Kono, et al., 2008; Stitziel, et al., 2004; Uzun, et al., 2007; Yue and Moult, 2006). Alternatively, nsSNPs could also influence post-translational modifications (PTMs) of proteins (eg., phosphorylation), by changing the residue types of the target sites or key flanking amino acids (Erxleben, et al., 2006; Gentile, et al., 2008; Ryu, et al., 2009; Savas and Ozcelik, 2005; Yang, et al., 2008). Previously, the Armstrong group firstly coined the term of phosphorylopathy to describe human genetic variation that results in aberrant regulation of protein phosphorylation (Erxleben, et al., 2006; Gentile, et al., 2008).
In this work, we performed a genome-wide analysis of genetic polymorphisms that influence protein phosphorylation in H. Sapiens. We collected 502,922 missense SNPs from NCBI dbSNP build 134 (Sherry, et al., 2001). The human mRNA/protein sequences were taken from RefSeq build 31 (Pruitt, et al., 2007). We used our GPS 2.1 software (Xue, et al., 2008) to predict kinase-specific phosphorylation sites for human proteins and nsSNP data.To reduce the false positive ratio, we future add the filter of the known phosphorylation sites and protein-protein interaction data.For simplicity, we defined a phosSNP (Phosphorylation-related SNP) as an nsSNP that might influence protein phosphorylation status. We classified all phosSNPs into Four groups. The first three types (I, II, and III) were similarly defined as previously described (Ryu, et al., 2009), including change of an amino acid with S/T/Y residue or vice versa to create a new [Type I (+)] or remove an original phosphorylation site [Type I (-)], variations to add [Type II (+)] or remove adjacent phosphorylation sites [Type II (-)], and mutations to change PK types of adjacent phosphorylation sites (Type III) (Ryu, et al., 2009). Also, we observed that an amino acid substitution among S, T or Y could also change the PK types in the phosphorylated position (Type IV), say, the target site could still be phosphorylated but by a different type of kinase. In this regard, we detected 55.7% of nsSNPs as potential phosSNPs (280,261) in 30,273 proteins. With the P-sites filtering, we observed 19,167 nsSNPs as phosSNPs in 9,771. Moreover after filtering with PPI data, we got a more accurate phosSNP set with the number of 4,490. In this regard, we proposed that most of nsSNPs might affect protein phosphorylation and play ubiquitous roles in rewiring the biological pathways. Taken together, we proposed that our results could be a useful resource for future disease diagnostics and provide basis for better and individualized. Finally, all phosSNPs data were integrated into PhosSNP 2.0 database, which was implemented in PHP + Apache + MySQL. The PhosSNP 2.0 is freely available for academic researches at: http://phossnp.biocuckoo.org/. It is available to visit archive version PhosSNP 1.0 on http://phossnp.biocuckoo.org/v1/.
For publication of results please cite the following article:
PhosSNP for Systematic Analysis of Genetic Polymorphisms That Influence Protein Phosphorylation
Jian Ren, Chunhui Jiang, Xinjiao Gao, Zexian Liu, Zineng Yuan, Changjiang Jin, Longping Wen, Zhaolei Zhang, Yu Xue and Xuebiao Yao. Mol Cell Proteomics. 2010;9(4):623-634