dbSTR: A Database for Short Tandem Repeats
Haifeng Liu1, Loo Nin Teo2, Eric Yap, Linda Gan, Hui Min Wu, Sock Hoon Ng, Adrian Eng, Loo See Teo, Keng Wah Chao
1lhaifeng@dso.org.sg, DSO National Laboratories, Singapore; 2tloonin@dso.org.sg, DSO National Laboratories, Singapore
Characterized by high levels of polymorphism and by a large number of alleles, short tandem repeats (STR) or microsatellites have been the ideal genetic markers in the localization of human disease loci in positional cloning. They have also been exploited for population and evolution genetics studies. We have established the database dbSTR to serve as a central repository of STRs with wet-lab verified or predicted polymorphisms. Currently a total of 187,952 dinucleotide STRs from the whole human genome (Genbank build 31) have been identified and deposited into dbSTR. Their polymorphisms have been predicted as low or high using a machine learning approach which is based on a hybrid combination of a k-nearest neighbor classifier and two artificial neural networks. The prediction accuracy of the approach has been validated by wet lab experiments to be 93% and 63% for high polymorphism and low polymorphism respectively. Researchers can freely retrieve the information of these STRs from http://www.dbSTR.org. Additional web interfaces for form-based submissions, querying, and browsing dbSTR are currently under development, and will be available by June 2003.