MICROSATELLITE REPEATS IN PLANTS
Chandri N Yandava1, Roger Pennell2, Kenneth Feldmann, Peter Mascia, Richard Flavell, William Kimmerly
1cyandava@ceres-inc.com, Ceres Inc; 2rpennell@ceres-inc.com, Ceres Inc
Microsatellite repeats or simple-sequence repeats (SSRs) are simple tandem repeats of 2-5 nucleotides. They are abundantly present in the genomes of metazoan animals and also plants. These repeats are highly length polymorphic, and they have been used in generating molecular markers for the development of high resolution genetic maps. SSR markers have been developed for mapping the quantitative traits in many crop plants such as corn, wheat, and soybean. As the complete genomes of Arabidopsis and rice were sequenced, it was possible to identify the microsatellite repeats in each genome. We analyzed these two genomes for the presence of di-, tri- and tetra-nucleotide repeats with total repeat lengths more than 23 base pairs. Based on these criteria we found a total of 1454 SSR markers in Arabidopsis and 5002 SSR markers in rice. In both Arabidopsis and rice, there were few repeats that were of tetra-nucleotide repeat classes. The distribution of repeat classes is significantly different between rice and Arabidopsis for di- and tri-nucleotide repeats. The CA class repeats are more abundant in rice (4.5%) than in Arabidopsis (2.2%). Similarly, AAC (4.5% vs 0.5%), AAG (16.7% vs 4.8%), ATC (6.6% vs 3.3%) tri-nucleotide classes present in higher number in Arabidopsis. On the other hand, classes AAT (6.2% vs 3.2%), AGG (3.5% vs 1.2%) and CCG (6.3% vs 0%) are more abundant in rice. Except for AAT repeats, repeats with A and T bases are more frequent in Arabidopsis, whereas repeats with G and C bases are high in number in rice, as might be expected based on the base composition of the respective genomes. The availability of such a vast inventory of potentially polymorphic markers in rice and Arabidopsis will certainly speed the ability to map quantitative loci in each species. We tested AAG repeats in Arabidopsis thaliana for their polymorphism information content using Columbia glabrous and Landsberg erecta strains.