GENOME-WIDE HAPLOTYPE STRUCTURE VISUALIZATION AND ANALYSIS IN MOUSE

Tim Wiltshire1, Serge Batalov 2, Mathew Pletcher, R.J.Mural, M.D.Adams, C.F.Fletcher
1timw@gnf.org, GNF; 2batalov@gnf.org, GNF

GENOME-WIDE HAPLOTYPE STRUCTURE VISUALIZATION AND ANALYSIS IN MOUSE

Serge Batalov1, Mathew T. Pletcher2, Richard J. Mural3, Mark D. Adams3, Colin F. Fletcher1,2, Tim Wiltshire1

1Genomics Institute of the Novartis Research Foundation (GNF), San Diego, CA, 2The Scripps Research Institute, La Jolla, CA, 3Celera Genomics, Rockville, MD, USA

The laboratory mouse is an excellent experimental model, in part because of well-defined strain genealogies, standardized mapping tools, and the availability of sophisticated genetic technologies. Simple sequence-length polymorphic (SSLP) markers have, until now, provided the foundation for genetic mapping in the mouse but the use of single-nucleotide polymorphisms (SNPs) is an attractive option because they are by far the most abundant variation in the genome, potentially providing a much larger marker set.

To identify a genome-wide panel of SNPs, we selected 2,600 evenly distributed loci (STSs) for sequencing from 6 common inbred ("laboratory") mouse strains (C57BL/6J, 129S1/SvImJ, C3H/HeJ, DBA/2J, A/J, BALB/cByJ) and wild derived strains, M. m. castaneus and M. spretus. Publicly available data was added to form a database containing 18,000+ individual SNPs across 3,950 loci. Pairwise comparison of C57BL/6J with each of the other laboratory strains in this dataset revealed 23 non-polymorphic regions greater than 20 Mb; the largest 'gap' is 61 Mb on chromosome 10 between C57BL/6J and DBA/2J. We have extended this analysis to all strain comparisons and found that regions of haplotype sharing were in some cases surprisingly large, up to 120Mb. SNPview, the interactive navigator for the individual SNPs, SSLPs, alleles and haplotypes projected to the genomic axis is available on-line at http://www.gnf.org/SNP/ . In most regions no more than three haplotypes were identified for a given interval, suggesting that common laboratory mouse strains represent a genetic mosaic of a small number of founder strains. These results indicate that large, but discrete regions of the genome are not very polymorphic between particular strain pairs, and thus cannot easily be interrogated by SNPs or SSLPs for natural genetic variations influencing QTLs.

The validity of this sampling approach was confirmed by analysis of 70,957 Chr.16 SNPs, identified from Celera's whole genome shotgun sequencing of three strains. Pairwise comparison of strains across Chr.16 reveals large intervals of very low SNP density (1 per 50Kb), abruptly switching to regions of "normal" density (>1 per 5Kb). Analysis of the features of this comprehensive set of Chr.16 SNPs will be presented.

This study provides a framework in the mouse for investigating the mechanisms that underlie haplotype conservation and the extent of linkage disequilibrium, as well as a powerful toolset for exploiting this knowledge to clone mouse genes from mutant or quantitatively variant strains. The genotyping and haplotype analysis for 24 additional strains is currently under way using a larger, more uniform, set of SNPs.