Issues and principles in the analysis of large genomic datasets.
Francis Clark1, Susan Lilley2
1fc@maths.uq.edu.au, Advanced Computational Modelling Centre, University of Queensland, Australia.; 2s364202@student.uq.edu.au, School of Information Technology & Electrical Engineering, University of Queensland, Australia.
The construction of "research pipelines" for the study and
analysis of genomic datasets (or similar) is a markedly different
problem to that of constructing "production pipelines". The
latter task is ideally performed by a software engineer as the
input data and required output are well defined. A research
pipeline is a different sort of beast; it often involves working
with poorly understood data to answer questions that are,
initially, simplistic. This poster overviews some of strategies
and best practices that may be employed in such work, including;
handling & appraisal of the data, choice of appropriate
thresholds, extrapolation, and checking for reasonableness.