Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θW, Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.

Original publication

DOI

10.1534/genetics.112.139949

Type

Journal

Genetics

Publication Date

08/2012

Volume

191

Pages

1397 - 1401

Addresses

Centre for Research in Agricultural Genomics, 08193 Bellaterra, Spain. luca.ferretti@uab.cat

Keywords

Data Interpretation, Statistical, Genetics, Population, Gene Frequency, Algorithms, Models, Genetic, Computer Simulation, Genetic Variation