Genotyping-by-sequencing (GBS) is a high-throughput sequencing technology developed in the last decade for obtaining genetic information. The advantage of GBS, compared to previous technologies, is that it generates large amounts of data in a cost- and time-efficient manner and is readily transferable to a broad range of species. However, data generated from GBS contains more errors that if not accounted for, result in substantial bias in estimation of genetic quantities. Many approaches for mitigating these errors involve generating more data per individual and discarding data with uncertain information. This is an inefficient use of resources and data. The goal of this thesis is to develop new statistical methods for analysing GBS data in a range of genetic analyses, focusing on the use of a binomial-type sampling model to account for errors in the data.
The first type of analysis we consider is the construction of genetic linkage maps, which are 1-dimensional representations of genetic inheritance along a chromosome. Linkage maps are important, as they form the starting point for many downstream genetic analyses. These maps are often constructed using a hidden Markov model (HMM) framework. Existing HMMs were developed for data generated from previous technologies and are unsuitable for GBS data. In this thesis, we extend current HMMs to GBS data. Benchmarking against current methods, simulation results show that our model provides accurate estimates of the model parameters, while an analysis of a GBS dataset shows our method reduces inflation in map length caused by errors.
We explore fitting the HMMs for linkage maps using a Bayesian framework to aid characterization of the uncertainty of various genetic parameters. An analysis of a GBS dataset highlights the utility of the Bayesian approach for producing credible intervals on a range of genetic quantities but shows that there are issues with selecting priors when making inference on genetic map distances. We also investigate extending the HMM for GBS data into a hierarchical modelling framework, as such models often improve parameter estimation, and results show that it circumvent issues with selecting priors under the Bayesian approach. Results from a simulation study show that the hierarchical HMM improves the mean squared error of parameter estimates, but also highlights some under-appreciated properties of hierarchical models.
The second type of analysis we consider is estimating genetic relatedness, which is the extent genetic information within or between individuals is derived from common ancestry. We derive two new estimators of genetic relatedness in polyploid species with GBS data that extend existing method of moments estimators developed for diploid species. Our derivations and results reveal that methods for diploid species and their properties can be extended to autopolyploids in the context of relatedness estimation. Simulation results show that our new estimators provide accurate estimates of relatedness with polyploid GBS data under some scenarios, but highlights that these estimators have different properties. An analysis of a GBS dataset shows that one important application of these estimators is the detection of potential errors in recorded pedigrees.
History
Rights statement
This is an open-access output. It may be used, distributed or reproduced in any medium, provided the original author and source are credited.
Language
English
Does this contain Māori information or data?
No
Publisher
University of Otago
Citation
Bilton, T. (2020). Developing statistical methods for genetic analysis of genotypes from genotyping-by-sequencing data. (Thesis, Doctor of Philosophy). University of Otago.