MapNet2023-Dodds.pdf (1.65 MB)

Correcting duplication effects in sequencing-based genotypes

Download (1.65 MB)
conference contribution
posted on 2024-01-15, 20:06 authored by Ken DoddsKen Dodds, Alan McCulloch, Rudiger BrauningRudiger Brauning, Tracey Van StijnTracey Van Stijn, Shannon ClarkeShannon Clarke

Most pipelines for calling genotypes or providing allelic counts from sequencing data assume that each of the possible alleles have been sampled at random. For a biallelic SNP, this corresponds to binomial sampling with the observed read depth as the sample size. However, the processes involved in generating the sequence reads can sometimes lead to duplication events whereby a particular sequence is copied and read multiple times. This leads to greater variation in allele counts than would be predicted by the binomial model, resulting in false inference of homozygous genotypes. A common practice with randomly sheared DNA fragments is to remove exact duplicate reads, but for restriction enzyme-based reduced representational sequencing (RE-RRS) some exact duplicate reads are expected. We investigate duplicate effects on an Illumina NovaSeq 6000 for RE-RRS. Duplication events specific to patterned flowcells result in duplicates tending to be spatially close to the original read, allowing for bioinformatic deduplication involving unsupervised spatial clustering of candidate duplicates. We have found that this may need to be complemented by using statistical models that allow for extra-binomial variation. Model parameters can be estimated using results from parents and their offspring or from multiple results on the same individual. Combining these two approaches supports suitable downstream analyses despite the presence of duplications in the sequencing results.


Rights statement

This is an open-access output. It may be used, distributed or reproduced in any medium, provided the original author and source are credited.

Publication date


Project number

  • 49050


  • English

Does this contain Māori information or data?

  • No


AgResearch Ltd

Conference name

New Zealand Molecular Mapping Workshop (MapNet 2023)

Conference location

Dunedin, New Zealand

Conference start date


Conference end date


Usage metrics



    Ref. manager