TY - JOUR
T1 - 2X GENOMES-DEPTH DOES MATTER
AU - Milinkovitch, Michel
AU - Helaers, Raphaël
AU - Depiereux, Eric
AU - Tzika, Athanasia
AU - Gabaldon, Toni
PY - 2009
Y1 - 2009
N2 - Background
Given the availability of full genome sequences, mapping gene gains, duplications, and losses during evolution should theoretically be straightforward. However, this endeavor suffers from overemphasis on detecting conserved genome features, which in turn has lead to sequencing multiple eutherian genomes with low coverage rather than fewer genomes with high-coverage and evener distribution in the phylogeny. Although limitations associated with analysis of low coverage genomes are recognized, they have not been quantified.
Results
Here, using recently-developed comparative genomic application systems, we evaluate the impact of low-coverage genomes on inferences pertaining to gene gains and losses when analyzing eukaryote genome evolution through gene duplication. We demonstrate that, when performing inference of genome content evolution, low-coverage genomes generate not only a massive number of false gene losses, but also striking artifacts in gene duplication inference, especially at the most recent common ancestor of low-coverage genomes. We show that the artifactual gains are caused by the low coverage of genome sequence per se rather than by the increased taxon sampling in a biased portion of the species tree.
Conclusions
We argue that it will remain difficult to differentiate artifacts from true changes in modes and tempo of genome evolution until there is better homogeneity in both taxon sampling and high-coverage sequencing. This is important for broadening the utility of full genome data to the community of evolutionary biologists, whose interests go well beyond widely-conserved physiologies and developmental patterns as they seek to understand the generative mechanisms underlying biological diversity.
AB - Background
Given the availability of full genome sequences, mapping gene gains, duplications, and losses during evolution should theoretically be straightforward. However, this endeavor suffers from overemphasis on detecting conserved genome features, which in turn has lead to sequencing multiple eutherian genomes with low coverage rather than fewer genomes with high-coverage and evener distribution in the phylogeny. Although limitations associated with analysis of low coverage genomes are recognized, they have not been quantified.
Results
Here, using recently-developed comparative genomic application systems, we evaluate the impact of low-coverage genomes on inferences pertaining to gene gains and losses when analyzing eukaryote genome evolution through gene duplication. We demonstrate that, when performing inference of genome content evolution, low-coverage genomes generate not only a massive number of false gene losses, but also striking artifacts in gene duplication inference, especially at the most recent common ancestor of low-coverage genomes. We show that the artifactual gains are caused by the low coverage of genome sequence per se rather than by the increased taxon sampling in a biased portion of the species tree.
Conclusions
We argue that it will remain difficult to differentiate artifacts from true changes in modes and tempo of genome evolution until there is better homogeneity in both taxon sampling and high-coverage sequencing. This is important for broadening the utility of full genome data to the community of evolutionary biologists, whose interests go well beyond widely-conserved physiologies and developmental patterns as they seek to understand the generative mechanisms underlying biological diversity.
M3 - Article
VL - 11
JO - Genome Biology
JF - Genome Biology
ER -