Data from: Optimisation of next generation sequencing transcriptome annotation for species lacking sequenced genomes

Main Authors: Ockendon, Nina F., O'Connell, Lauren A., Bush, Stephen J., Monzon-Sandoval, Jimena, Barnes, Holly, Székely, Tamás, Hofmann, Hans A., Dorus, Steve, Urrutia, Araxi O.
Format: info dataset Journal
Terbitan: , 2015
Subjects:
Online Access: https://zenodo.org/record/4964848
Daftar Isi:
  • Next generation sequencing methods, such as RNA-seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA-seq annotation methods using reference genomes from related species have yet to be robustly characterised. Here we conduct a comprehensive power analysis employing RNA-seq data from Drosophila melanogaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methods and quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analyses demonstrate that, regardless of the level of sequence divergence, direct genome mapping, where transcript short reads are aligned directly to the reference genome, significantly outperforms the widely used de novo and guided assembly-based methods in both the quantity and accuracy of gene detection. Our analysis also reveals that direct genome mapping recovers a more representative profile of Gene Ontology functional categories, which are often used to interpret emergent patterns in genome-wide expression analyses. Lastly, analysis of available primate RNA-seq data demonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy and reduced gene detection associated with sequence divergence thus provide empirically derived guidelines for the design of future gene expression studies in species without sequenced genomes.
  • Dmel_denovo_assembly_VelvetOases.faD. melanogaster de novo assembly constructed using Velvet OasesDros_denovo_assembly_Trinity.faD. melanogaster de novo assembly constructed using TrinityDrosophila guided assembliesGuided assemblies constructed from D. melanogaster RNA-seq reads guided by the genomes of each of the other 11 drosophila species, constructed using Velvet Columbusggv.tgzhuman-RNAseq_mapto_Gorilla-gorillaHuman RNA-seq reads aligned to the G. gorilla genomehuman-RNAseq_mapto_Pan-troglodytesHuman RNA-seq reads aligned to the P. troglodytes genomehuman-RNAseq_mapto_Pongo-abeliiHuman RNA-seq reads aligned to the P. abelii genomehuman-RNAseq_mapto_Macaca-mulattaHuman RNA-seq reads aligned to the M. mulatta genomehuman-RNAseq_mapto_total-humanHuman RNA-seq reads aligned to the human genomedmel_RNA-seq_mapto_dana_samD. melanogaster RNA-seq reads aligned to the D. ananassae genomedmel_RNA-seq_mapto_dere_samD. melanogaster RNA-seq reads aligned to the D. erecta genomedmel_RNA-seq_mapto_dgri_samD. melanogaster RNA-seq reads aligned to the D. grimshawi genomedmel_RNA-seq_mapto_dmel_samD. melanogaster RNA-seq reads aligned to the D. melanogaster genomedmel_RNA-seq_mapto_dmoj_samD. melanogaster RNA-seq reads aligned to the D. mojavensis genomedmel_RNA-seq_mapto_dper_samD. melanogaster RNA-seq reads aligned to the D. persimilis genomedmel_RNA-seq_mapto_dpse_samD. melanogaster RNA-seq reads aligned to the D. pseudoobscura genomedmel_RNA-seq_mapto_dsec_samD. melanogaster RNA-seq reads aligned to the D. sechellia genomedmel_RNA-seq_mapto_dsim_samD. melanogaster RNA-seq reads aligned to the D. simulans genomedmel_RNA-seq_mapto_dvir_samD. melanogaster RNA-seq reads aligned to the D. virilis genomedmel_RNA-seq_mapto_dwil_samD. melanogaster RNA-seq reads aligned to the D. willistoni genomedmel_RNA-seq_mapto_dyak_samD. melanogaster RNA-seq reads aligned to the D. yakuba genomedmel_RNA-seq_mapto_dyak.sam.out.gzFigure_data_filesData text files for all figures. README.txt contains details of content.