Daftar Isi: TEST DATA for Enhanced protein isoform characterization through long-read proteogenomics

Main Authors:	Miller, Rachel, Jordan, Ben, Jeffery, Erin, Mehlferber, Madison, Chatzipantsiou, Christina, Deslattes Mays, Anne, Shortreed, Michael, Millikin, Robert, Smith, Lloyd, TIberi, Simone, Conesa, Ana, Sheynkman, Gloria
Format:	Article Journal
Terbitan:	, 2021
Subjects:	Long-read RNA-seq PacBio SQANTI Protein Iso-Seq RNA-Seq protein parsimony ORF gene annotation multi-omics transcriptomics proteomics FAIR
Online Access:	https://zenodo.org/record/5081284

Daftar Isi:

Test data for The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g. PacBio, Oxford Nanopore) provides full-length transcript sequencing, which can be used to predict full-length proteins. Here, we describe a long-read proteogenomics approach for integrating matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data in protein inference to enable detection of protein isoforms that are intractable to MS detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis.