Drosophila Pathways

GEP Annotation Protocol

We’re excited to announce this preprint as it moves us one step closer to rolling out microPublications for the Pathways Project!

Abstract: Annotating the genomes of multiple organisms allows us to study their genes as well as the evolution of those genes. While many eukaryotic genome assemblies already include computational gene predictions, these predictions can benefit from review and refinement through manual gene annotation. The Genomics Education Partnership (GEP; thegep.org) has developed an annotation protocol for protein-coding genes that enables undergraduate students and other researchers to create high-quality gene annotations that can be utilized in subsequent scientific investigations. For example, this protocol has been utilized by the GEP faculty to engage undergraduate students in the comparative annotation of genes involved in the insulin signaling pathway in 28 Drosophila species, using D. melanogaster as the informant genome. Students construct gene models using multiple lines of computational and experimental evidence including expression data (e.g., RNA-Seq), sequence similarity (e.g., BLAST, multiple sequence alignments), and computational gene predictions. For quality control, each gene is annotated by at least two students working independently, followed by reconciliation of the submitted gene models by a more experienced student. This article provides an overview of the annotation protocol and describes how discrepancies in student submitted gene models are resolved to produce a final, high-quality gene set suitable for subsequent analyses. This annotation protocol can be adapted to other scientific questions (e.g., expansion of the Drosophila Muller F element) and other species (e.g., parasitoid wasps) to provide additional opportunities for undergraduate students to participate in genomics research. These student annotation efforts can substantially improve the quality of gene annotations in publicly available genomic databases.

Final gene model for Akt1 in D. arizonae, along with the submitted student models, and RNA-Seq data aligning to the region. The final model shows that despite there being only a single isoform prediction for a protein coding gene by RefSeq, there are likely two protein coding isoforms for this gene, which were annotated using multiple lines of evidence. The second isoform has a larger coding region in the reconciled gene model that is missed by the RefSeq genome predictor.

Rele CP, Sandlin KM, Leung W, Reed LK. Manual Annotation of Genes within Drosophila Species: the Genomics Education Partnership protocol. bioRxiv 2020.12.10.420521; doi: https://doi.org/10.1101/2020.12.10.420521.

Pathways Project: Annotation Workflow

The Annotation Workflow is a one page summary of the annotation protocol for the Pathways Project. This workflow provides an overview of the key analysis steps and bioinformatics tools for the annotation of a putative ortholog.

Pathways Project: Annotation Notebook

The Pathways Annotation Notebook will help GEP students keep track of their work as they are annotating, and then they can use the notebook to fill out the report form. This notebook does not have to be submitted to GEP, it’s just an extra resource students might find helpful.

Pathways Project: Annotation Report

GEP students will use the Report Form document to submit their annotation results for the Pathways Project. The Project Details Table handout will show students how to fill out the first page of the Annotation Report. The Report Form Exemplar is provided as an example of a completed report. The Annotation Workflow is a one page summary of the annotation protocol.  

Pathways Project: Annotation Walkthrough

The Pathways Project is focused on annotating genes found in well characterized signaling and metabolic pathways across the Drosophila genus. This walkthrough illustrates how to apply the GEP annotation strategy for the Pathways Project to construct a gene model for the Ras homolog enriched in brain (Rheb) gene in Drosophila yakuba.

Beta Version of Pathways Project Walkthrough is Live

We are very excited to release the beta version of the newest piece of GEP curriculum, the Drosophila Pathways Project: Annotation Walkthrough! Katie has been working really hard to bring this into being. We appreciate the indispensable help from the GEP members who shared with us their own curriculum and personal observations and piloted our earlier drafts. In particular, we would like to acknowledge the contributions by Andy Arsham, Indi Bose, John Braverman, Amy Hark, Shan Hays, Jennifer Kennell, Lindsey Long, Juan Carlos Martinez-Cruzado, Mollie Manier, Chinmay Rele, Joyce Stamm, Jeff Thompson, Jacqueline Wittke-Thompson, and Jim Youngblom (and others we apologize if we missed). An extra special thanks goes out to Alexa Sawa and her students who gave extensive feedback on the penultimate version of this curriculum by using it in Alexa’s January term research course. Also, as always, Wilson was of critical help in shaping the walkthrough and its contents, and in the development of the supporting tools.

This walkthrough illustrates how to apply the GEP annotation strategy for the Pathways Project to construct a gene model for the Ras homolog enriched in brain (Rheb) gene in Drosophila yakuba.

Visit the Drosophila Pathways Project page for more information.