Skip to content

Pathways Project

The Pathways Project uses network analysis approaches to better understand the evolution and function of biological pathways. The current focus is on annotating genes within the insulin signaling pathway across the Drosophila genus.
""

Biological systems are networks, and in these networks, we can define nodes (e.g., genes, proteins, metabolites) connected through edges (e.g., enzymatic/chemical reactions, transcription regulation). Networks have properties that can be measured using a mathematical approach, and we can make predictions about the evolution of a system based on some of those properties.

A “pathway” in a biological system can be defined as a relatively discrete (though never completely isolated) portion of a network. Generally, we view a pathway as a sequence of gene regulatory and enzymatic reactions that produce some important biological outcomes (e.g., synthesize an energy storage molecule, sense and regulate blood sugar levels).

In this project we will be using network analysis approaches to better understand the evolution and function of biological pathways. The Pathways Project is focused on annotating genes found in well characterized signaling and metabolic pathways across the Drosophila genus. The current focus is on the insulin signaling pathway which is well conserved across animals and critical to growth and metabolic homeostasis. The long-term goal of the Pathways Project is to analyze how the regulatory regions of genes evolve in the context of their positions within a network and we anticipate that other pathways will eventually be part of the analyses.

About

Pathways Project Overview provided by the Project Leader, Laura K. Reed (6 minutes) Slideset

The Pathways Project uses network analysis approaches to better understand the evolution and function of biological pathways. This GEP project is focused on annotating genes found in well characterized signaling and metabolic pathways across the Drosophila genus. The current focus is on the insulin signaling pathway which is well conserved across animals and critical to growth and metabolic homeostasis. The long-term goal of the Pathways Project is to analyze how the regulatory regions of genes evolve in the context of their positions within a network and we anticipate that other pathways will eventually be part of the analyses.

Project Curriculum

Pathways Project: Annotation Walkthrough

This walkthrough illustrates how to apply the GEP annotation strategy for the Pathways Project to construct a gene model for the Ras homolog enriched in brain (Rheb) gene in Drosophila yakuba.

Pathways Project: Annotation Form

This “Annotation Form” merged the “Annotation Report” and “Annotation Notebook” into a single document and the latter two items are now archived.

Pathways Project: Annotation Form Exemplar

The Annotation Form Exemplar is provided as an example of a completed Annotation Form ready for submission to the GEP’s Pathways Project. The optional questions were omitted from the exemplar.

Pathways Project: Annotation Form D. pseudoobscura Key

Students can apply what they learned in the Annotation Walkthrough to construct a gene model for Rheb in D. pseudoobscura by completing the Pathways Project: Annotation Form. This answer key is provided to assist instructors in checking the accuracy of the annotation and includes potential areas of confusion throughout.

Genomic Neighborhood Check For Understanding

This was created in response to a member mentioning their students really struggled with the genomic neighborhood and the member didn’t realize until they were already too far into the annotation to correct their misconceptions. This is meant to be a quick in-class and/or homework assignment.

Synteny Introduction Slides

This resource is a slide deck offering an expanded introduction to synteny. Instructors are encouraged to use and modify to fit the needs of specific courses and students.

Pathways Project Primer

This PowerPoint presentation provides a primer on the recommended annotation strategy for the Pathways Project. The presentation provides an overview of the goals of the Pathways Project annotations, an introduction to RNA-Seq, web databases, and a discussion on the phases of the splice donor and acceptor sites.

Sequence Similarity Introduction

This lesson and exercise defines similarity in a non-biological and biological sense, quantifies the similarity between two sequences, explains how a substitution matrix is used to quantify similarity, calculates amino acid similarity scores using the BLOSUM 62 substitution matrix, explains how BLAST detects similarity between two sequences and how to use

Prerequisite Curriculum

Module 4. Removal of introns from pre-mRNA by splicing

This module uses mRNA data to identify splice sites. After completing this module students will be able to identify intron-exon boundaries using canonical splice donor and acceptor sequences and determine which are best supported by RNA-Seq and TopHat splice junction predictions.

Module 5. Translation: The need for an Open Reading Frame

In this module students will learn how mRNA is translated into a string of amino acids. After completing this module students will be able to determine the codons for specific amino acids as well as start and stop codons. They will be able to identify open reading frames for a given gene, define the phases of splice donor and acceptor sites and describe how they impact the maintenance of the open reading frame.

Module 6. Alternative splicing

This module explores how multiple different mRNAs and polypeptides can be encoded by the same gene. After completing this module students will be able to explain how alternative splicing of a gene can lead to different mRNAs and illustrate how alternative splicing can lead to the production of different polypeptides and result in drastic changes in phenotype.

An Introduction to NCBI BLAST

This walkthrough serves as an introduction to key functionalities of NCBI BLAST. Exercise Exercise Worksheet Worksheet Answer Key Answer Key Package without Answers Package

RNA-Seq Primer

This PowerPoint presentation provides a brief introduction to the different types of RNA-Seq evidence tracks (e.g. Bowtie, TopHat, Cufflinks) that are on the GEP UCSC Genome Browser.

Useful References

FAQs

In this example, Ilp6 is within the intron of Raf-PE, however Raf-PA is upstream of Ilp6.

We are defining gene order based on the first/closest protein coding exon only.  So if the gene is nested in an intron that is between two non-coding exons then we ignore those UTRs and just define gene order based on the coding exons. If a gene is nested in an intron between two coding exons of another gene then we describe that as nesting. So in this example, Raf is upstream of Ilp6

The Genome Browser Gateway should default to the correct assembly once you click on the Drosophila species in the left-hand table. To double check, you are using the correct one, you can see which assembly you should be using via the “Genome Browsers” column of the Pathways Project Genome Assemblies web page. ""

For example, D. yakuba has three assembly options to choose from and according to the Genome Assemblies page, we should use the “Aug. 2021 (Princeton Prin_Dyak_Tai18E2_2.1/ DyakRefSeq3)” assembly when annotating D. yakuba.

  • A sequencing error needs to be first validated by performing tblastn of the region in question against another assembly – it is rare for the same sequencing/assembly error to be present in two distinct assemblies.
  • Once the sequencing error has been validated, you can use the Sequence Updater to make a VCF file.
    • More information on how to use the Sequence Updater can be found its User Guide.
    • If you are unsure of how to generate the VCF file/how to identify the bases that you think are errors, please get in touch with the GEP Virtual TAs.
  • When using the Gene Model Checker, under “Model Details > Errors in Consensus Sequence”, select “Yes”, and upload the VCF file.
  • Validate the model as usual, and then submit the VCF file during submission along with the Annotation Form, the PEP, FASTA, and GFF.
To name a novel isoform, follow these steps: 
  • Identify the last named isoform for the gene.
  • Add “-PN” (for putative/protein novel) to the end of the gene name.
  • Add the letter following the letter of the last named isoform.

For example, if the last named isoform for a gene is “first-PJ,” the novel isoform for that gene would be named “first-PNK.”