Pathways Project

The Pathways Project uses network analysis approaches to better understand the evolution and function of biological pathways. The current focus is on annotating genes within the insulin signaling pathway across the Drosophila genus.

Resources & Tools

Genomic Neighborhood Template (PowerPoint | Google Slides)

Faculty Resources

Micropublications

Help

Contacts

Project Leader: Laura K. Reed

Technical Support: Chinmay P. Rele

Curriculum Support: Katie M. Sandlin

Biological systems are networks, and in these networks, we can define nodes (e.g., genes, proteins, metabolites) connected through edges (e.g., enzymatic/chemical reactions, transcription regulation). Networks have properties that can be measured using a mathematical approach, and we can make predictions about the evolution of a system based on some of those properties.

A “pathway” in a biological system can be defined as a relatively discrete (though never completely isolated) portion of a network. Generally, we view a pathway as a sequence of gene regulatory and enzymatic reactions that produce some important biological outcomes (e.g., synthesize an energy storage molecule, sense and regulate blood sugar levels).

In this project we will be using network analysis approaches to better understand the evolution and function of biological pathways. The Pathways Project is focused on annotating genes found in well characterized signaling and metabolic pathways across the Drosophila genus. The current focus is on the insulin signaling pathway which is well conserved across animals and critical to growth and metabolic homeostasis. The long-term goal of the Pathways Project is to analyze how the regulatory regions of genes evolve in the context of their positions within a network and we anticipate that other pathways will eventually be part of the analyses.

About

Pathways Project Overview provided by the Project Leader, Laura K. Reed (6 minutes) Slideset

The Pathways Project uses network analysis approaches to better understand the evolution and function of biological pathways. This GEP project is focused on annotating genes found in well characterized signaling and metabolic pathways across the Drosophila genus. The current focus is on the insulin signaling pathway which is well conserved across animals and critical to growth and metabolic homeostasis. The long-term goal of the Pathways Project is to analyze how the regulatory regions of genes evolve in the context of their positions within a network and we anticipate that other pathways will eventually be part of the analyses.

Project Curriculum

Introduction to Pathways Project

Lecture is designed to introduce students to the big picture of the Pathways Project.

Pathways Project: Annotation Walkthrough

This walkthrough illustrates how to apply the GEP annotation strategy for the Pathways Project to construct a gene model for the Ras homolog enriched in brain (Rheb) gene in Drosophila yakuba.

Pathways Project: Annotation Workflow

The Annotation Workflow is a one page summary of the annotation protocol for the Pathways Project.

Pathways Project: Reference Glossary

The Reference Glossary includes definitions for terms that are frequently used in the Pathways Project.

Pathways Project: Annotation Form

This “Annotation Form” merged the “Annotation Report” and “Annotation Notebook” into a single document and the latter two items are now archived.

Pathways Project: Annotation Form Exemplar

The Annotation Form Exemplar is provided as an example of a completed Annotation Form ready for submission to the GEP’s Pathways Project. The optional questions were omitted from the exemplar.

Pathways Project: Annotation Form D. pseudoobscura Key

Students can apply what they learned in the Annotation Walkthrough to construct a gene model for Rheb in D. pseudoobscura by completing the Pathways Project: Annotation Form. This answer key is provided to assist instructors in checking the accuracy of the annotation and includes potential areas of confusion throughout.

Pathways Project: Annotation Videos

This series of videos is intended to help GEP students annotate a Pathways Project gene from start to finish.

Genomic Neighborhood Check For Understanding

This was created in response to a member mentioning their students really struggled with the genomic neighborhood and the member didn’t realize until they were already too far into the annotation to correct their misconceptions. This is meant to be a quick in-class and/or homework assignment.

Pilot Project Curriculum

Pathways Project Primer

This PowerPoint presentation provides a primer on the recommended annotation strategy for the Pathways Project. The presentation provides an overview of the goals of the Pathways Project annotations, an introduction to RNA-Seq, web databases, and a discussion on the phases of the splice donor and acceptor sites.

Last Updated	01/10/2024
Authors	Katie Sandlin
Curriculum Type	Lecture
Research Project	Pathways
Feedback Survey	Qualtrics Form
Resource	PPT

Sequence Similarity Introduction

This lesson and exercise defines similarity in a non-biological and biological sense, quantifies the similarity between two sequences, explains how a substitution matrix is used to quantify similarity, calculates amino acid similarity scores using the BLOSUM 62 substitution matrix, explains how BLAST detects similarity between two sequences and how to use BLAST and interpret the alignments.

Last Updated	10/26/2023
Authors	Katie Sandlin
Curriculum Type	Lesson with exercises
Research Project	Any
Feedback Survey	Qualtrics Form
Resource	Box Folder

Prerequisite Curriculum

Module 1. Introduction to the Genome Browser: What is a gene?

This module introduces students to the GEP UCSC Genome Browser. After completing this module students will be able to navigate to a genomic region and to control the display setting for different evidence tracks.

Module 4. Removal of introns from pre-mRNA by splicing

This module uses mRNA data to identify splice sites. After completing this module students will be able to identify intron-exon boundaries using canonical splice donor and acceptor sequences and determine which are best supported by RNA-Seq and TopHat splice junction predictions.

Module 5. Translation: The need for an Open Reading Frame

In this module students will learn how mRNA is translated into a string of amino acids. After completing this module students will be able to determine the codons for specific amino acids as well as start and stop codons. They will be able to identify open reading frames for a given gene, define the phases of splice donor and acceptor sites and describe how they impact the maintenance of the open reading frame.

Module 6. Alternative splicing

This module explores how multiple different mRNAs and polypeptides can be encoded by the same gene. After completing this module students will be able to explain how alternative splicing of a gene can lead to different mRNAs and illustrate how alternative splicing can lead to the production of different polypeptides and result in drastic changes in phenotype.

An Introduction to NCBI BLAST

This walkthrough serves as an introduction to key functionalities of NCBI BLAST. Exercise Exercise Worksheet Worksheet Answer Key Answer Key Package without Answers Package

RNA-Seq Primer

This PowerPoint presentation provides a brief introduction to the different types of RNA-Seq evidence tracks (e.g. Bowtie, TopHat, Cufflinks) that are on the GEP UCSC Genome Browser.

Useful References

Rele CP, Sandlin KM, Leung W and Reed LK. Manual annotation of Drosophila genes: a Genomics Education Partnership protocol [version 1; peer review: 2 approved with reservations]. F1000Research 2022, 11:1579
Mudge, J. M., & Harrow, J. (2016). The state of play in higher eukaryote gene annotation. Nature Reviews Genetics, 17(12), 758-772.
Weitz, J. S., Benfey, P. N., & Wingreen, N. S. (2007). Evolution, interactions, and biological networks. PLoS biology, 5(1), e11.
Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., & Barabási, A. L. (2000). The large-scale organization of metabolic networks. Nature, 407(6804), 651-654.
Alvarez-Ponce, D., Aguadé, M., & Rozas, J. (2009). Network-level molecular evolutionary analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes. Genome research, 19(2), 234–242.
Alvarez-Ponce, D., Guirao-Rico, S., Orengo, D. J., Segarra, C., Rozas, J., & Aguadé, M. (2012). Molecular population genetics of the insulin/TOR signal transduction pathway: a network-level analysis in Drosophila melanogaster. Molecular biology and evolution, 29(1), 123–132.
Alvarez-Ponce, D., Aguadé, M., & Rozas, J. (2011). Comparative genomics of the vertebrate insulin/TOR signal transduction pathway: a network-level analysis of selective pressures. Genome biology and evolution, 3, 87–101.
Alvarez-Ponce D. (2012). The relationship between the hierarchical position of proteins in the human signal transduction network and their rate of evolution. BMC evolutionary biology, 12, 192.
Alvarez-Ponce, D., Feyertag, F., & Chakraborty, S. (2017). Position Matters: Network Centrality Considerably Impacts Rates of Protein Evolution in the Human Protein-Protein Interaction Network. Genome biology and evolution, 9(6), 1742–1756.
Lynch, M., & Conery, J. S. (2000). The evolutionary fate and consequences of duplicate genes. Science (New York, N.Y.), 290(5494), 1151–1155.
Force, A., Lynch, M., Pickett, F. B., Amores, A., Yan, Y. L., & Postlethwait, J. (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151(4), 1531–1545.
Bhutkar, A., Schaeffer, S. W., Russo, S. M., Xu, M., Smith, T. F., & Gelbart, W. M. (2008). Chromosomal rearrangement inferred from comparisons of 12 Drosophila genomes. Genetics, 179(3), 1657–1680.
Wang, M., Wang, Q., Wang, Z., Wang, Q., Zhang, X., & Pan, Y. (2013). The Molecular Evolutionary Patterns of the Insulin/FOXO Signaling Pathway. Evolutionary bioinformatics online, 9, 1–16.
Grönke, S., Clarke, D. F., Broughton, S., Andrews, T. D., & Partridge, L. (2010). Molecular evolution and functional characterization of Drosophila insulin-like peptides. PLoS genetics, 6(2), e1000857.
Brogiolo, W., Stocker, H., Ikeya, T., Rintelen, F., Fernandez, R., & Hafen, E. (2001). An evolutionarily conserved function of the Drosophila insulin receptor and insulin-like peptides in growth control. Current biology : CB, 11(4), 213–221.

FAQs

How should students put together synteny diagrams when a gene is nested within the intron of another gene?

In this example, Ilp6 is within the intron of Raf-PE, however Raf-PA is upstream of Ilp6.

We are defining gene order based on the first/closest protein coding exon only. So if the gene is nested in an intron that is between two non-coding exons then we ignore those UTRs and just define gene order based on the coding exons. If a gene is nested in an intron between two coding exons of another gene then we describe that as nesting. So in this example, Raf is upstream of Ilp6

Which genome assembly should I use for my target species?

The Genome Browser Gateway should default to the correct assembly once you click on the Drosophila species in the left-hand table. To double check, you are using the correct one, you can see which assembly you should be using via the “Genome Browsers” column of the Pathways Project Genome Assemblies web page.

For example, D. yakuba has three assembly options to choose from and according to the Genome Assemblies page, we should use the “Aug. 2021 (Princeton Prin_Dyak_Tai18E2_2.1/ DyakRefSeq3)” assembly when annotating D. yakuba.

What should I do if I encounter a likely sequencing error in an exon?

A sequencing error needs to be first validated by performing tblastn of the region in question against another assembly – it is rare for the same sequencing/assembly error to be present in two distinct assemblies.
Once the sequencing error has been validated, you can use the Sequence Updater to make a VCF file.
- More information on how to use the Sequence Updater can be found its User Guide.
- If you are unsure of how to generate the VCF file/how to identify the bases that you think are errors, please get in touch with the GEP Virtual TAs.
When using the Gene Model Checker, under “Model Details > Errors in Consensus Sequence”, select “Yes”, and upload the VCF file.
Validate the model as usual, and then submit the VCF file during submission along with the Annotation Form, the PEP, FASTA, and GFF.

How do I name a novel isoform?

To name a novel isoform, follow these steps:

Identify the last named isoform for the gene.
Add “-PN” (for putative/protein novel) to the end of the gene name.
Add the letter following the letter of the last named isoform.

For example, if the last named isoform for a gene is “first-PJ,” the novel isoform for that gene would be named “first-PNK.”

Page Last Updated: January 10, 2024

About

Directories

Members

Curriculum

Research Projects

Students

Pathways Project

Resources & Tools

Faculty Resources

Micropublications

Help

Contacts

About

Project Curriculum

Introduction to Pathways Project

Pathways Project: Annotation Walkthrough

Pathways Project: Annotation Workflow

Pathways Project: Reference Glossary

Pathways Project: Annotation Form

Pathways Project: Annotation Form Exemplar

Pathways Project: Annotation Form D. pseudoobscura Key

Pathways Project: Annotation Videos

Genomic Neighborhood Check For Understanding

Pilot Project Curriculum

Pathways Project Primer

Sequence Similarity Introduction

Prerequisite Curriculum

Module 1. Introduction to the Genome Browser: What is a gene?

Module 4. Removal of introns from pre-mRNA by splicing

Module 5. Translation: The need for an Open Reading Frame

Module 6. Alternative splicing

An Introduction to NCBI BLAST

RNA-Seq Primer

Useful References

FAQs