
F Element Project
GEP students produce coding region and transcription start site annotations for F element genes in D. ananassae, D. bipectinata, D. kikkawai, and D. takahashii, as well as for genes in a euchromatic reference region derived from the Muller D element.
Contact
- Project Leaders
- Senior Scientist
- Lab Website
- Curriculum & Technical Support
About the Project
One of the unusual features of eukaryotic genomes is the discordance between genome size and the complexity of the organism (i.e., the C-value paradox; Eddy, 2012). The smallest chromosome in Drosophila melanogaster is chromosome 4 (also known as the Muller F element), with an estimated size of ~5.2 Mb (Locke and McDermid, 1993). The D. melanogaster F element is generally packaged as heterochromatin: it has a high repeat content, is packaged throughout with HP1a and H3K9me2/3; it exhibits late replication and little or no recombination. However, the banded portion (~1.4 Mb) of this chromosome also contains ~80 protein-coding genes. These F element genes exhibit a range of expression levels similar to genes that reside in euchromatic domains — indicating that F element genes have acquired distinct features that enable them to function in a heterochromatic environment (reviewed in Riddle and Elgin, 2018).
While the F element has maintained a similar size in many other Drosophila species, it is substantially larger in at least four Drosophila species (i.e., D. ananassae, D. bipectinata, D. kikkawai, and D. takahashii). For example, the D. ananassae Muller F element is more than 18.7 Mb in size. This study will examine the factors (e.g., transposon density) that have contributed to the expansion of the F element in these four Drosophila species, and assess the impact of this expansion on gene characteristics (e.g., codon bias, intron size).
GEP students will produce coding region and transcription start site annotations for F element genes in D. ananassae, D. bipectinata, D. kikkawai, and D. takahashii, as well as for genes in a euchromatic reference region derived from the Muller D element. (Euchromatic regions have not expanded in these species.) Comparative analyses using these datasets will provide insights into the evolutionary impacts of changes in chromosome and gene size, and will facilitate the identification of factors that enable genes to function in a heterochromatic environment. We anticipate that this work will move us toward a better understanding of how and why eukaryotic genomes became so large, for mammals, ~1000X larger than that of E. coli.
Introduction to the F Element Project provided by Dr. Sarah C.R. Elgin.
Quick Start Guide
- Exercises: require students to go through the process answering questions (can be graded)
- Walkthrough: highly scripted "click the button" tours that guide students through a protocol
- Lecture: provides an annotated slide set
- Workflows: one-page handouts to remind students of the steps for a protocol, including the logic for decision making
- Resource: project-specific materials (e.g., annotation protocols, strategy guides, and reports)
Concepts/Tools
Minimum Recommended Materials
Optional/Additional Materials
Concepts/Tools 1, 2, & 3
Beginning curriculum common to all GEP Research Projects.
- 1. Gene structure, Introduction to a genome browser, Introduction to gene model construction
- The recommended materials provide an introduction to gene structure by covering transcription, mRNA processing, and translation. The introduction to gene model covers topics including a codon consists of three nucleotides, that a coding exon can begin or end in the middle of a codon (i.e., phases of the splice acceptor and donor sites), and the need to maintain an open reading frame after splicing.
- Starting Points: Freshmen/Sophomores just learning about eukaryotic gene structure (exons/introns, etc.) may be best served by taking the time to work through all six “Understanding Eukaryotic Genes” (UEG) modules. Upper level students with a good grounding in molecular genetics need only review this material, which also introduces the nomenclature and tools of a genome browser (UCSC Genome Browser Mirror). UEG is highly scripted and can be given as an “at home” assignment before the start of the semester.
- 2. Homology/BLAST
- BLAST reports regions of sequence similarity; what is it looking for and how do we interpret the report?
- Using BLAST for Genomic Sequence Annotation
- Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment accompanying notes for above PPT
- 3. RNA-Seq (Optional)
- We make extensive use of RNA-Seq data; this is primarily derived from processed transcripts, so we can identify splice sites based on spliced RNA-Seq reads.
Prepares students to work on their own annotation projects; submitting completed projects to GEP allows them to become co-authors (if they read/critique/approve the resulting group manuscript or participate in microPublication).
- 4. Annotation
- How to use multiple lines of evidence to establish the presence of a gene, determine its D. melanogaster ortholog, and construct the best gene model, as defined by the available evidence. In most cases our evidence includes:
- 1. sequence homology = evolutionary conservation
- 2. ab initio gene models = rules of ORF, start and stop codons
- 3. RNA-Seq data = local transcription, position of exons, splice site borders
- Annotation of Drosophila Primer describes annotation goals and strategies; accompanies the walkthrough below
- Annotation of a Drosophila Gene
- Simple Annotation Problem (optional)
- F Element Project: Annotation Report detailed Word document; students must complete this form providing evidence for their annotation to contribute to the research and be eligible to be a co-author
- GEP Annotation Workflow
- Annotating Splice Sites Workflow
- Identify the D. melanogaster Ortholog
- The following PPTs provide alternatives to the “Annotation of Drosophila Primer” (mix and match slides to fulfill pedagogic needs)1
- Annotation of Drosophila more detailed version of the “Annotation for D. virilis” PPT; provides additional information regarding the RNA-Seq data, interpretation of the BLAST search results, and phases of splice donor and splice acceptor sites; some faculty adapt the curriculum by combining a subset of slides from the “Annotation of Drosophila” presentation with the “Annotation for D. virilis” presentation
- Annotation for D. virilis uses mav gene which has two coding exons; provides quickest way to illustrate an overview of the annotation protocol
- Other Optional Materials:
- Annotation Instruction Sheet more in-depth description of the GEP approach; complements the "Annotation for D. virilis" PPT
- Annotation Strategy Guide how GEP strategies can be applied to more challenging cases
- Browser-based Annotation and RNA Seq-data provides more practice on comparative annotation, using BLAST and RNA-Seq
- GEP Digital Laboratory Notebook example of a digital notebook to guide students
- 1 typically used in workshops where the participants work on the “Annotation of a Drosophila Gene” walkthrough during the same training session; uses same gene as the walkthrough (i.e., CG31997) to illustrate the key steps of the annotation protocol (e.g., the group attending the Summer 2022 ABLE meeting used a modified version of the “Annotation of Drosophila Primer” and the walkthrough in their presentation
If time permits, the project will benefit by students checking the Transcription Start Sites annotation for each gene as well.
- 5. Annotating the Transcription Start Sites (TSS) (Optional)
- Introduces concepts of promoter architecture and the experimental techniques for characterizing promoters (e.g., CAGE and RAMPAGE; RNA-Seq; ChIP-Seq data for RNA Polymerase II and transcription factors; DNase I Hypersensitive Sites in chromatin; 9-state chromatin models based on histone modifications).
- Note: We now have RAMPAGE data for the four species used here. The draft TSS annotation protocols and curriculum materials which use the new RAMPAGE data are available on Box.
- Searching for Transcription Start Sites in Drosophila
- Annotation of Transcription Start Sites in Drosophila
- TSS Annotation Workflow
- TSS Module Primer: Review of Transcription, Promoter Structure, and Chromatin Packaging provides an overview of promoter structure in eukaryotic genomes that might be helpful to students before they work on the TSS Modules
- TSS Modules (1-4)
Annotated Lecture Slides
1. Eukaryotic Genomes and Chromatin Structure
This lecture introduces the C-value paradox and explains how we first recognized that eukaryotic genomes are full of repetitious sequences by using Cot curves; followed by repeat characteristics of eukaryotic genomes; the need to package all that DNA to get it into a nucleus; the development of the nucleosome model; and the relationship between nucleosome arrays and gene expression.
2. Heterochromatin Formation — It’s all about silencing!
This lecture develops the relationship between chromatin packaging and control of gene expression, a significant epigenetic system that allows the genome to respond to changes in environment, both the external environment and physiological cues (e.g., hormone responses).
3. The Dilemma of Transposable Elements: Can’t Live with Them, Can’t Evolve without Them!
This lecture introduces students to the analysis of repetitious elements in the genome. It can be used as a stand-alone lecture, or included in the “F Element Project: Annotated Lecture Slides” sequence of lectures.
4. Characteristics of the F Element
This lecture combines wet-bench work in the Elgin lab, results of chromatin mapping by the modENCODE consortium, and the bioinformatics efforts of GEP faculty and students to describe what we have learned about the F element.
Project Curriculum
Gene Annotation: Constructing a Defendable Exon/Intron Gene Model
An Introduction to NCBI BLAST
This walkthrough serves as an introduction to key functionalities of NCBI BLAST. Exercise Exercise Worksheet Worksheet Answer Key Answer Key Package without Answers Package
Detecting and Interpreting Genetic Homology
An introductory exercise using BLAST to annotate a region in the Drosophila melanogaster genome. Students can use this exercise to gain familiarity with performing BLAST searches and interpreting BLAST output. An answer key is provided for instructors.
Introduction to BLAST using Human Leptin
Dr. Justin R. DiAngelo (Penn State Berks) and Dr. Alexis Nagengast (Widener University) have developed an exercise that introduces students to the basic functionality of the NCBI web site and NCBI BLAST. Students will use NCBI BLAST to identify the putative orthologs of the human Leptin gene in other species.
RNA-Seq: a Closer Look at Read Mapping
Developed by Jeremy Buhler, this PowerPoint presentation provides an introduction to the core algorithms that form the basis for efficient mapping of RNA-Seq reads against a genome or transcriptome. The video that accompanies this presentation was developed by Leocadia Paliulis (Bucknell University). PowerPoint Handout
Browser-Based Annotation and RNA-Seq Data
This exercise continues your introduction to practical issues in comparative annotation. You will be annotating genomic sequence from the dot chromosome of Drosophila mojavensis using your knowledge of BLAST and some improved visualization tools. You will also consider how best to integrate information from high-throughput sequencing of expressed RNA.
Annotation of Drosophila Primer
This PowerPoint presentation provides a brief primer on the recommended annotation strategy for Drosophila projects. The presentation provides an overview of the goals of the GEP annotation project, an introduction to RNA-Seq, web databases, and a discussion on the phases of the splice donor and acceptor sites.
Annotation of a Drosophila Gene
This walkthrough uses the annotation of a gene on the D. biarmipes Muller F element to illustrate the GEP comparative annotation strategy. This document shows how you can investigate a feature in an annotation project using FlyBase, the Gene Record Finder, and the gene prediction and RNA-Seq evidence tracks on the GEP UCSC Genome Browser. The walkthrough then shows how you can identify the coordinates of each coding exon using NCBI BLAST, and also includes a discussion on the phases of the donor and acceptor splice sites. The walkthrough concludes by verifying the proposed gene model using the Gene Model Checker; it also includes a sample GEP Annotation Report.
Simple Annotation Problem
This worksheet will guide you through a series of basic steps that have been found to work well for annotation of species closely related to Drosophila melanogaster. It provides a technique that can also be the foundation of annotation in other, more divergent species.
F Element Project: Annotation Report
This document is the revised annotation report that GEP students will use to report their annotation results to the GEP.
GEP Annotation Workflow
This workflow provides an overview of the key analysis steps and bioinformatics tools for the annotation of a predicted gene in the Drosophila F element GEP project.
Annotating Splice Sites: Workflow
A one-page summary/flowchart of the logic process for identifying appropriate splice sites when annotating.
Identify D. melanogaster Ortholog
This decision tree illustrates the list of criteria that can be used to determine the putative D. melanogaster ortholog of a predicted gene.
Using BLAST for Genomic Sequence Annotation
Similar to the Lecture Notes on Alignment, this is a PowerPoint presentation given by Dr. Jeremy Buhler for the GEP faculty and TA workshops. This presentation covers the basics of alignment, essential for students to correctly interpret BLAST results.
Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment
Notes from a lecture on sequence alignment given by Dr. Jeremy Buhler in the Bio 4342 course at WU. The lecture covers the theory behind BLAST as well as some of the potential problems and limitations of BLAST.
Annotation of Drosophila
This PowerPoint presentation describes the recommended annotation strategy for Drosophila projects. The presentation provides an overview of the goals of the GEP annotation project, an introduction to NCBI BLAST, web databases, and the issue of reading frames and phase.
Annotation for D. virilis
This is a PowerPoint presentation describing the recommended strategies for annotating a D. virilis fosmid. The homology-based annotation strategy should also be applicable to annotation of D. erecta and D. mojavensis projects.
Annotation Instruction Sheet
This document is a more in depth description of the evidence based annotation technique used by the GEP. This document is designed to complement and extend the basic technique described in the Annotation for D. virilis PowerPoint.
Annotation Strategy Guide
This document illustrates how the strategies outlined in the Annotation Instruction Sheet can be applied to more challenging annotation cases.
GEP Digital Laboratory Notebook
Developed by Dr. Nick Reeves at Mt. San Jacinto College, Menifee Valley Campus, this PowerPoint presentation provides a brief overview of the Digital Lab Notebook, which provides detailed guidance to students on the GEP annotation strategy.
Annotation of Other Genomic Features within the F Element Project
This presentation illustrates the unusual genomic features that GEP students have encountered as part of their annotation of Muller F Elements from Drosophila ananassae and D. bipectinata. The Muller F Elements in these two species have undergone substantial expansion compared to D. melanogaster. The presentation describes the basic strategy for identifying pseudogenes, retrogenes, partial gene duplications, pseudogene clusters, and nuclear mitochondrial DNA segments (NUMT) within these F Element annotation projects.
Common Annotation Errors
This PowerPoint presentation describes the common errors observed in student annotations.
RNA Quantitation from RNA-Seq Data
Developed by Dr. Jeremy Buhler, this PowerPoint presentation provides an overview of the approaches for quantifying transcript abundance based on RNA-Seq data. The presentation includes a discussion on the benefits and limitations of the two approaches commonly used for RNA quantitation – RPKM and TPM.
Drosophila Annotation Goals: Final Presentation and Written Reports
This document describes the primary annotation goals to be included in the final oral presentation and written report for students enrolled in the Bio4342 course at WU.
TSS Annotation (Under Development)
Searching for Transcription Start Sites in Drosophila
This PowerPoint presentation describes the recommended annotation strategy for identifying transcription start sites in Drosophila. The presentation provides an overview of the promoter architecture in D. melanogaster and describes the types of evidence that can be used to support the transcription start sites annotations.
Annotation of Transcription Start Sites in Drosophila
This walkthrough illustrates the GEP protocol for the comparative annotation of transcription start sites (TSS) in D. biarmipes. The walkthrough also includes a sample GEP TSS Report for the TSS annotation of onecut.
TSS Annotation Workflow
This workflow provides an overview of the key steps and recommended search parameters for the annotation of transcription start sites.
Investigation of Motifs
Introduction to Motifs and Motif Finding
This document contains the notes from a lecture on motif finding given by Dr. Jeremy Buhler in the Bio 4342 course at WU. The lecture covers the different approaches used to represent sequence motifs and to search for sequence motifs in a genome.
Behavior and Limitations of Motif Finding
Developed by Dr. Jeremy Buhler, this exercise uses MEME to discover putative regulatory motifs in a collection of D. melanogaster promoter sequences. It also illustrates some of the challenges associated with motif finding and the limitations of motif finding programs.
Annotation of Conserved Motifs in Drosophila
This walkthrough uses FlyBase, FlyFactorSurvey, and Patser to identify transcription factor binding sites in the region surrounding the transcription start site of onecut in D. biarmipes.
Motif Discovery in Drosophila
This walkthrough uses FlyBase RNA-Seq Search and the MEME suite to discover motifs that are enriched in a collection of D. melanogaster Muller F element genes that show similar expression patterns.
Investigation of Repetitious Elements (Under Development)
3. The Dilemma of Transposable Elements: Can’t Live with Them, Can’t Evolve without Them!
This lecture introduces students to the analysis of repetitious elements in the genome. It can be used as a stand-alone lecture, or included in the “F Element Project: Annotated Lecture Slides” sequence of lectures.
Design and Use of RepeatMasker
Similar to the lecture notes on Repetitious DNA, this is a PowerPoint presentation given by Dr. Jeremy Buhler for the GEP faculty and TA workshops. This presentation covers the basics of RepeatMasker, as well as limitations of the program that students should be aware of.