One of the unusual features of eukaryotic genomes is the discordance between genome size and the complexity of the organism (i.e., the C-value paradox; Eddy, 2012). The smallest chromosome in Drosophila melanogaster is chromosome 4 (also known as the Muller F element), with an estimated size of ~5.2 Mb (Locke and McDermid, 1993). The D. melanogaster F element is generally packaged as heterochromatin: it has a high repeat content, is packaged throughout with HP1a and H3K9me2/3; it exhibits late replication and little or no recombination. However, the banded portion (~1.4 Mb) of this chromosome also contains ~80 protein-coding genes. These F element genes exhibit a range of expression levels similar to genes that reside in euchromatic domains — indicating that F element genes have acquired distinct features that enable them to function in a heterochromatic environment (reviewed in Riddle and Elgin, 2018).

While the F element has maintained a similar size in many other Drosophila species, it is substantially larger in at least four Drosophila species (i.e., D. ananassae, D. bipectinata, D. kikkawai, and D. takahashii). For example, the D. ananassae Muller F element is more than 18.7 Mb in size. This study will examine the factors (e.g., transposon density) that have contributed to the expansion of the F element in these four Drosophila species, and assess the impact of this expansion on gene characteristics (e.g., codon bias, intron size).

GEP students will produce coding region and transcription start site annotations for F element genes in D. ananassae, D. bipectinata, D. kikkawai, and D. takahashii, as well as for genes in a euchromatic reference region derived from the Muller D element. (Euchromatic regions have not expanded in these species.) Comparative analyses using these datasets will provide insights into the evolutionary impacts of changes in chromosome and gene size, and will facilitate the identification of factors that enable genes to function in a heterochromatic environment. We anticipate that this work will move us toward a better understanding of how and why eukaryotic genomes became so large, for mammals, ~1000X larger than that of E. coli.

""

Using comparative genomics to assess the evolutionary impact of Drosophila F element expansion on chromosome and gene characteristics. (Top) Past studies using a transgene reporter with the white gene driven by an hsp70 promoter show that the Drosophila melanogaster Muller F element is mostly heterochromatic, even though the region contains ~80 protein-coding genes. (Bottom left) The D. ananassae F element has substantially higher transposon density compared to the D. melanogaster F element. The high density of LTR and LINE retrotransposons is one of the major contributors to the expansion of the D. ananassae F element (>18.6Mb) compared to the D. melanogaster F element (>1.4Mb). (Bottom right) In addition to D. ananassae, the D. bipectinata, D. kikkawai, and D. takahashii F elements are also larger than the D. melanogaster F element. GEP students will annotate genes on the F element and on a euchromatic reference region from the D element for these four Drosophila species.

Image credits: (Top) Karmella Haynes; (Bottom Left) Leung et al., 2017; (Bottom Right) Phylogenetic tree produced by Thom Kaufman as part of the modENCODE project.

Curriculum

Annotation for D. virilis

This is a PowerPoint presentation describing the recommended strategies for annotating a D. virilis fosmid. The homology-based annotation strategy should also be applicable to annotation of D. erecta and D. mojavensis projects.

Annotation Instruction Sheet

This document is a more in depth description of the evidence based annotation technique used by the GEP. This document is designed to complement and extend the basic technique described in the Annotation for D. virilis PowerPoint.

Annotation of Drosophila

This PowerPoint presentation describes the recommended annotation strategy for Drosophila projects. The presentation provides an overview of the goals of the GEP annotation project, an introduction to NCBI BLAST, web databases, and the issue of reading frames and phase.

Annotation of Drosophila Primer

This PowerPoint presentation provides a brief primer on the recommended annotation strategy for Drosophila projects. The presentation provides an overview of the goals of the GEP annotation project, an introduction to RNA-Seq, web databases, and a discussion on the phases of the splice donor and

Annotation of a Drosophila Gene

This walkthrough uses the annotation of a gene on the D. biarmipes Muller F element to illustrate the GEP comparative annotation strategy. This document shows how you can investigate a feature in an annotation project using FlyBase, the Gene Record Finder, and the gene prediction

Annotation of Conserved Motifs in Drosophila

This walkthrough uses FlyBase, FlyFactorSurvey, and Patser to identify transcription factor binding sites in the region surrounding the transcription start site of onecut in D. biarmipes.

Annotation Strategy Guide

This document illustrates how the strategies outlined in the Annotation Instruction Sheet can be applied to more challenging annotation cases.

Behavior and Limitations of Motif Finding

Developed by Dr. Jeremy Buhler, this exercise uses MEME to discover putative regulatory motifs in a collection of D. melanogaster promoter sequences. It also illustrates some of the challenges associated with motif finding and the limitations of motif finding programs.

Browser-Based Annotation and RNA-Seq Data

This exercise continues your introduction to practical issues in comparative annotation. You will be annotating genomic sequence from the dot chromosome of Drosophila mojavensis using your knowledge of BLAST and some improved visualization tools. You will also consider how best to integrate information from high-throughput

Detecting and Interpreting Genetic Homology

An introductory exercise using BLAST to annotate a region in the Drosophila melanogaster genome. Students can use this exercise to gain familiarity with performing BLAST searches and interpreting BLAST output. An answer key is provided for instructors.

F Element Project: Annotation Report

This document is the revised annotation report that GEP students will use to report their annotation results to the GEP. Complete Annotation Report Complete Annotation Report TSS Report TSS Report

GEP Annotation Workflow

This workflow provides an overview of the key analysis steps and bioinformatics tools for the annotation of a predicted gene in the Drosophila F element GEP project.

GEP Digital Laboratory Notebook

Developed by Dr. Nick Reeves at Mt. San Jacinto College, Menifee Valley Campus, this PowerPoint presentation provides a brief overview of the Digital Lab Notebook, which provides detailed guidance to students on the GEP annotation strategy.

Identify D. melanogaster Ortholog

This decision tree illustrates the list of criteria that can be used to determine the putative D. melanogaster ortholog of a predicted gene.

Introduction to Motifs and Motif Finding

This document contains the notes from a lecture on motif finding given by Dr. Jeremy Buhler in the Bio 4342 course at WU. The lecture covers the different approaches used to represent sequence motifs and to search for sequence motifs in a genome.

Motif Discovery in Drosophila

This walkthrough uses FlyBase RNA-Seq Search and the MEME suite to discover motifs that are enriched in a collection of D. melanogaster Muller F element genes that show similar expression patterns.

Searching for Transcription Start Sites in Drosophila

This PowerPoint presentation describes the recommended annotation strategy for identifying transcription start sites in Drosophila. The presentation provides an overview of the promoter architecture in D. melanogaster and describes the types of evidence that can be used to support the transcription start sites annotations.

Simple Annotation Problem

This worksheet will guide you through a series of basic steps that have been found to work well for annotation of species closely related to Drosophila melanogaster. It provides a technique that can also be the foundation of annotation in other, more divergent species.

TSS Annotation Workflow

This workflow provides an overview of the key steps and recommended search parameters for the annotation of transcription start sites.