The National Science Foundation awarded a Standard Grant of $434,154 (Award Number 2114661) to support the GEP’s “Drosophila F Element Expansion: A Window on the C-value Paradox” Project led by Principal Investigator Cindy Arrigo (New Jersey City University).
Abstract
This research award funds an investigation of the evolutionary causes and consequences of genome size variation. The DNA of all organisms contains the genes that code for proteins, the building blocks of cells. Humans have approximately five times as many protein-coding genes as do bacteria, but about 1,000 times the amount of DNA. This phenomenon, the C-value Paradox, will be studied using a chromosome (the F element) that has undergone a rapid change in size during the evolution of the fruit fly, Drosophila. Initial analysis of the F element genes in four species with an expanded F is being done by undergraduates in the Genomics Education Partnership (GEP). The GEP involves >150 faculty from across the United States who are using this project to introduce students to research in genomics, focusing on gene annotation. One of the most diverse universities in the nation, New Jersey City University, is the hub for this national research project.
An F element region containing ~80 genes is 1.3 megabases in Drosophila melanogaster, but 19.1 megabases in Drosophila ananassae, a 15-fold increase in size. Expansion of the F element is largely due to a higher repeat load, dominated by transposable elements (TEs). Using a comparative species approach to analyze the expansions within and between genes, the project will document the rate and timing of TE acquisition, and characterize the impacts of TEs on gene structure and on chromosome organization. Examining SNPs from 15 strains of D. ananassae will illuminate whether change in genome size is associated with change in effective population size. The mechanisms that limit recombination will be examined using both codon bias and substitution rates. These studies and others enabled by the GEP student annotations will contribute to a better understanding of the nature of the genome that can be broadly applied to eukaryotic biology.
Overview
This proposal is a collaborative effort of the Genomics Education Partnership (GEP), written by CJ Arrigo, C Ellison, W Leung, and SCR Elgin with GEP input. The GEP has published two major papers on the Drosophila F element, an unusual chromosome that appears to be entirely heterochromatic by many criteria (condensed appearance, high HP1a/H3K9me2/3, lack of recombination) but carries 80 genes. Surprisingly, four Drosophila species have been identified with a significantly larger than average F element (2-fold to ~15-fold). Investigation of this expansion should provide insights into genome expansion in general, including documentation of the process, impacts on the genes, and impacts on the chromosome.
Aim 1: Characterizing F element expansion: Careful annotation of high quality genome assemblies will allow us to document the structure of both the genes and the repetitious sequences, primarily Transposable Elements (TEs), addressing the following questions using bioinformatics tools:
- What is the distribution of repeats in relation to the protein-coding genes? How are those genes altered (e.g., in size) when the load of repetitious sequences increases?
- Is there an impact on Transcription Start Sites?
- How are these results impacted by the magnitude of expansion?
- What are these repetitious sequences, and what is their evolutionary history?
- Does high repeat density promote or allow other genome changes?
Aim 2: Determining the impact of expansion on gene / genome evolution: What is the impact of higher repeat loads on the evolution of the chromosome as a whole, and on the evolution of the genes, embedded in a sea of repetitive sequences that must be silenced?
The multiple independent F element expansions in Drosophila provide a unique opportunity to determine whether change in genome size is associated with a change in effective population size. We will test this prediction by using genome-wide single nucleotide polymorphisms (SNPs) from 15 strains of D. ananassae. Focusing on the four species with an expanded F for which high quality sequence is available, we will document the rate and timing of TE acquisition. We will look at the impact on the genes, using both codon bias and substitution rates to examine the extent of Hill-Robertson interference, determining whether interference is a centromere proximal effect or simply reflects the lack of recombination due to heterochromatin formation driven by the high repeat density.