Katie Sandlin

Facilitating Growth through Frustration: Using Genomics Research in a Course-Based Undergraduate Research Experience

Abstract: A hallmark of the research experience is encountering difficulty and working through those challenges to achieve success. This ability is essential to being a successful scientist, but replicating such challenges in a teaching setting can be difficult. The Genomics Education Partnership (GEP) is a consortium of faculty who engage their students in a genomics Course-Based Undergraduate Research Experience (CURE). Students participate in genome annotation, generating gene models using multiple lines of experimental evidence. Our observations suggested that the students’ learning experience is continuous and recursive, frequently beginning with frustration but eventually leading to success as they come up with defendable gene models. In order to explore our “formative frustration” hypothesis, we gathered data from faculty via a survey, and from students via both a general survey and a set of student focus groups. Upon analyzing these data, we found that all three datasets mentioned frustration and struggle, as well as learning and better understanding of the scientific process. Bioinformatics projects are particularly well suited to the process of iteration and refinement because iterations can be performed quickly and are inexpensive in both time and money. Based on these findings, we suggest that a dynamic of “formative frustration” is an important aspect for a successful CURE.

Representative comments from student focus groups highlighting the transition from challenges to benefits/successes.

Lopatto D, Rosenwald AG, DiAngelo JR, et al. Facilitating Growth through Frustration: Using Genomics Research in a Course-Based Undergraduate Research Experience. J Microbiol Biol Educ. 2020;21(1):21.1.6. Published 2020 Feb 28. doi:10.1128/jmbe.v21i1.2005

Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element

Abstract: The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (∼5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae. To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae. Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5′ ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains.

Histone modification profiles for D. ananassae and D. melanogaster F-element genes at the third instar larval stage of development.

Leung W, Shaffer CD, Chen EJ, et al. Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element. G3 (Bethesda). 2017;7(8):2439‐2460. Published 2017 Aug 7. doi:10.1534/g3.117.040907

Leung W, Elgin SCR; (On behalf of the participating students and faculty of the Genomics Education Partnership). Response to the Letter to the Editor by Dunning Hotopp and Klasson. G3 (Bethesda). 2018;8(1):375. Published 2018 Jan 4. doi:10.1534/g3.117.300379

An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure

Abstract: Gene structure, transcription, translation, and alternative splicing are challenging concepts for many undergraduates studying biology. These topics are typically covered in a traditional lecture environment, but students often fail to master and retain these concepts. To address this problem we have designed a series of six Modules that employ an active learning approach using a bioinformatics tool, the genome browser, to help students understand eukaryotic gene structure and functionality. Students learn how to use a mirror site of the UCSC Genome Browser created by the Genomics Education Partnership while completing the Modules, which focus on gene structure, transcription, splicing, translation, and alternative splicing. The Modules are supplemented with short videos that illustrate key functionalities of the genome browser and fundamental concepts in processing transcripts. These materials have been used successfully to teach gene structure in many different settings, from community colleges to 4-year colleges and universities, encompassing advanced high school students to college seniors. Instructors can easily customize the Modules and/or select a subset for their curriculum. The Modules have helped our students learn about eukaryotic gene structure and expression, simultaneously acquiring skills in the use of a genome browser, and have prepared them to pursue genome annotation projects as independent research.

Module 1: Lesson Timeline

Laakso, M.M., Paliulis, L.V., Croonquist, P., Derr, B., Gracheva, E., Hauser, C., Howell, C., Jones, C.J., Kagey, J.D., Kennell, J., Silver Key, S.C., Mistry, H., Robic, S., Sanford, J., Santisteban, M., Small, C., Spokony, R., Stamm, J., Van Stry, M., Leung, W., Elgin, S.C.R. 2017. An undergraduate bioinformatics curriculum that teaches eukaryotic gene structure. CourseSource. https://doi.org/10.24918/cs.2017.13

The GEP: Crowd-Sourcing Big Data Analysis With Undergraduates

Abstract: The era of ‘big data’ is also the era of abundant data, creating new opportunities for student-scientist research partnerships. By coordinating undergraduate efforts, the Genomics Education Partnership produces high-quality annotated data sets and analyses that could not be generated otherwise, leading to scientific publications while providing many students with research experience.

A GEP UCSC Genome Browser Mirror View of the Mitf Gene on the Drosophila erecta F Element.

Elgin SCR, Hauser C, Holzen TM, et al. The GEP: Crowd-Sourcing Big Data Analysis with Undergraduates. Trends Genet. 2017;33(2):81‐85. doi:10.1016/j.tig.2016.11.004

A Hands-on Introduction to Hidden Markov Models

Abstract: In this Lesson, we describe a classroom activity that demonstrates how a Hidden Markov Model (HMM) is applied to predict a eukaryotic gene, focusing on predicting one exon-intron boundary. This HMM lesson is part of the BIOL/CS 370 ‘Introduction to Bioinformatics’ course (Truman State University, MO) and of Bio4342 ‘Research Explorations in Genomics’ (Washington University in St. Louis, MO). The original target student audiences include both Biology and Computer Sciences majors in their junior and senior years, although we believe the model activity would be successful with younger students. The class session starts with a brief introductory lecture describing HMMs and the terminology used in defining the parameters for a given model. This lecture is followed by students’ exploration of the HMM using Excel spreadsheets to manage calculations while they alter the key variables; collaborative problem solving and discussion of their strategies and results; and homework to check their understandings.

Students have reacted very positively to the HMM curriculum. Students with more computer science experience tended to ask more questions concerning the model itself. Overall, students performed well on the homework assignment, leading us to believe that we are a step closer to our main goal of filling the intellectual gap between computer scientists and biologists.

Tally of the HMM homework results. Numbers along the X-axis correspond to the homework questions (Supporting material). Bar graphs indicate the percentage of satisfactory or above satisfactory answers.

Weisstein, A.E., Gracheva, E., Goodwin, Z., Qi, Z., Leung, W., Shaffer, C.D. and Elgin, S.C.R. 2016. A Hands-on Introduction to Hidden Markov Models. CourseSource. https://doi.org/10.24918/cs.2016.8

Drosophila Muller F Elements Maintain a Distinct Set of Genomic Properties Over 40 Million Years of Evolution

Abstract: The Muller F element (4.2 Mb, ~80 protein-coding genes) is an unusual autosome of Drosophila melanogaster; it is mostly heterochromatic with a low recombination rate. To investigate how these properties impact the evolution of repeats and genes, we manually improved the sequence and annotated the genes on the D. erecta, D. mojavensis, and D. grimshawi F elements and euchromatic domains from the Muller D element. We find that F elements have greater transposon density (25–50%) than euchromatic reference regions (3–11%). Among the F elements, D. grimshawi has the lowest transposon density (particularly DINE-1: 2% vs. 11–27%). F element genes have larger coding spans, more coding exons, larger introns, and lower codon bias. Comparison of the Effective Number of Codons with the Codon Adaptation Index shows that, in contrast to the other species, codon bias in D. grimshawi F element genes can be attributed primarily to selection instead of mutational biases, suggesting that density and types of transposons affect the degree of local heterochromatin formation. F element genes have lower estimated DNA melting temperatures than D element genes, potentially facilitating transcription through heterochromatin. Most F element genes (~90%) have remained on that element, but the F element has smaller syntenic blocks than genome averages (3.4–3.6 vs. 8.4–8.8 genes per block), indicating greater rates of inversion despite lower rates of recombination. Overall, the F element has maintained characteristics that are distinct from other autosomes in the Drosophila lineage, illuminating the constraints imposed by a heterochromatic milieu.

Sequence improvement of the D. mojavensis F element scaffold.

Leung W, Shaffer CD, Reed LK, et al. Drosophila Muller F Elements Maintain a Distinct Set of Genomic Properties Over 40 Million Years of Evolution. G3 (Bethesda). 2015;5(5):719‐740. Published 2015 Mar 4. doi:10.1534/g3.114.015966

• The improved sequences and gene annotations are available as part of the supplemental materials for the manuscript.

• The genome browsers for D. erecta, D. mojavensis, and D. grimshawi are available on the GEP UCSC Genome Browser.

• The underlying database and additional data files are available for download through the WUSTL Digital Research Materials Repository.

A Central Support System Can Facilitate Implementation and Sustainability of a Classroom-Based Undergraduate Research Experience (CURE) in Genomics

Abstract: In their 2012 report, the President’s Council of Advisors on Science and Technology advocated “replacing standard science laboratory courses with discovery-based research courses”-a challenging proposition that presents practical and pedagogical difficulties. In this paper, we describe our collective experiences working with the Genomics Education Partnership, a nationwide faculty consortium that aims to provide undergraduates with a research experience in genomics through a scheduled course (a classroom-based undergraduate research experience, or CURE). We examine the common barriers encountered in implementing a CURE, program elements of most value to faculty, ways in which a shared core support system can help, and the incentives for and rewards of establishing a CURE on our diverse campuses. While some of the barriers and rewards are specific to a research project utilizing a genomics approach, other lessons learned should be broadly applicable. We find that a central system that supports a shared investigation can mitigate some shortfalls in campus infrastructure (such as time for new curriculum development, availability of IT services) and provides collegial support for change. Our findings should be useful for designing similar supportive programs to facilitate change in the way we teach science for undergraduates.

Faculty identification of barriers to implementing and sustaining a research-based lab course in genomics. Mean faculty ratings (on the anonymous survey), scoring both the importance (red bar) and the presence on campus (blue bar) of 25 items, at the time when the respondent attempted to implement genomics research lab activities. Respondents rated importance on a scale of 1 (marginally important) to 5 (very important), and rated presence on a scale of 1 (absent) to 5 (present in abundance). Items are sorted top to bottom by importance (red bar). The mean response for presence (blue bar) was superimposed over the red to highlight the difference; if presence exceeds importance, only the blue bar is visible. The difference between importance (red, what is needed) and presence (blue) suggests barriers to implementation. Numerical data are provided in Supplemental Material S8.

Lopatto D, Hauser C, Jones CJ, et al. A Central Support System Can Facilitate Implementation and Sustainability of a Classroom-based Undergraduate Research Experience (CURE) in Genomics. CBE Life Sci Educ. 2014;13(4):711‐723. doi:10.1187/cbe.13-10-0200

A Course-Based Research Experience: How Benefits Change with Increased Investment in Instructional Time

Abstract: There is widespread agreement that science, technology, engineering, and mathematics programs should provide undergraduates with research experience. Practical issues and limited resources, however, make this a challenge. We have developed a bioinformatics project that provides a course-based research experience for students at a diverse group of schools and offers the opportunity to tailor this experience to local curriculum and institution-specific student needs. We assessed both attitude and knowledge gains, looking for insights into how students respond given this wide range of curricular and institutional variables. While different approaches all appear to result in learning gains, we find that a significant investment of course time is required to enable students to show gains commensurate to a summer research experience. An alumni survey revealed that time spent on a research project is also a significant factor in the value former students assign to the experience one or more years later. We conclude: 1) implementation of a bioinformatics project within the biology curriculum provides a mechanism for successfully engaging large numbers of students in undergraduate research; 2) benefits to students are achievable at a wide variety of academic institutions; and 3) successful implementation of course-based research experiences requires significant investment of instructional time for students to gain full benefit.

Self-reported student learning gains using the SURE survey. Blue squares indicate the mean for GEP students, while red squares indicate the mean for SURE summer research students, 2009. Error bars represent two SEs below and above the means. The SE for the averages of the GEP and SURE responses was <0.04. Data shown combine results from surveys given in academic years 2010–11 and 2011–12; the data include between 652 and 751 responses on each of the 20 items from GEP students. The comparison group is the 2009 SURE survey of 1653 students who had just completed a summer in the lab. The large number of students allows for smaller error estimates than in our previous study (Lopatto et al., 2008).

Shaffer CD, Alvarez CJ, Bednarski AE, et al. A Course-Based Research Experience: How Benefits Change with Increased Investment in Instructional Time. CBE Life Sci Educ. 2014;13(1):111‐130. doi:10.1187/cbe-13-08-0152

Evolution of a Distinct Genomic Domain in Drosophila: Comparative Analysis of the Dot Chromosome in D. melanogaster and D. virilis

Abstract: The distal arm of the fourth (“dot”) chromosome of Drosophila melanogaster is unusual in that it exhibits an amalgamation of heterochromatic properties (e.g., dense packaging, late replication) and euchromatic properties (e.g., gene density similar to euchromatic domains, replication during polytenization). To examine the evolution of this unusual domain, we undertook a comparative study by generating high-quality sequence data and manually curating gene models for the dot chromosome of D. virilis (Tucson strain 15010-1051.88). Our analysis shows that the dot chromosomes of D. melanogaster and D. virilis have higher repeat density, larger gene size, lower codon bias, and a higher rate of gene rearrangement compared to a reference euchromatic domain. Analysis of eight “wanderer” genes (present in a euchromatic chromosome arm in one species and on the dot chromosome in the other) shows that their characteristics are similar to other genes in the same domain, which suggests that these characteristics are features of the domain and are not required for these genes to function. Comparison of this strain of D. virilis with the strain sequenced by the Drosophila 12 Genomes Consortium (Tucson strain 15010-1051.87) indicates that most genes on the dot are under weak purifying selection. Collectively, despite the heterochromatin-like properties of this domain, genes on the dot evolve to maintain function while being responsive to changes in their local environment.

Distribution of gene sizes, coding exon sizes, intron sizes, and intron sizes without repeats. The graphs show empirical cumulative distribution plots for these features on the D. melanogaster and D. virilis dot chromosomes as well as the euchromatic and heterochromatic reference regions. (A) Genes on the dot chromosomes (from the start codon to the stop codon) are larger than genes from the euchromatic reference regions. (B) Coding exons on the dot chromosome tend to be slightly larger than coding exons in the heterochromatic reference region and slightly smaller than the coding exons in the euchromatic reference regions. (C) Introns on the dot chromosome are significantly smaller than the introns in the heterochromatic reference region and larger than introns in the euchromatic reference region. (D) Removing the repeats from introns reduces but does not eliminate this difference.

Leung W, Shaffer CD, Cordonnier T, et al. Evolution of a distinct genomic domain in Drosophila: comparative analysis of the dot chromosome in Drosophila melanogaster and Drosophila virilis. Genetics. 2010;185(4):1519‐1534. doi:10.1534/genetics.110.116129

The Genomics Education Partnership: Successful Integration of Research Into Laboratory Classes at a Diverse Group of Undergraduate Institutions

Genomics is not only essential for students to understand biology but also provides unprecedented opportunities for undergraduate research. The goal of the Genomics Education Partnership (GEP), a collaboration between a growing number of colleges and universities around the country and the Department of Biology and Genome Center of Washington University in St. Louis, is to provide such research opportunities. Using a versatile curriculum that has been adapted to many different class settings, GEP undergraduates undertake projects to bring draft-quality genomic sequence up to high quality and/or participate in the annotation of these sequences. GEP undergraduates have improved more than 2 million bases of draft genomic sequence from several species of Drosophila and have produced hundreds of gene models using evidence-based manual annotation. Students appreciate their ability to make a contribution to ongoing research, and report increased independence and a more active learning approach after participation in GEP projects. They show knowledge gains on pre- and postcourse quizzes about genes and genomes and in bioinformatic analysis. Participating faculty also report professional gains, increased access to genomics-related technology, and an overall positive experience. We have found that using a genomics research project as the core of a laboratory course is rewarding for both faculty and students.

Example of a student annotation of a gene. (A) Student-generated gene model (orange) compared with models from various ab initio gene prediction algorithms. Note the first two exons (top left) of the manually generated model, not found in any of the ab initio predictions. (B) Alignment of the amino acids of the first two exons of the gene model from D. melanogaster and the student model from D. erecta.

Shaffer CD, Alvarez C, Bailey C, et al. The Genomics Education Partnership: Successful Integration of Research Into Laboratory Classes at a Diverse Group of Undergraduate Institutions. CBE Life Sci Educ. 2010;9(1):55‐69. doi:10.1187/09-11-0087