Home ยป Bioinformatics

Bioinformatics

Sequence Similarity Introduction

This lesson and exercise defines similarity in a non-biological and biological sense, quantifies the similarity between two sequences, explains how a substitution matrix is used to quantify similarity, calculates amino acid similarity scores using the BLOSUM 62 substitution matrix, explains how BLAST detects similarity between two sequences and how to use BLAST and interpret the alignments.

Introduction to R and RStudio

This series of modules introduces students to the statistical platform R using an integrated development environment of RStudio. Both softwares can be downloaded for free. Once downloaded and installed according to Module 0, students should watch the accompanying video for an introduction to the new environment. Module 1 presents an exercise where students work with genomic sequence alignment data to learn how to construct R commands while performing basic summary statistics and making basic plots.

Motif Discovery in Drosophila

This walkthrough uses FlyBase RNA-Seq Search and the MEME suite to discover motifs that are enriched in a collection of D. melanogaster Muller F element genes that show similar expression patterns.

Annotation of Conserved Motifs in Drosophila

This walkthrough uses FlyBase, FlyFactorSurvey, and Patser to identify transcription factor binding sites in the region surrounding the transcription start site of onecut in D. biarmipes.

Behavior and Limitations of Motif Finding

Developed by Dr. Jeremy Buhler, this exercise uses MEME to discover putative regulatory motifs in a collection of D. melanogaster promoter sequences. It also illustrates some of the challenges associated with motif finding and the limitations of motif finding programs.

Introduction to Motifs and Motif Finding

This document contains the notes from a lecture on motif finding given by Dr. Jeremy Buhler in the Bio 4342 course at WU. The lecture covers the different approaches used to represent sequence motifs and to search for sequence motifs in a genome.

RNA Quantitation from RNA-Seq Data

Developed by Dr. Jeremy Buhler, this PowerPoint presentation provides an overview of the approaches for quantifying transcript abundance based on RNA-Seq data. The presentation includes a discussion on the benefits and limitations of the two approaches commonly used for RNA quantitation – RPKM and TPM.

Generating Multiple Sequence Alignments with ClustalW

Dr. Susan Parrish (McDaniel College) developed a basic lecture and weblem exercise (found at the end of the lecture) on using ClustalW to generate multiple sequence alignments, phylograms, and cladograms. This lecture and exercise are given prior to beginning the GEP annotation projects. Students who submit their GEP annotation projects early are then asked to generate multiple sequence alignments and phylograms of the putative proteins encoded within their assigned contig or fosmid, compared to those related proteins encoded by other Drosophila species of interest to the GEP.

Generating Multiple Sequence Alignments with Clustal Omega

Developed by Dr. Susan Parrish (McDaniel College), this PowerPoint presentation describes how Clustal Omega can be used to produce multiple sequence alignments. The multiple sequence alignments of the nucleotide sequences surrounding the transcription start sites can be used to identify core promoter motifs, while the multiple sequence alignments of protein sequences can be used to identify conserved domains. The presentation also includes a discussion of two strategies (UPGMA and Neighbor Joining) that are often used to construct phylogenetic trees.