Student Frequently Asked Questions

The following information was cultivated from students’ Frequently Asked Questions (FAQs) about the general processes and key ideas for eukaryotic gene annotation. Special thanks to GEP TA D’Andrew Harrington (College of Southern Nevada) for creating the original version of this living document.

GEP UCSC Genome Browser

If you bump into a situation where your gene annotation contains many in-frame stop codons or even if your initial CDS doesn’t appear to have a start Methionine, more than likely, you are examining your gene in the wrong orientation.
      1. Your BLAST results should indicate if your query is expected to be on the positive or negative strand. Be sure to read your BLAST report thoroughly.
      2. If your gene is on the negative strand, click on the “reverse” button to flip the gene’s orientation so it still reads from left-to-right. The figure below shows what this might look like when examining your first and second CDSs.
 

We see CDS1 and CDS 2 in the incorrect orientation. Can you spot the problems with this reading?

Notice in the picture above of CDS1 and CDS 2, we see two crucial errors in our reading:
  • We do not see a start Methionine (ATG) in CDS1.
  • Each frame in CDS2 has multiple in-frame stop codons.

We see CDS1 and CDS2 in the reverse orientation. Can you spot why this is correct as compared to the above figure?

It’s okay if the gene is on opposite strands in D. melanogaster and your ortholog. Ensure the rest of the genomic neighborhood makes sense (i.e., adjacent genes on opposite strands in D. melanogaster should also be on opposite strands in your ortholog). In an ideal world, the relative orientation and placement of genes should be consistent.

It depends. Tracks have been designed within the GEP UCSC Genome Browser based on the kind of research being done. It is essential to ask the question, “What information am I looking for that the Genome Browser can provide to me?” Examples could be RNA-Seq data, in-frame stop codons, comparative genomics across a genus, or many other things. In general, though, we recommend selecting your “Default Tracks” to get started. Aside from this, we have several other tracks that will stand out during your annotation work, see below:

  • Mapping and Sequencing Tracks → Base Position → FULL
  • Genes and Gene Prediction Tracks → FlyBase Genes → PACK
  • RNA Seq Tracks → FlyBase Exon Junctions → PACK
  • Updated Transcriptome Tracks → Splice Junctions (discretionary) → PACK
  • Comparative Genomics tracks (if available)

At first glance, it can be challenging to know what nucleotide position the GEP UCSC Genome Browser is showing you. A useful method of reading this frame can be found by looking for the pipe, denoted with this “|” symbol. By looking for the number in between each pipe, you can quickly figure out which nucleotide corresponds with the overall scaffold count.

We can see the start of CDS1 and how each position number is enclosed in pipes. We color coded these pipes to have a better look.

Absolutely! You can increase the font size by clicking on “configure” and then selecting a different “text size.” This font size can be adjusted to many different sizes – so pick whichever you find most comfortable.

We see the configure button located on the universal ribbon panel.

Underneath configure, we can see numerous different options, including text size.

BLAST: Basic Local Alignment Search Tool

BLAST, known as the Basic Local Alignment Search Tool, can identify similarities or differences within biological data. It accomplishes this by providing the user an E-value (Expected Value) to determine statistical significance. Explanations of each of the five types of BLAST and their uses can be found below:

BLAST
Type
Query
(sequence to match)
Database/Subject (searching for match)FunctionUse Cases
blastn (nucleotide)nucleotidenucleotidesearching with shorter queries, cross-species comparisonmap mRNAs against genomic assemblies
blastp (protein)proteinproteingeneral sequence identification and
similarity searches
search for proteins similar to predicted genes
blastxnucleotide → proteinproteinidentifying potential protein products encoded by a nucleotide querymap proteins/CDS against genomic sequence
tblastnproteinnucleotide → proteinidentifying database sequences encoding proteins similar to querymap proteins against genomic assemblies
tblastxnucleotide → proteinnucleotide → proteinidentifying nucleotide sequences similar to the query based on their coding potential 
identify genes in unannotated sequences

Arrows indicate the BLAST program translates the nucleotide sequence before performing the search.

Each BLAST tool has a different function. You should understand that various tools of BLAST will provide you with different outcomes. If you use the wrong tool, you should expect information that doesn’t make sense and precious time lost. See the table below to see examples of how each tool of BLAST can be used in relation to your time with the GEP. If you ever see results that do not make sense, feel free to reach out to the GEP TA’s, and we will be more than happy to provide more in-depth explanations and assistance with your queries.

BLAST TypeDescription of Usage as a Scientific Question
blastn“Are there nucleotide similarities from D. melanogaster to D. yakuba?”
blastp“Are there peptide similarities from D. melanogaster to D. yakuba?”
blastx“Are there peptide similarities inside of D. melanogaster that I can find with only my nucleotide sequence from D. yakuba?”
tblastn“Are there nucleotide similarities inside of D. melanogaster that I can find with only my peptide sequence from D. yakuba?”
tblastx“Are there translated nucleotide similarities from D. melanogaster that are found in the translated nucleotides of D. yakuba?”

BLAST has three methods for the GEP to understand a target species and our reference species D. melanogaster. Instead of searching broad areas (all NCBI genomes) that may have no information pertaining to your query, BLAST can also be used for more narrow searches such as for Entrez and Assembly searches. It is easy to understand these different search queries as an overlapping funnel that varies based on what you are searching for with BLAST. The figures below show each query type and how they relate to the search results.

An inverted pyramid showing each search type can be considered a subset of the prior. Notice the overall size change is based on your search area.

The differences between Broad (All NCBI), Entrez, and Assembly searches can best be seen like funnels.

Reading the BLAST results page can be daunting at first. Be sure to anchor yourself to your initial scientific question so you don’t get lost. For a detailed breakdown of reading BLAST results, refer to the GEP Tools | NCBI BLAST video tutorial. Below is an example of a BLAST results page and a legend with specific explanations for how to interpret the page.

A BLAST search results page for the target species D. yakuba.

Legend key to interpret the BLAST search results page shown above.

BLAST results within the GEP should not take several hours. Based on our query sizes, you should expect search times of between seconds to several minutes only. If you are getting search times of several hours, we have a few suggestions for this:
  • Review what query sequence you have submitted and against what database.
    • If the incorrect database or query is submitted, BLAST can take much longer to fulfill your request. Take a moment to review your query and databases and raise them again.
For more information on troubleshooting your BLAST results and why these results may be taking so long, feel free to reach out to any of our Virtual TAs or see NCBI’s Frequently Asked Questions and Troubleshooting page.

BLAST results within the GEP should not take several hours. Based on our query sizes, you should expect search times of between seconds to several minutes only. If you are getting search times of several hours, we have several suggestions for this:

  • Review what query sequence you have submitted and against what database.

Parasitoid Wasps Project

Insert text here