international online community for dna barcoding professionals
A few thoughts about possible tools for the user --
(1) BLAST against RefSeq genomes:
The BLAST home page (http://blast.ncbi.nlm.nih.gov/Blast.cgi) lists a subset of model organism genomes against which the user can compare his/her data, under the header “BLAST Assembled RefSeq Genomes.” I don’t know if those genomes are annotated with all of the features the user is interested in (e.g., the various types of RNAs), but they are certainly annotated with things such as coding regions (and the corresponding nucleotide spans and reading frames as annotated by RefSeq on the genome) and I’m guessing are annotated with features such as promoters, based on available data for a given organism. If the user finds BLAST hits, presumably the feature annotations they’ve hit can be mapped to the user’s query genome if the score is high enough.
If the user’s organism is not listed in that subset of model organisms, s/he can follow the link for “list all genomic BLAST databases” and click on the round “B” icon beside the organism of interest (e.g., Aspergillus niger in the Fungi group). That will open the organism’s genomic BLAST page.
(2) Genome Workbench (http://www.ncbi.nlm.nih.gov/projects/gbench/).
“An integrated application for viewing and analyzing sequence data. With Genome Workbench, you can view data in publically available sequence databases at NCBI, and mix these data with your own data.” It looks more like a research project (based on its URL and brief help pages) rather than an official public resource, but it is available for public use, of course.
P.S. I also thought about Sequin as a third possible tool for the user, but called [REDACTED] to find out more about what it does and does not do. [REDACTED] confirmed my assumption that Sequin is meant mostly as a submission tool and not an analysis tool, although it has a few prototype analytical capabilities built into it. In case it helps, here are a few notes about Sequin, including comments from [REDACTED]:
The Sequin submission tool (http://www.ncbi.nlm.nih.gov/Sequin/) can handle prokaryotic genome sequences and eukaryotic genome sequences , and it includes some analytical capabilities, such as identification of conserved domains. The following is an excerpt from the “File Menu/Open” section of the Sequin help doc (http://www.ncbi.nlm.nih.gov/Sequin/sequin.hlp.html#Open):
“…[Sequin] can also open a FASTA-formatted sequence file. The sequence will be displayed in Sequin and can be analyzed with tools such as CDD Search [to find conserved domains/functional information], but [your query sequence] should not be submitted [to GenBank], because it does not have the appropriate annotations.”
[REDACTED] also explained --
- Sequin doesn’t find coding sequences, but rather requires the submitter to supply this information either as (1) a table of intervals for the coding regions, which it can then translate into proteins, or (2) as a set of translated protein products, which Sequin compares against the nucleotide sequence using a “suggest intervals” algorithm that suggests likely coding sequences for those proteins, to the best of its ability.
- Finding genes denovo (gene model building based on rules) in a query sequence is still an active area of research, using algorithms that look for things such as start signals, stop signal, splice sites, but it’s still a big guess as to which gene models are real. It’s also more difficult to it’s more difficult to find coding sequences in eukaryotes because of splice sites. NCBI does the latter types of analyses when they annotate genomes, but they also compare genomic sequence data against known genes (which is essentially what the user would be doing via BLAST against genomes).
For any questions about this response, please contact: firstname.lastname@example.org.