Connect.BarcodeofLife.net

international online community for dna barcoding professionals

Many people truncate their ITS sequences to begin with the last bases of the 18S, which facilitates alignment with some algorithms:

 

CATTA

 

The 28S begins with this set of bases in Saccharomyces, fairly conserved:

 

GTTTGACCTCAAATCAGGTAGGAGTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGA

 

It will be helpful to establish a standard for ITS barcodes, and the proposal that I put on the table is that it be between these two motifs. All of the ITS barcodes that are included in the data release for our paper should then be trimmed to these limits before the data is released.

 

It will probably also be very helpful to have a standard DEFINITION string in GenBank, which will facilitate Boolean searches.

 

I suggest "Fungal ITS barcode" as the standard. This would get around the problem of the hundreds of variants of 'internal transcribed spacer including...' that presently occupy part of the DEFINITION field.

 

Perhaps also GenBank could decide and indicate whether the FEATURES will need to be defined for ITS barcodes. To me, this seems unnecessary.

 

We should make some decision on this, so discussion is hereby solicited!

 

keith

Views: 64

Replies to This Discussion

Good thoughts Keith.

 

My lab edits all ITS sequences to include the 5' and 3' primers.  In most cases, this is ITS1F and ITS4 and would include your two proposed motifs above so I'm good with this.

 

If I'm not mistaken, there is a lot of useful information included under the FEATURES section including isolate number, etc.

 

Andy

we have been routinely trimming our ITS sequences to beginning and ending with the CATTA- and -GACCT motifs. The -GACCT motif can vary a bit depending on which groups you work in (highly variable in Hygrocybe, for instance), but in my experience (agaricomycotina) it is invariably followed by the CAAAT, so is easy to recognize (we recommended the CATTA and GACCT anchors in our paper: DOI: 10.1111/j.1755-0998.2009.02825.x). has anyone else seen something other than -CAAAT following the -GACCT (et al) motif?

>Keith said:

>It will be helpful to establish a standard for ITS barcodes, and the proposal

>that I put on the table is that it be between these two motifs.

 

This sounds good to me. What about adding an additional requirement: that the FEATURES field is used to specify the exact start and end of ITS1, 5.8S, and ITS2 in the sequence. Otherwise people might take that sequence and use it under the impression that it features only (full-length) ITS1, 5.8S, and ITS2 (and not any SSU or LSU). That would mess with structure predictions etc.

 

>Keith said:

>It will probably also be very helpful to have a standard DEFINITION string in GenBank,

>which will facilitate Boolean searches.

 

I agree. Much needed.

 

>Bryn said:

>has anyone else seen something other than -CAAAT following the -GACCT (et al) motif?

 

There is some variation here. I put together a FASTA file for you (100 first bp of LSU for a reasonably large selection of fungi):

 

http://www.emerencia.org/100LSUforBryn.zip

 

Please note that in the RNA -> DNA conversion, I replaced all “u/U” with “t/T”. Thus some species names look a bit funny. Also, there may be some few non-fungal species at the bottom, mouse and whatever, used for reference.

 

Henrik

Well in fact only one is non-fungal: >00062Mts_mtsctlts__Motse_J01871__X00525 (Mus musculus)

right, thanks for pointing out my taxonomic bias! looks like there is no universally constant region within the 28S that could serve as a handle without needing exceptions.

but rather than trimming sequences to within these motifs, what would be nice is to have all the ITS sequences trimmed to include the CATTA- and whatever 3' motif we can agree on so that it is easy to recognize if the full ITS regions are included in the sequence. -GACCT and its variants followed by -CRRRT (with a few exceptions) still looks like a reasonable 3' motif, but this is just a suggestion.

RSS

Translate

Tory's site-wide code

New to the Connect network?


Watch our Intro Webinar


Introduce yourself to the Connect community


Write a blog post


Ask a question

Tory's code

© 2012   Created by Matthew Fisher.   Powered by

Badges  |  Report an Issue  |  Terms of Service