international online community for dna barcoding professionals
Many people truncate their ITS sequences to begin with the last bases of the 18S, which facilitates alignment with some algorithms:
CATTA
The 28S begins with this set of bases in Saccharomyces, fairly conserved:
GTTTGACCTCAAATCAGGTAGGAGTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGA
It will be helpful to establish a standard for ITS barcodes, and the proposal that I put on the table is that it be between these two motifs. All of the ITS barcodes that are included in the data release for our paper should then be trimmed to these limits before the data is released.
It will probably also be very helpful to have a standard DEFINITION string in GenBank, which will facilitate Boolean searches.
I suggest "Fungal ITS barcode" as the standard. This would get around the problem of the hundreds of variants of 'internal transcribed spacer including...' that presently occupy part of the DEFINITION field.
Perhaps also GenBank could decide and indicate whether the FEATURES will need to be defined for ITS barcodes. To me, this seems unnecessary.
We should make some decision on this, so discussion is hereby solicited!
keith
Tags:
Permalink Reply by Andrew Miller on May 17, 2011 at 5:34pm Good thoughts Keith.
My lab edits all ITS sequences to include the 5' and 3' primers. In most cases, this is ITS1F and ITS4 and would include your two proposed motifs above so I'm good with this.
If I'm not mistaken, there is a lot of useful information included under the FEATURES section including isolate number, etc.
Andy
Permalink Reply by Bryn T. M. Dentinger on May 17, 2011 at 6:03pm
Permalink Reply by Henrik Nilsson on May 18, 2011 at 5:19am >Keith said:
>It will be helpful to establish a standard for ITS barcodes, and the proposal
>that I put on the table is that it be between these two motifs.
This sounds good to me. What about adding an additional requirement: that the FEATURES field is used to specify the exact start and end of ITS1, 5.8S, and ITS2 in the sequence. Otherwise people might take that sequence and use it under the impression that it features only (full-length) ITS1, 5.8S, and ITS2 (and not any SSU or LSU). That would mess with structure predictions etc.
>Keith said:
>It will probably also be very helpful to have a standard DEFINITION string in GenBank,
>which will facilitate Boolean searches.
I agree. Much needed.
>Bryn said:
>has anyone else seen something other than -CAAAT following the -GACCT (et al) motif?
There is some variation here. I put together a FASTA file for you (100 first bp of LSU for a reasonably large selection of fungi):
http://www.emerencia.org/100LSUforBryn.zip
Please note that in the RNA -> DNA conversion, I replaced all “u/U” with “t/T”. Thus some species names look a bit funny. Also, there may be some few non-fungal species at the bottom, mouse and whatever, used for reference.
Henrik
Permalink Reply by Henrik Nilsson on May 18, 2011 at 9:00am
Permalink Reply by Bryn T. M. Dentinger on May 18, 2011 at 9:14am right, thanks for pointing out my taxonomic bias! looks like there is no universally constant region within the 28S that could serve as a handle without needing exceptions.
but rather than trimming sequences to within these motifs, what would be nice is to have all the ITS sequences trimmed to include the CATTA- and whatever 3' motif we can agree on so that it is easy to recognize if the full ITS regions are included in the sequence. -GACCT and its variants followed by -CRRRT (with a few exceptions) still looks like a reasonable 3' motif, but this is just a suggestion.
© 2012 Created by Matthew Fisher.
Powered by