Connect.BarcodeofLife.net

international online community for dna barcoding professionals

Please comment so I can recommend a final set of ITS standards to GenBank for implementation of the BARCODE flag before the end of July.

These are also available as an attached word file on the site. Contentious areas for focus are indicated in blue:

Barcode Standards

1.  Standardized sequence title should be "Fungal ITS barcode".

2.  The barcode includes the complete ITS1 spacer, 5.8S gene, and ITS2 spacer, presented as a single contiguous sequence in the correct 5'-to-3' order.  The barcode includes the entire 5.8S gene, which should not be parsed out of the barcode and need not be indicated as a 'feature' in the GenBank record.  The parsing will be done automatically.

3.  The barcode length will be defined by the bordering criteria below, but will normally be between 400 and 800 bp. In some fungal lineages, shorter or longer ITS barcodes may occur and will be accepted as barcodes, provided they are full-length and meet the prevailing length standards for DNA barcodes in other domains of life.

4.  Barcode starts with the last five bases of the nuclear small ribosomal subunit, normally CATTA, but the formal barcode starts after these five bases.

5. Barcode normally ends with the first 5 bases of the nuclear large ribosomal subunit, which in Saccharomyces cerevisiae are GTTTG, but the formal barcode excludes these bases; barcodes longer than 800 bp need not have the 3' motif provided the 5' motif is included.

{We need to take about longer length  ITS sequences  into account here, records for reads up to 2500 bp exist for some species.

Also,  the immediate start and end of sequences are sometimes of reduced read quality compared to the core part of the sequence, criteria 4 and 5 imply that particular attention has to be paid to the 5' and 3' ends – sometimes another round of sequencing may be needed to verify the authenticity of the barcode in these regions}

6. Metadata follows the CBOL standards for other organisms.

7. Barcodes must not contain IUPAC DNA ambiguity symbols.

8. Multiple, divergent ITS barcode sequences are possible for a single species.

Additional considerations

a.            By submitting a barcode, the user verifies that at least basic steps – such as a BLAST search in INSD – to verify the authenticity and reliability of the sequence with respect to, e.g., taxonomic affiliation and non-chimeric nature.

b.            Users submitting barcodes with significant homopolymer regions – such as ...AAAAAA... - must verify that particular attention was paid to the chromatograms in these regions.

c.             By submitting a barcode, the user acknowledges that s/he has looked at the barcode in a multiple alignment featuring several related sequences, and found the results to be satisfactory.

d. Barcodes must not represent consensus sequences computed from multiple aligments.

Views: 455

Replies to This Discussion

#8: Multiple, divergent ITS barcode sequences are possible for a single species.

Interesting. So barcodes could be submitted from a genome sequence (or clone libraries) with all the different haplotypes? These could well be different than the “consensus” sequence derived from PCR. Or does this refer to gel separation of different PCR’d ITS sequences. As far as I can tell the  experimental derivation of the sequences is not covered in the covered in the CBOL barcode metadata.
http://www.barcodeoflife.org/content/resources/standards-and-guidel...

 

#7: Going backwards because in the barcode standard document I linked to above there is mention of using N’s for low quality data (See 4B). Are N’s OK but not the other symbols e.g. W for A or T. Or are we saying that any ambiguity is forbidden. This is a great goal but might be a big challenge for some groups. It may also encourage submitters to just pick one of the bases if they have an ambiguity, otherwise they won’t get their barcode. Of course having the trace files available this can always be checked at a later point.

# Additional (d): I see what is meant here, but maybe change it to read “multiple sequence alignments” so it could not be confused with multiple trace file alignments.

Great points, Conrad. I like the comprehensive outline of the requirements for the ITS fungal barcode very much. Also thanks to all contributors to this discussion round, which I see that divergent ITS barcodes are a major issue of. I would suggest to use the major ITS sequence type out of the overall ITS moities. the major or predominant ITS type is routinely achieved by direct sequencing by ignoring the 'background noise'. If cloning is required we could think of a selection of 5 (or lower or higher number) of clones to be sequences and compared with each other.   

Hi Conrad, It's great discussion. I suggest having the posibility to include different ITS copies found in the same thallus. There are some cases where different ITS can be found in the same thallus within the range of a single species ITS barcode. Perhaps, we can include this in the point 8.

Do we need to have any minimum number of samples per species studied for the standard barcode?

How we can joudge that the authors have studied enough samples to cover the genetic diversity of a species?

Thanks for starting the discussion!

I would agree with Pradeep Kumar Divakar that one should allow several divergent ITS sequences per individual, and thus rephrase #8 'Multiple, divergent ITS barcode sequences are possible for a single genet.' #8 as it is does no harm, but is not necessary, either. Intraspecific genetic variation is a generally acknowledged reality. In the early barcoding literature it was recognised (for COI) and a species barcode, as opposed to a specimen barcode, was considered complete only if there are at least 10 sequences from 10 different specimens, ideally from geographically distant areas. However, these sequences may stem from different authors and may accumulate over time, thus it does not make sense to include any specimen barcode numbers in these rules which are for 'specimen barcodes' and not for 'species barcodes'. I would not necessarily exclude cloned ITS sequences for the barcode flag, but I would suggest that these ought to be identified as such.

Intragenomic variation goes hand in had with #7. 'Barcodes must not contain IUPAC DNA ambiguity symbols.' Obviously, one would want to exclude low quality data. However, I agree with Bevan Weir that this poses the risk of ignoring obvious ambiguities in the editing process.

To demonstrate the issue of intragenomic ITS variation: I have almost 120 Hebeloma hiemale sequences from all over Europe and the US and not a single one is without ambiguity in good quality traces, recognisable in the reads from both primers. This might be an extreme example, but it is not totally exceptional. Therefore, I would suggest to rephrase #7. 'Barcodes must not contain 'N' and not more than [5] ambiguity symbols in a row.' - 5 is a first  suggestion, knowing that there are taxa where a good proportion of the representatives have 4 ambiguous bp in a row. One might also consider excluding triple-ambiguities, because as soon as these occur, reading direct sequence data becomes more speculative.

Related to point 5, and the inclusion of "NNN". I was quite worried related to the long ITS2 region with multiple repetions in some species, such as Bremia lactucae complex (around 2500 bp) . At least from herbarium material, it was not possible to get good amplification of the whole ITS (including the start and ends proposed); always we need to do it in parts (at least 3 primer pairs). Thus,  after assembling all the sequences, a short part of 5.8S remain without sequence. It could be possible to include NNNN for this part, around 50 bp?. 

Related to point 8. When from the same specimen two or more ribotypes are obtained, as in some coricioid fungi (around 30 specimens/species), in my opinion, we should include the different sequences as barcoding to the species. From the same specimens we have found till 3 clearly ribotypes. 

Speaking about species, although we have not the paper published yet, we have found that some corticioids species can have till 9 ITS ribotypes, but when including the the alignment of the genus, all the sequences are in the same clade. The problem is if we have to decide or to choose one of the sequences as the barcoding of the species. 

Great discussion!  I agree with Bevan regarding the wording of (d) in that you are referring to "multiple sequence alignments" and not "multiple trace file alignments", the latter of which is a mandatory requirement for any barcode.

#2 and #5 seem to be in conflict in that we require the complete ITS1 spacer, 5.8S gene, and ITS2 spacer except when the sequence is longer than 800 bp.  Where then is the 3' end?  Are there examples where the ITS1 spacer and 5.8S gene are already 800 bp long before you even include the ITS2 spacer?

Andy

In the different fungi that I have studied, only some Annulohypoxylon specimens have long ITS region (without including the last five bases from 18S), and including 5.8S is around 730 bp.

RSS

Translate

Tory's site-wide code

New to the Connect network?


Watch our Intro Webinar


Introduce yourself to the Connect community


Write a blog post


Ask a question

Tory's code

© 2013   Created by Matthew Fisher.   Powered by

Badges  |  Report an Issue  |  Terms of Service