Connect.BarcodeofLife.net

international online community for dna barcoding professionals

Species classification using DNA sequences VS DNA barcoding

Hello,

I would like to know the differences between species classification via DNA sequences and DNA barcoding? For example:

We are using machine learning algorithm to classify (Aquifex aeolicus, Bacillus subtilis, Aeropyrum pernix and Buchnera sp) which can be downloaded from:

ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/

the dataset description as follow:

Species

Accession

Sequence #

Aquifex aeolicus

AE000657

51

Bacillus subtilis

AL009126

178

Aeropyrum pernix

BA000002

52

Buchnera sp

BA000003

36

Our algorithm achieved 95.85 % of classification accuracy.

Also, is it possible to classify the species from only a part from the DNA sequence (let's say 10 or 20 % of the DNA sequence)? If yes, is it applicable to DNA barcoding?

Thanks

-Taha

Views: 523

Reply to This

Replies to This Discussion

Dear Taha,

the circumscription and delineation of species based on DNA sequence data (classification) is the field of DNA Taxonomy. DNA Barcoding on the other hand is a) the collection of sequence data of an "universal" locus (animals COI, plants rbcL+matK, fungi ITS,...) from taxonomically known entities and b) the use of this barcode collection to identify unkown specimens.

I am not clear about what you are actually doing. Are you feeding the sequence information of the bacterial genomes to the learning algorithm and then use sequence fragments to check if the algorithm can determine the origin of that fragment ?

I am a little bit confused because you are using the term classification. Is it about species identity or about systematic relations of the species ?

sincerly

Thomas

2006 - Recent advances in DNA taxonomy

Attachments:

Dear Thomas,

 

Thanks for your replay and attachment. Sorry for the confusion caused by my terms. Form computer science (machine learning) and statistics prospective, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

In our case, the categories are the four types of species (Aquifex aeolicus, Bacillus subtilis, Aeropyrum pernix and Buchnera sp). Before training our machine learning algorithm, we have a step called "feature extraction", in this step we have collected a statistical information about the DNA sequence to generate a numerical vector for each DNA sequence. All vectors have the same length. The next step is feeding the 60% of the feature vectors to the learning algorithm for a process called "training". After that, using the rest 40% of the feature vectors to test the model in classifying (identifying) each feature vector to its corresponding category (species).

Another experiment, using part (fragment) for example n% of each DNA sequence for all DNA sequences and apply the previous steps on them. Our algorithm has achieved more than 80% of accuracy in identifying (classifying) the nth to its corresponding species.

So, our work is classified as DNA taxonomy?

Best wishes,

Taha

Dear Taha,

from the biological/taxonomic point of view, classification is the process of finding characteristic features to form/define a class/group (e.g. species). What you describe very much sounds like a DNA Barcoding approach, focusing on finding a way to connect DNA sequence data retrieved from an unknown specimen to an existing classification. You are not trying to show that the species are well or badly defined according to their DNA sequences. Which would be DNA Taxonomy.

So, concerning the feature extraction, are you using the complete genome of the species or only a selected part ?

kind rergards

Thomas

Dear Thomas,

Exactly. 

For the feature extraction, we have used all the sequences related to the designated species described in the previous table, which I guess it is the complete genome in our first experiment. In our second experiment, we are trying to identify the parts of DNA sequence which identify the species the best. 

Best wishes,

Taha

Good luck! And let me know when there is a publication to read :-)

Cheers

Thomas

Thanks Thomas for the good wishes :) ... Sure I will. Please let me know if you have any project that I can participate in.

Warm regards,

Taha 

RSS

Translate

Tory's site-wide code

New to the Connect network?


Watch our Intro Webinar


Introduce yourself to the Connect community


Write a blog post


Ask a question

Tory's code

© 2017   Created by Mike Trizna.   Powered by

Badges  |  Report an Issue  |  Terms of Service