Building a text classifier
Classifier units are normally considered to separate a database into various classes. The Naive Bayes classifier scheme is widely considered in literature to segregate the texts based on the trained model. This section of the chapter initially considers a text database with keywords; feature extraction extracts the key phrases from the text and trains the classifier system. Then, term frequency-inverse document frequency (tf-idf) transformation is implemented to specify the importance of the word. Finally, the output is predicted and printed using the classifier system.
How to do it...
- Include the following lines in a new Python file to add datasets:
from sklearn.datasets import fetch_20newsgroups
category_mapping = {'misc.forsale': 'Sellings', 'rec.motorcycles': 'Motorbikes',
'rec.sport.baseball': 'Baseball', 'sci.crypt': 'Cryptography',
'sci.space': 'OuterSpace'}
training_content = fetch_20newsgroups(subset='train',
categories=category_mapping...