Chapter 6. Retrieving Information from Text Data
In this chapter, we will cover the following recipes:
Detecting tokens (words) using Java
Detecting sentences using Java
Detecting tokens (words) and sentences using OpenNLP
Retrieving lemma and part-of-speech and recognizing named entities from tokens using Stanford CoreNLP
Measuring text similarity with Cosine Similarity measure using Java 8
Extracting topics from text documents using Mallet
Classifying text documents using Mallet
Classifying text documents using Weka