Learning to classify classy answers
In classification, we want to find the corresponding classes, sometimes also called labels, for given data instances. To be able to achieve this, we need to answer two questions:
- How should we represent the data instances?
- Which model or structure should our classifier possess?
Tuning the instance
In its simplest form, in our case the data instance is the answer text itself and the label would be a binary value indicating whether the asker accepted this text as an answer or not. Raw text, however, is a very inconvenient representation to process for most machine learning algorithms. They want numbers. And it will be our task to extract useful features from the raw text, which the machine learning algorithm can then use to learn the right label for it.
Tuning the classifier
Once we have found or collected enough (text, label) pairs, we can train a classifier. For the underlying structure of the classifier, we have a wide range of possibilities, each of them having...