Using logistic regression
Contrary to its name, logistic regression is a classification method. It is an enormously powerful one when it comes to text-based classification; it achieves this by first doing a regression on a logistic function, hence the name.
A bit of math with a small example
To get an initial understanding of the way logistic regression works, let's first take a look at the following example, where we have artificial feature values, X
, plotted with the corresponding classes, 0
or 1
:
from scipy.stats import norm np.random.seed(3) # for reproducibility NUM_PER_CLASS = 40 X_log = np.hstack((norm.rvs(2, size=NUM_PER_CLASS, scale=2), norm.rvs(8, size=NUM_PER_CLASS, scale=3))) y_log = np.hstack((np.zeros(NUM_PER_CLASS), np.ones(NUM_PER_CLASS))).astype(int) plt.xlim((-5, 20)) plt.scatter(X_log, y_log, c=np.array(['blue', 'red'])[y_log], s=10) plt.xlabel("feature value") plt.ylabel("class")
Refer to the following graph:

As we can see, the data...