





















































In this article by Rushdi Shams, the author of the book Java Data Science Cookbook, we will cover recipes that use machine learning techniques to learn patterns from data. These patterns are at the centre of attention for at least three key machine-learning tasks: classification, regression, and clustering. Classification is the task of predicting a value from a nominal class. In contrast to classification, regression models attempt to predict a value from a numeric class.
(For more resources related to this topic, see here.)
Most of the linear regression modelling follows a general pattern—there will be many independent variables that will be collectively produce a result, which is a dependent variable. For instance, we can generate a regression model to predict the price of a house based on different attributes/features of a house (mostly numeric, real values) like its size in square feet, number of bedrooms, number of washrooms, importance of its location, and so on.
In this recipe, we will use Weka’s Linear Regression classifier to generate a regression model.
In order to perform the recipes in this section, we will require the following:
During the writing of this article, 3.9.0 was the latest version for the developers and as the author already had version 1.8 JVM installed in his 64-bit Windows machine, he has chosen to download a self-extracting executable for 64-bit Windows without a Java Virtual Machine (JVM)
Once the installation is done, do not run the software. Instead, go to the directory where you have installed it and find the Java Archive File for Weka (weka.jar). Add this file in your Eclipse project as external library.
In this recipe, the linear regression model we will be creating is based on the cpu.arff dataset that can be found in the data directory of the Weka installation directory.
Instances cpu = null;
LinearRegression lReg ;
Next, we will be creating a method to load the ARFF file and assign the last attribute of the ARFF file as its class attribute.
public void loadArff(String arffInput){
DataSource source = null;
try {
source = new DataSource(arffInput);
cpu = source.getDataSet();
cpu.setClassIndex(cpu.numAttributes() - 1);
} catch (Exception e1) {
}
}
We will be creating a method to build the linear regression model. To do so, we simply need to call the buildClassifier() method of our linear regression variable. The model can directly be sent as parameter to System.out.println().
public void buildRegression(){
lReg = new LinearRegression();
try {
lReg.buildClassifier(cpu);
} catch (Exception e) {
}
System.out.println(lReg);
}
import weka.classifiers.functions.LinearRegression;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
public class WekaLinearRegressionTest {
Instances cpu = null;
LinearRegression lReg ;
public void loadArff(String arffInput){
DataSource source = null;
try {
source = new DataSource(arffInput);
cpu = source.getDataSet();
cpu.setClassIndex(cpu.numAttributes() - 1);
} catch (Exception e1) {
}
}
public void buildRegression(){
lReg = new LinearRegression();
try {
lReg.buildClassifier(cpu);
} catch (Exception e) {
}
System.out.println(lReg);
}
public static void main(String[] args) throws Exception{
WekaLinearRegressionTest test = new WekaLinearRegressionTest();
test.loadArff("path to the cpu.arff file");
test.buildRegression();
}
}
Linear Regression Model
class =
0.0491 * MYCT +
0.0152 * MMIN +
0.0056 * MMAX +
0.6298 * CACH +
1.4599 * CHMAX +
-56.075
Weka has a class named Logistic that can be used for building and using a multinomial logistic regression model with a ridge estimator. Although original Logistic Regression does not deal with instance weights, the algorithm in Weka has been modified to handle the instance weights.
In this recipe, we will use Weka to generate logistic regression model on iris dataset.
We will be generating a logistic regression model from the iris dataset that can be found in the data directory in the installed folder of Weka.
Instances iris = null;
Logistic logReg ;
public void loadArff(String arffInput){
DataSource source = null;
try {
source = new DataSource(arffInput);
iris = source.getDataSet();
iris.setClassIndex(iris.numAttributes() - 1);
} catch (Exception e1) {
}
}
Next, we will be creating the most important method of our recipe that builds a logistic regression classifier from the iris dataset:
public void buildRegression(){
logReg = new Logistic();
try {
logReg.buildClassifier(iris);
} catch (Exception e) {
}
System.out.println(logReg);
}
import weka.classifiers.functions.Logistic;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
public class WekaLogisticRegressionTest {
Instances iris = null;
Logistic logReg ;
public void loadArff(String arffInput){
DataSource source = null;
try {
source = new DataSource(arffInput);
iris = source.getDataSet();
iris.setClassIndex(iris.numAttributes() - 1);
} catch (Exception e1) {
}
}
public void buildRegression(){
logReg = new Logistic();
try {
logReg.buildClassifier(iris);
} catch (Exception e) {
}
System.out.println(logReg);
}
public static void main(String[] args) throws Exception{
WekaLogisticRegressionTest test = new WekaLogisticRegressionTest();
test.loadArff("path to the iris.arff file ");
test.buildRegression();
}
}
Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
Class
Variable Iris-setosa Iris-versicolor
===============================================
sepallength 21.8065 2.4652
sepalwidth 4.5648 6.6809
petallength -26.3083 -9.4293
petalwidth -43.887 -18.2859
Intercept 8.1743 42.637
Odds Ratios...
Class
Variable Iris-setosa Iris-versicolor
===============================================
sepallength 2954196659.8892 11.7653
sepalwidth 96.0426 797.0304
petallength 0 0.0001
petalwidth 0 0
The interpretation of the results from the recipe is beyond the scope of this article. Interested readers are encouraged to see a Stack Overflow discussion here: http://stackoverflow.com/questions/19136213/how-to-interpret-weka-logistic-regression-output.
In this article, we have covered the recipes that use machine learning techniques to learn patterns from data. These patterns are at the centre of attention for at least three key machine-learning tasks: classification, regression, and clustering. Classification is the task of predicting a value from a nominal class.
Further resources on this subject: