Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Learn from Data

Save for later
  • 360 min read
  • 2017-03-09 00:00:00

article-image

In this article by Rushdi Shams, the author of the book Java Data Science Cookbook, we will cover recipes that use machine learning techniques to learn patterns from data. These patterns are at the centre of attention for at least three key machine-learning tasks: classification, regression, and clustering. Classification is the task of predicting a value from a nominal class. In contrast to classification, regression models attempt to predict a value from a numeric class.

(For more resources related to this topic, see here.)

Generating linear regression models

Most of the linear regression modelling follows a general pattern—there will be many independent variables that will be collectively produce a result, which is a dependent variable. For instance, we can generate a regression model to predict the price of a house based on different attributes/features of a house (mostly numeric, real values) like its size in square feet, number of bedrooms, number of washrooms, importance of its location, and so on.

In this recipe, we will use Weka’s Linear Regression classifier to generate a regression model.

Getting ready

In order to perform the recipes in this section, we will require the following:

  1. To download Weka, go to http://www.cs.waikato.ac.nz/ml/weka/downloading.html and you will find download options for Windows, Mac, and other operating systems such as Linux. Read through the options carefully and download the appropriate version.

    During the writing of this article, 3.9.0 was the latest version for the developers and as the author already had version 1.8 JVM installed in his 64-bit Windows machine, he has chosen to download a self-extracting executable for 64-bit Windows without a Java Virtual Machine (JVM)

    learn-data-img-0 

  2. After the download is complete, double-click on the executable file and follow on screen instructions. You need to install the full version of Weka.

    Once the installation is done, do not run the software. Instead, go to the directory where you have installed it and find the Java Archive File for Weka (weka.jar). Add this file in your Eclipse project as external library.

    learn-data-img-1

  3. If you need to download older versions of Weka for some reasons, all of them can be found at https://sourceforge.net/projects/weka/files/. Please note that there is a possibility that many of the methods from old versions are deprecated and therefore not supported any more.

How to do it…

In this recipe, the linear regression model we will be creating is based on the cpu.arff dataset that can be found in the data directory of the Weka installation directory.

  1. Our code will have two instance variables: the first variable will contain the data instances of cpu.arff file and the second variable will be our linear regression classifier.
    Instances cpu = null;
    LinearRegression lReg ;
    

    Next, we will be creating a method to load the ARFF file and assign the last attribute of the ARFF file as its class attribute.

    Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $15.99/month. Cancel anytime
    public void loadArff(String arffInput){
    		DataSource source = null;
    		try {
    			source = new DataSource(arffInput);
    			cpu = source.getDataSet();
    			cpu.setClassIndex(cpu.numAttributes() - 1);
    		} catch (Exception e1) {
    		}
    }
    

    We will be creating a method to build the linear regression model. To do so, we simply need to call the buildClassifier() method of our linear regression variable. The model can directly be sent as parameter to System.out.println().

    public void buildRegression(){	
    		lReg = new LinearRegression();
    		try {
    			lReg.buildClassifier(cpu);
    		} catch (Exception e) {
    		}
    		System.out.println(lReg);
    }
    
  2. The complete code for the recipe is as follows:
    import weka.classifiers.functions.LinearRegression;
    import weka.core.Instances;
    import weka.core.converters.ConverterUtils.DataSource;
    
    public class WekaLinearRegressionTest {
    	Instances cpu = null;
    	LinearRegression lReg ;
    
    	public void loadArff(String arffInput){
    		DataSource source = null;
    		try {
    			source = new DataSource(arffInput);
    			cpu = source.getDataSet();
    			cpu.setClassIndex(cpu.numAttributes() - 1);
    		} catch (Exception e1) {
    		}
    	}
    
    	public void buildRegression(){	
    		lReg = new LinearRegression();
    		try {
    			lReg.buildClassifier(cpu);
    		} catch (Exception e) {
    		}
    		System.out.println(lReg);
    	}
    
    	public static void main(String[] args) throws Exception{
    		WekaLinearRegressionTest test = new WekaLinearRegressionTest();
    		test.loadArff("path to the cpu.arff file");
    		test.buildRegression();
    	}
    }
    
  3. The output of the code is as follows:

    Linear Regression Model

    class =
    
          0.0491 * MYCT +
          0.0152 * MMIN +
          0.0056 * MMAX +
          0.6298 * CACH +
          1.4599 * CHMAX +
        -56.075
    

Generating logistic regression models

Weka has a class named Logistic that can be used for building and using a multinomial logistic regression model with a ridge estimator. Although original Logistic Regression does not deal with instance weights, the algorithm in Weka has been modified to handle the instance weights.

In this recipe, we will use Weka to generate logistic regression model on iris dataset.

How to do it…

We will be generating a logistic regression model from the iris dataset that can be found in the data directory in the installed folder of Weka.

  1. Our code will have two instance variables: one will be containing the data instances of iris dataset and the other will be the logistic regression classifier. 
    Instances iris = null;
    Logistic logReg ;
    
  2. We will be using a method to load and read the dataset as well as assign its class attribute (the last attribute of iris.arff file):
    public void loadArff(String arffInput){
    		DataSource source = null;
    		try {
    			source = new DataSource(arffInput);
    			iris = source.getDataSet();
    			iris.setClassIndex(iris.numAttributes() - 1);
    		} catch (Exception e1) {
    		}
    }
    
  3. Next, we will be creating the most important method of our recipe that builds a logistic regression classifier from the iris dataset:

    public void buildRegression(){	
    		logReg = new Logistic();
    
    		try {
    			logReg.buildClassifier(iris);
    		} catch (Exception e) {
    		}
    		System.out.println(logReg);
    }
    
  4. The complete executable code for the recipe is as follows:
    import weka.classifiers.functions.Logistic;
    import weka.core.Instances;
    import weka.core.converters.ConverterUtils.DataSource;
    
    public class WekaLogisticRegressionTest {
    	Instances iris = null;
    	Logistic logReg ;
    
    	public void loadArff(String arffInput){
    		DataSource source = null;
    		try {
    			source = new DataSource(arffInput);
    			iris = source.getDataSet();
    			iris.setClassIndex(iris.numAttributes() - 1);
    		} catch (Exception e1) {
    		}
    	}
    
    	public void buildRegression(){	
    		logReg = new Logistic();
    
    		try {
    			logReg.buildClassifier(iris);
    		} catch (Exception e) {
    		}
    		System.out.println(logReg);
    	}
    
    	public static void main(String[] args) throws Exception{
    		WekaLogisticRegressionTest test = new WekaLogisticRegressionTest();
    		test.loadArff("path to the iris.arff file ");
    		test.buildRegression();
    	}
    }
    
  5. The output of the code is as follows:
    Logistic Regression with ridge parameter of 1.0E-8
    Coefficients...
                             Class
    Variable           Iris-setosa  Iris-versicolor
    ===============================================
    sepallength            21.8065           2.4652
    sepalwidth              4.5648           6.6809
    petallength           -26.3083          -9.4293
    petalwidth             -43.887         -18.2859
    Intercept               8.1743           42.637
    
    
    Odds Ratios...
                             Class
    Variable           Iris-setosa  Iris-versicolor
    ===============================================
    sepallength    2954196659.8892          11.7653
    sepalwidth             96.0426         797.0304
    petallength                  0           0.0001
    petalwidth                   0                0 

The interpretation of the results from the recipe is beyond the scope of this article. Interested readers are encouraged to see a Stack Overflow discussion here: http://stackoverflow.com/questions/19136213/how-to-interpret-weka-logistic-regression-output.

Summary

In this article, we have covered the recipes that use machine learning techniques to learn patterns from data. These patterns are at the centre of attention for at least three key machine-learning tasks: classification, regression, and clustering. Classification is the task of predicting a value from a nominal class.

Resources for Article:


Further resources on this subject: