Packt+ | Advance your knowledge in tech

You're reading from Machine Learning With Go Implement Regression, Classification, Clustering, Time-series Models, Neural Networks, and More using the Go Programming Language

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781785882104

Length 304 pages

Edition 1st Edition

Languages

Concepts

Machine Learning

Author (1):

Joseph Langstaff Whitenack

View More author details

Table of Contents (17) Chapters

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

1. Gathering and Organizing Data

2. Matrices, Probability, and Statistics FREE CHAPTER

3. Evaluation and Validation

4. Regression

5. Classification

6. Clustering

7. Time Series and Anomaly Detection

8. Neural Networks and Deep Learning

9. Deploying and Distributing Analyses and Models

10. Algorithms/Techniques Related to Machine Learning

SQL-like databases

Although there is a good bit of hype around interesting NoSQL databases and key-value stores, SQL-like databases are still ubiquitous. Every data scientist will, at some point, be processing data from an SQL-like database, such as Postgres, MySQL, or SQLite.

For example, we may be required to query one or more tables in a Postgres database to generate a set of features for model training. After using that model to make predictions or identify anomalies, we may send results to another database table that drives a dashboard or other reporting tool.

Go, of course, interacts nicely with all the popular data stores, such as SQL, NoSQL, key-value, and so on, but here, we will focus on SQL-like interactions. We will utilize database/sql for these interactions throughout the book.

Connecting to an SQL database

The first thing we need do before connecting to an SQL-like database is identify the particular database that we will be interacting with and import a corresponding driver. In the following examples, we will be connecting to a Postgres database and will utilize the github.com/lib/pq database driver for database/sql. This driver can be loaded via an empty import (with a corresponding comment):

import (
    "database/sql"
    "fmt"
    "log"
    "os"

    // pq is the library that allows us to connect
    // to postgres with databases/sql.
    _ "github.com/lib/pq"
)

Now let's assume that you have exported the Postgres connection string to an environmental variable PGURL. We can easily create an sql.DB value for our connection via the follow code:

// Get the postgres connection URL. I have it stored in
// an environmental variable.
pgURL := os.Getenv("PGURL")
if pgURL == "" {
    log.Fatal("PGURL empty")
}

// Open a database value. Specify the postgres driver
// for databases/sql.
db, err := sql.Open("postgres", pgURL)
if err != nil {
    log.Fatal(err)
}
defer db.Close()

Note that we need to defer the close method on this value. Also, note that creating this value does not mean that you have made a successful connection to the database. This is merely a value used by database/sql to connect to the database when triggered to do so by certain operations (such as a query).

To ensure that we can make a successful connection to the database, we can use the Ping method:

if err := db.Ping(); err != nil {
    log.Fatal(err)
}

Querying the database

Now that we know how to connect to the database, let's see how we can get data out of the database. We won't cover the specifics of SQL queries and statements in this book. If you are not familiar with SQL, I would highly recommend that you learn how to query, insert, and so on, but for our purposes here, you should know that there are basically two types of operations we want to perform as related to SQL databases:

A Query operation selects, groups, or aggregates data in the database and returns rows of data to us
An Exec operation updates, inserts, or otherwise modifies the state of the database without an expectation that portions of the data stored in the database should be returned

As you might expect, to get data out of our database, we will use a Query operation. To do this, we need to query the database with an SQL statement string. For example, imagine we have a database storing a bunch of iris flower measurements (petal length, petal width, and so on), we could query some of that data related to a particular iris species as follows:

// Query the database.
rows, err := db.Query(`
    SELECT 
        sepal_length as sLength, 
        sepal_width as sWidth, 
        petal_length as pLength, 
        petal_width as pWidth 
    FROM iris
    WHERE species = $1`, "Iris-setosa")
if err != nil {
    log.Fatal(err)
}
defer rows.Close()

Note that this returns a pointer to an sql.Rows value, and we need to defer the closing of this rows value. Then we can loop over our rows and parse the data into values of expected type. We utilize the Scan method on rows to parse out the columns returned by the SQL query and print them to standard out:

// Iterate over the rows, sending the results to
// standard out.
for rows.Next() {

    var (
        sLength float64
        sWidth float64
        pLength float64
        pWidth float64
    )

    if err := rows.Scan(&sLength, &sWidth, &pLength, &pWidth); err != nil {
        log.Fatal(err)
    }

    fmt.Printf("%.2f, %.2f, %.2f, %.2f\n", sLength, sWidth, pLength, pWidth)
}

Finally, we need to check for any errors that might have occurred while processing our rows. We want to maintain the integrity of our data handling, and we cannot assume that we looped over all the rows without encountering an error:

// Check for errors after we are done iterating over rows.
if err := rows.Err(); err != nil {
    log.Fatal(err)
}

Modifying the database

As mentioned earlier, there is another flavor of interaction with the database called Exec. With these types of statements, we are concerned with updating, adding to, or otherwise modifying the state of one or more tables in the database. We use the same type of database connection, but instead of calling db.Query, we will call db.Exec.

For example, let's say we want to update some of the values in our iris database table:

// Update some values.
res, err := db.Exec("UPDATE iris SET species = 'setosa' WHERE species = 'Iris-setosa'")
if err != nil {
    log.Fatal(err)
}

But how do we know whether we were successful and changed something? Well, the res function returned here allows us to see how many rows of our table were affected by our update:

// See how many rows where updated.
rowCount, err := res.RowsAffected()
if err != nil {
    log.Fatal(err)
}

// Output the number of rows to standard out.
log.Printf("affected = %d\n", rowCount)