SQL-like databases
Although there is a good bit of hype around interesting NoSQL databases and key-value stores, SQL-like databases are still ubiquitous. Every data scientist will, at some point, be processing data from an SQL-like database, such as Postgres, MySQL, or SQLite.
For example, we may be required to query one or more tables in a Postgres database to generate a set of features for model training. After using that model to make predictions or identify anomalies, we may send results to another database table that drives a dashboard or other reporting tool.
Go, of course, interacts nicely with all the popular data stores, such as SQL, NoSQL, key-value, and so on, but here, we will focus on SQL-like interactions. We will utilize database/sql
for these interactions throughout the book.
Connecting to an SQL database
The first thing we need do before connecting to an SQL-like database is identify the particular database that we will be interacting with and import a corresponding driver. In the following examples, we will be connecting to a Postgres database and will utilize the github.com/lib/pq
database driver for database/sql
. This driver can be loaded via an empty import (with a corresponding comment):
import ( "database/sql" "fmt" "log" "os" // pq is the library that allows us to connect // to postgres with databases/sql. _ "github.com/lib/pq" )
Now let's assume that you have exported the Postgres connection string to an environmental variable PGURL
. We can easily create an sql.DB
value for our connection via the follow code:
// Get the postgres connection URL. I have it stored in // an environmental variable. pgURL := os.Getenv("PGURL") if pgURL == "" { log.Fatal("PGURL empty") } // Open a database value. Specify the postgres driver // for databases/sql. db, err := sql.Open("postgres", pgURL) if err != nil { log.Fatal(err) } defer db.Close()
Note that we need to defer the close
method on this value. Also, note that creating this value does not mean that you have made a successful connection to the database. This is merely a value used by database/sql
to connect to the database when triggered to do so by certain operations (such as a query).
To ensure that we can make a successful connection to the database, we can use the Ping
method:
if err := db.Ping(); err != nil { log.Fatal(err) }
Querying the database
Now that we know how to connect to the database, let's see how we can get data out of the database. We won't cover the specifics of SQL queries and statements in this book. If you are not familiar with SQL, I would highly recommend that you learn how to query, insert, and so on, but for our purposes here, you should know that there are basically two types of operations we want to perform as related to SQL databases:
- A
Query
operation selects, groups, or aggregates data in the database and returns rows of data to us - An
Exec
operation updates, inserts, or otherwise modifies the state of the database without an expectation that portions of the data stored in the database should be returned
As you might expect, to get data out of our database, we will use a Query
operation. To do this, we need to query the database with an SQL statement string. For example, imagine we have a database storing a bunch of iris flower measurements (petal length, petal width, and so on), we could query some of that data related to a particular iris species as follows:
// Query the database. rows, err := db.Query(` SELECT sepal_length as sLength, sepal_width as sWidth, petal_length as pLength, petal_width as pWidth FROM iris WHERE species = $1`, "Iris-setosa") if err != nil { log.Fatal(err) } defer rows.Close()
Note that this returns a pointer to an sql.Rows
value, and we need to defer the closing of this rows value. Then we can loop over our rows and parse the data into values of expected type. We utilize the Scan
method on rows to parse out the columns returned by the SQL query and print them to standard out:
// Iterate over the rows, sending the results to // standard out. for rows.Next() { var ( sLength float64 sWidth float64 pLength float64 pWidth float64 ) if err := rows.Scan(&sLength, &sWidth, &pLength, &pWidth); err != nil { log.Fatal(err) } fmt.Printf("%.2f, %.2f, %.2f, %.2f\n", sLength, sWidth, pLength, pWidth) }
Finally, we need to check for any errors that might have occurred while processing our rows. We want to maintain the integrity of our data handling, and we cannot assume that we looped over all the rows without encountering an error:
// Check for errors after we are done iterating over rows. if err := rows.Err(); err != nil { log.Fatal(err) }
Modifying the database
As mentioned earlier, there is another flavor of interaction with the database called Exec
. With these types of statements, we are concerned with updating, adding to, or otherwise modifying the state of one or more tables in the database. We use the same type of database connection, but instead of calling db.Query
, we will call db.Exec
.
For example, let's say we want to update some of the values in our iris database table:
// Update some values. res, err := db.Exec("UPDATE iris SET species = 'setosa' WHERE species = 'Iris-setosa'") if err != nil { log.Fatal(err) }
But how do we know whether we were successful and changed something? Well, the res
function returned here allows us to see how many rows of our table were affected by our update:
// See how many rows where updated. rowCount, err := res.RowsAffected() if err != nil { log.Fatal(err) } // Output the number of rows to standard out. log.Printf("affected = %d\n", rowCount)