Caching
Sometimes, our machine learning algorithms will be trained by and/or given input for prediction via data from external sources (for example, APIs), that is, data that isn't local to the application running our modeling or analysis. Further, we might have various sets of data that are being accessed frequently, may be accessed again soon, or may need to be made available while the application is running.
In at least some of these cases, it might make sense to cache data in memory or embed the data locally where the application is running. For example, if you are reaching out to a government API (typically having high latency) for census data frequently, you may consider maintaining a local or in-memory cache of the census data being used so that you can avoid constantly reaching out to the API.
Caching data in memory
To cache a series of values in memory, we will use github.com/patrickmn/go-cache
. With this package, we can create an in-memory cache of keys and corresponding values. We can even specify things, such as the time to live, in the cache for specific key-value pairs.
To create a new in-memory cache and set a key-value pair in the cache, we do the following:
// Create a cache with a default expiration time of 5 minutes, and which // purges expired items every 30 seconds c := cache.New(5*time.Minute, 30*time.Second) // Put a key and value into the cache. c.Set("mykey", "myvalue", cache.DefaultExpiration)
To then retrieve the value for mykey
out of the cache, we just need to use the Get
method:
v, found := c.Get("mykey") if found { fmt.Printf("key: mykey, value: %s\n", v) }
Caching data locally on disk
The caching we just saw is in memory. That is, the cached data exists and is accessible while your application is running, but as soon as your application exits, your data disappears. In some cases, you may want your cached data to stick around when your application restarts or exits. You may also want to back up your cache such that you don't have to start applications from scratch without a cache of relevant data.
In these scenarios, you may consider using a local, embedded cache, such as github.com/boltdb/bolt
. BoltDB, as it is referred to, is a very popular project for these sorts of applications, and basically consists of a local key-value store. To initialize one of these local key-value stores, do the following:
// Open an embedded.db data file in your current directory. // It will be created if it doesn't exist. db, err := bolt.Open("embedded.db", 0600, nil) if err != nil { log.Fatal(err) } defer db.Close() // Create a "bucket" in the boltdb file for our data. if err := db.Update(func(tx *bolt.Tx) error { _, err := tx.CreateBucket([]byte("MyBucket")) if err != nil { return fmt.Errorf("create bucket: %s", err) } return nil }); err != nil { log.Fatal(err) }
You can, of course, have multiple different buckets of data in your BoltDB and use a filename other than embedded.db
.
Next, let's say you had a map of string values in memory that you need to cache in BoltDB. To do this, you would range over the keys and values in the map, updating your BoltDB:
// Put the map keys and values into the BoltDB file. if err := db.Update(func(tx *bolt.Tx) error { b := tx.Bucket([]byte("MyBucket")) err := b.Put([]byte("mykey"), []byte("myvalue")) return err }); err != nil { log.Fatal(err) }
Then, to get values out of BoltDB, you can view your data:
// Output the keys and values in the embedded // BoltDB file to standard out. if err := db.View(func(tx *bolt.Tx) error { b := tx.Bucket([]byte("MyBucket")) c := b.Cursor() for k, v := c.First(); k != nil; k, v = c.Next() { fmt.Printf("key: %s, value: %s\n", k, v) } return nil }); err != nil { log.Fatal(err) }