Working with the CSV format
CSV is a common format to manipulate data. It's common, for example, to import or export a CSV file into Excel. The Go CSV
package operates on data interfaces, and as a result, it's easy to write data to a buffer, stdout, a file, or to a socket. The examples in this section will show some common ways to get data into and out of the CSV format.
Getting ready
Refer to the Getting ready section's steps in the Using the common I/O interfaces recipe.
How to do it...
These steps cover writing and running your application:
- From your terminal/console application, create a new directory called
chapter1/csvformat
. - Navigate to this directory.
- Copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter1/csvformat, or use this as an exercise to write some of your own code!
- Create a file called
read_csv.go
with the following contents:
package csvformat import ( "bytes" "encoding/csv" "fmt" "io" "strconv" ) // Movie will hold our parsed CSV type Movie struct { Title string Director string Year int } // ReadCSV gives shows some examples of processing CSV // that is passed in as an io.Reader func ReadCSV(b io.Reader) ([]Movie, error) { r := csv.NewReader(b) // These are some optional configuration options r.Comma = ';' r.Comment = '-' var movies []Movie // grab and ignore the header for now // we may also wanna use this for a dictionary key or // some other form of lookup _, err := r.Read() if err != nil && err != io.EOF { return nil, err } // loop until it's all processed for { record, err := r.Read() if err == io.EOF { break } else if err != nil { return nil, err } year, err := strconv.ParseInt(record[2], 10, 64) if err != nil { return nil, err } m := Movie{record[0], record[1], int(year)} movies = append(movies, m) } return movies, nil } // AddMoviesFromText uses the CSV parser with a string func AddMoviesFromText() error { // this is an example of us taking a string, converting // it into a buffer, and reading it // with the csv package in := ` - first our headers movie title;director;year released - then some data Guardians of the Galaxy Vol. 2;James Gunn;2017 Star Wars: Episode VIII;Rian Johnson;2017 ` b := bytes.NewBufferString(in) m, err := ReadCSV(b) if err != nil { return err } fmt.Printf("%#vn", m) return nil }
- Create a file called
write_csv.go
with the following contents:
package csvformat import ( "bytes" "encoding/csv" "io" "os" ) // A Book has an Author and Title type Book struct { Author string Title string } // Books is a named type for an array of books type Books []Book // ToCSV takes a set of Books and writes to an io.Writer // it returns any errors func (books *Books) ToCSV(w io.Writer) error { n := csv.NewWriter(w) err := n.Write([]string{"Author", "Title"}) if err != nil { return err } for _, book := range *books { err := n.Write([]string{book.Author, book.Title}) if err != nil { return err } } n.Flush() return n.Error() } // WriteCSVOutput initializes a set of books // and writes the to os.Stdout func WriteCSVOutput() error { b := Books{ Book{ Author: "F Scott Fitzgerald", Title: "The Great Gatsby", }, Book{ Author: "J D Salinger", Title: "The Catcher in the Rye", }, } return b.ToCSV(os.Stdout) } // WriteCSVBuffer returns a buffer csv for // a set of books func WriteCSVBuffer() (*bytes.Buffer, error) { b := Books{ Book{ Author: "F Scott Fitzgerald", Title: "The Great Gatsby", }, Book{ Author: "J D Salinger", Title: "The Catcher in the Rye", }, } w := &bytes.Buffer{} err := b.ToCSV(w) return w, err }
- Create a new directory named
example
.
- Navigate to
example
. - Create a
main.go
file with the following contents and ensure that you modify thecsvformat
import to use the path you set up in step 2:
package main import ( "fmt" "github.com/agtorre/go-cookbook/chapter1/csvformat" ) func main() { if err := csvformat.AddMoviesFromText(); err != nil { panic(err) } if err := csvformat.WriteCSVOutput(); err != nil { panic(err) } buffer, err := csvformat.WriteCSVBuffer() if err != nil { panic(err) } fmt.Println("Buffer = ", buffer.String()) }
- Run go
run main.go
. - You may also run these:
go build ./example
You should see the following output:
$ go run main.go []csvformat.Movie{csvformat.Movie{Title:"Guardians of the Galaxy Vol. 2", Director:"James Gunn", Year:2017}, csvformat.Movie{Title:"Star Wars: Episode VIII", Director:"Rian Johnson", Year:2017}} Author,Title F Scott Fitzgerald,The Great Gatsby J D Salinger,The Catcher in the Rye Buffer = Author,Title F Scott Fitzgerald,The Great Gatsby J D Salinger,The Catcher in the Rye
- If you copied or wrote your own tests, go up one directory and run
go test
, and ensure all tests pass.
How it works...
In order to explore reading a CSV format, we first represent our data as a struct. It's very useful in Go to format data as a struct, as it makes things such as marshaling and encoding relatively simple. Our read example uses movies as our data type. The function takes an io.Reader
interface that holds our CSV data as an input. This could be a file or a buffer. We then use that data to create and populate a Movie
struct, including converting the year into an integer. We also add options to the CSV parser to use ;
as the separator and -
as a comment line.
Next, we explore the same idea, but in reverse. Novels are represented with a title and an author. We initialize an array of novels and then write specific novels in the CSV format to an io.Writer
interface. Once again, this can be a file, stdout, or a buffer.
The CSV
package is an excellent example of why you'd want to think of data flows in Go as implementing common interfaces. It's easy to change the source and destination of our data with small one-line tweaks, and we can easily manipulate CSV data without using an excessive amount of memory or time. For example, it would be possible to read from a stream of data one record at a time and write to a separate stream in a modified format one record at a time. Doing this would not incur significant memory or processor usage.
Later, when we explore data pipelines and worker pools, you'll see how these ideas can be combined and how to handle these streams in parallel.