Using the goquery package for web scraping
The goquery
package is not part of the standard library, but is available on GitHub. It is intended to work similar to jQuery—a popular JavaScript framework for interacting with the HTML DOM. As demonstrated in the previous sections, trying to search with string matching and regular expressions is both tedious and complicated. The goquery
package makes it much easier to work with HTML content and search for specific elements. The reason I suggest this package is because it is modelled after the very popular jQuery framework that many people are already familiar with.
You can get the goquery
package with the go get
command:
go get https://github.com/PuerkitoBio/goquery
The documentation is available at https://godoc.org/github.com/PuerkitoBio/goquery.
Listing all hyperlinks in a page
For the introduction to the goquery
package, we'll look at a common and simple task. We will find all hyperlinks in a page and print them out. A typical link looks something...