Storing data in Elasticsearch as the result of a scraping request
In this recipe, we extend our API to save the data we received from the scraper into Elasticsearch. We will use this later (in the next recipe) to be able to optimize requests by using the content in Elasticsearch as a cache so that we do not repeat the scraping process for jobs listings already scraped. Therefore, we can play nice with StackOverflows servers.
Getting ready
Make sure you have Elasticsearch running locally, as the code will access Elasticsearch at localhost:9200
. There a good quick-start available at https://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html, or you can check out the docker Elasticsearch recipe in Chapter 10, Creating Scraper Microservices with Docker if you'd like to run it in Docker.
Once installed, you can check proper installation with the following curl
:
curl 127.0.0.1:9200?pretty
If installed properly, you will get output similar to the following:
{ "name": "KHhxNlz...