Checking Elasticsearch for a listing before scraping
Now lets leverage Elasticsearch as a cache by checking to see if we already have stored a job listing and hence do not need to hit StackOverflow again. We extend the API for performing a scrape of a job listing to first search Elasticsearch, and if the result is found there we return that data. Hence, we optimize the process by making Elasticsearch a job listings cache.
How to do it
We proceed with the recipe as follows:
The code for this recipe is within 09/05/api.py. The JobListing class now has the following implementation:
class JobListing(Resource):
def get(self, job_listing_id):
print("Request for job listing with id: " + job_listing_id)
es = Elasticsearch()
if (es.exists(index='joblistings', doc_type='job-listing', id=job_listing_id)):
print('Found the document in ElasticSearch')
doc = es.get(index='joblistings', doc_type='job-listing', id=job_listing_id)
return doc...