Scraping after logging into websites using Scrapy
There are situations where we have to log into websites to access the data we are planning to extract. With Scrapy, we can handle the login forms and cookies easily. We can make use of Scrapy's FormRequest
object; it will deal with the login form and try to log in with the credentials provided.
Getting ready
When we visit a website that has authentication, we need a username and password. In Scrapy, we need the same credentials to log in. So we need to get an account for the website that we plan to scrape.
How to do it...
Here is how we can use Scrapy to crawl websites which require logging in:
- To use the
FormRequest
object, we can update theparse_page
method as follows:
def parse(self, response): return scrapy.FormRequest.from_response( response, formdata={'username': 'username', 'password': 'password'}, callback=self.parse_after_login )
Here, the response object is the HTTP response of the page where we...