Handling forms and forms-based authorization
We are often required to log into a site before we can crawl its content. This is usually done through a form where we enter a user name and password, press Enter, and then granted access to previously hidden content. This type of form authentication is often called cookie authorization, as when we authorize, the server creates a cookie that it can use to verify that you have signed in. Scrapy respects these cookies, so all we need to do is somehow automate the form during our crawl.
Getting ready
We will crawl a page in the containers web site at the following URL: http://localhost:5001/home/secured
. On this page, and links from that page, there is content we would like to scrape. However, this page is blocked by a login. When opening the page in a browser, we are presented with the following login form, where we can enter darkhelmet
as the user name and vespa
as the password:

Username and password credentials are entered
Upon pressing Enter we...