Visual scraping with Portia
Portia is a an open-source tool built on top of Scrapy that supports building a spider by clicking on the parts of a website which need to be scraped. This method can be more convenient than creating the CSS or XPath selectors manually.
Installation
Portia is a powerful tool, and it depends on multiple external libraries for its functionality. It is also relatively new, so currently, the installation steps are somewhat involved. In case the installation is simplified in future, the latest documentation can be found at https://github.com/scrapinghub/portia#running-portia. The current recommended way to run Portia is to use Docker (the open-source container framework). If you don't have Docker installed, you'll need to do so first by following the latest instructions (https://docs.docker.com/engine/installation/).
Once Docker is installed and running, you can pull the scrapinghub
image and get started. First, you should be in the directory you'd like to create your...