Packt+ | Advance your knowledge in tech

You're reading from Python Web Scraping Cookbook Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

Product type Paperback

Published in Feb 2018

Publisher Packt

ISBN-13 9781787285217

Length 364 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Data Mining

Author (1):

Michael Heydt

View More author details

Table of Contents (18) Chapters

Title Page

Contributors

Packt Upsell

Preface

1. Getting Started with Scraping FREE CHAPTER

2. Data Acquisition and Extraction

3. Processing Data

4. Working with Images, Audio, and other Assets

5. Scraping - Code of Conduct

6. Scraping Challenges and Solutions

7. Text Wrangling and Analysis

8. Searching, Mining and Visualizing Data

9. Creating a Simple Data API

10. Creating Scraper Microservices with Docker

11. Making the Scraper as a Service Real

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Controlling the length of a crawl

The length of a crawl, in terms of number of pages that can be parsed, can be controlled with the CLOSESPIDER_PAGECOUNT setting.

How to do it

We will be using the script in 06/07_limit_length.py. The script and scraper are the same as the NASA sitemap crawler with the addition of the following configuration to limit the number of pages parsed to 5:

if __name__ == "__main__":
    process = CrawlerProcess({
        'LOG_LEVEL': 'INFO',
        'CLOSESPIDER_PAGECOUNT': 5
    })
    process.crawl(Spider)
    process.start()

When this is run, the following output will be generated (interspersed in the logging output):

<200 https://www.nasa.gov/exploration/systems/sls/multimedia/sls-hardware-being-moved-on-kamag-transporter.html>
<200 https://www.nasa.gov/exploration/systems/sls/M17-057.html>
<200 https://www.nasa.gov/press-release/nasa-awards-contract-for-center-protective-services-for-glenn-research-center/>
<200 https://www.nasa.gov/centers...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Python Web Scraping Cookbook Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

Table of Contents (18) Chapters

Controlling the length of a crawl

How to do it

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Python Web Scraping Cookbook Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

Table of Contents (18) Chapters

Controlling the length of a crawl

How to do it

Authors (1)

Other recommended products

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access