Packt+ | Advance your knowledge in tech

You're reading from Python Web Scraping Cookbook Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

Product type Paperback

Published in Feb 2018

Publisher Packt

ISBN-13 9781787285217

Length 364 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Data Mining

Author (1):

Michael Heydt

View More author details

Table of Contents (18) Chapters

Title Page

Contributors

Packt Upsell

Preface

1. Getting Started with Scraping FREE CHAPTER

2. Data Acquisition and Extraction

3. Processing Data

4. Working with Images, Audio, and other Assets

5. Scraping - Code of Conduct

6. Scraping Challenges and Solutions

7. Text Wrangling and Analysis

8. Searching, Mining and Visualizing Data

9. Creating a Simple Data API

10. Creating Scraper Microservices with Docker

11. Making the Scraper as a Service Real

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

How to build robust ETL pipelines with AWS SQS

Scraping a large quantity of sites and data can be a complicated and slow process. But it is one that can take great advantage of parallel processing, either locally with multiple processor threads, or distributing scraping requests to report scrapers using a message queue system. There may also be the need for multiple steps in a process similar to an Extract, Transform, and Load pipeline (ETL). These pipelines can also be easily built using a message queuing architecture in conjunction with the scraping.

Using a message queuing architecture gives our pipeline two advantages:

Robustness
Scalability

The processing becomes robust, as if processing of an individual message fails, then the message can be re-queued for processing again. So if the scraper fails, we can restart it and not lose the request for scraping the page, or the message queue system will deliver the request to another scraper.

It provides scalability, as multiple scrapers on the...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from Python Web Scraping Cookbook Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

Table of Contents (18) Chapters

How to build robust ETL pipelines with AWS SQS

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Python Web Scraping Cookbook Over 90 proven recipes to get you scraping with Python, microservices, Docker, and AWS

Table of Contents (18) Chapters

How to build robust ETL pipelines with AWS SQS

Authors (1)

Other recommended products

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access