Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Hands-On Web Scraping with Python

You're reading from   Hands-On Web Scraping with Python Perform advanced scraping operations using various Python libraries and tools such as Selenium, Regex, and others

Arrow left icon
Product type Paperback
Published in Jul 2019
Publisher Packt
ISBN-13 9781789533392
Length 350 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
 Chapagain Chapagain
Author Profile Icon Chapagain
Chapagain
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Title Page
Copyright and Credits Dedication
About Packt Contributors Preface 1. Web Scraping Fundamentals FREE CHAPTER 2. Python and the Web – Using urllib and Requests 3. Using LXML, XPath, and CSS Selectors 4. Scraping Using pyquery – a Python Library 5. Web Scraping Using Scrapy and Beautiful Soup 6. Working with Secure Web 7. Data Extraction Using Web-Based APIs 8. Using Selenium to Scrape the Web 9. Using Regex to Extract Data 10. Next Steps 1. Other Books You May Enjoy

Introduction to web scraping

Scraping is the process of extracting, copying, screening, or collecting data. Scraping or extracting data from the web (commonly known as websites or web pages, or internet-related resources) is normally termed web scraping.

Web scraping is a process of data extraction from the web that is suitable for certain requirements. Data collection and analysis, and its involvement in information and decision making, plus research-related activities, make the scraping process sensitive for all types of industry.

The popularity of the internet and its resources is causing information domains to evolve every day, which is also causing a growing demand for raw data. Data is the basic requirement in the fields of science, technology, and management. Collected or organized data is processed with varying degrees of logic to obtain information and gain further insights.

Web scraping provides the tools and techniques used to collect data from websites as appropriate for either personal or business-related needs, but with a number of legal considerations. 

There are a number of legal factors to consider before performing scraping tasks. Most websites contain pages such as Privacy Policy, About Us, and Terms and Conditions, where legal terms, prohibited content policies, and general information are available. It's a developer's ethical duty to follow those policies before planning any crawling and scraping activities from websites.

Scraping and crawling are both used quite interchangeably throughout the chapters in this book. Crawling, also known as spidering, is a process used to browse through the links on websites and is often used by search engines for indexing purposes, whereas scraping is mostly related to content extraction from websites. 
You have been reading a chapter from
Hands-On Web Scraping with Python
Published in: Jul 2019
Publisher: Packt
ISBN-13: 9781789533392
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images