Beautiful Soup
Recipe Difficulty: Medium
Python Version: 3.5
Operating System: Any
In this recipe, we create a website preservation tool leveraging the Beautiful Soup library. This is a library meant to process markup languages, such as HTML or XML, and can be used to easily process these types of data structures. We will use it to identify and extract all links from a web page in a few lines of code. This script is meant to showcase a very simplistic example of a website preservation script; it is by no means intended to replace existing software out there on the market.
Getting started
This recipe requires the installation of the third-party library bs4
. This module can be installed via the following command. All other libraries used in this script are present in Python's standard library.
pip install bs4==0.0.1
Note
Learn more about the bs4
library; visit https://www.crummy.com/software/BeautifulSoup/bs4/doc/.
How to do it...
We will perform the following steps in this recipe:
- Access index web page...