Packt+ | Advance your knowledge in tech

You're reading from Modern Python Standard Library Cookbook Over 100 recipes to fully leverage the features of the standard library in Python

Product type Paperback

Published in Aug 2018

Publisher Packt

ISBN-13 9781788830829

Length 366 pages

Edition 1st Edition

Languages

Python

Concepts

Programming Language

Author (1):

Molina

View More author details

Table of Contents (21) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Containers and Data Structures FREE CHAPTER

2. Text Management

3. Command Line

4. Filesystem and Directories

5. Date and Time

6. Read/Write Data

7. Algorithms

8. Cryptography

9. Concurrency

10. Networking

11. Web Development

12. Multimedia

13. Graphical User Interfaces

14. Development Tools

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Reading XML/HTML content

Reading HTML or XML files allows us to parse web pages' content and to read documents or configurations described in XML.

Python has a built-in XML parser, the ElementTree module which is perfect for parsing XML files, but when HTML is involved, it chokes quickly due to the various quirks of HTML.

Consider trying to parse the following HTML:

<html>
    <body class="main-body">
        <p>hi</p>
        <img><br>
        <input type="text" />
    </body>
</html>

You will quickly face errors:

xml.etree.ElementTree.ParseError: mismatched tag: line 7, column 6

Luckily, it's not too hard to adapt the parser to handle at least the most common HTML files, such as self-closing/void tags.

How to do it...

You need to perform the following steps for this recipe:

ElementTree by default uses expat to parse documents, and then relies on xml.etree.ElementTree.TreeBuilder to build the DOM of the document.

We can replace XMLParser based...

The rest of the chapter is locked

You're reading from Modern Python Standard Library Cookbook Over 100 recipes to fully leverage the features of the standard library in Python

Table of Contents (21) Chapters

Reading XML/HTML content

How to do it...

Authors (1)

Other recommended products

Personalised recommendations for you

You're reading from Modern Python Standard Library Cookbook Over 100 recipes to fully leverage the features of the standard library in Python

Table of Contents (21) Chapters

Reading XML/HTML content

How to do it...

Unlock this book and the full library FREE for 7 days

Authors (1)

Other recommended products

Personalised recommendations for you