Reading complex formats using regular expressions
There are many file formats that lack the elegant regularity of a CSV file. One common file format that's rather difficult to parse is a web server log file. These files tend to have complex data without a single separator character or consistent quoting rules.
When we looked at a simplified log file in the Writing generator functions with the yield statement recipe in Chapter 8, Functional And Reactive Programming Features, we saw that the rows look as follows:
[2016-05-08 11:08:18,651] INFO in ch09_r09: Sample Message One
[2016-05-08 11:08:18,651] DEBUG in ch09_r09: Debugging
[2016-05-08 11:08:18,652] WARNING in ch09_r09: Something might have gone wrong
There are a variety of punctuation marks used in this file. The csv
module can't handle this complexity.
How can we process this kind of data with the elegant simplicity of a CSV file? Can we transform these irregular rows to a more regular data structure?
Getting ready
Parsing...