Packt+ | Advance your knowledge in tech

You're reading from Building Machine Learning Systems with Python Explore machine learning and deep learning techniques for building intelligent systems using scikit-learn and TensorFlow

Product type Paperback

Published in Jul 2018

Publisher

ISBN-13 9781788623223

Length 406 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Deep Learning

Authors (3):

Pedro Coelho

Willi Richert

Brucher

View More author details

Table of Contents (21) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Getting Started with Python Machine Learning FREE CHAPTER

2. Classifying with Real-World Examples

3. Regression

4. Classification I – Detecting Poor Answers

5. Dimensionality Reduction

6. Clustering – Finding Related Posts

7. Recommendations

8. Artificial Neural Networks and Deep Learning

9. Classification II – Sentiment Analysis

10. Topic Modeling

11. Classification III – Music Genre Classification

12. Computer Vision

13. Reinforcement Learning

14. Bigger Data

1. Where to Learn More About Machine Learning

2. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Creating our first classifier

Let's start with the simple and beautiful nearest-neighbor method from Chapter 2, Classifying with Real-world Examples. Although it is not as advanced as other methods, it is very powerful: as it is not model-based, it can learn nearly any data. But this beauty comes with a clear disadvantage, which we will find out very soon (because of which, we had to capitalize learn in the previous sentence).

Engineering the features

As mentioned earlier, we will use the Text and Score features to train our classifier. The problem with Text is that the classifier does not work well with strings. We will have to convert it into one or more numbers. So, what statistics could be useful to extract from a post? Let's start with the number of HTML links, assuming that good posts have a higher chance of having links in them.

We can do this with regular expressions. The following captures all HTML link tags that start with http:// (ignoring the other protocols for now):