Packt+ | Advance your knowledge in tech

You're reading from Building Machine Learning Systems with Python Explore machine learning and deep learning techniques for building intelligent systems using scikit-learn and TensorFlow

Product type Paperback

Published in Jul 2018

Publisher

ISBN-13 9781788623223

Length 406 pages

Edition 3rd Edition

Languages

Python

Tools

Scikit-learn

Concepts

Deep Learning

Authors (3):

Pedro Coelho

Willi Richert

Brucher

View More author details

Table of Contents (21) Chapters

Title Page

Packt Upsell

Contributors

Preface

1. Getting Started with Python Machine Learning FREE CHAPTER

2. Classifying with Real-World Examples

3. Regression

4. Classification I – Detecting Poor Answers

5. Dimensionality Reduction

6. Clustering – Finding Related Posts

7. Recommendations

8. Artificial Neural Networks and Deep Learning

9. Classification II – Sentiment Analysis

10. Topic Modeling

11. Classification III – Music Genre Classification

12. Computer Vision

13. Reinforcement Learning

14. Bigger Data

1. Where to Learn More About Machine Learning

2. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Slimming the classifier

It is always worth looking at the actual contributions of the individual features. For logistic regression, we can directly take the learned coefficients (clf.coef_) to get an impression of the features' impact:

We see that NumCodeLines, LinkCount, AvgWordLen, and NumTextTokens have the highest positive impact on determining whether a post is a good one, while AvgWordLen, LinkCount, and NumCodeLines have a say in that as well, but much less so. This means that being more verbose will more likely result in a classification as a good answer.

On the other side, we have NumAllCaps and NumExclams have negative weights one. That means that the more an answer is shouting, the less likely it will be received well.

Then we have the AvgSentLen feature, which does not seem to help much in detecting a good answer. We could easily drop that feature and retain. However, just from the same classification performance magnitude of the coefficients, we cannot immediately derive the feature...