Packt+ | Advance your knowledge in tech

You're reading from Hands-On Natural Language Processing with Python A practical guide to applying deep learning architectures to your NLP applications

Product type Paperback

Published in Jul 2018

Publisher Packt

ISBN-13 9781789139495

Length 312 pages

Edition 1st Edition

Languages

Processing

Tools

NLTK

Concepts

Deep Learning

Authors (2):

Rajalingappaa Shanmugamani

Rajesh Arumugam

View More author details

Table of Contents (20) Chapters

Title Page

Packt Upsell

Foreword

Contributors

Preface

1. Getting Started

2. Text Classification and POS Tagging Using NLTK FREE CHAPTER

3. Deep Learning and TensorFlow

4. Semantic Embedding Using Shallow Models

5. Text Classification Using LSTM

6. Searching and DeDuplicating Using CNNs

7. Named Entity Recognition Using Character LSTM

8. Text Generation and Summarization Using GRUs

9. Question-Answering and Chatbots Using Memory Networks

10. Machine Translation Using the Attention-Based Model

11. Speech Recognition Using DeepSpeech

12. Text-to-Speech Using Tacotron

13. Deploying Trained Models

1. Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

Identifying spam in YouTube video comments using RNNs

As a first example, we will look into the problem of identifying spam in YouTube video comments. The complete Jupyter Notebook for this example is available under the Chapter05/02_example.ipynb directory in this book's code repository. The data contains the comments with binary labels specifying whether the comment is genuine or spam. The code that follows loads the comments in CSV format into a pandas DataFrame:

comments_df_list = []
comments_file = ['data/Youtube01-Psy.csv','data/Youtube02-KatyPerry.csv','data/Youtube03-LMFAO.csv',
 'data/Youtube04-Eminem.csv','data/Youtube05-Shakira.csv']
for f in comments_file:
 df = pd.read_csv(f,header=0)
 comments_df_list.append(df)
comments_df = pd.concat(comments_df_list)
comments_df = comments_df.sample(frac=1.0)
print(comments_df.shape)
comments_df.head(5)

The following output shows a sample of the YouTube comments with the various fields: