Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Natural Language Processing with TensorFlow

You're reading from   Natural Language Processing with TensorFlow Teach language to machines using Python's deep learning library

Arrow left icon
Product type Paperback
Published in May 2018
Publisher Packt
ISBN-13 9781788478311
Length 472 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
 Saad Saad
Author Profile Icon Saad
Saad
 Ganegedara Ganegedara
Author Profile Icon Ganegedara
Ganegedara
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Natural Language Processing with TensorFlow
Contributors
Preface
1. Introduction to Natural Language Processing FREE CHAPTER 2. Understanding TensorFlow 3. Word2vec – Learning Word Embeddings 4. Advanced Word2vec 5. Sentence Classification with Convolutional Neural Networks 6. Recurrent Neural Networks 7. Long Short-Term Memory Networks 8. Applications of LSTM – Generating Text 9. Applications of LSTM – Image Caption Generation 10. Sequence-to-Sequence Learning – Neural Machine Translation 11. Current Trends and the Future of Natural Language Processing Mathematical Foundations and Advanced TensorFlow Index

Introduction to the TensorFlow seq2seq library


We used the raw TensorFlow API for all our implementations in this book for better transparency of the actual functionality of the models and for a better learning experience. However, TensorFlow has various libraries that hide all the fine-grained details of the implementations. This allows users to implement sequence-to-sequence models like the Neural Machine Translation (NMT) model we saw in Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation with fewer lines of code and without worrying about more specific technical details about how they work. Knowledge about these libraries is important as they provide a much cleaner way of using these models in production code or researching beyond the existing methods. Therefore, we will go through a quick introduction of how to use the TensorFlow seq2seq library. This code is available as an exercise in the seq2seq_nmt.ipynb file.

Defining embeddings for the encoder and decoder

We will first define the encoder inputs, decoder inputs, and decoder output placeholders:

enc_train_inputs = []
dec_train_inputs, dec_train_labels = [],[]
for ui in range(source_sequence_length):
    enc_train_inputs.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_inputs_%d'%ui))

for ui in range(target_sequence_length):
    dec_train_inputs.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_inputs_%d'%ui))
    dec_train_labels.append(tf.placeholder(tf.int32, shape=[batch_size],name='train_outputs_%d'%ui))

Next, we will define the embedding lookup function for all the encoder and decoder inputs, to obtain the word embeddings:

encoder_emb_inp = [tf.nn.embedding_lookup(encoder_emb_layer, src) for src in enc_train_inputs]
encoder_emb_inp = tf.stack(encoder_emb_inp)

decoder_emb_inp = [tf.nn.embedding_lookup(decoder_emb_layer, src) for src in dec_train_inputs]
decoder_emb_inp = tf.stack(decoder_emb_inp)

Defining the encoder

The encoder is made with an LSTM cell as its basic building block. Then, we will define dynamic_rnn, which takes the defined LSTM cell as the input, and the state is initialized with zeros. Then, we will set the time_major parameter to True because our data has the time axis as the first axis (that is, axis 0). In other words, our data has the [sequence_length, batch_size, embeddings_size] shape, where time-dependent sequence_length is in the first axis. The benefit of dynamic_rnn is its ability to handle dynamically sized inputs. You can use the optional sequence_length argument to define the length of each sentence in the batch. For example, consider you have a batch of size [3,30] with three sentences having lengths of [10, 20, 30] (note that we pad the short sentences up to 30 with a special token). Passing a tensor that has values [10, 20, 30] as sequence_length will zero out LSTM outputs that are computed beyond the length of each sentence. For the cell state, it will not zero out, but take the last cell state computed within the length of the sentence and copy that value beyond the length of the sentence, until 30 is reached:

encoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

initial_state = encoder_cell.zero_state(batch_size, dtype=tf.float32)

encoder_outputs, encoder_state = tf.nn.dynamic_rnn(
    encoder_cell, encoder_emb_inp, initial_state=initial_state,
    sequence_length=[source_sequence_length for _ in range(batch_size)], 
    time_major=True, swap_memory=True)

The swap_memory option allows TensorFlow to swap the tensors produced during the inference process between GPU and CPU, in case the model is too complex to fit entirely in the GPU.

Defining the decoder

The decoder is defined similar to the encoder, but has an extra layer called, projection_layer, which represents the softmax output layer for sampling the predictions made by the decoder. We will also define a TrainingHelper function that properly feeds the decoder inputs to the decoder. We also define two types of decoders in this example: a BasicDecoder and BahdanauAttention decoders. (The attention mechanism is discussed in Chapter 10, Sequence-to-Sequence Learning – Neural Machine Translation.) Many other decoders exist in the library, such as BeamSearchDecoder and BahdanauMonotonicAttention:

decoder_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)

projection_layer = Dense(units=vocab_size, use_bias=True)

helper = tf.contrib.seq2seq.TrainingHelper(
    decoder_emb_inp, [target_sequence_length for _ in range(batch_size)], time_major=True)

if decoder_type == 'basic':
    decoder = tf.contrib.seq2seq.BasicDecoder(
        decoder_cell, helper, encoder_state,
        output_layer=projection_layer)
    
elif decoder_type == 'attention':
    decoder = tf.contrib.seq2seq.BahdanauAttention(
        decoder_cell, helper, encoder_state,
        output_layer=projection_layer)

We will use dynamic decoding to get the outputs of the decoder:

outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(
    decoder, output_time_major=True,
    swap_memory=True
)

Next, we will define the logits, cross-entropy loss, and train prediction operations:

logits = outputs.rnn_output

crossent = tf.nn.sparse_softmax_cross_entropy_with_logits(
    labels=dec_train_labels, logits=logits)
loss = tf.reduce_mean(crossent)

train_prediction = outputs.sample_id

Then, we will define two optimizers, where we use AdamOptimizer for the first 10,000 steps and vanilla stochastic GradientDescentOptimizer for the rest of the optimization process. This is because, using Adam optimizer for a long term gives rise to some unexpected behaviors. Therefore, we will use Adam to obtain a good initial position for the SGD optimizer and then use SGD from then on:

with tf.variable_scope('Adam'):
    optimizer = tf.train.AdamOptimizer(learning_rate)
with tf.variable_scope('SGD'):
    sgd_optimizer = tf.train.GradientDescentOptimizer(learning_rate)

gradients, v = zip(*optimizer.compute_gradients(loss))
gradients, _ = tf.clip_by_global_norm(gradients, 25.0)
optimize = optimizer.apply_gradients(zip(gradients, v))

sgd_gradients, v = zip(*sgd_optimizer.compute_gradients(loss))
sgd_gradients, _ = tf.clip_by_global_norm(sgd_gradients, 25.0)
sgd_optimize = optimizer.apply_gradients(zip(sgd_gradients, v))

Note

A rigorous evaluation on how optimizers perform in NMT training is found in a paper by Bahar and others, called, Empirical Investigation of Optimization Algorithms in Neural Machine Translation, The Prague Bulletin of Mathematical Linguistics, 2017.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime
Visually different images