TensorFlow is based on graph-based computation. Consider the following math expression, for example:
In TensorFlow, this is represented as a computational graph, as shown here. This is powerful because computations are done in parallel:
There are two easy ways to install TensorFlow:
- Using a virtual environment (recommended and described here)
- With a Docker image
For macOS X/Linux variants
The following code snippet creates a Python virtual environment and installs TensorFlow in that environment. You should have Anaconda installed before you run this code:
#Creates a virtual environment named "tensorflow_env" assuming that python 3.7 version is already installed.
conda create -n tensorflow_env python=3.7
#Activate points to the environment named "tensorflow" source activate tensorflow_env
conda install pandas matplotlib jupyter notebook scipy scikit-learn
#installs latest tensorflow version into environment tensorflow_env
pip3 install tensorflow
Please check out the latest updates on the official TensorFlow page, https://www.tensorflow.org/install/.
Try running the following code in your Python console to validate your installation. The console should print Hello World!
if TensorFlow is installed and working:
import tensorflow as tf
#Creating TensorFlow object
hello_constant = tf.constant('Hello World!', name = 'hello_constant')
#Creating a session object for execution of the computational graph
with tf.Session() as sess:
#Implementing the tf.constant operation in the session
output = sess.run(hello_constant)
print(output)
In TensorFlow, data isn't stored as integers, floats, strings, or other primitives. These values are encapsulated in an object called a tensor. It consists of a set of primitive values shaped into an array of any number of dimensions. The number of dimensions in a tensor is called its rank. In the preceding example, hello_constant
is a constant string tensor with rank zero. A few more examples of constant tensors are as follows:
# A is an int32 tensor with rank = 0
A = tf.constant(123)
# B is an int32 tensor with dimension of 1 ( rank = 1 )
B = tf.constant([123,456,789])
# C is an int32 2- dimensional tensor
C = tf.constant([ [123,456,789], [222,333,444] ])
TensorFlow's core program is based on the idea of a computational graph. A computational graph is a directed graph consisting of the following two parts:
- Building a computational graph
- Running a computational graph
A computational graph executes within a session. A TensorFlow session is a runtime environment for the computational graph. It allocates the CPU or GPU and maintains the state of the TensorFlow runtime. The following code creates a session instance named sess
using tf.Session
. Then the sess.run()
function evaluates the tensor and returns the results stored in the output
variable. It finally prints as Hello World!
:
with tf.Session() as sess:
# Run the tf.constant operation in the session
output = sess.run(hello_constant)
print(output)
Using TensorBoard, we can visualize the graph. To run TensorBoard, use the following command:
tensorboard --logdir=path/to/log-directory
Let's create a piece of simple addition code as follows. Create a constant integer x
with value 5
, set the value of a new variable y
after adding 5
to it, and print it:
constant_x = tf.constant(5, name='constant_x')
variable_y = tf.Variable(x + 5, name='variable_y')
print (variable_y)
The difference is that variable_y
isn't given the current value of x + 5
as it should in Python code. Instead, it is an equation; that means, when variable_y
is computed, take the value of x
at that point in time and add 5
to it. The computation of the value of variable_y
is never actually performed in the preceding code. This piece of code actually belongs to the computational graph building section of a typical TensorFlow program. After running this, you'll get something like <tensorflow.python.ops.variables.Variable object at 0x7f074bfd9ef0>
and not the actual value of variable_y
as 10
. To fix this, we have to execute the code section of the computational graph, which looks like this:
#initialize all variables
init = tf.global_variables_initializer()
# All variables are now initialized
with tf.Session() as sess:
sess.run(init)
print(sess.run(variable_y))
Here is the execution of some basic math functions, such as addition, subtraction, multiplication, and division with tensors. For more math functions, please refer to the documentation:
For TensorFlow math functions, go to https://www.tensorflow.org/versions/r0.12/api_docs/python/math_ops/basic_math_functions.
Basic math with TensorFlow
The tf.add()
function takes two numbers, two tensors, or one of each, and it returns their sum as a tensor:
Addition
x = tf.add(1, 2, name=None) # 3
Here's an example with subtraction and multiplication:
x = tf.subtract(1, 2,name=None) # -1
y = tf.multiply(2, 5,name=None) # 10
What if we want to use a non-constant? How to feed an input dataset to TensorFlow? For this, TensorFlow provides an API, tf.placeholder()
, and uses feed_dict
.
A placeholder
is a variable that data is assigned to later in the tf.session.run()
function. With the help of this, our operations can be created and we can build our computational graph without needing the data. Afterwards, this data is fed into the graph through these placeholders with the help of the feed_dict
parameter in tf.session.run()
to set the placeholder
tensor. In the following example, the tensor x
is set to the string Hello World
before the session runs:
x = tf.placeholder(tf.string)
with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Hello World'})
It's also possible to set more than one tensor using feed_dict
, as follows:
x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32, None)
z = tf.placeholder(tf.float32, None)
with tf.Session() as sess:
output = sess.run(x, feed_dict={x: 'Welcome to CNN', y: 123, z: 123.45})
Placeholders can also allow storage of arrays with the help of multiple dimensions. Please see the following example:
import tensorflow as tf
x = tf.placeholder("float", [None, 3])
y = x * 2
with tf.Session() as session:
input_data = [[1, 2, 3],
[4, 5, 6],]
result = session.run(y, feed_dict={x: input_data})
print(result)
Note
This will throw an error as ValueError: invalid literal for...
in cases where the data passed to the feed_dict
parameter doesn't match the tensor type and can't be cast into the tensor type.
The tf.truncated_normal()
function returns a tensor with random values from a normal distribution. This is mostly used for weight initialization in a network:
n_features = 5
n_labels = 2
weights = tf.truncated_normal((n_features, n_labels))
with tf.Session() as sess:
print(sess.run(weights))
The softmax function converts its inputs, known as logit or logit scores, to be between 0 and 1, and also normalizes the outputs so that they all sum up to 1. In other words, the softmax function turns your logits into probabilities. Mathematically, the softmax function is defined as follows:
In TensorFlow, the softmax function is implemented. It takes logits and returns softmax activations that have the same type and shape as input logits, as shown in the following image:
The following code is used to implement this:
logit_data = [2.0, 1.0, 0.1]
logits = tf.placeholder(tf.float32)
softmax = tf.nn.softmax(logits)
with tf.Session() as sess:
output = sess.run(softmax, feed_dict={logits: logit_data})
print( output )
The way we represent labels mathematically is often called one-hot encoding. Each label is represented by a vector that has 1.0 for the correct label and 0.0 for everything else. This works well for most problem cases. However, when the problem has millions of labels, one-hot encoding is not efficient, since most of the vector elements are zeros. We measure the similarity distance between two probability vectors, known as cross-entropy and denoted by D.
Note
Cross-entropy is not symmetric. That means: D(S,L) != D(L,S)
In machine learning, we define what it means for a model to be bad usually by a mathematical function. This function is called loss, cost, or objective function. One very common function used to determine the loss of a model is called the cross-entropy loss. This concept came from information theory (for more on this, please refer to Visual Information Theory at https://colah.github.io/posts/2015-09-Visual-Information/). Intuitively, the loss will be high if the model does a poor job of classifying on the training data, and it will be lowotherwise, as shown here:
Cross-entropy loss function
In TensorFlow, we can write a cross-entropy function using tf.reduce_sum()
; it takes an array of numbers and returns its sum as a tensor (see the following code block):
x = tf.constant([[1,1,1], [1,1,1]])
with tf.Session() as sess:
print(sess.run(tf.reduce_sum([1,2,3]))) #returns 6
print(sess.run(tf.reduce_sum(x,0))) #sum along x axis, prints [2,2,2]
But in practice, while computing the softmax function, intermediate terms may be very large due to the exponentials. So, dividing large numbers can be numerically unstable. We should use TensorFlow's provided softmax and cross-entropy loss API. The following code snippet manually calculates cross-entropy loss and also prints the same using the TensorFlow API:
import tensorflow as tf
softmax_data = [0.1,0.5,0.4]
onehot_data = [0.0,1.0,0.0]
softmax = tf.placeholder(tf.float32)
onehot_encoding = tf.placeholder(tf.float32)
cross_entropy = - tf.reduce_sum(tf.multiply(onehot_encoding,tf.log(softmax)))
cross_entropy_loss = tf.nn.softmax_cross_entropy_with_logits(logits=tf.log(softmax), labels=onehot_encoding)
with tf.Session() as session:
print(session.run(cross_entropy,feed_dict={softmax:softmax_data, onehot_encoding:onehot_data} ))
print(session.run(cross_entropy_loss,feed_dict={softmax:softmax_data, onehot_encoding:onehot_data} ))