How-To Tutorials

article-image-implementing-gradient-descent-algorithm-to-solve-optimization-problems

22 Feb 2018

7 min read

Implementing gradient descent algorithm to solve optimization problems

22 Feb 2018

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Rajdeep Dua and Manpreet Singh Ghotra titled Neural Network Programming with Tensorflow. In this book, you will learn to leverage the power of TensorFlow to train neural networks of varying complexities, without any hassle.[/box] Today we will focus on the gradient descent algorithm and its different variants. We will take a simple example of linear regression to solve the optimization problem. Gradient descent is the most successful optimization algorithm. As mentioned earlier, it is used to do weights updates in a neural network so that we minimize the loss function. Let's now talk about an important neural network method called backpropagation, in which we firstly propagate forward and calculate the dot product of inputs with their corresponding weights, and then apply an activation function to the sum of products which transforms the input to an output and adds non linearities to the model, which enables the model to learn almost any arbitrary functional mappings. Later, we back propagate in the neural network, carrying error terms and updating weights values using gradient descent, as shown in the following graph: Different variants of gradient descent Standard gradient descent, also known as batch gradient descent, will calculate the gradient of the whole dataset but will perform only one update. Therefore, it can be quite slow and tough to control for datasets which are extremely large and don't fit in the memory. Let's now look at algorithms that can solve this problem. Stochastic gradient descent (SGD) performs parameter updates on each training example, whereas mini batch performs an update with n number of training examples in each batch. The issue with SGD is that, due to the frequent updates and fluctuations, it eventually complicates the convergence to the accurate minimum and will keep exceeding due to regular fluctuations. Mini-batch gradient descent comes to the rescue here, which reduces the variance in the parameter update, leading to a much better and stable convergence. SGD and mini-batch are used interchangeably. Overall problems with gradient descent include choosing a proper learning rate so that we avoid slow convergence at small values, or divergence at larger values and applying the same learning rate to all parameter updates wherein if the data is sparse we might not want to update all of them to the same extent. Lastly, is dealing with saddle points. Algorithms to optimize gradient descent We will now be looking at various methods for optimizing gradient descent in order to calculate different learning rates for each parameter, calculate momentum, and prevent decaying learning rates. To solve the problem of high variance oscillation of the SGD, a method called momentum was discovered; this accelerates the SGD by navigating along the appropriate direction and softening the oscillations in irrelevant directions. Basically, it adds a fraction of the update vector of the past step to the current update vector. Momentum value is usually set to .9. Momentum leads to a faster and stable convergence with reduced oscillations. Nesterov accelerated gradient explains that as we reach the minima, that is, the lowest point on the curve, momentum is quite high and it doesn't know to slow down at that point due to the large momentum which could cause it to miss the minima entirely and continue moving up. Nesterov proposed that we first make a long jump based on the previous momentum, then calculate the gradient and then make a correction which results in a parameter update. Now, this update prevents us to go too fast and not miss the minima, and makes it more responsive to changes. Adagrad allows the learning rate to adapt based on the parameters. Therefore, it performs large updates for infrequent parameters and small updates for frequent parameters. Therefore, it is very well-suited for dealing with sparse data. The main flaw is that its learning rate is always decreasing and decaying. Problems with decaying learning rates are solved using AdaDelta. AdaDelta solves the problem of decreasing learning rate in AdaGrad. In AdaGrad, the learning rate is computed as one divided by the sum of square roots. At each stage, we add another square root to the sum, which causes the denominator to decrease constantly. Now, instead of summing all prior square roots, it uses a sliding window which allows the sum to decrease. Adaptive Moment Estimation (Adam) computes adaptive learning rates for each parameter. Like AdaDelta, Adam not only stores the decaying average of past squared gradients but additionally stores the momentum change for each parameter. Adam works well in practice and is one of the most used optimization methods today. The following two images (image credit: Alec Radford) show the optimization behavior of optimization algorithms described earlier. We see their behavior on the contours of a loss surface over time. Adagrad, RMsprop, and Adadelta almost quickly head off in the right direction and converge fast, whereas momentum and NAG are headed off-track. NAG is soon able to correct its course due to its improved responsiveness by looking ahead and going to the minimum. The second image displays the behavior of the algorithms at a saddle point. SGD, Momentum, and NAG find it challenging to break symmetry, but slowly they manage to escape the saddle point, whereas Adagrad, Adadelta, and RMsprop head down the negative slope, as can seen from the following image: Which optimizer to choose In the case that the input data is sparse or if we want fast convergence while training complex neural networks, we get the best results using adaptive learning rate methods. We also don't need to tune the learning rate. For most cases, Adam is usually a good choice. Optimization with an example Let's take an example of linear regression, where we try to find the best fit for a straight line through a number of data points by minimizing the squares of the distance from the line to each data point. This is why we call it least squares regression. Essentially, we are formulating the problem as an optimization problem, where we are trying to minimize a loss function. Let's set up input data and look at the scatter plot: # input data xData = np.arange(100, step=.1) yData = xData + 20 * np.sin(xData/10) Define the data size and batch size: # define the data size and batch size nSamples = 1000 batchSize = 100 We will need to resize the data to meet the TensorFlow input format, as follows: # resize input for tensorflow xData = np.reshape(xData, (nSamples, 1)) yData = np.reshape(yData, (nSamples, 1)) The following scope initializes the weights and bias, and describes the linear model and loss function: with tf.variable_scope("linear-regression-pipeline"): W = tf.get_variable("weights", (1,1), initializer=tf.random_normal_initializer()) b = tf.get_variable("bias", (1, ), initializer=tf.constant_initializer(0.0)) # model yPred = tf.matmul(X, W) + b # loss function loss = tf.reduce_sum((y - yPred)**2/nSamples) We then set optimizers for minimizing the loss: # set the optimizer #optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss) #optimizer = tf.train.AdamOptimizer(learning_rate=.001).minimize(loss) #optimizer = tf.train.AdadeltaOptimizer(learning_rate=.001).minimize(loss) #optimizer = tf.train.AdagradOptimizer(learning_rate=.001).minimize(loss) #optimizer = tf.train.MomentumOptimizer(learning_rate=.001, momentum=0.9).minimize(loss) #optimizer = tf.train.FtrlOptimizer(learning_rate=.001).minimize(loss) optimizer = tf.train.RMSPropOptimizer(learning_rate=.001).minimize(loss) We then select the mini batch and run the optimizers errors = [] with tf.Session() as sess: # init variables sess.run(tf.global_variables_initializer()) for _ in range(1000): # select mini batch indices = np.random.choice(nSamples, batchSize) xBatch, yBatch = xData[indices], yData[indices] # run optimizer _, lossVal = sess.run([optimizer, loss], feed_dict={X: xBatch, y: yBatch}) errors.append(lossVal) plt.plot([np.mean(errors[i-50:i]) for i in range(len(errors))]) plt.show() plt.savefig("errors.png") The output of the preceding code is as follows: We also get a sliding curve, as follows: We learned optimization is a complicated subject and a lot depends on the nature and size of our data. Also, optimization depends on weight matrices. A lot of these optimizers are trained and tuned for tasks like image classification or predictions. However, for custom or new use cases, we need to perform trial and error to determine the best solution. To know more about how to build and optimize neural networks using TensorFlow, do checkout this book Neural Network Programming with Tensorflow.

0
0
6706

article-image-build-data-streams-ibm-spss-modeler

Amey Varangaonkar

22 Feb 2018

7 min read

How to build data streams in IBM SPSS Modeler

Amey Varangaonkar

22 Feb 2018

7 min read

[box type="note" align="" class="" width=""]The following excerpt is taken from the book IBM SPSS Modeler Essentials, co-authored by Keith McCormick and Jesus Salcedo. This book gives you a quick overview of the fundamental concepts of data mining and how to put them to practical use with the help of SPSS Modeler.[/box] SPSS Modeler allows users to mine data visually on the stream canvas. This means that you will not be writing code for your data mining projects; instead you will be placing nodes on the stream canvas. Remember that nodes represent operations to be carried out on the data. So once nodes have been placed on the stream canvas, they need to be linked together to form a stream. A stream represents the flow of data going through a number of operations (nodes). The following diagram is an example of nodes on the canvas, as well as a stream: Given that you will spend a lot of time building streams, in this section you will learn the most efficient ways of manipulating nodes to create a stream. Mouse buttons When building streams, mouse buttons are used extensively so that nodes can be brought onto the canvas, connected, edited, and so on. When building streams within Modeler, mouse buttons are used in the following ways: The left button is used for selecting, placing, and positioning nodes on the stream Canvas The right button is used for invoking context (pop-up) menus that allow for editing, connecting, renaming, deleting, and running nodes The middle button (optional) is used for connecting and disconnecting nodes Adding nodes To begin a new stream, a node from the Sources palette needs to be placed on the stream canvas. There are three ways to add nodes to a stream from a palette: Method one: Click on palette and then on stream: Click on the Sources palette. Click on the Var. File node. This will cause the icon to be highlighted. Move the cursor over the stream canvas. Click anywhere in the stream canvas. A copy of the selected node now appears on the stream canvas. This node represents the action of reading data into Modeler from a delimited text data file. If you wish to move the node within the stream canvas, select it by clicking on the node, and while holding the left mouse button down, drag the node to a new position. Method two: Drag and drop: Now go back to the Sources palette. Click on the Statistics File node and drag and drop this node onto the canvas. The Statistics File node represents the action of reading data into Modeler from an IBM SPSS Statistics data file. Method three: Double-click: Go back to the Sources palette one more time. Double click on the Database node. The Database node represents the action of reading data into Modeler from an ODBC compliant database. Editing nodes Once a node has been brought onto the stream canvas, typically at this point you will want to edit the node so that you can specify which fields, cases, or files you want the node to apply to. There are two ways to edit a node. Method one: Right-click on a node: Right-click on the Var. File node: Notice that there are many things you can do within this context menu. You can edit, add comments, copy, delete a node, connect nodes, and so on. Most often you will probably either edit the node or connect nodes. Method two: Double-click on a node: Double-click on the Var. File node. This bypasses the context menu we saw previously, and goes directly into the node itself so we can edit it. Deleting nodes There will be times when you will want to delete a node that you have on the stream canvas. There are two ways to delete a node. Method one: Right-click on a node: Right-click on the Database File node. Select Delete. The node is now deleted. Method two: Use the Delete button from the keyboard: Click on the Statistics File node. Click on the Delete button on the keyboard. Building a stream When two or more nodes have been placed on the stream canvas, they need to be connected to produce a stream. This can be thought of as representing the flow of data through the nodes. To demonstrate this, we will place a Table node on the stream canvas next to the Var. File node. The Table node presents data in a table format. Click the Output palette to activate it. Click on the Table node. Place this node to the right of the Var. File node by clicking in the stream canvas: At this point, we now have two nodes on the stream canvas, however, we technically do not have a stream because the nodes are not speaking to each other (that is, they are not connected). Connecting nodes In order for nodes to work together, they must be connected. Connecting nodes allows you to bring data into Modeler, explore the data, manipulate the data (to either clean it up or create additional fields), build a model, evaluate the model, and ultimately score the data. There are three main ways to connect nodes to form a stream that is, double-clicking, using the middle mouse button, or manually: Method one: Double-click. The simplest way to form a stream is to double-click on nodes on a palette. This method automatically connects the new node to the currently selected node on the stream canvas: Select the Var. File node that is on the stream canvas Double-click the Table node from the Output palette This action automatically connects the Table node to the existing Var. File node, and a connecting arrow appears between the nodes. The head of the arrow indicates the direction of the data flow. Method two: Manually. To manually connect two nodes: Bring a Table node onto the canvas. Right-click on the Var. File node. Select Connect from the context menu. Click the Table node. Method three: Middle mouse button. To use the middle mouse button: Bring a Table node onto the canvas. Use the middle mouse button to click on the Var. File node. While holding the middle mouse button down, drag the cursor over to the Table node. Release the middle mouse button. Deleting connections When you know that you are no longer going to use a node, you can delete it. Often, though, you may not want to delete a node; instead you might want to delete a connection. Deleting a node completely gets rid of the node. Deleting a connection allows you to keep a node with all the edits you have done, but for now the unconnected node will not be part of the stream. Nodes can be disconnected in several ways: Method one: Delete the connecting arrow: Right-click on the connecting arrow. Click Delete Connection. Method two: Right-click on a node: Right-click on one of the nodes that has a connection. Select Disconnect from the Context menu. Method three: Double-clicking: Double-click with the middle mouse button on a node that has a connection. All connections to this node will be severed, but the connections to neighboring nodes will be intact. Thus, we saw it’s fairly easy to build and manage data streams in SPSS Modeler. If you found the above excerpt useful, make sure to check out our book IBM SPSS Modeler Essentials for more tips and tricks on effective data mining.

0
0
3764

article-image-combine-data-files-within-ibm-spss-modeler

Amey Varangaonkar

22 Feb 2018

6 min read

How to combine data files within IBM SPSS Modeler

Amey Varangaonkar

22 Feb 2018

6 min read

[box type="note" align="" class="" width=""]The following extract is taken from the book IBM SPSS Modeler Essentials, written by Keith McCormick and Jesus Salcedo. SPSS Modeler is one of the popularly used enterprise tools for data mining and predictive analytics. [/box] In this article, we will explore how SPSS Modeler can be effectively used to combine different file types for efficient data modeling. In many organizations, different pieces of information for the same individuals are held in separate locations. To be able to analyze such information within Modeler, the data files must be combined into one single file. The Merge node joins two or more data sources so that information held for an individual in different locations can be analyzed collectively. The following diagram shows how the Merge node can be used to combine two separate data files that contain different types of information: Like the Append node, the Merge node is found in the Record Ops palette. This node takes multiple data sources and creates a single source containing all or some of the input fields. Let's go through an example of how to use the Merge node to combine data files: Open the Merge stream. The Merge stream contains the files we previously appended, as well as the main data file we were working with in earlier chapters. 2. Place a Merge node from the Record Ops palette on the canvas. 3. Connect the last Reclassify node to the Merge node. 4. Connect the Filter node to the Merge node. [box type="info" align="" class="" width=""]Like the Append node, the order in which data sources are connected to the Merge node impacts the order in which the sources are displayed. The fields of the first source connected to the Merge node will appear first, followed by the fields of the second source connected to the Merge node, and so on.[/box] 5. Connect the Merge node to a Table node: 6. Edit the Merge node. Since the Merge node can cope with a variety of different situations, the Merge tab allows you to specify the merging method. There are four methods for merging: Order: It joins the first record in the first dataset with the first record in the second dataset, and so on. If any of the datasets run out of records, no further output records are produced. This method can be dangerous if there happens to be any cases that are missing from a file, or if files have been sorted differently. Keys: It is the most commonly used method, used when records that have the same value in the field(s) defined as the key are merged. If multiple records contain the same value on the key field, all possible merges are returned. Condition: It joins records from files that meet a specified condition. Ranked condition: It specifies whether each row pairing in the primary dataset and all secondary datasets are to be merged; use the ranking expression to sort any multiple matches from low to high order. Let's combine these files. To do this: Set Merge Method to Keys. Fields contained in all input sources appear in the Possible keys list. To identify one of more fields as the key field(s), move the selected field into the Keys for merge list. In our case, there are two fields that appear in both files, ID and Year. 2. Select ID in the Possible keys list and place it into the Keys for merge list: There are five major methods of merging using a key field: Include only matching records (inner join) merges only complete records, that is, records; that are available in all datasets. Include matching and non-matching records (full outer join) merges records that appear in any of the datasets; that is, the incomplete records are still retained. The undefined value ($null$) is added to the missing fields and included in the output. Include matching and selected non-matching records (partial outerjoin) performs left and right outer-joins. All records from the specified file are retained, along with only those records from the other file(s) that match records in the specified file on the key field(s). The Select... button allows you to designate which file is to contribute incomplete records. Include records in first dataset not matching any others (anti-join) provides an easy way of identifying records in a dataset that do not have records with the same key values in any of the other datasets involved in the merge. This option only retains records from the dataset that match with no other records. Combine duplicate key fields is the final option in this dialog, and it deals with the problem of duplicate field names (one from each dataset) when key fields are used. This option ensures that there is only one output field with a given name, and this is enabled by default. The Filter tab The Filter tab lists the data sources involved in the merge, and the ordering of the sources determines the field ordering of the merged data. Here, you can rename and remove fields. Earlier, we saw that the field Year appeared in both datasets; here we can remove one version of this field (we could also rename one version of the field to keep both): Click on the arrow next to the second Year field: The second Year field will no longer appear in the combined data file. The Optimization tab The Optimization tab provides two options that allow you to merge data more efficiently when one input dataset is significantly larger than the other datasets, or when the data is already presorted by all or some of the key fields that you are using to merge: Click OK. Run the Table: All of these files have now been combined. The resulting table should have 44 fields and 143,531 records. We saw how the Merge node is used to join data files that contain different information for the same records. If you found this post useful, make sure to check out IBM SPSS Modeler Essentials for more information on leveraging SPSS Modeler to get faster and efficient insights from your data.

0
0
6477

article-image-fat-2018-conference-session-2-summary-interpretability-explainability

Savia Lobo

22 Feb 2018

5 min read

FAT* 2018 Conference Session 2 Summary: Interpretability and Explainability

Savia Lobo

22 Feb 2018

5 min read

This session of the FAT* 2018 is about interpretability and explainability in machine learning models. With the advances in Deep learning, machine learning models have become more accurate. However, with accuracy and advancements, it is a tough task to keep the models highly explainable. This means, these models may appear as black boxes to business users, who utilize them without knowing what lies within. Thus, it is equally important to make ML models interpretable and explainable, which can be beneficial and essential for understanding ML models and to have a ‘behind the scenes’ knowledge of what’s happening within them. This understanding can be highly essential for heavily regulated industries like Finance, Medicine, Defence and so on. The Conference on Fairness, Accountability, and Transparency (FAT), which would be held on the 23rd and 24th of February, 2018 is a multi-disciplinary conference that brings together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems. The FAT 2018 conference will witness 17 research papers, 6 tutorials, and 2 keynote presentations from leading experts in the field. This article covers research papers pertaining to the 2nd session that is dedicated to Interpretability and Explainability of machine-learned decisions. If you’ve missed our summary of the 1st session on Online Discrimination and Privacy, visit the article link for a catch up. Paper 1: Meaningful Information and the Right to Explanation This paper addresses an active debate in policy, industry, academia, and the media about whether and to what extent Europe’s new General Data Protection Regulation (GDPR) grants individuals a “right to explanation” of automated decisions. The paper explores two major papers, European Union Regulations on Algorithmic Decision Making and a “Right to Explanation” by Goodman and Flaxman (2017) Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation by Wachter et al. (2017) This paper demonstrates that the specified framework is built on incorrect legal and technical assumptions. In addition to responding to the existing scholarly contributions, the article articulates a positive conception of the right to explanation, located in the text and purpose of the GDPR. The authors take a position that the right should be interpreted functionally, flexibly, and should, at a minimum, enable a data subject to exercise his or her rights under the GDPR and human rights law. Key takeaways: The first paper by Goodman and Flaxman states that GDPR creates a "right to explanation" but without any argument. The second paper is in response to the first paper, where Watcher et al. have published an extensive critique, arguing against the existence of such a right. The current paper, on the other hand, is partially concerned with responding to the arguments of Watcher et al. Paper 2: Interpretable Active Learning The paper tries to highlight how due to complex and opaque ML models, the process of active learning has also become opaque. Not much has been known about what specific trends and patterns, the active learning strategy may be exploring. The paper expands on explaining about LIME (Local Interpretable Model-agnostic Explanations framework) to provide explanations for active learning recommendations. The authors, Richard Phillips, Kyu Hyun Chang, and Sorelle A. Friedler, demonstrate uses of LIME in generating locally faithful explanations for an active learning strategy. Further, the paper shows how these explanations can be used to understand how different models and datasets explore a problem space over time. Key takeaways: The paper demonstrates how active learning choices can be made more interpretable to non-experts. It also discusses techniques that make active learning interpretable to expert labelers, so that queries and query batches can be explained and the uncertainty bias can be tracked via interpretable clusters. It showcases per-query explanations of uncertainty to develop a system that allows experts to choose whether to label a query. This will allow them to incorporate domain knowledge and their own interests into the labeling process. It introduces a quantified notion of uncertainty bias, the idea that an algorithm may be less certain about its decisions on some data clusters than others. Paper 3: Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment Actuarial risk assessments might be unduly perceived as a neutral way to counteract implicit bias and increase the fairness of decisions made within the criminal justice system, from pretrial release to sentencing, parole, and probation. However, recently, these assessments have come under increased scrutiny, as critics claim that the statistical techniques underlying them might reproduce existing patterns of discrimination and historical biases that are reflected in the data. The paper proposes that machine learning should not be used for prediction, but rather to surface covariates that are fed into a causal model for understanding the social, structural and psychological drivers of crime. The authors, Chelsea Barabas, Madars Virza, Karthik Dinakar, Joichi Ito (MIT), Jonathan Zittrain (Harvard), propose an alternative application of machine learning and causal inference away from predicting risk scores to risk mitigation. Key takeaways: The paper gives a brief overview of how risk assessments have evolved from a tool used solely for prediction to one that is diagnostic at its core. The paper places a debate around risk assessment in a broader context. One can get a fuller understanding of the way these actuarial tools have evolved to achieve a varied set of social and institutional agendas. It argues for a shift away from predictive technologies, towards diagnostic methods that will help in understanding the criminogenic effects of the criminal justice system itself, as well as evaluate the effectiveness of interventions designed to interrupt cycles of crime. It proposes that risk assessments, when viewed as a diagnostic tool, can be used to understand the underlying social, economic and psychological drivers of crime. The authors also posit that causal inference offers the best framework for pursuing the goals to achieve a fair and ethical risk assessment tool.

0
0
4476

article-image-how-to-perform-numeric-metric-aggregations-with-elasticsearch

Pravin Dhandre

22 Feb 2018

7 min read

How to perform Numeric Metric Aggregations with Elasticsearch

Pravin Dhandre

22 Feb 2018

7 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book Learning Elastic Stack 6.0 written by Pranav Shukla and Sharath Kumar M N . This book provides detailed coverage on fundamentals of each components of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box] Today, we are going to demonstrate how to run numeric and statistical queries such as summation, average, count and various similar metric aggregations on Elastic Stack to serve a better analytics engine on your dataset. Metric aggregations Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations. We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations. We will learn about the following metric aggregations: Sum, average, min, and max aggregations Stats and extended stats aggregations Cardinality aggregation Let us learn about them one by one. Sum, average, min, and max aggregations Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with SQL, the query to find the sum would look like the following: SELECT sum(downloadTotal) FROM usageReport; The preceding query will calculate the sum of the downloadTotal field across all records in the table. This requires going through all records of the table or all records in the given context and adding the values of the given fields. In Elasticsearch, a similar query can be written using the sum aggregation. Let us understand the sum aggregation first. Sum aggregation Here is how to write a simple sum aggregation: GET bigginsight/_search { "aggregations": { 1 "download_sum": { 2 "sum": { 3 "field": "downloadTotal" 4 } } }, "size": 0 5 } The aggs or aggregations element at the top level should wrap any aggregation. Give a name to the aggregation; here we are doing the sum aggregation on the downloadTotal field and hence the name we chose is download_sum. You can name it anything. This field will be useful while looking up this particular aggregation's result in the response. We are doing a sum aggregation, hence the sum element. We want to do term aggregation on the downloadTotal field. Specify size = 0 to prevent raw search results from being returned. We just want aggregation results and not the search results in this case. Since we haven't specified any top level query elements, it matches all documents. We do not want any raw documents (or search hits) in the result. The response should look like the following: { "took": 92, ... "hits": { "total": 242836, 1 "max_score": 0, "hits": [] }, "aggregations": { 2 "download_sum": { 3 "value": 2197438700 4 } } } Let us understand the key aspects of the response. The key parts are numbered 1, 2, 3, and so on, and are explained in the following points: The hits.total element shows the number of documents that were considered or were in the context of the query. If there was no additional query or filter specified, it will include all documents in the type or index. Just like the request, this response is wrapped inside aggregations to indicate as Such. The response of the aggregation requested by us was named download_sum, hence we get our response from the sum aggregation inside an element with the same name. The actual value after applying the sum aggregation. The average, min, and max aggregations are very similar. Let's look at them briefly. Average aggregation The average aggregation finds an average across all documents in the querying context: GET bigginsight/_search { "aggregations": { "download_average": { 1 "avg": { 2 "field": "downloadTotal" } } }, "size": 0 } The only notable differences from the sum aggregation are as follows: We chose a different name, download_average, to make it apparent that the aggregation is trying to compute the average. The type of aggregation that we are doing is avg instead of the sum aggregation that we were doing earlier. The response structure is identical but the value field will now represent the average of the requested field. The min and max aggregations are the exactly same. Min aggregation Here is how we will find the minimum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_min": { "min": { "field": "downloadTotal" } } }, "size": 0 } Let's finally look at max aggregation also. Max aggregation Here is how we will find the maximum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_max": { "max": { "field": "downloadTotal" } } }, "size": 0 } These aggregations were really simple. Now let's look at some more advanced yet simple stats and extended stats aggregations. Stats and extended stats aggregations These aggregations compute some common statistics in a single request without having to issue multiple requests. This saves resources on the Elasticsearch side as well because the statistics are computed in a single pass rather than being requested multiple times. The client code also becomes simpler if you are interested in more than one of these statistics. Let's look at the stats aggregation first. Stats aggregation The stats aggregation computes the sum, average, min, max, and count of documents in a single pass: GET bigginsight/_search { "aggregations": { "download_stats": { "stats": { "field": "downloadTotal" } } }, "size": 0 } The structure of the stats request is the same as the other metric aggregations we have seen so far, so nothing special is going on here. The response should look like the following: { "took": 4, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_stats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700 } } } As you can see, the response with the download_stats element contains count, min, max, average, and sum; everything is included in the same response. This is very handy as it reduces the overhead of multiple requests and also simplifies the client code. Let us look at the extended stats aggregation. Extended stats Aggregation The extended stats aggregation returns a few more statistics in addition to the ones returned by the stats aggregation: GET bigginsight/_search { "aggregations": { "download_estats": { "extended_stats": { "field": "downloadTotal" } } }, "size": 0 } The response looks like the following: { "took": 15, "timed_out": false, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_estats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700, "sum_of_squares": 133545882701698, "variance": 468058704.9782911, "std_deviation": 21634.664429528162, "std_deviation_bounds": { "upper": 52318.43092424462, "lower": -34220.22679386803 } } } } It also returns the sum of squares, variance, standard deviation, and standard deviation Bounds. Cardinality aggregation Finding the count of unique elements can be done with the cardinality aggregation. It is similar to finding the result of a query such as the following: select count(*) from (select distinct username from usageReport) u; Finding the cardinality or the number of unique values for a specific field is a very common requirement. If you have click-stream from the different visitors on your website, you may want to find out how many unique visitors you got in a given day, week, or month. Let us understand how we find out the count of unique users for which we have network traffic data: GET bigginsight/_search { "aggregations": { "unique_visitors": { "cardinality": { "field": "username" } } }, "size": 0 } The cardinality aggregation response is just like the other metric aggregations: { "took": 110, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "unique_visitors": { "value": 79 } } } To summarize, we learned how to perform numerous metric aggregations on numeric datasets and easily deploy elasticsearch in building powerful analytics application. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.

0
0
10533

article-image-working-with-kafka-streams

Amarabha Banerjee

22 Feb 2018

6 min read

Working with Kafka Streams

Amarabha Banerjee

22 Feb 2018

6 min read

This article is a book excerpt from Apache Kafka 1.0 Cookbook written by Raúl Estrada. This book will simplify real-time data processing by leveraging Apache Kafka 1.0. In today’s tutorial we are going to discuss how to work with Apache Kafka Streams efficiently. In the data world, a stream is linked to the most important abstractions. A stream depicts a continuously updating and unbounded process. Here, unbounded means unlimited size. By definition, a stream is a fault-tolerant, replayable, and ordered sequence of immutable data records. A data record is defined as a key-value pair. Before we proceed, some concepts need to be defined: Stream processing application: Any program that utilizes the Kafka streams library is known as a stream processing application. Processor topology: This is a topology that defines the computational logic of the data processing that a stream processing application requires to be performed. A topology is a graph of stream processors (nodes) connected by streams (edges). There are two ways to define a topology: Via the low-level processor API Via the Kafka streams DSL Stream processor: This is a node present in the processor topology. It represents a processing step in a topology and is used to transform data in streams. The standard operations—filter, join, map, and aggregations—are examples of stream processors available in Kafka streams. Windowing: Sometimes, data records are divided into time buckets by a stream processor to window the stream by time. This is usually required for aggregation and join operations. Join: When two or more streams are merged based on the keys of their data records, a new stream is generated. The operation that generates this new stream is called a join. A join over record streams is usually required to be performed on a windowing basis. Aggregation: A new stream is generated by combining multiple input records into a single output record, by taking one input stream. The operation that creates this new stream is known as aggregation. Examples of aggregations are sums and counts. Setting up the project This recipe sets the project to use Kafka streams in the Treu application project. Getting ready The project generated in the first four chapters is needed. How to do it Open the build.gradle file on the Treu project generated in Chapter 4, Message Enrichment, and add these lines: apply plugin: 'java' apply plugin: 'application' sourceCompatibility = '1.8' mainClassName = 'treu.StreamingApp' repositories { mavenCentral() } version = '0.1.0' dependencies { compile 'org.apache.kafka:kafka-clients:1.0.0' compile 'org.apache.kafka:kafka-streams:1.0.0' compile 'org.apache.avro:avro:1.7.7' } jar { manifest { attributes 'Main-Class': mainClassName } from { configurations.compile.collect { it.isDirectory() ? it : zipTree(it) } } { exclude "META-INF/*.SF" exclude "META-INF/*.DSA" exclude "META-INF/*.RSA" } } To rebuild the app, from the project root directory, run this command: $ gradle jar The output is something like: ... BUILD SUCCESSFUL Total time: 24.234 secs As the next step, create a file called StreamingApp.java in the src/main/java/treu directory with the following contents: package treu; import org.apache.kafka.streams.StreamsBuilder; import org.apache.kafka.streams.Topology; import org.apache.kafka.streams.KafkaStreams; import org.apache.kafka.streams.StreamsConfig; import org.apache.kafka.streams.kstream.KStream; import java.util.Properties; public class StreamingApp { public static void main(String[] args) throws Exception { Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streaming_app_id");// 1 props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); //2 StreamsConfig config = new StreamsConfig(props); // 3 StreamsBuilder builder = new StreamsBuilder(); //4 Topology topology = builder.build(); KafkaStreams streams = new KafkaStreams(topology, config); KStream<String, String> simpleFirstStream = builder.stream("src-topic"); //5 KStream<String, String> upperCasedStream = simpleFirstStream.mapValues(String::toUpperCase); //6 upperCasedStream.to("out-topic"); //7 System.out.println("Streaming App Started"); streams.start(); Thread.sleep(30000); //8 System.out.println("Shutting down the Streaming App"); streams.close(); } } How it works Follow the comments in the code: In line //1, the APPLICATION_ID_CONFIG is an identifier for the app inside the broker In line //2, the BOOTSTRAP_SERVERS_CONFIG specifies the broker to use In line //3, the StreamsConfig object is created, it is built with the properties specified In line //4, the StreamsBuilder object is created, it is used to build a topology In line //5, when KStream is created, the input topic is specified In line //6, another KStream is created with the contents of the src-topic but in uppercase In line //7, the uppercase stream should write the output to out-topic In line //8, the application will run for 30 seconds Running the streaming application In the previous recipe, the first version of the streaming app was coded. Now, in this recipe, everything is compiled and executed. Getting ready The execution of the previous recipe of this chapter is needed. How to do it The streaming app doesn't receive arguments from the command line: To build the project, from the treu directory, run the following command: $ gradle jar If everything is OK, the output should be: ... BUILD SUCCESSFUL Total time: … To run the project, we have four different command-line windows. The following diagram shows what the arrangement of command-line windows should look like: In the first command-line Terminal, run the control center: $ <confluent-path>/bin/confluent start In the second command-line Terminal, create the two topics needed: $ bin/kafka-topics --create --topic src-topic --zookeeper localhost:2181 --partitions 1 --replication-factor 1 $ bin/kafka-topics --create --topic out-topic --zookeeper localhost:2181 --partitions 1 --replication-factor 1 In that command-line Terminal, start the producer: $ bin/kafka-console-producer --broker-list localhost:9092 --topic src-topic This window is where the input messages are typed. In the third command-line Terminal, start a consumer script listening to outtopic: $ bin/kafka-console-consumer --bootstrap-server localhost:9092 -- from-beginning --topic out-topic In the fourth command-line Terminal, start up the processing application. Go the project root directory (where the Gradle jar command was executed) and run: $ java -jar ./build/libs/treu-0.1.0.jar localhost:9092 Go to the second command-line Terminal (console-producer) and send the following three messages (remember to press Enter between messages and execute each one in just one line): $> Hello [Enter] $> Kafka [Enter] $> Streams [Enter] The messages typed in console-producer should appear uppercase in the outtopic console consumer window: > HELLO > KAFKA > STREAMS We discussed about the Apache Kafka streams and how to get up and running with it. If you liked this post, be sure to check out Apache Kafka 1.0 Cookbook which consists of more useful recipes to work with Apache Kafka installation.

0
0
3396

article-image-introduction-wordpress-plugin

Packt

22 Feb 2018

13 min read

Introduction to WordPress Plugin

Packt

22 Feb 2018

13 min read

In this article, Yannick Lefebvre, author of Wordpress Plugin Development Cookbook, Second Edition will cover the following recipes: Creating a new shortcode with parameters Managing multiple sets of user settings from a single admin page WordPress shortcodes are a simple, yet powerful tool that can be used to automate the insertion of code into web pages. For example, a shortcode could be used to automate the insertion of videos from a third-party platform that is not supported natively by WordPress, or embed content from a popular web site. By following the two code samples found in this article, you will learn how to create a WordPress plugin that defines your own shortcode to be able to quickly embed Twitter feeds on a web site. You will also learn how to create an administration configuration panel to be able to create a set of configurations that can be referenced when using your newly-created shortcode. Creating a new shortcode with parameters While simple shortcodes already provide a lot of potential to output complex content to a page by entering a few characters in the post editor, shortcodes become even more useful when they are coupled with parameters that will be passed to their associated processing function. Using this technique, it becomes very easy to create a shortcode that accelerates the insertion of external content in WordPress posts or pages by only needing to specify the shortcode and the unique identifier of the source element to be displayed. We will illustrate this concept in this recipe by creating a shortcode that will be used to quickly add Twitter feeds to posts or pages. How to do it... Navigate to the WordPress plugin directory of your development installation. Create a new directory called ch3-twitter-embed. Navigate to this directory and create a new text file called ch3-twitter-embed.php. Open the new file in a code editor and add an appropriate header at the top of the plugin file, naming the plugin Chapter 2 - Twitter Embed. Add the following line of code to declare a new shortcode and specify the name of the function that should be called when the shortcode is found in posts or pages: add_shortcode( 'twitterfeed', 'ch3te_twitter_embed_shortcode' ); Add the following code section to provide an implementation for the ch3te_twitter_embed_shortcode function: function ch3te_twitter_embed_shortcode( $atts ) { extract( shortcode_atts( array( 'user_name' => 'ylefebvre' ), $atts ) ); if ( !empty( $user_name ) ) { $output = '<a class="twitter-timeline" href="'; $output .= esc_url( 'https://twitter.com/' . $user_name ); $output .= '">Tweets by ' . esc_html( $user_name ); $output .= '</a><script async '; $output .= 'src="//platform.twitter.com/widgets.js"'; $output .= ' charset="utf-8"></script>'; } else { $output = ''; } return $output; }. Save and close the plugin file. Log in to the administration page of your development WordPress installation. Click on Plugins in the left-hand navigation menu. Activate your new plugin. Create a new page and use the shortcode [twitterfeed user_name='WordPress'] in the page editor, where WordPress is the Twitter username of the feed to display: Save and view the page to see that the shortcode was replaced by an embedded Twitter feed on your site. Edit the page and remove the user_name parameter and its associated value, only leaving the core [twitterfeed] shortcode in the post and Save. Refresh the page and see that the feed is still being displayed but now shows tweets from another account. How it works... When shortcodes are used with parameters, these extra pieces of data are sent to the associated processing function in the $atts parameter variable. By using a combination of the standard PHP extract and WordPress-specific shortcode_atts functions, our plugin is able to parse the data sent to the shortcode and create an array of identifiers and values that are subsequently transformed into PHP variables that we can use in the rest of our shortcode implementation function. In this specific example, we expect a single variable to be used, called user_name, which will be stored in a PHP variable called $user_name. If the user enters the shortcode without any parameter, a default value of ylefebvre will be assigned to the username variable to ensure that the plugin still works. Since we are going to accept user input in this code, we also verify that the user did not provide an empty string and we use the esc_html and esc_url functions to remove any potentially harmful HTML characters from the input string and make sure that the link destination URL is valid. Once we have access to the twitter username, we can put together the required HTML code that will embed a Twitter feed in our page and display the selected user's tweets. While this example only has one argument, it is possible to define multiple parameters for a shortcode. Managing multiple sets of user settings from a single admin page Throughout this article, you have learned how to create configuration pages to manage single sets of configuration options for our plugins. In some cases, only being able to specify a single set of options will not be enough. For example, looking back at the Twitter embed shortcode plugin that was created, a single configuration panel would only allow users to specify one set of options, such as the desired twitter feed dimensions or the number of tweets to display. A more flexible solution would be to allow users to specify multiple sets of configuration options, which could then be called up by using an extra shortcode parameter (for example, [twitterfeed user_name="WordPress" option_id="2"]). While the first thought that might cross your mind to configure such a plugin is to create a multi-level menu item with submenus to store a number of different settings, this method would produce a very awkward interface for users to navigate. A better way is to use a single panel but give the user a way to select between multiple sets of options to be modified. In this recipe, you will learn how to enhance the previously created Twitter feed shortcode plugin to be able to control the embedded feed size and number of tweets to display from the plugin configuration panel and to give users the ability to specify multiple display sizes. Getting ready You should have already followed the Creating a new shortcode with parameters recipe in the article to have a starting point for this recipe. Alternatively, you can get the resulting code (Chapter 2/ch3-twitter-embed/ch3-twitter-embed.php) from the downloaded code bundle. How to do it... Navigate to the ch3-twitter-embed folder of the WordPress plugin directory of your development installation. Open the ch3-twitter-embed.php file in a text editor. Add the following lines of code to implement an activation callback to initialize plugin options when it is installed or upgraded: register_activation_hook( __FILE__, 'ch3te_set_default_options_array' ); function ch3te_set_default_options_array() { ch3te_get_options(); } Introduction to WordPress Plugin [ 6 ] function ch3te_get_options( $id = 1 ) { $options = get_option( 'ch3te_options_' . $id, array() ); $new_options['setting_name'] = 'Default'; $new_options['width'] = 560; $new_options['number_of_tweets'] = 3; $merged_options = wp_parse_args( $options, $new_options ); $compare_options = array_diff_key( $new_options, $options ); if ( empty( $options ) || !empty( $compare_options ) ) { update_option( 'ch3te_options_' . $id, $merged_options ); } return $merged_options; } Insert the following code segment to register a function to be called when the administration menu is put together. When this happens, the callback function adds an item to the Settings menu and specifies the function to be called to render the configuration page: // Assign function to be called when admin menu is constructed add_action( 'admin_menu', 'ch3te_settings_menu' ); // Function to add item to Settings menu and // specify function to display options page content function ch3te_settings_menu() { add_options_page( 'Twitter Embed Configuration', 'Twitter Embed', 'manage_options', 'ch3te-twitter-embed', 'ch3te_config_page' ); Add the following code to implement the configuration page rendering function: // Function to display options page content function ch3te_config_page() { // Retrieve plugin configuration options from database if ( isset( $_GET['option_id'] ) ) { $option_id = intval( $_GET['option_id'] ); } elseif ( isset( $_POST['option_id'] ) ) { $option_id = intval( $_POST['option_id'] ); } else { Introduction to WordPress Plugin [ 7 ] $option_id = 1; } $options = ch3te_get_options( $option_id ); ?> <div id="ch3te-general" class="wrap"> <h3>Twitter Embed</h3>  <?php if ( isset( $_GET['message'] ) && $_GET['message'] == '1' ) { ?> <div id='message' class='updated fade'> <p><strong>Settings Saved</strong></p></div> <?php } ?>  <div id="icon-themes" class="icon32"><br></div> <h3 class="nav-tab-wrapper"> <?php for ( $counter = 1; $counter <= 5; $counter++ ) { $temp_options = ch3te_get_options( $counter); $class = ( $counter == $option_id ) ? ' nav-tabactive' : ''; ?> <a class="nav-tab<?php echo $class; ?>" href="<?php echo add_query_arg( array( 'page' => 'ch3te-twitterembed', 'option_id' => $counter ), admin_url( 'options-general.php' ) ); ?>"><?php echo $counter; ?><?php if ( $temp_options !== false ) echo ' (' . $temp_options['setting_name'] . ')'; else echo ' (Empty)'; ?></a> <?php } ?> </h3><br />  <form name="ch3te_options_form" method="post" action="admin-post.php"> <input type="hidden" name="action" value="save_ch3te_options" /> <input type="hidden" name="option_id" value="<?php echo $option_id; ?>" /> <?php wp_nonce_field( 'ch3te' ); ?> <table> <tr><td>Setting name</td> <td><input type="text" name="setting_name" value="<?php echo esc_html( $options['setting_name'] ); ?>"/> </td> </tr> <tr><td>Feed width</td> <td><input type="text" name="width" Introduction to WordPress Plugin [ 8 ] value="<?php echo esc_html( $options['width'] ); ?>"/></td> </tr> <tr><td>Number of Tweets to display</td> <td><input type="text" name="number_of_tweets" value="<?php echo esc_html( $options['height'] ); ?>"/></td> </tr> </table><br /> <input type="submit" value="Submit" class="buttonprimary" /> </form> </div> <?php } Add the following block of code to register a function that will process user options when submitted to the site: add_action( 'admin_init', 'ch3te_admin_init' ); function ch3te_admin_init() { add_action( 'admin_post_save_ch3te_options', 'process_ch3te_options' ); Add the following code to implement the process_ch3te_options function, declared in the previous block of code, and to declare a utility function used to clean the redirection path: // Function to process user data submission function process_ch3te_options() { // Check that user has proper security level if ( !current_user_can( 'manage_options' ) ) { wp_die( 'Not allowed' ); } // Check that nonce field is present check_admin_referer( 'ch3te' ); // Check if option_id field was present if ( isset( $_POST['option_id'] ) ) { $option_id = intval( $_POST['option_id'] ); } else { $option_id = 1; } // Build option name and retrieve options $options = ch3te_get_options( $option_id ); // Cycle through all text fields and store their Introduction to WordPress Plugin [ 9 ] values foreach ( array( 'setting_name' ) as $param_name ) { if ( isset( $_POST[$param_name] ) ) { $options[$param_name] = sanitize_text_field( $_POST[$param_name] ); } } // Cycle through all numeric fields, convert to int and store foreach ( array( 'width', 'number_of_tweets' ) as $param_name ) { if ( isset( $_POST[$param_name] ) ) { $options[$param_name] = intval( $_POST[$param_name] ); } } // Store updated options array to database $options_name = 'ch3te_options_' . $option_id; update_option( $options_name, $options ); $cleanaddress = add_query_arg( array( 'message' => 1, 'option_id' => $option_id, 'page' => 'ch3te-twitter-embed' ), admin_url( 'options-general.php' ) ); wp_redirect( $cleanaddress ); exit; } // Function to process user data submission function process_ch3te_options() { // Check that user has proper security level if ( !current_user_can( 'manage_options' ) ) { wp_die( 'Not allowed' ); } // Check that nonce field is present check_admin_referer( 'ch3te' ); // Check if option_id field was present if ( isset( $_POST['option_id'] ) ) { $option_id = intval( $_POST['option_id'] ); } else { $option_id = 1; } // Build option name and retrieve options $options = ch3te_get_options( $option_id ); // Cycle through all text fields and store their values foreach ( array( 'setting_name' ) as $param_name ) { if ( isset( $_POST[$param_name] ) ) { $options[$param_name] = sanitize_text_field( $_POST[$param_name] ); } } Find the ch3te_twitter_embed_shortcode function and modify it as follows to accept the new option_id parameter and load the plugin options to produce the desired output. The changes are identified in bold within the recipe: function ch3te_twitter_embed_shortcode( $atts ) { extract( shortcode_atts( array( 'user_name' => 'ylefebvre', 'option_id' => '1' ), $atts ) ); if ( intval( $option_id ) < 1 || intval( $option_id ) > 5 ) { $option_id = 1; } $options = ch3te_get_options( $option_id ); if ( !empty( $user_name ) ) { $output = '<a class="twitter-timeline" href="'; $output .= esc_url( 'https://twitter.com/' . $user_name ); $output .= '" data-width="' . $options['width'] . Save and close the plugin file. Deactivate and then Activate the Chapter 2 - Twitter Embed plugin from the administration interface to execute its activation function and create default settings. Navigate to the Settings menu and select the Twitter Embed submenu item to see the newly created configuration panel with the first set of options being displayed and more sets of options accessible through the drop-down list shown at the top of the page. To select the set of options to be used, add the parameter option_id to the shortcode used to display a Twitter feed, as follows: [twitterfeed user_name="WordPress" option_id="1"] How it works... This recipe shows how we can leverage options arrays to create multiple sets of options simply by creating the name of the options array on the fly. Instead of having a specific option name in the first parameter of the get_option function call, we create a string with an option ID. This ID is sent through as a URL parameter on the configuration page and as a hidden text field when processing the form data. On initialization, the plugin only creates a single set of options, which is probably enough for most casual users of the plugin. Doing so will avoid cluttering the site database with useless options. When the user requests to view one of the empty option sets, the plugin creates a new set of options right before rendering the options page. The rest of the code is very similar to the other examples that we saw in this article, since the way to access the array elements remains the same. Summary In this article, the author has explained about the entire process of how to create a new shortcode with parameters and how to manage multiple sets of user settings from a single admin page.

0
0
6881

article-image-exchange-management-shell-common-tasks

Packt

21 Feb 2018

11 min read

Exchange Management Shell Common Tasks

Packt

21 Feb 2018

11 min read

0
0
5686

article-image-classify-emails-using-deep-neural-networks-generating-tf-idf

Savia Lobo

21 Feb 2018

9 min read

How to classify emails using deep neural networks after generating TF-IDF

Savia Lobo

21 Feb 2018

9 min read

[box type="note" align="" class="" width=""]This article is an excerpt taken from the book Natural Language Processing with Python Cookbook written by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti. This book will teach you how to efficiently use NLTK and implement text classification, identify parts of speech, tag words, and more. You will also learn how to analyze sentence structures and master lexical analysis, syntactic and semantic analysis, pragmatic analysis, and application of deep learning techniques.[/box] In this article, you will learn how to use deep neural networks to classify emails into one of the 20 pre-trained categories based on the words present in each email. This is a simple model to start with understanding the subject of deep learning and its applications on NLP. Getting ready The 20 newsgroups dataset from scikit-learn have been utilized to illustrate the concept. Number of observations/emails considered for analysis are 18,846 (train observations - 11,314 and test observations - 7,532) and its corresponding classes/categories are 20, which are shown in the following: >>> from sklearn.datasets import fetch_20newsgroups >>> newsgroups_train = fetch_20newsgroups(subset='train') >>> newsgroups_test = fetch_20newsgroups(subset='test') >>> x_train = newsgroups_train.data >>> x_test = newsgroups_test.data >>> y_train = newsgroups_train.target >>> y_test = newsgroups_test.target >>> print ("List of all 20 categories:") >>> print (newsgroups_train.target_names) >>> print ("n") >>> print ("Sample Email:") >>> print (x_train[0]) >>> print ("Sample Target Category:") >>> print (y_train[0]) >>> print (newsgroups_train.target_names[y_train[0]]) In the following screenshot, a sample first data observation and target class category has been shown. From the first observation or email we can infer that the email is talking about a two-door sports car, which we can classify manually into autos category which is 8. Note: Target value is 7 due to the indexing starts from 0), which is validating our understanding with actual target class 7. How to do it… Using NLP techniques, we have pre-processed the data for obtaining finalized word vectors to map with final outcomes spam or ham. Major steps involved are: Pre-processing. Removal of punctuations. Word tokenization. Converting words into lowercase. Stop word removal. Keeping words of length of at least 3. Stemming words. POS tagging. Lemmatization of words: TF-IDF vector conversion. Deep learning model training and testing. Model evaluation and results discussion. How it works... The NLTK package has been utilized for all the pre-processing steps, as it consists of all the necessary NLP functionality under one single roof: # Used for pre-processing data >>> import nltk >>> from nltk.corpus import stopwords >>> from nltk.stem import WordNetLemmatizer >>> import string >>> import pandas as pd >>> from nltk import pos_tag >>> from nltk.stem import PorterStemmer The function written (pre-processing) consists of all the steps for convenience. However, we will be explaining all the steps in each section: >>> def preprocessing(text): The following line of the code splits the word and checks each character to see if it contains any standard punctuations, if so it will be replaced with a blank or else it just don't replace with blank: ... text2 = " ".join("".join([" " if ch in string.punctuation else ch for ch in text]).split()) The following code tokenizes the sentences into words based on whitespaces and puts them together as a list for applying further steps: ... tokens = [word for sent in nltk.sent_tokenize(text2) for word in nltk.word_tokenize(sent)] Converting all the cases (upper, lower and proper) into lower case reduces duplicates in corpus: ... tokens = [word.lower() for word in tokens] As mentioned earlier, Stop words are the words that do not carry much of weight in understanding the sentence; they are used for connecting words and so on. We have removed them with the following line of code: ... stopwds = stopwords.words('english') ... tokens = [token for token in tokens if token not in stopwds] Keeping only the words with length greater than 3 in the following code for removing small words which hardly consists of much of a meaning to carry; ... tokens = [word for word in tokens if len(word)>=3] Stemming applied on the words using Porter stemmer which stems the extra suffixes from the words: ... stemmer = PorterStemmer() ... tokens = [stemmer.stem(word) for word in tokens] POS tagging is a prerequisite for lemmatization, based on whether word is noun or verb or and so on. it will reduce it to the root word ... tagged_corpus = pos_tag(tokens) pos_tag function returns the part of speed in four formats for Noun and six formats for verb. NN - (noun, common, singular), NNP - (noun, proper, singular), NNPS - (noun, proper, plural), NNS - (noun, common, plural), VB - (verb, base form), VBD - (verb, past tense), VBG - (verb, present participle), VBN - (verb, past participle), VBP - (verb, present tense, not 3rd person singular), VBZ - (verb, present tense, third person singular) ... Noun_tags = ['NN','NNP','NNPS','NNS'] ... Verb_tags = ['VB','VBD','VBG','VBN','VBP','VBZ'] ... lemmatizer = WordNetLemmatizer() The following function, prat_lemmatize, has been created only for the reasons of mismatch between the pos_tag function and intake values of lemmatize function. If the tag for any word falls under the respective noun or verb tags category, n or v will be applied accordingly in lemmatize function: ... def prat_lemmatize(token,tag): ... if tag in Noun_tags: ... return lemmatizer.lemmatize(token,'n') ... elif tag in Verb_tags: ... return lemmatizer.lemmatize(token,'v') ... else: ... return lemmatizer.lemmatize(token,'n') After performing tokenization and applied all the various operations, we need to join it back to form stings and the following function performs the same: ... pre_proc_text = " ".join([prat_lemmatize(token,tag) for token,tag in tagged_corpus]) ... return pre_proc_text Applying pre-processing on train and test data: >>> x_train_preprocessed = [] >>> for i in x_train: ... x_train_preprocessed.append(preprocessing(i)) >>> x_test_preprocessed = [] >>> for i in x_test: ... x_test_preprocessed.append(preprocessing(i)) # building TFIDF vectorizer >>> from sklearn.feature_extraction.text import TfidfVectorizer >>> vectorizer = TfidfVectorizer(min_df=2, ngram_range=(1, 2), stop_words='english', max_features= 10000,strip_accents='unicode', norm='l2') >>> x_train_2 = vectorizer.fit_transform(x_train_preprocessed).todense() >>> x_test_2 = vectorizer.transform(x_test_preprocessed).todense() After the pre-processing step has been completed, processed TF-IDF vectors have to be sent to the following deep learning code: # Deep Learning modules >>> import numpy as np >>> from keras.models import Sequential >>> from keras.layers.core import Dense, Dropout, Activation >>> from keras.optimizers import Adadelta,Adam,RMSprop >>> from keras.utils import np_utils The following image produces the output after firing up the preceding Keras code. Keras has been installed on Theano, which eventually works on Python. A GPU with 6 GB memory has been installed with additional libraries (CuDNN and CNMeM) for four to five times faster execution, with a choking of around 20% memory; hence only 80% memory out of 6 GB is available; The following code explains the central part of the deep learning model. The code is self- explanatory, with the number of classes considered 20, batch size 64, and number of epochs to train, 20: # Definition hyper parameters >>> np.random.seed(1337) >>> nb_classes = 20 >>> batch_size = 64 >>> nb_epochs = 20 The following code converts the 20 categories into one-hot encoding vectors in which 20 columns are created and the values against the respective classes are given as 1. All other classes are given as 0: >>> Y_train = np_utils.to_categorical(y_train, nb_classes) In the following building blocks of Keras code, three hidden layers (1000, 500, and 50 neurons in each layer respectively) are used, with dropout as 50% for each layer with Adam as an optimizer: #Deep Layer Model building in Keras #del model >>> model = Sequential() >>> model.add(Dense(1000,input_shape= (10000,))) >>> model.add(Activation('relu')) >>> model.add(Dropout(0.5)) >>> model.add(Dense(500)) >>> model.add(Activation('relu')) >>> model.add(Dropout(0.5)) >>> model.add(Dense(50)) >>> model.add(Activation('relu')) >>> model.add(Dropout(0.5)) >>> model.add(Dense(nb_classes)) >>> model.add(Activation('softmax')) >>> model.compile(loss='categorical_crossentropy', optimizer='adam') >>> print (model.summary()) The architecture is shown as follows and describes the flow of the data from a start of 10,000 as input. Then there are 1000, 500, 50, and 20 neurons to classify the given email into one of the 20 categories: The model is trained as per the given metrics: # Model Training >>> model.fit(x_train_2, Y_train, batch_size=batch_size, epochs=nb_epochs,verbose=1) The model has been fitted with 20 epochs, in which each epoch took about 2 seconds. The loss has been minimized from 1.9281 to 0.0241. By using CPU hardware, the time required for training each epoch may increase as a GPU massively parallelizes the computation with thousands of threads/cores: Finally, predictions are made on the train and test datasets to determine the accuracy, precision, and recall values: #Model Prediction >>> y_train_predclass = model.predict_classes(x_train_2,batch_size=batch_size) >>> y_test_predclass = model.predict_classes(x_test_2,batch_size=batch_size) >>> from sklearn.metrics import accuracy_score,classification_report >>> print ("nnDeep Neural Network - Train accuracy:"),(round(accuracy_score( y_train, y_train_predclass),3)) >>> print ("nDeep Neural Network - Test accuracy:"),(round(accuracy_score( y_test,y_test_predclass),3)) >>> print ("nDeep Neural Network - Train Classification Report") >>> print (classification_report(y_train,y_train_predclass)) >>> print ("nDeep Neural Network - Test Classification Report") >>> print (classification_report(y_test,y_test_predclass)) It appears that the classifier is giving a good 99.9% accuracy on the train dataset and 80.7% on the test dataset. We learned the classification of emails using DNNs(Deep Neural Networks) after generating TF-IDF. If you found this post useful, do check out this book Natural Language Processing with Python Cookbook to further analyze sentence structures and application of various deep learning techniques.

0
0
10377

article-image-installing-tensorflow-in-windows-ubuntu-and-mac-os

Amarabha Banerjee

21 Feb 2018

7 min read

Installing TensorFlow in Windows, Ubuntu and Mac OS

Amarabha Banerjee

21 Feb 2018

7 min read

[box type="note" align="" class="" width=""]This article is taken from the book Machine Learning with Tensorflow 1.x, written by Quan Hua, Shams Ul Azeem and Saif Ahmed. This book will help tackle common commercial machine learning problems with Google’s TensorFlow 1.x library.[/box] Today, we shall explore the basics of getting started with TensorFlow, its installation and configuration process. The proliferation of large public datasets, inexpensive GPUs, and open-minded developer culture has revolutionized machine learning efforts in recent years. Training data, the lifeblood of machine learning, has become widely available and easily consumable in recent years. Computing power has made the required horsepower available to small businesses and even individuals. The current decade is incredibly exciting for data scientists. Some of the top platforms used in the industry include Caffe, Theano, and Torch. While the underlying platforms are actively developed and openly shared, usage is limited largely to machine learning practitioners due to difficult installations, non-obvious configurations, and difficulty with productionizing solutions. TensorFlow has one of the easiest installations of any platform, bringing machine learning capabilities squarely into the realm of casual tinkerers and novice programmers. Meanwhile, high-performance features, such as—multiGPU support, make the platform exciting for experienced data scientists and industrial use as well. TensorFlow also provides a reimagined process and multiple user-friendly utilities, such as TensorBoard, to manage machine learning efforts. Finally, the platform has significant backing and community support from the world's largest machine learning powerhouse--Google. All this is before even considering the compelling underlying technical advantages, which we'll dive into later. Installing TensorFlow TensorFlow conveniently offers several types of installation and operates on multiple operating systems. The basic installation is CPU-only, while more advanced installations unleash serious horsepower by pushing calculations onto the graphics card, or even to multiple graphics cards. We recommend starting with a basic CPU installation at first. More complex GPU and CUDA installations will be discussed in Appendix, Advanced Installation. Even with just a basic CPU installation, TensorFlow offers multiple options, which are as follows: A basic Python pip installation A segregated Python installation via Virtualenv A fully segregated container-based installation via Docker Ubuntu installation Ubuntu is one of the best Linux distributions for working with Tensorflow. We highly recommend that you use an Ubuntu machine, especially if you want to work with GPU. We will do most of our work on the Ubuntu terminal. We will begin with installing pythonpip and python-dev via the following command: sudo apt-get install python-pip python-dev A successful installation will appear as follows: If you find missing packages, you can correct them via the following command: sudo apt-get update --fix-missing Then, you can continue the python and pip installation. We are now ready to install TensorFlow. The CPU installation is initiated via the following command: sudo pip install tensorflow A successful installation will appear as follows: macOS installation If you use Python, you will probably already have the Python package installer, pip. However, if not, you can easily install it using the easy_install pip command. You'll note that we actually executed sudo easy_install pip—the sudo prefix was required because the installation requires administrative rights. We will make the fair assumption that you already have the basic package installer, easy_install, available; if not, you can install it from https://pypi.python.org/pypi/setuptools. A successful installation will appear as shown in the following screenshot: Next, we will install the six package: sudo easy_install --upgrade six A successful installation will appear as shown in the following screenshot: Surprisingly, those are the only two prerequisites for TensorFlow, and we can now install the core platform. We will use the pip package installer mentioned earlier and install TensorFlow directly from Google's site. The most recent version at the time of writing this book is v1.3, but you should change this to the latest version you wish to use: sudo pip install tensorflow The pip installer will automatically gather all the other required dependencies. You will see each individual download and installation until the software is fully installed. A successful installation will appear as shown in the following screenshot: That's it! If you were able to get to this point, you can start to train and run your first model. Skip to Chapter 2, Your First Classifier, to train your first model. macOS X users wishing to completely segregate their installation can use a VM instead, as described in the Windows installation. Windows installation As we mentioned earlier, TensorFlow with Python 2.7 does not function natively on Windows. In this section, we will guide you through installing TensorFlow with Python 3.5 and set up a VM with Linux if you want to use TensorFlow with Python 2.7. First, we need to install Python 3.5.x or 3.6.x 64-bit from the following links: https://www.python.org/downloads/release/python-352/ https://www.python.org/downloads/release/python-362/ Make sure that you download the 64-bit version of Python where the name of the installation has amd64, such as python-3.6.2-amd64.exe. The Python 3.6.2 installation looks like this: We will select Add Python 3.6 to PATH and click Install Now. The installation process will complete with the following screen: We will click the Disable path length limit and then click Close to finish the Python installation. Now, let's open the Windows PowerShell application under the Windows menu. We will install the CPU-only version of Tensorflow with the following command: pip3 install tensorflow. The result of the installation will look like this: Congratulations, you can now use TensorFlow on Windows with Python 3.5.x or 3.6.x support. In the next section, we will show you how to set up a VM to use TensorFlow with Python 2.7. However, you can skip to the Test installation section of Chapter 2, Your First Classifier, if you don't need Python 2.7. Now, we will show you how to set up a VM with Linux to use TensorFlow with Python 2.7. We recommend the free VirtualBox system available at https://www.virtualbox.org/wiki/Downloads. The latest stable version at the time of writing is v5.0.14, available at the following URL: http:/ / download. virtualbox. org/ virtualbox/ 5. 1. 28/ VirtualBox- 5. 1. 28- 117968- Win. exe A successful installation will allow you to run the Oracle VM VirtualBox Manager dashboard, which looks like this: Testing the installation In this section, we will use TensorFlow to compute a simple math operation. First, open your terminal on Linux/macOS or Windows PowerShell in Windows. Now, we need to run python to use TensorFlow with the following command: python Enter the following program in the Python shell: import tensorflow as tf a = tf.constant(1.0) b = tf.constant(2.0) c = a + b sess = tf.Session() print(sess.run(c)) The result will look like the following screen where 3.0 is printed at the end: We covered TensorFlow installation on the three major operating systems, so that you are up and running with the platform. Windows users faced an extra challenge, as TensorFlow on Windows only supports Python 3.5.x or Python 3.6.x 64-bit version. However, even Windows users should now be up and running. Further get a detailed understanding of implementing Tensorflow with contextual examples in this post. If you liked this article, be sure to check out Machine Learning with Tensorflow 1.x which will help you take up any challenge you may face while implementing TensorFlow 1.x in your machine learning environment.

0
0
4992

article-image-ipv6-unix-domain-sockets-and-network-interfaces

Packt

21 Feb 2018

38 min read

IPv6, Unix Domain Sockets, and Network Interfaces

Packt

21 Feb 2018

38 min read

0
0
3688

Packt

21 Feb 2018

9 min read

API Gateway and its Need

Packt

21 Feb 2018

9 min read

In this article by Umesh R Sharma, author of the book Practical Microservices, we will cover API Gateway and its need with simple and short examples. (For more resources related to this topic, see here.) Dynamic websites show a lot on a single page, and there is a lot of information that needs to be shown on the page. The common success order summary page shows the cart detail and customer address. For this, frontend has to fire a different query to the customer detail service and order detail service. This is a very simple example of having multiple services on a single page. As a single microservice has to deal with only one concern, in result of that to show much information on page, there are many API calls on the same page. So, a website or mobile page can be very chatty in terms of displaying data on the same page. Another problem is that, sometimes, microservice talks on another protocol, then HTTP only, such as thrift call and so on. Outer consumers can't directly deal with microservice in that protocol. As a mobile screen is smaller than a web page, the result of the data required by the mobile or desktop API call is different. A developer would want to give less data to the mobile API or have different versions of the API calls for mobile and desktop. So, you could face a problem such as this: each client is calling different web services and keeping track of their web service and developers have to give backward compatibility because API URLs are embedded in clients like in mobile app. Why do we need the API Gateway? All these preceding problems can be addressed with the API Gateway in place. The API Gateway acts as a proxy between the API consumer and the API servers. To address the first problem in that scenario, there will only be one call, such as /successOrderSummary, to the API Gateway. The API Gateway, on behalf of the consumer, calls the order and user detail, then combines the result and serves to the client. So basically, it acts as a facade or API call, which may internally call many APIs. The API Gateway solves many purposes, some of which are as follows. Authentication API Gateways can take the overhead of authenticating an API call from outside. After that, all the internal calls remove security check. If the request comes from inside the VPC, it can remove the check of security, decrease the network latency a bit, and make the developer focus more on business logic than concerning about security. Different protocol Sometimes, microservice can internally use different protocols to talk to each other; it can be thrift call, TCP, UDP, RMI, SOAP, and so on. For clients, there can be only one rest-based HTTP call. Clients hit the API Gateway with the HTTP protocol and the API Gateway can make the internal call in required protocol and combine the results in the end from all web service. It can respond to the client in required protocol; in most of the cases, that protocol will be HTTP. Load-balancing The API Gateway can work as a load balancer to handle requests in the most efficient manner. It can keep a track of the request load it has sent to different nodes of a particular service. Gateway should be intelligent enough to load balances between different nodes of a particular service. With NGINX Plus coming into the picture, NGINX can be a good candidate for the API Gateway. It has many of the features to address the problem that is usually handled by the API Gateway. Request dispatching (including service discovery) One main feature of the gateway is to make less communication between client and microservcies. So, it initiates the parallel microservices if that is required by the client. From the client side, there will only be one hit. Gateway hits all the required services and waits for the results from all services. After obtaining the response from all the services, it combines the result and sends it back to the client. Reactive microservice designs can help you achieve this. Working with service discovery can give many extra features. It can mention which is the master node of service and which is the slave. Same goes for DB in case any write request can go to the master or read request can go to the slave. This is the basic rule, but users can apply so many rules on the basis of meta information provided by the API Gateway. Gateway can record the basic response time from each node of service instance. For higher priority API calls, it can be routed to the fastest responding node. Again, rules can be defined on the basis of the API Gateway you are using and how it will be implemented. Response transformation Being a first and single point of entry for all API calls, the API Gateway knows which type of client is calling a mobile, web client, or other external consumer; it can make the internal call to the client and give the data to different clients as per needs and configuration. Circuit breaker To handle the partial failure, the API Gateway uses a technique called circuit breaker pattern. A service failure in one service can cause the cascading failure in the flow to all the service calls in stack. The API Gateway can keep an eye on some threshold for any microservice. If any service passes that threshold, it marks that API as open circuit and decides not to make the call for a configured time. Hystrix (by Netflix) served this purpose efficiently. Default value in this is failing of 20 requests in 5 seconds. Developers can also mention the fall back for this open circuit. This fall back can be of dummy service. Once API starts giving results as expected, then gateway marks it as a closed service again. Pros and cons of API Gateway Using the API Gateway itself has its own pros and cons. In the previous section, we have described the advantages of using the API Gateway already. I will still try to make them in points as the pros of the API Gateway. Pros Microservice can focus on business logic Clients can get all the data in a single hit Authentication, logging, and monitoring can be handled by the API Gateway Gives flexibility to use completely independent protocols in which clients and microservice can talk It can give tailor-made results, as per the clients needs It can handle partial failure Addition to the preceding mentioned pros, some of the trade-offs are also to use this pattern. Cons It can cause performance degrade due to lots of happenings on the API Gateway With this, discovery service should be implemented Sometimes, it becomes the single point of failure Managing routing is an overhead of the pattern Adding additional network hope in the call Overall. it increases the complexity of the system Too much logic implementation in this gateway will lead to another dependency problem So, before using the API Gateway, both of the aspects should be considered. Decision of including the API Gateway in the system increases the cost as well. Before putting effort, cost, and management in this pattern, it is recommended to analysis how much you can gain from it. Example of API Gateway In this example, we will try to show only sample product pages that will fetch the data from service product detail to give information about the product. This example can be increased in many aspects. Our focus of this example is to only show how the API Gateway pattern works; so we will try to keep this example simple and small. This example will be using Zuul from Netflix as an API Gateway. Spring also had an implementation of Zuul in it, so we are creating this example with Spring Boot. For a sample API Gateway implementation, we will be using http://start.spring.io/ to generate an initial template of our code. Spring initializer is the project from Spring to help beginners generate basic Spring Boot code. A user has to set a minimum configuration and can hit the Generate Project button. If any user wants to set more specific details regarding the project, then they can see all the configuration settings by clicking on the Switch to the full version button, as shown in the following screenshot: Let's create a controller in the same package of main application class and put the following code in the file: @SpringBootApplication @RestController public class ProductDetailConrtoller { @Resource ProductDetailService pdService; @RequestMapping(value = "/product/{id}") public ProductDetail getAllProduct( @PathParam("id") String id) { return pdService.getProductDetailById(id); } } In the preceding code, there is an assumption of the pdService bean that will interact with Spring data repository for product detail and get the result for the required product ID. Another assumption is that this service is running on port 10000. Just to make sure everything is running, a hit on a URL such as http://localhost:10000/product/1 should give some JSON as response. For the API Gateway, we will create another Spring Boot application with Zuul support. Zuul can be activated by just adding a simple @EnableZuulProxy annotation. The following is a simple code to start the simple Zuul proxy: @SpringBootApplication @EnableZuulProxy public class ApiGatewayExampleInSpring { public static void main(String[] args) { SpringApplication.run(ApiGatewayExampleInSpring.class, args); } } Rest all the things are managed in configuration. In the application.properties file of the API Gateway, the content will be something as follows: zuul.routes.product.path=/product/** zuul.routes.produc.url=http://localhost:10000 ribbon.eureka.enabled=false server.port=8080 With this configuration, we are defining rules such as this: for any request for a URL such as /product/xxx, pass this request to http://localhost:10000. For outer world, it will be like http://localhost:8080/product/1, which will internally be transferred to the 10000 port. If we defined a spring.application.name variable as product in product detail microservice, then we don't need to define the URL path property here (zuul.routes.product.path=/product/** ), as Zuul, by default, will make it a URL/product. The example taken here for an API Gateway is not very intelligent, but this is a very capable API Gateway. Depending on the routes, filter, and caching defined in the Zuul's property, one can make a very powerful API Gateway. Summary In this article, you learned about the API Gateway, its need, and its pros and cons with the code example. Resources for Article: Further resources on this subject: What are Microservices? [article] Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article]

0
0
11305

article-image-some-basic-concepts-theano

Packt

21 Feb 2018

13 min read

Some Basic Concepts of Theano

Packt

21 Feb 2018

13 min read

In this article by Christopher Bourez, the author of the book Deep Learning with Theano, presents Theano as a compute engine, and the basics for symbolic computing with Theano. Symbolic computing consists in building graphs of operations that will be optimized later on for a specific architecture, using the computation libraries available for this architecture. (For more resources related to this topic, see here.) Although this article might sound far from practical application. Theano may be defined as a library for scientific computing; it has been available since 2007 and is particularly suited for deep learning. Two important features are at the core of any deep learning library: tensor operations, and the capability to run the code on CPU or GPU indifferently. These two features enable us to work with massive amount of multi-dimensional data. Moreover, Theano proposes automatic differentiation, a very useful feature to solve a wider range of numeric optimizations than deep learning problems. The content of the article covers the following points: Theano install and loading Tensors and algebra Symbolic programming Need for tensor Usually, input data is represented with multi-dimensional arrays: Images have three dimensions: The number of channels, the width and height of the image Sounds and times series have one dimension: The time length Natural language sequences can be represented by two dimensional arrays: The time length and the alphabet length or the vocabulary length In Theano, multi-dimensional arrays are implemented with an abstraction class, named tensor, with many more transformations available than traditional arrays in a computer language like Python. At each stage of a neural net, computations such as matrix multiplications involve multiple operations on these multi-dimensional arrays. Classical arrays in programming languages do not have enough built-in functionalities to address well and fastly multi-dimensional computations and manipulations. Computations on multi-dimensional arrays have known a long history of optimizations, with tons of libraries and hardwares. One of the most important gains in speed has been permitted by the massive parallel architecture of the Graphical Computation Unit (GPU), with computation ability on a large number of cores, from a few hundreds to a few thousands. Compared to the traditional CPU, for example a quadricore, 12-core or 32-core engine, the gain with GPU can range from a 5x to a 100x times speedup, even if part of the code is still being executed on the CPU (data loading, GPU piloting, result outputing). The main bottleneck with the use of GPU is usually the transfer of data between the memory of the CPU and the memory of the GPU, but still, when well programmed, the use of GPU helps bring a significant increase in speed of an order of magnitude. Getting results in days rather than months, or hours rather than days, is an undeniable benefit for experimentation. Theano engine has been designed to address these two challenges of multi-dimensional array and architecture abstraction from the beginning. There is another undeniable benefit of Theano for scientific computation: the automatic differentiation of functions of multi-dimensional arrays, a well-suited feature for model parameter inference via objective function minimization. Such a feature facilitates the experimentation by releasing the pain to compute derivatives, which might not be so complicated, but prone to many errors. Installing and loading Theano Conda package and environment manager The easiest way to install Theano is to use conda, a cross-platform package and environment manager. If conda is not already installed on your operating system, the fastest way to install conda is to download the miniconda installer from https://conda.io/miniconda.html. For example, for conda under Linux 64 bit and Python 2.7: wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh chmod +x Miniconda2-latest-Linux-x86_64.sh bash ./Miniconda2-latest-Linux-x86_64.sh Conda enables to create new environments in which versions of Python (2 or 3) and the installed packages may differ. The conda root environment uses the same version of Python as the version installed on your system on which you installed conda. Install and run Theano on CPU Last, let’s install Theano: conda install theano Run a python session and try the following commands to check your configuration: >>> from theano import theano >>> theano.config.device 'cpu' >>> theano.config.floatX 'float64' >>> print(theano.config) The last command prints all the configuration of Theano. The theano.config object contains keys to many configuration options. To infer the configuration options, Theano looks first at ~/.theanorc file, then at any environment variables available, which override the former options, last at the variable set in the code, that are first in the order of precedence: >>> theano.config.floatX='float32' Some of the properties might be read-only and cannot be changed in the code, but floatX property, that sets the default floating point precision for floats, is among properties that can be changed directly in the code. It is advised to use float32 since GPU have a long history without float64, float64 execution speed on GPU is slower, sometimes much slower (2x to 32x on latest generation Pascal hardware), and that float32 precision is enough in practice. GPU drivers and libraries Theano enables the use of GPU (graphic computation units), the units usually used to compute the graphics to display on the computer screen. To have Theano work on the GPU as well, a GPU backend library is required on your system. CUDA library (for NVIDIA GPU cards only) is the main choice for GPU computations. There exists also the OpenCL standard, which is opensource, but far less developed, and much more experimental and rudimentary on Theano. Most of the scientific computations still occur on NVIDIA cards today. If you have a NVIDIA GPU card, download CUDA from the NVIDIA website at https://developer.nvidia.com/cuda-downloads and install it. The installer will install the lastest version of the gpu drivers first if they are not already installed. It will install the CUDA library in /usr/local/cuda directory. Install the cuDNN library, a library by NVIDIA also, that offers faster implementations of some operations for the GPU To install it, I usually copy /usr/local/cuda directory to a new directory /usr/local/cuda-{CUDA_VERSION}-cudnn-{CUDNN_VERSION} so that I can choose the version of CUDA and cuDNN, depending on the deep learning technology I use, and its compatibility. In your .bashrc profile, add the following line to set $PATH and $LD_LIBRARY_PATH variables: export PATH=/usr/local/cuda-8.0-cudnn-5.1/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0-cudnn-5.1/lib64::/usr/local/cuda-8.0-cudnn-5.1/lib:$LD_LIBRARY_PATH Install and run Theano on GPU N-dimensional GPU arrays have been implemented in Python under 6 different GPU library (Theano/CudaNdarray,PyCUDA/ GPUArray,CUDAMAT/ CUDAMatrix, PYOPENCL/GPUArray, Clyther, Copperhead), are a subset of NumPy.ndarray. Libgpuarray is a backend library to have them in a common interface with the same property. To install libgpuarray with conda: conda install pygpu To run Theano in GPU mode, you need to configure the config.device variable before execution since it is a read-only variable once the code is run. With the environment variable THEANO_FLAGS: THEANO_FLAGS="device=cuda,floatX=float32" python >>> import theano Using cuDNN version 5110 on context None Mapped name None to device cuda: Tesla K80 (0000:83:00.0) >>> theano.config.device 'gpu' >>> theano.config.floatX 'float32' The first return shows that GPU device has been correctly detected, and specifies which GPU it uses. By default, Theano activates CNMeM, a faster CUDA memory allocator, an initial preallocation can be specified with gpuarra.preallocate option. At the end, my launch command will be: THEANO_FLAGS="device=cuda,floatX=float32,gpuarray.preallocate=0.8" python >>> from theano import theano Using cuDNN version 5110 on context None Preallocating 9151/11439 Mb (0.800000) on cuda Mapped name None to device cuda: Tesla K80 (0000:83:00.0) The first line confirms that cuDNN is active, the second confirms memory preallocation. The third line gives the default context name (that is None when the flag device=cuda is set) and the model of the GPU used, while the default context name for the CPU will always be cpu. It is possible to specify a different GPU than the first one, setting the device to cuda0, cuda1,... for multi-GPU computers. It is also possible to run a program on multiple GPU in parallel or in sequence (when the memory of one GPU is not sufficient), in particular when training very deep neural nets. In this case, the context flag contexts=dev0->cuda0;dev1->cuda1;dev2->cuda2;dev3->cuda3 activates multiple GPU instead of one, and designate the context name to each GPU device to be used in the code. For example, on a 4-GPU instance: THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1;dev2->cuda2;dev3->cuda3,floatX=float32,gpuarray.preallocate=0.8" python >>> import theano Using cuDNN version 5110 on context None Preallocating 9177/11471 Mb (0.800000) on cuda0 Mapped name dev0 to device cuda0: Tesla K80 (0000:83:00.0) Using cuDNN version 5110 on context dev1 Preallocating 9177/11471 Mb (0.800000) on cuda1 Mapped name dev1 to device cuda1: Tesla K80 (0000:84:00.0) Using cuDNN version 5110 on context dev2 Preallocating 9177/11471 Mb (0.800000) on cuda2 Mapped name dev2 to device cuda2: Tesla K80 (0000:87:00.0) Using cuDNN version 5110 on context dev3 Preallocating 9177/11471 Mb (0.800000) on cuda3 Mapped name dev3 to device cuda3: Tesla K80 (0000:88:00.0) To assign computations to a specific GPU in this multi-GPU setting, the names we choose dev0, dev1, dev2, and dev3 have been mapped to each device (cuda0, cuda1, cuda2, cuda3). This name mapping enables to write codes that are independent of the underlying GPU assignments and libraries (CUDA or other). To keep the current configuration flags active at every Python session or execution without using environment variables, save your configuration in the ~/.theanorc file as: [global] floatX = float32 device = cuda0 [gpuarray] preallocate = 1 Now, you can simply run python command. You are now all set. Tensors In Python, some scientific libraries such as NumPy provide multi-dimensional arrays. Theano doesn't replace Numpy but works in concert with it. In particular, NumPy is used for the initialization of tensors. To perform the computation on CPU and GPU indifferently, variables are symbolic and represented by the tensor class, an abstraction, and writing numerical expressions consists in building a computation graph of Variable nodes and Apply nodes. Depending on the platform on which the computation graph will be compiled, tensors are replaced either: By a TensorType variable, which data has to be on CPU By a GpuArrayType variable, which data has to be on GPU That way, the code can be written indifferently of the platform where it will be executed. Here are a few tensor objects: Object class Number of dimensions Example theano.tensor.scalar 0-dimensional array 1, 2.5 theano.tensor.vector 1-dimensional array [0,3,20] theano.tensor.matrix 2-dimensional array [[2,3][1,5]] theano.tensor.tensor3 3-dimensional array [[[2,3][1,5]],[[1,2],[3,4]]] Playing with these Theano objects in the Python shell gives a better idea: >>> import theano.tensor as T >>> T.scalar() <TensorType(float32, scalar)> >>> T.iscalar() <TensorType(int32, scalar)> >>> T.fscalar() <TensorType(float32, scalar)> >>> T.dscalar() <TensorType(float64, scalar)> With a i, l, f, d letter in front of the object name, you initiate a tensor of a given type, integer32, integer64, floats32 or float64. For real-valued (floating point) data, it is advised to use the direct form T.scalar() instead of the f or d variants since the direct form will use your current configuration for floats: >>> theano.config.floatX = 'float64' >>> T.scalar() <TensorType(float64, scalar)> >>> T.fscalar() <TensorType(float32, scalar)> >>> theano.config.floatX = 'float32' >>> T.scalar() <TensorType(float32, scalar)> Symbolic variables either: Play the role of placeholders, as a starting point to build your graph of numerical operations (such as addition, multiplication): they receive the flow of the incoming data during the evaluation, once the graph has been compiled Represent intermediate or output results Symbolic variables and operations are both part of a computation graph that will be compiled either towards CPU or GPU for fast execution. Let's write a first computation graph consisting in a simple addition: >>> x = T.matrix('x') >>> y = T.matrix('y') >>> z = x + y >>> theano.pp(z) '(x + y)' >>> z.eval({x: [[1, 2], [1, 3]], y: [[1, 0], [3, 4]]}) array([[ 2., 2.], [ 4., 7.]], dtype=float32) At first place, two symbolic variables, or Variable nodes are created, with names x and y, and an addition operation, an Apply node, is applied between both of them, to create a new symbolic variable, z, in the computation graph. The pretty print function pp prints the expression represented by Theano symbolic variables. Eval evaluates the value of the output variable z, when the first two variables x and y are initialized with two numerical 2-dimensional arrays. The following example explicit the difference between the variables x and y, and their names x and y: >>> a = T.matrix() >>> b = T.matrix() >>> theano.pp(a + b) '(<TensorType(float32, matrix)> + <TensorType(float32, matrix)>)' Without names, it is more complicated to trace the nodes in a large graph. When printing the computation graph, names significantly helps diagnose problems, while variables are only used to handle the objects in the graph: >>> x = T.matrix('x') >>> x = x + x >>> theano.pp(x) '(x + x)' Here the original symbolic variable, named x, does not change and stays part of the computation graph. x + x creates a new symbolic variable we assign to the Python variable x. Note also, that with names, the plural form initializes multiple tensors at the same time: >>> x, y, z = T.matrices('x', 'y', 'z') Now, let's have a look at the different functions to display the graph. Summary Thus, this article helps us to give a brief idea on how to download and install Theano on various platforms along with the packages such as NumPy and SciPy. Resources for Article: Further resources on this subject: Introduction to Deep Learning [article] Getting Started with Deep Learning [article] Practical Applications of Deep Learning [article]

0
0
1958

Packt

21 Feb 2018

13 min read

Tree Test and Surveys

Packt

21 Feb 2018

13 min read

In this article by Pablo Perea and Pau Giner, authors of the book UX Design for Mobile, we will cover how to use different techniques that can be applied according to the needs of the project and the how to obtain the information that we want. (For more resources related to this topic, see here.) Tree Test This is also called reverse card sorting. This is a method where the participants try to find elements in a given structure. The objective of this method is to discover findability problems and improve the organization and labeling system. The organization structure used should represent a realistic navigation for the application or web page you are evaluating. If you don’t have one by the time the experiment is taking place, try to create a real scenario, as using a fake organization will not lead to really valuable results. There are some platforms available to perform this type of experiment with a computer. One example is https://www.optimalworkshop.com. This can have several advantages: the experiment can be carried out without requiring a physical displacement by the participant, and you can also study the participant steps and not just analyze whether the participant succeeded or not. It can be that the participants found the objectives but had to make many attempts to achieve them. The method steps Create the structure: Type or create the navigation structure with all the different levels you want to evaluate. Create a set of findability tasks: Think about different items that the participant should find or give a location in the given structure. Test with participants: The participant will receive a set of tasks to do. The following are some examples of possible tasks: Find some different products to buy Contact the customer support Get the shipping rates The results: At the end, we should have a success rate for each of the tasks. Tasks such as finding products in a store must be done several times with products located in different sections. This will help us classify our assortment and show us how to organize the first levels of the structure better. How to improve the organization Once we find the main weak points and workarounds we have in our structure, we can create alternative structures to retest and try to find better results. We can repeat this process several times until we get the desired results. The Information Architecture is the science field of organizing and labeling content in a web page to support usability and findability. There's a growing community of Information architecture specialists that supports the Information architecture Institute--https://en.wikipedia.org/wiki/Information_architecture. There are some general lines of work in which we have to invest time in order to improve our Information architecture. Content organization The content can be ordered by following different schemes, in the same way that a supermarket orders products according to different criteria. We should try to find the one that better fits our user needs. We can order the content, dividing it into groups according to nature, goals, audience, chronological entry, and so on. Each of these approaches will lead to different results and each will work better with different kinds of users. In the case of mobile applications, it is common to have certain sections where they mix contents of a different nature, for instance, integrating messages for the user in the contents of the activity view. However, an abuse of these types of techniques can lead to turning the section into a confusing area for the user. Areas naming There are words that have a completely different meaning for one person to another, especially if those people are thinking in different fields when they use our solution. Understanding our user needs, and how the think and speak, will help us provide clear names for sections and subsections. For example, the word pool will represent a different set of products for a person looking for summer products than for a person looking for games. In the case of applications, we will have to find a balance between simplicity and clarity. If space permits, adding a label along with the icon will clarify and reduce the possible ambiguities that may be encountered in recognizing the meaning of these graphic representations. In the case of mobiles, where space is really small, we can find some universal icons, but we must test with users to ensure that they interpret them properly. In the following examples, you can find two different approaches. In the Gmail app, attachment and send are known icons and can work without a label. We find a very different scenario in the Samsung Clock app, where it would be really difficult to differentiate between the Alarm, the Stopwatch, and the Timer without labels: Samsung system and Google Gmail App screenshots (source: Screenshot from Google Gmail App, source: Screenshot from Gmail App) The working memory limit The way the information is displayed to the user can drastically change the ease with which it is understood. When we talk about mobiles, where space is very limited, limiting the number of options and providing a navigation adapted to small spaces can help our user have a more satisfactory experience. As you probably know, the human working memory is not limitless, and it is commonly supposed to be limited to remembering a maximum of seven elements (https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two). Some authors such as Nelson Cowan suggested that the number of elements an adult can remember while performing a task is even lower, and gives the number of reference as four (https://en.wikipedia.org/wiki/Working_memory). This means that your users will understand the information you give them better if you block it into groups according to their limitations. Once we create a new structure, we can evaluate the efficiency of this new structure versus the last version. With small improvements, we will be able to increase the user engagement. Another way to learn about how the user understands the organization of our app or web is by testing a competitor product. This is one of the cheapest ways to have a quick prototype. Evaluate as many versions as you can; in each review, you will find new ideas to organize and show the content of your application or web app better. Surveys Surveys allow us to gather information from lots of participants without too much effort. Sometimes, we need information from a big group of people, and interviewing them one by one will not be affordable. Instead of that, surveys can quickly provide answers from lots of participants and analyze the results with bulk methods. It is not the purpose of this book to deal in depth with the field of questionnaires since there are books devoted entirely to this subject. Nevertheless, we will give some brushstrokes on the subject since they are commonly used to gather information in both web pages and mobile applications. Creating proper questions is a key part of the process that will reduce the noise and help the participants provide useful answers. Some questions will require more effort to analyze, but they will give us answers with deeper level of detail. Questions with pre-established answers are usually easier to automatize, and we can get results in less time. What we want to discover? The first thing to do is to define the objective for which we are making a survey. Working with a clear objective will help the process be focused and will get better results. Plan carefully and determine the information that you really need at that moment. We should avoid surveys with lots of questions that do not have a clear purpose. They will produce poor outcome and result in meaningless exercises for the participants. On the contrary, if we have a general leitmotiv for the questionnaire, it will also help the participants understand how the effort of completing the survey will help the company, and therefore it will give clear value to the time expended. You can plan your survey according to different planning approaches, your questions can be focussed on short and long term goals: Long-term planning: Understanding your users expectations and their view about your product in the long term will help plan the evolution of your application and help create new features that match their needs. For example, imagine that you are designing a music application, and you are unsure about focusing on mass majority music or maybe giving more visibility to amateur groups. Creating a long-term survey can help you understand what your users want to find in your platform and plan changes that match the conclusions extracted from the survey analysis. Short-term planning: This is usually related to operational actions. The objective with these kind of surveys is to gather information for taking actions later with a defined mission. These kind of surveys are useful when we need to choose between two options, that is, whether we are deciding to make a change in our platform or not. For example, it can help to decide what type of information is most important for the user when choosing between one group and another, so we can make that information more visible. We will take better decisions by understanding the main aspects our users will expect to find in our platform. Find the participants Depending on the goal of the survey, we can use a wider range of participants or reduce their number, filtering by their demographics, experience, or the relationship with our brand or products. If the goal is to expand our number of users, it may be interesting to expand the search range to participants outside our current set of users. Looking for new niches and interesting features for our potential users can make new users try out our application. If, on the contrary, our objective is to keep our current users loyal, it can be a great source of improvement to consult them about their preferences and their opinions about the things that work properly in our application. This data, along with the data of use and navigation, will let us see areas for improvement, and we will be able to solve problems of navigation and usability. Determining the questions We can ask different types of questions; depending on the type, we will get more or less detailed answers. If we choose the right type, we can save analysis effort, or we can reduce the number of participants when we require a deep analysis of each response. It is common to include questions at the beginning of the questionnaires in order to classify the results. They are usually called filtering or screening questions, and they will allow us to analyze the answers based on data such as age, gender, or technical skills. These questions have the objective of knowing the person answering the survey. If we know the person solving the questionnaire, we will be able to determine whether the answers given by this user are useful for our goals or not. We can add questions about the experience the participant has with general technology, or with our app, and about the relation with the brand. We can create two kinds of questions based on the type of answers the participant can provide; each of them, therefore, will lead to different results. Open-answer questions The objective of this type of questions is to know more about the participant without guiding the answers. We will try to ask objectively for a subject without providing possible answers. The participant will answer these type of questions with open-ended answers, so it will be easier to know more about how that participant thinks and which aspects are proving more or less satisfactory. While the advantage of this kind of questions is that you will gain a lot of insights and new ideas, the con is the cost of managing big amounts of data. So, these type of questions will be more useful when the number of participants is reduced. Here are some examples of open-answer questions: How often have you been using our customer service? How was your last purchase experience on our platform? Questions with pre-established answers These type of questions facilitate the analysis when the number of participants is high. We will create questions with a clear objective and give different options to respond. Participants will be able to choose one of the options in response. The analysis of these types of questions can be automated and therefore is faster, but it will not give us as detailed information as an open question, in which the participant can expose all his ideas about the matter in the question. The following is an example of a question with pre-established answers: Questions: How many times have you used our application in the last week? Answers: 1) More than five times 2) Two to Five 3) Once 4) None Another great advantage is the facility to answer these types of questions when the participant does not have much time or interest to respond. In environments such as mobile phones, typing long answers can be costly and frustrating. With these types of questions, we can offer answers that the user can select with a single click. This can help increase the number of participants completing the form. It is common to mix both the types of questions. Open-answer questions where the user can respond in more detail can be included as optional questions. The participants willing to share more information can use these fields to introduce more detailed answers. This way, we can make a quicker analysis on the questions with pre-established answers and analyze the questions that require more precise revision later. Humanize the forms When we create a form, we must think about the person who will answer it. Almost no one likes to fill in questionnaires, especially if they are long and complex. To make our participants feel comfortable filling out all the answers on our form, we have to try to treat the process as a human relationship: The first thing we should do is to explain the reason of our form. If our participants understand how their answers will be used in the project, and how they can help us achieve the goal, they will feel more encouraged to answer the questions and take their role seriously. Ask only what is strictly necessary for the purpose you have set for it. We must prevent different departments from introducing questions without a common goal. If the form will answer concerns of different departments, all of them should have the same goal. This way, the form will have more cohesion. The tone used on the form should be friendly and clear. We should not go beyond the limits of indiscretion with our questions, or the participant may feel overwhelmed, especially if the participants of our study are not users of our application or our services, we must treat them as unknown. Being respectful and kind is a key point in getting high participation. Summary In the article we saw how to apply Tree Test and how to conduct surveys to gain information that you want. Resources for Article: Further resources on this subject: Trends UX Design [article] Building Mobile Apps [article] Auditing Mobile Applications [article]

0
0
1373

How-To Tutorials

article-image-open-and-proprietary-next-generation-networks

Packt

21 Feb 2018

29 min read

Open and Proprietary Next Generation Networks

Packt

21 Feb 2018

29 min read

In this article by Steven Noble, the author of the book Building Modern Networks, we will discuss networking concepts such as hyper-scale networking, software-defined networking, network hardware and software design along with a litany of network design ideas utilized in NGN. (For more resources related to this topic, see here.) The term Next Generation Network (NGN) has been around for over 20 years and refers to the current state of the art network equipment, protocols and features. A big driver in NGN is the constant newer, better, faster forwarding ASICs coming out of companies like Barefoot, Broadcom, Cavium, Nephos (MediaTek) and others. The advent of commodity networking chips has shortened the development time for generic switches, allowing hyper scale networking end users to build equipment upgrades into their network designs. At the time of writing, multiple companies have announced 6.4 Tbps switching chips. In layman terms, a 6.4 Tbps switching chip can handle 64x100GbE of evenly distributed network traffic without losing any packets. To put the number in perspective, the entire internet in 2004 was about 4 Tbps, so all of the internet traffic in 2004 could have crossed this one switching chip without issue. (Internet Traffic 1.3 EB/month http://blogs.cisco.com/sp/the-history-and-future-of-internet-traffic) A hyper-scale network is one that is operated by companies such as Facebook, Google, Twitter and other companies that add hundreds if not thousands of new systems a month to keep up with demand. Examples of next generation networking At the start of the commercial internet age (1994), software routers running on minicomputers such as BBNs PDP-11 based IP routers designed in the 1970's were still in use and hubs were simply dumb hardware devices that broadcast traffic everywhere. At that time, the state of the art in networking was the Cisco 7000 series router, introduced in 1993. The next generation router was the Cisco 7500 (1995), while the Cisco 12000 series (gigabit) routers and the Juniper M40 were only concepts. When we say next generation, we are speaking of the current state of the art and the near future of networking equipment and software. For example, 100 GB Ethernet is the current state of the art, while 400 GB Ethernet is in the pipeline. The definition of a modern network is a network that contains one or more of the following concepts: Software-defined Networking (SDN) Network design concepts Next generation hardware Hyper scale networking Open networking hardware and software Network Function Virtualization (NFV) Highly configurable traffic management Both Open and Closed network hardware vendors have been innovating at a high rate of speed with the help of and due to hyper-scale companies like Google, Facebook and others who have the need for next generation high speed network devices. This provides the network architect with a reasonable pipeline of equipment to be used in designs. Google and Facebook are both companies with hyper scale networks. A hyper scale network is one where the data stored, transferred, and updated on the network grows exponentially. Hyper scale companies deploy new equipment, software, and configurations weekly or even daily to support the needs of their customers. These companies have needs that are outside of the normal networking equipment available, so they must innovate by building their own next generation network devices, designing multi-tiered networks (like a three stage Clos network) and automating the installation and configuration of the next generation networking devices. The need of hyper scalers is well summed up by Google's Amin Vahdat in a 2014 Wired article "We couldn't buy the hardware we needed to build a network of the size and speed we needed to build". Terms and concepts in networking Here you will find the definition of some terms that are important in networking. They have been broken into groups of similar concepts. Routing and switching concepts In network devices and network designs there are many important concepts to understand. Here we begin with the way data is handled. The easiest way to discuss networking is to look at the OSI layer and point out where each device sits. OSI Layer with respect to routers and switches: Layer 1 (Physical): Layer 1 includes cables, hub, and switch ports. This is how all of the devices connect to each other including copper cables (CatX), fiber optics and Direct Attach Cables (DAC) which connect SFP ports without fiber. Layer 2 (Data link Layer): Layer 2 includes the raw data sent over the links and manages the Media Access Control (MAC) addresses for Ethernet Layer 3 (Network layer): Layer 3 includes packets that have more than just layer 2 data, such as IP, IPX (Novell Networks protocol), AFP (Apple's protocol) Routers and switches In a network you will have equipment that switches and/or routes traffic. A switch is a networking device that connects multiple devices such as servers, provides local connectivity and provides an uplink to the core network. A router is a network device that computes paths to remote and local devices, providing connectivity to devices across a network. Both switches and routers can use copper and fiber connections to interconnect. There are a few parts to a networking device, the forwarding chip, the TCAM, and the network processor. Some newer switches have Baseboard Management Controllers (BMCs) which manage the power, fans and other hardware, lessening the burden on the NOS to manage these devices. Currently routers and switches are very similar as there are many Layer 3 forwarding capable switches and some Layer 2 forwarding capable routers. Making a switch Layer 3 capable is less of an issue than making a router Layer 2 forwarding as the switch already is doing Layer 2 and adding Layer 3 is not an issue. A router does not do Layer 2 forwarding in general, so it has to be modified to allow for ports to switch rather than route. Control plane The control plane is where all of the information about how packets should be handled is kept. Routing protocols live in the control plane and are constantly scanning information received to determine the best path for traffic to flow. This data is then packed into a simple table and pushed down to the data plane. Data plane The data plane is where forwarding happens. In a software router, this would be done in the devices CPU, in a hardware router, this would be done using the forwarding chip and associated memories. VLAN/VXLAN A Virtual Local Area Network (VLAN) is a way of creating separate logical networks within a physical network. VLANs are generally used to separate/combine different users, or network elements such as phones, servers, workstations, and so on. You can have up to 4,096 VLANs on a network segment. A Virtual Extensible LAN (VXLAN) was created to all for large, dynamic isolated logical networks for virtualized and multiple tenant networks. You can have up to 16 million VXLANs on a network segment. A VXLAN Tunnel Endpoint (VTEP) is a set of two logical interfaces inbound which encapsulates incoming traffic into VXLANs and outbound which removes the encapsulation of outgoing traffic from VXLAN back to its original state. Network design concepts Network design requires the knowledge of the physical structure of the network so that the proper design choices are made. For example, in data center you would have a local area network, if you have multiple data centers near each other, they would be considered a metro area network. LAN A Local Area Network (LAN), generally considered to be within the same building. These networks can be bridged (switched) or routed. In general LANs are segmented into areas to avoid large broadcast domains. MAN A Metro Area Network (MAN), generally defined as multiple sites in the same geographic area or city, that is, metropolitan area. A MAN generally runs at the same speed as a LAN but is able to cover larger distances. WAN A Wide Area Network (WAN), essentially everything that is not a LAN or MAN is a WAN. WANs generally use fiber optic cables to transmit data from one location to another. WAN circuits can be provided via multiple connections and data encapsulations including MPLS, ATM, and Ethernet. Most large network providers utilize Dense Wavelength Division Multiplexing (DWDM) to put more bits on their fiber networks. DWDM puts multiple colors of light onto the fiber, allowing up to 128 different wavelengths to be sent down a single fiber. DWDM has just entered open networking with the introduction of Facebook's Voyager system. Leaf-Spine design In a Leaf-Spine network design, there are Leaf switches (the switches that connect to the servers) sometimes called Top of Rack (ToR) switches connected to a set of Spine (switches that connect leafs together) sometimes called End of Rack (EoR) switches. Clos network A Clos network is one of the ways to design a multi-stage network. Based on the switching network design by Charles Clos in 1952, a three stage Clos is the smallest version of a Clos network. It has an ingress, a middle, and an egress stage. Some hyper scale networks are using five stage Clos where the middle is replaced with another three stage Clos. In a five stage Clos there is an ingress, a middle ingress, a middle, a middle egress and an egress stage. All stages are connected to their neighbor, so in the example shown, Ingress 1 is connected to all four of the middle stages just as Egress 1 is connected to all four of the middle stages. A Clos network can be built in odd numbers starting with 3, so a 5, 7, and so on stage Clos is possible. For even numbered designs, Benes designs are usable. Benes network A Benes design is a non-blocking Clos design where the middle stage is 2x2 instead of NxN. A Benes network can have even numbers of stages. Here is a four stage Benes network. Network controller concepts Here we will discuss the concepts of network controllers. Every networking device has a controller, whether built in or external to manage the forwarding of the system. Controller A controller is a computer that sits on the network and manages one or more network devices. A controller can be built into a device, like the Cisco Supervisor module or standalone like an OpenFlow controller. The controller is responsible for managing all of the control plane data and deciding what should be sent down to the data plane. Generally, a controller will have a Command-line Interface (CLI) and more recently a web configuration interface. Some controllers will even have an Application Programming Interface (API). OpenFlow controller An OpenFlow controller, as it sounds is a controller that uses the OpenFlow protocol to communicate with network devices. The most common OpenFlow controllers that people hear about are OpenDaylight and ONOS. People who are working with OpenFlow would also know of Floodlight and RYU. Supervisor module A route processor is a computer that sits inside of the chassis of the network device you are managing. Sometimes the route processor is built in to the system, while other times it is a module that can be replaced/upgraded. Many vendor multi-slot systems have multiple route processors for redundancy. An example of a removable route processor is the Cisco 9500 series Supervisor module. There are multiple versions available including revision A, with a 4 core processor and 16 GB of RAM and revision B with a 6 core processor and 24 GB of RAM. Previous systems such as the Cisco Catalyst 7600 had options such as the SUP720 (Supervisor Module 720) of which they offered multiple versions. The standard SUP720 had a limited number of routes that it could support (256k) versus the SUP720 XL which could support up to 1M routes. Juniper Route Engine In Juniper terminology, the controller is called a Route Engine. They are similar to the Cisco Route Processor/Supervisor modules. Unlike Cisco Supervisor modules which utilize special CPUS, Juniper's REs generally use common x86 CPUs. Like Cisco, Juniper multi-slot systems can have redundant processors. Juniper has recently released the information about the NG-REs or Next Generation Route Engines. One example is the new RE-S-X6-64G, a 6-core x86 CPU based routing engine with 64 GB DRAM and 2x 64 GB SSD storage available for the MX240/MX480/MX960. These NG-REs allow for containers and other virtual machines to be run directly. Built in processor When looking at single rack unit (1 RU) or pizza box design switches there are some important design considerations. Most 1 RU switches do not have redundant processors, or field replaceable route processors. In general the field replaceable units (FRUs) that the customer can replace are power supplies and fans. If the failure is outside of the available FRUs, the entire switch must be replaced in the event of a failure. With white box switches this can be a simple process as white box switches can be used in multiple locations of your network including the customer edge, provider edge and core. Sparing (keeping a spare switch) is easy when you have the same hardware in multiple parts of the network. Recently commodity switch fabric chips have come with built-in low power ARM CPUs that can be used to manage the entire system, leading to cheaper and less power hungry designs. Facebook Wedge microserver The Facebook Wedge is different from most white box switches as it has its controller as an add in module, the same board that is used in some of the OCP servers. By separating the controller board from the switch, different boards can be put in place, such as higher memory, faster CPUs, different CPU types, and so on. Routing protocols A routing protocol is a daemon that runs on a controller and communicates with other network devices to exchange route information. For this section we will use common words to demonstrate the way the routing protocol is working, these should not be construed as the actual way that the protocols talk. BGP Border Gateway Protocol (BGP) is a path vector based External Gateway Protocol (EGP) protocol that makes routing decisions based on paths, network policies, or rules (route-maps on Cisco). Though designed as a EGP, BGP can be used as both an interior (iboga) and exterior (eBGP) routing protocol. BGP uses keep alive packets (are you there?) to confirm that neighbors are still accessible. BGP is the protocol that is utilized to route traffic across the internet, exchanging routing information between different Autonomous Systems (AS). An AS is all of the connected networks under the control of a single entity such as Level 3 (AS1) or Sprint (AS1239). When two different ASes interconnect, BGP peering sessions are setup between two or more network devices that have direct connections to each other. In an eBGP scenario, AS1 and AS1239 would setup BGP peering sessions that would allow traffic to route between their AS. In an iBGP scenario, the same AS would peer with other routers with the same AS and transfer the routes that are defined on the system. While iBGP is used internally in most networks, iBGP is used in large corporate networks because other Interior Gateway Protocols (IGPs) may not scale. Examples: iBGP next hop self In this scenario AS1 and AS2 are peered with each other and exchanging one prefix each. AS1 advertises 192.168.1.0/24 and AS2 advertises 192.168.2.0/24. Each network has two routers, one border router, which connects to other ASes and one internal router which gets its routes from the border router. The routes are advertised internally with the next-hop set as the border router. This is a standard scenario when you are not running an IGP inside to distribute the routes for the border router external interfaces. The conversation goes like this: AS1 -> AS2: Hi AS2, I am AS1 AS2 -> AS1: Hi AS1, I am AS2 AS1 -> AS2: I have the following route, 192.168.1.0/24 AS2 - AS1: I have received the route, I have 192.168.2.0/24 AS1 - AS2: I have received the route AS1 -> Internal Router AS1: I have this route, 192.168.2.0/24, you can reach it through me at 10.1.1.1 AS2 -> Internal Router AS2: I have this route, 192.168.1.0/24, you can reach it through me at 10.1.1.1 iBGP next-hop unmodified In the next scenario the border routers are the same, but the internal routers are given a next-hop of the external (Other AS) border router. The last scenario is where you peer with a router server, a system that handles peering, filtering the routes based on what you have specified you send. The routes are then forwarded onto your peers with your IP as the next hop. OSPF Open Shortest Path First (OSPF) is a relatively simple protocol. Different links on the same router are put into the same or different areas. For example, you would use area 1 for the interconnects between campuses but you would use another area, such as area 10 for the campus itself. By separating areas, you can reduce the amount of cross talk that happens between devices. There are two versions of OSPF, v2 and v3. The main difference between v2 and v3 is that v2 is for IPv4 networks and v3 is for IPv6 networks. When there are multiple paths that can be taken, the cost of the links must be taken into account. Below you can see where there are two paths, one has a total cost of 20 (5+5+10), the other 16 (8+8) so the traffic will take the lowest cost link. IS-IS IS-IS is a link-state routing protocol, operating by flooding link state information throughout a network of routers using NETs (Network Entity Title). Each IS-IS router has its own database of the network topology, built by aggregating the flooded network information. IS-IS is used by companies who are looking for Fast convergence, scalability and Rapid flooding of new information. IS-IS uses the concept of levels instead of areas as in OSPF. There are two levels in IS-IS, Level 1 - area and Level2 - backbone. A Level 1 Intermediate System (IS), keeps track of the destinations within its area, while a Level 2 IS keep track of paths to the Level 1 areas. EIGRP Enhanced Interior Gateway Routing Protocol (EIGRP) is Cisco's proprietary routing protocol. It is hardly ever seen in current networks but if you see it in yours, then you need to plan accordingly. Replacing EIGRP with OSPF is suggested so that you can interoperate with non-cisco devices. RIP If Routing Information Protocol (RIP) is being used in your network, it must be replaced during the design. Most newer routing stacks do not support RIP. RIP is one of the original routing protocols, using the number of hops (routed ports) between the device and remote location to determine the optimal path. RIP sends its entire routing database out every 30 seconds. When routing tables were small, many years ago, RIP worked fine. With larger tables, the traffic bursts and resulting re-computing by other routers in the network causes routers to run at almost 100 percent CPU all the time. Cables Here we will review the major types of cables. Copper Copper cables have been around for a very long time, originally network devices were connected together using coax cable (the same cable used for television antennas and cable). These days there are a few standard cables that are used. RJ45 Cables Cat5 - A 100Mb capable cable, used for both 10Mb and 100Mb connections Cat5E - 1GbE capable cable but not suggested for 1GbE networks (Cat6 is better and the price difference is nominal). Cat6 - A 1GbE capable cable, can be used for any speed at or below 1GbE including 100Mb and 10Mb. SFPs SFP - Small Form-factor Pluggable port. Capable of up to 1GbE connections SFP+ - Same size as the SFP, capable of up to 10Gb connections SFP28 - Same size as the SFP, capable of up to 25Gb connections QSFP - Quad Small Form-factor Pluggable - A bit wider than the SFP but capable of multiple GbE connections QSFP+ - Same size as the QSFP - capable of 40GbE as 4x10GbE on the same cable QSFP28 - Same size as the QSFP - capable of 100GbE DAC - A direct attach cable that fits into a SFP or QSFP port Fiber/Hot pluggable Breakout Cables As routers and switches continue to become more dense, where the number of ports on the front of the device can no longer fit in the space, manufacturers have moved to what we call breakout cables. For example, if you have a switch that can handle 3.2Tb/s of traffic, you need to provide 3200Gbp/s of port capacity. The easiest way to do that is to use 32 100Gb ports which will fit on the front of a 1U device. You cannot fit 128 10Gb ports without using either a breakout patch panel (which will then use another few rack units (RUs), or a breakout cable. For a period of time in the 1990's, Cisco used RJ21 connectors to provide up to 96 ethernet ports per slot Network engineers would then create breakout cables to go from the RJ21 to RJ45. These days, we have both DAC (Direct Attach Cable) and Fiber breakout cables. For example, here you can see a 1x4 breakout cable, providing 4 10g or 25G ports from a single 40G or 100G port. If you build a LAN network that only includes switches that provide layer 2 connectivity, any devices you want to connect together need to be in the same IP block. If you have a router in your network, it can route traffic between IP blocks. Part 1: What defines a modern network There is a litany of concepts that define a modern network, from simple principles to full feature sets. In general, a next-generation data center design enables you to move to a widely distributed non-blocking fabric with uniform chipset, bandwidth, and buffering characteristics in a simple architecture. In one example, to support these requirements, you would begin with a true three-tier Clos switching architecture with Top of Rack (ToR), spine, and fabric layers to build a data center network. Each ToR would have access to multiple fabrics and have the ability to select a desired path based on application requirement or network availability. Following the definition of a modern network from the introduction, here we layout the general definition of the parts. Modern network pieces Here we will discuss the concepts that build a Next Generation Network (NGN). Software Defined Networks Software defined networks can be defined in multiple ways. The general definition of a Software defined network is one which can be controlled as a singular unit instead of at a system by system basis. The control-plane which would normally be in the device and using routing protocols is replaced with a controller. Software defined networks can be built using many different technologies including OpenFlow, overlay networks and automation tools. Next generation networking and hyper scale networks As we mention in the introduction, twenty years ago NGN hardware would have been the Cisco GSR (officially introduced in 1997) or the Juniper M40 (officially released in 1998). Large Cisco and Juniper customers would have been working with the companies to help come up with the specifications and determining how to deploy the devices (possibly Alpha or Beta versions) in their networks. Today we can look at the hyper scale networking companies to see what a modern network looks like. A hyper scale network is one where the data stored, transferred and updated on the network grows exponentially. Technology such as 100Gb Ethernet, software defined networking, Open networking equipment and software are being deployed by hyper scale companies. Open networking hardware overview Open Hardware has been around for about 10 years, first in the consumer space and more recently in the enterprise space. Enterprise open networking hardware companies such as Quanta and Accton provide a significant amount of the hardware currently utilized in networks today. Companies such as Google and Facebook have been building their own hardware for many years. Facebook's routers such as the Wedge 100 and Backpack are available publicly for end users to utilize. Some examples of Open Networking hardware are: The Dell S6000-ON - a 32x40G switch with 32 QSFP ports on the front. The Quanta LY8 - a 48x10G + 6x40G switch with 48 SFP+ ports and 6 QSFP ports. The Facebook Wedge 100 - a 32x100G switch with 32 QSFP28 ports on the front. Open networking software overview To use open networking hardware, you need an operating system. The operating system manages the system devices such as fans, power, LEDs and temperature. On top of the operating system you will run a forwarding agent, examples of forwarding agents are Indigo, the open source OpenFlow daemon and Quagga, an open source routing agent. Closed networking hardware overview Cisco and Juniper are the leaders in the Closed Hardware and Software space. Cisco produces switches like the Nexus series (3000, 7000, 9000) with the 9000 programmable by ACI. Juniper provides the MX series (480, 960, 2020) with the 2020 being the highest end forwarding system they sell. Closed networking software overview Cisco has multiple network operating systems including IOS, NX-OS, IOS-XR. All Cisco NOSs are closed source and proprietary to the system that they run on. Cisco has what the industry calls a "industry standard CLI" which is emulated by many other companies. Juniper ships a single NOS, JunOS which can install on multiple different systems. JunOS is a closed source BSD based NOS. The JunOS CLI is significantly different from IOS and is more focused on engineers who program. Network Virtualization Not to be confused with Network Function Virtualization (NFV), Network virtualization is the concept of re-creating the hardware interfaces that exist in a traditional network in software. By creating a software counterpart to the hardware interfaces, you decouple the network forwarding from the hardware. There are a few companies and software projects that allow the end user to enable network virtualization. The first one is NSX which comes from the same team that developed OvS (Open Virtual Switch) Nicira, which was acquired by VMWare in 2012. Another project is Big Switch Networks Big Cloud Fabric, which utilizes a heavily modified version of Indigo, an OpenFlow controller. Network Function Virtualization Network Function Virtualization can be summed up by the statement that: "Due to recent network focused advancements in PC hardware, any service able to be delivered on proprietary, application specific hardware should be able to be done on a virtual machine". Essentially: routers, firewalls, load balancers and other network devices all running virtualized on commodity hardware. Traffic Engineering Traffic engineering is a method of optimizing the performance of a telecommunications network by dynamically analyzing, predicting and regulating the behavior of data transmitted over that network. Part 2: Next generation networking examples In my 25 or so years of networking, I have dealt with a lot of different networking technologies, each iteration (supposedly) better than the last. Starting with Thin Net (10BASE2), moving through ArcNet, 10BASE-T, Token Ring, ATM to the Desktop, FDDI and onwards. Generally, the technology improved for each system until it was swapped out. A good example is the change from a literal ring for token ring to a switching design where devices hung off of a hub (as in 10BASE-T). ATM to the desktop was a novel idea, providing up to 25Mbps to connected devices, but the complexity of configuring and managing it was not worth the gain. Today almost everything is Ethernet as shown by the Facebook Voyager DWDM system, which uses Ethernet over both traditional SFP ports and the DWDM interfaces. Ethernet is simple, well supported and easy to manage. Example 1 - Migration from FDDI to 100Base-T In late 1996, early 1997, the Exodus network used FDDI rings (Fiber Distributed Data Interface) to connect the main routers together at 100Mbps. As the network grew we had to decide between two competing technologies, FDDI switches and Fast Ethernet (100Base-T) both providing 100Mbp/s. FDDI switches from companies like DEC (FDDI Gigaswitch) were used in most of the Internet Exchange Points (IXPs) and worked reasonably well with one minor issue, head of line blocking (HOLB), which also impacted other technologies. Head of line blocking occurs when a packet is destined for an interface that is already full, so a queue is built, if the interface continues to be full, eventually the queue will be dropped. While we were testing the DEC FDDI Gigaswitches, we were also in deep discussions with Cisco about the availability of Fast Ethernet (FE) and working on designs. Because FE was new, there were concerns about how it would perform and how we would be able to build a redundant network design. In the end, we decided to use FE, connect the main routers in a full mesh and use routing protocols to manage fail-over. Example 2 - NGN Failure - LANE (LAN Emulation) During the high growth period at Exodus communications, there was a request to connect a new data center to the original one and allow customers to put servers in both locations using the same address space. To do this, we chose LAN Emulation or LANE which allows a ATM network to be used like a LAN. On paper, LANE looked like a great idea, the ability to extend the LAN so that customers could use the same IP space in two different locations. In reality, it was very different. For hardware, we were using Cisco 5513 switches which provided a combination of Ethernet and ATM ports. There were multiple issues with this design: First, the customer is provided with an ethernet interface, which runs over an ATM optical interface. Any error on the physical connection between switches or the ATM layer would cause errors on the Ethernet layer. Second, monitoring was very hard, when there were network issues, you had to look in multiple locations to determine where the errors were happening. After a few weeks, we did a midnight swap putting Cisco 7500 routers in to replace the 5500 switches and moving customers onto new blocks for the new data center. Part 3: Designing a modern network When designing a new network, some of the following might be important to you: Simple, focused yet non-blocking IP fabric Multistage parallel fabrics based on Clos network concept Simple merchant silicon Distributed control plane with some centralized controls Wide multi-path (ECMP) Uniform chipset, bandwidth, and buffering 1:1 oversubscribed (non-blocking fabric) Minimize the hardware necessary to carry east–west traffic Ability to support a large number of bare metal servers without adding an additional layer Limit fabric to a 5 stage Clos within the data center to minimize lookups and switching latency. Support host attachment at 10G, 25G, 50G and 100G Ethernet Traffic management In a modern network one of the first decisions is whether you will use a centralized controller or not. If you use a centralized controller, you will be able to see and control the entire network from one location. If you do not use a centralized controller, you will need to either manage each system directly or via automation. There is a middle space where you can use some software defined network pieces to manage parts of the network, such as an OpenFlow controller for the WAN or VMware NSX for your virtualized workloads. Once you know what the general management goal is, the next decision is whether to use open, proprietary, or a combination of both open and proprietary networking equipment. Open networking equipment is a concept that has been around less than a decade and started when very large network operators decided that they wanted a better control of the cost and features of the equipment in their networks. Google is a good example. In the following figure, you can see how Facebook used both their own hardware, 6-Pack/Backpack and legacy vendor hardware for their interoperability and performance testing. Google wanted to build a high-speed backbone, but was not looking to pay the prices that the incumbent proprietary vendors such as Cisco and Juniper wanted. Google set a price per port (1G/10G/40G) that they wanted to hit and designed equipment around that. Later companies like Facebook decided to go the same direction and contracted with commodity manufacturers to build network switches that met their needs. Proprietary vendors can offer the same level of performance or better using their massive teams of engineers to design and optimize hardware. This distinction even applies on the software side where companies like VMware and Cisco have created software defined networking tools such as NSX and ACI. With the large amount of networking gear available, designing and building a modern network can appear to be a complex concept. Designing a modern network requires research and a good understanding of networking equipment. While complex, the task is not hard if you follow the guidelines. These are a few of the stages of planning that need to be followed before the modern network design is started: The first step is to understand the scope of the project (single site, multi-site, multi-continent, multi-planet). The second step is to determine if the project is a green field (new) or brown field deployment (how many of the sites already exist and will/will not be upgraded). The third step is to determine if there will be any software defined networking (SDN), next generation networking (NGN) or Open Networking pieces. Finally, it is key that the equipment to be used is assembled and tested to determine if the equipment meets the needs of the network. Summary In this article, we have discussed many different concepts that tie NGN together. The term NGN refers to the latest and near-term networking equipment and designs. We looked at networking concepts such as local, metro and wide area networks, network controllers, routers and switches. Routing protocols such as BGP, IS-IS, OSPF and RIP. Then we discussed many pieces that are used either singularly or together that create a modern network. In the end, we also learned some guidelines that should be followed while designing a network. Resources for Article: Further resources on this subject: Analyzing Social Networks with Facebook [article] Social Networks [article] Point-to-Point Networks [article]

0
0
2688

Implementing gradient descent algorithm to solve optimization problems

How to build data streams in IBM SPSS Modeler

How to combine data files within IBM SPSS Modeler

FAT* 2018 Conference Session 2 Summary: Interpretability and Explainability

How to perform Numeric Metric Aggregations with Elasticsearch

Working with Kafka Streams

Introduction to WordPress Plugin

Exchange Management Shell Common Tasks

How to classify emails using deep neural networks after generating TF-IDF

Installing TensorFlow in Windows, Ubuntu and Mac OS

Trending Topics

IPv6, Unix Domain Sockets, and Network Interfaces

API Gateway and its Need

Some Basic Concepts of Theano

Tree Test and Surveys

Open and Proprietary Next Generation Networks