How-To Tutorials

article-image-exploring-deep-learning-architectures-tutorial

25 Nov 2018

11 min read

Exploring Deep Learning Architectures [Tutorial]

25 Nov 2018

This tutorial will focus on some of the important architectures present today in deep learning. A lot of the success of neural networks lies in the careful design of the neural network architecture. We will look at the architecture of Autoencoder Neural Networks, Variational Autoencoders, CNN's and RNN's. This tutorial is an excerpt from a book written by Dipanjan Sarkar, Raghav Bali, Et al titled Hands-On Transfer Learning with Python. This book extensively focuses on deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. Autoencoder neural networks Autoencoders are typically used for reducing the dimensionality of data in neural networks. They are also successfully used for anomaly detection and novelty detection problems. Autoencoder neural networks come under the unsupervised learning category. The network is trained by minimizing the difference between input and output. A typical autoencoder architecture is a slight variant of the DNN architecture, where the number of units per hidden layer is progressively reduced until a certain point before being progressively increased, with the final layer dimension being equal to the input dimension. The key idea behind this is to introduce bottlenecks in the network and force it to learn a meaningful compact representation. The middle layer of hidden units (the bottleneck) is basically the dimension-reduced encoding of the input. The first half of the hidden layers is called the encoder, and the second half is called the decoder. The following depicts a simple autoencoder architecture. The layer named z is the representation layer here: Source: cloud4scieng.org Variational autoencoders The variational autoencoders (VAEs) are generative models and compared to other deep generative models, VAEs are computationally tractable and stable and can be estimated by the efficient backpropagation algorithm. They are inspired by the idea of variational inference in Bayesian analysis. The idea of variational inference is as follows: given input distribution x, the posterior probability distribution over output y is too complicated to work with. So, let's approximate that complicated posterior, p(y | x), with a simpler distribution, q(y). Here, q is chosen from a family of distributions, Q, that best approximates the posterior. For example, this technique is used in training latent Dirichlet allocation (LDAs) (they do topic modeling for text and are Bayesian generative models). Given a dataset, X, VAE can generate new samples similar but not necessarily equal to those in X. Dataset X has N Independent and Identically Distributed (IID) samples of some continuous or discrete random variable, x. Let's assume that the data is generated by some random process, involving an unobserved continuous random variable, z. In this example of a simple autoencoder, the variable z is deterministic and is a stochastic variable. Data generation is a two-step process: A value of z is generated from a prior distribution, ρθ(z) A value of x is generated from the conditional distribution, ρθ(x|z) So, p(x) is basically the marginal probability, calculated as: The parameter of the distribution, θ, and the latent variable, z, are both unknown. Here, x can be generated by taking samples from the marginal p(x). Backpropagation cannot handle stochastic variable z or stochastic layer z within the network. Assuming the prior distribution, p(z), is Gaussian, we can leverage the location-scale property of Gaussian distribution, and rewrite the stochastic layer as z = μ + σε , where μ is the location parameter, σ is the scale, and ε is the white noise. Now we can obtain multiple samples of the noise, ε, and feed them as the deterministic input to the neural network. Then, the model becomes an end-to-end deterministic deep neural network, as shown here: Here, the decoder part is same as in the case of the simple autoencoder that we looked at earlier. Types of CNN architectures CNNs are multilayered neural networks designed specifically for identifying shape patterns with a high degree of invariance to translation, scaling, and rotation in two-dimensional image data. These networks need to be trained in a supervised way. Typically, a labeled set of object classes, such as MNIST or ImageNet, is provided as a training set. The crux of any CNN model is the convolution layer and the subsampling/pooling layer. LeNet architecture This is a pioneering seven-level convolutional network, designed by LeCun and their co-authors in 1998, that was used for digit classification. Later, it was applied by several banks to recognize handwritten numbers on cheques. The lower layers of the network are composed of alternating convolution and max pooling layers. The upper layers are fully connected, dense MLPs (formed of hidden layers and logistic regression). The input to the first fully connected layer is the set of all the feature maps of the previous layer: AlexNet In 2012, AlexNet significantly outperformed all the prior competitors and won the ILSVRC by reducing the top-5 error to 15.3%, compared to the runner-up with 26%. This work popularized the application of CNNs in computer vision. AlexNet has a very similar architecture to that of LeNet, but has more filters per layer and is deeper. Also, AlexNet introduces the use of stacked convolution, instead of always using alternative convolution pooling. A stack of small convolutions is better than one large receptive field of convolution layers, as this introduces more non-linearities and fewer parameters. ZFNet The ILSVRC 2013 winner was a CNN from Matthew Zeiler and Rob Fergus. It became known as ZFNet. It improved on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller, going from 11 x 11 stride 4 in AlexNet to 7 x 7 stride 2 in ZFNet. The intuition behind this was that a smaller filter size in the first convolution layer helps to retain a lot of the original pixel information. Also, AlexNet was trained on 15 million images, while ZFNet was trained on only 1.3 million images: GoogLeNet (inception network) The ILSVRC 2014 winner was a convolutional network called GoogLeNet from Google. It achieved a top-5 error rate of 6.67%! This was very close to human-level performance. The runner up was the network from Karen Simonyan and Andrew Zisserman known as VGGNet. GoogLeNet introduced a new architectural component using a CNN called the inception layer. The intuition behind the inception layer is to use larger convolutions, but also keep a fine resolution for smaller information on the images. The following diagram describes the full GoogLeNet architecture: Visual Geometry Group Researchers from the Oxford Visual Geometry Group, or the VGG for short, developed the VGG network, which is characterized by its simplicity, using only 3 x 3 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling. At the end, two fully connected layers, each with 4,096 nodes, are then followed by a softmax layer. The only preprocessing done to the input is the subtraction of the mean RGB value, computed on the training set, from each pixel. Pooling is carried out by max pooling layers, which follow some of the convolution layers. Not all the convolution layers are followed by max pooling. Max pooling is performed over a 2 x 2 pixel window, with a stride of 2. ReLU activation is used in each of the hidden layers. The number of filters increases with depth in most VGG variants. The 16-layered architecture VGG-16 is shown in the following diagram. The 19-layered architecture with uniform 3 x 3 convolutions (VGG-19) is shown along with ResNet in the following section. The success of VGG models confirms the importance of depth in image representations: VGG-16: Input RGB image of size 224 x 224 x 3, the number of filters in each layer is circled Residual Neural Networks The main idea in this architecture is as follows. Instead of hoping that a set of stacked layers would directly fit a desired underlying mapping, H(x), they tried to fit a residual mapping. More formally, they let the stacked set of layers learn the residual R(x) = H(x) - x, with the true mapping later being obtained by a skip connection. The input is then added to the learned residual, R(x) + x. Also, batch normalization is applied right after each convolution and before activation: Here is the full ResNet architecture compared to VGG-19. The dotted skip connections show an increase in dimensions; hence, for the addition to be valid, no padding is done. Also, increases in dimensions are indicated by changes in color: Types of RNN architectures An recurrent neural Network (RNN) is specialized for processing a sequence of values, as in x(1), . . . , x(t). We need to do sequence modeling if, say, we wanted to predict the next term in the sequence given the recent history of the sequence, or maybe translate a sequence of words in one language to another language. RNNs are distinguished from feedforward networks by the presence of a feedback loop in their architecture. It is often said that RNNs have memory. The sequential information is preserved in the RNNs hidden state. So, the hidden layer in the RNN is the memory of the network. In theory, RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps. LSTMs RNNs start losing historical context over time in the sequence, and hence are hard to train for practical purposes. This is where LSTMs (Long short-term memory) come into the picture! Introduced by Hochreiter and Schmidhuber in 1997, LSTMs can remember information from really long sequence-based data and prevent issues such as the vanishing gradient problem. LSTMs usually consist of three or four gates, including input, output, and forget gates. The following diagram shows a high-level representation of a single LSTM cell: The input gate can usually allow or deny incoming signals or inputs to alter the memory cell state. The output gate usually propagates the value to other neurons as necessary. The forget gate controls the memory cell's self-recurrent connection to remember or forget previous states as necessary. Multiple LSTM cells are usually stacked in any deep learning network to solve real-world problems, such as sequence prediction. Stacked LSTMs If we want to learn about the hierarchical representation of sequential data, a stack of LSTM layers can be used. Each LSTM layer outputs a sequence of vectors rather than a single vector for each item of the sequence, which will be used as an input to a subsequent LSTM layer. This hierarchy of hidden layers enables a more complex representation of our sequential data. Stacked LSTM models can be used for modeling complex multivariate time series data. Encoder-decoder – Neural Machine Translation Machine translation is a sub-field of computational linguistics, and is about performing translation of text or speech from one language to another. Traditional machine translation systems typically rely on sophisticated feature engineering based on the statistical properties of text. Recently, deep learning has been used to solve this problem, with an approach known as Neural Machine Translation (NMT). An NMT system typically consists of two modules: an encoder and a decoder. It first reads the source sentence using the encoder to build a thought vector: a sequence of numbers that represents the sentence's meaning. A decoder processes the sentence vector to emit a translation to other target languages. This is called an encoder-decoder architecture. The encoders and decoders are typically forms of RNN. The following diagram shows an encoder-decoder architecture using stacked LSTMs. Source: tensorflow.org The source code for NMT in TensorFlow is available at Github. Gated Recurrent Units Gated Recurrent Units (GRUs) are related to LSTMs, as both utilize different ways of gating information to prevent the vanishing gradient problem and store long-term memory. A GRU has two gates: a reset gate, r, and an update gate, z, as shown in the following diagram. The reset gate determines how to combine the new input with the previous hidden state, ht-1, and the update gate defines how much of the previous state information to keep. If we set the reset to all ones and update gate to all zeros, we arrive at a simple RNN model: GRUs are computationally more efficient because of a simpler structure and fewer parameters. Summary This article covered various advances in neural network architectures including Autoencoder Neural Networks, Variational Autoencoders, CNN's and RNN's architectures. To understand how to simplify deep learning by taking supervised, unsupervised, and reinforcement learning to the next level using the Python ecosystem, check out this book Hands-On Transfer Learning with Python Neural Style Transfer: Creating artificial art with deep learning and transfer learning Dr. Brandon explains ‘Transfer Learning’ to Jon 5 cool ways Transfer Learning is being used today

0
0
6492

How-To Tutorials

article-image-perform-regression-analysis-using-sas

Gebin George

27 Feb 2018

7 min read

How to perform regression analysis using SAS

Gebin George

27 Feb 2018

7 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book, Big Data Analysis with SAS written by David Pope. This book will help you leverage the power of SAS for data management, analysis and reporting. It contains practical use-cases and real-world examples on predictive modelling, forecasting, optimizing, and reporting your Big Data analysis using SAS.[/box] Today, we will perform regression analysis using SAS in a step-by-step manner with a practical use-case. Regression analysis is one of the earliest predictive techniques most people learn because it can be applied across a wide variety of problems dealing with data that is related in linear and non-linear ways. Linear data is one of the easier use cases, and as such PROC REG is a well-known and often-used procedure to help predict likely outcomes before they happen. The REG procedure provides extensive capabilities for fitting linear regression models that involve individual numeric independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of survival data, and regression modeling of transformed variables. The SAS/STAT procedures that can fit regression models include the ADAPTIVEREG, CATMOD, GAM, GENMOD, GLIMMIX, GLM, GLMSELECT, LIFEREG, LOESS, LOGISTIC, MIXED, NLIN, NLMIXED, ORTHOREG, PHREG, PLS, PROBIT, QUANTREG, QUANTSELECT, REG, ROBUSTREG, RSREG, SURVEYLOGISTIC, SURVEYPHREG, SURVEYREG, TPSPLINE, and TRANSREG procedures. Several procedures in SAS/ETS software also fit regression models. SAS/STAT14.2 / SAS/STAT User's Guide - Introduction to Regression Procedures - Overview: Regression Procedures (http://documentation.sas.com/?cdcId=statcdccdcVersion=14.2 docsetId=statugdocsetTarget=statug_introreg_sect001.htmlocale=enshowBanner=yes). Regression analysis attempts to model the relationship between a response or output variable and a set of input variables. The response is considered the target variable or the variable that one is trying to predict, while the rest of the input variables make up parameters used as input into the algorithm. They are used to derive the predicted value for the response variable. PROC REG One of the easiest ways to determine if regression analysis is applicable to helping you answer a question is if the type of question being asked has only two answers. For example, should a bank lend an applicant money? Yes or no? This is known as a binary response, and as such, regression analysis can be applied to help determine the answer. In the following example, the reader will use the SASHELP.BASEBALL dataset to create a regression model to predict the value of a baseball player's salary. The SASHELP.BASEBALL dataset contains salary and performance information for Major League. Baseball players who played at least one game in both the 1986 and 1987 seasons, excluding pitchers. The salaries (Sports Illustrated, April 20, 1987) are for the 1987 season and the performance measures are from 1986 (Collier Books, The 1987 Baseball Encyclopedia Update). SAS/STAT® 14.2 / SAS/STAT User's Guide - Example 99: Modeling Salaries of Major League Baseball Players (http://documentation.sas.com/ ?cdcId= statcdc cdcVersion= 14.2 docsetId=statugdocsetTarget= statug_ reg_ examples01.htmlocale= en showBanner= yes). Let's first use PROC UNIVARIATE to learn something about this baseball data by submitting the following code: proc univariate data=sashelp.baseball; quit; While reviewing the results of the output, the reader will notice that the variance associated with logSalary, 0.79066, is much less than the variance associated with the actual target variable Salary, 203508. In this case, it makes better sense to attempt to predict the logSalary value of a player instead of Salary. Write the following code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball; id name team league; model logSalary = nAtBat nHits nHome nRuns nRBI YrMajor CrAtBat CrHits CrHome CrRuns CrRbi; Quit; Notice that there are 59 observations as specified in the first output table with at least one of the input variables with missing values; as such those are not used in the development of the regression model. The Root Mean Squared Error (RMSE) and R-square are statistics that typically inform the analyst how good the model is in predicting the target. These range from 0 to 1.0 with higher values typically indicating a better model. The higher the Rsquared values typically indicate a better performing model but sometimes conditions or the data used to train the model over-fit and don't represent the true value of the prediction power of that particular model. Over-fitting can happen when an analyst doesn't have enough real-life data and chooses data or a sample of data that over-presents the target event, and therefore it will produce a poor performing model when using real-world data as input. Since several of the input values appear to have little predictive power on the target, an analyst may decide to drop these variables, thereby reducing the need for that information to make a decent prediction. In this case, it appears we only need to use four input variables. YrMajor, nHits, nRuns, and nAtBat. Modify the code as follows and submit it again: proc reg data=sashelp.baseball; id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; The p-value associated with each of the input variables provides the analyst with an insight into which variables have the biggest impact on helping to predict the target variable. In this case, the smaller the value, the higher the predictive value of the input variable. Both the RMSE and R-square values for this second model are slightly lower than the original. However, the adjusted R-square value is slightly higher. In this case, an analyst may chose to use the second model since it requires much less data and provides basically the same predictive power. Prior to accepting any model, an analyst should determine whether there are a few observations that may be over-influencing the results by investigating the influence and fit diagnostics. The default output from PROC REG provides this type of visual insight: The top-right corner plot, showing the externally studentized residuals (RStudent) by leverage values, shows that there are a few observations with high leverage that may be overly influencing the fit produced. In order to investigate this further, we will add a plots statement to our PROC REG to produce a labeled version of this plot. Type the following code in a SAS Studio program section and submit: proc reg data=sashelp.baseball plots(only label)=(RStudentByLeverage); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; Sure enough, there are three to five individuals whose input variables may have excessive influence on fitting this model. Let's remove those points and see if the model improves. Type this code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball plots=(residuals(smooth)); where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; This change, in itself, has not improved the model but actually made the model worse as can be seen by the R-square, 0.5592. However, the plots residuals(smooth) option gives some insights as it pertains to YrMajor; players at the beginning and the end of their careers tend to be paid less compared to others, as can be seen in Figure 4.12: In order to address this lack of fit, an analyst can use polynomials of degree two for this variable, YrMajor. Type the following code in a SAS Studio program section and submit it: data work.baseball; set sashelp.baseball; where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); YrMajor2 = YrMajor*YrMajor; run; proc reg data=work.baseball; id name team league; model logSalary = YrMajor YrMajor2 nHits nRuns nAtBat; Quit; After removing some outliers and adjusting for the YrMajor variable, the model's predictive power has improved significantly as can be seen in the much improved R-square value of 0.7149. We saw an effective way of performing regression analysis using SAS platform. If you found our post useful, do check out this book Big Data Analysis with SAS to understand other data analysis models and perform them practically using SAS.

0
0
6482

article-image-testing-ui-using-webdriverjs

Packt

17 Feb 2015

30 min read

Testing a UI Using WebDriverJS

Packt

17 Feb 2015

30 min read

0
0
6481

Packt

19 Feb 2016

8 min read

How is Python code organized

Packt

19 Feb 2016

8 min read

Python is an easy to learn yet a powerful programming language. It has efficient high-level data structures and effective approach to object-oriented programming. Let's talk a little bit about how Python code is organized. In this paragraph, we'll start going down the rabbit hole a little bit more and introduce a bit more technical names and concepts. Starting with the basics, how is Python code organized? Of course, you write your code into files. When you save a file with the extension .py, that file is said to be a Python module. If you're on Windows or Mac, which typically hide file extensions to the user, please make sure you change the configuration so that you can see the complete name of the files. This is not strictly a requirement, but a hearty suggestion. It would be impractical to save all the code that it is required for software to work within one single file. That solution works for scripts, which are usually not longer than a few hundred lines (and often they are quite shorter than that). A complete Python application can be made of hundreds of thousands of lines of code, so you will have to scatter it through different modules. Better, but not nearly good enough. It turns out that even like this it would still be impractical to work with the code. So Python gives you another structure, called package, which allows you to group modules together. A package is nothing more than a folder, which must contain a special file, __init__.py that doesn't need to hold any code but whose presence is required to tell Python that the folder is not just some folder, but it's actually a package (note that as of Python 3.3 __init__.py is not strictly required any more). As always, an example will make all of this much clearer. I have created an example structure in my project, and when I type in my Linux console: $ tree -v example Here's how a structure of a real simple application could look like: example/ ├── core.py ├── run.py └── util ├── __init__.py ├── db.py ├── math.py └── network.py You can see that within the root of this example, we have two modules, core.py and run.py, and one package: util. Within core.py, there may be the core logic of our application. On the other hand, within the run.py module, we can probably find the logic to start the application. Within the util package, I expect to find various utility tools, and in fact, we can guess that the modules there are called by the type of tools they hold: db.py would hold tools to work with databases, math.py would of course hold mathematical tools (maybe our application deals with financial data), and network.py would probably hold tools to send/receive data on networks. As explained before, the __init__.py file is there just to tell Python that util is a package and not just a mere folder. Had this software been organized within modules only, it would have been much harder to infer its structure. I put a module only example under the ch1/files_only folder, see it for yourself: $ tree -v files_only This shows us a completely different picture: files_only/ ├── core.py ├── db.py ├── math.py ├── network.py └── run.py It is a little harder to guess what each module does, right? Now, consider that this is just a simple example, so you can guess how much harder it would be to understand a real application if we couldn't organize the code in packages and modules. How do we use modules and packages When a developer is writing an application, it is very likely that they will need to apply the same piece of logic in different parts of it. For example, when writing a parser for the data that comes from a form that a user can fill in a web page, the application will have to validate whether a certain field is holding a number or not. Regardless of how the logic for this kind of validation is written, it's very likely that it will be needed in more than one place. For example in a poll application, where the user is asked many question, it's likely that several of them will require a numeric answer. For example: What is your age How many pets do you own How many children do you have How many times have you been married It would be very bad practice to copy paste (or, more properly said: duplicate) the validation logic in every place where we expect a numeric answer. This would violate the DRY (Don't Repeat Yourself) principle, which states that you should never repeat the same piece of code more than once in your application. I feel the need to stress the importance of this principle: you should never repeat the same piece of code more than once in your application (got the irony?). There are several reasons why repeating the same piece of logic can be very bad, the most important ones being: There could be a bug in the logic, and therefore, you would have to correct it in every place that logic is applied. You may want to amend the way you carry out the validation, and again you would have to change it in every place it is applied. You may forget to fix/amend a piece of logic because you missed it when searching for all its occurrences. This would leave wrong/inconsistent behavior in your application. Your code would be longer than needed, for no good reason. Python is a wonderful language and provides you with all the tools you need to apply all the coding best practices. For this particular example, we need to be able to reuse a piece of code. To be able to reuse a piece of code, we need to have a construct that will hold the code for us so that we can call that construct every time we need to repeat the logic inside it. That construct exists, and it's called function. I'm not going too deep into the specifics here, so please just remember that a function is a block of organized, reusable code which is used to perform a task. Functions can assume many forms and names, according to what kind of environment they belong to, but for now this is not important. Functions are the building blocks of modularity in your application, and they are almost indispensable (unless you're writing a super simple script, you'll use functions all the time). Python comes with a very extensive library, as I already said a few pages ago. Now, maybe it's a good time to define what a library is: a library is a collection of functions and objects that provide functionalities that enrich the abilities of a language. For example, within Python's math library we can find a plethora of functions, one of which is the factorial function, which of course calculates the factorial of a number. In mathematics, the factorial of a non-negative integer number N, denoted as N!, is defined as the product of all positive integers less than or equal to N. For example, the factorial of 5 is calculated as: 5! = 5 * 4 * 3 * 2 * 1 = 120 The factorial of 0 is 0! = 1, to respect the convention for an empty product. So, if you wanted to use this function in your code, all you would have to do is to import it and call it with the right input values. Don't worry too much if input values and the concept of calling is not very clear for now, please just concentrate on the import part. We use a library by importing what we need from it, and then we use it. In Python, to calculate the factorial of number 5, we just need the following code: >>> from math import factorial >>> factorial(5) 120 Whatever we type in the shell, if it has a printable representation, will be printed on the console for us (in this case, the result of the function call: 120). So, let's go back to our example, the one with core.py, run.py, util, and so on. In our example, the package util is our utility library. Our custom utility belt that holds all those reusable tools (that is, functions), which we need in our application. Some of them will deal with databases (db.py), some with the network (network.py), and some will perform mathematical calculations (math.py) that are outside the scope of Python's standard math library and therefore, we had to code them for ourselves. Summary In this article, we started to explore the world of programming and that of Python. We saw how Python code can be organized using modules and packages. For more information on Python, refer the following books recomended by Packt Publishing: Learning Python (https://www.packtpub.com/application-development/learning-python) Python 3 Object-oriented Programming - Second Edition (https://www.packtpub.com/application-development/python-3-object-oriented-programming-second-edition) Python Essentials (https://www.packtpub.com/application-development/python-essentials) Resources for Article: Further resources on this subject: Test all the things with Python [article] Putting the Fun in Functional Python [article] Scraping the Web with Python - Quick Start [article]

0
0
6477

article-image-combine-data-files-within-ibm-spss-modeler

Amey Varangaonkar

22 Feb 2018

6 min read

How to combine data files within IBM SPSS Modeler

Amey Varangaonkar

22 Feb 2018

6 min read

[box type="note" align="" class="" width=""]The following extract is taken from the book IBM SPSS Modeler Essentials, written by Keith McCormick and Jesus Salcedo. SPSS Modeler is one of the popularly used enterprise tools for data mining and predictive analytics. [/box] In this article, we will explore how SPSS Modeler can be effectively used to combine different file types for efficient data modeling. In many organizations, different pieces of information for the same individuals are held in separate locations. To be able to analyze such information within Modeler, the data files must be combined into one single file. The Merge node joins two or more data sources so that information held for an individual in different locations can be analyzed collectively. The following diagram shows how the Merge node can be used to combine two separate data files that contain different types of information: Like the Append node, the Merge node is found in the Record Ops palette. This node takes multiple data sources and creates a single source containing all or some of the input fields. Let's go through an example of how to use the Merge node to combine data files: Open the Merge stream. The Merge stream contains the files we previously appended, as well as the main data file we were working with in earlier chapters. 2. Place a Merge node from the Record Ops palette on the canvas. 3. Connect the last Reclassify node to the Merge node. 4. Connect the Filter node to the Merge node. [box type="info" align="" class="" width=""]Like the Append node, the order in which data sources are connected to the Merge node impacts the order in which the sources are displayed. The fields of the first source connected to the Merge node will appear first, followed by the fields of the second source connected to the Merge node, and so on.[/box] 5. Connect the Merge node to a Table node: 6. Edit the Merge node. Since the Merge node can cope with a variety of different situations, the Merge tab allows you to specify the merging method. There are four methods for merging: Order: It joins the first record in the first dataset with the first record in the second dataset, and so on. If any of the datasets run out of records, no further output records are produced. This method can be dangerous if there happens to be any cases that are missing from a file, or if files have been sorted differently. Keys: It is the most commonly used method, used when records that have the same value in the field(s) defined as the key are merged. If multiple records contain the same value on the key field, all possible merges are returned. Condition: It joins records from files that meet a specified condition. Ranked condition: It specifies whether each row pairing in the primary dataset and all secondary datasets are to be merged; use the ranking expression to sort any multiple matches from low to high order. Let's combine these files. To do this: Set Merge Method to Keys. Fields contained in all input sources appear in the Possible keys list. To identify one of more fields as the key field(s), move the selected field into the Keys for merge list. In our case, there are two fields that appear in both files, ID and Year. 2. Select ID in the Possible keys list and place it into the Keys for merge list: There are five major methods of merging using a key field: Include only matching records (inner join) merges only complete records, that is, records; that are available in all datasets. Include matching and non-matching records (full outer join) merges records that appear in any of the datasets; that is, the incomplete records are still retained. The undefined value ($null$) is added to the missing fields and included in the output. Include matching and selected non-matching records (partial outerjoin) performs left and right outer-joins. All records from the specified file are retained, along with only those records from the other file(s) that match records in the specified file on the key field(s). The Select... button allows you to designate which file is to contribute incomplete records. Include records in first dataset not matching any others (anti-join) provides an easy way of identifying records in a dataset that do not have records with the same key values in any of the other datasets involved in the merge. This option only retains records from the dataset that match with no other records. Combine duplicate key fields is the final option in this dialog, and it deals with the problem of duplicate field names (one from each dataset) when key fields are used. This option ensures that there is only one output field with a given name, and this is enabled by default. The Filter tab The Filter tab lists the data sources involved in the merge, and the ordering of the sources determines the field ordering of the merged data. Here, you can rename and remove fields. Earlier, we saw that the field Year appeared in both datasets; here we can remove one version of this field (we could also rename one version of the field to keep both): Click on the arrow next to the second Year field: The second Year field will no longer appear in the combined data file. The Optimization tab The Optimization tab provides two options that allow you to merge data more efficiently when one input dataset is significantly larger than the other datasets, or when the data is already presorted by all or some of the key fields that you are using to merge: Click OK. Run the Table: All of these files have now been combined. The resulting table should have 44 fields and 143,531 records. We saw how the Merge node is used to join data files that contain different information for the same records. If you found this post useful, make sure to check out IBM SPSS Modeler Essentials for more information on leveraging SPSS Modeler to get faster and efficient insights from your data.

0
0
6477

article-image-quantum-computing-edge-analytics-and-meta-learning-key-trends-in-data-science-and-big-data-in-2019

Richard Gall

18 Dec 2018

11 min read

Quantum computing, edge analytics, and meta learning: key trends in data science and big data in 2019

Richard Gall

18 Dec 2018

11 min read

0
0
6476

article-image-testing-your-application-cljstest

Packt

11 May 2016

13 min read

Testing Your Application with cljs.test

Packt

11 May 2016

13 min read

In this article written by David Jarvis, Rafik Naccache, and Allen Rohner, authors of the book Learning ClojureScript, we'll take a look at how to configure our ClojureScript application or library for testing. As usual, we'll start by creating a new project for us to play around with: $ lein new figwheel testing (For more resources related to this topic, see here.) We'll be playing around in a test directory. Most JVM Clojure projects will have one already, but since the default Figwheel template doesn't include a test directory, let's make one first (following the same convention used with source directories, that is, instead of src/$PROJECT_NAME we'll create test/$PROJECT_NAME): $ mkdir –p test/testing We'll now want to make sure that Figwheel knows that it has to watch the test directory for file modifications. To do that, we will edit the the dev build in our project.clj project's :cljsbuild map so that it's :source-paths vector includes both src and test. Your new dev build configuration should look like the following: {:id "dev" :source-paths ["src" "test"] ;; If no code is to be run, set :figwheel true for continued automagical reloading :figwheel {:on-jsload "testing.core/on-js-reload"} :compiler {:main testing.core :asset-path "js/compiled/out" :output-to "resources/public/js/compiled/testing.js" :output-dir "resources/public/js/compiled/out" :source-map-timestamp true}} Next, we'll get the old Figwheel REPL going so that we can have our ever familiar hot reloading: $ cd testing $ rlwrap lein figwheel Don't forget to navigate a browser window to http://localhost:3449/ to get the browser REPL to connect. Now, let's create a new core_test.cljs file in the test/testing directory. By convention, most libraries and applications in Clojure and ClojureScript have test files that correspond to source files with the suffix _test. In this project, this means that test/testing/core_test.cljs is intended to contain the tests for src/testing/core.cljs. Let's get started by just running tests on a single file. Inside core_test.cljs, let's add the following code: (ns testing.core-test (:require [cljs.test :refer-macros [deftest is]])) (deftest i-should-fail (is (= 1 0))) (deftest i-should-succeed (is (= 1 1))) This code first requires two of the most important cljs.test macros, and then gives us two simple examples of what a failed test and a successful test should look like. At this point, we can run our tests from the Figwheel REPL: cljs.user=> (require 'testing.core-test) ;; => nil cljs.user=> (cljs.test/run-tests 'testing.core-test) Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 2 tests containing 2 assertions. 1 failures, 0 errors. ;; => nil At this point, what we've got is tolerable, but it's not really practical in terms of being able to test a larger application. We don't want to have to test our application in the REPL and pass in our test namespaces one by one. The current idiomatic solution for this in ClojureScript is to write a separate test runner that is responsible for important executions and then run all of your tests. Let's take a look at what this looks like. Let's start by creating another test namespace. Let's call this one app_test.cljs, and we'll put the following in it: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is]])) (deftest another-successful-test (is (= 4 (count "test")))) We will not do anything remarkable here; it's just another test namespace with a single test that should pass by itself. Let's quickly make sure that's the case at the REPL: cljs.user=> (require 'testing.app-test) nil cljs.user=> (cljs.test/run-tests 'testing.app-test) Testing testing.app-test Ran 1 tests containing 1 assertions. 0 failures, 0 errors. ;; => nil Perfect. Now, let's write a test runner. Let's open a new file that we'll simply call test_runner.cljs, and let's include the following: (ns testing.test-runner (:require [cljs.test :refer-macros [run-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (defn run-all-tests [] (run-tests 'testing.app-test 'testing.core-test)) Again, nothing surprising. We're just making a single function for us that runs all of our tests. This is handy for us at the REPL: cljs.user=> (testing.test-runner/run-all-tests) Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. ;; => nil Ultimately, however, we want something we can run at the command line so that we can use it in a continuous integration environment. There are a number of ways we can go about configuring this directly, but if we're clever, we can let someone else do the heavy lifting for us. Enter doo, the handy ClojureScript testing plugin for Leiningen. Using doo for easier testing configuration doo is a library and Leiningen plugin for running cljs.test in many different JavaScript environments. It makes it easy to test your ClojureScript regardless of whether you're writing for the browser or for the server, and it also includes file watching capabilities such as Figwheel so that you can automatically rerun tests on file changes. The doo project page can be found at https://github.com/bensu/doo. To configure our project to use doo, first we need to add it to the list of plugins in our project.clj file. Modify the :plugins key so that it looks like the following: :plugins [[lein-figwheel "0.5.2"] [lein-doo "0.1.6"] [lein-cljsbuild "1.1.3" :exclusions [[org.clojure/clojure]]]] Next, we will add a new cljsbuild build configuration for our test runner. Add the following build map after the dev build map on which we've been working with until now: {:id "test" :source-paths ["src" "test"] :compiler {:main testing.test-runner :output-to "resources/public/js/compiled/testing_test.js" :optimizations :none}} This configuration tells Cljsbuild to use both our src and test directories, just like our dev profile. It adds some different configuration elements to the compiler options, however. First, we're not using testing.core as our main namespace anymore—instead, we'll use our test runner's namespace, testing.test-runner. We will also change the output JavaScript file to a different location from our compiled application code. Lastly, we will make sure that we pass in :optimizations :none so that the compiler runs quickly and doesn't have to do any magic to look things up. Note that our currently running Figwheel process won't know about the fact that we've added lein-doo to our list of plugins or that we've added a new build configuration. If you want to make Figwheel aware of doo in a way that'll allow them to play nicely together, you should also add doo as a dependency to your project. Once you've done that, exit the Figwheel process and restart it after you've saved the changes to project.clj. Lastly, we need to modify our test runner namespace so that it's compatible with doo. To do this, open test_runner.cljs and change it to the following: (ns testing.test-runner (:require [doo.runner :refer-macros [doo-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (doo-tests 'testing.app-test 'testing.core-test) This shouldn't look too different from our original test runner—we're just importing from doo.runner rather than cljs.test and using doo-tests instead of a custom runner function. The doo-tests runner works very similarly to cljs.test/run-tests, but it places hooks around the tests to know when to start them and finish them. We're also putting this at the top-level of our namespace rather than wrapping it in a particular function. The last thing we're going to need to do is to install a JavaScript runtime that we can use to execute our tests with. Up until now, we've been using the browser via Figwheel, but ideally, we want to be able to run our tests in a headless environment as well. For this purpose. we recommend installing PhantomJS (though other execution environments are also fine). If you're on OS X and have Homebrew installed (http://www.brew.sh), installing PhantomJS is as simple as typing brew install phantomjs. If you're not on OS X or don't have Homebrew, you can find instructions on how to install PhantomJS on the project's website at http://phantomjs.org/. The key thing is that the following should work: $ phantomjs -v 2.0.0 Once you've got PhantomJS installed, you can now invoke your test runner from the command line with the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Let's break down this command. The first part, lein doo, just tells Leiningen to invoke the doo plugin. Next, we have phantom, which tells doo to use PhantomJS as its running environment. The doo plugin supports a number of other environments, including Chrome, Firefox, Internet Explorer, Safari, Opera, SlimerJS, NodeJS, Rhino, and Nashorn. Be aware that if you're interested in running doo on one of these other environments, you may have to configure and install additional software. For instance, if you want to run tests on Chrome, you'll need to install Karma as well as the appropriate Karma npm modules to enable Chrome interaction. Next we have test, which refers to the cljsbuild build ID we set up earlier. Lastly, we have once, which tells doo to just run tests and not to set up a filesystem watcher. If, instead, we wanted doo to watch the filesystem and rerun tests on any changes, we would just use lein doo phantom test. Testing fixtures The cljs.test project has support for adding fixtures to your tests that can run before and after your tests. Test fixtures are useful for establishing isolated states between tests—for instance, you can use tests to set up a specific database state before each test and to tear it down afterward. You can add them to your ClojureScript tests by declaring them with the use-fixtures macro within the testing namespace you want fixtures applied to. Let's see what this looks like in practice by changing one of our existing tests and adding some fixtures to it. Modify app-test.cljs to the following: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is use-fixtures]])) ;; Run these fixtures for each test. ;; We could also use :once instead of :each in order to run ;; fixtures once for the entire namespace instead of once for ;; each individual test. (use-fixtures :each {:before (fn [] (println "Setting up tests...")) :after (fn [] (println "Tearing down tests..."))}) (deftest another-successful-test ;; Give us an idea of when this test actually executes. (println "Running a test...") (is (= 4 (count "test")))) Here, we've added a call to use-fixtures that prints to the console before and after running the test, and we've added a println call to the test itself so that we know when it executes. Now when we run this test, we get the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Setting up tests... Running a test... Tearing down tests... Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Note that our fixtures get called in the order we expect them to. Asynchronous testing Due to the fact that client-side code is frequently asynchronous and JavaScript is single threaded, we need to have a way to support asynchronous tests. To do this, we can use the async macro from cljs.test. Let's take a look at an example using an asynchronous HTTP GET request. First, let's modify our project.clj file to add cljs-ajax to our dependencies. Our dependencies project key should now look something like this: :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/clojurescript "1.7.228"] [cljs-ajax "0.5.4"] [org.clojure/core.async "0.2.374" :exclusions [org.clojure/tools.reader]]] Next, let's create a new async_test.cljs file in our test.testing directory. Inside it, we will add the following code: (ns testing.async-test (:require [ajax.core :refer [GET]] [cljs.test :refer-macros [deftest is async]])) (deftest test-async (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!"))})) Note that we're not using async in our test at the moment. Let's try running this test with doo (don't forget that you have to add testing.async-test to test_runner.cljs!): $ lein doo phantom test once ... Testing testing.async-test ... Ran 4 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Now, our test here passes, but note that the println async code never fires, and our additional assertion doesn't get called (looking back at our previous examples, since we've added a new is assertion we should expect to see four assertions in the final summary)! If we actually want our test to appropriately validate the error-handler callback within the context of the test, we need to wrap it in an async block. Doing so gives us a test that looks like the following: (deftest test-async (async done (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!") (done))}))) Now, let's try to run our tests again: $ lein doo phantom test once ... Testing testing.async-test Test finished! ... Ran 4 tests containing 4 assertions. 1 failures, 0 errors. Subprocess failed Awesome! Note that this time we see the printed statement from our callback, and we can see that cljs.test properly ran all four of our assertions. Asynchronous fixtures One final "gotcha" on testing—the fixtures we talked about earlier in this article do not handle asynchronous code automatically. This means that if you have a :before fixture that executes asynchronous logic, your test can begin running before your fixture has completed! In order to get around this, all you need to do is to wrap your :before fixture in an async block, just like with asynchronous tests. Consider the following for instance: (use-fixtures :once {:before #(async done ... (done)) :after #(do ...)}) Summary This concludes our section on cljs.test. Testing, whether in ClojureScript or any other language, is a critical software engineering best practice to ensure that your application behaves the way you expect it to and to protect you and your fellow developers from accidentally introducing bugs to your application. With cljs.test and doo, you have the power and flexibility to test your ClojureScript application with multiple browsers and JavaScript environments and to integrate your tests into a larger continuous testing framework. Resources for Article: Further resources on this subject: Clojure for Domain-specific Languages - Design Concepts with Clojure [article] Visualizing my Social Graph with d3.js [article] Improving Performance with Parallel Programming [article]

0
0
6472

article-image-how-to-handle-exceptions-and-synchronization-methods-with-selenium-webdriver-api

Amey Varangaonkar

02 Apr 2018

11 min read

How to handle exceptions and synchronization methods with Selenium WebDriver API

Amey Varangaonkar

02 Apr 2018

11 min read

0
0
6472

Packt

30 Oct 2014

18 min read

Untangle VPN Services

Packt

30 Oct 2014

18 min read

0
0
6462

Packt

29 Mar 2010

2 min read

Setting Up the iReport Pages

Packt

29 Mar 2010

2 min read

Configuring the page format We can follow the listed steps for setting up report pages: Open the report List of Products. Go to menu Window | Report Inspector. The following window will appear on the left side of the report designer: Select the report List of Products, right-click on it, and choose Page Format…. The Page format… dialog box will appear, select A4 from the Format drop-down list, and select Portrait from the Page orientation section. You can modify the page margins if you need to, or leave it as it is to have the default margins. For our report, you need not change the margins. Press OK. Page size You have seen that there are many preset sizes/formats for the report, such as Custom, Letter, Note, Legal, A0 to A10, B0 to B5, and so on. You will choose the appropriate one based on your requirements. We have chosen A4. If the number of columns is too high to fit in Portrait, then choose the Landscape orientation. If you change the preset sizes, the report elements (title, column heading, fields, or other elements) will not be positioned automatically according to the new page size. You have to position each element manually. So be careful if you decide to change the page size. Configuring properties We can modify the default settings of report properties in the following way: Right-click on List of Products and choose Properties. We can configure many important report properties from the Properties window. You can see that there are many options here. You can change the Report name, Page size, Margins, Columns, and more. We have already learnt about setting up pages, so now our concern is to learn about some of the other (More…) options.

0
0
6462

article-image-building-a-twitter-news-bot-using-twitter-api-tutorial

Bhagyashree R

07 Sep 2018

11 min read

Building a Twitter news bot using Twitter API [Tutorial]

Bhagyashree R

07 Sep 2018

11 min read

This article is an excerpt from a book written by Srini Janarthanam titled Hands-On Chatbots and Conversational UI Development. In this article, we will explore the Twitter API and build core modules for tweeting, searching, and retweeting. We will further explore a data source for news around the globe and build a simple bot that tweets top news on its timeline. Getting started with the Twitter app To get started, let us explore the Twitter developer platform. Let us begin by building a Twitter app and later explore how we can tweet news articles to followers based on their interests: Log on to Twitter. If you don't have an account on Twitter, create one. Go to Twitter Apps, which is Twitter's application management dashboard. Click the Create New App button: Create an application by filling in the form providing name, description, and a website (fully-qualified URL). Read and agree to the Developer Agreement and hit Create your Twitter application: You will now see your application dashboard. Explore the tabs: Click Keys and Access Tokens: Copy consumer key and consumer secret and hang on to them. Scroll down to Your Access Token: Click Create my access token to create a new token for your app: Copy the Access Token and Access Token Secret and hang on to them. Now, we have all the keys and tokens we need to create a Twitter app. Building your first Twitter bot Let's build a simple Twitter bot. This bot will listen to tweets and pick out those that have a particular hashtag. All the tweets with a given hashtag will be printed on the console. This is a very simple bot to help us get started. In the following sections, we will explore more complex bots. To follow along you can download the code from the book's GitHub repository. Go to the root directory and create a new Node.js program using npm init: Execute the npm install twitter --save command to install the Twitter Node.js library: Run npm install request --save to install the Request library as well. We will use this in the future to make HTTP GET requests to a news data source. Explore your package.json file in the root directory: { "name": "twitterbot", "version": "1.0.0", "description": "my news bot", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "author": "", "license": "ISC", "dependencies": { "request": "^2.81.0", "twitter": "^1.7.1" } } Create an index.js file with the following code: //index.js var TwitterPackage = require('twitter'); var request = require('request'); console.log("Hello World! I am a twitter bot!"); var secret = { consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_SECRET' } var Twitter = new TwitterPackage(secret); In the preceding code, put the keys and tokens you saved in their appropriate variables. We don't need the request package just yet, but we will later. Now let's create a hashtag listener to listen to the tweets on a specific hashtag: //Twitter stream var hashtag = '#brexit'; //put any hashtag to listen e.g. #brexit console.log('Listening to:' + hashtag); Twitter.stream('statuses/filter', {track: hashtag}, function(stream) { stream.on('data', function(tweet) { console.log('Tweet:@' + tweet.user.screen_name + '\t' + tweet.text); console.log('------') }); stream.on('error', function(error) { console.log(error); }); }); Replace #brexit with the hashtag you want to listen to. Use a popular one so that you can see the code in action. Run the index.js file with the node index.js command. You will see a stream of tweets from Twitter users all over the globe who used the hashtag: Congratulations! You have built your first Twitter bot. Exploring the Twitter SDK In the previous section, we explored how to listen to tweets based on hashtags. Let's now explore the Twitter SDK to understand the capabilities that we can bestow upon our Twitter bot. Updating your status You can also update your status on your Twitter timeline by using the following status update module code: tweet ('I am a Twitter Bot!', null, null); function tweet(statusMsg, screen_name, status_id){ console.log('Sending tweet to: ' + screen_name); console.log('In response to:' + status_id); var msg = statusMsg; if (screen_name != null){ msg = '@' + screen_name + ' ' + statusMsg; } console.log('Tweet:' + msg); Twitter.post('statuses/update', { status: msg }, function(err, response) { // if there was an error while tweeting if (err) { console.log('Something went wrong while TWEETING...'); console.log(err); } else if (response) { console.log('Tweeted!!!'); console.log(response) } }); } Comment out the hashtag listener code and instead add the preceding status update code and run it. When run, your bot will post a tweet on your timeline: In addition to tweeting on your timeline, you can also tweet in response to another tweet (or status update). The screen_name argument is used to create a response. tweet. screen_name is the name of the user who posted the tweet. We will explore this a bit later. Retweet to your followers You can retweet a tweet to your followers using the following retweet status code: var retweetId = '899681279343570944'; retweet(retweetId); function retweet(retweetId){ Twitter.post('statuses/retweet/', { id: retweetId }, function(err, response) { if (err) { console.log('Something went wrong while RETWEETING...'); console.log(err); } else if (response) { console.log('Retweeted!!!'); console.log(response) } }); } Searching for tweets You can also search for recent or popular tweets with hashtags using the following search hashtags code: search('#brexit', 'popular') function search(hashtag, resultType){ var params = { q: hashtag, // REQUIRED result_type: resultType, lang: 'en' } Twitter.get('search/tweets', params, function(err, data) { if (!err) { console.log('Found tweets: ' + data.statuses.length); console.log('First one: ' + data.statuses[1].text); } else { console.log('Something went wrong while SEARCHING...'); } }); } Exploring a news data service Let's now build a bot that will tweet news articles to its followers at regular intervals. We will then extend it to be personalized by users through a conversation that happens over direct messaging with the bot. In order to build a news bot, we need a source where we can get news articles. We are going to explore a news service called NewsAPI.org in this section. News API is a service that aggregates news articles from roughly 70 newspapers around the globe. Setting up News API Let us set up an account with the News API data service and get the API key: Go to NewsAPI.org: Click Get API key. Register using your email. Get your API key. Explore the sources: https://newsapi.org/v1/sources?apiKey=YOUR_API_KEY. There are about 70 sources from across the globe including popular ones such as BBC News, Associated Press, Bloomberg, and CNN. You might notice that each source has a category tag attached. The possible options are: business, entertainment, gaming, general, music, politics, science-and-nature, sport, and technology. You might also notice that each source also has language (en, de, fr) and country (au, de, gb, in, it, us) tags. The following is the information on the BBC-News source: { "id": "bbc-news", "name": "BBC News", "description": "Use BBC News for up-to-the-minute news, breaking news, video, audio and feature stories. BBC News provides trusted World and UK news as well as local and regional perspectives. Also entertainment, business, science, technology and health news.", "url": "http://www.bbc.co.uk/news", "category": "general", "language": "en", "country": "gb", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top" ] } Get sources for a specific category, language, or country using: https://newsapi.org/v1/sources?category=business&apiKey=YOUR_API_KEY The following is the part of the response to the preceding query asking for all sources under the business category: "sources": [ { "id": "bloomberg", "name": "Bloomberg", "description": "Bloomberg delivers business and markets news, data, analysis, and video to the world, featuring stories from Businessweek and Bloomberg News.", "url": "http://www.bloomberg.com", "category": "business", "language": "en", "country": "us", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top" ] }, { "id": "business-insider", "name": "Business Insider", "description": "Business Insider is a fast-growing business site with deep financial, media, tech, and other industry verticals. Launched in 2007, the site is now the largest business news site on the web.", "url": "http://www.businessinsider.com", "category": "business", "language": "en", "country": "us", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top", "latest" ] }, ... ] Explore the articles: https://newsapi.org/v1/articles?source=bbc-news&apiKey=YOUR_API_KEY The following is the sample response: "articles": [ { "author": "BBC News", "title": "US Navy collision: Remains found in hunt for missing sailors", "description": "Ten US sailors have been missing since Monday's collision with a tanker near Singapore.", "url": "http://www.bbc.co.uk/news/world-us-canada-41013686", "urlToImage": "https://ichef1.bbci.co.uk/news/1024/cpsprodpb/80D9/ production/_97458923_mediaitem97458918.jpg", "publishedAt": "2017-08-22T12:23:56Z" }, { "author": "BBC News", "title": "Afghanistan hails Trump support in 'joint struggle'", "description": "President Ghani thanks Donald Trump for supporting Afghanistan's battle against the Taliban.", "url": "http://www.bbc.co.uk/news/world-asia-41012617", "urlToImage": "https://ichef.bbci.co.uk/images/ic/1024x576/p05d08pf.jpg", "publishedAt": "2017-08-22T11:45:49Z" }, ... ] For each article, the author, title, description, url, urlToImage,, and publishedAt fields are provided. Now that we have explored a source of news data that provides up-to-date news stories under various categories, let us go on to build a news bot. Building a Twitter news bot Now that we have explored News API, a data source for the latest news updates, and a little bit of what the Twitter API can do, let us combine them both to build a bot tweeting interesting news stories, first on its own timeline and then specifically to each of its followers: Let's build a news tweeter module that tweets the top news article given the source. The following code uses the tweet() function we built earlier: topNewsTweeter('cnn', null); function topNewsTweeter(newsSource, screen_name, status_id){ request({ url: 'https://newsapi.org/v1/articles?source=' + newsSource + '&apiKey=YOUR_API_KEY', method: 'GET' }, function (error, response, body) { //response is from the bot if (!error && response.statusCode == 200) { var botResponse = JSON.parse(body); console.log(botResponse); tweetTopArticle(botResponse.articles, screen_name); } else { console.log('Sorry. No new'); } }); } function tweetTopArticle(articles, screen_name, status_id){ var article = articles[0]; tweet(article.title + " " + article.url, screen_name); } Run the preceding program to fetch news from CNN and post the topmost article on Twitter: Here is the post on Twitter: Now, let us build a module that tweets news stories from a randomly-chosen source in a list of sources: function tweetFromRandomSource(sources, screen_name, status_id){ var max = sources.length; var randomSource = sources[Math.floor(Math.random() * (max + 1))]; //topNewsTweeter(randomSource, screen_name, status_id); } Let's call the tweeting module after we acquire the list of sources: function getAllSourcesAndTweet(){ var sources = []; console.log('getting sources...') request({ url: 'https://newsapi.org/v1/sources? apiKey=YOUR_API_KEY', method: 'GET' }, function (error, response, body) { //response is from the bot if (!error && response.statusCode == 200) { // Print out the response body var botResponse = JSON.parse(body); for (var i = 0; i < botResponse.sources.length; i++){ console.log('adding.. ' + botResponse.sources[i].id) sources.push(botResponse.sources[i].id) } tweetFromRandomSource(sources, null, null); } else { console.log('Sorry. No news sources!'); } }); } Let's create a new JS file called tweeter.js. In the tweeter.js file, call getSourcesAndTweet() to get the process started: //tweeter.js var TwitterPackage = require('twitter'); var request = require('request'); console.log("Hello World! I am a twitter bot!"); var secret = { consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_SECRET' } var Twitter = new TwitterPackage(secret); getAllSourcesAndTweet(); Run the tweeter.js file on the console. This bot will tweet a news story every time it is called. It will choose top news stories from around 70 news sources randomly. Hurray! You have built your very own Twitter news bot. In this tutorial, we have covered a lot. We started off with the Twitter API and got a taste of how we can automatically tweet, retweet, and search for tweets using hashtags. We then explored a News source API that provides news articles from about 70 different newspapers. We integrated it with our Twitter bot to create a new tweeting bot. If you found this post useful, do check out the book, Hands-On Chatbots and Conversational UI Development, which will help you explore the world of conversational user interfaces. Build and train an RNN chatbot using TensorFlow [Tutorial] Building a two-way interactive chatbot with Twilio: A step-by-step guide How to create a conversational assistant or chatbot using Python

0
1
6462

How-To Tutorials

article-image-reactive-programming-and-flux-architecture

Packt

18 Feb 2016

12 min read

Reactive Programming and the Flux Architecture

Packt

18 Feb 2016

12 min read

Reactive programming, including functional reactive programming as will be discussed later, is a programming paradigm that can be used in multiparadigm languages such as JavaScript, Python, Scala, and many more. It is primarily distinguished from imperative programming, in which a statement does something by what are called side effects, in literature, about functional and reactive programming. Please note, though, that side effects here are not what they are in common English, where all medications have some effects, which are the point of taking the medication, and some other effects are unwanted but are tolerated for the main benefit. For example, Benadryl is taken for the express purpose of reducing symptoms of airborne allergies, and the fact that Benadryl, in a way similar to some other allergy medicines, can also cause drowsiness is (or at least was; now it is also sold as a sleeping aid) a side effect. This is unwelcome but tolerated as the lesser of two evils by people, who would rather be somewhat tired and not bothered by allergies than be alert but bothered by frequent sneezing. Medication side effects are rarely the only thing that would ordinarily be considered side effects by a programmer. For them, side effects are the primary intended purpose and effect of a statement, often implemented through changes in the stored state for a program. (For more resources related to this topic, see here.) Reactive programming has its roots in the observer pattern, as discussed in Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides's classic book Design Patterns: Elements of Reusable Object-Oriented Software (the authors of this book are commonly called GoF or Gang of Four). In the observer pattern, there is an observable subject. It has a list of listeners, and notifies all of them when it has something to publish. This is somewhat simpler than the publisher/subscriber (PubSub) pattern, not having potentially intricate filtering of which messages reach which subscriber which is a normal feature to include. Reactive programming has developed a life of its own, a bit like the MVC pattern-turned-buzzword, but it is best taken in connection with the broader context explored in GoF. Reactive programming, including the ReactJS framework (which is explored in this title), is intended to avoid the shared mutable state and be idempotent. This means that, as with RESTful web services, you will get the same result from a function whether you call it once or a hundred times. Pete Hunt formerly of Facebook—perhaps the face of ReactJS as it now exists—has said that he would rather be predictable than right. If there is a bug in his code, Hunt would rather have the interface fail the same way every single time than go on elaborate hunts for heisenbugs. These are bugs that manifest only in some special and slippery edge cases, and are explored later in this book. ReactJS is called the V of MVC. That is, it is intended for user interface work and has little intentions of offering other standard features. But just as the painter Charles Cézanne said about the impressionist painter Claude Monet, "Monet is only an eye, but what an eye!" about MVC and ReactJS, we can say, "ReactJS is only a view, but what a view!" In this chapter, we will be covering the following topics: Declarative programming The war on heisenbugs The Flux Architecture From pit of despair to the pit of success A complete UI teardown and rebuild JavaScript as a Domain-specific Language (DSL) Big-Coffee Notation ReactJS, the library explored in this book, was developed by Facebook and made open source in the not-too-distant past. It is shaped by some of Facebook's concerns about making a large-scale site that is safe to debug and work on, and also allowing a large number of programmers to work on different components without having to store brain-bending levels of complexity in their heads. The quotation "Simplicity is the lack of interleaving," which can be found in the videos at http://facebook.github.io/react, is not about how much or how little stuff there is on an absolute scale, but about how many moving parts you need to juggle simultaneously to work on a system (See the section on Big-Coffee Notation for further reflections). Declarative programming Probably, the biggest theoretical advantage of the ReactJS framework is that the programming is declarative rather than imperative. In imperative programming, you specify what steps need to be done; declarative programming is the programming in which you specify what needs to be accomplished without telling how it needs to be done. It may be difficult at first to shift from an imperative paradigm to a declarative paradigm, but once the shift has been made, it is well worth the effort involved to get there. Familiar examples of declarative paradigms, as opposed to imperative paradigms, include both SQL and HTML. An SQL query would be much more verbose if you had to specify how exactly to find records and filter them appropriately, let alone say how indices are to be used, and HTML would be much more verbose if, instead of having an IMG tag, you had to specify how to render an image. Many libraries, for instance, are more declarative than a rolling of your own solution from scratch. With a library, you are more likely to specify only what needs to be done and not—in addition to this—how to do it. ReactJS is not in any sense the only library or framework that is intended to provide a more declarative JavaScript, but this is one of its selling points, along with other better specifics that it offers to help teams work together and be productive. And again, ReactJS has emerged from some of Facebook's efforts in managing bugs and cognitive load while enabling developers to contribute a lot to a large-scale project. The war on Heisenbugs In modern physics, Heisenberg's uncertainty principle loosely says that there is an absolute theoretical limit to how well a particle's position and velocity can be known. Regardless of how good a laboratory's measuring equipment gets, funny things will always happen when you try to pin things down too far. Heisenbugs, loosely speaking, are subtle, slippery bugs that can be very hard to pin down. They only manifest under very specific conditions and may even fail to manifest when one attempts to investigate them (note that this definition is slightly different from the jargon file's narrower and more specific definition at http://www.catb.org/jargon/html/H/heisenbug.html, which specifies that attempting to measure a heisenbug may suppress its manifestation). This motive—of declaring war on heisenbugs—stems from Facebook's own woes and experiences in working at scale and seeing heisenbugs keep popping up. One thing that Pete Hunt mentioned, in not a flattering light at all, was a point where Facebook's advertisement system was only understood by two engineers well enough who were comfortable with modifying it. This is an example of something to avoid. By contrast, looking at Pete Hunt's remark that he would "rather be predictable than right" is a statement that if a defectively designed lamp can catch fire and burn, his much, much rather have it catch fire and burn immediately, the same way, every single time, than at just the wrong point of the moon phase have something burn. In the first case, the lamp will fail testing while the manufacturer is testing, the problem will be noticed and addressed, and lamps will not be shipped out to the public until the defect has been property addressed. The opposite Heisenbug case is one where the lamp will spark and catch fire under just the wrong conditions, which means that a defect will not be caught until the laps have shipped and started burning customers' homes down. "Predictable" means "fail the same way, every time, if it's going to fail at all." "Right means "passes testing successfully, but we don't know whether they're safe to use [probably they aren't]." Now, he ultimately does, in fact, care about being right, but the choices that Facebook has made surrounding React stem from a realization that being predictable is a means to being right. It's not acceptable for a manufacturer to ship something that will always spark and catch fire when a consumer plugs it in. However, being predictable moves the problems to the front and the center, rather than being the occasional result of subtle, hard-to-pin-down interactions that will have unacceptable consequences in some rare circumstances. The choices in Flux and ReactJS are designed to make failures obvious and bring them to the surface, rather than them being manifested only in the nooks and crannies of a software labyrinth. Facebook's war on the shared mutable state is illustrated in the experience that they had regarding a chat bug. The chat bug became an overarching concern for its users. One crucial moment of enlightenment for Facebook came when they announced a completely unrelated feature, and the first comment on this feature was a request to fix the chat; it got 898 likes. Also, they commented that this was one of the more polite requests. The problem was that the indicator for unread messages could have a phantom positive message count when there were no messages available. Things came to a point where people seemed not to care about what improvements or new features Facebook was adding, but just wanted them to fix the phantom message count. And they kept investigating and kept addressing edge cases, but the phantom message count kept on recurring. The solution, besides ReactJS, was found in the flux pattern, or architecture, which is discussed in the next section. After a situation where not too many people felt comfortable making changes, all of a sudden, many more people felt comfortable making changes. These things simplified matters enough that new developers tended not to really need the ramp-up time and treatment that had previously been given. Furthermore, when there was a bug, the more experienced developers could guess with reasonable accuracy what part of the system was the culprit, and the newer developers, after working on a bug, tended to feel confident and have a general sense of how the system worked. The Flux Architecture One of the ways in which Facebook, in relation to ReactJS, has declared war on heisenbugs is by declaring war on the mutable state. Flux is an architecture and a pattern, rather than a specific technology, and it can be used (or not used) with ReactJS. It is somewhat like MVC, equivalent to a loose competitor to that approach, but it is very different from a simple MVC variant and is designed to have a pit of success that provides unidirectional data flow like this: from the action to the dispatcher, then to the store, and finally to the view (but some people have said that these two are so different that a direct comparison between Flux and MVC, in terms of trying to identify what part of Flux corresponds to what conceptual hook in MVC, is not really that helpful). Actions are like events—they are fed into a top funnel. Dispatchers go through the funnels and can not only pass actions but also make sure that no additional actions are dispatched until the previous one has completely settled out. Stores have similarities and difference to models. They are like models in that they keep track of state. They are unlike models in that they have only getters, not setters, which stops the effect of any part of the program with access to a model being able to change anything in its setters. Stores can accept input, but in a very controlled way, and in general a store is not at the mercy of anything possessing a reference to it. A view is what displays the current output based on what is obtained from stores. Stores, compared to models in some respects, have getters but not setters. This helps foster a kind of data flow that is not at the mercy of anyone who has access to a setter. It is possible for events to be percolated as actions, but the dispatcher acts as a traffic cop and ensures that new actions are processed only after the stores are completely settled. This de-escalates the complexity considerably. Flux simplified interactions so that Facebook developers no longer had subtle edge cases and bug that kept coming back—the chat bug was finally dead and has not come back. Summary We just took a whirlwind tour of some of the theory surrounding reactive programming with ReactJS. This includes declarative programming, one of the selling points of ReactJS that offers something easier to work with at the end than imperative programming. The war on heisenbugs, is an overriding concern surrounding decisions made by Facebook, including ReactJS. This takes place through Facebook's declared war on the shared mutable state. The Flux Architecture is used by Facebook with ReactJS to avoid some nasty classes of bugs. To learn more about Reactive Programming and the Flux Architecture, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Reactive Programming with JavaScript (https://www.packtpub.com/application-development/reactive-programming-javascript) Clojure Reactive Programming (https://www.packtpub.com/web-development/clojure-reactive-programming) Resources for Article: Further resources on this subject: The Observer Pattern [article] Concurrency in Practice [article] Introduction to Akka [article]

0
0
6459

article-image-bryan-cantrill-on-the-changing-ethical-dilemmas-in-software-engineering

Vincy Davis

17 May 2019

6 min read

Bryan Cantrill on the changing ethical dilemmas in Software Engineering

Vincy Davis

17 May 2019

6 min read

Earlier this month at the Craft Conference in Budapest, Bryan Cantrill (Chief Technology Officer at Joyent) gave a talk on “Andreessen's Corollary: Ethical Dilemmas in Software Engineering”. In 2011, Marc Andreessen had penned an essay ‘Why Software Is Eating The World’ in The Wall Street Journal. In the article, he’s talking about how software is present in all fields and are poised to take over large swathes of the economy. He believed, way back in 2011, that “many of the prominent new Internet companies are building real, high-growth, high-margin, highly defensible businesses.” Eight years later, Bryan Cantrill believes this prophecy is clearly coming to fulfillment. According to the article ‘Software engineering code of ethics’ published in 1997 by ACM (Association for Computing Machinery), a code is not a simple ethical program that generates ethical judgements. In some situations, these codes can generate conflict with each other. This will require a software engineer to use ethical judgement that will be consistent in terms of ethics. The article provides certain principles for software engineers to follow. According to Bryan, these principles are difficult to follow. Some of the principles expect software engineers to ensure that their product on which they are working is useful and of acceptable quality to the public, the employer, the client and user, is completed on time and of reasonable cost & free of errors. The codes specifications should be well documented according to the user’s requirements and have the client’s approval. The codes should have appropriate methodology and good management. Software engineers should ensure realistic estimate of cost, scheduling, and outcome of any project on which they work or propose to work. The guiding context surrounding the code of ethics remains timeless, but as time has changed, these principles have become old fashioned. With the immense use of software and industries implementing these codes, it’s difficult for software engineers to follow these old principles and be ethically sound. Bryan calls this era as an ‘ethical grey area’ for software engineers. The software's contact with our broader world, has brought with it novel ethical dilemmas for those who endeavor to build it. More than ever, software engineers are likely to find themselves in new frontiers with respect to society, the law or their own moral compass. Often without any formal training or even acknowledgement with respect to the ethical dimensions of their work, software engineers have to make ethical judgments. Ethical dilemmas in software development since Andreessen’s prophecy 2012 : Facebook started using emotional manipulation by beginning to perform experiments in the name of dynamic research or to generate revenue. The posts were determined to be positive or negative. 2013 : ‘Zenefits’ is a Silicon Valley startup. In order to build a software, they had to be certified by the state of California. For which, they had to sit through 52 hours of training studying the materials through the web browser. The manager created a hack called ‘Macro’ that made it possible to complete the pre-licensing education requirement in less than 52 hours. This was passed on to almost 100 Zenefit employees to automate the process for them too. 2014 : Uber illegally entered the Portland market, with a software called ‘Greyball’. This software was used by uber to intentionally evade Portland Bureau of Transportation (PBOT) officers and deny separate ride requests. 2015 : Google started to mislabel photo captions. One of the times, Google mistakenly identified a dark skinned individual as a ‘Gorilla’. They offered a prompt reaction and removed the photo. This highlighted a real negative point of Artificial Intelligence (AI), that AI relies on biased human classification, at times using repeated patterns. Google was facing the problem of defending this mistake, as Google had not intentionally misled its network with such wrong data. 2016 : The first Tesla ‘Autopilot’ car was launched. It had traffic avoiding cruise control and steering assists features but was sold and marketed as a autopilot car. In an accident, the driver was killed, maybe because he believed that the car will drive itself. This was a serious problem. Tesla was using two cameras to judge the movements while driving. It should be understood that this Tesla car was just an enhancement to the driver and not a replacement. 2017 : Facebook faced the ire of the anti- Rohingya violence in Myanmar. Facebook messages were used to coordinate a effective genocide against the Rohingya, a mostly Muslim minority community, where 75000 people died. Facebook did not enable it or advocate it. It was a merely a communication platform, used for a wrong purpose. But Facebook could have helped to reduce the gravity of the situation by acting promptly and not allowing such messages to be circulated. This shows how everything should not be automated and human judgement cannot be replaced anytime soon. 2018 : In the wake of Pittsburg shooting, the alleged shooter had used the Gab platform to post against the Jews. Gab, which bills itself as "the free speech social network," is small compared to mainstream social media platforms but it has an avid user base. Joyent provided infrastructure to Gab, but quickly removed them from their platform, after the horrific incident. 2019 : After the 737 MAX MCAS AND JT/610 /ET302 crashes, reports emerged that aircraft's MCAS system played a role in the crash. The crash happened because a faulty sensor erroneously reported that the airplane was stalling. The false report triggered an automated system known as the Maneuvering Characteristics Augmentation System (MCAS). MCAS is activated without the pilot’s input. The crew confirmed that the manual trim operation was not working. These are some of the examples of ethical dilemmas in the post-Andreessen’s prophecy. As seen, all the incidents were the result of ethical decisions gone wrong. It is clear that ‘what is right for software is not necessarily right for society.’ How to deal with these ethical dilemmas? In the summer of 2018, the ACM came up with a new code of Ethics: Contribute to society and human well being Avoid harm Be honest and trustworthy It has also included an Integrity project which will have case studies and “Ask an Ethicist” feature. These efforts by ACM will help software engineers facing ethical dilemmas. This will also pave way for great discussions resulting in a behavior consistent with the code of ethics. Organisations should encourage such discussions. This will help like minded people to perpetuate a culture of consideration of ethical consequences. As software’s footprint continues to grow, the ethical dilemmas of software engineers will only expand. These Ethical dilemmas are Andreessen’s corollary. And software engineers must address them collectively and directly. Software engineers agree with this evolving nature of ethical dilemmas. https://twitter.com/MA_Hanin/status/1129082836512911360 Watch the talk by Bryan Cantrill at Craft Conference. All coding and no sleep makes Jack/Jill a dull developer, research confirms Red Badger Tech Director Viktor Charypar talks monorepos, lifelong learning, and the challenges facing open source software Google AI engineers introduce Translatotron, an end-to-end speech-to-speech translation model

0
0
6457

How-To Tutorials

Packt

04 Mar 2015

20 min read

AngularJS Performance

Packt

04 Mar 2015

20 min read

In this article by Chandermani, the author of AngularJS by Example, we focus our discussion on the performance aspect of AngularJS. For most scenarios, we can all agree that AngularJS is insanely fast. For standard size views, we rarely see any performance bottlenecks. But many views start small and then grow over time. And sometimes the requirement dictates we build large pages/views with a sizable amount of HTML and data. In such a case, there are things that we need to keep in mind to provide an optimal user experience. Take any framework and the performance discussion on the framework always requires one to understand the internal working of the framework. When it comes to Angular, we need to understand how Angular detects model changes. What are watches? What is a digest cycle? What roles do scope objects play? Without a conceptual understanding of these subjects, any performance guidance is merely a checklist that we follow without understanding the why part. Let's look at some pointers before we begin our discussion on performance of AngularJS: The live binding between the view elements and model data is set up using watches. When a model changes, one or many watches linked to the model are triggered. Angular's view binding infrastructure uses these watches to synchronize the view with the updated model value. Model change detection only happens when a digest cycle is triggered. Angular does not track model changes in real time; instead, on every digest cycle, it runs through every watch to compare the previous and new values of the model to detect changes. A digest cycle is triggered when $scope.$apply is invoked. A number of directives and services internally invoke $scope.$apply: Directives such as ng-click, ng-mouse* do it on user action Services such as $http and $resource do it when a response is received from server $timeout or $interval call $scope.$apply when they lapse A digest cycle tracks the old value of the watched expression and compares it with the new value to detect if the model has changed. Simply put, the digest cycle is a workflow used to detect model changes. A digest cycle runs multiple times till the model data is stable and no watch is triggered. Once you have a clear understanding of the digest cycle, watches, and scopes, we can look at some performance guidelines that can help us manage views as they start to grow. (For more resources related to this topic, see here.) Performance guidelines When building any Angular app, any performance optimization boils down to: Minimizing the number of binding expressions and hence watches Making sure that binding expression evaluation is quick Optimizing the number of digest cycles that take place The next few sections provide some useful pointers in this direction. Remember, a lot of these optimization may only be necessary if the view is large. Keeping the page/view small The sanest advice is to keep the amount of content available on a page small. The user cannot interact/process too much data on the page, so remember that screen real estate is at a premium and only keep necessary details on a page. The lesser the content, the lesser the number of binding expressions; hence, fewer watches and less processing are required during the digest cycle. Remember, each watch adds to the overall execution time of the digest cycle. The time required for a single watch can be insignificant but, after combining hundreds and maybe thousands of them, they start to matter. Angular's data binding infrastructure is insanely fast and relies on a rudimentary dirty check that compares the old and the new values. Check out the stack overflow (SO) post (http://stackoverflow.com/questions/9682092/databinding-in-angularjs), where Misko Hevery (creator of Angular) talks about how data binding works in Angular. Data binding also adds to the memory footprint of the application. Each watch has to track the current and previous value of a data-binding expression to compare and verify if data has changed. Keeping a page/view small may not always be possible, and the view may grow. In such a case, we need to make sure that the number of bindings does not grow exponentially (linear growth is OK) with the page size. The next two tips can help minimize the number of bindings in the page and should be seriously considered for large views. Optimizing watches for read-once data In any Angular view, there is always content that, once bound, does not change. Any read-only data on the view can fall into this category. This implies that once the data is bound to the view, we no longer need watches to track model changes, as we don't expect the model to update. Is it possible to remove the watch after one-time binding? Angular itself does not have something inbuilt, but a community project bindonce (https://github.com/Pasvaz/bindonce) is there to fill this gap. Angular 1.3 has added support for bind and forget in the native framework. Using the syntax {{::title}}, we can achieve one-time binding. If you are on Angular 1.3, use it! Hiding (ng-show) versus conditional rendering (ng-if/ng-switch) content You have learned two ways to conditionally render content in Angular. The ng-show/ng-hide directive shows/hides the DOM element based on the expression provided and ng-if/ng-switch creates and destroys the DOM based on an expression. For some scenarios, ng-if can be really beneficial as it can reduce the number of binding expressions/watches for the DOM content not rendered. Consider the following example: <div ng-if='user.isAdmin'> <div ng-include="'admin-panel.html'"></div></div> The snippet renders an admin panel if the user is an admin. With ng-if, if the user is not an admin, the ng-include directive template is neither requested nor rendered saving us of all the bindings and watches that are part of the admin-panel.html view. From the preceding discussion, it may seem that we should get rid of all ng-show/ng-hide directives and use ng-if. Well, not really! It again depends; for small size pages, ng-show/ng-hide works just fine. Also, remember that there is a cost to creating and destroying the DOM. If the expression to show/hide flips too often, this will mean too many DOMs create-and-destroy cycles, which are detrimental to the overall performance of the app. Expressions being watched should not be slow Since watches are evaluated too often, the expression being watched should return results fast. The first way we can make sure of this is by using properties instead of functions to bind expressions. These expressions are as follows: {{user.name}}ng-show='user.Authorized' The preceding code is always better than this: {{getUserName()}}ng-show = 'isUserAuthorized(user)' Try to minimize function expressions in bindings. If a function expression is required, make sure that the function returns a result quickly. Make sure a function being watched does not: Make any remote calls Use $timeout/$interval Perform sorting/filtering Perform DOM manipulation (this can happen inside directive implementation) Or perform any other time-consuming operation Be sure to avoid such operations inside a bound function. To reiterate, Angular will evaluate a watched expression multiple times during every digest cycle just to know if the return value (a model) has changed and the view needs to be synchronized. Minimizing the deep model watch When using $scope.$watch to watch for model changes in controllers, be careful while setting the third $watch function parameter to true. The general syntax of watch looks like this: $watch(watchExpression, listener, [objectEquality]); In the standard scenario, Angular does an object comparison based on the reference only. But if objectEquality is true, Angular does a deep comparison between the last value and new value of the watched expression. This can have an adverse memory and performance impact if the object is large. Handling large datasets with ng-repeat The ng-repeat directive undoubtedly is the most useful directive Angular has. But it can cause the most performance-related headaches. The reason is not because of the directive design, but because it is the only directive that allows us to generate HTML on the fly. There is always the possibility of generating enormous HTML just by binding ng-repeat to a big model list. Some tips that can help us when working with ng-repeat are: Page data and use limitTo: Implement a server-side paging mechanism when a number of items returned are large. Also use the limitTo filter to limit the number of items rendered. Its syntax is as follows: <tr ng-repeat="user in users |limitTo:pageSize">…</tr> Look at modules such as ngInfiniteScroll (http://binarymuse.github.io/ngInfiniteScroll/) that provide an alternate mechanism to render large lists. Use the track by expression: The ng-repeat directive for performance tries to make sure it does not unnecessarily create or delete HTML nodes when items are added, updated, deleted, or moved in the list. To achieve this, it adds a $$hashKey property to every model item allowing it to associate the DOM node with the model item. We can override this behavior and provide our own item key using the track by expression such as: <tr ng-repeat="user in users track by user.id">…</tr> This allows us to use our own mechanism to identify an item. Using your own track by expression has a distinct advantage over the default hash key approach. Consider an example where you make an initial AJAX call to get users: $scope.getUsers().then(function(users){ $scope.users = users;}) Later again, refresh the data from the server and call something similar again: $scope.users = users; With user.id as a key, Angular is able to determine what elements were added/deleted and moved; it can also determine created/deleted DOM nodes for such elements. Remaining elements are not touched by ng-repeat (internal bindings are still evaluated). This saves a lot of CPU cycles for the browser as fewer DOM elements are created and destroyed. Do not bind ng-repeat to a function expression: Using a function's return value for ng-repeat can also be problematic, depending upon how the function is implemented. Consider a repeat with this: <tr ng-repeat="user in getUsers()">…</tr> And consider the controller getUsers function with this: $scope.getUser = function() { var orderBy = $filter('orderBy'); return orderBy($scope.users, predicate);} Angular is going to evaluate this expression and hence call this function every time the digest cycle takes place. A lot of CPU cycles were wasted sorting user data again and again. It is better to use scope properties and presort the data before binding. Minimize filters in views, use filter elements in the controller: Filters defined on ng-repeat are also evaluated every time the digest cycle takes place. For large lists, if the same filtering can be implemented in the controller, we can avoid constant filter evaluation. This holds true for any filter function that is used with arrays including filter and orderBy. Avoiding mouse-movement tracking events The ng-mousemove, ng-mouseenter, ng-mouseleave, and ng-mouseover directives can just kill performance. If an expression is attached to any of these event directives, Angular triggers a digest cycle every time the corresponding event occurs and for events like mouse move, this can be a lot. We have already seen this behavior when working with 7 Minute Workout, when we tried to show a pause overlay on the exercise image when the mouse hovers over it. Avoid them at all cost. If we just want to trigger some style changes on mouse events, CSS is a better tool. Avoiding calling $scope.$apply Angular is smart enough to call $scope.$apply at appropriate times without us explicitly calling it. This can be confirmed from the fact that the only place we have seen and used $scope.$apply is within directives. The ng-click and updateOnBlur directives use $scope.$apply to transition from a DOM event handler execution to an Angular execution context. Even when wrapping the jQuery plugin, we may require to do a similar transition for an event raised by the JQuery plugin. Other than this, there is no reason to use $scope.$apply. Remember, every invocation of $apply results in the execution of a complete digest cycle. The $timeout and $interval services take a Boolean argument invokeApply. If set to false, the lapsed $timeout/$interval services does not call $scope.$apply or trigger a digest cycle. Therefore, if you are going to perform background operations that do not require $scope and the view to be updated, set the last argument to false. Always use Angular wrappers over standard JavaScript objects/functions such as $timeout and $interval to avoid manually calling $scope.$apply. These wrapper functions internally call $scope.$apply. Also, understand the difference between $scope.$apply and $scope.$digest. $scope.$apply triggers $rootScope.$digest that evaluates all application watches whereas, $scope.$digest only performs dirty checks on the current scope and its children. If we are sure that the model changes are not going to affect anything other than the child scopes, we can use $scope.$digest instead of $scope.$apply. Lazy-loading, minification, and creating multiple SPAs I hope you are not assuming that the apps that we have built will continue to use the numerous small script files that we have created to separate modules and module artefacts (controllers, directives, filters, and services). Any modern build system has the capability to concatenate and minify these files and replace the original file reference with a unified and minified version. Therefore, like any JavaScript library, use minified script files for production. The problem with the Angular bootstrapping process is that it expects all Angular application scripts to be loaded before the application can bootstrap. We cannot load modules, controllers, or in fact, any of the other Angular constructs on demand. This means we need to provide every artefact required by our app, upfront. For small applications, this is not a problem as the content is concatenated and minified; also, the Angular application code itself is far more compact as compared to the traditional JavaScript of jQuery-based apps. But, as the size of the application starts to grow, it may start to hurt when we need to load everything upfront. There are at least two possible solutions to this problem; the first one is about breaking our application into multiple SPAs. Breaking applications into multiple SPAs This advice may seem counterintuitive as the whole point of SPAs is to get rid of full page loads. By creating multiple SPAs, we break the app into multiple small SPAs, each supporting parts of the overall app functionality. When we say app, it implies a combination of the main (such as index.html) page with ng-app and all the scripts/libraries and partial views that the app loads over time. For example, we can break the Personal Trainer application into a Workout Builder app and a Workout Runner app. Both have their own start up page and scripts. Common scripts such as the Angular framework scripts and any third-party libraries can be referenced in both the applications. On similar lines, common controllers, directives, services, and filters too can be referenced in both the apps. The way we have designed Personal Trainer makes it easy to achieve our objective. The segregation into what belongs where has already been done. The advantage of breaking an app into multiple SPAs is that only relevant scripts related to the app are loaded. For a small app, this may be an overkill but for large apps, it can improve the app performance. The challenge with this approach is to identify what parts of an application can be created as independent SPAs; it totally depends upon the usage pattern of the application. For example, assume an application has an admin module and an end consumer/user module. Creating two SPAs, one for admin and the other for the end customer, is a great way to keep user-specific features and admin-specific features separate. A standard user may never transition to the admin section/area, whereas an admin user can still work on both areas; but transitioning from the admin area to a user-specific area will require a full page refresh. If breaking the application into multiple SPAs is not possible, the other option is to perform the lazy loading of a module. Lazy-loading modules Lazy-loading modules or loading module on demand is a viable option for large Angular apps. But unfortunately, Angular itself does not have any in-built support for lazy-loading modules. Furthermore, the additional complexity of lazy loading may be unwarranted as Angular produces far less code as compared to other JavaScript framework implementations. Also once we gzip and minify the code, the amount of code that is transferred over the wire is minimal. If we still want to try our hands on lazy loading, there are two libraries that can help: ocLazyLoad (https://github.com/ocombe/ocLazyLoad): This is a library that uses script.js to load modules on the fly angularAMD (http://marcoslin.github.io/angularAMD): This is a library that uses require.js to lazy load modules With lazy loading in place, we can delay the loading of a controller, directive, filter, or service script, until the page that requires them is loaded. The overall concept of lazy loading seems to be great but I'm still not sold on this idea. Before we adopt a lazy-load solution, there are things that we need to evaluate: Loading multiple script files lazily: When scripts are concatenated and minified, we load the complete app at once. Contrast it to lazy loading where we do not concatenate but load them on demand. What we gain in terms of lazy-load module flexibility we lose in terms of performance. We now have to make a number of network requests to load individual files. Given these facts, the ideal approach is to combine lazy loading with concatenation and minification. In this approach, we identify those feature modules that can be concatenated and minified together and served on demand using lazy loading. For example, Personal Trainer scripts can be divided into three categories: The common app modules: This consists of any script that has common code used across the app and can be combined together and loaded upfront The Workout Runner module(s): Scripts that support workout execution can be concatenated and minified together but are loaded only when the Workout Runner pages are loaded. The Workout Builder module(s): On similar lines to the preceding categories, scripts that support workout building can be combined together and served only when the Workout Builder pages are loaded. As we can see, there is a decent amount of effort required to refactor the app in a manner that makes module segregation, concatenation, and lazy loading possible. The effect on unit and integration testing: We also need to evaluate the effect of lazy-loading modules in unit and integration testing. The way we test is also affected with lazy loading in place. This implies that, if lazy loading is added as an afterthought, the test setup may require tweaking to make sure existing tests still run. Given these facts, we should evaluate our options and check whether we really need lazy loading or we can manage by breaking a monolithic SPA into multiple smaller SPAs. Caching remote data wherever appropriate Caching data is the one of the oldest tricks to improve any webpage/application performance. Analyze your GET requests and determine what data can be cached. Once such data is identified, it can be cached from a number of locations. Data cached outside the app can be cached in: Servers: The server can cache repeated GET requests to resources that do not change very often. This whole process is transparent to the client and the implementation depends on the server stack used. Browsers: In this case, the browser caches the response. Browser caching depends upon the server sending HTTP cache headers such as ETag and cache-control to guide the browser about how long a particular resource can be cached. Browsers can honor these cache headers and cache data appropriately for future use. If server and browser caching is not available or if we also want to incorporate any amount of caching in the client app, we do have some choices: Cache data in memory: A simple Angular service can cache the HTTP response in the memory. Since Angular is SPA, the data is not lost unless the page refreshes. This is how a service function looks when it caches data: var workouts;service.getWorkouts = function () { if (workouts) return $q.resolve(workouts); return $http.get("/workouts").then(function (response){ workouts = response.data; return workouts; });}; The implementation caches a list of workouts into the workouts variable for future use. The first request makes a HTTP call to retrieve data, but subsequent requests just return the cached data as promised. The usage of $q.resolve makes sure that the function always returns a promise. Angular $http cache: Angular's $http service comes with a configuration option cache. When set to true, $http caches the response of the particular GET request into a local cache (again an in-memory cache). Here is how we cache a GET request: $http.get(url, { cache: true}); Angular caches this cache for the lifetime of the app, and clearing it is not easy. We need to get hold of the cache dedicated to caching HTTP responses and clear the cache key manually. The caching strategy of an application is never complete without a cache invalidation strategy. With cache, there is always a possibility that caches are out of sync with respect to the actual data store. We cannot affect the server-side caching behavior from the client; consequently, let's focus on how to perform cache invalidation (clearing) for the two client-side caching mechanisms described earlier. If we use the first approach to cache data, we are responsible for clearing cache ourselves. In the case of the second approach, the default $http service does not support clearing cache. We either need to get hold of the underlying $http cache store and clear the cache key manually (as shown here) or implement our own cache that manages cache data and invalidates cache based on some criteria: var cache = $cacheFactory.get('$http');cache.remove("http://myserver/workouts"); //full url Using Batarang to measure performance Batarang (a Chrome extension), as we have already seen, is an extremely handy tool for Angular applications. Using Batarang to visualize app usage is like looking at an X-Ray of the app. It allows us to: View the scope data, scope hierarchy, and how the scopes are linked to HTML elements Evaluate the performance of the application Check the application dependency graph, helping us understand how components are linked to each other, and with other framework components. If we enable Batarang and then play around with our application, Batarang captures performance metrics for all watched expressions in the app. This data is nicely presented as a graph available on the Performance tab inside Batarang: That is pretty sweet! When building an app, use Batarang to gauge the most expensive watches and take corrective measures, if required. Play around with Batarang and see what other features it has. This is a very handy tool for Angular applications. This brings us to the end of the performance guidelines that we wanted to share in this article. Some of these guidelines are preventive measures that we should take to make sure we get optimal app performance whereas others are there to help when the performance is not up to the mark. Summary In this article, we looked at the ever-so-important topic of performance, where you learned ways to optimize an Angular app performance. Resources for Article: Further resources on this subject: Role of AngularJS [article] The First Step [article] Recursive directives [article]

0
0
6455

article-image-replication-solutions-postgresql

Packt

09 Mar 2017

14 min read

Replication Solutions in PostgreSQL

Packt

09 Mar 2017

14 min read

0
0
6455

Exploring Deep Learning Architectures [Tutorial]

How to perform regression analysis using SAS

Testing a UI Using WebDriverJS

How is Python code organized

How to combine data files within IBM SPSS Modeler

Quantum computing, edge analytics, and meta learning: key trends in data science and big data in 2019

Testing Your Application with cljs.test

How to handle exceptions and synchronization methods with Selenium WebDriver API

Untangle VPN Services

Setting Up the iReport Pages

Trending Topics

Building a Twitter news bot using Twitter API [Tutorial]

Reactive Programming and the Flux Architecture

Bryan Cantrill on the changing ethical dilemmas in Software Engineering

AngularJS Performance

Replication Solutions in PostgreSQL