Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-exploring-deep-learning-architectures-tutorial
Melisha Dsouza
25 Nov 2018
11 min read
Save for later

Exploring Deep Learning Architectures [Tutorial]

Melisha Dsouza
25 Nov 2018
11 min read
This tutorial will focus on some of the important architectures present today in deep learning. A lot of the success of neural networks lies in the careful design of the neural network architecture. We will look at the architecture of Autoencoder Neural Networks, Variational Autoencoders, CNN's and RNN's. This tutorial is an excerpt from a book written by Dipanjan Sarkar, Raghav Bali, Et al titled Hands-On Transfer Learning with Python. This book extensively focuses on deep learning (DL) and transfer learning, comparing and contrasting the two with easy-to-follow concepts and examples. Autoencoder neural networks Autoencoders are typically used for reducing the dimensionality of data in neural networks. They are also successfully used for anomaly detection and novelty detection problems. Autoencoder neural networks come under the unsupervised learning category. The network is trained by minimizing the difference between input and output. A typical autoencoder architecture is a slight variant of the DNN architecture, where the number of units per hidden layer is progressively reduced until a certain point before being progressively increased, with the final layer dimension being equal to the input dimension. The key idea behind this is to introduce bottlenecks in the network and force it to learn a meaningful compact representation. The middle layer of hidden units (the bottleneck) is basically the dimension-reduced encoding of the input. The first half of the hidden layers is called the encoder, and the second half is called the decoder. The following depicts a simple autoencoder architecture. The layer named z is the representation layer here: Source: cloud4scieng.org Variational autoencoders The variational autoencoders (VAEs) are generative models and compared to other deep generative models, VAEs are computationally tractable and stable and can be estimated by the efficient backpropagation algorithm. They are inspired by the idea of variational inference in Bayesian analysis. The idea of variational inference is as follows: given input distribution x, the posterior probability distribution over output y is too complicated to work with. So, let's approximate that complicated posterior, p(y | x), with a simpler distribution, q(y). Here, q is chosen from a family of distributions, Q, that best approximates the posterior. For example, this technique is used in training latent Dirichlet allocation (LDAs) (they do topic modeling for text and are Bayesian generative models). Given a dataset, X, VAE can generate new samples similar but not necessarily equal to those in X. Dataset X has N Independent and Identically Distributed (IID) samples of some continuous or discrete random variable, x. Let's assume that the data is generated by some random process, involving an unobserved continuous random variable, z. In this example of a simple autoencoder, the variable z is deterministic and is a stochastic variable. Data generation is a two-step process: A value of z is generated from a prior distribution, ρθ(z) A value of x is generated from the conditional distribution, ρθ(x|z)  So, p(x) is basically the marginal probability, calculated as:   The parameter of the distribution, θ, and the latent variable, z, are both unknown. Here, x can be generated by taking samples from the marginal p(x). Backpropagation cannot handle stochastic variable z or stochastic layer z within the network. Assuming the prior distribution, p(z), is Gaussian, we can leverage the location-scale property of Gaussian distribution, and rewrite the stochastic layer as z = μ + σε , where μ is the location parameter, σ is the scale, and ε is the white noise. Now we can obtain multiple samples of the noise, ε, and feed them as the deterministic input to the neural network. Then, the model becomes an end-to-end deterministic deep neural network, as shown here: Here, the decoder part is same as in the case of the simple autoencoder that we looked at earlier. Types of CNN architectures CNNs are multilayered neural networks designed specifically for identifying shape patterns with a high degree of invariance to translation, scaling, and rotation in two-dimensional image data. These networks need to be trained in a supervised way. Typically, a labeled set of object classes, such as MNIST or ImageNet, is provided as a training set. The crux of any CNN model is the convolution layer and the subsampling/pooling layer. LeNet architecture This is a pioneering seven-level convolutional network, designed by LeCun and their co-authors in 1998, that was used for digit classification. Later, it was applied by several banks to recognize handwritten numbers on cheques. The lower layers of the network are composed of alternating convolution and max pooling layers. The upper layers are fully connected, dense MLPs (formed of hidden layers and logistic regression). The input to the first fully connected layer is the set of all the feature maps of the previous layer: AlexNet In 2012, AlexNet significantly outperformed all the prior competitors and won the ILSVRC by reducing the top-5 error to 15.3%, compared to the runner-up with 26%. This work popularized the application of CNNs in computer vision. AlexNet has a very similar architecture to that of LeNet, but has more filters per layer and is deeper. Also, AlexNet introduces the use of stacked convolution, instead of always using alternative convolution pooling. A stack of small convolutions is better than one large receptive field of convolution layers, as this introduces more non-linearities and fewer parameters. ZFNet The ILSVRC 2013 winner was a CNN from Matthew Zeiler and Rob Fergus. It became known as ZFNet. It improved on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller, going from 11 x 11 stride 4 in AlexNet to 7 x 7 stride 2 in ZFNet. The intuition behind this was that a smaller filter size in the first convolution layer helps to retain a lot of the original pixel information. Also, AlexNet was trained on 15 million images, while ZFNet was trained on only 1.3 million images: GoogLeNet (inception network) The ILSVRC 2014 winner was a convolutional network called GoogLeNet from Google. It achieved a top-5 error rate of 6.67%! This was very close to human-level performance. The runner up was the network from Karen Simonyan and Andrew Zisserman known as VGGNet. GoogLeNet introduced a new architectural component using a CNN called the inception layer. The intuition behind the inception layer is to use larger convolutions, but also keep a fine resolution for smaller information on the images. The following diagram describes the full GoogLeNet architecture: Visual Geometry Group Researchers from the Oxford Visual Geometry Group, or the VGG for short, developed the VGG network, which is characterized by its simplicity, using only 3 x 3 convolutional layers stacked on top of each other in increasing depth. Reducing volume size is handled by max pooling. At the end, two fully connected layers, each with 4,096 nodes, are then followed by a softmax layer. The only preprocessing done to the input is the subtraction of the mean RGB value, computed on the training set, from each pixel. Pooling is carried out by max pooling layers, which follow some of the convolution layers. Not all the convolution layers are followed by max pooling. Max pooling is performed over a 2 x 2 pixel window, with a stride of 2. ReLU activation is used in each of the hidden layers. The number of filters increases with depth in most VGG variants. The 16-layered architecture VGG-16 is shown in the following diagram. The 19-layered architecture with uniform 3 x 3 convolutions (VGG-19) is shown along with ResNet in the following section. The success of VGG models confirms the importance of depth in image representations: VGG-16: Input RGB image of size 224 x 224 x 3, the number of filters in each layer is circled Residual Neural Networks The main idea in this architecture is as follows. Instead of hoping that a set of stacked layers would directly fit a desired underlying mapping, H(x), they tried to fit a residual mapping. More formally, they let the stacked set of layers learn the residual R(x) = H(x) - x, with the true mapping later being obtained by a skip connection. The input is then added to the learned residual, R(x) + x. Also, batch normalization is applied right after each convolution and before activation: Here is the full ResNet architecture compared to VGG-19. The dotted skip connections show an increase in dimensions; hence, for the addition to be valid, no padding is done. Also, increases in dimensions are indicated by changes in color: Types of RNN architectures An recurrent neural Network (RNN) is specialized for processing a sequence of values, as in x(1), . . . , x(t).  We need to do sequence modeling if, say, we wanted to predict the next term in the sequence given the recent history of the sequence, or maybe translate a sequence of words in one language to another language. RNNs are distinguished from feedforward networks by the presence of a feedback loop in their architecture. It is often said that RNNs have memory. The sequential information is preserved in the RNNs hidden state. So, the hidden layer in the RNN is the memory of the network. In theory, RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps. LSTMs RNNs start losing historical context over time in the sequence, and hence are hard to train for practical purposes. This is where  LSTMs (Long short-term memory)  come into the picture! Introduced by Hochreiter and Schmidhuber in 1997, LSTMs can remember information from really long sequence-based data and prevent issues such as the vanishing gradient problem. LSTMs usually consist of three or four gates, including input, output, and forget gates. The following diagram shows a high-level representation of a single LSTM cell: The input gate can usually allow or deny incoming signals or inputs to alter the memory cell state. The output gate usually propagates the value to other neurons as necessary. The forget gate controls the memory cell's self-recurrent connection to remember or forget previous states as necessary. Multiple LSTM cells are usually stacked in any deep learning network to solve real-world problems, such as sequence prediction. Stacked LSTMs If we want to learn about the hierarchical representation of sequential data, a stack of LSTM layers can be used. Each LSTM layer outputs a sequence of vectors rather than a single vector for each item of the sequence, which will be used as an input to a subsequent LSTM layer. This hierarchy of hidden layers enables a more complex representation of our sequential data. Stacked LSTM models can be used for modeling complex multivariate time series data. Encoder-decoder – Neural Machine Translation Machine translation is a sub-field of computational linguistics, and is about performing translation of text or speech from one language to another. Traditional machine translation systems typically rely on sophisticated feature engineering based on the statistical properties of text. Recently, deep learning has been used to solve this problem, with an approach known as Neural Machine Translation (NMT). An NMT system typically consists of two modules: an encoder and a decoder. It first reads the source sentence using the encoder to build a thought vector: a sequence of numbers that represents the sentence's meaning. A decoder processes the sentence vector to emit a translation to other target languages. This is called an encoder-decoder architecture. The encoders and decoders are typically forms of RNN. The following diagram shows an encoder-decoder architecture using stacked LSTMs. Source: tensorflow.org The source code for NMT in TensorFlow is available at Github. Gated Recurrent Units Gated Recurrent Units (GRUs) are related to LSTMs, as both utilize different ways of gating information to prevent the vanishing gradient problem and store long-term memory. A GRU has two gates: a reset gate, r, and an update gate, z, as shown in the following diagram. The reset gate determines how to combine the new input with the previous hidden state, ht-1, and the update gate defines how much of the previous state information to keep. If we set the reset to all ones and update gate to all zeros, we arrive at a simple RNN model: GRUs are computationally more efficient because of a simpler structure and fewer parameters. Summary This article covered various advances in neural network architectures including Autoencoder Neural Networks, Variational Autoencoders,  CNN's and  RNN's architectures. To understand how to simplify deep learning  by taking supervised, unsupervised, and reinforcement learning to the next level using the Python ecosystem, check out this book  Hands-On Transfer Learning with Python Neural Style Transfer: Creating artificial art with deep learning and transfer learning Dr. Brandon explains ‘Transfer Learning’ to Jon 5 cool ways Transfer Learning is being used today
Read more
  • 0
  • 0
  • 6492

article-image-perform-regression-analysis-using-sas
Gebin George
27 Feb 2018
7 min read
Save for later

How to perform regression analysis using SAS

Gebin George
27 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Big Data Analysis with SAS written by David Pope. This book will help you leverage the power of SAS for data management, analysis and reporting. It contains practical use-cases and real-world examples on predictive modelling, forecasting, optimizing, and reporting your Big Data analysis using SAS.[/box] Today, we will perform regression analysis using SAS in a step-by-step manner with a practical use-case. Regression analysis is one of the earliest predictive techniques most people learn because it can be applied across a wide variety of problems dealing with data that is related in linear and non-linear ways. Linear data is one of the easier use cases, and as such PROC REG is a well-known and often-used procedure to help predict likely outcomes before they happen. The REG procedure provides extensive capabilities for fitting linear regression models that involve individual numeric independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of survival data, and regression modeling of transformed variables. The SAS/STAT procedures that can fit regression models include the ADAPTIVEREG, CATMOD, GAM, GENMOD, GLIMMIX, GLM, GLMSELECT, LIFEREG, LOESS, LOGISTIC, MIXED, NLIN, NLMIXED, ORTHOREG, PHREG, PLS, PROBIT, QUANTREG, QUANTSELECT, REG, ROBUSTREG, RSREG, SURVEYLOGISTIC, SURVEYPHREG, SURVEYREG, TPSPLINE, and TRANSREG procedures. Several procedures in SAS/ETS software also fit regression models. SAS/STAT14.2 / SAS/STAT User's Guide - Introduction to Regression Procedures - Overview: Regression Procedures (http://documentation.sas.com/?cdcId=statcdccdcVersion=14.2 docsetId=statugdocsetTarget=statug_introreg_sect001.htmlocale=enshowBanner=yes). Regression analysis attempts to model the relationship between a response or output variable and a set of input variables. The response is considered the target variable or the variable that one is trying to predict, while the rest of the input variables make up parameters used as input into the algorithm. They are used to derive the predicted value for the response variable. PROC REG One of the easiest ways to determine if regression analysis is applicable to helping you answer a question is if the type of question being asked has only two answers. For example, should a bank lend an applicant money? Yes or no? This is known as a binary response, and as such, regression analysis can be applied to help determine the answer. In the following example, the reader will use the SASHELP.BASEBALL dataset to create a regression model to predict the value of a baseball player's salary. The SASHELP.BASEBALL dataset contains salary and performance information for Major League. Baseball players who played at least one game in both the 1986 and 1987 seasons, excluding pitchers. The salaries (Sports Illustrated, April 20, 1987) are for the 1987 season and the performance measures are from 1986 (Collier Books, The 1987 Baseball Encyclopedia Update). SAS/STAT® 14.2 / SAS/STAT User's Guide - Example 99: Modeling Salaries of Major League Baseball Players (http://documentation.sas.com/ ?cdcId= statcdc cdcVersion= 14.2 docsetId=statugdocsetTarget= statug_ reg_ examples01.htmlocale= en showBanner= yes). Let's first use PROC UNIVARIATE to learn something about this baseball data by submitting the following code: proc univariate data=sashelp.baseball; quit; While reviewing the results of the output, the reader will notice that the variance associated with logSalary, 0.79066, is much less than the variance associated with the actual target variable Salary, 203508. In this case, it makes better sense to attempt to predict the logSalary value of a player instead of Salary. Write the following code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball; id name team league; model logSalary = nAtBat nHits nHome nRuns nRBI YrMajor CrAtBat CrHits CrHome CrRuns CrRbi; Quit; Notice that there are 59 observations as specified in the first output table with at least one of the input variables with missing values; as such those are not used in the development of the regression model. The Root Mean Squared Error (RMSE) and R-square are statistics that typically inform the analyst how good the model is in predicting the target. These range from 0 to 1.0 with higher values typically indicating a better model. The higher the Rsquared values typically indicate a better performing model but sometimes conditions or the data used to train the model over-fit and don't represent the true value of the prediction power of that particular model. Over-fitting can happen when an analyst doesn't have enough real-life data and chooses data or a sample of data that over-presents the target event, and therefore it will produce a poor performing model when using real-world data as input. Since several of the input values appear to have little predictive power on the target, an analyst may decide to drop these variables, thereby reducing the need for that information to make a decent prediction. In this case, it appears we only need to use four input variables. YrMajor, nHits, nRuns, and nAtBat. Modify the code as follows and submit it again: proc reg data=sashelp.baseball; id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; The p-value associated with each of the input variables provides the analyst with an insight into which variables have the biggest impact on helping to predict the target variable. In this case, the smaller the value, the higher the predictive value of the input variable. Both the RMSE and R-square values for this second model are slightly lower than the original. However, the adjusted R-square value is slightly higher. In this case, an analyst may chose to use the second model since it requires much less data and provides basically the same predictive power. Prior to accepting any model, an analyst should determine whether there are a few observations that may be over-influencing the results by investigating the influence and fit diagnostics. The default output from PROC REG provides this type of visual insight: The top-right corner plot, showing the externally studentized residuals (RStudent) by leverage values, shows that there are a few observations with high leverage that may be overly influencing the fit produced. In order to investigate this further, we will add a plots statement to our PROC REG to produce a labeled version of this plot. Type the following code in a SAS Studio program section and submit: proc reg data=sashelp.baseball plots(only label)=(RStudentByLeverage); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; Sure enough, there are three to five individuals whose input variables may have excessive influence on fitting this model. Let's remove those points and see if the model improves. Type this code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball plots=(residuals(smooth)); where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; This change, in itself, has not improved the model but actually made the model worse as can be seen by the R-square, 0.5592. However, the plots residuals(smooth) option gives some insights as it pertains to YrMajor; players at the beginning and the end of their careers tend to be paid less compared to others, as can be seen in Figure 4.12: In order to address this lack of fit, an analyst can use polynomials of degree two for this variable, YrMajor. Type the following code in a SAS Studio program section and submit it: data work.baseball; set sashelp.baseball; where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); YrMajor2 = YrMajor*YrMajor; run; proc reg data=work.baseball; id name team league; model logSalary = YrMajor YrMajor2 nHits nRuns nAtBat; Quit; After removing some outliers and adjusting for the YrMajor variable, the model's predictive power has improved significantly as can be seen in the much improved R-square value of 0.7149. We saw an effective way of performing regression analysis using SAS platform. If you found our post useful, do check out this book Big Data Analysis with SAS to understand other data analysis models and perform them practically using SAS.    
Read more
  • 0
  • 0
  • 6482

article-image-testing-ui-using-webdriverjs
Packt
17 Feb 2015
30 min read
Save for later

Testing a UI Using WebDriverJS

Packt
17 Feb 2015
30 min read
In this article, by the author, Enrique Amodeo, of the book, Learning Behavior-driven Development with JavaScript, we will look into an advanced concept: how to test a user interface. For this purpose, you will learn the following topics: Using WebDriverJS to manipulate a browser and inspect the resulting HTML generated by our UI Organizing our UI codebase to make it easily testable The right abstraction level for our UI tests (For more resources related to this topic, see here.) Our strategy for UI testing There are two traditional strategies towards approaching the problem of UI testing: record-and-replay tools and end-to-end testing. The first approach, record-and-replay, leverages the use of tools capable of recording user activity in the UI and saves this into a script file. This script file can be later executed to perform exactly the same UI manipulation as the user performed and to check whether the results are exactly the same. This approach is not very compatible with BDD because of the following reasons: We cannot test-first our UI. To be able to use the UI and record the user activity, we first need to have most of the code of our application in place. This is not a problem in the waterfall approach, where QA and testing are performed after the codification phase is finished. However, in BDD, we aim to document the product features as automated tests, so we should write the tests before or during the coding. The resulting test scripts are low-level and totally disconnected from the problem domain. There is no way to use them as a live documentation for the requirements of the system. The resulting test suite is brittle and it will stop working whenever we make slight changes, even cosmetic ones, to the UI. The problem is that the tools record the low-level interaction with the system that depends on technical details of the HTML. The other classic approach is end-to-end testing, where we do not only test the UI layer, but also most of the system or even the whole of it. To perform the setup of the tests, the most common approach is to substitute the third-party systems with test doubles. Normally, the database is under the control of the development team, so some practitioners use a regular database for the setup. However, we could use an in-memory database or even mock the DAOs. In any case, this approach prompts us to create an integrated test suite where we are not only testing the correctness of the UI, but the business logic as well. In the context of this discussion, an integrated test is a test that checks several layers of abstraction, or subsystems, in combination. Do not confuse it with the act of testing several classes or functions together. This approach is not inherently against BDD; for example, we could use Cucumber.js to capture the features of the system and implement Gherkin steps using WebDriver to drive the UI and make assertions. In fact, for most people, when you say BDD they always interpret this term to refer to this kind of test. We will end up writing a lot of test cases, because we need to combine the scenarios from the business logic domain with the ones from the UI domain. Furthermore, in which language should we formulate the tests? If we use the UI language, maybe it will be too low-level to easily describe business concepts. If we use the business domain language, maybe we will not be able to test the important details of the UI because they are too low-level. Alternatively, we can even end up with tests that mix UI language with business terminology, so they will neither be focused nor very clear to anyone. Choosing the right tests for the UI If we want to test whether the UI works, why should we test the business rules? After all, this is already tested in the BDD test suite of the business logic layer. To decide which tests to write, we should first determine the responsibilities of the UI layer, which are as follows: Presenting the information provided by the business layer to the user in a nice way. Transforming user interaction into requests for the business layer. Controlling the changes in the appearance of the UI components, which includes things such as enabling/disabling controls, highlighting entry fields, showing/hiding UI elements, and so on. Orchestration between the UI components. Transferring and adapting information between the UI components and navigation between pages fall under this category. We do not need to write tests about business rules, and we should not assume much about the business layer itself, apart from a loose contract. How we should word our tests? We should use a UI-related language when we talk about what the user sees and does. Words such as fields, buttons, forms, links, click, hover, highlight, enable/disable, or show and hide are relevant in this context. However, we should not go too far; otherwise, our tests will be too brittle. Saying, for example, that the name field should have a pink border is too low-level. The moment that the designer decides to use red instead of pink, or changes his mind and decides to change the background color instead of the border, our test will break. We should aim for tests that express the real intention of the user interface; for example, the name field should be highlighted as incorrect. The testing architecture At this point, we could write tests relevant for our UI using the following testing architecture: A simple testing architecture for our UI We can use WebDriver to issue user gestures to interact with the browser. These user gestures are transformed by the browser in to DOM events that are the inputs of our UI logic and will trigger operations on it. We can use WebDriver again to read the resulting HTML in the assertions. We can simply use a test double to impersonate our server, so we can set up our tests easily. This architecture is very simple and sounds like a good plan, but it is not! There are three main problems here: UI testing is very slow. Take into account that the boot time and shutdown phase can take 3 seconds in a normal laptop. Each UI interaction using WebDriver can take between 50 and 100 milliseconds, and the latency with the fake server can be an extra 10 milliseconds. This gives us only around 10 tests per second, plus an extra 3 seconds. UI tests are complex and difficult to diagnose when they fail. What is failing? Our selectors used to tell WebDriver how to find the relevant elements. Some race condition we were not aware of? A cross-browser issue? Also note that our test is now distributed between two different processes, a fact that always makes debugging more difficult. UI tests are inherently brittle. We can try to make them less brittle with best practices, but even then a change in the structure of the HTML code will sometimes break our tests. This is a bad thing because the UI often changes more frequently than the business layer. As UI testing is very risky and expensive, we should try to code as less amount of tests that interact with the UI as possible. We can achieve this without losing testing power, with the following testing architecture:   A smarter testing architecture We have now split our UI layer into two components: the view and the UI logic. This design aligns with the family of MV* design patterns. In the context of this article, the view corresponds with a passive view, and the UI logic corresponds with the controller or the presenter, in combination with the model. A passive view is usually very hard to test; so in this article we will focus mostly on how to do it. You will often be able to easily separate the passive view from the UI logic, especially if you are using an MV* pattern, such as MVC, MVP, or MVVM. Most of our tests will be for the UI logic. This is the component that implements the client-side validation, orchestration of UI components, navigation, and so on. It is the UI logic component that has all the rules about how the user can interact with the UI, and hence it needs to maintain some kind of internal state. The UI logic component can be tested completely in memory using standard techniques. We can simply mock the XMLHttpRequest object, or the corresponding object in the framework we are using, and test everything in memory using a single Node.js process. No interaction with the browser and the HTML is needed, so these tests will be blazingly fast and robust. Then we need to test the view. This is a very thin component with only two responsibilities: Manipulating and updating the HTML to present the user with the information whenever it is instructed to do so by the UI logic component Listening for HTML events and transforming them into suitable requests for the UI logic component The view should not have more responsibilities, and it is a stateless component. It simply does not need to store the internal state, because it only transforms and transmits information between the HTML and the UI logic. Since it is the only component that interacts with the HTML, it is the only one that needs to be tested using WebDriver. The point of all of this is that the view can be tested with only a bunch of tests that are conceptually simple. Hence, we minimize the number and complexity of the tests that need to interact with the UI. WebDriverJS Testing the passive view layer is a technical challenge. We not only need to find a way for our test to inject native events into the browser to simulate user interaction, but we also need to be able to inspect the DOM elements and inject and execute scripts. This was very challenging to do approximately 5 years ago. In fact, it was considered complex and expensive, and some practitioners recommended not to test the passive view. After all, this layer is very thin and mostly contains the bindings of the UI to the HTML DOM, so the risk of error is not supposed to be high, specially if we use modern cross-browser frameworks to implement this layer. Nonetheless, nowadays the technology has evolved, and we can do this kind of testing without much fuss if we use the right tools. One of these tools is Selenium 2.0 (also known as WebDriver) and its library for JavaScript, which is WebDriverJS (https://code.google.com/p/selenium/wiki/WebDriverJs).  In this book, we will use WebDriverJS, but there are other bindings in JavaScript for Selenium 2.0, such as WebDriverIO (http://webdriver.io/). You can use the one you like most or even try both. The point is that the techniques I will show you here can be applied with any client of WebDriver or even with other tools that are not WebDriver. Selenium 2.0 is a tool that allows us to make direct calls to a browser automation API. This way, we can simulate native events, we can access the DOM, and we can control the browser. Each browser provides a different API and has its own quirks, but Selenium 2.0 will offer us a unified API called the WebDriver API. This allows us to interact with different browsers without changing the code of our tests. As we are accessing the browser directly, we do not need a special server, unless we want to control browsers that are on a different machine. Actually, this is only true, due some technical limitations, if we want to test against a Google Chrome or a Firefox browser using WebDriverJS. So, basically, the testing architecture for our passive view looks like this: Testing with WebDriverJS We can see that we use WebDriverJS for the following: Sending native events to manipulate the UI, as if we were the user, during the action phase of our tests Inspecting the HTML during the assert phase of our test Sending small scripts to set up the test doubles, check them, and invoke the update method of our passive view Apart from this, we need some extra infrastructure, such as a web server that serves our test HTML page and the components we want to test. As is evident from the diagram, the commands of WebDriverJS require some network traffic to able to send the appropriate request to the browser automation API, wait for the browser to execute, and get the result back through the network. This forces the API of WebDriverJS to be asynchronous in order to not block unnecessarily. That is why WebDriverJS has an API designed around promises. Most of the methods will return a promise or an object whose methods return promises. This plays perfectly well with Mocha and Chai.  There is a W3C specification for the WebDriver API. If you want to have a look, just visit https://dvcs.w3.org/hg/webdriver/raw-file/default/webdriver-spec.html. The API of WebDriverJS is a bit complex, and you can find its official documentation at http://selenium.googlecode.com/git/docs/api/javascript/module_selenium-webdriver.html. However, to follow this article, you do not need to read it, since I will now show you the most important API that WebDriverJS offers us. Finding and interacting with elements It is very easy to find an HTML element using WebDriverJS; we just need to use either the findElement or the findElements methods. Both methods receive a locator object specifying which element or elements to find. The first method will return the first element it finds, or simply fail with an exception, if there are no elements matching the locator. The findElements method will return a promise for an array with all the matching elements. If there are no matching elements, the promised array will be empty and no error will be thrown. How do we specify which elements we want to find? To do so, we need to use a locator object as a parameter. For example, if we would like to find the element whose identifier is order_item1, then we could use the following code: var By = require('selenium-webdriver').By;   driver.findElement(By.id('order_item1')); We need to import the selenium-webdriver module and capture its locator factory object. By convention, we store this locator factory in a variable called By. Later, we will see how we can get a WebDriverJS instance. This code is very expressive, but a bit verbose. There is another version of this: driver.findElement({ id: 'order_item1' }); Here, the locator criteria is passed in the form of a plain JSON object. There is no need to use the By object or any factory. Which version is better? Neither. You just use the one you like most. In this article, the plain JSON locator will be used. The following are the criteria for finding elements: Using the tag name, for example, to locate all the <li> elements in the document: driver.findElements(By.tagName('li'));driver.findElements({ tagName: 'li' }); We can also locate using the name attribute. It can be handy to locate the input fields. The following code will locate the first element named password: driver.findElement(By.name('password')); driver.findElement({ name: 'password' }); Using the class name; for example, the following code will locate the first element that contains a class called item: driver.findElement(By.className('item')); driver.findElement({ className: 'item' }); We can use any CSS selector that our target browser understands. If the target browser does not understand the selector, it will throw an exception; for example, to find the second item of an order (assuming there is only one order on the page): driver.findElement(By.css('.order .item:nth-of-type(2)')); driver.findElement({ css: '.order .item:nth-of-type(2)' }); Using only the CSS selector you can locate any element, and it is the one I recommend. The other ones can be very handy in specific situations. There are more ways of locating elements, such as linkText, partialLinkText, or xpath, but I seldom use them. Locating elements by their text, such as in linkText or partialLinkText, is brittle because small changes in the wording of the text can break the tests. Also, locating by xpath is not as useful in HTML as using a CSS selector. Obviously, it can be used if the UI is defined as an XML document, but this is very rare nowadays. In both methods, findElement and findElements, the resulting HTML elements are wrapped as a WebElement object. This object allows us to send an event to that element or inspect its contents. Some of its methods that allow us to manipulate the DOM are as follows: clear(): This will do nothing unless WebElement represents an input control. In this case, it will clear its value and then trigger a change event. It returns a promise that will be fulfilled whenever the operation is done. sendKeys(text or key, …): This will do nothing unless WebElement is an input control. In this case, it will send the equivalents of keyboard events to the parameters we have passed. It can receive one or more parameters with a text or key object. If it receives a text, it will transform the text into a sequence of keyboard events. This way, it will simulate a user typing on a keyboard. This is more realistic than simply changing the value property of an input control, since the proper keyDown, keyPress, and keyUp events will be fired. A promise is returned that will be fulfilled when all the key events are issued. For example, to simulate that a user enters some search text in an input field and then presses Enter, we can use the following code: var Key = require('selenium-webdriver').Key;   var searchField = driver.findElement({name: 'searchTxt'}); searchField.sendKeys('BDD with JS', Key.ENTER);  The webdriver.Key object allows us to specify any key that does not represent a character, such as Enter, the up arrow, Command, Ctrl, Shift, and so on. We can also use its chord method to represent a combination of several keys pressed at the same time. For example, to simulate Alt + Command + J, use driver.sendKeys(Key.chord(Key.ALT, Key.COMMAND, 'J'));. click(): This will issue a click event just in the center of the element. The returned promise will be fulfilled when the event is fired.  Sometimes, the center of an element is nonclickable, and an exception is thrown! This can happen, for example, with table rows, since the center of a table row may just be the padding between cells! submit(): This will look for the form that contains this element and will issue a submit event. Apart from sending events to an element, we can inspect its contents with the following methods: getId(): This will return a promise with the internal identifier of this element used by WebDriver. Note that this is not the value of the DOM ID property! getText(): This will return a promise that will be fulfilled with the visible text inside this element. It will include the text in any child element and will trim the leading and trailing whitespaces. Note that, if this element is not displayed or is hidden, the resulting text will be an empty string! getInnerHtml() and getOuterHtml(): These will return a promise that will be fulfilled with a string that contains innerHTML or outerHTML of this element. isSelected(): This will return a promise with a Boolean that determines whether the element has either been selected or checked. This method is designed to be used with the <option> elements. isEnabled(): This will return a promise with a Boolean that determines whether the element is enabled or not. isDisplayed(): This will return a promise with a Boolean that determines whether the element is displayed or not. Here, "displayed" is taken in a broad sense; in general, it means that the user can see the element without resizing the browser. For example, whether the element is hidden, whether it has diplay: none, or whether it has no size, or is in an inaccessible part of the document, the returned promise will be fulfilled as false. getTagName(): This will return a promise with the tag name of the element. getSize(): This will return a promise with the size of the element. The size comes as a JSON object with width and height properties that indicate the height and width in pixels of the bounding box of the element. The bounding box includes padding, margin, and border. getLocation(): This will return a promise with the position of the element. The position comes as a JSON object with x and y properties that indicate the coordinates in pixels of the element relative to the page. getAttribute(name): This will return a promise with the value of the specified attribute. Note that WebDriver does not distinguish between attributes and properties! If there is neither an attribute nor a property with that name, the promise will be fulfilled as null. If the attribute is a "boolean" HTML attribute (such as checked or disabled), the promise will be evaluated as true only if the attribute is present. If there is both an attribute and a property with the same name, the attribute value will be used.  If you really need to be precise about getting an attribute or a property, it is much better to use an injected script to get it. getCssValue(cssPropertyName): This will return a promise with a string that represents the computed value of the specified CSS property. The computed value is the resulting value after the browser has applied all the CSS rules and the style and class attributes. Note that the specific representation of the value depends on the browser; for example, the color property can be returned as red, #ff0000, or rgb(255, 0, 0) depending on the browser. This is not cross-browser, so we should avoid this method in our tests. findElement(locator) and findElements(locator): These will return an element, or all the elements that are the descendants of this element, and match the locator. isElementPresent(locator): This will return a promise with a Boolean that indicates whether there is at least one descendant element that matches this locator. As you can see, the WebElement API is pretty simple and allows us to do most of our tests easily. However, what if we need to perform some complex interaction with the UI, such as drag-and-drop? Complex UI interaction WebDriverJS allows us to define a complex action gesture in an easy way using the DSL defined in the webdriver.ActionSequence object. This DSL allows us to define any sequence of browser events using the builder pattern. For example, to simulate a drag-and-drop gesture, proceed with the following code: var beverageElement = driver.findElement({ id: 'expresso' });var orderElement = driver.findElement({ id: 'order' });driver.actions()    .mouseMove(beverageElement)    .mouseDown()    .mouseMove(orderElement)    .mouseUp()    .perform(); We want to drag an espresso to our order, so we move the mouse to the center of the espresso and press the mouse. Then, we move the mouse, by dragging the element, over the order. Finally, we release the mouse button to drop the espresso. We can add as many actions we want, but the sequence of events will not be executed until we call the perform method. The perform method will return a promise that will be fulfilled when the full sequence is finished. The webdriver.ActionSequence object has the following methods: sendKeys(keys...): This sends a sequence of key events, exactly as we saw earlier, to the method with the same name in the case of WebElement. The difference is that the keys will be sent to the document instead of a specific element. keyUp(key) and keyDown(key): These send the keyUp and keyDown events. Note that these methods only admit the modifier keys: Alt, Ctrl, Shift, command, and meta. mouseMove(targetLocation, optionalOffset): This will move the mouse from the current location to the target location. The location can be defined either as a WebElement or as page-relative coordinates in pixels, using a JSON object with x and y properties. If we provide the target location as a WebElement, the mouse will be moved to the center of the element. In this case, we can override this behavior by supplying an extra optional parameter indicating an offset relative to the top-left corner of the element. This could be needed in the case that the center of the element cannot receive events. mouseDown(), click(), doubleClick(), and mouseUp(): These will issue the corresponding mouse events. All of these methods can receive zero, one, or two parameters. Let's see what they mean with the following examples: var Button = require('selenium-webdriver').Button;   // to emit the event in the center of the expresso element driver.actions().mouseDown(expresso).perform(); // to make a right click in the current position driver.actions().click(Button.RIGHT).perform(); // Middle click in the expresso element driver.actions().click(expresso, Button.MIDDLE).perform();  The webdriver.Button object defines the three possible buttons of a mouse: LEFT, RIGHT, and MIDDLE. However, note that mouseDown() and mouseUp() only support the LEFT button! dragAndDrop(element, location): This is a shortcut to performing a drag-and-drop of the specified element to the specified location. Again, the location can be WebElement of a page-relative coordinate. Injecting scripts We can use WebDriver to execute scripts in the browser and then wait for its results. There are two methods for this: executeScript and executeAsyncScript. Both methods receive a script and an optional list of parameters and send the script and the parameters to the browser to be executed. They return a promise that will be fulfilled with the result of the script; it will be rejected if the script failed. An important detail is how the script and its parameters are sent to the browser. For this, they need to be serialized and sent through the network. Once there, they will be deserialized, and the script will be executed inside an autoexecuted function that will receive the parameters as arguments. As a result of of this, our scripts cannot access any variable in our tests, unless they are explicitly sent as parameters. The script is executed in the browser with the window object as its execution context (the value of this). When passing parameters, we need to take into consideration the kind of data that WebDriver can serialize. This data includes the following: Booleans, strings, and numbers. The null and undefined values. However, note that undefined will be translated as null. Any function will be transformed to a string that contains only its body. A WebElement object will be received as a DOM element. So, it will not have the methods of WebElement but the standard DOM method instead. Conversely, if the script results in a DOM element, it will be received as WebElement in the test. Arrays and objects will be converted to arrays and objects whose elements and properties have been converted using the preceding rules. With this in mind, we could, for example, retrieve the identifier of an element, such as the following one: var elementSelector = ".order ul > li"; driver.executeScript(     "return document.querySelector(arguments[0]).id;",     elementSelector ).then(function(id) {   expect(id).to.be.equal('order_item0'); }); Notice that the script is specified as a string with the code. This can be a bit awkward, so there is an alternative available: var elementSelector = ".order ul > li"; driver.executeScript(function() {     var selector = arguments[0];     return document.querySelector(selector).id; }, elementSelector).then(function(id) {   expect(id).to.be.equal('order_item0'); }); WebDriver will just convert the body of the function to a string and send it to the browser. Since the script is executed in the browser, we cannot access the elementSelector variable, and we need to access it through parameters. Unfortunately, we are forced to retrieve the parameters using the arguments pseudoarray, because WebDriver have no way of knowing the name of each argument. As its name suggest, executeAsyncScript allows us to execute an asynchronous script. In this case, the last argument provided to the script is always a callback that we need to call to signal that the script has finalized. The result of the script will be the first argument provided to that callback. If no argument or undefined is explicitly provided, then the result will be null. Note that this is not directly compatible with the Node.js callback convention and that any extra parameters passed to the callback will be ignored. There is no way to explicitly signal an error in an asynchronous way. For example, if we want to return the value of an asynchronous DAO, then proceed with the following code: driver.executeAsyncScript(function() {   var cb = arguments[1],       userId = arguments[0];   window.userDAO.findById(userId).then(cb, cb); }, 'user1').then(function(userOrError) {   expect(userOrError).to.be.equal(expectedUser); }); Command control flows All the commands in WebDriverJS are asynchronous and return a promise or WebElement. How do we execute an ordered sequence of commands? Well, using promises could be something like this: return driver.findElement({name:'quantity'}).sendKeys('23')     .then(function() {       return driver.findElement({name:'add'}).click();     })     .then(function() {       return driver.findElement({css:firstItemSel}).getText();     })     .then(function(quantity) {       expect(quantity).to.be.equal('23');     }); This works because we wait for each command to finish before issuing the next command. However, it is a bit verbose. Fortunately, with WebDriverJS we can do the following: driver.findElement({name:'quantity'}).sendKeys('23'); driver.findElement({name:'add'}).click(); return expect(driver.findElement({css:firstItemSel}).getText())     .to.eventually.be.equal('23'); How can the preceding code work? Because whenever we tell WebDriverJS to do something, it simply schedules the requested command in a queue-like structure called the control flow. The point is that each command will not be executed until it reaches the top of the queue. This way, we do not need to explicitly wait for the sendKeys command to be completed before executing the click command. The sendKeys command is scheduled in the control flow before click, so the latter one will not be executed until sendKeys is done. All the commands are scheduled against the same control flow queue that is associated with the WebDriver object. However, we can optionally create several control flows if we want to execute commands in parallel: var flow1 = webdriver.promise.createFlow(function() {   var driver = new webdriver.Builder().build();     // do something with driver here }); var flow2 = webdriver.promise.createFlow(function() {   var driver = new webdriver.Builder().build();     // do something with driver here }); webdriver.promise.fullyResolved([flow1, flow2]).then(function(){   // Wait for flow1 and flow2 to finish and do something }); We need to create each control flow instance manually and, inside each flow, create a separate WebDriver instance. The commands in both flows will be executed in parallel, and we can wait for both of them to be finalized to do something else using fullyResolved. In fact, we can even nest flows if needed to create a custom parallel command-execution graph. Taking screenshots Sometimes, it is useful to take some screenshots of the current screen for debugging purposes. This can be done with the takeScreenshot() method. This method will return a promise that will be fulfilled with a string that contains a base-64 encoded PNG. It is our responsibility to save this string as a PNG file. The following snippet of code will do the trick: driver.takeScreenshot()     .then(function(shot) {       fs.writeFileSync(fileFullPath, shot, 'base64');     });  Note that not all browsers support this capability. Read the documentation for the specific browser adapter to see if it is available. Working with several tabs and frames WebDriver allows us to control several tabs, or windows, for the same browser. This can be useful if we want to test several pages in parallel or if our test needs to assert or manipulate things in several frames at the same time. This can be done with the switchTo() method that will return a webdriver.WebDriver.TargetLocator object. This object allows us to change the target of our commands to a specific frame or window. It has the following three main methods: frame(nameOrIndex): This will switch to a frame with the specified name or index. It will return a promise that is fulfilled when the focus has been changed to the specified frame. If we specify the frame with a number, this will be interpreted as a zero-based index in the window.frames array. window(windowName): This will switch focus to the window named as specified. The returned promise will be fulfilled when it is done. alert(): This will switch the focus to the active alert window. We can dismiss an alert with driver.switchTo().alert().dismiss();. The promise returned by these methods will be rejected if the specified window, frame, or alert window is not found. To make tests on several tabs at the same time, we must ensure that they do not share any kind of state, or interfere with each other through cookies, local storage, or an other kind of mechanism. Summary This article showed us that a good way to test the UI of an application is actually to split it into two parts and test them separately. One part is the core logic of the UI that takes responsibility for control logic, models, calls to the server, validations, and so on. This part can be tested in a classic way, using BDD, and mocking the server access. No new techniques are needed for this, and the tests will be fast. Here, we can involve nonengineer stakeholders, such as UX designers, users, and so on, to write some nice BDD features using Gherkin and Cucumber.js. The other part is a thin view layer that follows a passive view design. It only updates the HTML when it is asked for, and listens to DOM events to transform them as requests to the core logic UI layer. This layer has no internal state or control rules; it simply transforms data and manipulates the DOM. We can use WebDriverJS to test the view. This is a good approach because the most complex part of the UI can be fully test-driven easily, and the hard and slow parts to test the view do not need many tests since they are very simple. In this sense, the passive view should not have a state; it should only act as a proxy of the DOM. Resources for Article: Further resources on this subject: Dart With Javascript [article] Behavior-Driven Development With Selenium WebDriver [article] Event-Driven Programming [article]
Read more
  • 0
  • 0
  • 6481

article-image-how-python-code-organized
Packt
19 Feb 2016
8 min read
Save for later

How is Python code organized

Packt
19 Feb 2016
8 min read
Python is an easy to learn yet a powerful programming language. It has efficient high-level data structures and effective approach to object-oriented programming. Let's talk a little bit about how Python code is organized. In this paragraph, we'll start going down the rabbit hole a little bit more and introduce a bit more technical names and concepts. Starting with the basics, how is Python code organized? Of course, you write your code into files. When you save a file with the extension .py, that file is said to be a Python module. If you're on Windows or Mac, which typically hide file extensions to the user, please make sure you change the configuration so that you can see the complete name of the files. This is not strictly a requirement, but a hearty suggestion. It would be impractical to save all the code that it is required for software to work within one single file. That solution works for scripts, which are usually not longer than a few hundred lines (and often they are quite shorter than that). A complete Python application can be made of hundreds of thousands of lines of code, so you will have to scatter it through different modules. Better, but not nearly good enough. It turns out that even like this it would still be impractical to work with the code. So Python gives you another structure, called package, which allows you to group modules together. A package is nothing more than a folder, which must contain a special file, __init__.py that doesn't need to hold any code but whose presence is required to tell Python that the folder is not just some folder, but it's actually a package (note that as of Python 3.3 __init__.py is not strictly required any more). As always, an example will make all of this much clearer. I have created an example structure in my project, and when I type in my Linux console: $ tree -v example Here's how a structure of a real simple application could look like: example/ ├── core.py ├── run.py └── util ├── __init__.py ├── db.py ├── math.py └── network.py You can see that within the root of this example, we have two modules, core.py and run.py, and one package: util. Within core.py, there may be the core logic of our application. On the other hand, within the run.py module, we can probably find the logic to start the application. Within the util package, I expect to find various utility tools, and in fact, we can guess that the modules there are called by the type of tools they hold: db.py would hold tools to work with databases, math.py would of course hold mathematical tools (maybe our application deals with financial data), and network.py would probably hold tools to send/receive data on networks. As explained before, the __init__.py file is there just to tell Python that util is a package and not just a mere folder. Had this software been organized within modules only, it would have been much harder to infer its structure. I put a module only example under the ch1/files_only folder, see it for yourself: $ tree -v files_only This shows us a completely different picture: files_only/ ├── core.py ├── db.py ├── math.py ├── network.py └── run.py It is a little harder to guess what each module does, right? Now, consider that this is just a simple example, so you can guess how much harder it would be to understand a real application if we couldn't organize the code in packages and modules. How do we use modules and packages When a developer is writing an application, it is very likely that they will need to apply the same piece of logic in different parts of it. For example, when writing a parser for the data that comes from a form that a user can fill in a web page, the application will have to validate whether a certain field is holding a number or not. Regardless of how the logic for this kind of validation is written, it's very likely that it will be needed in more than one place. For example in a poll application, where the user is asked many question, it's likely that several of them will require a numeric answer. For example: What is your age How many pets do you own How many children do you have How many times have you been married It would be very bad practice to copy paste (or, more properly said: duplicate) the validation logic in every place where we expect a numeric answer. This would violate the DRY (Don't Repeat Yourself) principle, which states that you should never repeat the same piece of code more than once in your application. I feel the need to stress the importance of this principle: you should never repeat the same piece of code more than once in your application (got the irony?). There are several reasons why repeating the same piece of logic can be very bad, the most important ones being: There could be a bug in the logic, and therefore, you would have to correct it in every place that logic is applied. You may want to amend the way you carry out the validation, and again you would have to change it in every place it is applied. You may forget to fix/amend a piece of logic because you missed it when searching for all its occurrences. This would leave wrong/inconsistent behavior in your application. Your code would be longer than needed, for no good reason. Python is a wonderful language and provides you with all the tools you need to apply all the coding best practices. For this particular example, we need to be able to reuse a piece of code. To be able to reuse a piece of code, we need to have a construct that will hold the code for us so that we can call that construct every time we need to repeat the logic inside it. That construct exists, and it's called function. I'm not going too deep into the specifics here, so please just remember that a function is a block of organized, reusable code which is used to perform a task. Functions can assume many forms and names, according to what kind of environment they belong to, but for now this is not important. Functions are the building blocks of modularity in your application, and they are almost indispensable (unless you're writing a super simple script, you'll use functions all the time). Python comes with a very extensive library, as I already said a few pages ago. Now, maybe it's a good time to define what a library is: a library is a collection of functions and objects that provide functionalities that enrich the abilities of a language. For example, within Python's math library we can find a plethora of functions, one of which is the factorial function, which of course calculates the factorial of a number. In mathematics, the factorial of a non-negative integer number N, denoted as N!, is defined as the product of all positive integers less than or equal to N. For example, the factorial of 5 is calculated as: 5! = 5 * 4 * 3 * 2 * 1 = 120 The factorial of 0 is 0! = 1, to respect the convention for an empty product. So, if you wanted to use this function in your code, all you would have to do is to import it and call it with the right input values. Don't worry too much if input values and the concept of calling is not very clear for now, please just concentrate on the import part. We use a library by importing what we need from it, and then we use it. In Python, to calculate the factorial of number 5, we just need the following code: >>> from math import factorial >>> factorial(5) 120 Whatever we type in the shell, if it has a printable representation, will be printed on the console for us (in this case, the result of the function call: 120). So, let's go back to our example, the one with core.py, run.py, util, and so on. In our example, the package util is our utility library. Our custom utility belt that holds all those reusable tools (that is, functions), which we need in our application. Some of them will deal with databases (db.py), some with the network (network.py), and some will perform mathematical calculations (math.py) that are outside the scope of Python's standard math library and therefore, we had to code them for ourselves. Summary In this article, we started to explore the world of programming and that of Python. We saw how Python code can be organized using modules and packages. For more information on Python, refer the following books recomended by Packt Publishing: Learning Python (https://www.packtpub.com/application-development/learning-python) Python 3 Object-oriented Programming - Second Edition (https://www.packtpub.com/application-development/python-3-object-oriented-programming-second-edition) Python Essentials (https://www.packtpub.com/application-development/python-essentials) Resources for Article: Further resources on this subject: Test all the things with Python [article] Putting the Fun in Functional Python [article] Scraping the Web with Python - Quick Start [article]
Read more
  • 0
  • 0
  • 6477

article-image-combine-data-files-within-ibm-spss-modeler
Amey Varangaonkar
22 Feb 2018
6 min read
Save for later

How to combine data files within IBM SPSS Modeler

Amey Varangaonkar
22 Feb 2018
6 min read
[box type="note" align="" class="" width=""]The following extract is taken from the book IBM SPSS Modeler Essentials, written by Keith McCormick and Jesus Salcedo. SPSS Modeler is one of the popularly used enterprise tools for data mining and predictive analytics. [/box] In this article, we will explore how SPSS Modeler can be effectively used to combine different file types for efficient data modeling. In many organizations, different pieces of information for the same individuals are held in separate locations. To be able to analyze such information within Modeler, the data files must be combined into one single file. The Merge node joins two or more data sources so that information held for an individual in different locations can be analyzed collectively. The following diagram shows how the Merge node can be used to combine two separate data files that contain different types of information: Like the Append node, the Merge node is found in the Record Ops palette. This node takes multiple data sources and creates a single source containing all or some of the input fields. Let's go through an example of how to use the Merge node to combine data files: Open the Merge stream. The Merge stream contains the files we previously appended, as well as the main data file we were working with in earlier chapters. 2. Place a Merge node from the Record Ops palette on the canvas. 3. Connect the last Reclassify node to the Merge node. 4. Connect the Filter node to the Merge node. [box type="info" align="" class="" width=""]Like the Append node, the order in which data sources are connected to the Merge node impacts the order in which the sources are displayed. The fields of the first source connected to the Merge node will appear first, followed by the fields of the second source connected to the Merge node, and so on.[/box] 5. Connect the Merge node to a Table node: 6. Edit the Merge node. Since the Merge node can cope with a variety of different situations, the Merge tab allows you to specify the merging method. There are four methods for merging: Order: It joins the first record in the first dataset with the first record in the second dataset, and so on. If any of the datasets run out of records, no further output records are produced. This method can be dangerous if there happens to be any cases that are missing from a file, or if files have been sorted differently. Keys: It is the most commonly used method, used when records that have the same value in the field(s) defined as the key are merged. If multiple records contain the same value on the key field, all possible merges are returned. Condition: It joins records from files that meet a specified condition. Ranked condition: It specifies whether each row pairing in the primary dataset and all secondary datasets are to be merged; use the ranking expression to sort any multiple matches from low to high order. Let's combine these files. To do this: Set Merge Method to Keys. Fields contained in all input sources appear in the Possible keys list. To identify one of more fields as the key field(s), move the selected field into the Keys for merge list. In our case, there are two fields that appear in both files, ID and Year. 2. Select ID in the Possible keys list and place it into the Keys for merge list: There are five major methods of merging using a key field: Include only matching records (inner join) merges only complete records, that is, records; that are available in all datasets. Include matching and non-matching records (full outer join) merges records that appear in any of the datasets; that is, the incomplete records are still retained. The undefined value ($null$) is added to the missing fields and included in the output. Include matching and selected non-matching records (partial outerjoin) performs left and right outer-joins. All records from the specified file are retained, along with only those records from the other file(s) that match records in the specified file on the key field(s). The Select... button allows you to designate which file is to contribute incomplete records. Include records in first dataset not matching any others (anti-join) provides an easy way of identifying records in a dataset that do not have records with the same key values in any of the other datasets involved in the merge. This option only retains records from the dataset that match with no other records. Combine duplicate key fields is the final option in this dialog, and it deals with the problem of duplicate field names (one from each dataset) when key fields are used. This option ensures that there is only one output field with a given name, and this is enabled by default. The Filter tab The Filter tab lists the data sources involved in the merge, and the ordering of the sources determines the field ordering of the merged data. Here, you can rename and remove fields. Earlier, we saw that the field Year appeared in both datasets; here we can remove one version of this field (we could also rename one version of the field to keep both): Click on the arrow next to the second Year field: The second Year field will no longer appear in the combined data file. The Optimization tab The Optimization tab provides two options that allow you to merge data more efficiently when one input dataset is significantly larger than the other datasets, or when the data is already presorted by all or some of the key fields that you are using to merge: Click OK. Run the Table: All of these files have now been combined. The resulting table should have 44 fields and 143,531 records. We saw how the Merge node is used to join data files that contain different information for the same records. If you found this post useful, make sure to check out IBM SPSS Modeler Essentials for more information on leveraging SPSS Modeler to get faster and efficient insights from your data.  
Read more
  • 0
  • 0
  • 6477

article-image-quantum-computing-edge-analytics-and-meta-learning-key-trends-in-data-science-and-big-data-in-2019
Richard Gall
18 Dec 2018
11 min read
Save for later

Quantum computing, edge analytics, and meta learning: key trends in data science and big data in 2019

Richard Gall
18 Dec 2018
11 min read
When historians study contemporary notions of data in the early 21st century, 2018 might well be a landmark year. In many ways this was the year when Big and Important Issues - from the personal to the political - began to surface. The techlash, a term which has defined the year, arguably emerged from conversations and debates about the uses and abuses of data. But while cynicism casts a shadow on the brightly lit data science landcape, there’s still a lot of optimism out there. And more importantly, data isn’t going to drop off the agenda any time soon. However, the changing conversation in 2018 does mean that the way data scientists, analysts, and engineers use data and build solutions for it will change. A renewed emphasis on ethics and security is now appearing, which will likely shape 2019 trends. But what will these trends be? Let’s take a look at some of the most important areas to keep an eye on in the new year. Meta learning and automated machine learning One of the key themes of data science and artificial intelligence in 2019 will be doing more with less. There are a number of ways in which this will manifest itself. The first is meta learning. This is a concept that aims to improve the way that machine learning systems actually work by running machine learning on machine learning systems. Essentially this allows a machine learning algorithm to learn how to learn. By doing this, you can better decide which algorithm is most appropriate for a given problem. Find out how to put meta learning into practice. Learn with Hands On Meta Learning with Python. Automated machine learning is closely aligned with meta learning. One way of understanding it is to see it as putting the concept of automating the application of meta learning. So, if meta learning can help better determine which machine learning algorithms should be applied and how they should be designed, automated machine learning makes that process a little smoother. It builds the decision making into the machine learning solution. Fundamentally, it’s all about “algorithm selection, hyper-parameter tuning, iterative modelling, and model assessment,” as Matthew Mayo explains on KDNuggets. Automated machine learning tools What’s particularly exciting about automated machine learning is that there are already a number of tools that make it relatively easy to do. AutoML is a set of tools developed by Google that can be used on the Google Cloud Platform, while auto-sklearn, built around the scikit-learn library, provides a similar out of the box solution for automated machine learning. Although both AutoML and auto-sklearn are very new, there are newer tools available that could dominate the landscape: AutoKeras and AdaNet. AutoKeras is built on Keras (the Python neural network library), while AdaNet is built on TensorFlow. Both could be more affordable open source alternatives to AutoML. Whichever automated machine learning library gains the most popularity will remain to be seen, but one thing is certain: it makes deep learning accessible to many organizations who previously wouldn’t have had the resources or inclination to hire a team of PhD computer scientists. But it’s important to remember that automated machine learning certainly doesn’t mean automated data science. While tools like AutoML will help many organizations build deep learning models for basic tasks, for organizations that need a more developed data strategy, the role of the data scientist will remain vital. You can’t after all, automate away strategy and decision making. Learn automated machine learning with these titles: Hands-On Automated Machine Learning TensorFlow 1.x Deep Learning Cookbook         Quantum computing Quantum computing, even as a concept, feels almost fantastical. It's not just cutting-edge, it's mind-bending. But in real-world terms it also continues the theme of doing more with less. Explaining quantum computing can be tricky, but the fundamentals are this: instead of a binary system (the foundation of computing as we currently know it), which can be either 0 or 1, in a quantum system you have qubits, which can be 0, 1 or both simultaneously. (If you want to learn more, read this article). What Quantum computing means for developers So, what does this mean in practice? Essentially, because the qubits in a quantum system can be multiple things at the same time, you are then able to run much more complex computations. Think about the difference in scale: running a deep learning system on a binary system has clear limits. Yes, you can scale up in processing power, but you’re nevertheless constrained by the foundational fact of zeros and ones. In a quantum system where that restriction no longer exists, the scale of the computing power at your disposal increases astronomically. Once you understand the fundamental proposition, it becomes much easier to see why the likes of IBM and Google are clamouring to develop and deploy quantum technology. One of the most talked about use cases is using Quantum computers to find even larger prime numbers (a move which contains risks given prime numbers are the basis for much modern encryption). But there other applications, such as in chemistry, where complex subatomic interactions are too detailed to be modelled by a traditional computer. It’s important to note that Quantum computing is still very much in its infancy. While Google and IBM are leading the way, they are really only researching the area. It certainly hasn’t been deployed or applied in any significant or sustained way. But this isn’t to say that it should be ignored. It’s going to have a huge impact on the future, and more importantly it’s plain interesting. Even if you don’t think you’ll be getting to grips with quantum systems at work for some time (a decade at best), understanding the principles and how it works in practice will not only give you a solid foundation for major changes in the future, it will also help you better understand some of the existing challenges in scientific computing. And, of course, it will also make you a decent conversationalist at dinner parties. Who's driving Quantum computing forward? If you want to get started, Microsoft has put together the Quantum Development Kit, which includes the first quantum-specific programming language Q#. IBM, meanwhile, has developed its own Quantum experience, which allows engineers and researchers to run quantum computations in the IBM cloud. As you investigate these tools you’ll probably get the sense that no one’s quite sure what to do with these technologies. And that’s fine - if anything it makes it the perfect time to get involved and help further research and thinking on the topic. Get a head start in the Quantum Computing revolution. Pre-order Mastering Quantum Computing with IBM QX.           Edge analytics and digital twins While Quantum lingers on the horizon, the concept of the edge has quietly planted itself at the very center of the IoT revolution. IoT might still be the term that business leaders and, indeed, wider society are talking about, for technologists and engineers, none of its advantages would be possible without the edge. Edge computing or edge analytics is essentially about processing data at the edge of a network rather than within a centralized data warehouse. Again, as you can begin to see, the concept of the edge allows you to do more with less. More speed, less bandwidth (as devices no longer need to communicate with data centers), and, in theory, more data. In the context of IoT, where just about every object in existence could be a source of data, moving processing and analytics to the edge can only be a good thing. Will the edge replace the cloud? There's a lot of conversation about whether edge will replace cloud. It won't. But it probably will replace the cloud as the place where we run artificial intelligence. For example, instead of running powerful analytics models in a centralized space, you can run them at different points across the network. This will dramatically improve speed and performance, particularly for those applications that run on artificial intelligence. A more distributed world Think of it this way: just as software has become more distributed in the last few years, thanks to the emergence of the edge, data itself is going to be more distributed. We'll have billions of pockets of activity, whether from consumers or industrial machines, a locus of data-generation. Find out how to put the principles of edge analytics into practice: Azure IoT Development Cookbook Digital twins An emerging part of the edge computing and analytics trend is the concept of digital twins. This is, admittedly, still something in its infancy, but in 2019 it’s likely that you’ll be hearing a lot more about digital twins. A digital twin is a digital replica of a device that engineers and software architects can monitor, model and test. For example, if you have a digital twin of a machine, you could run tests on it to better understand its points of failure. You could also investigate ways you could make the machine more efficient. More importantly, a digital twin can be used to help engineers manage the relationship between centralized cloud and systems at the edge - the digital twin is essentially a layer of abstraction that allows you to better understand what’s happening at the edge without needing to go into the detail of the system. For those of us working in data science, digital twins provide better clarity and visibility on how disconnected aspects of a network interact. If we’re going to make 2019 the year we use data more intelligently - maybe even more humanely - then this is precisely the sort of thing we need. Interpretability, explainability, and ethics Doing more with less might be one of the ongoing themes in data science and big data in 2019, but we can’t ignore the fact that ethics and security will remain firmly on the agenda. Although it’s easy to dismiss these issues issues as separate from the technical aspects of data mining, processing, and analytics, but it is, in fact, deeply integrated into it. One of the key facets of ethics are two related concepts: explainability and interpretability. The two terms are often used interchangeably, but there are some subtle differences. Explainability is the extent to which the inner-working of an algorithm can be explained in human terms, while interpretability is the extent to which one can understand the way in which it is working (eg. predict the outcome in a given situation). So, an algorithm can be interpretable, but you might not quite be able to explain why something is happening. (Think about this in the context of scientific research: sometimes, scientists know that a thing is definitely happening, but they can’t provide a clear explanation for why it is.) Improving transparency and accountability Either way, interpretability and explainability are important because they can help to improve transparency in machine learning and deep learning algorithms. In a world where deep learning algorithms are being applied to problems in areas from medicine to justice - where the problem of accountability is particularly fraught - this transparency isn’t an option, it’s essential. In practice, this means engineers must tweak the algorithm development process to make it easier for those outside the process to understand why certain things are happening and why they aren't. To a certain extent, this ultimately requires the data science world to take the scientific method more seriously than it has done. Rather than just aiming for accuracy (which is itself often open to contestation), the aim is to constantly manage that gap between what we’re trying to achieve with an algorithm and how it goes about actually doing that. You can learn the basics of building explainable machine learning models in the Getting Started with Machine Learning in Python video.          Transparency and innovation must go hand in hand in 2019 So, there are two fundamental things for data science in 2019: improving efficiency, and improving transparency. Although the two concepts might look like the conflict with each other, it's actually a bit of a false dichotomy. If we realised that 12 months ago, we might have avoided many of the issues that have come to light this year. Transparency has to be a core consideration for anyone developing systems for analyzing and processing data. Without it, the work your doing might be flawed or unnecessary. You’re only going to need to add further iterations to rectify your mistakes or modify the impact of your biases. With this in mind, now is the time to learn the lessons of 2018’s techlash. We need to commit to stopping the miserable conveyor belt of scandal and failure. Now is the time to find new ways to build better artificial intelligence systems.
Read more
  • 0
  • 0
  • 6476
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-testing-your-application-cljstest
Packt
11 May 2016
13 min read
Save for later

Testing Your Application with cljs.test

Packt
11 May 2016
13 min read
In this article written by David Jarvis, Rafik Naccache, and Allen Rohner, authors of the book Learning ClojureScript, we'll take a look at how to configure our ClojureScript application or library for testing. As usual, we'll start by creating a new project for us to play around with: $ lein new figwheel testing (For more resources related to this topic, see here.) We'll be playing around in a test directory. Most JVM Clojure projects will have one already, but since the default Figwheel template doesn't include a test directory, let's make one first (following the same convention used with source directories, that is, instead of src/$PROJECT_NAME we'll create test/$PROJECT_NAME): $ mkdir –p test/testing We'll now want to make sure that Figwheel knows that it has to watch the test directory for file modifications. To do that, we will edit the the dev build in our project.clj project's :cljsbuild map so that it's :source-paths vector includes both src and test. Your new dev build configuration should look like the following: {:id "dev" :source-paths ["src" "test"] ;; If no code is to be run, set :figwheel true for continued automagical reloading :figwheel {:on-jsload "testing.core/on-js-reload"} :compiler {:main testing.core :asset-path "js/compiled/out" :output-to "resources/public/js/compiled/testing.js" :output-dir "resources/public/js/compiled/out" :source-map-timestamp true}} Next, we'll get the old Figwheel REPL going so that we can have our ever familiar hot reloading: $ cd testing $ rlwrap lein figwheel Don't forget to navigate a browser window to http://localhost:3449/ to get the browser REPL to connect. Now, let's create a new core_test.cljs file in the test/testing directory. By convention, most libraries and applications in Clojure and ClojureScript have test files that correspond to source files with the suffix _test. In this project, this means that test/testing/core_test.cljs is intended to contain the tests for src/testing/core.cljs. Let's get started by just running tests on a single file. Inside core_test.cljs, let's add the following code: (ns testing.core-test (:require [cljs.test :refer-macros [deftest is]])) (deftest i-should-fail (is (= 1 0))) (deftest i-should-succeed (is (= 1 1))) This code first requires two of the most important cljs.test macros, and then gives us two simple examples of what a failed test and a successful test should look like. At this point, we can run our tests from the Figwheel REPL: cljs.user=> (require 'testing.core-test) ;; => nil cljs.user=> (cljs.test/run-tests 'testing.core-test) Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 2 tests containing 2 assertions. 1 failures, 0 errors. ;; => nil At this point, what we've got is tolerable, but it's not really practical in terms of being able to test a larger application. We don't want to have to test our application in the REPL and pass in our test namespaces one by one. The current idiomatic solution for this in ClojureScript is to write a separate test runner that is responsible for important executions and then run all of your tests. Let's take a look at what this looks like. Let's start by creating another test namespace. Let's call this one app_test.cljs, and we'll put the following in it: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is]])) (deftest another-successful-test (is (= 4 (count "test")))) We will not do anything remarkable here; it's just another test namespace with a single test that should pass by itself. Let's quickly make sure that's the case at the REPL: cljs.user=> (require 'testing.app-test) nil cljs.user=> (cljs.test/run-tests 'testing.app-test) Testing testing.app-test Ran 1 tests containing 1 assertions. 0 failures, 0 errors. ;; => nil Perfect. Now, let's write a test runner. Let's open a new file that we'll simply call test_runner.cljs, and let's include the following: (ns testing.test-runner (:require [cljs.test :refer-macros [run-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (defn run-all-tests [] (run-tests 'testing.app-test 'testing.core-test)) Again, nothing surprising. We're just making a single function for us that runs all of our tests. This is handy for us at the REPL: cljs.user=> (testing.test-runner/run-all-tests) Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. ;; => nil Ultimately, however, we want something we can run at the command line so that we can use it in a continuous integration environment. There are a number of ways we can go about configuring this directly, but if we're clever, we can let someone else do the heavy lifting for us. Enter doo, the handy ClojureScript testing plugin for Leiningen. Using doo for easier testing configuration doo is a library and Leiningen plugin for running cljs.test in many different JavaScript environments. It makes it easy to test your ClojureScript regardless of whether you're writing for the browser or for the server, and it also includes file watching capabilities such as Figwheel so that you can automatically rerun tests on file changes. The doo project page can be found at https://github.com/bensu/doo. To configure our project to use doo, first we need to add it to the list of plugins in our project.clj file. Modify the :plugins key so that it looks like the following: :plugins [[lein-figwheel "0.5.2"] [lein-doo "0.1.6"] [lein-cljsbuild "1.1.3" :exclusions [[org.clojure/clojure]]]] Next, we will add a new cljsbuild build configuration for our test runner. Add the following build map after the dev build map on which we've been working with until now: {:id "test" :source-paths ["src" "test"] :compiler {:main testing.test-runner :output-to "resources/public/js/compiled/testing_test.js" :optimizations :none}} This configuration tells Cljsbuild to use both our src and test directories, just like our dev profile. It adds some different configuration elements to the compiler options, however. First, we're not using testing.core as our main namespace anymore—instead, we'll use our test runner's namespace, testing.test-runner. We will also change the output JavaScript file to a different location from our compiled application code. Lastly, we will make sure that we pass in :optimizations :none so that the compiler runs quickly and doesn't have to do any magic to look things up. Note that our currently running Figwheel process won't know about the fact that we've added lein-doo to our list of plugins or that we've added a new build configuration. If you want to make Figwheel aware of doo in a way that'll allow them to play nicely together, you should also add doo as a dependency to your project. Once you've done that, exit the Figwheel process and restart it after you've saved the changes to project.clj. Lastly, we need to modify our test runner namespace so that it's compatible with doo. To do this, open test_runner.cljs and change it to the following: (ns testing.test-runner (:require [doo.runner :refer-macros [doo-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (doo-tests 'testing.app-test 'testing.core-test) This shouldn't look too different from our original test runner—we're just importing from doo.runner rather than cljs.test and using doo-tests instead of a custom runner function. The doo-tests runner works very similarly to cljs.test/run-tests, but it places hooks around the tests to know when to start them and finish them. We're also putting this at the top-level of our namespace rather than wrapping it in a particular function. The last thing we're going to need to do is to install a JavaScript runtime that we can use to execute our tests with. Up until now, we've been using the browser via Figwheel, but ideally, we want to be able to run our tests in a headless environment as well. For this purpose. we recommend installing PhantomJS (though other execution environments are also fine). If you're on OS X and have Homebrew installed (http://www.brew.sh), installing PhantomJS is as simple as typing brew install phantomjs. If you're not on OS X or don't have Homebrew, you can find instructions on how to install PhantomJS on the project's website at http://phantomjs.org/. The key thing is that the following should work: $ phantomjs -v 2.0.0 Once you've got PhantomJS installed, you can now invoke your test runner from the command line with the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Let's break down this command. The first part, lein doo, just tells Leiningen to invoke the doo plugin. Next, we have phantom, which tells doo to use PhantomJS as its running environment. The doo plugin supports a number of other environments, including Chrome, Firefox, Internet Explorer, Safari, Opera, SlimerJS, NodeJS, Rhino, and Nashorn. Be aware that if you're interested in running doo on one of these other environments, you may have to configure and install additional software. For instance, if you want to run tests on Chrome, you'll need to install Karma as well as the appropriate Karma npm modules to enable Chrome interaction. Next we have test, which refers to the cljsbuild build ID we set up earlier. Lastly, we have once, which tells doo to just run tests and not to set up a filesystem watcher. If, instead, we wanted doo to watch the filesystem and rerun tests on any changes, we would just use lein doo phantom test. Testing fixtures The cljs.test project has support for adding fixtures to your tests that can run before and after your tests. Test fixtures are useful for establishing isolated states between tests—for instance, you can use tests to set up a specific database state before each test and to tear it down afterward. You can add them to your ClojureScript tests by declaring them with the use-fixtures macro within the testing namespace you want fixtures applied to. Let's see what this looks like in practice by changing one of our existing tests and adding some fixtures to it. Modify app-test.cljs to the following: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is use-fixtures]])) ;; Run these fixtures for each test. ;; We could also use :once instead of :each in order to run ;; fixtures once for the entire namespace instead of once for ;; each individual test. (use-fixtures :each {:before (fn [] (println "Setting up tests...")) :after (fn [] (println "Tearing down tests..."))}) (deftest another-successful-test ;; Give us an idea of when this test actually executes. (println "Running a test...") (is (= 4 (count "test")))) Here, we've added a call to use-fixtures that prints to the console before and after running the test, and we've added a println call to the test itself so that we know when it executes. Now when we run this test, we get the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Setting up tests... Running a test... Tearing down tests... Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Note that our fixtures get called in the order we expect them to. Asynchronous testing Due to the fact that client-side code is frequently asynchronous and JavaScript is single threaded, we need to have a way to support asynchronous tests. To do this, we can use the async macro from cljs.test. Let's take a look at an example using an asynchronous HTTP GET request. First, let's modify our project.clj file to add cljs-ajax to our dependencies. Our dependencies project key should now look something like this: :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/clojurescript "1.7.228"] [cljs-ajax "0.5.4"] [org.clojure/core.async "0.2.374" :exclusions [org.clojure/tools.reader]]] Next, let's create a new async_test.cljs file in our test.testing directory. Inside it, we will add the following code: (ns testing.async-test (:require [ajax.core :refer [GET]] [cljs.test :refer-macros [deftest is async]])) (deftest test-async (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!"))})) Note that we're not using async in our test at the moment. Let's try running this test with doo (don't forget that you have to add testing.async-test to test_runner.cljs!): $ lein doo phantom test once ... Testing testing.async-test ... Ran 4 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Now, our test here passes, but note that the println async code never fires, and our additional assertion doesn't get called (looking back at our previous examples, since we've added a new is assertion we should expect to see four assertions in the final summary)! If we actually want our test to appropriately validate the error-handler callback within the context of the test, we need to wrap it in an async block. Doing so gives us a test that looks like the following: (deftest test-async (async done (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!") (done))}))) Now, let's try to run our tests again: $ lein doo phantom test once ... Testing testing.async-test Test finished! ... Ran 4 tests containing 4 assertions. 1 failures, 0 errors. Subprocess failed Awesome! Note that this time we see the printed statement from our callback, and we can see that cljs.test properly ran all four of our assertions. Asynchronous fixtures One final "gotcha" on testing—the fixtures we talked about earlier in this article do not handle asynchronous code automatically. This means that if you have a :before fixture that executes asynchronous logic, your test can begin running before your fixture has completed! In order to get around this, all you need to do is to wrap your :before fixture in an async block, just like with asynchronous tests. Consider the following for instance: (use-fixtures :once {:before #(async done ... (done)) :after #(do ...)}) Summary This concludes our section on cljs.test. Testing, whether in ClojureScript or any other language, is a critical software engineering best practice to ensure that your application behaves the way you expect it to and to protect you and your fellow developers from accidentally introducing bugs to your application. With cljs.test and doo, you have the power and flexibility to test your ClojureScript application with multiple browsers and JavaScript environments and to integrate your tests into a larger continuous testing framework. Resources for Article: Further resources on this subject: Clojure for Domain-specific Languages - Design Concepts with Clojure [article] Visualizing my Social Graph with d3.js [article] Improving Performance with Parallel Programming [article]
Read more
  • 0
  • 0
  • 6472

article-image-how-to-handle-exceptions-and-synchronization-methods-with-selenium-webdriver-api
Amey Varangaonkar
02 Apr 2018
11 min read
Save for later

How to handle exceptions and synchronization methods with Selenium WebDriver API

Amey Varangaonkar
02 Apr 2018
11 min read
One of the areas often misunderstood, but is important in framework design is exception handling. Users must program into their tests and methods on how to handle exceptions that might occur in tests, including those that are thrown by applications themselves, and those that occur using the Selenium WebDriver API. In this article, we will see how to do that effectively. Let us look at different kinds of exceptions that users must account for: Implicit exceptions: Implicit exceptions are internal exceptions raised by the API method when a certain condition is not met, such as an illegal index of an array, null pointer, file not found, or something unexpected occurring at runtime. Explicit exceptions: Explicit exceptions are thrown by the user to transfer control out of the current method, and to another event handler when certain conditions are not met, such as an object is not found on the page, a test verification fails, or something expected as a known state is not met. In other words, the user is predicting that something will occur, and explicitly throws an exception if it does not. WebDriver exceptions: The Selenium WebDriver API has its own set of exceptions that can implicitly occur when elements are not found, elements are not visible, elements are not enabled or clickable, and so on. They are thrown by the WebDriver API method, but users can catch those exceptions and explicitly handle them in a predictable way. Try...catch blocks: In Java, exception handling can be completely controlled using a try...catch block of statements to transfer control to another method, so that the exit out of the current routine doesn't transfer control to the call handler up the chain, but rather, is handled in a predictable way before the exception is thrown. Let us examine the different ways of handling exceptions during automated testing. Implicit exception handling A simple example of Selenium WebDriver implicit exception handling can be described as follows: Define an element on a page Create a method to retrieve the text from the element on the page In the signature of the method, add throws Exception Do not handle a specific exception like ElementNotFoundException: // create a method to retrieve the text from an element on a page @FindBy(id="submit") protected M submit; public String getText(WebElement element) throws Exception { return element.getText(); } // use the method LoginPO.getText(submit); Now, when using an assertion method, TestNG will implicitly throw an exception if the condition is not met: Define an element on a page Create a method to verify the text of the element on a page Cast the expected and actual text to the TestNG's assertEquals method TestNG will throw an AssertionError TestNG engages the difference viewer to compare the result if it fails: // create a method to verify the text from an element on a page @FindBy(id="submit") protected M submit; public void verifyText(WebElement element, String expText) throws AssertionError { assertEquals(element.getText(), expText, "Verify Submit Button Text"); } // use the method LoginPO.verifyText(submit, "Sign Inx"); // throws AssertionError java.lang.AssertionError: Verify Text Label expected [ Sign Inx] but found [ Sign In] Expected : Sign Inx Actual : Sign In <Click to see difference> TestNG difference viewer When using the TestNG's assertEquals methods, a difference viewer will be engaged if the comparison fails. There will be a link in the stacktrace in the console to open it. Since it is an overloaded method, it can take a number of data types, such as String, Integer, Boolean, Arrays, Objects, and so on. The following screenshot displays the TestNG difference viewer: Explicit exception handling In cases where the user can predict when an error might occur in the application, they can check for that error and explicitly raise an exception if it is found. Take the login function of a browser or mobile application as an example. If the user credentials are incorrect, the app will throw an exception saying something like "username invalid, try again" or "password incorrect, please re-enter". The exception can be explicitly handled in a way that the actual error message can be thrown in the exception. Here is an example of the login method we wrote earlier with exception handling added to it: @FindBy(id="myApp_exception") protected M error; /** * login - method to login to app with error handling * * @param username * @param password * @throws Exception */ public void login(String username, String password) throws Exception { if ( !this.username.getAttribute("value").equals("") ) { this.username.clear(); } this.username.sendKeys(username); if ( !this.password.getAttribute( "value" ).equals( "" ) ) { this.password.clear(); } this.password.sendKeys(password); submit.click(); // exception handling if ( BrowserUtils.elementExists(error, Global_VARS.TIMEOUT_SECOND) ) { String getError = error.getText(); throw new Exception("Login Failed with error = " + getError); } } Try...catch exception handling Now, sometimes the user will want to trap an exception instead of throwing it, and perform some other action such as retry, reload page, cleanup dialogs, and so on. In cases like that, the user can use try...catch in Java to trap the exception. The action would be included in the try clause, and the user can decide what to do in the catch condition. Here is a simple example that uses the ExpectedConditions method to look for an element on a page, and only return true or false if it is found. No exception will be raised:  /** * elementExists - wrapper around the WebDriverWait method to * return true or false * * @param element * @param timer * @throws Exception */ public static boolean elementExists(WebElement element, int timer) { try { WebDriver driver = CreateDriver.getInstance().getCurrentDriver(); WebDriverWait exists = new WebDriverWait(driver, timer); exists.until(ExpectedConditions.refreshed( ExpectedConditions.visibilityOf(element))); return true; } catch (StaleElementReferenceException | TimeoutException | NoSuchElementException e) { return false; } } In cases where the element is not found on the page, the Selenium WebDriver will return a specific exception such as ElementNotFoundException. If the element is not visible on the page, it will return ElementNotVisibleException, and so on. Users can catch those specific exceptions in a try...catch...finally block, and do something specific for each type (reload page, re-cache element, and so on): try { .... } catch(ElementNotFoundException e) { // do something } catch(ElementNotVisibleException f) { // do something else } finally { // cleanup } Synchronizing methods Earlier, the login method was introduced, and in that method, we will now call one of the synchronization methods waitFor(title, timer) that we created in the utility classes. This method will wait for the login page to appear with the title element as defined. So, in essence, after the URL is loaded, the login method is called, and it synchronizes against a predefined page title. If the waitFor method doesn't find it, it will throw an exception, and the login will not be attempted. It's important to predict and synchronize the page object methods so that they do not get out of "sync" with the application and continue executing when a state has not been reached during the test. This becomes a tedious process during the development of the page object methods, but pays big dividends in the long run when making those methods "robust". Also, users do not have to synchronize before accessing each element. Usually, you would synchronize against the last control rendered on a page when navigating between them. In the same login method, it's not enough to just check and wait for the login page title to appear before logging in; users must also wait for the next page to render, that being the home page of the application. So, finally, in the login method we just built, another waitFor will be added: public void login(String username, String password) throws Exception { BrowserUtils.waitFor(getPageTitle(), getElementWait()); if ( !this.username.getAttribute("value").equals("") ) { this.username.clear(); } this.username.sendKeys(username); if ( !this.password.getAttribute( "value" ).equals( "" ) ) { this.password.clear(); } this.password.sendKeys(password); submit.click(); // exception handling if ( BrowserUtils.elementExists(error, Global_VARS.TIMEOUT_SECOND) ) { String getError = error.getText(); throw new Exception("Login Failed with error = " + getError); } // wait for the home page to appear BrowserUtils.waitFor(new MyAppHomePO<WebElement>().getPageTitle(), getElementWait()); } Table classes When building the page object classes, there will frequently be components on a page that are common to multiple pages, but not all pages, and rather than including the similar locators and methods in each class, users can build a common class for just that portion of the page. HTML tables are a typical example of a common component that can be classed. So, what users can do is create a generic class for the common table rows and columns, extend the subclasses that have a table with this new class, and pass in the dynamic ID or locator to the constructor when extending the subclass with that table class. Let's take a look at how this is done: Create a new page object class for the table component in the application, but do not derive it from the base class in the framework In the constructor of the new class, add a parameter of the type WebElement, requiring users to pass in the static element defined in each subclass for that specific table Create generic methods to get the row count, column count, row data, and cell data for the table In each subclass that inherits these methods, implement them for each page, varying the starting row number and/or column header rows if <th> is used rather than <tr> When the methods are called on each table, it will identify them using the WebElement passed into the constructor: /** * WebTable Page Object Class * * @author Name */ public class WebTablePO { private WebElement table; /** constructor * * @param table * @throws Exception */ public WebTablePO(WebElement table) throws Exception { setTable(table); } /** * setTable - method to set the table on the page * * @param table * @throws Exception */ public void setTable(WebElement table) throws Exception { this.table = table; } /** * getTable - method to get the table on the page * * @return WebElement * @throws Exception */ public WebElement getTable() throws Exception { return this.table; } .... Now, the structure of the class is simple so far, so let's add in some common "generic" methods that can be inherited and extended by each subclass that extends the class: // Note: JavaDoc will be eliminated in these examples for simplicity sake public int getRowCount() { List<WebElement> tableRows = table.findElements(By.tagName("tr")); return tableRows.size(); } public int getColumnCount() { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement headerRow = tableRows.get(1); List<WebElement> tableCols = headerRow.findElements(By.tagName("td")); return tableCols.size(); } public int getColumnCount(int index) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement headerRow = tableRows.get(index); List<WebElement> tableCols = headerRow.findElements(By.tagName("td")); return tableCols.size(); } public String getRowData(int rowIndex) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement currentRow = tableRows.get(rowIndex); return currentRow.getText(); } public String getCellData(int rowIndex, int colIndex) { List<WebElement> tableRows = table.findElements(By.tagName("tr")); WebElement currentRow = tableRows.get(rowIndex); List<WebElement> tableCols = currentRow.findElements(By.tagName("td")); WebElement cell = tableCols.get(colIndex - 1); return cell.getText(); } Finally, let's extend a subclass with the new WebTablePO class, and implement some of the methods: /** * Homepage Page Object Class * * @author Name */ public class MyHomepagePO<M extends WebElement> extends WebTablePO<M> { public MyHomepagePO(M table) throws Exception { super(table); } @FindBy(id = "my_table") protected M myTable; // table methods public int getTableRowCount() throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getRowCount(); } public int getTableColumnCount() throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getColumnCount(); } public int getTableColumnCount(int index) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getColumnCount(index); } public String getTableCellData(int row, int column) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getCellData(row, column); } public String getTableRowData(int row) throws Exception { WebTablePO table = new WebTablePO(getTable()); return table.getRowData(row).replace("\n", " "); } public void verifyTableRowData(String expRowText) { String actRowText = ""; int totalNumRows = getTableRowCount(); // parse each row until row data found for ( int i = 0; i < totalNumRows; i++ ) { if ( this.getTableRowData(i).contains(expRowText) ) { actRowText = this.getTableRowData(i); break; } } // verify the row data try { assertEquals(actRowText, expRowText, "Verify Row Data"); } catch (AssertionError e) { String error = "Row data '" + expRowText + "' Not found!"; throw new Exception(error); } } } We saw, how fairly effective it is to handle object class methods, especially when it comes to handling synchronization and exceptions. You read an excerpt from the book Selenium Framework Design in Data-Driven Testing by Carl Cocchiaro. The book will show you how to design your own automation testing framework without any hassle.
Read more
  • 0
  • 0
  • 6472

article-image-untangle-vpn-services
Packt
30 Oct 2014
18 min read
Save for later

Untangle VPN Services

Packt
30 Oct 2014
18 min read
This article by Abd El-Monem A. El-Bawab, the author of Untangle Network Security, covers the Untangle solution, OpenVPN. OpenVPN is an SSL/TLS-based VPN, which is mainly used for remote access as it is easy to configure and uses clients that can work on multiple operating systems and devices. OpenVPN can also provide site-to-site connections (only between two Untangle servers) with limited features. (For more resources related to this topic, see here.) OpenVPN Untangle's OpenVPN is an SSL-based VPN solution that is based on the well-known open source application, OpenVPN. Untangle's OpenVPN is mainly used for client-to-site connections with a client feature that is easy to deploy and configure, which is widely available for Windows, Mac, Linux, and smartphones. Untangle's OpenVPN can also be used for site-to-site connections but the two sites need to have Untangle servers. Site-to-site connections between Untangle and third-party devices are not supported. How OpenVPN works In reference to the OSI model, an SSL/TLS-based VPN will only encrypt the application layer's data, while the lower layer's information will be transferred unencrypted. In other words, the application packets will be encrypted. The IP addresses of the server and client are visible; the port number that the server uses for communication between the client and server is also visible, but the actual application port number is not visible. Furthermore, the destination IP address will not be visible; only the VPN server IP address is seen. Secure Sockets Layer (SSL) and Transport Layer Security (TLS) refer to the same thing. SSL is the predecessor of TLS. SSL was originally developed by Netscape and many releases were produced (V.1 to V.3) till it got standardized under the TLS name. The steps to create an SSL-based VPN are as follows: The client will send a message to the VPN server that it wants to initiate an SSL session. Also, it will send a list of all ciphers (hash and encryption protocols) that it supports. The server will respond with a set of selected ciphers and will send its digital certificate to the client. The server's digital certificate includes the server's public key. The client will try to verify the server's digital certificate by checking it against trusted certificate authorities and by checking the certificate's validity (valid from and valid through dates). The server may need to authenticate the client before allowing it to connect to the internal network. This could be achieved either by asking for a valid username and password or by using the user's digital identity certificates. Untangle NGFW uses the digital certificates method. The client will create a session key (which will be used to encrypt the transferred data between the two devices) and will send this key to the server encrypted using the server's public key. Thus, no third party can get the session key as the server is the only device that can decrypt the session key as it's the only party that has the private key. The server will acknowledge the client that it received the session key and is ready for the encrypted data transformation. Configuring Untangle's OpenVPN server settings After installing the OpenVPN application, the application will be turned off. You'll need to turn it on before you can use it. You can configure Untangle's OpenVPN server settings under OpenVPN settings | Server. The settings configure how OpenVPN will be a server for remote clients (which can be clients on Windows, Linux, or any other operating systems, or another Untangle server). The different available settings are as follows: Site Name: This is the name of the OpenVPN site that is used to define the server among other OpenVPN servers inside your origination. This name should be unique across all Untangle servers in the organization. A random name is automatically chosen for the site name. Site URL: This is the URL that the remote client will use to reach this OpenVPN server. This can be configured under Config | Administration | Public Address. If you have more than one WAN interface, the remote client will first try to initiate the connection using the settings defined in the public address. If this fails, it will randomly try the IP of the remaining WAN interfaces. Server Enabled: If checked, the OpenVPN server will run and accept connections from the remote clients. Address Space: This defines the IP subnet that will be used to assign IPs for the remote VPN clients. The value in Address Space must be unique and separate across all existing networks and other OpenVPN address spaces. A default address space will be chosen that does not conflict with the existing configuration: Configuring Untangle's OpenVPN remote client settings Untangle's OpenVPN allows you to create OpenVPN clients to give your office employees, who are out of the company, the ability to remotely access your internal network resources via their PCs and/or smartphones. Also, an OpenVPN client can be imported to another Untangle server to provide site-to-site connection. Each OpenVPN client will have its unique IP (from the address space range defined previously). Thus, each OpenVPN client can only be used for one user. For multiple users, you'll have to create multiple clients as using the same client for multiple users will result in client disconnection issues. Creating a remote client You can create remote access clients by clicking on the Add button located under OpenVPN Settings | Server | Remote Clients. A new window will open, which has the following settings: Enabled: If this checkbox is checked, it will allow the client connection to the OpenVPN server. If unchecked, it will not allow the client connection. Client Name: Give a unique name for the client; this will help you identify the client. Only alphanumeric characters are allowed. Group: Specify the group the client will be a member of. Groups are used to apply similar settings to their members. Type: Select Individual Client for remote access and Network for site-to-site VPN. The following screenshot shows a remote access client created for JDoe: After configuring the client settings, you'll need to press the Done button and then the OK or Apply button to save this client configuration. The new client will be available under the Remote Clients tab, as shown in the following screenshot: Understanding remote client groups Groups are used to group clients together and apply similar settings to the group members. By default, there will be a Default Group. Each group has the following settings: Group Name: Give a suitable name for the group that describes the group settings (for example, full tunneling clients) or the target clients (for example, remote access clients). Full Tunnel: If checked, all the traffic from the remote clients will be sent to the OpenVPN server, which allows Untangle to filter traffic directed to the Internet. If unchecked, the remote client will run in the split tunnel mode, which means that the traffic directed to local resources behind Untangle is sent through VPN, and the traffic directed to the Internet is sent by the machine's default gateway. You can't use Full Tunnel for site-to-site connections. Push DNS: If checked, the remote OpenVPN client will use the DNS settings defined by the OpenVPN server. This is useful to resolve local names and services. Push DNS server: If the OpenVPN server is selected, remote clients will use the OpenVPN server for DNS queries. If set to Custom, DNS servers configured here will be used for DNS queries. Push DNS Custom 1: If the Push DNS server is set to Custom, the value configured here will be used as a primary DNS server for the remote client. If blank, no settings will be pushed for the remote client. Push DNS Custom 2: If the Push DNS server is set to Custom, the value configured here will be used as a secondary DNS server for the remote client. If blank, no settings will be pushed for the remote client. Push DNS Domain: The configured value will be pushed to the remote clients to extend their domain's search path during DNS resolution. The following screenshot illustrates all these settings: Defining the exported networks Exported networks are used to define the internal networks behind the OpenVPN server that the remote client can reach after successful connection. Additional routes will be added to the remote client's routing table that state that the exported networks (the main site's internal subnet) are reachable through the OpenVPN server. By default, each static non-WAN interface network will be listed in the Exported Networks list: You can modify the default settings or create new entries. The Exported Networks settings are as follows: Enabled: If checked, the defined network will be exported to the remote clients. Export Name: Enter a suitable name for the exported network. Network: This defines the exported network. The exported network should be written in CIDR form. These settings are illustrated in the following screenshot: Using OpenVPN remote access clients So far, we have been configuring the client settings but didn't create the real package to be used on remote systems. We can get the remote client package by pressing the Download Client button located under OpenVPN Settings | Server | Remote Clients, which will start the process of building the OpenVPN client that will be distributed: There are three available options to download the OpenVPN client. The first option is to download the client as a .exe file to be used with the Windows operating system. The second option is to download the client configuration files, which can be used with the Apple and Linux operating systems. The third option is similar to the second one except that the configuration file will be imported to another Untangle NGFW server, which is used for site-to-site scenarios. The following screenshot illustrates this: The configuration files include the following files: <Site_name>.ovpn <Site_name>.conf Keys<Site_name>.-<User_name>.crt Keys<Site_name>.-<User_name>.key Keys<Site_name>.-<User_name>-ca.crt The certificate files are for the client authentication, and the .ovpn and .conf files have the defined connection settings (that is, the OpenVPN server IP, used port, and used ciphers). The following screenshot shows the .ovpn file for the site Untangle-1849: As shown in the following screenshot, the created file (openvpn-JDoe-setup.exe) includes the client name, which helps you identify the different clients and simplifies the process of distributing each file to the right user: Using an OpenVPN client with Windows OS Using an OpenVPN client with the Windows operating system is really very simple. To do this, perform the following steps: Set up the OpenVPN client on the remote machine. The setup is very easy and it's just a next, next, install, and finish setup. To set up and run the application as an administrator is important in order to allow the client to write the VPN routes to the Windows routing table. You should run the client as an administrator every time you use it so that the client can create the required routes. Double-click on the OpenVPN icon on the Windows desktop: The application will run in the system tray: Right-click on the system tray of the application and select Connect. The client will start to initiate the connection to the OpenVPN server and a window with the connection status will appear, as shown in the following screenshot: Once the VPN tunnel is initiated, a notification will appear from the client with the IP assigned to it, as shown in the following screenshot: If the OpenVPN client was running in the task bar and there was an established connection, the client will automatically reconnect to the OpenVPN server if the tunnel was dropped due to Windows being asleep. By default, the OpenVPN client will not start at the Windows login. We can change this and allow it to start without requiring administrative privileges by going to Control Panel | Administrative Tools | Services and changing the OpenVPN service's Startup Type to automatic. Now, in the start parameters field, put –-connect <Site_name>.ovpn; you can find the <site_name>.ovpn under C:Program FilesOpenVPNconfig. Using OpenVPN with non-Windows clients The method to configure OpenVPN clients to work with Untangle is the same for all non-Windows clients. Simply download the .zip file provided by Untangle, which includes the configuration and certificate files, and place them into the application's configuration folder. The steps are as follows: Download and install any of the following OpenVPN-compatible clients for your operating system: For Mac OS X, Untangle, Inc. suggests using Tunnelblick, which is available at http://code.google.com/p/tunnelblick For Linux, OpenVPN clients for different Linux distros can be found at https://openvpn.net/index.php/access-server/download-openvpn-as-sw.html OpenVPN connect for iOS is available at https://itunes.apple.com/us/app/openvpn-connect/id590379981?mt=8 OpenVPN for Android 4.0+ is available at https://play.google.com/store/apps/details?id=net.openvpn.openvpn Log in to the Untangle NGFW server, download the .zip client configuration file, and extract the files from the .zip file. Place the configuration files into any of the following OpenVPN-compatible applications: Tunnelblick: Manually copy the files into the Configurations folder located at ~/Library/Application Support/Tunnelblick. Linux: Copy the extracted files into /etc/openvpn, and then you can connect using sudo openvpn /etc/openvpn/<Site_name>.conf. iOS: Open iTunes and select the files from the config ZIP file to add to the app on your iPhone or iPad. Android: From OpenVPN for an Android application, click on all your precious VPNs. In the top-right corner, click on the folder, and then browse to the folder where you have the OpenVPN .Conf file. Click on the file and hit Select. Then, in the top-right corner, hit the little floppy disc icon to save the import. Now, you should see the imported profile. Click on it to connect to the tunnel. For more information on this, visit http://forums.untangle.com/openvpn/30472-openvpn-android-4-0-a.html. Run the OpenVPN-compatible client. Using OpenVPN for site-to-site connection To use OpenVPN for site-to-site connection, one Untangle NGFW server will run on the OpenVPN server mode, and the other server will run on the client mode. We will need to create a client that will be imported in the remote server. The client settings are shown in the following screenshot: We will need to download the client configuration that is supposed to be imported on another Untangle server (the third option available on the client download menu), and then import this client configuration's zipped file on the remote server. To import the client, on the remote server under the Client tab, browse to the .zip file and press the Submit button. The client will be shown as follows: You'll need to restart the two servers before being able to use the OpenVPN site-to-site connection. The site-to-site connection is bidirectional. Reviewing the connection details The current connected clients (either they were OS clients or another Untangle NGFW client) will appear under Connected Remote Clients located under the Status tab. The screen will show the client name, its external address, and the address assigned to it by OpenVPN. In addition to the connection start time, the amount of transmitted and received MB during this connection is also shown: For the site-to-site connection, the client server will show the name of the remote server, whether the connection is established or not, in addition to the amount of transmitted and received data in MB: Event logs show a detailed connection history as shown in the following screenshot: In addition, there are two reports available for Untangle's OpenVPN: Bandwidth usage: This report shows the maximum and average data transfer rate (KB/s) and the total amount of data transferred that day Top users: This report shows the top users connected to the Untangle OpenVPN server Troubleshooting Untangle's OpenVPN In this section, we will discuss some points to consider when dealing with Untangle NGFW OpenVPN. OpenVPN acts as a router as it will route between different networks. Using OpenVPN with Untangle NGFW in the bridge mode (Untangle NGFW server is behind another router) requires additional configurations. The required configurations are as follows: Create a static route on the router that will route any traffic from the VPN range (the VPN address pool) to the Untangle NGFW server. Create a Port Forward rule for the OpenVPN port 1194 (UDP) on the router to Untangle NGFW. Verify that your setting is correct by going to Config | Administration | Public Address as it is used by Untangle to configure OpenVPN clients, and ensure that the configured address is resolvable from outside the company. If the OpenVPN client is connected, but you can't access anything, perform the following steps: Verify that the hosts you are trying to reach are exported in Exported Networks. Try to ping Untangle NGFW LAN IP address (if exported). Try to bring up the Untangle NGFW GUI by entering the IP address in a browser. If the preceding tasks work, your tunnel is up and operational. If you can't reach any clients inside the network, check for the following conditions: The client machine's firewall is not preventing the connection from the OpenVPN client. The client machine uses Untangle as a gateway or has a static route to send the VPN address pool to Untangle NGFW. In addition, some port forwarding rules on Untangle NGFW are needed for OpenVPN to function properly. The required ports are 53, 445, 389, 88, 135, and 1025. If the site-to-site tunnel is set up correctly, but the two sites can't talk to each other, the reason may be as follows: If your sites have IPs from the same subnet (this probably happens when you use a service from the same ISP for both branches), OpenVPN may fail as it consider no routing is needed from IPs in the same subnet, you should ask your ISP to change the IPs. To get DNS resolution to work over the site-to-site tunnel, you'll need to go to Config | Network | Advanced | DNS Server | Local DNS Servers and add the IP of the DNS server on the far side of the tunnel. Enter the domain in the Domain List column and use the FQDN when accessing resources. You'll need to do this on both sides of the tunnel for it to work from either side. If you are using site-to-site VPN in addition to the client-to-site VPN. However, the OpenVPN client is able to connect to the main site only: You'll need to add VPN Address Pool to Exported Hosts and Networks Lab-based training This section will provide training for the OpenVPN site-to-site and client-to-site scenarios. In this lab, we will mainly use Untangle-01, Untangle-03, and a laptop (192.168.1.7). The ABC bank started a project with Acme schools. As a part of this project, the ABC bank team needs to periodically access files located on Acme-FS01. So, the two parties decided to opt for OpenVPN. However, Acme's network team doesn't want to leave access wide open for ABC bank members, so they set firewall rules to limit ABC bank's access to the file server only. In addition, the IT team director wants to have VPN access from home to the Acme network, which they decided to accomplish using OpenVPN. The following diagram shows the environment used in the site-to-site scenario: To create the site-to-site connection, we will need to do the following steps: Enable OpenVPN Server on Untangle-01. Create a network type client with a remote network of 172.16.1.0/24. Download the client and import it under the Client tab in Untangle-03. Restart the two servers. After the restart, you have a site-to-site VPN connection. However, the Acme network is wide open to the ABC bank, so we need to create a firewall-limiting rule. On Untangle-03, create a rule that will allow any traffic that comes from the OpenVPN interface, and its source is 172.16.136.10 (Untangle-01 Client IP) and is directed to 172.16.1.7 (Acme-FS01). The rule is shown in the following screenshot: Also, we will need a general block rule that comes after the preceding rule in the rule evaluation order. The environment used for the client-to-site connection is shown in the following diagram: To create a client-to-site VPN connection, we need to perform the following steps: Enable the OpenVPN server on Untangle-03. Create an individual client type client on Untangle-03. Distribute the client to the intended user (that is 192.168.1.7). Install OpenVPN on your laptop. Connect using the installed OpenVPN and try to ping Acme-DC01 using its name. The ping will fail because the client is not able to query the Acme DNS. So, in the Default Group settings, change Push DNS Domain to Acme.local. Changing the group settings will not affect the OpenVPN client till the client is restarted. Now, the ping process will be a success. Summary In this article, we covered the VPN services provided by Untangle NGFW. We went deeply into understanding how each solution works. This article also provided a guide on how to configure and deploy the services. Untangle provides a free solution that is based on the well-known open source OpenVPN, which provides an SSL-based VPN. Resources for Article: Further resources on this subject: Important Features of Gitolite [Article] Target Exploitation [Article] IPv6 on Packet Tracer [Article]
Read more
  • 0
  • 0
  • 6462

article-image-setting-ireport-pages
Packt
29 Mar 2010
2 min read
Save for later

Setting Up the iReport Pages

Packt
29 Mar 2010
2 min read
Configuring the page format We can follow the listed steps for setting up report pages: Open the report List of Products. Go to menu Window | Report Inspector. The following window will appear on the left side of the report designer: Select the report List of Products, right-click on it, and choose Page Format…. The Page format… dialog box will appear, select A4 from the Format drop-down list, and select Portrait from the Page orientation section. You can modify the page margins if you need to, or leave it as it is to have the default margins. For our report, you need not change the margins. Press OK. Page size You have seen that there are many preset sizes/formats for the report, such as Custom, Letter, Note, Legal, A0 to A10, B0 to B5, and so on. You will choose the appropriate one based on your requirements. We have chosen A4. If the number of columns is too high to fit in Portrait, then choose the Landscape orientation. If you change the preset sizes, the report elements (title, column heading, fields, or other elements) will not be positioned automatically according to the new page size. You have to position each element manually. So be careful if you decide to change the page size. Configuring properties We can modify the default settings of report properties in the following way: Right-click on List of Products and choose Properties. We can configure many important report properties from the Properties window. You can see that there are many options here. You can change the Report name, Page size, Margins, Columns, and more. We have already learnt about setting up pages, so now our concern is to learn about some of the other (More…) options.
Read more
  • 0
  • 0
  • 6462
article-image-building-a-twitter-news-bot-using-twitter-api-tutorial
Bhagyashree R
07 Sep 2018
11 min read
Save for later

Building a Twitter news bot using Twitter API [Tutorial]

Bhagyashree R
07 Sep 2018
11 min read
This article is an excerpt from a book written by Srini Janarthanam titled Hands-On Chatbots and Conversational UI Development. In this article, we will explore the Twitter API and build core modules for tweeting, searching, and retweeting. We will further explore a data source for news around the globe and build a simple bot that tweets top news on its timeline. Getting started with the Twitter app To get started, let us explore the Twitter developer platform. Let us begin by building a Twitter app and later explore how we can tweet news articles to followers based on their interests: Log on to Twitter. If you don't have an account on Twitter, create one. Go to Twitter Apps, which is Twitter's application management dashboard. Click the Create New App button: Create an application by filling in the form providing name, description, and a website (fully-qualified URL). Read and agree to the Developer Agreement and hit Create your Twitter application: You will now see your application dashboard. Explore the tabs: Click Keys and Access Tokens: Copy consumer key and consumer secret and hang on to them. Scroll down to Your Access Token: Click Create my access token to create a new token for your app: Copy the Access Token and Access Token Secret and hang on to them. Now, we have all the keys and tokens we need to create a Twitter app. Building your first Twitter bot Let's build a simple Twitter bot. This bot will listen to tweets and pick out those that have a particular hashtag. All the tweets with a given hashtag will be printed on the console. This is a very simple bot to help us get started. In the following sections, we will explore more complex bots. To follow along you can download the code from the book's GitHub repository. Go to the root directory and create a new Node.js program using npm init: Execute the npm install twitter --save command to install the Twitter Node.js library: Run npm install request --save to install the Request library as well. We will use this in the future to make HTTP GET requests to a news data source. Explore your package.json file in the root directory: { "name": "twitterbot", "version": "1.0.0", "description": "my news bot", "main": "index.js", "scripts": { "test": "echo \"Error: no test specified\" && exit 1" }, "author": "", "license": "ISC", "dependencies": { "request": "^2.81.0", "twitter": "^1.7.1" } } Create an index.js file with the following code: //index.js var TwitterPackage = require('twitter'); var request = require('request'); console.log("Hello World! I am a twitter bot!"); var secret = { consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_SECRET' } var Twitter = new TwitterPackage(secret); In the preceding code, put the keys and tokens you saved in their appropriate variables. We don't need the request package just yet, but we will later. Now let's create a hashtag listener to listen to the tweets on a specific hashtag: //Twitter stream var hashtag = '#brexit'; //put any hashtag to listen e.g. #brexit console.log('Listening to:' + hashtag); Twitter.stream('statuses/filter', {track: hashtag}, function(stream) { stream.on('data', function(tweet) { console.log('Tweet:@' + tweet.user.screen_name + '\t' + tweet.text); console.log('------') }); stream.on('error', function(error) { console.log(error); }); }); Replace #brexit with the hashtag you want to listen to. Use a popular one so that you can see the code in action. Run the index.js file with the node index.js command. You will see a stream of tweets from Twitter users all over the globe who used the hashtag: Congratulations! You have built your first Twitter bot. Exploring the Twitter SDK In the previous section, we explored how to listen to tweets based on hashtags. Let's now explore the Twitter SDK to understand the capabilities that we can bestow upon our Twitter bot. Updating your status You can also update your status on your Twitter timeline by using the following status update module code: tweet ('I am a Twitter Bot!', null, null); function tweet(statusMsg, screen_name, status_id){ console.log('Sending tweet to: ' + screen_name); console.log('In response to:' + status_id); var msg = statusMsg; if (screen_name != null){ msg = '@' + screen_name + ' ' + statusMsg; } console.log('Tweet:' + msg); Twitter.post('statuses/update', { status: msg }, function(err, response) { // if there was an error while tweeting if (err) { console.log('Something went wrong while TWEETING...'); console.log(err); } else if (response) { console.log('Tweeted!!!'); console.log(response) } }); } Comment out the hashtag listener code and instead add the preceding status update code and run it. When run, your bot will post a tweet on your timeline: In addition to tweeting on your timeline, you can also tweet in response to another tweet (or status update). The screen_name argument is used to create a response. tweet. screen_name is the name of the user who posted the tweet. We will explore this a bit later. Retweet to your followers You can retweet a tweet to your followers using the following retweet status code: var retweetId = '899681279343570944'; retweet(retweetId); function retweet(retweetId){ Twitter.post('statuses/retweet/', { id: retweetId }, function(err, response) { if (err) { console.log('Something went wrong while RETWEETING...'); console.log(err); } else if (response) { console.log('Retweeted!!!'); console.log(response) } }); } Searching for tweets You can also search for recent or popular tweets with hashtags using the following search hashtags code: search('#brexit', 'popular') function search(hashtag, resultType){ var params = { q: hashtag, // REQUIRED result_type: resultType, lang: 'en' } Twitter.get('search/tweets', params, function(err, data) { if (!err) { console.log('Found tweets: ' + data.statuses.length); console.log('First one: ' + data.statuses[1].text); } else { console.log('Something went wrong while SEARCHING...'); } }); } Exploring a news data service Let's now build a bot that will tweet news articles to its followers at regular intervals. We will then extend it to be personalized by users through a conversation that happens over direct messaging with the bot. In order to build a news bot, we need a source where we can get news articles. We are going to explore a news service called NewsAPI.org in this section. News API is a service that aggregates news articles from roughly 70 newspapers around the globe. Setting up News API Let us set up an account with the News API data service and get the API key: Go to NewsAPI.org: Click Get API key. Register using your email. Get your API key. Explore the sources: https://newsapi.org/v1/sources?apiKey=YOUR_API_KEY. There are about 70 sources from across the globe including popular ones such as BBC News, Associated Press, Bloomberg, and CNN. You might notice that each source has a category tag attached. The possible options are: business, entertainment, gaming, general, music, politics, science-and-nature, sport, and technology. You might also notice that each source also has language (en, de, fr) and country (au, de, gb, in, it, us) tags. The following is the information on the BBC-News source: { "id": "bbc-news", "name": "BBC News", "description": "Use BBC News for up-to-the-minute news, breaking news, video, audio and feature stories. BBC News provides trusted World and UK news as well as local and regional perspectives. Also entertainment, business, science, technology and health news.", "url": "http://www.bbc.co.uk/news", "category": "general", "language": "en", "country": "gb", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top" ] } Get sources for a specific category, language, or country using: https://newsapi.org/v1/sources?category=business&apiKey=YOUR_API_KEY The following is the part of the response to the preceding query asking for all sources under the business category: "sources": [ { "id": "bloomberg", "name": "Bloomberg", "description": "Bloomberg delivers business and markets news, data, analysis, and video to the world, featuring stories from Businessweek and Bloomberg News.", "url": "http://www.bloomberg.com", "category": "business", "language": "en", "country": "us", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top" ] }, { "id": "business-insider", "name": "Business Insider", "description": "Business Insider is a fast-growing business site with deep financial, media, tech, and other industry verticals. Launched in 2007, the site is now the largest business news site on the web.", "url": "http://www.businessinsider.com", "category": "business", "language": "en", "country": "us", "urlsToLogos": { "small": "", "medium": "", "large": "" }, "sortBysAvailable": [ "top", "latest" ] }, ... ] Explore the articles: https://newsapi.org/v1/articles?source=bbc-news&apiKey=YOUR_API_KEY The following is the sample response: "articles": [ { "author": "BBC News", "title": "US Navy collision: Remains found in hunt for missing sailors", "description": "Ten US sailors have been missing since Monday's collision with a tanker near Singapore.", "url": "http://www.bbc.co.uk/news/world-us-canada-41013686", "urlToImage": "https://ichef1.bbci.co.uk/news/1024/cpsprodpb/80D9/ production/_97458923_mediaitem97458918.jpg", "publishedAt": "2017-08-22T12:23:56Z" }, { "author": "BBC News", "title": "Afghanistan hails Trump support in 'joint struggle'", "description": "President Ghani thanks Donald Trump for supporting Afghanistan's battle against the Taliban.", "url": "http://www.bbc.co.uk/news/world-asia-41012617", "urlToImage": "https://ichef.bbci.co.uk/images/ic/1024x576/p05d08pf.jpg", "publishedAt": "2017-08-22T11:45:49Z" }, ... ] For each article, the author, title, description, url, urlToImage,, and publishedAt fields are provided. Now that we have explored a source of news data that provides up-to-date news stories under various categories, let us go on to build a news bot. Building a Twitter news bot Now that we have explored News API, a data source for the latest news updates, and a little bit of what the Twitter API can do, let us combine them both to build a bot tweeting interesting news stories, first on its own timeline and then specifically to each of its followers: Let's build a news tweeter module that tweets the top news article given the source. The following code uses the tweet() function we built earlier: topNewsTweeter('cnn', null); function topNewsTweeter(newsSource, screen_name, status_id){ request({ url: 'https://newsapi.org/v1/articles?source=' + newsSource + '&apiKey=YOUR_API_KEY', method: 'GET' }, function (error, response, body) { //response is from the bot if (!error && response.statusCode == 200) { var botResponse = JSON.parse(body); console.log(botResponse); tweetTopArticle(botResponse.articles, screen_name); } else { console.log('Sorry. No new'); } }); } function tweetTopArticle(articles, screen_name, status_id){ var article = articles[0]; tweet(article.title + " " + article.url, screen_name); } Run the preceding program to fetch news from CNN and post the topmost article on Twitter: Here is the post on Twitter: Now, let us build a module that tweets news stories from a randomly-chosen source in a list of sources: function tweetFromRandomSource(sources, screen_name, status_id){ var max = sources.length; var randomSource = sources[Math.floor(Math.random() * (max + 1))]; //topNewsTweeter(randomSource, screen_name, status_id); } Let's call the tweeting module after we acquire the list of sources: function getAllSourcesAndTweet(){ var sources = []; console.log('getting sources...') request({ url: 'https://newsapi.org/v1/sources? apiKey=YOUR_API_KEY', method: 'GET' }, function (error, response, body) { //response is from the bot if (!error && response.statusCode == 200) { // Print out the response body var botResponse = JSON.parse(body); for (var i = 0; i < botResponse.sources.length; i++){ console.log('adding.. ' + botResponse.sources[i].id) sources.push(botResponse.sources[i].id) } tweetFromRandomSource(sources, null, null); } else { console.log('Sorry. No news sources!'); } }); } Let's create a new JS file called tweeter.js. In the tweeter.js file, call getSourcesAndTweet() to get the process started: //tweeter.js var TwitterPackage = require('twitter'); var request = require('request'); console.log("Hello World! I am a twitter bot!"); var secret = { consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_SECRET' } var Twitter = new TwitterPackage(secret); getAllSourcesAndTweet(); Run the tweeter.js file on the console. This bot will tweet a news story every time it is called. It will choose top news stories from around 70 news sources randomly. Hurray! You have built your very own Twitter news bot. In this tutorial, we have covered a lot. We started off with the Twitter API and got a taste of how we can automatically tweet, retweet, and search for tweets using hashtags. We then explored a News source API that provides news articles from about 70 different newspapers. We integrated it with our Twitter bot to create a new tweeting bot. If you found this post useful, do check out the book, Hands-On Chatbots and Conversational UI Development, which will help you explore the world of conversational user interfaces. Build and train an RNN chatbot using TensorFlow [Tutorial] Building a two-way interactive chatbot with Twilio: A step-by-step guide How to create a conversational assistant or chatbot using Python
Read more
  • 0
  • 1
  • 6462

article-image-reactive-programming-and-flux-architecture
Packt
18 Feb 2016
12 min read
Save for later

Reactive Programming and the Flux Architecture

Packt
18 Feb 2016
12 min read
Reactive programming, including functional reactive programming as will be discussed later, is a programming paradigm that can be used in multiparadigm languages such as JavaScript, Python, Scala, and many more. It is primarily distinguished from imperative programming, in which a statement does something by what are called side effects, in literature, about functional and reactive programming. Please note, though, that side effects here are not what they are in common English, where all medications have some effects, which are the point of taking the medication, and some other effects are unwanted but are tolerated for the main benefit. For example, Benadryl is taken for the express purpose of reducing symptoms of airborne allergies, and the fact that Benadryl, in a way similar to some other allergy medicines, can also cause drowsiness is (or at least was; now it is also sold as a sleeping aid) a side effect. This is unwelcome but tolerated as the lesser of two evils by people, who would rather be somewhat tired and not bothered by allergies than be alert but bothered by frequent sneezing. Medication side effects are rarely the only thing that would ordinarily be considered side effects by a programmer. For them, side effects are the primary intended purpose and effect of a statement, often implemented through changes in the stored state for a program. (For more resources related to this topic, see here.) Reactive programming has its roots in the observer pattern, as discussed in Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides's classic book Design Patterns: Elements of Reusable Object-Oriented Software (the authors of this book are commonly called GoF or Gang of Four). In the observer pattern, there is an observable subject. It has a list of listeners, and notifies all of them when it has something to publish. This is somewhat simpler than the publisher/subscriber (PubSub) pattern, not having potentially intricate filtering of which messages reach which subscriber which is a normal feature to include. Reactive programming has developed a life of its own, a bit like the MVC pattern-turned-buzzword, but it is best taken in connection with the broader context explored in GoF. Reactive programming, including the ReactJS framework (which is explored in this title), is intended to avoid the shared mutable state and be idempotent. This means that, as with RESTful web services, you will get the same result from a function whether you call it once or a hundred times. Pete Hunt formerly of Facebook—perhaps the face of ReactJS as it now exists—has said that he would rather be predictable than right. If there is a bug in his code, Hunt would rather have the interface fail the same way every single time than go on elaborate hunts for heisenbugs. These are bugs that manifest only in some special and slippery edge cases, and are explored later in this book. ReactJS is called the V of MVC. That is, it is intended for user interface work and has little intentions of offering other standard features. But just as the painter Charles Cézanne said about the impressionist painter Claude Monet, "Monet is only an eye, but what an eye!" about MVC and ReactJS, we can say, "ReactJS is only a view, but what a view!" In this chapter, we will be covering the following topics: Declarative programming The war on heisenbugs The Flux Architecture From pit of despair to the pit of success A complete UI teardown and rebuild JavaScript as a Domain-specific Language (DSL) Big-Coffee Notation ReactJS, the library explored in this book, was developed by Facebook and made open source in the not-too-distant past. It is shaped by some of Facebook's concerns about making a large-scale site that is safe to debug and work on, and also allowing a large number of programmers to work on different components without having to store brain-bending levels of complexity in their heads. The quotation "Simplicity is the lack of interleaving," which can be found in the videos at http://facebook.github.io/react, is not about how much or how little stuff there is on an absolute scale, but about how many moving parts you need to juggle simultaneously to work on a system (See the section on Big-Coffee Notation for further reflections). Declarative programming Probably, the biggest theoretical advantage of the ReactJS framework is that the programming is declarative rather than imperative. In imperative programming, you specify what steps need to be done; declarative programming is the programming in which you specify what needs to be accomplished without telling how it needs to be done. It may be difficult at first to shift from an imperative paradigm to a declarative paradigm, but once the shift has been made, it is well worth the effort involved to get there. Familiar examples of declarative paradigms, as opposed to imperative paradigms, include both SQL and HTML. An SQL query would be much more verbose if you had to specify how exactly to find records and filter them appropriately, let alone say how indices are to be used, and HTML would be much more verbose if, instead of having an IMG tag, you had to specify how to render an image. Many libraries, for instance, are more declarative than a rolling of your own solution from scratch. With a library, you are more likely to specify only what needs to be done and not—in addition to this—how to do it. ReactJS is not in any sense the only library or framework that is intended to provide a more declarative JavaScript, but this is one of its selling points, along with other better specifics that it offers to help teams work together and be productive. And again, ReactJS has emerged from some of Facebook's efforts in managing bugs and cognitive load while enabling developers to contribute a lot to a large-scale project. The war on Heisenbugs In modern physics, Heisenberg's uncertainty principle loosely says that there is an absolute theoretical limit to how well a particle's position and velocity can be known. Regardless of how good a laboratory's measuring equipment gets, funny things will always happen when you try to pin things down too far. Heisenbugs, loosely speaking, are subtle, slippery bugs that can be very hard to pin down. They only manifest under very specific conditions and may even fail to manifest when one attempts to investigate them (note that this definition is slightly different from the jargon file's narrower and more specific definition at http://www.catb.org/jargon/html/H/heisenbug.html, which specifies that attempting to measure a heisenbug may suppress its manifestation). This motive—of declaring war on heisenbugs—stems from Facebook's own woes and experiences in working at scale and seeing heisenbugs keep popping up. One thing that Pete Hunt mentioned, in not a flattering light at all, was a point where Facebook's advertisement system was only understood by two engineers well enough who were comfortable with modifying it. This is an example of something to avoid. By contrast, looking at Pete Hunt's remark that he would "rather be predictable than right" is a statement that if a defectively designed lamp can catch fire and burn, his much, much rather have it catch fire and burn immediately, the same way, every single time, than at just the wrong point of the moon phase have something burn. In the first case, the lamp will fail testing while the manufacturer is testing, the problem will be noticed and addressed, and lamps will not be shipped out to the public until the defect has been property addressed. The opposite Heisenbug case is one where the lamp will spark and catch fire under just the wrong conditions, which means that a defect will not be caught until the laps have shipped and started burning customers' homes down. "Predictable" means "fail the same way, every time, if it's going to fail at all." "Right means "passes testing successfully, but we don't know whether they're safe to use [probably they aren't]." Now, he ultimately does, in fact, care about being right, but the choices that Facebook has made surrounding React stem from a realization that being predictable is a means to being right. It's not acceptable for a manufacturer to ship something that will always spark and catch fire when a consumer plugs it in. However, being predictable moves the problems to the front and the center, rather than being the occasional result of subtle, hard-to-pin-down interactions that will have unacceptable consequences in some rare circumstances. The choices in Flux and ReactJS are designed to make failures obvious and bring them to the surface, rather than them being manifested only in the nooks and crannies of a software labyrinth. Facebook's war on the shared mutable state is illustrated in the experience that they had regarding a chat bug. The chat bug became an overarching concern for its users. One crucial moment of enlightenment for Facebook came when they announced a completely unrelated feature, and the first comment on this feature was a request to fix the chat; it got 898 likes. Also, they commented that this was one of the more polite requests. The problem was that the indicator for unread messages could have a phantom positive message count when there were no messages available. Things came to a point where people seemed not to care about what improvements or new features Facebook was adding, but just wanted them to fix the phantom message count. And they kept investigating and kept addressing edge cases, but the phantom message count kept on recurring. The solution, besides ReactJS, was found in the flux pattern, or architecture, which is discussed in the next section. After a situation where not too many people felt comfortable making changes, all of a sudden, many more people felt comfortable making changes. These things simplified matters enough that new developers tended not to really need the ramp-up time and treatment that had previously been given. Furthermore, when there was a bug, the more experienced developers could guess with reasonable accuracy what part of the system was the culprit, and the newer developers, after working on a bug, tended to feel confident and have a general sense of how the system worked. The Flux Architecture One of the ways in which Facebook, in relation to ReactJS, has declared war on heisenbugs is by declaring war on the mutable state. Flux is an architecture and a pattern, rather than a specific technology, and it can be used (or not used) with ReactJS. It is somewhat like MVC, equivalent to a loose competitor to that approach, but it is very different from a simple MVC variant and is designed to have a pit of success that provides unidirectional data flow like this: from the action to the dispatcher, then to the store, and finally to the view (but some people have said that these two are so different that a direct comparison between Flux and MVC, in terms of trying to identify what part of Flux corresponds to what conceptual hook in MVC, is not really that helpful). Actions are like events—they are fed into a top funnel. Dispatchers go through the funnels and can not only pass actions but also make sure that no additional actions are dispatched until the previous one has completely settled out. Stores have similarities and difference to models. They are like models in that they keep track of state. They are unlike models in that they have only getters, not setters, which stops the effect of any part of the program with access to a model being able to change anything in its setters. Stores can accept input, but in a very controlled way, and in general a store is not at the mercy of anything possessing a reference to it. A view is what displays the current output based on what is obtained from stores. Stores, compared to models in some respects, have getters but not setters. This helps foster a kind of data flow that is not at the mercy of anyone who has access to a setter. It is possible for events to be percolated as actions, but the dispatcher acts as a traffic cop and ensures that new actions are processed only after the stores are completely settled. This de-escalates the complexity considerably. Flux simplified interactions so that Facebook developers no longer had subtle edge cases and bug that kept coming back—the chat bug was finally dead and has not come back. Summary We just took a whirlwind tour of some of the theory surrounding reactive programming with ReactJS. This includes declarative programming, one of the selling points of ReactJS that offers something easier to work with at the end than imperative programming. The war on heisenbugs, is an overriding concern surrounding decisions made by Facebook, including ReactJS. This takes place through Facebook's declared war on the shared mutable state. The Flux Architecture is used by Facebook with ReactJS to avoid some nasty classes of bugs. To learn more about Reactive Programming and the Flux Architecture, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Reactive Programming with JavaScript (https://www.packtpub.com/application-development/reactive-programming-javascript) Clojure Reactive Programming (https://www.packtpub.com/web-development/clojure-reactive-programming) Resources for Article:   Further resources on this subject: The Observer Pattern [article] Concurrency in Practice [article] Introduction to Akka [article]
Read more
  • 0
  • 0
  • 6459

article-image-bryan-cantrill-on-the-changing-ethical-dilemmas-in-software-engineering
Vincy Davis
17 May 2019
6 min read
Save for later

Bryan Cantrill on the changing ethical dilemmas in Software Engineering

Vincy Davis
17 May 2019
6 min read
Earlier this month at the Craft Conference in Budapest, Bryan Cantrill (Chief Technology Officer at Joyent) gave a talk on “Andreessen's Corollary: Ethical Dilemmas in Software Engineering”. In 2011, Marc Andreessen had penned an essay ‘Why Software Is Eating The World’ in The Wall Street Journal. In the article, he’s talking about how software is present in all fields and are poised to take over large swathes of the economy. He believed, way back in 2011, that “many of the prominent new Internet companies are building real, high-growth, high-margin, highly defensible businesses.” Eight years later, Bryan Cantrill believes this prophecy is clearly coming to fulfillment. According to the article ‘Software engineering code of ethics’ published in 1997 by ACM (Association for Computing Machinery), a code is not a simple ethical program that generates ethical judgements. In some situations, these codes can generate conflict with each other. This will require a software engineer to use ethical judgement that will be consistent in terms of ethics. The article provides certain principles for software engineers to follow. According to Bryan, these principles are difficult to follow. Some of the principles expect software engineers to ensure that their product on which they are working is useful and of acceptable quality to the public, the employer, the client and user, is completed on time and of reasonable cost & free of errors. The codes specifications should be well documented according to the user’s requirements and have the client’s approval.  The codes should have appropriate methodology and good management. Software engineers should ensure realistic estimate of cost, scheduling, and outcome of any project on which they work or propose to work. The guiding context surrounding the code of ethics remains timeless, but as time has changed, these principles have become old fashioned. With the immense use of software and industries implementing these codes, it’s difficult for software engineers to follow these old principles and be ethically sound. Bryan calls this era as an ‘ethical grey area’ for software engineers. The software's contact with our broader world, has brought with it novel ethical dilemmas for those who endeavor to build it.  More than ever, software engineers are likely to find themselves in new frontiers with respect to society, the law or their own moral compass. Often without any formal training or even acknowledgement with respect to the ethical dimensions of their work, software engineers have to make ethical judgments. Ethical dilemmas in software development since Andreessen’s prophecy 2012 : Facebook started using emotional manipulation by beginning to perform experiments in the name of dynamic research or to generate revenue. The posts were determined to be positive or negative. 2013 : ‘Zenefits’ is a Silicon Valley startup. In order to build a software, they had to be certified by the  state of California. For which, they had to sit through 52 hours of training studying the materials through the web browser. The manager created a hack called ‘Macro’ that made it possible to complete the pre-licensing education requirement in less than 52 hours. This was passed on to almost 100 Zenefit employees to automate the process for them too. 2014 : Uber illegally entered the Portland market, with a software called ‘Greyball’. This software was used by uber to intentionally evade Portland Bureau of Transportation (PBOT) officers and deny separate ride requests. 2015 : Google started to mislabel photo captions. One of the times, Google mistakenly identified a dark skinned individual as a ‘Gorilla’. They offered a prompt reaction and removed the photo. This highlighted a real negative point of Artificial Intelligence (AI), that AI relies on biased human classification, at times using repeated patterns. Google was facing the problem of defending this mistake, as Google had not intentionally misled its network with such wrong data. 2016 : The first Tesla ‘Autopilot’ car was launched. It had traffic avoiding cruise control and steering assists features but was sold and marketed as a autopilot car. In an accident, the driver was killed, maybe because he believed that the car will drive itself. This was a serious problem. Tesla was using two cameras to judge the movements while driving. It should be understood that this Tesla car was just an enhancement to the driver and not a replacement. 2017 : Facebook faced the ire of the anti- Rohingya violence in Myanmar. Facebook messages were used to coordinate a effective genocide against the Rohingya, a mostly Muslim minority community, where 75000 people died. Facebook did not enable it or advocate it. It was a merely a communication platform, used for a wrong purpose. But Facebook could have helped to reduce the gravity of the situation by acting promptly and not allowing such messages to be circulated. This shows how everything should not be automated and human judgement cannot be replaced anytime soon. 2018 : In the wake of Pittsburg shooting, the alleged shooter had used the Gab platform to post against the Jews. Gab, which bills itself as "the free speech social network," is small compared to mainstream social media platforms but it has an avid user base. Joyent provided infrastructure to Gab, but quickly removed them from their platform, after the horrific incident. 2019 : After the 737 MAX MCAS AND JT/610 /ET302 crashes, reports emerged that aircraft's MCAS system played a role in the crash. The crash happened because a faulty sensor erroneously reported that the airplane was stalling. The false report triggered an automated system known as the Maneuvering Characteristics Augmentation System (MCAS). MCAS is activated without the pilot’s input. The crew confirmed that the manual trim operation was not working. These are some of the examples of ethical dilemmas in the post-Andreessen’s prophecy. As seen, all the incidents were the result of ethical decisions gone wrong. It is clear that ‘what is right for software is not necessarily right for society.’ How to deal with these ethical dilemmas? In the summer of 2018, the ACM came up with a new code of Ethics: Contribute to society and human well being Avoid harm Be honest and trustworthy It has also included an Integrity project which will have case studies and  “Ask an Ethicist” feature. These efforts by ACM will help software engineers facing ethical dilemmas. This will also pave way for great discussions resulting in a behavior consistent with the code of ethics. Organisations should encourage such discussions. This will help like minded people to perpetuate a culture of consideration of ethical consequences. As software’s footprint continues to grow, the ethical dilemmas of software engineers will only expand. These Ethical dilemmas are Andreessen’s corollary. And software engineers must address them collectively and directly. Software engineers agree with this evolving nature of ethical dilemmas. https://twitter.com/MA_Hanin/status/1129082836512911360 Watch the talk by Bryan Cantrill at Craft Conference. All coding and no sleep makes Jack/Jill a dull developer, research confirms Red Badger Tech Director Viktor Charypar talks monorepos, lifelong learning, and the challenges facing open source software Google AI engineers introduce Translatotron, an end-to-end speech-to-speech translation model
Read more
  • 0
  • 0
  • 6457
article-image-angularjs-performance
Packt
04 Mar 2015
20 min read
Save for later

AngularJS Performance

Packt
04 Mar 2015
20 min read
In this article by Chandermani, the author of AngularJS by Example, we focus our discussion on the performance aspect of AngularJS. For most scenarios, we can all agree that AngularJS is insanely fast. For standard size views, we rarely see any performance bottlenecks. But many views start small and then grow over time. And sometimes the requirement dictates we build large pages/views with a sizable amount of HTML and data. In such a case, there are things that we need to keep in mind to provide an optimal user experience. Take any framework and the performance discussion on the framework always requires one to understand the internal working of the framework. When it comes to Angular, we need to understand how Angular detects model changes. What are watches? What is a digest cycle? What roles do scope objects play? Without a conceptual understanding of these subjects, any performance guidance is merely a checklist that we follow without understanding the why part. Let's look at some pointers before we begin our discussion on performance of AngularJS: The live binding between the view elements and model data is set up using watches. When a model changes, one or many watches linked to the model are triggered. Angular's view binding infrastructure uses these watches to synchronize the view with the updated model value. Model change detection only happens when a digest cycle is triggered. Angular does not track model changes in real time; instead, on every digest cycle, it runs through every watch to compare the previous and new values of the model to detect changes. A digest cycle is triggered when $scope.$apply is invoked. A number of directives and services internally invoke $scope.$apply: Directives such as ng-click, ng-mouse* do it on user action Services such as $http and $resource do it when a response is received from server $timeout or $interval call $scope.$apply when they lapse A digest cycle tracks the old value of the watched expression and compares it with the new value to detect if the model has changed. Simply put, the digest cycle is a workflow used to detect model changes. A digest cycle runs multiple times till the model data is stable and no watch is triggered. Once you have a clear understanding of the digest cycle, watches, and scopes, we can look at some performance guidelines that can help us manage views as they start to grow. (For more resources related to this topic, see here.) Performance guidelines When building any Angular app, any performance optimization boils down to: Minimizing the number of binding expressions and hence watches Making sure that binding expression evaluation is quick Optimizing the number of digest cycles that take place The next few sections provide some useful pointers in this direction. Remember, a lot of these optimization may only be necessary if the view is large. Keeping the page/view small The sanest advice is to keep the amount of content available on a page small. The user cannot interact/process too much data on the page, so remember that screen real estate is at a premium and only keep necessary details on a page. The lesser the content, the lesser the number of binding expressions; hence, fewer watches and less processing are required during the digest cycle. Remember, each watch adds to the overall execution time of the digest cycle. The time required for a single watch can be insignificant but, after combining hundreds and maybe thousands of them, they start to matter. Angular's data binding infrastructure is insanely fast and relies on a rudimentary dirty check that compares the old and the new values. Check out the stack overflow (SO) post (http://stackoverflow.com/questions/9682092/databinding-in-angularjs), where Misko Hevery (creator of Angular) talks about how data binding works in Angular. Data binding also adds to the memory footprint of the application. Each watch has to track the current and previous value of a data-binding expression to compare and verify if data has changed. Keeping a page/view small may not always be possible, and the view may grow. In such a case, we need to make sure that the number of bindings does not grow exponentially (linear growth is OK) with the page size. The next two tips can help minimize the number of bindings in the page and should be seriously considered for large views. Optimizing watches for read-once data In any Angular view, there is always content that, once bound, does not change. Any read-only data on the view can fall into this category. This implies that once the data is bound to the view, we no longer need watches to track model changes, as we don't expect the model to update. Is it possible to remove the watch after one-time binding? Angular itself does not have something inbuilt, but a community project bindonce (https://github.com/Pasvaz/bindonce) is there to fill this gap. Angular 1.3 has added support for bind and forget in the native framework. Using the syntax {{::title}}, we can achieve one-time binding. If you are on Angular 1.3, use it! Hiding (ng-show) versus conditional rendering (ng-if/ng-switch) content You have learned two ways to conditionally render content in Angular. The ng-show/ng-hide directive shows/hides the DOM element based on the expression provided and ng-if/ng-switch creates and destroys the DOM based on an expression. For some scenarios, ng-if can be really beneficial as it can reduce the number of binding expressions/watches for the DOM content not rendered. Consider the following example: <div ng-if='user.isAdmin'>   <div ng-include="'admin-panel.html'"></div></div> The snippet renders an admin panel if the user is an admin. With ng-if, if the user is not an admin, the ng-include directive template is neither requested nor rendered saving us of all the bindings and watches that are part of the admin-panel.html view. From the preceding discussion, it may seem that we should get rid of all ng-show/ng-hide directives and use ng-if. Well, not really! It again depends; for small size pages, ng-show/ng-hide works just fine. Also, remember that there is a cost to creating and destroying the DOM. If the expression to show/hide flips too often, this will mean too many DOMs create-and-destroy cycles, which are detrimental to the overall performance of the app. Expressions being watched should not be slow Since watches are evaluated too often, the expression being watched should return results fast. The first way we can make sure of this is by using properties instead of functions to bind expressions. These expressions are as follows: {{user.name}}ng-show='user.Authorized' The preceding code is always better than this: {{getUserName()}}ng-show = 'isUserAuthorized(user)' Try to minimize function expressions in bindings. If a function expression is required, make sure that the function returns a result quickly. Make sure a function being watched does not: Make any remote calls Use $timeout/$interval Perform sorting/filtering Perform DOM manipulation (this can happen inside directive implementation) Or perform any other time-consuming operation Be sure to avoid such operations inside a bound function. To reiterate, Angular will evaluate a watched expression multiple times during every digest cycle just to know if the return value (a model) has changed and the view needs to be synchronized. Minimizing the deep model watch When using $scope.$watch to watch for model changes in controllers, be careful while setting the third $watch function parameter to true. The general syntax of watch looks like this: $watch(watchExpression, listener, [objectEquality]); In the standard scenario, Angular does an object comparison based on the reference only. But if objectEquality is true, Angular does a deep comparison between the last value and new value of the watched expression. This can have an adverse memory and performance impact if the object is large. Handling large datasets with ng-repeat The ng-repeat directive undoubtedly is the most useful directive Angular has. But it can cause the most performance-related headaches. The reason is not because of the directive design, but because it is the only directive that allows us to generate HTML on the fly. There is always the possibility of generating enormous HTML just by binding ng-repeat to a big model list. Some tips that can help us when working with ng-repeat are: Page data and use limitTo: Implement a server-side paging mechanism when a number of items returned are large. Also use the limitTo filter to limit the number of items rendered. Its syntax is as follows: <tr ng-repeat="user in users |limitTo:pageSize">…</tr> Look at modules such as ngInfiniteScroll (http://binarymuse.github.io/ngInfiniteScroll/) that provide an alternate mechanism to render large lists. Use the track by expression: The ng-repeat directive for performance tries to make sure it does not unnecessarily create or delete HTML nodes when items are added, updated, deleted, or moved in the list. To achieve this, it adds a $$hashKey property to every model item allowing it to associate the DOM node with the model item. We can override this behavior and provide our own item key using the track by expression such as: <tr ng-repeat="user in users track by user.id">…</tr> This allows us to use our own mechanism to identify an item. Using your own track by expression has a distinct advantage over the default hash key approach. Consider an example where you make an initial AJAX call to get users: $scope.getUsers().then(function(users){ $scope.users = users;}) Later again, refresh the data from the server and call something similar again: $scope.users = users; With user.id as a key, Angular is able to determine what elements were added/deleted and moved; it can also determine created/deleted DOM nodes for such elements. Remaining elements are not touched by ng-repeat (internal bindings are still evaluated). This saves a lot of CPU cycles for the browser as fewer DOM elements are created and destroyed. Do not bind ng-repeat to a function expression: Using a function's return value for ng-repeat can also be problematic, depending upon how the function is implemented. Consider a repeat with this: <tr ng-repeat="user in getUsers()">…</tr> And consider the controller getUsers function with this: $scope.getUser = function() {   var orderBy = $filter('orderBy');   return orderBy($scope.users, predicate);} Angular is going to evaluate this expression and hence call this function every time the digest cycle takes place. A lot of CPU cycles were wasted sorting user data again and again. It is better to use scope properties and presort the data before binding. Minimize filters in views, use filter elements in the controller: Filters defined on ng-repeat are also evaluated every time the digest cycle takes place. For large lists, if the same filtering can be implemented in the controller, we can avoid constant filter evaluation. This holds true for any filter function that is used with arrays including filter and orderBy. Avoiding mouse-movement tracking events The ng-mousemove, ng-mouseenter, ng-mouseleave, and ng-mouseover directives can just kill performance. If an expression is attached to any of these event directives, Angular triggers a digest cycle every time the corresponding event occurs and for events like mouse move, this can be a lot. We have already seen this behavior when working with 7 Minute Workout, when we tried to show a pause overlay on the exercise image when the mouse hovers over it. Avoid them at all cost. If we just want to trigger some style changes on mouse events, CSS is a better tool. Avoiding calling $scope.$apply Angular is smart enough to call $scope.$apply at appropriate times without us explicitly calling it. This can be confirmed from the fact that the only place we have seen and used $scope.$apply is within directives. The ng-click and updateOnBlur directives use $scope.$apply to transition from a DOM event handler execution to an Angular execution context. Even when wrapping the jQuery plugin, we may require to do a similar transition for an event raised by the JQuery plugin. Other than this, there is no reason to use $scope.$apply. Remember, every invocation of $apply results in the execution of a complete digest cycle. The $timeout and $interval services take a Boolean argument invokeApply. If set to false, the lapsed $timeout/$interval services does not call $scope.$apply or trigger a digest cycle. Therefore, if you are going to perform background operations that do not require $scope and the view to be updated, set the last argument to false. Always use Angular wrappers over standard JavaScript objects/functions such as $timeout and $interval to avoid manually calling $scope.$apply. These wrapper functions internally call $scope.$apply. Also, understand the difference between $scope.$apply and $scope.$digest. $scope.$apply triggers $rootScope.$digest that evaluates all application watches whereas, $scope.$digest only performs dirty checks on the current scope and its children. If we are sure that the model changes are not going to affect anything other than the child scopes, we can use $scope.$digest instead of $scope.$apply. Lazy-loading, minification, and creating multiple SPAs I hope you are not assuming that the apps that we have built will continue to use the numerous small script files that we have created to separate modules and module artefacts (controllers, directives, filters, and services). Any modern build system has the capability to concatenate and minify these files and replace the original file reference with a unified and minified version. Therefore, like any JavaScript library, use minified script files for production. The problem with the Angular bootstrapping process is that it expects all Angular application scripts to be loaded before the application can bootstrap. We cannot load modules, controllers, or in fact, any of the other Angular constructs on demand. This means we need to provide every artefact required by our app, upfront. For small applications, this is not a problem as the content is concatenated and minified; also, the Angular application code itself is far more compact as compared to the traditional JavaScript of jQuery-based apps. But, as the size of the application starts to grow, it may start to hurt when we need to load everything upfront. There are at least two possible solutions to this problem; the first one is about breaking our application into multiple SPAs. Breaking applications into multiple SPAs This advice may seem counterintuitive as the whole point of SPAs is to get rid of full page loads. By creating multiple SPAs, we break the app into multiple small SPAs, each supporting parts of the overall app functionality. When we say app, it implies a combination of the main (such as index.html) page with ng-app and all the scripts/libraries and partial views that the app loads over time. For example, we can break the Personal Trainer application into a Workout Builder app and a Workout Runner app. Both have their own start up page and scripts. Common scripts such as the Angular framework scripts and any third-party libraries can be referenced in both the applications. On similar lines, common controllers, directives, services, and filters too can be referenced in both the apps. The way we have designed Personal Trainer makes it easy to achieve our objective. The segregation into what belongs where has already been done. The advantage of breaking an app into multiple SPAs is that only relevant scripts related to the app are loaded. For a small app, this may be an overkill but for large apps, it can improve the app performance. The challenge with this approach is to identify what parts of an application can be created as independent SPAs; it totally depends upon the usage pattern of the application. For example, assume an application has an admin module and an end consumer/user module. Creating two SPAs, one for admin and the other for the end customer, is a great way to keep user-specific features and admin-specific features separate. A standard user may never transition to the admin section/area, whereas an admin user can still work on both areas; but transitioning from the admin area to a user-specific area will require a full page refresh. If breaking the application into multiple SPAs is not possible, the other option is to perform the lazy loading of a module. Lazy-loading modules Lazy-loading modules or loading module on demand is a viable option for large Angular apps. But unfortunately, Angular itself does not have any in-built support for lazy-loading modules. Furthermore, the additional complexity of lazy loading may be unwarranted as Angular produces far less code as compared to other JavaScript framework implementations. Also once we gzip and minify the code, the amount of code that is transferred over the wire is minimal. If we still want to try our hands on lazy loading, there are two libraries that can help: ocLazyLoad (https://github.com/ocombe/ocLazyLoad): This is a library that uses script.js to load modules on the fly angularAMD (http://marcoslin.github.io/angularAMD): This is a library that uses require.js to lazy load modules With lazy loading in place, we can delay the loading of a controller, directive, filter, or service script, until the page that requires them is loaded. The overall concept of lazy loading seems to be great but I'm still not sold on this idea. Before we adopt a lazy-load solution, there are things that we need to evaluate: Loading multiple script files lazily: When scripts are concatenated and minified, we load the complete app at once. Contrast it to lazy loading where we do not concatenate but load them on demand. What we gain in terms of lazy-load module flexibility we lose in terms of performance. We now have to make a number of network requests to load individual files. Given these facts, the ideal approach is to combine lazy loading with concatenation and minification. In this approach, we identify those feature modules that can be concatenated and minified together and served on demand using lazy loading. For example, Personal Trainer scripts can be divided into three categories: The common app modules: This consists of any script that has common code used across the app and can be combined together and loaded upfront The Workout Runner module(s): Scripts that support workout execution can be concatenated and minified together but are loaded only when the Workout Runner pages are loaded. The Workout Builder module(s): On similar lines to the preceding categories, scripts that support workout building can be combined together and served only when the Workout Builder pages are loaded. As we can see, there is a decent amount of effort required to refactor the app in a manner that makes module segregation, concatenation, and lazy loading possible. The effect on unit and integration testing: We also need to evaluate the effect of lazy-loading modules in unit and integration testing. The way we test is also affected with lazy loading in place. This implies that, if lazy loading is added as an afterthought, the test setup may require tweaking to make sure existing tests still run. Given these facts, we should evaluate our options and check whether we really need lazy loading or we can manage by breaking a monolithic SPA into multiple smaller SPAs. Caching remote data wherever appropriate Caching data is the one of the oldest tricks to improve any webpage/application performance. Analyze your GET requests and determine what data can be cached. Once such data is identified, it can be cached from a number of locations. Data cached outside the app can be cached in: Servers: The server can cache repeated GET requests to resources that do not change very often. This whole process is transparent to the client and the implementation depends on the server stack used. Browsers: In this case, the browser caches the response. Browser caching depends upon the server sending HTTP cache headers such as ETag and cache-control to guide the browser about how long a particular resource can be cached. Browsers can honor these cache headers and cache data appropriately for future use. If server and browser caching is not available or if we also want to incorporate any amount of caching in the client app, we do have some choices: Cache data in memory: A simple Angular service can cache the HTTP response in the memory. Since Angular is SPA, the data is not lost unless the page refreshes. This is how a service function looks when it caches data: var workouts;service.getWorkouts = function () {   if (workouts) return $q.resolve(workouts);   return $http.get("/workouts").then(function (response){       workouts = response.data;       return workouts;   });}; The implementation caches a list of workouts into the workouts variable for future use. The first request makes a HTTP call to retrieve data, but subsequent requests just return the cached data as promised. The usage of $q.resolve makes sure that the function always returns a promise. Angular $http cache: Angular's $http service comes with a configuration option cache. When set to true, $http caches the response of the particular GET request into a local cache (again an in-memory cache). Here is how we cache a GET request: $http.get(url, { cache: true}); Angular caches this cache for the lifetime of the app, and clearing it is not easy. We need to get hold of the cache dedicated to caching HTTP responses and clear the cache key manually. The caching strategy of an application is never complete without a cache invalidation strategy. With cache, there is always a possibility that caches are out of sync with respect to the actual data store. We cannot affect the server-side caching behavior from the client; consequently, let's focus on how to perform cache invalidation (clearing) for the two client-side caching mechanisms described earlier. If we use the first approach to cache data, we are responsible for clearing cache ourselves. In the case of the second approach, the default $http service does not support clearing cache. We either need to get hold of the underlying $http cache store and clear the cache key manually (as shown here) or implement our own cache that manages cache data and invalidates cache based on some criteria: var cache = $cacheFactory.get('$http');cache.remove("http://myserver/workouts"); //full url Using Batarang to measure performance Batarang (a Chrome extension), as we have already seen, is an extremely handy tool for Angular applications. Using Batarang to visualize app usage is like looking at an X-Ray of the app. It allows us to: View the scope data, scope hierarchy, and how the scopes are linked to HTML elements Evaluate the performance of the application Check the application dependency graph, helping us understand how components are linked to each other, and with other framework components. If we enable Batarang and then play around with our application, Batarang captures performance metrics for all watched expressions in the app. This data is nicely presented as a graph available on the Performance tab inside Batarang: That is pretty sweet! When building an app, use Batarang to gauge the most expensive watches and take corrective measures, if required. Play around with Batarang and see what other features it has. This is a very handy tool for Angular applications. This brings us to the end of the performance guidelines that we wanted to share in this article. Some of these guidelines are preventive measures that we should take to make sure we get optimal app performance whereas others are there to help when the performance is not up to the mark. Summary In this article, we looked at the ever-so-important topic of performance, where you learned ways to optimize an Angular app performance. Resources for Article: Further resources on this subject: Role of AngularJS [article] The First Step [article] Recursive directives [article]
Read more
  • 0
  • 0
  • 6455

article-image-replication-solutions-postgresql
Packt
09 Mar 2017
14 min read
Save for later

Replication Solutions in PostgreSQL

Packt
09 Mar 2017
14 min read
In this article by Chitij Chauhan, Dinesh Kumar, the authors of the book PostgreSQL High Performance Cookbook, we will talk about various high availability and replication solutions including some popular third-party replication tool like Slony. (For more resources related to this topic, see here.) Setting up hot streaming replication Here in this recipe we are going to set up a master/slave streaming replication. Getting ready For this exercise you would need two Linux machines each with the latest version of PostgreSQL 9.6 installed. We will be using the following IP addresses for master and slave servers. Master IP address: 192.168.0.4 Slave IP Address: 192.168.0.5 How to do it… Given are the sequence of steps for setting up master/slave streaming replication: Setup password less authentication between master and slave for postgres user. First we are going to create a user ID on the master which will be used by slave server to connect to the PostgreSQL database on the master server: psql -c "CREATE USER repuser REPLICATION LOGIN ENCRYPTED PASSWORD 'charlie';" Next would be to allow the replication user that was created in the previous step to allow access to the master PostgreSQL server. This is done by making the necessary changes as mentioned in the pg_hba.conf file: Vi pg_hba.conf host replication repuser 192.168.0.5/32 md5 In the next step we are going to configure parameters in the postgresql.conf file. These parameters are required to be set in order to get the streaming replication working: Vi /var/lib/pgsql/9.6/data/postgresql.conf listen_addresses = '*' wal_level = hot_standby max_wal_senders = 3 wal_keep_segments = 8 archive_mode = on archive_command = 'cp %p /var/lib/pgsql/archive/%f && scp %p [email protected]:/var/lib/pgsql/archive/%f' Once the parameter changes have been made in the postgresql.conf file in the previous step ,the next step would be restart the PostgreSQL server on the master server in order to get the changes made in the previous step come into effect: pg_ctl -D /var/lib/pgsql/9.6/data restart Before the slave can replicate the master we would need to give it the initial database to build off. For this purpose we will make a base backup by copying the primary server's data directory to the standby: psql -U postgres -h 192.168.0.4 -c "SELECT pg_start_backup('label', true)" rsync -a /var/lib/pgsql/9.6/data/ 192.168.0.5:/var/lib/pgsql/9.6/data/ --exclude postmaster.pid psql -U postgres -h 192.168.0.4 -c "SELECT pg_stop_backup()" Once the data directory in the previous step is populated ,next step is to configure the following mentioned parameters in the postgresql.conf file on the slave server: hot_standby = on Next would be to copy the recovery.conf.sample in the $PGDATA location on the slave server and then configure the following mentioned parameters: cp /usr/pgsql-9.6/share/recovery.conf.sample /var/lib/pgsql/9.6/data/recovery.conf standby_mode = on primary_conninfo = 'host=192.168.0.4 port=5432 user=repuser password=charlie' trigger_file = '/tmp/trigger.replication' restore_command = 'cp /var/lib/pgsql/archive/%f "%p"' Next would be to start the slave server: service postgresql-9.6 start Now that the preceding mentioned replication steps are set up we will now test for replication. On the master server, login and issue the following mentioned SQL commands: psql -h 192.168.0.4 -d postgres -U postgres -W postgres=# create database test; postgres=# c test; test=# create table testtable ( testint int, testchar varchar(40) ); CREATE TABLE test=# insert into testtable values ( 1, 'What A Sight.' ); INSERT 0 1 On the slave server we will now check if the newly created database and the corresponding table in the previous step are replicated: psql -h 192.168.0.5 -d test -U postgres -W test=# select * from testtable; testint | testchar --------+---------------- 1 | What A Sight. (1 row) The wal_keep_segments parameter ensures that how many WAL files should be retained in the master pg_xlog in case of network delays. However if you do not want assume a value for this, you can create a replication slot which makes sure master does not remove the WAL files in pg_xlog until they have been received by standbys. For more information refer to: https://www.postgresql.org/docs/9.6/static/warm-standby.html#STREAMING-REPLICATION-SLOTS. How it works… The following is the explanation given for the steps done in the preceding section: In the initial step of the preceding section we create a user called repuser which will be used by the slave server to make a connection to the primary server. In step 2 of the preceding section we make the necessary changes in the pg_hba.conf file to allow the master server to be accessed by the slave server using the user ID repuser that was created in the step 2. We then make the necessary parameter changes on the master in step 4 of the preceding section for configuring streaming replication. Given is a description for these parameters:     Listen_Addresses: This parameter is used to provide the IP address that you want to have PostgreSQL listen too. A value of * indicates that all available IP addresses.     wal_level: This parameter determines the level of WAL logging done. Specify hot_standby for streaming replication.    wal_keep_segments: This parameter specifies the number of 16 MB WAL files to retain in the pg_xlog directory. The rule of thumb is that more such files may be required to handle a large checkpoint.     archive_mode: Setting this parameter enables completed WAL segments to be sent to archive storage.     archive_command: This parameter is basically a shell command is executed whenever a WAL segment is completed. In our case we are basically copying the file to the local machine and then we are using the secure copy command to send it across to the slave.     max_wal_senders: This parameter specifies the total number of concurrent connections allowed from the slave servers. Once the necessary configuration changes have been made on the master server we then restart the PostgreSQL server on the master in order to get the new configuration changes come into effect. This is done in step 5 of the preceding section. In step 6 of the preceding section, we were basically building the slave by copying the primary's data directory to the slave. Now with the data directory available on the slave the next step is to configure it. We will now make the necessary parameter replication related parameter changes on the slave in the postgresql.conf directory on the slave server. We set the following mentioned parameter on the slave:    hot_standby: This parameter determines if we can connect and run queries during times when server is in the archive recovery or standby mode In the next step we are configuring the recovery.conf file. This is required to be setup so that the slave can start receiving logs from the master. The following mentioned parameters are configured in the recovery.conf file on the slave:    standby_mode: This parameter when enabled causes PostgreSQL to work as a standby in a replication configuration.    primary_conninfo: This parameter specifies the connection information used by the slave to connect to the master. For our scenario the our master server is set as 192.168.0.4 on port 5432 and we are using the user ID repuser with password charlie to make a connection to the master. Remember that the repuser was the user ID which was created in the initial step of the preceding section for this purpose that is, connecting to the  master from the slave.    trigger_file: When slave is configured as a standby it will continue to restore the XLOG records from the master. The trigger_file parameter specifies what is used to trigger a slave to switch over its duties from standby and take over as master or being the primary server. At this stage the slave has been now fully configured and we then start the slave server and then replication process begins. In step 10 and 11 of the preceding section we are simply testing our replication. We first begin by creating a database test and then log into the test database and create a table by the name test table and then begin inserting some records into the test table. Now our purpose is to see whether these changes are replicated across the slave. To test this we then login into slave on the test database and then query the records from the test table as seen in step 10 of the preceding section. The final result that we see is that the all the records which are changed/inserted on the primary are visible on the slave. This completes our streaming replication setup and configuration. Replication using Slony Here in this recipe we are going to setup replication using Slony which is widely used replication engine. It replicates a desired set of tables data from one database to other. This replication approach is based on few event triggers which will be created on the source set of tables which will log the DML and DDL statements into a Slony catalog tables. By using Slony, we can also setup the cascading replication among multiple nodes. Getting ready The steps followed in this recipe are carried out on a CentOS Version 6 machine. We would first need to install Slony. The following mentioned are the steps needed to install Slony: First go to the mentioned web link and download the given software at http://slony.info/downloads/2.2/source/. Once you have downloaded the following mentioned software the next step is to unzip the tarball and then go the newly created directory: tar xvfj slony1-2.2.3.tar.bz2 cd slony1-2.2.3 In the next step we are going to configure, compile, and build the software: /configure --with-pgconfigdir=/usr/pgsql-9.6/bin/ make make install How to do it… The following mentioned are the sequence of steps required to replicate data between two tables using Slony replication: First start the PostgreSQL server if not already started: pg_ctl -D $PGDATA start In the next step we will be creating two databases test1 and test 2 which will be used as source and target databases: createdb test1 createdb test2 In the next step we will create the table t_test on the source database test1 and will insert some records into it: psql -d test1 test1=# create table t_test (id numeric primary key, name varchar); test1=# insert into t_test values(1,'A'),(2,'B'), (3,'C'); We will now set up the target database by copying the table definitions from the source database test1: pg_dump -s -p 5432 -h localhost test1 | psql -h localhost -p 5432 test2 We will now connect to the target database test2 and verify that there is no data in the tables of the test2 database: psql -d test2 test2=# select * from t_test; We will now setup a slonik script for master/slave that is, source/target setup: vi init_master.slonik #! /bin/slonik cluster name = mycluster; node 1 admin conninfo = 'dbname=test1 host=localhost port=5432 user=postgres password=postgres'; node 2 admin conninfo = 'dbname=test2 host=localhost port=5432 user=postgres password=postgres'; init cluster ( id=1); create set (id=1, origin=1); set add table(set id=1, origin=1, id=1, fully qualified name = 'public.t_test'); store node (id=2, event node = 1); store path (server=1, client=2, conninfo='dbname=test1 host=localhost port=5432 user=postgres password=postgres'); store path (server=2, client=1, conninfo='dbname=test2 host=localhost port=5432 user=postgres password=postgres'); store listen (origin=1, provider = 1, receiver = 2); store listen (origin=2, provider = 2, receiver = 1); We will now create a slonik script for subscription on the slave that is, target: vi init_slave.slonik #! /bin/slonik cluster name = mycluster; node 1 admin conninfo = 'dbname=test1 host=localhost port=5432 user=postgres password=postgres'; node 2 admin conninfo = 'dbname=test2 host=localhost port=5432 user=postgres password=postgres'; subscribe set ( id = 1, provider = 1, receiver = 2, forward = no); We will now run the init_master.slonik script created in step 6 and will run this on the master: cd /usr/pgsql-9.6/bin slonik init_master.slonik We will now run the init_slave.slonik script created in step 7 and will run this on the slave that is, target: cd /usr/pgsql-9.6/bin slonik init_slave.slonik In the next step we will start the master slon daemon: nohup slon mycluster "dbname=test1 host=localhost port=5432 user=postgres password=postgres" & In the next step we will start the slave slon daemon: nohup slon mycluster "dbname=test2 host=localhost port=5432 user=postgres password=postgres" & In the next step we will connect to the master that is, source database test1 and insert some records in the t_test table: psql -d test1 test1=# insert into t_test values (5,'E'); We will now test for replication by logging to the slave that is, target database test2 and see if the inserted records into the t_test table in the previous step are visible: psql -d test2 test2=# select * from t_test; id | name ----+------ 1 | A 2 | B 3 | C 5 | E (4 rows) How it works… We will now discuss about the steps followed in the preceding section: In step 1, we first start the PostgreSQL server if not already started. In step 2 we create two databases namely test1 and test2 that will serve as our source (master) and target (slave) databases. In step 3 of the preceding section we log into the source database test1 and create a table t_test and insert some records into the table. In step 4 of the preceding section we set up the target database test2 by copying the table definitions present in the source database and loading them into the target database test2 by using pg_dump utility. In step 5 of the preceding section we login into the target database test2 and verify that there are no records present in the table t_test because in step 5 we only extracted the table definitions into test2 database from test1 database. In step 6 we setup a slonik script for master/slave replication setup. In the file init_master.slonik we first define the cluster name as mycluster. We then define the nodes in the cluster. Each node will have a number associated to a connection string which contains database connection information. The node entry is defined both for source and target databases. The store_path commands are necessary so that each node knows how to communicate with the other. In step 7 we setup a slonik script for subscription of the slave that is, target database test2. Once again the script contains information such as cluster name, node entries which are designed a unique number related to connect string information. It also contains a subscriber set. In step 8 of the preceding section we run the init_master.slonik on the master. Similarly in step 9 we run the init_slave.slonik on the slave. In step 10 of the preceding section we start the master slon daemon. In step 11 of the preceding section we start the slave slon daemon. Subsequent section from step 12 and 13 of the preceding section are used to test for replication. For this purpose in step 12 of the preceding section we first login into the source database test1 and insert some records into the t_test table. To check if the newly inserted records have been replicated to target database test2 we login into the test2 database in step 13 and then result set obtained by the output of the query confirms that the changed/inserted records on the t_test table in the test1 database are successfully replicated across the target database test2. You may refer to the link given for more information regarding Slony replication at http://slony.info/documentation/tutorial.html. Summary We have seen how to setup streaming replication and then we looked at how to install and replicate using one popular third-party replication tool Slony. Resources for Article: Further resources on this subject: Introducing PostgreSQL 9 [article] PostgreSQL Cookbook - High Availability and Replication [article] PostgreSQL in Action [article]
Read more
  • 0
  • 0
  • 6455