Programming | 0 articles | Tech News, Tutorials & Expert Insights

article-image-containerizing-web-application-docker-part-1

10 Jun 2016

4 min read

Containerizing a Web Application with Docker Part 1

10 Jun 2016

Congratulations, you’ve written a web application! Now what? Part one of this post deals with steps to take after development, more specifically the creation of a Docker image that contains the application. In part two, I’ll lay out deploying that image to the Google Cloud Platform as well as some further reading that'll help you descend into the rabbit hole that is DevOps. For demonstration purposes, let’s say that you’re me and you want to share your adventures in TrapRap and Death Metal (not simultaneously, thankfully!) with the world. I’ve written a simple Ember frontend for this purpose, and through the course of this post, I will explain to you how I go about containerizing it. Of course, the beauty of this procedure is that it will work with any frontend application, and you are certainly welcome to Bring Your Own Code. Everything I use is publically available on GitHub, however, and you’re certainly welcome to work through this post with the material presented as well. So, I’ve got this web app. You can get it here, or you can run: $ git clone https://github.com/ndarwincorn/docker-demo.git Do this for wherever it is you keep your source code. You’ll need ember-cli and some familiarity with Ember to customize it yourself, or you can just cut to the chase and build the Docker image, which is what I’m going to do in this post. I’m using Docker 1.10, but there’s no reason this wouldn’t work on a Mac running Docker Toolbox (or even Boot2Docker, but don’t quote me on that) or a less bleeding edge Linux distro. Since installing Docker is well documented, I won’t get into that here and will continue with the assumption that you have a working, up-to-date Docker installed on your machine, and that the Docker daemon is running. If you’re working with your own app, feel free to skip below to my explanation of the process and then come back here when you’ve got a Dockerfile in the root of your application. In the root of the application, run the following (make sure you don’t have any locally-installed web servers listening on port 80 already): # docker build -t docker-demo . # docker run -d -p 80:80 --name demo docker-demo Once the command finishes by printing a container ID, launch a web browser and navigate to http://localhost. Hey! Now you can listen to my music served from a LXC container running on your very own computer. How did we accomplish this? Let’s take it piece-by-piece (here’s where to start reading again if you’ve approached this article with your own app): I created a simple Dockerfile using the official Nginx image because I have a deep-seated mistrust of Canonical and don’t want to use the Dockerfile here. Here’s what it looks like in my project: docker-demo/Dockerfile FROM nginx COPY dist/usr/share/nginx/html Running the docker build command reads the Dockerfile and uses it to configure a docker image based on the nginx image. During image configuration, it copies the contents of the dist folder in my project to /srv/http/docker-demo in the container, which the nginx configuration that was mentioned is pointed to. The -t flag tells Docker to ‘tag’ (name) the image we’ve just created as ‘docker-demo’. The docker run command takes that image and builds a container from it. The -d flag is short for ‘detach’, or run the /usr/bin/nginx command built into the image from our Dockerfile and leave the container running. The -p flag maps a port on the host to a port in the container, and --name names the container for later reference. The command should return a container ID that can be used to manipulate it later. In part two, I’ll show you how to push the image we created to the Google Cloud Platform and then launch it as a container in a specially-purposed VM on their Compute Engine. About the Author Darwin Corn is a Systems Analyst for the Consumer Direct Care Network. He is a mid-level professional with diverse experience in the Information Technology world.

0
0
4348

article-image-asynchronous-control-flow-patterns-es2015-and-beyond

Packt

07 Jun 2016

6 min read

Asynchronous Control Flow Patterns with ES2015 and beyond

Packt

07 Jun 2016

6 min read

In this article,by Luciano Mammino, the author of the book Node.js Design Patterns, Second Edition, we will explore async await, an innovative syntaxthat will be available in JavaScript as part of the release of ECMAScript 2017. (For more resources related to this topic, see here.) Async await using Babel Callbacks, promises, and generators turn out to be the weapons at our disposal to deal with asynchronous code in JavaScript and in Node.js. As we have seen, generators are very interesting because they offer a way to actually suspend the execution of a function and resume it at a later stage. Now we can adopt this feature to write asynchronous codethatallowsdevelopers to write functions that "appear" to block at each asynchronous operation, waiting for the results before continuing with the following statement. The problem is that generator functions are designed to deal mostly with iterators and their usage with asynchronous code feels a bit cumbersome.It might be hard to understand,leading to code that is hard to read and maintain. But there is hope that there will be a cleaner syntax sometime in the near future. In fact, there is an interesting proposal that will be introduced with the ECMAScript 2017 specification that defines the async function's syntax. You can read more about the current status of the async await proposal at https://tc39.github.io/ecmascript-asyncawait/. The async function specification aims to dramatically improve the language-level model for writing asynchronous code by introducing two new keywords into the language: async and await. To clarify how these keywords are meant to be used and why they are useful, let's see a very quick example: const request = require('request'); function getPageHtml(url) { return new Promise(function(resolve, reject) { request(url, function(error, response, body) { resolve(body); }); }); } async function main() { const html = awaitgetPageHtml('http://google.com'); console.log(html); } main(); console.log('Loading...'); In this code,there are two functions: getPageHtml and main. The first one is a very simple function that fetches the HTML code of a remote web page given its URL. It's worth noticing that this function returns a promise. The main function is the most interesting one because it's where the new async and await keywords are used. The first thing to notice is that the function is prefixed with the async keyword. This means that the function executes asynchronous code and allows it to use the await keyword within its body. The await keyword before the call to getPageHtml tells the JavaScript interpreter to "await" the resolution of the promise returned by getPageHtml before continuing to the next instruction. This way, the main function is internally suspended until the asynchronous code completes without blocking the normal execution of the rest of the program. In fact, we will see the string Loading… in the console and, after a moment, the HTML code of the Google landing page. Isn't this approach much more readable and easy to understand? Unfortunately, this proposal is not yet final, and even if it will be approved we will need to wait for the next version of the ECMAScript specification to come out and be integrated in Node.js to be able to use this new syntax natively. So what do we do today? Just wait? No, of course not! We can already leverage async await in our code thanks to transpilers such as Babel. Installing and running Babel Babel is a JavaScript compiler (or transpiler) that is able to convert JavaScript code into other JavaScript code using syntax transformers. Syntax transformers allowsthe use of new syntax such as ES2015, ES2016, JSX, and others to produce backward compatible equivalent code that can be executed in modernJavaScript runtimes, such as browsers or Node.js. You can install Babel in your project using NPM with the following command: npm install --save-dev babel-cli We also need to install the extensions to support async await parsing and transformation: npm install --save-dev babel-plugin-syntax-async-functions babel-plugin-transform-async-to-generator Now let's assume we want to run our previous example (called index.js).We need to launch the following command: node_modules/.bin/babel-node --plugins "syntax-async-functions,transform-async-to-generator" index.js This way, we are transforming the source code in index.js on the fly, applying the transformers to support async await. This new backward compatible code is stored in memory and then executed on the fly on the Node.js runtime. Babel can also be configured to act as a build processor that stores the generated code into files so that you can easily deploy and run the generated code. You can read more about how to install and configure Babel on the official website at https://babeljs.io. Comparison At this point, we should have a better understanding of the options we have to tame the asynchronous nature of JavaScript. Each one of the solutions presented has its own pros and cons. Let's summarize them in the following table: Solutions Pros Cons Plain JavaScript Does not require any additional libraries or technology Offers the best performances Provides the best level of compatibility with third-party libraries Allows the creation of ad hoc and more advanced algorithms Might require extra code and relatively complex algorithms Async (library) Simplifies the most common control flow patterns Is still a callback-based solution Good performance Introduces an external dependency Might still not be enough for advanced flows Promises Greatly simplify the most common control flow patterns Robust error handling Part of the ES2015 specification Guarantee deferred invocation of onFulfilled and onRejected Require to promisify callback-based APIs Introduce a small performance hit Generators Make non-blocking API look like a blocking one Simplify error handling Part of ES2015 specification Require a complementary control flow library Still require callbacks or promises to implement non-sequential flows Require to thunkify or promisify nongenerator-based APIs Async await Make non-blocking API look like blocking Clean and intuitive syntax Not yet available in JavaScript and Node.js natively Requires Babel or other transpilers and some configuration to be used today It is worth mentioning that we chose to present only the most popular solutions to handle asynchronous control flow, or the ones receiving a lot of momentum, but it's good to know that there are a few more options you might want to look at, for example, Fibers (https://npmjs.org/package/fibers) and Streamline (https://npmjs.org/package/streamline). Summary In this article, we analyzed how Babel can be used for performing async await and how to install Babel.

0
0
3174

Packt

07 Jun 2016

18 min read

Learning JavaScript Data Structures: Arrays

Packt

07 Jun 2016

18 min read

In this article by Loiane Groner, author of the book Learning JavaScript Data Structures and Algorithms, Second Edition, we will learn about arrays. An array is the simplest memory data structure. For this reason, all programming languages have a built-in array datatype. JavaScript also supports arrays natively, even though its first version was released without array support. In this article, we will dive into the array data structure and its capabilities. An array stores values sequentially that are all of the same datatype. Although JavaScript allows us to create arrays with values from different datatypes, we will follow best practices and assume that we cannot do this(most languages do not have this capability). (For more resources related to this topic, see here.) Why should we use arrays? Let's consider that we need to store the average temperature of each month of the year of the city that we live in. We could use something similar to the following to store this information: var averageTempJan = 31.9; var averageTempFeb = 35.3; var averageTempMar = 42.4; var averageTempApr = 52; var averageTempMay = 60.8; However, this is not the best approach. If we store the temperature for only 1 year, we could manage 12 variables. However, what if we need to store the average temperature for more than 1 year? Fortunately, this is why arrays were created, and we can easily represent the same information mentioned earlier as follows: averageTemp[0] = 31.9; averageTemp[1] = 35.3; averageTemp[2] = 42.4; averageTemp[3] = 52; averageTemp[4] = 60.8; We can also represent the averageTemp array graphically: Creating and initializing arrays Declaring, creating, and initializing an array in JavaScript is as simple, as shown by the following: var daysOfWeek = new Array(); //{1} var daysOfWeek = new Array(7); //{2} var daysOfWeek = new Array('Sunday', 'Monday', 'Tuesday', 'Wednes"day', 'Thursday', 'Friday', 'Saturday'); //{3} We can simply declare and instantiate a new array using the keyword new (line {1}). Also, using the keyword new, we can create a new array specifying the length of the array (line {2}). A third option would be passing the array elements directly to its constructor (line {3}). However, using the new keyword is not best practice. If you want to create an array in JavaScript, we can assign empty brackets ([]),as in the following example: var daysOfWeek = []; We can also initialize the array with some elements, as follows: var daysOfWeek = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', "'Thursday', 'Friday', 'Saturday']; If we want to know how many elements are in the array (its size), we can use the length property. The following code will give an output of 7: console.log(daysOfWeek.length); Accessing elements and iterating an array To access a particular position of the array, we can also use brackets, passing the index of the position we would like to access. For example, let's say we want to output all the elements from the daysOfWeek array. To do so, we need to loop the array and print the elements, as follows: for (var i=0; i<daysOfWeek.length; i++){ console.log(daysOfWeek[i]); } Let's take a look at another example. Let's say that we want to find out the first 20 numbers of the Fibonacci sequence. The first two numbers of the Fibonacci sequence are 1 and 2, and each subsequent number is the sum of the previous two numbers: var fibonacci = []; //{1} fibonacci[1] = 1; //{2} fibonacci[2] = 1; //{3} for(var i = 3; i < 20; i++){ fibonacci[i] = fibonacci[i-1] + fibonacci[i-2]; ////{4} } for(var i = 1; i<fibonacci.length; i++){ //{5} console.log(fibonacci[i]); //{6} } So, in line {1}, we declared and created an array. In lines {2} and {3}, we assigned the first two numbers of the Fibonacci sequence to the second and third positions of the array (in JavaScript, the first position of the array is always referenced by 0, and as there is no 0 in the Fibonacci sequence, we will skip it). Then, all we have to do is create the third to the twentieth number of the sequence (as we know the first two numbers already). To do so, we can use a loop and assign the sum of the previous two positions of the array to the current position (line {4},starting from index 3 of the array to the 19th index). Then, to take a look at the output (line {6}), we just need to loop the array from its first position to its length (line {5}). We can use console.log to output each index of the array (lines {5} and {6}), or we can also use console.log(fibonacci) to output the array itself. Most browsers have a nice array representation in console.log. If you would like to generate more than 20 numbers of the Fibonacci sequence, just change the number 20 to whatever number you like. Adding elements Adding and removing elements from an array is not that difficult; however, it can be tricky. For the examples we will use in this section, let's consider that we have the following numbers array initialized with numbers from 0 to 9: var numbers = [0,1,2,3,4,5,6,7,8,9]; If we want to add a new element to this array (for example, the number 10), all we have to do is reference the latest free position of the array and assign a value to it: numbers[numbers.length] = 10; In JavaScript, an array is a mutable object. We can easily add new elements to it. The object will grow dynamically as we add new elements to it. In many other languages, such as C and Java, we need to determine the size of the array, and if we need to add more elements to the array, we need to create a completely new array; we cannot simply add new elements to it as we need them. Using the push method However, there is also a method called push that allows us to add new elements to the end of the array. We can add as many elements as we want as arguments to the push method: numbers.push(11); numbers.push(12, 13); The output of the numbers array will be the numbers from 0 to 13. Inserting an element in the first position Now, let's say we need to add a new element to the array and would like to insert it in the first position, not the last one. To do so, first, we need to free the first position by shifting all the elements to the right. We can loop all the elements of the array, starting from the last position + 1 (length) and shifting the previous element to the new position to finally assign the new value we want to the first position (-1). Run the following code for this: for (var i=numbers.length; i>=0; i--){ numbers[i] = numbers[i-1]; } numbers[0] = -1; We can represent this action with the following diagram: Using the unshift method The JavaScript array class also has a method called unshift, which inserts the values passed in the method's arguments at the start of the array: numbers.unshift(-2); numbers.unshift(-4, -3); So, using the unshift method, we can add the value -2 and then -3 and -4 to the beginning of the numbers array. The output of this array will be the numbers from -4 to 13. Removing elements So far, you have learned how to add values to the end and at the beginning of an array. Let's take a look at how we can remove a value from an array. To remove a value from the end of an array, we can use the pop method: numbers.pop(); The push and pop methods allow an array to emulate a basic stack data structure. The output of our array will be the numbers from -4 to 12. The length of our array is 17. Removing an element from first position To remove a value from the beginning of the array, we can use the following code: for (var i=0; i<numbers.length; i++){ numbers[i] = numbers[i+1]; } We can represent the previous code using the following diagram: We shifted all the elements one position to the left. However, the length of the array is still the same (17), meaning we still have an extra element in our array (with an undefined value).The last time the code inside the loop was executed, i+1was a reference to a position that does not exist. In some languages such as Java, C/C++, or C#, the code would throw an exception, and we would have to end our loop at numbers.length -1. As you can note, we have only overwritten the array's original values, and we did not really remove the value (as the length of the array is still the same and we have this extra undefined element). Using the shift method To actually remove an element from the beginning of the array, we can use the shift method, as follows: numbers.shift(); So, if we consider that our array has the value -4 to 12 and a length of 17, after we execute the previous code, the array will contain the values -3 to 12 and have a length of 16. The shift and unshift methods allow an array to emulate a basic queue data structure. Adding and removing elements from a specific position So far, you have learned how to add elements at the end and at the beginning of an array, and you have also learned how to remove elements from the beginning and end of an array. What if we also want to add or remove elements from any particular position of our array? How can we do this? We can use the splice method to remove an element from an array by simply specifying the position/index that we would like to delete from and how many elements we would like to remove, as follows: numbers.splice(5,3); This code will remove three elements, starting from index 5 of our array. This means the numbers [5],numbers [6], and numbers [7] will be removed from the numbers array. The content of our array will be -3, -2, -1, 0, 1, 5, 6, 7, 8, 9, 10, 11, and 12 (as the numbers 2, 3, and 4 have been removed). As with JavaScript arrays and objects, we can also use the delete operator to remove an element from the array, for example, remove numbers[0]. However, the position 0 of the array will have the value undefined, meaning that it would be the same as doing numbers[0] = undefined. For this reason, we should always use the splice, pop, or shift methods to remove elements. Now, let's say we want to insert numbers 2 to 4 back into the array, starting from the position 5. We can again use the splice method to do this: numbers.splice(5,0,2,3,4); The first argument of the method is the index we want to remove elements from or insert elements into. The second argument is the number of elements we want to remove (in this case, we do not want to remove any, so we will pass the value 0 (zero)). And the third argument (onwards) are the values we would like to insert into the array (the elements 2, 3, and 4). The output will be values from -3 to 12 again. Finally, let's execute the following code: numbers.splice(5,3,2,3,4); The output will be values from -3 to 12. This is because we are removing three elements, starting from the index 5, and we are also adding the elements 2, 3, and 4, starting at index 5. Two-dimensional and multidimensional arrays At the beginning of this article, we used the temperature measurement example. We will now use this example one more time. Let's consider that we need to measure the temperature hourly for a few days. Now that we already know we can use an array to store the temperatures, we can easily write the following code to store the temperatures over two days: var averageTempDay1 = [72,75,79,79,81,81]; var averageTempDay2 = [81,79,75,75,73,72]; However, this is not the best approach; we can write better code! We can use a matrix (two-dimensional array) to store this information, in which each row will represent the day, and each column will represent an hourly measurement of temperature, as follows: var averageTemp = []; averageTemp[0] = [72,75,79,79,81,81]; averageTemp[1] = [81,79,75,75,73,72]; JavaScript only supports one-dimensional arrays; it does not support matrices. However, we can implement matrices or any multidimensional array using an array of arrays, as in the previous code. The same code can also be written as follows: //day 1 averageTemp[0] = []; averageTemp[0][0] = 72; averageTemp[0][1] = 75; averageTemp[0][2] = 79; averageTemp[0][3] = 79; averageTemp[0][4] = 81; averageTemp[0][5] = 81; //day 2 averageTemp[1] = []; averageTemp[1][0] = 81; averageTemp[1][1] = 79; averageTemp[1][2] = 75; averageTemp[1][3] = 75; averageTemp[1][4] = 73; averageTemp[1][5] = 72; In the previous code, we specified the value of each day and hour separately. We can also represent this example in a diagram similar to the following: Each row represents a day, and each column represents an hour of the day (temperature). Iterating the elements of two-dimensional arrays If we want to take a look at the output of the matrix, we can create a generic function to log its output: function printMatrix(myMatrix) { for (var i=0; i<myMatrix.length; i++){ for (var j=0; j<myMatrix[i].length; j++){ console.log(myMatrix[i][j]); } } } We need to loop through all the rows and columns. To do this, we need to use a nested for loop in which the variable i represents rows, and j represents the columns. We can call the following code to take a look at the output of the averageTemp matrix: printMatrix(averageTemp); Multi-dimensional arrays We can also work with multidimensional arrays in JavaScript. For example, let's create a 3 x 3 matrix. Each cell contains the sum i (row) + j (column) + z (depth) of the matrix, as follows: var matrix3x3x3 = []; for (var i=0; i<3; i++){ matrix3x3x3[i] = []; for (var j=0; j<3; j++){ matrix3x3x3[i][j] = []; for (var z=0; z<3; z++){ matrix3x3x3[i][j][z] = i+j+z; } } } It does not matter how many dimensions we have in the data structure; we need to loop each dimension to access the cell. We can represent a 3 x 3 x 3 matrix with a cube diagram, as follows: To output the content of this matrix, we can use the following code: for (var i=0; i<matrix3x3x3.length; i++){ for (var j=0; j<matrix3x3x3[i].length; j++){ for (var z=0; z<matrix3x3x3[i][j].length; z++){ console.log(matrix3x3x3[i][j][z]); } } } If we had a 3 x 3 x 3 x 3 matrix, we would have four nested for statements in our code and so on. References for JavaScript array methods Arrays in JavaScript are modified objects, meaning that every array we create has a few methods available to be used. JavaScript arrays are very interesting because they are very powerful and have more capabilities available than primitive arrays in other languages. This means that we do not need to write basic capabilities ourselves, such as adding and removing elements in/from the middle of the data structure. The following is a list of the core available methods in an array object. We have covered some methods already: Method Description concat This joins multiple arrays and returns a copy of the joined arrays every This iterates every element of the array, verifying a desired condition (function) until false is returned filter This creates an array with each element that evaluates to true in the function provided forEach This executes a specific function on each element of the array join This joins all the array elements into a string indexOf This searches the array for specific elements and returns its position lastIndexOf This returns the position of last item in the array that matches the search criteria map This creates a new array from a function that contains the criteria/condition and returns the elements of the array that match the criteria reverse This reverses the array so that the last items become the first and vice versa slice This returns a new array from the specified index some This iterates every element of the array, verifying a desired condition (function) until true is returned sort This sorts the array alphabetically or by the supplied function toString This returns the array as a string valueOf Similar to the toString method, this returns the array as a string We have already covered the push, pop, shift, unshift, and splice methods. Let's take a look at these new ones. Joining multiple arrays Consider a scenario where you have different arrays and you need to join all of them into a single array. We could iterate each array and add each element to the final array. Fortunately, JavaScript already has a method that can do this for us named the concat method, which looks as follows: var zero = 0; var positiveNumbers = [1,2,3]; var negativeNumbers = [-3,-2,-1]; var numbers = negativeNumbers.concat(zero, positiveNumbers); We can pass as many arrays and objects/elements to this array as we desire. The arrays will be concatenated to the specified array in the order that the arguments are passed to the method. In this example, zero will be concatenated to negativeNumbers, and then positiveNumbers will be concatenated to the resulting array. The output of the numbers array will be the values -3, -2, -1, 0, 1, 2, and 3. Iterator functions Sometimes, we need to iterate the elements of an array. You learned that we can use a loop construct to do this, such as the for statement, as we saw in some previous examples. JavaScript also has some built-in iterator methods that we can use with arrays. For the examples of this section, we will need an array and also a function. We will use an array with values from 1 to 15 and also a function that returns true if the number is a multiple of 2 (even) and false otherwise. Run the following code: var isEven = function (x) { // returns true if x is a multiple of 2. console.log(x); return (x % 2 == 0) ? true : false; }; var numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]; return (x % 2 == 0) ? true : false can also be represented as return (x % 2 == 0). Iterating using the every method The first method we will take a look at is the every method. The every method iterates each element of the array until the return of the function is false, as follows: numbers.every(isEven); In this case, our first element of the numbers array is the number 1. 1 is not a multiple of 2 (it is an odd number), so the isEven function will return false, and this will be the only time the function will be executed. Iterating using the some method Next, we have the some method. It has the same behavior as the every method; however, the some method iterates each element of the array until the return of the function is true: numbers.some(isEven); In our case, the first even number of our numbers array is 2 (the second element). The first element that will be iterated is the number 1; it will return false. Then, the second element that will be iterated is the number 2, which will return true, and the iteration will stop. Iterating using forEach If we need the array to be completely iterated no matter what, we can use the forEach function. It has the same result as using a for loop with the function's code inside it, as follows: numbers.forEach(function(x){ console.log((x % 2 == 0)); }); Using map and filter JavaScript also has two other iterator methods that return a new array with a result. The first one is the map method, which is as follows: var myMap = numbers.map(isEven); The myMap array will have the following values: [false, true, false, true, false, true, false, true, false, true, false, true, false, true, false]. It stores the result of the isEven function that was passed to the map method. This way, we can easily know whether a number is even or not. For example, myMap[0] returns false because 1 is not even, and myMap[1] returns true because 2 is even. We also have the filter method. It returns a new array with the elements that the function returned true, as follows: var evenNumbers = numbers.filter(isEven); In our case, the evenNumbers array will contain the elements that are multiples of 2: [2, 4, 6, 8, 10, 12, 14]. Using the reduce method Finally, we have the reduce method. The reduce method also receives a function with the following parameters: previousValue, currentValue, index, and array. We can use this function to return a value that will be added to an accumulator, which will be returned after the reduce method stops being executed. It can be very useful if we want to sum up all the values in an array. Here's an example: numbers.reduce(function(previous, current, index){ return previous + current; }); The output will be 120. The JavaScript Array class also has two other important methods: map and reduce. The method names are self-explanatory, meaning that the map method will map values when given a function, and the reduce method will reduce the array containing the values that match a function as well. These three methods (map, filter, and reduce) are the base of the functional programming of JavaScript. Summary In this article, we covered the most-used data structure: arrays. You learned how to declare, initialize, and assign values as well as add and remove elements. You also learned about two-dimensional and multidimensional arrays as well as the main methods of an array, which will be very useful when we start creating our own algorithms.

0
0
4200

article-image-practical-big-data-exploration-spark-and-python

Anant Asthana

06 Jun 2016

6 min read

Practical Big Data Exploration with Spark and Python

Anant Asthana

06 Jun 2016

6 min read

The reader of this post should be familiar with basic concepts of Spark, such as the shell and RDDs. Data sizes have increased, but our exploration tools and techniques have not evolved as fast. Traditional Hadoop Map Reduce jobs are cumbersome and time consuming to develop. Also, Pig isn't quite as fully featured and easy to work with. Exploration can mean parsing/analyzing raw text documents, analyzing log files, processing tabular data in various formats, and exploring data that may or may not be correctly formatted. This is where a tool like Spark excels. It provides an interactive shell for quick processing, prototyping, exploring, and slicing and dicing data. Spark works with R, Scala, and Python. In conjunction with Jupyter notebooks, we get a clean web interface to write out python, R, or Scala code backed by a Spark cluster. Jupyter notebook is also a great tool for presenting our findings, since we can do inline visualizations and easily share them as a PDF on GitHub or through a web viewer. The power of this set up is that we make Spark do the heavy lifting while still having the flexibility to test code on a small subset of data via the interactive notebooks. Another powerful capability of Spark is its Data Frames API. After we have cleaned our data (dealt with badly formatted rows that can't be loaded correctly), we can load it as a Data Frame. Once the data is a loaded as a Data Frame, we can use the Spark SQL to explore the data. Since notebooks can be shared, this is also a great way to let the developers do the work of cleaning the data and loading it as a Data Frame. Analysts, data scientists, and the likes can then use this data for their tasks. Data Frames can also be exported as Hive tables, which are commonly used in Hadoop-based warehouses. Examples: For this section, we will be using examples that I have uploaded on GitHub. These examples can be found at here. In addition to the examples, there is also a Docker container for running these examples that have been provided. The container runs Spark in a pseudo-distributed mode, and has Jupyter notebook configured with to run Python/PYspark. The basics: To set this up, in your environment, you need a running spark cluster with Jupyter notebook installed. Jupyter notebook, by default, only has the Python kernel configured. You can download additional kernels for Jupyter notebook to run R and Scala. To run Jupyter notebook with Pyspark, use the following command on your cluster: IPYTHON_OPTS="notebook --pylab inline --notebook-dir=<directory sto store notebooks>" MASTER=local[6] ./bin/pyspark When you start Jupyter notebook in the way we mentioned earlier, it initializes a few critical variables. One of them is the Spark Context (sc), which is used to interact with all spark-related tasks. The other is sqlContext, which is the Spark SQL context. This is used to interact with Spark SQL (create Data Frames, run queries, and so on). You need to understand the following: Log Analysis In this example, we use a log file from Apache Server. The code for this example can be found at here. We load our log file in question using: log_file = sc.textFile("../data/log_file.txt") Spark can load files from HDFS, local filesystem, and S3 natively. Other storage formats libraries can be found freely on the Internet, or you could write you own formats (Blog post for another time). The previous command loads the log file. We then use Python’s native shlex library to split the file into different fields and use the Sparks map command to load them as a Row. An RDD consisting of rows can easily be registered as a DataFrame. How we arrived at this solution is where data exploration comes in. We use the Sparks takeSample method to sample the file and get five rows: log_file.takeSample(True, 5) These sample rows are helpful in determining how to parse and load the file. Once we have written our code to load the file, we can apply it to the dataset using map to create a new RDD consisting of Rows to test code on a subset of data in a similar manner using the take or takeSample methods. The take method sequentially reads rows from the file, so although it is faster, it may not be a good representation of the dataset. The take sample method on the other hand randomly picks sample rows from the file; this has a better representation. To create the new RDD and register it as a DataFrame, we use the following code: schema_DF = splits.map(create_schema).toDF() Once we have created the DataFrame and tested it using take/takeSample to make sure that our loading code is working, we can register it as a table using the following: sqlCtx.registerDataFrameAsTable(schema_DF, 'logs') Once it is registered as a table, we can run SQL queries on the log file: sample = sqlCtx.sql('SELECT * FROM logs LIMIT 10').collect() Note that the collect() method collects the result to the driver’s memory so this may not be feasible for large datasets. Use take/takeSample instead to sample data if your dataset is large. The beauty of using Spark with Jupyter is that all this exploration work takes only a few lines of code. It can be written interactively with all the trial and error we needed, the processed data can be easily shared, and running interactive queries on this data is easy. Last but not least, this can easily scale to massive (GB, TB) data sets. k-means on the Iris dataset In this example, we use data from the Iris dataset, which contains measurements of sepal and petal length and width. This is a popular open source dataset used to showcase classification algorithms. In this case, we use Spark’s k-Means algorithm from the MLlib library of Spark. MLlib is Spark’s machine learning library. The code and the output can be found at here. In this example, we are not going to get into too much detail since some of the concepts are outside the scope of this blog post. This example showcases how we load the Iris dataset and create a DataFrame with it. We then train a k-means classifier on this dataset, and then we visualize our classification results. The power of this is that we did a somewhat complex task of parsing a dataset, creating a DataFrame, training a machine learning classifier, and visualizing the data in an interactive and scalable manner. The repository contains several more examples. Feel free to reach out to me if you have any questions. If you would like to see more posts with practical examples, please let us know. About the Author Anant Asthana is a data scientist and principal architect at Pythian, and he can be found on Github at anantasty.

0
0
3119

article-image-understanding-patterns-and-architecturesin-typescript

Packt

01 Jun 2016

19 min read

Understanding Patterns and Architecturesin TypeScript

Packt

01 Jun 2016

19 min read

In this article by Vilic Vane,author of the book TypeScript Design Patterns, we'll study architecture and patterns that are closely related to the language or its common applications. Many topics in this articleare related to asynchronous programming. We'll start from a web architecture for Node.js that's based on Promise. This is a larger topic that has interesting ideas involved, including abstractions of response and permission, as well as error handling tips. Then, we'll talk about how to organize modules with ES module syntax. Due to the limited length of this article, some of the related code is aggressively simplified, and nothing more than the idea itself can be applied practically. (For more resources related to this topic, see here.) Promise-based web architecture The most exciting thing for Promise may be the benefits brought to error handling. In a Promise-based architecture, throwing an error could be safe and pleasant. You don't have to explicitly handle errors when chaining asynchronous operations, and this makes it tougher for mistakes to occur. With the growing usage with ES2015 compatible runtimes, Promise has already been there out of the box. We have actually plenty of polyfills for Promises (including my ThenFail, written in TypeScript) as people who write JavaScript roughly, refer to the same group of people who create wheels. Promises work great with other Promises: A Promises/A+ compatible implementation should work with other Promises/A+ compatible implementations Promises do their best in a Promise-based architecture If you are new to Promise, you may complain about trying Promise with a callback-based project. You may intend to use helpers provided by Promise libraries, such asPromise.all, but it turns out that you have better alternatives,such as the async library. So, the reason that makes you decide to switch should not be these helpers (as there are a lot of them for callbacks).They should be because there's an easier way to handle errors or because you want to take the advantages of ES async and awaitfeatures which are based on Promise. Promisifying existing modules or libraries Though Promises do their best with a Promise-based architecture, it is still possible to begin using Promise with a smaller scope by promisifying existing modules or libraries. Taking Node.js style callbacks as an example, this is how we use them: import * as FS from 'fs'; FS.readFile('some-file.txt', 'utf-8', (error, text) => { if (error) { console.error(error); return; } console.log('Content:', text); }); You may expect a promisified version of readFile to look like the following: FS .readFile('some-file.txt', 'utf-8') .then(text => { console.log('Content:', text); }) .catch(reason => { Console.error(reason); }); Implementing the promisified version of readFile can be easy as the following: function readFile(path: string, options: any): Promise<string> { return new Promise((resolve, reject) => { FS.readFile(path, options, (error, result) => { if (error) { reject(error); } else { resolve(result); } }); }); } I am using any here for parameter options to reduce the size of demo code, but I would suggest that you donot useany whenever possible in practice. There are libraries that are able to promisify methods automatically. Unfortunately, you may need to write declaration files yourself for the promisified methods if there is no declaration file of the promisified version that is available. Views and controllers in Express Many of us may have already been working with frameworks such as Express. This is how we render a view or send back JSON data in Express: import * as Path from 'path'; import * as express from 'express'; let app = express(); app.set('engine', 'hbs'); app.set('views', Path.join(__dirname, '../views')); app.get('/page', (req, res) => { res.render('page', { title: 'Hello, Express!', content: '...' }); }); app.get('/data', (req, res) => { res.json({ version: '0.0.0', items: [] }); }); app.listen(1337); We will usuallyseparate controller from routing, as follows: import { Request, Response } from 'express'; export function page(req: Request, res: Response): void { res.render('page', { title: 'Hello, Express!', content: '...' }); } Thus, we may have a better idea of existing routes, and we may have controllers managed more easily. Furthermore, automated routing can be introduced so that we don't always need to update routing manually: import * as glob from 'glob'; let controllersDir = Path.join(__dirname, 'controllers'); let controllerPaths = glob.sync('**/*.js', { cwd: controllersDir }); for (let path of controllerPaths) { let controller = require(Path.join(controllersDir, path)); let urlPath = path.replace(/\/g, '/').replace(/.js$/, ''); for (let actionName of Object.keys(controller)) { app.get( `/${urlPath}/${actionName}`, controller[actionName] ); } } The preceding implementation is certainly too simple to cover daily usage. However, it displays the one rough idea of how automated routing could work: via conventions that are based on file structures. Now, if we are working with asynchronous code that is written in Promises, an action in the controller could be like the following: export function foo(req: Request, res: Response): void { Promise .all([ Post.getContent(), Post.getComments() ]) .then(([post, comments]) => { res.render('foo', { post, comments }); }); } We use destructuring of an array within a parameter. Promise.all returns a Promise of an array with elements corresponding to values of resolvablesthat are passed in. (A resolvable means a normal value or a Promise-like object that may resolve to a normal value.) However, this is not enough, we need to handle errors properly. Or in some case, the preceding code may fail in silence (which is terrible). In Express, when an error occurs, you should call next (the third argument that is passed into the callback) with the error object, as follows: import { Request, Response, NextFunction } from 'express'; export function foo( req: Request, res: Response, next: NextFunction ): void { Promise // ... .catch(reason => next(reason)); } Now, we are fine with the correctness of this approach, but this is simply not how Promises work. Explicit error handling with callbacks could be eliminated in the scope of controllers, and the easiest way to do this is to return the Promise chain and hand over to code that was previously performing routing logic. So, the controller could be written like the following: export function foo(req: Request, res: Response) { return Promise .all([ Post.getContent(), Post.getComments() ]) .then(([post, comments]) => { res.render('foo', { post, comments }); }); } Or, can we make this even better? Abstraction of response We've already been returning a Promise to tell whether an error occurs. So, for a server error, the Promise actually indicates the result, or in other words, the response of the request. However, why we are still calling res.render()to render the view? The returned Promise object could be an abstraction of the response itself. Think about the following controller again: export class Response {} export class PageResponse extends Response { constructor(view: string, data: any) { } } export function foo(req: Request) { return Promise .all([ Post.getContent(), Post.getComments() ]) .then(([post, comments]) => { return new PageResponse('foo', { post, comments }); }); } The response object that is returned could vary for a different response output. For example, it could be either a PageResponse like it is in the preceding example, a JSONResponse, a StreamResponse, or even a simple Redirection. As in most of the cases, PageResponse or JSONResponse is applied, and the view of a PageResponse can usually be implied with the controller path and action name.It is useful to have these two responses automatically generated from a plain data object with proper view to render with, as follows: export function foo(req: Request) { return Promise .all([ Post.getContent(), Post.getComments() ]) .then(([post, comments]) => { return { post, comments }; }); } This is how a Promise-based controller should respond. With this idea in mind, let's update the routing code with an abstraction of responses. Previously, we were passing controller actions directly as Express request handlers. Now, we need to do some wrapping up with the actions by resolving the return value, and applying operations that are based on the resolved result, as follows: If it fulfills and it's an instance of Response, apply it to the resobjectthat is passed in by Express. If it fulfills and it's a plain object, construct a PageResponse or a JSONResponse if no view found and apply it to the resobject. If it rejects, call thenext function using this reason. As seen previously,our code was like the following: app.get(`/${urlPath}/${actionName}`, controller[actionName]); Now, it gets a little bit more lines, as follows: let action = controller[actionName]; app.get(`/${urlPath}/${actionName}`, (req, res, next) => { Promise .resolve(action(req)) .then(result => { if (result instanceof Response) { result.applyTo(res); } else if (existsView(actionName)) { new PageResponse(actionName, result).applyTo(res); } else { new JSONResponse(result).applyTo(res); } }) .catch(reason => next(reason)); }); However, so far we can only handle GET requests as we hardcoded app.get() in our router implementation. The poor view matching logic can hardly be used in practice either. We need to make these actions configurable, and ES decorators could perform a good job here: export default class Controller { @get({ View: 'custom-view-path' }) foo(req: Request) { return { title: 'Action foo', content: 'Content of action foo' }; } } I'll leave the implementation to you, and feel free to make them awesome. Abstraction of permission Permission plays an important role in a project, especially in systems that have different user groups. For example, a forum. The abstraction of permission should be extendable to satisfy changing requirements, and it should be easy to use as well. Here, we are going to talk about the abstraction of permission in the level of controller actions. Consider the legibility of performing one or more actions a privilege. The permission of a user may consist of several privileges, and usually most of the users at the same level would have the same set of privileges. So, we may have a larger concept, namely groups. The abstraction could either work based on both groups and privileges, or work based on only privileges (groups are now just aliases to sets of privileges): Abstraction that validates based on privileges and groups at the same time is easier to build. You do not need to create a large list of which actions can be performed for a certain group of user, as granular privileges are only required when necessary. Abstraction that validates based on privileges has better control and more flexibility to describe the permission. For example, you can remove a small set of privileges from the permission of a user easily. However, both approaches have similar upper-level abstractions, and they differ mostly on implementations. The general structure of the permission abstractions that we've talked about is like in the following diagram: The participants include the following: Privilege: This describes detailed privilege corresponding to specific actions Group: This defines a set of privileges Permission: This describes what a user is capable of doing, consist of groups that the user belongs to, and the privileges that the user has. Permission descriptor: This describes how the permission of a user works and consists of possible groups and privileges. Expected errors A great concern that was wiped away after using Promises is that we do not need to worry about whether throwing an error in a callback would crash the application most of the time. The error will flow through the Promises chain and if not caught, it will be handled by our router. Errors can be roughly divided as expected errors and unexpected errors. Expected errors are usually caused by incorrect input or foreseeable exceptions, and unexpected errors are usually caused by bugs or other libraries that the project relies on. For expected errors, we usually want to give users a friendly response with readable error messages and codes. So that the user can help themselves searching the error or report to us with useful context. For unexpected errors, we would also want a reasonable response (usually a message described as an unknown error), a detailed server-side log (including real error name, message, stack information, and so on), and even alerts to let the team know as soon as possible. Defining and throwing expected errors The router will need to handle different types of errors, and an easy way to achieve this is to subclass a universal ExpectedError class and throw its instances out, as follows: import ExtendableError from 'extendable-error'; class ExpectedError extends ExtendableError { constructor( message: string, public code: number ) { super(message); } } The extendable-error is a package of mine that handles stack trace and themessage property. You can directly extend Error class as well. Thus, when receiving an expected error, we can safely output the error name and message as part of the response. If this is not an instance of ExpectedError, we can display predefined unknown error messages. Transforming errors Some errors such as errors that are caused by unstable networks or remote services are expected.We may want to catch these errors and throw them out again as expected errors. However, it could be rather trivial to actually do this. A centralized error transforming process can then be applied to reduce the efforts required to manage these errors. The transforming process includes two parts: filtering (or matching) and transforming. These are the approaches to filter errors: Filter by error class: Many third party libraries throws error of certain class. Taking Sequelize (a popular Node.js ORM) as an example, it has DatabaseError, ConnectionError, ValidationError, and so on. By filtering errors by checking whether they are instances of a certain error class, we may easily pick up target errors from the pile. Filter by string or regular expression: Sometimes a library might be throw errors that are instances of theError class itself instead of its subclasses.This makes these errors hard to distinguish from others. In this situation, we can filter these errors by their message with keywords or regular expressions. Filter by scope: It's possible that instances of the same error class with the same error message should result in a different response. One of the reasons may be that the operation throwing a certain error is at a lower-level, but it is being used by upper structures within different scopes. Thus, a scope mark can be added for these errors and make it easier to be filtered. There could be more ways to filter errors, and they are usually able to cooperate as well. By properly applying these filters and transforming errors, we can reduce noises, analyze what's going on within a system,and locate problems faster if they occur. Modularizing project Before ES2015, there are actually a lot of module solutions for JavaScript that work. The most famous two of them might be AMD and CommonJS. AMD is designed for asynchronous module loading, which is mostly applied in browsers. While CommonJSperforms module loading synchronously, and this is the way that the Node.js module system works. To make it work asynchronously, writing an AMD module takes more characters. Due to the popularity of tools, such asbrowserify and webpack, CommonJS becomes popular even for browser projects. Proper granularity of internal modules can help a project keep a healthy structure. Consider project structure like the following: project├─controllers├─core│ │ index.ts│ ││ ├─product│ │ index.ts│ │ order.ts│ │ shipping.ts│ ││ └─user│ index.ts│ account.ts│ statistics.ts│├─helpers├─models├─utils└─views Let's assume that we are writing a controller file that's going to import a module defined by thecore/product/order.ts file. Previously, usingCommonJS style'srequire, we would write the following: const Order = require('../core/product/order'); Now, with the new ES import syntax, this would be like the following: import * as Order from '../core/product/order'; Wait, isn't this essentially the same? Sort of. However, you may have noticed several index.ts files that I've put into folders. Now, in the core/product/index.tsfile, we could have the following: import * as Order from './order'; import * as Shipping from './shipping'; export { Order, Shipping } Or, we could also have the following: export * from './order'; export * from './shipping'; What's the difference? The ideal behind these two approaches of re-exporting modules can vary. The first style works better when we treat Order and Shipping as namespaces, under which the identifier names may not be easy to distinguish from one another. With this style, the files are the natural boundaries of building these namespaces. The second style weakens the namespace property of two files, and then uses them as tools to organize objects and classes under the same larger category. A good thingabout using these files as namespaces is that multiple-level re-exporting is fine, while weakening namespaces makes it harder to understand different identifier names as the number of re-exporting levels grows. Summary In this article, we discussed some interesting ideas and an architecture formed by these ideas. Most of these topics focused on limited examples, and did their own jobs.However, we also discussed ideas about putting a whole system together. Resources for Article: Further resources on this subject: Introducing Object Oriented Programmng with TypeScript [article] Writing SOLID JavaScript code with TypeScript [article] Optimizing JavaScript for iOS Hybrid Apps [article]

0
0
3422

Packt

27 May 2016

13 min read

Wrappers

Packt

27 May 2016

13 min read

In this article by Erik Westra, author of the book Modular Programming with Python, we learn the concepts of wrappers. A wrapper is essentially a group of functions that call other functions to do the work. Wrappers are used to simplify an interface, to make a confusing or badly designed API easier to use, to convert data formats into something more convenient, and to implement cross-language compatibility. Wrappers are also sometimes used to add testing and error-checking code to an existing API. Let's take a look at a real-world application of a wrapper module. Imagine that you work for a large bank and have been asked to write a program to analyze fund transfers to help identify possible fraud. Your program receives information, in real time, about every inter-bank funds transfer that takes place. For each transfer, you are given: The amount of the transfer The ID of the branch in which the transfer took place The identification code for the bank the funds are being sent to Your task is to analyze the transfers over time to identify unusual patterns of activity. To do this, you need to calculate, for each of the last eight days, the total value of all transfers for each branch and destination bank. You can then compare the current day's totals against the average for the previous seven days, and flag any daily totals that are more than 50% above the average. You start by deciding how to represent the total transfers for a day. Because you need to keep track of this for each branch and destination bank, it makes sense to store these totals in a two-dimensional array: In Python, this type of two-dimensional array is represented as a list of lists: totals = [[0, 307512, 1612, 0, 43902, 5602918], [79400, 3416710, 75, 23508, 60912, 5806], ... ] You can then keep a separate list of the branch ID for each row and another list holding the destination bank code for each column: branch_ids = [125000249, 125000252, 125000371, ...] bank_codes = ["AMERUS33", "CERYUS33", "EQTYUS44", ...] Using these lists, you can calculate the totals for a given day by processing the transfers that took place on that particular day: totals = [] for branch in branch_ids: branch_totals = [] for bank in bank_codes: branch_totals.append(0) totals.append(branch_totals) for transfer in transfers_for_day: branch_index = branch_ids.index(transfer['branch']) bank_index = bank_codes.index(transfer['dest_bank']) totals[branch_index][bank_index] += transfer['amount'] So far so good. Once you have these totals for each day, you can then calculate the average and compare it against the current day's totals to identify the entries that are higher than 150% of the average. Let's imagine that you've written this program and managed to get it working. When you start using it, though, you immediately discover a problem: your bank has over 5,000 branches, and there are more than 15,000 banks worldwide that your bank can transfer funds to—that's a total of 75 million combinations that you need to keep totals for, and as a result, your program is taking far too long to calculate the totals. To make your program faster, you need to find a better way of handling large arrays of numbers. Fortunately, there's a library designed to do just this: NumPy. NumPy is an excellent array-handling library. You can create huge arrays and perform sophisticated operations on an array with a single function call. Unfortunately, NumPy is also a dense and impenetrable library. It was designed and written for people with a deep understanding of mathematics. While there are many tutorials available and you can generally figure out how to use it, the code that uses NumPy is often hard to comprehend. For example, to calculate the average across multiple matrices would involve the following: daily_totals = [] for totals in totals_to_average: daily_totals.append(totals) average = numpy.mean(numpy.array(daily_totals), axis=0) Figuring out what that last line does would require a trip to the NumPy documentation. Because of the complexity of the code that uses NumPy, this is a perfect example of a situation where a wrapper module can be used: the wrapper module can provide an easier-to-use interface to NumPy, so your code can use it without being cluttered with complex and confusing function calls. To work through this example, we'll start by installing the NumPy library. NumPy (http://www.numpy.org) runs on Mac OS X, Windows, and Linux machines. How you install it depends on which operating system you are using: For Mac OS X, you can download an installer from http://www.kyngchaos.com/software/python. For MS Windows, you can download a Python "wheel" file for NumPy from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy. Choose the pre-built version of NumPy that matches your operating system and the desired version of Python. To use the wheel file, use the pip install command, for example, pip install numpy-1.10.4+mkl-cp34-none-win32.whl. For more information about installing Python wheels, refer to https://pip.pypa.io/en/latest/user_guide/#installing-from-wheels. If your computer runs Linux, you can use your Linux package manager to install NumPy. Alternatively, you can download and build NumPy in source code form. To ensure that NumPy is working, fire up your Python interpreter and enter the following: import numpy a = numpy.array([[1, 2], [3, 4]]) print(a) All going well, you should see a 2 x 2 matrix displayed: [[1 2] [3 4]] Now that we have NumPy installed, let's start working on our wrapper module. Create a new Python source file, named numpy_wrapper.py, and enter the following into this file: import numpy That's all for now; we'll add functions to this wrapper module as we need them. Next, create another Python source file, named detect_unusual_transfers.py, and enter the following into this file: import random import numpy_wrapper as npw BANK_CODES = ["AMERUS33", "CERYUS33", "EQTYUS44", "LOYDUS33", "SYNEUS44", "WFBIUS6S"] BRANCH_IDS = ["125000249", "125000252", "125000371", "125000402", "125000596", "125001067"] As you can see, we are hardwiring the bank and branch codes for our example; in a real program, these values would be loaded from somewhere, such as a file or a database. Since we don't have any available data, we will use the random module to create some. We are also changing the name of the numpy_wrapper module to make it easier to access from our code. Let's now create some funds transfer data to process, using the random module: days = [1, 2, 3, 4, 5, 6, 7, 8] transfers = [] for i in range(10000): day = random.choice(days) bank_code = random.choice(BANK_CODES) branch_id = random.choice(BRANCH_IDS) amount = random.randint(1000, 1000000) transfers.append((day, bank_code, branch_id, amount)) Here, we randomly select a day, a bank code, a branch ID, and an amount, storing these values in the transfers list. Our next task is to collate this information into a series of arrays. This allows us to calculate the total value of the transfers for each day, grouped by the branch ID and destination bank. To do this, we'll create a NumPy array for each day, where the rows in each array represent branches and the columns represent destination banks. We'll then go through the list of transfers, processing them one by one. The following illustration summarizes how we process each transfer in turn: First, we select the array for the day on which the transfer occurred, and then we select the appropriate row and column based on the destination bank and the branch ID. Finally, we add the amount of the transfer to that item within the day's array. Let's implement this logic. Our first task is to create a series of NumPy arrays, one for each day. Here, we immediately hit a snag: NumPy has many different options for creating arrays; in this case, we want to create an array that holds integer values and has its contents initialized to zero. If we used NumPy directly, our code would look like the following: array = numpy.zeros((num_rows, num_cols), dtype=numpy.int32) This is not exactly easy to understand, so we're going to move this logic into our NumPy wrapper module. Edit the numpy_wrapper.py file, and add the following to the end of this module: def new(num_rows, num_cols): return numpy.zeros((num_rows, num_cols), dtype=numpy.int32) Now, we can create a new array by calling our wrapper function (npw.new()) and not have to worry about the details of how NumPy works at all. We have simplified the interface to this particular aspect of NumPy: Let's now use our wrapper function to create the eight arrays that we will need, one for each day. Add the following to the end of the detect_unusual_transfers.py file: transfers_by_day = {} for day in days: transfers_by_day[day] = npw.new(num_rows=len(BANK_CODES), num_cols=len(BRANCH_IDS)) Now that we have our NumPy arrays, we can use them as if they were nested Python lists. For example: array[row][col] = array[row][col] + amount We just need to choose the appropriate array, and calculate the row and column numbers to use. Here is the necessary code, which you should add to the end of your detect_unusual_transfers.py script: for day,bank_code,branch_id,amount in transfers: array = transfers_by_day[day] row = BRANCH_IDS.index(branch_id) col = BANK_CODES.index(bank_code) array[row][col] = array[row][col] + amount Now that we've collated the transfers into eight NumPy arrays, we want to use all this data to detect any unusual activity. For each combination of branch ID and destination bank code, we will need to do the following: Calculate the average of the first seven days' activity. Multiply the calculated average by 1.5. If the activity on the eighth day is greater than the average multiplied by 1.5, then we consider this activity to be unusual. Of course, we need to do this for every row and column in our arrays, which would be very slow; this is why we're using NumPy. So, we need to calculate the average for multiple arrays of numbers, then multiply the array of averages by 1.5, and finally, compare the values within the multiplied array against the array for the eighth day of data. Fortunately, these are all things that NumPy can do for us. We'll start by collecting together the seven arrays we need to average, as well as the array for the eighth day. To do this, add the following to the end of your program: latest_day = max(days) transfers_to_average = [] for day in days: if day != latest_day: transfers_to_average.append(transfers_by_day[day]) current = transfers_by_day[latest_day] To calculate the average of a list of arrays, NumPy requires us to use the following function call: average = numpy.mean(numpy.array(arrays_to_average), axis=0) Since this is confusing, we will move this function into our wrapper. Add the following code to the end of the numpy_wrapper.py module: def average(arrays_to_average): return numpy.mean(numpy.array(arrays_to_average), axis=0) This lets us calculate the average of the seven day's activity using a single call to our wrapper function. To do this, add the following to the end of your detect_unusual_transfers.py script: average = npw.average(transfers_to_average) As you can see, using the wrapper makes our code much easier to understand. Our next task is to multiply the array of calculated averages by 1.5, and compare the result against the current day's totals. Fortunately, NumPy makes this easy: unusual_transfers = current > average * 1.5 Because this code is so clear, there's no advantage in creating a wrapper function for it. The resulting array, unusual_transfers, will be the same size as our current and average arrays, where each entry in the array is either True or False: We're almost done; our final task is to identify the array entries with a value of True, and tell the user about the unusual activity. While we could scan through every row and column to find the True entries, using NumPy is much faster. The following NumPy code will give us a list containing the row and column numbers for the True entries in the array: indices = numpy.transpose(array.nonzero()) True to form, though, this code is hard to understand, so it's a perfect candidate for another wrapper function. Go back to your numpy_wrapper.py module, and add the following to the end of the file: def get_indices(array): return numpy.transpose(array.nonzero()) This function returns a list (actually an array) of (row,col) values for all the True entries in the array. Back in our detect_unusual_activity.py file, we can use this function to quickly identify the unusual activity: for row,col in npw.get_indices(unusual_transfers): branch_id = BRANCH_IDS[row] bank_code = BANK_CODES[col] average_amt = int(average[row][col]) current_amt = current[row][col] print("Branch {} transferred ${:,d}".format(branch_id, current_amt) + " to bank {}, average = ${:,d}".format(bank_code, average_amt)) As you can see, we use the BRANCH_IDS and BANK_CODES lists to convert from the row and column number back to the relevant branch ID and bank code. We also retrieve the average and current amounts for the suspicious activity. Finally, we print out this information to warn the user about the unusual activity. If you run your program, you should see an output that looks something like this: Branch 125000371 transferred $24,729,847 to bank WFBIUS6S, average = $14,954,617 Branch 125000402 transferred $26,818,710 to bank CERYUS33, average = $16,338,043 Branch 125001067 transferred $27,081,511 to bank EQTYUS44, average = $17,763,644 Because we are using random numbers for our financial data, the output will be random too. Try running the program a few times; you may not get any output at all if none of the randomly-generated values are suspicious. Of course, we are not really interested in detecting suspicious financial activity—this example is just an excuse for working with NumPy. What is far more interesting is the wrapper module that we created, hiding the complexity of the NumPy interface so that the rest of our program can concentrate on the job to be done. If we were to continue developing our unusual activity detector, we would no doubt add more functionality to our numpy_wrapper.py module as we found more NumPy functions that we wanted to wrap. Summary This is just one example of a wrapper module. As we mentioned earlier, simplifying a complex and confusing API is just one use for a wrapper module; they can also be used to convert data from one format to another, add testing and error-checking code to an existing API, and call functions that are written in a different language. Note that, by definition, a wrapper is always thin—while there might be code in a wrapper (for example, to convert a parameter from an object into a dictionary), the wrapper function always ends up calling another function to do the actual work.

0
0
1956

article-image-exploring-scala-performance

Packt

19 May 2016

19 min read

Exploring Scala Performance

Packt

19 May 2016

19 min read

0
0
7635

Packt

17 May 2016

21 min read

Diving into OOP Principles

Packt

17 May 2016

21 min read

0
0
3198

Packt

17 May 2016

19 min read

Expert Python Programming: Interfaces

Packt

17 May 2016

19 min read

0
0
8019

Packt

12 May 2016

10 min read

The Parallel Universe of R

Packt

12 May 2016

10 min read

This article by Simon Chapple, author of the book Mastering Parallel Programming with R, helps us understand the intricacies of parallel computing. Here, we'll take a look into Delores' Crystal Ball at what the future holds for massively parallel computation that will likely have a significant impact on the world of R programming, particularly when applied to big data. (For more resources related to this topic, see here.) Three steps to successful parallelization The following three-step distilled guidance is intended to help you decide what form of parallelism might be best suited for your particular algorithm/problem and summarizes what you learned throughout this article. Necessarily, it applies a level of generalization, so approach these guidelines with due consideration: Determine the type of parallelism that may best apply to your algorithm. Is the problem you are solving more computationally bound or data bound? If the former, then your problem may be amenable to GPUs; if the latter, then your problem may be more amenable to cluster-based computing; and if your problem requires a complex processing chain, then consider using the Spark framework. Can you divide the problem data/space to achieve a balanced workload across all processes, or do you need to employ an adaptive load-balancing scheme—for example, a task farm-based approach? Does your problem/algorithm naturally divide spatially? If so, consider whether a grid-based parallel approach can be used. Perhaps your problem is on an epic scale? If so, maybe you can develop your message passing-based code and run it on a supercomputer. Is there an implied sequential dependency between tasks; that is, do processes need to cooperate and share data during their computation, or can each separate divided task be executed entirely independently of one another? A large proportion of parallel algorithms typically have a work distribution phase, a parallel computation phase, and a result aggregation phase. To reduce the overhead of the startup and close down phases, consider whether a Tree-based approach to work distribution and result aggregation may be appropriate in your case. Ensure that the basis of the compute in your algorithm has optimal implementation. Profile your code in serial to determine whether there are any bottlenecks, and target these for improvement. Is there an existing parallel implementation similar to your algorithm that you can use directly or adopt? Review CRAN Task View: High-Performance and Parallel Computing with R at https://cran.r-project.org/web/views/HighPerformanceComputing.html; in particular, take a look at the subsection entitled Parallel Computing: Applications, a snapshot of which at the time of writing can be seen in the following figure: Figure 1: CRAN provides various parallelized packages you can use in your own program. Test and evaluate the parallel efficiency of your implementation. Use the P-estimated form of Amdahl's Law to predict the level of scalability you can achieve. Test your algorithm at varying amounts of parallelism, particularly odd numbers that trigger edge-case behaviors. Don't forget to run with just a single process. Running with more processes than processors will lead to trigger-lurking deadlock/race conditions (this is most applicable to message passing-based implementations). Where possible, to reduce overhead, ensure that your method of deployment/initialization places the data being consumed locally to each parallel process. What does the future hold? Obviously, this final section is at a considerable risk of "crystal ball gazing" and getting it wrong. However, there are a number of clear directions in which we can see how both hardware and software will develop, which makes it clear that parallel programming will play an ever more important and increasing role in our computational future. Besides, it has now become critical for us to be able to process vast amounts of information within a short window of time in order to ensure our own individual and collective safety. For example, we are experiencing an increased momentum towards significant climate change and extreme weather events and will therefore require increasingly accurate weather prediction to help us deal with this; this will only be possible with highly efficient parallel algorithms. In order to gaze into the future, we need to look back at the past. The hardware technology available to parallel computing has evolved at a phenomenal pace through the years. The levels of performance that can be achieved today by single chip designs are truly staggering in terms of recent history. The history of HPC For an excellent infographic review of the development of computing performance, I would urge you to visit the following web page: http://pages.experts-exchange.com/processing-power-compared/ This beautifully illustrates how, for example, an iPhone 4 released in 2010 has near-equivalent performance to the Cray 2 supercomputer from 1985 of around 1.5 gigaflops, and the Apple Watch released in 2015 has around twice the performance of the iPhone 4 and Cray 2! While chip manufacturers have managed to maintain the famous Moore's law that predicts transistor count doubling every two years, we are now at 14 nanometers (nm) in chip production, giving us around 100 complex processing cores in a single chip. In July 2015, IBM announced a prototype chip at 7 nm (1/10,000th the width of a human hair). Some scientists suggest that quantum tunneling effects will start to impact at 5 nm (which Intel expects to bring to the market by 2020), although a number of research groups have shown individual transistor construction as small as 1 nm in the lab using materials such as graphene. What all of this suggests is that the placement of 1,000 independent high-performance computational cores, together with sufficient amounts of high-speed cache memory inside a single-chip package comparable to the size of today's chips, could potentially be possible within the next 10 years. NIVIDA and Intel are arguably at the forefront of the dedicated HPC chip development with their respective offerings used in the world's fastest supercomputers, which can also be embedded in your desktop computer. NVIDIA produces Tesla, the K80 GPU-based accelerator available that now peaks at 1.87 teraflops double precision and 5.6 teraflops single precision, utilizing 4,992 cores (dual processor) and 24 GB of on-board RAM. Intel produces Xeon Phi, the collective family brand name for its Many Integrated Core (MIC) architecture; Knights Landing, which is new, is expected to peak at 3 teraflops double precision and 6 teraflops single precision, utilizing 72 cores (single processor) and 16 GB of the highly integrated on-chip fast memory when it is released, likely in fall 2015. The successors to these chips, namely NVIDIA's Volta and Intel's Knights Hill, will be the foundation for the next generation of American $200 million dollar supercomputers in 2018, delivering around 150 to 300 petaflops peak performance (around 150 million iPhone 4s), as compared to China's TIANHE-2, ranked as the fastest supercomputer in the world in 2015, with a peak performance of around 50 petaflops from 3.1 million cores. At the other extreme, within the somewhat smaller and less expensive world of mobile devices, most currently use between two and four cores, though mixed multicore capability such as ARM's big.LITTLE octacore makes eight cores available. However, this is already on the increase with, for example, MediaTek's new SoC MT6797, which has 10 main processing cores split into a pair and two groups of four cores with different clock speeds and power requirements to serve as the basis for the next-generation mobile phones. Top-end mobile devices, therefore, exhibit a rich heterogeneous architecture with mixed power cores, separate sensor chips, GPUs, and Digital Signal Processors (DSP) to direct different aspects of workload to the most power-efficient component. Mobile phones increasingly act as the communications hub and signal a processing gateway for a plethora of additional devices, such as biometric wearables and the rapidly expanding number of ultra-low power Internet of Things (IoT) sensing devices smartening all aspects of our local environment. While we are a little bit away from running R itself natively on mobile devices, the time will come when we seek to harness the distributed computing power of all our mobile devices. In 2014 alone, around 1.25 billion smartphones were sold. That's a lot of crowd-sourced compute power and potentially far outstrips any dedicated supercomputer on the planet either existing or planned. The software that enables us to utilize parallel systems, which, as we noted, are increasingly heterogeneous, continues to evolve. In this book, we have examined how you can utilize OpenCL from R to gain access to both the GPU and CPU, making it possible to perform mixed computation across both components, exploiting the particular strengths of each for certain types of processing. Indeed, another related initiative, Heterogeneous System Architecture (HSA), which enables even lower-level access to the spectrum of processor capabilities, may well gain traction over the coming years and help promote the uptake of OpenCL and its counterparts. HSA Foundation HSA Foundation was founded by a cross-industry group led by AMD, ARM, Imagination, MediaTek, Qualcomm, Samsung, and Texas Instruments. Its stated goal is to help support the creation of applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access, enabling greater application performance at low power consumption. To enable this, HSA Foundation is defining key interfaces for parallel computation using CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing. You can find the recently released version 1.0 of the HSA specification at the following link: http://www.hsafoundation.com/html/HSA_Library.htm Hybrid parallelism As a final wrapping up, I thought I would show how you can overcome some of the inherent single-threaded nature of R even further and demonstrate a hybrid approach to parallelism that combines two of the different techniques we covered previously within a single R program. We've also discussed how heterogeneous computing is potentially the way of the future. This example refers to the code we would develop to utilize MPI through pbdMPI together with ROpenCL to enable us to exploit both the CPU and GPU simultaneously. While this is a slightly contrived example and both the devices compute the same dist() function, the intention is to show you just how far you can take things with R to get the most out of all your available compute resource. Basically, all we need to do is to top and tail our implementation of the dist() function in OpenCL with the appropriate pbdMPI initialization and termination and run the script with mpiexec on two processes, as follows: # Initialise both ROpenCL and pdbMPI require(ROpenCL) library(pbdMPI, quietly = TRUE) init() # Select device based on my MPI rank r <- comm.rank() if (r == 0) { # use gpu device <- 1 } else { # use cpu device <- 2 } ... # Execute the OpenCL dist() function on my assigned device comm.print(sprintf("%d executing on device %s", r, getDeviceType(deviceID)), all.rank = TRUE) res <- teval(openclDist(kernel)) comm.print(sprintf("%d done in %f secs",r,res$Duration), all.rank = TRUE) finalize() This is simple and very effective! Summary In this article, we looked at Crystal Ball and saw the prospects for the combination of heterogeneous compute hardware that is here today and that will expand in capability even further in the future, not only in our supercomputers and laptops but also in our personal devices. Parallelism is the only way these systems can be utilized effectively. As the volume of new quantified self- and environmentally-derived data increases and the number of cores in our compute architectures continues to rise, so does the importance of being able to write parallel programs to make use of it all—job security for parallel programmers looks good for many years to come! Resources for Article: Further resources on this subject: Multiplying Performance with Parallel Computing [article] Training and Visualizing a neural network with R [article] Big Data Analysis (R and Hadoop) [article]

0
0
1835

article-image-testing-your-application-cljstest

Packt

11 May 2016

13 min read

Testing Your Application with cljs.test

Packt

11 May 2016

13 min read

In this article written by David Jarvis, Rafik Naccache, and Allen Rohner, authors of the book Learning ClojureScript, we'll take a look at how to configure our ClojureScript application or library for testing. As usual, we'll start by creating a new project for us to play around with: $ lein new figwheel testing (For more resources related to this topic, see here.) We'll be playing around in a test directory. Most JVM Clojure projects will have one already, but since the default Figwheel template doesn't include a test directory, let's make one first (following the same convention used with source directories, that is, instead of src/$PROJECT_NAME we'll create test/$PROJECT_NAME): $ mkdir –p test/testing We'll now want to make sure that Figwheel knows that it has to watch the test directory for file modifications. To do that, we will edit the the dev build in our project.clj project's :cljsbuild map so that it's :source-paths vector includes both src and test. Your new dev build configuration should look like the following: {:id "dev" :source-paths ["src" "test"] ;; If no code is to be run, set :figwheel true for continued automagical reloading :figwheel {:on-jsload "testing.core/on-js-reload"} :compiler {:main testing.core :asset-path "js/compiled/out" :output-to "resources/public/js/compiled/testing.js" :output-dir "resources/public/js/compiled/out" :source-map-timestamp true}} Next, we'll get the old Figwheel REPL going so that we can have our ever familiar hot reloading: $ cd testing $ rlwrap lein figwheel Don't forget to navigate a browser window to http://localhost:3449/ to get the browser REPL to connect. Now, let's create a new core_test.cljs file in the test/testing directory. By convention, most libraries and applications in Clojure and ClojureScript have test files that correspond to source files with the suffix _test. In this project, this means that test/testing/core_test.cljs is intended to contain the tests for src/testing/core.cljs. Let's get started by just running tests on a single file. Inside core_test.cljs, let's add the following code: (ns testing.core-test (:require [cljs.test :refer-macros [deftest is]])) (deftest i-should-fail (is (= 1 0))) (deftest i-should-succeed (is (= 1 1))) This code first requires two of the most important cljs.test macros, and then gives us two simple examples of what a failed test and a successful test should look like. At this point, we can run our tests from the Figwheel REPL: cljs.user=> (require 'testing.core-test) ;; => nil cljs.user=> (cljs.test/run-tests 'testing.core-test) Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 2 tests containing 2 assertions. 1 failures, 0 errors. ;; => nil At this point, what we've got is tolerable, but it's not really practical in terms of being able to test a larger application. We don't want to have to test our application in the REPL and pass in our test namespaces one by one. The current idiomatic solution for this in ClojureScript is to write a separate test runner that is responsible for important executions and then run all of your tests. Let's take a look at what this looks like. Let's start by creating another test namespace. Let's call this one app_test.cljs, and we'll put the following in it: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is]])) (deftest another-successful-test (is (= 4 (count "test")))) We will not do anything remarkable here; it's just another test namespace with a single test that should pass by itself. Let's quickly make sure that's the case at the REPL: cljs.user=> (require 'testing.app-test) nil cljs.user=> (cljs.test/run-tests 'testing.app-test) Testing testing.app-test Ran 1 tests containing 1 assertions. 0 failures, 0 errors. ;; => nil Perfect. Now, let's write a test runner. Let's open a new file that we'll simply call test_runner.cljs, and let's include the following: (ns testing.test-runner (:require [cljs.test :refer-macros [run-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (defn run-all-tests [] (run-tests 'testing.app-test 'testing.core-test)) Again, nothing surprising. We're just making a single function for us that runs all of our tests. This is handy for us at the REPL: cljs.user=> (testing.test-runner/run-all-tests) Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. ;; => nil Ultimately, however, we want something we can run at the command line so that we can use it in a continuous integration environment. There are a number of ways we can go about configuring this directly, but if we're clever, we can let someone else do the heavy lifting for us. Enter doo, the handy ClojureScript testing plugin for Leiningen. Using doo for easier testing configuration doo is a library and Leiningen plugin for running cljs.test in many different JavaScript environments. It makes it easy to test your ClojureScript regardless of whether you're writing for the browser or for the server, and it also includes file watching capabilities such as Figwheel so that you can automatically rerun tests on file changes. The doo project page can be found at https://github.com/bensu/doo. To configure our project to use doo, first we need to add it to the list of plugins in our project.clj file. Modify the :plugins key so that it looks like the following: :plugins [[lein-figwheel "0.5.2"] [lein-doo "0.1.6"] [lein-cljsbuild "1.1.3" :exclusions [[org.clojure/clojure]]]] Next, we will add a new cljsbuild build configuration for our test runner. Add the following build map after the dev build map on which we've been working with until now: {:id "test" :source-paths ["src" "test"] :compiler {:main testing.test-runner :output-to "resources/public/js/compiled/testing_test.js" :optimizations :none}} This configuration tells Cljsbuild to use both our src and test directories, just like our dev profile. It adds some different configuration elements to the compiler options, however. First, we're not using testing.core as our main namespace anymore—instead, we'll use our test runner's namespace, testing.test-runner. We will also change the output JavaScript file to a different location from our compiled application code. Lastly, we will make sure that we pass in :optimizations :none so that the compiler runs quickly and doesn't have to do any magic to look things up. Note that our currently running Figwheel process won't know about the fact that we've added lein-doo to our list of plugins or that we've added a new build configuration. If you want to make Figwheel aware of doo in a way that'll allow them to play nicely together, you should also add doo as a dependency to your project. Once you've done that, exit the Figwheel process and restart it after you've saved the changes to project.clj. Lastly, we need to modify our test runner namespace so that it's compatible with doo. To do this, open test_runner.cljs and change it to the following: (ns testing.test-runner (:require [doo.runner :refer-macros [doo-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (doo-tests 'testing.app-test 'testing.core-test) This shouldn't look too different from our original test runner—we're just importing from doo.runner rather than cljs.test and using doo-tests instead of a custom runner function. The doo-tests runner works very similarly to cljs.test/run-tests, but it places hooks around the tests to know when to start them and finish them. We're also putting this at the top-level of our namespace rather than wrapping it in a particular function. The last thing we're going to need to do is to install a JavaScript runtime that we can use to execute our tests with. Up until now, we've been using the browser via Figwheel, but ideally, we want to be able to run our tests in a headless environment as well. For this purpose. we recommend installing PhantomJS (though other execution environments are also fine). If you're on OS X and have Homebrew installed (http://www.brew.sh), installing PhantomJS is as simple as typing brew install phantomjs. If you're not on OS X or don't have Homebrew, you can find instructions on how to install PhantomJS on the project's website at http://phantomjs.org/. The key thing is that the following should work: $ phantomjs -v 2.0.0 Once you've got PhantomJS installed, you can now invoke your test runner from the command line with the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Let's break down this command. The first part, lein doo, just tells Leiningen to invoke the doo plugin. Next, we have phantom, which tells doo to use PhantomJS as its running environment. The doo plugin supports a number of other environments, including Chrome, Firefox, Internet Explorer, Safari, Opera, SlimerJS, NodeJS, Rhino, and Nashorn. Be aware that if you're interested in running doo on one of these other environments, you may have to configure and install additional software. For instance, if you want to run tests on Chrome, you'll need to install Karma as well as the appropriate Karma npm modules to enable Chrome interaction. Next we have test, which refers to the cljsbuild build ID we set up earlier. Lastly, we have once, which tells doo to just run tests and not to set up a filesystem watcher. If, instead, we wanted doo to watch the filesystem and rerun tests on any changes, we would just use lein doo phantom test. Testing fixtures The cljs.test project has support for adding fixtures to your tests that can run before and after your tests. Test fixtures are useful for establishing isolated states between tests—for instance, you can use tests to set up a specific database state before each test and to tear it down afterward. You can add them to your ClojureScript tests by declaring them with the use-fixtures macro within the testing namespace you want fixtures applied to. Let's see what this looks like in practice by changing one of our existing tests and adding some fixtures to it. Modify app-test.cljs to the following: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is use-fixtures]])) ;; Run these fixtures for each test. ;; We could also use :once instead of :each in order to run ;; fixtures once for the entire namespace instead of once for ;; each individual test. (use-fixtures :each {:before (fn [] (println "Setting up tests...")) :after (fn [] (println "Tearing down tests..."))}) (deftest another-successful-test ;; Give us an idea of when this test actually executes. (println "Running a test...") (is (= 4 (count "test")))) Here, we've added a call to use-fixtures that prints to the console before and after running the test, and we've added a println call to the test itself so that we know when it executes. Now when we run this test, we get the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Setting up tests... Running a test... Tearing down tests... Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Note that our fixtures get called in the order we expect them to. Asynchronous testing Due to the fact that client-side code is frequently asynchronous and JavaScript is single threaded, we need to have a way to support asynchronous tests. To do this, we can use the async macro from cljs.test. Let's take a look at an example using an asynchronous HTTP GET request. First, let's modify our project.clj file to add cljs-ajax to our dependencies. Our dependencies project key should now look something like this: :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/clojurescript "1.7.228"] [cljs-ajax "0.5.4"] [org.clojure/core.async "0.2.374" :exclusions [org.clojure/tools.reader]]] Next, let's create a new async_test.cljs file in our test.testing directory. Inside it, we will add the following code: (ns testing.async-test (:require [ajax.core :refer [GET]] [cljs.test :refer-macros [deftest is async]])) (deftest test-async (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!"))})) Note that we're not using async in our test at the moment. Let's try running this test with doo (don't forget that you have to add testing.async-test to test_runner.cljs!): $ lein doo phantom test once ... Testing testing.async-test ... Ran 4 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Now, our test here passes, but note that the println async code never fires, and our additional assertion doesn't get called (looking back at our previous examples, since we've added a new is assertion we should expect to see four assertions in the final summary)! If we actually want our test to appropriately validate the error-handler callback within the context of the test, we need to wrap it in an async block. Doing so gives us a test that looks like the following: (deftest test-async (async done (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!") (done))}))) Now, let's try to run our tests again: $ lein doo phantom test once ... Testing testing.async-test Test finished! ... Ran 4 tests containing 4 assertions. 1 failures, 0 errors. Subprocess failed Awesome! Note that this time we see the printed statement from our callback, and we can see that cljs.test properly ran all four of our assertions. Asynchronous fixtures One final "gotcha" on testing—the fixtures we talked about earlier in this article do not handle asynchronous code automatically. This means that if you have a :before fixture that executes asynchronous logic, your test can begin running before your fixture has completed! In order to get around this, all you need to do is to wrap your :before fixture in an async block, just like with asynchronous tests. Consider the following for instance: (use-fixtures :once {:before #(async done ... (done)) :after #(do ...)}) Summary This concludes our section on cljs.test. Testing, whether in ClojureScript or any other language, is a critical software engineering best practice to ensure that your application behaves the way you expect it to and to protect you and your fellow developers from accidentally introducing bugs to your application. With cljs.test and doo, you have the power and flexibility to test your ClojureScript application with multiple browsers and JavaScript environments and to integrate your tests into a larger continuous testing framework. Resources for Article: Further resources on this subject: Clojure for Domain-specific Languages - Design Concepts with Clojure [article] Visualizing my Social Graph with d3.js [article] Improving Performance with Parallel Programming [article]

0
0
6472

Packt

21 Apr 2016

13 min read

Introducing Dynamics CRM

Packt

21 Apr 2016

13 min read

0
0
2933

Packt

15 Apr 2016

24 min read

Web Server Development

Packt

15 Apr 2016

24 min read

In this article by Holger Brunn, Alexandre Fayolle, and Daniel Eufémio Gago Reis, the authors of the book, Odoo Development Cookbook, have discussed how to deploy the web server in Odoo. In this article, we'll cover the following topics: Make a path accessible from the network Restrict access to web accessible paths Consume parameters passed to your handlers Modify an existing handler Using the RPC API (For more resources related to this topic, see here.) Introduction We'll introduce the basics of the web server part of Odoo in this article. Note that this article covers the fundamental pieces. All of Odoo's web request handling is driven by the Python library werkzeug (http://werkzeug.pocoo.org). While the complexity of werkzeug is mostly hidden by Odoo's convenient wrappers, it is an interesting read to see how things work under the hood. Make a path accessible from the network In this recipe, we'll see how to make an URL of the form http://yourserver/path1/path2 accessible to users. This can either be a web page or a path returning arbitrary data to be consumed by other programs. In the latter case, you would usually use the JSON format to consume parameters and to offer you data. Getting ready We'll make use of a ready-made library.book model. We want to allow any user to query the full list of books. Furthermore, we want to provide the same information to programs via a JSON request. How to do it… We'll need to add controllers, which go into a folder called controllers by convention. Add a controllers/main.py file with the HTML version of our page: from openerp import http from openerp.http import request class Main(http.Controller): @http.route('/my_module/books', type='http', auth='none') def books(self): records = request.env['library.book']. sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a function to serve the same information in the JSON format @http.route('/my_module/books/json', type='json', auth='none') def books_json(self): records = request.env['library.book']. sudo().search([]) return records.read(['name']) Add the file controllers/__init__.py: from . import main Add controllers to your __init__.py addon: from . import controllers After restarting your server, you can visit /my_module/books in your browser and get presented with a flat list of book names. To test the JSON-RPC part, you'll have to craft a JSON request. A simple way to do that would be using the following command line to receive the output on the command line: curl -i -X POST -H "Content-Type: application/json" -d "{}" localhost:8069/my_module/books/json If you get 404 errors at this point, you probably have more than one database available on your instance. In this case, it's impossible for Odoo to determine which database is meant to serve the request. Use the --db-filter='^yourdatabasename$' parameter to force using exact database you installed the module in. Now the path should be accessible. How it works… The two crucial parts here are that our controller is derived from openerp.http.Controller and that the methods we use to serve content are decorated with openerp.http.route. Inheriting from openerp.http.Controller registers the controller with Odoo's routing system in a similar way as models are registered by inheriting from openerp.models.Model; also, Controller has a meta class that takes care of this. In general, paths handled by your addon should start with your addon's name to avoid name clashes. Of course, if you extend some addon's functionality, you'll use this addon's name. openerp.http.route The route decorator allows us to tell Odoo that a method is to be web accessible in the first place, and the first parameter determines on which path it is accessible. Instead of a string, you can also pass a list of strings in case you use the same function to serve multiple paths. The type argument defaults to http and determines what type of request is to be served. While strictly speaking JSON is HTTP, declaring the second function as type='json' makes life a lot easier, because Odoo then handles type conversions itself. Don't worry about the auth parameter for now, it will be addressed in recipe Restrict access to web accessible paths. Return values Odoo's treatment of the functions' return values is determined by the type argument of the route decorator. For type='http', we usually want to deliver some HTML, so the first function simply returns a string containing it. An alternative is to use request.make_response(), which gives you control over the headers to send in the response. So to indicate when our page was updated the last time, we might change the last line in books() to the following: return request.make_response( result, [ ('Last-modified', email.utils.formatdate( ( fields.Datetime.from_string( request.env['library.book'].sudo() .search([], order='write_date desc', limit=1) .write_date) - datetime.datetime(1970, 1, 1) ).total_seconds(), usegmt=True)), ]) This code sends a Last-modified header along with the HTML we generated, telling the browser when the list was modified for the last time. We extract this information from the write_date field of the library.book model. In order for the preceding snippet to work, you'll have to add some imports on the top of the file: import email import datetime from openerp import fields You can also create a Response object of werkzeug manually and return that, but there's little gain for the effort. Generating HTML manually is nice for demonstration purposes, but you should never do this in production code. Always use templates as appropriate and return them by calling request.render(). This will give you localization for free and makes your code better by separating business logic from the presentation layer. Also, templates provide you with functions to escape data before outputting HTML. The preceding code is vulnerable to cross-site-scripting attacks if a user manages to slip a script tag into the book name, for example. For a JSON request, simply return the data structure you want to hand over to the client, Odoo takes care of serialization. For this to work, you should restrict yourself to data types that are JSON serializable, which are roughly dictionaries, lists, strings, floats and integers. openerp.http.request The request object is a static object referring to the currently handled request, which contains everything you need to take useful action. Most important is the property request.env, which contains an Environment object which is just the same as in self.env for models. This environment is bound to the current user, which is none in the preceding example because we used auth='none'. Lack of a user is also why we have to sudo() all our calls to model methods in the example code. If you're used to web development, you'll expect session handling, which is perfectly correct. Use request.session for an OpenERPSession object (which is quite a thin wrapper around the Session object of werkzeug), and request.session.sid to access the session id. To store session values, just treat request.session as a dictionary: request.session['hello'] = 'world' request.session.get('hello') Note that storing data in the session is not different from using global variables. Use it only if you must - that is usually the case for multi request actions like a checkout in the website_sale module. And also in this case, handle all functionality concerning sessions in your controllers, never in your modules. There's more… The route decorator can have some extra parameters to customize its behavior further. By default, all HTTP methods are allowed, and Odoo intermingles with the parameters passed. Using the parameter methods, you can pass a list of methods to accept, which usually would be one of either ['GET'] or ['POST']. To allow cross origin requests (browsers block AJAX and some other types of requests to domains other than where the script was loaded from for security and privacy reasons), set the cors parameter to * to allow requests from all origins, or some URI to restrict requests to ones originating from this URI. If this parameter is unset, which is the default, the Access-Control-Allow-Origin header is not set, leaving you with the browser's standard behavior. In our example, we might want to set it on /my_module/books/json in order to allow scripts pulled from other websites accessing the list of books. By default, Odoo protects certain types of requests from an attack known as cross-site request forgery by passing a token along on every request. If you want to turn that off, set the parameter csrf to False, but note that this is a bad idea in general. See also If you host multiple Odoo databases on the same instance and each database has different web accessible paths on possibly multiple domain names per database, the standard regular expressions in the --db-filter parameter might not be enough to force the right database for every domain. In that case, use the community module dbfilter_from_header from https://github.com/OCA/server-tools in order to configure the database filters on proxy level. To see how using templates makes modularity possible, see recipe Modify an existing handler later in the article. Restrict access to web accessible paths We'll explore the three authentication mechanisms Odoo provides for routes in this recipe. We'll define routes with different authentication mechanisms in order to show their differences. Getting ready As we extend code from the previous recipe, we'll also depend on the library.book model, so you should get its code correct in order to proceed. How to do it… Define handlers in controllers/main.py: Add a path that shows all books: @http.route('/my_module/all-books', type='http', auth='none') def all_books(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a path that shows all books and indicates which was written by the current user, if any: @http.route('/my_module/all-books/mark-mine', type='http', auth='public') def all_books_mark_mine(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table>' for record in records: result += '<tr>' if record.author_ids & request.env.user.partner_id: result += '<th>' else: result += '<td>' result += record.name if record.author_ids & request.env.user.partner_id: result += '</th>' else: result += '</td>' result += '</tr>' result += '</table></body></html>' return result Add a path that shows the current user's books: @http.route('/my_module/all-books/mine', type='http', auth='user') def all_books_mine(self): records = request.env['library.book'].search([ ('author_ids', 'in', request.env.user.partner_id.ids), ]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result With this code, the paths /my_module/all_books and /my_module/all_books/mark_mine look the same for unauthenticated users, while a logged in user sees her books in a bold font on the latter path. The path /my_module/all-books/mine is not accessible at all for unauthenticated users. If you try to access it without being authenticated, you'll be redirected to the login screen in order to do so. How it works… The difference between authentication methods is basically what you can expect from the content of request.env.user. For auth='none', the user record is always empty, even if an authenticated user is accessing the path. Use this if you want to serve content that has no dependencies on users, or if you want to provide database agnostic functionality in a server wide module. The value auth='public' sets the user record to a special user with XML ID, base.public_user, for unauthenticated users, and to the user's record for authenticated ones. This is the right choice if you want to offer functionality to both unauthenticated and authenticated users, while the authenticated ones get some extras, as demonstrated in the preceding code. Use auth='user' to be sure that only authenticated users have access to what you've got to offer. With this method, you can be sure request.env.user points to some existing user. There's more… The magic for authentication methods happens in the ir.http model from the base addon. For whatever value you pass to the auth parameter in your route, Odoo searches for a function called _auth_method_<yourvalue> on this model, so you can easily customize this by inheriting this model and declaring a method that takes care of your authentication method of choice. As an example, we provide an authentication method base_group_user which enforces a currently logged in user who is a member of the group with XML ID, base.group_user: from openerp import exceptions, http, models from openerp.http import request class IrHttp(models.Model): _inherit = 'ir.http' def _auth_method_base_group_user(self): self._auth_method_user() if not request.env.user.has_group('base.group_user'): raise exceptions.AccessDenied() Now you can say auth='base_group_user' in your decorator and be sure that users running this route's handler are members of this group. With a little trickery you can extend this to auth='groups(xmlid1,…)', the implementation of this is left as an exercise to the reader, but is included in the example code. Consume parameters passed to your handlers It's nice to be able to show content, but it's better to show content as a result of some user input. This recipe will demonstrate the different ways to receive this input and react to it. As the recipes before, we'll make use of the library.book model. How to do it… First, we'll add a route that expects a traditional parameter with a book's ID to show some details about it. Then, we'll do the same, but we'll incorporate our parameter into the path itself: Add a path that expects a book's ID as parameter: @http.route('/my_module/book_details', type='http', auth='none') def book_details(self, book_id): record = request.env['library.book'].sudo().browse( int(book_id)) return u'<html><body><h1>%s</h1>Authors: %s' % ( record.name, u', '.join(record.author_ids.mapped( 'name')) or 'none', ) Add a path where we can pass the book's ID in the path @http.route("/my_module/book_details/<model('library.book') :book>", type='http', auth='none') def book_details_in_path(self, book): return self.book_details(book.id) If you point your browser to /my_module/book_details?book_id=1, you should see a detail page of the book with ID 1. If this doesn't exist, you'll receive an error page. The second handler allows you to go to /my_module/book_details/1 and view the same page. How it works… By default, Odoo (actually werkzeug) intermingles with GET and POST parameters and passes them as keyword argument to your handler. So by simply declaring your function as expecting a parameter called book_id, you introduce this parameter as either GET (the parameter in the URL) or POST (usually passed by forms with your handler as action) parameter. Given that we didn't add a default value for this parameter, the runtime will raise an error if you try to access this path without setting the parameter. The second example makes use of the fact that in a werkzeug environment, most paths are virtual anyway. So we can simply define our path as containing some input. In this case, we say we expect the ID of a library.book as the last component of the path. The name after the colon is the name of a keyword argument. Our function will be called with this parameter passed as keyword argument. Here, Odoo takes care of looking up this ID and delivering a browse record, which of course only works if the user accessing this path has appropriate permissions. Given that book is a browse record, we can simply recycle the first example's function by passing book.id as parameter book_id to give out the same content. There's more… Defining parameters within the path is a functionality delivered by werkzeug, which is called converters. The model converter is added by Odoo, which also defines the converter, models, that accepts a comma separated list of IDs and passes a record set containing those IDs to your handler. The beauty of converters is that the runtime coerces the parameters to the expected type, while you're on your own with normal keyword parameters. These are delivered as strings and you have to take care of the necessary type conversions yourself, as seen in the first example. Built-in werkzeug converters include int, float, and string, but also more intricate ones such as path, any, or uuid. You can look up their semantics at http://werkzeug.pocoo.org/docs/0.11/routing/#builtin-converters. See also Odoo's custom converters are defined in ir_http.py in the base module and registered in the _get_converters method of ir.http. As an exercise, you can create your own converter that allows you to visit the /my_module/book_details/Odoo+cookbook page to receive the details of this book (if you added it to your library before). Modify an existing handler When you install the website module, the path /website/info displays some information about your Odoo instance. In this recipe, we override this in order to change this information page's layout, but also to change what is displayed. Getting ready Install the website module and inspect the path /website/info. Now craft a new module that depends on website and uses the following code. How to do it… We'll have to adapt the existing template and override the existing handler: Override the qweb template in a file called views/templates.xml: <?xml version="1.0" encoding="UTF-8"?> <odoo> <template id="show_website_info" inherit_id="website.show_website_info"> <xpath expr="//dl[@t-foreach='apps']" position="replace"> <table class="table"> <tr t-foreach="apps" t-as="app"> <th> <a t-att-href="app.website"> <t t-esc="app.name" /></a> </th> <td><t t-esc="app.summary" /></td> </tr> </table> </xpath> </template> </odoo> Override the handler in a file called controllers/main.py: from openerp import http from openerp.addons.website.controllers.main import Website class Website(Website): @http.route() def website_info(self): result = super(Website, self).website_info() result.qcontext['apps'] = result.qcontext[ 'apps'].filtered( lambda x: x.name != 'website') return result Now when visiting the info page, we'll only see a filtered list of installed applications, and in a table as opposed to the original definition list. How it works In the first step, we override an existing QWeb template. In order to find out which that is, you'll have to consult the code of the original handler. Usually, it will end with the following command line, which tells you that you need to override template.name: return request.render('template.name', values) In our case, the handler uses a template called website.info, but this one is extended immediately by another template called website.show_website_info, so it's more convenient to override this one. Here, we replace the definition list showing installed apps with a table. In order to override the handler method, we must identify the class that defines the handler, which is openerp.addons.website.controllers.main.Website in this case. We import the class to be able to inherit from it. Now we override the method and change the data passed to the response. Note that what the overridden handler returns is a Response object and not a string of HTML as the previous recipes did for the sake of brevity. This object contains a reference to the template to be used and the values accessible to the template, but is only evaluated at the very end of the request. In general, there are three ways to change an existing handler: If it uses a QWeb template, the simplest way of changing it is to override the template. This is the right choice for layout changes and small logic changes. QWeb templates get a context passed, which is available in the response as the field qcontext. This usually is a dictionary where you can add or remove values to suit your needs. In the preceding example, we filter the list of apps to only contain apps which have a website set. If the handler receives parameters, you could also preprocess those in order to have the overridden handler behave the way you want. There's more… As seen in the preceding section, inheritance with controllers works slightly differently than model inheritance: You actually need a reference to the base class and use Python inheritance on it. Don't forget to decorate your new handler with the @http.route decorator; Odoo uses it as a marker for which methods are exposed to the network layer. If you omit the decorator, you actually make the handler's path inaccessible. The @http.route decorator itself behaves similarly to field declarations: every value you don't set will be derived from the decorator of the function you're overriding, so we don't have to repeat values we don't want to change. After receiving a response object from the function you override, you can do a lot more than just changing the QWeb context: You can add or remove HTTP headers by manipulating response.headers. If you want to render an entirely different template, you can set response.template. To detect if a response is based on QWeb in the first place, query response.is_qweb. The resulting HTML code is available by calling response.render(). Using the RPC API One of Odoo's strengths is its interoperability, which is helped by the fact that basically any functionality is available via JSON-RPC 2.0 and XMLRPC. In this recipe, we'll explore how to use both of them from client code. This interface also enables you to integrate Odoo with any other application. Making functionality available via any of the two protocols on the server side is explained in the There's more section of this recipe. We'll query a list of installed modules from the Odoo instance, so that we could show a list as the one displayed in the previous recipe in our own application or website. How to do it… The following code is not meant to run within Odoo, but as simple scripts: First, we query the list of installed modules via XMLRPC: #!/usr/bin/env python2 import xmlrpclib db = 'odoo9' user = 'admin' password = 'admin' uid = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/common') .authenticate(db, user, password, {}) odoo = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/object') installed_modules = odoo.execute_kw( db, uid, password, 'ir.module.module', 'search_read', [[('state', '=', 'installed')], ['name']], {'context': {'lang': 'fr_FR'}}) for module in installed_modules: print module['name'] Then we do the same with JSONRPC: import json import urllib2 db = 'odoo9' user = 'admin' password = 'admin' request = urllib2.Request( 'http://localhost:8069/web/session/authenticate', json.dumps({ 'jsonrpc': '2.0', 'params': { 'db': db, 'login': user, 'password': password, }, }), {'Content-type': 'application/json'}) result = urllib2.urlopen(request).read() result = json.loads(result) session_id = result['result']['session_id'] request = urllib2.Request( 'http://localhost:8069/web/dataset/call_kw', json.dumps({ 'jsonrpc': '2.0', 'params': { 'model': 'ir.module.module', 'method': 'search_read', 'args': [ [('state', '=', 'installed')], ['name'], ], 'kwargs': {'context': {'lang': 'fr_FR'}}, }, }), { 'X-Openerp-Session-Id': session_id, 'Content-type': 'application/json', }) result = urllib2.urlopen(request).read() result = json.loads(result) for module in result['result']: print module['name'] Both code snippets will print a list of installed modules, and because they pass a context that sets the language to French, the list will be in French if there are no translations available. How it works… Both snippets call the function search_read, which is very convenient because you can specify a search domain on the model you call, pass a list of fields you want to be returned, and receive the result in one request. In older versions of Odoo, you had to call search first to receive a list of IDs and then call read to actually read the data. search_read returns a list of dictionaries, with the keys being the names of the fields requested and the values the record's data. The ID field will always be transmitted, no matter if you requested it or not. Now, we need to look at the specifics of the two protocols. XMLRPC The XMLRPC API expects a user ID and a password for every call, which is why we need to fetch this ID via the method authenticate on the path /xmlrpc/2/common. If you already know the user's ID, you can skip this step. As soon as you know the user's ID, you can call any model's method by calling execute_kw on the path /xmlrpc/2/object. This method expects the database you want to execute the function on, the user's ID and password for authentication, then the model you want to call your function on, and then the function's name. The next two mandatory parameters are a list of positional arguments to your function, and a dictionary of keyword arguments. JSONRPC Don't be distracted by the size of the code example, that's because Python doesn't have built in support for JSONRPC. As soon as you've wrapped the urllib calls in some helper functions, the example will be as concise as the XMLRPC one. As JSONRPC is stateful, the first thing we have to do is to request a session at /web/session/authenticate. This function takes the database, the user's name, and their password. The crucial part here is that we record the session ID Odoo created, which we pass in the header X-Openerp-Session-Id to /web/dataset/call_kw. Then the function behaves the same as execute_kw from; we need to pass a model name and a function to call on it, then positional and keyword arguments. There's more… Both protocols allow you to call basically any function of your models. In case you don't want a function to be available via either interface, prepend its name with an underscore – Odoo won't expose those functions as RPC calls. Furthermore, you need to take care that your parameters, as well as the return values, are serializable for the protocol. To be sure, restrict yourself to scalar values, dictionaries, and lists. As you can do roughly the same with both protocols, it's up to you which one to use. This decision should be mainly driven by what your platform supports best. In a web context, you're generally better off with JSON, because Odoo allows JSON handlers to pass a CORS header conveniently (see the Make a path accessible from the network recipe for details). This is rather difficult with XMLRPC. Summary In this article, we saw how to start about with the web server architecture. Later on, we covered the Routes and Controllers that will be used in the article and their authentication, how the handlers consumes parameters, and how to use an RPC API, namely, JSON-RPC and XML-RPC. Resources for Article: Further resources on this subject: Advanced React [article] Remote Authentication [article] ASP.Net Site Performance: Improving JavaScript Loading [article]

0
0
5471

Packt

11 Apr 2016

34 min read

Setting Up and Cleaning Up

Packt

11 Apr 2016

34 min read

0
0
3157

Packt

08 Apr 2016

7 min read

Building Custom Widgets

Packt

08 Apr 2016

7 min read

This article by Yogesh Dhanapal and Jayakrishnan Vijayaraghavan, authors of the book ArcGIS for JavaScript developers by Example, will develop a custom widget. (For more resources related to this topic, see here.) Building a custom widget Let's create a custom widget in the app, which will do the following: Allow the user to draw a polygon on the map. The polygon should be symbolized with a semitransparent red fill with a dashed yellow outline. The polygon should fetch all the major wild fire events within the boundary of the polygon. This shall be shown as highlighted in graphics and the data should in a grid. Internationalization support must be provided. Modules required for the widget Let's list the modules required to define class and their corresponding intended callback function decoration The modules for Class declaration and OOPS are illustrated in the following table: Modules Callback functions dojo/_base/declare declare dijit/_WidgetBase _WidgetBase dojo/_base/lang lang The modules for using HTML templates are illustrated in the following table: Modules Callback functions dijit/_TemplatedMixin _TemplatedMixin dojo/text! dijitTemplate The modules for using Event is illustrated in the following table: Modules Callback functions dojo/on on dijit/a11yclick a11yclick The modules for manipulating dom elements and their style are illustrated in the following table: Modules Callback functions dojo/dom-style domStyle dojo/dom-class domClass dojo/domReady! - Modules for using draw toolbar and displaying graphics Modules Callback functions esri/toolbars/draw Draw esri/symbols/SimpleFillSymbol SimpleFillSymbol esri/symbols/SimpleLineSymbol SimpleLineSymbol esri/graphic Graphic dojo/_base/Color Color Modules for querying data Modules Callback functions esri/tasks/query Query esri/tasks/QueryTask QueryTask Modules for internationalization support Module Callback functions dojo/i18n! nls Using the draw toolbar Draw toolbar enables us to draw graphics on the map. Draw toolbar has events associated with it. When a draw operation is completed, it returns the object drawn on the map as geometry. Perform the following steps to create a graphic using the draw toolbar: Initiating Draw toolbar The draw toolbar is provided by the module esri/toolbars/draw. The draw toolbar accepts the map object as an argument. Instantiate the draw toolbar within the postCreate function. The draw toolbar also accepts an additional optional argument named options. One of the properties in the options object is named showTooltips. This can be set to true so that we can see a tooltip associated while drawing. The text in the tooltip can be customized. Else, a default tooltip associated with draw geometry is displayed: return declare([_WidgetBase, _TemplatedMixin], { //assigning html template to template string templateString: dijitTemplate, isDrawActive: false, map: null, tbDraw: null, constructor: function (options, srcRefNode) { this.map = options.map; }, startup: function () {}, postCreate: function () { this.inherited(arguments); this.tbDraw = new Draw(this.map, {showTooltips : true}); } The Draw toolbar can be activated on the click event or touch event (in case of smartphones or tablets) of a button, which is intended to indicate the start of a draw event. Dojo provides a module that takes care of touch as well as click events. The module is named dijit/a11yclick. To activate the draw toolbar, we need to provide the type of symbol to draw. The draw toolbar provides a list of constants, which corresponds to the type of draw symbol. These constants are POINT, POLYGON, LINE, POLYLINE, FREEHAND_POLYGON, FREEHAND_POLYLINE, MULTI_POINT, RECTANGLE, TRIANGLE, CIRCLE, ELLIPSE, ARROW, UP_ARROW, DOWN_ARROW, LEFT_ARROW, and RIGHT_ARROW. While activating the draw toolbar, these constants must be used to define the type of Draw operation required. Our objective is to draw a polygon on the click of a draw button. The code is shown in the following screenshot: The draw operation Once the draw tool bar is activated, the draw operation will begin. For point geometry, the draw operation is just a single click. For a polyline and a polygon, the single click adds a vertex to the polyline and a double-click ends the sketch. For freehand polyline or polygon, the click-and-drag operation draws the geometry and a mouse-up operation ends the drawing. The draw-end event handler When the draw operation is complete, we need an event handler to do something with the shape that was drawn by the draw toolbar. The API provides a draw-end event, which is fired once the draw operation is complete. This event handler must be connected to the draw toolbar. This event handler shall be defined within the this.own() function inside the postCreate() method of the widget. The event result can be passed to a named function or an anonymous function: postCreate: function () { ... this.tbDraw.on("draw-end", lang.hitch(this, this.querybyGeometry)); }, ... querybyGeometry: function (evt) { this.isBusy(true); //Get the Drawn geometry var geometryInput = evt.geometry; ... } Symbolizing the drawn shape In the draw-end event call back function, we will get the geometry of the drawn shape as the result object. To add this geometry back to the map, we need to symbolize it. A symbol is associated with the geometry it symbolizes. Also, the styling of the symbol is defined by the colors or picture used to fill up the symbol and the size of the symbol. Just to symbolize a polygon, we need to use the SimpleFillSymbol and the SimpleLineSymbol modules. We may also need the esri/color module to define the fill colors. Let's review a snippet to understand this better. This is a simple snippet to construct a symbol for a polygon with semitransparent solid red color fill and a yellow dash-dot line. In the preceding snippet, SimpleFillSymbol.STYLE_SOLID and SimpleLineSymbol.STYLE_DASHDOT are the constants provided by the SimpleFIllSymbol and the SimpleLineSymbol modules, respectively. These constants are used for styling the polygon and the line. Two colors are defined in the construction of the symbol—one for filling up the polygon and the other for coloring the outline. A color can be defined by four components. They are as follows: Red Green Blue Opacity Red, Green, and Blue components take values from 0 to 255 and the Opacity takes values from 0 to 1. A combination of Red, Green, and Blue components can be used to produce any color according to the RGB color theory. So, to create a yellow color, we are using the maximum of Red component (255) and the maximum of Green Component (255); we don't want the Blue component to contribute to our color, so we will use 0. An Opacity value of 0 means 100% transparency and an opacity value of 1 means 100% opaqueness. We have used 0.2 for the fill color. This means that we need our polygon to be 20% opaque or 80% transparent. The default value for this component is 1. Symbol is just a generic object. It means that any polygon geometry can use the symbol to render itself. Now, we need a container object to display the drawn geometry with the previously defined symbol on the map. A Graphic object provided by the esri/Graphic module acts as a container object, which can accept a geometry and a symbol. The graphic object can be added to the map's graphic layer. A graphic layer is always present in the map object, which can be accessed by using the graphics property of the map (this.map.graphics). Summary In this article, we learned how to create classes and customized widget and its required modules, and how to use a draw toolbar. Resources for Article: Further resources on this subject: Using JavaScript with HTML[article] Learning to Create and Edit Data in ArcGIS[article] Introduction to Mobile Web ArcGIS Development[article]

0
0
3937

How-To Tutorials - Programming

Containerizing a Web Application with Docker Part 1

Asynchronous Control Flow Patterns with ES2015 and beyond

Learning JavaScript Data Structures: Arrays

Practical Big Data Exploration with Spark and Python

Understanding Patterns and Architecturesin TypeScript

Wrappers

Exploring Scala Performance

Diving into OOP Principles

Expert Python Programming: Interfaces

The Parallel Universe of R

Trending Topics

Testing Your Application with cljs.test

Introducing Dynamics CRM

Web Server Development

Setting Up and Cleaning Up

Building Custom Widgets