Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1081 Articles
article-image-containerizing-web-application-docker-part-1
Darwin Corn
10 Jun 2016
4 min read
Save for later

Containerizing a Web Application with Docker Part 1

Darwin Corn
10 Jun 2016
4 min read
Congratulations, you’ve written a web application! Now what? Part one of this post deals with steps to take after development, more specifically the creation of a Docker image that contains the application. In part two, I’ll lay out deploying that image to the Google Cloud Platform as well as some further reading that'll help you descend into the rabbit hole that is DevOps. For demonstration purposes, let’s say that you’re me and you want to share your adventures in TrapRap and Death Metal (not simultaneously, thankfully!) with the world. I’ve written a simple Ember frontend for this purpose, and through the course of this post, I will explain to you how I go about containerizing it. Of course, the beauty of this procedure is that it will work with any frontend application, and you are certainly welcome to Bring Your Own Code. Everything I use is publically available on GitHub, however, and you’re certainly welcome to work through this post with the material presented as well. So, I’ve got this web app. You can get it here, or you can run: $ git clone https://github.com/ndarwincorn/docker-demo.git Do this for wherever it is you keep your source code. You’ll need ember-cli and some familiarity with Ember to customize it yourself, or you can just cut to the chase and build the Docker image, which is what I’m going to do in this post. I’m using Docker 1.10, but there’s no reason this wouldn’t work on a Mac running Docker Toolbox (or even Boot2Docker, but don’t quote me on that) or a less bleeding edge Linux distro. Since installing Docker is well documented, I won’t get into that here and will continue with the assumption that you have a working, up-to-date Docker installed on your machine, and that the Docker daemon is running. If you’re working with your own app, feel free to skip below to my explanation of the process and then come back here when you’ve got a Dockerfile in the root of your application. In the root of the application, run the following (make sure you don’t have any locally-installed web servers listening on port 80 already): # docker build -t docker-demo . # docker run -d -p 80:80 --name demo docker-demo Once the command finishes by printing a container ID, launch a web browser and navigate to http://localhost. Hey! Now you can listen to my music served from a LXC container running on your very own computer. How did we accomplish this? Let’s take it piece-by-piece (here’s where to start reading again if you’ve approached this article with your own app): I created a simple Dockerfile using the official Nginx image because I have a deep-seated mistrust of Canonical and don’t want to use the Dockerfile here. Here’s what it looks like in my project: docker-demo/Dockerfile FROM nginx COPY dist/usr/share/nginx/html Running the docker build command reads the Dockerfile and uses it to configure a docker image based on the nginx image. During image configuration, it copies the contents of the dist folder in my project to /srv/http/docker-demo in the container, which the nginx configuration that was mentioned is pointed to. The -t flag tells Docker to ‘tag’ (name) the image we’ve just created as ‘docker-demo’. The docker run command takes that image and builds a container from it. The -d flag is short for ‘detach’, or run the /usr/bin/nginx command built into the image from our Dockerfile and leave the container running. The -p flag maps a port on the host to a port in the container, and --name names the container for later reference. The command should return a container ID that can be used to manipulate it later. In part two, I’ll show you how to push the image we created to the Google Cloud Platform and then launch it as a container in a specially-purposed VM on their Compute Engine. About the Author Darwin Corn is a Systems Analyst for the Consumer Direct Care Network. He is a mid-level professional with diverse experience in the Information Technology world.
Read more
  • 0
  • 0
  • 4332

article-image-arrays
Packt
07 Jun 2016
18 min read
Save for later

Learning JavaScript Data Structures: Arrays

Packt
07 Jun 2016
18 min read
In this article by Loiane Groner, author of the book Learning JavaScript Data Structures and Algorithms, Second Edition, we will learn about arrays. An array is the simplest memory data structure. For this reason, all programming languages have a built-in array datatype. JavaScript also supports arrays natively, even though its first version was released without array support. In this article, we will dive into the array data structure and its capabilities. An array stores values sequentially that are all of the same datatype. Although JavaScript allows us to create arrays with values from different datatypes, we will follow best practices and assume that we cannot do this(most languages do not have this capability). (For more resources related to this topic, see here.) Why should we use arrays? Let's consider that we need to store the average temperature of each month of the year of the city that we live in. We could use something similar to the following to store this information: var averageTempJan = 31.9; var averageTempFeb = 35.3; var averageTempMar = 42.4; var averageTempApr = 52; var averageTempMay = 60.8; However, this is not the best approach. If we store the temperature for only 1 year, we could manage 12 variables. However, what if we need to store the average temperature for more than 1 year? Fortunately, this is why arrays were created, and we can easily represent the same information mentioned earlier as follows: averageTemp[0] = 31.9; averageTemp[1] = 35.3; averageTemp[2] = 42.4; averageTemp[3] = 52; averageTemp[4] = 60.8; We can also represent the averageTemp array graphically: Creating and initializing arrays Declaring, creating, and initializing an array in JavaScript is as simple, as shown by the following: var daysOfWeek = new Array(); //{1} var daysOfWeek = new Array(7); //{2} var daysOfWeek = new Array('Sunday', 'Monday', 'Tuesday', 'Wednes"day', 'Thursday', 'Friday', 'Saturday'); //{3} We can simply declare and instantiate a new array using the keyword new (line {1}). Also, using the keyword new, we can create a new array specifying the length of the array (line {2}). A third option would be passing the array elements directly to its constructor (line {3}). However, using the new keyword is not best practice. If you want to create an array in JavaScript, we can assign empty brackets ([]),as in the following example: var daysOfWeek = []; We can also initialize the array with some elements, as follows: var daysOfWeek = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', "'Thursday', 'Friday', 'Saturday']; If we want to know how many elements are in the array (its size), we can use the length property. The following code will give an output of 7: console.log(daysOfWeek.length); Accessing elements and iterating an array To access a particular position of the array, we can also use brackets, passing the index of the position we would like to access. For example, let's say we want to output all the elements from the daysOfWeek array. To do so, we need to loop the array and print the elements, as follows: for (var i=0; i<daysOfWeek.length; i++){ console.log(daysOfWeek[i]); } Let's take a look at another example. Let's say that we want to find out the first 20 numbers of the Fibonacci sequence. The first two numbers of the Fibonacci sequence are 1 and 2, and each subsequent number is the sum of the previous two numbers: var fibonacci = []; //{1} fibonacci[1] = 1; //{2} fibonacci[2] = 1; //{3} for(var i = 3; i < 20; i++){ fibonacci[i] = fibonacci[i-1] + fibonacci[i-2]; ////{4} } for(var i = 1; i<fibonacci.length; i++){ //{5} console.log(fibonacci[i]); //{6} } So, in line {1}, we declared and created an array. In lines {2} and {3}, we assigned the first two numbers of the Fibonacci sequence to the second and third positions of the array (in JavaScript, the first position of the array is always referenced by 0, and as there is no 0 in the Fibonacci sequence, we will skip it). Then, all we have to do is create the third to the twentieth number of the sequence (as we know the first two numbers already). To do so, we can use a loop and assign the sum of the previous two positions of the array to the current position (line {4},starting from index 3 of the array to the 19th index). Then, to take a look at the output (line {6}), we just need to loop the array from its first position to its length (line {5}). We can use console.log to output each index of the array (lines {5} and {6}), or we can also use console.log(fibonacci) to output the array itself. Most browsers have a nice array representation in console.log. If you would like to generate more than 20 numbers of the Fibonacci sequence, just change the number 20 to whatever number you like. Adding elements Adding and removing elements from an array is not that difficult; however, it can be tricky. For the examples we will use in this section, let's consider that we have the following numbers array initialized with numbers from 0 to 9: var numbers = [0,1,2,3,4,5,6,7,8,9]; If we want to add a new element to this array (for example, the number 10), all we have to do is reference the latest free position of the array and assign a value to it: numbers[numbers.length] = 10; In JavaScript, an array is a mutable object. We can easily add new elements to it. The object will grow dynamically as we add new elements to it. In many other languages, such as C and Java, we need to determine the size of the array, and if we need to add more elements to the array, we need to create a completely new array; we cannot simply add new elements to it as we need them. Using the push method However, there is also a method called push that allows us to add new elements to the end of the array. We can add as many elements as we want as arguments to the push method: numbers.push(11); numbers.push(12, 13); The output of the numbers array will be the numbers from 0 to 13. Inserting an element in the first position Now, let's say we need to add a new element to the array and would like to insert it in the first position, not the last one. To do so, first, we need to free the first position by shifting all the elements to the right. We can loop all the elements of the array, starting from the last position + 1 (length) and shifting the previous element to the new position to finally assign the new value we want to the first position (-1). Run the following code for this: for (var i=numbers.length; i>=0; i--){ numbers[i] = numbers[i-1]; } numbers[0] = -1; We can represent this action with the following diagram: Using the unshift method The JavaScript array class also has a method called unshift, which inserts the values passed in the method's arguments at the start of the array: numbers.unshift(-2); numbers.unshift(-4, -3); So, using the unshift method, we can add the value -2 and then -3 and -4 to the beginning of the numbers array. The output of this array will be the numbers from -4 to 13. Removing elements So far, you have learned how to add values to the end and at the beginning of an array. Let's take a look at how we can remove a value from an array. To remove a value from the end of an array, we can use the pop method: numbers.pop(); The push and pop methods allow an array to emulate a basic stack data structure. The output of our array will be the numbers from -4 to 12. The length of our array is 17. Removing an element from first position To remove a value from the beginning of the array, we can use the following code: for (var i=0; i<numbers.length; i++){ numbers[i] = numbers[i+1]; } We can represent the previous code using the following diagram: We shifted all the elements one position to the left. However, the length of the array is still the same (17), meaning we still have an extra element in our array (with an undefined value).The last time the code inside the loop was executed, i+1was a reference to a position that does not exist. In some languages such as Java, C/C++, or C#, the code would throw an exception, and we would have to end our loop at numbers.length -1. As you can note, we have only overwritten the array's original values, and we did not really remove the value (as the length of the array is still the same and we have this extra undefined element). Using the shift method To actually remove an element from the beginning of the array, we can use the shift method, as follows: numbers.shift(); So, if we consider that our array has the value -4 to 12 and a length of 17, after we execute the previous code, the array will contain the values -3 to 12 and have a length of 16. The shift and unshift methods allow an array to emulate a basic queue data structure. Adding and removing elements from a specific position So far, you have learned how to add elements at the end and at the beginning of an array, and you have also learned how to remove elements from the beginning and end of an array. What if we also want to add or remove elements from any particular position of our array? How can we do this? We can use the splice method to remove an element from an array by simply specifying the position/index that we would like to delete from and how many elements we would like to remove, as follows: numbers.splice(5,3); This code will remove three elements, starting from index 5 of our array. This means the numbers [5],numbers [6], and numbers [7] will be removed from the numbers array. The content of our array will be -3, -2, -1, 0, 1, 5, 6, 7, 8, 9, 10, 11, and 12 (as the numbers 2, 3, and 4 have been removed). As with JavaScript arrays and objects, we can also use the delete operator to remove an element from the array, for example, remove numbers[0]. However, the position 0 of the array will have the value undefined, meaning that it would be the same as doing numbers[0] = undefined. For this reason, we should always use the splice, pop, or shift methods to remove elements. Now, let's say we want to insert numbers 2 to 4 back into the array, starting from the position 5. We can again use the splice method to do this: numbers.splice(5,0,2,3,4); The first argument of the method is the index we want to remove elements from or insert elements into. The second argument is the number of elements we want to remove (in this case, we do not want to remove any, so we will pass the value 0 (zero)). And the third argument (onwards) are the values we would like to insert into the array (the elements 2, 3, and 4). The output will be values from -3 to 12 again. Finally, let's execute the following code: numbers.splice(5,3,2,3,4); The output will be values from -3 to 12. This is because we are removing three elements, starting from the index 5, and we are also adding the elements 2, 3, and 4, starting at index 5. Two-dimensional and multidimensional arrays At the beginning of this article, we used the temperature measurement example. We will now use this example one more time. Let's consider that we need to measure the temperature hourly for a few days. Now that we already know we can use an array to store the temperatures, we can easily write the following code to store the temperatures over two days: var averageTempDay1 = [72,75,79,79,81,81]; var averageTempDay2 = [81,79,75,75,73,72]; However, this is not the best approach; we can write better code! We can use a matrix (two-dimensional array) to store this information, in which each row will represent the day, and each column will represent an hourly measurement of temperature, as follows: var averageTemp = []; averageTemp[0] = [72,75,79,79,81,81]; averageTemp[1] = [81,79,75,75,73,72]; JavaScript only supports one-dimensional arrays; it does not support matrices. However, we can implement matrices or any multidimensional array using an array of arrays, as in the previous code. The same code can also be written as follows: //day 1 averageTemp[0] = []; averageTemp[0][0] = 72; averageTemp[0][1] = 75; averageTemp[0][2] = 79; averageTemp[0][3] = 79; averageTemp[0][4] = 81; averageTemp[0][5] = 81; //day 2 averageTemp[1] = []; averageTemp[1][0] = 81; averageTemp[1][1] = 79; averageTemp[1][2] = 75; averageTemp[1][3] = 75; averageTemp[1][4] = 73; averageTemp[1][5] = 72; In the previous code, we specified the value of each day and hour separately. We can also represent this example in a diagram similar to the following: Each row represents a day, and each column represents an hour of the day (temperature). Iterating the elements of two-dimensional arrays If we want to take a look at the output of the matrix, we can create a generic function to log its output: function printMatrix(myMatrix) { for (var i=0; i<myMatrix.length; i++){ for (var j=0; j<myMatrix[i].length; j++){ console.log(myMatrix[i][j]); } } } We need to loop through all the rows and columns. To do this, we need to use a nested for loop in which the variable i represents rows, and j represents the columns. We can call the following code to take a look at the output of the averageTemp matrix: printMatrix(averageTemp); Multi-dimensional arrays We can also work with multidimensional arrays in JavaScript. For example, let's create a 3 x 3 matrix. Each cell contains the sum i (row) + j (column) + z (depth) of the matrix, as follows: var matrix3x3x3 = []; for (var i=0; i<3; i++){ matrix3x3x3[i] = []; for (var j=0; j<3; j++){ matrix3x3x3[i][j] = []; for (var z=0; z<3; z++){ matrix3x3x3[i][j][z] = i+j+z; } } } It does not matter how many dimensions we have in the data structure; we need to loop each dimension to access the cell. We can represent a 3 x 3 x 3 matrix with a cube diagram, as follows: To output the content of this matrix, we can use the following code: for (var i=0; i<matrix3x3x3.length; i++){ for (var j=0; j<matrix3x3x3[i].length; j++){ for (var z=0; z<matrix3x3x3[i][j].length; z++){ console.log(matrix3x3x3[i][j][z]); } } } If we had a 3 x 3 x 3 x 3 matrix, we would have four nested for statements in our code and so on. References for JavaScript array methods Arrays in JavaScript are modified objects, meaning that every array we create has a few methods available to be used. JavaScript arrays are very interesting because they are very powerful and have more capabilities available than primitive arrays in other languages. This means that we do not need to write basic capabilities ourselves, such as adding and removing elements in/from the middle of the data structure. The following is a list of the core available methods in an array object. We have covered some methods already: Method Description concat This joins multiple arrays and returns a copy of the joined arrays every This iterates every element of the array, verifying a desired condition (function) until false is returned filter This creates an array with each element that evaluates to true in the function provided forEach This executes a specific function on each element of the array join This joins all the array elements into a string indexOf This searches the array for specific elements and returns its position lastIndexOf This returns the position of last item in the array that matches the search criteria map This creates a new array from a function that contains the criteria/condition and returns the elements of the array that match the criteria reverse This reverses the array so that the last items become the first and vice versa slice This returns a new array from the specified index some This iterates every element of the array, verifying a desired condition (function) until true is returned sort This sorts the array alphabetically or by the supplied function toString This returns the array as a string valueOf Similar to the toString method, this returns the array as a string We have already covered the push, pop, shift, unshift, and splice methods. Let's take a look at these new ones. Joining multiple arrays Consider a scenario where you have different arrays and you need to join all of them into a single array. We could iterate each array and add each element to the final array. Fortunately, JavaScript already has a method that can do this for us named the concat method, which looks as follows: var zero = 0; var positiveNumbers = [1,2,3]; var negativeNumbers = [-3,-2,-1]; var numbers = negativeNumbers.concat(zero, positiveNumbers); We can pass as many arrays and objects/elements to this array as we desire. The arrays will be concatenated to the specified array in the order that the arguments are passed to the method. In this example, zero will be concatenated to negativeNumbers, and then positiveNumbers will be concatenated to the resulting array. The output of the numbers array will be the values -3, -2, -1, 0, 1, 2, and 3. Iterator functions Sometimes, we need to iterate the elements of an array. You learned that we can use a loop construct to do this, such as the for statement, as we saw in some previous examples. JavaScript also has some built-in iterator methods that we can use with arrays. For the examples of this section, we will need an array and also a function. We will use an array with values from 1 to 15 and also a function that returns true if the number is a multiple of 2 (even) and false otherwise. Run the following code: var isEven = function (x) { // returns true if x is a multiple of 2. console.log(x); return (x % 2 == 0) ? true : false; }; var numbers = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]; return (x % 2 == 0) ? true : false can also be represented as return (x % 2 == 0). Iterating using the every method The first method we will take a look at is the every method. The every method iterates each element of the array until the return of the function is false, as follows: numbers.every(isEven); In this case, our first element of the numbers array is the number 1. 1 is not a multiple of 2 (it is an odd number), so the isEven function will return false, and this will be the only time the function will be executed. Iterating using the some method Next, we have the some method. It has the same behavior as the every method; however, the some method iterates each element of the array until the return of the function is true: numbers.some(isEven); In our case, the first even number of our numbers array is 2 (the second element). The first element that will be iterated is the number 1; it will return false. Then, the second element that will be iterated is the number 2, which will return true, and the iteration will stop. Iterating using forEach If we need the array to be completely iterated no matter what, we can use the forEach function. It has the same result as using a for loop with the function's code inside it, as follows: numbers.forEach(function(x){ console.log((x % 2 == 0)); }); Using map and filter JavaScript also has two other iterator methods that return a new array with a result. The first one is the map method, which is as follows: var myMap = numbers.map(isEven); The myMap array will have the following values: [false, true, false, true, false, true, false, true, false, true, false, true, false, true, false]. It stores the result of the isEven function that was passed to the map method. This way, we can easily know whether a number is even or not. For example, myMap[0] returns false because 1 is not even, and myMap[1] returns true because 2 is even. We also have the filter method. It returns a new array with the elements that the function returned true, as follows: var evenNumbers = numbers.filter(isEven); In our case, the evenNumbers array will contain the elements that are multiples of 2: [2, 4, 6, 8, 10, 12, 14]. Using the reduce method Finally, we have the reduce method. The reduce method also receives a function with the following parameters: previousValue, currentValue, index, and array. We can use this function to return a value that will be added to an accumulator, which will be returned after the reduce method stops being executed. It can be very useful if we want to sum up all the values in an array. Here's an example: numbers.reduce(function(previous, current, index){ return previous + current; }); The output will be 120. The JavaScript Array class also has two other important methods: map and reduce. The method names are self-explanatory, meaning that the map method will map values when given a function, and the reduce method will reduce the array containing the values that match a function as well. These three methods (map, filter, and reduce) are the base of the functional programming of JavaScript. Summary In this article, we covered the most-used data structure: arrays. You learned how to declare, initialize, and assign values as well as add and remove elements. You also learned about two-dimensional and multidimensional arrays as well as the main methods of an array, which will be very useful when we start creating our own algorithms.
Read more
  • 0
  • 0
  • 1647

article-image-asynchronous-control-flow-patterns-es2015-and-beyond
Packt
07 Jun 2016
6 min read
Save for later

Asynchronous Control Flow Patterns with ES2015 and beyond

Packt
07 Jun 2016
6 min read
In this article,by Luciano Mammino, the author of the book Node.js Design Patterns, Second Edition, we will explore async await, an innovative syntaxthat will be available in JavaScript as part of the release of ECMAScript 2017. (For more resources related to this topic, see here.) Async await using Babel Callbacks, promises, and generators turn out to be the weapons at our disposal to deal with asynchronous code in JavaScript and in Node.js. As we have seen, generators are very interesting because they offer a way to actually suspend the execution of a function and resume it at a later stage. Now we can adopt this feature to write asynchronous codethatallowsdevelopers to write functions that "appear" to block at each asynchronous operation, waiting for the results before continuing with the following statement. The problem is that generator functions are designed to deal mostly with iterators and their usage with asynchronous code feels a bit cumbersome.It might be hard to understand,leading to code that is hard to read and maintain. But there is hope that there will be a cleaner syntax sometime in the near future. In fact, there is an interesting proposal that will be introduced with the ECMAScript 2017 specification that defines the async function's syntax. You can read more about the current status of the async await proposal at https://tc39.github.io/ecmascript-asyncawait/. The async function specification aims to dramatically improve the language-level model for writing asynchronous code by introducing two new keywords into the language: async and await. To clarify how these keywords are meant to be used and why they are useful, let's see a very quick example: const request = require('request'); function getPageHtml(url) { return new Promise(function(resolve, reject) { request(url, function(error, response, body) { resolve(body); }); }); } async function main() { const html = awaitgetPageHtml('http://google.com'); console.log(html); } main(); console.log('Loading...'); In this code,there are two functions: getPageHtml and main. The first one is a very simple function that fetches the HTML code of a remote web page given its URL. It's worth noticing that this function returns a promise. The main function is the most interesting one because it's where the new async and await keywords are used. The first thing to notice is that the function is prefixed with the async keyword. This means that the function executes asynchronous code and allows it to use the await keyword within its body. The await keyword before the call to getPageHtml tells the JavaScript interpreter to "await" the resolution of the promise returned by getPageHtml before continuing to the next instruction. This way, the main function is internally suspended until the asynchronous code completes without blocking the normal execution of the rest of the program. In fact, we will see the string Loading… in the console and, after a moment, the HTML code of the Google landing page. Isn't this approach much more readable and easy to understand? Unfortunately, this proposal is not yet final, and even if it will be approved we will need to wait for the next version of the ECMAScript specification to come out and be integrated in Node.js to be able to use this new syntax natively. So what do we do today? Just wait? No, of course not! We can already leverage async await in our code thanks to transpilers such as Babel. Installing and running Babel Babel is a JavaScript compiler (or transpiler) that is able to convert JavaScript code into other JavaScript code using syntax transformers. Syntax transformers allowsthe use of new syntax such as ES2015, ES2016, JSX, and others to produce backward compatible equivalent code that can be executed in modernJavaScript runtimes, such as browsers or Node.js. You can install Babel in your project using NPM with the following command: npm install --save-dev babel-cli We also need to install the extensions to support async await parsing and transformation: npm install --save-dev babel-plugin-syntax-async-functions babel-plugin-transform-async-to-generator Now let's assume we want to run our previous example (called index.js).We need to launch the following command: node_modules/.bin/babel-node --plugins "syntax-async-functions,transform-async-to-generator" index.js This way, we are transforming the source code in index.js on the fly, applying the transformers to support async await. This new backward compatible code is stored in memory and then executed on the fly on the Node.js runtime. Babel can also be configured to act as a build processor that stores the generated code into files so that you can easily deploy and run the generated code. You can read more about how to install and configure Babel on the official website at https://babeljs.io. Comparison At this point, we should have a better understanding of the options we have to tame the asynchronous nature of JavaScript. Each one of the solutions presented has its own pros and cons. Let's summarize them in the following table: Solutions Pros Cons Plain JavaScript Does not require any additional libraries or technology Offers the best performances Provides the best level of compatibility with third-party libraries Allows the creation of ad hoc and more advanced algorithms Might require extra code and relatively complex algorithms Async (library) Simplifies the most common control flow patterns Is still a callback-based solution Good performance Introduces an external dependency Might still not be enough for advanced flows   Promises Greatly simplify the most common control flow patterns Robust error handling Part of the ES2015 specification Guarantee deferred invocation of onFulfilled and onRejected Require to promisify callback-based APIs Introduce a small performance hit   Generators Make non-blocking API look like a blocking one Simplify error handling Part of ES2015 specification Require a complementary control flow library Still require callbacks or promises to implement non-sequential flows Require to thunkify or promisify nongenerator-based APIs   Async await Make non-blocking API look like blocking Clean and intuitive syntax Not yet available in JavaScript and Node.js natively Requires Babel or other transpilers and some configuration to be used today   It is worth mentioning that we chose to present only the most popular solutions to handle asynchronous control flow, or the ones receiving a lot of momentum, but it's good to know that there are a few more options you might want to look at, for example, Fibers (https://npmjs.org/package/fibers) and Streamline (https://npmjs.org/package/streamline). Summary In this article, we analyzed how Babel can be used for performing async await and how to install Babel.
Read more
  • 0
  • 0
  • 3125
Banner background image

article-image-practical-big-data-exploration-spark-and-python
Anant Asthana
06 Jun 2016
6 min read
Save for later

Practical Big Data Exploration with Spark and Python

Anant Asthana
06 Jun 2016
6 min read
The reader of this post should be familiar with basic concepts of Spark, such as the shell and RDDs. Data sizes have increased, but our exploration tools and techniques have not evolved as fast. Traditional Hadoop Map Reduce jobs are cumbersome and time consuming to develop. Also, Pig isn't quite as fully featured and easy to work with. Exploration can mean parsing/analyzing raw text documents, analyzing log files, processing tabular data in various formats, and exploring data that may or may not be correctly formatted. This is where a tool like Spark excels. It provides an interactive shell for quick processing, prototyping, exploring, and slicing and dicing data. Spark works with R, Scala, and Python. In conjunction with Jupyter notebooks, we get a clean web interface to write out python, R, or Scala code backed by a Spark cluster. Jupyter notebook is also a great tool for presenting our findings, since we can do inline visualizations and easily share them as a PDF on GitHub or through a web viewer. The power of this set up is that we make Spark do the heavy lifting while still having the flexibility to test code on a small subset of data via the interactive notebooks. Another powerful capability of Spark is its Data Frames API. After we have cleaned our data (dealt with badly formatted rows that can't be loaded correctly), we can load it as a Data Frame. Once the data is a loaded as a Data Frame, we can use the Spark SQL to explore the data. Since notebooks can be shared, this is also a great way to let the developers do the work of cleaning the data and loading it as a Data Frame. Analysts, data scientists, and the likes can then use this data for their tasks. Data Frames can also be exported as Hive tables, which are commonly used in Hadoop-based warehouses. Examples: For this section, we will be using examples that I have uploaded on GitHub. These examples can be found at here. In addition to the examples, there is also a Docker container for running these examples that have been provided. The container runs Spark in a pseudo-distributed mode, and has Jupyter notebook configured with to run Python/PYspark. The basics: To set this up, in your environment, you need a running spark cluster with Jupyter notebook installed. Jupyter notebook, by default, only has the Python kernel configured. You can download additional kernels for Jupyter notebook to run R and Scala. To run Jupyter notebook with Pyspark, use the following command on your cluster: IPYTHON_OPTS="notebook --pylab inline --notebook-dir=<directory sto store notebooks>" MASTER=local[6] ./bin/pyspark When you start Jupyter notebook in the way we mentioned earlier, it initializes a few critical variables. One of them is the Spark Context (sc), which is used to interact with all spark-related tasks. The other is sqlContext, which is the Spark SQL context. This is used to interact with Spark SQL (create Data Frames, run queries, and so on). You need to understand the following: Log Analysis In this example, we use a log file from Apache Server. The code for this example can be found at here. We load our log file in question using: log_file = sc.textFile("../data/log_file.txt") Spark can load files from HDFS, local filesystem, and S3 natively. Other storage formats libraries can be found freely on the Internet, or you could write you own formats (Blog post for another time). The previous command loads the log file. We then use Python’s native shlex library to split the file into different fields and use the Sparks map command to load them as a Row. An RDD consisting of rows can easily be registered as a DataFrame. How we arrived at this solution is where data exploration comes in. We use the Sparks takeSample method to sample the file and get five rows: log_file.takeSample(True, 5) These sample rows are helpful in determining how to parse and load the file. Once we have written our code to load the file, we can apply it to the dataset using map to create a new RDD consisting of Rows to test code on a subset of data in a similar manner using the take or takeSample methods. The take method sequentially reads rows from the file, so although it is faster, it may not be a good representation of the dataset. The take sample method on the other hand randomly picks sample rows from the file; this has a better representation. To create the new RDD and register it as a DataFrame, we use the following code: schema_DF = splits.map(create_schema).toDF() Once we have created the DataFrame and tested it using take/takeSample to make sure that our loading code is working, we can register it as a table using the following: sqlCtx.registerDataFrameAsTable(schema_DF, 'logs') Once it is registered as a table, we can run SQL queries on the log file: sample = sqlCtx.sql('SELECT * FROM logs LIMIT 10').collect() Note that the collect() method collects the result to the driver’s memory so this may not be feasible for large datasets. Use take/takeSample instead to sample data if your dataset is large. The beauty of using Spark with Jupyter is that all this exploration work takes only a few lines of code. It can be written interactively with all the trial and error we needed, the processed data can be easily shared, and running interactive queries on this data is easy. Last but not least, this can easily scale to massive (GB, TB) data sets. k-means on the Iris dataset In this example, we use data from the Iris dataset, which contains measurements of sepal and petal length and width. This is a popular open source dataset used to showcase classification algorithms. In this case, we use Spark’s k-Means algorithm from the MLlib library of Spark. MLlib is Spark’s machine learning library. The code and the output can be found at here. In this example, we are not going to get into too much detail since some of the concepts are outside the scope of this blog post. This example showcases how we load the Iris dataset and create a DataFrame with it. We then train a k-means classifier on this dataset, and then we visualize our classification results. The power of this is that we did a somewhat complex task of parsing a dataset, creating a DataFrame, training a machine learning classifier, and visualizing the data in an interactive and scalable manner. The repository contains several more examples. Feel free to reach out to me if you have any questions. If you would like to see more posts with practical examples, please let us know. About the Author Anant Asthana is a data scientist and principal architect at Pythian, and he can be found on Github at anantasty.
Read more
  • 0
  • 0
  • 3058

article-image-understanding-patterns-and-architecturesin-typescript
Packt
01 Jun 2016
19 min read
Save for later

Understanding Patterns and Architecturesin TypeScript

Packt
01 Jun 2016
19 min read
In this article by Vilic Vane,author of the book TypeScript Design Patterns, we'll study architecture and patterns that are closely related to the language or its common applications. Many topics in this articleare related to asynchronous programming. We'll start from a web architecture for Node.js that's based on Promise. This is a larger topic that has interesting ideas involved, including abstractions of response and permission, as well as error handling tips. Then, we'll talk about how to organize modules with ES module syntax. Due to the limited length of this article, some of the related code is aggressively simplified, and nothing more than the idea itself can be applied practically. (For more resources related to this topic, see here.) Promise-based web architecture The most exciting thing for Promise may be the benefits brought to error handling. In a Promise-based architecture, throwing an error could be safe and pleasant. You don't have to explicitly handle errors when chaining asynchronous operations, and this makes it tougher for mistakes to occur. With the growing usage with ES2015 compatible runtimes, Promise has already been there out of the box. We have actually plenty of polyfills for Promises (including my ThenFail, written in TypeScript) as people who write JavaScript roughly, refer to the same group of people who create wheels. Promises work great with other Promises: A Promises/A+ compatible implementation should work with other Promises/A+ compatible implementations Promises do their best in a Promise-based architecture If you are new to Promise, you may complain about trying Promise with a callback-based project. You may intend to use helpers provided by Promise libraries, such asPromise.all, but it turns out that you have better alternatives,such as the async library. So, the reason that makes you decide to switch should not be these helpers (as there are a lot of them for callbacks).They should be because there's an easier way to handle errors or because you want to take the advantages of ES async and awaitfeatures which are based on Promise. Promisifying existing modules or libraries Though Promises do their best with a Promise-based architecture, it is still possible to begin using Promise with a smaller scope by promisifying existing modules or libraries. Taking Node.js style callbacks as an example, this is how we use them: import * as FS from 'fs';   FS.readFile('some-file.txt', 'utf-8', (error, text) => { if (error) {     console.error(error);     return; }   console.log('Content:', text); }); You may expect a promisified version of readFile to look like the following: FS .readFile('some-file.txt', 'utf-8') .then(text => {     console.log('Content:', text); }) .catch(reason => {     Console.error(reason); }); Implementing the promisified version of readFile can be easy as the following: function readFile(path: string, options: any): Promise<string> { return new Promise((resolve, reject) => {     FS.readFile(path, options, (error, result) => {         if (error) { reject(error);         } else {             resolve(result);         }     }); }); } I am using any here for parameter options to reduce the size of demo code, but I would suggest that you donot useany whenever possible in practice. There are libraries that are able to promisify methods automatically. Unfortunately, you may need to write declaration files yourself for the promisified methods if there is no declaration file of the promisified version that is available. Views and controllers in Express Many of us may have already been working with frameworks such as Express. This is how we render a view or send back JSON data in Express: import * as Path from 'path'; import * as express from 'express';   let app = express();   app.set('engine', 'hbs'); app.set('views', Path.join(__dirname, '../views'));   app.get('/page', (req, res) => {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); });   app.get('/data', (req, res) => {     res.json({         version: '0.0.0',         items: []     }); });   app.listen(1337); We will usuallyseparate controller from routing, as follows: import { Request, Response } from 'express';   export function page(req: Request, res: Response): void {     res.render('page', {         title: 'Hello, Express!',         content: '...'     }); } Thus, we may have a better idea of existing routes, and we may have controllers managed more easily. Furthermore, automated routing can be introduced so that we don't always need to update routing manually: import * as glob from 'glob';   let controllersDir = Path.join(__dirname, 'controllers');   let controllerPaths = glob.sync('**/*.js', {     cwd: controllersDir });   for (let path of controllerPaths) {     let controller = require(Path.join(controllersDir, path));     let urlPath = path.replace(/\/g, '/').replace(/.js$/, '');       for (let actionName of Object.keys(controller)) {         app.get(             `/${urlPath}/${actionName}`, controller[actionName] );     } } The preceding implementation is certainly too simple to cover daily usage. However, it displays the one rough idea of how automated routing could work: via conventions that are based on file structures. Now, if we are working with asynchronous code that is written in Promises, an action in the controller could be like the following: export function foo(req: Request, res: Response): void {     Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } We use destructuring of an array within a parameter. Promise.all returns a Promise of an array with elements corresponding to values of resolvablesthat are passed in. (A resolvable means a normal value or a Promise-like object that may resolve to a normal value.) However, this is not enough, we need to handle errors properly. Or in some case, the preceding code may fail in silence (which is terrible). In Express, when an error occurs, you should call next (the third argument that is passed into the callback) with the error object, as follows: import { Request, Response, NextFunction } from 'express';   export function foo( req: Request, res: Response, next: NextFunction ): void {     Promise         // ...         .catch(reason => next(reason)); } Now, we are fine with the correctness of this approach, but this is simply not how Promises work. Explicit error handling with callbacks could be eliminated in the scope of controllers, and the easiest way to do this is to return the Promise chain and hand over to code that was previously performing routing logic. So, the controller could be written like the following: export function foo(req: Request, res: Response) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             res.render('foo', {                 post,                 comments             });         }); } Or, can we make this even better? Abstraction of response We've already been returning a Promise to tell whether an error occurs. So, for a server error, the Promise actually indicates the result, or in other words, the response of the request. However, why we are still calling res.render()to render the view? The returned Promise object could be an abstraction of the response itself. Think about the following controller again: export class Response {}   export class PageResponse extends Response {     constructor(view: string, data: any) { } }   export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return new PageResponse('foo', {                 post,                 comments             });         }); } The response object that is returned could vary for a different response output. For example, it could be either a PageResponse like it is in the preceding example, a JSONResponse, a StreamResponse, or even a simple Redirection. As in most of the cases, PageResponse or JSONResponse is applied, and the view of a PageResponse can usually be implied with the controller path and action name.It is useful to have these two responses automatically generated from a plain data object with proper view to render with, as follows: export function foo(req: Request) {     return Promise         .all([             Post.getContent(),             Post.getComments()         ])         .then(([post, comments]) => {             return {                 post,                 comments             };         }); } This is how a Promise-based controller should respond. With this idea in mind, let's update the routing code with an abstraction of responses. Previously, we were passing controller actions directly as Express request handlers. Now, we need to do some wrapping up with the actions by resolving the return value, and applying operations that are based on the resolved result, as follows: If it fulfills and it's an instance of Response, apply it to the resobjectthat is passed in by Express. If it fulfills and it's a plain object, construct a PageResponse or a JSONResponse if no view found and apply it to the resobject. If it rejects, call thenext function using this reason. As seen previously,our code was like the following: app.get(`/${urlPath}/${actionName}`, controller[actionName]); Now, it gets a little bit more lines, as follows: let action = controller[actionName];   app.get(`/${urlPath}/${actionName}`, (req, res, next) => {     Promise         .resolve(action(req))         .then(result => {             if (result instanceof Response) {                 result.applyTo(res);             } else if (existsView(actionName)) {                 new PageResponse(actionName, result).applyTo(res);             } else {                 new JSONResponse(result).applyTo(res);             }         })         .catch(reason => next(reason)); });   However, so far we can only handle GET requests as we hardcoded app.get() in our router implementation. The poor view matching logic can hardly be used in practice either. We need to make these actions configurable, and ES decorators could perform a good job here: export default class Controller { @get({     View: 'custom-view-path' })     foo(req: Request) {         return {             title: 'Action foo',             content: 'Content of action foo'         };     } } I'll leave the implementation to you, and feel free to make them awesome. Abstraction of permission Permission plays an important role in a project, especially in systems that have different user groups. For example, a forum. The abstraction of permission should be extendable to satisfy changing requirements, and it should be easy to use as well. Here, we are going to talk about the abstraction of permission in the level of controller actions. Consider the legibility of performing one or more actions a privilege. The permission of a user may consist of several privileges, and usually most of the users at the same level would have the same set of privileges. So, we may have a larger concept, namely groups. The abstraction could either work based on both groups and privileges, or work based on only privileges (groups are now just aliases to sets of privileges): Abstraction that validates based on privileges and groups at the same time is easier to build. You do not need to create a large list of which actions can be performed for a certain group of user, as granular privileges are only required when necessary. Abstraction that validates based on privileges has better control and more flexibility to describe the permission. For example, you can remove a small set of privileges from the permission of a user easily. However, both approaches have similar upper-level abstractions, and they differ mostly on implementations. The general structure of the permission abstractions that we've talked about is like in the following diagram: The participants include the following: Privilege: This describes detailed privilege corresponding to specific actions Group: This defines a set of privileges Permission: This describes what a user is capable of doing, consist of groups that the user belongs to, and the privileges that the user has. Permission descriptor: This describes how the permission of a user works and consists of possible groups and privileges. Expected errors A great concern that was wiped away after using Promises is that we do not need to worry about whether throwing an error in a callback would crash the application most of the time. The error will flow through the Promises chain and if not caught, it will be handled by our router. Errors can be roughly divided as expected errors and unexpected errors. Expected errors are usually caused by incorrect input or foreseeable exceptions, and unexpected errors are usually caused by bugs or other libraries that the project relies on. For expected errors, we usually want to give users a friendly response with readable error messages and codes. So that the user can help themselves searching the error or report to us with useful context. For unexpected errors, we would also want a reasonable response (usually a message described as an unknown error), a detailed server-side log (including real error name, message, stack information, and so on), and even alerts to let the team know as soon as possible. Defining and throwing expected errors The router will need to handle different types of errors, and an easy way to achieve this is to subclass a universal ExpectedError class and throw its instances out, as follows: import ExtendableError from 'extendable-error';   class ExpectedError extends ExtendableError { constructor(     message: string,     public code: number ) {     super(message); } } The extendable-error is a package of mine that handles stack trace and themessage property. You can directly extend Error class as well. Thus, when receiving an expected error, we can safely output the error name and message as part of the response. If this is not an instance of ExpectedError, we can display predefined unknown error messages. Transforming errors Some errors such as errors that are caused by unstable networks or remote services are expected.We may want to catch these errors and throw them out again as expected errors. However, it could be rather trivial to actually do this. A centralized error transforming process can then be applied to reduce the efforts required to manage these errors. The transforming process includes two parts: filtering (or matching) and transforming. These are the approaches to filter errors: Filter by error class: Many third party libraries throws error of certain class. Taking Sequelize (a popular Node.js ORM) as an example, it has DatabaseError, ConnectionError, ValidationError, and so on. By filtering errors by checking whether they are instances of a certain error class, we may easily pick up target errors from the pile. Filter by string or regular expression: Sometimes a library might be throw errors that are instances of theError class itself instead of its subclasses.This makes these errors hard to distinguish from others. In this situation, we can filter these errors by their message with keywords or regular expressions. Filter by scope: It's possible that instances of the same error class with the same error message should result in a different response. One of the reasons may be that the operation throwing a certain error is at a lower-level, but it is being used by upper structures within different scopes. Thus, a scope mark can be added for these errors and make it easier to be filtered. There could be more ways to filter errors, and they are usually able to cooperate as well. By properly applying these filters and transforming errors, we can reduce noises, analyze what's going on within a system,and locate problems faster if they occur. Modularizing project Before ES2015, there are actually a lot of module solutions for JavaScript that work. The most famous two of them might be AMD and CommonJS. AMD is designed for asynchronous module loading, which is mostly applied in browsers. While CommonJSperforms module loading synchronously, and this is the way that the Node.js module system works. To make it work asynchronously, writing an AMD module takes more characters. Due to the popularity of tools, such asbrowserify and webpack, CommonJS becomes popular even for browser projects. Proper granularity of internal modules can help a project keep a healthy structure. Consider project structure like the following: project├─controllers├─core│  │ index.ts│  ││  ├─product│  │   index.ts│  │   order.ts│  │   shipping.ts│  ││  └─user│      index.ts│      account.ts│      statistics.ts│├─helpers├─models├─utils└─views Let's assume that we are writing a controller file that's going to import a module defined by thecore/product/order.ts file. Previously, usingCommonJS style'srequire, we would write the following: const Order = require('../core/product/order'); Now, with the new ES import syntax, this would be like the following: import * as Order from '../core/product/order'; Wait, isn't this essentially the same? Sort of. However, you may have noticed several index.ts files that I've put into folders. Now, in the core/product/index.tsfile, we could have the following: import * as Order from './order'; import * as Shipping from './shipping';   export { Order, Shipping } Or, we could also have the following: export * from './order'; export * from './shipping'; What's the difference? The ideal behind these two approaches of re-exporting modules can vary. The first style works better when we treat Order and Shipping as namespaces, under which the identifier names may not be easy to distinguish from one another. With this style, the files are the natural boundaries of building these namespaces. The second style weakens the namespace property of two files, and then uses them as tools to organize objects and classes under the same larger category. A good thingabout using these files as namespaces is that multiple-level re-exporting is fine, while weakening namespaces makes it harder to understand different identifier names as the number of re-exporting levels grows. Summary In this article, we discussed some interesting ideas and an architecture formed by these ideas. Most of these topics focused on limited examples, and did their own jobs.However, we also discussed ideas about putting a whole system together. Resources for Article: Further resources on this subject: Introducing Object Oriented Programmng with TypeScript [article] Writing SOLID JavaScript code with TypeScript [article] Optimizing JavaScript for iOS Hybrid Apps [article]
Read more
  • 0
  • 0
  • 3415

article-image-wrappers
Packt
27 May 2016
13 min read
Save for later

Wrappers

Packt
27 May 2016
13 min read
In this article by Erik Westra, author of the book Modular Programming with Python, we learn the concepts of wrappers. A wrapper is essentially a group of functions that call other functions to do the work. Wrappers are used to simplify an interface, to make a confusing or badly designed API easier to use, to convert data formats into something more convenient, and to implement cross-language compatibility. Wrappers are also sometimes used to add testing and error-checking code to an existing API. Let's take a look at a real-world application of a wrapper module. Imagine that you work for a large bank and have been asked to write a program to analyze fund transfers to help identify possible fraud. Your program receives information, in real time, about every inter-bank funds transfer that takes place. For each transfer, you are given: The amount of the transfer The ID of the branch in which the transfer took place The identification code for the bank the funds are being sent to Your task is to analyze the transfers over time to identify unusual patterns of activity. To do this, you need to calculate, for each of the last eight days, the total value of all transfers for each branch and destination bank. You can then compare the current day's totals against the average for the previous seven days, and flag any daily totals that are more than 50% above the average. You start by deciding how to represent the total transfers for a day. Because you need to keep track of this for each branch and destination bank, it makes sense to store these totals in a two-dimensional array: In Python, this type of two-dimensional array is represented as a list of lists: totals = [[0, 307512, 1612, 0, 43902, 5602918], [79400, 3416710, 75, 23508, 60912, 5806], ... ] You can then keep a separate list of the branch ID for each row and another list holding the destination bank code for each column: branch_ids = [125000249, 125000252, 125000371, ...] bank_codes = ["AMERUS33", "CERYUS33", "EQTYUS44", ...] Using these lists, you can calculate the totals for a given day by processing the transfers that took place on that particular day: totals = [] for branch in branch_ids: branch_totals = [] for bank in bank_codes: branch_totals.append(0) totals.append(branch_totals) for transfer in transfers_for_day: branch_index = branch_ids.index(transfer['branch']) bank_index = bank_codes.index(transfer['dest_bank']) totals[branch_index][bank_index] += transfer['amount'] So far so good. Once you have these totals for each day, you can then calculate the average and compare it against the current day's totals to identify the entries that are higher than 150% of the average. Let's imagine that you've written this program and managed to get it working. When you start using it, though, you immediately discover a problem: your bank has over 5,000 branches, and there are more than 15,000 banks worldwide that your bank can transfer funds to—that's a total of 75 million combinations that you need to keep totals for, and as a result, your program is taking far too long to calculate the totals. To make your program faster, you need to find a better way of handling large arrays of numbers. Fortunately, there's a library designed to do just this: NumPy. NumPy is an excellent array-handling library. You can create huge arrays and perform sophisticated operations on an array with a single function call. Unfortunately, NumPy is also a dense and impenetrable library. It was designed and written for people with a deep understanding of mathematics. While there are many tutorials available and you can generally figure out how to use it, the code that uses NumPy is often hard to comprehend. For example, to calculate the average across multiple matrices would involve the following: daily_totals = [] for totals in totals_to_average: daily_totals.append(totals) average = numpy.mean(numpy.array(daily_totals), axis=0) Figuring out what that last line does would require a trip to the NumPy documentation. Because of the complexity of the code that uses NumPy, this is a perfect example of a situation where a wrapper module can be used: the wrapper module can provide an easier-to-use interface to NumPy, so your code can use it without being cluttered with complex and confusing function calls. To work through this example, we'll start by installing the NumPy library. NumPy (http://www.numpy.org) runs on Mac OS X, Windows, and Linux machines. How you install it depends on which operating system you are using: For Mac OS X, you can download an installer from http://www.kyngchaos.com/software/python. For MS Windows, you can download a Python "wheel" file for NumPy from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy. Choose the pre-built version of NumPy that matches your operating system and the desired version of Python. To use the wheel file, use the pip install command, for example, pip install numpy-1.10.4+mkl-cp34-none-win32.whl. For more information about installing Python wheels, refer to https://pip.pypa.io/en/latest/user_guide/#installing-from-wheels. If your computer runs Linux, you can use your Linux package manager to install NumPy. Alternatively, you can download and build NumPy in source code form. To ensure that NumPy is working, fire up your Python interpreter and enter the following: import numpy a = numpy.array([[1, 2], [3, 4]]) print(a) All going well, you should see a 2 x 2 matrix displayed: [[1 2] [3 4]] Now that we have NumPy installed, let's start working on our wrapper module. Create a new Python source file, named numpy_wrapper.py, and enter the following into this file: import numpy That's all for now; we'll add functions to this wrapper module as we need them. Next, create another Python source file, named detect_unusual_transfers.py, and enter the following into this file: import random import numpy_wrapper as npw BANK_CODES = ["AMERUS33", "CERYUS33", "EQTYUS44", "LOYDUS33", "SYNEUS44", "WFBIUS6S"] BRANCH_IDS = ["125000249", "125000252", "125000371", "125000402", "125000596", "125001067"] As you can see, we are hardwiring the bank and branch codes for our example; in a real program, these values would be loaded from somewhere, such as a file or a database. Since we don't have any available data, we will use the random module to create some. We are also changing the name of the numpy_wrapper module to make it easier to access from our code. Let's now create some funds transfer data to process, using the random module: days = [1, 2, 3, 4, 5, 6, 7, 8] transfers = [] for i in range(10000): day = random.choice(days) bank_code = random.choice(BANK_CODES) branch_id = random.choice(BRANCH_IDS) amount = random.randint(1000, 1000000) transfers.append((day, bank_code, branch_id, amount)) Here, we randomly select a day, a bank code, a branch ID, and an amount, storing these values in the transfers list. Our next task is to collate this information into a series of arrays. This allows us to calculate the total value of the transfers for each day, grouped by the branch ID and destination bank. To do this, we'll create a NumPy array for each day, where the rows in each array represent branches and the columns represent destination banks. We'll then go through the list of transfers, processing them one by one. The following illustration summarizes how we process each transfer in turn: First, we select the array for the day on which the transfer occurred, and then we select the appropriate row and column based on the destination bank and the branch ID. Finally, we add the amount of the transfer to that item within the day's array. Let's implement this logic. Our first task is to create a series of NumPy arrays, one for each day. Here, we immediately hit a snag: NumPy has many different options for creating arrays; in this case, we want to create an array that holds integer values and has its contents initialized to zero. If we used NumPy directly, our code would look like the following: array = numpy.zeros((num_rows, num_cols), dtype=numpy.int32) This is not exactly easy to understand, so we're going to move this logic into our NumPy wrapper module. Edit the numpy_wrapper.py file, and add the following to the end of this module: def new(num_rows, num_cols): return numpy.zeros((num_rows, num_cols), dtype=numpy.int32) Now, we can create a new array by calling our wrapper function (npw.new()) and not have to worry about the details of how NumPy works at all. We have simplified the interface to this particular aspect of NumPy: Let's now use our wrapper function to create the eight arrays that we will need, one for each day. Add the following to the end of the detect_unusual_transfers.py file: transfers_by_day = {} for day in days: transfers_by_day[day] = npw.new(num_rows=len(BANK_CODES), num_cols=len(BRANCH_IDS)) Now that we have our NumPy arrays, we can use them as if they were nested Python lists. For example: array[row][col] = array[row][col] + amount We just need to choose the appropriate array, and calculate the row and column numbers to use. Here is the necessary code, which you should add to the end of your detect_unusual_transfers.py script: for day,bank_code,branch_id,amount in transfers: array = transfers_by_day[day] row = BRANCH_IDS.index(branch_id) col = BANK_CODES.index(bank_code) array[row][col] = array[row][col] + amount Now that we've collated the transfers into eight NumPy arrays, we want to use all this data to detect any unusual activity. For each combination of branch ID and destination bank code, we will need to do the following: Calculate the average of the first seven days' activity. Multiply the calculated average by 1.5. If the activity on the eighth day is greater than the average multiplied by 1.5, then we consider this activity to be unusual. Of course, we need to do this for every row and column in our arrays, which would be very slow; this is why we're using NumPy. So, we need to calculate the average for multiple arrays of numbers, then multiply the array of averages by 1.5, and finally, compare the values within the multiplied array against the array for the eighth day of data. Fortunately, these are all things that NumPy can do for us. We'll start by collecting together the seven arrays we need to average, as well as the array for the eighth day. To do this, add the following to the end of your program: latest_day = max(days) transfers_to_average = [] for day in days: if day != latest_day: transfers_to_average.append(transfers_by_day[day]) current = transfers_by_day[latest_day] To calculate the average of a list of arrays, NumPy requires us to use the following function call: average = numpy.mean(numpy.array(arrays_to_average), axis=0) Since this is confusing, we will move this function into our wrapper. Add the following code to the end of the numpy_wrapper.py module: def average(arrays_to_average): return numpy.mean(numpy.array(arrays_to_average), axis=0) This lets us calculate the average of the seven day's activity using a single call to our wrapper function. To do this, add the following to the end of your detect_unusual_transfers.py script: average = npw.average(transfers_to_average) As you can see, using the wrapper makes our code much easier to understand. Our next task is to multiply the array of calculated averages by 1.5, and compare the result against the current day's totals. Fortunately, NumPy makes this easy: unusual_transfers = current > average * 1.5 Because this code is so clear, there's no advantage in creating a wrapper function for it. The resulting array, unusual_transfers, will be the same size as our current and average arrays, where each entry in the array is either True or False: We're almost done; our final task is to identify the array entries with a value of True, and tell the user about the unusual activity. While we could scan through every row and column to find the True entries, using NumPy is much faster. The following NumPy code will give us a list containing the row and column numbers for the True entries in the array: indices = numpy.transpose(array.nonzero()) True to form, though, this code is hard to understand, so it's a perfect candidate for another wrapper function. Go back to your numpy_wrapper.py module, and add the following to the end of the file: def get_indices(array): return numpy.transpose(array.nonzero()) This function returns a list (actually an array) of (row,col) values for all the True entries in the array. Back in our detect_unusual_activity.py file, we can use this function to quickly identify the unusual activity: for row,col in npw.get_indices(unusual_transfers): branch_id = BRANCH_IDS[row] bank_code = BANK_CODES[col] average_amt = int(average[row][col]) current_amt = current[row][col] print("Branch {} transferred ${:,d}".format(branch_id, current_amt) + " to bank {}, average = ${:,d}".format(bank_code, average_amt)) As you can see, we use the BRANCH_IDS and BANK_CODES lists to convert from the row and column number back to the relevant branch ID and bank code. We also retrieve the average and current amounts for the suspicious activity. Finally, we print out this information to warn the user about the unusual activity. If you run your program, you should see an output that looks something like this: Branch 125000371 transferred $24,729,847 to bank WFBIUS6S, average = $14,954,617 Branch 125000402 transferred $26,818,710 to bank CERYUS33, average = $16,338,043 Branch 125001067 transferred $27,081,511 to bank EQTYUS44, average = $17,763,644 Because we are using random numbers for our financial data, the output will be random too. Try running the program a few times; you may not get any output at all if none of the randomly-generated values are suspicious. Of course, we are not really interested in detecting suspicious financial activity—this example is just an excuse for working with NumPy. What is far more interesting is the wrapper module that we created, hiding the complexity of the NumPy interface so that the rest of our program can concentrate on the job to be done. If we were to continue developing our unusual activity detector, we would no doubt add more functionality to our numpy_wrapper.py module as we found more NumPy functions that we wanted to wrap. Summary This is just one example of a wrapper module. As we mentioned earlier, simplifying a complex and confusing API is just one use for a wrapper module; they can also be used to convert data from one format to another, add testing and error-checking code to an existing API, and call functions that are written in a different language. Note that, by definition, a wrapper is always thin—while there might be code in a wrapper (for example, to convert a parameter from an object into a dictionary), the wrapper function always ends up calling another function to do the actual work.
Read more
  • 0
  • 0
  • 1740
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
article-image-exploring-scala-performance
Packt
19 May 2016
19 min read
Save for later

Exploring Scala Performance

Packt
19 May 2016
19 min read
In this article by Michael Diamant and Vincent Theron, author of the book Scala High Performance Programming, we look at how Scala features get compiled with bytecode. (For more resources related to this topic, see here.) Value classes The domain model of the order book application included two classes, Price and OrderId. We pointed out that we created domain classes for Price and OrderId to provide contextual meanings to the wrapped BigDecimal and Long. While providing us with readable code and compilation time safety, this practice also increases the amount of instances that are created by our application. Allocating memory and generating class instances create more work for the garbage collector by increasing the frequency of collections and by potentially introducing additional long-lived objects. The garbage collector will have to work harder to collect them, and this process may severely impact our latency. Luckily, as of Scala 2.10, the AnyVal abstract class is available for developers to define their own value classes to solve this problem. The AnyVal class is defined in the Scala doc (http://www.scala-lang.org/api/current/#scala.AnyVal) as, "the root class of all value types, which describe values not implemented as objects in the underlying host system." The AnyVal class can be used to define a value class, which receives special treatment from the compiler. Value classes are optimized at compile time to avoid the allocation of an instance, and instead, they use the wrapped type. Bytecode representation As an example, to improve the performance of our order book, we can define Price and OrderId as value classes: case class Price(value: BigDecimal) extends AnyVal case class OrderId(value: Long) extends AnyVal To illustrate the special treatment of value classes, we define a dummy function taking a Price value class and an OrderId value class as arguments: def printInfo(p: Price, oId: OrderId): Unit = println(s"Price: ${p.value}, ID: ${oId.value}") From this definition, the compiler produces the following method signature: public void printInfo(scala.math.BigDecimal, long); We see that the generated signature takes a BigDecimal object and a long object, even though the Scala code allows us to take advantage of the types defined in our model. This means that we cannot use an instance of BigDecimal or Long when calling printInfo because the compiler will throw an error. An interesting thing to notice is that the second parameter of printInfo is not compiled as Long (an object), but long (a primitive type, note the lower case 'l'). Long and other objects matching to primitive types, such as Int,Float or Short, are specially handled by the compiler to be represented by their primitive type at runtime. Value classes can also define methods. Let's enrich our Price class, as follows: case class Price(value: BigDecimal) extends AnyVal { def lowerThan(p: Price): Boolean = this.value < p.value } // Example usage val p1 = Price(BigDecimal(1.23)) val p2 = Price(BigDecimal(2.03)) p1.lowerThan(p2) // returns true Our new method allows us to compare two instances of Price. At compile time, a companion object is created for Price. This companion object defines a lowerThan method that takes two BigDecimal objects as parameters. In reality, when we call lowerThan on an instance of Price, the code is transformed by the compiler from an instance method call to a static method call that is defined in the companion object: public final boolean lowerThan$extension(scala.math.BigDecimal, scala.math.BigDecimal); Code: 0: aload_1 1: aload_2 2: invokevirtual #56 // Method scala/math/BigDecimal.$less:(Lscala/math/BigDecimal;)Z 5: ireturn If we were to write the pseudo-code equivalent to the preceding Scala code, it would look something like the following: val p1 = BigDecimal(1.23) val p2 = BigDecimal(2.03) Price.lowerThan(p1, p2) // returns true   Performance considerations Value classes are a great addition to our developer toolbox. They help us reduce the count of instances and spare some work for the garbage collector, while allowing us to rely on meaningful types that reflect our business abstractions. However, extending AnyVal comes with a certain set of conditions that the class must fulfill. For example, a value class may only have one primary constructor that takes one public val as a single parameter. Furthermore, this parameter cannot be a value class. We saw that value classes can define methods via def. Neither val nor var are allowed inside a value class. A nested class or object definitions are also impossible. Another limitation prevents value classes from extending anything other than a universal trait, that is, a trait that extends Any, only has defs as members, and performs no initialization. If any of these conditions is not fulfilled, the compiler generates an error. In addition to the preceding constraints that are listed, there are special cases in which a value class has to be instantiated by the JVM. Such cases include performing a pattern matching or runtime type test, or assigning a value class to an array. An example of the latter looks like the following snippet: def newPriceArray(count: Int): Array[Price] = { val a = new Array[Price](count) for(i <- 0 until count){ a(i) = Price(BigDecimal(Random.nextInt())) } a } The generated bytecode is as follows: public highperfscala.anyval.ValueClasses$$anonfun$newPriceArray$1(highperfscala.anyval.ValueClasses$Price[]); Code: 0: aload_0 1: aload_1 2: putfield #29 // Field a$1:[Lhighperfscala/anyval/ValueClasses$Price; 5: aload_0 6: invokespecial #80 // Method scala/runtime/AbstractFunction1$mcVI$sp."<init>":()V 9: return public void apply$mcVI$sp(int); Code: 0: aload_0 1: getfield #29 // Field a$1:[Lhighperfscala/anyval/ValueClasses$Price; 4: iload_1 5: new #31 // class highperfscala/anyval/ValueClasses$Price // omitted for brevity 21: invokevirtual #55 // Method scala/math/BigDecimal$.apply:(I)Lscala/math/BigDecimal; 24: invokespecial #59 // Method highperfscala/anyval/ValueClasses$Price."<init>":(Lscala/math/BigDecimal;)V 27: aastore 28: return Notice how mcVI$sp is invoked from newPriceArray, and this creates a new instance of ValueClasses$Price at the 5 instruction. As turning a single field case class into a value class is as trivial as extending the AnyVal trait, we recommend that you always use AnyVal wherever possible. The overhead is quite low, and it generate high benefits in terms of garbage collection's performance. To learn more about value classes, their limitations and use cases, you can find detailed descriptions at http://docs.scala-lang.org/overviews/core/value-classes.html. Tagged types – an alternative to value classes Value classes are an easy to use tool, and they can yield great improvements in terms of performance. However, they come with a constraining set of conditions, which can make them impossible to use in certain cases. We will conclude this section with a glance at an interesting alternative to leveraging the tagged type feature that is implemented by the Scalaz library. The Scalaz implementation of tagged types is inspired by another Scala library, named shapeless. The shapeless library provides tools to write type-safe, generic code with minimal boilerplate. While we will not explore shapeless, we encourage you to learn more about the project at https://github.com/milessabin/shapeless. Tagged types are another way to enforce compile-type checking without incurring the cost of instance instantiation. They rely on the Tagged structural type and the @@ type alias that is defined in the Scalaz library, as follows: type Tagged[U] = { type Tag = U } type @@[T, U] = T with Tagged[U] Let's rewrite part of our code to leverage tagged types with our Price object: object TaggedTypes { sealed trait PriceTag type Price = BigDecimal @@ PriceTag object Price { def newPrice(p: BigDecimal): Price = Tag[BigDecimal, PriceTag](p) def lowerThan(a: Price, b: Price): Boolean = Tag.unwrap(a) < Tag.unwrap(b) } } Let's perform a short walkthrough of the code snippet. We will define a PriceTag sealed trait that we will use to tag our instances, a Price type alias is created and defined as a BigDecimal object tagged with PriceTag. The Price object defines useful functions, including the newPrice factory function that is used to tag a given BigDecimal object and return a Price object (that is, a tagged BigDecimal object). We will also implement an equivalent to the lowerThan method. This function takes two Price objects (that is two tagged BigDecimal objects), extracts the content of the tag that are two BigDecimal objects, and compares them. Using our new Price type, we rewrite the same newPriceArray function that we previously looked at (the code is omitted for brevity, but you can refer to it in the attached source code), and print the following generated bytecode: public void apply$mcVI$sp(int); Code: 0: aload_0 1: getfield #29 // Field a$1:[Ljava/lang/Object; 4: iload_1 5: getstatic #35 // Field highperfscala/anyval/TaggedTypes$Price$.MODULE$:Lhighperfscala/anyval/TaggedTypes$Price$; 8: getstatic #40 // Field scala/package$.MODULE$:Lscala/package$; 11: invokevirtual #44 // Method scala/package$.BigDecimal:()Lscala/math/BigDecimal$; 14: getstatic #49 // Field scala/util/Random$.MODULE$:Lscala/util/Random$; 17: invokevirtual #53 // Method scala/util/Random$.nextInt:()I 20: invokevirtual #58 // Method scala/math/BigDecimal$.apply:(I)Lscala/math/BigDecimal; 23: invokevirtual #62 // Method highperfscala/anyval/TaggedTypes$Price$.newPrice:(Lscala/math/BigDecimal;)Ljava/lang/Object; 26: aastore 27: return In this version, we no longer see an instantiation of Price, even though we are assigning them to an array. The tagged Price implementation involves a runtime cast, but we anticipate that the cost of this cast will be less than the instance allocations (and garbage collection) that was observed in the previous value class Price strategy. Specialization To understand the significance of specialization, it is important to first grasp the concept of object boxing. The JVM defines primitive types (boolean, byte, char, float, int, long, short, and double) that are stack allocated rather than heap allocated. When a generic type is introduced, for example, scala.collection.immutable.List, the JVM references an object equivalent, instead of a primitive type. In this example, an instantiated list of integers would be heap allocated objects rather than integer primitives. The process of converting a primitive to its object equivalent is called boxing, and the reverse process is called unboxing. Boxing is a relevant concern for performance-sensitive programming because boxing involves heap allocation. In performance-sensitive code that performs numerical computations, the cost of boxing and unboxing can create an order of magnitude or larger performance slowdowns. Consider the following example to illustrate boxing overhead: List.fill(10000)(2).map(_* 2) Creating the list via fill yields 10,000 heap allocations of the integer object. Performing the multiplication in map requires 10,000 unboxings to perform multiplication and then 10,000 boxings to add the multiplication result into the new list. From this simple example, you can imagine how critical section arithmetic will be slowed down due to boxing or unboxing operations. As shown in Oracle's tutorial on boxing at https://docs.oracle.com/javase/tutorial/java/data/autoboxing.html, boxing in Java and also in Scala happens transparently. This means that without careful profiling or bytecode analysis, it is difficult to discern where you are paying the cost for object boxing. To ameliorate this problem, Scala provides a feature named specialization. Specialization refers to the compile-time process of generating duplicate versions of a generic trait or class that refer directly to a primitive type instead of the associated object wrapper. At runtime, the compiler-generated version of the generic class, or as it is commonly referred to, the specialized version of the class, is instantiated. This process eliminates the runtime cost of boxing primitives, which means that you can define generic abstractions while retaining the performance of a handwritten, specialized implementation. Bytecode representation Let's look at a concrete example to better understand how the specialization process works. Consider a naive, generic representation of the number of shares purchased, as follows: case class ShareCount[T](value: T) For this example, let's assume that the intended usage is to swap between an integer or long representation of ShareCount. With this definition, instantiating a long-based ShareCount instance incurs the cost of boxing, as follows: def newShareCount(l: Long): ShareCount[Long] = ShareCount(l) This definition translates to the following bytecode: public highperfscala.specialization.Specialization$ShareCount<java.lang.Object> newShareCount(long); Code: 0: new #21 // class orderbook/Specialization$ShareCount 3: dup 4: lload_1 5: invokestatic #27 // Method scala/runtime/BoxesRunTime.boxToLong:(J)Ljava/lang/Long; 8: invokespecial #30 // Method orderbook/Specialization$ShareCount."<init>":(Ljava/lang/Object;)V 11: areturn In the preceding bytecode, it is clear in the 5 instruction that the primitive long value is boxed before instantiating the ShareCount instance. By introducing the @specialized annotation, we are able to eliminate the boxing by having the compiler provide an implementation of ShareCount that works with primitive long values. It is possible to specify which types you wish to specialize by supplying a set of types. As defined in the Specializables trait (http://www.scala-lang.org/api/current/index.html#scala.Specializable), you are able to specialize for all JVM primitives, such as Unit and AnyRef. For our example, let's specialize ShareCount for integers and longs, as follows: case class ShareCount[@specialized(Long, Int) T](value: T) With this definition, the bytecode now becomes the following: public highperfscala.specialization.Specialization$ShareCount<java.lang.Object> newShareCount(long); Code: 0: new #21 // class highperfscala.specialization/Specialization$ShareCount$mcJ$sp 3: dup 4: lload_1 5: invokespecial #24 // Method highperfscala.specialization/Specialization$ShareCount$mcJ$sp."<init>":(J)V 8: areturn The boxing disappears and is curiously replaced with a different class name, ShareCount $mcJ$sp. This is because we are invoking the compiler-generated version of ShareCount that is specialized for long values. By inspecting the output of javap, we see that the specialized class generated by the compiler is a subclass of ShareCount: public class highperfscala.specialization.Specialization$ShareCount$mcI$sp extends highperfscala.specialization.Specialization$ShareCount<java .lang.Object> Bear this specialization implementation detail in mind as we turn to the Performance considerations section. The use of inheritance forces tradeoffs to be made in more complex use cases. Performance considerations At first glance, specialization appears to be a simple panacea for JVM boxing. However, there are several caveats to consider when using specialization. A liberal use of specialization leads to significant increases in compile time and resulting code size. Consider specializing Function3, which accepts three arguments as input and produces one result. To specialize four arguments across all types (that is, Byte, Short, Int, Long, Char, Float, Double, Boolean, Unit, and AnyRef) yields 10^4 or 10,000 possible permutations. For this reason, the standard library conserves application of specialization. In your own use cases, consider carefully which types you wish to specialize. If we specialize Function3 only for Int and Long, the number of generated classes shrinks to 2^4 or 16. Specialization involving inheritance requires extra attention because it is trivial to lose specialization when extending a generic class. Consider the following example: class ParentFoo[@specialized T](t: T) class ChildFoo[T](t: T) extends ParentFoo[T](t) def newChildFoo(i: Int): ChildFoo[Int] = new ChildFoo[Int](i) In this scenario, you likely expect that ChildFoo is defined with a primitive integer. However, as ChildFoo does not mark its type with the @specialized annotation, zero specialized classes are created. Here is the bytecode to prove it: public highperfscala.specialization.Inheritance$ChildFoo<java.lang.Object> newChildFoo(int); Code: 0: new #16 // class highperfscala/specialization/Inheritance$ChildFoo 3: dup 4: iload_1 5: invokestatic #22 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 8: invokespecial #25 // Method highperfscala/specialization/Inheritance$ChildFoo."<init>":(Ljava/lang/Object;)V 11: areturn The next logical step is to add the @specialized annotation to the definition of ChildFoo. In doing so, we stumble across a scenario where the compiler warns about use of specialization, as follows: class ParentFoo must be a trait. Specialized version of class ChildFoo will inherit generic highperfscala.specialization.Inheritance.ParentFoo[Boolean] class ChildFoo[@specialized T](t: T) extends ParentFoo[T](t) The compiler indicates that you have created a diamond inheritance problem, where the specialized versions of ChildFoo extend both ChildFoo and the associated specialized version of ParentFoo. This issue can be resolved by modeling the problem with a trait, as follows: trait ParentBar[@specialized T] { def t(): T } class ChildBar[@specialized T](val t: T) extends ParentBar[T] def newChildBar(i: Int): ChildBar[Int] = new ChildBar(i) This definition compiles using a specialized version of ChildBar, as we originally were hoping for, as see in the following code: public highperfscala.specialization.Inheritance$ChildBar<java.lang.Object> newChildBar(int); Code: 0: new #32 // class highperfscala/specialization/Inheritance$ChildBar$mcI$sp 3: dup 4: iload_1 5: invokespecial #35 // Method highperfscala/specialization/Inheritance$ChildBar$mcI$sp."<init>":(I)V 8: areturn An analogous and equally error-prone scenario is when a generic function is defined around a specialized type. Consider the following definition: class Foo[T](t: T) object Foo { def create[T](t: T): Foo[T] = new Foo(t) } def boxed: Foo[Int] = Foo.create(1) Here, the definition of create is analogous to the child class from the inheritance example. Instances of Foo wrapping a primitive that are instantiated from the create method will be boxed. The following bytecode demonstrates how boxed leads to heap allocations: public highperfscala.specialization.MethodReturnTypes$Foo<java.lang.Object> boxed(); Code: 0: getstatic #19 // Field highperfscala/specialization/MethodReturnTypes$Foo$.MODULE$:Lhighperfscala/specialization/MethodReturnTypes$Foo$; 3: iconst_1 4: invokestatic #25 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 7: invokevirtual #29 // Method highperfscala/specialization/MethodReturnTypes$Foo$.create:(Ljava/lang/Object;)Lhighperfscala/specialization/MethodReturnTypes$Foo; 10: areturn The solution is to apply the @specialized annotation at the call site, as follows: def createSpecialized[@specialized T](t: T): Foo[T] = new Foo(t) The solution is to apply the @specialized annotation at the call site, as follows: def createSpecialized[@specialized T](t: T): Foo[T] = new Foo(t) One final interesting scenario is when specialization is used with multiple types and one of the types extends AnyRef or is a value class. To illustrate this scenario, consider the following example: case class ShareCount(value: Int) extends AnyVal case class ExecutionCount(value: Int) class Container2[@specialized X, @specialized Y](x: X, y: Y) def shareCount = new Container2(ShareCount(1), 1) def executionCount = new Container2(ExecutionCount(1), 1) def ints = new Container2(1, 1) In this example, which methods do you expect to box the second argument to Container2? For brevity, we omit the bytecode, but you can easily inspect it yourself. As it turns out, shareCount and executionCount box the integer. The compiler does not generate a specialized version of Container2 that accepts a primitive integer and a value extending AnyVal (for example, ExecutionCount). The shareCount variable also causes boxing due to the order in which the compiler removes the value class type information from the source code. In both scenarios, the workaround is to define a case class that is specific to a set of types (for example, ShareCount and Int). Removing the generics allows the compiler to select the primitive types. The conclusion to draw from these examples is that specialization requires extra focus to be used throughout an application without boxing. As the compiler is unable to infer scenarios where you accidentally forgot to apply the @specialized annotation, it fails to raise a warning. This places the onus on you to be vigilant about profiling and inspecting bytecode to detect scenarios where specialization is incidentally dropped. To combat some of the shortcomings that specialization brings, there is a compiler plugin under active development, named miniboxing, at http://scala-miniboxing.org/. This compiler plugin applies a different strategy that involves encoding all primitive types into a long value and carrying metadata to recall the original type. For example, boolean can be represented in long using a single bit to signal true or false. With this approach, performance is qualitatively similar to specialization while producing orders of magnitude for fewer classes for large permutations. Additionally, miniboxing is able to more robustly handle inheritance scenarios and can warn when boxing will occur. While the implementations of specialization and miniboxing differ, the end user usage is quite similar. Like specialization, you must add appropriate annotations to activate the miniboxing plugin. To learn more about the plugin, you can view the tutorials on the miniboxing project site. The extra focus to ensure specialization produces heap-allocation free code is worthwhile because of the performance wins in performance-sensitive code. To drive home the value of specialization, consider the following microbenchmark that computes the cost of a trade by multiplying share count with execution price. For simplicity, primitive types are used directly instead of value classes. Of course, in production code this would never happen: @BenchmarkMode(Array(Throughput)) @OutputTimeUnit(TimeUnit.SECONDS) @Warmup(iterations = 3, time = 5, timeUnit = TimeUnit.SECONDS) @Measurement(iterations = 30, time = 10, timeUnit = TimeUnit.SECONDS) @Fork(value = 1, warmups = 1, jvmArgs = Array("-Xms1G", "-Xmx1G")) class SpecializationBenchmark { @Benchmark def specialized(): Double = specializedExecution.shareCount.toDouble * specializedExecution.price @Benchmark def boxed(): Double = boxedExecution.shareCount.toDouble * boxedExecution.price } object SpecializationBenchmark { class SpecializedExecution[@specialized(Int) T1, @specialized(Double) T2]( val shareCount: Long, val price: Double) class BoxingExecution[T1, T2](val shareCount: T1, val price: T2) val specializedExecution: SpecializedExecution[Int, Double] = new SpecializedExecution(10l, 2d) val boxedExecution: BoxingExecution[Long, Double] = new BoxingExecution(10l, 2d) } In this benchmark, two versions of a generic execution class are defined. SpecializedExecution incurs zero boxing when computing the total cost because of specialization, while BoxingExecution requires object boxing and unboxing to perform the arithmetic. The microbenchmark is invoked with the following parameterization: sbt 'project chapter3' 'jmh:run SpecializationBenchmark -foe true' We configure this JMH benchmark via annotations that are placed at the class level in the code. Annotations have the advantage of setting proper defaults for your benchmark, and simplifying the command-line invocation. It is still possible to override the values in the annotation with command-line arguments. We use the -foe command-line argument to enable failure on error because there is no annotation to control this behavior. In the rest of this book, we will parameterize JMH with annotations and omit the annotations in the code samples because we always use the same values. The results are summarized in the following table: Benchmark Throughput (ops per second) Error as percentage of throughput boxed 251,534,293.11 ±2.23 specialized 302,371,879.84 ±0.87 This microbenchmark indicates that the specialized implementation yields approximately 17% higher throughput. By eliminating boxing in a critical section of the code, there is an order of magnitude performance improvement available through judicious usage of specialization. For performance-sensitive arithmetic, this benchmark provides justification for the extra effort that is required to ensure that specialization is applied properly. Summary This article talk about different Scala constructs and features. It also explained different features and how they get compiled with bytecode. Resources for Article: Further resources on this subject: Differences in style between Java and Scala code [article] Integrating Scala, Groovy, and Flex Development with Apache Maven [article] Cluster Computing Using Scala [article]
Read more
  • 0
  • 0
  • 6884

article-image-diving-oop-principles
Packt
17 May 2016
21 min read
Save for later

Diving into OOP Principles

Packt
17 May 2016
21 min read
In this article by Andrea Chiarelli, the author of the book Mastering JavaScript Object-Oriented Programming, we will discuss about the OOP nature of JavaScript by showing that it complies with the OOP principles. It also will explain the main differences with classical OOP. The following topics will be addressed in the article: What are the principles of OOP paradigm? Support of abstraction and modeling How JavaScript implements Aggregation, Association, and Composition The Encapsulation principle in JavaScript How JavaScript supports inheritance principle Support of the polymorphism principle What are the differences between classical OOP and JavaScript's OOP (For more resources related to this topic, see here.) Object-Oriented Programming principles Object-Oriented Programming (OOP) is one of the most popular programming paradigms. Many developers use languages based on this programming model such as C++, Java, C#, Smalltalk, Objective-C, and many other. One of the keys to the success of this programming approach is that it promotes a modular design and code reuse—two important features when developing complex software. However, the Object-Oriented Programming paradigm is not based on a formal standard specification. There is not a technical document that defines what OOP is and what it is not. The OOP definition is mainly based on a common sense taken from the papers published by early researchers as Kristen Nygaard, Alan Kays, William Cook, and others. An interesting discussion about various attempts to define Object-Oriented Programming can be found online at the following URL: http://c2.com/cgi/wiki?DefinitionsForOo Anyway, a widely accepted definition to classify a programming language as Object Oriented is based on two requirements—its capability to model a problem through objects and its support of a few principles that grant modularity and code reuse. In order to satisfy the first requirement, a language must enable a developer to describe the reality using objects and to define relationships among objects such as the following: Association: This is the object's capability to refer another independent object Aggregation: This is the object's capability to embed one or more independent objects Composition: This is the object's capability to embed one or more dependent objects Commonly, the second requirement is satisfied if a language supports the following principles: Encapsulation: This is the capability to concentrate into a single entity data and code that manipulates it, hiding its internal details Inheritance: This is the mechanism by which an object acquires some or all features from one or more other objects Polymorphism: This is the capability to process objects differently based on their data type or structure Meeting these requirements is what usually allows us to classify a language as Object Oriented. Is JavaScript Object Oriented? Once we have established the principles commonly accepted for defining a language as Object Oriented, can we affirm that JavaScript is an OOP language? Many developers do not consider JavaScript a true Object-Oriented language due to its lack of class concept and because it does not enforce compliance with OOP principles. However, we can see that our informal definition make no explicit reference to classes. Features and principles are required for objects. Classes are not a real requirement, but they are sometimes a convenient way to abstract sets of objects with common properties. So, a language can be Object Oriented if it supports objects even without classes, as in JavaScript. Moreover, the OOP principles required for a language are intended to be supported. They should not be mandatory in order to do programming in a language. The developer can choose to use constructs that allow him to create Object Oriented code or not. Many criticize JavaScript because developers can write code that breaches the OOP principles. But this is just a choice of the programmer, not a language constraint. It also happens with other programming languages, such as C++. We can conclude that lack of abstract classes and leaving the developer free to use or not features that support OOP principles are not a real obstacle to consider JavaScript an OOP language. So, let's analyze in the following sections how JavaScript supports abstraction and OOP principles. Abstraction and modeling support The first requirement for us to consider a language as Object Oriented is its support to model a problem through objects. We already know that JavaScript supports objects, but here we should determine whether they are supported in order to be able to model reality. In fact, in Object-Oriented Programming we try to model real-world entities and processes and represent them in our software. We need a model because it is a simplification of reality, it allows us to reduce the complexity offering a vision from a particular perspective and helps us to reason about relationship among entities. This simplification feature is usually known as abstraction, and it is sometimes considered as one of the principles of OOP. Abstraction is the concept of moving the focus from the details and concrete implementation of things to the features that are relevant for a specific purpose, with a more general and abstract approach. In other words, abstraction is the capability to define which properties and actions of a real-world entity have to be represented by means of objects in a program in order to solve a specific problem. For example, thanks to abstraction, we can decide that to solve a specific problem we can represent a person just as an object with name, surname, and age, since other information such as address, height, hair color, and so on are not relevant for our purpose. More than a language feature, it seems a human capability. For this reason, we prefer not to consider it an OOP principle but a (human) capability to support modeling. Modeling reality not only involves defining objects with relevant features for a specific purpose. It also includes the definition of relationships between objects, such as Association, Aggregation, and Composition. Association Association is a relationship between two or more objects where each object is independent of each other. This means that an object can exist without the other and no object owns the other. Let us clarify with an example. In order to define a parent–child relationship between persons, we can do so as follows: function Person(name, surname) { this.name = name; this.surname = surname; this.parent = null; } var johnSmith = new Person("John", "Smith"); var fredSmith = new Person("Fred", "Smith"); fredSmith.parent = johnSmith; The assignment of the object johnSmith to the parent property of the object fredSmith establishes an association between the two objects. Of course, the object johnSmith lives independently from the object fredSmith and vice versa. Both can be created and deleted independently each other. As we can see from the example, JavaScript allows to define association between objects using a simple object reference through a property. Aggregation Aggregation is a special form of association relationship where an object has a major role than the other one. Usually, this major role determines a sort of ownership of an object in relation to the other. The owner object is often called aggregate and the owned object is called component. However, each object has an independent life. An example of aggregation relationship is the one between a company and its employees, as in the following example: var company = { name: "ACME Inc.", employees: [] }; var johnSmith = new Person("John", "Smith"); var marioRossi = new Person("Mario", "Rossi"); company.employees.push(johnSmith); company.employees.push(marioRossi); The person objects added to employees collection help to define the company object, but they are independent from it. If the company object is deleted, each single person still lives. However, the real meaning of a company is bound to the presence of its employees. Again, the code show us that the aggregation relationship is supported by JavaScript by means of object reference. It is important not to confuse the Association with the Aggregation. Even if the support of the two relationships is syntactically identical, that is, the assignment or attachment of an object to a property, from a conceptual point of view they represent different situations. Aggregation is the mechanism that allows you to create an object consisting of several objects, while the association relates autonomous objects. In any case, JavaScript makes no control over the way in which we associate or aggregate objects between them. Association and Aggregation raise a constraint more conceptual than technical. Composition Composition is a strong type of Aggregation, where each component object has no independent life without its owner, the aggregate. Consider the following example: var person = {name: "John", surname: "Smith", address: { street: "123 Duncannon Street", city: "London", country: "United Kingdom" }}; This code defines a person with his address represented as an object. The address property in strictly bound to the person object. Its life is dependent on the life of the person and it cannot have an independent life without the person. If the person object is deleted, also the address object is deleted. In this case, the strict relation between the person and his address is expressed in JavaScript assigning directly the literal representing the address to the address property. OOP principles support The second requirement that allows us to consider JavaScript as an Object-Oriented language involves the support of at least three principles—encapsulation, inheritance, and polymorphism. Let analyze how JavaScript supports each of these principles. Encapsulation Objects are central to the Object-Oriented Programming model, and they represent the typical expression of encapsulation, that is, the ability to concentrate in one entity both data (properties) and functions (methods), hiding the internal details. In other words, the encapsulation principle allows an object to expose just what is needed to use it, hiding the complexity of its implementation. This is a very powerful principle, often found in the real world that allows us to use an object without knowing how it internally works. Consider for instance how we drive cars. We need just to know how to speed up, brake, and change direction. We do not need to know how the car works in details, how its motor burns fuel or transmits movement to the wheels. To understand the importance of this principle also in software development, consider the following code: var company = { name: "ACME Inc.", employees: [], sortEmployeesByName: function() {...} }; It creates a company object with a name, a list of employees and a method to sort the list of employees using their name property. If we need to get a sorted list of employees of the company, we simply need to know that the sortEmployeesByName() method accomplishes this task. We do not need to know how this method works, which algorithm it implements. That is an implementation detail that encapsulation hides to us. Hiding internal details and complexity has two main reasons: The first reason is to provide a simplified and understandable way to use an object without the need to understand the complexity inside. In our example, we just need to know that to sort employees, we have to call a specific method. The second reason is to simplify change management. Changes to the internal sort algorithm do not affect our way to order employees by name. We always continue to call the same method. Maybe we will get a more efficient execution, but the expected result will not change. We said that encapsulation hides internal details in order to simplify both the use of an object and the change of its internal implementation. However, when internal implementation depends on publicly accessible properties, we risk to frustrate the effort of hiding internal behavior. For example, what happens if you assign a string to the property employees of the object company? company.employees = "this is a joke!"; company.sortEmployeesByName(); The assignment of a string to a property whose value is an array is perfectly legal in JavaScript, since it is a language with dynamic typing. But most probably, we will get an exception when calling the sort method after this assignment, since the sort algorithm expects an array. In this case, the encapsulation principle has not been completely implemented. A general approach to prevent direct access to relevant properties is to replace them with methods. For example, we can redefine our company object as in the following: function Company(name) { var employees = []; this.name = name; this.getEmployees = function() { return employees; }; this.addEmployee = function(employee) { employees.push(employee); }; this.sortEmployeesByName = function() { ... }; } var company = new Company("ACME Inc."); With this approach, we cannot access directly the employees property, but we need to use the getEmployees() method to obtain the list of employees of the company and addEmployee() to add an employee to the list. This guarantees that the internal state remains really hidden and consistent. The way we created methods for the Company() constructor is not the best one. This is just one possible approach to enforce encapsulation by protecting the internal state of an object. This kind of data protection is usually called information hiding and, although often linked to encapsulation, it should be considered as an autonomous principle. Information hiding deals with the accessibility to an object's members, in particular to properties. While encapsulation concerns hiding details, the information hiding principle usually allows different access levels to the members of an object. Inheritance In Object-Oriented Programming, inheritance enables new objects to acquire the properties of existing objects. This relationship between two objects is very common and can be found in many situations in real life. It usually refers to creating a specialized object starting from a more general one. Let's consider, for example, a person: he has some features such as name, surname, height, weight, and so on. The set of features describes a generic entity that represents a person. Using abstraction, we can select the features needed for our purpose and represent a person as an object: If we need a special person who is able to program computers, that is a programmer, we need to create an object that has all the properties of a generic person plus some new properties that characterize the programmer object. For instance, the new programmer object can have a property describing which programming language he knows. Suppose we choose to create the new programmer object by duplicating the properties of the person object and adding to it the programming language knowledge as follows: This approach is in contrast with the Object-Oriented Programming goals. In particular, it does not reuse existing code, since we are duplicating the properties of the person object. A more appropriate approach should reuse the code created to define the person object. This is where the inheritance principle can help us. It allows to share common features between objects avoiding code duplication. Inheritance is also called subclassing in languages that support classes. A class that inherits from another class is called subclass, while the class from which it is derived is called superclass. Apart from the naming, the inheritance concept is the same, although of course it does not seem suited to JavaScript. We can implement inheritance in JavaScript in various ways. Consider, for example, the following constructor of person objects: function Person() { this.name = ""; this.surname = ""; } In order to define a programmer as a person specialized in computer programming, we will add a new property describing its knowledge about a programming language: knownLanguage. A simple approach to create the programmer object that inherits properties from person is based on prototype. Here is a possible implementation: function Programmer() { this.knownLanguage = ""; } Programmer.prototype = new Person(); We will create a programmer with the following code: var programmer = new Programmer(); We will obtain an object that has the properties of the person object (name and surname) and the specific property of the programmer (knownLanguage), that is, the programmer object inherits the person properties. This is a simple example to demonstrate that JavaScript supports the inheritance principle of Object-Oriented Programming at its basic level. Inheritance is a complex concept that has many facets and several variants in programming, many of them dependent on the used language. Polymorphism In Object-Oriented Programming, polymorphism is understood in different ways, even if the basis is a common notion—the ability to handle multiple data types uniformly. Support of polymorphism brings benefits in programming that go toward the overall goal of OOP. Mainly, it reduces coupling in our application, and in some cases, allows to create more compact code. Most common ways to support polymorphism by a programming language include: Methods that take parameters with different data types (overloading) Management of generic types, not known in advance (parametric polymorphism) Expressions whose type can be represented by a class and classes derived from it (subtype polymorphism or inclusion polymorphism) In most languages, overloading is what happens when you have two methods with the same name but different signature. At compile time, the compiler works out which method to call based on matching between types of invocation arguments and method's parameters. The following is an example of method overloading in C#: public int CountItems(int x) { return x.ToString().Length; } public int CountItems(string x) { return x.Length; } The CountItems()method has two signatures—one for integers and one for strings. This allows to count the number of digits in a number or the number of characters in a string in a uniform manner, just calling the same method. Overloading can also be expressed through methods with different number of arguments, as shown in the following C# example: public int Sum(int x, int y) { return Sum(x, y, 0); } public int Sum(int x, int y, int z) { return x+ y + z; } Here, we have the Sum()method that is able to sum two or three integers. The correct method definition will be detected on the basis of the number of arguments passed. As JavaScript developers, we are able to replicate this behavior in our scripts. For example, the C# CountItems() method become in JavaScript as follows: function countItems(x) { return x.toString().length; } While the Sum() example will be as follows: function sum(x, y, z) { x = x?x:0; y = y?y:0; z = z?z:0; return x + y + z; } Or, using the more convenient ES6 syntax: function sum(x = 0, y = 0, z = 0) { return x + y + z; } These examples demonstrate that JavaScript supports overloading in a more immediate way than strong-typed languages. In strong-typed languages, overloading is sometimes called static polymorphism, since the correct method to invoke is detected statically by the compiler at compile time. Parametric polymorphism allows a method to work on parameters of any type. Often it is also called generics and many languages support it in built-in methods. For example, in C#, we can define a list of items whose type is not defined in advance using the List<T> generic type. This allows us to create lists of integers, strings, or any other type. We can also create our generic class as shown by the following C# code: public class Stack<T> { private T[] items; private int count; public void Push(T item) { ... } public T Pop() { ... } } This code defines a typical stack implementation whose item's type is not defined. We will be able to create, for example, a stack of strings with the following code: var stack = new Stack<String>(); Due to its dynamic data typing, JavaScript supports parametric polymorphism implicitly. In fact, the type of function's parameters is inherently generic, since its type is set when a value is assigned to it. The following is a possible implementation of a stack constructor in JavaScript: function Stack() { this.stack = []; this.pop = function(){ return this.stack.pop(); } this.push = function(item){ this.stack.push(item); } } Subtype polymorphism allows to consider objects of different types, but with an inheritance relationship, to be handled consistently. This means that wherever I can use an object of a specific type, here I can use an object of a type derived from it. Let's see a C# example to clarify this concept: public class Person { public string Name {get; set;} public string SurName {get; set;} } public class Programmer:Person { public String KnownLanguage {get; set;} } public void WriteFullName(Person p) { Console.WriteLine(p.Name + " " + p.SurName); } var a = new Person(); a.Name = "John"; a.SurName = "Smith"; var b = new Programmer(); b.Name = "Mario"; b.SurName = "Rossi"; b.KnownLanguage = "C#"; WriteFullName(a); //result: John Smith WriteFullName(b); //result: Mario Rossi In this code, we again present the definition of the Person class and its derived class Programmer and define the method WriteFullName() that accepts argument of type Person. Thanks to subtype polymorphism, we can pass to WriteFullName() also objects of type Programmer, since it is derived from Person. In fact, from a conceptual point of view a programmer is also a person, so subtype polymorphism fits to a concrete representation of reality. Of course, the C# example can be easily reproduced in JavaScript since we have no type constraint. Let's see the corresponding code: function Person() { this.name = ""; this.surname = ""; } function Programmer() { this.knownLanguage = ""; } Programmer.prototype = new Person(); function writeFullName(p) { console.log(p.name + " " + p.surname); } var a = new Person(); a.name = "John"; a.surname = "Smith"; var b = new Programmer(); b.name = "Mario"; b.surname = "Rossi"; b.knownLanguage = "JavaScript"; writeFullName(a); //result: John Smith writeFullName(b); //result: Mario Rossi As we can see, the JavaScript code is quite similar to the C# code and the result is the same. JavaScript OOP versus classical OOP The discussion conducted so far shows how JavaScript supports the fundamental Object-Oriented Programming principles and can be considered a true OOP language as many other. However, JavaScript differs from most other languages for certain specific features which can create some concern to the developers used to working with programming languages that implement the classical OOP. The first of these features is the dynamic nature of the language both in data type management and object creation. Since data types are dynamically evaluated, some features of OOP, such as polymorphism, are implicitly supported. Moreover, the ability to change an object structure at runtime breaks the common sense that binds an object to a more abstract entity like a class. The lack of the concept of class is another big difference with the classical OOP. Of course, we are talking about the class generalization, nothing to do with the class construct introduced by ES6 that represents just a syntactic convenience for standard JavaScript constructors. Classes in most Object-Oriented languages represent a generalization of objects, that is, an extra level of abstraction upon the objects. So, classical Object-Oriented programming has two types of abstractions—classes and objects. An object is an abstraction of a real-world entity while a class is an abstraction of an object or another class (in other words, it is a generalization). Objects in classical OOP languages can only be created by instantiating classes. JavaScript has a different approach in object management. It has just one type of abstraction—the objects. Unlike the classical OOP approach, an object can be created directly as an abstraction of a real-world entity or as an abstraction of another object. In the latter case the abstracted object is called prototype. As opposed to the classical OOP approach, the JavaScript approach is sometimes called Prototypal Object-Oriented Programming. Of course, the lack of a notion of class in JavaScript affects the inheritance mechanism. In fact, while in classical OOP inheritance is an operation allowed on classes, in prototypal OOP inheritance is an operation on objects. That does not mean that classical OOP is better than prototypal OOP or vice versa. They are simply different approaches. However, we cannot ignore that these differences lead to some impact in the way we manage objects. At least we note that while in classical OOP classes are immutable, that is we cannot add, change, or remove properties or methods at runtime, in prototypal OOP objects and prototypes are extremely flexible. Moreover, classical OOP adds an extra level of abstraction with classes, leading to a more verbose code, while prototypal OOP is more immediate and requires a more compact code. Summary In this article, we explored the basic principles of Object-Oriented Programming paradigm. We have been focusing on abstraction to define objects, association, aggregation, and composition to define relationships between objects, encapsulation, inheritance, and polymorphism principles to outline the basic principles required by OOP. We have seen how JavaScript supports all features that allows us to define it as a true OOP language and have compared classical OOP with prototypal OOP. Once we established that JavaScript is a true Object-Oriented language like other languages such as Java, C #, and C ++. Resources for Article:   Further resources on this subject: Just Object Oriented Programming (Object Oriented Programming, explained) [article] Introducing Object Oriented Programmng with TypeScript [article] Python 3 Object Oriented Programming: Managing objects [article]
Read more
  • 0
  • 0
  • 2785

article-image-interfaces
Packt
17 May 2016
19 min read
Save for later

Expert Python Programming: Interfaces

Packt
17 May 2016
19 min read
This article by, Michał Jaworski and Tarek Ziadé, the authors of the book, Expert Python Programming - Second Edition, will mainly focus on interfaces. (For more resources related to this topic, see here.) An interface is a definition of an API. It describes a list of methods and attributes a class should have to implement with the desired behavior. This description does not implement any code but just defines an explicit contract for any class that wishes to implement the interface. Any class can then implement one or several interfaces in whichever way it wants. While Python prefers duck-typing over explicit interface definitions, it may be better to use them sometimes. For instance, explicit interface definition makes it easier for a framework to define functionalities over interfaces. The benefit is that classes are loosely coupled, which is considered as a good practice. For example, to perform a given process, a class A does not depend on a class B, but rather on an interface I. Class B implements I, but it could be any other class. The support for such a technique is built-in in many statically typed languages such as Java or Go. The interfaces allow the functions or methods to limit the range of acceptable parameter objects that implement a given interface, no matter what kind of class it comes from. This allows for more flexibility than restricting arguments to given types or their subclasses. It is like an explicit version of duck-typing behavior: Java uses interfaces to verify a type safety at compile time rather than use duck-typing to tie things together at run time. Python has a completely different typing philosophy to Java, so it does not have native support for interfaces. Anyway, if you would like to have more explicit control on application interfaces, there are generally two solutions to choose from: Use some third-party framework that adds a notion of interfaces Use some of the advanced language features to build your methodology for handling interfaces. Using zope.interface There are a few frameworks that allow you to build explicit interfaces in Python. The most notable one is a part of the Zope project. It is the zope.interface package. Although, nowadays, Zope is not as popular as it used to be, the zope.interface package is still one of the main components of the Twisted framework. The core class of the zope.interface package is the Interface class. It allows you to explicitly define a new interface by subclassing. Let's assume that we want to define the obligatory interface for every implementation of a rectangle: from zope.interface import Interface, Attribute class IRectangle(Interface): width = Attribute("The width of rectangle") height = Attribute("The height of rectangle") def area(): """ Return area of rectangle """ def perimeter(): """ Return perimeter of rectangle """ Some important things to remember when defining interfaces with zope.interface are as follows: The common naming convention for interfaces is to use I as the name suffix. The methods of the interface must not take the self parameter. As the interface does not provide concrete implementation, it should consist only of empty methods. You can use the pass statement, raise NotImplementedError, or provide a docstring (preferred). An interface can also specify the required attributes using the Attribute class. When you have such a contract defined, you can then define new concrete classes that provide implementation for our IRectangle interface. In order to do that, you need to use the implementer() class decorator and implement all of the defined methods and attributes: @implementer(IRectangle) class Square: """ Concrete implementation of square with rectangle interface """ def __init__(self, size): self.size = size @property def width(self): return self.size @property def height(self): return self.size def area(self): return self.size ** 2 def perimeter(self): return 4 * self.size @implementer(IRectangle) class Rectangle: """ Concrete implementation of rectangle """ def __init__(self, width, height): self.width = width self.height = height def area(self): return self.width * self.height def perimeter(self): return self.width * 2 + self.height * 2 It is common to say that the interface defines a contract that a concrete implementation needs to fulfill. The main benefit of this design pattern is being able to verify consistency between contract and implementation before the object is being used. With the ordinary duck-typing approach, you only find inconsistencies when there is a missing attribute or method at runtime. With zope.interface, you can introspect the actual implementation using two methods from the zope.interface.verify module to find inconsistencies early on: verifyClass(interface, class_object): This verifies the class object for existence of methods and correctness of their signatures without looking for attributes verifyObject(interface, instance): This verifies the methods, their signatures, and also attributes of the actual object instance Since we have defined our interface and two concrete implementations, let's verify their contracts in an interactive session: >>> from zope.interface.verify import verifyClass, verifyObject >>> verifyObject(IRectangle, Square(2)) True >>> verifyClass(IRectangle, Square) True >>> verifyObject(IRectangle, Rectangle(2, 2)) True >>> verifyClass(IRectangle, Rectangle) True Nothing impressive. The Rectangle and Square classes carefully follow the defined contract so there is nothing more to see than a successful verification. But what happens when we make a mistake? Let's see an example of two classes that fail to provide full IRectangle interface implementation: @implementer(IRectangle) class Point: def __init__(self, x, y): self.x = x self.y = y @implementer(IRectangle) class Circle: def __init__(self, radius): self.radius = radius def area(self): return math.pi * self.radius ** 2 def perimeter(self): return 2 * math.pi * self.radius The Point class does not provide any method or attribute of the IRectangle interface, so its verification will show inconsistencies already on the class level: >>> verifyClass(IRectangle, Point) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "zope/interface/verify.py", line 102, in verifyClass return _verify(iface, candidate, tentative, vtype='c') File "zope/interface/verify.py", line 62, in _verify raise BrokenImplementation(iface, name) zope.interface.exceptions.BrokenImplementation: An object has failed to implement interface <InterfaceClass __main__.IRectangle> The perimeter attribute was not provided. The Circle class is a bit more problematic. It has all the interface methods defined but breaks the contract on the instance attribute level. This is the reason why, in most cases, you need to use the verifyObject() function to completely verify the interface implementation: >>> verifyObject(IRectangle, Circle(2)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "zope/interface/verify.py", line 105, in verifyObject return _verify(iface, candidate, tentative, vtype='o') File "zope/interface/verify.py", line 62, in _verify raise BrokenImplementation(iface, name) zope.interface.exceptions.BrokenImplementation: An object has failed to implement interface <InterfaceClass __main__.IRectangle> The width attribute was not provided. Using zope.inteface is an interesting way to decouple your application. It allows you to enforce proper object interfaces without the need for the overblown complexity of multiple inheritance, and it also allows to catch inconsistencies early. However, the biggest downside of this approach is the requirement that you explicitly define that the given class follows some interface in order to be verified. This is especially troublesome if you need to verify instances coming from external classes of built-in libraries. zope.interface provides some solutions for that problem, and you can of course handle such issues on your own by using the adapter pattern, or even monkey-patching. Anyway, the simplicity of such solutions is at least arguable. Using function annotations and abstract base classes Design patterns are meant to make problem solving easier and not to provide you with more layers of complexity. The zope.interface is a great concept and may greatly fit some projects, but it is not a silver bullet. By using it, you may soon find yourself spending more time on fixing issues with incompatible interfaces for third-party classes and providing never-ending layers of adapters instead of writing the actual implementation. If you feel that way, then this is a sign that something went wrong. Fortunately, Python supports for building lightweight alternative to the interfaces. It's not a full-fledged solution like zope.interface or its alternatives but it generally provides more flexible applications. You may need to write a bit more code, but in the end you will have something that is more extensible, better handles external types, and may be more future proof. Note that Python in its core does not have explicit notions of interfaces, and probably will never have, but has some of the features that allow you to build something that resembles the functionality of interfaces. The features are: Abstract base classes (ABCs) Function annotations Type annotations The core of our solution is abstract base classes, so we will feature them first. As you probably know, the direct type comparison is considered harmful and not pythonic. You should always avoid comparisons as follows: assert type(instance) == list Comparing types in functions or methods that way completely breaks the ability to pass a class subtype as an argument to the function. The slightly better approach is to use the isinstance() function that will take the inheritance into account: assert isinstance(instance, list) The additional advantage of isinstance() is that you can use a larger range of types to check the type compatibility. For instance, if your function expects to receive some sort of sequence as the argument, you can compare against the list of basic types: assert isinstance(instance, (list, tuple, range)) Such a way of type compatibility checking is OK in some situations but it is still not perfect. It will work with any subclass of list, tuple, or range, but will fail if the user passes something that behaves exactly the same as one of these sequence types but does not inherit from any of them. For instance, let's relax our requirements and say that you want to accept any kind of iterable as an argument. What would you do? The list of basic types that are iterable is actually pretty long. You need to cover list, tuple, range, str, bytes, dict, set, generators, and a lot more. The list of applicable built-in types is long, and even if you cover all of them it will still not allow you to check against the custom class that defines the __iter__() method, but will instead inherit directly from object. And this is the kind of situation where abstract base classes (ABC) are the proper solution. ABC is a class that does not need to provide a concrete implementation but instead defines a blueprint of a class that may be used to check against type compatibility. This concept is very similar to the concept of abstract classes and virtual methods known in the C++ language. Abstract base classes are used for two purposes: Checking for implementation completeness Checking for implicit interface compatibility So, let's assume we want to define an interface which ensures that a class has a push() method. We need to create a new abstract base class using a special ABCMeta metaclass and an abstractmethod() decorator from the standard abc module: from abc import ABCMeta, abstractmethod class Pushable(metaclass=ABCMeta): @abstractmethod def push(self, x): """ Push argument no matter what it means """ The abc module also provides an ABC base class that can be used instead of the metaclass syntax: from abc import ABCMeta, abstractmethod class Pushable(metaclass=ABCMeta): @abstractmethod def push(self, x): """ Push argument no matter what it means """ Once it is done, we can use that Pushable class as a base class for concrete implementation and it will guard us from the instantiation of objects that would have incomplete implementation. Let's define DummyPushable, which implements all interface methods and the IncompletePushable that breaks the expected contract: class DummyPushable(Pushable): def push(self, x): return class IncompletePushable(Pushable): pass If you want to obtain the DummyPushable instance, there is no problem because it implements the only required push() method: >>> DummyPushable() <__main__.DummyPushable object at 0x10142bef0> But if you try to instantiate IncompletePushable, you will get TypeError because of missing implementation of the interface() method: >>> IncompletePushable() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Can't instantiate abstract class IncompletePushable with abstract methods push The preceding approach is a great way to ensure implementation completeness of base classes but is as explicit as the zope.interface alternative. The DummyPushable instances are of course also instances of Pushable because Dummy is a subclass of Pushable. But how about other classes with the same methods but not descendants of Pushable? Let's create one and see: >>> class SomethingWithPush: ... def push(self, x): ... pass ... >>> isinstance(SomethingWithPush(), Pushable) False Something is still missing. The SomethingWithPush class definitely has a compatible interface but is not considered as an instance of Pushable yet. So, what is missing? The answer is the __subclasshook__(subclass) method that allows you to inject your own logic into the procedure that determines whether the object is an instance of a given class. Unfortunately, you need to provide it by yourself, as abc creators did not want to constrain the developers in overriding the whole isinstance() mechanism. We got full power over it, but we are forced to write some boilerplate code. Although you can do whatever you want to, usually the only reasonable thing to do in the __subclasshook__() method is to follow the common pattern. The standard procedure is to check whether the set of defined methods are available somewhere in the MRO of the given class: from abc import ABCMeta, abstractmethod class Pushable(metaclass=ABCMeta): @abstractmethod def push(self, x): """ Push argument no matter what it means """ @classmethod def __subclasshook__(cls, C): if cls is Pushable: if any("push" in B.__dict__ for B in C.__mro__): return True return NotImplemented With the __subclasshook__() method defined that way, you can now confirm that the instances that implement the interface implicitly are also considered instances of the interface: >>> class SomethingWithPush: ... def push(self, x): ... pass ... >>> isinstance(SomethingWithPush(), Pushable) True Unfortunately, this approach to the verification of type compatibility and implementation completeness does not take into account the signatures of class methods. So, if the number of expected arguments is different in implementation, it will still be considered compatible. In most cases, this is not an issue, but if you need such fine-grained control over interfaces, the zope.interface package allows for that. As already said, the __subclasshook__() method does not constrain you in adding more complexity to the isinstance() function's logic to achieve a similar level of control. The two other features that complement abstract base classes are function annotations and type hints. Function annotation is the syntax element. It allows you to annotate functions and their arguments with arbitrary expressions. This is only a feature stub that does not provide any syntactic meaning. There is no utility in the standard library that uses this feature to enforce any behavior. Anyway, you can use it as a convenient and lightweight way to inform the developer of the expected argument interface. For instance, consider this IRectangle interface rewritten from zope.interface to abstract the base class: from abc import ( ABCMeta, abstractmethod, abstractproperty ) class IRectangle(metaclass=ABCMeta): @abstractproperty def width(self): return @abstractproperty def height(self): return @abstractmethod def area(self): """ Return rectangle area """ @abstractmethod def perimeter(self): """ Return rectangle perimeter """ @classmethod def __subclasshook__(cls, C): if cls is IRectangle: if all([ any("area" in B.__dict__ for B in C.__mro__), any("perimeter" in B.__dict__ for B in C.__mro__), any("width" in B.__dict__ for B in C.__mro__), any("height" in B.__dict__ for B in C.__mro__), ]): return True return NotImplemented If you have a function that works only on rectangles, let's say draw_rectangle(), you could annotate the interface of the expected argument as follows: def draw_rectangle(rectangle: IRectange): ... This adds nothing more than information for the developer about expected information. And even this is done through an informal contract because, as we know, bare annotations contain no syntactic meaning. However, they are accessible at runtime, so we can do something more. Here is an example implementation of a generic decorator that is able to verify interface from function annotation if it is provided using abstract base classes: def ensure_interface(function): signature = inspect.signature(function) parameters = signature.parameters @wraps(function) def wrapped(*args, **kwargs): bound = signature.bind(*args, **kwargs) for name, value in bound.arguments.items(): annotation = parameters[name].annotation if not isinstance(annotation, ABCMeta): continue if not isinstance(value, annotation): raise TypeError( "{} does not implement {} interface" "".format(value, annotation) ) function(*args, **kwargs) return wrapped Once it is done, we can create some concrete class that implicitly implements the IRectangle interface (without inheriting from IRectangle) and update the implementation of the draw_rectangle() function to see how the whole solution works: class ImplicitRectangle: def __init__(self, width, height): self._width = width self._height = height @property def width(self): return self._width @property def height(self): return self._height def area(self): return self.width * self.height def perimeter(self): return self.width * 2 + self.height * 2 @ensure_interface def draw_rectangle(rectangle: IRectangle): print( "{} x {} rectangle drawing" "".format(rectangle.width, rectangle.height) ) If we feed the draw_rectangle() function with an incompatible object, it will now raise TypeError with a meaningful explanation: >>> draw_rectangle('foo') Traceback (most recent call last): File "<input>", line 1, in <module> File "<input>", line 101, in wrapped TypeError: foo does not implement <class 'IRectangle'> interface But if we use ImplicitRectangle or anything else that resembles the IRectangle interface, the function executes as it should: >>> draw_rectangle(ImplicitRectangle(2, 10)) 2 x 10 rectangle drawing Our example implementation of ensure_interface() is based on the typechecked() decorator from the typeannotations project that tries to provide run-time checking capabilities (refer to https://github.com/ceronman/typeannotations). Its source code might give you some interesting ideas about how to process type annotations to ensure run-time interface checking. The last feature that can be used to complement this interface pattern landscape are type hints. Type hints are described in detail by PEP 484 and were added to the language quite recently. They are exposed in the new typing module and are available from Python 3.5. Type hints are built on top of function annotations and reuse this slightly forgotten syntax feature of Python 3. They are intended to guide type hinting and check for various yet-to-come Python type checkers. The typing module and PEP 484 document aim to provide a standard hierarchy of types and classes that should be used for describing type annotations. Still, type hints do not seem to be something revolutionary because this feature does not come with any type checker built-in into the standard library. If you want to use type checking or enforce strict interface compatibility in your code, you need to create your own tool because there is none worth recommendation yet. This is why we won't dig into details of PEP 484. Anyway, type hints and the documents describing them are worth mentioning because if some extraordinary solution emerges in the field of type checking in Python, it is highly probable that it will be based on PEP 484. Using collections.abc Abstract base classes are like small building blocks for creating a higher level of abstraction. They allow you to implement really usable interfaces but are very generic and designed to handle lot more than this single design pattern. You can unleash your creativity and do magical things but building something generic and really usable may require a lot of work. Work that may never pay off. This is why custom abstract base classes are not used so often. Despite that, the collections.abc module provides a lot of predefined ABCs that allow to verify interface compatibility of many basic Python types. With base classes provided in this module, you can check, for example, whether a given object is callable, mapping, or if it supports iteration. Using them with the isinstance() function is way better than comparing them against the base python types. You should definitely know how to use these base classes even if you don't want to define your own custom interfaces with ABCMeta. The most common abstract base classes from collections.abc that you will use from time to time are: Container: This interface means that the object supports the in operator and implements the __contains__() method Iterable: This interface means that the object supports the iteration and implements the __iter__() method Callable: This interface means that it can be called like a function and implements the __call__() method Hashable: This interface means that the object is hashable (can be included in sets and as key in dictionaries) and implements the __hash__ method Sized: This interface means that the object has size (can be a subject of the len() function) and implements the __len__() method A full list of the available abstract base classes from the collections.abc module is available in the official Python documentation (refer to https://docs.python.org/3/library/collections.abc.html). Summary Design patterns are reusable, somewhat language-specific solutions to common problems in software design. They are a part of the culture of all developers, no matter what language they use. We learned a small part of design patterns in this article. We covered what interfaces are and how they can be used in Python. Resources for Article: Further resources on this subject: Creating User Interfaces [article] Graphical User Interfaces for OpenSIPS 1.6 [article]
Read more
  • 0
  • 0
  • 6243

article-image-parallel-universe-r
Packt
12 May 2016
10 min read
Save for later

The Parallel Universe of R

Packt
12 May 2016
10 min read
This article by Simon Chapple, author of the book Mastering Parallel Programming with R, helps us understand the intricacies of parallel computing. Here, we'll take a look into Delores' Crystal Ball at what the future holds for massively parallel computation that will likely have a significant impact on the world of R programming, particularly when applied to big data. (For more resources related to this topic, see here.) Three steps to successful parallelization The following three-step distilled guidance is intended to help you decide what form of parallelism might be best suited for your particular algorithm/problem and summarizes what you learned throughout this article. Necessarily, it applies a level of generalization, so approach these guidelines with due consideration: Determine the type of parallelism that may best apply to your algorithm. Is the problem you are solving more computationally bound or data bound? If the former, then your problem may be amenable to GPUs; if the latter, then your problem may be more amenable to cluster-based computing; and if your problem requires a complex processing chain, then consider using the Spark framework. Can you divide the problem data/space to achieve a balanced workload across all processes, or do you need to employ an adaptive load-balancing scheme—for example, a task farm-based approach? Does your problem/algorithm naturally divide spatially? If so, consider whether a grid-based parallel approach can be used. Perhaps your problem is on an epic scale? If so, maybe you can develop your message passing-based code and run it on a supercomputer. Is there an implied sequential dependency between tasks; that is, do processes need to cooperate and share data during their computation, or can each separate divided task be executed entirely independently of one another? A large proportion of parallel algorithms typically have a work distribution phase, a parallel computation phase, and a result aggregation phase. To reduce the overhead of the startup and close down phases, consider whether a Tree-based approach to work distribution and result aggregation may be appropriate in your case. Ensure that the basis of the compute in your algorithm has optimal implementation. Profile your code in serial to determine whether there are any bottlenecks, and target these for improvement. Is there an existing parallel implementation similar to your algorithm that you can use directly or adopt? Review CRAN Task View: High-Performance and Parallel Computing with R at https://cran.r-project.org/web/views/HighPerformanceComputing.html; in particular, take a look at the subsection entitled Parallel Computing: Applications, a snapshot of which at the time of writing can be seen in the following figure: Figure 1: CRAN provides various parallelized packages you can use in your own program. Test and evaluate the parallel efficiency of your implementation. Use the P-estimated form of Amdahl's Law to predict the level of scalability you can achieve. Test your algorithm at varying amounts of parallelism, particularly odd numbers that trigger edge-case behaviors. Don't forget to run with just a single process. Running with more processes than processors will lead to trigger-lurking deadlock/race conditions (this is most applicable to message passing-based implementations). Where possible, to reduce overhead, ensure that your method of deployment/initialization places the data being consumed locally to each parallel process. What does the future hold? Obviously, this final section is at a considerable risk of "crystal ball gazing" and getting it wrong. However, there are a number of clear directions in which we can see how both hardware and software will develop, which makes it clear that parallel programming will play an ever more important and increasing role in our computational future. Besides, it has now become critical for us to be able to process vast amounts of information within a short window of time in order to ensure our own individual and collective safety. For example, we are experiencing an increased momentum towards significant climate change and extreme weather events and will therefore require increasingly accurate weather prediction to help us deal with this; this will only be possible with highly efficient parallel algorithms. In order to gaze into the future, we need to look back at the past. The hardware technology available to parallel computing has evolved at a phenomenal pace through the years. The levels of performance that can be achieved today by single chip designs are truly staggering in terms of recent history. The history of HPC For an excellent infographic review of the development of computing performance, I would urge you to visit the following web page: http://pages.experts-exchange.com/processing-power-compared/ This beautifully illustrates how, for example, an iPhone 4 released in 2010 has near-equivalent performance to the Cray 2 supercomputer from 1985 of around 1.5 gigaflops, and the Apple Watch released in 2015 has around twice the performance of the iPhone 4 and Cray 2! While chip manufacturers have managed to maintain the famous Moore's law that predicts transistor count doubling every two years, we are now at 14 nanometers (nm) in chip production, giving us around 100 complex processing cores in a single chip. In July 2015, IBM announced a prototype chip at 7 nm (1/10,000th the width of a human hair). Some scientists suggest that quantum tunneling effects will start to impact at 5 nm (which Intel expects to bring to the market by 2020), although a number of research groups have shown individual transistor construction as small as 1 nm in the lab using materials such as graphene. What all of this suggests is that the placement of 1,000 independent high-performance computational cores, together with sufficient amounts of high-speed cache memory inside a single-chip package comparable to the size of today's chips, could potentially be possible within the next 10 years. NIVIDA and Intel are arguably at the forefront of the dedicated HPC chip development with their respective offerings used in the world's fastest supercomputers, which can also be embedded in your desktop computer. NVIDIA produces Tesla, the K80 GPU-based accelerator available that now peaks at 1.87 teraflops double precision and 5.6 teraflops single precision, utilizing 4,992 cores (dual processor) and 24 GB of on-board RAM. Intel produces Xeon Phi, the collective family brand name for its Many Integrated Core (MIC) architecture; Knights Landing, which is new, is expected to peak at 3 teraflops double precision and 6 teraflops single precision, utilizing 72 cores (single processor) and 16 GB of the highly integrated on-chip fast memory when it is released, likely in fall 2015. The successors to these chips, namely NVIDIA's Volta and Intel's Knights Hill, will be the foundation for the next generation of American $200 million dollar supercomputers in 2018, delivering around 150 to 300 petaflops peak performance (around 150 million iPhone 4s), as compared to China's TIANHE-2, ranked as the fastest supercomputer in the world in 2015, with a peak performance of around 50 petaflops from 3.1 million cores. At the other extreme, within the somewhat smaller and less expensive world of mobile devices, most currently use between two and four cores, though mixed multicore capability such as ARM's big.LITTLE octacore makes eight cores available. However, this is already on the increase with, for example, MediaTek's new SoC MT6797, which has 10 main processing cores split into a pair and two groups of four cores with different clock speeds and power requirements to serve as the basis for the next-generation mobile phones. Top-end mobile devices, therefore, exhibit a rich heterogeneous architecture with mixed power cores, separate sensor chips, GPUs, and Digital Signal Processors (DSP) to direct different aspects of workload to the most power-efficient component. Mobile phones increasingly act as the communications hub and signal a processing gateway for a plethora of additional devices, such as biometric wearables and the rapidly expanding number of ultra-low power Internet of Things (IoT) sensing devices smartening all aspects of our local environment. While we are a little bit away from running R itself natively on mobile devices, the time will come when we seek to harness the distributed computing power of all our mobile devices. In 2014 alone, around 1.25 billion smartphones were sold. That's a lot of crowd-sourced compute power and potentially far outstrips any dedicated supercomputer on the planet either existing or planned. The software that enables us to utilize parallel systems, which, as we noted, are increasingly heterogeneous, continues to evolve. In this book, we have examined how you can utilize OpenCL from R to gain access to both the GPU and CPU, making it possible to perform mixed computation across both components, exploiting the particular strengths of each for certain types of processing. Indeed, another related initiative, Heterogeneous System Architecture (HSA), which enables even lower-level access to the spectrum of processor capabilities, may well gain traction over the coming years and help promote the uptake of OpenCL and its counterparts. HSA Foundation HSA Foundation was founded by a cross-industry group led by AMD, ARM, Imagination, MediaTek, Qualcomm, Samsung, and Texas Instruments. Its stated goal is to help support the creation of applications that seamlessly blend scalar processing on the CPU, parallel processing on the GPU, and optimized processing on the DSP via high bandwidth shared memory access, enabling greater application performance at low power consumption. To enable this, HSA Foundation is defining key interfaces for parallel computation using CPUs, GPUs, DSPs, and other programmable and fixed-function devices, thus supporting a diverse set of high-level programming languages and creating the next generation in general-purpose computing. You can find the recently released version 1.0 of the HSA specification at the following link: http://www.hsafoundation.com/html/HSA_Library.htm Hybrid parallelism As a final wrapping up, I thought I would show how you can overcome some of the inherent single-threaded nature of R even further and demonstrate a hybrid approach to parallelism that combines two of the different techniques we covered previously within a single R program. We've also discussed how heterogeneous computing is potentially the way of the future. This example refers to the code we would develop to utilize MPI through pbdMPI together with ROpenCL to enable us to exploit both the CPU and GPU simultaneously. While this is a slightly contrived example and both the devices compute the same dist() function, the intention is to show you just how far you can take things with R to get the most out of all your available compute resource. Basically, all we need to do is to top and tail our implementation of the dist() function in OpenCL with the appropriate pbdMPI initialization and termination and run the script with mpiexec on two processes, as follows: # Initialise both ROpenCL and pdbMPI require(ROpenCL) library(pbdMPI, quietly = TRUE) init() # Select device based on my MPI rank r <- comm.rank() if (r == 0) { # use gpu device <- 1 } else { # use cpu device <- 2 } ... # Execute the OpenCL dist() function on my assigned device comm.print(sprintf("%d executing on device %s", r, getDeviceType(deviceID)), all.rank = TRUE) res <- teval(openclDist(kernel)) comm.print(sprintf("%d done in %f secs",r,res$Duration), all.rank = TRUE) finalize() This is simple and very effective! Summary In this article, we looked at Crystal Ball and saw the prospects for the combination of heterogeneous compute hardware that is here today and that will expand in capability even further in the future, not only in our supercomputers and laptops but also in our personal devices. Parallelism is the only way these systems can be utilized effectively. As the volume of new quantified self- and environmentally-derived data increases and the number of cores in our compute architectures continues to rise, so does the importance of being able to write parallel programs to make use of it all—job security for parallel programmers looks good for many years to come! Resources for Article: Further resources on this subject: Multiplying Performance with Parallel Computing [article] Training and Visualizing a neural network with R [article] Big Data Analysis (R and Hadoop) [article]
Read more
  • 0
  • 0
  • 1214
article-image-testing-your-application-cljstest
Packt
11 May 2016
13 min read
Save for later

Testing Your Application with cljs.test

Packt
11 May 2016
13 min read
In this article written by David Jarvis, Rafik Naccache, and Allen Rohner, authors of the book Learning ClojureScript, we'll take a look at how to configure our ClojureScript application or library for testing. As usual, we'll start by creating a new project for us to play around with: $ lein new figwheel testing (For more resources related to this topic, see here.) We'll be playing around in a test directory. Most JVM Clojure projects will have one already, but since the default Figwheel template doesn't include a test directory, let's make one first (following the same convention used with source directories, that is, instead of src/$PROJECT_NAME we'll create test/$PROJECT_NAME): $ mkdir –p test/testing We'll now want to make sure that Figwheel knows that it has to watch the test directory for file modifications. To do that, we will edit the the dev build in our project.clj project's :cljsbuild map so that it's :source-paths vector includes both src and test. Your new dev build configuration should look like the following: {:id "dev" :source-paths ["src" "test"] ;; If no code is to be run, set :figwheel true for continued automagical reloading :figwheel {:on-jsload "testing.core/on-js-reload"} :compiler {:main testing.core :asset-path "js/compiled/out" :output-to "resources/public/js/compiled/testing.js" :output-dir "resources/public/js/compiled/out" :source-map-timestamp true}} Next, we'll get the old Figwheel REPL going so that we can have our ever familiar hot reloading: $ cd testing $ rlwrap lein figwheel Don't forget to navigate a browser window to http://localhost:3449/ to get the browser REPL to connect. Now, let's create a new core_test.cljs file in the test/testing directory. By convention, most libraries and applications in Clojure and ClojureScript have test files that correspond to source files with the suffix _test. In this project, this means that test/testing/core_test.cljs is intended to contain the tests for src/testing/core.cljs. Let's get started by just running tests on a single file. Inside core_test.cljs, let's add the following code: (ns testing.core-test (:require [cljs.test :refer-macros [deftest is]])) (deftest i-should-fail (is (= 1 0))) (deftest i-should-succeed (is (= 1 1))) This code first requires two of the most important cljs.test macros, and then gives us two simple examples of what a failed test and a successful test should look like. At this point, we can run our tests from the Figwheel REPL: cljs.user=> (require 'testing.core-test) ;; => nil cljs.user=> (cljs.test/run-tests 'testing.core-test) Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 2 tests containing 2 assertions. 1 failures, 0 errors. ;; => nil At this point, what we've got is tolerable, but it's not really practical in terms of being able to test a larger application. We don't want to have to test our application in the REPL and pass in our test namespaces one by one. The current idiomatic solution for this in ClojureScript is to write a separate test runner that is responsible for important executions and then run all of your tests. Let's take a look at what this looks like. Let's start by creating another test namespace. Let's call this one app_test.cljs, and we'll put the following in it: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is]])) (deftest another-successful-test (is (= 4 (count "test")))) We will not do anything remarkable here; it's just another test namespace with a single test that should pass by itself. Let's quickly make sure that's the case at the REPL: cljs.user=> (require 'testing.app-test) nil cljs.user=> (cljs.test/run-tests 'testing.app-test) Testing testing.app-test Ran 1 tests containing 1 assertions. 0 failures, 0 errors. ;; => nil Perfect. Now, let's write a test runner. Let's open a new file that we'll simply call test_runner.cljs, and let's include the following: (ns testing.test-runner (:require [cljs.test :refer-macros [run-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (defn run-all-tests [] (run-tests 'testing.app-test 'testing.core-test)) Again, nothing surprising. We're just making a single function for us that runs all of our tests. This is handy for us at the REPL: cljs.user=> (testing.test-runner/run-all-tests) Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (cljs/test.js?zx=icyx7aqatbda:430:14) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. ;; => nil Ultimately, however, we want something we can run at the command line so that we can use it in a continuous integration environment. There are a number of ways we can go about configuring this directly, but if we're clever, we can let someone else do the heavy lifting for us. Enter doo, the handy ClojureScript testing plugin for Leiningen. Using doo for easier testing configuration doo is a library and Leiningen plugin for running cljs.test in many different JavaScript environments. It makes it easy to test your ClojureScript regardless of whether you're writing for the browser or for the server, and it also includes file watching capabilities such as Figwheel so that you can automatically rerun tests on file changes. The doo project page can be found at https://github.com/bensu/doo. To configure our project to use doo, first we need to add it to the list of plugins in our project.clj file. Modify the :plugins key so that it looks like the following: :plugins [[lein-figwheel "0.5.2"] [lein-doo "0.1.6"] [lein-cljsbuild "1.1.3" :exclusions [[org.clojure/clojure]]]] Next, we will add a new cljsbuild build configuration for our test runner. Add the following build map after the dev build map on which we've been working with until now: {:id "test" :source-paths ["src" "test"] :compiler {:main testing.test-runner :output-to "resources/public/js/compiled/testing_test.js" :optimizations :none}} This configuration tells Cljsbuild to use both our src and test directories, just like our dev profile. It adds some different configuration elements to the compiler options, however. First, we're not using testing.core as our main namespace anymore—instead, we'll use our test runner's namespace, testing.test-runner. We will also change the output JavaScript file to a different location from our compiled application code. Lastly, we will make sure that we pass in :optimizations :none so that the compiler runs quickly and doesn't have to do any magic to look things up. Note that our currently running Figwheel process won't know about the fact that we've added lein-doo to our list of plugins or that we've added a new build configuration. If you want to make Figwheel aware of doo in a way that'll allow them to play nicely together, you should also add doo as a dependency to your project. Once you've done that, exit the Figwheel process and restart it after you've saved the changes to project.clj. Lastly, we need to modify our test runner namespace so that it's compatible with doo. To do this, open test_runner.cljs and change it to the following: (ns testing.test-runner (:require [doo.runner :refer-macros [doo-tests]] [testing.app-test] [testing.core-test])) ;; This isn't strictly necessary, but is a good idea depending ;; upon your application's ultimate runtime engine. (enable-console-print!) (doo-tests 'testing.app-test 'testing.core-test) This shouldn't look too different from our original test runner—we're just importing from doo.runner rather than cljs.test and using doo-tests instead of a custom runner function. The doo-tests runner works very similarly to cljs.test/run-tests, but it places hooks around the tests to know when to start them and finish them. We're also putting this at the top-level of our namespace rather than wrapping it in a particular function. The last thing we're going to need to do is to install a JavaScript runtime that we can use to execute our tests with. Up until now, we've been using the browser via Figwheel, but ideally, we want to be able to run our tests in a headless environment as well. For this purpose. we recommend installing PhantomJS (though other execution environments are also fine). If you're on OS X and have Homebrew installed (http://www.brew.sh), installing PhantomJS is as simple as typing brew install phantomjs. If you're not on OS X or don't have Homebrew, you can find instructions on how to install PhantomJS on the project's website at http://phantomjs.org/. The key thing is that the following should work: $ phantomjs -v 2.0.0 Once you've got PhantomJS installed, you can now invoke your test runner from the command line with the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Let's break down this command. The first part, lein doo, just tells Leiningen to invoke the doo plugin. Next, we have phantom, which tells doo to use PhantomJS as its running environment. The doo plugin supports a number of other environments, including Chrome, Firefox, Internet Explorer, Safari, Opera, SlimerJS, NodeJS, Rhino, and Nashorn. Be aware that if you're interested in running doo on one of these other environments, you may have to configure and install additional software. For instance, if you want to run tests on Chrome, you'll need to install Karma as well as the appropriate Karma npm modules to enable Chrome interaction. Next we have test, which refers to the cljsbuild build ID we set up earlier. Lastly, we have once, which tells doo to just run tests and not to set up a filesystem watcher. If, instead, we wanted doo to watch the filesystem and rerun tests on any changes, we would just use lein doo phantom test. Testing fixtures The cljs.test project has support for adding fixtures to your tests that can run before and after your tests. Test fixtures are useful for establishing isolated states between tests—for instance, you can use tests to set up a specific database state before each test and to tear it down afterward. You can add them to your ClojureScript tests by declaring them with the use-fixtures macro within the testing namespace you want fixtures applied to. Let's see what this looks like in practice by changing one of our existing tests and adding some fixtures to it. Modify app-test.cljs to the following: (ns testing.app-test (:require [cljs.test :refer-macros [deftest is use-fixtures]])) ;; Run these fixtures for each test. ;; We could also use :once instead of :each in order to run ;; fixtures once for the entire namespace instead of once for ;; each individual test. (use-fixtures :each {:before (fn [] (println "Setting up tests...")) :after (fn [] (println "Tearing down tests..."))}) (deftest another-successful-test ;; Give us an idea of when this test actually executes. (println "Running a test...") (is (= 4 (count "test")))) Here, we've added a call to use-fixtures that prints to the console before and after running the test, and we've added a println call to the test itself so that we know when it executes. Now when we run this test, we get the following: $ lein doo phantom test once ;; ====================================================================== ;; Testing with Phantom: Testing testing.app-test Setting up tests... Running a test... Tearing down tests... Testing testing.core-test FAIL in (i-should-fail) (:) expected: (= 1 0) actual: (not (= 1 0)) Ran 3 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Note that our fixtures get called in the order we expect them to. Asynchronous testing Due to the fact that client-side code is frequently asynchronous and JavaScript is single threaded, we need to have a way to support asynchronous tests. To do this, we can use the async macro from cljs.test. Let's take a look at an example using an asynchronous HTTP GET request. First, let's modify our project.clj file to add cljs-ajax to our dependencies. Our dependencies project key should now look something like this: :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/clojurescript "1.7.228"] [cljs-ajax "0.5.4"] [org.clojure/core.async "0.2.374" :exclusions [org.clojure/tools.reader]]] Next, let's create a new async_test.cljs file in our test.testing directory. Inside it, we will add the following code: (ns testing.async-test (:require [ajax.core :refer [GET]] [cljs.test :refer-macros [deftest is async]])) (deftest test-async (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!"))})) Note that we're not using async in our test at the moment. Let's try running this test with doo (don't forget that you have to add testing.async-test to test_runner.cljs!): $ lein doo phantom test once ... Testing testing.async-test ... Ran 4 tests containing 3 assertions. 1 failures, 0 errors. Subprocess failed Now, our test here passes, but note that the println async code never fires, and our additional assertion doesn't get called (looking back at our previous examples, since we've added a new is assertion we should expect to see four assertions in the final summary)! If we actually want our test to appropriately validate the error-handler callback within the context of the test, we need to wrap it in an async block. Doing so gives us a test that looks like the following: (deftest test-async (async done (GET "http://www.google.com" ;; will always fail from PhantomJS because ;; `Access-Control-Allow-Origin` won't allow ;; our headless browser to make requests to Google. {:error-handler (fn [res] (is (= (:status-text res) "Request failed.")) (println "Test finished!") (done))}))) Now, let's try to run our tests again: $ lein doo phantom test once ... Testing testing.async-test Test finished! ... Ran 4 tests containing 4 assertions. 1 failures, 0 errors. Subprocess failed Awesome! Note that this time we see the printed statement from our callback, and we can see that cljs.test properly ran all four of our assertions. Asynchronous fixtures One final "gotcha" on testing—the fixtures we talked about earlier in this article do not handle asynchronous code automatically. This means that if you have a :before fixture that executes asynchronous logic, your test can begin running before your fixture has completed! In order to get around this, all you need to do is to wrap your :before fixture in an async block, just like with asynchronous tests. Consider the following for instance: (use-fixtures :once {:before #(async done ... (done)) :after #(do ...)}) Summary This concludes our section on cljs.test. Testing, whether in ClojureScript or any other language, is a critical software engineering best practice to ensure that your application behaves the way you expect it to and to protect you and your fellow developers from accidentally introducing bugs to your application. With cljs.test and doo, you have the power and flexibility to test your ClojureScript application with multiple browsers and JavaScript environments and to integrate your tests into a larger continuous testing framework. Resources for Article: Further resources on this subject: Clojure for Domain-specific Languages - Design Concepts with Clojure [article] Visualizing my Social Graph with d3.js [article] Improving Performance with Parallel Programming [article]
Read more
  • 0
  • 0
  • 4132

article-image-introducing-dynamics-crm
Packt
21 Apr 2016
13 min read
Save for later

Introducing Dynamics CRM

Packt
21 Apr 2016
13 min read
In this article by Nicolae Tarla, the author of Microsoft Dynamics CRM 2016 Customization, you will learn about the Customer Relationship Management (CRM) market and the huge uptake it has seen in the last few years. Some of the drivers for this market are the need to enhance customer experience, provide faster and better services, and adapting to the customer’s growing digital presence. CRM systems, in general, are taking a central place in the new organizational initiatives. (For more resources related to this topic, see here.) Dynamics CRM is Microsoft’s response to a growing trend. The newest version is Dynamics CRM 2016. It is being offered in a variety of deployment scenarios. From the standard on-premise deployment to a private cloud or an online cloud offering from Microsoft, the choice depends on each customer, their type of project, and a large number of requirements, policies, and legal restrictions. We’ll first look at what environment we need to complete the examples presented. We will create a new environment based on a Microsoft Dynamics CRM Online trial. This approach will give us 30-day trial to experiment with an environment for free. The following topics will be covered: Introducing Dynamics CRM Dynamics CRM features Deployment models Global datacenter locations Customization requirements Getting setup Dynamics CRM 2016 is the current version of the popular Customer Relationship Management platform offered by Microsoft. This platform offers users the ability to integrate and connect data across their sales, marketing, and customer service activities, and to give staff an overall 360-degree view of all interactions and activities as they relate to a specific customer. Along with the standard platform functionality provided, we have a wide range of customization options, allowing us to extend and further customize solutions to solve a majority of other business requirements. In addition, we can integrate this platform with other applications, and create a seamless solution. While by no means the only available CRM platform on the market today, Microsoft Dynamics CRM 2016, is one of the fastest growing, gaining large acceptance at all levels from small to mid-size and enterprise level organizations. This is due to a multitude of reasons, some of which include the variety of deployment options, the scalability, the extensibility, the ease of integration with other systems, and the ease of use. Microsoft Dynamics CRM can be deployed in a variety of options. Starting with the offering from Microsoft, you can get CRM Online. Once we have a 30-day trial active, this can be easily turned into a full production environment by providing payment information and keeping the environment active. The data will live in the cloud, on one of the data centers provided by Microsoft. Alternatively, you can obtain hosting with a third-party provider. The whole environment can be hosted by a third party, and the service can be offered either as a SaaS solution, or a fully hosted environment. Usually, there is a difference in the way payment is processed, with a SaaS solution in most cases being offered in a monthly subscription model. Another option is to have the environment hosted in-house. This option is called on premise deployment and carries the highest up-front cost but gives you the ability to customize the system extensively. In addition to the higher up-front cost, the cost to maintain the environment, the hardware, and skilled people required to constantly administer the environment can easily add-up. As of recently, we now have the ability to host a virtual CRM environment in Azure. This offloads the cost of maintaining the local infrastructure in a fashion similar to a third-party-hosted solution but takes advantage of the scalability and performance of a large cloud solution maintained and supported fully by Microsoft. The following white paper released by Microsoft describes the deployment model using Azure Virtual Machines: http://www.microsoft.com/en-us/download/details.aspx?id=49193 Features of Dynamics CRM Some of the most notable features of the Dynamics CRM platform include: Scalability Extensibility Ability to integrate with other systems Ease of use Let’s look at each of the features in more detail. Scalability Dynamics CRM can scale over a wide range of deployment options. From a single box deployment, used mostly for development, all the way to a cloud offering that can span over a large number of servers, and host a large number of environments, the same base solution can handle all the scenarios in between with ease. Extensibility Dynamics CRM is a platform in which the base offering comes with prepackaged functionality for Sales, Service, and Marketing; a large variety of solutions can be built on top of Dynamics CRM. The extensibility model is called xRM and allows power users, non-developers, and developers alike to build custom solutions to handle various other business scenarios or integrate with other third-party platforms. The Dynamics CRM Marketplace is a great example of such solutions that are built to extend the core platform, and are offered for sale by various companies. These companies are called Independent Software Vendors (ISVs) and play a very important role in the ecosystem created by Microsoft. In time and with enough experience, some of them become the go-to partners for various implementations. If nothing else, the Dynamics Marketplace is a cool place to look at some of the solutions created, and search for specific applications. The idea of the marketplace became public sometime around 2010 and was integrated into Dynamics CRM 2011. At launch, it was designed as a searchable repository of solutions. It is a win-win for both solution providers and customers alike. Solutions can also be rated, thus giving customers better community feedback before committing to purchasing and implementing a foreign solution into their organization. The Dynamics Marketplace is hosted on Pinpoint, Microsoft’s online directory of software applications and professional services. On this platform, independent companies and certified partners offer their products and services. At the time of this writing, Pinpoint hosts a few marketplaces, including Office, Azure, Dynamics, and Cloud, and is available at the following location: https://pinpoint.microsoft.com/en-CA/Home/Dynamics Navigating to the Dynamics page you are presented with a search option as seen in the following screenshot: You now have the option to filter your results by Solution providers, Services, or Apps (Applications). In addition, you can further filter your results by distance to a geo-location derived from an address or postal code, as well as other categories as illustrated in the following screenshot: When searching for a solution provider, the results provide a high-level view of the organization, with a logo and a high-level description. The Ratings and Competencies count are displayed for easy visibility as shown here: Drilling down into the partner profile page, you can find additional details on the organization, the industries focus, details on the competencies, as well as a way to connect with the organization. Navigation to additional details, including Reviews and Locations is available on the profile page. The Dynamics Marketplace is also available, starting with Dynamics CRM 2011, as a part of the organization. A user with necessary permission can navigate to Settings | Dynamics Marketplace. This presents the user with a view by solutions available. Options for sorting and filtering include Popular, Newest, and Featured. Community rating is clearly visible and provides the necessary feedback to consider when evaluating new solutions. Ability to integrate with other systems There is a large variety of integration options available when working with Dynamics CRM. In addition, various deployment options offer more or fewer integration features. With CRM Online, you tend to get more integration options into cloud services; whereas, the on-premise solution has a limited number of configurable integration options, but can provide more integration using various third-party tools. The base solution comes with the ability to configure integration with the following common services: SharePoint for document management Yammer for social features In addition, you can use specific connectors provided by either Microsoft or other third-party providers for integration with specific solutions. When the preceding options are not available, you can still integrate with other solutions using a third-party integration tool. This allows real-time integration into legacy systems. Some of the most popular tools used for integration include, but are not limited to: Kingsway Software (https://www.kingswaysoft.com/) Scribe (http://www.scribesoft.com/) BizTalk (http://www.microsoft.com/en-us/server-cloud/products/biztalk/) Ease of use Dynamics CRM offers users a variety of options to interact with the system. You can access Dynamics CRM either through a browser, with support for all recent versions of the major browsers. The following browsers and versions are supported: Internet Explorer—versions 10 and above Edge—latest version Chrome—latest version on Windows 7 and above Firefox—latest version on Windows 7 and above Safari on Mac—using the latest publicly released version on OS x 10.8 and above In addition, a user can interact with the system directly from the very familiar interface of Outlook. The Dynamics CRM connector for Outlook allows users to get access to all the system data and features from within Outlook. In addition, a set of functions built specifically for Outlook allows users to track and interact with e-mails, tasks, and events from within Outlook. Further to the features provided through the Outlook integration, users of CRM for Outlook have the ability to work offline. Data can be taken offline, work can be done when disconnected, and can be synchronized back into the system when connectivity resumes. For mobile users, Dynamics CRM can be accessed from mobile devices and tablets. Dynamics CRM provides a standard web-based interface for most mobile devices, as well as specific applications for various platforms including Windows-based tablets, iPads, and Android tablets. With these apps, you can also take a limited sub-set of cached data offline, as well as have the ability to create new records and synchronize them back to CRM next time you go online. The quality of these mobile offerings has increased exponentially over the last few versions, and new features are being added with each new release. In addition, third-party providers have also built mobile solutions for Dynamics CRM. A quick search in the application markets for each platform will reveal several options for each platform. Global Data Centre Locations for Dynamics CRM Online Dynamics CRM Online is hosted at various locations in the world. Preview organizations can be created in all available locations, but features are sometimes rolled out on a schedule, in some locations faster than others. The format of the Dynamics CRM Online Organization URL describes the data center location. As such, the standard format is as follows: https://OrganizationName.crm[x].dynamics.com The OrganizationName is the name you have selected for your online organization. This is customizable, and is validated for uniqueness within the respective data center. The [x] represents a number. As of this writing, this number can be anywhere between 2, 4, 5, 6, 7, 9, or no number at all. This describes the global data center used to host your organization. The following table maps the data center to the URL format: URL Format: crm[x].dynamics.com Global Data Centre Location crm.dynamics.com NAM crm2.dynamics.com SAM crm4.dynamics.com EMEA crm5.dynamics.com APAC crm6.dynamics.com OCE crm7.dynamics.com JPN crm9.dynamics.com GCC Out of these global locations, usually the following get a preview of the new features first: Organization Global Location crm.dynamics.com North America crm4.dynamics.com Europe, the Middle East and Africa crm5.dynamics.com Asia-Pacific New data centers are being added on a regular basis. As of this writing, new data centers are being added in Europe and Canada, with others to follow as needed. Some of the drivers behind adding these new data centers revolve around not only performance improvements, as a data center located closer to a customer will provide theoretically better performance, but also a need for privacy and localization of data. Strict legislation around data residency has a great impact on the selection of the deployment model by customers who are bound to store all data local to the country of the operation. Overall, by the end of 2016, the plan is to have Dynamics CRM Online available in 105 markets. These markets (countries) will be served by data centers spread across five generic global regions. These data centers share services between Dynamics CRM Online and other services such as Azure and Office 365. Advantages of choosing Dynamics CRM online Choosing one of the available hosting models for Dynamics CRM is now not only a matter of preference. The decision can be driven by multiple factors. During the last few years, there has been a huge push for the cloud. Microsoft has been very focused on enhancing their online offering, and has continued to push more functionality and more resources in supporting the cloud model. As such, Dynamics CRM Online has become a force to reckon with. It is hosted on a very modern and high performing infrastructure. Microsoft has pushed literally billions of dollars in new data centers and infrastructure. This allows new customers to forego the necessary expenses on infrastructure associated with an on-premise deployment. Along with investments on infrastructure, the SLA (service level agreement) offered by Dynamics CRM Online is financially backed by Microsoft. Depending on the service selected, the uptime is guaranteed and backed financially. Application and Infrastructure are automatically handled for you by Microsoft so you don’t have to. This translates in much lower upfront costs, as well as reduced costs around ongoing maintenance and upgrades. The Dynamics CRM Online offering is also compliant with various regulatory requirements, and backed and verified through various third-party tests. Various rules, regulations, and policies in various locales are validated and certified by various organizations. Some of the various compliance policies evaluated include but are not limited to: Data Privacy and Confidentiality Policies Data Classification Information Security Privacy Data Stewardship Secure Infrastructure Identity and Access Control All these compliance requirements are in conformance with regulations stipulated by the International Standard Organization and other international and local standards. Independent auditors validate standards compliance. Microsoft is ISO 27001 certified. The Microsoft Trust Center website located at http://www.microsoft.com/en-us/trustcenter/CloudServices/Dynamics provides additional information on compliance, responsibilities, and warranties. Further to the aforementioned benefits, choosing cloud over a standard on-premise deployment offers other advantages around scalability, faster time to market, and higher value proposition. In addition to the standard benefits of an online deployment, one other great advantage is the ability to spin-up a 30-day trial instance of Dynamics CRM Online and convert it to a paid instance only when ready to go to production. This allows customizers and companies to get started and customize their solution in a free environment, with no additional costs attached. The 30-day trial instance gives us a 25-license instance, which allows us to not only customize the organization, but also test various roles and restrictions. Summary We learned to create a new environment based on a Microsoft Dynamics CRM Online trial Resources for Article: Further resources on this subject: Customization in Microsoft Dynamics CRM[article] Introduction to Reporting in Microsoft Dynamics CRM[article] Using Processes in Microsoft Dynamics CRM 2011[article]
Read more
  • 0
  • 0
  • 1169

article-image-web-server-development
Packt
15 Apr 2016
24 min read
Save for later

Web Server Development

Packt
15 Apr 2016
24 min read
In this article by Holger Brunn, Alexandre Fayolle, and Daniel Eufémio Gago Reis, the authors of the book, Odoo Development Cookbook, have discussed how to deploy the web server in Odoo. In this article, we'll cover the following topics: Make a path accessible from the network Restrict access to web accessible paths Consume parameters passed to your handlers Modify an existing handler Using the RPC API (For more resources related to this topic, see here.) Introduction We'll introduce the basics of the web server part of Odoo in this article. Note that this article covers the fundamental pieces. All of Odoo's web request handling is driven by the Python library werkzeug (http://werkzeug.pocoo.org). While the complexity of werkzeug is mostly hidden by Odoo's convenient wrappers, it is an interesting read to see how things work under the hood. Make a path accessible from the network In this recipe, we'll see how to make an URL of the form http://yourserver/path1/path2 accessible to users. This can either be a web page or a path returning arbitrary data to be consumed by other programs. In the latter case, you would usually use the JSON format to consume parameters and to offer you data. Getting ready We'll make use of a ready-made library.book model. We want to allow any user to query the full list of books. Furthermore, we want to provide the same information to programs via a JSON request. How to do it… We'll need to add controllers, which go into a folder called controllers by convention. Add a controllers/main.py file with the HTML version of our page: from openerp import http from openerp.http import request class Main(http.Controller): @http.route('/my_module/books', type='http', auth='none') def books(self): records = request.env['library.book']. sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a function to serve the same information in the JSON format @http.route('/my_module/books/json', type='json', auth='none') def books_json(self): records = request.env['library.book']. sudo().search([]) return records.read(['name']) Add the file controllers/__init__.py: from . import main Add controllers to your __init__.py addon: from . import controllers After restarting your server, you can visit /my_module/books in your browser and get presented with a flat list of book names. To test the JSON-RPC part, you'll have to craft a JSON request. A simple way to do that would be using the following command line to receive the output on the command line: curl -i -X POST -H "Content-Type: application/json" -d "{}" localhost:8069/my_module/books/json If you get 404 errors at this point, you probably have more than one database available on your instance. In this case, it's impossible for Odoo to determine which database is meant to serve the request. Use the --db-filter='^yourdatabasename$' parameter to force using exact database you installed the module in. Now the path should be accessible. How it works… The two crucial parts here are that our controller is derived from openerp.http.Controller and that the methods we use to serve content are decorated with openerp.http.route. Inheriting from openerp.http.Controller registers the controller with Odoo's routing system in a similar way as models are registered by inheriting from openerp.models.Model; also, Controller has a meta class that takes care of this. In general, paths handled by your addon should start with your addon's name to avoid name clashes. Of course, if you extend some addon's functionality, you'll use this addon's name. openerp.http.route The route decorator allows us to tell Odoo that a method is to be web accessible in the first place, and the first parameter determines on which path it is accessible. Instead of a string, you can also pass a list of strings in case you use the same function to serve multiple paths. The type argument defaults to http and determines what type of request is to be served. While strictly speaking JSON is HTTP, declaring the second function as type='json' makes life a lot easier, because Odoo then handles type conversions itself. Don't worry about the auth parameter for now, it will be addressed in recipe Restrict access to web accessible paths. Return values Odoo's treatment of the functions' return values is determined by the type argument of the route decorator. For type='http', we usually want to deliver some HTML, so the first function simply returns a string containing it. An alternative is to use request.make_response(), which gives you control over the headers to send in the response. So to indicate when our page was updated the last time, we might change the last line in books() to the following: return request.make_response( result, [ ('Last-modified', email.utils.formatdate( ( fields.Datetime.from_string( request.env['library.book'].sudo() .search([], order='write_date desc', limit=1) .write_date) - datetime.datetime(1970, 1, 1) ).total_seconds(), usegmt=True)), ]) This code sends a Last-modified header along with the HTML we generated, telling the browser when the list was modified for the last time. We extract this information from the write_date field of the library.book model. In order for the preceding snippet to work, you'll have to add some imports on the top of the file: import email import datetime from openerp import fields You can also create a Response object of werkzeug manually and return that, but there's little gain for the effort. Generating HTML manually is nice for demonstration purposes, but you should never do this in production code. Always use templates as appropriate and return them by calling request.render(). This will give you localization for free and makes your code better by separating business logic from the presentation layer. Also, templates provide you with functions to escape data before outputting HTML. The preceding code is vulnerable to cross-site-scripting attacks if a user manages to slip a script tag into the book name, for example. For a JSON request, simply return the data structure you want to hand over to the client, Odoo takes care of serialization. For this to work, you should restrict yourself to data types that are JSON serializable, which are roughly dictionaries, lists, strings, floats and integers. openerp.http.request The request object is a static object referring to the currently handled request, which contains everything you need to take useful action. Most important is the property request.env, which contains an Environment object which is just the same as in self.env for models. This environment is bound to the current user, which is none in the preceding example because we used auth='none'. Lack of a user is also why we have to sudo() all our calls to model methods in the example code. If you're used to web development, you'll expect session handling, which is perfectly correct. Use request.session for an OpenERPSession object (which is quite a thin wrapper around the Session object of werkzeug), and request.session.sid to access the session id. To store session values, just treat request.session as a dictionary: request.session['hello'] = 'world' request.session.get('hello') Note that storing data in the session is not different from using global variables. Use it only if you must - that is usually the case for multi request actions like a checkout in the website_sale module. And also in this case, handle all functionality concerning sessions in your controllers, never in your modules. There's more… The route decorator can have some extra parameters to customize its behavior further. By default, all HTTP methods are allowed, and Odoo intermingles with the parameters passed. Using the parameter methods, you can pass a list of methods to accept, which usually would be one of either ['GET'] or ['POST']. To allow cross origin requests (browsers block AJAX and some other types of requests to domains other than where the script was loaded from for security and privacy reasons), set the cors parameter to * to allow requests from all origins, or some URI to restrict requests to ones originating from this URI. If this parameter is unset, which is the default, the Access-Control-Allow-Origin header is not set, leaving you with the browser's standard behavior. In our example, we might want to set it on /my_module/books/json in order to allow scripts pulled from other websites accessing the list of books. By default, Odoo protects certain types of requests from an attack known as cross-site request forgery by passing a token along on every request. If you want to turn that off, set the parameter csrf to False, but note that this is a bad idea in general. See also If you host multiple Odoo databases on the same instance and each database has different web accessible paths on possibly multiple domain names per database, the standard regular expressions in the --db-filter parameter might not be enough to force the right database for every domain. In that case, use the community module dbfilter_from_header from https://github.com/OCA/server-tools in order to configure the database filters on proxy level. To see how using templates makes modularity possible, see recipe Modify an existing handler later in the article. Restrict access to web accessible paths We'll explore the three authentication mechanisms Odoo provides for routes in this recipe. We'll define routes with different authentication mechanisms in order to show their differences. Getting ready As we extend code from the previous recipe, we'll also depend on the library.book model, so you should get its code correct in order to proceed. How to do it… Define handlers in controllers/main.py: Add a path that shows all books: @http.route('/my_module/all-books', type='http', auth='none') def all_books(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result Add a path that shows all books and indicates which was written by the current user, if any: @http.route('/my_module/all-books/mark-mine', type='http', auth='public') def all_books_mark_mine(self): records = request.env['library.book'].sudo().search([]) result = '<html><body><table>' for record in records: result += '<tr>' if record.author_ids & request.env.user.partner_id: result += '<th>' else: result += '<td>' result += record.name if record.author_ids & request.env.user.partner_id: result += '</th>' else: result += '</td>' result += '</tr>' result += '</table></body></html>' return result Add a path that shows the current user's books: @http.route('/my_module/all-books/mine', type='http', auth='user') def all_books_mine(self): records = request.env['library.book'].search([ ('author_ids', 'in', request.env.user.partner_id.ids), ]) result = '<html><body><table><tr><td>' result += '</td></tr><tr><td>'.join( records.mapped('name')) result += '</td></tr></table></body></html>' return result With this code, the paths /my_module/all_books and /my_module/all_books/mark_mine look the same for unauthenticated users, while a logged in user sees her books in a bold font on the latter path. The path /my_module/all-books/mine is not accessible at all for unauthenticated users. If you try to access it without being authenticated, you'll be redirected to the login screen in order to do so. How it works… The difference between authentication methods is basically what you can expect from the content of request.env.user. For auth='none', the user record is always empty, even if an authenticated user is accessing the path. Use this if you want to serve content that has no dependencies on users, or if you want to provide database agnostic functionality in a server wide module. The value auth='public' sets the user record to a special user with XML ID, base.public_user, for unauthenticated users, and to the user's record for authenticated ones. This is the right choice if you want to offer functionality to both unauthenticated and authenticated users, while the authenticated ones get some extras, as demonstrated in the preceding code. Use auth='user' to be sure that only authenticated users have access to what you've got to offer. With this method, you can be sure request.env.user points to some existing user. There's more… The magic for authentication methods happens in the ir.http model from the base addon. For whatever value you pass to the auth parameter in your route, Odoo searches for a function called _auth_method_<yourvalue> on this model, so you can easily customize this by inheriting this model and declaring a method that takes care of your authentication method of choice. As an example, we provide an authentication method base_group_user which enforces a currently logged in user who is a member of the group with XML ID, base.group_user: from openerp import exceptions, http, models from openerp.http import request class IrHttp(models.Model): _inherit = 'ir.http' def _auth_method_base_group_user(self): self._auth_method_user() if not request.env.user.has_group('base.group_user'): raise exceptions.AccessDenied() Now you can say auth='base_group_user' in your decorator and be sure that users running this route's handler are members of this group. With a little trickery you can extend this to auth='groups(xmlid1,…)', the implementation of this is left as an exercise to the reader, but is included in the example code. Consume parameters passed to your handlers It's nice to be able to show content, but it's better to show content as a result of some user input. This recipe will demonstrate the different ways to receive this input and react to it. As the recipes before, we'll make use of the library.book model. How to do it… First, we'll add a route that expects a traditional parameter with a book's ID to show some details about it. Then, we'll do the same, but we'll incorporate our parameter into the path itself: Add a path that expects a book's ID as parameter: @http.route('/my_module/book_details', type='http', auth='none') def book_details(self, book_id): record = request.env['library.book'].sudo().browse( int(book_id)) return u'<html><body><h1>%s</h1>Authors: %s' % ( record.name, u', '.join(record.author_ids.mapped( 'name')) or 'none', ) Add a path where we can pass the book's ID in the path @http.route("/my_module/book_details/<model('library.book') :book>", type='http', auth='none') def book_details_in_path(self, book): return self.book_details(book.id) If you point your browser to /my_module/book_details?book_id=1, you should see a detail page of the book with ID 1. If this doesn't exist, you'll receive an error page. The second handler allows you to go to /my_module/book_details/1 and view the same page. How it works… By default, Odoo (actually werkzeug) intermingles with GET and POST parameters and passes them as keyword argument to your handler. So by simply declaring your function as expecting a parameter called book_id, you introduce this parameter as either GET (the parameter in the URL) or POST (usually passed by forms with your handler as action) parameter. Given that we didn't add a default value for this parameter, the runtime will raise an error if you try to access this path without setting the parameter. The second example makes use of the fact that in a werkzeug environment, most paths are virtual anyway. So we can simply define our path as containing some input. In this case, we say we expect the ID of a library.book as the last component of the path. The name after the colon is the name of a keyword argument. Our function will be called with this parameter passed as keyword argument. Here, Odoo takes care of looking up this ID and delivering a browse record, which of course only works if the user accessing this path has appropriate permissions. Given that book is a browse record, we can simply recycle the first example's function by passing book.id as parameter book_id to give out the same content. There's more… Defining parameters within the path is a functionality delivered by werkzeug, which is called converters. The model converter is added by Odoo, which also defines the converter, models, that accepts a comma separated list of IDs and passes a record set containing those IDs to your handler. The beauty of converters is that the runtime coerces the parameters to the expected type, while you're on your own with normal keyword parameters. These are delivered as strings and you have to take care of the necessary type conversions yourself, as seen in the first example. Built-in werkzeug converters include int, float, and string, but also more intricate ones such as path, any, or uuid. You can look up their semantics at http://werkzeug.pocoo.org/docs/0.11/routing/#builtin-converters. See also Odoo's custom converters are defined in ir_http.py in the base module and registered in the _get_converters method of ir.http. As an exercise, you can create your own converter that allows you to visit the /my_module/book_details/Odoo+cookbook page to receive the details of this book (if you added it to your library before). Modify an existing handler When you install the website module, the path /website/info displays some information about your Odoo instance. In this recipe, we override this in order to change this information page's layout, but also to change what is displayed. Getting ready Install the website module and inspect the path /website/info. Now craft a new module that depends on website and uses the following code. How to do it… We'll have to adapt the existing template and override the existing handler: Override the qweb template in a file called views/templates.xml: <?xml version="1.0" encoding="UTF-8"?> <odoo> <template id="show_website_info" inherit_id="website.show_website_info"> <xpath expr="//dl[@t-foreach='apps']" position="replace"> <table class="table"> <tr t-foreach="apps" t-as="app"> <th> <a t-att-href="app.website"> <t t-esc="app.name" /></a> </th> <td><t t-esc="app.summary" /></td> </tr> </table> </xpath> </template> </odoo> Override the handler in a file called controllers/main.py: from openerp import http from openerp.addons.website.controllers.main import Website class Website(Website): @http.route() def website_info(self): result = super(Website, self).website_info() result.qcontext['apps'] = result.qcontext[ 'apps'].filtered( lambda x: x.name != 'website') return result Now when visiting the info page, we'll only see a filtered list of installed applications, and in a table as opposed to the original definition list. How it works In the first step, we override an existing QWeb template. In order to find out which that is, you'll have to consult the code of the original handler. Usually, it will end with the following command line, which tells you that you need to override template.name: return request.render('template.name', values) In our case, the handler uses a template called website.info, but this one is extended immediately by another template called website.show_website_info, so it's more convenient to override this one. Here, we replace the definition list showing installed apps with a table. In order to override the handler method, we must identify the class that defines the handler, which is openerp.addons.website.controllers.main.Website in this case. We import the class to be able to inherit from it. Now we override the method and change the data passed to the response. Note that what the overridden handler returns is a Response object and not a string of HTML as the previous recipes did for the sake of brevity. This object contains a reference to the template to be used and the values accessible to the template, but is only evaluated at the very end of the request. In general, there are three ways to change an existing handler: If it uses a QWeb template, the simplest way of changing it is to override the template. This is the right choice for layout changes and small logic changes. QWeb templates get a context passed, which is available in the response as the field qcontext. This usually is a dictionary where you can add or remove values to suit your needs. In the preceding example, we filter the list of apps to only contain apps which have a website set. If the handler receives parameters, you could also preprocess those in order to have the overridden handler behave the way you want. There's more… As seen in the preceding section, inheritance with controllers works slightly differently than model inheritance: You actually need a reference to the base class and use Python inheritance on it. Don't forget to decorate your new handler with the @http.route decorator; Odoo uses it as a marker for which methods are exposed to the network layer. If you omit the decorator, you actually make the handler's path inaccessible. The @http.route decorator itself behaves similarly to field declarations: every value you don't set will be derived from the decorator of the function you're overriding, so we don't have to repeat values we don't want to change. After receiving a response object from the function you override, you can do a lot more than just changing the QWeb context: You can add or remove HTTP headers by manipulating response.headers. If you want to render an entirely different template, you can set response.template. To detect if a response is based on QWeb in the first place, query response.is_qweb. The resulting HTML code is available by calling response.render(). Using the RPC API One of Odoo's strengths is its interoperability, which is helped by the fact that basically any functionality is available via JSON-RPC 2.0 and XMLRPC. In this recipe, we'll explore how to use both of them from client code. This interface also enables you to integrate Odoo with any other application. Making functionality available via any of the two protocols on the server side is explained in the There's more section of this recipe. We'll query a list of installed modules from the Odoo instance, so that we could show a list as the one displayed in the previous recipe in our own application or website. How to do it… The following code is not meant to run within Odoo, but as simple scripts: First, we query the list of installed modules via XMLRPC: #!/usr/bin/env python2 import xmlrpclib db = 'odoo9' user = 'admin' password = 'admin' uid = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/common') .authenticate(db, user, password, {}) odoo = xmlrpclib.ServerProxy( 'http://localhost:8069/xmlrpc/2/object') installed_modules = odoo.execute_kw( db, uid, password, 'ir.module.module', 'search_read', [[('state', '=', 'installed')], ['name']], {'context': {'lang': 'fr_FR'}}) for module in installed_modules: print module['name'] Then we do the same with JSONRPC: import json import urllib2 db = 'odoo9' user = 'admin' password = 'admin' request = urllib2.Request( 'http://localhost:8069/web/session/authenticate', json.dumps({ 'jsonrpc': '2.0', 'params': { 'db': db, 'login': user, 'password': password, }, }), {'Content-type': 'application/json'}) result = urllib2.urlopen(request).read() result = json.loads(result) session_id = result['result']['session_id'] request = urllib2.Request( 'http://localhost:8069/web/dataset/call_kw', json.dumps({ 'jsonrpc': '2.0', 'params': { 'model': 'ir.module.module', 'method': 'search_read', 'args': [ [('state', '=', 'installed')], ['name'], ], 'kwargs': {'context': {'lang': 'fr_FR'}}, }, }), { 'X-Openerp-Session-Id': session_id, 'Content-type': 'application/json', }) result = urllib2.urlopen(request).read() result = json.loads(result) for module in result['result']: print module['name'] Both code snippets will print a list of installed modules, and because they pass a context that sets the language to French, the list will be in French if there are no translations available. How it works… Both snippets call the function search_read, which is very convenient because you can specify a search domain on the model you call, pass a list of fields you want to be returned, and receive the result in one request. In older versions of Odoo, you had to call search first to receive a list of IDs and then call read to actually read the data. search_read returns a list of dictionaries, with the keys being the names of the fields requested and the values the record's data. The ID field will always be transmitted, no matter if you requested it or not. Now, we need to look at the specifics of the two protocols. XMLRPC The XMLRPC API expects a user ID and a password for every call, which is why we need to fetch this ID via the method authenticate on the path /xmlrpc/2/common. If you already know the user's ID, you can skip this step. As soon as you know the user's ID, you can call any model's method by calling execute_kw on the path /xmlrpc/2/object. This method expects the database you want to execute the function on, the user's ID and password for authentication, then the model you want to call your function on, and then the function's name. The next two mandatory parameters are a list of positional arguments to your function, and a dictionary of keyword arguments. JSONRPC Don't be distracted by the size of the code example, that's because Python doesn't have built in support for JSONRPC. As soon as you've wrapped the urllib calls in some helper functions, the example will be as concise as the XMLRPC one. As JSONRPC is stateful, the first thing we have to do is to request a session at /web/session/authenticate. This function takes the database, the user's name, and their password. The crucial part here is that we record the session ID Odoo created, which we pass in the header X-Openerp-Session-Id to /web/dataset/call_kw. Then the function behaves the same as execute_kw from; we need to pass a model name and a function to call on it, then positional and keyword arguments. There's more… Both protocols allow you to call basically any function of your models. In case you don't want a function to be available via either interface, prepend its name with an underscore – Odoo won't expose those functions as RPC calls. Furthermore, you need to take care that your parameters, as well as the return values, are serializable for the protocol. To be sure, restrict yourself to scalar values, dictionaries, and lists. As you can do roughly the same with both protocols, it's up to you which one to use. This decision should be mainly driven by what your platform supports best. In a web context, you're generally better off with JSON, because Odoo allows JSON handlers to pass a CORS header conveniently (see the Make a path accessible from the network recipe for details). This is rather difficult with XMLRPC. Summary In this article, we saw how to start about with the web server architecture. Later on, we covered the Routes and Controllers that will be used in the article and their authentication, how the handlers consumes parameters, and how to use an RPC API, namely, JSON-RPC and XML-RPC. Resources for Article: Further resources on this subject: Advanced React [article] Remote Authentication [article] ASP.Net Site Performance: Improving JavaScript Loading [article]
Read more
  • 0
  • 0
  • 3934
article-image-setting-and-cleaning
Packt
11 Apr 2016
34 min read
Save for later

Setting Up and Cleaning Up

Packt
11 Apr 2016
34 min read
This article, by Mani Tadayon, author of the book, RSpec Essentials, discusses support code to set tests up and clean up after them. Initialization, configuration, cleanup, and other support code related to RSpec specs are important in real-world RSpec usage. We will learn how to cleanly organize support code in real-world applications by learning about the following topics: Configuring RSpec with spec_helper.rb Initialization and configuration of resources Preventing tests from accessing the Internet with WebMock Maintaining clean test state Custom helper code Loading support code on demand with tags (For more resources related to this topic, see here.) Configuring RSpec with spec_helper.rb The RSpec specs that we've seen so far have functioned as standalone units. Specs in the real world, however, almost never work without supporting code to prepare the test environment before tests are run and ensure it is cleaned up afterwards. In fact, the first line of nearly every real-world RSpec spec file loads a file that takes care of initialization, configuration, and cleanup: require 'spec_helper' By convention, the entry point for all support code for specs is in a file called spec_helper.rb. Another convention is that specs are located in a folder called spec in the root folder of the project. The spec_helper.rb file is located in the root of this spec folder. Now that we know where it goes, what do we actually put in spec_helper.rb? Let's start with an example: # spec/spec_helper.rb require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 3    end To see what these two options do, let's create a couple of dummy spec files that include our spec_helper.rb. Here's the first spec file: # spec/first_spec.rb require 'spec_helper'   describe 'first spec' do   it 'sleeps for 1 second' do     sleep 1   end     it 'sleeps for 2 seconds' do     sleep 2   end      it 'sleeps for 3 seconds' do     sleep 3   end  end And here's our second spec file: # spec/second_spec.rb require 'spec_helper'   describe 'second spec' do   it 'sleeps for 4 second' do     sleep 4   end     it 'sleeps for 5 seconds' do     sleep 5   end      it 'sleeps for 6 seconds' do     sleep 6   end  end Now let's run our two spec files and see what happens: We note that we used --format documentation when running RSpec so that we see the order in which the tests were run (the default format just outputs a green dot for each passing test). From the output, we can see that the tests were run in a random order. We can also see the three slowest specs. Although this was a toy example, I would recommend using both of these configuration options for RSpec. Running examples in a random order is very important, as it is the only reliable way of detecting bad tests which sometimes pass and sometimes fail based on the order the in which overall test suite is run. Also, keeping tests running fast is very important for maintaining a productive development flow, and seeing which tests are slow on every test run is the most effective way of encouraging developers to make the slow tests fast, or remove them from the test run. We'll return to both test order and test speed later. For now, let us just note that RSpec configuration is very important to keeping our specs reliable and fast. Initialization and configuration of resources Real-world applications rely on resources, such as databases, and external services, such as HTTP APIs. These must be initialized and configured for the application to work properly. When writing tests, dealing with these resources and services can be a challenge because of two opposing fundamental interests. First, we would like the test environment to match as closely as possible the production environment so that tests that interact with resources and services are realistic. For example, we may use a powerful database system in production that runs on many servers to provide the best performance. Should we spend money and effort to create and maintain a second production-grade database environment just for testing purposes? Second, we would like the test environment to be simple and relatively easy to understand, so that we understand what we are actually testing. We would also like to keep our code modular so that components can be tested in isolation, or in simpler environments that are easier to create, maintain, and understand. If we think of the example of the system that relies on a database cluster in production, we may ask ourselves whether we are better off using a single-server setup for our test database. We could even go so far as to use an entirely different database for our tests, such as the file-based SQLite. As always, there are no easy answers to such trade-offs. The important thing is to understand the costs and benefits, and adjust where we are on the continuum between production faithfulness and test simplicity as our system evolves, along with the goals it serves. For example, for a small hobbyist application or a project with a limited budget, we may choose to completely favor test simplicity. As the same code grows to become a successful fan site or a big-budget project, we may have a much lower tolerance for failure, and have both the motivation and resources to shift towards production faithfulness for our test environment. Some rules of thumb to keep in mind: Unit tests are better places for test simplicity Integration tests are better places for production faithfulness Try to cleverly increase production faithfulness in unit tests Try to cleverly increase test simplicity in integration tests In between unit and integration tests, be clear what is and isn't faithful to the production environment A case study of test simplicity with an external service Let's put these ideas into practice. I haven't changed the application code, except to rename the module OldWeatherQuery. The test code is also slightly changed to require a spec_helper file and to use a subject block to define an alias for the module name, which makes it easier to rename the code without having to change many lines of test code. So let's look at our three files now. First, here's the application code: # old_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module OldWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)       if use_cache       cache[place] ||= begin         @api_request_count += 1         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError     raise NetworkError.new("Bad response")   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end Next is the spec file: # spec/old_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../old_weather_query'   describe OldWeatherQuery do   subject(:weather_query) { described_class }     describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})         example.run         weather_query.clear!     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])       allow(weather_query).to receive(:http).and_return("{}")     end     after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)       allow(weather_query).to receive(:http).and_return("{}")     end       after do       weather_query.clear!     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end And last but not least, our spec_helper file, which has also changed only slightly: we only configure RSpec to show one slow spec (to keep test results uncluttered) and use color in the output to distinguish passes and failures more easily: # spec/spec_helper.rb   require 'rspec'   RSpec.configure do |config|   config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run these specs, something unexpected happens. Most of the time the specs pass, but sometimes they fail. If we keep running the specs with the same command, we'll see the tests pass and fail apparently at random. These are flaky tests, and we have exposed them because of the random order configuration we chose. If our tests run in a certain order, they fail. The problem could be simply in our tests. For example, we could have forgotten to clear state before or after a test. However, there could also be a problem with our code. In any case, we need to get to the bottom of the situation: We first notice that at the end of the failing test run, RSpec tells us "Randomized with seed 318". We can use this information to run the tests in the order that caused the failure and start to debug and diagnose the problem. We do this by passing the --seed parameter with the value 318, as follows: $ rspec spec/old_weather_query_spec.rb --seed 318 The problem has to do with the way that we increment @api_request_count without ensuring it has been initialized. Looking at our code, we notice that the only place we initialize @api_request_count is in OldWeatherQuery.api_request_count and OldWeatherQuery.clear!. If we don't call either of these methods first, then OldWeatherQuery.forecast, the main method in this module, will always fail. Our tests sometimes pass because our setup code calls one of these methods first when tests are run in a certain order, but that is not at all how our code would likely be used in production. So basically, our code is completely broken, but our specs pass (sometimes). Based on this, we can create a simple spec that will always fail: describe 'api_request is not initialized' do   it "does not raise an error" do     weather_query.forecast('Malibu,US')   end    end At least now our tests fail deterministically. But this is not the end of our troubles with these specs. If we run our tests many times with the seed value of 318, we will start seeing a second failing test case that is even more random than the first. This is an OldWeatherQuery::NetworkError and it indicates that our tests are actually making HTTP requests to the Internet! Let's do an experiment to confirm this. We'll turn off our Wi-Fi access, unplug our Ethernet cables, and run our specs. When we run our tests without any Internet access, we will see three errors in total. One of them is the error with the uninitialized @api_request_count instance variable, and two of them are instances of OldWeatherQuery::NetworkError, which confirms that we are indeed making real HTTP requests in our code. What's so bad about making requests to the Internet? After all, the test failures are indeed very random and we had to purposely shut off our Internet access to replicate the errors. Flaky tests are actually the least of our problems. First, we could be performing destructive actions that affect real systems, accounts, and people! Imagine if we were testing an e-commerce application that charged customer credit cards by using a third-party payment API via HTTP. If our tests actually hit our payment provider's API endpoint over HTTP, we would get a lot of declined transactions (assuming we are not storing and using real credit card numbers), which could lead to our account being suspended due to suspicions of fraud, putting our e-commerce application out of service. Also, if we were running a continuous integration (CI) server such as Jenkins, which did not have access to the public Internet, we would get failures in our CI builds due to failing tests that attempted to access the Internet. There are a few approaches to solving this problem. In our tests, we attempted to mock our HTTP requests, but obviously failed to do so effectively. A second approach is to allow actual HTTP requests but to configure a special server for testing purposes. Let's focus on figuring out why our HTTP mocks were not successful. In a small set of tests like in this example, it is not hard to hunt down the places where we are sending actual HTTP requests. In larger code bases with a lot of test support code, it may be harder. Also, it would be nice to prevent access to the Internet altogether so we notice these issues as soon as we run the offending tests. Fortunately, Ruby has many excellent tools for testing, and there is one that addresses our needs exactly: WebMock (https://github.com/bblimke/webmock). We simply install the gem and add a couple of lines to our spec helper file to disable all network connections in our tests: require 'rspec'   # require the webmock gem require 'webmock/rspec'   RSpec.configure do |config|   # this is done by default, but let's make it clear   WebMock.disable_net_connect!     Config.order            = 'random'   config.profile_examples = 1   config.color            = true end When we run our tests again, we'll see one or more instances of WebMock::NetConnectNotAllowedError, along with a backtrace to lead us to the point in our tests where the HTTP request was made: If we examine our test code, we'll notice that we mock the OldWeatherQuery.http method in a few places. However, we forgot to set up the mock in the first describe block for caching where we defined a json_response object, but never mocked the OldWeatherQuery.http method to return json_response. We can solve the problem by mocking OldWeatherQuery.http throughout the entire test file. We'll also take this opportunity to clean up the initialization of @api_request_count in our code. Here's what we have now: # new_weather_query.rb   require 'net/http' require 'json' require 'timeout'   module NewWeatherQuery   extend self     class NetworkError < StandardError   end     def forecast(place, use_cache=true)     add_to_history(place)     if use_cache       cache[place] ||= begin         increment_api_request_count         JSON.parse( http(place) )       end     else       JSON.parse( http(place) )     end   rescue JSON::ParserError => e     raise NetworkError.new("Bad response: #{e.inspect}")   end     def increment_api_request_count     @api_request_count ||= 0     @api_request_count += 1   end     def api_request_count     @api_request_count ||= 0   end     def history     (@history || []).dup   end     def clear!     @history           = []     @cache             = {}     @api_request_count = 0   end     private     def add_to_history(s)     @history ||= []     @history << s   end     def cache     @cache ||= {}   end     BASE_URI = 'http://api.openweathermap.org/data/2.5/weather?q='   def http(place)     uri = URI(BASE_URI + place)       Net::HTTP.get(uri)   rescue Timeout::Error     raise NetworkError.new("Request timed out")   rescue URI::InvalidURIError     raise NetworkError.new("Bad place name: #{place}")   rescue SocketError     raise NetworkError.new("Could not reach #{uri.to_s}")   end end And here is the spec file to go with it: # spec/new_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../new_weather_query'   describe NewWeatherQuery do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end   describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache)       expect(actual).to eq({})             example.run     end       it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache)       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end     describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end Now we've fixed a major bug with our code that slipped through our specs and used to pass randomly. We've made it so that our tests always pass, regardless of the order in which they are run, and without needing to access the Internet. Our test code and application code has also become clearer as we've reduced duplication in a few places. A case study of production faithfulness with a test resource instance We're not done with our WeatherQuery example just yet. Let's take a look at how we would add a simple database to store our cached values. There are some serious limitations to the way we are caching with instance variables, which persist only within the scope of a single Ruby process. As soon as we stop or restart our app, the entire cache will be lost. In a production app, we would likely have many processes running the same code in order to serve traffic effectively. With our current approach, each process would have a separate cache, which would be very inefficient. We could easily save many HTTP requests if we were able to share the cache between processes and across restarts. Economizing on these requests is not simply a matter of improved response time. We also need to consider that we cannot make unlimited requests to external services. For commercial services, we would pay for the number of requests we make. For free services, we are likely to get throttled if we exceed some threshold. Therefore, an effective caching scheme that reduces the number of HTTP requests we make to our external services is of vital importance to the function of a real-world app. Finally, our cache is very simplistic and has no expiration mechanism short of clearing all entries. For a cache to be effective, we need to be able to store entries for individual locations for some period of time within which we don't expect the weather forecast to change much. This will keep the cache small and up to date. We'll use Redis (http://redis.io) as our database since it is very fast, simple, and easy to set up. You can find instructions on the Redis website on how to install it, which is an easy process on any platform. Once you have Redis installed, you simply need to start the server locally, which you can do with the redis-server command. We'll also need to install the Redis Ruby client as a gem (https://github.com/redis/redis-rb). Let's start with a separate configuration file to set up our Redis client for our tests: # spec/config/redis.rb   require 'rspec' require 'redis'   ENV['WQ_REDIS_URL'] ||= 'redis://localhost:6379/15'   RSpec.configure do |config|   if ! ENV['WQ_REDIS_URL'].is_a?(String)     raise "WQ_REDIS_URL environment variable not set"   end   ::REDIS_CLIENT = Redis.new( :url => ENV['WQ_REDIS_URL'] )     config.after(:example) do         ::REDIS_CLIENT.flushdb   end end Note that we place this file in a new config folder under our main spec folder. The idea is to configure each resource separately in its own file to keep everything isolated and easy to understand. This will make maintenance easy and prevent problems with configuration management down the road. We don't do much in this file, but we do establish some important conventions. There is a single environment variable, which takes care of the Redis connection URL. By using an environment variable, we make it easy to change configuration and also allow flexibility in how these configurations are stored. Our code doesn't care if the Redis connection URL is stored in a simple .env file with key-value pairs or loaded from a configuration database. We can also easily override this value manually simply by setting it when we run RSpec, like so: $ WQ_REDIS_URL=redis://1.2.3.4:4321/0 rspec spec Note that we also set a sensible default value, which is to run on the default Redis port of 6379 on our local machine, on database number 15, which is less likely to be used for local development. This prevents our tests from relying on our development database, or from polluting or destroying it. It is also worth mentioning that we prefix our environment variable with WQ (short for weather query). Small details like this are very important for keeping our code easy to understand and to prevent dangerous clashes. We could imagine the kinds of confusion and clashes that could be caused if we relied on REDIS_URL and we had multiple apps running on the same server, all relying on Redis. It would be very easy to break many applications if we changed the value of REDIS_URL for a single app to point to a different instance of Redis. We set a global constant, ::REDIS_CLIENT, to point to a Redis client. We will use this in our code to connect to Redis. Note that in real-world code, we would likely have a global namespace for the entire app and we would define globals such as REDIS_CLIENT under that namespace rather than in the global Ruby namespace. Finally, we configure RSpec to call the flushdb command after every example tagged with :redis to empty the database and keep state clean across tests. In our code, all tests interact with Redis, so this tag seems pointless. However, it is very likely that we would add code that had nothing to do with Redis, and using tags helps us to constrain the scope of our configuration hooks only to where they are needed. This will also prevent confusion about multiple hooks running for the same example. In general, we want to prevent global hooks where possible and make configuration hooks explicitly triggered where possible. So what does our spec look like now? Actually, it is almost exactly the same. Only a few lines have changed to work with the new Redis cache. See if you can spot them! # spec/redis_weather_query_spec.rb   require_relative 'spec_helper' require_relative '../redis_weather_query'   describe RedisWeatherQuery, redis: true do   subject(:weather_query) { described_class }     after { weather_query.clear! }     let(:json_response) { '{}' }   before do     allow(weather_query).to receive(:http).and_return(json_response)        end     describe 'api_request is initialized' do     it "does not raise an error" do       weather_query.forecast('Malibu,US')     end      end           describe 'caching' do     let(:json_response) do       '{"weather" : { "description" : "Sky is Clear"}}'     end       around(:example) do |example|       actual = weather_query.send(:cache).all       expect(actual).to eq({})             example.run     end     it "stores results in local cache" do       weather_query.forecast('Malibu,US')         actual = weather_query.send(:cache).all       expect(actual.keys).to eq(['Malibu,US'])       expect(actual['Malibu,US']).to be_a(Hash)     end       it "uses cached result in subsequent queries" do       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')       weather_query.forecast('Malibu,US')     end   end     describe 'query history' do     before do       expect(weather_query.history).to eq([])     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.history).to eq(places)     end       it "does not allow history to be modified" do       expect {         weather_query.history = ['Malibu,CN']       }.to raise_error         weather_query.history << 'Malibu,CN'       expect(weather_query.history).to eq([])     end   end   describe 'number of API requests' do     before do       expect(weather_query.api_request_count).to eq(0)     end       it "stores every place requested" do       places = %w(         Malibu,US         Beijing,CN         Delhi,IN         Malibu,US         Malibu,US         Beijing,CN       )         places.each {|s| weather_query.forecast(s) }         expect(weather_query.api_request_count).to eq(3)     end       it "does not allow count to be modified" do       expect {         weather_query.api_request_count = 100       }.to raise_error         expect {         weather_query.api_request_count += 10       }.to raise_error         expect(weather_query.api_request_count).to eq(0)     end   end end So what about the actual WeatherQuery code? It changes very little as well: # redis_weather_query.rb   require 'net/http' require 'json' require 'timeout'   # require the new cache module require_relative 'redis_weather_cache' module RedisWeatherQuery   extend self     class NetworkError < StandardError   end     # ... same as before ...     def clear!     @history           = []     @api_request_count = 0           # no more clearing of cache here   end     private     # ... same as before ...       # the new cache module has a Hash-like interface   def cache     RedisWeatherCache   end     # ... same as before ...     end We can see that we've preserved pretty much the same code and specs as before. Almost all of the new functionality is accomplished in a new module that caches with Redis. Here is what it looks like: # redis_weather_cache.rb   require 'redis'   module RedisWeatherCache   extend self     CACHE_KEY             = 'weather_query:cache'   EXPIRY_ZSET_KEY       = 'weather_query:expiry_tracker'   EXPIRE_FORECAST_AFTER = 300 # 5 minutes       def redis_client     if ! defined?(::REDIS_CLIENT)       raise("No REDIS_CLIENT defined!")     end         ::REDIS_CLIENT   end     def []=(location, forecast)     redis_client.hset(CACHE_KEY, location, JSON.generate(forecast))     redis_client.zadd(EXPIRY_ZSET_KEY, Time.now.to_i, location)   end     def [](location)     remove_expired_entries         raw_value = redis_client.hget(CACHE_KEY, location)         if raw_value       JSON.parse(raw_value)     else       nil     end   end     def all     redis_client.hgetall(CACHE_KEY).inject({}) do |memo, (location, forecast_json)|       memo[location] = JSON.parse(forecast_json)       memo     end   end     def clear!     redis_client.del(CACHE_KEY)   end     def remove_expired_entries     # expired locations have a score, i.e. creation timestamp, less than a certain threshold     expired_locations = redis_client.zrangebyscore(EXPIRY_ZSET_KEY, 0, Time.now.to_i - EXPIRE_FORECAST_AFTER)       if ! expired_locations.empty?       # remove the cache entry       redis_client.hdel(CACHE_KEY, expired_locations)                  # also clear the expiry entry       redis_client.zrem(EXPIRY_ZSET_KEY, expired_locations)      end   end end We'll avoid a detailed explanation of this code. We simply note that we accomplish all of the design goals we discussed at the beginning of the section: a persistent cache with expiration of individual values. We've accomplished this using some simple Redis functionality along with ZSET or sorted set functionality, which is a bit more complex, and which we needed because Redis does not allow individual entries in a Hash to be deleted. We can see that by using method names such as RedisWeatherCache.[] and RedisWeatherCache.[]=, we've maintained a Hash-like interface, which made it easy to use this cache instead of the simple in-memory Ruby Hash we had in our previous iteration. Our tests all pass and are still pretty simple, thanks to the modularity of this new cache code, the modular configuration file, and the previous fixes we made to our specs to remove Internet and run-order dependencies. Summary In this article, we delved into setting up and cleaning up state for real-world specs that interact with external services and local resources by extending our WeatherQuery example to address a big bug, isolate our specs from the Internet, and cleanly configure a Redis database to serve as a better cache. Resources for Article: Further resources on this subject: Creating your first heat map in R [article] Probability of R? [article] Programming on Raspbian [article]
Read more
  • 0
  • 0
  • 2875

article-image-threading-basics
Packt
08 Apr 2016
6 min read
Save for later

Threading Basics

Packt
08 Apr 2016
6 min read
In this article by Eugene Agafonov, author of the book Multithreading with C# Cookbook - Second Edition, we will cover the basic tasks to work with threads in C#. You will learn the following recipes: Creating a thread in C# Pausing a thread Making a thread wait (For more resources related to this topic, see here.) Creating a thread in C# Throughout the following recipes, we will use Visual Studio 2015 as the main tool to write multithreaded programs in C#. This recipe will show you how to create a new C# program and use threads in it. There is a free Visual Studio Community 2015 IDE, which can be downloaded from the Microsoft website and used to run the code samples. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how to create a new C# program and use threads in it, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. Make sure that the project uses .NET Framework 4.6 or higher; however, the code in this article will work with previous versions. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; Add the following code snippet below the Main method: static void PrintNumbers() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { WriteLine(i); } } Add the following code snippet inside the Main method: Thread t = new Thread(PrintNumbers); t.Start(); PrintNumbers(); Run the program. The output will be something like the following screenshot: How it works... In steps 1 and 2, we created a simple console application in C# using .Net Framework version 4.0. Then, in step 3, we included the System.Threading namespace, which contains all the types needed for the program. Then, we used the using static feature from C# 6.0, which allows us to use the System.Console type's static methods without specifying the type name. An instance of a program that is being executed can be referred to as a process. A process consists of one or more threads. This means that when we run a program, we always have one main thread that executes the program code. In step 4, we defined the PrintNumbers method, which will be used in both the main and newly created threads. Then, in step 5, we created a thread that runs PrintNumbers. When we construct a thread, an instance of the ThreadStart or ParameterizedThreadStart delegate is passed to the constructor. The C# compiler creates this object behind the scenes when we just type the name of the method we want to run in a different thread. Then, we start a thread and run PrintNumbers in the usual manner on the main thread. As a result, there will be two ranges of numbers from 1 to 10 randomly crossing each other. This illustrates that the PrintNumbers method runs simultaneously on the main thread and on the other thread. Pausing a thread This recipe will show you how to make a thread wait for some time without wasting operating system resources. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how to make a thread wait without wasting operating system resources, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; using static System.Threading.Thread; Add the following code snippet below the Main method: static void PrintNumbers() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { WriteLine(i); } } static void PrintNumbersWithDelay() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { Sleep(TimeSpan.FromSeconds(2)); WriteLine(i); } } Add the following code snippet inside the Main method: Thread t = new Thread(PrintNumbersWithDelay); t.Start(); PrintNumbers(); Run the program. How it works... When the program is run, it creates a thread that will execute a code in the PrintNumbersWithDelay method. Immediately after that, it runs the PrintNumbers method. The key feature here is adding the Thread.Sleep method call to a PrintNumbersWithDelay method. It causes the thread executing this code to wait a specified amount of time (2 seconds in our case) before printing each number. While a thread sleeps, it uses as little CPU time as possible. As a result, we will see that the code in the PrintNumbers method, which usually runs later, will be executed before the code in the PrintNumbersWithDelay method in a separate thread. Making a thread wait This recipe will show you how a program can wait for some computation in another thread to complete to use its result later in the code. It is not enough to use Thread.Sleep method because we don't know the exact time the computation will take. Getting ready To work through this recipe, you will need Visual Studio 2015. There are no other prerequisites. How to do it... To understand how a program waits for some computation in another thread to complete in order to use its result later, perform the following steps: Start Visual Studio 2015. Create a new C# console application project. In the Program.cs file, add the following using directives: using System; using System.Threading; using static System.Console; using static System.Threading.Thread; Add the following code snippet below the Main method: static void PrintNumbersWithDelay() { WriteLine("Starting..."); for (int i = 1; i < 10; i++) { Sleep(TimeSpan.FromSeconds(2)); WriteLine(i); } } Add the following code snippet inside the Main method: WriteLine("Starting..."); Thread t = new Thread(PrintNumbersWithDelay); t.Start(); t.Join(); WriteLine("Thread completed"); Run the program. How it works... When the program is run, it runs a long-running thread that prints out numbers and waits two seconds before printing each number. But in the main program, we called the t.Join method, which allows us to wait for thread t to complete. When it is complete, the main program continues to run. With the help of this technique, it is possible to synchronize execution steps between two threads. The first one waits until another one is complete and then continues to work. While the first thread waits, it is in a blocked state (as it is in the previous recipe when you call Thread.Sleep). Summary In this article, we focused on performing some very basic operations with threads in the C# language. We covered a thread's life cycle, which includes creating a thread, suspending a thread, and making a thread wait. Resources for Article: Further resources on this subject: Simplifying Parallelism Complexity in C#[article] Watching Multiple Threads in C#[article] Debugging Multithreaded Applications as Singlethreaded in C#[article]
Read more
  • 0
  • 0
  • 917