Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-hypothesis-testing-with-r
Richa Tripathi
13 Feb 2018
8 min read
Save for later

Hypothesis testing with R

Richa Tripathi
13 Feb 2018
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from the book Learning Quantitative Finance with R written by Dr. Param Jeet and Prashant Vats. This book will help you understand the basics of R and how they can be applied in various Quantitative Finance scenarios.[/box] Hypothesis testing is used to reject or retain a hypothesis based upon the measurement of an observed sample. So in today’s tutorial we will discuss how to implement the various scenarios of hypothesis testing in R. Lower tail test of population mean with known variance The null hypothesis is given by where is the hypothesized lower bound of the population mean. Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is greater than $10. The average of 30 days' daily return sample is $9.9. Assume the population standard deviation is 0.011. Can we reject the null hypothesis at .05 significance level? Now let us calculate the test statistics z which can be computed by the following code in R: > xbar= 9.9 > mu0 = 10 > sig = 1.1 > n = 30 > z = (xbar-mu0)/(sig/sqrt(n)) > z Here: xbar: Sample mean mu: Hypothesized value sig: Standard deviation of population n: Sample size z: Test statistics This gives the value of z the test statistics: [1] -0.4979296 Now let us find out the critical value at 0.05 significance level. It can be computed by the following code: > alpha = .05 > z.alpha = qnorm(1-alpha) > -z.alpha This gives the following output: [1] -1.644854 Since the value of the test statistics is greater than the critical value, we fail to reject the null hypothesis claim that the return is greater than $10. In place of using the critical value test, we can use the pnorm function to compute the lower tail of Pvalue test statistics. This can be computed by the following code: > pnorm(z) This gives the following output: [1] 0.3092668 Since the Pvalue is greater than 0.05, we fail to reject the null hypothesis. Upper tail test of population mean with known variance The null hypothesis is given by  where  is the hypothesized upper bound of the population mean. Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is at most $5. The average of 30 days' daily return sample is $5.1. Assume the population standard deviation is 0.25. Can we reject the null hypothesis at .05 significance level? Now let us calculate the test statistics z, which can be computed by the following code in R: > xbar= 5.1 > mu0 = 5 > sig = .25 > n = 30 > z = (xbar-mu0)/(sig/sqrt(n)) > z Here: xbar: Sample mean mu0: Hypothesized value sig: Standard deviation of population n: Sample size z: Test statistics It gives 2.19089 as the value of test statistics. Now let us calculate the critical value at .05 significance level, which is given by the following code: > alpha = .05 > z.alpha = qnorm(1-alpha) > z.alpha This gives 1.644854, which is less than the value computed for the test statistics. Hence we reject the null hypothesis claim. Also, the Pvalue of the test statistics is given as follows: >pnorm(z, lower.tail=FALSE) This gives 0.01422987, which is less than 0.05 and hence we reject the null hypothesis. Two-tailed test of population mean with known variance The null hypothesis is given by  where  is the hypothesized value of the population mean. Let us assume a scenario where the mean of daily returns of a stock last year is $2. The average of 30 days' daily return sample is $1.5 this year. Assume the population standard deviation is .2. Can we reject the null hypothesis that there is not much significant difference in returns this year from last year at .05 significance level? Now let us calculate the test statistics z, which can be computed by the following code in R: > xbar= 1.5 > mu0 = 2 > sig = .1 > n = 30 > z = (xbar-mu0)/(sig/sqrt(n)) > z This gives the value of test statistics as -27.38613. Now let us try to find the critical value for comparing the test statistics at .05 significance level. This is given by the following code: >alpha = .05 >z.half.alpha = qnorm(1-alpha/2) >c(-z.half.alpha, z.half.alpha) This gives the value -1.959964, 1.959964. Since the value of test statistics is not between the range (-1.959964, 1.959964), we reject the claim of the null hypothesis that there is not much significant difference in returns this year from last year at .05 significance level. The two-tailed Pvalue statistics is given as follows: >2*pnorm(z) This gives a value less than .05 so we reject the null hypothesis. In all the preceding scenarios, the variance is known for population and we use the normal distribution for hypothesis testing. However, in the next scenarios, we will not be given the variance of the population so we will be using t distribution for testing the hypothesis. Lower tail test of population mean with unknown variance The null hypothesis is given by  where  is the hypothesized lower bound of the population mean. Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is greater than $1. The average of 30 days' daily return sample is $.9. Assume the population standard deviation is 0.01. Can we reject the null hypothesis at .05 significance level? In this scenario, we can compute the test statistics by executing the following code: > xbar= .9 > mu0 = 1 > sig = .1 > n = 30 > t = (xbar-mu0)/(sig/sqrt(n)) > t Here: xbar: Sample mean mu0: Hypothesized value sig: Standard deviation of sample n: Sample size t: Test statistics This gives the value of the test statistics as -5.477226. Now let us compute the critical value at .05 significance level. This is given by the following code: > alpha = .05 > t.alpha = qt(1-alpha, df=n-1) > -t.alpha We get the value as -1.699127. Since the value of the test statistics is less than the critical value, we reject the null hypothesis claim. Now instead of the value of the test statistics, we can use the Pvalue associated with the test statistics, which is given as follows: >pt(t, df=n-1) This results in a value less than .05 so we can reject the null hypothesis claim. Upper tail test of population mean with unknown variance The null hypothesis is given by where  is the hypothesized upper bound of the population mean. Let us assume a scenario where an investor assumes that the mean of daily returns of a stock since inception is at most $3. The average of 30 days' daily return sample is $3.1. Assume the population standard deviation is .2. Can we reject the null hypothesis at .05 significance level? Now let us calculate the test statistics t which can be computed by the following code in R: > xbar= 3.1 > mu0 = 3 > sig = .2 > n = 30 > t = (xbar-mu0)/(sig/sqrt(n)) > t Here: xbar: Sample mean mu0: Hypothesized value sig: Standard deviation of sample n: Sample size t: Test statistics This gives the value 2.738613 of the test statistics. Now let us find the critical value associated with the .05 significance level for the test statistics. It is given by the following code: > alpha = .05 > t.alpha = qt(1-alpha, df=n-1) > t.alpha Since the critical value 1.699127 is less than the value of the test statistics, we reject the null hypothesis claim. Also, the value associated with the test statistics is given as follows: >pt(t, df=n-1, lower.tail=FALSE) This is less than .05. Hence the null hypothesis claim gets rejected. Two tailed test of population mean with unknown variance The null hypothesis is given by , where  is the hypothesized value of the population mean. Let us assume a scenario where the mean of daily returns of a stock last year is $2. The average of 30 days' daily return sample is $1.9 this year. Assume the population standard deviation is .1. Can we reject the null hypothesis that there is not much significant difference in returns this year from last year at .05 significance level? Now let us calculate the test statistics t, which can be computed by the following code in R: > xbar= 1.9 > mu0 = 2 > sig = .1 > n = 30 > t = (xbar-mu0)/(sig/sqrt(n)) > t This gives -5.477226 as the value of the test statistics. Now let us try to find the critical value range for comparing, which is given by the following code: > alpha = .05 > t.half.alpha = qt(1-alpha/2, df=n-1) > c(-t.half.alpha, t.half.alpha) This gives the range value (-2.04523, 2.04523). Since this is the value of the test statistics, we reject the claim of the null hypothesis We learned how to practically perform one-tailed/ two-tailed hypothesis testing with known as well as unknown variance using R. If you enjoyed this excerpt, check out the book  Learning Quantitative Finance with R to explore different methods to manage risks and trading using Machine Learning with R.
Read more
  • 0
  • 0
  • 6975

article-image-compression-formats-linux-shell-script
Packt
31 Jan 2011
6 min read
Save for later

Compression Formats in Linux Shell Script

Packt
31 Jan 2011
6 min read
  Linux Shell Scripting Cookbook Solve real-world shell scripting problems with over 110 simple but incredibly effective recipes Master the art of crafting one-liner command sequence to perform tasks such as text processing, digging data from files, and lot more Practical problem solving techniques adherent to the latest Linux platform Packed with easy-to-follow examples to exercise all the features of the Linux shell scripting language Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to complete the task as efficiently as possible Compressing with gunzip (gzip) gzip is a commonly used compression format in GNU/Linux platforms. Utilities such as gzip, gunzip, and zcat are available to handle gzip compression file types. gzip can be applied on a file only. It cannot archive directories and multiple files. Hence we use a tar archive and compress it with gzip. When multiple files are given as input it will produce several individually compressed (.gz) files. Let's see how to operate with gzip. How to do it... In order to compress a file with gzip use the following command: $ gzip filename $ ls filename.gz Then it will remove the file and produce a compressed file called filename.gz. Extract a gzip compressed file as follows: $ gunzip filename.gz It will remove filename.gz and produce an uncompressed version of filename.gz. In order to list out the properties of a compressed file use: $ gzip -l test.txt.gz compressed uncompressed ratio uncompressed_name 35 6 -33.3% test.txt The gzip command can read a file from stdin and also write a compressed file into stdout. Read from stdin and out as stdout as follows: $ cat file | gzip -c > file.gz The -c option is used to specify output to stdout. We can specify the compression level for gzip. Use --fast or the --best option to provide low and high compression ratios, respectively. There's more... The gzip command is often used with other commands. It also has advanced options to specify the compression ratio. Let's see how to work with these features. Gzip with tarball We usually use gzip with tarballs. A tarball can be compressed by using the –z option passed to the tar command while archiving and extracting. You can create gzipped tarballs using the following methods: Method - 1 $ tar -czvvf archive.tar.gz [FILES] Or: $ tar -cavvf archive.tar.gz [FILES] The -a option specifies that the compression format should automatically be detected from the extension. Method - 2First, create a tarball: $ tar -cvvf archive.tar [FILES] Compress it after tarballing as follows: $ gzip archive.tar If many files (a few hundreds) are to be archived in a tarball and need to be compressed, we use Method - 2 with few changes. The issue with giving many files as command arguments to tar is that it can accept only a limited number of files from the command line. In order to solve this issue, we can create a tar file by adding files one by one using a loop with an append option (-r) as follows: FILE_LIST="file1 file2 file3 file4 file5" for f in $FILE_LIST; do tar -rvf archive.tar $f done gzip archive.tar In order to extract a gzipped tarball, use the following: -x for extraction -z for gzip specification Or: $ tar -xavvf archive.tar.gz -C extract_directory In the above command, the -a option is used to detect the compression format automatically. zcat – reading gzipped files without extracting zcat is a command that can be used to dump an extracted file from a .gz file to stdout without manually extracting it. The .gz file remains as before but it will dump the extracted file into stdout as follows: $ ls test.gz $ zcat test.gz A test file # file test contains a line "A test file" $ ls test.gz Compression ratio We can specify compression ratio, which is available in range 1 to 9, where: 1 is the lowest, but fastest 9 is the best, but slowest You can also specify the ratios in between as follows: $ gzip -9 test.img This will compress the file to the maximum. Compressing with bunzip (bzip) bunzip2 is another compression technique which is very similar to gzip. bzip2 typically produces smaller (more compressed) files than gzip. It comes with all Linux distributions. Let's see how to use bzip2. How to do it... In order to compress with bzip2 use: $ bzip2 filename $ ls filename.bz2 Then it will remove the file and produce a compressed file called filename.bzip2. Extract a bzipped file as follows: $ bunzip2 filename.bz2 It will remove filename.bz2 and produce an uncompressed version of filename. bzip2 can read a file from stdin and also write a compressed file into stdout. In order to read from stdin and read out as stdout use: $ cat file | bzip2 -c > file.tar.bz2 -c is used to specify output to stdout. We usually use bzip2 with tarballs. A tarball can be compressed by using the -j option passed to the tar command while archiving and extracting. Creating a bzipped tarball can be done by using the following methods: Method - 1 $ tar -cjvvf archive.tar.bz2 [FILES] Or: $ tar -cavvf archive.tar.bz2 [FILES] The -a option specifies to automatically detect compression format from the extension. Method - 2First create the tarball: $ tar -cvvf archive.tar [FILES] Compress it after tarballing: $ bzip2 archive.tar If we need to add hundreds of files to the archive, the above commands may fail. To fix that issue, use a loop to append files to the archive one by one using the –r option. Extract a bzipped tarball as follows: $ tar -xjvvf archive.tar.bz2 -C extract_directory In this command: -x is used for extraction -j is for bzip2 specification -C is for specifying the directory to which the files are to be extracted Or, you can use the following command: $ tar -xavvf archive.tar.bz2 -C extract_directory -a will automatically detect the compression format. There's more... bunzip has several additional options to carry out different functions. Let's go through few of them. Keeping input files without removing them While using bzip2 or bunzip2, it will remove the input file and produce a compressed output file. But we can prevent it from removing input files by using the –k option. For example: $ bunzip2 test.bz2 -k $ ls test test.bz2 Compression ratio We can specify the compression ratio, which is available in the range of 1 to 9 (where 1 is the least compression, but fast, and 9 is the highest possible compression but much slower). For example: $ bzip2 -9 test.img This command provides maximum compression.  
Read more
  • 0
  • 0
  • 6968

article-image-type-subtype-and-category-patterns-logical-data-modeling
Packt
24 Oct 2009
4 min read
Save for later

Type, Subtype, and Category Patterns in Logical Data Modeling

Packt
24 Oct 2009
4 min read
Before I cover the three logical data modeling patterns, let's review briefly how we typically model a type. Let's say you're in a car business. You can model a car as a Car entity shown in the figure below; its sample data values are in following table (I just use six numbers for the VIN, instead of the 17 characters VIN standard, in the sample). VIN (Vehicle Identification Number), the car serial number, is the unique key of the Car entity. The other attributes (Brand, Model, Year, and Manufacturer's Suggested Retail Price) can be thought as the type of a specific car  with a unique VIN. So, the type is in the entity itself. Note that you can have more than one car—each with a unique VIN—that have the same type, such as the first three Honda Accord in the sample table. If you have many cars of the same type, or, you have many car types and they're dynamic (have changes:  new, update, delete; for example, the update on the MSRP), you can easily recognize that this model is then not suitable—type model is a better solution.   VIN Brand Model Year MSRP 123987 Honda Accord 2007 20,000 456321 Honda Accord 2007 20,000 555666 Honda Accord 2007 20,000 678345 Toyota Corolla 2008 21,000   ...   ...   Type The ER (Entity Relationship) diagram of the following figure shows Car Type and Car entities and their relationship. Car Type defines each type of your cars—a type is a definition of something. The Car is the individual car, each with a serial number (Vehicle Identifier Number) that has a specific type defined in the Car Type. You can think of a Car Type entity as a template used (instantiated) by an individual car. Now you can have as many car types as you need, and type changes don't affect the cars. Table two tables after the figure contain sample data values of the Car Type—Car data model. Note that a car can belong to one car type only. On the other hand, a car type can be the type of many cars.   Car Type Key Brand Model Year MSRP 1 Honda Accord 2007 20,000 2 Toyota Corolla 2008 21,000 ... ... ... ... ...       VIN Car Type Key Owner 123987 1 Djoni Darmawikarta 456321 1 Kevin Peter 555666 1 Rao Ganipineni 678345 2 Sherman Chang ... ... ... How do we deal with product that doesn't have an individual identifier? Can we apply the same data modeling structure to, for example, commercial books? You certainly have inventory; each Inventory is an instance of the Book Type. The following figure shows the Book Type—Book data model and its sample data values, respectively. You can also apply the same data model to intangible thing, such as Service; an individual service may be identified by, for example, a contract number. The following figure  and the last table in the article show the Service Type—Service data model and its sample data values, respectively. Subtype What if you have cars that have different sets of attributes, meaning different types? You can model the different types as subtypes. The following figure shows two subtypes of the Car Type entity: Passenger Car Type and Truck Type. The Car supertype has the common attributes of its subtypes while each of the subtypes has its different attributes. Category While Type is a definition of something, Category is a way to categorize something. While Service can be of only one type, it can be of more than one category—its relationship to Category entity is many-to-many. An example of category for Service is shown in the following figure and its sample data values in the table after it. Note that you need to resolve the many-to-many relationship at implementation time   Service Category Key Service Category 1 Bundled 2 Outsourced 3 Onsite 4 Software 5 Hardware ... ... Summary Type, Subtype, and Category are similar patterns for data modeling. This article introduces these three patterns and shows their differences. One or more of them exist in most data model. If your initial data model doesn't have any one of them then you should re-inspect the data model.
Read more
  • 0
  • 0
  • 6961

article-image-creating-a-3d-printed-kite
Michael Ang
30 Oct 2014
9 min read
Save for later

Polygon Construction - Make a 3D printed kite

Michael Ang
30 Oct 2014
9 min read
3D printers are incredible machines, but let's face it, it takes them a long time to work their magic! Printing small objects takes a relatively short amount of time, but larger objects can take hours and hours to print. Is there a way we can get the speed of printing small objects, while still making something big—even bigger than our printer can make in one piece? This tutorial shows a technique I'm calling "polygon construction", where you 3D print connectors and attach them with rods to make a larger structure. This technique is the basis for my Polygon Construction Kit (Polycon). A Polycon object, with 3D printer for scale I’m going to start simple, showing how even one connector can form the basis of a rather delightful object—a simple flying kite! The kite we're making today is a version of the Eddy diamond kite, originally invented in the 1890s by William Eddy. This classic diamond kite is easy to make, and flies well. We'll design and print the central connector in the kite, and use wooden rods to extend the shape. The total size of the kite is 50 centimeters (about 20 inches) tall and wide, which is bigger than most print beds. We'll design the connector so that it's parametric—we'll be able to change the important sizes of the connector just by changing a few numbers. Because the connector is a small object to print out, and the design is parametric, we'll be able to iterate quickly if we want to make changes. A finished kite The connector we need is a cross that holds two of the "arms" up at an angle. This angle is called the dihedral angle, and is one of the secrets to why the Eddy kite flies well. We'll use the OpenSCAD modeling program to create the connector. OpenSCAD allows you to create solid objects for printing by combining simple shapes such as cylinders and boxes using a basic programming language. It's open source and multi-platform. You can download OpenSCAD from http://openscad.org. Open OpenSCAD. You should have a blank document. Let's set up a few variables to represent the important dimensions in our connector, and make the first part of our connector, which is a cylinder that will be one of the four "arms" of the cross. Go to Design->Compile  to see the result. rod_diameter = 4; // in millimeters wall_thickness = 2; tube_length = 20; angle = 15; // degrees cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); First part of the connector Now let's add the same shape, but translate down. You can see the axis indicator in the lower-left corner of the output window. The blue axis pointing up is the Z-axis, so you want to move down (negative) in the Z-axis. Add this line to your file and recompile (Design->Compile). translate([0,0,-tube_length]) cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); Second part of the main tube Now that we have the long straight part of our connector, let's add the angled arms. We want the angled part of the connector to be 90 degrees from the straight part, and then rotated by our dihedral angle (in this case 15 degrees). In OpenSCAD, rotations can be specified as a rotation around X, Y, and then Z axes. If you look at the axis indicator, you can see that a rotation around the Y-axis of 90 degrees, followed by a rotation around the Z-axis of 15 degrees that will get us to the right place. Here's the code: rotate([0,90,angle]) cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length);    First angled part Let's do the same thing, but for the other side. Instead of rotating by 15 degrees, we'll rotate by 180 degrees and then subtract out the 15 degrees to put the new cylinder on the opposite side. rotate([0,90,180-angle]) cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); Opposite angled part Awesome, we have the shape of our connector! There's only one problem: how do we make the holes for the rods to go in? To do this we'll make the same shape, but a little smaller, and then subtract it out of the shape we already made. OpenSCAD supports Boolean operations on shapes, and in this case the Boolean operation we want is difference. To make the Boolean operation easier, we'll group the different parts of the shape together by putting them into modules. Once we have the parts we want together in a module, we can make a difference of the two modules. Here's the complete new version: rod_diameter = 4; // in millimeters wall_thickness = 2; tube_length = 20; angle = 15; // degrees // Connector as a solid object (no holes) module solid() { cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); translate([0,0,-tube_length]) cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); rotate([0,90,angle]) cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); rotate([0,90,180-angle]) cylinder(r = rod_diameter + wall_thickness * 2, h = tube_length); } // Object representing the space for the rods. module hole_cutout() { cut_overlap = 0.2; // Extra length to make clean cut out of main shape cylinder(r = rod_diameter, h = tube_length + cut_overlap); translate([0,0,-tube_length-cut_overlap]) cylinder(r = rod_diameter, h = tube_length + cut_overlap); rotate([0,90,angle]) cylinder(r = rod_diameter, h = tube_length + cut_overlap); rotate([0,90,180-angle]) cylinder(r = rod_diameter, h = tube_length + cut_overlap); } difference() { solid(); hole_cutout(); } Completed connector We've finished modeling our kite connector! But what if our rod isn't 4mm in diameter? What if it's 1/8"? Since we've written a program to describe our kite connector, making the change is easy. We can change the parameters at the beginning of the file to change the shape of the connector. There are 25.4 millimeters in an inch, and OpenSCAD can do the math for us to convert from inches to millimeters. Let's change the rod diameter to 1/8" and also change the dihedral angle, so there's more of a visible change. Change the parameters at the top of the file and recompile (Design->Compile). rod_diameter = 1/8 * 25.4; // inches to millimeters wall_thickness = 2; tube_length = 20; angle = 20; // degrees Different angle and rod diameter Now you start to see the power of using a parametric model—making a new connector can be as simple as changing a few numbers and recompiling the design. To get the model ready for printing, change the rod diameter to the size of the rod you actually have, and change the angle back to 15 degrees. Now go to Design->Compile and Render so that the connector is fully rendered inside OpenSCAD. Go to File->Export->Export as STL and save the file. Open the .stl file in the software for your 3D printer. I have a Prusa i3 Berlin RepRap printer and use Cura as my printer software, but almost any printer/software combination should work. You may want to rotate the part so that it doesn't need as much support under the overhanging arms, but be aware that the orientation of the layers will affect the final strength (if the tube breaks, it's almost always from the layers splitting apart). It's worth experimenting with changing the orientation of the part to increase its strength. Orienting the layers slightly at an angle to the length of the tube seems to give the best strength. Cura default part orientation Part reoriented for printing, showing layers Print your connector, and see if it fits on your rods. You may need to adjust the size a little to get a tight fit. Since the model is parametric, adjusting the size of the connector should just take a few minutes! To get the most strength in your print you can make multiple prints of the connector with different settings (temperature, wall thickness, and so on) and see how much force it takes to break them. This is a good technique in general for getting strong prints. A printed connector Kite dimensions Now that we have the connector printed, we need to finish off the rest of the kite. You can see full instructions on making an Eddy kite, but here's the short version. I've built this kite by taking a 1m long 4mm diameter wooden rod from a kite store and cutting it into one 50cm piece and two 25cm pieces. The center connector goes 10cm from the top of the long rod. For the "sail", paper does fine (cut to fit the frame, making sure that the sail is symmetrical), and you can just tape the paper to the rods. Tie a piece of string about 80cm long between the center connector and a point 4cm from the tail to make a bridle. To find the right place to tie on the long flying line, take the kite out on a breezy day and hold by the bridle, moving your hand up and down until you find a spot where the kite doesn't try to fly up too much, or fall back down. That's the spot to tie on your long flying line. If the kite is unstable while flying, you can add a long tail to the kite, but I haven't found it to be necessary (though it adds to the classic look). Assembled kite Back side of kite, showing printed connector Being able to print your own kite parts makes it easy to experiment. If you want to try a different dihedral angle, just print a new center connector. It's quite a sight to see your kite flying high up in the sky, held together by a part you printed yourself. You can download a slightly different version of this code that includes additional bracing between the angled arms at http://www.thingiverse.com/thing:415345. For an idea of what's possible using the "polygon construction" technique, have a look at my Polygon Construction Kit for some examples of larger structures with multiple connectors. Happy flying! Sky high About the Author Michael Ang is a Berlin-based artist and engineer working at the intersection of art, engineering, and the natural world. His latest project is the Polygon Construction Kit, a toolkit for bridging the virtual and physical worlds by translating simple 3D models into physical structures.
Read more
  • 0
  • 0
  • 6960

article-image-how-greedy-algorithms-work
Richard Gall
10 Apr 2018
2 min read
Save for later

How greedy algorithms work

Richard Gall
10 Apr 2018
2 min read
What is a greedy algorithm? Greedy algorithms are useful for optimization problems. They make the optimal choice at a localized and immediate level, with the aim being that you’ll find the optimal solution you want. It’s important to note that they don’t always find you the best solution for the data science problem you’re trying to solve - so apply them wisely. In the below video tutorial above, from Fundamental Algorithms in Scala, you'll learn when and how to apply a simple greedy algorithm, and see examples of both an iterative and recursive algorithm in action. with examples of both an iterative algorithm and a recursive algorithm. The advantages and disadvantages of greedy algorithms Greedy algorithms have a number of advantages and disadvantages. While on the one hand, it's relatively easy to come up with them, it is actually pretty challenging to identify the issues around the 'correctness' of your algorithm. That means that ultimately the optimization problem you're trying to solve by using greedy algorithms isn't really a technical issue as such. Instead, it's more of an issue with the scope and definition of your data analysis project. It's a human problem, not a mechanical one. Different ways to apply greedy algorithms There are a number of areas where greedy algorithms can most successfully be applied. In fact, it's worth exploring some of these problems if you want to get to know them in more detail. They should give you a clearer indication of how they work, what makes them useful, and potential drawbacks. Some of the best examples are: Huffman coding Dijkstra's algorithm Continuous knapsack problem  To learn more about other algorithms check out these articles: 4 popular algorithms for Distance-based outlier detection 10 machine learning algorithms every engineer needs to know 4 Clustering Algorithms every Data Scientist should know Backpropagation Algorithm To learn to implement specific algorithms, use these tutorials: Creating a reference generator for a job portal using Breadth First Search (BFS) algorithm Implementing gradient descent algorithm to solve optimization problems Implementing face detection using the Haar Cascades and AdaBoost algorithm Getting started with big data analysis using Google’s PageRank algorithm Implementing the k-nearest neighbors' algorithm in Python Machine Learning Algorithms: Implementing Naive Bayes with Spark MLlib
Read more
  • 0
  • 0
  • 6952

article-image-understanding-and-developing-node-modules
Packt
11 Aug 2011
5 min read
Save for later

Understanding and Developing Node Modules

Packt
11 Aug 2011
5 min read
Node Web Development A practical introduction to Node, the exciting new server-side JavaScript web development stack What's a module? Modules are the basic building block of constructing Node applications. We have already seen modules in action; every JavaScript file we use in Node is itself a module. It's time to see what they are and how they work. The following code to pull in the fs module, gives us access to its functions: var fs = require('fs'); The require function searches for modules, and loads the module definition into the Node runtime, making its functions available. The fs object (in this case) contains the code (and data) exported by the fs module. Let's look at a brief example of this before we start diving into the details. Ponder over this module, simple.js: var count = 0; exports.next = function() { return count++; } This defines an exported function and a local variable. Now let's use it: The object returned from require('./simple') is the same object, exports, we assigned a function to inside simple.js. Each call to s.next calls the function next in simple.js, which returns (and increments) the value of the count variable, explaining why s.next returns progressively bigger numbers. The rule is that, anything (functions, objects) assigned as a field of exports is exported from the module, and objects inside the module but not assigned to exports are not visible to any code outside the module. This is an example of encapsulation. Now that we've got a taste of modules, let's take a deeper look. Node modules Node's module implementation is strongly inspired by, but not identical to, the CommonJS module specification. The differences between them might only be important if you need to share code between Node and other CommonJS systems. A quick scan of the Modules/1.1.1 spec indicates that the differences are minor, and for our purposes it's enough to just get on with the task of learning to use Node without dwelling too long on the differences. How does Node resolve require('module')? In Node, modules are stored in files, one module per file. There are several ways to specify module names, and several ways to organize the deployment of modules in the file system. It's quite flexible, especially when used with npm, the de-facto standard package manager for Node. Module identifiers and path names Generally speaking the module name is a path name, but with the file extension removed. That is, when we write require('./simple'), Node knows to add .js to the file name and load in simple.js. Modules whose file names end in .js are of course expected to be written in JavaScript. Node also supports binary code native libraries as Node modules. In this case the file name extension to use is .node. It's outside our scope to discuss implementation of a native code Node module, but this gives you enough knowledge to recognize them when you come across them. Some Node modules are not files in the file system, but are baked into the Node executable. These are the Core modules, the ones documented on nodejs.org. Their original existence is as files in the Node source tree but the build process compiles them into the binary Node executable. There are three types of module identifiers: relative, absolute, and top-level. Relative module identifiers begin with "./" or "../" and absolute identifiers begin with "/". These are identical with POSIX file system semantics with path names being relative to the file being executed. Absolute module identifiers obviously are relative to the root of the file system. Top-level module identifiers do not begin with "." , "..", or "/" and instead are simply the module name. These modules are stored in one of several directories, such as a node_modules directory, or those directories listed in the array require.paths, designated by Node to hold these modules. Local modules within your application The universe of all possible modules is split neatly into two kinds, those modules that are part of a specific application, and those modules that aren't. Hopefully the modules that aren't part of a specific application were written to serve a generalized purpose. Let's begin with implementation of modules used within your application. Typically your application will have a directory structure of module files sitting next to each other in the source control system, and then deployed to servers. These modules will know the relative path to their sibling modules within the application, and should use that knowledge to refer to each other using relative module identifiers. For example, to help us understand this, let's look at the structure of an existing Node package, the Express web application framework. It includes several modules structured in a hierarchy that the Express developers found to be useful. You can imagine creating a similar hierarchy for applications reaching a certain level of complexity, subdividing the application into chunks larger than a module but smaller than an application. Unfortunately there isn't a word to describe this, in Node, so we're left with a clumsy phrase like "subdivide into chunks larger than a module". Each subdivided chunk would be implemented as a directory with a few modules in it. In this example, the most likely relative module reference is to utils.js. Depending on the source file which wants to use utils.js it would use one of the following require statements: var utils = require('./lib/utils'); var utils = require('./utils'); var utils = require('../utils');  
Read more
  • 0
  • 0
  • 6950
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-contexts-and-dependency-injection-netbeans
Packt
06 Feb 2015
18 min read
Save for later

Contexts and Dependency Injection in NetBeans

Packt
06 Feb 2015
18 min read
In this article by David R. Heffelfinger, the author of Java EE 7 Development with NetBeans 8, we will introduce Contexts and Dependency Injection (CDI) and other aspects of it. CDI can be used to simplify integrating the different layers of a Java EE application. For example, CDI allows us to use a session bean as a managed bean, so that we can take advantage of the EJB features, such as transactions, directly in our managed beans. In this article, we will cover the following topics: Introduction to CDI Qualifiers Stereotypes Interceptor binding types Custom scopes (For more resources related to this topic, see here.) Introduction to CDI JavaServer Faces (JSF) web applications employing CDI are very similar to JSF applications without CDI; the main difference is that instead of using JSF managed beans for our model and controllers, we use CDI named beans. What makes CDI applications easier to develop and maintain are the excellent dependency injection capabilities of the CDI API. Just as with other JSF applications, CDI applications use facelets as their view technology. The following example illustrates a typical markup for a JSF page using CDI: <?xml version='1.0' encoding='UTF-8' ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html      >    <h:head>        <title>Create New Customer</title>    </h:head>    <h:body>        <h:form>            <h3>Create New Customer</h3>            <h:panelGrid columns="3">                <h:outputLabel for="firstName" value="First Name"/>                <h:inputText id="firstName" value="#{customer.firstName}"/>                <h:message for="firstName"/>                  <h:outputLabel for="middleName" value="Middle Name"/>                <h:inputText id="middleName"                  value="#{customer.middleName}"/>                <h:message for="middleName"/>                  <h:outputLabel for="lastName" value="Last Name"/>                <h:inputText id="lastName" value="#{customer.lastName}"/>                <h:message for="lastName"/>                  <h:outputLabel for="email" value="Email Address"/>                <h:inputText id="email" value="#{customer.email}"/>                <h:message for="email"/>                <h:panelGroup/>                <h:commandButton value="Submit"                  action="#{customerController.navigateToConfirmation}"/>            </h:panelGrid>        </h:form>    </h:body> </html> As we can see, the preceding markup doesn't look any different from the markup used for a JSF application that does not use CDI. The page renders as follows (shown after entering some data): In our page markup, we have JSF components that use Unified Expression Language expressions to bind themselves to CDI named bean properties and methods. Let's take a look at the customer bean first: package com.ensode.cdiintro.model;   import java.io.Serializable; import javax.enterprise.context.RequestScoped; import javax.inject.Named;   @Named @RequestScoped public class Customer implements Serializable {      private String firstName;    private String middleName;    private String lastName;    private String email;      public Customer() {    }      public String getFirstName() {        return firstName;    }      public void setFirstName(String firstName) {        this.firstName = firstName;    }      public String getMiddleName() {        return middleName;    }      public void setMiddleName(String middleName) {        this.middleName = middleName;    }      public String getLastName() {        return lastName;    }      public void setLastName(String lastName) {        this.lastName = lastName;    }      public String getEmail() {        return email;    }      public void setEmail(String email) {        this.email = email;    } } The @Named annotation marks this class as a CDI named bean. By default, the bean's name will be the class name with its first character switched to lowercase (in our example, the name of the bean is "customer", since the class name is Customer). We can override this behavior if we wish by passing the desired name to the value attribute of the @Named annotation, as follows: @Named(value="customerBean") A CDI named bean's methods and properties are accessible via facelets, just like regular JSF managed beans. Just like JSF managed beans, CDI named beans can have one of several scopes as listed in the following table. The preceding named bean has a scope of request, as denoted by the @RequestScoped annotation. Scope Annotation Description Request @RequestScoped Request scoped beans are shared through the duration of a single request. A single request could refer to an HTTP request, an invocation to a method in an EJB, a web service invocation, or sending a JMS message to a message-driven bean. Session @SessionScoped Session scoped beans are shared across all requests in an HTTP session. Each user of an application gets their own instance of a session scoped bean. Application @ApplicationScoped Application scoped beans live through the whole application lifetime. Beans in this scope are shared across user sessions. Conversation @ConversationScoped The conversation scope can span multiple requests, and is typically shorter than the session scope. Dependent @Dependent Dependent scoped beans are not shared. Any time a dependent scoped bean is injected, a new instance is created. As we can see, CDI has equivalent scopes to all JSF scopes. Additionally, CDI adds two additional scopes. The first CDI-specific scope is the conversation scope, which allows us to have a scope that spans across multiple requests, but is shorter than the session scope. The second CDI-specific scope is the dependent scope, which is a pseudo scope. A CDI bean in the dependent scope is a dependent object of another object; beans in this scope are instantiated when the object they belong to is instantiated and they are destroyed when the object they belong to is destroyed. Our application has two CDI named beans. We already discussed the customer bean. The other CDI named bean in our application is the controller bean: package com.ensode.cdiintro.controller;   import com.ensode.cdiintro.model.Customer; import javax.enterprise.context.RequestScoped; import javax.inject.Inject; import javax.inject.Named;   @Named @RequestScoped public class CustomerController {      @Inject    private Customer customer;      public Customer getCustomer() {        return customer;    }      public void setCustomer(Customer customer) {        this.customer = customer;    }      public String navigateToConfirmation() {        //In a real application we would        //Save customer data to the database here.          return "confirmation";    } } In the preceding class, an instance of the Customer class is injected at runtime; this is accomplished via the @Inject annotation. This annotation allows us to easily use dependency injection in CDI applications. Since the Customer class is annotated with the @RequestScoped annotation, a new instance of Customer will be injected for every request. The navigateToConfirmation() method in the preceding class is invoked when the user clicks on the Submit button on the page. The navigateToConfirmation() method works just like an equivalent method in a JSF managed bean would, that is, it returns a string and the application navigates to an appropriate page based on the value of that string. Like with JSF, by default, the target page's name with an .xhtml extension is the return value of this method. For example, if no exceptions are thrown in the navigateToConfirmation() method, the user is directed to a page named confirmation.xhtml: <?xml version='1.0' encoding='UTF-8' ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html      >    <h:head>        <title>Success</title>    </h:head>    <h:body>        New Customer created successfully.        <h:panelGrid columns="2" border="1" cellspacing="0">            <h:outputLabel for="firstName" value="First Name"/>            <h:outputText id="firstName" value="#{customer.firstName}"/>              <h:outputLabel for="middleName" value="Middle Name"/>            <h:outputText id="middleName"              value="#{customer.middleName}"/>              <h:outputLabel for="lastName" value="Last Name"/>            <h:outputText id="lastName" value="#{customer.lastName}"/>              <h:outputLabel for="email" value="Email Address"/>            <h:outputText id="email" value="#{customer.email}"/>          </h:panelGrid>    </h:body> </html> Again, there is nothing special we need to do to access the named beans properties from the preceding markup. It works just as if the bean was a JSF managed bean. The preceding page renders as follows: As we can see, CDI applications work just like JSF applications. However, CDI applications have several advantages over JSF, for example (as we mentioned previously) CDI beans have additional scopes not found in JSF. Additionally, using CDI allows us to decouple our Java code from the JSF API. Also, as we mentioned previously, CDI allows us to use session beans as named beans. Qualifiers In some instances, the type of bean we wish to inject into our code may be an interface or a Java superclass, but we may be interested in injecting a subclass or a class implementing the interface. For cases like this, CDI provides qualifiers we can use to indicate the specific type we wish to inject into our code. A CDI qualifier is an annotation that must be decorated with the @Qualifier annotation. This annotation can then be used to decorate the specific subclass or interface. In this section, we will develop a Premium qualifier for our customer bean; premium customers could get perks that are not available to regular customers, for example, discounts. Creating a CDI qualifier with NetBeans is very easy; all we need to do is go to File | New File, select the Contexts and Dependency Injection category, and select the Qualifier Type file type. In the next step in the wizard, we need to enter a name and a package for our qualifier. After these two simple steps, NetBeans generates the code for our qualifier: package com.ensode.cdiintro.qualifier;   import static java.lang.annotation.ElementType.TYPE; import static java.lang.annotation.ElementType.FIELD; import static java.lang.annotation.ElementType.PARAMETER; import static java.lang.annotation.ElementType.METHOD; import static java.lang.annotation.RetentionPolicy.RUNTIME; import java.lang.annotation.Retention; import java.lang.annotation.Target; import javax.inject.Qualifier;   @Qualifier @Retention(RUNTIME) @Target({METHOD, FIELD, PARAMETER, TYPE}) public @interface Premium { } Qualifiers are standard Java annotations. Typically, they have retention of runtime and can target methods, fields, parameters, or types. The only difference between a qualifier and a standard annotation is that qualifiers are decorated with the @Qualifier annotation. Once we have our qualifier in place, we need to use it to decorate the specific subclass or interface implementation, as shown in the following code: package com.ensode.cdiintro.model;   import com.ensode.cdiintro.qualifier.Premium; import javax.enterprise.context.RequestScoped; import javax.inject.Named;   @Named @RequestScoped @Premium public class PremiumCustomer extends Customer {      private Integer discountCode;      public Integer getDiscountCode() {        return discountCode;    }      public void setDiscountCode(Integer discountCode) {        this.discountCode = discountCode;    } } Once we have decorated the specific instance we need to qualify, we can use our qualifiers in the client code to specify the exact type of dependency we need: package com.ensode.cdiintro.controller;   import com.ensode.cdiintro.model.Customer; import com.ensode.cdiintro.model.PremiumCustomer; import com.ensode.cdiintro.qualifier.Premium;   import java.util.logging.Level; import java.util.logging.Logger; import javax.enterprise.context.RequestScoped; import javax.inject.Inject; import javax.inject.Named;   @Named @RequestScoped public class PremiumCustomerController {      private static final Logger logger = Logger.getLogger(            PremiumCustomerController.class.getName());    @Inject    @Premium    private Customer customer;      public String saveCustomer() {          PremiumCustomer premiumCustomer =          (PremiumCustomer) customer;          logger.log(Level.INFO, "Saving the following information n"                + "{0} {1}, discount code = {2}",                new Object[]{premiumCustomer.getFirstName(),                    premiumCustomer.getLastName(),                    premiumCustomer.getDiscountCode()});          //If this was a real application, we would have code to save        //customer data to the database here.          return "premium_customer_confirmation";    } } Since we used our @Premium qualifier to decorate the customer field, an instance of the PremiumCustomer class is injected into that field. This is because this class is also decorated with the @Premium qualifier. As far as our JSF pages go, we simply access our named bean as usual using its name, as shown in the following code; <?xml version='1.0' encoding='UTF-8' ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html      >    <h:head>        <title>Create New Premium Customer</title>    </h:head>    <h:body>        <h:form>            <h3>Create New Premium Customer</h3>            <h:panelGrid columns="3">                <h:outputLabel for="firstName" value="First Name"/>                 <h:inputText id="firstName"                    value="#{premiumCustomer.firstName}"/>                <h:message for="firstName"/>                  <h:outputLabel for="middleName" value="Middle Name"/>                <h:inputText id="middleName"                     value="#{premiumCustomer.middleName}"/>                <h:message for="middleName"/>                  <h:outputLabel for="lastName" value="Last Name"/>                <h:inputText id="lastName"                    value="#{premiumCustomer.lastName}"/>                <h:message for="lastName"/>                  <h:outputLabel for="email" value="Email Address"/>                <h:inputText id="email"                    value="#{premiumCustomer.email}"/>                <h:message for="email"/>                  <h:outputLabel for="discountCode" value="Discount Code"/>                <h:inputText id="discountCode"                    value="#{premiumCustomer.discountCode}"/>                <h:message for="discountCode"/>                   <h:panelGroup/>                <h:commandButton value="Submit"                      action="#{premiumCustomerController.saveCustomer}"/>            </h:panelGrid>        </h:form>    </h:body> </html> In this example, we are using the default name for our bean, which is the class name with the first letter switched to lowercase. Now, we are ready to test our application: After submitting the page, we can see the confirmation page. Stereotypes A CDI stereotype allows us to create new annotations that bundle up several CDI annotations. For example, if we need to create several CDI named beans with a scope of session, we would have to use two annotations in each of these beans, namely @Named and @SessionScoped. Instead of having to add two annotations to each of our beans, we could create a stereotype and annotate our beans with it. To create a CDI stereotype in NetBeans, we simply need to create a new file by selecting the Contexts and Dependency Injection category and the Stereotype file type. Then, we need to enter a name and package for our new stereotype. At this point, NetBeans generates the following code: package com.ensode.cdiintro.stereotype;   import static java.lang.annotation.ElementType.TYPE; import static java.lang.annotation.ElementType.FIELD; import static java.lang.annotation.ElementType.METHOD; import static java.lang.annotation.RetentionPolicy.RUNTIME; import java.lang.annotation.Retention; import java.lang.annotation.Target; import javax.enterprise.inject.Stereotype;   @Stereotype @Retention(RUNTIME) @Target({METHOD, FIELD, TYPE}) public @interface NamedSessionScoped { } Now, we simply need to add the CDI annotations that we want the classes annotated with our stereotype to use. In our case, we want them to be named beans and have a scope of session; therefore, we add the @Named and @SessionScoped annotations as shown in the following code: package com.ensode.cdiintro.stereotype;   import static java.lang.annotation.ElementType.TYPE; import static java.lang.annotation.ElementType.FIELD; import static java.lang.annotation.ElementType.METHOD; import static java.lang.annotation.RetentionPolicy.RUNTIME; import java.lang.annotation.Retention; import java.lang.annotation.Target; import javax.enterprise.context.SessionScoped; import javax.enterprise.inject.Stereotype; import javax.inject.Named;   @Named @SessionScoped @Stereotype @Retention(RUNTIME) @Target({METHOD, FIELD, TYPE}) public @interface NamedSessionScoped { } Now we can use our stereotype in our own code: package com.ensode.cdiintro.beans;   import com.ensode.cdiintro.stereotype.NamedSessionScoped; import java.io.Serializable;   @NamedSessionScoped public class StereotypeClient implements Serializable {      private String property1;    private String property2;      public String getProperty1() {        return property1;    }      public void setProperty1(String property1) {        this.property1 = property1;    }      public String getProperty2() {        return property2;    }      public void setProperty2(String property2) {        this.property2 = property2;    } } We annotated the StereotypeClient class with our NamedSessionScoped stereotype, which is equivalent to using the @Named and @SessionScoped annotations. Interceptor binding types One of the advantages of EJBs is that they allow us to easily perform aspect-oriented programming (AOP) via interceptors. CDI allows us to write interceptor binding types; this lets us bind interceptors to beans and the beans do not have to depend on the interceptor directly. Interceptor binding types are annotations that are themselves annotated with @InterceptorBinding. Creating an interceptor binding type in NetBeans involves creating a new file, selecting the Contexts and Dependency Injection category, and selecting the Interceptor Binding Type file type. Then, we need to enter a class name and select or enter a package for our new interceptor binding type. At this point, NetBeans generates the code for our interceptor binding type: package com.ensode.cdiintro.interceptorbinding;   import static java.lang.annotation.ElementType.TYPE; import static java.lang.annotation.ElementType.METHOD; import static java.lang.annotation.RetentionPolicy.RUNTIME; import java.lang.annotation.Inherited; import java.lang.annotation.Retention; import java.lang.annotation.Target; import javax.interceptor.InterceptorBinding;   @Inherited @InterceptorBinding @Retention(RUNTIME) @Target({METHOD, TYPE}) public @interface LoggingInterceptorBinding { } The generated code is fully functional; we don't need to add anything to it. In order to use our interceptor binding type, we need to write an interceptor and annotate it with our interceptor binding type, as shown in the following code: package com.ensode.cdiintro.interceptor;   import com.ensode.cdiintro.interceptorbinding.LoggingInterceptorBinding; import java.io.Serializable; import java.util.logging.Level; import java.util.logging.Logger; import javax.interceptor.AroundInvoke; import javax.interceptor.Interceptor; import javax.interceptor.InvocationContext;   @LoggingInterceptorBinding @Interceptor public class LoggingInterceptor implements Serializable{      private static final Logger logger = Logger.getLogger(            LoggingInterceptor.class.getName());      @AroundInvoke    public Object logMethodCall(InvocationContext invocationContext)            throws Exception {          logger.log(Level.INFO, new StringBuilder("entering ").append(                invocationContext.getMethod().getName()).append(                " method").toString());          Object retVal = invocationContext.proceed();          logger.log(Level.INFO, new StringBuilder("leaving ").append(                invocationContext.getMethod().getName()).append(                " method").toString());          return retVal;    } } As we can see, other than being annotated with our interceptor binding type, the preceding class is a standard interceptor similar to the ones we use with EJB session beans. In order for our interceptor binding type to work properly, we need to add a CDI configuration file (beans.xml) to our project. Then, we need to register our interceptor in beans.xml as follows: <?xml version="1.0" encoding="UTF-8"?> <beans               xsi_schemaLocation="http://>    <interceptors>          <class>        com.ensode.cdiintro.interceptor.LoggingInterceptor      </class>    </interceptors> </beans> To register our interceptor, we need to set bean-discovery-mode to all in the generated beans.xml and add the <interceptor> tag in beans.xml, with one or more nested <class> tags containing the fully qualified names of our interceptors. The final step before we can use our interceptor binding type is to annotate the class to be intercepted with our interceptor binding type: package com.ensode.cdiintro.controller;   import com.ensode.cdiintro.interceptorbinding.LoggingInterceptorBinding; import com.ensode.cdiintro.model.Customer; import com.ensode.cdiintro.model.PremiumCustomer; import com.ensode.cdiintro.qualifier.Premium; import java.util.logging.Level; import java.util.logging.Logger; import javax.enterprise.context.RequestScoped; import javax.inject.Inject; import javax.inject.Named;   @LoggingInterceptorBinding @Named @RequestScoped public class PremiumCustomerController {      private static final Logger logger = Logger.getLogger(            PremiumCustomerController.class.getName());    @Inject    @Premium    private Customer customer;      public String saveCustomer() {          PremiumCustomer premiumCustomer = (PremiumCustomer) customer;          logger.log(Level.INFO, "Saving the following information n"                + "{0} {1}, discount code = {2}",                new Object[]{premiumCustomer.getFirstName(),                    premiumCustomer.getLastName(),                    premiumCustomer.getDiscountCode()});          //If this was a real application, we would have code to save        //customer data to the database here.          return "premium_customer_confirmation";    } } Now, we are ready to use our interceptor. After executing the preceding code and examining the GlassFish log, we can see our interceptor binding type in action. The lines entering saveCustomer method and leaving saveCustomer method were added to the log by our interceptor, which was indirectly invoked by our interceptor binding type. Custom scopes In addition to providing several prebuilt scopes, CDI allows us to define our own custom scopes. This functionality is primarily meant for developers building frameworks on top of CDI, not for application developers. Nevertheless, NetBeans provides a wizard for us to create our own CDI custom scopes. To create a new CDI custom scope, we need to go to File | New File, select the Contexts and Dependency Injection category, and select the Scope Type file type. Then, we need to enter a package and a name for our custom scope. After clicking on Finish, our new custom scope is created, as shown in the following code: package com.ensode.cdiintro.scopes;   import static java.lang.annotation.ElementType.TYPE; import static java.lang.annotation.ElementType.FIELD; import static java.lang.annotation.ElementType.METHOD; import static java.lang.annotation.RetentionPolicy.RUNTIME; import java.lang.annotation.Inherited; import java.lang.annotation.Retention; import java.lang.annotation.Target; import javax.inject.Scope;   @Inherited @Scope // or @javax.enterprise.context.NormalScope @Retention(RUNTIME) @Target({METHOD, FIELD, TYPE}) public @interface CustomScope { } To actually use our scope in our CDI applications, we would need to create a custom context which, as mentioned previously, is primarily a concern for framework developers and not for Java EE application developers. Therefore, it is beyond the scope of this article. Interested readers can refer to JBoss Weld CDI for Java Platform, Ken Finnigan, Packt Publishing. (JBoss Weld is a popular CDI implementation and it is included with GlassFish.) Summary In this article, we covered NetBeans support for CDI, a new Java EE API introduced in Java EE 6. We provided an introduction to CDI and explained additional functionality that the CDI API provides over standard JSF. We also covered how to disambiguate CDI injected beans via CDI Qualifiers. Additionally, we covered how to group together CDI annotations via CDI stereotypes. We also, we saw how CDI can help us with AOP via interceptor binding types. Finally, we covered how NetBeans can help us create custom CDI scopes. Resources for Article: Further resources on this subject: Java EE 7 Performance Tuning and Optimization [article] Java EE 7 Developer Handbook [article] Java EE 7 with GlassFish 4 Application Server [article]
Read more
  • 0
  • 0
  • 6946

article-image-technical-and-hidden-debts-in-machine-learning-google-engineers-give-their-perspective
Prasad Ramesh
06 Nov 2018
6 min read
Save for later

Technical and hidden debts in machine learning - Google engineers’ give their perspective

Prasad Ramesh
06 Nov 2018
6 min read
In a paper, Google engineers have pointed out the various costs of maintaining a machine learning system. The paper, Hidden Technical Debt in Machine Learning Systems, talks about technical debt and other ML specific debts that are hard to detect or hidden. They found that is common to incur massive maintenance costs in real-world machine learning systems. They looked at several ML-specific risk factors to account for in system design. These factors include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a number of system-level anti-patterns. Boundary erosion in complex models In traditional software engineering, setting strict abstractions boundaries helps in logical consistency among the inputs and outputs of a given component. It is difficult to set these boundaries in machine learning systems. Yet, machine learning is needed in areas where the desired behavior cannot be effectively expressed with traditional software logic without depending on data. This results in a boundary erosion in a couple of areas. Entanglement Machine learning systems mix signals together, entangle them and make isolated improvements impossible. Change to one input may change all the other inputs and an isolated improvement cannot be done. It is referred to as the CACE principle: Change Anything Changes Everything. There are two possible ways to avoid this: Isolate models and serve ensembles. Useful in situations where the sub-problems decompose naturally. In many cases, ensembles work well as the errors in the component models are not correlated. Relying on this combination creates a strong entanglement and improving an individual model may make the system less accurate. Another strategy is to focus on detecting changes in the prediction behaviors as they occur. Correction cascades There are cases where a problem is only slightly different than another which already has a solution. It can be tempting to use the same model for the slightly different problem. A small correction is learned as a fast way to solve the newer problem. This correction model has created a new system dependency on the original model. This makes it significantly more expensive to analyze improvements to the models in the future. The cost increases when correction models are cascaded. A correction cascade can create an improvement deadlock. Visibility debt caused by undeclared consumers Many times a model is made widely accessible that may later be consumed by other systems. Without access controls, these consumers may be undeclared, silently using the output of a given model as an input to another system. These issues are referred to as visibility debt. These undeclared consumers may also create hidden feedback loops. Data dependencies cost more than code dependencies Data dependencies can carry a similar capacity as dependency debt for building debt, only more difficult to detect. Without proper tooling to identify them, data dependencies can form large chains that are difficult to untangle. They are of two types. Unstable data dependencies For moving along the process quickly, it is often convenient to use signals from other systems as input to your own. But some input signals are unstable, they can qualitatively or quantitatively change behavior over time. This can happen as the other system updates over time or made explicitly. A mitigation strategy is to create versioned copies. Underutilized data dependencies Underutilized data dependencies are input signals that provide little incremental modeling benefit. These can make an ML system vulnerable to change where it is not necessary. Underutilized data dependencies can come into a model in several ways—via legacy, bundled, epsilon or correlated features. Feedback loops Live ML systems often end up influencing their own behavior on being updated over time. This leads to analysis debt. It is difficult to predict the behavior of a given model before it is released in such a case. These feedback loops are difficult to detect and address if they occur gradually over time. This may be the case if the model is not updated frequently. A direct feedback loop is one in which a model may directly influence the selection of its own data for future training. In a hidden feedback loop, two systems influence each other indirectly. Machine learning system anti-patterns It is common for systems that incorporate machine learning methods to end up with high-debt design patterns. Glue code: Using generic packages results in a glue code system design pattern. In that, a massive amount of supporting code is typed to get data into and out of general-purpose packages. Pipeline jungles: Pipeline jungles often appear in data preparation as a special case of glue code. This can evolve organically with new sources added. The result can become a jungle of scrapes, joins, and sampling steps. Dead experimental codepaths: Glue code commonly becomes increasingly attractive in the short term. None of the surrounding structures need to be reworked. Over time, these accumulated codepaths create a growing debt due to the increasing difficulties of maintaining backward compatibility. Abstraction debt: There is a lack of support for strong abstractions in ML systems. Common smells: A smell may indicate an underlying problem in a component system. These can be data smells, multiple-language smell, or prototype smells. Configuration debt Debt can also accumulate when configuring a machine learning system. A large system has a wide number of configurations with respect to features, data selection, verification methods and so on. It is common that configuration is treated an afterthought. In a mature system, config lines can be larger than the code lines themselves and each configuration line has potential for mistakes. Dealing with external world changes ML systems interact directly with the external world and the external world is rarely stable. Some measures that can be taken to deal with the instability are: Fixing thresholds in dynamic systems It is necessary to pick a decision threshold for a given model to perform some action. Either to predict true or false, to mark an email as spam or not spam, to show or not show a given advertisement. Monitoring and testing Unit testing and end-to-end testing cannot ensure complete proper functioning of an ML system.  For long-term system reliability, comprehensive live monitoring and automated response is critical. Now there is a question of what to monitor. The authors of the paper point out three areas as starting points—prediction bias, limits for actions, and upstream producers. Other related areas in ML debt In addition to the mentioned areas, an ML system may also face debts from other areas. These include data testing debt, reproducibility debt, process management debt, and cultural debt. Conclusion Moving quickly often introduces technical debt. The most important insight from this paper, according to the authors is that technical debt is an issue that both engineers and researchers need to be aware of. Paying machine learning related technical debt requires commitment, which can often only be achieved by a shift in team culture. Prioritizing and rewarding this effort which needs to be recognized is important for the long-term health of successful machine learning teams. For more details, you can read the paper at NIPS website. Uses of Machine Learning in Gaming Julia for machine learning. Will the new language pick up pace? Machine learning APIs for Google Cloud Platform
Read more
  • 0
  • 0
  • 6943

article-image-map-reduce
Packt
08 Aug 2013
10 min read
Save for later

Map Reduce

Packt
08 Aug 2013
10 min read
(For more resources related to this topic, see here.) Map-reduce is a technique that is used to take large quantities of data and farm it out for processing. A somewhat trivial example might be: given 1TB of HTTP log data, count the number of hits that come from a given country, and report those numbers. For example, if you have the log entries: 204.12.226.2 - - [09/Jun/2013:09:12:24 -0700] "GET /who-we-are HTTP/1.0"404 471 "-" "Mozilla/5.0 (compatible; MJ12bot/v1.4.3; http://www.majestic12.co.uk/bot.php?+)"174.129.187.73 - - [09/Jun/2013:10:58:22 -0700] "GET /robots.txtHTTP/1.1" 404 452 "-" "CybEye.com/2.0 (compatible; MSIE 9.0; Windows NT5.1; Trident/4.0; GTB6.4)"157.55.35.37 - - [02/Jun/2013:23:31:01 -0700] "GET / HTTP/1.1" 200 483"-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"206.183.1.74 - - [02/Jun/2013:18:24:35 -0700] "GET / HTTP/1.1" 200 482"-" "Mozilla/4.0 (compatible; http://search.thunderstone.com/texis/websearch/about.html)"1.202.218.21 - - [02/Jun/2013:17:38:20 -0700] "GET /robots.txt HTTP/1.1"404 471 "-" "Mozilla/5.0 (compatible; JikeSpider; +http://shoulu.jike.com/spider.html)" Then the answer to the question would be as follows: US: 4China: 1 Clearly this example dataset does not warrant distributing the data processing among multiple machines, but imagine if instead of five rows of log data we had twenty-five billion rows. If your program took a single computer a half a second to process five records, it would take a little short of eighty years to process twenty-five billion records. To solve for this, we could break up the data into smaller chunks and then process those smaller chunks, rejoining them when we were finished. To apply this to a slightly larger dataset, imagine you extrapolated these five records to one hundred records and then split those one hundred records into five groups, each containing twenty records. From those five groups we might compute the following results: Group 1   Group 2   Group 3   Group 4   Group 5   US 5 Mexico 2 US 15 Italy 1 Finland 5 Greece 4 Scotland 6 China 2 Greece 4 China 5 Ireland 8 Canada 9 Finland 3 Scotland 10 US 10 Canada 3 Ireland 3     US 5     If we were to combine these data points by using the country name as a key and store them in a map, adding the value to any existing value, we would get the count per country across all one hundred records. Using Ruby, we can write a simple program to do this, first without using Gearman, and then with it. To demonstrate this, we will write the following: A simple library that we can use in our non-distributed program and in our Gearman-enabled programs An example program that demonstrates using the library A client that uses the library to split up our data and submit jobs to our manager A worker that uses the library to process the job requests and return the results The shared library First we will develop a library that we can reuse. This will demonstrate that you can reuse existing logic to quickly take advantage of Gearman because it ensures the following things: The program, client, and worker are much simpler so we can see what's going on in them The behavior between our program, client, and worker is guaranteed to be consistent The shared library will have two methods, map_data and reduce_data. The map_data method will be responsible for splitting up the data into chunks to be processed, and the reduce_data method will process those chunks of data and return something that can be merged together into an accurate answer. Take the following example, and save it to a file named functions.rb for later use: #!/bin/env ruby# Generate sub-lists of the data# each sub-list has size = blocksizedef map_data(lines, blocksize)blocks = []counter = 0block = []lines.each do |line|if (counter >= blocksize)blocks << blockblock = []counter = 0endblock << linecounter += 1endblocks << block if block.size> 0blocksend# Extract the number of times we see a unique line# Result is a hash with key = line, value = countdef reduce_data(lines)results = {}lines.each do |line|results[line] ||= 0results[line] += 1endresultsend A simple program To use this library, we can write a very simple program that demonstrates the functionality: require './functions.rb'countries = ["china", "us", "greece", "italy"]lines = []results = {}(1..100).each { |i| lines << countries[i % 4] }blocks = map_data(lines, 20)blocks.each do |block|reduce_data(block).each do |k,v|results[k] ||= 0results[k] += vendendputs results.inspect Put the contents of this example into a Ruby source file, named mapreduce.rb in the same directory as you placed your functions.rb file, and execute it with the following: [user@host:$] ruby ./mapreduce.rb This script will generate a list with one hundred elements in it. Since there are four distinct elements, each will appear 25 times as the following output shows: {"us"=>25, "greece"=>25, "italy"=>25, "china"=>25} Following in this vein, we can add in Gearman to extend our example to operate using a client that submits jobs and a single worker that will process the results serially to generate the same results. The reason we wrote these methods in a separate module from the driver application was to make them reusable in this fashion. The client The following code for the client in this example will be responsible for the mapping phase, it will split apart the results and submit jobs for the blocks of data it needs processed. In this example worker/client setup, we are using JSON as a simple way to serialize/deserialize data being sent back and forth: require 'rubygems'require 'gearman'require 'json'require './functions.rb'client = Gearman::Client.new('localhost:4730')taskset = Gearman::TaskSet.new(client)countries = ["china", "us", "greece", "italy"]jobcount = 1lines = []results = {}(1..100).each { |i| lines << countries[i % 4] }blocks = map_data(lines, 20)blocks.each do |block|# Generate a task with a unique iduniq = rand(36**8).to_s(36)task = Gearman::Task.new('count_countries',JSON.dump(block),:uniq =>uniq)# When the task is complete, add its results into ourstask.on_complete do |d|# We are passing data back and forth as JSON, so# decode it to a hash and then iterate over the# k=>v pairsJSON.parse(d).each do |k,v|results[k] ||= 0results[k] += vendendtaskset.add_task(task)puts "Submitted job #{jobcount}"jobcount += 1endputs "Submitted all jobs, waiting for results."start_time = Time.nowtaskset.wait(100)time_diff = (Time.now - start_time).to_iputs "Took #{time_diff} seconds: #{results.inspect}" This client uses a few new concepts that were not used in the introductory examples, that is, task sets and unique identifiers. In the Ruby client, a task set is a group of tasks that are submitted together and can be waited upon collectively. To generate a task set, you construct it by giving it the client that you want to submit the task set with: taskset = Gearman::TaskSet.new(client) Then you can create and add tasks to the task set: task = Gearman::Task.new('count_countries',JSON.dump(block), :uniq =>uniq)taskset.add_task(task) Finally, you tell the task set how long you want to wait for the results: taskset.wait(100) This will block the program until the timeout passes, or all the tasks in the task set complete hold true (again, complete does necessarily mean that the worker succeeded at the task, but that it saw it to completion). In this example, it will wait 100 seconds for all the tasks to complete before giving up on them. This doesn't mean that the jobs won't complete if the client disconnects, just that the client won't see the end results (which may or may not be acceptable). The worker To complete the distributed MapReduce example, we need to implement the worker that is responsible for performing the actual data processing. The worker will perform the following tasks: Receive a list of countries serialized as JSON from the manager Decode that JSON data into a Ruby structure Perform the reduce operation on the data converting the list of countries into a corresponding hash of counts Serialize the hash of counts as a JSON string Return the JSON string to the manager (to be passed on to the client) require 'rubygems'require 'gearman'require 'json'require './functions.rb'Gearman::Util.logger.level = Logger::DEBUG@servers = ['localhost:4730']w = Gearman::Worker.new(@servers)w.add_ability('count_countries') do |json_data,job|puts "Received: #{json_data}"data = JSON.parse(json_data)result = reduce_data(data)puts "Result: #{result.inspect}"returndata = JSON.dump(result)puts "Returning #{returndata}"sleep 4returndataendloop { w.work } Notice that we have introduced a slight delay in returning the results by instructing our worker to sleep for four seconds before returning the data. This is here in order to simulate a job that takes a while to process. To run this example, we will repeat the exercise from the first section. Save the contents of the client to a file called mapreduce_client.rb, and then contents of the worker to a file named mapreduce_worker.rb in the same directory as the functions.rb file. Then, start the worker first by running the following: ruby mapreduce_worker.rb And then start the client by running the following: ruby mapreduce_client.rb When you run these scripts, the worker will be waiting to pick up jobs, and then the client will generate five jobs, each with a block containing a list of countries to be counted, and submit them to the manager. These jobs will be picked up by the worker and then processed, one at a time, until they are all complete. As a result there will be a twenty second difference between when the jobs are submitted and when they are completed. Parallelizing the pipeline Implementing the solution this way clearly doesn't gain us much performance from the original example. In fact, it is going to be slower (even ignoring the four second sleep inside each job execution) than the original because there is time involved in serialization and deserialization of the data, transmitting the data between the actors, and transmitting the results between the actors. The goal of this exercise is to demonstrate building a system that can increase the number of workers and parallelize the processing of data, which we will see in the following exercise. To demonstrate the power of parallel processing, we can now run two copies of the worker. Simply open a new shell and execute the worker via ruby mapreduce_worker.rb and this will spin up a second copy of the worker that is ready to process jobs. Now, run the client a second time and observe the behavior. You will see that the client has completed in twelve seconds instead of twenty. Why not ten? Remember that we submitted five jobs, and each will take four seconds. Five jobs do not get divided evenly between two workers and so one worker will acquire three jobs instead of two, which will take it an additional four seconds to complete: [user@host]% ruby mapreduce_client.rbSubmitted job 1Submitted job 2Submitted job 3Submitted job 4Submitted job 5Submitted all jobs, waiting for results.Took 12 seconds: {"us"=>25, "greece"=>25, "italy"=>25, "china"=>25} Feel free to experiment with the various parameters of the system such as running more workers, increasing the number of records that are being processed, or adjusting the amount of time that the worker sleeps during a job. While this example does not involve processing enormous quantities of data, hopefully you can see how this can be expanded for future growth. Summary In this article, we have discussed MapReduce technique. Hope this article gives you a glimpse of how the book flows. Resources for Article : Further resources on this subject: BPMN 2.0 Concepts and The Sales Quote Process [Article] Simplifying Parallelism Complexity in C# [Article] Oracle BPM Suite 11gR1: Creating a BPM Application [Article]
Read more
  • 0
  • 0
  • 6941

article-image-ruby-strings
Packt
06 Jul 2017
9 min read
Save for later

Ruby Strings

Packt
06 Jul 2017
9 min read
In this article by Jordan Hudgens, the author of the book Comprehensive Ruby Programming, you'll learn about the Ruby String data type and walk through how to integrate string data into a Ruby program. Working with words, sentences, and paragraphs are common requirements in many applications. Additionally you learn how to: Employ string manipulation techniques using core Ruby methods Demonstrate how to work with the string data type in Ruby (For more resources related to this topic, see here.) Using strings in Ruby A string is a data type in Ruby and contains set of characters, typically normal English text (or whatever natural language you're building your program for), that you would write. A key point for the syntax of strings is that they have to be enclosed in single or double quotes if you want to use them in a program. The program will throw an error if they are not wrapped inside quotation marks. Let's walk through three scenarios. Missing quotation marks In this code I tried to simply declare a string without wrapping it in quotation marks. As you can see, this results in an error. This error is because Ruby thinks that the values are classes and methods. Printing strings In this code snippet we're printing out a string that we have properly wrapped in quotation marks. Please note that both single and double quotation marks work properly. It's also important that you do not mix the quotation mark types. For example, if you attempted to run the code: puts "Name an animal' You would get an error, because you need to ensure that every quotation mark is matched with a closing (and matching) quotation mark. If you start a string with double quotation marks, the Ruby parser requires that you end the string with the matching double quotation marks. Storing strings in variables Lastly in this code snippet we're storing a string inside of a variable and then printing the value out to the console. We'll talk more about strings and string interpolation in subsequent sections. String interpolation guide for Ruby In this section, we are going to talk about string interpolation in Ruby. What is string interpolation? So what exactly is string interpolation? Good question. String interpolation is the process of being able to seamlessly integrate dynamic values into a string. Let's assume we want to slip dynamic words into a string. We can get input from the console and store that input into variables. From there we can call the variables inside of a pre-existing string. For example, let's give a sentence the ability to change based on a user's input. puts "Name an animal" animal = gets.chomp puts "Name a noun" noun= gets.chomp p "The quick brown #{animal} jumped over the lazy #{noun} " Note the way I insert variables inside the string? They are enclosed in curly brackets and are preceded by a # sign. If I run this code, this is what my output will look: So, this is how you insert values dynamically in your sentences. If you see sites like Twitter, it sometimes displays personalized messages such as: Good morning Jordan or Good evening Tiffany. This type of behavior is made possible by inserting a dynamic value in a fixed part of a string and leverages string interpolation. Now, let's use single quotes instead of double quotes, to see what happens. As you'll see, the string was printed as it is without inserting the values for animal and noun. This is exactly what happens when you try using single quotes—it prints the entire string as it is without any interpolation. Therefore it's important to remember the difference. Another interesting aspect is that anything inside the curly brackets can be a Ruby script. So, technically you can type your entire algorithm inside these curly brackets, and Ruby will run it perfectly for you. However, it is not recommended for practical programming purposes. For example, I can insert a math equation, and as you'll see it prints the value out. String manipulation guide In this section we are going to learn about string manipulation along with a number of examples of how to integrate string manipulation methods in a Ruby program. What is string manipulation? So what exactly is string manipulation? It's the process of altering the format or value of a string, usually by leveraging string methods. String manipulation code examples Let's start with an example. Let's say I want my application to always display the word Astros in capital letters. To do that, I simply write: "Astros".upcase Now if I always a string to be in lower case letters I can use the downcase method, like so: "Astros".downcase Those are both methods I use quite often. However there are other string methods available that we also have at our disposal. For the rare times when you want to literally swap the case of the letters you can leverage the swapcase method: "Astros".swapcase And lastly if you want to reverse the order of the letters in the string we can call the reverse method: "Astros".reverse These methods are built into the String data class and we can call them on any string values in Ruby. Method chaining Another neat thing we can do is join different methods together to get custom output. For example, I can run: "Astros".reverse.upcase The preceding code displays the value SORTSA. This practice of combining different methods with a dot is called method chaining. Split, strip, and join guides for strings In this section, we are going to walk through how to use the split and strip methods in Ruby. These methods will help us clean up strings and convert a string to an array so we can access each word as its own value. Using the strip method Let's start off by analyzing the strip method. Imagine that the input you get from the user or from the database is poorly formatted and contains white space before and after the value. To clean the data up we can use the strip method. For example: str = " The quick brown fox jumped over the quick dog " p str.strip When you run this code, the output is just the sentence without the white space before and after the words. Using the split method Now let's walk through the split method. The split method is a powerful tool that allows you to split a sentence into an array of words or characters. For example, when you type the following code: str = "The quick brown fox jumped over the quick dog" p str.split You'll see that it converts the sentence into an array of words. This method can be particularly useful for long paragraphs, especially when you want to know the number of words in the paragraph. Since the split method converts the string into an array, you can use all the array methods like size to see how many words were in the string. We can leverage method chaining to find out how many words are in the string, like so: str = "The quick brown fox jumped over the quick dog" p str.split.size This should return a value of 9, which is the number of words in the sentence. To know the number of letters, we can pass an optional argument to the split method and use the format: str = "The quick brown fox jumped over the quick dog" p str.split(//).size And if you want to see all of the individual letters, we can remove the size method call, like this: p str.split(//) And your output should look like this: Notice, that it also included spaces as individual characters which may or may not be what you want a program to return. This method can be quite handy while developing real-world applications. A good practical example of this method is Twitter. Since this social media site restricts users to 140 characters, this method is sure to be a part of the validation code that counts the number of characters in a Tweet. Using the join method We've walked through the split method, which allows you to convert a string into a collection of characters. Thankfully, Ruby also has a method that does the opposite, which is to allow you to convert an array of characters into a single string, and that method is called join. Let's imagine a situation where we're asked to reverse the words in a string. This is a common Ruby coding interview question, so it's an important concept to understand since it tests your knowledge of how string work in Ruby. Let's imagine that we have a string, such as: str = "backwards am I" And we're asked to reverse the words in the string. The pseudocode for the algorithm would be: Split the string into words Reverse the order of the words Merge all of the split words back into a single string We can actually accomplish each of these requirements in a single line of Ruby code. The following code snippet will perform the task: str.split.reverse.join(' ') This code will convert the single string into an array of strings, for the example it will equal ["backwards", "am", "I"]. From there it will reverse the order of the array elements, so the array will equal: ["I", "am", "backwards"]. With the words reversed, now we simply need to merge the words into a single string, which is where the join method comes in. Running the join method will convert all of the words in the array into one string. Summary In this article, we were introduced to the string data type and how it can be utilized in Ruby. We analyzed how to pass strings into Ruby processes by leveraging string interpolation. We also learned the methods of basic string manipulation and how to find and replace string data. We analyzed how to break strings into smaller components, along with how to clean up string based data. We even introduced the Array class in this article. Resources for Article: Further resources on this subject: Ruby and Metasploit Modules [article] Find closest mashup plugin with Ruby on Rails [article] Building tiny Web-applications in Ruby using Sinatra [article]
Read more
  • 0
  • 0
  • 6940
article-image-chaos-engineering-managing-complexity-by-breaking-things
Richard Gall
20 Apr 2018
7 min read
Save for later

Chaos Engineering: managing complexity by breaking things

Richard Gall
20 Apr 2018
7 min read
Chaos Engineering is based on a fundamental assertion about software infrastructure today: that it is inherently chaotic. Or, to be more specific, it is chaotic because it is complex. Whereas software infrastructure used to be centralized, owned and licensed by large enterprise vendors, today much of the software that comprises infrastructure is open source. This is where we get back to chaos - because software infrastructure is comprised of many different parts, the way these parts can be unpredictable. Chaos Engineering is an attempt to acknowledge that fact and develop software accordingly. Who invented Chaos Engineering? Chaos Engineering began at Netflix. That makes sense when you consider the complexity of the Netflix technology stack and the way the company have scaled over the last 5 years or so. It built a number of tools to help adopt this chaos-first approach, the most prominent being Chaos Monkey. First launched in 2011 and open-sourced in 2012, Chaos Monkey was a tool that randomly selects instances in production and pulls them down; a little bit like monkeys pulling off your windscreen wipers in a safari park. However, Chaos Monkey became part of a wider suite of tools - called the Simian Army - that were built by Netflix to cause chaos in different part of its infrastructure. Here are the other two components used to simulate chaos: Chaos Gorilla causes big trouble by pulling down an entire AWS availability zone Latency monkey delays communication, essentially simulating poor network performance From that point Chaos Engineering grew. A number of large Silicon Valley organizations have adopted a similar approaches. For example, Facebook's Project Storm simulates data center failures on a huge scale, while Uber uses a tool called uDestroy. Slack has recently spoken in detail on the importance of stress testing their software too; the company is looking to build an engineering team simply to perform Chaos Engineering and improve Slack's reliability. One of the most interesting figures in Chaos Engineering is a man called Kolton Andrus. Andrus used to work at Amazon and Google, but today he is the CEO and founder of Gremlin, a startup that "helps engineers build resilient systems". Essentially, Andrus helped to develop the concept of Chaos Engineering while he was working at Netflix. Gremlin is his vehicle that is making it accessible to others. Chaos Engineering in practice Now the conceptual stuff is out of the way, here's how chaos engineering works. It's actually quite straightforward: Chaos Engineering simulates all sorts of unpredictable situations and scenarios in order to see how the system responds. It's effectively a form of stress testing. As we've seen, over the past few years companies have built their own tools to allow them to stress test their infrastructure. But Gremlin is taking the approach of offering this as a service. It's product is described as 'resiliency-as-a-service.' Its' product is a whole library of 'attacks' which can replicate different types of outages within a system. These are what it calls 'chaos experiments' that allows you to 'identify weak points in your system and fix them before they become a problem'. In this sense, Chaos Engineering is a bit like using the principles of penetration testing an applying it to software testing more broadly. By simulating everything that could possibly go wrong it allows you to make much better optimization decisions. The principles of Chaos Engineering are documented here. This is effectively its 'manifesto'. There's a lot in there worth reading, but here are the 5 principles that any sort of testing or experimentation should aspire to: Base your testing hypothesis on steady state behavior. Consider your infrastructure holistically, making individual parts work is important but not the priority. Simulate a variety of real-world events. This could be hardware or software failures, or simply external changes like spikes in traffic. What's important is that they're all unpredictable. Test in production. Your tests should be authentic. Automate! Testing things could be laborious and require a lot of manual work. Make use of automation tools to do lots of different tests without taking up too much of your time. Don't cause unnecessary pain. While it's important that your stress-tests are authentic, the impact must be contained and minimized by the engineer. Why Chaos Engineering now? Chaos Engineering isn't particularly new. As you've seen, Netflix has been doing it since 2011. But it does feel more urgent and relevant today. That's because the complexity of the software infrastructure behind many of the biggest Silicon Valley companies is now mainstream. It's normal. Cloud isn't an exotic buzzword any more - it's a reality (a reality that often has failures). Microservices are common - they're a commonsense way of building better applications and websites. Alongside this increased complexity, there is also a growing awareness of how much software outages can cost businesses. In a white paper, Gremlin make a big deal out of how much money is lost due to outages. Gremlin cite BAs system failure in summer 2017, which led to passengers stranded all over the world. This outage was estimated to have cost BA $135 million. It also refers to the Amazon S3 outage in March 2017, which is believed to have cost Amazon's customers $150 million. So - outages cost money. Yes, it's marketing spiel from Gremlin, but it's also true. It doesn't take a genius to work out that if you're eCommerce site is down for an hour, you're going to have lost a lot of money. Because software performance is so tied up with business performance, it feels incredibly fragile. That's why Chaos Engineering is perhaps more important and popular than ever. It's a way of countering that fragility. The key challenges of Chaos Engineering Chaos Engineering poses many challenges to software engineering teams. First and foremost, it requires a big cultural change. If you're intent on breaking everything, there are no rules about how things should work or what you're trying to build. Instead you're looking for the best way to build software that performs for the user. More practically, Chaos Engineering isn't that easy to do in a cost-effective manner. Everything Gremlin details in its white paper is very much true - of course outages cost a hell of a lot. But creative destruction and experimentation feels like an expensive route through software projects. It's not hard to see how it might appear self-indulgent, especially to a company or organization where software isn't properly understood. And more to the point, how often do businesses actually do the smart thing when they're building software? Long term projects are always difficult. So much software evolves pragmatically - often for the worse.  Adding in an extra layer of experimentation and detailed testing is a weird mix of bacchanalian and hyper-organized, something that many organizations just couldn't process or properly understand. Chaos engineering and the future of software development Chaos Engineering certainly looks like the future of software development. The only question is whether services like those provided by Gremlin will take off. To understand the true value of stress testing your infrastructure you do need at least a modicum awareness of the complexity of your infrastructure. Indeed, you probably need to have a conversation about what services and dependencies are most business critical. Or rather, which ones most impact the user. That's something this TechCrunch piece addresses: "Testing can... be very political. Finding the points of failure in a system might force deep conversations about a particular software architecture and its robustness in the face of tough situations. A particular company might be deeply invested in a specific technical roadmap (e.g. microservices) that chaos engineering tests show is not as resilient to failures as originally predicted." This means there is going to be a question mark over the extent to which Chaos Engineering ever really enters the mainstream. How many businesses want to have these conversations? It's not just about the inclination - it's also about the time and money. It's an innovative software engineering approach that really calls people's bluff when they talk about innovation. It asks difficult questions about how and why you innovate: do you do new things because you think you should? Is this new thing going to be good for the business? And how well will it work for users? Of course these questions are vital when you're building software. But they rarely make building software easier.
Read more
  • 0
  • 0
  • 6936

article-image-using-nodejs-dependencies-nwjs
Max Gfeller
19 Nov 2015
6 min read
Save for later

Using Node.js dependencies in NW.js

Max Gfeller
19 Nov 2015
6 min read
NW.js (formerly known as node-webkit) is a framework that makes it possible to write multi-platform desktop applications using the technologies you already know well: HTML, CSS and JavaScript. It bundles a Chromium and a Node (or io.js) runtime and provides additional APIs to implement native-like features like real menu bars or desktop notifications. A big advantage of having a Node/io.js runtime is to be able to make use of all the modules that are available for node developers. We can categorize three different types of modules that we can use. Internal modules Node comes with a solid set of internal modules like fs or http. It is built on the UNIX philosophy of doing only one thing and doing it very well. Therefore you won't find too much functionality in node core. The following modules are shipped with node: assert: used for writing unit tests buffer: raw memory allocation used for dealing with binary data child_process: spawn and use child processes cluster: take advatage of multi-core systems crypto: cryptographic functions dgram: use datagram sockets - dns: perform DNS lookups domain: handle multiple different IO operations as a single group events: provides the EventEmitter fs: operations on the file system http: perform http queries and create http servers https: perform https queries and create https servers net: asynchronous network wrapper os: basic operating-system related utility functions path: handle and transform file paths punycode: deal with punycode domain names querystring: deal with query strings stream: abstract interface implemented by various objects in Node timers: setTimeout, setInterval etc. tls: encrypted stream communication url: URL resolution and parsing util: various utility functions vm: sandbox to run Node code in zlib: bindings to Gzip/Gunzip, Deflate/Inflate, and DeflateRaw/InflateRaw Those are documented on the official Node API documentation and can all be used within NW.js. Please take care that Chromium already defines a crypto global, so when using the crypto module in the webkit context you should assign it to a variable like crypt rather than crypto: var crypt = require('crypto'); The following example shows how we would read a file and use its contents using Node's modules: var fs = require('fs'); fs.readFile(__dirname + '/file.txt', function (error, contents) {   if (error) returnconsole.error(error);   console.log(contents); }); 3rd party JavaScript modules Soon after Node itself was started, Isaac Schlueter, who was friend of creator Ryan Dahl, started working on a package manager for Node itself. While Nodes's popularity reached new highs, a lot of packages got added to the npm registry and it soon became the fastest growing package registry. To the time of this writing there are over 169'000 packages on the registry and nearly two billion downloads each month. The npm registry is now also slowly evolving from being "only" a package manager for Node into a package manager for all things Javascript. Most of these packages can also be used inside NW.js applications. Your application's dependencies are being defined in your package.json file in the dependencies(or devDependencies) section: {   "name": "my-cool-application",   "version": "1.0.0",   "dependencies": {     "lodash": "^3.1.2"   },   "devDependencies": {     "uglify-js": "^2.4.3"   } } In the dependencies field you find all the modules that are required to run your application while in the devDependencies field only the modules required while developing the application are found. Installing a module is fairly easy and the best way to do this is with the npm install command: npm install lodash --save The install command directly downloads the latest version into your node_modules/ folder. The --save flag means that this dependency should also directly be written into your package.json file. You can also define a specific version to download by using following notation: npm install lodash@1.* or even npm install [email protected] How does node's require() work? You need to deal with two different contexts in NW.js and it is really important to always know which context you are currently in as it changes the way the require() function works. When you load a moule using Node's require() function, then this module runs in the Node context. That means you have the same globals as you would have in a pure Node script but you can't access the globals from the browser, e.g. document or window. If you write Javascript code inside of a <script> tag in your html, or when you include a script inside your HTML using <script src="">, then this code runs in the webkit context. There you have access to all browsers globals. In the webkit context The require() function is a module loading system defined by the CommonJS Modules 1.0 standard and directly implemented in node core. To offer the same smooth experience you get a modified require() method that works in webkit, too. Whenever you want to include a certain module from the webkit context, e.g. directly from an inline script in your index.html file, you need to specify the path directly from the root of your project. Let's assume the following folder structure: - app/   - app.js   - foo.js   - bar.js   - index.html And you want to include the app/app.js file directly in your index.html you need to include it like this: <script type="text/javascript">   var app = require('./app/app.js'); </script> If you need to use a module from npm then you can simply require() it and NW.js will figure out where the corresponding node_modules/ folder is located. In the node context In node when you use relative paths it will always try to locate this module relative to the file you are requiring it from. If we take the example from above then we could require the foo.js module from app.js like this: var foo = require('./foo'); About the Author Max Gfeller is a passionate web developer and JavaScript enthusiast. He is making awesome things at Cylon and can be found on Twitter @mgefeller.
Read more
  • 0
  • 0
  • 6930

article-image-optimize-scans
Packt
23 Jun 2017
20 min read
Save for later

To Optimize Scans

Packt
23 Jun 2017
20 min read
In this article by Paulino Calderon Pale author of the book Nmap Network Exploration and Security Auditing Cookbook, Second Edition, we will explore the following topics: Skipping phases to speed up scans Selecting the correct timing template Adjusting timing parameters Adjusting performance parameters (For more resources related to this topic, see here.) One of my favorite things about Nmap is how customizable it is. If configured properly, Nmap can be used to scan from single targets to millions of IP addresses in a single run. However, we need to be careful and need to understand the configuration options and scanning phases that can affect performance, but most importantly, really think about our scan objective beforehand. Do we need the information from the reverse DNS lookup? Do we know all targets are online? Is the network congested? Do targets respond fast enough? These and many more aspects can really add up to your scanning time. Therefore, optimizing scans is important and can save us hours if we are working with many targets. This article starts by introducing the different scanning phases, timing, and performance options. Unless we have a solid understanding of what goes on behind the curtains during a scan, we won't be able to completely optimize our scans. Timing templates are designed to work in common scenarios, but we want to go further and shave off those extra seconds per host during our scans. Remember that this can also not only improve performance but accuracy as well. Maybe those targets marked as offline were only too slow to respond to the probes sent after all. Skipping phases to speed up scans Nmap scans can be broken in phases. When we are working with many hosts, we can save up time by skipping tests or phases that return information we don't need or that we already have. By carefully selecting our scan flags, we can significantly improve the performance of our scans. This explains the process that takes place behind the curtains when scanning, and how to skip certain phases to speed up scans. How to do it... To perform a full port scan with the timing template set to aggressive, and without the reverse DNS resolution (-n) or ping (-Pn), use the following command: # nmap -T4 -n -Pn -p- 74.207.244.221 Note the scanning time at the end of the report: Nmap scan report for 74.207.244.221 Host is up (0.11s latency). Not shown: 65532 closed ports PORT     STATE SERVICE 22/tcp   open ssh 80/tcp   open http 9929/tcp open nping-echo Nmap done: 1 IP address (1 host up) scanned in 60.84 seconds Now, compare the running time that we get if we don't skip any tests: # nmap -p- scanme.nmap.org Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up (0.11s latency). Not shown: 65532 closed ports PORT     STATE SERVICE 22/tcp   open ssh 80/tcp   open http 9929/tcp open nping-echo Nmap done: 1 IP address (1 host up) scanned in 77.45 seconds Although the time difference isn't very drastic, it really adds up when you work with many hosts. I recommend that you think about your objectives and the information you need, to consider the possibility of skipping some of the scanning phases that we will describe next. How it works... Nmap scans are divided in several phases. Some of them require some arguments to be set to run, but others, such as the reverse DNS resolution, are executed by default. Let's review the phases that can be skipped and their corresponding Nmap flag: Target enumeration: In this phase, Nmap parses the target list. This phase can't exactly be skipped, but you can save DNS forward lookups using only the IP addresses as targets. Host discovery: This is the phase where Nmap establishes if the targets are online and in the network. By default, Nmap sends an ICMP echo request and some additional probes, but it supports several host discovery techniques that can even be combined. To skip the host discovery phase (no ping), use the flag -Pn. And we can easily see what probes we skipped by comparing the packet trace of the two scans: $ nmap -Pn -p80 -n --packet-trace scanme.nmap.org SENT (0.0864s) TCP 106.187.53.215:62670 > 74.207.244.221:80 S ttl=46 id=4184 iplen=44 seq=3846739633 win=1024 <mss 1460> RCVD (0.1957s) TCP 74.207.244.221:80 > 106.187.53.215:62670 SA ttl=56 id=0 iplen=44 seq=2588014713 win=14600 <mss 1460> Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up (0.11s latency). PORT   STATE SERVICE 80/tcp open http Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds For scanning without skipping host discovery, we use the command: $ nmap -p80 -n --packet-trace scanme.nmap.orgSENT (0.1099s) ICMP 106.187.53.215 > 74.207.244.221 Echo request (type=8/code=0) ttl=59 id=12270 iplen=28 SENT (0.1101s) TCP 106.187.53.215:43199 > 74.207.244.221:443 S ttl=59 id=38710 iplen=44 seq=1913383349 win=1024 <mss 1460> SENT (0.1101s) TCP 106.187.53.215:43199 > 74.207.244.221:80 A ttl=44 id=10665 iplen=40 seq=0 win=1024 SENT (0.1102s) ICMP 106.187.53.215 > 74.207.244.221 Timestamp request (type=13/code=0) ttl=51 id=42939 iplen=40 RCVD (0.2120s) ICMP 74.207.244.221 > 106.187.53.215 Echo reply (type=0/code=0) ttl=56 id=2147 iplen=28 SENT (0.2731s) TCP 106.187.53.215:43199 > 74.207.244.221:80 S ttl=51 id=34952 iplen=44 seq=2609466214 win=1024 <mss 1460> RCVD (0.3822s) TCP 74.207.244.221:80 > 106.187.53.215:43199 SA ttl=56 id=0 iplen=44 seq=4191686720 win=14600 <mss 1460> Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up (0.10s latency). PORT   STATE SERVICE 80/tcp open http Nmap done: 1 IP address (1 host up) scanned in 0.41 seconds Reverse DNS resolution: Host names often reveal by themselves additional information and Nmap uses reverse DNS lookups to obtain them. This step can be skipped by adding the argument -n to your scan arguments. Let's see the traffic generated by the two scans with and without reverse DNS resolution. First, let's skip reverse DNS resolution by adding -n to your command: $ nmap -n -Pn -p80 --packet-trace scanme.nmap.orgSENT (0.1832s) TCP 106.187.53.215:45748 > 74.207.244.221:80 S ttl=37 id=33309 iplen=44 seq=2623325197 win=1024 <mss 1460> RCVD (0.2877s) TCP 74.207.244.221:80 > 106.187.53.215:45748 SA ttl=56 id=0 iplen=44 seq=3220507551 win=14600 <mss 1460> Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up (0.10s latency). PORT   STATE SERVICE 80/tcp open http   Nmap done: 1 IP address (1 host up) scanned in 0.32 seconds And if we try the same command but not' skipping reverse DNS resolution, as follows: $ nmap -Pn -p80 --packet-trace scanme.nmap.org NSOCK (0.0600s) UDP connection requested to 106.187.36.20:53 (IOD #1) EID 8 NSOCK (0.0600s) Read request from IOD #1 [106.187.36.20:53] (timeout: -1ms) EID                                                 18 NSOCK (0.0600s) UDP connection requested to 106.187.35.20:53 (IOD #2) EID 24 NSOCK (0.0600s) Read request from IOD #2 [106.187.35.20:53] (timeout: -1ms) EID                                                 34 NSOCK (0.0600s) UDP connection requested to 106.187.34.20:53 (IOD #3) EID 40 NSOCK (0.0600s) Read request from IOD #3 [106.187.34.20:53] (timeout: -1ms) EID                                                 50 NSOCK (0.0600s) Write request for 45 bytes to IOD #1 EID 59 [106.187.36.20:53]:                                                 =............221.244.207.74.in-addr.arpa..... NSOCK (0.0600s) Callback: CONNECT SUCCESS for EID 8 [106.187.36.20:53] NSOCK (0.0600s) Callback: WRITE SUCCESS for EID 59 [106.187.36.20:53] NSOCK (0.0600s) Callback: CONNECT SUCCESS for EID 24 [106.187.35.20:53] NSOCK (0.0600s) Callback: CONNECT SUCCESS for EID 40 [106.187.34.20:53] NSOCK (0.0620s) Callback: READ SUCCESS for EID 18 [106.187.36.20:53] (174 bytes) NSOCK (0.0620s) Read request from IOD #1 [106.187.36.20:53] (timeout: -1ms) EID                                                 66 NSOCK (0.0620s) nsi_delete() (IOD #1) NSOCK (0.0620s) msevent_cancel() on event #66 (type READ) NSOCK (0.0620s) nsi_delete() (IOD #2) NSOCK (0.0620s) msevent_cancel() on event #34 (type READ) NSOCK (0.0620s) nsi_delete() (IOD #3) NSOCK (0.0620s) msevent_cancel() on event #50 (type READ) SENT (0.0910s) TCP 106.187.53.215:46089 > 74.207.244.221:80 S ttl=42 id=23960 ip                                                 len=44 seq=1992555555 win=1024 <mss 1460> RCVD (0.1932s) TCP 74.207.244.221:80 > 106.187.53.215:46089 SA ttl=56 id=0 iplen                                                =44 seq=4229796359 win=14600 <mss 1460> Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up (0.10s latency). PORT   STATE SERVICE 80/tcp open http Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds Port scanning: In this phase, Nmap determines the state of the ports. By default, it uses SYN/TCP Connect scanning depending on the user privileges, but several other port scanning techniques are supported. Although this may not be so obvious, Nmap can do a few different things with targets without port scanning them like resolving their DNS names or checking whether they are online. For this reason, this phase can be skipped with the argument -sn: $ nmap -sn -R --packet-trace 74.207.244.221 SENT (0.0363s) ICMP 106.187.53.215 > 74.207.244.221 Echo request (type=8/code=0) ttl=56 id=36390 iplen=28 SENT (0.0364s) TCP 106.187.53.215:53376 > 74.207.244.221:443 S ttl=39 id=22228 iplen=44 seq=155734416 win=1024 <mss 1460> SENT (0.0365s) TCP 106.187.53.215:53376 > 74.207.244.221:80 A ttl=46 id=36835 iplen=40 seq=0 win=1024 SENT (0.0366s) ICMP 106.187.53.215 > 74.207.244.221 Timestamp request (type=13/code=0) ttl=50 id=2630 iplen=40 RCVD (0.1377s) TCP 74.207.244.221:443 > 106.187.53.215:53376 RA ttl=56 id=0 iplen=40 seq=0 win=0 NSOCK (0.1660s) UDP connection requested to 106.187.36.20:53 (IOD #1) EID 8 NSOCK (0.1660s) Read request from IOD #1 [106.187.36.20:53] (timeout: -1ms) EID 18 NSOCK (0.1660s) UDP connection requested to 106.187.35.20:53 (IOD #2) EID 24 NSOCK (0.1660s) Read request from IOD #2 [106.187.35.20:53] (timeout: -1ms) EID 34 NSOCK (0.1660s) UDP connection requested to 106.187.34.20:53 (IOD #3) EID 40 NSOCK (0.1660s) Read request from IOD #3 [106.187.34.20:53] (timeout: -1ms) EID 50 NSOCK (0.1660s) Write request for 45 bytes to IOD #1 EID 59 [106.187.36.20:53]: [............221.244.207.74.in-addr.arpa..... NSOCK (0.1660s) Callback: CONNECT SUCCESS for EID 8 [106.187.36.20:53] NSOCK (0.1660s) Callback: WRITE SUCCESS for EID 59 [106.187.36.20:53] NSOCK (0.1660s) Callback: CONNECT SUCCESS for EID 24 [106.187.35.20:53] NSOCK (0.1660s) Callback: CONNECT SUCCESS for EID 40 [106.187.34.20:53] NSOCK (0.1660s) Callback: READ SUCCESS for EID 18 [106.187.36.20:53] (174 bytes) NSOCK (0.1660s) Read request from IOD #1 [106.187.36.20:53] (timeout: -1ms) EID 66 NSOCK (0.1660s) nsi_delete() (IOD #1) NSOCK (0.1660s) msevent_cancel() on event #66 (type READ) NSOCK (0.1660s) nsi_delete() (IOD #2) NSOCK (0.1660s) msevent_cancel() on event #34 (type READ) NSOCK (0.1660s) nsi_delete() (IOD #3) NSOCK (0.1660s) msevent_cancel() on event #50 (type READ) Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up (0.10s latency). Nmap done: 1 IP address (1 host up) scanned in 0.17 seconds In the previous example, we can see that an ICMP echo request and a reverse DNS lookup were performed (We forced DNS lookups with the option -R), but no port scanning was done. There's more... I recommend that you also run a couple of test scans to measure the speeds of the different DNS servers. I've found that ISPs tend to have the slowest DNS servers, but you can make Nmap use different DNS servers by specifying the argument --dns-servers. For example, to use Google's DNS servers, use the following command: # nmap -R --dns-servers 8.8.8.8,8.8.4.4 -O scanme.nmap.org You can test your DNS server speed by comparing the scan times. The following command tells Nmap not to ping or scan the port and only perform a reverse DNS lookup: $ nmap -R -Pn -sn 74.207.244.221 Nmap scan report for scanme.nmap.org (74.207.244.221) Host is up. Nmap done: 1 IP address (1 host up) scanned in 1.01 seconds To further customize your scans, it is important that you understand the scan phases of Nmap. See Appendix-Scanning Phases for more information. Selecting the correct timing template Nmap includes six templates that set different timing and performance arguments to optimize your scans based on network condition. Even though Nmap automatically adjusts some of these values, it is recommended that you set the correct timing template to hint Nmap about the speed of your network connection and the target's response time. The following will teach you about Nmap's timing templates and how to choose the more appropriate one. How to do it... Open your terminal and type the following command to use the aggressive timing template (-T4). Let's also use debugging (-d) to see what Nmap option -T4 sets: # nmap -T4 -d 192.168.4.20 --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 500, min 100, max 1250 max-scan-delay: TCP 10, UDP 1000, SCTP 10 parallelism: min 0, max 0 max-retries: 6, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- <Scan output removed for clarity> You may use the integers between 0 and 5, for example,-T[0-5]. How it works... The option -T is used to set the timing template in Nmap. Nmap provides six timing templates to help users tune the timing and performance arguments. The available timing templates and their initial configuration values are as follows: Paranoid(-0)—This template is useful to avoid detection systems, but it is painfully slow because only one port is scanned at a time, and the timeout between probes is 5 minutes: --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 300000, min 100, max 300000 max-scan-delay: TCP 1000, UDP 1000, SCTP 1000 parallelism: min 0, max 1 max-retries: 10, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- Sneaky (-1)—This template is useful for avoiding detection systems but is still very slow: --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 15000, min 100, max 15000 max-scan-delay: TCP 1000, UDP 1000, SCTP 1000 parallelism: min 0, max 1 max-retries: 10, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- Polite (-2)—This template is used when scanning is not supposed to interfere with the target system, very conservative and safe setting: --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 1000, min 100, max 10000 max-scan-delay: TCP 1000, UDP 1000, SCTP 1000 parallelism: min 0, max 1 max-retries: 10, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- Normal (-3)—This is Nmap's default timing template, which is used when the argument -T is not set: --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 1000, min 100, max 10000 max-scan-delay: TCP 1000, UDP 1000, SCTP 1000 parallelism: min 0, max 0 max-retries: 10, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- Aggressive (-4)—This is the recommended timing template for broadband and Ethernet connections: --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 500, min 100, max 1250 max-scan-delay: TCP 10, UDP 1000, SCTP 10 parallelism: min 0, max 0 max-retries: 6, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- Insane (-5)—This timing template sacrifices accuracy for speed: --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 250, min 50, max 300 max-scan-delay: TCP 5, UDP 1000, SCTP 5 parallelism: min 0, max 0 max-retries: 2, host-timeout: 900000 min-rate: 0, max-rate: 0 --------------------------------------------- There's more... An interactive mode in Nmap allows users to press keys to dynamically change the runtime variables, such as verbose, debugging, and packet tracing. Although the discussion of including timing and performance options in the interactive mode has come up a few times in the development mailing list; so far, this hasn't been implemented yet. However, there is an unofficial patch submitted in June 2012 that allows you to change the minimum and maximum packet rate values(--max-rateand --min-rate) dynamically. If you would like to try it out, it's located at http://seclists.org/nmap-dev/2012/q2/883. Adjusting timing parameters Nmap not only adjusts itself to different network and target conditions while scanning, but it can be fine-tuned using timing options to improve performance. Nmap automatically calculates packet round trip, timeout, and delay values, but these values can also be set manually through specific settings. The following describes the timing parameters supported by Nmap. How to do it... Enter the following command to adjust the initial round trip timeout, the delay between probes and a time out for each scanned host: # nmap -T4 --scan-delay 1s --initial-rtt-timeout 150ms --host-timeout 15m -d scanme.nmap.org --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 150, min 100, max 1250 max-scan-delay: TCP 1000, UDP 1000, SCTP 1000 parallelism: min 0, max 0 max-retries: 6, host-timeout: 900000 min-rate: 0, max-rate: 0 --------------------------------------------- How it works... Nmap supports different timing arguments that can be customized. However, setting these values incorrectly will most likely hurt performance rather than improve it. Let's examine closer each timing parameter and learn its Nmap option parameter name. The Round Trip Time (RTT) value is used by Nmap to know when to give up or retransmit a probe response. Nmap estimates this value by analyzing previous responses, but you can set the initial RTT timeout with the argument --initial-rtt-timeout, as shown in the following command: # nmap -A -p- --initial-rtt-timeout 150ms <target> In addition, you can set the minimum and maximum RTT timeout values with--min-rtt-timeout and --max-rtt-timeout, respectively, as shown in the following command: # nmap -A -p- --min-rtt-timeout 200ms --max-rtt-timeout 600ms <target> Another very important setting we can control in Nmap is the waiting time between probes. Use the arguments --scan-delay and --max-scan-delay to set the waiting time and maximum amount of time allowed to wait between probes, respectively, as shown in the following commands: # nmap -A --max-scan-delay 10s scanme.nmap.org # nmap -A --scan-delay 1s scanme.nmap.org Note that the arguments previously shown are very useful when avoiding detection mechanisms. Be careful not to set --max-scan-delay too low because it will most likely miss the ports that are open. There's more... If you would like Nmap to give up on a host after a certain amount of time, you can set the argument --host-timeout: # nmap -sV -A -p- --host-timeout 5m <target> Estimating round trip times with Nping To use Nping to estimate the round trip time taken between the target and you, the following command can be used: # nping -c30 <target> This will make Nping send 30 ICMP echo request packets, and after it finishes, it will show the average, minimum, and maximum RTT values obtained: # nping -c30 scanme.nmap.org ... SENT (29.3569s) ICMP 50.116.1.121 > 74.207.244.221 Echo request (type=8/code=0) ttl=64 id=27550 iplen=28 RCVD (29.3576s) ICMP 74.207.244.221 > 50.116.1.121 Echo reply (type=0/code=0) ttl=63 id=7572 iplen=28 Max rtt: 10.170ms | Min rtt: 0.316ms | Avg rtt: 0.851ms Raw packets sent: 30 (840B) | Rcvd: 30 (840B) | Lost: 0 (0.00%) Tx time: 29.09096s | Tx bytes/s: 28.87 | Tx pkts/s: 1.03 Rx time: 30.09258s | Rx bytes/s: 27.91 | Rx pkts/s: 1.00 Nping done: 1 IP address pinged in 30.47 seconds Examine the round trip times and use the maximum to set the correct --initial-rtt-timeout and --max-rtt-timeout values. The official documentation recommends using double the maximum RTT value for the --initial-rtt-timeout, and as high as four times the maximum round time value for the –max-rtt-timeout. Displaying the timing settings Enable debugging to make Nmap inform you about the timing settings before scanning: $ nmap -d<target> --------------- Timing report --------------- hostgroups: min 1, max 100000 rtt-timeouts: init 1000, min 100, max 10000 max-scan-delay: TCP 1000, UDP 1000, SCTP 1000 parallelism: min 0, max 0 max-retries: 10, host-timeout: 0 min-rate: 0, max-rate: 0 --------------------------------------------- To further customize your scans, it is important that you understand the scan phases of Nmap. See Appendix-Scanning Phases for more information. Adjusting performance parameters Nmap not only adjusts itself to different network and target conditions while scanning, but it also supports several parameters that affect the behavior of Nmap, such as the number of hosts scanned concurrently, number of retries, and number of allowed probes. Learning how to adjust these parameters properly can reduce a lot of your scanning time. The following explains the Nmap parameters that can be adjusted to improve performance. How to do it... Enter the following command, adjusting the values for your target condition: $ nmap --min-hostgroup 100 --max-hostgroup 500 --max-retries 2 <target> How it works... The command shown previously tells Nmap to scan and report by grouping no less than 100 (--min-hostgroup 100) and no more than 500 hosts (--max-hostgroup 500). It also tells Nmap to retry only twice before giving up on any port (--max-retries 2): # nmap --min-hostgroup 100 --max-hostgroup 500 --max-retries 2 <target> It is important to note that setting these values incorrectly will most likely hurt the performance or accuracy rather than improve it. Nmap sends many probes during its port scanning phase due to the ambiguity of what a lack of responsemeans; either the packet got lost, the service is filtered, or the service is not open. By default, Nmap adjusts the number of retries based on the network conditions, but you can set this value with the argument --max-retries. By increasing the number of retries, we can improve Nmap's accuracy, but keep in mind this sacrifices speed: $ nmap --max-retries 10<target> The arguments --min-hostgroup and --max-hostgroup control the number of hosts that we probe concurrently. Keep in mind that reports are also generated based on this value, so adjust it depending on how often would you like to see the scan results. Larger groups are optimalto improve performance, but you may prefer smaller host groups on slow networks: # nmap -A -p- --min-hostgroup 100 --max-hostgroup 500 <target> There is also a very important argument that can be used to limit the number of packets sent per second by Nmap. The arguments --min-rate and --max-rate need to be used carefully to avoid undesirable effects. These rates are set automatically by Nmap if the arguments are not present: # nmap -A -p- --min-rate 50 --max-rate 100 <target> Finally, the arguments --min-parallelism and --max-parallelism can be used to control the number of probes for a host group. By setting these arguments, Nmap will no longer adjust the values dynamically: # nmap -A --max-parallelism 1 <target> # nmap -A --min-parallelism 10 --max-parallelism 250 <target> There's more... If you would like Nmap to give up on a host after a certain amount of time, you can set the argument --host-timeout, as shown in the following command: # nmap -sV -A -p- --host-timeout 5m <target> Interactive mode in Nmap allows users to press keys to dynamically change the runtime variables, such as verbose, debugging, and packet tracing. Although the discussion of including timing and performance options in the interactive mode has come up a few times in the development mailing list, so far this hasn't been implemented yet. However, there is an unofficial patch submitted in June 2012 that allows you to change the minimum and maximum packet rate values (--max-rate and --min-rate) dynamically. If you would like to try it out, it's located at http://seclists.org/nmap-dev/2012/q2/883. To further customize your scans, it is important that you understand the scan phases of Nmap. See Appendix-Scanning Phases for more information. Summary In this article, we are finally able to learn how to implement and optimize scans. Nmap scans among several clients, allowing us to save time and take advantage of extra bandwidth and CPU resources. This article is short but full of tips for optimizing your scans. Prepare to dig deep into Nmap's internals and the timing and performance parameters! Resources for Article: Further resources on this subject: Introduction to Network Security Implementing OpenStack Networking and Security Communication and Network Security
Read more
  • 0
  • 0
  • 6916
article-image-data-access-methods-flex-3
Packt
06 Oct 2009
4 min read
Save for later

Data Access Methods in Flex 3

Packt
06 Oct 2009
4 min read
Flex provides a range of data access components to work with server-side remote data. These components are also commonly called Remote Procedure Call or RPC services. The Remote Procedure Call allows you to execute remote or server-side procedure or method, either locally or remotely in another address space without having to write the details of that method explicitly. (For more information on RPC, visit http://en.wikipedia.org/wiki/Remote_procedure_call.) The Flex data access components are based on Service Oriented Architecture (SOA). The Flex data access components allow you to call the server-side business logic built in Java, PHP, ColdFusion, .NET, or any other server-side technology to send and receive remote data. In this article by Satish Kore, we will learn how to interact with a server environment (specifically built with Java). We will look at the various data access components which includes HTTPService class and WebService class. This article focuses on providing in-depth information on the various data access methods available in Flex. Flex data access components Flex data access components provide call-and-response model to access remote data. There are three methods for accessing remote data in Flex: via the HTTPService class, the WebService class (compliant with Simple Object Access Protocol, or SOAP), or the RemoteObject class. The HTTPService class The HTTPService class is a basic method to send and receive remote data over the HTTP protocol. It allows you to perform traditional HTTP GET and POST requests to a URL. The HTTPService class can be used with any kind of service-side technology such as JSP, PHP, ColdFusion, ASP, and so on. The HTTP service is also known as a REST-style web service. REST stands for Representational State Transfer. It is an approach for getting content from a web server via web pages. Web pages are written in many languages such as JSP, PHP, ColdFusion, ASP, and so on that receive POST or GET requests. Output can be formatted in XML instead of HTML, which can be used in Flex applications for manipulating and displaying content. For example, RSS feeds output standard XML content when accessed using URL. For more information about REST, see www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm. Using the HTTPService tag in MXML The HTTPService class can be used to load both static and dynamic content from remote URLs. For example, you can use HTTPService to load an XML fi le that is located on a server, or you can also use HTTPService in conjunction with server-side technologies that return dynamic results to the Flex application. You can use both the HTTPS and the HTTP protocol to access secure and non-secure remote content. The following is the basic syntax of a HTTPService class: <mx:HTTPService id="instanceName" method="GET|POST|HEAD|OPTIONS|PUT|TRACE|DELETE" resultFormat="object|array|xml|e4x|flashvars|text" url="http://www.mydomain.com/myFile.jsp" fault="faultHandler(event);" result="resultHandler(event)"/> The following are the properties and events: id: Specifies instance identifier name method: Specifies HTTP method; the default value is GET resultFormat: Specifies the format of the result data url: Specifies the complete remote URL which will be called fault: Specifies the event handler method that'll be called when a fault or error occurs while connecting to the URL result: Specifies the event handler method to call when the HttpService call completed successfully and returned results When you call the HTTPService.send() method, Flex will make the HTTP request to the specified URL. The HTTP response will be returned in the result property of the ResultEvent class in the result handler of the HTTPService class. You can also pass an optional parameter to the send() method by passing in an object. Any attributes of the object will be passed in as a URL-encoded query string along with URL request. For example: var param:Object = {title:"Book Title", isbn:"1234567890"}myHS.send(param); In the above code example, we have created an object called param and added two string attributes to it: title and isbn. When you pass the param object to the send() method, these two string attributes will get converted to URL-encoded http parameters and your HTTP request URL will look like this: http://www.domain.com/myjsp.jsp?title=Book Title &isbn=1234567890. This way, you can pass any number of HTTP parameters to the URL and these parameters can be accessed in your server-side code. For example, in a JSP or servlet, you could use request.getParameter("title") to get the HTTP request parameter.
Read more
  • 0
  • 0
  • 6913

article-image-6-javascript-micro-optimizations-need-know
Savia Lobo
05 Apr 2018
18 min read
Save for later

6 JavaScript micro optimizations you need to know

Savia Lobo
05 Apr 2018
18 min read
JavaScript micro optimizations can improve the performance of your JavaScript code. This means you can get it to do more - this is essential especially when thinking about the scale of modern web applications, as greater efficiencies in code can lead to much stronger overall performance. Let us have a look at micro optimizations in detail. Truthy/falsy comparisons We have all, at some point, written if conditions or assigned default values by relying on the truthy or falsy nature of the JavaScript variables. As helpful as it is most of the times, we will need to consider the impact that such an operation would cause on our application. However, before we jump into the details, let's discuss how any condition is evaluated in JavaScript, specifically an if condition in this case. As a developer, we tend to do the following: if(objOrNumber) { // do something } This works for most of the cases, unless the number is 0, in which case it gets evaluated to false. That is a very common edge case, and most of us catch it anyway. However, what does the JavaScript engine have to do to evaluate this condition? How does it know whether the objOrNumber evaluates to true or false? Let's return to our ECMA262 specs and pull out the IF condition spec (https://www.ecma-international.org/ecma-262/5.1/#sec-12.5). The following is an excerpt of the same: Semantics The production IfStatement : If (Expression) Statement else Statement Statement is evaluated as follows:    Let exprRef be the result of evaluating Expression.    If ToBoolean(GetValue(exprRef)) is true, then Return the result of evaluating the first Statement.    Else, Return the result of evaluating the second Statement. Now, we note that whatever expression we pass goes through the following three steps: Getting the exprRef from Expression. GetValue is called on exprRef. ToBoolean is called as the result of step 2. Step 1 does not concern us much at this stage; think of it this way—an expression can be something like a == b or something like the shouldIEvaluateTheIFCondition() method call, that is, something that evaluates your condition. Step 2 extracts the value of the exprRef, that is, 10, true, undefined. In this step, we differentiate how the value is extracted based on the type of the exprRef. You can refer to the details of GetValue here. Step 3 then converts the value extracted from Step 2 into a Boolean value based on the following table (taken from https://www.ecma-international.org/ecma-262/5.1/#sec-9. 2): At each step, you can see that it is always beneficial if we are able to provide the direct boolean value instead of a truthy or falsy value. Looping optimizations We can do a deep-down dive into the for loop, similar to what we did with the if condition earlier (https://www.ecma-international.org/ecma-262/5.1/#sec-12.6.3), but there are easier and more obvious optimizations which can be applied when it comes to loops. Simple changes can drastically affect the quality and performance of the code; consider this for example: for(var i = 0; i < arr.length; i++) { // logic } The preceding code can be changed as follows: var len = arr.length; for(var i = 0; i < len; i++) { // logic } What is even better is to run the loops in reverse, which is even faster than what we have seen previously: var len = arr.length; for(var i = len; i >= 0; i--) { // logic } The conditional function call Some of the features that we have within our applications are conditional. For example, logging or analytics fall into this category. Some of the applications may have logging turned off for some time and then turned back on. The most obvious way of achieving this is to wrap the method for logging within an if condition. However, since the method could be triggered a lot of times, there is another way in which we can make the optimization in this case: function someUserAction() { // logic if (analyticsEnabled) { trackUserAnalytics(); } } // in some other class function trackUserAnalytics() { // save analytics } Instead of the preceding approach, we can instead try to do something, which is only slightly different but allows V8-based engines to optimize the way the code is executed: function someUserAction() { // logic trackUserAnalytics(); } // in some other class function toggleUserAnalytics() { if(enabled) { trackUserAnalytics =   userAnalyticsMethod; } else { trackUserAnalytics = noOp; } } function userAnalyticsMethod() { // save analytics } // empty function function noOp           {} Now, the preceding implementation is a double-edged sword. The reason for that is very simple. JavaScript engines employ a technique called inline caching (IC), which means that any previous lookup for a certain method performed by the JS engine will be cached and reused when triggered the next time; for example, if we have an object that has a nested method, a.b.c, the method a.b.c will be only looked up once and stored on cache (IC); if a.b.c is called the next time, it will be picked up from IC, and the JS engine will not parse the whole chain again. If there are any changes to the a.b.c chain, then the IC gets invalidated and a new dynamic lookup is performed the next time instead of being retrieved from the IC. So, from our previous example, when we have noOp assigned to the trackUserAnalytics() method, the method path gets tracked and saved within IC, but it internally removes this function call as it is a call to an empty method. However, when it is applied to an actual function with some logic in it, IC points it directly to this new method. So, if we keep calling our toggleUserAnalytics() method multiple times, it keeps invalidating our IC, and our dynamic method lookup has to happen every time until the application state stabilizes (that is, toggleUserAnalytics() is no longer called). Image and font optimizations When it comes to image and font optimizations, there are no limits to the types and the scale of optimization that we can perform. However, we need to keep in mind our target audience, and we need to tailor our approach based on the problem at hand. With both images and fonts, the first and foremost important thing is that we do not overserve, that is, we request and send only the data that is necessary by determining the dimensions of the device that our application is running on. The simplest way to do this is by adding a cookie for your device size and sending it to the server along with each of the request. Once the server receives the request for the image, it can then retrieve the image based on the dimension of the image that was sent to the cookie. Most of the time these images are something like a user avatar or a list of people who commented on a certain post. We can agree that the thumbnail images do not need to be of the same size as that of the profile page, and we can save some of the bandwidth while transmitting a smaller image based on the image. Since screens these days have very high Dots Per Inch (DPI), the media that we serve to screens needs to be worthy of it. Otherwise, the application looks bad and the images look all pixelated. This can be avoided using Vector images or SVGs, which can be GZipped over the wire, thus reducing the payload size. Another not so obvious optimization is changing the image compression type. Have you ever loaded a page in which the image loads from the top to bottom in small, incremental rectangles? By default, the images are compressed using a baseline technique, which is a default method of compressing the image from top to bottom. We can change this to be progressive compression using libraries such as imagemin. This would load the entire image first as blurred, then semi blurred, and so on until the entire image is uncompressed and displayed on the screen. Uncompressing a progressive JPEG might take a little longer than that of the baseline, so it is important to measure before making such optimizations. Another extension based on this concept is a Chrome-only format of an image called WebP. This is a highly effective way of serving images, which serves a lot of companies in production and saved almost 30% on bandwidth. Using WebP is almost as simple as the progressive compression as discussed previously. We can use the imagemin-webp node module, which can convert a JPEG image into a webp image, thus reducing the image size to a great extent. Web fonts are a little different than that of images. Images get downloaded and rendered onto the UI on demand, that is, when the browser encounters the image either from the HTML 0r CSS files. However, the fonts, on the other hand, are a little different. The font files are only requested when the Render Tree is completely constructed. That means that the CSSOM and DOM have to be ready by the time request is dispatched for the fonts. Also, if the fonts files are being served from the server and not locally, then there are chances that we may see the text without the font applied first (or no text at all) and then we see the font applied, which may cause a flashing effect of the text. There are multiple simple techniques to avoid this problem: Download, serve, and preload the font files locally: <link rel="preload" href="fonts/my-font.woff2" as="font"> Specify the unicode-range in the font-face so that browsers can adapt and improvise on the character set and glyphs that are actually expected by the browser: @font-face( ... unicode-range: U+000-5FF; // latin ... ) So far, we have seen that we can get the unstyled text to be loaded on to the UI and the get styled as we expected it to be; this can be changed using the font loading API, which allows us to load and render the font using JavaScript: var font = new FontFace("myFont", "url(/my-fonts/my-font.woff2)", { unicodeRange: 'U+000-5FF' }); // initiate a fetch without Render Tree font.load().then(function() { // apply the font document.fonts.add(font); document.body.style.fontFamily = "myFont"; }); Garbage collection in JavaScript Let's take a quick look at what garbage collection (GC) is and how we can handle it in JavaScript. A lot of low-level languages provide explicit capabilities to developers to allocate and free memory in their code. However, unlike those languages, JavaScript automatically handles the memory management, which is both a good and bad thing. Good because we no longer have to worry about how much memory we need to allocate, when we need to do so, and how to free the assigned memory. The bad part about the whole process is that, to an uninformed developer, this can be a recipe for disaster and they can end up with an application that might hang and crash. Luckily for us, understanding the process of GC is quite easy and can be very easily incorporated into our coding style to make sure that we are writing optimal code when it comes to memory management. Memory management has three very obvious steps:    Assign the memory to variables: var a = 10; // we assign a number to a memory location referenced by variable a    Use the variables to read or write from the memory: a += 3; // we read the memory location referenced by a and write a new value to it    Free the memory when it's no longer needed. Now, this is the part that is not explicit. How does the browser know when we are done with the variable a and it is ready to be garbage collected? Let's wrap this inside a function before we continue this discussion: function test() { var a = 10; a += 3; return a; } We have a very simple function, which just adds to our variable a and returns the result and finishes the execution. However, there is actually one more step, which will happen after the execution of this method called mark and sweep (not immediately after, sometimes this can also happen after a batch of operations is completed on the main thread). When the browser performs mark and sweep, it's dependent on the total memory the application consumes and the speed at which the memory is being consumed. Mark and sweep algorithm Since there is no accurate way to determine whether the data at a particular memory location is going to be used or not in the future, we will need to depend on alternatives which can help us make this decision. In JavaScript, we use the concept of a reference to determine whether a variable is still being used or not—if not, it can be garbage collected. The concept of mark and sweep is very straightforward: what all memory locations are reachable from all the known active memory locations? If something is not reachable, collect it, that is, free the memory. That's it, but what are the known active memory locations? It still needs a starting point, right? In most of the browsers, the GC algorithm keeps a list of the roots from which the mark and sweep process can be started. All the roots and their children are marked as active, and any variable that can be reached from these roots are also marked as active. Anything that cannot be reached can be marked as unreachable and thus collected. In most of the cases, the roots consist of the window object. So, we will go back to our previous example: function test() { var a = 10; a += 3; return a; } Our variable a is local to the test() method. As soon as the method is executed, there is no way to access that variable anymore, that is, no one holds any reference to that variable, and that is when it can be marked for garbage collection so that the next time GC runs, the var  a will be swept and the memory allocated to it can be freed. Garbage collection and V8 When it comes to V8, the process of garbage collection is extremely complex (as it should be). So, let's briefly discuss how V8 handles it. In V8, the memory (heap) is divided into two main generations, which are the new-space and old-space. Both new-space and old-space are assigned some memory (between 1 MB and 20 MB). Most of the programs and their variables when created are assigned within the new-space. As and when we create a new variable or perform an operation, which consumes memory, it is by default assigned from the new-space, which is optimized for memory allocation. Once the total memory allocated to the new-space is almost completely consumed, the browser triggers a Minor GC, which basically removes the variables that are no longer being referenced and marks the variables that are still being referenced and cannot be removed yet. Once a variable survives two or more Minor GCs, then it becomes a candidate for old-space where the GC cycle is not run as frequently as that of the new- space. A Major GC is triggered when the old-space is of a certain size, all of this is driven by the heuristics of the application, which is very important to the whole process. So, well- written programs move fewer objects into the old-space and thus have less Major GC events being triggered. Needless to say that this is a very high-level overview of what V8 does for garbage collection, and since this process keeps changing over time, we will switch gears and move on to the next topic. Avoiding memory leaks Well, now that we know on a high level what garbage collection is in JavaScript and how it works, let's take a look at some common pitfalls which prevent us from getting our variables marked for GC by the browser. Assigning variables to global scope This should be pretty obvious by now; we discussed how the GC mechanism determines a root (which is the window object) and treats everything on the root and its children as active and never marks them for garbage collection. So, the next time you forget to add a var to your variable declarations, remember that the global variable that you are creating will live forever and never get garbage collected: function test() { a = 10; // created on window object a += 3; return a; } Removing DOM elements and references It's imperative that we keep our DOM references to a minimum, so a well-known step that we like to perform is caching the DOM elements in our JavaScript so that we do not have to query any of the DOM elements over and over. However, once the DOM elements are removed, we will need to make sure that these methods are removed from our cache as well, otherwise, they will never get GC'd: var cache = { row: document.getElementById('row') }; function removeTable() { document.body.removeChild(document.getElementById('row')); } The code shown previously removes the row from the DOM but the variable cache still refers to the DOM element, hence preventing it from being garbage collected. Another interesting thing to note here is that even when we remove the table that was containing the row, the entire table would remain in the memory and not get GC'd because the row, which is in cache internally refers to the table. Closures edge case Closures are amazing; they help us deal with a lot of problematic scenarios and also provide us with ways in which we can simulate the concept of private variables. Well, all that is good, but sometimes we tend to overlook the potential downsides that are associated with the closures. Here is what we do know and use: function myGoodFunc() { var a = new Array(10000000).join('*'); // something big enough to cause a spike in memory usage function myGoodClosure() { return a + ' added from closure'; } myGoodClosure(); } setInterval(myGoodFunc, 1000); When we run this script in the browser and then profile it, we see as expected that the method consumes a constant amount of memory and then is GC'd and restored to the baseline memory consumed by the script: Now, let's zoom into one of these spikes and take a look at the call tree to determine what all events are triggered around the time of the spikes: We can see that everything happens as per our expectation here; first, our setInterval() is triggered, which calls myGoodFunc(), and once the execution is done, there is a GC, which collects the data and hence the spike, as we can see from the preceding screenshots. Now, this was the expected flow or the happy path when dealing with closures. However, sometimes our code is not as simple and we end up performing multiple things within one closure, and sometimes even end up nesting closures: function myComplexFunc() { var a = new Array(1000000).join('*'); // something big enough to cause a spike in memory usage function closure1() { return a + ' added from closure'; } closure1(); function closure2() { console.log('closure2 called') } setInterval(closure2, 100); } setInterval(myComplexFunc, 1000); We can note in the preceding code that we extended our method to contain two closures now: closure1 and closure2. Although closure1 still performs the same operation as before, closure2 will run forever because we have it running at 1/10th of the frequency of the parent function. Also, since both the closure methods share the parent closure scope, in this case the variable a, it will never get GC'd and thus cause a huge memory leak, which can be seen from the profile as follows: On a closer look, we can see that the GC is being triggered but because of the frequency at which the methods are being called, the memory is slowly leaking (lesser memory is collected than being created): Well, that was an extreme edge case, right? It's way more theoretical than practical—why would anyone have two nested setInterval() methods with closures. Let's take a look at another example in which we no longer nest multiple setInterval(), but it is driven by the same logic. Let's assume that we have a method that creates closures: var something = null; function replaceValue () { var previousValue = something; // `unused` method loads the `previousValue` into closure scope function </span>unused() { if (previousValue) console.log("hi"); } // update something something = { str: new Array(1000000).join('*'), // all closures within replaceValue share the same // closure scope hence someMethod would have access // to previousValue which is nothing but its parent // object (`something`) // since `someMethod` has access to its parent // object, even when it is replaced by a new (identical) // object in the next setInterval iteration, the previous // value does not get garbage collected because the someMethod // on previous value still maintains reference to previousValue // and so on. someMethod: function () {} }; } setInterval(replaceValue, 1000); A simple fix to solve this problem is obvious, as we have said ourselves that the previous value of the object something doesn't get garbage collected as it refers to the previousValue from the previous iteration. So, the solution to this would be to clear out the value of the previousValue at the end of each iteration, thus leaving nothing for something to refer once it is unloaded, hence the memory profiling can be seen to change: The preceding image changes as follows: To summarize,  we introduced JavaScript micro-optimizations and memory optimizations that ultimately led to a high performance JavaScript. If you have found this post useful, do check out the book Hands-On Data Structures and Algorithms with JavaScript for solutions to implement complex data structures and algorithms in practical way.    
Read more
  • 0
  • 2
  • 6905