Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-how-to-build-a-cold-start-friendly-content-based-recommender-using-apache-spark-sql
Sunith Shetty
14 Dec 2017
12 min read
Save for later

How to build a cold-start friendly content-based recommender using Apache Spark SQL

Sunith Shetty
14 Dec 2017
12 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Big Data Analytics with Java written by Rajat Mehta. In this book, you will learn how to perform analytics on big data using Java, machine learning and other big data tools. [/box] From the below given article, you will learn how to create the content-based recommendation system using movielens dataset. Getting started with content-based recommendation systems In content-based recommendations, the recommendation systems check for similarity between the items based on their attributes or content and then propose those items to the end users. For example, if there is a movie and the recommendation system has to show similar movies to the users, then it might check for the attributes of the movie such as the director name, the actors in the movie, the genre of the movie, and so on or if there is a news website and the recommendation system has to show similar news then it might check for the presence of certain words within the news articles to build the similarity criteria. As such the recommendations are based on actual content whether in the form of tags, metadata, or content from the item itself (as in the case of news articles). Dataset For this article, we are using the very popular movielens dataset, which was collected by the GroupLens Research Project at the University of Minnesota. This dataset contains a list of movies that are rated by their customers. It can be downloaded from the site https:/grouplens.org/datasets/movielens. There are few files in this dataset, they are: u.item:This contains the information about the movies in the dataset. The attributes in this file are: Movie Id This is the ID of the movie Movie title This is the title of the movie Video release date This is the release date of the movie Genre (isAction, isComedy, isHorror) This is the genre of the movie. This is represented by multiple flats such as isAction, isComedy, isHorror, and so on and as such it is just a Boolean flag with 1 representing users like this genre and 0 representing user do not like this genre. u.data: This contains the data about the users rating for the movies that they have watched. As such there are three main attributes in this file and they are: Userid This is the ID of the user rating the movie Item id This is the ID of the movie Rating This is the rating given to the movie by the user u.user: This contains the demographic information about the users. The main attributes from this file are: User id This is the ID of the user rating the movie Age This is the age of the user Gender This is the rating given to the movie by the user ZipCode This is the zip code of the place of the user As the data is pretty clean and we are only dealing with a few parameters, we are not doing any extensive data exploration in this article for this dataset. However, we urge the users to try and practice data exploration on this dataset and try creating bar charts using the ratings given, and find patterns such as top-rated movies or top-rated action movies, or top comedy movies, and so on. Content-based recommender on MovieLens dataset We will build a simple content-based recommender using Apache Spark SQL and the MovieLens dataset. For figuring out the similarity between movies, we will use the Euclidean Distance. This Euclidean Distance would be run using the genre properties, that is, action, comedy, horror, and so on of the movie items and also run on the average rating for each movie. First is the usual boiler plate code to create the SparkSession object. For this we first build the Spark configuration object and provide the master which in our case is local since we are running this locally in our computer. Next using this Spark configuration object we build the SparkSession object and also provide the name of the application here and that is ContentBasedRecommender. SparkConf sc = new SparkConf().setMaster("local"); SparkSession spark = SparkSession      .builder()      .config(sc)      .appName("ContentBasedRecommender")      .getOrCreate(); Once the SparkSession is created, load the rating data from the u.data file. We are using the Spark RDD API for this, the user can feel free to change this code and use the dataset API if they prefer to use that. On this Java RDD of the data, we invoke a map function and within the map function code we extract the data from the dataset and populate the data into a Java POJO called RatingVO: JavaRDD<RatingVO> ratingsRDD = spark.read().textFile("data/movie/u.data").javaRDD() .map(row -> { RatingVO rvo = RatingVO.parseRating(row); return rvo; }); As you can see, the data from each row of the u.data dataset is passed to the RatingVO.parseRating method in the lambda function. This parseRating method is declared in the POJO RatingVO. Let's look at the RatingVO POJO first. For maintaining brevity of the code, we are just showing the first few lines of the POJO here: public class RatingVO implements Serializable {   private int userId; private int movieId; private float rating; private long timestamp;   private int like; As you can see in the preceding code we are storing movieId, rating given by the user and the userId attribute in this POJO. This class RatingVO contains one useful method parseRating as shown next. In this method, we parse the actual data row (that is tab separated) and extract the values of rating, movieId, and so on from it. In the method, do note the if...else block, it is here that we check whether the rating given is greater than three. If it is, then we take it as a 1 or like and populate it in the like attribute of RatingVO, else we mark it as 0 or dislike for the user: public static RatingVO parseRating(String str) {      String[] fields = str.split("t");      ...      int userId = Integer.parseInt(fields[0]);      int movieId = Integer.parseInt(fields[1]);      float rating = Float.parseFloat(fields[2]); if(rating > 3) return new RatingVO(userId, movieId, rating, timestamp,1);      return new RatingVO(userId, movieId, rating, timestamp,0); } Once we have the RDD object built with the POJO's RatingVO contained per row, we are now ready to build our beautiful dataset object. We will use the createDataFrame method on the spark session and provide the RDD containing data and the corresponding POJO class, that is, RatingVO. We also register this dataset as a temporary view called ratings so that we can fire SQL queries on it: Dataset<Row> ratings = spark.createDataFrame(ratingsRDD, RatingVO.class); ratings.createOrReplaceTempView("ratings"); ratings.show(); The show method in the preceding code would print the first few rows of this dataset as shown next: Next we will fire a Spark SQL query on this view (ratings) and in this SQL query we will find the average rating in this dataset for each movie. For this, we will group by on the movieId in this query and invoke an average function on the rating  column as shown here: Dataset<Row> moviesLikeCntDS = spark.sql("select movieId,avg(rating) likesCount from ratings group by movieId"); The moviesLikeCntDS dataset now contains the results of our group by query. Next we load the data for the movies from the u.item data file. As we did for users, we store this data for movies in a MovieVO POJO. This MovieVO POJO contains the data for the movies such as the MovieId, MovieTitle and it also stores the information about the movie genre such as action, comedy, animation, and so on. The genre information is stored as 1 or 0. For maintaining the brevity of the code, we are not showing the full code of the lambda function here: JavaRDD<MovieVO> movieRdd = spark.read().textFile("data/movie/u.item").javaRDD() .map(row -> {  String[] strs = row.split("|");  MovieVO mvo = new MovieVO();  mvo.setMovieId(strs[0]);        mvo.setMovieTitle(strs[1]);  ...  mvo.setAction(strs[6]); mvo.setAdventure(strs[7]);     ...  return mvo; }); As you can see in the preceding code we split the data row and from the split result, which is an array of strings, we extract our individual values and store in the MovieVO object. The results of this operation are stored in the movieRdd object, which is a Spark RDD. Next, we convert this RDD into a Spark dataset. To do so, we invoke the Spark createDataFrame function and provide our movieRdd to it and also supply the MovieVO POJO here. After creating the dataset for the movie RDD, we perform an important step here of combining our movie dataset with the moviesLikeCntDS dataset we created earlier (recall that moviesLikeCntDS dataset contains our movie ID and the average rating for that review in this movie dataset). We also register this new dataset as a temporary view so that we can fire Spark SQL queries on it: Dataset<Row> movieDS = spark.createDataFrame(movieRdd.rdd(), MovieVO.class).join(moviesLikeCntDS, "movieId"); movieDS.createOrReplaceTempView("movies"); Before we move further on this program, we will print the results of this new dataset. We will invoke the show method on this new dataset: movieDS.show(); This would print the results as (for brevity we are not showing the full columns in the screenshot): Now comes the turn for the meat of this content recommender program. Here we see the power of Spark SQL, we will now do a self join within a Spark SQL query to the temporary view movie. Here, we will make a combination of every movie to every other movie in the dataset except to itself. So if you have say three movies in the set as (movie1, movie2, movie3), this would result in the combinations (movie1, movie2), (movie1, movie3), (movie2, movie3), (movie2, movie1), (movie3, movie1), (movie3, movie2). You must have noticed by now that this query would produce duplicates as (movie1, movie2) is same as (movie2, movie1), so we will have to write separate code to remove those duplicates. But for now the code for fetching these combinations is shown as follows, for maintaining brevity we have not shown the full code: Dataset<Row> movieDataDS = spark.sql("select m.movieId movieId1,m.movieTitle movieTitle1,m.action action1,m.adventure adventure1, ... " + "m2.movieId movieId2,m2.movieTitle movieTitle2,m2.action action2,m2.adventure adventure2, ..." If you invoke show on this dataset and print the result, it would print a lot of columns and their first few values. We will show some of the values next (note that we show values for movie1 on the left-hand side and the corresponding movie2 on the right-hand side): Did you realize by now that on a big data dataset this operation is massive and requires extensive computing resources. Thankfully on Spark you can distribute this operation to run on multiple computer notes. For this, you can tweak the spark-submit job with the following parameters so as to extract the last bit of juice from all the clustered nodes: spark-submit executor-cores <Number of Cores> --num-executor <Number of Executors> The next step is the most important one in this content management program. We will run over the results of the previous query and from each row we will pull out the data for movie1 and movie2 and cross compare the attributes of both the movies and find the Euclidean Distance between them. This would show how similar both the movies are; the greater the distance, the less similar the movies are. As expected, we convert our dataset to an RDD and invoke a map function and within the lambda function for that map we go over all the dataset rows and from each row we extract the data and find the Euclidean Distance. For maintaining conciseness, we depict only a portion of the following code. Also note that we pull the information for both movies and store the calculated Euclidean Distance in a EuclidVO Java POJO object: JavaRDD<EuclidVO> euclidRdd = movieDataDS.javaRDD().map( row -> {    EuclidVO evo = new EuclidVO(); evo.setMovieId1(row.getString(0)); evo.setMovieTitle1(row.getString(1)); evo.setMovieTitle2(row.getString(22)); evo.setMovieId2(row.getString(21));    int action = Math.abs(Integer.parseInt(row.getString(2)) – Integer.parseInt(row.getString(23)) ); ... double likesCnt = Math.abs(row.getDouble(20) - row.getDouble(41)); double euclid = Math.sqrt(action * action + ... + likesCnt * likesCnt); evo.setEuclidDist(euclid); return evo; }); As you can see in the bold text in the preceding code, we are calculating the Euclid Distance and storing it in a variable. Next, we convert our RDD to a dataset object and again register it in a temporary view movieEuclids and now are ready to fire queries for our predictions: Dataset<Row> results = spark.createDataFrame(euclidRdd.rdd(), EuclidVO.class); results.createOrReplaceTempView("movieEuclids"); Finally, we are ready to make our predictions using this dataset. Let's see our first prediction; let's find the top 10 movies that are closer to Toy Story, and this movie has movieId of 1. We will fire a simple query on the view movieEuclids to find this: spark.sql("select * from movieEuclids where movieId1 = 1 order by euclidDist asc").show(20); As you can see in the preceding query, we order by Euclidean Distance in ascending order as lesser distance means more similarity. This would print the first 20 rows of the result as follows: As you can see in the preceding results, they are not bad for content recommender system with little to no historical data. As you can see the first few movies returned are also famous animation movies like Toy Story and they are Aladdin, Winnie the Pooh, Pinocchio, and so on. So our little content management system we saw earlier is relatively okay. You can do many more things to make it even better by trying out more properties to compare with and using a different similarity coefficient such as Pearson Coefficient, Jaccard Distance, and so on. From the article, we learned how content recommenders can be built on zero to no historical data. These recommenders are based on the attributes present on the item itself using which we figure out the similarity with other items and recommend them. To know more about data analysis, data visualization and machine learning techniques you can refer to the book Big Data Analytics with Java.  
Read more
  • 0
  • 0
  • 6685

article-image-getting-know-generative-models-types
Sunith Shetty
20 Feb 2018
9 min read
Save for later

Getting to know Generative Models and their types

Sunith Shetty
20 Feb 2018
9 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Rajdeep Dua and Manpreet Singh Ghotra titled Neural Network Programming with Tensorflow. In this book, you will use TensorFlow to build and train neural networks of varying complexities, without any hassle.[/box] In today’s tutorial, we will learn about generative models, and their types. We will also look into how discriminative models differs from generative models. Introduction to Generative models Generative models are the family of machine learning models that are used to describe how data is generated. To train a generative model we first accumulate a vast amount of data in any domain and later train a model to create or generate data like it. In other words, these are the models that can learn to create data that is similar to data that we give them. One such approach is using Generative Adversarial Networks (GANs). There are two kinds of machine learning models: generative models and discriminative models. Let's examine the following list of classifiers: decision trees, neural networks, random forests, generalized boosted models, logistic regression, naive bayes, and Support Vector Machine (SVM). Most of these are classifiers and ensemble models. The odd one out here is Naive Bayes. It's the only generative model in the list. The others are examples of discriminative models. The fundamental difference between generative and discriminative models lies in the underlying probability inference structure. Let's go through some of the key differences between generative and discriminative models. Discriminative versus generative models Discriminative models learn P(Y|X), which is the conditional relationship between the target variable Y and features X. This is how least squares regression works, and it is the kind of inference pattern that gets used. It is an approach to sort out the relationship among variables. Generative models aim for a complete probabilistic description of the dataset. With generative models, the goal is to develop the joint probability distribution P(X, Y), either directly or by computing P(Y | X) and P(X) and then inferring the conditional probabilities required to classify newer data. This method requires more solid probabilistic thought than regression demands, but it provides a complete model of the probabilistic structure of the data. Knowing the joint distribution enables you to generate the data; hence, Naive Bayes is a generative model. Suppose we have a supervised learning task, where xi is the given features of the data points and yi is the corresponding labels. One way to predict y on future x is to learn a function f() from (xi,yi) that takes in x and outputs the most likely y. Such models fall in the category of discriminative models, as you are learning how to discriminate between x's from different classes. Methods like SVMs and neural networks fall into this category. Even if you're able to classify the data very accurately, you have no notion of how the data might have been generated. The second approach is to model how the data might have been generated and learn a function f(x,y) that gives a score to the configuration determined by x and y together. Then you can predict y for a new x by finding the y for which the score f(x,y) is maximum. A canonical example of this is Gaussian mixture models. Another example of this is: you can imagine x to be an image and y to be a kind of object like a dog, namely in the image. The probability written as p(y|x) tells us how much the model believes that there is a dog, given an input image compared to all possibilities it knows about. Algorithms that try to model this probability map directly are called discriminative models. Generative models, on the other hand, try to learn a function called the joint probability p(y, x). We can read this as how much the model believes that x is an image and there is a dog y in it at the same time. These two probabilities are related and that could be written as p(y, x) = p(x) p(y|x), with p(x) being how likely it is that the input x is an image. The p(x) probability is usually called a density function in literature. The main reason to call these models generative ultimately connects to the fact that the model has access to the probability of both input and output at the same time. Using this, we can generate images of animals by sampling animal kinds y and new images x from p(y, x). We can mainly learn the density function p(x) which only depends on the input space. Both models are useful; however, comparatively, generative models have an interesting advantage over discriminative models, namely, they have the potential to understand and explain the underlying structure of the input data even when there are no labels available. This is very desirable when working in the real world. Types of generative models Discriminative models have been at the forefront of the recent success in the field of machine learning. Models make predictions that depend on a given input, although they are not able to generate new samples or data. The idea behind the recent progress of generative modeling is to convert the generation problem to a prediction one and use deep learning algorithms to learn such a problem. Autoencoders One way to convert a generative to a discriminative problem can be by learning the mapping from the input space itself. For example, we want to learn an identity map that, for each image x, would ideally predict the same image, namely, x = f(x), where f is the predictive model. This model may not be of use in its current form, but from this, we can create a generative model. Here, we create a model formed of two main components: an encoder model q(h|x) that maps the input to another space, which is referred to as hidden or the latent space represented by h, and a decoder model q(x|h) that learns the opposite mapping from the hidden input space. These components--encoder and decoder--are connected together to create an end-to-end trainable model. Both the encoder and decoder models are neural networks of different architectures, for example, RNNs and Attention Nets, to get desired outcomes. As the model is learned, we can remove the decoder from the encoder and then use them separately. To generate a new data sample, we can first generate a sample from the latent space and then feed that to the decoder to create a new sample from the output space. GAN As seen with autoencoders, we can think of a general concept to create networks that will work together in a relationship, and training them will help us learn the latent spaces that allow us to generate new data samples. Another type of generative network is GAN, where we have a generator model q(x|h) to map the small dimensional latent space of h (which is usually represented as noise samples from a simple distribution) to the input space of x. This is quite similar to the role of decoders in autoencoders. The deal is now to introduce a discriminative model p(y| x), which tries to associate an input instance x to a yes/no binary answer y, about whether the generator model generated the input or was a genuine sample from the dataset we were training on. Let's use the image example done previously. Assume that the generator model creates a new image, and we also have the real image from our actual dataset. If the generator model was right, the discriminator model would not be able to distinguish between the two images easily. If the generator model was poor, it would be very simple to tell which one was a fake or fraud and which one was real. When both these models are coupled, we can train them end to end by assuring that the generator model is getting better over time to fool the discriminator model, while the discriminator model is trained to work on the harder problem of detecting frauds. Finally, we desire a generator model with outputs that are indistinguishable from the real data that we used for the training. Through the initial parts of the training, the discriminator model can easily detect the samples coming from the actual dataset versus the ones generated synthetically by the generator model, which is just beginning to learn. As the generator gets better at modeling the dataset, we begin to see more and more generated samples that look similar to the dataset. The following example depicts the generated images of a GAN model learning over time: Sequence models If the data is temporal in nature, then we can use specialized algorithms called Sequence Models. These models can learn the probability of the form p(y|x_n, x_1), where i is an index signifying the location in the sequence and x_i is the ith  input sample. As an example, we can consider each word as a series of characters, each sentence as a series of words, and each paragraph as a series of sentences. Output y could be the sentiment of the sentence. Using a similar trick from autoencoders, we can replace y with the next item in the series or sequence, namely y = x_n + 1, allowing the model to learn. To summarize, we learned generative models are a fast advancing area of study and research. As we proceed to advance these models and grow the training and datasets, we can expect to generate data examples that depict completely believable images. This can be used in several applications such as image denoising, painting, structured prediction, and exploration in reinforcement learning. To know more about how to build and optimize neural networks using TensorFlow, do checkout this book Neural Network Programming with Tensorflow.    
Read more
  • 0
  • 0
  • 6685

article-image-shaping-model-meshmixer-and-printing-it
Packt
20 Jun 2014
4 min read
Save for later

Shaping a model with Meshmixer and printing it

Packt
20 Jun 2014
4 min read
(For more resources related to this topic, see here.) Shaping with Meshmixer Meshmixer was designed to provide a modeling interface that frees the user from working directly with the geometry of the mesh. In most cases, the intent of the program succeeds, but in some cases, it's good to see how the underlying mesh works. We'll use some brush tools to make our model better, thereby taking a look at how this affects the mesh structure. Getting ready We'll use a toy block scanned with 123D Catch. How to do it... We will proceed as follows: Let's take a look at the model's mesh by positioning the model with a visible large surface. Go to the menu and select View. Scroll down and select Toggle Wireframe (W). Choose Sculpt. From the pop-up toolbox, choose Brushes. Go to the menu and select ShrinkSmooth. Adjust your settings in the Properties section. Keep the size as 60 and its strength as 25. Use the smooth tool slowly across the model, watching the change it makes to the mesh. In the following example, the keyboard shortcut W is used to toggle between mesh views: Repeat using the RobustSmooth and Flatten brushes. Use these combinations of brushes to flatten one side of the toy block. Rotate your model to an area where there's heavy distortion. Make sure your view is in the wireframe mode. Go back to Brushes and select Pinch. Adjust the Strength to 85, Size to 39, Depth to -17, and Lazyness to 95. Keep everything else at default values. If you are uncertain of the default values, left-click on the small cogwheel icon next to the Properties heading. Choose Reset to Defaults. We're going to draw a line across a distorted area of the toy block to see how it affects the mesh. Using the pinch brush, draw a line across the model. Save your work and then select Undo/back from the Actions menu (Ctrl+ Z). Now, select your entire model. Go to the toolbox and select Edit. Scroll down and select Remesh (R). You'll see an even distribution of polygons in the mesh. Keep the defaults in the pop up and click on Accept. Now, go back and choose Clear Selection. Select the pinch brush again and draw a line across the model as you did before. Compare it to the other model with the unrefined mesh. Let's finish cleaning up the toy block. Click on Undo/back (Ctrl+ Z) to the pinch brush line that you drew. Now, use the pinch tool to refine the edges of the model. Work around it and sharpen all the edges. Finish smoothing the planes on the block and click on Save. We can see the results clearly as we compare the original toy block model to our modified model in the preceding image. How it works... Meshmixer works by using a mesh with a high definition of polygons. When a sculpting brush such as pinch is used to manipulate the surface, it rapidly increases the polygon count in the surrounding area. When the pinch tool crosses an area that has fewer and larger polygons, the interpolation of the area becomes distorted. We can see this in the following example when we compare the original and remeshed model in the wireframe view: In the following image, when we hide the wireframe, we can see how the distortion in the mesh has given the model on the left some undesirable texture along the pinch line: It may be a good idea to examine a model's mesh before sculpting it. Meshmixer works better with a dense polygon count that is consistent in size. By using the Remesh edit, a variety of mesh densities can be achieved by making changes in Properties. Experiment with the various settings and the sculpting brushes while in the wireframing stage. This will help you gain a better understanding of how mesh surface modeling works. Let's print! When we 3D print a model, we have the option of controlling how solid the interior will be and what kind of structure will fill it. How we choose between the options is easily determined by answering the following questions: Will it need to be structurally strong? If it's going to be used as a mechanical part or an item that will be heavily handled, then it does. Will it be a prototype? If it's a temporary object for examination purposes or strictly for display, then a fragile form may suffice. Depending on the use of a model, you'll have to decide how the object falls within these two extremes.
Read more
  • 0
  • 0
  • 6684

article-image-how-to-navigate-files-in-a-vue-app-using-the-dropbox-api
Pravin Dhandre
05 Jun 2018
13 min read
Save for later

How to navigate files in a Vue app using the Dropbox API

Pravin Dhandre
05 Jun 2018
13 min read
Dropbox API is a set of HTTP endpoints that help apps to integrate with Dropbox easily. It allows developers to work with files present in Dropbox and provides access to advanced functionality like full-text search, thumbnails, and sharing. In this article we will discuss Dropbox API functionalities listed below to navigate and query your files and folders: Loading and querying the Dropbox API Listing the directories and files from your Dropbox account Adding a loading state to your app Using Vue animations To get started you will need a Dropbox account and follow the steps detailed in this article. If you don't have one, sign up and add a few dummy files and folders. The content of Dropbox doesn't matter, but having folders to navigate through will help with understanding the code. This article is an excerpt from a book written by Mike Street titled Vue.js 2.x by Example.  This book puts Vue.js into a real-world context, guiding you through example projects to help you build Vue.js applications from scratch. Getting started—loading the libraries Create a new HTML page for your app to run in. Create the HTML structure required for a web page and include your app view wrapper: It's called #app here, but call it whatever you want - just remember to update the JavaScript. As our app code is going to get quite chunky, make a separate JavaScript file and include it at the bottom of the document. You will also need to include Vue and the Dropbox API SDK. As with before, you can either reference the remote files or download a local copy of the library files. Download a local copy for both speed and compatibility reasons. Include your three JavaScript files at the bottom of your HTML file: Create your app.js and initialize a new Vue instance, using the el tag to mount the instance onto the ID in your view. Creating a Dropbox app and initializing the SDK Before we interact with the Vue instance, we need to connect to the Dropbox API through the SDK. This is done via an API key that is generated by Dropbox itself to keep track of what is connecting to your account and where Dropbox requires you to make a custom Dropbox app. Head to the Dropbox developers area and select Create your app. Choose Dropbox API and select either a restricted folder or full access. This depends on your needs, but for testing, choose Full Dropbox. Give your app a name and click the button Create app. Generate an access token to your app. To do so, when viewing the app details page, click the Generate button under the Generated access token. This will give you a long string of numbers and letters - copy and paste that into your editor and store it as a variable at the top of your JavaScript. In this the API key will be referred to as XXXX: Now that we have our API key, we can access the files and folders from our Dropbox. Initialize the API and pass in your accessToken variable to the accessToken property of the Dropbox API: We now have access to Dropbox via the dbx variable. We can verify our connection to Dropbox is working by connecting and outputting the contents of the root path: This code uses JavaScript promises, which are a way of adding actions to code without requiring callback functions. Take a note of the first line, particularly the path variable. This lets us pass in a folder path to list the files and folders within that directory. For example, if you had a folder called images in your Dropbox, you could change the parameter value to /images and the file list returned would be the files and folders within that directory. Open your JavaScript console and check the output; you should get an array containing several objects - one for each file or folder in the root of your Dropbox. Displaying your data and using Vue to get it Now that we can retrieve our data using the Dropbox API, it's time to retrieve it within our Vue instance and display in our view. This app is going to be entirely built using components so we can take advantage of the compartmentalized data and methods. It will also mean the code is modular and shareable, should you want to integrate into other apps. We are also going to take advantage of the native Vue created() function - we'll cover it when it gets triggered in a bit. Create the component First off, create your custom HTML element, <dropbox-viewer>, in your View. Create a <script> template block at the bottom of the page for our HTML layout: Initialize your component in your app.js file, pointing it to the template ID: Viewing the app in the browser should show the heading from the template. The next step is to integrate the Dropbox API into the component. Retrieve the Dropbox data Create a new method called dropbox. In there, move the code that calls the Dropbox class and returns the instance. This will now give us access to the Dropbox API through the component by calling this.dropbox(): We are also going to integrate our API key into the component. Create a data function that returns an object containing your access token. Update the Dropbox method to use the local version of the key: We now need to add the ability for the component to get the directory list. For this, we are going to create another method that takes a single parameter—the path. This will give us the ability later to request the structure of a different path or folder if required. Use the code provided earlier - changing the dbx variable to this.dropbox(): Update the Dropbox filesListFolder function to accept the path parameter passed in, rather than a fixed value. Running this app in the browser will show the Dropbox heading, but won't retrieve any folders because the methods have not been called yet. The Vue life cycle hooks This is where the created() function comes in. The created() function gets called once the Vue instance has initialized the data and methods, but has yet to mount the instance on the HTML component. There are several other functions available at various points in the life cycle; more about these can be read at Alligator.io. The life cycle is as follows: Using the created() function gives us access to the methods and data while being able to start our retrieval process as Vue is mounting the app. The time between these various stages is split-second, but every moment counts when it comes to performance and creating a quick app. There is no point waiting for the app to be fully mounted before processing data if we can start the task early. Create the created() function on your component and call the getFolderStructure method, passing in an empty string for the path to get the root of your Dropbox: Running the app now in your browser will output the folder list to your console, which should give the same result as before. We now need to display our list of files in the view. To do this, we are going to create an empty array in our component and populate it with the result of our Dropbox query. This has the advantage of giving Vue a variable to loop through in the view, even before it has any content. Displaying the Dropbox data Create a new property in your data object titled structure, and assign this to an empty array. In the response function of the folder retrieval, assign response.entries to this.structure. Leave console.log as we will need to inspect the entries to work out what to output in our template: We can now update our view to display the folders and files from your Dropbox. As the structure array is available in our view, create a <ul> with a repeatable <li> looping through the structure. As we are now adding a second element, Vue requires templates to have one containing the element, wrap your heading and list in a <div>: Viewing the app in the browser will show a number of empty bullet points when the array appears in the JavaScript console. To work out what fields and properties you can display, expand the array in the JavaScript console and then further for each object. You should notice that each object has a collection of similar properties and a few that vary between folders and files. The first property, .tag, helps us identify whether the item is a file or a folder. Both types then have the following properties in common: id: A unique identifier to Dropbox name: The name of the file or folder, irrespective of where the item is path_display: The full path of the item with the case matching that of the files and folders path_lower: Same as path_display but all lowercase Items with a .tag of a file also contain several more fields for us to display: client_modified: This is the date when the file was added to Dropbox. content_hash: A hash of the file, used for identifying whether it is different from a local or remote copy. More can be read about this on the Dropbox website. rev: A unique identifier of the version of the file. server_modified: The last time the file was modified on Dropbox. size: The size of the file in bytes. To begin with, we are going to display the name of the item and the size, if present. Update the list item to show these properties: More file meta information To make our file and folder view a bit more useful, we can add more rich content and metadata to files such as images. These details are available by enabling the include_media_info option in the Dropbox API. Head back to your getFolderStructure method and add the parameter after path. Here are some new lines of readability: Inspecting the results from this new API call will reveal the media_info key for videos and images. Expanding this will reveal several more pieces of information about the file, for example, dimensions. If you want to add these, you will need to check that the media_info object exists before displaying the information: Try updating the path when retrieving the data from Dropbox. For example, if you have a folder called images, change the this.getFolderStructure parameter to /images. If you're not sure what the path is, analyze the data in the JavaScript console and copy the value of the path_lower attribute of a folder, for example: Formatting the file sizes With the file size being output in plain bytes it can be quite hard for a user to dechiper. To combat this, we can add a formatting method to output a file size which is more user-friendly, for example displaying 1kb instead of 1024. First, create a new key on the data object that contains an array of units called byteSizes: This is what will get appended to the figure, so feel free to make these properties either lowercase or full words, for example, megabyte. Next, add a new method, bytesToSize, to your component. This will take one parameter of bytes and output a formatted string with the unit at the end: We can now utilize this method in our view: Add loading screen The last step of this tutorial is to make a loading screen for our app. This will tell the user the app is loading, should the Dropbox API be running slowly (or you have a lot of data to show!). The theory behind this loading screen is fairly basic. We will set a loading variable to true by default that then gets set to false once the data has loaded. Based on the result of this variable, we will utilize view attributes to show, and then hide, an element with the loading text or animation in and also reveal the loaded data list. Create a new key in the data object titled isLoading. Set this variable to true by default: Within the getFolderStructure method on your component, set the isLoading variable to false. This should happen within the promise after you have set the structure: We can now utilize this variable in our view to show and hide a loading container. Create a new <div> before the unordered list containing some loading text. Feel free to add a CSS animation or an animated gif—anything to let the user know the app is retrieving data: We now need to only show the loading div if the app is loading and the list once the data has loaded. As this is just one change to the DOM, we can use the v-if directive. To give you the freedom of rearranging the HTML, add the attribute to both instead of using v-else. To show or hide, we just need to check the status of the isLoading variable. We can prepend an exclamation mark to the list to only show if the app is not loading: Our app should now show the loading container once mounted, and then it should show the list once the app data has been gathered. To recap, our complete component code now looks like this: Animating between states As a nice enhancement for the user, we can add some transitions between components and states. Helpfully, Vue includes some built-in transition effects. Working with CSS, these transitions allow you to add fades, swipes, and other CSS animations easily when DOM elements are being inserted. More information about transitions can be found in the Vue documentation. The first step is to add the Vue custom HTML <transition> element. Wrap both your loading and list with separate transition elements and give it an attribute of name and a value of fade: Now add the following CSS to either the head of your document or a separate style sheet if you already have one: With the transition element, Vue adds and removes various CSS classes based on the state and time of the transition. All of these begin with the name passed in via the attribute and are appended with the current stage of transition: Try the app in your browser, you should notice the loading container fading out and the file list fading in. Although in this basic example, the list jumps up once the fading has completed, it's an example to help you understand using transitions in Vue. We learned to make Dropbox viewer to list out files and folders from the Dropbox account and also used Vue animations for navigation. Do check out the book Vue.js 2.x by Example to start integrating remote data into your web applications swiftly. Building a real-time dashboard with Meteor and Vue.js Building your first Vue.js 2 Web application Installing and Using Vue.js
Read more
  • 0
  • 0
  • 6684

article-image-welcome-to-machine-learning-using-the-net-framework
Oli Huggins
16 Mar 2016
26 min read
Save for later

Welcome to Machine Learning using the .NET Framework

Oli Huggins
16 Mar 2016
26 min read
This article by, Jamie Dixon, the author of the book, Mastering .NET Machine Learning, will focus on some of the larger questions you might have about machine learning using the .NET Framework, namely: What is machine learning? Why should we consider it in the .NET Framework? How can I get started with coding? (For more resources related to this topic, see here.) What is machine learning? If you check out on Wikipedia, you will find a fairly abstract definition of machine learning: "Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions." I like to think of machine learning as computer programs that produce different results as they are exposed to more information without changing their source code (and consequently needed to be redeployed). For example, consider a game that I play with the computer. I show the computer this picture  and tell it "Blue Circle". I then show it this picture  and tell it "Red Circle". Next I show it this picture  and say "Green Triangle." Finally, I show it this picture  and ask it "What is this?". Ideally the computer would respond, "Green Circle." This is one example of machine learning. Although I did not change my code or recompile and redeploy, the computer program can respond accurately to data it has never seen before. Also, the computer code does not have to explicitly write each possible data permutation. Instead, we create models that the computer applies to new data. Sometimes the computer is right, sometimes it is wrong. We then feed the new data to the computer to retrain the model so the computer gets more and more accurate over time—or, at least, that is the goal. Once you decide to implement some machine learning into your code base, another decision has to be made fairly early in the process. How often do you want the computer to learn? For example, if you create a model by hand, how often do you update it? With every new data row? Every month? Every year? Depending on what you are trying to accomplish, you might create a real-time ML model, a near-time model, or a periodic model. Why .NET? If you are a Windows developer, using .NET is something you do without thinking. Indeed, a vast majority of Windows business applications written in the last 15 years use managed code—most of it written in C#. Although it is difficult to categorize millions of software developers, it is fair to say that .NET developers often come from nontraditional backgrounds. Perhaps a developer came to .NET from a BCSC degree but it is equally likely s/he started writing VBA scripts in Excel, moving up to Access applications, and then into VB.NET/C# applications. Therefore, most .NET developers are likely to be familiar with C#/VB.NET and write in an imperative and perhaps OO style. The problem with this rather narrow exposure is that most machine learning classes, books, and code examples are in R or Python and very much use a functional style of writing code. Therefore, the .NET developer is at a disadvantage when acquiring machine learning skills because of the need to learn a new development environment, a new language, and a new style of coding before learning how to write the first line of machine learning code. If, however, that same developer could use their familiar IDE (Visual Studio) and the same base libraries (the .NET Framework), they can concentrate on learning machine learning much sooner. Also, when creating machine learning models in .NET, they have immediate impact as you can slide the code right into an existing C#/VB.NET solution. On the other hand, .NET is under-represented in the data science community. There are a couple of different reasons floating around for that fact. The first is that historically Microsoft was a proprietary closed system and the academic community embraced open source systems such as Linux and Java. The second reason is that much academic research uses domain-specific languages such as R, whereas Microsoft concentrated .NET on general purpose programming languages. Research that moved to industry took their language with them. However, as the researcher's role is shifted from data science to building programs that can work at real time that customers touch, the researcher is getting more and more exposure to Windows and Windows development. Whether you like it or not, all companies which create software that face customers must have a Windows strategy, an iOS strategy, and an Android strategy. One real advantage to writing and then deploying your machine learning code in .NET is that you can get everything with one stop shopping. I know several large companies who write their models in R and then have another team rewrite them in Python or C++ to deploy them. Also, they might write their model in Python and then rewrite it in C# to deploy on Windows devices. Clearly, if you could write and deploy in one language stack, there is a tremendous opportunity for efficiency and speed to market. What version of the .NET Framework are we using? The .NET Framework has been around for general release since 2002. The base of the framework is the Common Language Runtime or CLR. The CLR is a virtual machine that abstracts much of the OS specific functionality like memory management and exception handling. The CLR is loosely based on the Java Virtual Machine (JVM). Sitting on top of the CLR is the Framework Class Library (FCL) that allows different languages to interoperate with the CLR and each other: the FCL is what allows VB.Net, C#, F#, and Iron Python code to work side-by-side with each other. Since its first release, the .NET framework has included more and more features. The first release saw support for the major platform libraries like WinForms, ASP.NET, and ADO.NET. Subsequent releases brought in things like Windows Communication Foundation (WCF), Language Integrated Query (LINQ), and Task Parallel Library (TPL). At the time of writing, the latest version is of the .Net Framework is 4.6.2. In addition to the full-Monty .NET Framework, over the years Microsoft has released slimmed down versions of the .NET Framework intended to run on machines that have limited hardware and OS support. The most famous of these releases was the Portable Class Library (PCL) that targeted Windows RT applications running Windows 8. The most recent incantation of this is Universal Windows Applications (UWA), targeting Windows 10. At Connect(); in November 2015, Microsoft announced GA of the latest edition of the .NET Framework. This release introduced the .Net Core 5. In January, they decided to rename it to .Net Core 1.0. .NET Core 1.0 is intended to be a slimmed down version of the full .NET Framework that runs on multiple operating systems (specifically targeting OS X and Linux). The next release of ASP.NET (ASP.NET Core 1.0) sits on top of .NET Core 1.0. ASP.NET Core 1.0 applications that run on Windows can still run the full .NET Framework. (https://blogs.msdn.microsoft.com/webdev/2016/01/19/asp-net-5-is-dead-int...) In this book, we will be using a mixture of ASP.NET 4.0, ASP.NET 5.0, and Universal Windows Applications. As you can guess, machine learning models (and the theory behind the models) change with a lot less frequency than framework releases so the most of the code you write on .NET 4.6 will work equally well with PCL and .NET Core 1.0. Saying that, the external libraries that we will use need some time to catch up—so they might work with PCL but not with .NET Core 1.0 yet. To make things realistic, the demonstration projects will use .NET 4.6 on ASP.NET 4.x for existing (Brownfield) applications. New (Greenfield) applications will be a mixture of a UWA using PCL and ASP.NET 5.0 applications. Why write your own? It seems like all of the major software companies are pitching machine learning services such as Google Analytics, Amazon Machine Learning Services, IBM Watson, Microsoft Cortana Analytics, to name a few. In addition, major software companies often try to sell products that have a machine learning component, such as Microsoft SQL Server Analysis Service, Oracle Database Add-In, IBM SPSS, or SAS JMP. I have not included some common analytical software packages such as PowerBI or Tableau because they are more data aggregation and report writing applications. Although they do analytics, they do not have a machine learning component (not yet at least). With all these options, why would you want to learn how to implement machine learning inside your applications, or in effect, write some code that you can purchase elsewhere? It is the classic build versus buy decision that every department or company has to make. You might want to build because: You really understand what you are doing and you can be a much more informed consumer and critic of any given machine learning package. In effect, you are building your internal skill set that your company will most likely prize. Another way to look at it, companies are not one tool away from purchasing competitive advantage because if they were, their competitors could also buy the same tool and cancel any advantage. However, companies can be one hire away or more likely one team away to truly have the ability to differentiate themselves in their market. You can get better performance by executing locally, which is especially important for real-time machine learning and can be implemented in disconnected or slow connection scenarios. This becomes particularly important when we start implementing machine learning with Internet of Things (IoT) devices in scenarios where the device has a lot more RAM than network bandwidth. Consider the Raspberry Pi running Windows 10 on a pipeline. Network communication might be spotty, but the machine has plenty of power to implement ML models. You are not beholden to any one vendor or company, for example, every time you implement an application with a specific vendor and are not thinking about how to move away from the vendor, you make yourself more dependent on the vendor and their inevitable recurring licensing costs. The next time you are talking to the CTO of a shop that has a lot of Oracle, ask him/her if they regret any decision to implement any of their business logic in Oracle databases. The answer will not surprise you. A majority of this book's code is written in F#—an open source language that runs great on Windows, Linux, and OS X. You can be much more agile and have much more flexibility in what you implement. For example, we will often re-train our models on the fly and when you write your own code, it is fairly easy to do this. If you use a third-party service, they may not even have API hooks to do model training and evaluation, so near-time model changes are impossible. Once you decide to go native, you have a choice of rolling your own code or using some of the open source assemblies out there. This book will introduce both the techniques to you, highlight some of the pros and cons of each technique, and let you decide how you want to implement them. For example, you can easily write your own basic classifier that is very effective in production but certain models, such as a neural network, will take a considerable amount of time and energy and probably will not give you the results that the open source libraries do. As a final note, since the libraries that we will look at are open source, you are free to customize pieces of it—the owners might even accept your changes. However, we will not be customizing these libraries in this book. Why open data? Many books on machine learning use datasets that come with the language install (such as R or Hadoop) or point to public repositories that have considerable visibility in the data science community. The most common ones are Kaggle (especially the Titanic competition) and the UC Irvine's datasets. While these are great datasets and give a common denominator, this book will expose you to datasets that come from government entities. The notion of getting data from government and hacking for social good is typically called open data. I believe that open data will transform how the government interacts with its citizens and will make government entities more efficient and transparent. Therefore, we will use open datasets in this book and hopefully you will consider helping out with the open data movement. Why F#? As we will be on the .NET Framework, we could use either C#, VB.NET, or F#. All three languages have strong support within Microsoft and all three will be around for many years. F# is the best choice for this book because it is unique in the .NET Framework for thinking in the scientific method and machine learning model creation. Data scientists will feel right at home with the syntax and IDE (languages such as R are also functional first languages). It is the best choice for .NET business developers because it is built right into Visual Studio and plays well with your existing C#/VB.NET code. The obvious alternative is C#. Can I do this all in C#? Yes, kind of. In fact, many of the .NET libraries we will use are written in C#. However, using C# in our code base will make it larger and have a higher chance of introducing bugs into the code. At certain points, I will show some examples in C#, but the majority of the book is in F#. Another alternative is to forgo .NET altogether and develop the machine learning models in R and Python. You could spin up a web service (such as AzureML), which might be good in some scenarios, but in disconnected or slow network environments, you will get stuck. Also, assuming comparable machines, executing locally will perform better than going over the wire. When we implement our models to do real-time analytics, anything we can do to minimize the performance hit is something to consider. A third alternative that the .NET developers will consider is to write the models in T-SQL. Indeed, many of our initial models have been implemented in T-SQL and are part of the SQL Server Analysis Server. The advantage of doing it on the data server is that the computation is as close as you can get to the data, so you will not suffer the latency of moving large amount of data over the wire. The downsides of using T-SQL are that you can't implement unit tests easily, your domain logic is moving away from the application and to the data server (which is considered bad form with most modern application architecture), and you are now reliant on a specific implementation of the database. F# is open source and runs on a variety of operating systems, so you can port your code much more easily. Getting ready for Machine Learning In this section, we will install Visual Studio, take a quick lap around F#, and install the major open source libraries that we will be using. Setting up Visual Studio To get going, you will need to download Visual Studio on a Microsoft Windows machine. As of this writing, the latest (free) version is Visual Studio 2015 Community. If you have a higher version already installed on your machine, you can skip this step. If you need a copy, head on over to the Visual Studio home page at https://www.visualstudio.com. Download the Visual Studio Community 2015 installer and execute it. Now, you will get the following screen: Select Custom installation and you will be taken to the following screen: Make sure Visual F# has a check mark next to it. Once it is installed, you should see Visual Studio in your Windows Start menu. Learning F# One of the great features about F# is that you can accomplish a whole lot with very little code. It is a very terse language compared to C# and VB.NET, so picking up the syntax is a bit easier. Although this is not a comprehensive introduction, this is going to introduce you to the major language features that we will use in this book. I encourage you to check out http://www.tryfsharp.org/ or the tutorials at http://fsharpforfunandprofit.com/ if you want to get a deeper understanding of the language. With that in mind, let's create our 1st F# project: Start Visual Studio. Navigate to File | New | Project as shown in the following screenshot: When the New Project dialog box appears, navigate the tree view to Visual F# | Windows | Console Application. Have a look at the following screenshot: Give your project a name, hit OK, and the Visual Studio Template generator will create the following boilerplate: Although Visual Studio created a Program.fs file that creates a basic console .exe application for us, we will start learning about F# in a different way, so we are going to ignore it for now. Right-click in the Solution Explorer and navigate to Add | New Item. When the Add New Item dialog box appears, select Script File. The Script1.fsx file is then added to the project. Once Script1.fsx is created, open it up, and enter the following into the file: let x = "Hello World" Highlight that entire row of code, right-click and select Execute In Interactive (or press Alt + Enter). And the F# Interactive console will pop up and you will see this: The F# Interactive is a type of REPL, which stands for Read-Evaluate-Print-Loop. If you are a .NET developer who has spent any time in SQL Server Management Studio, the F# Interactive will look very familiar to the Query Analyzer where you enter your code at the top and see how it executes at the bottom. Also, if you are a data scientist using R Studio, you are very familiar with the concept of a REPL. I have used the words REPL and FSI interchangeably in this book. There are a couple of things to notice about this first line of F# code you wrote. First, it looks very similar to C#. In fact, consider changing the code to this: It would be perfectly valid C#. Note that the red squiggly line, showing you that the F# compiler certainly does not think this is valid. Going back to the correct code, notice that type of x is not explicitly defined. F# uses the concept of inferred typing so that you don't have to write the type of the values that you create. I used the term value deliberately because unlike variables, which can be assigned in C# and VB.NET, values are immutable; once bound, they can never change. Here, we are permanently binding the name x to its value, Hello World. This notion of immutability might seem constraining at first, but it has profound and positive implications, especially when writing machine learning models. With our basic program idea proven out, let's move it over to a compliable assembly; in this case, an .exe that targets the console. Highlight the line that you just wrote, press Ctrl + C, and then open up Program.fs. Go into the code that was generated and paste it in: [<EntryPoint>] let main argv = printfn "%A" argv let x = "Hello World" 0 // return an integer exit code Then, add the following lines of code around what you just added: // Learn more about F# at http://fsharp.org // See the 'F# Tutorial' project for more help. open System [<EntryPoint>] let main argv = printfn "%A" argv let x = "Hello World" Console.WriteLine(x) let y = Console.ReadKey() 0 // return an integer exit code Press the Start button (or hit F5) and you should see your program run: You will notice that I had to bind the return value from Console.ReadKey() to y. In C# or VB.NET, you can get away with not handling the return value explicitly. In F#, you are not allowed to ignore the returned values. Although some might think this is a limitation, it is actually a strength of the language. It is much harder to make a mistake in F# because the language forces you to address execution paths explicitly versus accidentally sweeping them under the rug (or into a null, but we'll get to that later). In any event, let's go back to our script file and enter in another line of code: let ints = [|1;2;3;4;5;6|] If you send that line of code to the REPL, you should see this: val ints : int [] = [|1; 2; 3; 4; 5; 6|] This is an array, as if you did this in C#: var ints = new[] {1,2,3,4,5,6}; Notice that the separator is a semicolon in F# and not a comma. This differs from many other languages, including C#. The comma in F# is reserved for tuples, not for separating items in an array. We'll discuss tuples later. Now, let's sum up the values in our array: let summedValue = ints |> Array.sum While sending that line to the REPL, you should see this: val summedValue : int = 21 There are two things going on. We have the |> operator, which is a pipe forward operator. If you have experience with Linux or PowerShell, this should be familiar. However, if you have a background in C#, it might look unfamiliar. The pipe forward operator takes the result of the value on the left-hand side of the operator (in this case, ints) and pushes it into the function on the right-hand side (in this case, sum). The other new language construct is Array.sum. Array is a module in the core F# libraries, which has a series of functions that you can apply to your data. The function sum, well, sums the values in the array, as you can probably guess by inspecting the result. So, now, let's add a different function from the Array type: let multiplied = ints |> Array.map (fun i -> i * 2) If you send it to the REPL, you should see this: val multiplied : int [] = [|2; 4; 6; 8; 10; 12|] Array.map is an example of a high ordered function that is part of the Array type. Its parameter is another function. Effectively, we are passing a function into another function. In this case, we are creating an anonymous function that takes a parameter i and returns i * 2. You know it is an anonymous function because it starts with the keyword fun and the IDE makes it easy for us to understand that by making it blue. This anonymous function is also called a lambda expression, which has been in C# and VB.NET since .Net 3.5, so you might have run across it before. If you have a data science background using R, you are already quite familiar with lambdas. Getting back to the higher-ordered function Array.map, you can see that it applies the lambda function against each item of the array and returns a new array with the new values. We will be using Array.map (and its more generic kin Seq.map) a lot when we start implementing machine learning models as it is the best way to transform an array of data. Also, if you have been paying attention to the buzz words of map/reduce when describing big data applications such as Hadoop, the word map means exactly the same thing in this context. One final note is that because of immutability in F#, the original array is not altered, instead, multiplied is bound to a new array. Let's stay in the script and add in another couple more lines of code: let multiplyByTwo x = x * 2 If you send it to the REPL, you should see this: val multiplyByTwo : x:int -> int These two lines created a named function called multiplyByTwo. The function that takes a single parameter x and then returns the value of the parameter multiplied by 2. This is exactly the same as our anonymous function we created earlier in-line that we passed into the map function. The syntax might seem a bit strange because of the -> operator. You can read this as, "the function multiplyByTwo takes in a parameter called x of type int and returns an int." Note three things here. Parameter x is inferred to be an int because it is used in the body of the function as multiplied to another int. If the function reads x * 2.0, the x would have been inferred as a float. This is a significant departure from C# and VB.NET but pretty familiar for people who use R. Also, there is no return statement for the function, instead, the final expression of any function is always returned as the result. The last thing to note is that whitespace is important so that the indentation is required. If the code was written like this: let multiplyByTwo(x) = x * 2 The compiler would complain: Script1.fsx(8,1): warning FS0058: Possible incorrect indentation: this token is offside of context started at position (7:1). Since F# does not use curly braces and semicolons (or the end keyword), such as C# or VB.NET, it needs to use something to separate code. That separation is whitespace. Since it is good coding practice to use whitespace judiciously, this should not be very alarming to people having a C# or VB.NET background. If you have a background in R or Python, this should seem natural to you. Since multiplyByTwo is the functional equivalent of the lambda created in Array.map (fun i -> i * 2), we can do this if we want: let multiplied' = ints |> Array.map (fun i -> multiplyByTwo i) If you send it to the REPL, you should see this: val multiplied' : int [] = [|2; 4; 6; 8; 10; 12|] Typically, we will use named functions when we need to use that function in several places in our code and we use a lambda expression when we only need that function for a specific line of code. There is another minor thing to note. I used the tick notation for the value multiplied when I wanted to create another value that was representing the same idea. This kind of notation is used frequently in the scientific community, but can get unwieldy if you attempt to use it for a third or even fourth (multiplied'''') representation. Next, let's add another named function to the REPL: let isEven x = match x % 2 = 0 with | true -> "even" | false -> "odd" isEven 2 isEven 3 If you send it to the REPL, you should see this: val isEven : x:int -> string This is a function named isEven that takes a single parameter x. The body of the function uses a pattern-matching statement to determine whether the parameter is odd or even. When it is odd, then it returns the string odd. When it is even, it returns the string even. There is one really interesting thing going on here. The match statement is a basic example of pattern matching and it is one of the coolest features of F#. For now, you can consider the match statement much like the switch statement that you may be familiar within R, Python, C#, or VB.NET. I would have written the conditional logic like this: let isEven' x = if x % 2 = 0 then "even" else "odd" But I prefer to use pattern matching for this kind of conditional logic. In fact, I will attempt to go through this entire book without using an if…then statement. With isEven written, I can now chain my functions together like this: let multipliedAndIsEven = ints |> Array.map (fun i -> multiplyByTwo i) |> Array.map (fun i -> isEven i) If you send it to REPL, you should see this: val multipliedAndIsEven : string [] = [|"even"; "even"; "even"; "even"; "even"; "even"|] In this case, the resulting array from the first pipe Array.map (fun i -> multiplyByTwo i))gets sent to the next function Array.map (fun i -> isEven i). This means we might have three arrays floating around in memory: ints which is passed into the first pipe, the result from the first pipe that is passed into the second pipe, and the result from the second pipe. From your mental model point of view, you can think about each array being passed from one function into the next. In this book, I will be chaining pipe forwards frequently as it is such a powerful construct and it perfectly matches the thought process when we are creating and using machine learning models. You now know enough F# to get you up and running with the first machine learning models in this book. I will be introducing other F# language features as the book goes along, but this is a good start. As you will see, F# is truly a powerful language where a simple syntax can lead to very complex work. Third-party libraries The following are a few third-party libraries that we will cover in our book later on: Math.NET Math.NET is an open source project that was created to augment (and sometimes replace) the functions that are available in System.Math. Its home page is http://www.mathdotnet.com/. We will be using Math.Net's Numerics and Symbolics namespaces in some of the machine learning algorithms that we will write by hand. A nice feature about Math.Net is that it has strong support for F#. Accord.NET Accord.NET is an open source project that was created to implement many common machine learning models. Its home page is http://accord-framework.net/. Although the focus of Accord.NET was for computer vision and signal processing, we will be using Accord.Net extensively in this book as it makes it very easy to implement algorithms in our problem domain. Numl Numl is an open source project that implements several common machine learning models as experiments. Its home page is http://numl.net/. Numl is newer than any of the other third-party libraries that we will use in the book, so it may not be as extensive as the other ones, but it can be very powerful and helpful in certain situations. Summary We covered a lot of ground in this article. We discussed what machine learning is, why you want to learn about it in the .NET stack, how to get up and running using F#, and had a brief introduction to the major open source libraries that we will be using in this book. With all this preparation out of the way, we are ready to start exploring machine learning. Further resources on this subject: ASP.Net Site Performance: Improving JavaScript Loading [article] Displaying MySQL data on an ASP.NET Web Page [article] Creating a NHibernate session to access database within ASP.NET [article]
Read more
  • 0
  • 0
  • 6676

article-image-core-data-ios-designing-data-model-and-building-data-objects
Packt
05 May 2011
7 min read
Save for later

Core Data: Designing a Data Model and Building Data Objects

Packt
05 May 2011
7 min read
To design a data model with Core Data, we need to create a new project. So, let's start there... Creating a new project To create a new project, perform the following steps: Launch Xcode and create a new project by selecting the File | New Project option. The New Project Assistant window will appear, prompting us to select a template for the new project, as shown in the next screenshot. We will select the Navigation-based Application template. Ensure that the Use Core Data for storage checkbox is checked and click on the Choose... button. On selecting the Choose... button, we will get a dialog box to specify the name and the location of the project. Let us keep the location the same as default (Documents folder) and assign the project name as: prob (any name). Click on Save. Xcode will then generate the project files and the project gets opened in the Xcode project window. The checkbox Use Core Data for storage will ask Xcode to provide all the default code that is required for using Core Data. This option is visible with only two project templates: Navigationbased Application and Window-based Application templates. Designing the data model Designing a data model means defining entities, attributes, and relationships for our application using a special tool. Xcode includes a data modeling tool (also known as Data Model Editor or simply a modeler) that facilitates the creation of entities, defining attributes, and the relationships among them. Data Model Editor The Data Model Editor is a data modeling tool provided by Xcode that makes the job of designing a data model quite easy. It displays the browser as well as a diagram view of the data model. The Browser view displays two panes, the Entity pane and the Properties pane, for defining entities and their respective properties. The diagram view displays rounded rectangles that designate entities and lines to show relationships among the entities. Adding an entity To add an entity to our data model, perform the following steps: Invoke the data modeling tool by double-clicking the prob.xcdatamodel file in the Resources group found in the Xcode Project window. Xcode's data modeling tool will open and we will find that an entity by default is already created for us by the name: Event (as shown in the next image) with an attribute: timeStamp. We can delete or rename the default entity Event as desired. Let us select the default Event entity and delete it by clicking on the minus (-) button in the Entity pane followed by either choosing plus (+) button in the Entity pane or by choosing Design | Data Model | Add Entity option from the menu bar. This will add a blank entity (by the name Entity) to our data model, which we can rename as per our requirements. Let us set the name of the new entity as: Customer. Automatically, an instance of NSManagedObject will be created to represent our newly created Customer entity. The next step is to add attributes to this entity. Adding an attribute property We want to add three attributes by name—name, emailid, and contactno—to the Customer entity. Let's follow the steps mentioned next for the same: Select the entity and choose the Design | Data Model | Add Attribute option from the menu bar or select the + (plus) button in the Property pane. A menu with several options such as Add Attribute, Add Fetched property, Add Relationship, and Add Fetch Request will pop up. We select the Add Attribute option from the popped up menu. We see that a new attribute property is created for our Customer entity by a default name: newAttribute in the inspector. Let us rename our new attribute as: name (as we will be using this attribute to store the names of the customers). Then, we set the type of the name attribute to String as shown in the next screenshot (as names consists of strings): Below the Name field are three checkboxes: Optional, Transient, and Indexed. Though we will be using the Optional checkbox for the name attribute, let us see the usage of all three: Optional: If this checkbox is checked, it means the entity can be saved even if the attribute is nil (empty). If this checkbox is unchecked and we try to save the entity with this attribute set to nil, it will result in a validation error. When used with a relationship, if the checkbox is checked it means that the relationship can be empty. Suppose that we create one more entity say: Credit Card (where information of the customer's credit card is kept). In that case, the relationship from customer to the credit card will be optional (we have to leave this checkbox checked) as a customer may or may not have a credit card. And if we create an entity say: Product—in that case, the relationship from the Customer to the Product cannot be empty as a customer will definitely buy at least a single product (the checkbox has to be unchecked). Transient: This checkbox, if checked, means that the attribute or the relationship is of a temporary nature and we don't want it to be stored (persist) in the persistent store. This checkbox must be unchecked for the attributes or relationship that we want to persist (to be stored on the disk). Indexed: This checkbox has to be checked to apply indexing on the attribute. It is used when we want to perform sorting or searching on some attribute. By checking this checkbox, an index will be created on that attribute and the database will be ordered on that attribute. Types of attributes Using the Type drop-down list control, we select the data type (that is, numerical, string, date, and so on) of the attribute to specify the kind of information that can be stored in the attribute. The following is the list of data types: Integer 16, Integer 32, and Integer 64 data types are for storing signed integers. The range of values that these types are able to store is as follows: Integer 16:-32,768 to 32, 767 Integer 32:-2,147,483,648 to 2,147,483,647 Integer 64:-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 Decimal, Double, and Float data types are for storing fractional numbers. The Double data type uses 64 bits to store a value while the Float data type uses 32 bits for storing a value. The only limitation with these two data types is that they round off the values. To avoid any rounding of values, the Decimal data type is preferred. The Decimal type uses fixed point numbers for storing values, so the numerical value stored in it is not rounded off. String data type is used for storing text contents. Boolean data type is used for storing YES or NO values. Date data type is used for storing dates as well as timestamps. Binary data type is used for storing binary data. Transformable data type works along with Value Transformers that help us create attributes based on any Objective-C class, that is, we can create custom data types other than the standard data types. This data type can be used to store an instance of UIColor, UIImage, and so on. It archives objects to instances of NSData. Below the Type drop-down menu, we will see a few more fields in the detail pane, as shown in the next screenshot: Fields applying constraints Min Length: and Max Length: fields are for applying constraints of minimum and maximum number of characters to an attribute. If we exceed the range supplied in these fields, we get a validation error. Meaning, if we enter the string of fewer characters than the value supplied in Min Length: or the string has more characters than the value supplied in Max Length: field, this will result in a validation error while saving managed objects. Reg. Ex: field stands for regular expression and is used for applying validation checks on the data entered in the attribute by making use of regular expressions. Default Value: field is for specifying default value of the attribute. If we create a new managed object, the attribute will automatically be set to the default value specified in this field. Let us add two more attributes to the Customer entity: emailid and contactno (for storing a customer's e-mail address and contact number, respectively). These two attributes will also be of type: String as shown in the next screenshot. Now, save the .xcdatamodel.
Read more
  • 0
  • 0
  • 6672
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-game-development-using-c
Packt
07 Jun 2016
14 min read
Save for later

Game Development Using C++

Packt
07 Jun 2016
14 min read
C++ is one of the most popular languages for game development as it supports a variety of coding styles that provide low-level access to the system. In this article by Druhin Mukherjee, author of the book C++ Game Development Cookbook, we will go through a basics of game development using C++. (For more resources related to this topic, see here.) Creating your first simple game Creating a simple text-based game is really easy. All we need to do is to create some rules and logic and we will have ourselves a game. Of course, as the game gets more complex we need to add more functions. When the game reaches a point where there are multiple behaviors and states of objects and enemies, we should use classes and inheritance to achieve the desired result. Getting ready To work through this recipe, you will need a machine running Windows. You also need to have a working copy of Visual Studio installed on your Windows machine. No other prerequisites are required. How to do it… In this recipe, we will learn how to create a simple luck-based lottery game: Open Visual Studio. Create a new C++ project. Select Win32 Console Application. Add a Source.cpp file. Add the following lines of code to it: #include <iostream> #include <cstdlib> #include <ctime>   int main(void) { srand(time(NULL)); // To not have the same numbers over and over again.   while (true) { // Main loop.     // Initialize and allocate. intinumber = rand() % 100 + 1 // System number is stored in here. intiguess; // User guess is stored in here. intitries = 0; // Number of tries is stored here. charcanswer; // User answer to question is stored here.       while (true) { // Get user number loop.       // Get number. std::cout<< "Enter a number between 1 and 100 (" << 20 - itries<< " tries left): "; std::cin>>iguess; std::cin.ignore();         // Check is tries are taken up. if (itries>= 20) { break;       }         // Check number. if (iguess>inumber) { std::cout<< "Too high! Try again.n";       } else if (iguess<inumber) { std::cout<< "Too low! Try again.n";       } else { break;       }         // If not number, increment tries. itries++;     }       // Check for tries. if (itries>= 20) { std::cout<< "You ran out of tries!nn";     } else {       // Or, user won. std::cout<< "Congratulations!! "<<std::endl; std::cout<< "You got the right number in "<<itries<< " tries!n";     }   while (true) { // Loop to ask user is he/she would like to play again.       // Get user response. std::cout<< "Would you like to play again (Y/N)? "; std::cin>>canswer; std::cin.ignore();         // Check if proper response. if (canswer == 'n' || canswer == 'N' || canswer == 'y' || canswer == 'Y') { break;       } else { std::cout<< "Please enter 'Y' or 'N'...n";       }     }       // Check user's input and run again or exit; if (canswer == 'n' || canswer == 'N') { std::cout<< "Thank you for playing!"; break;     } else { std::cout<< "nnn";     }   }     // Safely exit. std::cout<< "nnEnter anything to exit. . . "; std::cin.ignore(); return 0; } How it works… The game works by creating a random number from 1 to 100 and asks the user to guess that number. Hints are provided as to whether the number guessed is higher or lower than the actual number. The user is given just 20 tries to guess the number. We first need a pseudo seeder, based on which we are going to generate a random number. The pseudo seeder in this case is srand. We have chosen TIME as a value to generate our random range. We need to execute the program in an infinite loop so that the program breaks only when all tries are used up or when the user correctly guesses the number. We can set a variable for tries and increment for every guess a user takes. The random number is generated by the rand function. We use rand%100+1 so that the random number is in the range 1 to 100. We ask the user to input the guessed number and then we check whether that number is less than, greater than, or equal to the randomly generated number. We then display the correct message. If the user has guessed correctly, or all tries have been used, the program should break out of the main loop. At this point, we ask the user whether they want to play the game again. Then, depending on the answer, we go back into the main loop and start the process of selecting a random number. Creating your first window Creating a window is the first step in Windows programming. All our sprites and other objects will be drawn on top of this window. There is a standard way of drawing a window. So this part of the code will be repeated in all programs that use Windows programming to draw something. Getting ready You need to have a working copy of Visual Studio installed on your Windows machine. How to do it… In this recipe, we will find out how easy it is to create a window: Open Visual Studio. Create a new C++ project. Select a Win32 Windows application. Add a source file called Source.cpp. Add the following lines of code to it: #define WIN32_LEAN_AND_MEAN   #include <windows.h>   // Include all the windows headers. #include <windowsx.h>  // Include useful macros. #include "resource.h"   #define WINDOW_CLASS_NAMEL"WINCLASS1"     voidGameLoop() {   //One frame of game logic occurs here... }   LRESULT CALLBACK WindowProc(HWND _hwnd, UINT _msg, WPARAM _wparam, LPARAM _lparam) {   // This is the main message handler of the system. PAINTSTRUCTps; // Used in WM_PAINT. HDChdc;        // Handle to a device context.     // What is the message? switch (_msg)   { caseWM_CREATE:   {             // Do initialization stuff here.               // Return Success. return (0);   } break;   caseWM_PAINT:   {            // Simply validate the window. hdc = BeginPaint(_hwnd, &ps);              // You would do all your painting here...   EndPaint(_hwnd, &ps);              // Return Success. return (0);   } break;   caseWM_DESTROY:   {              // Kill the application, this sends a WM_QUIT message. PostQuitMessage(0);                // Return success. return (0);   } break;     default:break;   } // End switch.     // Process any messages that we did not take care of...   return (DefWindowProc(_hwnd, _msg, _wparam, _lparam)); }   intWINAPIWinMain(HINSTANCE _hInstance, HINSTANCE _hPrevInstance, LPSTR _lpCmdLine, int _nCmdShow) { WNDCLASSEXwinclass; // This will hold the class we create. HWNDhwnd;           // Generic window handle.   MSG msg;             // Generic message.   HCURSORhCrosshair = LoadCursor(_hInstance, MAKEINTRESOURCE(IDC_CURSOR2));     // First fill in the window class structure. winclass.cbSize = sizeof(WNDCLASSEX); winclass.style = CS_DBLCLKS | CS_OWNDC | CS_HREDRAW | CS_VREDRAW; winclass.lpfnWndProc = WindowProc; winclass.cbClsExtra = 0; winclass.cbWndExtra = 0; winclass.hInstance = _hInstance; winclass.hIcon = LoadIcon(NULL, IDI_APPLICATION); winclass.hCursor = LoadCursor(_hInstance, MAKEINTRESOURCE(IDC_CURSOR2)); winclass.hbrBackground = static_cast<HBRUSH>(GetStockObject(WHITE_BRUSH)); winclass.lpszMenuName = NULL; winclass.lpszClassName = WINDOW_CLASS_NAME; winclass.hIconSm = LoadIcon(NULL, IDI_APPLICATION);     // register the window class if (!RegisterClassEx(&winclass))   { return (0);   }     // create the window hwnd = CreateWindowEx(NULL, // Extended style. WINDOW_CLASS_NAME,      // Class. L"Packt Publishing",   // Title. WS_OVERLAPPEDWINDOW | WS_VISIBLE,     0, 0,                    // Initial x,y.     400, 400,                // Initial width, height.     NULL,                   // Handle to parent.     NULL,                   // Handle to menu.     _hInstance,             // Instance of this application.     NULL);                  // Extra creation parameters.   if (!(hwnd))   { return (0);   }     // Enter main event loop while (true)   {     // Test if there is a message in queue, if so get it. if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))     {       // Test if this is a quit. if (msg.message == WM_QUIT)       { break;       }         // Translate any accelerator keys. TranslateMessage(&msg);       // Send the message to the window proc. DispatchMessage(&msg);     }       // Main game processing goes here. GameLoop(); //One frame of game logic occurs here...   }     // Return to Windows like this... return (static_cast<int>(msg.wParam)); } How it works… In this example, we have used the standard Windows API callback. We query on the message parameter that is passed and, based on that, we intercept and perform suitable actions. We have used the WM_PAINT message to paint the window for us and the WM_DESTROY message to destroy the current window. To paint the window, we need a handle to the device context and then we can use BeginPaint and EndPaint appropriately. In the main structure, we need to fill up the Windows structures and specify the current cursor and icons that need to be loaded. Here, we can specify what color brush we are going to use to paint the window. Finally, the size of the window is specified and registered. After that, we need to continuously peek messages, translate them, and finally dispatch them to the Windows procedure. Adding artificial intelligence to a game Adding artificial intelligence to a game may be easy or extremely difficult, based on the level of realism or complexity we are trying to achieve. In this recipe, we will start with the basics of adding artificial intelligence. Getting ready To work through this recipe, you will need a machine running Windows and a version of Visual Studio. No other prerequisites are required. How to do it… In this recipe, we will see how easy it is to add a basic artificial intelligence to the game. Add a source file called Source.cpp. Add the following code to it: // Basic AI : Keyword identification   #include <iostream> #include <string> #include <string.h>     std::stringarr[] = { "Hello, what is your name ?", "My name is Siri" };   int main() {   std::stringUserResponse;   std::cout<< "Enter your question? "; std::cin>>UserResponse;   if (UserResponse == "Hi")   { std::cout<<arr[0] <<std::endl; std::cout<<arr[1];   }     int a; std::cin>> a; return 0;   } How it works… In the previous example, we are using a string array to store a response. The idea of the software is to create an intelligent chat bot that can reply to questions asked by users and interact with them as if it were human. Hence the first task was to create an array of responses. The next thing to do is to ask the user for the question. In this example, we are searching for a basic keyword called Hi and, based on that, we are displaying the appropriate answer.  Of course, this is a very basic implementation. Ideally we would have a list of keywords and responses when either of the keywords is triggered. We can even personalize this by asking the user for their name and then appending it to the answer every time. The user may also ask to search for something. That is actually quite an easy thing to do. If we have detected the word that the user is longing to search for correctly, we just need to enter that into the search engine. Whatever result the page displays, we can report it back to the user. We can also use voice commands to enter the questions and give the responses. In this case, we would also need to implement some kind of NLP (Natural Language Processing). After the voice command is correctly identified, all the other processes are exactly the same. Using the const keyword to optimize your code Aconst keyword is used to make data or a pointer constant so that we cannot change the value or address, respectively. There is one more advantage of using the const keyword. This is particularly useful in the object-oriented paradigm. Getting ready For this recipe, you will need a Windows machine and an installed version of Visual Studio. How to do it… In this recipe, we will find out how easy it is to use the const keyword effectively: #include <iostream>   class A { public:   voidCalc()const   { Add(a, b);     //a = 9;       // Not Allowed   } A()   {     a = 10;     b = 10;     } private:   int a, b; void Add(int a, int b)const   {   std::cout<< a + b <<std::endl;   } };   int main() {     A _a;   _a.Calc();   int a; std::cin>> a;   return 0; } How it works… In this example, we are writing a simple application to add two numbers. The first function is a public function. This mean that it is exposed to other classes. Whenever we write public functions, we must ensure that they are not harming any private data of that class. As an example, if the public function was to return the values of the member variables or change the values, then this public function is very risky. Therefore, we must ensure that the function cannot modify any member variables by adding the const keyword at the end of the function. This ensures that the function is not allowed to change any member variables. If we try to assign a different value to the member, we will get a compiler error: errorC3490: 'a' cannot be modified because it is being accessed through a const object. So this makes the code more secure. However, there is another problem. This public function internally calls another private function. What if this private function modifies the values of the member variables? Again, we will be at the same risk. As a result, C++ does not allow us to call that function unless it has the same signature of const at the end of the function. This is to ensure that the function cannot change the values of the member variables. Summary C++ is still used as the game programming languageof choice by many as it gives game programmers control of the entire architecture, including memorypatterns and usage. For detailed game development using C++ you can refer to: https://www.packtpub.com/game-development/procedural-content-generation-c-game-development https://www.packtpub.com/game-development/learning-c-creating-games-ue4 Resources for Article: Further resources on this subject: Putting the Fun in Functional Python [article] Playing with Physics [article] Introducing GameMaker [article]
Read more
  • 0
  • 0
  • 6662

Packt
18 Jul 2014
17 min read
Save for later

C10K – A Non-blocking Web Server in Go

Packt
18 Jul 2014
17 min read
This article by Nathan Kozyra, author of Mastering Concurrency in Go, tackles one of the Internet's most famous and esteemed challenges and attempt to solve it with core Go packages. (For more resources related to this topic, see here.) We've built a few usable applications; things we can start with and leapfrog into real systems for everyday use. By doing so, we've been able to demonstrate the basic and intermediate-level patterns involved in Go's concurrent syntax and methodology. However, it's about time we take on a real-world problem—one that has vexed developers (and their managers and VPs) for a great deal of the early history of the Web. In addressing and, hopefully, solving this problem, we'll be able to develop a high-performance web server that can handle a very large volume of live, active traffic. For many years, the solution to this problem was solely to throw hardware or intrusive caching systems at the problem; so, alternately, solving it with programming methodology should excite any programmer. We'll be using every technique and language construct we've learned so far, but we'll do so in a more structured and deliberate way than we have up to now. Everything we've explored so far will come into play, including the following points: Creating a visual representation of our concurrent application Utilizing goroutines to handle requests in a way that will scale Building robust channels to manage communication between goroutines and the loop that will manage them Profiling and benchmarking tools (JMeter, ab) to examine the way our event loop actually works Timeouts and concurrency controls—when necessary—to ensure data and request consistency Attacking the C10K problem The genesis of the C10K problem is rooted in serial, blocking programming, which makes it ideal to demonstrate the strength of concurrent programming, especially in Go. When he asked this in 1999, for many server admins and engineers, serving 10,000 concurrent visitors was something that would be solved with hardware. The notion that a single server on common hardware could handle this type of CPU and network bandwidth without falling over seemed foreign to most. The crux of his proposed solutions relied on producing non-blocking code. Of course, in 1999, concurrency patterns and libraries were not widespread. C++ had some polling and queuing options available via some third-party libraries and the earliest predecessor to multithreaded syntaxes, later available through Boost and then C++11. Over the coming years, solutions to the problem began pouring in across various flavors of languages, programming design, and general approaches. Any performance and scalability problem will ultimately be bound to the underlying hardware, so as always, your mileage may vary. Squeezing 10,000 concurrent connections on a 486 processor with 500 MB of RAM will certainly be more challenging than doing so on a barebones Linux server stacked with memory and multiple cores. It's also worth noting that a simple echo server would obviously be able to assume more cores than a functional web server that returns larger amounts of data and accepts greater complexity in requests, sessions, and so on, as we'll be dealing with here. Failing of servers at 10,000 concurrent connections When the Web was born and the Internet commercialized, the level of interactivity was pretty minimal. If you're a graybeard, you may recall the transition from NNTP/IRC and the like and how extraordinarily rudimentary the Web was. To address the basic proposition of [page request] → [HTTP response], the requirements on a web server in the early 1990s were pretty lenient. Ignoring all of the error responses, header reading, and settings, and other essential (but unrelated to the in → out mechanism) functions, the essence of the early servers was shockingly simple, at least compared to the modern web servers. The first web server was developed by the father of the Web, Tim Berners-Lee. Developed at CERN (such as WWW/HTTP itself), CERN httpd handled many of the things you would expect in a web server today—hunting through the code, you'll find a lot of notation that will remind you that the very core of the HTTP protocol is largely unchanged. Unlike most technologies, HTTP has had an extraordinarily long shelf life. Written in C in 1990, it was unable to utilize a lot of concurrency strategies available in languages such as Erlang. Frankly, doing so was probably unnecessary—the majority of web traffic was a matter of basic file retrieval and protocol. The meat and potatoes of a web server were not dealing with traffic, but rather dealing with the rules surrounding the protocol itself. You can still access the original CERN httpd site and download the source code for yourself from http://www.w3.org/Daemon/. I highly recommend that you do so as both a history lesson and a way to look at the way the earliest web server addressed some of the earliest problems. However, the Web in 1990 and the Web when the C10K question was first posed were two very different environments. By 1999, most sites had some level of secondary or tertiary latency provided by third-party software, CGI, databases, and so on, all of which further complicated the matter. The notion of serving 10,000 flat files concurrently is a challenge in itself, but try doing so by running them on top of a Perl script that accesses a MySQL database without any caching layer; this challenge is immediately exacerbated. By the mid 1990s, the Apache web server had taken hold and largely controlled the market (by 2009, it had become the first server software to serve more than 100 million websites). Apache's approach was rooted heavily in the earliest days of the Internet. At its launch, connections were initially handled first in, first out. Soon, each connection was assigned a thread from the thread pool. There are two problems with the Apache server. They are as follows: Blocking connections can lead to a domino effect, wherein one or more slowly resolved connections could avalanche into inaccessibility Apache had hard limits on the number of threads/workers you could utilize, irrespective of hardware constraints It's easy to see the opportunity here, at least in retrospect. A concurrent server that utilizes actors (Erlang), agents (Clojure), or goroutines (Go) seems to fit the bill perfectly. Concurrency does not solve the C10k problem in itself, but it absolutely provides a methodology to facilitate it. The most notable and visible example of an approach to the C10K problem today is Nginx, which was developed using concurrency patterns, widely available in C by 2002 to address—and ultimately solve—the C10k problem. Nginx, today, represents either the #2 or #3 web server in the world, depending on the source. Using concurrency to attack C10K There are two primary approaches to handle a large volume of concurrent requests. The first involves allocating threads per connection. This is what Apache (and a few others) do. On the one hand, allocating a thread to a connection makes a lot of sense—it's isolated, controllable via the application's and kernel's context switching, and can scale with increased hardware. One problem for Linux servers—on which the majority of the Web lives—is that each allocated thread reserves 8 MB of memory for its stack by default. This can (and should) be redefined, but this imposes a largely unattainable amount of memory required for a single server. Even if you set the default stack size to 1 MB, we're dealing with a minimum of 10 GB of memory just to handle the overhead. This is an extreme example that's unlikely to be a real issue for a couple of reasons: first, because you can dictate the maximum amount of resources available to each thread, and second, because you can just as easily load balance across a few servers and instances rather than add 10 GB to 80 GB of RAM. Even in a threaded server environment, we're fundamentally bound to the issue that can lead to performance decreases (to the point of a crash). First, let's look at a server with connections bound to threads (as shown in the following diagram), and visualize how this can lead to logjams and, eventually, crashes: This is obviously what we want to avoid. Any I/O, network, or external process that can impose some slowdown can bring about that avalanche effect we talked about, such that our available threads are taken (or backlogged) and incoming requests begin to stack up. We can spawn more threads in this model, but as mentioned earlier, there are potential risks there too, and even this will fail to mitigate the underlying problem. Taking another approach In an attempt to create our web server that can handle 10,000 concurrent connections, we'll obviously leverage our goroutine/channel mechanism to put an event loop in front of our content delivery to keep new channels recycled or created constantly. For this example, we'll assume we're building a corporate website and infrastructure for a rapidly expanding company. To do this, we'll need to be able to serve both static and dynamic content. The reason we want to introduce dynamic content is not just for the purposes of demonstration—we want to challenge ourselves to show 10,000 true concurrent connections even when a secondary process gets in the way. As always, we'll attempt to map our concurrency strategy directly to goroutines and channels. In a lot of other languages and applications, this is directly analogous to an event loop, and we'll approach it as such. Within our loop, we'll manage the available goroutines, expire or reuse completed ones, and spawn new ones where necessary. In this example visualization, we show how an event loop (and corresponding goroutines) can allow us to scale our connections without employing too many hard resources such as CPU threads or RAM: The most important step for us here is to manage that event loop. We'll want to create an open, infinite loop to manage the creation and expiration of our goroutines and respective channels. As part of this, we will also want to do some internal logging of what's happening, both for benchmarking and debugging our application. Building our C10K web server Our web server will be responsible for taking requests, routing them, and serving either flat files or dynamic files with templates parsed against a few different data sources. As mentioned earlier, if we exclusively serve flat files and remove much of the processing and network latency, we'd have a much easier time with handling 10,000 concurrent connections. Our goal is to approach as much of a real-world scenario as we can—very little of the Web operates on a single server in a static fashion. Most websites and applications utilize databases, CDNs (Content Delivery Networks), dynamic and uncached template parsing, and so on. We need to replicate them whenever possible. For the sake of simplicity, we'll separate our content by type and filter them through URL routing, as follows: /static/[request]: This will serve request.html directly /template/[request]: This will serve request.tpl after its been parsed through Go /dynamic/[request][number]: This will also serve request.tpl and parse it against a database source's record By doing this, we should get a better mixture of possible HTTP request types that could impede the ability to serve large numbers of users simultaneously, especially in a blocking web server environment. You can find Go's exceptional library to generate safe data-driven templating at http://golang.org/pkg/html/template/. By safe, we're largely referring to the ability to accept data and move it directly into templates without worrying about the sort of injection issues that are behind a large amount of malware and cross-site scripting. For the database source, we'll use MySQL here, but feel free to experiment with other databases if you're more comfortable with them. Like the html/template package, we're not going to put a lot of time into outlining MySQL and/or its variants. Benchmarking against a blocking web server It's only fair to add some starting benchmarks against a blocking web server first so that we can measure the effect of concurrent versus nonconcurrent architecture. For our starting benchmarks, we'll eschew any framework, and we'll go with our old stalwart, Apache. For the sake of completeness here, we'll be using an Intel i5 3GHz machine with 8 GB of RAM. While we'll benchmark our final product on Ubuntu, Windows, and OS X here, we'll focus on Ubuntu for our example. Our localhost domain will have three plain HTML files in /static, each trimmed to 80 KB. As we're not using a framework, we don't need to worry about raw dynamic requests, but only about static and dynamic requests in addition to data source requests. For all examples, we'll use a MySQL database (named master) with a table called articles that will contain 10,000 duplicate entries. Our structure is as follows: CREATE TABLE articles ( article_id INT NOT NULL AUTO_INCREMENT, article_title VARCHAR(128) NOT NULL, article_text VARCHAR(128) NOT NULL, PRIMARY KEY (article_id) ) With ID indexes ranging sequentially from 0-10,000, we'll be able to generate random number requests, but for now, we just want to see what kind of basic response we can get out of Apache serving static pages with this machine. For this test, we'll use Apache's ab tool and then gnuplot to sequentially map the request time as the number of concurrent requests and pages; we'll do this for our final product as well, but we'll also go through a few other benchmarking tools for it to get some better details.   Apache's AB comes with the Apache web server itself. You can read more about it at http://httpd.apache.org/docs/2.2/programs/ab.html. You can download it for Linux, Windows, OS X, and more from http://httpd.apache.org/download.cgi. The gnuplot utility is available for the same operating systems at http://www.gnuplot.info/. So, let's see how we did it. Have a look at the following graph: Ouch! Not even close. There are things we can do to tune the connections available (and respective threads/workers) within Apache, but this is not really our goal. Mostly, we want to know what happens with an out-of-the-box Apache server. In these benchmarks, we start to drop or refuse connections at around 800 concurrent connections. More troubling is that as these requests start stacking up, we see some that exceed 20 seconds or more. When this happens in a blocking server, each request behind it is queued; requests behind that are similarly queued and the entire thing starts to fall apart. Even if we cannot hit 10,000 concurrent connections, there's a lot of room for improvement. While a single server of any capacity is no longer the way we expect a web server environment to be designed, being able to squeeze as much performance as possible out of that server, ostensibly with our concurrent, event-driven approach, should be our goal. Handling requests The Gorilla toolkit certainly makes this easier, but we should also know how to intercept the functionality to impose our own custom handler. Here is a simple web router wherein we handle and direct requests using a custom http.Server struct, as shown in the following code: var routes []string type customRouter struct { } func (customRouter) ServeHTTP(rw http.ResponseWriter, r *http.Request) { fmt.Println(r.URL.Path); } func main() { var cr customRouter; server := &http.Server { Addr: ":9000", Handler:cr, ReadTimeout: 10 * time.Second, WriteTimeout: 10 * time.Second, MaxHeaderBytes: 1 << 20, } server.ListenAndServe() } Here, instead of using a built-in URL routing muxer and dispatcher, we're creating a custom server and custom handler type to accept URLs and route requests. This allows us to be a little more robust with our URL handling. In this case, we created a basic, empty struct called customRouter and passed it to our custom server creation call. We can add more elements to our customRouter type, but we really don't need to for this simple example. All we need to do is to be able to access the URLs and pass them along to a handler function. We'll have three: one for static content, one for dynamic content, and one for dynamic content from a database. Before we go so far though, we should probably see what our absolute barebones, HTTP server written in Go, does when presented with the same traffic that we sent Apache's way. By old school, we mean that the server will simply accept a request and pass along a static, flat file. You could do this using a custom router as we did earlier, taking requests, opening files, and then serving them, but Go provides a much simpler mode to handle this basic task in the http.FileServer method. So, to get some benchmarks for the most basic of Go servers against Apache, we'll utilize a simple FileServer and test it against a test.html page (which contains the same 80 KB file that we had with Apache). As our goal with this test is to improve our performance in serving flat and dynamic pages, the actual specs for the test suite are somewhat immaterial. We'd expect that while the metrics will not match from environment to environment, we should see a similar trajectory. That said, it's only fair we supply the environment used for these tests; in this case, we used a MacBook Air with a 1.4 GHz i5 processor and 4 GB of memory. First, we'll do this with our absolute best performance out of the box with Apache, which had 850 concurrent connections and 900 total requests. The results are certainly encouraging as compared to Apache. Neither of our test systems were tweaked much (Apache as installed and basic FileServer in Go), but Go's FileServer handles 1,000 concurrent connections without so much as a blip, with the slowest clocking in at 411 ms. Apache has made a great number of strides pertaining to concurrency and performance options in the last five years, but to get there does require a bit of tuning and testing. The intent of this experiment is not intended to denigrate Apache, which is well tested and established. Instead, it's to compare the out-of-the-box performance of the world's number 1 web server against what we can do with Go. To really get a baseline of what we can achieve in Go, let's see if Go's FileServer can hit 10,000 connections on a single, modest machine out of the box: ab -n 10500 -c 10000 -g test.csv http://localhost:8080/a.html We will get the following output: Success! Go's FileServer by itself will easily handle 10,000 concurrent connections, serving flat, static content. Of course, this is not the goal of this particular project—we'll be implementing real-world obstacles such as template parsing and database access, but this alone should show you the kind of starting point that Go provides for anyone who needs a responsive server that can handle a large quantity of basic web traffic. Routing requests So, let's take a step back and look again at routing our traffic through a traditional web server to include not only our static content, but also the dynamic content. We'll want to create three functions that will route traffic from our customRouter:serveStatic():: read function and serve a flat file serveRendered():, parse a template to display serveDynamic():, connect to MySQL, apply data to a struct, and parse a template. To take our requests and reroute, we'll change the ServeHTTP method for our customRouter struct to handle three regular expressions. For the sake of brevity and clarity, we'll only be returning data on our three possible requests. Anything else will be ignored. In a real-world scenario, we can take this approach to aggressively and proactively reject connections for requests we think are invalid. This would include spiders and nefarious bots and processes, which offer no real value as nonusers.
Read more
  • 0
  • 0
  • 6661

article-image-testing-node-and-hapi
Packt
09 Feb 2016
22 min read
Save for later

Testing in Node and Hapi

Packt
09 Feb 2016
22 min read
In this article by John Brett, the author of the book Getting Started with Hapi.js, we are going to explore the topic of testing in node and hapi. We will look at what is involved in writing a simple test using hapi's test runner, lab, how to test hapi applications, techniques to make testing easier, and finally how to achieve the all-important 100% code coverage. (For more resources related to this topic, see here.) The benefits and importance of testing code Technical debt is developmental work that must be done before a particular job is complete, or else it will make future changes much harder to implement later on. A codebase without tests is a clear indication of technical debt. Let's explore this statement in more detail. Even very simple applications will generally comprise: Features, which the end user interacts with Shared services, such as authentication and authorization, that features interact with These will all generally depend on some direct persistent storage or API. Finally, to implement most of these features and services, we will use libraries, frameworks, and modules regardless of language. So, even for simpler applications, we have already arrived at a few dependencies to manage, where a change that causes a break in one place could possibly break everything up the chain. So let's take a common use case, in which a new version of one of your dependencies is released. This could be a new hapi version, a smaller library, your persistent storage engine, MySQL, MongoDB, or even an operating system or language version. SemVer, as mentioned previously, attempts to mitigate this somewhat, but you are taking someone at their word when they say that they have adhered to this correctly, and SemVer is not used everywhere. So, in the case of a break-causing change, will the current application work with this new dependency version? What will fail? What percentage of tests fail? What's the risk if we don't upgrade? Will support eventually be dropped, including security patches? Without a good automated test suite, these have to be answered by manual testing, which is a huge waste of developer time. Development progress stops here every time these tasks have to be done, meaning that these types of tasks are rarely done, building further technical debt. Apart from this, humans are proven to be poor at repetitive tasks, prone to error, and I know I personally don't enjoy testing manually, which makes me poor at it. I view repetitive manual testing like this as time wasted, as these questions could easily be answered by running a test suite against the new dependency so that developer time could be spent on something more productive. Now, let's look at a worse and even more common example: a security exploit has been identified in one of your dependencies. As mentioned previously, if it's not easy to update, you won't do it often, so you could be on an outdated version that won't receive this security update. Now you have to jump multiple versions at once and scramble to test them manually. This usually means many quick fixes, which often just cause more bugs. In my experience, code changes under pressure are what deteriorate the structure and readability in a codebase, lead to a much higher number of bugs, and are a clear sign of poor planning. A good development team will, instead of looking at what is currently available, look ahead to what is in beta and will know ahead of time if they expect to run into issues. The questions asked will be: Will our application break in the next version of Chrome? What about the next version of node? Hapi does this by running the full test suite against future versions of node in order to alert the node community of how planned changes will impact hapi and the node community as a whole. This is what we should all aim to do as developers. A good test suite has even bigger advantages when working in a team or when adding new developers to a team. Most development teams start out small and grow, meaning all the knowledge of the initial development needs to be passed on to new developers joining the team. So, how do tests lead to a benefit here? For one, tests are a great documentation on how parts of the application work for other members of a team. When trying to communicate a problem in an application, a failing test is a perfect illustration of what and where the problem is. When working as a team, for every code change from yourself or another member of the team, you're faced with the preceding problem of changing a dependency. Do we just test the code that was changed? What about the code that depends on the changed code? Is it going to be manual testing again? If this is the case, how much time in a week would be spent on manual testing versus development? Often, with changes, existing functionality can be broken along with new functionality, which is called regression. Having a good test suite highlights this and makes it much easier to prevent. These are the questions and topics that need to be answered when discussing the importance of tests. Writing tests can also improve code quality. For one, identifying dead code is much easier when you have a good testing suite. If you find that you can only get 90% code coverage, what does the extra 10% do? Is it used at all if it's unreachable? Does it break other parts of the application if removed? Writing tests will often improve your skills in writing easily testable code. Software applications usually grow to be complex pretty quickly—it happens, but we always need to be active in dealing with this, or software complexity will win. A good test suite is one of the best tools we have to tackle this. The preceding is not an exhaustive list on the importance or benefits of writing tests for your code, but hopefully it has convinced you of the importance of having a good testing suite. So, now that we know why we need to write good tests, let's look at hapi's test runner lab and assertion library code and how, along with some tools from hapi, they make the process of writing tests much easier and a more enjoyable experience. Introducing hapi's testing utilities The test runner in the hapi ecosystem is called lab. If you're not familiar with test runners, they are command-line interface tools for you to run your testing suite. Lab was inspired by a similar test tool called mocha, if you are familiar with it, and in fact was initially begun as a fork of the mocha codebase. But, as hapi's needs diverged from the original focus of mocha, lab was born. The assertion library commonly used in the hapi ecosystem is code. An assertion library forms the part of a test that performs the actual checks to judge whether a test case has passed or not, for example, checking that the value of a variable is true after an action has been taken. Lets look at our first test script; then, we can take a deeper look at lab and code, how they function under the hood, and some of the differences they have with other commonly used libraries, such as mocha and chai. Installing lab and code You can install lab and code the same as any other module on npm: npm install lab code -–save-dev Note the --save-dev flag added to the install command here. Remember your package.json file, which describes an npm module? This adds the modules to the devDependencies section of your npm module. These are dependencies that are required for the development and testing of a module but are not required for using the module. The reason why these are separated is that when we run npm install in an application codebase, it only installs the dependencies and devDependencies of package.json in that directory. For all the modules installed, only their dependencies are installed, not their development dependencies. This is because we only want to download the dependencies required to run that application; we don't need to download all the development dependencies for every module. The npm install command installs all the dependencies and devDependencies of package.json in the current working directory, and only the dependencies of the other installed module, not devDependencies. To install the development dependencies of a particular module, navigate to the root directory of the module and run npm install. After you have installed lab, you can then run it with the following: ./node_modules/lab/bin/lab test.js This is quite long to type every time, but fortunately due to a handy feature of npm called npm scripts, we can shorten it. If you look at package.json generated by npm init in the first chapter, depending on your version of npm, you may see the following (some code removed for brevity): ... "scripts": { "test": "echo "Error: no test specified" && exit 1" }, ... Scripts are a list of commands related to the project; they can be for testing purposes, as we will see in this example; to start an application; for build steps; and to start extra servers, among many other options. They offer huge flexibility in how these are combined to manage scripts related to a module or application, and I could spend a chapter, or even a book, on just these, but they are outside the scope of this book, so let's just focus on what is important to us here. To get a list of available scripts for a module application, in the module directory, simply run: $ npm run To then run the listed scripts, such as test you can just use: $ npm run test As you can see, this gives a very clean API for scripts and the documentation for each of them in the project's package.json. From this point on in this book, all code snippets will use npm scripts to test or run any examples. We should strive to use these in our projects to simplify and document commands related to applications and modules for ourselves and others. Let's now add the ability to run a test file to our package.json file. This just requires modifying the scripts section to be the following: ... "scripts": { "test": "./node_modules/lab/bin/lab ./test/index.js" }, ... It is common practice in node to place all tests in a project within the test directory. A handy addition to note here is that when calling a command with npm run, the bin directory of every module in your node_modules directory is added to PATH when running these scripts, so we can actually shorten this script to: … "scripts": { "test": "lab ./test/index.js" }, … This type of module install is considered to be local, as the dependency is local to the application directory it is being run in. While I believe this is how we should all install our modules, it is worth pointing it out that it is also possible to install a module globally. This means that when installing something like lab, it is immediately added to PATH and can be run from anywhere. We do this by adding a -g flag to the install, as follows: $ npm install lab code -g This may appear handier than having to add npm scripts or running commands locally outside of an npm script but should be avoided where possible. Often, installing globally requires sudo to run, meaning you are taking a script from the Internet and allowing it to have complete access to your system. Hopefully, the security concerns here are obvious. Other than that, different projects may use different versions of test runners, assertion libraries, or build tools, which can have unknown side effects and cause debugging headaches. The only time I would use globally installed modules are for command-line tools that I may use outside a particular project—for example, a node base terminal IDE such as slap (https://www.npmjs.com/package/slap) or a process manager such as PM2 (https://www.npmjs.com/package/pm2)—but never with sudo! Now that we are familiar with installing lab and code and the different ways or running it inside and outside of npm scripts, let's look at writing our first test script and take a more in-depth look at lab and code. Our First Test Script Let's take a look at what a simple test script in lab looks like using the code assertion library: const Code = require('code'); [1] const Lab = require('lab'); [1] const lab = exports.lab = Lab.script(); [2] lab.experiment('Testing example', () => { [3] lab.test('fails here', (done) => { [4] Code.expect(false).to.be.true(); [4] return done(); [4] }); [4] lab.test('passes here', (done) => { [4] Code.expect(true).to.be.true(); [4] return done(); [4] }); [4] }); This script, even though small, includes a number of new concepts, so let's go through it with reference to the numbers in the preceding code: [1]: Here, we just include the code and lab modules, as we would any other node module. [2]: As mentioned before, it is common convention to place all test files within the test directory of a project. However, there may be JavaScript files in there that aren't tests, and therefore should not be tested. To avoid this, we inform lab of which files are test scripts by calling Lab.script() and assigning the value to lab and exports.lab. [3]: The lab.experiment() function (aliased lab.describe()) is just a way to group tests neatly. In test output, tests will have the experiment string prefixed to the message, for example, "Testing example fails here". This is optional, however. [4]: These are the actual test cases. Here, we define the name of the test and pass a callback function with the parameter function done(). We see code in action here for managing our assertions. And finally, we call the done() function when finished with our test case. Things to note here: lab tests are always asynchronous. In every test, we have to call done() to finish the test; there is no counting of function parameters or checking whether synchronous functions have completed in order to ensure that a test is finished. Although this requires the boilerplate of calling the done() function at the end of every test, it means that all tests, synchronous or asynchronous, have a consistent structure. In Chai, which was originally used for hapi, some of the assertions such as .ok, .true, and .false use properties instead of functions for assertions, while assertions like .equal(), and .above() use functions. This type of inconsistency leads to us easily forgetting that an assertion should be a method call and hence omitting the (). This means that the assertion is never called and the test may pass as a false positive. Code's API is more consistent in that every assertion is a function call. Here is a comparison of the two: Chai: expect('hello').to.equal('hello'); expect(foo).to.exist; Code: expect('hello').to.equal('hello'); expect('foot').to.exist(); Notice the difference in the second exist() assertion. In Chai, you see the property form of the assertion, while in Code, you see the required function call. Through this, lab can make sure all assertions within a test case are called before done is complete, or it will fail the test. So let's try running our first test script. As we already updated our package.json script, we can run our test with the following command: $ npm run test This will generate the following output: There are a couple of things to note from this. Tests run are symbolized with a . or an X, depending on whether they pass or not. You can get a lab list of the full test title by adding the -v or -–verbose flag to our npm test script command. There are lots of flags to customize the running and output of lab, so I recommend using the full labels for each of these, for example, --verbose and --lint instead of -v and -l, in order to save you the time spent referring back to the documentation each time. You may have noticed the No global variable leaks detected message at the bottom. Lab assumes that the global object won't be polluted and checks that no extra properties have been added after running tests. Lab can be configured to not run this check or whitelist certain globals. Details of this are in the lab documentation availbale at https://github.com/hapijs/lab. Testing approaches This is one of the many known approaches to building a test suite, as is BDD (Behavior Driven Development), and like most test runners in node, lab is unopinionated about how you structure your tests. Details of how to structure your tests in a BDD can again be found easily in the lab documentation. Testing with hapi As I mentioned before, testing is considered paramount in the hapi ecosystem, with every module in the ecosystem having to maintain 100% code coverage at all times, as with all module dependencies. Fortunately, hapi provides us with some tools to make the testing of hapi apps much easier through a module called Shot, which simulates network requests to a hapi server. Let's take the example of a Hello World server and write a simple test for it: const Code = require('code'); const Lab = require('lab'); const Hapi = require('hapi'); const lab = exports.lab = Lab.script(); lab.test('It will return Hello World', (done) => { const server = new Hapi.Server(); server.connection(); server.route({ method: 'GET', path: '/', handler: function (request, reply) { return reply('Hello Worldn'); } }); server.inject('/', (res) => { Code.expect(res.statusCode).to.equal(200); Code.expect(res.result).to.equal('Hello Worldn'); done(); }); }); Now that we are more familiar with with what a test script looks like, most of this will look familiar. However, you may have noticed we never started our hapi server. This means the server was never started and no port assigned, but thanks to the shot module (https://github.com/hapijs/shot), we can still make requests against it using the server.inject API. Not having to start a server means less setup and teardown before and after tests and means that a test suite can run quicker as less resources are required. server.inject can still be used if used with the same API whether the server has been started or not. Code coverage As I mentioned earlier in the article, having 100% code coverage is paramount in the hapi ecosystem and, in my opinion, hugely important for any application to have. Without a code coverage target, writing tests can feel like an empty or unrewarding task where we don't know how many tests are enough or how much of our application or module has been covered. With any task, we should know what our goal is; testing is no different, and this is what code coverage gives us. Even with 100% coverage, things can still go wrong, but it means that at the very least, every line of code has been considered and has at least one test covering it. I've found from working on modules for hapi that trying to achieve 100% code coverage actually gamifies the process of writing tests, making it a more enjoyable experience overall. Fortunately, lab has code coverage integrated, so we don't need to rely on an extra module to achieve this. It's as simple as adding the --coverage or -c flag to our test script command. Under the hood, lab will then build an abstract syntax tree so it can evaluate which lines are executed, thus producing our coverage, which will be added to the console output when we run tests. The code coverage tool will also highlight which lines are not covered by tests, so you know where to focus your testing effort, which is extremely useful in identifying where to focus your testing effort. It is also possible to enforce a minimum threshold as to the percentage of code coverage required to pass a suite of tests with lab through the --threshold or -t flag followed by an integer. This is used for all the modules in the hapi ecosystem, and all thresholds are set to 100. Having a threshold of 100% for code coverage makes it much easier to manage changes to a codebase. When any update or pull request is submitted, the test suite is run against the changes, so we can know that all tests have passed and all code covered before we even look at what has been changed in the proposed submission. There are services that even automate this process for us, such as TravisCI (https://travis-ci.org/). It's also worth knowing that the coverage report can be displayed in a number of formats; For a full list of these reporters with explanations, I suggest reading the lab documentation available at https://github.com/hapijs/lab. Let's now look at what's involved in getting 100% coverage for our previous example. First of all, we'll move our server code to a separate file, which we will place in the lib folder and call index.js. It's worth noting here that it's good testing practice and also the typical module structure in the hapi ecosystem to place all module code in a folder called lib and the associated tests for each file within lib in a folder called test, preferably with a one-to-one mapping like we have done here, where all the tests for lib/index.js are in test/index.js. When trying to find out how a feature within a module works, the one-to-one mapping makes it much easier to find the associated tests and see examples of it in use. So, having separated our server from our tests, let's look at what our two files now look like; first, ./lib/index.js: const Hapi = require('hapi'); const server = new Hapi.Server(); server.connection(); server.route({ method: 'GET', path: '/', handler: function (request, reply) { return reply('Hello Worldn'); } }); module.exports = server; The main change here is that we export our server at the end for another file to acquire and start it if necessary. Our test file at ./test/index.js will now look like this: const Code = require('code'); const Lab = require('lab'); const server = require('../lib/index.js'); const lab = exports.lab = Lab.script(); lab.test('It will return Hello World', (done) => { server.inject('/', (res) => { Code.expect(res.statusCode).to.equal(200); Code.expect(res.result).to.equal('Hello Worldn'); done(); }); }); Finally, for us to test our code coverage, we update our npm test script to include the coverage flag --coverage or -c. The final example of this is in the second example of the source code of Chapter 4, Adding Tests and the Importance of 100% Coverage, which is supplied with this book. If you run this, you'll find that we actually already have 100% of the code covered with this one test. An interesting exercise here would be to find out what versions of hapi this code functions correctly with. At the time of writing, this code was written for hapi version 11.x.x on node.js version 4.0.0. Will it work if run with hapi version 9 or 10? You can test this now by installing an older version with the help of the following command: $ npm install hapi@10 This will give you an idea of how easy it can be to check whether your codebase works with different versions of libraries. If you have some time, it would be interesting to see how this example runs on different versions of node (Hint: it breaks on any version earlier than 4.0.0). In this example, we got 100% code coverage with one test. Unfortunately, we are rarely this fortunate when we increase the complexity of our codebase, and so will the complexity of our tests be, which is where knowledge of writing testable code comes in. This is something that comes with practice by writing tests while writing application or module code. Linting Also built into lab is linting support. Linting enforces a code style that is adhered to, which can be specified through an .eslintrc or .jshintrc file. By default, lab will enforce the the hapi style guide rules. The idea of linting is that all code will have the same structure, making it much easier to spot bugs and keep code tidy. As JavaScript is a very flexible language, linters are used regularly to forbid bad practices such as global or unused variables. To enable the lab linter, simply add the linter flag to the test command, which is --lint or -L. I generally stick with the default hapi style guide rules as they are chosen to promote easy-to-read code that is easily testable and forbids many bad practices. However, it's easy to customize the linting rules used; for this, I recommend referring to the lab documentation. Summary In this article, we covered testing in node and hapi and how testing and code coverage are paramount in the hapi ecosystem. We saw justification for their need in application development and where they can make us more productive developers. We also introduced the test runner and code assertion libraries lab and code in the ecosystem. We saw justification for their use and also how to use them to write simple tests and how to use the tools provided in lab and hapi to test hapi applications. We also learned about some of the extra features baked into lab, such as code coverage and linting. We looked at how to test the code coverage of an application and get it to 100% and how the hapi ecosystem applies the hapi styleguide to all modules using lab's linting integration. Resources for Article: Further resources on this subject: Welcome to JavaScript in the full stack[article] A Typical JavaScript Project[article] An Introduction to Mastering JavaScript Promises and Its Implementation in Angular.js[article]
Read more
  • 0
  • 0
  • 6660

article-image-how-to-integrate-sharepoint-with-sql-server-reporting-services
Kunal Chaudhari
27 Jan 2018
5 min read
Save for later

How to integrate SharePoint with SQL Server Reporting Services

Kunal Chaudhari
27 Jan 2018
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Dinesh Priyankara and Robert C. Cain, titled SQL Server 2016 Reporting Services Cookbook.This book will help you get up and running with the latest enhancements and advanced query and reporting feature in SQL Server 2016.[/box] Today we will learn the steps to integrate SharePoint in the SQL Server Reporting services. We will create a Reporting Services SharePoint application, and set it up in a way that we are able to view reports when they are uploaded to SharePoint. Getting ready For this, all you'll need is a SharePoint instance you can work with. Do make sure you have an administrative access to the SharePoint site. If you have an Azure account, free or paid, you could set up a test instance of SharePoint and use it to follow the instructions in this article. Note the setup of such an Azure instance is outside the scope of this article. In this article, we assume you are using an on premise SharePoint installation. How to do it… Open the SharePoint 2016 Central Administration web page. Click on Manage service applications under the Application Management area: 3. The Service Applications tab now appears at the top of the page. Click on the New menu: 4. In the menu, find and click on the option for SQL Server Reporting Services Service Application: 5. You'll now need to fill out the information for the service application. Start at the top by giving it a good name, here we are using SSRS_SharePoint. 6. Presumably this is a new install, so you'll have to take the Create new application pool option. Give it an appropriate name; in this example, we used SSRS_SharePoint_Pool. 7. Select a security account to run under. Here we selected an account set up by our Active Directory administrator, which has permissions to SQL Server where SSRS is installed. 8. Enter the name of the server which has SQL Server 2016 Reporting Services installed. In this example, our machine is ACSrv. 9. By default, SharePoint will create a name for the database that includes a GUID (a long string of letters and numbers). You should absolutely rename this to eliminate the GUID, but ensure the database name will be unique. In this example, we used ReportingService_SharePoint. 10. Review the information so that it resembles the following figure, but don't hit OK quite yet as there are few more pieces of information to fill out. Scroll down in the dialog to continue: 11. After the database name, you'll need to indicate the authentication method. Assuming the credentials you entered for the security account came from your Active Directory administrator, you can take the default of Windows authentication. 12. Place a check mark beside the instance of SharePoint to associate this SSRS application with. Here there is only one, SharePoint – 80. 13. Click OK to continue. Assuming all goes well, you should see the following confirmation dialog. If so, click OK to proceed: 14. Now that SharePoint is configured, you'll now need to provide additional information to SQL Server. That is the purpose of this final screen, Provision Subscriptions and Alerts. Select the Download Script button, and save the generated SQL file: 15. Pass the SQL file to a database administrator to execute, or open it in SSMS and execute it yourself, assuming you have administrative rights on the SQL Server. SharePoint uses the concept of Service Applications to manage items which run under the hood of SharePoint. SQL Server Reporting Services is one such service application. By integrating it as a service application, end users can upload, modify, and view SSRS reports right within SharePoint. We began by generating a new Service Application, and picking Reporting Services from the list. We then needed to let SharePoint know where the SQL Server would be used to host both the database, as well as have a copy of Reporting Services for SharePoint installed. In addition, we also needed to provide security credentials for SharePoint to use to communicate with SQL Server. As the final step, we needed to configure SQL Server to now work with SharePoint. This was the purpose of the Provision Subscriptions and Alerts screen. Note there is an option to fill out a user name and credential; clicking OK would then have immediately executed scripts against the target SQL Server. In most mid-to large-size corporations, however, there will be controls in place to prevent this type of thing. Most companies will require a DBA to review scripts, or at the very least you'll want to keep a copy of the script in your source control system to be able to track what changes were made to a SQL Server. Hence, we suggest taking the action laid out in this article, namely downloading the script and executing it manually in the SQL Server Management Studio. To test your setup, we suggest creating a new report with embedded data sources and datasets. Upload that report to the server, and attempt to execute; it should display correctly if your install went well. If you enjoyed this excerpt, check out the book SQL Server 2016 Reporting Services Cookbook to know more about handling security and configuring email with SharePoint using Reporting Services.    
Read more
  • 0
  • 0
  • 6653
article-image-how-create-breakout-game-godot-engine-part-2
George Marques
23 Nov 2016
8 min read
Save for later

How to create a Breakout game with Godot Engine – Part 2

George Marques
23 Nov 2016
8 min read
In part one of this article you learned how to set up a project and create a basic scene with input and scripting. By now you should grasp the basic concepts of the Godot Engine, such as nodes and scenes. Here we're going to complete the game up to a playable demo. Game scene Let's create new scene to hold the game itself. Click on the menu Scene > New Scene and add a Node2D as its root. You may feel tempted to resize this node to occupy the scene, but you shouldn't. If you resize it you'll be changing the scale and position, which will be reflected on child nodes. We want the position and scale both to be (0, 0). Rename the root node to Game and save the scene as game.tscn. Go to the Project Settings, and in the Application section, set this as the main_scene option. This will make the Game scene run when the game starts. Drag the paddle.tscn file from the FileSystem dock and drop it over the root Game node. This will create a new instance of the paddle scene. It's also possible to click on the chain icon on the Scene dock and select a scene to instance. You can then move the instanced paddle to the bottom of the screen where it should stay in the game (use the guides in the editor as reference). Play the project and you can then move the paddle with your keyboard. If you find the movement too slow or too fast, you can select the Paddle node and adjust the Speed value on the Inspector because it's an exported script variable. This is a great way to tweak the gameplay without touching the code. It also allows you to put multiple paddles in the game, each with its own speed if you wish. To make this better, you can click the Debug Options button (the last one on the top center bar) and activate Sync Scene Changes. This will reflect the changes on the editor in the running game, so you can set the speed without having to stop and play again. The ball Let's create a moving object to interact with. Make a new scene and add a RigidBody2D as the root. Rename it to Ball and save the scene as ball.tscn. The rigid body can be moved by the physics engine and interact with other physical objects, like the static body of the paddle. Add a Sprite as a child node and set the following image as its texture: Ball Now add a CollisionShape2D as a child of the Ball. Set its shape to new CircleShape2D and adjust the radius to cover the image. We need to adjust some of the Ball properties to behave in an appropriate way for this game. Set the Mode property to Character to avoid rotation. Set the Gravity Scale to 0 so it doesn't fall. Set the Friction to 0 and the Damp Override > Linear to 0 in to avoid the loss of momentum. Finally, set the Bounce property to 1, as we want the ball to completely bounce when touching the paddle. With that done, add the following script to the ball, so it starts moving when the scene plays: extends RigidBody2D # Export the ball speed to be changed in the editor export var ball_speed = 150.0 func _ready(): # Apply the initial impulse to the ball so it starts moving # It uses a bit of vector math to make the speed consistent apply_impulse(Vector2(), Vector2(1, 1).normalized() * ball_speed) Walls Going back to the Game scene, instance the ball as a child of the root node. We're going to add the walls so the ball doesn't get lost in the world. Add a Node2D as a child of the root and rename it to Walls. This will be the root for the wall nodes, to keep things organized. As a child of that, add four StaticBody2D nodes, each with its own rectangular collision shape to cover the borders of the screen. You'll end up with something like the following: Walls By now you can play the game a little bit and use the paddle to deflect the ball or leave it to bounce on the bottom wall. Bricks The last part of this puzzle left is the bricks. Create a new scene, add a StaticBody2D as the root and rename it to Brick. Save the scene as brick.tscn. Add a Sprite as its child and set the texture to the following image: Brick Add a CollisionShape2D and set its shape to rectangle, making it cover the whole image. Now add the following script to the root to make a little bit of magic: # Tool keyword makes the script run in editor # In this case you can see the color change in the editor itself tool extends StaticBody2D # Export the color variable and a setter function to pass it to the sprite export (Color) var brick_color = Color(1, 1, 1) setget set_color func _ready(): # Set the color when first entering the tree set_color(brick_color) # This is a setter function and will be called whenever the brick_color variable is set func set_color(color): brick_color = color # We make sure the node is inside the tree otherwise it cannot access its children if is_inside_tree(): # Change the modulate property of the sprite to change its color get_node("Sprite").set_modulate(color) This will allow to set the color of the brick using the Inspector, removing the need to make a scene for each brick color. To make it easier to see, you can click the eye icon besides the CollisionShape2D to hide it. Hide CollisionShape2D The last thing to be done is to make the brick disappear when touched by the ball. Using the Node dock, add the group brick to the root Brick node. Then go back to the Ball scene and, again using the Node dock, but this time in the Signals section, double-click the body_enter signal. Click the Connect button with the default values. This will open the script editor with a new function. Replace it with this: func _on_Ball_body_enter( body ): # If the body just touched in member of the "brick" group if body.is_in_group("brick"): # Mark it for deletion in the next idle frame body.queue_free() Using the Inspector, change the Ball node to enable the Contact Monitor property and increase the Contacts Reported to 1. This will make sure the signal is sent when the ball touches something. Level Make a new scene for the level. Add a Node2D as the root, rename it to Level 1 and save the scene as level1.tscn. Now instance a brick in the scene. Position it anywhere, set a color using the Inspector and then duplicate it. You can repeat this process to make the level look the way you want. Using the Edit menu you can set a grid with snapping to make it easier to position the bricks. Then go back to the Game scene, instance the level there as a child of the root. Play the game and you will finally see the ball bouncing around and destroying the bricks it touches. Breakout Game Going further This is just a basic tutorial showing some of the fundamental aspects of the Godot Engine. The Node and Scene system, physics bodies, scripting, signals, and groups are very useful concepts but not all that Godot has to offer. Once you get acquainted with those, it's easy to learn other functions of the engine. The finished game in this tutorial is just bare bones. There are many things you can do, such as adding a start menu, progressing the levels as they are finished and detecting when the player loses. Thankfully, Godot makes all those things very easy and it should not be much effort to make this a complete game. Author: George Marques is a Brazilian software developer who has been playing with programming in a variety of environments since he was a kid. He works as a freelancer programmer for web technologies based on open source solutions such as WordPress and Open Journal Systems. He's also one of the regular contributors of the Godot Engine, helping solving bugs and adding new features to the software, while also giving solutions to the community for the questions they have.
Read more
  • 0
  • 0
  • 6652

article-image-how-choose-open-source-content-management-system
Packt
19 Mar 2010
10 min read
Save for later

How to Choose an Open Source Content Management System

Packt
19 Mar 2010
10 min read
I want to define a couple of important concepts before going into the detail of this topic. What is a Content Management System (CMS)?A CMS is a system that manages a website's content, without requiring any HTML knowledge, to link pages and to control how the pages look. A CMS makes it much easier for users to create, edit, and publish the content on a website. What is the scope of the document that covers CMS?CMS is available in commercial and Open Source licenses. It can be written using different languages such as PHP, Perl, ASP, JSP, etc. To help limit the scope of this document, I will only cover open source CMS that are written in PHP. What is the definition of content?Content can mean different things to different people. It can mean news, articles, Blogs, Wikis, forum posts, picture galleries, source codes, file managements for download, product for e-commerce and so on. And finally... How do I choose the best Open Source CMS for me?With my observations from the many people that visit opensourceCMS.com, the "How to choose an Open Source CMS" question is the most frequently asked, however there is no single simple answer! Below are my recommendations on the processes that I would normally use in selecting an Open Source CMS. Your criteria:One of the most basic steps involved in a CMS selection effort is to determine your requirements.  The following bullet points may assist you in the determination process. Content or Purpose: Defining the purpose of your content is just as important as defining the content itself. Is your content tailored for interaction, forums or for you to express yourself like Blogs and articles? Entry format: Content includes text, images, video; audio, XML, PDF, HTML, spreadsheets, etc. How should the content be stored: In flat files or in databases? Support: For open source projects, the community support is very important. How helpful is it? How active is it? Add-ons: There is rarely a CMS available that will match exactly what you need, so it is very important to have as many add-on options available for your chosen CMS as possible. Hands-on:With the above understanding of your requirements, this is where the rubber hits the road, so to speak. The fun begins with some hands-on experimenting with Content Management Systems, and oodles of research. opensourceCMS.com is the only site of its class that lets you play with a whole array of open source CMS with full administrator privileges before you do the installation on your own site. There are several categories of CMS at the site to choose from: Portal, Blogs, e-Commerce, Groupware, Forums, e-Learning, Image Galleries, Wiki, Lite and Miscellaneous. As you are experimenting with different CMS packages you might notice each has its own strengths and weaknesses. Some might prefer PHP-Nuke style, some might like the Mambo way of managing articles using section and category, others might love the way Drupal uses the concept of Taxonomy to organize information amongst others like Wiki, Blogs etc. You will also find CMS like eZ Publish and Xaraya, which comes with choices of multiple personalities for you to choose when you do the install. The most important thing is choosing a CMS package that fits the way you want to organize your content. It might take you a little bit of effort to learn how a particular CMS works. For example, when I started out learning about CMS I cut my teeth with PHP-NUKE and learned the way it organizes information using the concept of modules and blocks. I later learned about how insecure it is; our site was hacked and defaced a few times. So we set out to look for another CMS and found PHPWebSite; it looks very nice and it works similarly to PHP-NUKE but it didn't perform to our expectation on the version that we looked at, so we were forced to start our search again for another suitable CMS. This time we tried Mambo. Mambo does things different from PHP-NUKE and PHPWebSite. It took me a few weeks of reading forum posts, news, and articles to learn the way Mambo organizes things. Once we got the basics understanding of Mambo, we started customizing just as easy as PHP-NUKE, or even better! Research:Where should I begin? Read as much as you can on the forum posting of your particular chosen CMS at opensourceCMS.com or at the CMS project home site.  Comparison. You can visit CMSMatrix.com do a comparison on features of different CMS packages that you tentatively want to take a look at. CMS Rating  As I was doing research on different CMS packages, I figured that I would need some way to give me a starting point to determine where to start. That is when I decided to develop a dynamic list of CMS ratings based on visitors' ratings, voted for by visitors of opensourceCMS. I created a composite of all the ratings into a list sorted by category and by rating score in a descending order. You can find the CMS rating list here. CMS top hits.  One other tool that I use to help me decide on which CMS to choose is by looking at the hits statistics on a particular CMS demo at the opensourceCMS.com site. So I decided to develop a CMS Top Hits list.  This list is a dynamic collection of the statistics of the daily hits counted by the visitors to the CMS demo. One other figure that I found very useful beside the hits count is the 'Ratio'.  The Ratio is calculated based on this formula: Ratio = (number of hits) / (number of days since this CMS was listed at the site).  By using this indicator all CMS has a fair chance comparing to other CMS that might be listed at the site for a longer time.  You can find the CMS Top Hits list here How about other kind of CMS? As I was studying the world of CMS I found that there are many CMS packages available out there on the net.  Beside the usual category that you might find at opensourceCMS.com, there are: CMS Framework, KMS, Directory, Calendar, etc.  There are too many to name here!  You can find a collection of my research on different kinds of CMS from my site at OngETC.com Is it secure? You might want to visit www.osvdb.org to see how much vulnerabilities your chosen CMS had and how quickly it was patched.  You might not want to pick a particular CMS that it is particularly insecure. Try again? You might have discovered that you sometimes have to abandon the CMS you have chosen for some reason(s) and go back to the hands-on point and try out a different CMS.  Plug-ins:Next, decide what features and functions you would like to use on your site.  The following questions might help you to decide: Do you want to store your content in flat files or databases?  Will there be a need for an event calendar?  Will there be a need for an upload/download area?  What about an image gallery?  How about a poll or survey tool?  Will you be needing multi language support or translation function?  How about WYSIWYG (What You See Is What You Get) editor for content? Is security and permission important to you?  How do you want to set permission and assign access to your content? Components and modules are "plug-ins" that typically provide add-ons to the core system.  Sometimes there are presets of add-ons installed as a part of the base system and sometimes you will have to install them separately.  This can vary greatly from one system to the next.  More established systems like Mambo will have a lot of community support which translates to a wide variety of components, modules and plug-ins.  You will soon find out that some of the features you want, might not be available as part of the core or add-ons.  You might have to buy a commercial add-on product, pay a 3rd party developer to do it for you or learn enough to develop your own. It helps to list all the things you need and use that as your shopping list to test against and to experiment with.  The key word here is experiment because that's the best way to understand how useful it might be. The LookLastly, once you determine the base system and the add-on features to use, it is time to decide on how your site will look.  Most systems utilize templates or skin and CSS (cascading style sheets) that allow you to tailor the look of your site.  It's up to your imagination!  Even if you are not a graphic or template designer, there are many free or commercial templates that you can acquire and tailor to fit your taste. You can download these and add them to your site which can really give it a complete makeover.  Some of the CMS have a "template chooser" that lets you pick different "skins" for your site.  At last...In Summary, this article started out by defining 'what is a CMS?'  'What is the scope of this article?'  'How content definition could be differed in people mind?' 'How do you decide which CMS is best for you?' We also looked at what criteria to use to define your requirements.  With your tentative set of requirements, this article suggests where to go to do some hands-on experiments to gain more understanding about different choices of CMS, and how to fine tune or expand your criteria.  We also shared a little of our experience when we went through the process of choosing a CMS. Once you identify a CMS that you like, then it would be worth while to do a lot of reading regarding your choices, whether at opensourceCMS.com, or at the CMS project home site and any additional affiliate sites that you can find.  You might consider doing a search on Google about your particular CMS to see what else you might find.  The point is doing as much reading as you can afford! With knowledge from all the reading you have done, it is time to decide on what additional feature-sets you want and find out whether it is available on the core system; pre-built plug-ins, and freely available open source or commercial plug-ins. Finally you will need to decide on a template for your site.  You might develop yourself, download some open source templates, or buy from a professional commercial template designer. That should give a good overview of the whole lifecycle in choosing an open source CMS and give a brief glimpse into some things for consideration.  Now is the time to load your content to your site and ready for the debut to the world! By Chanh Ong Chanh Ong is a Computer Specialist. Chanh has many years experience in various computer platforms and operating system, programming languages, system administration, DBA and etc.     Content Management Books from Packt Packt publishes a range of books on various Open Source Content Management Systems, including Joomla!, Mambo, Plone, TYPO3, DotNetNuke, OpenCMS, XOOPS and many more. For more information, please visit: www.PacktPub.com/content-management  
Read more
  • 0
  • 0
  • 6650

article-image-spatial-data-services
Packt
19 Nov 2013
6 min read
Save for later

Spatial Data Services

Packt
19 Nov 2013
6 min read
(For more resources related to this topic, see here.) So far we have worked with relatively small sets of data; for larger collections, Bing Maps provide Spatial Data Services. They offer the Data Source Management API to load large datasets into Bing Maps servers, and the Query API to query the data. In this article, we will use the Geocode Dataflow API, which provides geocoding and reverse geocoding of large datasets. Geocoding is the process of finding geographic coordinates from other geographic data, such as, street addresses or postal codes. Reverse geocoding is the opposite process, where the coordinates are used to find their associated textual locations, such as, addresses and postal codes. Bing Maps implement these processes by creating jobs on Bing Maps servers, and querying them later. All the process can be automated, which is ideal for huge amounts of data. Please note that strict rules of data usage apply to the Spatial Data Services (please refer http://msdn.microsoft.com/en-us/library/gg585136.aspx for full details). At the moment of writing, a user with a basic account can set up to 5 jobs in a 24-hour period. Our task in this article is to geocode the addresses of ten of the biggest technology companies in the world, such as Microsoft, Google, Apple, Facebook, and so on, and then display them on the map. The first step is to prepare the file with the companies' addresses. Geocoding dataflow input data The input and output data can be supplied in the following formats: XML (content type application/xml) Comma separated values (text/plain) Tab-delimited values (text/plain) Pipe-delimited values (text/plain) Binary (application/octet-stream) used with Blob Service REST API We will use the XML format, for its clearer declarative structure. Now, let's open Visual Studio and create a new C# Console project named LBM.Geocoder. We then add a Datafolder, which will contain all the data files and samples with which we'll work in this article, starting with the data.xml file we need to upload to the Spatial Data servers to be geocoded. <?xml version="1.0" encoding="utf-8"?><GeocodeFeedVersion="2.0"><GeocodeEntity Id="001"><GeocodeRequest Culture="en-US" IncludeNeighborhood="1"><Address AddressLine="1 Infinite Loop"AdminDistrict="CA" Locality="Cupertino" PostalCode="95014" /></GeocodeRequest></GeocodeEntity><GeocodeEntity Id="002"><GeocodeRequest Culture="en-US" IncludeNeighborhood="1"><Address AddressLine="185 Berry St"AdminDistrict="NY" Locality="New York" PostalCode="10038"/></GeocodeRequest></GeocodeEntity> The listing above is a fragment of that file with the addresses of the companies' headquarters. Please note that the more addressing information we provide to the API, the better quality geocoding we receive. In production, this file would be created programmatically, probably based on an Addresses database. The ID of GeocodeEntity, could also be stored, so that the data is matched easier once fetched from the servers. (You can find the Geocode Dataflow Data Schema, Version 2, at http://msdn.microsoft.com/en-us/library/jj735477.aspx.) The job Let's add a Jobclass to our project: public class Job{private readonly string dataFilePath;public Job(string dataFilePath){this.dataFilePath = dataFilePath;}} The dataFilePath argument is the path to the data.xml file we created earlier. Creating the job is as easy as calling a REST URL: public void Create(){var uri =String.Format("{0}?input=xml&output=xml&key={1}",Settings.DataflowUri, Settings.Key);var data = File.ReadAllBytes(dataFilePath);try{var wc = new WebClient();wc.Headers.Add("Content-Type", "application/xml");var receivedBytes = wc.UploadData(uri, "POST",data);ParseJobResponse(receivedBytes);}catch (WebException e){var response = (HttpWebResponse)e.Response;var status = response.StatusCode;}} We place all the API URLs and other settings in the Settings class: public class Settings{public static string Key = "[YOUR BING MAPS KEY];public static string DataflowUri ="https://spatial.virtualearth.net/REST/v1/dataflows/geocode";public static XNamespace XNamespace ="http://schemas.microsoft.com/search/local/ws/rest/v1";public static XNamespace GeocodeFeedXNamespace ="http://schemas.microsoft.com/search/local/2010/5/geocode";} To create the job, we need to build the Dataflow URL template with a Bing Maps Key, and parameters such as input and output formats. We specify the latter to be XML. Next, we use a WebClient instance to load the data with a POST protocol. Then, we parse the server response: private void ParseJobResponse(byte[] response){using (var stream = new MemoryStream(response)){var xDoc = XDocument.Load(stream);var job = xDoc.Descendants(Settings.XNamespace +"DataflowJob").FirstOrDefault();var linkEl = job.Element(Settings.XNamespace +"Link");if (linkEl != null) Link = linkEl.Value;}} Here, we pass the stream created with the bytes received from the server to the XDocument.load method. This produces an XDocument instance, which we will use to extract the data we need. We will apply a similar process throughout the article to parse XML content. Note that the appropriate XNamespace needs to be supplied in order to navigate through the document nodes. You can find a sample of the response inside the Data folder (jobSetupResponse.xml), which shows that the link to the job created is found under a Link element within the DataflowJob node. Getting job status Once we have set up a job, we can store the link on a data store, such as a database, and check for its status later. The data will be available on the Microsoft servers up to 14 days after creation. Let's see how we can query the job status: public static JobStatus CheckStatus(string jobUrl){var result = new JobStatus();var uri = String.Format("{0}?output=xml&key={1}", jobUrl,Settings.Key);var xDoc = XDocument.Load(uri);var job = xDoc.Descendants(Settings.XNamespace +"DataflowJob").FirstOrDefault();if (job != null){var linkEls = job.Elements(Settings.XNamespace +"Link").ToList();foreach (var linkEl in linkEls){var nameAttr = linkEl.Attribute("name");if (nameAttr != null){if (nameAttr.Value == "succeeded") result.SucceededLink = linkEl.Value;if (nameAttr.Value == "failed") result.FailedLink= linkEl.Value;}}var statusEl = job.Elements(Settings.XNamespace +"Status").FirstOrDefault();if (statusEl != null) result.Status = statusEl.Value;}return result;} Now, we know that to query a Data API we need to first build the URL template. We do this by attaching the Bing Maps Key and an output parameter to the job link. The response we get from the server, stores the job status within a Link element of a DataflowJob node (the jobResponse.xml file inside the Data folder contains an example). The link we need has a name attribute with the value succeeded. Summary When it comes to large amounts of data, the Spatial Data Services offer a number of interfaces to store, and query user data; geocode addresses or reverse geocode geographical coordinates. The services perform these tasks by means of background jobs, which can be set up and queried through REST URLs. Resources for Article: Further resources on this subject: NHibernate 2: Mapping relationships and Fluent Mapping [Article] Using Maps in your Windows Phone App [Article] What is OpenLayers? [Article]
Read more
  • 0
  • 0
  • 6647
article-image-setting-test-infrastructure
Packt
12 Sep 2013
9 min read
Save for later

Setting Up a Test Infrastructure

Packt
12 Sep 2013
9 min read
(For more resources related to this topic, see here.) Setting up and writing our first tests Now that we have the base test libraries, we can create a test driver web page that includes the application and test libraries, sets up and executes the tests, and displays a test report. The test driver page A single web page is typically used to include the test and application code and drive all frontend tests. Accordingly, we can create a web page named test.html in the chapters/01/test directory of our repository starting with just a bit of HTML boilerplate—a title and meta attributes: <html> <head> <title>Backbone.js Tests</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> Then, we include the Mocha stylesheet for test reports and the Mocha, Chai, and Sinon.JS JavaScript libraries: <link rel="stylesheet" href="js/lib/mocha.css" /> <script src = "js/lib/mocha.js"></script> <script src = "js/lib/chai.js"></script > <script src = "js/lib/sinon.js"></script> Next, we prepare Mocha and Chai. Chai is configured to globally export the expect assertion function. Mocha is set up to use the bdd test interface and start tests on the window.onload event: <script> // Setup. var expect = chai.expect; mocha.setup("bdd"); // Run tests on window load event. window.onload = function () { mocha.run(); }; </script> After the library configurations, we add in the test specs. Here we include a single test file (that we will create later) for the initial test run: <script src = "js/spec/hello.spec.js"></script> </head> Finally, we include a div element that Mocha uses to generate the full HTML test report. Note that a common alternative practice is to place all the script include statements before the close body tag instead of within the head tag: <body> <div id="mocha"></div> </body> </html> And with that, we are ready to create some tests. Now, you could even open chapters/01/test/test.html in a browser to see what the test report looks like with an empty test suite. Adding some tests It is sufficient to say that test development generally entails writing JavaScript test files, each containing some organized collection of test functions. Let's start with a single test file to preview the testing technology stack and give us some tests to run. The test file chapters/01/test/js/spec/hello.spec.js creates a simple function (hello()) to test and implements a nested set of suites introducing a few Chai and Sinon.JS features. The function under test is about as simple as you can get: window.hello = function () { return "Hello World"; }; The hello function should be contained in its own library file (perhaps hello.js) for inclusion in applications and tests. The code samples simply include it in the spec file for convenience. The test code uses nested Mocha describe statements to create a test suite hierarchy. The test in the Chai suite uses expect to illustrate a simple assertion. The Sinon.JS suite's single test shows a test spy in action: describe("Trying out the test libraries", function () { describe("Chai", function () { it("should be equal using 'expect'", function () { expect(hello()).to.equal("Hello World"); }); }); describe("Sinon.JS", function () { it("should report spy called", function () { var helloSpy = sinon.spy(window, 'hello'); expect(helloSpy.called).to.be.false; hello(); expect(helloSpy.called).to.be.true; hello.restore(); }); }); }); Not to worry if you do not fully understand the specifics of these tests and assertions at this point, as we will shortly cover everything in detail. The takeaway is that we now have a small collection of test suites with a set of specifications ready to be run. Running and assessing test results Now that all the necessary pieces are in place, it is time to run the tests and review the test report. The first test report Opening up the chapters/01/test/test.html file in any web browser will cause Mocha to run all of the included tests and produce a test report: Test report This report provides a useful summary of the test run. The top-right column shows that two tests passed, none failed, and the tests collectively took 0.01 seconds to run. The test suites declared in our describe statements are present as nested headings. Each test specification has a green checkmark next to the specification text, indicating that the test has passed. Test report actions The report page also provides tools for analyzing subsets of the entire test collection. Clicking on a suite heading such as Trying out the test libraries or Chai will re-run only the specifications under that heading. Clicking on a specification text (for example, should be equal using 'expect') will show the JavaScript code of the test. A filter button designated by a right triangle is located to the right of the specification text (it is somewhat difficult to see). Clicking the button re-runs the single test specification. The test specification code and filter The previous figure illustrates a report in which the filter button has been clicked. The test specification text in the figure has also been clicked, showing the JavaScript specification code. Advanced test suite and specification filtering The report suite and specification filters rely on Mocha's grep feature, which is exposed as a URL parameter in the test web page. Assuming that the report web page URL ends with something such as chapters/01/test/test.html, we can manually add a grep filter parameter accompanied with the text to match suite or specification names. For example, if we want to filter on the term spy, we would navigate a browser to a comparable URL containing chapters/01/test/test.html?grep=spy, causing Mocha to run only the should report spy called specification from the Sinon.JS suite. It is worth playing around with various grep values to get the hang of matching just the suites or specifications that you want. Test timing and slow tests All of our tests so far have succeeded and run quickly, but real-world development necessarily involves a certain amount of failures and inefficiencies on the road to creating robust web applications. To this end, the Mocha reporter helps identify slow tests and analyze failures. Why is test speed important? Slow tests can indicate inefficient or even incorrect application code, which should be fixed to speed up the overall web application. Further, if a large collection of tests run too slow, developers will have implicit incentives to skip tests in development, leading to costly defect discovery later down the deployment pipeline. Accordingly, it is a good testing practice to routinely diagnose and speed up the execution time of the entire test collection. Slow application code may be left up to the developer to fix, but most slow tests can be readily fixed with a combination of tools such as stubs and mocks as well as better test planning and isolation. Let's explore some timing variations in action by creating chapters/01/test/js/spec/timing.spec.js with the following code: describe("Test timing", function () { it("should be a fast test", function (done) { expect("hi").to.equal("hi"); done(); }); it("should be a medium test", function (done) { setTimeout(function () { expect("hi").to.equal("hi"); done(); }, 40); }); it("should be a slow test", function (done) { setTimeout(function () { expect("hi").to.equal("hi"); done(); }, 100); }); it("should be a timeout failure", function (done) { setTimeout(function () { expect("hi").to.equal("hi"); done(); }, 2001); }); }); We use the native JavaScript setTimeout() function to simulate slow tests. To make the tests run asynchronously, we use the done test function parameter, which delays test completion until done() is called. The first test has no delay before the test assertion and done() callback, the second adds 40 milliseconds of latency, the third adds 100 milliseconds, and the final test adds 2001 milliseconds. These delays will expose different timing results under the Mocha default configuration that reports a slow test at 75 milliseconds, a medium test at one half the slow threshold, and a failure for tests taking longer than 2 seconds. Next, include the file in your test driver page (chapters/01/test/test-timing.html in the example code): <script src = "js/spec/timing.spec.js"></script> Now, on running the driver page, we get the following report: Test report timings and failures This figure illustrates timing annotation boxes for our medium (orange) and slow (red) tests and a test failure/stack trace for the 2001-millisecond test. With these report features, we can easily identify the slow parts of our test infrastructure and use more advanced test techniques and application refactoring to execute the test collection efficiently and correctly. Test failures A test timeout is one type of test failure we can encounter in Mocha. Two other failures that merit a quick demonstration are assertion and exception failures. Let's try out both in a new file named chapters/01/test/js/spec/failure.spec.js: // Configure Mocha to continue after first error to show // both failure examples. mocha.bail(false); describe("Test failures", function () { it("should fail on assertion", function () { expect("hi").to.equal("goodbye"); }); it("should fail on unexpected exception", function () { throw new Error(); }); }); The first test, should fail on assertion, is a Chai assertion failure, which Mocha neatly wraps up with the message expected 'hi' to equal 'goodbye'. The second test, should fail on unexpected exception, throws an unchecked exception that Mocha displays with a full stack trace. Stack traces on Chai assertion failures vary based on the browser. For example, in Chrome, no stack trace is displayed for the first assertion while one is shown in Safari. See the Chai documentation for configuration options that offer more control over stack traces. Test failures Mocha's failure reporting neatly illustrates what went wrong and where. Most importantly, Chai and Mocha report the most common case—a test assertion failure—in a very readable natural language format. Summary In this article, we introduced an application and test structure suitable for development, gathered the Mocha, Chai, and Sinon.JS libraries, and created some basic tests to get things started. Then, we reviewed some facets of the Mocha test reporter and watched various tests in action—passing, slow, timeouts, and failures. Resources for Article: Further resources on this subject: The Aliens Have Landed! [Article] Testing your App [Article] Quick Start into Selenium Tests [Article]
Read more
  • 0
  • 0
  • 6646

Packt
03 Sep 2015
10 min read
Save for later

Learning RSLogix 5000 – Buffering I/O Module Input and Output Values

Packt
03 Sep 2015
10 min read
 In the following article by Austin Scott, the author of Learning RSLogix 5000 Programming, you will be introduced to the high performance, asynchronous nature of the Logix family of controllers and the requirement for the buffering I/O module data it drives. You will learn various techniques for the buffering I/O module values in RSLogix 5000 and Studio 5000 Logix Designer. You will also learn about the IEC Languages that do not require the input or output module buffering techniques to be applied to them. In order to understand the need for buffering, let's start by exploring the evolution of the modern line of Rockwell Automation Controllers. (For more resources related to this topic, see here.) ControlLogix controllers The ControlLogix controller was first launched in 1997 as a replacement for Allen Bradley's previous large-scale control platform. The PLC-5. ControlLogix represented a significant technological step forward, which included a 32-bit ARM-6 RISC-core microprocessor and the ABrisc Boolean processor combined with a bus interface on the same silicon chip. At launch, the Series 5 ControlLogix controllers (also referred to as L5 and ControlLogix 5550, which has now been superseded by the L6 and L7 series controllers) were able to execute code three times faster than PLC-5. The following is an illustration of the original ControlLogix L5 Controller: ControlLogix Logix L5 Controller The L5 controller is considered to be a PAC (Programmable Automation Controller) rather than a traditional PLC (Programmable Logic Controller), due to its modern design, power, and capabilities beyond a traditional PLC (such as motion control, advanced networking, batching and sequential control). ControlLogix represented a significant technological step forward for Rockwell Automation, but this new technology also presented new challenges for automation professionals. ControlLogix was built using a modern asynchronous operating model rather than the more traditional synchronous model used by all the previous generations of controllers. The asynchronous operating model requires a different approach to real-time software development in RSLogix 5000 (now known in version 20 and higher as Studio 5000 Logix Designer). Logix operating cycle The entire Logix family of controllers (ControlLogix and CompactLogix) have diverged from the traditional synchronous PLC scan architecture in favor of a more efficient asynchronous operation. Like most modern computer systems, asynchronous operation allows the Logix controller to handle multiple Tasks at the same time by slicing the processing time between each task. The continuous update of information in an asynchronous processor creates some programming challenges, which we will explore in this article. The following diagram illustrates the difference between synchronous and asynchronous operation. Synchronous versus Asynchronous Processor Operation Addressing module I/O data Individual channels on a module can be referenced in your Logix Designer / RSLogix 5000 programs using it's address. An address gives the controller directions to where it can find a particular piece of information about a channel on one of your modules. The following diagram illustrates the components of an address in RsLogix 5000 or Studio 5000 Logix Designer: The components of an I/O Module Address in Logix Module I/O tags can be viewed using the Controller Tags window, as the following screen shot illustrates. I/O Module Tags in Studio 5000 Logix Designer Controller Tags Window Using the module I/O tags, input and output module data can be directly accessed anywhere within a logic routine. However, it is recommended that we buffer module I/O data before we evaluate it in Logic. Otherwise, due to the asynchronous tag value updates in our I/O modules, the state of our process values could change part way through logic execution, thus creating unpredictable results. In the next section, we will introduce the concept of module I/O data buffering. Buffering module I/O data In the olden days of PLC5s and SLC500s, before we had access to high-performance asynchronous controllers like the ControlLogix, SoftLogix and CompactLogix families, program execution was sequential (synchronous) and very predictable. In asynchronous controllers, there are many activities happening at the same time. Input and output values can change in the middle of a program scan and put the program in an unpredictable state. Imagine a program starting a pump in one line of code and closing a valve directly in front of that pump in the next line of code, because it detected a change in process conditions. In order to address this issue, we use a technique call buffering and, depending on the version of Logix you are developing on, there are a few different methods of achieving this. Buffering is a technique where the program code does not directly access the real input or output tags on the modules during the execution of a program. Instead, the input and output module tags are copied at the beginning of a programs scan to a set of base tags that will not change state during the program's execution. Think of buffering as taking a snapshot of the process conditions and making decisions on those static values rather than live values that are fluctuating every millisecond. Today, there is a rule in most automation companies that require programmers to write code that "Buffers I/O" data to base tags that will not change during a programs execution. The two widely accepted methods of buffering are: Buffering to base tags Program parameter buffering (only available in the Logix version 24 and higher) Do not underestimate the importance of buffering a program's I/O. I worked on an expansion project for a process control system where the original programmers had failed to implement buffering. Once a month, the process would land in a strange state, which the program could not recover from. The operators had attributed these problem to "Gremlins" for years, until I identified and corrected the issue. Buffering to base tags Logic can be organized into manageable pieces and executed based on different intervals and conditions. The buffering to base tags practice takes advantage of Logix's ability to organize code into routines. The default ladder logic routine that is created in every new Logix project is called MainRoutine. The recommended best practice for buffering tags in ladder logic is to create three routines: One for reading input values and buffering them One for executing logic One for writing the output values from the buffered values The following ladder logic excerpt is from MainRoutine of a program that implements Input and Output Buffering: MainRoutine Ladder Logic Routine with Input and Output Buffering Subroutine Calls The following ladder logic is taken from the BufferInputs routine and demonstrates the buffering of digital input module tag values to Local tags prior to executing our PumpControl routine: Ladder Logic Routine with Input Module Buffering After our input module values have been buffered to Local tags, we can execute our processlogic in our PumpControl routine without having to worry about our values changing in the middle of the routine's execution. The following ladder logic code determines whether all the conditions are met to run a pump: Pump Control Ladder Logic Routine Finally, after all of our Process Logic has finished executing, we can write the resulting values to our digital output modules. The following ladder logic BufferOutputs, routine copies the resulting RunPump value to the digital output module tag. Ladder Logic Routine with Output Module Buffering We have now buffered our module inputs and module outputs in order to ensure they do not change in the middle of a program execution and potentially put our process into an undesired state. Buffering Structured Text I/O module values Just like ladder logic, Structured Text I/O module values should be buffered at the beginning of a routine or prior to executing a routine in order to prevent the values from changing mid-execution and putting the process into a state you could not have predicted. Following is an example of the ladder logic buffering routines written in Structured Text (ST)and using the non-retentive assignment operator: (* I/O Buffering in Structured Text Input Buffering *) StartPump [:=] Local:2:I.Data[0].0; HighPressure [:=] Local:2:I.Data[0].1; PumpStartManualOverride [:=] Local:2:I.Data[0].2; (* I/O Buffering in Structured Text Output Buffering *) Local:3:O.Data[0].0 [:=] RunPump; Function Block Diagram (FBD) and Sequential Function Chart (SFC) I/O module buffering Within Rockwell Automation's Logix platform, all of the supported IEC languages (ladder logic, structured text, function block, and sequential function chart) will compile down to the same controller bytecode language. The available functions and development interface in the various Logix programming languages are vastly different. Function Block Diagrams (FBD) and Sequential Function Charts (SFC) will always automatically buffer input values prior to executing Logic. Once a Function Block Diagram or a Sequential Function Chart has completed the execution of their logic, they will write all Output Module values at the same time. There is no need to perform Buffering on FBD or SFC routines, as it is automatically handled. Buffering using program parameters A program parameter is a powerful new feature in Logix that allows the association of dynamic values to tags and programs as parameters. The importance of program parameters is clear by the way they permeate the user interface in newer versions of Logix Designer (version 24 and higher). Program parameters are extremely powerful, but the key benefit to us for using them is that they are automatically buffered. This means that we could have effectively created the same result in one ladder logic rung rather than the eight we created in the previous exercise. There are four types of program parameters: Input: This program parameter is automatically buffered and passed into a program on each scan cycle. Output: This program parameter is automatically updated at the end of a program (as a result of executing that program) on each scan cycle. It is similar to the way we buffered our output module value in the previous exercise. InOut: This program parameter is updated at the start of the program scan and the end of the program scan. It is also important to note that, unlike the input and output parameters; the InOut parameter is passed as a pointer in memory. A pointer shares a piece of memory with other processes rather than creating a copy of it, which means that it is possible for an InOut parameter to change its value in the middle of a program scan. This makes InOut program parameters unsuitable for buffering when used on their own. Public: This program parameterbehaves like a normal controller tag and can be connected to input, output, and InOut parameters. it is similar to the InOut parameter, public parameters that are updated globally as their values are changed. This makes program parameters unsuitable for buffering, when used on their own. Primarily public program parameters are used for passing large data structures between programs on a controller. In Logix Designer version 24 and higher, a program parameter can be associated with a local tag using the Parameters and Local Tags in the Control Organizer (formally called "Program Tags"). The module input channel can be associated with a base tag within your program scope using the Parameter Connections. Add the module input value as a parameter connection. The previous screenshot demonstrates how we would associate the input module channel with our StartPump base tag using the Parameter Connection value. Summary In this article, we explored the asynchronous nature of the Logix family of controllers. We learned the importance of buffering input module and output module values for ladder logic routines and structured text routines. We also learned that, due to the way Function Block Diagrams (FBD/FB) and Sequential Function Chart (SFC) Routines execute, there is no need to buffer input module or output module tag values. Finally, we introduced the concept of buffering tags using program parameters in version 24 and high of Studio 5000 Logix Designer. Resources for Article: Further resources on this subject: vROps – Introduction and Architecture[article] Ensuring Five-star Rating in the MarketPlace[article] Building Ladder Diagram programs (Simple) [article]
Read more
  • 0
  • 0
  • 6643