How-To Tutorials

article-image-creators-of-python-java-c-and-perl-discuss-the-evolution-and-future-of-programming-language-design-at-puppy

08 Apr 2019

11 min read

Creators of Python, Java, C#, and Perl discuss the evolution and future of programming language design at PuPPy

08 Apr 2019

0
0
6637

article-image-data-visualization-ggplot2

Janu Verma

14 Nov 2016

6 min read

Data Visualization with ggplot2

Janu Verma

14 Nov 2016

6 min read

0
0
6634

How-To Tutorials

Packt

19 Jun 2010

10 min read

Getting Started with Blender’s Particle System

Packt

19 Jun 2010

10 min read

We’ll cover a general introduction about Blender’s particle system; how to use it, when to use it, how useful it is on both small scale and large scale projects, and an overview of its implications. In this article, I’ll also discuss how to use the particle system to instance objects These are just but very few examples of what can really be achieved with the particle system, there’s a vast array of possibilities out there that can be discovered. But I’m hoping with these examples and definitions, you’ll find your way through the wonderful world of visual effects with the particle system. If you remember watching Blender Foundation’s Big Buck Bunny, it might not seem as a surprise to you that almost the entire film is filled up with so much particle goodness. Thanks to that, great improvements have been made to Blender’s particle system. As compared to my previous articles, this one uses Blender 2.5, the latest version of Blender which is currently in Alpha stage right now but is fully workable. However, in this article I wouldn’t be telling you what each of the button would do since that would take another article in itself and might deviate from an introductory approach. If you wanted to know more in-depth information about the whole particle system, you can check out the official Blender documentation or the Blender wiki. There had been great improvements on the particle system and now is the right time to start trying it out. So what are you waiting for? Hop on and ride with me in this journey! You can watch the video here. Prerequisites Before we begin with this article, it’s vital to first determine our requirements before proceeding on the actual application. Before going on, we need to have the following: Latest Blender 2.5 (you can grab a copy at http://www.blender.org or http://www.graphicall.org) Adequate hard disk space Decent processor and graphics card What is a Particle System and where can we find it in Blender? A particle system is a technique in Computer Graphics that is used to simulate a visual effect that would otherwise be very difficult and cumbersome if not impossible to do in traditional 3D techniques. These effects include crowd simulation, hair, fur, grass, fire, smoke, fluids, dust, snowflakes, and more. Particle Systems are calculated inside of your simulation program in several ways and one of the most common ones is the Newtonian Physics calculation which regards forces like wind, magnetism, friction, gravity, etc. to generate the particle’s motion along a controllable environment. Properties or parameters of particle systems include: mass, speed, velocity, size, life, color, transparency, softness, damping, and many more. Particle’s behavior along a certain span of time are then saved and written on to your disk. This process is called caching, which enables you to view and edit the particle’s behavior within the timeframe set. And when you’re satisfied with the way your particles act on your simulation space, you can now permanently write these settings to the disk, called baking. Be aware that the longer your particle system’s life is and the greater the number, the more hard disk space it uses, so for testing purposes, it is best to keep the particles number low, then progressively increase them as needed. Particle mass refers the natural weight (in a gravitational field) of a single particle object/point, where higher mass means a heavier particle and a lower mass implies a lighter particle. A heavy particle is more resistant to external forces such as wind but more reactive to gravitational force as compared to a lighter particle system which is a direct opposite. Particle speed/velocity refers to how fast or slow the particle points are being thrown or emitted from their source within a given time. They can be displaced and controlled in the x, y, or z directions accordingly. A higher velocity will create a faster shooting particle (as seen in muzzle flares/flash) and a lower one will create slower particle shots. Size of the particles is one of the most important setting when using a particle system as object instances, since this will better control the size of the objects on the emitting plane as compared to manually resizing the original object. With this option, you can have random sizes which will give the scene a more believable and natural look and offcourse, an easier setup as compared to creating numerous objects for this matter. Particle life refers to the lifespan of the particles, for how long they will be existent until they disappear and die from the system. In some cases, having a large particle life is useful but it can have some speed drawbacks on your machine. Imagine emitting one million particles within 5 seconds where only half of it is relevant within the first few seconds. That could mean 500,000 particles cached is a sheer waste and rendering your machine slower. Having smaller particle lives sometimes is also useful especially when you’re creating small scale smoke and fire. But all these really depend on the way you setup your scenes. Play around and you’ll find the best settings that suit your needs. Note though that in Blender, only Mesh objects can be particle emitters (which is simply rational for this purpose). You can modify the size and shape of the mesh object and the particle system reacts accordingly. The direction from which particles are emitted is dictated by the mesh normals, which I will discuss along the way. In Blender 2.5, we can find the Particle System by selecting any mesh object and clicking the Particles Button in the Properties Editor, as seen in the screenshot. Later, we’ll add and name particle systems and go over their settings more closely. Blender 2.5’s Particle System What are the types of Blender particle system? Currently, there are only two types of particle systems in Blender 2.5, namely Emmiter and Hair. With few on the list, these two have already proved to be very handy tools in creating robust simulations. Particle System Types Let’s begin by learning what an Emitter Type is. Aside from the fact that the term emitter is used to define the source of the particles, the emitter type is one different thing on its own. The Emitter, being the default particle type, is one of the most commonly used particle system type. With it you can create dust, smoke, fire, fluids, snowflakes, and the like. Inside Blender, make the default cube a particle emitter with the type set as Emitter. Leave the default settings as they are and press ALT+A in the 3D Viewport to playback the animation and observe the way the particles act in your observable space, that is, the 3D space that your object is in. During the playback process, the particle system is already caching the position of the particle points in your space and writing these on to the disk. Naturally, this is how the emitter type will act on our space, with a slight pulse from emitter then it drops down as an effect of the gravity. That pulse is coming from the Normal value which tells how strong the points are spurted out until they get affected by external forces. There’s more to the emitter type than there seems to be, so don’t hold back and keep changing the settings and see where they lead you to. And if you can, please send me what you got, I’d be very pleased to see what fun things you came up with. Emitter Type Leaving from the default settings we had awhile back, change your current frame to 1 (if it is not yet set as such) then let’s change the type from Emitter to Hair. Instantly, you’ll notice that strands come bursting out of our default mesh. This is the particle hair’s nature, whichever frame you are in now, it will stay in the same frame, regardless of any explicit animation you add in. As compared to the Emitter type, Hair doesn’t need to be cached to see the results, it’s an on-the-fly process of editing particle settings. Another great advantage of the hair type over the emitter type is that it has a dedicated Particle Mode which enables you to edit the strands as though you were actually touching real hair. With the hair type, you can create systems like fur, grass, static particle instances/grouping, and more. Hair Type Basic and Practical Uses of the Particle System Now let’s on the real business. From here on, I’ll approach the proceeding steps in a practical application manner. First, let’s check some of the default settings and see where they lead us. Fire up a fresh Blender session by opening up Blender or by pressing CTRL+N (when you already have it open from our steps awhile ago). Fresh Blender 2.5 Session Delete the default cube and add in a Plane object. Adding a Plane Primitive Position the camera’s view such that the plane is just on top of the viewing range, this will give us more space to see the particles falling. Adjusting the View Select the Plane Mesh and in the Particle Buttons window, add a new Particle System and leave the default values as they are. In the 3D Viewport, press ALT+A to play and pause the animation or use the play buttons in the Timeline Window. Pick a frame you’re satisfied with, any will do right now. New Particle System Next, add a new material to the Plane Mesh and the Particle System. Activate the Material Buttons and add a new material, then change the material type from Surface to Halo. Halos is a special material type for particles; they give each point a unique shading system compared to the Surface shaders. With halos, you can control the size of the particles, their transparency, color, brightness, hardness, textures, flares, and other interesting effects. With just the default values, let’s render out our scene and see what we get. Halo Material Halo Rendered Right now, this doesn’t look too pleasing, but it will be once we get the combinations right. Next, let’s try activating Rings under the halo menu, then Lines, Star, and finally a combination of the three. Below are rendered versions of each and their combination. Additional Options for Halo Halo with Rings Halo with Lines Halo with Star Halo with Rings, Lines, and Star Just by altering the halo settings alone, we can achieve particle effects like pollen dust as seen in the image below where the particle effect was composited over a photo. Particle Pollen Dust

0
0
6629

Packt

19 Aug 2015

14 min read

Optimization in Python

Packt

19 Aug 2015

14 min read

0
0
6620

article-image-customization-microsoft-dynamics-crm

Packt

30 Dec 2014

24 min read

Customization in Microsoft Dynamics CRM

Packt

30 Dec 2014

24 min read

0
0
6619

article-image-the-elements-of-webassembly-wat-and-wasm-explained-tutorial

Prasad Ramesh

03 Jan 2019

9 min read

The elements of WebAssembly - Wat and Wasm, explained [Tutorial]

Prasad Ramesh

03 Jan 2019

9 min read

In this tutorial, we will dig into the elements that correspond to the official specifications created by the WebAssembly Working Group. We will examine the Wat and the binary format in greater detail to gain a better understanding of how they relate to modules. This article is an excerpt from a book written by Mike Rourke titled Learn WebAssembly. In this book, you will learn how to build web applications with native performance using Wasm and C/C++<mark. Common structure and abstract syntax Before getting into the nuts and bolts of these formats, it's worth mentioning how these are related within the Core Specification. The following diagram is a visual representation of the table of contents (with some sections excluded for clarity): As you can see, the Text Format and Binary Format sections contain subsections for Values, Types, Instructions, and Modules that correlate with the Structure section. Consequently, much of what we cover in the next section for the text format have direct corollaries with the binary format. With that in mind, let's dive into the text format. Wat The Text Format section of the Core Specification provides technical descriptions for common language concepts such as values, types, and instructions. These are important concepts to know and understand if you're planning on building tooling for WebAssembly, but not necessary if you just plan on using it in your applications. That being said, the text format is an important part of WebAssembly, so there are concepts you should be aware of. In this section, we will dig into some of the details of the text format and highlight important points from the Core Specification. Definitions and S-expressions To understand Wat, let's start with the first sentence of the description taken directly from the WebAssembly Core Specification: "The textual format for WebAssembly modules is a rendering of their abstract syntax into S-expressions." So what are symbolic expressions (S-expressions)? S-expressions are notations for nested list (tree-structured) data. Essentially, they provide a simple and elegant way to represent list-based data in textual form. To understand how textual representations of nested lists map to a tree structure, let's extrapolate the tree structure from an HTML page. The following example contains a simple HTML page and the corresponding tree structure diagram. A simple HTML page: The corresponding tree structure is: Even if you've never seen a tree structure before, it's still clear to see how the HTML maps to the tree in terms of structure and hierarchy. Mapping HTML elements is relatively simple because it's a markup language with well-defined tags and no actual logic. Wat represents modules that can have multiple functions with varying parameters. To demonstrate the relationship between source code, Wat, and the corresponding tree structure, let's start with a simple C function that adds 2 to the number that is passed in as a parameter: Here is a C function that adds 2 to the num argument passed in and returns the result: int addTwo(int num) { return num + 2; } Converting the addTwo function to valid Wat produces this result: (module (table 0 anyfunc) (memory $0 1) (export "memory" (memory $0)) (export "addTwo" (func $addTwo)) (func $addTwo (; 0 ;) (param $0 i32) (result i32) (i32.add (get_local $0) (i32.const 2) ) ) ) The Structure section defines each of these concepts in the context of an abstract syntax. The Text Format section of the specification corresponds with these concepts as well, and you can see them defined by their keywords in the preceding snippet (func, memory, table). Tree Structure: The entire tree would be too large to fit on a page, so this diagram is limited to the first five lines of the Wat source text. Each filled-in dot represents a list node (or the contents of a set of parentheses). As you can see, code written in s-expressions can be clearly and concisely expressed in a tree structure, which is why s-expressions were chosen for WebAssembly's text format. Values, types, and instructions Although detailed coverage of the Text Format section of the Core Specification is out of the scope of this text, it's worth demonstrating how some of the language concepts map to the corresponding Wat. The following diagram demonstrates these mappings in a sample Wat snippet. The C code that this was compiled from represents a function that takes a word as a parameter and returns the square root of the character count: If you intend on writing or editing Wat, note that it supports block and line comments. The instructions are split up into blocks and consist of setting and getting memory associated with variables with valid types. You are able to control the flow of logic using if statements and loops are supported using the loop keyword. Role in the development process The text format allows for the representation of a binary Wasm module in textual form. This has some profound implications with regard to the ease of development and debugging. Having a textual representation of a WebAssembly module allows developers to view the source of a loaded module in a browser, which eliminates the black-box issues that inhibited the adoption of NaCl. It also allows for tooling to be built around troubleshooting modules. The official website describes the use cases that drove the design of the text format: • View Source on a WebAssembly module, thus fitting into the Web (where every source can be viewed) in a natural way. • Presentation in browser development tools when source maps aren't present (which is necessarily the case with the Minimum Viable Product (MVP)). • Writing WebAssembly code directly for reasons including pedagogical, experimental, debugging, optimization, and testing of the spec itself. The last item in the list reflects that the text format isn't intended to be written by hand in the course of normal development, but rather generated from a tool like Emscripten. You probably won't see or manipulate any .wat files when you're generating modules, but you may be viewing them in a debugging context. Not only is the text format valuable with regards to debugging, but having this intermediate format reduces the amount of reliance on a single tool for compilation. Several different tools currently exist to consume and emit this s-expression syntax, some of which are used by Emscripten to compile your code down to a .wasm file. Binary format and the module file (Wasm) The Binary Format section of the Core Specification provides the same level of detail with regard to language concepts as the Text format section. In this section, we will briefly cover some high-level details about the binary format and discuss the various sections that make up a Wasm module. Definition and module overview The binary format is defined as a dense linear encoding of the abstract syntax. Without getting too technical, that essentially means it's an efficient form of binary that allows for fast decoding, small file size, and reduced memory usage. The file representation of the binary format is a .wasm file. The Values, Types, and Instructions subsections of the Core Specification for the binary format correlate directly to the Text Format section. Each of these concepts is covered in the context of encoding. For example, according to the specification, the Integer types are encoded using the LEB128 variable-length integer encoding, in either unsigned or signed variant. These are important details to know if you wish to develop tooling for WebAssembly, but not necessary if you just plan on using it on your website. The Structure, Binary Format, and Text Format (wat) sections of the Core Specification have a Module subsection. We didn't cover aspects of the module in the previous section because it's more prudent to describe them in the context of a binary. The official WebAssembly site offers the following description for a module: "The distributable, loadable, and executable unit of code in WebAssembly is called a module. At runtime, a module can be instantiated with a set of import values to produce an instance, which is an immutable tuple referencing all the state accessible to the running module." Module sections A module is made up of several sections, some of which you'll be interacting with through the JavaScript API: Imports (import) are elements that can be accessed within the module and can be one of the following: Function, which can be called inside the module using the call operator Global, which can be accessed inside the module via the global operators Linear Memory, which can be accessed inside the module via the memory operators Table, which can be accessed inside the module using call_indirect Exports (export) are elements that can be accessed by the consuming API (that is, called by a JavaScript function) Module start function (start) is called after the module instance is initialized Global (global) contains the internal definition of global variables Linear memory (memory) contains the internal definition of linear memory with an initial memory size and optional maximum size Data (data) contains an array of data segments which specify the initial contents of fixed ranges of a given memory Table (table) is a linear memory whose elements are opaque values of a particular table element type: In the MVP, its primary purpose is to implement indirect function calls in C/C++ Elements (elements) is a section that allows a module to initialize the elements of any import or internally defined table with any other definition in the module Function and code: The function section declares the signatures of each internal function defined in the module The code section contains the function body of each function declared by the function section Some of the keywords (import, export, and so on) should look familiar; they're present in the contents of the Wat file in the previous section. WebAssembly's components follow a logical mapping that directly correspond to the APIs (for example, you pass a memory and table instance into JavaScript's WebAssembly.instantiate() function). Your primary interaction with a module in binary format will be through these APIs. In this tutorial, we looked at the WebAssembly modules Wat and Wasm. We covered Wat definitions, values, types, instructions, and its role. We looked at Wasm overview and its module sections. To know more about another element of WebAssembly, the JavaScript API and build applications on WebAssembly check out the book Learn WebAssembly. How has Rust and WebAssembly evolved in 2018 WebAssembly – Trick or Treat? Mozilla shares plans to bring desktop applications, games to WebAssembly and make deeper inroads for the future web

0
0
6617

How-To Tutorials

article-image-implementing-stacks-using-javascript

Packt

22 Oct 2014

10 min read

Implementing Stacks using JavaScript

Packt

22 Oct 2014

10 min read

0
0
6611

article-image-classifying-real-world-examples

Packt

24 Mar 2015

32 min read

Classifying with Real-world Examples

Packt

24 Mar 2015

32 min read

This article by the authors, Luis Pedro Coelho and Willi Richert, of the book, Building Machine Learning Systems with Python - Second Edition, focuses on the topic of classification. (For more resources related to this topic, see here.) You have probably already used this form of machine learning as a consumer, even if you were not aware of it. If you have any modern e-mail system, it will likely have the ability to automatically detect spam. That is, the system will analyze all incoming e-mails and mark them as either spam or not-spam. Often, you, the end user, will be able to manually tag e-mails as spam or not, in order to improve its spam detection ability. This is a form of machine learning where the system is taking examples of two types of messages: spam and ham (the typical term for "non spam e-mails") and using these examples to automatically classify incoming e-mails. The general method of classification is to use a set of examples of each class to learn rules that can be applied to new examples. This is one of the most important machine learning modes and is the topic of this article. Working with text such as e-mails requires a specific set of techniques and skills. For the moment, we will work with a smaller, easier-to-handle dataset. The example question for this article is, "Can a machine distinguish between flower species based on images?" We will use two datasets where measurements of flower morphology are recorded along with the species for several specimens. We will explore these small datasets using a few simple algorithms. At first, we will write classification code ourselves in order to understand the concepts, but we will quickly switch to using scikit-learn whenever possible. The goal is to first understand the basic principles of classification and then progress to using a state-of-the-art implementation. The Iris dataset The Iris dataset is a classic dataset from the 1930s; it is one of the first modern examples of statistical classification. The dataset is a collection of morphological measurements of several Iris flowers. These measurements will enable us to distinguish multiple species of the flowers. Today, species are identified by their DNA fingerprints, but in the 1930s, DNA's role in genetics had not yet been discovered. The following four attributes of each plant were measured: sepal length sepal width petal length petal width In general, we will call the individual numeric measurements we use to describe our data features. These features can be directly measured or computed from intermediate data. This dataset has four features. Additionally, for each plant, the species was recorded. The problem we want to solve is, "Given these examples, if we see a new flower out in the field, could we make a good prediction about its species from its measurements?" This is the supervised learning or classification problem: given labeled examples, can we design a rule to be later applied to other examples? A more familiar example to modern readers who are not botanists is spam filtering, where the user can mark e-mails as spam, and systems use these as well as the non-spam e-mails to determine whether a new, incoming message is spam or not. For the moment, the Iris dataset serves our purposes well. It is small (150 examples, four features each) and can be easily visualized and manipulated. Visualization is a good first step Datasets will grow to thousands of features. With only four in our starting example, we can easily plot all two-dimensional projections on a single page. We will build intuitions on this small example, which can then be extended to large datasets with many more features. Visualizations are excellent at the initial exploratory phase of the analysis as they allow you to learn the general features of your problem as well as catch problems that occurred with data collection early. Each subplot in the following plot shows all points projected into two of the dimensions. The outlying group (triangles) are the Iris Setosa plants, while Iris Versicolor plants are in the center (circle) and Iris Virginica are plotted with x marks. We can see that there are two large groups: one is of Iris Setosa and another is a mixture of Iris Versicolor and Iris Virginica. In the following code snippet, we present the code to load the data and generate the plot: >>> from matplotlib import pyplot as plt >>> import numpy as np >>> # We load the data with load_iris from sklearn >>> from sklearn.datasets import load_iris >>> data = load_iris() >>> # load_iris returns an object with several fields >>> features = data.data >>> feature_names = data.feature_names >>> target = data.target >>> target_names = data.target_names >>> for t in range(3): ... if t == 0: ... c = 'r' ... marker = '>' ... elif t == 1: ... c = 'g' ... marker = 'o' ... elif t == 2: ... c = 'b' ... marker = 'x' ... plt.scatter(features[target == t,0], ... features[target == t,1], ... marker=marker, ... c=c) Building our first classification model If the goal is to separate the three types of flowers, we can immediately make a few suggestions just by looking at the data. For example, petal length seems to be able to separate Iris Setosa from the other two flower species on its own. We can write a little bit of code to discover where the cut-off is: >>> # We use NumPy fancy indexing to get an array of strings: >>> labels = target_names[target] >>> # The petal length is the feature at position 2 >>> plength = features[:, 2] >>> # Build an array of booleans: >>> is_setosa = (labels == 'setosa') >>> # This is the important step: >>> max_setosa =plength[is_setosa].max() >>> min_non_setosa = plength[~is_setosa].min() >>> print('Maximum of setosa: {0}.'.format(max_setosa)) Maximum of setosa: 1.9. >>> print('Minimum of others: {0}.'.format(min_non_setosa)) Minimum of others: 3.0. Therefore, we can build a simple model: if the petal length is smaller than 2, then this is an Iris Setosa flower; otherwise it is either Iris Virginica or Iris Versicolor. This is our first model and it works very well in that it separates Iris Setosa flowers from the other two species without making any mistakes. In this case, we did not actually do any machine learning. Instead, we looked at the data ourselves, looking for a separation between the classes. Machine learning happens when we write code to look for this separation automatically. The problem of recognizing Iris Setosa apart from the other two species was very easy. However, we cannot immediately see what the best threshold is for distinguishing Iris Virginica from Iris Versicolor. We can even see that we will never achieve perfect separation with these features. We could, however, look for the best possible separation, the separation that makes the fewest mistakes. For this, we will perform a little computation. We first select only the non-Setosa features and labels: >>> # ~ is the boolean negation operator >>> features = features[~is_setosa] >>> labels = labels[~is_setosa] >>> # Build a new target variable, is_virginica >>> is_virginica = (labels == 'virginica') Here we are heavily using NumPy operations on arrays. The is_setosa array is a Boolean array and we use it to select a subset of the other two arrays, features and labels. Finally, we build a new boolean array, virginica, by using an equality comparison on labels. Now, we run a loop over all possible features and thresholds to see which one results in better accuracy. Accuracy is simply the fraction of examples that the model classifies correctly. >>> # Initialize best_acc to impossibly low value >>> best_acc = -1.0 >>> for fi in range(features.shape[1]): ... # We are going to test all possible thresholds ... thresh = features[:,fi] ... for t in thresh: ... # Get the vector for feature `fi` ... feature_i = features[:, fi] ... # apply threshold `t` ... pred = (feature_i > t) ... acc = (pred == is_virginica).mean() ... rev_acc = (pred == ~is_virginica).mean() ... if rev_acc > acc: ... reverse = True ... acc = rev_acc ... else: ... reverse = False ... ... if acc > best_acc: ... best_acc = acc ... best_fi = fi ... best_t = t ... best_reverse = reverse We need to test two types of thresholds for each feature and value: we test a greater than threshold and the reverse comparison. This is why we need the rev_acc variable in the preceding code; it holds the accuracy of reversing the comparison. The last few lines select the best model. First, we compare the predictions, pred, with the actual labels, is_virginica. The little trick of computing the mean of the comparisons gives us the fraction of correct results, the accuracy. At the end of the for loop, all the possible thresholds for all the possible features have been tested, and the variables best_fi, best_t, and best_reverse hold our model. This is all the information we need to be able to classify a new, unknown object, that is, to assign a class to it. The following code implements exactly this method: def is_virginica_test(fi, t, reverse, example): "Apply threshold model to a new example" test = example[fi] > t if reverse: test = not test return test What does this model look like? If we run the code on the whole data, the model that is identified as the best makes decisions by splitting on the petal width. One way to gain intuition about how this works is to visualize the decision boundary. That is, we can see which feature values will result in one decision versus the other and exactly where the boundary is. In the following screenshot, we see two regions: one is white and the other is shaded in grey. Any datapoint that falls on the white region will be classified as Iris Virginica, while any point that falls on the shaded side will be classified as Iris Versicolor. In a threshold model, the decision boundary will always be a line that is parallel to one of the axes. The plot in the preceding screenshot shows the decision boundary and the two regions where points are classified as either white or grey. It also shows (as a dashed line) an alternative threshold, which will achieve exactly the same accuracy. Our method chose the first threshold it saw, but that was an arbitrary choice. Evaluation – holding out data and cross-validation The model discussed in the previous section is a simple model; it achieves 94 percent accuracy of the whole data. However, this evaluation may be overly optimistic. We used the data to define what the threshold will be, and then we used the same data to evaluate the model. Of course, the model will perform better than anything else we tried on this dataset. The reasoning is circular. What we really want to do is estimate the ability of the model to generalize to new instances. We should measure its performance in instances that the algorithm has not seen at training. Therefore, we are going to do a more rigorous evaluation and use held-out data. For this, we are going to break up the data into two groups: on one group, we'll train the model, and on the other, we'll test the one we held out of training. The full code, which is an adaptation of the code presented earlier, is available on the online support repository. Its output is as follows: Training accuracy was 96.0%. Testing accuracy was 90.0% (N = 50). The result on the training data (which is a subset of the whole data) is apparently even better than before. However, what is important to note is that the result in the testing data is lower than that of the training error. While this may surprise an inexperienced machine learner, it is expected that testing accuracy will be lower than the training accuracy. To see why, look back at the plot that showed the decision boundary. Consider what would have happened if some of the examples close to the boundary were not there or that one of them between the two lines was missing. It is easy to imagine that the boundary will then move a little bit to the right or to the left so as to put them on the wrong side of the border. The accuracy on the training data, the training accuracy, is almost always an overly optimistic estimate of how well your algorithm is doing. We should always measure and report the testing accuracy, which is the accuracy on a collection of examples that were not used for training. These concepts will become more and more important as the models become more complex. In this example, the difference between the accuracy measured on training data and on testing data is not very large. When using a complex model, it is possible to get 100 percent accuracy in training and do no better than random guessing on testing! One possible problem with what we did previously, which was to hold out data from training, is that we only used half the data for training. Perhaps it would have been better to use more training data. On the other hand, if we then leave too little data for testing, the error estimation is performed on a very small number of examples. Ideally, we would like to use all of the data for training and all of the data for testing as well, which is impossible. We can achieve a good approximation of this impossible ideal by a method called cross-validation. One simple form of cross-validation is leave-one-out cross-validation. We will take an example out of the training data, learn a model without this example, and then test whether the model classifies this example correctly. This process is then repeated for all the elements in the dataset. The following code implements exactly this type of cross-validation: >>> correct = 0.0 >>> for ei in range(len(features)): # select all but the one at position `ei`: training = np.ones(len(features), bool) training[ei] = False testing = ~training model = fit_model(features[training], is_virginica[training]) predictions = predict(model, features[testing]) correct += np.sum(predictions == is_virginica[testing]) >>> acc = correct/float(len(features)) >>> print('Accuracy: {0:.1%}'.format(acc)) Accuracy: 87.0% At the end of this loop, we will have tested a series of models on all the examples and have obtained a final average result. When using cross-validation, there is no circularity problem because each example was tested on a model which was built without taking that datapoint into account. Therefore, the cross-validated estimate is a reliable estimate of how well the models would generalize to new data. The major problem with leave-one-out cross-validation is that we are now forced to perform many times more work. In fact, you must learn a whole new model for each and every example and this cost will increase as our dataset grows. We can get most of the benefits of leave-one-out at a fraction of the cost by using x-fold cross-validation, where x stands for a small number. For example, to perform five-fold cross-validation, we break up the data into five groups, so-called five folds. Then you learn five models: each time you will leave one fold out of the training data. The resulting code will be similar to the code given earlier in this section, but we leave 20 percent of the data out instead of just one element. We test each of these models on the left-out fold and average the results. The preceding figure illustrates this process for five blocks: the dataset is split into five pieces. For each fold, you hold out one of the blocks for testing and train on the other four. You can use any number of folds you wish. There is a trade-off between computational efficiency (the more folds, the more computation is necessary) and accurate results (the more folds, the closer you are to using the whole of the data for training). Five folds is often a good compromise. This corresponds to training with 80 percent of your data, which should already be close to what you will get from using all the data. If you have little data, you can even consider using 10 or 20 folds. In the extreme case, if you have as many folds as datapoints, you are simply performing leave-one-out cross-validation. On the other hand, if computation time is an issue and you have more data, 2 or 3 folds may be the more appropriate choice. When generating the folds, you need to be careful to keep them balanced. For example, if all of the examples in one fold come from the same class, then the results will not be representative. We will not go into the details of how to do this, because the machine learning package scikit-learn will handle them for you. We have now generated several models instead of just one. So, "What final model do we return and use for new data?" The simplest solution is now to train a single overall model on all your training data. The cross-validation loop gives you an estimate of how well this model should generalize. A cross-validation schedule allows you to use all your data to estimate whether your methods are doing well. At the end of the cross-validation loop, you can then use all your data to train a final model. Although it was not properly recognized when machine learning was starting out as a field, nowadays, it is seen as a very bad sign to even discuss the training accuracy of a classification system. This is because the results can be very misleading and even just presenting them marks you as a newbie in machine learning. We always want to measure and compare either the error on a held-out dataset or the error estimated using a cross-validation scheme. Building more complex classifiers In the previous section, we used a very simple model: a threshold on a single feature. Are there other types of systems? Yes, of course! Many others. To think of the problem at a higher abstraction level, "What makes up a classification model?" We can break it up into three parts: The structure of the model: How exactly will a model make decisions? In this case, the decision depended solely on whether a given feature was above or below a certain threshold value. This is too simplistic for all but the simplest problems. The search procedure: How do we find the model we need to use? In our case, we tried every possible combination of feature and threshold. You can easily imagine that as models get more complex and datasets get larger, it rapidly becomes impossible to attempt all combinations and we are forced to use approximate solutions. In other cases, we need to use advanced optimization methods to find a good solution (fortunately, scikit-learn already implements these for you, so using them is easy even if the code behind them is very advanced). The gain or loss function: How do we decide which of the possibilities tested should be returned? Rarely do we find the perfect solution, the model that never makes any mistakes, so we need to decide which one to use. We used accuracy, but sometimes it will be better to optimize so that the model makes fewer errors of a specific kind. For example, in spam filtering, it may be worse to delete a good e-mail than to erroneously let a bad e-mail through. In that case, we may want to choose a model that is conservative in throwing out e-mails rather than the one that just makes the fewest mistakes overall. We can discuss these issues in terms of gain (which we want to maximize) or loss (which we want to minimize). They are equivalent, but sometimes one is more convenient than the other. We can play around with these three aspects of classifiers and get different systems. A simple threshold is one of the simplest models available in machine learning libraries and only works well when the problem is very simple, such as with the Iris dataset. In the next section, we will tackle a more difficult classification task that requires a more complex structure. In our case, we optimized the threshold to minimize the number of errors. Alternatively, we might have different loss functions. It might be that one type of error is much costlier than the other. In a medical setting, false negatives and false positives are not equivalent. A false negative (when the result of a test comes back negative, but that is false) might lead to the patient not receiving treatment for a serious disease. A false positive (when the test comes back positive even though the patient does not actually have that disease) might lead to additional tests to confirm or unnecessary treatment (which can still have costs, including side effects from the treatment, but are often less serious than missing a diagnostic). Therefore, depending on the exact setting, different trade-offs can make sense. At one extreme, if the disease is fatal and the treatment is cheap with very few negative side-effects, then you want to minimize false negatives as much as you can. What the gain/cost function should be is always dependent on the exact problem you are working on. When we present a general-purpose algorithm, we often focus on minimizing the number of mistakes, achieving the highest accuracy. However, if some mistakes are costlier than others, it might be better to accept a lower overall accuracy to minimize the overall costs. A more complex dataset and a more complex classifier We will now look at a slightly more complex dataset. This will motivate the introduction of a new classification algorithm and a few other ideas. Learning about the Seeds dataset We now look at another agricultural dataset, which is still small, but already too large to plot exhaustively on a page as we did with Iris. This dataset consists of measurements of wheat seeds. There are seven features that are present, which are as follows: area A perimeter P compactness C = 4pA/P² length of kernel width of kernel asymmetry coefficient length of kernel groove There are three classes, corresponding to three wheat varieties: Canadian, Koma, and Rosa. As earlier, the goal is to be able to classify the species based on these morphological measurements. Unlike the Iris dataset, which was collected in the 1930s, this is a very recent dataset and its features were automatically computed from digital images. This is how image pattern recognition can be implemented: you can take images, in digital form, compute a few relevant features from them, and use a generic classification system. For the moment, we will work with the features that are given to us. UCI Machine Learning Dataset Repository The University of California at Irvine (UCI) maintains an online repository of machine learning datasets (at the time of writing, they list 233 datasets). Both the Iris and the Seeds dataset used in this article were taken from there. The repository is available online at http://archive.ics.uci.edu/ml/. Features and feature engineering One interesting aspect of these features is that the compactness feature is not actually a new measurement, but a function of the previous two features, area and perimeter. It is often very useful to derive new combined features. Trying to create new features is generally called feature engineering. It is sometimes seen as less glamorous than algorithms, but it often matters more for performance (a simple algorithm on well-chosen features will perform better than a fancy algorithm on not-so-good features). In this case, the original researchers computed the compactness, which is a typical feature for shapes. It is also sometimes called roundness. This feature will have the same value for two kernels, one of which is twice as big as the other one, but with the same shape. However, it will have different values for kernels that are very round (when the feature is close to one) when compared to kernels that are elongated (when the feature is closer to zero). The goals of a good feature are to simultaneously vary with what matters (the desired output) and be invariant with what does not. For example, compactness does not vary with size, but varies with the shape. In practice, it might be hard to achieve both objectives perfectly, but we want to approximate this ideal. You will need to use background knowledge to design good features. Fortunately, for many problem domains, there is already a vast literature of possible features and feature-types that you can build upon. For images, all of the previously mentioned features are typical and computer vision libraries will compute them for you. In text-based problems too, there are standard solutions that you can mix and match. When possible, you should use your knowledge of the problem to design a specific feature or to select which ones from the literature are more applicable to the data at hand. Even before you have data, you must decide which data is worthwhile to collect. Then, you hand all your features to the machine to evaluate and compute the best classifier. A natural question is whether we can select good features automatically. This problem is known as feature selection. There are many methods that have been proposed for this problem, but in practice very simple ideas work best. For the small problems we are currently exploring, it does not make sense to use feature selection, but if you had thousands of features, then throwing out most of them might make the rest of the process much faster. Nearest neighbor classification For use with this dataset, we will introduce a new classifier: the nearest neighbor classifier. The nearest neighbor classifier is very simple. When classifying a new element, it looks at the training data for the object that is closest to it, its nearest neighbor. Then, it returns its label as the answer. Notice that this model performs perfectly on its training data! For each point, its closest neighbor is itself, and so its label matches perfectly (unless two examples with different labels have exactly the same feature values, which will indicate that the features you are using are not very descriptive). Therefore, it is essential to test the classification using a cross-validation protocol. The nearest neighbor method can be generalized to look not at a single neighbor, but to multiple ones and take a vote amongst the neighbors. This makes the method more robust to outliers or mislabeled data. Classifying with scikit-learn We have been using handwritten classification code, but Python is a very appropriate language for machine learning because of its excellent libraries. In particular, scikit-learn has become the standard library for many machine learning tasks, including classification. We are going to use its implementation of nearest neighbor classification in this section. The scikit-learn classification API is organized around classifier objects. These objects have the following two essential methods: fit(features, labels): This is the learning step and fits the parameters of the model predict(features): This method can only be called after fit and returns a prediction for one or more inputs Here is how we could use its implementation of k-nearest neighbors for our data. We start by importing the KneighborsClassifier object from the sklearn.neighbors submodule: >>> from sklearn.neighbors import KNeighborsClassifier The scikit-learn module is imported as sklearn (sometimes you will also find that scikit-learn is referred to using this short name instead of the full name). All of the sklearn functionality is in submodules, such as sklearn.neighbors. We can now instantiate a classifier object. In the constructor, we specify the number of neighbors to consider, as follows: >>> classifier = KNeighborsClassifier(n_neighbors=1) If we do not specify the number of neighbors, it defaults to 5, which is often a good choice for classification. We will want to use cross-validation (of course) to look at our data. The scikit-learn module also makes this easy: >>> from sklearn.cross_validation import KFold >>> kf = KFold(len(features), n_folds=5, shuffle=True) >>> # `means` will be a list of mean accuracies (one entry per fold) >>> means = [] >>> for training,testing in kf: ... # We fit a model for this fold, then apply it to the ... # testing data with `predict`: ... classifier.fit(features[training], labels[training]) ... prediction = classifier.predict(features[testing]) ... ... # np.mean on an array of booleans returns fraction ... # of correct decisions for this fold: ... curmean = np.mean(prediction == labels[testing]) ... means.append(curmean) >>> print("Mean accuracy: {:.1%}".format(np.mean(means))) Mean accuracy: 90.5% Using five folds for cross-validation, for this dataset, with this algorithm, we obtain 90.5 percent accuracy. As we discussed in the earlier section, the cross-validation accuracy is lower than the training accuracy, but this is a more credible estimate of the performance of the model. Looking at the decision boundaries We will now examine the decision boundary. In order to plot these on paper, we will simplify and look at only two dimensions. Take a look at the following plot: Canadian examples are shown as diamonds, Koma seeds as circles, and Rosa seeds as triangles. Their respective areas are shown as white, black, and grey. You might be wondering why the regions are so horizontal, almost weirdly so. The problem is that the x axis (area) ranges from 10 to 22, while the y axis (compactness) ranges from 0.75 to 1.0. This means that a small change in x is actually much larger than a small change in y. So, when we compute the distance between points, we are, for the most part, only taking the x axis into account. This is also a good example of why it is a good idea to visualize our data and look for red flags or surprises. If you studied physics (and you remember your lessons), you might have already noticed that we had been summing up lengths, areas, and dimensionless quantities, mixing up our units (which is something you never want to do in a physical system). We need to normalize all of the features to a common scale. There are many solutions to this problem; a simple one is to normalize to z-scores. The z-score of a value is how far away from the mean it is, in units of standard deviation. It comes down to this operation: In this formula, f is the old feature value, f' is the normalized feature value, μ is the mean of the feature, and σ is the standard deviation. Both μ and σ are estimated from training data. Independent of what the original values were, after z-scoring, a value of zero corresponds to the training mean, positive values are above the mean, and negative values are below it. The scikit-learn module makes it very easy to use this normalization as a preprocessing step. We are going to use a pipeline of transformations: the first element will do the transformation and the second element will do the classification. We start by importing both the pipeline and the feature scaling classes as follows: >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler Now, we can combine them. >>> classifier = KNeighborsClassifier(n_neighbors=1) >>> classifier = Pipeline([('norm', StandardScaler()), ('knn', classifier)]) The Pipeline constructor takes a list of pairs (str,clf). Each pair corresponds to a step in the pipeline: the first element is a string naming the step, while the second element is the object that performs the transformation. Advanced usage of the object uses these names to refer to different steps. After normalization, every feature is in the same units (technically, every feature is now dimensionless; it has no units) and we can more confidently mix dimensions. In fact, if we now run our nearest neighbor classifier, we obtain 93 percent accuracy, estimated with the same five-fold cross-validation code shown previously! Look at the decision space again in two dimensions: The boundaries are now different and you can see that both dimensions make a difference for the outcome. In the full dataset, everything is happening on a seven-dimensional space, which is very hard to visualize, but the same principle applies; while a few dimensions are dominant in the original data, after normalization, they are all given the same importance. Binary and multiclass classification The first classifier we used, the threshold classifier, was a simple binary classifier. Its result is either one class or the other, as a point is either above the threshold value or it is not. The second classifier we used, the nearest neighbor classifier, was a natural multiclass classifier, its output can be one of the several classes. It is often simpler to define a simple binary method than the one that works on multiclass problems. However, we can reduce any multiclass problem to a series of binary decisions. This is what we did earlier in the Iris dataset, in a haphazard way: we observed that it was easy to separate one of the initial classes and focused on the other two, reducing the problem to two binary decisions: Is it an Iris Setosa (yes or no)? If not, check whether it is an Iris Virginica (yes or no). Of course, we want to leave this sort of reasoning to the computer. As usual, there are several solutions to this multiclass reduction. The simplest is to use a series of one versus the rest classifiers. For each possible label ℓ, we build a classifier of the type is this ℓ or something else? When applying the rule, exactly one of the classifiers will say yes and we will have our solution. Unfortunately, this does not always happen, so we have to decide how to deal with either multiple positive answers or no positive answers. Alternatively, we can build a classification tree. Split the possible labels into two, and build a classifier that asks, "Should this example go in the left or the right bin?" We can perform this splitting recursively until we obtain a single label. The preceding diagram depicts the tree of reasoning for the Iris dataset. Each diamond is a single binary classifier. It is easy to imagine that we could make this tree larger and encompass more decisions. This means that any classifier that can be used for binary classification can also be adapted to handle any number of classes in a simple way. There are many other possible ways of turning a binary method into a multiclass one. There is no single method that is clearly better in all cases. The scikit-learn module implements several of these methods in the sklearn.multiclass submodule. Some classifiers are binary systems, while many real-life problems are naturally multiclass. Several simple protocols reduce a multiclass problem to a series of binary decisions and allow us to apply the binary models to our multiclass problem. This means methods that are apparently only for binary data can be applied to multiclass data with little extra effort. Summary Classification means generalizing from examples to build a model (that is, a rule that can automatically be applied to new, unclassified objects). It is one of the fundamental tools in machine. In a sense, this was a very theoretical article, as we introduced generic concepts with simple examples. We went over a few operations with the Iris dataset. This is a small dataset. However, it has the advantage that we were able to plot it out and see what we were doing in detail. This is something that will be lost when we move on to problems with many dimensions and many thousands of examples. The intuitions we gained here will all still be valid. You also learned that the training error is a misleading, over-optimistic estimate of how well the model does. We must, instead, evaluate it on testing data that has not been used for training. In order to not waste too many examples in testing, a cross-validation schedule can get us the best of both worlds (at the cost of more computation). We also had a look at the problem of feature engineering. Features are not predefined for you, but choosing and designing features is an integral part of designing a machine learning pipeline. In fact, it is often the area where you can get the most improvements in accuracy, as better data beats fancier methods. Resources for Article: Further resources on this subject: Ridge Regression [article] The Spark programming model [article] Using cross-validation [article]

0
0
6610

Packt

07 Jul 2015

15 min read

Installation and Setup

Packt

07 Jul 2015

15 min read

The Banana Pi is a single-board computer, which enables you to build your own individual and versatile system. In fact, it is a complete computer, including all the required elements such as a processor, memory, network, and other interfaces, which we are going to explore. It provides enough power to run even relatively complex applications suitably. In this article by, Ryad El-Dajani, author of the book, Banana Pi Cookbook, we are going to get to know the Banana Pi device. The available distributions are mentioned, as well as how to download and install these distributions. We will also examine Android in contrast to our upcoming Linux adventure. (For more resources related to this topic, see here.) Thus, you are going to transform your little piece of hardware into a functional, running computer with a working operating system. You will master the whole process of doing the required task from connecting the cables, choosing an operating system, writing the image to an SD card, and successfully booting up and shutting down your device for the first time. Banana Pi Overview In the following picture, you see a Banana Pi on the left-hand side and a Banana Pro on the right-hand side: As you can see, there are some small differences that we need to notice. The Banana Pi provides a dedicated composite video output besides the HDMI output. However, with the Banana Pro, you can connect your display via composite video output using a four-pole composite audio/video cable on the jack. In contrast to the Banana Pi, which has 26 pin headers, the Banana Pro provides 40 pins. Also the pins for the UART port interface are located below the GPIO headers on the Pi, while they are located besides the network interface on the Pro. The other two important differences are not clearly visible on the previous picture. The operating system for your device comes in the form of image files that need to be written (burned) to an SD card. The Banana Pi uses normal SD cards while the Banana Pro will only accept Micro SD cards. Moreover, the Banana Pro provides a Wi-Fi interface already on board. Therefore, you are also able to connect the Banana Pro to your wireless network, while the Pi would require an external wireless USB device. Besides the mentioned differences, the devices are very similar. You will find the following hardware components and interfaces on your device. On the back side, you will find: A20 ARM Cortex-A7 dual core central processing unit (CPU) ARM Mali400 MP2 graphics processing unit (GPU) 1 gigabyte of DDR3 memory (that is shared with the GPU) On the front side, you will find: Ethernet network interface adapter Two USB 2.0 ports A 5V micro USB power with DC in and a micro USB OTG port A SATA 2.0 port and SATA power output Various display outputs [HDMI, LVDS, and composite (integrated into jack on the Pro)] A CSI camera input connector An infrared (IR) receiver A microphone Various hardware buttons on board (power key, reset key, and UBoot key) Various LEDs (red for power status, blue for Ethernet status, and green for user defined) As you can see, you have a lot of opportunities for letting your device interact with various external components. Operating systems for the Banana Pi The Banana Pi is capable of running any operating system that supports the ARM Cortex-A7 architecture. There are several operating systems precompiled, so you are able to write the operating system to an SD card and boot your system flawlessly. Currently, there are the following operating systems provided officially by LeMaker, the manufacturer of the Banana Pi. Android Android is a well-known operating system for mobile phones, but it is also runnable on various other devices such as smart watches, cars, and, of course, single-board computers such as the Banana Pi. The main advantage of running Android on a single-board computer is its convenience. Anybody who uses an Android-based smartphone will recognize the graphical user interface (GUI) and may have less initial hurdles. Also, setting up a media center might be easier to do on Android than on a Linux-based system. However, there are also a few disadvantages, as you are limited to software that is provided by an Android store such as Google Play. As most apps are optimized for mobile use at the moment, you will not find a lot of usable software for your Banana Pi running Android, except some Games and Multimedia applications. Moreover, you are required to use special Windows software called PhoenixCard to be able to prepare an Android SD card. In this article, we are going to ignore the installing of Android. For further information, please see Installing the Android OS image (LeMaker Wiki) at http://wiki.lemaker.org/BananaPro/Pi:SD_card_installation. Linux Most of the Linux users never realize that they are actually using Linux when operating their phones, appliances, routers, and many more products, as most of its magic happens in the background. We are going to dig into this adventure to discover its possibilities when running on our Banana Pi device. The following Linux-based operating systems—so-called distributions—are used by the majority of the Banana Pi user base and are supported officially by the manufacturer: Lubuntu: This is a lightweight distribution based on the well-known Ubuntu using the LXDE desktop, which is principally a good choice, if you are a Windows user. Raspbian: This is a distribution based on Debian, which was initially produced for the Raspberry Pi (hence the name). As a lot of Raspberry Pi owners are running Raspbian on their devices while also experimenting with the Banana Pi, LeMaker ported the original Raspbian distribution to the Banana Pi. Raspbian also comes with an LXDE desktop by default. Bananian: This too is a Debian-based Linux distribution optimized exclusively for the Banana Pi and its siblings. All of the aforementioned distributions are based on the well-known distribution, Debian. Besides the huge user base, all Debian-based distributions use the same package manager Apt (Advanced Packaging Tool) to search for and install new software, and all are similar to use. There are still more distributions that are officially supported by LeMaker, such as Berryboot, LeMedia, OpenSUSE, Fedora, Gentoo, Scratch, ArchLinux, Open MediaVault, and OpenWrt. All of them have their pros and cons or their specific use cases. If you are an experienced Linux user, you may choose your preferred distribution from the mentioned list, as most of the recipes are similar to, or even equally usable on, most of the Linux-based operating systems. Moreover, the Banana Pi community publishes various customized Linux distributions for the Banana Pi regularly. The possible advantages of a customized distribution may include enabled and optimized hardware acceleration capabilities, supportive helper scripts, fully equipped desktop environments, and much more. However, when deciding to use a customized distribution, there is no official support by LeMaker and you have to contact the publisher in case you encounter bugs, or need help. You can also check the customized Arch Linux image that author have built (http://blog.eldajani.net/banana-pi-arch-linux-customized-distribution/) for the Banana Pi and Banana Pro, including several useful applications. Downloading an operating system for the Banana Pi The following two recipes will explain how to set up the SD card with the desired operating system and how to get the Banana Pi up and running for the first time. This recipe is a predecessor. Besides the device itself, you will need at least a source for energy, which is usually a USB power supply and an SD card to boot your Banana Pi. Also, a network cable and connection is highly recommended to be able to interact with your Banana Pi from another computer via a remote shell using the application. You might also want to actually see something on a display. Then, you will need to connect your Banana Pi via HDMI, composite, or LVDS to an external screen. It is recommended that you use an HDMI Version 1.4 cable since lower versions can possibly cause issues. Besides inputting data using a remote shell, you can directly connect an USB keyboard and mouse to your Banana Pi via the USB ports. After completing the required tasks in the upcoming recipes, you will be able to boot your Banana Pi. Getting ready The following components are required for this recipe: Banana Pi SD card (minimum class 4; class 10 is recommended) USB power supply (5V 2A recommended) A computer with an SD card reader/writer (to write the image to the SD card) Furthermore, you are going to need an Internet connection to download a Linux distribution or Android. A few optional but highly recommended components are: Connection to a display (via HDMI or composite) Network connection via Ethernet USB keyboard and mouse You can acquire these items from various retailers. All items shown in the previous two pictures were bought from an online retailer that is known for originally selling books. However, the Banana Pi and the other products can be acquired from a large number of retailers. It is recommended to get a USB power supply with 2000mA (2A) output. How to do it… To download an operating system for Banana Pi, follow these steps: Download an image of your desired operating system. We are going to download Android and Raspbian from the official LeMaker image files website: http://www.lemaker.org/resources/9-38/image_files.html. The following screenshot shows the LeMaker website where you can download the official images: If you are clicking on one of the mirrors (such as Google Drive, Dropbox, and so on), you will be redirected to the equivalent file-hosting service. From there, you are actually able to download the archive file. Once your archive containing the image is downloaded, you are ready to unpack the downloaded archive, which we will do in the upcoming recipes. Setting up the SD card on Windows This recipe will explain how to set up the SD card using a Windows operating system. How to do it… In the upcoming steps, we will unpack the archive containing the operating system image for the Banana Pi and write the image to the SD card: Open the downloaded archive with 7-Zip. The following screenshot shows the 7-Zip application opening a compressed .tgz archive: Unpack the archive to a directory until you get a file with the file extension .img. If it is .tgz or .tar.gz file, you will need to unpack the archive twice Create a backup of the contents of the SD card as everything on the SD card is going to be erased unrecoverablely. Open SD Formatter (https://www.sdcard.org/downloads/formatter_4/) and check the disk letter (E: in the following screenshot). Choose Option to open the Option Setting window and choose: FORMAT TYPE: FULL (Erase) FORMAT SIZE ADJUSTMENT: ON When everything is configured correctly, check again to see whether you are using the correct disk and click Format to start the formatting process. Writing a Linux distribution image to the SD card on Windows The following steps explain how to write a Linux-based distribution to the SD card on Windows: Format the SD card using SD Formatter, which we covered in the previous section. Open the Win32 Disk Imager (http://sourceforge.net/projects/win32diskimager/). Choose the image file by clicking on the directory button. Check, whether you are going to write to the correct disk and then click on Write. Once the burning process is done, you are ready to insert the freshly prepared SD card containing your Linux operating system into the Banana Pi and boot it up for the first time. Booting up and shutting down the Banana Pi This recipe will explain how to boot up and shut down the Banana Pi. As the Banana Pi is a real computer, these tasks are as equally important as tasks on your desktop computer. The booting process starts the Linux kernel and several important services. The shutting down stops them accordingly and does not power off the Banana Pi until all data is synchronized with the SD card or external components correctly. How to do it… We are going to boot up and shut down the Banana Pi. Booting up Do the following steps to boot up your Banana Pi: Attach the Ethernet cable to your local network. Connect your Banana Pi to a display. Plug in an USB keyboard and mouse. Insert the SD card to your device. Power your Banana Pi by plugging in the USB power cable. The next screenshot shows the desktop of Raspbian after a successful boot: Shutting down Linux To shut down your Linux-based distribution, you either use the shutdown command or do it via the desktop environment (in case of Raspbian, it is called LXDE). For the latter method, these are the steps: Click on the LXDE icon in the lower-left corner. Click on Logout. Click on Shutdown in the upcoming window. To shut down your operating system via the shell, type in the following command: $ sudo shutdown -h now Connecting via SSH on Windows using PuTTY The following recipe shows you how to connect to your Banana Pi remotely using an open source application called PuTTY. Getting ready For this recipe, you will need the following ingredients: A booted up Linux operating system on your Banana Pi connected to your local network The PuTTY application on your Windows PC that is also connected to your local area network How to do it… To connect to your Banana Pi via SSH on Windows, perform the following: Run putty.exe. You will see the PuTTY Configuration dialog. Enter the IP address of the Banana Pi and leave the Port as number 22 as. Click on the Open button. A new terminal will appear, attempting a connection to the Banana Pi. When connecting to the Banana Pi for the first time, you will see a PuTTY security alert. The following screenshot shows the PuTTY Security Alert window: Trust the connection by clicking on Yes. You will be requested to enter the login credentials. Use the default username bananapi and password bananapi. When you are done, you should be welcomed by the shell of your Banana Pi. The following screenshot shows the shell of your Banana Pi accessed via SSH using PuTTY on Windows: To quit your SSH session, execute the command exit or press Ctrl + D. Searching, installing, and removing the software Once you have your decent operating system on the Banana Pi, sooner or later you are going to require a new software. As most software for Linux systems is published as open source, you can obtain the source code and compile it for yourself. One alternative is to use a package manager. A lot of software is precompiled and provided as installable packages by the so-called repositories. In case of Debian-based distributions (for example, Raspbian, Bananian, and Lubuntu), the package manager that uses these repositories is called Advanced Packaging Tool (Apt). The two most important tools for our requirements will be apt-get and apt-cache. In this recipe, we will cover the searching, the installing, and removing of software using the Apt utilities. Getting ready The following ingredients are required for this recipe. A booted Debian-based operating system on your Banana Pi An Internet connection How to do it… We will separate this recipe into searching for, installing and removing of packages. Searching for packages In the upcoming example, we will search for a solitaire game: Connect to your Banana Pi remotely or open a terminal on the desktop. Type the following command into the shell: $ apt-cache search solitaire You will get a list of packages that contain the string solitaire in their package name or description. Each line represents a package and shows the package name and description separated by a dash (-). Now we have obtained a list of solitaire games: The preceding screenshot shows the output after searching for packages containing the string solitaire using the apt-cache command. Installing a package We are going to install a package by using its package name. From the previous received list, we select the package ace-of-penguins. Type the following command into the shell: $ sudo apt-get install ace-of-penguins If asked to type the password for sudo, enter the user's password. If a package requires additional packages (dependencies), you will be asked to confirm the additional packages. In this case, enter Y. After downloading and installing, the desired package is installed: Removing a package When you want to uninstall (remove) a package, you also use the apt-get command: Type the following command into a shell: $ sudo apt-get remove ace-of-penguins If asked to type the password for sudo, enter the user's password. You will be asked to confirm the removal. Enter Y. After this process, the package is removed from your system. You will have uninstalled the package ace-of-penguins. Summary In this article, we discovered the installation of a Linux operating system on the Banana Pi. Furthermore, we connected to the Banana Pi via the SSH protocol using PuTTY. Moreover, we discussed how to install new software using the Advanced Packaging Tool. This article is a combination of parts from the first two chapters of the Banana Pi Cookbook. In the Banana Pi Cookbook, we are diving more into detail and explain the specifics of the Banana Pro, for example, how to connect to the local network via WLAN. If you are using a Linux-based desktop computer, you will also learn how to set up the SD card and connect via SSH to your Banana Pi on your Linux computer. Resources for Article: Further resources on this subject: Color and motion finding [article] Controlling the Movement of a Robot with Legs [article] Develop a Digital Clock [article]

0
0
6601

article-image-work-with-classes-in-typescript

Amey Varangaonkar

15 May 2018

8 min read

How to work with classes in Typescript

Amey Varangaonkar

15 May 2018

8 min read

If we are developing any application using TypeScript, be it a small-scale or a large-scale application, we will use classes to manage our properties and methods. Prior to ES 2015, JavaScript did not have the concept of classes, and we used functions to create class-like behavior. TypeScript introduced classes as part of its initial release, and now we have classes in ES6 as well. The behavior of classes in TypeScript and JavaScript ES6 closely relates to the behavior of any object-oriented language that you might have worked on, such as C#. This excerpt is taken from the book TypeScript 2.x By Example written by Sachin Ohri. Object-oriented programming in TypeScript Object-oriented programming allows us to represent our code in the form of objects, which themselves are instances of classes holding properties and methods. Classes form the container of related properties and their behavior. Modeling our code in the form of classes allows us to achieve various features of object-oriented programming, which helps us write more intuitive, reusable, and robust code. Features such as encapsulation, polymorphism, and inheritance are the result of implementing classes. TypeScript, with its implementation of classes and interfaces, allows us to write code in an object-oriented fashion. This allows developers coming from traditional languages, such as Java and C#, feel right at home when learning TypeScript. Understanding classes Prior to ES 2015, JavaScript developers did not have any concept of classes; the best way they could replicate the behavior of classes was with functions. The function provides a mechanism to group together related properties and methods. The methods can be either added internally to the function or using the prototype keyword. The following is an example of such a function: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; this.fullName = function() { return this.firstName + ' ' + this.lastName ; }; } In this preceding example, we have the fullName method encapsulated inside the Name function. Another way of adding methods to functions is shown in the following code snippet with the prototype keyword: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; } Name.prototype.fullName = function() { return this.firstName + ' ' + this.lastName ; }; These features of functions did solve most of the issues of not having classes, but most of the dev community has not been comfortable with these approaches. Classes make this process easier. Classes provide an abstraction on top of common behavior, thus making code reusable. The following is the syntax for defining a class in TypeScript: The syntax of the class should look very similar to readers who come from an object-oriented background. To define a class, we use a class keyword followed by the name of the class. The News class has three member properties and one method. Each member has a type assigned to it and has an access modifier to define the scope. On line 10, we create an object of a class with the new keyword. Classes in TypeScript also have the concept of a constructor, where we can initialize some properties at the time of object creation. Access modifiers Once the object is created, we can access the public members of the class with the dot operator. Note that we cannot access the author property with the espn object because this property is defined as private. TypeScript provides three types of access modifiers. Public Any property defined with the public keyword will be freely accessible outside the class. As we saw in the previous example, all the variables marked with the public keyword were available outside the class in an object. Note that TypeScript assigns public as a default access modifier if we do not assign any explicitly. This is because the default JavaScript behavior is to have everything public. Private When a property is marked as private, it cannot be accessed outside of the class. The scope of a private variable is only inside the class when using TypeScript. In JavaScript, as we do not have access modifiers, private members are treated similarly to public members. Protected The protected keyword behaves similarly to private, with the exception that protected variables can be accessed in the derived classes. The following is one such example: class base{ protected id: number; } class child extends base{ name: string; details():string{ return `${name} has id: ${this.id}` } } In the preceding code, we extend the child class with the base class and have access to the id property inside the child class. If we create an object of the child class, we will still not have access to the id property outside. Readonly As the name suggests, a property with a readonly access modifier cannot be modified after the value has been assigned to it. The value assigned to a readonly property can only happen at the time of variable declaration or in the constructor. In the above code, line 5 gives an error stating that property name is readonly, and cannot be an assigned value. Transpiled JavaScript from classes While learning TypeScript, it is important to remember that TypeScript is a superset of JavaScript and not a new language on its own. Browsers can only understand JavaScript, so it is important for us to understand the JavaScript that is transpiled by TypeScript. TypeScript provides an option to generate JavaScript based on the ECMA standards. You can configure TypeScript to transpile into ES5 or ES6 (ES 2015) and even ES3 JavaScript by using the flag target in the tsconfig.json file. The biggest difference between ES5 and ES6 is with regard to the classes, let, and const keywords which were introduced in ES6. Even though ES6 has been around for more than a year, most browsers still do not have full support for ES6. So, if you are creating an application that would target older browsers as well, consider having the target as ES5. So, the JavaScript that's generated will be different based on the target setting. Here, we will take an example of class in TypeScript and generate JavaScript for both ES5 and ES6. The following is the class definition in TypeScript: This is the same code that we saw when we introduced classes in the Understanding Classes section. Here, we have a class named News that has three members, two of which are public and one private. The News class also has a format method, which returns a string concatenated from the member variables. Then, we create an object of the News class in line 10 and assign values to public properties. In the last line, we call the format method to print the result. Now let's look at the JavaScript transpiled by TypeScript compiler for this class. ES6 JavaScript ES6, also known as ES 2015, is the latest version of JavaScript, which provides many new features on top of ES5. Classes are one such feature; JavaScript did not have classes prior to ES6. The following is the code generated from the TypeScript class, which we saw previously: If you compare the preceding code with TypeScript code, you will notice minor differences. This is because classes in TypeScript and JavaScript are similar, with just types and access modifiers additional in TypeScript. In JavaScript, we do not have the concept of declaring public members. The author variable, which was defined as private and was initialized at its declaration, is converted to a constructor initialization in JavaScript. If we had not have initialized author, then the produced JavaScript would not have added author in the constructor. ES5 JavaScript ES5 is the most popular JavaScript version supported in browsers, and if you are developing an application that has to support the majority of browser versions, then you need to transpile your code to the ES5 version. This version of JavaScript does not have classes, and hence the transpiled code converts classes to functions, and methods inside the classes are converted to prototypically defined methods on the functions. The following is the code transpiled when we have the target set as ES5 in the TypeScript compiler options: As discussed earlier, the basic difference is that the class is converted to a function. The interesting aspect of this conversion is that the News class is converted to an immediately invoked function expression (IIFE). An IIFE can be identified by the parenthesis at the end of the function declaration, as we see in line 9 in the preceding code snippet. IIFEs cause the function to be executed immediately and help to maintain the correct scope of a function rather than declaring the function in a global scope. Another difference was how we defined the method format in the ES5 JavaScript. The prototype keyword is used to add the additional behavior to the function, which we see here. A couple of other differences you may have noticed include the change of the let keyword to var, as let is not supported in ES5. All variables in ES5 are defined with the var keyword. Also, the format method now does not use a template string, but standard string concatenation to print the output. TypeScript does a good job of transpiling the code to JavaScript while following recommended practices. This helps in making sure we have a robust and reusable code with minimum error cases. If you found this tutorial useful, make sure you check out the book TypeScript 2.x By Example for more hands-on tutorials on how to effectively leverage the power of TypeScript to develop and deploy state-of-the-art web applications. How to install and configure TypeScript Understanding Patterns and Architectures in TypeScript Writing SOLID JavaScript code with TypeScript

0
0
6588

article-image-microsoft-office-excel-programming-using-vsto-30

Packt

08 Oct 2009

10 min read

Microsoft Office Excel Programming Using VSTO 3.0

Packt

08 Oct 2009

10 min read

0
0
6577

How-To Tutorials

Packt

19 Aug 2015

20 min read

The strange relationship between objects, functions, generators and coroutines

Packt

19 Aug 2015

20 min read

The strange relationship between objects, functions, generators and coroutines In this article, I’d like to investigate some relationships between functions, objects, generators and coroutines in Python. At a theoretical level, these are very different concepts, but because of Python’s dynamic nature, many of them can appear to be used interchangeably. I discuss useful applications of all of these in my book, Python 3 Object-oriented Programming - Second Edition. In this essay, we’ll examine their relationship in a more whimsical light; most of the code examples below are ridiculous and should not be attempted in a production setting! Let’s start with functions, which are simplest. A function is an object that can be executed. When executed, the function is entered at one place accepting a group of (possibly zero) objects as parameters. The function exits at exactly one place and always returns a single object. Already we see some complications; that last sentence is true, but you might have several counter-questions: What if a function has multiple return statements? Only one of them will be executed in any one call to the function. What if the function doesn’t return anything? Then it it will implicitly return the None object. Can’t you return multiple objects separated by a comma? Yes, but the returned object is actually a single tuple Here’s a look at a function: def average(sequence): avg = sum(sequence) / len(sequence) return avg print(average([3, 5, 8, 7])) which outputs: That’s probably nothing you haven’t seen before. Similarly, you probably know what an object and a class are in Python. I define an object as a collection of data and associated behaviors. A class represents the “template” for an object. Usually the data is represented as a set of attributes and the behavior is represented as a collection of method functions, but this doesn’t have to be true. Here’s a basic object: class Statistics: def __init__(self, sequence): self.sequence = sequence def calculate_average(self): return sum(self.sequence) / len(self.sequence) def calculate_median(self): length = len(self.sequence) is_middle = int(not length % 2) return ( self.sequence[length // 2 - is_middle] + self.sequence[-length // 2]) / 2 statistics = Statistics([5, 2, 3]) print(statistics.calculate_average()) which outputs: This object has one piece of data attached to it: the sequence. It also has two methods besides the initializer. Only one of these methods is used in this particular example, but as Jack Diederich said in his famous Stop Writing Classes talk, a class with only one function besides the initializer should just be a function. So I included a second one to make it look like a useful class (It’s not. The new statistics module in Python 3.4 should be used instead. Never define for yourself that which has been defined, debugged, and tested by someone else). Classes like this are also things you’ve seen before, but with this background in place, we can now look at some bizarre things you might not expect (or indeed, want) to be able to do with a function. For example, did you know that functions are objects? In fact, anything that you can interact with in Python is defined in the source code for the CPython interpreter as a PyObject structure. This includes functions, objects, basic primitives, containers, classes, modules, you name it. This means we can attach attributes to a function just as with any standard object. Ah, but if functions are objects, can you attach functions to functions? Don’t try this at home (and especially don’t do it at work): def statistics(sequence): statistics.sequence = sequence return statistics def calculate_average(): return sum(statistics.sequence) / len(statistics.sequence) statistics.calculate_average = calculate_average print(statistics([1, 5, 8, 4]).calculate_average()) which outputs: This is a pretty crazy example (but we’re just getting started). The statistics function is being set up as an object that has two attributes: sequence is a list and calculate_average is another function object. For fun, the function returns itself so that the print function can call the calculate_average function all in one line. Note that the statistics function here is an object, not a class. Rather than emulating the Statistics class in the previous example, it is more similar to the statistics instance of that class. It is hard to imagine any reason that you would want to write code like this in real life. Perhaps it could be used to implement the Singleton (anti-)pattern popular with some other languages. Since there can only ever be one statistics function, it is not possible to create two distinct instances with two distinct sequence attributes the way we can with the Statistics class. There is generally little need for such code in Python, though, because of its ‘consenting adults’ nature. We can more closely simulate a class by using a function like a constructor: def Statistics(sequence): def self(): return self.average() self.sequence = sequence def average(): return sum(self.sequence) / len(self.sequence) self.average = average return self statistics = Statistics([2, 1, 1]) print(Statistics([1, 4, 6, 2]).average()) print(statistics()) which outputs: That looks an awful lot like Javascript, doesn’t it? The Statistics function acts like a constructor that returns an object (that happens to be a function, named self). That function object has had a couple attributes attached to it, so our function is now an object with both data and behavior. The last three lines show that we can instantiate two separate Statistics “objects” just as if it were a class. Finally, since the statistics object in the last line really is a function, we can even call it directly. It proxies the call through to the average function (or is it a method at this point? I can’t tell anymore) defined on itself. Before we go on, note that this simulated overlapping of functionality does not mean that we are getting exactly the same behavior out of the Python interpreter. While functions are objects, not all objects are functions. The underlying implementation is different, and if you try to do this in production code, you’ll quickly find confusing anomalies. In normal code, the fact that functions can have attributes attached to them is rarely useful. I’ve seen it used for interesting diagnostics or testing, but it’s generally just a hack. However, knowing that functions are objects allows us to pass them around to be called at a later time. Consider this basic partial implementation of an observer pattern: class Observers(list): register_observer = list.append def notify(self): for observer in self: observer() observers = Observers() def observer_one(): print('one was called') def observer_two(): print('two was called') observers.register_observer(observer_one) observers.register_observer(observer_two) observers.notify() which outputs: At line 2, I’ve intentionally reduced the comprehensibility of this code to conform to my initial ‘most of the examples in this article are ridiculous’ thesis. This line creates a new class attribute named register_observer which points to the list.append function. Since the Observers class inherits from the list class, this line essentially creates a shortcut to a method that would look like this: def register_observer(self, item): self.append(item) And this is how you should do it in your code. Nobody’s going to understand what’s going on if you follow my version. The part of this code that you might want to use in real life is the way the callback functions are passed into the registration function at lines 16 and 17. Passing functions around like this is quite common in Python. The alternative, if functions were not objects, would be to create a bunch of classes that have a single method with an uninformative name like execute and pass those around instead. Observers are a bit too useful, though, so in the spirit of keeping things ridiculous, let’s make a silly function that returns a function object: def silly(): print("silly") return silly silly()()()()()() which outputs: Since we’ve seen some ways that functions can (sort of) imitate objects, let’s now make an object that can behave like a function: class Function: def __init__(self, message): self.message = message def __call__(self, name): return "{} says '{}'".format(name, self.message) function = Function("I am a function") print(function('Cheap imitation function')) which outputs: I don’t use this feature often, but it can be useful in a few situations. For example, if you write a function and call it from many different places, but later discover that it needs to maintain some state, you can change the function to an object and implement the __call__ method without changing all the call sites. Or if you have a callback implementation that normally passes functions around, you can use a callable object when you need to store more complicated state. I’ve also seen Python decorators made out of objects when additional state or behavior is required. Now, let’s talk about generators. As you might expect by now, we’ll start with the silliest way to implement generation code. We can use the idea of a function that returns an object to create a rudimentary generatorish object that calculates the Fibonacci sequence: def FibFunction(): a = b = 1 def next(): nonlocal a, b a, b = b, a + b return b return next fib = FibFunction() for i in range(8): print(fib(), end=' ') which outputs: This is a pretty wacky thing to do, but the point is that it is possible to build functions that are able to maintain state between calls. The state is stored in the surrounding closure; we can access that state by referencing them with Python 3’s nonlocal keyword. It is kind of like global except it accesses the state from the surrounding function, not the global namespace. We can, of course, build a similar construct using classic (or classy) object notation: class FibClass(): def __init__(self): self.a = self.b = 1 def __call__(self): self.a, self.b = self.b, self.a + self.b return self.b fib = FibClass() for i in range(8): print(fib(), end=' ') which outputs: Of course, neither of these obeys the iterator protocol. No matter how I wrangle it, I was not able to get FibFunction to work with Python’s builtin next() function, even after looking through the CPython source code for a couple hours. As I mentioned earlier, using the function syntax to build pseudo-objects quickly leads to frustration. However, it’s easy to tweak the object based FibClass to fulfill the iterator protocol: class FibIterator(): def __init__(self): self.a = self.b = 1 def __next__(self): self.a, self.b = self.b, self.a + self.b return self.b def __iter__(self): return self fib = FibIterator() for i in range(8): print(next(fib), end=' ') which outputs: This class is a standard implementation of the iterator pattern. But it’s kind of ugly and verbose. Luckily, we can get the same effect in Python with a function that includes a yield statement to construct a generator. Here’s the Fibonacci sequence as a generator: ef FibGenerator(): a = b = 1 while True: a, b = b, a + b yield b fib = FibGenerator() for i in range(8): print(next(fib), end=' ') print('n', fib) which outputs: The generator version is a bit more readable than the other two implementations. The thing to pay attention to here is that a generator is not a function. The FibGenerator function returns an object as illustrated by the words “generator object” in the last line of output above. Unlike a normal function, a generator function does not execute any code inside it when we call the function. Instead it constructs a generator object and returns that. You could think of this as like an implicit Python decorator; the Python interpreter sees the yield keyword and wraps it in a decorator that returns an object instead. To start the function code executing, we have to use the next function (either explicitly as in the examples, or implicitly by using a for loop or yield from). While a generator is technically an object, it is often convenient to think of the function that creates it as a function that can have data passed in at one place and can return values multiple times. It’s sort of like a generic version of a function (which can have data passed in at one place and return a value at only one place). It is easy to make a generator that behaves not completely unlike a function, by yielding only one value: def average(sequence): yield sum(sequence) / len(sequence) print(next(average([1, 2, 3]))) which outputs: Unfortunately, the call site at line 4 is less readable than a normal function call, since we have to throw that pesky next() in there. The obvious way around this would be to add a __call__ method to the generator but this fails if we try to use attribute assignment or inheritance. There are optimizations that make generators run quickly in C code and also don’t let us assign attributes to them. We can, however, wrap the generator in a function-like object using a ludicrous decorator: def gen_func(func): def wrapper(*args, **kwargs): gen = func(*args, **kwargs) return next(gen) return wrapper @gen_func def average(sequence): yield sum(sequence) / len(sequence) print(average([1, 6, 3, 4])) which outputs: Of course this is an absurd thing to do. I mean, just write a normal function for pity’s sake! But taking this idea a little further, it could be tempting to create a slightly different wrapper: def callable_gen(func): class CallableGen: def __init__(self, *args, **kwargs): self.gen = func(*args, **kwargs) def __next__(self): return self.gen.__next__() def __iter__(self): return self def __call__(self): return next(self) return CallableGen @callable_gen def FibGenerator(): a = b = 1 while True: a, b = b, a + b yield b fib = FibGenerator() for i in range(8): print(fib(), end=' ') which outputs: To completely wrap the generator, we’d need to proxy a few other methods through to the underlying generator including send, close, and throw. This generator wrapper can be used to call a generator any number of times without calling the next function. I’ve been tempted to do this to make my code look cleaner if there are a lot of next calls in it, but I recommend not yielding into this temptation. Coders reading your code, including yourself, will go berserk trying to figure out what that “function call” is doing. Just get used to the next function and ignore this decorator business. So we’ve drawn some parallels between generators, objects and functions. Let’s talk now about one of the more confusing concepts in Python: coroutines. In Python, coroutines are usually defined as “generators that you can send values into”. At an implementation level, this is probably the most sensible definition. In the theoretical sense, however, it is more accurate to define coroutines as constructs that can accept values at one or more locations and return values at one or more locations. Therefore, while in Python it is easy to think of a generator as a special type of function that has yield statements and a coroutine as a special type of generator that we can send data into a different points, a better taxonomy is to think of a coroutine that can accept and return values at multiple locations as a general case, and generators and functions as special types of coroutines that are restricted in where they can accept or return values. So let’s see a coroutine: def LineInserter(lines): out = [] for line in lines: to_append = yield line out.append(line) if to_append is not None: out.append(to_append) return out emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ inserter = LineInserter(iter(emily.splitlines())) count = 1 try: line = next(inserter) while True: line = next(inserter) if count % 4 else inserter.send('-------') count += 1 except StopIteration as ex: print('n' + 'n'.join(ex.value)) which outputs: The LineInserter object is called a coroutine rather than a generator only because the yield statement is placed on the right side of an assignment operator. Now whenever we yield a line, it stores any value that might have been sent back into the coroutine in the to_append variable. As you can see in the driver code, we can send a value back in using inserter.send. if you instead just use next, the to_append variable gets a value of None. Don’t ask me why next is a function and send is a method when they both do nearly the same thing! In this example, we use the send call to insert a ruler every four lines to separate stanzas in Emily Dickinson’s famous poem. But I used the exact same coroutine in a program that parses the source file for this article. It checks if any line contains the string !#python, and if so, it executes the subsequent code block and inserts the output (see the ‘which outputs’ lines throughout this article) into the article. Coroutines can provide that little extra something when normal ‘one way’ iteration doesn’t quite cut it. The coroutine in the last example is really nice and elegant, but I find the driver code a bit annoying. I think it’s just me, but something about the indentation of a try…catch statement always frustrates me. Recently, I’ve been emulating Python 3.4’s contextlib.suppress context manager to replace except clauses with a callback. For example: def LineInserter(lines): out = [] for line in lines: to_append = yield line out.append(line) if to_append is not None: out.append(to_append) return out from contextlib import contextmanager @contextmanager def generator_stop(callback): try: yield except StopIteration as ex: callback(ex.value) def lines_complete(all_lines): print('n' + 'n'.join(all_lines)) emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ inserter = LineInserter(iter(emily.splitlines())) count = 1 with generator_stop(lines_complete): line = next(inserter) while True: line = next(inserter) if count % 4 else inserter.send('-------') count += 1 which outputs: The generator_stop now encapsulates all the ugliness, and the context manager can be used in a variety of situations where StopIteration needs to be handled. Since coroutines are undifferentiated from generators, they can emulate functions just as we saw with generators. We can even call into the same coroutine multiple times as if it were a function: def IncrementBy(increment): sequence = yield while True: sequence = yield [i + increment for i in sequence] sequence = [10, 20, 30] increment_by_5 = IncrementBy(5) increment_by_8 = IncrementBy(8) next(increment_by_5) next(increment_by_8) print(increment_by_5.send(sequence)) print(increment_by_8.send(sequence)) print(increment_by_5.send(sequence)) which outputs: Note the two calls to next at lines 9 and 10. These effectively “prime” the generator by advancing it to the first yield statement. Then each call to send essentially looks like a single call to a function. The driver code for this coroutine doesn’t look anything like calling a function, but with some evil decorator magic, we can make it look less disturbing: def evil_coroutine(func): def wrapper(*args, **kwargs): gen = func(*args, **kwargs) next(gen) def gen_caller(arg=None): return gen.send(arg) return gen_caller return wrapper @evil_coroutine def IncrementBy(increment): sequence = yield while True: sequence = yield [i + increment for i in sequence] sequence = [10, 20, 30] increment_by_5 = IncrementBy(5) increment_by_8 = IncrementBy(8) print(increment_by_5(sequence)) print(increment_by_8(sequence)) print(increment_by_5(sequence)) which outputs: The decorator accepts a function and returns a new wrapper function that gets assigned to the IncrementBy variable. Whenever this new IncrementBy is called, it constructs a generator using the original function, and advances it to the first yield statement using next (the priming action from before). It returns a new function that calls send on the generator each time it is called. This function makes the argument default to None so that it can also work if we call next instead of send. The new driver code is definitely more readable, but once again, I would not recommend using this coding style to make coroutines behave like hybrid object/functions. The argument that other coders aren’t going to understand what is going through your head still stands. Plus, since send can only accept one argument, the callable is quite restricted. Before we leave our discussion of the bizarre relationships between these concepts, let’s look at how the stanza processing code could look without an explicit coroutine, generator, function, or object: emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ for index, line in enumerate(emily.splitlines(), start=1): print(line) if not index % 4: print('------') which outputs: This code is so simple and elegant! This happens to me nearly every time I try to use coroutines. I keep refactoring and simplifying it until I discover that coroutines are making my code less, not more, readable. Unless I am explicitly modeling a state-transition system or trying to do asynchronous work using the terrific asyncio library (which wraps all the possible craziness with StopIteration, cascading exceptions, etc), I rarely find that coroutines are the right tool for the job. That doesn’t stop me from attempting them though, because they are fun. For the record, the LineInserter coroutine actually is useful in the markdown code executor I use to parse this source file. I need to keep track of more transitions between states (am I currently looking for a code block? am I in a code block? Do I need to execute the code block and record the output?) than in the stanza marking example used here. So, it has become clear that in Python, there is more than one way to do a lot of things. Luckily, most of these ways are not very obvious, and there is usually “one, and preferably only one, obvious way to do things”, to quote The Zen Of Python. I hope by this point that you are more confused about the relationship between functions, objects, generators and coroutines than ever before. I hope you now know how to write much code that should never be written. But mostly I hope you’ve enjoyed your exploration of these topics. If you’d like to see more useful applications of these and other Python concepts, grab a copy of Python 3 Object-oriented Programming, Second Edition.

0
0
6576

article-image-working-with-pandas-dataframes

Sugandha Lahoti

23 Feb 2018

15 min read

Working with pandas DataFrames

Sugandha Lahoti

23 Feb 2018

15 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book Python Data Analysis - Second Edition written by Armando Fandango. From this book, you will learn how to process and manipulate data with Python for complex data analysis and modeling. Code bundle for this article is hosted on GitHub.[/box] The popular open source Python library, pandas is named after panel data (an econometric term) and Python data analysis. We shall learn about basic panda functionalities, data structures, and operations in this article. The official pandas documentation insists on naming the project pandas in all lowercase letters. The other convention the pandas project insists on, is the import pandas as pd import statement. We will follow these conventions in this text. In this tutorial, we will install and explore pandas. We will also acquaint ourselves with the a central pandas data structure–DataFrame. Installing and exploring pandas The minimal dependency set requirements for pandas is given as follows: NumPy: This is the fundamental numerical array package that we installed and covered extensively in the preceding chapters python-dateutil: This is a date handling library pytz: This handles time zone definitions This list is the bare minimum; a longer list of optional dependencies can be located at http://pandas.pydata.org/pandas-docs/stable/install.html. We can install pandas via PyPI with pip or easy_install, using a binary installer, with the aid of our operating system package manager, or from the source by checking out the code. The binary installers can be downloaded from http://pandas.pydata.org/getpandas.html. The command to install pandas with pip is as follows: $ pip3 install pandas rpy2 rpy2 is an interface to R and is required because rpy is being deprecated. You may have to prepend the preceding command with sudo if your user account doesn't have sufficient rights. The pandas DataFrames A pandas DataFrame is a labeled two-dimensional data structure and is similar in spirit to a worksheet in Google Sheets or Microsoft Excel, or a relational database table. The columns in pandas DataFrame can be of different types. A similar concept, by the way, was invented originally in the R programming language. (For more information, refer to http://www.r-tutor.com/r-introduction/data-frame). A DataFrame can be created in the following ways: Using another DataFrame. Using a NumPy array or a composite of arrays that has a two-dimensional shape. Likewise, we can create a DataFrame out of another pandas data structure called Series. We will learn about Series in the following section. A DataFrame can also be produced from a file, such as a CSV file. From a dictionary of one-dimensional structures, such as one-dimensional NumPy arrays, lists, dicts, or pandas Series. As an example, we will use data that can be retrieved from http://www.exploredata.net/Downloads/WHO-Data-Set. The original data file is quite large and has many columns, so we will use an edited file instead, which only contains the first nine columns and is called WHO_first9cols.csv; the file is in the code bundle of this book. These are the first two lines, including the header: Country,CountryID,Continent,Adolescent fertility rate (%),Adult literacy rate (%),Gross national income per capita (PPP international $),Net primary school enrolment ratio female (%),Net primary school enrolment ratio male (%),Population (in thousands) totalAfghanistan,1,1,151,28,,,,26088 In the next steps, we will take a look at pandas DataFrames and its attributes: To kick off, load the data file into a DataFrame and print it on the screen: from pandas.io.parsers import read_csv df = read_csv("WHO_first9cols.csv") print("Dataframe", df) The printout is a summary of the DataFrame. It is too long to be displayed entirely, so we will just grab the last few lines: 199 21732.0 200 11696.0 201 13228.0 [202 rows x 9 columns] The DataFrame has an attribute that holds its shape as a tuple, similar to ndarray. Query the number of rows of a DataFrame as follows: print("Shape", df.shape) print("Length", len(df)) The values we obtain comply with the printout of the preceding step: Shape (202, 9) Length 202 Check the column header and data types with the other attributes: print("Column Headers", df.columns) print("Data types", df.dtypes) We receive the column headers in a special data structure: Column Headers Index([u'Country', u'CountryID', u'Continent', u'Adolescent fertility rate (%)', u'Adult literacy rate (%)', u'Gross national income per capita (PPP international $)', u'Net primary school enrolment ratio female (%)', u'Net primary school enrolment ratio male (%)', u'Population (in thousands) total'], dtype='object') The data types are printed as follows: 4. The pandas DataFrame has an index, which is like the primary key of relational database tables. We can either specify the index or have pandas create it automatically. The index can be accessed with a corresponding property, as follows: Print("Index", df.index) An index helps us search for items quickly, just like the index in this book. In our case, the index is a wrapper around an array starting at 0, with an increment of one for each row: Sometimes, we wish to iterate over the underlying data of a DataFrame. Iterating over column values can be inefficient if we utilize the pandas iterators. It's much better to extract the underlying NumPy arrays and work with those. The pandas DataFrame has an attribute that can aid with this as well: print("Values", df.values) Please note that some values are designated nan in the output, for 'not a number'. These values come from empty fields in the input datafile: The preceding code is available in Python Notebook ch-03.ipynb, available in the code bundle of this book. Querying data in pandas Since a pandas DataFrame is structured in a similar way to a relational database, we can view operations that read data from a DataFrame as a query. In this example, we will retrieve the annual sunspot data from Quandl. We can either use the Quandl API or download the data manually as a CSV file from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual. If you want to install the API, you can do so by downloading installers from https://pypi.python.org/pypi/Quandl or by running the following command: $ pip3 install Quandl Using the API is free, but is limited to 50 API calls per day. If you require more API calls, you will have to request an authentication key. The code in this tutorial is not using a key. It should be simple to change the code to either use a key or read a downloaded CSV file. If you have difficulties, search through the Python docs at https://docs.python.org/2/. Without further preamble, let's take a look at how to query data in a pandas DataFrame: As a first step, we obviously have to download the data. After importing the Quandl API, get the data as follows: import quandl # Data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual # PyPi url https://pypi.python.org/pypi/Quandl sunspots = quandl.get("SIDC/SUNSPOTS_A") The head() and tail() methods have a purpose similar to that of the Unix commands with the same name. Select the first n and last n records of a DataFrame, where n is an integer parameter: print("Head 2", sunspots.head(2) ) print("Tail 2", sunspots.tail(2)) This gives us the first two and last two rows of the sunspot data (for the sake of brevity we have not shown all the columns here; your output will have all the columns from the dataset): Head 2 Number Year 1700-12-31 5 1701-12-31 11 [2 rows x 1 columns] Tail 2 Number Year 2012-12-31 57.7 2013-12-31 64.9 [2 rows x 1 columns] Please note that we only have one column holding the number of sunspots per year. The dates are a part of the DataFrame index. The following is the query for the last value using the last date: last_date = sunspots.index[-1] print("Last value", sunspots.loc[last_date]) You can check the following output with the result from the previous step: Last value Number 64.9 Name: 2013-12-31 00:00:00, dtype: float64 Query the date with date strings in the YYYYMMDD format as follows: print("Values slice by date:n", sunspots["20020101": "20131231"]) This gives the records from 2002 through to 2013: Values slice by date Number Year 2002-12-31 104.0 [TRUNCATED] 2013-12-31 64.9 [12 rows x 1 columns] A list of indices can be used to query as well: print("Slice from a list of indices:n", sunspots.iloc[[2, 4, -4, -2]]) The preceding code selects the following rows: Slice from a list of indices Number Year 1702-12-31 16.0 1704-12-31 36.0 2010-12-31 16.0 2012-12-31 57.7 [4 rows x 1 columns] To select scalar values, we have two options. The second option given here should be faster. Two integers are required, the first for the row and the second for the column: print("Scalar with Iloc:", sunspots.iloc[0, 0]) print("Scalar with iat", sunspots.iat[1, 0]) This gives us the first and second values of the dataset as scalars: Scalar with Iloc 5.0 Scalar with iat 11.0 Querying with Booleans works much like the Where clause of SQL. The following code queries for values larger than the arithmetic mean. Note that there is a difference between when we perform the query on the whole DataFrame and when we perform it on a single column: print("Boolean selection", sunspots[sunspots > sunspots.mean()]) print("Boolean selection with column label:n", sunspots[sunspots['Number of Observations'] > sunspots['Number of Observations'].mean()]) The notable difference is that the first query yields all the rows, with some rows not conforming to the condition that has a value of NaN. The second query returns only the rows where the value is larger than the mean: Boolean selection Number Year 1700-12-31 NaN [TRUNCATED] 1759-12-31 54.0 ... [314 rows x 1 columns] Boolean selection with column label Number Year 1705-12-31 58.0 [TRUNCATED] 1870-12-31 139.1 ... [127 rows x 1 columns] The preceding example code is in the ch_03.ipynb file of this book's code bundle. Data aggregation with pandas DataFrames Data aggregation is a term used in the field of relational databases. In a database query, we can group data by the value in a column or columns. We can then perform various operations on each of these groups. The pandas DataFrame has similar capabilities. We will generate data held in a Python dict and then use this data to create a pandas DataFrame. We will then practice the pandas aggregation features: Seed the NumPy random generator to make sure that the generated data will not differ between repeated program runs. The data will have four columns: Weather (a string) Food (also a string) Price (a random float) Number (a random integer between one and nine) The use case is that we have the results of some sort of consumer-purchase research, combined with weather and market pricing, where we calculate the average of prices and keep a track of the sample size and parameters: import pandas as pd from numpy.random import seed from numpy.random import rand from numpy.random import rand_int import numpy as np seed(42) df = pd.DataFrame({'Weather' : ['cold', 'hot', 'cold','hot', 'cold', 'hot', 'cold'], 'Food' : ['soup', 'soup', 'icecream', 'chocolate', 'icecream', 'icecream', 'soup'], 'Price' : 10 * rand(7), 'Number' : rand_int(1, 9,)}) print(df) You should get an output similar to the following: Please note that the column labels come from the lexically ordered keys of the Python dict. Lexical or lexicographical order is based on the alphabetic order of characters in a string. Group the data by the Weather column and then iterate through the groups as follows: weather_group = df.groupby('Weather') i = 0 for name, group in weather_group: i = i + 1 print("Group", i, name) print(group) We have two types of weather, hot and cold, so we get two groups: The weather_group variable is a special pandas object that we get as a result of the groupby() method. This object has aggregation methods, which are demonstrated as follows: print("Weather group firstn", weather_group.first()) print("Weather group lastn", weather_group.last()) print("Weather group meann", weather_group.mean()) The preceding code snippet prints the first row, last row, and mean of each group: Just as in a database query, we are allowed to group on multiple columns. The groups attribute will then tell us the groups that are formed, as well as the rows in each group: wf_group = df.groupby(['Weather', 'Food']) print("WF Groups", wf_group.groups) For each possible combination of weather and food values, a new group is created. The membership of each row is indicated by their index values as follows: WF Groups {('hot', 'chocolate'): [3], ('cold', 'icecream'): [2, 4], ('hot', 'icecream'): [5], ('hot', 'soup'): [1], ('cold', 'soup'): [0, 6] 5. Apply a list of NumPy functions on groups with the agg() method: print("WF Aggregatedn", wf_group.agg([np.mean, np.median])) Obviously, we could apply even more functions, but it would look messier than the following output: Concatenating and appending DataFrames The pandas DataFrame allows operations that are similar to the inner and outer joins of database tables. We can append and concatenate rows as well. To practice appending and concatenating of rows, we will reuse the DataFrame from the previous section. Let's select the first three rows: print("df :3n", df[:3]) Check that these are indeed the first three rows: df :3 Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold The concat() function concatenates DataFrames. For example, we can concatenate a DataFrame that consists of three rows to the rest of the rows, in order to recreate the original DataFrame: print("Concat Back togethern", pd.concat([df[:3], df[3:]])) The concatenation output appears as follows: Concat Back together Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 3 chocolate 8 5.986585 hot 4 icecream 8 1.560186 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [7 rows x 4 columns] To append rows, use the append() function: print("Appending rowsn", df[:3].append(df[5:])) The result is a DataFrame with the first three rows of the original DataFrame and the last two rows appended to it: Appending rows Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [5 rows x 4 columns] Joining DataFrames To demonstrate joining, we will use two CSV files-dest.csv and tips.csv. The use case behind it is that we are running a taxi company. Every time a passenger is dropped off at his or her destination, we add a row to the dest.csv file with the employee number of the driver and the destination: EmpNr,Dest5,The Hague3,Amsterdam9,Rotterdam Sometimes drivers get a tip, so we want that registered in the tips.csv file (if this doesn't seem realistic, please feel free to come up with your own story): EmpNr,Amount5,109,57,2.5 Database-like joins in pandas can be done with either the merge() function or the join() DataFrame method. The join() method joins onto indices by default, which might not be what you want. In SQL a relational database query language we have the inner join, left outer join, right outer join, and full outer join. An inner join selects rows from two tables, if and only if values match, for columns specified in the join condition. Outer joins do not require a match, and can potentially return more rows. More information on joins can be found at http://en.wikipedia.org/wiki/Join_%28SQL%29. All these join types are supported by pandas, but we will only take a look at inner joins and full outer joins: A join on the employee number with the merge() function is performed as follows: print("Merge() on keyn", pd.merge(dests, tips, on='EmpNr')) This gives an inner join as the outcome: Merge() on key EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] Joining with the join() method requires providing suffixes for the left and right operands: print("Dests join() tipsn", dests.join(tips, lsuffix='Dest', rsuffix='Tips')) This method call joins index values so that the result is different from an SQL inner join: Dests join() tips EmpNrDest Dest EmpNrTips Amount 0 5 The Hague 5 10.0 1 3 Amsterdam 9 5.0 2 9 Rotterdam 7 2.5 [3 rows x 4 columns] An even more explicit way to execute an inner join with merge() is as follows: print("Inner join with merge()n", pd.merge(dests, tips, how='inner')) The output is as follows: Inner join with merge() EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] To make this a full outer join requires only a small change: print("Outer joinn", pd.merge(dests, tips, how='outer')) The outer join adds rows with NaN values: Outer join EmpNr Dest Amount 0 5 The Hague 10.0 1 3 Amsterdam NaN 2 9 Rotterdam 5.0 3 7 NaN 2.5 [4 rows x 3 columns] In a relational database query, these values would have been set to NULL. The demo code is in the ch-03.ipynb file of this book's code bundle. We learnt how to perform various data manipulation techniques such as aggregating, concatenating, appending, cleaning, and handling missing values, with pandas. If you found this post useful, check out the book Python Data Analysis - Second Edition to learn advanced topics such as signal processing, textual data analysis, machine learning, and more.

0
0
6574

article-image-docker-container-management-with-saltstack

Nicole Thomas

25 Sep 2014

8 min read

Docker Container Management at Scale with SaltStack

Nicole Thomas

25 Sep 2014

8 min read

Every once in a while a technology comes along that changes the way work is done in a data center. It happened with virtualization and we are seeing it with various cloud computing technologies and concepts. But most recently, the rise in popularity of container technology has given us reason to rethink interdependencies within software stacks and how we run applications in our infrastructures. Despite all of the enthusiasm around the potential of containers, they are still just another thing in the data center that needs to be centrally controlled and managed...often at massive scale. This article will provide an introduction to how SaltStack can be used to manage all of this, including container technology, at web scale. SaltStack Systems Management Software The SaltStack systems management software is built for remote execution, orchestration, and configuration management and is known for being fast, scalable, and flexible. Salt is easy to set up and can easily communicate asynchronously with tens of thousands of servers in a matter of seconds. Salt was originally built as a remote execution engine relying on a secure, bi-directional communication system utilizing a Salt Master daemon used to control Salt Minion daemons, where the minions receive commands from the remote master. Salt’s configuration management capabilities are called the state system, or Salt States, and are built on SLS formulas. SLS files are data structures based on dictionaries, lists, strings, and numbers that are used to enforce the state that a system should be in, also known as configuration management. SLS files are easy to write, simple to implement, and are typically written in YAML. State file execution occurs on the Salt Minion. Therefore, once any states files that the infrastructure requires have been written, it is possible to enforce state on tens of thousands of hosts simultaneously. Additionally, each minion returns its status to the Salt Master. Docker containers Docker is an agile runtime and packaging platform specializing in enabling applications to be quickly built, shipped, and run in any environment. Docker containers allow developers to easily compartmentalize their applications so that all of the program’s needs are installed and configured in a single container. This container can then be dropped into clouds, data centers, laptops, virtual machines (VMs), or infrastructures where the app can execute, unaltered, without any cross-environment issues. Building Docker containers is fast, simple, and inexpensive. Developers can create a container, change it, break it, fix it, throw it away, or save it much like a VM. However, Docker containers are much more nimble, as the containers only possess the application and its dependencies. VMs, along with the application and its dependencies, also install a guest operating system, which requires much more overhead than may be necessary. While being able to slice deployments into confined components is a good idea for application portability and resource allocation, it also means there are many more pieces that need to be managed. As the number of containers scales, without proper management, inefficiencies abound. This scenario, known as container sprawl, is where SaltStack can help and the combination of SaltStack and Docker quickly proves its value. SaltStack + Docker When we combine SaltStack’s powerful configuration management armory with Docker’s portable and compact containerization tools, we get the best of both worlds. SaltStack has added support for users to manage Docker containers on any infrastructure. The following demonstration illustrates how SaltStack can be used to manage Docker containers using Salt States. We are going to use a Salt Master to control a minimum of two Salt Minions. On one minion we will install HAProxy. On the other minion, we will install Docker and then spin up two containers on that minion. Each container hosts a PHP app with Apache that displays information about each container’s IP address and exposed port. The HAProxy minion will automatically update to pull the IP address and port number from each of the Docker containers housed on the Docker minion. You can certainly set up more minions for this experiment, where additional minions would be additional Docker container minions, to see an implementation of containers at scale, but it is not necessary to see what is possible. First, we have a few installation steps to take care of. SaltStack’s installation documentation provides a thorough overview of how to get Salt up and running, depending on your operating system. Follow the instructions on Salt’s Installation Guide to install and configure your Salt Master and two Salt Minions. Go through the process of accepting the minion keys on the Salt Master and, to make sure the minions are connected, run: salt ‘*’ test.ping Note: Much of the Docker functionality used in this demonstration is brand new. You must install the latest supported version of Salt, which is 2014.1.6. For reference, I have my master and minions all running on Ubuntu 14.04. My Docker minion is named “docker1” and my HAProxy minion is named “haproxy”. Now that you have a Salt Master and two Salt Minions configured and communicating with each other, it’s time to get to work. Dave Boucha, a Senior Engineer at SaltStack, has already set up the necessary configuration files for both our haproxy and docker1 minions and we will be using those to get started. First, clone the Docker files in Dave’s GitHub repository, dock-apache, and copy the dock_apache and docker> directories into the /srv/salt directory on your Salt Master (you may need to create the /srv/salt directory): cp -r path/to/download/dock_apache /srv/salt cp -r path/to/download/docker /srv/salt The init.sls file inside the docker directory is a Salt State that installs Docker and all of its dependencies on your minion. The docker.pgp file is a public key that is used during the Docker installation. To fire off all of these events, run the following command: salt ‘docker1’ state.sls docker Once Docker has successfully installed on your minion, we need to set up our two Docker containers. The init.sls file in the dock_apache directory is a Salt State that specifies the image to pull from Docker’s public container repository, installs the two containers, and assigns the ports that each container should expose: 8000 for the first container and 8080 for the second container. Now, let’s run the state: salt ‘docker1’ state.sls dock_apache Let’s check and see if it worked. Get the IP address of the minion by running: salt ‘docker1’ network.ip_addrs Copy and paste the IP address into a browser and add one of the ports to the end. Try them both (IP:8000 and IP:8080) to see each container up and running! At this point, you should have two Docker containers installed on your “docker1” minion (or any other docker minions you may have created). Now we need to set up the “haproxy” minion. Download the files from Dave’s GitHub repository, haproxy-docker, and place them all in your Salt Master’s /srv/salt/haproxy directory: cp -r path/to/download/haproxy /srv/salt First, we need to retrieve the data about our Docker containers from the Docker minion(s), such as the container IP address and port numbers, by running the haproxy.config salt ‘haproxy’ state.sls haproxy.config By running haproxy.config, we are setting up a new configuration with the Docker minion data. The configuration file also runs the haproxy state (init.sls. The init.sls file contains a haproxy state that ensures that HAProxy is installed on the minion, sets up the configuration for HAProxy, and then runs HAProxy. You should now be able to look at the HAProxy statistics by entering the haproxy minion’s IP address in your browser with /haproxy?stats on the end of the URL. When the authentication pop-up appears, the username and password are both saltstack. While this is a simple example of running HAProxy to balance the load of only two Docker containers, it demonstrates how Salt can be used to manage tens of thousands of containers as your infrastructure scales. To see this same demonstration with a haproxy minion in action with 99 Docker minions, each with two Docker containers installed, check out SaltStack’s Salt Air 20 tutorial by Dave Boucha on YouTube. About the author Nicole Thomas is a QA Engineer at SaltStack, Inc. Before coming to SaltStack, she wore many hats from web and Android developer to contributing editor to working in Environmental Education. Nicole recently graduated Summa Cum Laude from Westminster College with a degree in Computer Science. Nicole also has a degree in Environmental Studies from the University of Utah.

0
0
6557

article-image-searching-data-using-phpmyadmin-and-mysql

Packt

09 Oct 2009

4 min read

Searching Data using phpMyAdmin and MySQL

Packt

09 Oct 2009

4 min read

In this article by Marc Delisle, we present mechanisms that can be used to find the data we are looking for instead of just browsing tables page-by-page and sorting them. This article covers single-table and whole database searches. Single-Table Searches This section describes the Search sub-page where single-table search is available. Daily Usage of phpMyAdmin The main use for the tool for some users is with the Search mode, for finding and updating data. For this, the phpMyAdmin team has made it possible to define which sub-page is the starting page in Table view, with the $cfg['DefaultTabTable'] parameter. Setting it to 'tbl_select.php' defines the default sub-page to search. With this mode, application developers can look for data in ways not expected by the interface they are building, adjusting and sometimes repairing data. Entering the Search Sub-Page The Search sub-page can be accessed by clicking the Search link in the Table view. This has been done here for the book table: Selection of Display Fields The first panel facilitates a selection of the fields to be displayed in the results: All fields are selected by default, but we can control-click other fields to make the necessary selections. Mac users would use command-click to select / unselect the fields. Here are the fields of interest to us in this example: We can also specify the number of rows per page in the textbox just next to the field selection. The Add search conditions box will be explained in the Applying a WHERE Clause section later in this article. Ordering the Results The Display order dialog permits to specify an initial sorting order for the results to come. In this dialog, a drop-down menu contains all the table's columns; it's up to us to select the one on which we want to sort. By default, the sorting will be in Ascending order, but a choice of Descending order is available. It should be noted that on the results page, we can also change the sort order. Search Criteria by Field: Query by Example The main usage of the Search panel is to enter criteria for some fields so as to retrieve only the data in which we are interested. This is called Query by example because we give an example of what we are looking for. Our first retrieval will concern finding the book with ISBN 1-234567-89-0. We simply enter this value in the isbn box and choose the = operator: Clicking on Go gives the results shown in the following screenshot. The four fields displayed are those selected in the Select fields dialog: This is a standard results page. If the results ran in pages, we could navigate through them, and edit and delete data for the subset we chose during the process. Another feature of phpMyAdmin is that the fields used as the criteria are highlighted by changing the border color of the columns to better reflect their importance on the results page. It isn't necessary to specify that the isbn column be displayed. We could have selected only the title column for display and selected the isbn column as a criterion. Print View We see the Print view and Print view (with full texts) links on the results page. These links produce a more formal report of the results (without the navigation interface) directly to the printer. In our case, using Print view would produce the following: This report contains information about the server, database, time of generation, version of phpMyAdmin, version of MySQL, and SQL query used. The other link, Print view (with full texts) would print the contents of TEXT fields in its entirety.

0
0
6550

Creators of Python, Java, C#, and Perl discuss the evolution and future of programming language design at PuPPy

Data Visualization with ggplot2

Getting Started with Blender’s Particle System

Optimization in Python

Customization in Microsoft Dynamics CRM

The elements of WebAssembly - Wat and Wasm, explained [Tutorial]

Implementing Stacks using JavaScript

Classifying with Real-world Examples

Installation and Setup

How to work with classes in Typescript

Trending Topics

Microsoft Office Excel Programming Using VSTO 3.0

The strange relationship between objects, functions, generators and coroutines

Working with pandas DataFrames

Docker Container Management at Scale with SaltStack

Searching Data using phpMyAdmin and MySQL