Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-creators-of-python-java-c-and-perl-discuss-the-evolution-and-future-of-programming-language-design-at-puppy
Bhagyashree R
08 Apr 2019
11 min read
Save for later

Creators of Python, Java, C#, and Perl discuss the evolution and future of programming language design at PuPPy

Bhagyashree R
08 Apr 2019
11 min read
At the first annual charity event conducted by Puget Sound Programming Python (PuPPy) last Tuesday, four legendary language creators came together to discuss the past and future of language design. This event was organized to raise funds for Computer Science for All (CSforALL), an organization which aims to make CS an integral part of the educational experience. Among the panelists were the creators of some of the most popular programming languages: Guido van Rossum, the creator of Python James Gosling, the founder, and lead designer behind the Java programming language Anders Hejlsberg, the original author of Turbo Pascal who has also worked on the development of C# and TypeScript Larry Wall, the creator of Perl The discussion was moderated by Carol Willing, who is currently a Steering Council member and developer for Project Jupyter. She is also a member of the inaugural Python Steering Council, a Python Software Foundation Fellow and former Director. Key principles of language design The first question thrown at the panelists was, “What are the principles of language design?” Guido van Rossum believes: [box type="shadow" align="" class="" width=""]Designing a programming language is very similar to the way JK Rowling writes her books, the Harry Potter series.[/box] When asked how he says JK Rowling is a genius in the way that some details that she mentioned in her first Harry Potter book ended up playing an important plot point in part six and seven. Explaining how this relates to language design he adds, “In language design often that's exactly how things go”. When designing a language we start with committing to certain details like the keywords we want to use, the style of coding we want to follow, etc. But, whatever we decide on we are stuck with them and in the future, we need to find new ways to use those details, just like Rowling. “The craft of designing a language is, in one hand, picking your initial set of choices so that's there are a lot of possible continuations of the story. The other half of the art of language design is going back to your story and inventing creative ways of continuing it in a way that you had not thought of,” he adds. When James Gosling was asked how Java came into existence and what were the design principles he abided by, he simply said, “it didn’t come out of like a personal passion project or something. It was actually from trying to build a prototype.” James Gosling and his team were working on a project that involved understanding the domain of embedded systems. For this, they spoke to a lot of developers who built software for embedded systems to know how their process works. This project had about a dozen people on it and Gosling was responsible for making things much easier from a programming language point of view. “It started out as kind of doing better C and then it got out of control that the rest of the project really ended up just providing the context”, he adds. In the end, the only thing out of that project survived was “Java”. It was basically designed to solve the problems of people who are living outside of data centers, people who are getting shredded by problems with networking, security, and reliability. Larry Wall calls himself a “linguist” rather than a computer scientist. He wanted to create a language that was more like a natural language. Explaining through an example, he said, “Instead of putting people in a university campus and deciding where they go we're just gonna see where people want to walk and then put shortcuts in all those places.” A basic principle behind creating Perl was to provide APIs to everything. It was aimed to be both a good text processing language linguistically but also a glue language. Wall further shares that in the 90s the language was stabilizing, but it did have some issues. So, in the year 2000, the Perl team basically decided to break everything and came up with a whole new set of design principles. And, based on these principles Perl was redesigned into Perl 6. Some of these principles were picking the right default, conserve your brackets because even Unicode does not have enough brackets, don't reinvent object orientation poorly, etc. He adds, [box type="shadow" align="" class="" width=""]“A great deal of the redesign was to say okay what is the right peg to hang everything on? Is it object-oriented? Is it something in the lexical scope or in the larger scope? What does the right peg to hang each piece of information on and if we don't have that peg how do we create it?”[/box] Anders Hejlsberg shares that he follows a common principle in all the languages he has worked on and that is “there's only one way to do a particular thing.” He believes that if a developer is provided with four different ways he may end up choosing the wrong path and realize it later in the development. According to Hejlsberg, this is why often developers end up creating something called “simplexity” which means taking something complex and wrapping a single wrapper on top it so that the complexity goes away. Similar to the views of Guido van Rossum, he further adds that any decision that you make when designing a language you have to live with it. When designing a language you need to be very careful about reasoning over what “not” to introduce in the language. Often, people will come to you with their suggestions for updates, but you cannot really change the nature of the programming language. Though you cannot really change the basic nature of a language, you can definitely extend it through extensions. You essentially have two options, either stay true to the nature of the language or you develop a new one. The type system of programming languages Guido van Rossum, when asked about the typing approach in Python, shared how it was when Python was first introduced. Earlier, int was not a class it was actually a little conversion function. If you wanted to convert a string to an integer you can do that with a built-in function. Later on, Guido realized that this was a mistake. “We had a bunch of those functions and we realized that we had made a mistake, we have given users classes that were different from the built-in object types.” That's where the Python team decided to reinvent the whole approach to types in Python and did a bunch of cleanups. So, they changed the function int into a designator for the class int. Now, calling the class means constructing an instance of the class. James Gosling shared that his focus has always been performance and one factor for improving performance is the type system. It is really useful for things like building optimizing compilers and doing ahead of time correctness checking. Having the type system also helps in cases where you are targeting small footprint devices. “To do that kind of compaction you need every kind of hope that it gives you, every last drop of information and, the earlier you know it, the better job you do,” he adds. Anders Hejlsberg looks at type systems as a tooling feature. Developers love their IDEs, they are accustomed to things like statement completion, refactoring, and code navigation. These features are enabled by the semantic knowledge of your code and this semantic knowledge is provided by a compiler with a type system. Hejlsberg believes that adding types can dramatically increase the productivity of developers, which is a counterintuitive thought. “We think that dynamic languages were easier to approach because you've got rid of the types which was a bother all the time. It turns out that you can actually be more productive by adding types if you do it in a non-intrusive manner and if you work hard on doing good type inference and so forth,” he adds. Talking about the type system in Perl, Wall started off by saying that Perl 5 and Perl 6 had very different type systems. In Perl 5, everything was treated as a string even if it is a number or a floating point. The team wanted to keep this feature in Perl 6 as part of the redesign, but they realized that “it's fine if the new user is confused about the interchangeability but it's not so good if the computer is confused about things.” For Perl 6, Wall and his team envisioned to make it a better object-oriented as well as a better functional programming language. To achieve this goal, it is important to have a very sound type system of a sound meta object model underneath. And, you also need to take the slogans like “everything is an object, everything is a closure” very seriously. What makes a programming language maintainable Guido van Rossum believes that to make a programming language maintainable it is important to hit the right balance between the flexible and disciplined approach. While dynamic typing is great for small programs, large programs require a much-disciplined approach. And, it is better if the language itself enables that discipline rather than giving you the full freedom of doing whatever you want. This is why Guido is planning to add a very similar technology like TypeScript to Python. He adds: [box type="shadow" align="" class="" width=""]“TypeScript is actually incredibly useful and so we're adding a very similar idea to Python. We are adding it in a slightly different way because we have a different context."[/box] Along with type system, refactoring engines can also prove to be very helpful. It will make it easier to perform large scale refactorings like millions of lines of code at once. Often, people do not rename methods because it is really hard to go over a piece of code and rename exactly this right variable. If you are provided with a refactoring engine, you just need to press a couple of buttons, type in the new name, and it will be refactored in maybe just 30 seconds. The origin of the TypeScript project was these enormous JavaScript codebases. As these codebases became bigger and bigger, it became quite difficult to maintain them. These codebases basically became “write-only code” shared Anders Hejlsberg. He adds that this is why we need a semantic understanding of the code, which makes refactoring much easier. “This semantic understanding requires a type system to be in place and once you start adding that you add documentation to the code,” added Hejlsberg. Wall also supports the same thought that “good lexical scoping helps with refactoring”. The future of programming language design When asked about the future of programming design, James Gosling shared that a very underexplored area in programming is writing code for GPUs. He highlights the fact that currently, we do not have any programming language that works like a charm with GPUs and much work is needed to be done in that area. Anders Hejlsberg rightly mentioned that programming languages do not move with the same speed as hardware or all the other technologies. In terms of evolution, programming languages are more like maths and the human brain. He said, “We're still programming in languages that were invented 50 years ago, all of the principles of functional programming were thought of more than 50 years ago.” But, he does believe that instead of segregating into separate categories like object-oriented or functional programming, now languages are becoming multi-paradigm. [box type="shadow" align="" class="" width=""]“Languages are becoming more multi-paradigm. I think it is wrong to talk about oh I only like object-oriented programming, or imperative programming, or functional programming language.”[/box] Now, it is important to be aware of the recent researches, the new thinking, and the new paradigms. Then we need to incorporate them in our programming style, but tastefully. Watch this talk conducted by PuPPy to know more in detail. Python 3.8 alpha 2 is now available for testing ISO C++ Committee announces that C++20 design is now feature complete Using lambda expressions in Java 11 [Tutorial]
Read more
  • 0
  • 0
  • 6637

article-image-data-visualization-ggplot2
Janu Verma
14 Nov 2016
6 min read
Save for later

Data Visualization with ggplot2

Janu Verma
14 Nov 2016
6 min read
In this post, we will learn about data visualization using ggplot2. ggplot2 is an R package for data exploration and visualization. It produces amazing graphics that are easy to interpret. The main use of ggplot2 is in exploratory analysis, and it is an important element of a data scientist’s toolkit. The ease with which complex graphs can be plotted using ggplot2 is probably its most attractive feature. It also allows you to slice and dice data in many different ways. ggplot2 is an implementation of A Layered Grammar of Graphics by Hadley Wickham, who is certainly the strongest R programmer out there. Installation Installing packages in R is very easy. Just type the following command on the R prompt. install.packages("ggplot2") Import the package in your R code. library(ggplot2) qplot We will start with the function qplot(). qplot is the simplest plotting function in ggplot2. It is very similar to the generic plot() function in basic R. We will learn how to plot basic statistical and exploratory plots using qplot. We will use the Iris dataset that comes with the base R package (and with every other data mining package that I know of). The Iris data consists of observations of phenotypic traits of three species of iris. In R, the iris data is provided as a data frame of 150 rows and 5 columns. The head command will print first 6 rows of the data. head(iris) The general syntax for the qplot function is: qplot(x,y, data=data.frame) We will plot sepal length against petal length: qplot(Sepal.Length, Petal.Length, data = iris) We can color each point by adding a color argument. We will color each point by what species it belongs to. qplot(Sepal.Length, Petal.Length, data = iris, color = Species) An observant reader would notice that this coloring scheme provides a way to visualize clustering. Also, we can change the size of each point by adding a size argument. Let the size correspond to the petal width. qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width) Thus we have a visualization for four-dimensional data. The alpha argument can be used to increase the transparency of the circles. qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width, alpha = I(0.7))   This reduces the over-plotting of the data. Label the graph using xlab, ylab, and main arguments. qplot(Sepal.Length, Petal.Length, data = iris, color = Species, xlab = "Sepal Length", ylab = "Petal Length", main = "Sepal vs. Petal Length in Fisher's Iris data")   All the above graphs were scatterplots. We can use the geom argument to draw other types of graphs. Histogram qplot(Sepal.Length, data = iris, geom="bar")   Line chart qplot(Sepal.Length, Petal.Length, data = iris, geom = "line", color = Species) ggplot Now we'll move to the ggplot() function, which has a much broader range of graphing techniques. We'll start with the basic plots similar to what we did with qplot(). First things first, load the library: library(ggplot2) As before, we will use the iris dataset. For ggplot(), we generate aesthetic mappings that describe how variables in the data are mapped to visual properties. This is specified by the aes function. Scatterplot ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point() This is exactly what we got for qplot(). The syntax is a bit unintuitive, but is very consistent. The basic structure is: ggplot(data.frame, aes(x=, y=, ...)) + geom_*(.) + .... Add Colors in the scatterplot. ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point() The geom argument We can other geoms to create different types of graphs, for example, linechart: ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color=Species)) + geom_line() + ggtitle("Plot of sepal length vs. petal length") Histogram ggplot(iris, aes(x = Sepal.Length)) + geom_histogram(binwidth = .2) Histogram with color. Use the fill argument ggplot(iris, aes(x = Sepal.Length, fill=Species)) + geom_histogram(binwidth = .2)   The position argument can fine tune positioning to achieve useful effects; for example, we can adjust the position by dodging overlaps to the side. ggplot(iris, aes(x = Sepal.Length, fill = Species)) + geom_histogram(binwidth = .2, position = "dodge")   Labeling ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color=Species)) + geom_point() + ggtitle("Plot of sepal length vs. petal length") size and alpha arguments ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color=Species, size=Petal.Width)) + geom_point(alpha=0.7) + ggtitle("Plot of sepal length vs. petal length") We can also transform variables directly in the ggplot call. ggplot(iris, aes(x = log(Sepal.Length), y = Petal.Length/Petal.Width, color=Species)) + geom_point()   ggplot allows slicing of data. A way to split up the way we look at data is with the facets argument. These break the plot into multiple plots. ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color=Species)) + geom_point() + facet_wrap(~Species) + ggtitle("Plot of sepal length vs. petal length")   Themes We can use a whole range of themes for ggplot using the R package ggthemes. install.packages('ggthemes', dependencies = TRUE) library(ggthemes) Essentially you add the theme_*() argument to the ggplot call. The Economist theme: For someone like me who reads The Economist regularly and might work there one day (ask for my resume if you know someone!!!!), it would be fun/useful to try to reproduce some of the graphs they publish. We may not have the data available, but we have the theme. ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point() + theme_economist() Five Thirty Eight theme: Nate Silver is probably the most famous statistician (data scientist). His company, Five Thirty Eight, is a must for any data scientist. Good folks at ggthemes have created a 538 theme as well. ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point() + theme_fivethirtyeight()   About the author Janu Verma is a researcher in the IBM T.J. Watson Research Center, New York. His research interests are mathematics, machine learning, information visualization, computational biology and healthcare analytics. He has held research positions at Cornell University, Kansas State University, Tata Institute of Fundamental Research, Indian Institute of Science, and Indian Statistical Institute.  He has written papers for IEEE Vis, KDD, International Conference on HealthCare Informatics, Computer Graphics and Applications, Nature Genetics, IEEE Sensors Journals, and so on.  His current focus is on the development of visual analytics systems for prediction and understanding. He advises start-ups and companies on data science and machine learning in the Delhi-NCR area.
Read more
  • 0
  • 0
  • 6634

Packt
19 Jun 2010
10 min read
Save for later

Getting Started with Blender’s Particle System

Packt
19 Jun 2010
10 min read
We’ll cover a general introduction about Blender’s particle system; how to use it, when to use it, how useful it is on both small scale and large scale projects, and an overview of its implications. In this article, I’ll also discuss how to use the particle system to instance objects These are just but very few examples of what can really be achieved with the particle system, there’s a vast array of possibilities out there that can be discovered. But I’m hoping with these examples and definitions, you’ll find your way through the wonderful world of visual effects with the particle system. If you remember watching Blender Foundation’s Big Buck Bunny, it might not seem as a surprise to you that almost the entire film is filled up with so much particle goodness. Thanks to that, great improvements have been made to Blender’s particle system. As compared to my previous articles, this one uses Blender 2.5, the latest version of Blender which is currently in Alpha stage right now but is fully workable. However, in this article I wouldn’t be telling you what each of the button would do since that would take another article in itself and might deviate from an introductory approach. If you wanted to know more in-depth information about the whole particle system, you can check out the official Blender documentation or the Blender wiki. There had been great improvements on the particle system and now is the right time to start trying it out. So what are you waiting for? Hop on and ride with me in this journey! You can watch the video here. Prerequisites Before we begin with this article, it’s vital to first determine our requirements before proceeding on the actual application. Before going on, we need to have the following: Latest Blender 2.5 (you can grab a copy at http://www.blender.org or http://www.graphicall.org) Adequate hard disk space Decent processor and graphics card What is a Particle System and where can we find it in Blender? A particle system is a technique in Computer Graphics that is used to simulate a visual effect that would otherwise be very difficult and cumbersome if not impossible to do in traditional 3D techniques. These effects include crowd simulation, hair, fur, grass, fire, smoke, fluids, dust, snowflakes, and more. Particle Systems are calculated inside of your simulation program in several ways and one of the most common ones is the Newtonian Physics calculation which regards forces like wind, magnetism, friction, gravity, etc. to generate the particle’s motion along a controllable environment. Properties or parameters of particle systems include: mass, speed, velocity, size, life, color, transparency, softness, damping, and many more. Particle’s behavior along a certain span of time are then saved and written on to your disk. This process is called caching, which enables you to view and edit the particle’s behavior within the timeframe set. And when you’re satisfied with the way your particles act on your simulation space, you can now permanently write these settings to the disk, called baking. Be aware that the longer your particle system’s life is and the greater the number, the more hard disk space it uses, so for testing purposes, it is best to keep the particles number low, then progressively increase them as needed. Particle mass refers the natural weight (in a gravitational field) of a single particle object/point, where higher mass means a heavier particle and a lower mass implies a lighter particle. A heavy particle is more resistant to external forces such as wind but more reactive to gravitational force as compared to a lighter particle system which is a direct opposite. Particle speed/velocity refers to how fast or slow the particle points are being thrown or emitted from their source within a given time. They can be displaced and controlled in the x, y, or z directions accordingly. A higher velocity will create a faster shooting particle (as seen in muzzle flares/flash) and a lower one will create slower particle shots. Size of the particles is one of the most important setting when using a particle system as object instances, since this will better control the size of the objects on the emitting plane as compared to manually resizing the original object. With this option, you can have random sizes which will give the scene a more believable and natural look and offcourse, an easier setup as compared to creating numerous objects for this matter. Particle life refers to the lifespan of the particles, for how long they will be existent until they disappear and die from the system. In some cases, having a large particle life is useful but it can have some speed drawbacks on your machine. Imagine emitting one million particles within 5 seconds where only half of it is relevant within the first few seconds. That could mean 500,000 particles cached is a sheer waste and rendering your machine slower. Having smaller particle lives sometimes is also useful especially when you’re creating small scale smoke and fire. But all these really depend on the way you setup your scenes. Play around and you’ll find the best settings that suit your needs. Note though that in Blender, only Mesh objects can be particle emitters (which is simply rational for this purpose). You can modify the size and shape of the mesh object and the particle system reacts accordingly. The direction from which particles are emitted is dictated by the mesh normals, which I will discuss along the way. In Blender 2.5, we can find the Particle System by selecting any mesh object and clicking the Particles Button in the Properties Editor, as seen in the screenshot. Later, we’ll add and name particle systems and go over their settings more closely. Blender 2.5’s Particle System   What are the types of Blender particle system? Currently, there are only two types of particle systems in Blender 2.5, namely Emmiter and Hair.  With few on the list, these two have already proved to be very handy tools in creating robust simulations. Particle System Types   Let’s begin by learning what an Emitter Type is. Aside from the fact that the term emitter is used to define the source of the particles, the emitter type is one different thing on its own. The Emitter, being the default particle type, is one of the most commonly used particle system type. With it you can create dust, smoke, fire, fluids, snowflakes, and the like. Inside Blender, make the default cube a particle emitter with the type set as Emitter. Leave the default settings as they are and press ALT+A in the 3D Viewport to playback the animation and observe the way the particles act in your observable space, that is, the 3D space that your object is in. During the playback process, the particle system is already caching the position of the particle points in your space and writing these on to the disk. Naturally, this is how the emitter type will act on our space, with a slight pulse from emitter then it drops down as an effect of the gravity. That pulse is coming from the Normal value which tells how strong the points are spurted out until they get affected by external forces. There’s more to the emitter type than there seems to be, so don’t hold back and keep changing the settings and see where they lead you to. And if you can, please send me what you got, I’d be very pleased to see what fun things you came up with. Emitter Type   Leaving from the default settings we had awhile back, change your current frame to 1 (if it is not yet set as such) then let’s change the type from Emitter to Hair. Instantly, you’ll notice that strands come bursting out of our default mesh. This is the particle hair’s nature, whichever frame you are in now, it will stay in the same frame, regardless of any explicit animation you add in. As compared to the Emitter type, Hair doesn’t need to be cached to see the results, it’s an on-the-fly process of editing particle settings. Another great advantage of the hair type over the emitter type is that it has a dedicated Particle Mode which enables you to edit the strands as though you were actually touching real hair. With the hair type, you can create systems like fur, grass, static particle instances/grouping, and more. Hair Type   Basic and Practical Uses of the Particle System Now let’s on the real business. From here on, I’ll approach the proceeding steps in a practical application manner. First, let’s check some of the default settings and see where they lead us. Fire up a fresh Blender session by opening up Blender or by pressing CTRL+N (when you already have it open from our steps awhile ago). Fresh Blender 2.5 Session   Delete the default cube and add in a Plane object. Adding a Plane Primitive   Position the camera’s view such that the plane is just on top of the viewing range, this will give us more space to see the particles falling. Adjusting the View   Select the Plane Mesh and in the Particle Buttons window, add a new Particle System and leave the default values as they are. In the 3D Viewport, press ALT+A to play and pause the animation or use the play buttons in the Timeline Window. Pick a frame you’re satisfied with, any will do right now. New Particle System   Next, add a new material to the Plane Mesh and the Particle System. Activate the Material Buttons and add a new material, then change the material type from Surface to Halo. Halos is a special material type for particles; they give each point a unique shading system compared to the Surface shaders. With halos, you can control the size of the particles, their transparency, color, brightness, hardness, textures, flares, and other interesting effects. With just the default values, let’s render out our scene and see what we get. Halo Material   Halo Rendered   Right now, this doesn’t look too pleasing, but it will be once we get the combinations right. Next, let’s try activating Rings under the halo menu, then Lines, Star, and finally a combination of the three. Below are rendered versions of each and their combination. Additional Options for Halo   Halo with Rings   Halo with Lines   Halo with Star   Halo with Rings, Lines, and Star   Just by altering the halo settings alone, we can achieve particle effects like pollen dust as seen in the image below where the particle effect was composited over a photo. Particle Pollen Dust  
Read more
  • 0
  • 0
  • 6629

article-image-optimization-python
Packt
19 Aug 2015
14 min read
Save for later

Optimization in Python

Packt
19 Aug 2015
14 min read
The path to mastering performance in Python has just started. Profiling only takes us half way there. Measuring how our program is using the resources at its disposal only tells us where the problem is, not how to fix it. In this article by Fernando Doglio, author of the book Mastering Python High Performance, we will cover the process of optimization, and to do that, we need to start with the basics. We'll keep it inside the language for now: no external tools, just Python and the right way to use it. We will cover the following topics in this article: Memoization / lookup tables Usage of default arguments (For more resources related to this topic, see here.) Memoization / lookup tables This is one of the most common techniques used to improve the performance of a piece of code (namely a function). We can save the results of expensive function calls associated to a specific set of input values and return the saved result (instead of redoing the whole computation) when the function is called with the remembered input. It might be confused with caching, since it is one case of it, although this term refers also, to other types of optimization (such as HTTP caching, buffering, and so on) This methodology is very powerful, because in practice, it'll turn what should have been a potentially very expensive call into a O(1) function call if the implementation is right. Normally, the parameters are used to create a unique key, which is then used on a dictionary to either save the result or obtain it if it's been already saved. There is, of course, a trade-off to this technique. If we're going to be remembering the returned values of a memoized function, then we'll be exchanging memory space for speed. This is a very acceptable trade-off, unless the saved data becomes more than what the system can handle. Classic use cases for this optimization are function calls that repeat the input parameters often. This will assure that most of the times, the memoized results are returned. If there are many function calls but with different parameters, we'll only store results and spend our memory without any real benefit, as seen in the following diagram: You can clearly see how the blue bar (Fixed params, memoized) is clearly the fastest use case, while the others are all similar due to their nature. Here is the code that generates values for the preceding chart. To generate some sort of time-consuming function, the code will call either the twoParams function or the twoParamsMemoized function several hundred times under different conditions, and it will log the execution time: import math import time import random class Memoized: def __init__(self, fn): self.fn = fn self.results = {} def __call__(self, *args): key = ''.join(map(str, args[0])) try: return self.results[key] except KeyError: self.results[key] = self.fn(*args) return self.results[key] @Memoized def twoParamsMemoized(values, period): totalSum = 0 for x in range(0, 100): for v in values: totalSum = math.pow((math.sqrt(v) * period), 4) + totalSum return totalSum def twoParams(values, period): totalSum = 0 for x in range(0, 100): for v in values: totalSum = math.pow((math.sqrt(v) * period), 4) + totalSum return totalSum def performTest(): valuesList = [] for i in range(0, 10): valuesList.append(random.sample(xrange(1, 101), 10)) start_time = time.clock() for x in range(0, 10): for values in valuesList: twoParamsMemoized(values, random.random()) end_time = time.clock() - start_time print "Fixed params, memoized: %s" % (end_time) start_time = time.clock() for x in range(0, 10): for values in valuesList: twoParams(values, random.random()) end_time = time.clock() - start_time print "Fixed params, without memoizing: %s" % (end_time) start_time = time.clock() for x in range(0, 10): for values in valuesList: twoParamsMemoized(random.sample(xrange(1,2000), 10), random.random()) end_time = time.clock() - start_time print "Random params, memoized: %s" % (end_time) start_time = time.clock() for x in range(0, 10): for values in valuesList: twoParams(random.sample(xrange(1,2000), 10), random.random()) end_time = time.clock() - start_time print "Random params, without memoizing: %s" % (end_time) performTest() The main insight to take from the preceding chart is that just like with every aspect of programming, there is no silver bullet algorithm that will work for all cases. Memoization is clearly a very basic way of optimizing code, but clearly, it won't optimize anything given the right circumstances. As for the code, there is not much to it. It is a very simple, non real-world example of the point I was trying to send across. The performTest function will take care of running a series of 10 tests for every use case and measure the total time each use case takes. Notice that we're not really using profilers at this point. We're just measuring time in a very basic and ad-hoc way, which works for us. The input for both functions is simply a set of numbers on which they will run some math functions, just for the sake of doing something. The other interesting bit about the arguments is that, since the first argument is a list of numbers, we can't just use the args parameter as a key inside the Memoized class' methods. This is why we have the following line: key = ''.join(map(str, args[0])) This line will concatenate all the numbers from the first parameter into a single value, which will act as the key. The second parameter is not used here, because it's always random, which would imply that the key would never be the same. Another variation of the preceding method is to pre-calculate all values from the function (assuming we have a limited number of inputs of course) during initialization and then refer to the lookup table during execution. This approach has several preconditions: The number of input values must be finite; otherwise it's impossible to precalculate everything The lookup table with all of its values must fit into memory Just like before, the input must be repeated, at least once, so the optimization both makes sense and is worth the extra effort There are different approaches when it comes to architecting the lookup table, all offering different types of optimizations. It all depends on the type of application and solution that you're trying to optimize. Here is a set of examples. Lookup on a list or linked list This solution works by iterating over an unsorted list and checking the key against each element, with the associated value as the result we're looking for. This is obviously a very slow method of implementation, with a big O notation of O(n) for both the average and worst case scenarios. Still, given the right circumstances, it could prove to be faster than calling the actual function every time. In this case, using a linked list would improve the performance of the algorithm over using a simple list. However, it would still depend heavily on the type of linked list it is (doubly linked list, simple linked list with direct access to the first and last elements, and so on). Simple lookup on a dictionary This method works using a one-dimensional dictionary lookup, indexed by a key consisting of the input parameters (enough of them create a unique key). In particular cases (like we covered earlier), this is probably one of the fastest lookups, even faster than binary search in some cases with a constant execution time (big O notation of O(1)). Note that this approach is efficient as long as the key-generation algorithm is capable of generating unique keys every time. Otherwise, the performance could degrade over time due to the many collisions on the dictionaries. Binary search This particular method is only possible if the list is sorted. This could potentially be an option depending on the values to sort. Yet, sorting them would require an extra effort that would hurt the performance of the entire effort. However, it presents very good results even in long lists (average big O notation of O(log n)). It works by determining in which half of the list the value is and repeating until either the value is found or the algorithm is able to determine that the value is not in the list. To put all of this into perspective, looking at the Memoized class mentioned earlier, it implements a simple lookup on a dictionary. However, this would be the place to implement either of the other algorithms. Use cases for lookup tables There are some classic example use cases for this type of optimization, but the most common one is probably the optimization of trigonometric functions. Based on the computing time, these functions are really slow. When used repeatedly, they can cause some serious damage to your program's performance. This is why it is normally recommended to precalculate the values of these functions. For functions that deal with an infinite domain universe of possible input values, this task becomes impossible. So, the developer is forced to sacrifice accuracy for performance by precalculating a discrete subdomain of the possible input values (that is, going from floating points down to integer numbers). This approach might not be ideal in some cases, since some systems require both performance and accuracy. So, the solution is to meet in the middle and use some form of interpolation to calculate the wanted value, based on the ones that have been precalculated. It will provide better accuracy. Even though it won't be as performant as using the lookup table directly, it should prove to be faster than doing the trigonometric calculation every time. Let's look at some examples of this, for instance, for the following trigonometric function: def complexTrigFunction(x): return math.sin(x) * math.cos(x)**2 We'll take a look at how simple precalculation won't be accurate enough and how some form of interpolation will result in a better level of accuracy. The following code will precalculate the values for the function on a range from -1000 to 1000 (only integer values). Then, it'll try to do the same calculation (only for a smaller range) for floating point numbers: import math import time from collections import defaultdict import itertools trig_lookup_table = defaultdict(lambda: 0) def drange(start, stop, step): assert(step != 0) sample_count = math.fabs((stop - start) / step) return itertools.islice(itertools.count(start, step), sample_count) def complexTrigFunction(x): return math.sin(x) * math.cos(x)**2 def lookUpTrig(x): return trig_lookup_table[int(x)] for x in range(-1000, 1000): trig_lookup_table[x] = complexTrigFunction(x) trig_results = [] lookup_results = [] init_time = time.clock() for x in drange(-100, 100, 0.1): trig_results.append(complexTrigFunction(x)) print "Trig results: %s" % (time.clock() - init_time) init_time = time.clock() for x in drange(-100, 100, 0.1): lookup_results.append(lookUpTrig(x)) print "Lookup results: %s" % (time.clock() - init_time) for idx in range(0, 200): print "%st%s" % (trig_results [idx], lookup_results[idx]) The results from the preceding code will help demonstrate how the simple lookup table approach is not accurate enough (see the following chart). However, it compensates for it on speed since the original function takes 0.001526 seconds to run while the lookup table only takes 0.000717 seconds. The preceding chart shows how the lack of interpolation hurts the accuracy. You can see how even though both plots are quite similar, the results from the lookup table execution aren't as accurate as the trig function used directly. So, now, let's take another look at the same problem. However, this time, we'll add some basic interpolation (we'll limit the rage of values from -PI to PI): import math import time from collections import defaultdict import itertools trig_lookup_table = defaultdict(lambda: 0) def drange(start, stop, step): assert(step != 0) sample_count = math.fabs((stop - start) / step) return itertools.islice(itertools.count(start, step), sample_count) def complexTrigFunction(x): return math.sin(x) * math.cos(x)**2 reverse_indexes = {} for x in range(-1000, 1000): trig_lookup_table[x] = complexTrigFunction(math.pi * x / 1000) complex_results = [] lookup_results = [] init_time = time.clock() for x in drange(-10, 10, 0.1): complex_results .append(complexTrigFunction(x)) print "Complex trig function: %s" % (time.clock() - init_time) init_time = time.clock() factor = 1000 / math.pi for x in drange(-10 * factor, 10 * factor, 0.1 * factor): lookup_results.append(trig_lookup_table[int(x)]) print "Lookup results: %s" % (time.clock() - init_time) for idx in range(0, len(lookup_results )): print "%st%s" % (complex_results [idx], lookup_results [idx]) As you might've noticed in the previous chart, the resulting plot is periodic (specially because we've limited the range from -PI to PI). So, we'll focus on a particular range of values that will generate one single segment of the plot. The output of the preceding script also shows that the interpolation solution is still faster than the original trigonometric function, although not as fast as it was earlier: Interpolation Solution Original function 0.000118 seconds 0.000343 seconds The following chart is a bit different from the previous one, especially because it shows (in green) the error percentage between the interpolated value and the original one: The biggest error we have is around 12 percent (which represents the peaks we see on the chart). However, it's for the smallest values, such as -0.000852248551417 versus -0.000798905501416. This is a case where the error percentage needs to be contextualized to see if it really matters. In our case, since the values related to that error are so small, we can ignore that error in practice. There are other use cases for lookup tables, such as in image processing. However, for the sake of this article, the preceding example should be enough to demonstrate their benefits and trade-off implied in their usage. Usage of default arguments Another optimization technique, one that is contrary to memoization, is not particularly generic. Instead, it is directly tied to how the Python interpreter works. Default arguments can be used to determine values once at function creation time instead of at run time. This can only be done for functions or objects that will not be changed during program execution. Let's look at an example of how this optimization can be applied. The following code shows two versions of the same function, which does some random trigonometric calculation: import math #original function def degree_sin(deg): return math.sin(deg * math.pi / 180.0) #optimized function, the factor variable is calculated during function creation time, #and so is the lookup of the math.sin method. def degree_sin(deg, factor=math.pi/180.0, sin=math.sin): return sin(deg * factor) This optimization can be problematic if not correctly documented. Since it uses attributes to precompute terms that should not change during the program's execution, it could lead to the creation of a confusing API. With a quick and simple test, we can double-check the performance gain from this optimization: import time import math def degree_sin(deg): return math.sin(deg * math.pi / 180.0) * math.cos(deg * math.pi / 180.0) def degree_sin_opt(deg, factor=math.pi/180.0, sin=math.sin, cos = math.cos): return sin(deg * factor) * cos(deg * factor) normal_times = [] optimized_times = [] for y in range(100): init = time.clock() for x in range(1000): degree_sin(x) normal_times.append(time.clock() - init) init = time.clock() for x in range(1000): degree_sin_opt(x) optimized_times.append(time.clock() - init) print "Normal function: %s" % (reduce(lambda x, y: x + y, normal_times, 0) / 100) print "Optimized function: %s" % (reduce(lambda x, y: x + y, optimized_times, 0 ) / 100) The preceding code measures the time it takes for the script to finish each of the versions of the function to run its code 1000 times. It saves those measurements, and finally, it does an average for each case. The result is displayed in the following chart: It clearly isn't an amazing optimization. However, it does shave off some microseconds from our execution time, so we'll keep it in mind. Just remember that this optimization could cause problems if you're working as part of an OS developer team. Summary In this article, we covered several optimization techniques. Some of them are meant to provide big boosts on speed, save memory. Some of them are just meant to provide minor speed improvements. Most of this article covered Python-specific techniques, but some of them can be translated into other languages as well. Resources for Article: Further resources on this subject: How to do Machine Learning with Python [article] The Essentials of Working with Python Collections [article] Symbolizers [article]
Read more
  • 0
  • 0
  • 6620

article-image-customization-microsoft-dynamics-crm
Packt
30 Dec 2014
24 min read
Save for later

Customization in Microsoft Dynamics CRM

Packt
30 Dec 2014
24 min read
 In this article by Nicolae Tarla, author of the book Dynamics CRM Application Structure, we looked at the basic structure of Dynamics CRM, the modules comprising the application, and what each of these modules contain. Now, we'll delve deeper into the application and take a look at how we can customize it. In this chapter, we will take a look at the following topics: Solutions and publishers Entity elements Entity types Extending entities Entity forms, quick view, and quick create forms Entity views and charts Entity relationships Messages Business rules We'll be taking a look at how to work with each of the elements comprising the sales, service, and marketing modules. We will go through the customization options and see how we can extend the system to fit new business requirements. (For more resources related to this topic, see here.) Solutions When we are talking about customizations for Microsoft Dynamics CRM, one of the most important concepts is the solution. The solution is a container of all the configurations and customizations. This packaging method allows customizers to track customizations, export and reimport them into other environments, as well as group specific sets of customizations by functionality or project cycle. Managing solutions is an aspect that should not be taken lightly, as down the road, a properly designed solution packaging model can help a lot, or an incorrect one can create difficulties. Using solutions is a best practice. While you can implement customizations without using solutions, these customizations will be merged into the base solutions and "you will not be able to export the customizations separately from the core elements of the platform. For a comprehensive description of solutions, you can refer to the MSDN documentation available at http://msdn.microsoft.com/en-gb/library/gg334576.aspx#BKMK_UnmanagedandManagedSolutions. Types of solutions Within the context of Dynamics CRM, there are two types of solutions that you will commonly use while implementing customizations: Unmanaged solutions Managed solutions Each one of these solution types has its own strengths and properties and are recommended to be used in various circumstances. In order to create and manage solutions as well as perform system customizations, the user account must be configured as a system customizer or system administrator. Unmanaged solutions An unmanaged solution is the default state of a newly created solution. A solution is unmanaged for the period of time while customization work is being performed in the context of the solution. You cannot customize a managed solution. An unmanaged solution can be converted to a managed solution by exporting it as managed. When the work is completed and the unmanaged solution is ready to be distributed, it is recommended that you package it as a managed solution for distribution. A managed solution, if configured as such, prevents further customizations to the solution elements. For this reason, solution vendors package their solutions as managed. In an unmanaged solution, the system customizer can perform various tasks, "which include: Adding and removing components Deleting components that allow deletion Exporting and importing the solution as an unmanaged solution Exporting the solution as a managed solution Changes made to the components in an unmanaged solution are also applied to all the unmanaged solutions that include these components. This means that all changes from all unmanaged solutions are also applied to the default solution. Deleting an unmanaged solution results in the removal of the container alone, "while the unmanaged components of the solution remain in the system. Deleting a component in an unmanaged solution results in the deletion of this component from the system. In order to remove a component from an unmanaged solution, the component should be removed from the solution, not deleted. Managed solutions Once work is completed in an unmanaged solution and the solution is ready to be distributed, it can be exported as a managed solution. Packaging a solution as a managed solution presents the following advantages: Solution components cannot be added or removed from a managed solution A managed solution cannot be exported from the environment it was deployed in Deleting a managed solution results in the uninstallation of all the component customizations included with the solution. It also results "in the loss of data associated with the components being deleted. A managed solution cannot be installed in the same organization that contains the unmanaged solution which was used to create it. Within a managed solution, certain components can be configured to allow further customization. Through this mechanism, the managed solution provider can enable future customizations that modify aspects of the solution provided. The guidance provided by Microsoft when working with various solution types states that a solution should be used in an unmanaged state between development and test environments, and it should be exported as a managed solution when it is ready to be deployed to a production environment. Solution properties Besides the solution type, each solution contains a solution publisher. This is a set of properties that allow the solution creators to communicate different information to the solution's users, including ways to contact the publisher for additional support. The solution publisher record will be created in all the organizations where the solution is being deployed. The solution publisher record is also important when releasing an update to an existing solution. Based on this common record and the solution properties, an update solution can be released and deployed on top of an existing solution. Using a published solution also allows us to define a custom prefix for all new custom fields created in the context of the solution. The default format for new custom field names is a new field name. Using a custom publisher, we can change "the "new" prefix to a custom prefix specific to our solution. Solution layering When multiple solutions are deployed in an organization, there are two methods by which the system defines the order in which changes take precedence. These methods are merge and top wins. The user interface elements are merged by default. As such, elements such as the default forms, ribbons, command bars, and site map are merged, and all base elements and new custom elements are rendered. For all other solution components, the top wins approach is taken, where the last solution that makes a customization takes precedence. The top wins approach is also taken into consideration when a subset of customizations is being applied on top of a previously applied customization. The system checks the integrity of all solution exports, imports, and other operations. As such, when exporting a solution, if dependent entities are not included, a warning is presented. The customizer has the option to ignore this warning. When importing a solution, if the dependent entities are missing, the import is halted and it fails. Also, deleting a component from a solution is prevented if dependent entities require it to be present. The default solution Dynamics CRM allows you to customize the system without taking advantage of solutions. By default, the system comes with a solution. This is an unmanaged solution, and all system customizations are applied to it by default. The default solution includes all the default components and customizations defined within Microsoft Dynamics CRM. This solution defines the default application behavior. Most of the components in this solution can be further customized. "This solution includes all the out-of-the-box customizations. Also, customizations applied through unmanaged solutions are being merged into the default solution. Entity elements Within a solution, we work with various entities. In Dynamics CRM, there are three main entity types: System entities Business entities Custom entities Each entity is composed of various attributes, while each attribute is defined as a value with a specific data type. We can consider an entity to be a data table. Each "row represents and entity record, while each column represents an entity attribute. As with any table, each attribute has specific properties that define its data type. The system entities in Dynamics CRM are used internally by the application and "are not customizable. Also, they cannot be deleted. As a system customizer or developer, we will work mainly with business management entities and custom entities. Business management entities are the default entities that come with the application. Some are customizable and can "be extended as required. Custom entities are all net new entities that are created "as part of our system customizations. The aspects related to customizing an entity include renaming the entity; modifying, adding, or removing entity attributes; or changing various settings and properties. Let's take a look at all these in detail. Renaming an entity One of the ways to customize an entity is by renaming it. In the general properties "of the entity, the field's display name allows us to change the name of an entity. "The plural name can also be updated accordingly. When renaming an entity, make sure that all the references and messages are updated to reflect the new entity name. Views, charts, messages, business rules, hierarchy settings, and even certain fields can reference the original name, and they should be updated to reflect the new name assigned to the entity. The display name of an entity can be modified for the default value. This is a very common customization. In many instances, we need to modify the default entity name to match the business for which we are customizing the system. For instance, many customers use the term organization instead of account. This is a very easy customization achieved by updating the Display Name and Plural Name fields. While implementing this change, make sure that you also update the entity messages, as a lot of them use the original name of the entity by default.   You can change a message value by double-clicking on the message and entering the new message into the Custom Display String field. Changing entity settings and properties When creating and managing entities in Dynamics CRM, there are generic "entity settings that we have to pay attention to. We can easily get to these settings and properties by navigating to Components | Entities within a solution and selecting an entity from the list. We will get an account entity screen similar to "the following screenshot:   The settings are structured in two main tabs, with various categories on each tab. We will take a look at each set of settings and properties individually in the next sections. Entity definition This area of the General tab groups together general properties and settings related to entity naming properties, ownership, and descriptions. Once an entity is created, the Name value remains fixed and cannot be modified. If the internal Name field needs to be changed, a new entity with the new Name field must be created. Areas that display this entity This section sets the visibility of this entity. An entity can be made available in only one module or more standard modules of the application. The account is a good example as it is present in all the three areas of the application. Options for entity The Options for Entity section contains a subset of sections with various settings "and properties to configure the main properties of the entity, such as whether the entity can be customized by adding business process flows, notes and activities, "and auditing as well as other settings. Pay close attention to the settings marked with a plus, as once these settings are enabled, they cannot be disabled. If you are not sure whether you need these features, disable them. The Process section allows you to enable the entity for Business Process Flows. When enabling an entity for Business Process Flows, specific fields to support this functionality are created. For this reason, once an entity is enabled for Business Process Flows, it cannot be disabled at a later time. In the communication and collaboration area, we can enable the use of notes, related activities, and connections as well as enable sending of e-mails and queues on the entity. Enabling these configurations creates the required fields and relationships in the system, and you cannot disable them later. In addition, you can enable the entity for mail merge for use with access teams and also for document management. Enabling an entity for document management allows you to store documents related to the records of this type in SharePoint if the organization is configured to integrate with SharePoint. The data services section allows you to enable the quick create forms for this entity's records as well as to enable or disable duplicate detection and auditing. When you are enabling auditing, auditing must also be enabled at the organization level. Auditing is a two-step process. The next subsections deal with Outlook and mobile access. Here, we can define whether the entity can be accessed from various mobile devices as well as Outlook and whether the access is read-only or read/write on tablets. The last section allows us to define a custom help section for a specific entity. "Custom help must be enabled at the organization level first. Primary field settings The Primary Field settings tab contains the configuration properties for the entity's primary field. Each entity in the Dynamics CRM platform is defined by a primary field. This field can only be a text field, and the size can be customized as needed. The display name can be adjusted as needed. Also, the requirement level can be selected from one of the three values: optional, business-recommended, or business-required. When it is marked as business-required, the system will require users to enter a value if they are creating or making changes to an entity record form. The primary fields are also presented for customization in the entity field's listing. Business versus custom entities As mentioned previously, there are two types of customizable entities in Dynamics CRM. They are business entities and custom entities. Business entities are customizable entities that are created by Microsoft and come as part of the default solution package. They are part of the three modules: sales, service, and marketing. Custom entities are all the new entities that are being created as part of the customization and platform extending process. Business entities Business entities are part of the default customization provided with the application by Microsoft. They are either grouped into one of the three modules of functionality or are spread across all three. For example, the account and contact entities are present in all the modules, while the case entity belongs to the service module. "Some other business entities are opportunity, lead, marketing list, and so on. Most of the properties of business entities are customizable in Dynamics CRM. However, there are certain items that are not customizable across these entities. These are, in general, the same type of customizations that are not changeable "when creating a custom entity. For example, the entity internal name (the schema name) cannot be changed once an entity has been created. In addition, the primary field properties cannot be modified once an entity is created. Custom entities All new entities created as part of a customization and implemented in Dynamics CRM are custom entities. When creating a new custom entity, we have the freedom to configure all the settings and properties as needed from the beginning. We can use a naming convention that makes sense to the user and generate all the messages from the beginning, taking advantage of this name. A custom entity can be assigned by default to be displayed in one or more of the three main modules or in the settings and help section. If a new module is created and custom entities need to be part of this new module, we can achieve this by customizing the application navigation, commonly referred to as the application sitemap. While customizing the application navigation might not be such a straightforward process, the tools released to the community are available, which makes this job a lot easier and more visual. The default method to customize the navigation is described in detail in the SDK, and it involves exporting a solution with the navigation sitemap configuration, modifying the XML data, and reimporting the updated solution. Extending entities Irrespective of whether we want to extend a customizable business entity or a custom entity, the process is similar. We extend entities by creating new entity forms, views, charts, relationships, and business rules.   Starting with Dynamics CRM 2015, entities configured for hierarchical relationships now support the creation and visualization of hierarchies through hierarchy settings. We will be taking a look at each of these options in detail in the next sections. Entity forms Entities in Dynamics CRM can be accessed from various parts of the system, and their information can be presented in various formats. This feature contributes to "the 360-degree view of customer data. In order to enable this functionality, the entities in Dynamics CRM present a variety of standard views that are available for customization. These include standard entity forms, quick create forms, and quick view forms. In addition, for mobile devices, "we can customize mobile forms. Form types With the current version of Dynamics CRM 2015, most of the updated entities now have four different form types, as follows: The main form The mobile form The quick create form The quick view form Various other forms can be created on an entity, either from scratch or by opening an existing form and saving it with a new name. When complex forms need to be created, in many circumstances, it is much easier to start from an existing entity "form rather than recreating everything. We have role-based forms, which change based on the user's security role, and we can also have more than one form available for users to select from. We can customize which view is presented to the user based on specific form rules or "other business requirements. It is a good practice to define a fallback form for each entity and to give all the users view permissions to this form. Once more than one main forms are created for an entity, you can define the order in which the forms are presented based on permissions. If the user does not have access to any of the higher precedence forms, they will be able to access the fallback form. Working with contingency forms is quite similar; here, a form is defined to be available to users who cannot access any other forms on an entity. The approach for configuring this is a little different though. You create a form with minimal information being displayed on it. Only assign a system administrator role to this form, and select enable for a fallback. With this, you specify a form that will not be visible to anybody other that the system administrator. In addition, configuring the form in this manner also makes it available to users whose security roles do not have a form specified. With such a configuration, if a user is added to a restrictive group that does not allow them to see most forms, they will have this one form available. The main form The main form is the default form associated with an entity. This form will be available by default when you open a record. There can be more than one main form, and these forms can be configured to be available to various security roles. A role must have at least one form available for the role. If more than one form is available for a specific role, then the users will be given the option to select the form they want to use to visualize a record available for it to be selected by the user. Forms that are available for various roles are called role-based forms. As an example, the human resource role can have a specific view in an account, showing more information than a form available for a sales role. At the time of editing, the main form of an entity will look similar to the "following screenshot: A mobile form A mobile form is a stripped-down form that is available for mobile devices with small screens. When customizing mobile forms, you should not only pay attention to the fact that a small screen can only render so much before extensive scrolling becomes exhaustive but also the fact that most mobile devices transfer data wirelessly and, as such, the amount of data should be limited. At the time of editing, the Mobile Entity form looks similar to the Account Mobile form shown in the following screenshot. This is basically just a listing of the fields that are available and the order in which they are presented to the user.   The quick create form The quick create form, while serving a different purpose than quick view forms, "are confined to the same minimalistic approach. Of course, a system customizer "is not necessarily limited to a certain amount of data to be added to these forms, "but it should be mindful of where these forms are being used and how much real estate is dedicated to them. In a quick create form, the minimal amount of data to be added is the required "fields. In order to save a new record, all business-required fields must be filled in. "As such, they should be added to the quick create form. The quick create form are created in the same way as any other type of form. In the solution package, navigate to entities, select the entity in which you want to customize an existing quick create form or add a new one, and expand the forms section; you will see all the existing forms for the specific entity. Here, you can select the form you want to modify or click on New to create a new one. Once the form is open for editing, the process of customizing the form is exactly the same for all forms. You can add or remove fields, customize labels, rearrange fields in the form, and so on. In order to remind the customizer that this is a quick create form, a minimalistic three-column grid is provided by default for this type of form in edit mode, "as shown in the following screenshot: Pay close attention to the fact that you can add only a limited type of controls to a quick create form. Items such as iframes and sub-grids are not available. That's not to say that the layout cannot be changed. You can be as creative as needed when customizing the quick create view. Once you have created the form, save and publish it. Since we have created a relationship between the account and the project earlier, we can add a grid view "to the account displaying all the related child projects. Now, navigating to an account, we can quickly add a new child project by going "to the project's grid view and clicking on the plus symbol to add a project. This will launch the quick create view of the project we just customized. This is how the project window will look:   As you can see in the previous screenshot, the quick create view is displayed as an overlay over the main form. For this reason, the amount of data should be kept to a minimum. This type of form is not meant to replace a full-fledged form but to allow a user to create a new record type with minimal inputs and with no navigation to other records. Another way to access the quick create view for an entity is by clicking on the "Create button situated at the top-right corner of most Dynamics CRM pages, "right before the field that displays your username. This presents the user with "the option to create common out-of-the-box record types available in the system, "as seen in the following screenshot:   Selecting any one of the Records options presents the quick create view. If you opt to create activities in this way, you will not be presented with a quick create form; rather, you will be taken to the full activity form. Once a quick create form record is created in the system, the quick create form closes and a notification is displayed to the user with an option to navigate to the newly created record. This is how the final window should look:   The quick view form The quick view form is a feature added with Dynamics CRM 2013 that allows system customizers to create a minimalistic view to be presented in a related record form. This form presents a summary of a record in a condensed format that allows you to insert it into a related record's form. The process to use a quick view form comprises the following two steps: Create the quick view form for an entity Add the quick view form to the related record The process of creating a quick view form is similar to the process of creating "any other form. The only requirement here is to keep the amount of information minimal, in order to avoid taking up too much real estate on the related record "form. The following screenshot describes the standard Account quick create form:   A very good example is the quick view form for the account entity. This view is created by default in the system. It only includes the account name, e-mail and "phone information, as well as a grid of recent cases and recent activities. We can use this view in a custom project entity. In the project's main form, add a lookup field to define the account related to the project. In the project's form customization, add a Quick View Form tab from the ribbon, as shown in the following screenshot:   Once you add a Quick View Form tab, you are presented with a Quick View Control Properties window. Here, define the name and label for the control and whether you want the label to be displayed in the form. In addition, on this form, you get to define the rules on what is to be displayed "on the form. In the Data Source section, select Account in the Lookup Field and Related Entity dropdown list and in the Quick View Form dropdown list, select "the account card form. This is the name of the account's quick view form defined "in the system. The following screenshot shows the Data Source configuration and the Selected quick view forms field:   Once complete, save and publish the form. Now, if we navigate to a project record, we can select the related account and the quick view will automatically be displayed on the project form, as shown in the "next screenshot:   The default quick view form created for the account entity is displayed now on the project form with all the specified account-related details. This way any updates to the account are immediately reflected in the project form. Taking this approach, it is now much easier to display all the needed information on the same screen so that the user does not have to navigate away and click through a maze to get to all the data needed. Summary Throughout this chapter, we looked at the main component of the three system modules: an entity. We defined what an entity is and we looked at what an entity is composed of. Then, we looked at each of the components in detail and we discussed ways in which we can customize the entities and extend the system. We investigated ways to visually represent the data related to entities and how to relate entities for data integrity. We also looked at how to enhance entity behavior with business rules and the limitations that the business rules have versus more advanced customizations, using scripts or other developer-specific methods. The next chapter will take you into the business aspect of the Dynamics CRM platform, with an in-depth look at all the available business processes. We will revisit business rules, and we will take a look at other ways to enforce business-specific rules and processes using the wizard-driven customizations available with the platform. Resources for Article: Further resources on this subject: Form customizations [article] Introduction to Reporting in Microsoft Dynamics CRM [article] Overview of Microsoft Dynamics CRM 2011 [article]
Read more
  • 0
  • 0
  • 6619

article-image-the-elements-of-webassembly-wat-and-wasm-explained-tutorial
Prasad Ramesh
03 Jan 2019
9 min read
Save for later

The elements of WebAssembly - Wat and Wasm, explained [Tutorial]

Prasad Ramesh
03 Jan 2019
9 min read
In this tutorial, we will dig into the elements that correspond to the official specifications created by the WebAssembly Working Group. We will examine the Wat and the binary format in greater detail to gain a better understanding of how they relate to modules. This article is an excerpt from a book written by Mike Rourke titled Learn WebAssembly. In this book, you will learn how to build web applications with native performance using Wasm and C/C++<mark. Common structure and abstract syntax Before getting into the nuts and bolts of these formats, it's worth mentioning how these are related within the Core Specification. The following diagram is a visual representation of the table of contents (with some sections excluded for clarity): As you can see, the Text Format and Binary Format sections contain subsections for Values, Types, Instructions, and Modules that correlate with the Structure section. Consequently, much of what we cover in the next section for the text format have direct corollaries with the binary format. With that in mind, let's dive into the text format. Wat The Text Format section of the Core Specification provides technical descriptions for common language concepts such as values, types, and instructions. These are important concepts to know and understand if you're planning on building tooling for WebAssembly, but not necessary if you just plan on using it in your applications. That being said, the text format is an important part of WebAssembly, so there are concepts you should be aware of. In this section, we will dig into some of the details of the text format and highlight important points from the Core Specification. Definitions and S-expressions To understand Wat, let's start with the first sentence of the description taken directly from the WebAssembly Core Specification: "The textual format for WebAssembly modules is a rendering of their abstract syntax into S-expressions." So what are symbolic expressions (S-expressions)? S-expressions are notations for nested list (tree-structured) data. Essentially, they provide a simple and elegant way to represent list-based data in textual form. To understand how textual representations of nested lists map to a tree structure, let's extrapolate the tree structure from an HTML page. The following example contains a simple HTML page and the corresponding tree structure diagram. A simple HTML page: The corresponding tree structure is: Even if you've never seen a tree structure before, it's still clear to see how the HTML maps to the tree in terms of structure and hierarchy. Mapping HTML elements is relatively simple because it's a markup language with well-defined tags and no actual logic. Wat represents modules that can have multiple functions with varying parameters. To demonstrate the relationship between source code, Wat, and the corresponding tree structure, let's start with a simple C function that adds 2 to the number that is passed in as a parameter: Here is a C function that adds 2 to the num argument passed in and returns the result: int addTwo(int num) { return num + 2; } Converting the addTwo function to valid Wat produces this result: (module (table 0 anyfunc) (memory $0 1) (export "memory" (memory $0)) (export "addTwo" (func $addTwo)) (func $addTwo (; 0 ;) (param $0 i32) (result i32) (i32.add (get_local $0) (i32.const 2) ) ) ) The Structure section defines each of these concepts in the context of an abstract syntax. The Text Format section of the specification corresponds with these concepts as well, and you can see them defined by their keywords in the preceding snippet (func, memory, table). Tree Structure: The entire tree would be too large to fit on a page, so this diagram is limited to the first five lines of the Wat source text. Each filled-in dot represents a list node (or the contents of a set of parentheses). As you can see, code written in s-expressions can be clearly and concisely expressed in a tree structure, which is why s-expressions were chosen for WebAssembly's text format. Values, types, and instructions Although detailed coverage of the Text Format section of the Core Specification is out of the scope of this text, it's worth demonstrating how some of the language concepts map to the corresponding Wat. The following diagram demonstrates these mappings in a sample Wat snippet. The C code that this was compiled from represents a function that takes a word as a parameter and returns the square root of the character count: If you intend on writing or editing Wat, note that it supports block and line comments. The instructions are split up into blocks and consist of setting and getting memory associated with variables with valid types. You are able to control the flow of logic using if statements and loops are supported using the loop keyword. Role in the development process The text format allows for the representation of a binary Wasm module in textual form. This has some profound implications with regard to the ease of development and debugging. Having a textual representation of a WebAssembly module allows developers to view the source of a loaded module in a browser, which eliminates the black-box issues that inhibited the adoption of NaCl. It also allows for tooling to be built around troubleshooting modules. The official website describes the use cases that drove the design of the text format: • View Source on a WebAssembly module, thus fitting into the Web (where every source can be viewed) in a natural way. • Presentation in browser development tools when source maps aren't present (which is necessarily the case with the Minimum Viable Product (MVP)). • Writing WebAssembly code directly for reasons including pedagogical, experimental, debugging, optimization, and testing of the spec itself. The last item in the list reflects that the text format isn't intended to be written by hand in the course of normal development, but rather generated from a tool like Emscripten. You probably won't see or manipulate any .wat files when you're generating modules, but you may be viewing them in a debugging context. Not only is the text format valuable with regards to debugging, but having this intermediate format reduces the amount of reliance on a single tool for compilation. Several different tools currently exist to consume and emit this s-expression syntax, some of which are used by Emscripten to compile your code down to a .wasm file. Binary format and the module file (Wasm) The Binary Format section of the Core Specification provides the same level of detail with regard to language concepts as the Text format section. In this section, we will briefly cover some high-level details about the binary format and discuss the various sections that make up a Wasm module. Definition and module overview The binary format is defined as a dense linear encoding of the abstract syntax. Without getting too technical, that essentially means it's an efficient form of binary that allows for fast decoding, small file size, and reduced memory usage. The file representation of the binary format is a .wasm file. The Values, Types, and Instructions subsections of the Core Specification for the binary format correlate directly to the Text Format section. Each of these concepts is covered in the context of encoding. For example, according to the specification, the Integer types are encoded using the LEB128 variable-length integer encoding, in either unsigned or signed variant. These are important details to know if you wish to develop tooling for WebAssembly, but not necessary if you just plan on using it on your website. The Structure, Binary Format, and Text Format (wat) sections of the Core Specification have a Module subsection. We didn't cover aspects of the module in the previous section because it's more prudent to describe them in the context of a binary. The official WebAssembly site offers the following description for a module: "The distributable, loadable, and executable unit of code in WebAssembly is called a module. At runtime, a module can be instantiated with a set of import values to produce an instance, which is an immutable tuple referencing all the state accessible to the running module." Module sections A module is made up of several sections, some of which you'll be interacting with through the JavaScript API: Imports (import) are elements that can be accessed within the module and can be one of the following: Function, which can be called inside the module using the call operator Global, which can be accessed inside the module via the global operators Linear Memory, which can be accessed inside the module via the memory operators Table, which can be accessed inside the module using call_indirect Exports (export) are elements that can be accessed by the consuming API (that is, called by a JavaScript function) Module start function (start) is called after the module instance is initialized Global (global) contains the internal definition of global variables Linear memory (memory) contains the internal definition of linear memory with an initial memory size and optional maximum size Data (data) contains an array of data segments which specify the initial contents of fixed ranges of a given memory Table (table) is a linear memory whose elements are opaque values of a particular table element type: In the MVP, its primary purpose is to implement indirect function calls in C/C++ Elements (elements) is a section that allows a module to initialize the elements of any import or internally defined table with any other definition in the module Function and code: The function section declares the signatures of each internal function defined in the module The code section contains the function body of each function declared by the function section Some of the keywords (import, export, and so on) should look familiar; they're present in the contents of the Wat file in the previous section. WebAssembly's components follow a logical mapping that directly correspond to the APIs  (for example, you pass a memory and table instance into JavaScript's WebAssembly.instantiate() function). Your primary interaction with a module in binary format will be through these APIs. In this tutorial, we looked at the WebAssembly modules Wat and Wasm. We covered Wat definitions, values, types, instructions, and its role. We looked at Wasm overview and its module sections. To know more about another element of WebAssembly, the JavaScript API and build applications on WebAssembly check out the book Learn WebAssembly. How has Rust and WebAssembly evolved in 2018 WebAssembly – Trick or Treat? Mozilla shares plans to bring desktop applications, games to WebAssembly and make deeper inroads for the future web
Read more
  • 0
  • 0
  • 6617
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-implementing-stacks-using-javascript
Packt
22 Oct 2014
10 min read
Save for later

Implementing Stacks using JavaScript

Packt
22 Oct 2014
10 min read
 In this article by Loiane Groner, author of the book Learning JavaScript Data Structures and Algorithms, we will discuss the stacks. (For more resources related to this topic, see here.) A stack is an ordered collection of items that follows the LIFO (short for Last In First Out) principle. The addition of new items or the removal of existing items takes place at the same end. The end of the stack is known as the top and the opposite is known as the base. The newest elements are near the top, and the oldest elements are near the base. We have several examples of stacks in real life, for example, a pile of books, as we can see in the following image, or a stack of trays from a cafeteria or food court: A stack is also used by compilers in programming languages and by computer memory to store variables and method calls. Creating a stack We are going to create our own class to represent a stack. Let's start from the basics and declare our class: function Stack() {   //properties and methods go here} First, we need a data structure that will store the elements of the stack. We can use an array to do this: Var items = []; Next, we need to declare the methods available for our stack: push(element(s)): This adds a new item (or several items) to the top of the stack. pop(): This removes the top item from the stack. It also returns the removed element. peek(): This returns the top element from the stack. The stack is not modified (it does not remove the element; it only returns the element for information purposes). isEmpty(): This returns true if the stack does not contain any elements and false if the size of the stack is bigger than 0. clear(): This removes all the elements of the stack. size(): This returns how many elements the stack contains. It is similar to the length property of an array. The first method we will implement is the push method. This method will be responsible for adding new elements to the stack with one very important detail: we can only add new items to the top of the stack, meaning at the end of the stack. The push method is represented as follows: this.push = function(element){   items.push(element);}; As we are using an array to store the elements of the stack, we can use the push method from the JavaScript array class. Next, we are going to implement the pop method. This method will be responsible for removing the items from the stack. As the stack uses the LIFO principle, the last item that we added is the one that is removed. For this reason, we can use the pop method from the JavaScript array class. The pop method is represented as follows: this.pop = function(){   return items.pop();}; With the push and pop methods being the only methods available for adding and removing items from the stack, the LIFO principle will apply to our own Stack class. Now, let's implement some additional helper methods for our class. If we would like to know what the last item added to our stack was, we can use the peek method. This method will return the item from the top of the stack: this.peek = function(){   return items[items.length-1];}; As we are using an array to store the items internally, we can obtain the last item from an array using length - 1 as follows: For example, in the previous diagram, we have a stack with three items; therefore, the length of the internal array is 3. The last position used in the internal array is 2. As a result, the length - 1 (3 - 1) is 2! The next method is the isEmpty method, which returns true if the stack is empty (no item has been added) and false otherwise: this.isEmpty = function(){   return items.length == 0;}; Using the isEmpty method, we can simply verify whether the length of the internal array is 0. Similar to the length property from the array class, we can also implement length for our Stack class. For collections, we usually use the term "size" instead of "length". And again, as we are using an array to store the items internally, we can simply return its length: this.size = function(){   return items.length;}; Finally, we are going to implement the clear method. The clear method simply empties the stack, removing all its elements. The simplest way of implementing this method is as follows: this.clear = function(){   items = [];}; An alternative implementation would be calling the pop method until the stack is empty. And we are done! Our Stack class is implemented. Just to make our lives easier during the examples, to help us inspect the contents of our stack, let's implement a helper method called print that is going to output the content of the stack on the console: this.print = function(){   console.log(items.toString());}; And now we are really done! The complete Stack class Let's take a look at how our Stack class looks after its full implementation: function Stack() {    var items = [];    this.push = function(element){       items.push(element);   };    this.pop = function(){       return items.pop();   };    this.peek = function(){       return items[items.length-1];   };    this.isEmpty = function(){       return items.length == 0;   };    this.size = function(){       return items.length;   };    this.clear = function(){       items = [];   };    this.print = function(){       console.log(items.toString());   };} Using the Stack class Before we dive into some examples, we need to learn how to use the Stack class. The first thing we need to do is instantiate the Stack class we just created. Next, we can verify whether it is empty (the output is true because we have not added any elements to our stack yet): var stack = new Stack();console.log(stack.isEmpty()); //outputs true Next, let's add some elements to it (let's push the numbers 5 and 8; you can add any element type to the stack): stack.push(5);stack.push(8); If we call the peek method, the output will be the number 8 because it was the last element that was added to the stack: console.log(stack.peek()); // outputs 8 Let's also add another element: stack.push(11);console.log(stack.size()); // outputs 3console.log(stack.isEmpty()); //outputs false We added the element 11. If we call the size method, it will give the output as 3, because we have three elements in our stack (5, 8, and 11). Also, if we call the isEmpty method, the output will be false (we have three elements in our stack). Finally, let's add another element: stack.push(15); The following diagram shows all the push operations we have executed so far and the current status of our stack: Next, let's remove two elements from the stack by calling the pop method twice: stack.pop();stack.pop();console.log(stack.size()); // outputs 2stack.print(); // outputs [5, 8] Before we called the pop method twice, our stack had four elements in it. After the execution of the pop method two times, the stack now has only two elements: 5 and 8. The following diagram exemplifies the execution of the pop method: Decimal to binary Now that we know how to use the Stack class, let's use it to solve some Computer Science problems. You are probably already aware of the decimal base. However, binary representation is very important in Computer Science as everything in a computer is represented by binary digits (0 and 1). Without the ability to convert back and forth between decimal and binary numbers, it would be a little bit difficult to communicate with a computer. To convert a decimal number to a binary representation, we can divide the number by 2 (binary is base 2 number system) until the division result is 0. As an example, we will convert the number 10 into binary digits: This conversion is one of the first things you learn in college (Computer Science classes). The following is our algorithm: function divideBy2(decNumber){    var remStack = new Stack(),       rem,       binaryString = '';    while (decNumber > 0){ //{1}       rem = Math.floor(decNumber % 2); //{2}       remStack.push(rem); //{3}       decNumber = Math.floor(decNumber / 2); //{4} }    while (!remStack.isEmpty()){ //{5}       binaryString += remStack.pop().toString();   }    return binaryString;} In this code, while the division result is not zero (line {1}), we get the remainder of the division (mod) and push it to the stack (lines {2} and {3}), and finally, we update the number that will be divided by 2 (line {4}). An important observation: JavaScript has a numeric data type, but it does not distinguish integers from floating points. For this reason, we need to use the Math.floor function to obtain only the integer value from the division operations. And finally, we pop the elements from the stack until it is empty, concatenating the elements that were removed from the stack into a string (line {5}). We can try the previous algorithm and output its result on the console using the following code: console.log(divideBy2(233));console.log(divideBy2(10));console.log(divideBy2(1000)); We can easily modify the previous algorithm to make it work as a converter from decimal to any base. Instead of dividing the decimal number by 2, we can pass the desired base as an argument to the method and use it in the divisions, as shown in the following algorithm: function baseConverter(decNumber, base){    var remStack = new Stack(),        rem,       baseString = '',       digits = '0123456789ABCDEF'; //{6}    while (decNumber > 0){       rem = Math.floor(decNumber % base);       remStack.push(rem);       decNumber = Math.floor(decNumber / base);   }    while (!remStack.isEmpty()){       baseString += digits[remStack.pop()]; //{7}   }    return baseString;} There is one more thing we need to change. In the conversion from decimal to binary, the remainders will be 0 or 1; in the conversion from decimal to octagonal, the remainders will be from 0 to 8; but in the conversion from decimal to hexadecimal, the remainders can be 0 to 8 plus the letters A to F (values 10 to 15). For this reason, we need to convert these values as well (lines {6} and {7}). We can use the previous algorithm and output its result on the console as follows: console.log(baseConverter(100345, 2));console.log(baseConverter(100345, 8));console.log(baseConverter(100345, 16)); Summary In this article, we learned about the stack data structure. We implemented our own algorithm that represents a stack and we learned how to add and remove elements from it using the push and pop methods. We also covered a very famous example of how to use a stack. Resources for Article: Further resources on this subject: Organizing Backbone Applications - Structure, Optimize, and Deploy [article] Introduction to Modern OpenGL [article] Customizing the Backend Editing in TYPO3 Templates [article]
Read more
  • 0
  • 0
  • 6611

article-image-classifying-real-world-examples
Packt
24 Mar 2015
32 min read
Save for later

Classifying with Real-world Examples

Packt
24 Mar 2015
32 min read
This article by the authors, Luis Pedro Coelho and Willi Richert, of the book, Building Machine Learning Systems with Python - Second Edition, focuses on the topic of classification. (For more resources related to this topic, see here.) You have probably already used this form of machine learning as a consumer, even if you were not aware of it. If you have any modern e-mail system, it will likely have the ability to automatically detect spam. That is, the system will analyze all incoming e-mails and mark them as either spam or not-spam. Often, you, the end user, will be able to manually tag e-mails as spam or not, in order to improve its spam detection ability. This is a form of machine learning where the system is taking examples of two types of messages: spam and ham (the typical term for "non spam e-mails") and using these examples to automatically classify incoming e-mails. The general method of classification is to use a set of examples of each class to learn rules that can be applied to new examples. This is one of the most important machine learning modes and is the topic of this article. Working with text such as e-mails requires a specific set of techniques and skills. For the moment, we will work with a smaller, easier-to-handle dataset. The example question for this article is, "Can a machine distinguish between flower species based on images?" We will use two datasets where measurements of flower morphology are recorded along with the species for several specimens. We will explore these small datasets using a few simple algorithms. At first, we will write classification code ourselves in order to understand the concepts, but we will quickly switch to using scikit-learn whenever possible. The goal is to first understand the basic principles of classification and then progress to using a state-of-the-art implementation. The Iris dataset The Iris dataset is a classic dataset from the 1930s; it is one of the first modern examples of statistical classification. The dataset is a collection of morphological measurements of several Iris flowers. These measurements will enable us to distinguish multiple species of the flowers. Today, species are identified by their DNA fingerprints, but in the 1930s, DNA's role in genetics had not yet been discovered. The following four attributes of each plant were measured: sepal length sepal width petal length petal width In general, we will call the individual numeric measurements we use to describe our data features. These features can be directly measured or computed from intermediate data. This dataset has four features. Additionally, for each plant, the species was recorded. The problem we want to solve is, "Given these examples, if we see a new flower out in the field, could we make a good prediction about its species from its measurements?" This is the supervised learning or classification problem: given labeled examples, can we design a rule to be later applied to other examples? A more familiar example to modern readers who are not botanists is spam filtering, where the user can mark e-mails as spam, and systems use these as well as the non-spam e-mails to determine whether a new, incoming message is spam or not. For the moment, the Iris dataset serves our purposes well. It is small (150 examples, four features each) and can be easily visualized and manipulated. Visualization is a good first step Datasets will grow to thousands of features. With only four in our starting example, we can easily plot all two-dimensional projections on a single page. We will build intuitions on this small example, which can then be extended to large datasets with many more features. Visualizations are excellent at the initial exploratory phase of the analysis as they allow you to learn the general features of your problem as well as catch problems that occurred with data collection early. Each subplot in the following plot shows all points projected into two of the dimensions. The outlying group (triangles) are the Iris Setosa plants, while Iris Versicolor plants are in the center (circle) and Iris Virginica are plotted with x marks. We can see that there are two large groups: one is of Iris Setosa and another is a mixture of Iris Versicolor and Iris Virginica.   In the following code snippet, we present the code to load the data and generate the plot: >>> from matplotlib import pyplot as plt >>> import numpy as np   >>> # We load the data with load_iris from sklearn >>> from sklearn.datasets import load_iris >>> data = load_iris()   >>> # load_iris returns an object with several fields >>> features = data.data >>> feature_names = data.feature_names >>> target = data.target >>> target_names = data.target_names   >>> for t in range(3): ...   if t == 0: ...       c = 'r' ...       marker = '>' ...   elif t == 1: ...       c = 'g' ...       marker = 'o' ...   elif t == 2: ...       c = 'b' ...       marker = 'x' ...   plt.scatter(features[target == t,0], ...               features[target == t,1], ...               marker=marker, ...               c=c) Building our first classification model If the goal is to separate the three types of flowers, we can immediately make a few suggestions just by looking at the data. For example, petal length seems to be able to separate Iris Setosa from the other two flower species on its own. We can write a little bit of code to discover where the cut-off is: >>> # We use NumPy fancy indexing to get an array of strings: >>> labels = target_names[target]   >>> # The petal length is the feature at position 2 >>> plength = features[:, 2]   >>> # Build an array of booleans: >>> is_setosa = (labels == 'setosa')   >>> # This is the important step: >>> max_setosa =plength[is_setosa].max() >>> min_non_setosa = plength[~is_setosa].min() >>> print('Maximum of setosa: {0}.'.format(max_setosa)) Maximum of setosa: 1.9.   >>> print('Minimum of others: {0}.'.format(min_non_setosa)) Minimum of others: 3.0. Therefore, we can build a simple model: if the petal length is smaller than 2, then this is an Iris Setosa flower; otherwise it is either Iris Virginica or Iris Versicolor. This is our first model and it works very well in that it separates Iris Setosa flowers from the other two species without making any mistakes. In this case, we did not actually do any machine learning. Instead, we looked at the data ourselves, looking for a separation between the classes. Machine learning happens when we write code to look for this separation automatically. The problem of recognizing Iris Setosa apart from the other two species was very easy. However, we cannot immediately see what the best threshold is for distinguishing Iris Virginica from Iris Versicolor. We can even see that we will never achieve perfect separation with these features. We could, however, look for the best possible separation, the separation that makes the fewest mistakes. For this, we will perform a little computation. We first select only the non-Setosa features and labels: >>> # ~ is the boolean negation operator >>> features = features[~is_setosa] >>> labels = labels[~is_setosa] >>> # Build a new target variable, is_virginica >>> is_virginica = (labels == 'virginica') Here we are heavily using NumPy operations on arrays. The is_setosa array is a Boolean array and we use it to select a subset of the other two arrays, features and labels. Finally, we build a new boolean array, virginica, by using an equality comparison on labels. Now, we run a loop over all possible features and thresholds to see which one results in better accuracy. Accuracy is simply the fraction of examples that the model classifies correctly. >>> # Initialize best_acc to impossibly low value >>> best_acc = -1.0 >>> for fi in range(features.shape[1]): ... # We are going to test all possible thresholds ... thresh = features[:,fi] ... for t in thresh: ...   # Get the vector for feature `fi` ...   feature_i = features[:, fi] ...   # apply threshold `t` ...   pred = (feature_i > t) ...   acc = (pred == is_virginica).mean() ...   rev_acc = (pred == ~is_virginica).mean() ...   if rev_acc > acc: ...       reverse = True ...       acc = rev_acc ...   else: ...       reverse = False ... ...   if acc > best_acc: ...     best_acc = acc ...     best_fi = fi ...     best_t = t ...     best_reverse = reverse We need to test two types of thresholds for each feature and value: we test a greater than threshold and the reverse comparison. This is why we need the rev_acc variable in the preceding code; it holds the accuracy of reversing the comparison. The last few lines select the best model. First, we compare the predictions, pred, with the actual labels, is_virginica. The little trick of computing the mean of the comparisons gives us the fraction of correct results, the accuracy. At the end of the for loop, all the possible thresholds for all the possible features have been tested, and the variables best_fi, best_t, and best_reverse hold our model. This is all the information we need to be able to classify a new, unknown object, that is, to assign a class to it. The following code implements exactly this method: def is_virginica_test(fi, t, reverse, example):    "Apply threshold model to a new example"    test = example[fi] > t    if reverse:        test = not test    return test What does this model look like? If we run the code on the whole data, the model that is identified as the best makes decisions by splitting on the petal width. One way to gain intuition about how this works is to visualize the decision boundary. That is, we can see which feature values will result in one decision versus the other and exactly where the boundary is. In the following screenshot, we see two regions: one is white and the other is shaded in grey. Any datapoint that falls on the white region will be classified as Iris Virginica, while any point that falls on the shaded side will be classified as Iris Versicolor. In a threshold model, the decision boundary will always be a line that is parallel to one of the axes. The plot in the preceding screenshot shows the decision boundary and the two regions where points are classified as either white or grey. It also shows (as a dashed line) an alternative threshold, which will achieve exactly the same accuracy. Our method chose the first threshold it saw, but that was an arbitrary choice. Evaluation – holding out data and cross-validation The model discussed in the previous section is a simple model; it achieves 94 percent accuracy of the whole data. However, this evaluation may be overly optimistic. We used the data to define what the threshold will be, and then we used the same data to evaluate the model. Of course, the model will perform better than anything else we tried on this dataset. The reasoning is circular. What we really want to do is estimate the ability of the model to generalize to new instances. We should measure its performance in instances that the algorithm has not seen at training. Therefore, we are going to do a more rigorous evaluation and use held-out data. For this, we are going to break up the data into two groups: on one group, we'll train the model, and on the other, we'll test the one we held out of training. The full code, which is an adaptation of the code presented earlier, is available on the online support repository. Its output is as follows: Training accuracy was 96.0%. Testing accuracy was 90.0% (N = 50). The result on the training data (which is a subset of the whole data) is apparently even better than before. However, what is important to note is that the result in the testing data is lower than that of the training error. While this may surprise an inexperienced machine learner, it is expected that testing accuracy will be lower than the training accuracy. To see why, look back at the plot that showed the decision boundary. Consider what would have happened if some of the examples close to the boundary were not there or that one of them between the two lines was missing. It is easy to imagine that the boundary will then move a little bit to the right or to the left so as to put them on the wrong side of the border. The accuracy on the training data, the training accuracy, is almost always an overly optimistic estimate of how well your algorithm is doing. We should always measure and report the testing accuracy, which is the accuracy on a collection of examples that were not used for training. These concepts will become more and more important as the models become more complex. In this example, the difference between the accuracy measured on training data and on testing data is not very large. When using a complex model, it is possible to get 100 percent accuracy in training and do no better than random guessing on testing! One possible problem with what we did previously, which was to hold out data from training, is that we only used half the data for training. Perhaps it would have been better to use more training data. On the other hand, if we then leave too little data for testing, the error estimation is performed on a very small number of examples. Ideally, we would like to use all of the data for training and all of the data for testing as well, which is impossible. We can achieve a good approximation of this impossible ideal by a method called cross-validation. One simple form of cross-validation is leave-one-out cross-validation. We will take an example out of the training data, learn a model without this example, and then test whether the model classifies this example correctly. This process is then repeated for all the elements in the dataset. The following code implements exactly this type of cross-validation: >>> correct = 0.0 >>> for ei in range(len(features)):      # select all but the one at position `ei`:      training = np.ones(len(features), bool)      training[ei] = False      testing = ~training      model = fit_model(features[training], is_virginica[training])      predictions = predict(model, features[testing])      correct += np.sum(predictions == is_virginica[testing]) >>> acc = correct/float(len(features)) >>> print('Accuracy: {0:.1%}'.format(acc)) Accuracy: 87.0% At the end of this loop, we will have tested a series of models on all the examples and have obtained a final average result. When using cross-validation, there is no circularity problem because each example was tested on a model which was built without taking that datapoint into account. Therefore, the cross-validated estimate is a reliable estimate of how well the models would generalize to new data. The major problem with leave-one-out cross-validation is that we are now forced to perform many times more work. In fact, you must learn a whole new model for each and every example and this cost will increase as our dataset grows. We can get most of the benefits of leave-one-out at a fraction of the cost by using x-fold cross-validation, where x stands for a small number. For example, to perform five-fold cross-validation, we break up the data into five groups, so-called five folds. Then you learn five models: each time you will leave one fold out of the training data. The resulting code will be similar to the code given earlier in this section, but we leave 20 percent of the data out instead of just one element. We test each of these models on the left-out fold and average the results.   The preceding figure illustrates this process for five blocks: the dataset is split into five pieces. For each fold, you hold out one of the blocks for testing and train on the other four. You can use any number of folds you wish. There is a trade-off between computational efficiency (the more folds, the more computation is necessary) and accurate results (the more folds, the closer you are to using the whole of the data for training). Five folds is often a good compromise. This corresponds to training with 80 percent of your data, which should already be close to what you will get from using all the data. If you have little data, you can even consider using 10 or 20 folds. In the extreme case, if you have as many folds as datapoints, you are simply performing leave-one-out cross-validation. On the other hand, if computation time is an issue and you have more data, 2 or 3 folds may be the more appropriate choice. When generating the folds, you need to be careful to keep them balanced. For example, if all of the examples in one fold come from the same class, then the results will not be representative. We will not go into the details of how to do this, because the machine learning package scikit-learn will handle them for you. We have now generated several models instead of just one. So, "What final model do we return and use for new data?" The simplest solution is now to train a single overall model on all your training data. The cross-validation loop gives you an estimate of how well this model should generalize. A cross-validation schedule allows you to use all your data to estimate whether your methods are doing well. At the end of the cross-validation loop, you can then use all your data to train a final model. Although it was not properly recognized when machine learning was starting out as a field, nowadays, it is seen as a very bad sign to even discuss the training accuracy of a classification system. This is because the results can be very misleading and even just presenting them marks you as a newbie in machine learning. We always want to measure and compare either the error on a held-out dataset or the error estimated using a cross-validation scheme. Building more complex classifiers In the previous section, we used a very simple model: a threshold on a single feature. Are there other types of systems? Yes, of course! Many others. To think of the problem at a higher abstraction level, "What makes up a classification model?" We can break it up into three parts: The structure of the model: How exactly will a model make decisions? In this case, the decision depended solely on whether a given feature was above or below a certain threshold value. This is too simplistic for all but the simplest problems. The search procedure: How do we find the model we need to use? In our case, we tried every possible combination of feature and threshold. You can easily imagine that as models get more complex and datasets get larger, it rapidly becomes impossible to attempt all combinations and we are forced to use approximate solutions. In other cases, we need to use advanced optimization methods to find a good solution (fortunately, scikit-learn already implements these for you, so using them is easy even if the code behind them is very advanced). The gain or loss function: How do we decide which of the possibilities tested should be returned? Rarely do we find the perfect solution, the model that never makes any mistakes, so we need to decide which one to use. We used accuracy, but sometimes it will be better to optimize so that the model makes fewer errors of a specific kind. For example, in spam filtering, it may be worse to delete a good e-mail than to erroneously let a bad e-mail through. In that case, we may want to choose a model that is conservative in throwing out e-mails rather than the one that just makes the fewest mistakes overall. We can discuss these issues in terms of gain (which we want to maximize) or loss (which we want to minimize). They are equivalent, but sometimes one is more convenient than the other. We can play around with these three aspects of classifiers and get different systems. A simple threshold is one of the simplest models available in machine learning libraries and only works well when the problem is very simple, such as with the Iris dataset. In the next section, we will tackle a more difficult classification task that requires a more complex structure. In our case, we optimized the threshold to minimize the number of errors. Alternatively, we might have different loss functions. It might be that one type of error is much costlier than the other. In a medical setting, false negatives and false positives are not equivalent. A false negative (when the result of a test comes back negative, but that is false) might lead to the patient not receiving treatment for a serious disease. A false positive (when the test comes back positive even though the patient does not actually have that disease) might lead to additional tests to confirm or unnecessary treatment (which can still have costs, including side effects from the treatment, but are often less serious than missing a diagnostic). Therefore, depending on the exact setting, different trade-offs can make sense. At one extreme, if the disease is fatal and the treatment is cheap with very few negative side-effects, then you want to minimize false negatives as much as you can. What the gain/cost function should be is always dependent on the exact problem you are working on. When we present a general-purpose algorithm, we often focus on minimizing the number of mistakes, achieving the highest accuracy. However, if some mistakes are costlier than others, it might be better to accept a lower overall accuracy to minimize the overall costs. A more complex dataset and a more complex classifier We will now look at a slightly more complex dataset. This will motivate the introduction of a new classification algorithm and a few other ideas. Learning about the Seeds dataset We now look at another agricultural dataset, which is still small, but already too large to plot exhaustively on a page as we did with Iris. This dataset consists of measurements of wheat seeds. There are seven features that are present, which are as follows: area A perimeter P compactness C = 4pA/P² length of kernel width of kernel asymmetry coefficient length of kernel groove There are three classes, corresponding to three wheat varieties: Canadian, Koma, and Rosa. As earlier, the goal is to be able to classify the species based on these morphological measurements. Unlike the Iris dataset, which was collected in the 1930s, this is a very recent dataset and its features were automatically computed from digital images. This is how image pattern recognition can be implemented: you can take images, in digital form, compute a few relevant features from them, and use a generic classification system. For the moment, we will work with the features that are given to us. UCI Machine Learning Dataset Repository The University of California at Irvine (UCI) maintains an online repository of machine learning datasets (at the time of writing, they list 233 datasets). Both the Iris and the Seeds dataset used in this article were taken from there. The repository is available online at http://archive.ics.uci.edu/ml/. Features and feature engineering One interesting aspect of these features is that the compactness feature is not actually a new measurement, but a function of the previous two features, area and perimeter. It is often very useful to derive new combined features. Trying to create new features is generally called feature engineering. It is sometimes seen as less glamorous than algorithms, but it often matters more for performance (a simple algorithm on well-chosen features will perform better than a fancy algorithm on not-so-good features). In this case, the original researchers computed the compactness, which is a typical feature for shapes. It is also sometimes called roundness. This feature will have the same value for two kernels, one of which is twice as big as the other one, but with the same shape. However, it will have different values for kernels that are very round (when the feature is close to one) when compared to kernels that are elongated (when the feature is closer to zero). The goals of a good feature are to simultaneously vary with what matters (the desired output) and be invariant with what does not. For example, compactness does not vary with size, but varies with the shape. In practice, it might be hard to achieve both objectives perfectly, but we want to approximate this ideal. You will need to use background knowledge to design good features. Fortunately, for many problem domains, there is already a vast literature of possible features and feature-types that you can build upon. For images, all of the previously mentioned features are typical and computer vision libraries will compute them for you. In text-based problems too, there are standard solutions that you can mix and match. When possible, you should use your knowledge of the problem to design a specific feature or to select which ones from the literature are more applicable to the data at hand. Even before you have data, you must decide which data is worthwhile to collect. Then, you hand all your features to the machine to evaluate and compute the best classifier. A natural question is whether we can select good features automatically. This problem is known as feature selection. There are many methods that have been proposed for this problem, but in practice very simple ideas work best. For the small problems we are currently exploring, it does not make sense to use feature selection, but if you had thousands of features, then throwing out most of them might make the rest of the process much faster. Nearest neighbor classification For use with this dataset, we will introduce a new classifier: the nearest neighbor classifier. The nearest neighbor classifier is very simple. When classifying a new element, it looks at the training data for the object that is closest to it, its nearest neighbor. Then, it returns its label as the answer. Notice that this model performs perfectly on its training data! For each point, its closest neighbor is itself, and so its label matches perfectly (unless two examples with different labels have exactly the same feature values, which will indicate that the features you are using are not very descriptive). Therefore, it is essential to test the classification using a cross-validation protocol. The nearest neighbor method can be generalized to look not at a single neighbor, but to multiple ones and take a vote amongst the neighbors. This makes the method more robust to outliers or mislabeled data. Classifying with scikit-learn We have been using handwritten classification code, but Python is a very appropriate language for machine learning because of its excellent libraries. In particular, scikit-learn has become the standard library for many machine learning tasks, including classification. We are going to use its implementation of nearest neighbor classification in this section. The scikit-learn classification API is organized around classifier objects. These objects have the following two essential methods: fit(features, labels): This is the learning step and fits the parameters of the model predict(features): This method can only be called after fit and returns a prediction for one or more inputs Here is how we could use its implementation of k-nearest neighbors for our data. We start by importing the KneighborsClassifier object from the sklearn.neighbors submodule: >>> from sklearn.neighbors import KNeighborsClassifier The scikit-learn module is imported as sklearn (sometimes you will also find that scikit-learn is referred to using this short name instead of the full name). All of the sklearn functionality is in submodules, such as sklearn.neighbors. We can now instantiate a classifier object. In the constructor, we specify the number of neighbors to consider, as follows: >>> classifier = KNeighborsClassifier(n_neighbors=1) If we do not specify the number of neighbors, it defaults to 5, which is often a good choice for classification. We will want to use cross-validation (of course) to look at our data. The scikit-learn module also makes this easy: >>> from sklearn.cross_validation import KFold   >>> kf = KFold(len(features), n_folds=5, shuffle=True) >>> # `means` will be a list of mean accuracies (one entry per fold) >>> means = [] >>> for training,testing in kf: ...   # We fit a model for this fold, then apply it to the ...   # testing data with `predict`: ...   classifier.fit(features[training], labels[training]) ...   prediction = classifier.predict(features[testing]) ... ...   # np.mean on an array of booleans returns fraction ...     # of correct decisions for this fold: ...   curmean = np.mean(prediction == labels[testing]) ...   means.append(curmean) >>> print("Mean accuracy: {:.1%}".format(np.mean(means))) Mean accuracy: 90.5% Using five folds for cross-validation, for this dataset, with this algorithm, we obtain 90.5 percent accuracy. As we discussed in the earlier section, the cross-validation accuracy is lower than the training accuracy, but this is a more credible estimate of the performance of the model. Looking at the decision boundaries We will now examine the decision boundary. In order to plot these on paper, we will simplify and look at only two dimensions. Take a look at the following plot:   Canadian examples are shown as diamonds, Koma seeds as circles, and Rosa seeds as triangles. Their respective areas are shown as white, black, and grey. You might be wondering why the regions are so horizontal, almost weirdly so. The problem is that the x axis (area) ranges from 10 to 22, while the y axis (compactness) ranges from 0.75 to 1.0. This means that a small change in x is actually much larger than a small change in y. So, when we compute the distance between points, we are, for the most part, only taking the x axis into account. This is also a good example of why it is a good idea to visualize our data and look for red flags or surprises. If you studied physics (and you remember your lessons), you might have already noticed that we had been summing up lengths, areas, and dimensionless quantities, mixing up our units (which is something you never want to do in a physical system). We need to normalize all of the features to a common scale. There are many solutions to this problem; a simple one is to normalize to z-scores. The z-score of a value is how far away from the mean it is, in units of standard deviation. It comes down to this operation: In this formula, f is the old feature value, f' is the normalized feature value, μ is the mean of the feature, and σ is the standard deviation. Both μ and σ are estimated from training data. Independent of what the original values were, after z-scoring, a value of zero corresponds to the training mean, positive values are above the mean, and negative values are below it. The scikit-learn module makes it very easy to use this normalization as a preprocessing step. We are going to use a pipeline of transformations: the first element will do the transformation and the second element will do the classification. We start by importing both the pipeline and the feature scaling classes as follows: >>> from sklearn.pipeline import Pipeline >>> from sklearn.preprocessing import StandardScaler Now, we can combine them. >>> classifier = KNeighborsClassifier(n_neighbors=1) >>> classifier = Pipeline([('norm', StandardScaler()), ('knn', classifier)]) The Pipeline constructor takes a list of pairs (str,clf). Each pair corresponds to a step in the pipeline: the first element is a string naming the step, while the second element is the object that performs the transformation. Advanced usage of the object uses these names to refer to different steps. After normalization, every feature is in the same units (technically, every feature is now dimensionless; it has no units) and we can more confidently mix dimensions. In fact, if we now run our nearest neighbor classifier, we obtain 93 percent accuracy, estimated with the same five-fold cross-validation code shown previously! Look at the decision space again in two dimensions:   The boundaries are now different and you can see that both dimensions make a difference for the outcome. In the full dataset, everything is happening on a seven-dimensional space, which is very hard to visualize, but the same principle applies; while a few dimensions are dominant in the original data, after normalization, they are all given the same importance. Binary and multiclass classification The first classifier we used, the threshold classifier, was a simple binary classifier. Its result is either one class or the other, as a point is either above the threshold value or it is not. The second classifier we used, the nearest neighbor classifier, was a natural multiclass classifier, its output can be one of the several classes. It is often simpler to define a simple binary method than the one that works on multiclass problems. However, we can reduce any multiclass problem to a series of binary decisions. This is what we did earlier in the Iris dataset, in a haphazard way: we observed that it was easy to separate one of the initial classes and focused on the other two, reducing the problem to two binary decisions: Is it an Iris Setosa (yes or no)? If not, check whether it is an Iris Virginica (yes or no). Of course, we want to leave this sort of reasoning to the computer. As usual, there are several solutions to this multiclass reduction. The simplest is to use a series of one versus the rest classifiers. For each possible label ℓ, we build a classifier of the type is this ℓ or something else? When applying the rule, exactly one of the classifiers will say yes and we will have our solution. Unfortunately, this does not always happen, so we have to decide how to deal with either multiple positive answers or no positive answers.   Alternatively, we can build a classification tree. Split the possible labels into two, and build a classifier that asks, "Should this example go in the left or the right bin?" We can perform this splitting recursively until we obtain a single label. The preceding diagram depicts the tree of reasoning for the Iris dataset. Each diamond is a single binary classifier. It is easy to imagine that we could make this tree larger and encompass more decisions. This means that any classifier that can be used for binary classification can also be adapted to handle any number of classes in a simple way. There are many other possible ways of turning a binary method into a multiclass one. There is no single method that is clearly better in all cases. The scikit-learn module implements several of these methods in the sklearn.multiclass submodule. Some classifiers are binary systems, while many real-life problems are naturally multiclass. Several simple protocols reduce a multiclass problem to a series of binary decisions and allow us to apply the binary models to our multiclass problem. This means methods that are apparently only for binary data can be applied to multiclass data with little extra effort. Summary Classification means generalizing from examples to build a model (that is, a rule that can automatically be applied to new, unclassified objects). It is one of the fundamental tools in machine. In a sense, this was a very theoretical article, as we introduced generic concepts with simple examples. We went over a few operations with the Iris dataset. This is a small dataset. However, it has the advantage that we were able to plot it out and see what we were doing in detail. This is something that will be lost when we move on to problems with many dimensions and many thousands of examples. The intuitions we gained here will all still be valid. You also learned that the training error is a misleading, over-optimistic estimate of how well the model does. We must, instead, evaluate it on testing data that has not been used for training. In order to not waste too many examples in testing, a cross-validation schedule can get us the best of both worlds (at the cost of more computation). We also had a look at the problem of feature engineering. Features are not predefined for you, but choosing and designing features is an integral part of designing a machine learning pipeline. In fact, it is often the area where you can get the most improvements in accuracy, as better data beats fancier methods. Resources for Article: Further resources on this subject: Ridge Regression [article] The Spark programming model [article] Using cross-validation [article]
Read more
  • 0
  • 0
  • 6610

article-image-installation-and-setup
Packt
07 Jul 2015
15 min read
Save for later

Installation and Setup

Packt
07 Jul 2015
15 min read
The Banana Pi is a single-board computer, which enables you to build your own individual and versatile system. In fact, it is a complete computer, including all the required elements such as a processor, memory, network, and other interfaces, which we are going to explore. It provides enough power to run even relatively complex applications suitably. In this article by, Ryad El-Dajani, author of the book, Banana Pi Cookbook, we are going to get to know the Banana Pi device. The available distributions are mentioned, as well as how to download and install these distributions. We will also examine Android in contrast to our upcoming Linux adventure. (For more resources related to this topic, see here.) Thus, you are going to transform your little piece of hardware into a functional, running computer with a working operating system. You will master the whole process of doing the required task from connecting the cables, choosing an operating system, writing the image to an SD card, and successfully booting up and shutting down your device for the first time. Banana Pi Overview In the following picture, you see a Banana Pi on the left-hand side and a Banana Pro on the right-hand side: As you can see, there are some small differences that we need to notice. The Banana Pi provides a dedicated composite video output besides the HDMI output. However, with the Banana Pro, you can connect your display via composite video output using a four-pole composite audio/video cable on the jack. In contrast to the Banana Pi, which has 26 pin headers, the Banana Pro provides 40 pins. Also the pins for the UART port interface are located below the GPIO headers on the Pi, while they are located besides the network interface on the Pro. The other two important differences are not clearly visible on the previous picture. The operating system for your device comes in the form of image files that need to be written (burned) to an SD card. The Banana Pi uses normal SD cards while the Banana Pro will only accept Micro SD cards. Moreover, the Banana Pro provides a Wi-Fi interface already on board. Therefore, you are also able to connect the Banana Pro to your wireless network, while the Pi would require an external wireless USB device. Besides the mentioned differences, the devices are very similar. You will find the following hardware components and interfaces on your device. On the back side, you will find: A20 ARM Cortex-A7 dual core central processing unit (CPU) ARM Mali400 MP2 graphics processing unit (GPU) 1 gigabyte of DDR3 memory (that is shared with the GPU) On the front side, you will find: Ethernet network interface adapter Two USB 2.0 ports A 5V micro USB power with DC in and a micro USB OTG port A SATA 2.0 port and SATA power output Various display outputs [HDMI, LVDS, and composite (integrated into jack on the Pro)] A CSI camera input connector An infrared (IR) receiver A microphone Various hardware buttons on board (power key, reset key, and UBoot key) Various LEDs (red for power status, blue for Ethernet status, and green for user defined) As you can see, you have a lot of opportunities for letting your device interact with various external components. Operating systems for the Banana Pi The Banana Pi is capable of running any operating system that supports the ARM Cortex-A7 architecture. There are several operating systems precompiled, so you are able to write the operating system to an SD card and boot your system flawlessly. Currently, there are the following operating systems provided officially by LeMaker, the manufacturer of the Banana Pi. Android Android is a well-known operating system for mobile phones, but it is also runnable on various other devices such as smart watches, cars, and, of course, single-board computers such as the Banana Pi. The main advantage of running Android on a single-board computer is its convenience. Anybody who uses an Android-based smartphone will recognize the graphical user interface (GUI) and may have less initial hurdles. Also, setting up a media center might be easier to do on Android than on a Linux-based system. However, there are also a few disadvantages, as you are limited to software that is provided by an Android store such as Google Play. As most apps are optimized for mobile use at the moment, you will not find a lot of usable software for your Banana Pi running Android, except some Games and Multimedia applications. Moreover, you are required to use special Windows software called PhoenixCard to be able to prepare an Android SD card. In this article, we are going to ignore the installing of Android. For further information, please see Installing the Android OS image (LeMaker Wiki) at http://wiki.lemaker.org/BananaPro/Pi:SD_card_installation. Linux Most of the Linux users never realize that they are actually using Linux when operating their phones, appliances, routers, and many more products, as most of its magic happens in the background. We are going to dig into this adventure to discover its possibilities when running on our Banana Pi device. The following Linux-based operating systems—so-called distributions—are used by the majority of the Banana Pi user base and are supported officially by the manufacturer: Lubuntu: This is a lightweight distribution based on the well-known Ubuntu using the LXDE desktop, which is principally a good choice, if you are a Windows user. Raspbian: This is a distribution based on Debian, which was initially produced for the Raspberry Pi (hence the name). As a lot of Raspberry Pi owners are running Raspbian on their devices while also experimenting with the Banana Pi, LeMaker ported the original Raspbian distribution to the Banana Pi. Raspbian also comes with an LXDE desktop by default. Bananian: This too is a Debian-based Linux distribution optimized exclusively for the Banana Pi and its siblings. All of the aforementioned distributions are based on the well-known distribution, Debian. Besides the huge user base, all Debian-based distributions use the same package manager Apt (Advanced Packaging Tool) to search for and install new software, and all are similar to use. There are still more distributions that are officially supported by LeMaker, such as Berryboot, LeMedia, OpenSUSE, Fedora, Gentoo, Scratch, ArchLinux, Open MediaVault, and OpenWrt. All of them have their pros and cons or their specific use cases. If you are an experienced Linux user, you may choose your preferred distribution from the mentioned list, as most of the recipes are similar to, or even equally usable on, most of the Linux-based operating systems. Moreover, the Banana Pi community publishes various customized Linux distributions for the Banana Pi regularly. The possible advantages of a customized distribution may include enabled and optimized hardware acceleration capabilities, supportive helper scripts, fully equipped desktop environments, and much more. However, when deciding to use a customized distribution, there is no official support by LeMaker and you have to contact the publisher in case you encounter bugs, or need help. You can also check the customized Arch Linux image that author have built (http://blog.eldajani.net/banana-pi-arch-linux-customized-distribution/) for the Banana Pi and Banana Pro, including several useful applications. Downloading an operating system for the Banana Pi The following two recipes will explain how to set up the SD card with the desired operating system and how to get the Banana Pi up and running for the first time. This recipe is a predecessor. Besides the device itself, you will need at least a source for energy, which is usually a USB power supply and an SD card to boot your Banana Pi. Also, a network cable and connection is highly recommended to be able to interact with your Banana Pi from another computer via a remote shell using the application. You might also want to actually see something on a display. Then, you will need to connect your Banana Pi via HDMI, composite, or LVDS to an external screen. It is recommended that you use an HDMI Version 1.4 cable since lower versions can possibly cause issues. Besides inputting data using a remote shell, you can directly connect an USB keyboard and mouse to your Banana Pi via the USB ports. After completing the required tasks in the upcoming recipes, you will be able to boot your Banana Pi. Getting ready The following components are required for this recipe: Banana Pi SD card (minimum class 4; class 10 is recommended) USB power supply (5V 2A recommended) A computer with an SD card reader/writer (to write the image to the SD card) Furthermore, you are going to need an Internet connection to download a Linux distribution or Android. A few optional but highly recommended components are: Connection to a display (via HDMI or composite) Network connection via Ethernet USB keyboard and mouse You can acquire these items from various retailers. All items shown in the previous two pictures were bought from an online retailer that is known for originally selling books. However, the Banana Pi and the other products can be acquired from a large number of retailers. It is recommended to get a USB power supply with 2000mA (2A) output. How to do it… To download an operating system for Banana Pi, follow these steps: Download an image of your desired operating system. We are going to download Android and Raspbian from the official LeMaker image files website: http://www.lemaker.org/resources/9-38/image_files.html. The following screenshot shows the LeMaker website where you can download the official images: If you are clicking on one of the mirrors (such as Google Drive, Dropbox, and so on), you will be redirected to the equivalent file-hosting service. From there, you are actually able to download the archive file. Once your archive containing the image is downloaded, you are ready to unpack the downloaded archive, which we will do in the upcoming recipes. Setting up the SD card on Windows This recipe will explain how to set up the SD card using a Windows operating system. How to do it… In the upcoming steps, we will unpack the archive containing the operating system image for the Banana Pi and write the image to the SD card: Open the downloaded archive with 7-Zip. The following screenshot shows the 7-Zip application opening a compressed .tgz archive: Unpack the archive to a directory until you get a file with the file extension .img. If it is .tgz or .tar.gz file, you will need to unpack the archive twice Create a backup of the contents of the SD card as everything on the SD card is going to be erased unrecoverablely. Open SD Formatter (https://www.sdcard.org/downloads/formatter_4/) and check the disk letter (E: in the following screenshot). Choose Option to open the Option Setting window and choose: FORMAT TYPE: FULL (Erase) FORMAT SIZE ADJUSTMENT: ON When everything is configured correctly, check again to see whether you are using the correct disk and click Format to start the formatting process. Writing a Linux distribution image to the SD card on Windows The following steps explain how to write a Linux-based distribution to the SD card on Windows: Format the SD card using SD Formatter, which we covered in the previous section. Open the Win32 Disk Imager (http://sourceforge.net/projects/win32diskimager/). Choose the image file by clicking on the directory button. Check, whether you are going to write to the correct disk and then click on Write. Once the burning process is done, you are ready to insert the freshly prepared SD card containing your Linux operating system into the Banana Pi and boot it up for the first time. Booting up and shutting down the Banana Pi This recipe will explain how to boot up and shut down the Banana Pi. As the Banana Pi is a real computer, these tasks are as equally important as tasks on your desktop computer. The booting process starts the Linux kernel and several important services. The shutting down stops them accordingly and does not power off the Banana Pi until all data is synchronized with the SD card or external components correctly. How to do it… We are going to boot up and shut down the Banana Pi. Booting up Do the following steps to boot up your Banana Pi: Attach the Ethernet cable to your local network. Connect your Banana Pi to a display. Plug in an USB keyboard and mouse. Insert the SD card to your device. Power your Banana Pi by plugging in the USB power cable. The next screenshot shows the desktop of Raspbian after a successful boot: Shutting down Linux To shut down your Linux-based distribution, you either use the shutdown command or do it via the desktop environment (in case of Raspbian, it is called LXDE). For the latter method, these are the steps: Click on the LXDE icon in the lower-left corner. Click on Logout. Click on Shutdown in the upcoming window. To shut down your operating system via the shell, type in the following command: $ sudo shutdown -h now Connecting via SSH on Windows using PuTTY The following recipe shows you how to connect to your Banana Pi remotely using an open source application called PuTTY. Getting ready For this recipe, you will need the following ingredients: A booted up Linux operating system on your Banana Pi connected to your local network The PuTTY application on your Windows PC that is also connected to your local area network How to do it… To connect to your Banana Pi via SSH on Windows, perform the following: Run putty.exe. You will see the PuTTY Configuration dialog. Enter the IP address of the Banana Pi and leave the Port as number 22 as. Click on the Open button. A new terminal will appear, attempting a connection to the Banana Pi. When connecting to the Banana Pi for the first time, you will see a PuTTY security alert. The following screenshot shows the PuTTY Security Alert window: Trust the connection by clicking on Yes. You will be requested to enter the login credentials. Use the default username bananapi and password bananapi. When you are done, you should be welcomed by the shell of your Banana Pi. The following screenshot shows the shell of your Banana Pi accessed via SSH using PuTTY on Windows: To quit your SSH session, execute the command exit or press Ctrl + D. Searching, installing, and removing the software Once you have your decent operating system on the Banana Pi, sooner or later you are going to require a new software. As most software for Linux systems is published as open source, you can obtain the source code and compile it for yourself. One alternative is to use a package manager. A lot of software is precompiled and provided as installable packages by the so-called repositories. In case of Debian-based distributions (for example, Raspbian, Bananian, and Lubuntu), the package manager that uses these repositories is called Advanced Packaging Tool (Apt). The two most important tools for our requirements will be apt-get and apt-cache. In this recipe, we will cover the searching, the installing, and removing of software using the Apt utilities. Getting ready The following ingredients are required for this recipe. A booted Debian-based operating system on your Banana Pi An Internet connection How to do it… We will separate this recipe into searching for, installing and removing of packages. Searching for packages In the upcoming example, we will search for a solitaire game: Connect to your Banana Pi remotely or open a terminal on the desktop. Type the following command into the shell: $ apt-cache search solitaire You will get a list of packages that contain the string solitaire in their package name or description. Each line represents a package and shows the package name and description separated by a dash (-). Now we have obtained a list of solitaire games: The preceding screenshot shows the output after searching for packages containing the string solitaire using the apt-cache command. Installing a package We are going to install a package by using its package name. From the previous received list, we select the package ace-of-penguins. Type the following command into the shell: $ sudo apt-get install ace-of-penguins If asked to type the password for sudo, enter the user's password. If a package requires additional packages (dependencies), you will be asked to confirm the additional packages. In this case, enter Y. After downloading and installing, the desired package is installed: Removing a package When you want to uninstall (remove) a package, you also use the apt-get command: Type the following command into a shell: $ sudo apt-get remove ace-of-penguins If asked to type the password for sudo, enter the user's password. You will be asked to confirm the removal. Enter Y. After this process, the package is removed from your system. You will have uninstalled the package ace-of-penguins. Summary In this article, we discovered the installation of a Linux operating system on the Banana Pi. Furthermore, we connected to the Banana Pi via the SSH protocol using PuTTY. Moreover, we discussed how to install new software using the Advanced Packaging Tool. This article is a combination of parts from the first two chapters of the Banana Pi Cookbook. In the Banana Pi Cookbook, we are diving more into detail and explain the specifics of the Banana Pro, for example, how to connect to the local network via WLAN. If you are using a Linux-based desktop computer, you will also learn how to set up the SD card and connect via SSH to your Banana Pi on your Linux computer. Resources for Article: Further resources on this subject: Color and motion finding [article] Controlling the Movement of a Robot with Legs [article] Develop a Digital Clock [article]
Read more
  • 0
  • 0
  • 6601

article-image-work-with-classes-in-typescript
Amey Varangaonkar
15 May 2018
8 min read
Save for later

How to work with classes in Typescript

Amey Varangaonkar
15 May 2018
8 min read
If we are developing any application using TypeScript, be it a small-scale or a large-scale application, we will use classes to manage our properties and methods. Prior to ES 2015, JavaScript did not have the concept of classes, and we used functions to create class-like behavior. TypeScript introduced classes as part of its initial release, and now we have classes in ES6 as well. The behavior of classes in TypeScript and JavaScript ES6 closely relates to the behavior of any object-oriented language that you might have worked on, such as C#. This excerpt is taken from the book TypeScript 2.x By Example written by Sachin Ohri. Object-oriented programming in TypeScript Object-oriented programming allows us to represent our code in the form of objects, which themselves are instances of classes holding properties and methods. Classes form the container of related properties and their behavior. Modeling our code in the form of classes allows us to achieve various features of object-oriented programming, which helps us write more intuitive, reusable, and robust code. Features such as encapsulation, polymorphism, and inheritance are the result of implementing classes. TypeScript, with its implementation of classes and interfaces, allows us to write code in an object-oriented fashion. This allows developers coming from traditional languages, such as Java and C#, feel right at home when learning TypeScript. Understanding classes Prior to ES 2015, JavaScript developers did not have any concept of classes; the best way they could replicate the behavior of classes was with functions. The function provides a mechanism to group together related properties and methods. The methods can be either added internally to the function or using the prototype keyword. The following is an example of such a function: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; this.fullName = function() { return this.firstName + ' ' + this.lastName ; }; } In this preceding example, we have the fullName method encapsulated inside the Name function. Another way of adding methods to functions is shown in the following code snippet with the prototype keyword: function Name (firstName, lastName) { this.firstName = firstName; this.lastName = lastName; } Name.prototype.fullName = function() { return this.firstName + ' ' + this.lastName ; }; These features of functions did solve most of the issues of not having classes, but most of the dev community has not been comfortable with these approaches. Classes make this process easier. Classes provide an abstraction on top of common behavior, thus making code reusable. The following is the syntax for defining a class in TypeScript: The syntax of the class should look very similar to readers who come from an object-oriented background. To define a class, we use a class keyword followed by the name of the class. The News class has three member properties and one method. Each member has a type assigned to it and has an access modifier to define the scope. On line 10, we create an object of a class with the new keyword. Classes in TypeScript also have the concept of a constructor, where we can initialize some properties at the time of object creation. Access modifiers Once the object is created, we can access the public members of the class with the dot operator. Note that we cannot access the author property with the espn object because this property is defined as private. TypeScript provides three types of access modifiers. Public Any property defined with the public keyword will be freely accessible outside the class. As we saw in the previous example, all the variables marked with the public keyword were available outside the class in an object. Note that TypeScript assigns public as a default access modifier if we do not assign any explicitly. This is because the default JavaScript behavior is to have everything public. Private When a property is marked as private, it cannot be accessed outside of the class. The scope of a private variable is only inside the class when using TypeScript. In JavaScript, as we do not have access modifiers, private members are treated similarly to public members. Protected The protected keyword behaves similarly to private, with the exception that protected variables can be accessed in the derived classes. The following is one such example: class base{ protected id: number; } class child extends base{ name: string; details():string{ return `${name} has id: ${this.id}` } } In the preceding code, we extend the child class with the base class and have access to the id property inside the child class. If we create an object of the child class, we will still not have access to the id property outside. Readonly As the name suggests, a property with a readonly access modifier cannot be modified after the value has been assigned to it. The value assigned to a readonly property can only happen at the time of variable declaration or in the constructor. In the above code, line 5 gives an error stating that property name is readonly, and cannot be an assigned value. Transpiled JavaScript from classes While learning TypeScript, it is important to remember that TypeScript is a superset of JavaScript and not a new language on its own. Browsers can only understand JavaScript, so it is important for us to understand the JavaScript that is transpiled by TypeScript. TypeScript provides an option to generate JavaScript based on the ECMA standards. You can configure TypeScript to transpile into ES5 or ES6 (ES 2015) and even ES3 JavaScript by using the flag target in the tsconfig.json file. The biggest difference between ES5 and ES6 is with regard to the classes, let, and const keywords which were introduced in ES6. Even though ES6 has been around for more than a year, most browsers still do not have full support for ES6. So, if you are creating an application that would target older browsers as well, consider having the target as ES5. So, the JavaScript that's generated will be different based on the target setting. Here, we will take an example of class in TypeScript and generate JavaScript for both ES5 and ES6. The following is the class definition in TypeScript: This is the same code that we saw when we introduced classes in the Understanding Classes section. Here, we have a class named News that has three members, two of which are public and one private. The News class also has a format method, which returns a string concatenated from the member variables. Then, we create an object of the News class in line 10 and assign values to public properties. In the last line, we call the format method to print the result. Now let's look at the JavaScript transpiled by TypeScript compiler for this class. ES6 JavaScript ES6, also known as ES 2015, is the latest version of JavaScript, which provides many new features on top of ES5. Classes are one such feature; JavaScript did not have classes prior to ES6. The following is the code generated from the TypeScript class, which we saw previously: If you compare the preceding code with TypeScript code, you will notice minor differences. This is because classes in TypeScript and JavaScript are similar, with just types and access modifiers additional in TypeScript. In JavaScript, we do not have the concept of declaring public members. The author variable, which was defined as private and was initialized at its declaration, is converted to a constructor initialization in JavaScript. If we had not have initialized author, then the produced JavaScript would not have added author in the constructor. ES5 JavaScript ES5 is the most popular JavaScript version supported in browsers, and if you are developing an application that has to support the majority of browser versions, then you need to transpile your code to the ES5 version. This version of JavaScript does not have classes, and hence the transpiled code converts classes to functions, and methods inside the classes are converted to prototypically defined methods on the functions. The following is the code transpiled when we have the target set as ES5 in the TypeScript compiler options: As discussed earlier, the basic difference is that the class is converted to a function. The interesting aspect of this conversion is that the News class is converted to an immediately invoked function expression (IIFE). An IIFE can be identified by the parenthesis at the end of the function declaration, as we see in line 9 in the preceding code snippet. IIFEs cause the function to be executed immediately and help to maintain the correct scope of a function rather than declaring the function in a global scope. Another difference was how we defined the method format in the ES5 JavaScript. The prototype keyword is used to add the additional behavior to the function, which we see here. A couple of other differences you may have noticed include the change of the let keyword to var, as let is not supported in ES5. All variables in ES5 are defined with the var keyword. Also, the format method now does not use a template string, but standard string concatenation to print the output. TypeScript does a good job of transpiling the code to JavaScript while following recommended practices. This helps in making sure we have a robust and reusable code with minimum error cases. If you found this tutorial useful, make sure you check out the book TypeScript 2.x By Example for more hands-on tutorials on how to effectively leverage the power of TypeScript to develop and deploy state-of-the-art web applications. How to install and configure TypeScript Understanding Patterns and Architectures in TypeScript Writing SOLID JavaScript code with TypeScript
Read more
  • 0
  • 0
  • 6588
article-image-microsoft-office-excel-programming-using-vsto-30
Packt
08 Oct 2009
10 min read
Save for later

Microsoft Office Excel Programming Using VSTO 3.0

Packt
08 Oct 2009
10 min read
Microsoft Office Excel is the most frequently-used Microsoft application and one of the leading spreadsheet programs available. A spreadsheet program is a program that uses a huge grid to display data in rows and columns. The spreadsheet program can then be used to do calculation, data manipulation, and various similar tasks, against this data. Programming in Excel Microsoft Office Excel is one of the powerful Office tools that provides uncomplicated data management. It is loaded with features such as graphing, charting, pivot tables, and calculation. Even though Excel is loaded with numerous features for users, there may be some functionality that cannot be achieved by using the standard Microsoft Excel environment. In Microsoft Office Excel 2007, automation is a great mechanism for populating business documents (for example, reports) with data from backend system. This can be achieved by using VSTO 3.0 and Visual Studio 2008. Microsoft Office Excel 2007 is more programmable than ever before with support for Visual Studio Tools for Office. VSTO is aimed at Microsoft Office Excel users who want to develop Excel-based applications for other Microsoft Office Excel users. VSTO 3.0 is shipped with all of the newly-enhanced Excel objects, including improved features for building Microsoft Office based solutions. VSTO 3.0 is loaded with new features, including support for Windows form controls from within Microsoft Office tools customization. It provides the support for using .NET frameworks, objects and classes for Microsoft Office tools customization. For example, System.Data is the .NET frameworks object that can be used inside the Excel solution to process database operations. This new feature is tightly integrated with the Visual Studio 2008 IDE and gives you the comfort of design time customization of Excel documents for data display and UI customization. Similar to other Microsoft Office Tools, with Microsoft Office Excel 2007 customization using VSTO 3.0, you have two levels of customization— document-level customization and application-level customization. Document-level customization is a solution created for document-level programming and is specific to the document that is part of that particular solution. Application-level customization is a solution created for application-level programming and is specific to the application, and therefore common to all documents based on that application. In a simple Hello World demonstration, let's learn about the document level customization approach. We'll step through a simple task, showing how to create an Excel document that will display a Hello World message on startup. Hello World example using Visual Studio 2008 Open Visual Studio 2008, and create a new Excel 2007 Workbook project. Select New Project. Under Office select 2007, and then select the Excel 2007 Workbook template and name the project ExcelHelloWorld, as shown in the following image: The document selection dialog box is displayed. At this point, you need to choose the template for your design. In this example, you select a new blank template and click on the OK button. Refer to the following screenshot: The solution will be created with all of the supporting files required for our development of an Excel solution. Each solution is created with three worksheets, with default names: Sheet1, Sheet2, and Sheet3 for the workbook you're going to customize , as shown in the following image. The number of sheets in a new workbook depends on the settings in Excel. The following image also shows you the Excel-supported controls, placed on the leftmost side of the Visual Studio 2008 toolbox panel. You can also see the visual representation of Excel 2007 inside Visual Studio 2008. Let's write our Hello World message in a cell when we load the Excel 2007 document. Write the following code inside the ThisWorkbook.cs file. // The Startup event of workbook in our Excel solution// Startup event common to all Office application// Startup event will be fired at the start of the applicationprivate void ThisWorkbook_Startup(object sender,System.EventArgs e){// Creating the instance of active WorkSheet of Excel DocumentExcel.Worksheet AuthorWorkSheet = ThisApplication.ActiveSheetas Excel.Worksheet;// Get the range using number index through Excel cellsby setting AuthorExchange to an Excel range object starting at(1,1) and ending at (1,1)Excel.Range AuthorExcelRange = ThisApplication.get_Range(AuthorWorkSheet.Cells[1, 1],AuthorWorkSheet.Cells[1, 1]);// Setting the value in the cellAuthorExcelRange.Value2 = "Hello! this is my VSTO program forExcel 2007";} The following screenshot results after adding and executing the preceding code: Manipulation Microsoft Office Excel is one of the most comprehensive data management tools for all kinds of users. It is a tool that can be easily understood and quickly learnt. The most important feature of Microsoft Office Excel is its capability to manipulate data from different sources. Excel is one of the most powerful and user-friendly data manipulation applications. You could use Excel to predict what's ahead for your business by creating detailed financial forecasts. This powerful application has pivot table functionality that allows you to drop in your data and rearrange it to answer all kinds of business data analysis type questions. Excel helps you to build various useful analytical tools such as your monthly sales report and product sales growth analysis more easily and flexibly. Excel offers you formulae and functions that will help you to perform complex financial calculations without any manual errors. Excel can provide you with a graphical presentation of your business data by using charts and graphs. Want to know your growth levels for a specific product sales range? Check which parts of your business are performing worse? The pivot table provides more information from your business in depth. Every application in this computer world works with data. The data can be in any form and can belong to different sources. The key question for data management is where to place the data. You manage the data in two ways: data managed outside the program and data managed inside the program. The data managed outside the program includes data managed in a database, a file system, and so on. Data managed inside the program includes data in different worksheets within the workbook, embedded images, and so on. Data manipulation For users who are not satisfied with the default features in Microsoft Office Excel, VSTO programming makes Excel more flexible, and provides a development tool for the creation of custom solutions for data manipulation and data analysis. Custom programming using VSTO 3.0 improves most part of the Microsoft Office Excel solution. Custom programming using VSTO 3.0 provides a wide range of advantages, including saving time by automating most of the frequently-performed tasks, reducing errors due to manual operations, as well as enforcing standards for data processing, and building the capability to seamlessly integrate with other applications seamlessly. Reading worksheet cells There are many ways to manipulate data and write to the cells in an Excel worksheet. Let's see some of these ways. We can read worksheet cells directly through the Cells property of the sheets, rows, and columns, and can set a value directly by using the cell's row and column index. Open Visual Studio 2008 and create a new Excel solution. Refer to the previous example for full instructions of how to do this. Write the following code inside the ThisWorkbook.cs file. In this sample explanation, you are writing data into the worksheet by using the Cells object. // The Startup event of workbook in our Excel solutionprivate void ThisWorkbook_Startup(object sender,System.EventArgs e){// Set value for Cells row and column index// Text data in Sheet1 cellsGlobals.Sheet1.Cells[3, 3] = "Set my data";} We can also read the worksheet and write data to the cells by using the Range object. In this case, you are creating the range and setting the text data for the range in the Excel worksheet. Open Visual Studio 2008 and create a new solution, as before. Write the following code inside the ThisWorkbook.cs file . In this demonstration, you read the worksheet through the range of cells and set the value by reading through cell ranges. private void ThisWorkbook_Startup(object sender,System.EventArgs e){// Setting value in ExcelSheet cells through reading range objectExcel.Range AuthorExcelSheetRange = Globals.Sheet1.Range["A2","B2"];// Text data for the range A2 to B2AuthorExcelSheetRange.Value2 = "Set my data";} Let's see a demonstration of how to read data from an external data file and display this inside our Excel cells. In this demonstration, you will see how the data from the text (.txt) file is displayed in the spreadsheet. Opening a text file as a workbook using VSTO We'll now see how to open the text file as a workbook by using VSTO and C# programming. This saves time and makes the user more comfortable in accessing the text file while processing the data. Open Visual Studio 2008 and create a new solution, as before. Write the following code inside the ThisWorkbook.cs file: // Opening Text file as workbookprivate void ThisWorkbook_Startup(object sender,System.EventArgs e){// In the workbook objects, file path as parameter in Opentextproperty this.Application.Workbooks.OpenText(@"C:TechBooks.txt",// Value 1 is, the row from which it will read data in the textfile missing, 1,// Checks for delimits for text parsingExcel.XlTextParsingType.xlDelimited,// Text Enumeration valueExcel.XlTextQualifier.xlTextQualifierNone,missing, missing, missing, true, missing, missing, missing,missing, missing, missing, missing, missing, missing);} Connecting with Microsoft SQL Server 2008 database Microsoft SQL Server 2008 is a relational database management system developed by Microsoft Corporation. Microsoft SQL Server is used to manage a huge volume of data along with relation and Metadata information for this data. VSTO provides support for manipulating the data from your database inside Excel using ADO.NET classes. The preceding figure demonstrates how an Excel 2007 object is used to interact with the Microsoft SQL Server database. Let's see how to connect with a relational database management system, retrieve data from the database, and finally display it in our Excel spreadsheet. This demonstration shows you how to retrieve data from a Microsoft SQL Server 2008 database and place the retrieved data into the worksheet cells. Open Visual Studio 2008 and create a new solution, as usual. Write the following code inside the ThisWorkbook.cs file. // Namespace for SQL Server connectionusing System.Data.SqlClient;// Startup event of the workbookprivate void ThisWorkbook_Startup(object sender,System.EventArgs e){// Opening SQL connection for Microsoft SQL Server 2008// WINNER the database server contains the databse called ProductsSqlConnection MySQLConnection = new SqlConnection(@"DataSource=WINNER;Initial Catalog=Products;Integrated Security=True");// Passing SQL command textSqlCommand MySQLCommand = new SqlCommand("SELECT * FROMBooks", MySQLConnection);MySQLConnection.Open();// SQL reader to read through data from DatabaseSqlDataReader MySQLReader = MySQLCommand.ExecuteReader();// Get the active sheet of current applicationExcel.Worksheet MyWorkSheet = this.Application.ActiveSheet asExcel.Worksheet;// Header for the columns set in the Value2 properties((Excel.Range)MyWorkSheet.Cells[1, 1]).Value2 = "Book Name";((Excel.Range)MyWorkSheet.Cells[1, 2]).Value2 = "Author Name";// Indexerint i = 2;// Loop to read through the database returned datawhile (MySQLReader.Read()){// Writing the data from the database table column BookName((Excel.Range)MyWorkSheet.Cells[i, 1]).Value2 =MySQLReader["BookName"];// Writing the data from the database table column Author((Excel.Range)MyWorkSheet.Cells[i, 2]).Value2 =MySQLReader["Author"];i++;}// Dispose the SQL commandMySQLCommand.Dispose();// Closing SQL connection after using it.MySQLConnection.Close();} The following screenshot displays data retrieved from Microsoft SQL Server 2008 database and the data being displayed in the worksheet cells. In this demonstration, you learned how to connect with a Microsoft SQL Server 2008 database in order to get data and populate it in a workbook. This is just one of the ways of manipulating data outside of a workbook.
Read more
  • 0
  • 0
  • 6577

article-image-bizarre-python
Packt
19 Aug 2015
20 min read
Save for later

The strange relationship between objects, functions, generators and coroutines

Packt
19 Aug 2015
20 min read
The strange relationship between objects, functions, generators and coroutines In this article, I’d like to investigate some relationships between functions, objects, generators and coroutines in Python. At a theoretical level, these are very different concepts, but because of Python’s dynamic nature, many of them can appear to be used interchangeably. I discuss useful applications of all of these in my book, Python 3 Object-oriented Programming - Second Edition. In this essay, we’ll examine their relationship in a more whimsical light; most of the code examples below are ridiculous and should not be attempted in a production setting! Let’s start with functions, which are simplest. A function is an object that can be executed. When executed, the function is entered at one place accepting a group of (possibly zero) objects as parameters. The function exits at exactly one place and always returns a single object. Already we see some complications; that last sentence is true, but you might have several counter-questions: What if a function has multiple return statements? Only one of them will be executed in any one call to the function. What if the function doesn’t return anything? Then it it will implicitly return the None object. Can’t you return multiple objects separated by a comma? Yes, but the returned object is actually a single tuple Here’s a look at a function: def average(sequence): avg = sum(sequence) / len(sequence) return avg print(average([3, 5, 8, 7])) which outputs: That’s probably nothing you haven’t seen before. Similarly, you probably know what an object and a class are in Python. I define an object as a collection of data and associated behaviors. A class represents the “template” for an object. Usually the data is represented as a set of attributes and the behavior is represented as a collection of method functions, but this doesn’t have to be true. Here’s a basic object: class Statistics: def __init__(self, sequence): self.sequence = sequence def calculate_average(self): return sum(self.sequence) / len(self.sequence) def calculate_median(self): length = len(self.sequence) is_middle = int(not length % 2) return ( self.sequence[length // 2 - is_middle] + self.sequence[-length // 2]) / 2 statistics = Statistics([5, 2, 3]) print(statistics.calculate_average()) which outputs: This object has one piece of data attached to it: the sequence. It also has two methods besides the initializer. Only one of these methods is used in this particular example, but as Jack Diederich said in his famous Stop Writing Classes talk, a class with only one function besides the initializer should just be a function. So I included a second one to make it look like a useful class (It’s not. The new statistics module in Python 3.4 should be used instead. Never define for yourself that which has been defined, debugged, and tested by someone else). Classes like this are also things you’ve seen before, but with this background in place, we can now look at some bizarre things you might not expect (or indeed, want) to be able to do with a function. For example, did you know that functions are objects? In fact, anything that you can interact with in Python is defined in the source code for the CPython interpreter as a PyObject structure. This includes functions, objects, basic primitives, containers, classes, modules, you name it. This means we can attach attributes to a function just as with any standard object. Ah, but if functions are objects, can you attach functions to functions? Don’t try this at home (and especially don’t do it at work): def statistics(sequence): statistics.sequence = sequence return statistics def calculate_average(): return sum(statistics.sequence) / len(statistics.sequence) statistics.calculate_average = calculate_average print(statistics([1, 5, 8, 4]).calculate_average()) which outputs: This is a pretty crazy example (but we’re just getting started). The statistics function is being set up as an object that has two attributes: sequence is a list and calculate_average is another function object. For fun, the function returns itself so that the print function can call the calculate_average function all in one line. Note that the statistics function here is an object, not a class. Rather than emulating the Statistics class in the previous example, it is more similar to the statistics instance of that class. It is hard to imagine any reason that you would want to write code like this in real life. Perhaps it could be used to implement the Singleton (anti-)pattern popular with some other languages. Since there can only ever be one statistics function, it is not possible to create two distinct instances with two distinct sequence attributes the way we can with the Statistics class. There is generally little need for such code in Python, though, because of its ‘consenting adults’ nature. We can more closely simulate a class by using a function like a constructor: def Statistics(sequence): def self(): return self.average() self.sequence = sequence def average(): return sum(self.sequence) / len(self.sequence) self.average = average return self statistics = Statistics([2, 1, 1]) print(Statistics([1, 4, 6, 2]).average()) print(statistics()) which outputs: That looks an awful lot like Javascript, doesn’t it? The Statistics function acts like a constructor that returns an object (that happens to be a function, named self). That function object has had a couple attributes attached to it, so our function is now an object with both data and behavior. The last three lines show that we can instantiate two separate Statistics “objects” just as if it were a class. Finally, since the statistics object in the last line really is a function, we can even call it directly. It proxies the call through to the average function (or is it a method at this point? I can’t tell anymore) defined on itself. Before we go on, note that this simulated overlapping of functionality does not mean that we are getting exactly the same behavior out of the Python interpreter. While functions are objects, not all objects are functions. The underlying implementation is different, and if you try to do this in production code, you’ll quickly find confusing anomalies. In normal code, the fact that functions can have attributes attached to them is rarely useful. I’ve seen it used for interesting diagnostics or testing, but it’s generally just a hack. However, knowing that functions are objects allows us to pass them around to be called at a later time. Consider this basic partial implementation of an observer pattern: class Observers(list): register_observer = list.append def notify(self): for observer in self: observer() observers = Observers() def observer_one(): print('one was called') def observer_two(): print('two was called') observers.register_observer(observer_one) observers.register_observer(observer_two) observers.notify() which outputs: At line 2, I’ve intentionally reduced the comprehensibility of this code to conform to my initial ‘most of the examples in this article are ridiculous’ thesis. This line creates a new class attribute named register_observer which points to the list.append function. Since the Observers class inherits from the list class, this line essentially creates a shortcut to a method that would look like this: def register_observer(self, item): self.append(item) And this is how you should do it in your code. Nobody’s going to understand what’s going on if you follow my version. The part of this code that you might want to use in real life is the way the callback functions are passed into the registration function at lines 16 and 17. Passing functions around like this is quite common in Python. The alternative, if functions were not objects, would be to create a bunch of classes that have a single method with an uninformative name like execute and pass those around instead. Observers are a bit too useful, though, so in the spirit of keeping things ridiculous, let’s make a silly function that returns a function object: def silly(): print("silly") return silly silly()()()()()() which outputs: Since we’ve seen some ways that functions can (sort of) imitate objects, let’s now make an object that can behave like a function: class Function: def __init__(self, message): self.message = message def __call__(self, name): return "{} says '{}'".format(name, self.message) function = Function("I am a function") print(function('Cheap imitation function')) which outputs: I don’t use this feature often, but it can be useful in a few situations. For example, if you write a function and call it from many different places, but later discover that it needs to maintain some state, you can change the function to an object and implement the __call__ method without changing all the call sites. Or if you have a callback implementation that normally passes functions around, you can use a callable object when you need to store more complicated state. I’ve also seen Python decorators made out of objects when additional state or behavior is required. Now, let’s talk about generators. As you might expect by now, we’ll start with the silliest way to implement generation code. We can use the idea of a function that returns an object to create a rudimentary generatorish object that calculates the Fibonacci sequence: def FibFunction(): a = b = 1 def next(): nonlocal a, b a, b = b, a + b return b return next fib = FibFunction() for i in range(8): print(fib(), end=' ') which outputs: This is a pretty wacky thing to do, but the point is that it is possible to build functions that are able to maintain state between calls. The state is stored in the surrounding closure; we can access that state by referencing them with Python 3’s nonlocal keyword. It is kind of like global except it accesses the state from the surrounding function, not the global namespace. We can, of course, build a similar construct using classic (or classy) object notation: class FibClass(): def __init__(self): self.a = self.b = 1 def __call__(self): self.a, self.b = self.b, self.a + self.b return self.b fib = FibClass() for i in range(8): print(fib(), end=' ') which outputs: Of course, neither of these obeys the iterator protocol. No matter how I wrangle it, I was not able to get FibFunction to work with Python’s builtin next() function, even after looking through the CPython source code for a couple hours. As I mentioned earlier, using the function syntax to build pseudo-objects quickly leads to frustration. However, it’s easy to tweak the object based FibClass to fulfill the iterator protocol: class FibIterator(): def __init__(self): self.a = self.b = 1 def __next__(self): self.a, self.b = self.b, self.a + self.b return self.b def __iter__(self): return self fib = FibIterator() for i in range(8): print(next(fib), end=' ') which outputs: This class is a standard implementation of the iterator pattern. But it’s kind of ugly and verbose. Luckily, we can get the same effect in Python with a function that includes a yield statement to construct a generator. Here’s the Fibonacci sequence as a generator: ef FibGenerator(): a = b = 1 while True: a, b = b, a + b yield b fib = FibGenerator() for i in range(8): print(next(fib), end=' ') print('n', fib) which outputs: The generator version is a bit more readable than the other two implementations. The thing to pay attention to here is that a generator is not a function. The FibGenerator function returns an object as illustrated by the words “generator object” in the last line of output above. Unlike a normal function, a generator function does not execute any code inside it when we call the function. Instead it constructs a generator object and returns that. You could think of this as like an implicit Python decorator; the Python interpreter sees the yield keyword and wraps it in a decorator that returns an object instead. To start the function code executing, we have to use the next function (either explicitly as in the examples, or implicitly by using a for loop or yield from). While a generator is technically an object, it is often convenient to think of the function that creates it as a function that can have data passed in at one place and can return values multiple times. It’s sort of like a generic version of a function (which can have data passed in at one place and return a value at only one place). It is easy to make a generator that behaves not completely unlike a function, by yielding only one value: def average(sequence): yield sum(sequence) / len(sequence) print(next(average([1, 2, 3]))) which outputs: Unfortunately, the call site at line 4 is less readable than a normal function call, since we have to throw that pesky next() in there. The obvious way around this would be to add a __call__ method to the generator but this fails if we try to use attribute assignment or inheritance. There are optimizations that make generators run quickly in C code and also don’t let us assign attributes to them. We can, however, wrap the generator in a function-like object using a ludicrous decorator: def gen_func(func): def wrapper(*args, **kwargs): gen = func(*args, **kwargs) return next(gen) return wrapper @gen_func def average(sequence): yield sum(sequence) / len(sequence) print(average([1, 6, 3, 4])) which outputs: Of course this is an absurd thing to do. I mean, just write a normal function for pity’s sake! But taking this idea a little further, it could be tempting to create a slightly different wrapper: def callable_gen(func): class CallableGen: def __init__(self, *args, **kwargs): self.gen = func(*args, **kwargs) def __next__(self): return self.gen.__next__() def __iter__(self): return self def __call__(self): return next(self) return CallableGen @callable_gen def FibGenerator(): a = b = 1 while True: a, b = b, a + b yield b fib = FibGenerator() for i in range(8): print(fib(), end=' ') which outputs: To completely wrap the generator, we’d need to proxy a few other methods through to the underlying generator including send, close, and throw. This generator wrapper can be used to call a generator any number of times without calling the next function. I’ve been tempted to do this to make my code look cleaner if there are a lot of next calls in it, but I recommend not yielding into this temptation. Coders reading your code, including yourself, will go berserk trying to figure out what that “function call” is doing. Just get used to the next function and ignore this decorator business. So we’ve drawn some parallels between generators, objects and functions. Let’s talk now about one of the more confusing concepts in Python: coroutines. In Python, coroutines are usually defined as “generators that you can send values into”. At an implementation level, this is probably the most sensible definition. In the theoretical sense, however, it is more accurate to define coroutines as constructs that can accept values at one or more locations and return values at one or more locations. Therefore, while in Python it is easy to think of a generator as a special type of function that has yield statements and a coroutine as a special type of generator that we can send data into a different points, a better taxonomy is to think of a coroutine that can accept and return values at multiple locations as a general case, and generators and functions as special types of coroutines that are restricted in where they can accept or return values. So let’s see a coroutine: def LineInserter(lines): out = [] for line in lines: to_append = yield line out.append(line) if to_append is not None: out.append(to_append) return out emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ inserter = LineInserter(iter(emily.splitlines())) count = 1 try: line = next(inserter) while True: line = next(inserter) if count % 4 else inserter.send('-------') count += 1 except StopIteration as ex: print('n' + 'n'.join(ex.value)) which outputs: The LineInserter object is called a coroutine rather than a generator only because the yield statement is placed on the right side of an assignment operator. Now whenever we yield a line, it stores any value that might have been sent back into the coroutine in the to_append variable. As you can see in the driver code, we can send a value back in using inserter.send. if you instead just use next, the to_append variable gets a value of None. Don’t ask me why next is a function and send is a method when they both do nearly the same thing! In this example, we use the send call to insert a ruler every four lines to separate stanzas in Emily Dickinson’s famous poem. But I used the exact same coroutine in a program that parses the source file for this article. It checks if any line contains the string !#python, and if so, it executes the subsequent code block and inserts the output (see the ‘which outputs’ lines throughout this article) into the article. Coroutines can provide that little extra something when normal ‘one way’ iteration doesn’t quite cut it. The coroutine in the last example is really nice and elegant, but I find the driver code a bit annoying. I think it’s just me, but something about the indentation of a try…catch statement always frustrates me. Recently, I’ve been emulating Python 3.4’s contextlib.suppress context manager to replace except clauses with a callback. For example: def LineInserter(lines): out = [] for line in lines: to_append = yield line out.append(line) if to_append is not None: out.append(to_append) return out from contextlib import contextmanager @contextmanager def generator_stop(callback): try: yield except StopIteration as ex: callback(ex.value) def lines_complete(all_lines): print('n' + 'n'.join(all_lines)) emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ inserter = LineInserter(iter(emily.splitlines())) count = 1 with generator_stop(lines_complete): line = next(inserter) while True: line = next(inserter) if count % 4 else inserter.send('-------') count += 1 which outputs: The generator_stop now encapsulates all the ugliness, and the context manager can be used in a variety of situations where StopIteration needs to be handled. Since coroutines are undifferentiated from generators, they can emulate functions just as we saw with generators. We can even call into the same coroutine multiple times as if it were a function: def IncrementBy(increment): sequence = yield while True: sequence = yield [i + increment for i in sequence] sequence = [10, 20, 30] increment_by_5 = IncrementBy(5) increment_by_8 = IncrementBy(8) next(increment_by_5) next(increment_by_8) print(increment_by_5.send(sequence)) print(increment_by_8.send(sequence)) print(increment_by_5.send(sequence)) which outputs: Note the two calls to next at lines 9 and 10. These effectively “prime” the generator by advancing it to the first yield statement. Then each call to send essentially looks like a single call to a function. The driver code for this coroutine doesn’t look anything like calling a function, but with some evil decorator magic, we can make it look less disturbing: def evil_coroutine(func): def wrapper(*args, **kwargs): gen = func(*args, **kwargs) next(gen) def gen_caller(arg=None): return gen.send(arg) return gen_caller return wrapper @evil_coroutine def IncrementBy(increment): sequence = yield while True: sequence = yield [i + increment for i in sequence] sequence = [10, 20, 30] increment_by_5 = IncrementBy(5) increment_by_8 = IncrementBy(8) print(increment_by_5(sequence)) print(increment_by_8(sequence)) print(increment_by_5(sequence)) which outputs: The decorator accepts a function and returns a new wrapper function that gets assigned to the IncrementBy variable. Whenever this new IncrementBy is called, it constructs a generator using the original function, and advances it to the first yield statement using next (the priming action from before). It returns a new function that calls send on the generator each time it is called. This function makes the argument default to None so that it can also work if we call next instead of send. The new driver code is definitely more readable, but once again, I would not recommend using this coding style to make coroutines behave like hybrid object/functions. The argument that other coders aren’t going to understand what is going through your head still stands. Plus, since send can only accept one argument, the callable is quite restricted. Before we leave our discussion of the bizarre relationships between these concepts, let’s look at how the stanza processing code could look without an explicit coroutine, generator, function, or object: emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ for index, line in enumerate(emily.splitlines(), start=1): print(line) if not index % 4: print('------') which outputs: This code is so simple and elegant! This happens to me nearly every time I try to use coroutines. I keep refactoring and simplifying it until I discover that coroutines are making my code less, not more, readable. Unless I am explicitly modeling a state-transition system or trying to do asynchronous work using the terrific asyncio library (which wraps all the possible craziness with StopIteration, cascading exceptions, etc), I rarely find that coroutines are the right tool for the job. That doesn’t stop me from attempting them though, because they are fun. For the record, the LineInserter coroutine actually is useful in the markdown code executor I use to parse this source file. I need to keep track of more transitions between states (am I currently looking for a code block? am I in a code block? Do I need to execute the code block and record the output?) than in the stanza marking example used here. So, it has become clear that in Python, there is more than one way to do a lot of things. Luckily, most of these ways are not very obvious, and there is usually “one, and preferably only one, obvious way to do things”, to quote The Zen Of Python. I hope by this point that you are more confused about the relationship between functions, objects, generators and coroutines than ever before. I hope you now know how to write much code that should never be written. But mostly I hope you’ve enjoyed your exploration of these topics. If you’d like to see more useful applications of these and other Python concepts, grab a copy of Python 3 Object-oriented Programming, Second Edition.
Read more
  • 0
  • 0
  • 6576

article-image-working-with-pandas-dataframes
Sugandha Lahoti
23 Feb 2018
15 min read
Save for later

Working with pandas DataFrames

Sugandha Lahoti
23 Feb 2018
15 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book Python Data Analysis - Second Edition written by Armando Fandango. From this book, you will learn how to process and manipulate data with Python for complex data analysis and modeling. Code bundle for this article is hosted on GitHub.[/box] The popular open source Python library, pandas is named after panel data (an econometric term) and Python data analysis. We shall learn about basic panda functionalities, data structures, and operations in this article. The official pandas documentation insists on naming the project pandas in all lowercase letters. The other convention the pandas project insists on, is the import pandas as pd import statement. We will follow these conventions in this text. In this tutorial, we will install and explore pandas. We will also acquaint ourselves with the a central pandas data structure–DataFrame. Installing and exploring pandas The minimal dependency set requirements for pandas is given as follows: NumPy: This is the fundamental numerical array package that we installed and covered extensively in the preceding chapters python-dateutil: This is a date handling library pytz: This handles time zone definitions This list is the bare minimum; a longer list of optional dependencies can be located at http://pandas.pydata.org/pandas-docs/stable/install.html. We can install pandas via PyPI with pip or easy_install, using a binary installer, with the aid of our operating system package manager, or from the source by checking out the code. The binary installers can be downloaded from http://pandas.pydata.org/getpandas.html. The command to install pandas with pip is as follows: $ pip3 install pandas rpy2 rpy2 is an interface to R and is required because rpy is being deprecated. You may have to prepend the preceding command with sudo if your user account doesn't have sufficient rights. The pandas DataFrames A pandas DataFrame is a labeled two-dimensional data structure and is similar in spirit to a worksheet in Google Sheets or Microsoft Excel, or a relational database table. The columns in pandas DataFrame can be of different types. A similar concept, by the way, was invented originally in the R programming language. (For more information, refer to http://www.r-tutor.com/r-introduction/data-frame). A DataFrame can be created in the following ways: Using another DataFrame. Using a NumPy array or a composite of arrays that has a two-dimensional shape. Likewise, we can create a DataFrame out of another pandas data structure called Series. We will learn about Series in the following section. A DataFrame can also be produced from a file, such as a CSV file. From a dictionary of one-dimensional structures, such as one-dimensional NumPy arrays, lists, dicts, or pandas Series. As an example, we will use data that can be retrieved from http://www.exploredata.net/Downloads/WHO-Data-Set. The original data file is quite large and has many columns, so we will use an edited file instead, which only contains the first nine columns and is called WHO_first9cols.csv; the file is in the code bundle of this book. These are the first two lines, including the header: Country,CountryID,Continent,Adolescent fertility rate (%),Adult literacy rate (%),Gross national income per capita (PPP international $),Net primary school enrolment ratio female (%),Net primary school enrolment ratio male (%),Population (in thousands) totalAfghanistan,1,1,151,28,,,,26088 In the next steps, we will take a look at pandas DataFrames and its attributes: To kick off, load the data file into a DataFrame and print it on the screen: from pandas.io.parsers import read_csv df = read_csv("WHO_first9cols.csv") print("Dataframe", df) The printout is a summary of the DataFrame. It is too long to be displayed entirely, so we will just grab the last few lines: 199 21732.0 200 11696.0 201 13228.0 [202 rows x 9 columns] The DataFrame has an attribute that holds its shape as a tuple, similar to ndarray. Query the number of rows of a DataFrame as follows: print("Shape", df.shape) print("Length", len(df)) The values we obtain comply with the printout of the preceding step: Shape (202, 9) Length 202 Check the column header and data types with the other attributes: print("Column Headers", df.columns) print("Data types", df.dtypes) We receive the column headers in a special data structure: Column Headers Index([u'Country', u'CountryID', u'Continent', u'Adolescent fertility rate (%)', u'Adult literacy rate (%)', u'Gross national income per capita (PPP international $)', u'Net primary school enrolment ratio female (%)', u'Net primary school enrolment ratio male (%)', u'Population (in thousands) total'], dtype='object') The data types are printed as follows: 4. The pandas DataFrame has an index, which is like the primary key of relational database tables. We can either specify the index or have pandas create it automatically. The index can be accessed with a corresponding property, as follows: Print("Index", df.index) An index helps us search for items quickly, just like the index in this book. In our case, the index is a wrapper around an array starting at 0, with an increment of one for each row: Sometimes, we wish to iterate over the underlying data of a DataFrame. Iterating over column values can be inefficient if we utilize the pandas iterators. It's much better to extract the underlying NumPy arrays and work with those. The pandas DataFrame has an attribute that can aid with this as well: print("Values", df.values) Please note that some values are designated nan in the output, for 'not a number'. These values come from empty fields in the input datafile: The preceding code is available in Python Notebook ch-03.ipynb, available in the code bundle of this book. Querying data in pandas Since a pandas DataFrame is structured in a similar way to a relational database, we can view operations that read data from a DataFrame as a query. In this example, we will retrieve the annual sunspot data from Quandl. We can either use the Quandl API or download the data manually as a CSV file from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual. If you want to install the API, you can do so by downloading installers from https://pypi.python.org/pypi/Quandl or by running the following command: $ pip3 install Quandl Using the API is free, but is limited to 50 API calls per day. If you require more API calls, you will have to request an authentication key. The code in this tutorial is not using a key. It should be simple to change the code to either use a key or read a downloaded CSV file. If you have difficulties, search through the Python docs at https://docs.python.org/2/. Without further preamble, let's take a look at how to query data in a pandas DataFrame: As a first step, we obviously have to download the data. After importing the Quandl API, get the data as follows: import quandl # Data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual # PyPi url https://pypi.python.org/pypi/Quandl sunspots = quandl.get("SIDC/SUNSPOTS_A") The head() and tail() methods have a purpose similar to that of the Unix commands with the same name. Select the first n and last n records of a DataFrame, where n is an integer parameter: print("Head 2", sunspots.head(2) ) print("Tail 2", sunspots.tail(2)) This gives us the first two and last two rows of the sunspot data (for the sake of brevity we have not shown all the columns here; your output will have all the columns from the dataset): Head 2          Number Year 1700-12-31      5 1701-12-31     11 [2 rows x 1 columns] Tail 2          Number Year 2012-12-31 57.7 2013-12-31 64.9 [2 rows x 1 columns] Please note that we only have one column holding the number of sunspots per year. The dates are a part of the DataFrame index. The following is the query for the last value using the last date: last_date = sunspots.index[-1] print("Last value", sunspots.loc[last_date]) You can check the following output with the result from the previous step: Last value Number        64.9 Name: 2013-12-31 00:00:00, dtype: float64 Query the date with date strings in the YYYYMMDD format as follows: print("Values slice by date:n", sunspots["20020101": "20131231"]) This gives the records from 2002 through to 2013: Values slice by date                             Number Year 2002-12-31     104.0 [TRUNCATED] 2013-12-31       64.9 [12 rows x 1 columns] A list of indices can be used to query as well: print("Slice from a list of indices:n", sunspots.iloc[[2, 4, -4, -2]]) The preceding code selects the following rows: Slice from a list of indices                              Number Year 1702-12-31       16.0 1704-12-31       36.0 2010-12-31       16.0 2012-12-31       57.7 [4 rows x 1 columns] To select scalar values, we have two options. The second option given here should be faster. Two integers are required, the first for the row and the second for the column: print("Scalar with Iloc:", sunspots.iloc[0, 0]) print("Scalar with iat", sunspots.iat[1, 0]) This gives us the first and second values of the dataset as scalars: Scalar with Iloc 5.0 Scalar with iat 11.0 Querying with Booleans works much like the Where clause of SQL. The following code queries for values larger than the arithmetic mean. Note that there is a difference between when we perform the query on the whole DataFrame and when we perform it on a single column: print("Boolean selection", sunspots[sunspots > sunspots.mean()]) print("Boolean selection with column label:n", sunspots[sunspots['Number of Observations'] > sunspots['Number of Observations'].mean()]) The notable difference is that the first query yields all the rows, with some rows not conforming to the condition that has a value of NaN. The second query returns only the rows where the value is larger than the mean: Boolean selection                             Number Year 1700-12-31          NaN [TRUNCATED] 1759-12-31       54.0 ... [314 rows x 1 columns] Boolean selection with column label                              Number Year 1705-12-31       58.0 [TRUNCATED] 1870-12-31     139.1 ... [127 rows x 1 columns] The preceding example code is in the ch_03.ipynb file of this book's code bundle. Data aggregation with pandas DataFrames Data aggregation is a term used in the field of relational databases. In a database query, we can group data by the value in a column or columns. We can then perform various operations on each of these groups. The pandas DataFrame has similar capabilities. We will generate data held in a Python dict and then use this data to create a pandas DataFrame. We will then practice the pandas aggregation features: Seed the NumPy random generator to make sure that the generated data will not differ between repeated program runs. The data will have four columns: Weather (a string) Food (also a string) Price (a random float) Number (a random integer between one and nine) The use case is that we have the results of some sort of consumer-purchase research, combined with weather and market pricing, where we calculate the average of prices and keep a track of the sample size and parameters: import pandas as pd from numpy.random import seed from numpy.random import rand from numpy.random import rand_int import numpy as np seed(42) df = pd.DataFrame({'Weather' : ['cold', 'hot', 'cold','hot', 'cold', 'hot', 'cold'], 'Food' : ['soup', 'soup', 'icecream', 'chocolate', 'icecream', 'icecream', 'soup'], 'Price' : 10 * rand(7), 'Number' : rand_int(1, 9,)}) print(df) You should get an output similar to the following: Please note that the column labels come from the lexically ordered keys of the Python dict. Lexical or lexicographical order is based on the alphabetic order of characters in a string. Group the data by the Weather column and then iterate through the groups as follows: weather_group = df.groupby('Weather') i = 0 for name, group in weather_group: i = i + 1 print("Group", i, name) print(group) We have two types of weather, hot and cold, so we get two groups: The weather_group variable is a special pandas object that we get as a result of the groupby() method. This object has aggregation methods, which are demonstrated as follows: print("Weather group firstn", weather_group.first()) print("Weather group lastn", weather_group.last()) print("Weather group meann", weather_group.mean()) The preceding code snippet prints the first row, last row, and mean of each group: Just as in a database query, we are allowed to group on multiple columns. The groups attribute will then tell us the groups that are formed, as well as the rows in each group: wf_group = df.groupby(['Weather', 'Food']) print("WF Groups", wf_group.groups) For each possible combination of weather and food values, a new group is created. The membership of each row is indicated by their index values as follows: WF Groups {('hot', 'chocolate'): [3], ('cold', 'icecream'): [2, 4], ('hot', 'icecream'): [5], ('hot', 'soup'): [1], ('cold', 'soup'): [0, 6] 5. Apply a list of NumPy functions on groups with the agg() method: print("WF Aggregatedn", wf_group.agg([np.mean, np.median])) Obviously, we could apply even more functions, but it would look messier than the following output: Concatenating and appending DataFrames The pandas DataFrame allows operations that are similar to the inner and outer joins of database tables. We can append and concatenate rows as well. To practice appending and concatenating of rows, we will reuse the DataFrame from the previous section. Let's select the first three rows: print("df :3n", df[:3]) Check that these are indeed the first three rows: df :3 Food Number        Price Weather 0           soup              8 3.745401       cold 1           soup              5 9.507143         hot 2 icecream              4 7.319939       cold The concat() function concatenates DataFrames. For example, we can concatenate a DataFrame that consists of three rows to the rest of the rows, in order to recreate the original DataFrame: print("Concat Back togethern", pd.concat([df[:3], df[3:]])) The concatenation output appears as follows: Concat Back together Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 3 chocolate 8 5.986585 hot 4 icecream 8 1.560186 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [7 rows x 4 columns] To append rows, use the append() function: print("Appending rowsn", df[:3].append(df[5:])) The result is a DataFrame with the first three rows of the original DataFrame and the last two rows appended to it: Appending rows Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [5 rows x 4 columns] Joining DataFrames To demonstrate joining, we will use two CSV files-dest.csv and tips.csv. The use case behind it is that we are running a taxi company. Every time a passenger is dropped off at his or her destination, we add a row to the dest.csv file with the employee number of the driver and the destination: EmpNr,Dest5,The Hague3,Amsterdam9,Rotterdam Sometimes drivers get a tip, so we want that registered in the tips.csv file (if this doesn't seem realistic, please feel free to come up with your own story): EmpNr,Amount5,109,57,2.5 Database-like joins in pandas can be done with either the merge() function or the join() DataFrame method. The join() method joins onto indices by default, which might not be what you want. In SQL a relational database query language we have the inner join, left outer join, right outer join, and full outer join. An inner join selects rows from two tables, if and only if values match, for columns specified in the join condition. Outer joins do not require a match, and can potentially return more rows. More information on joins can be found at http://en.wikipedia.org/wiki/Join_%28SQL%29. All these join types are supported by pandas, but we will only take a look at inner joins and full outer joins: A join on the employee number with the merge() function is performed as follows: print("Merge() on keyn", pd.merge(dests, tips, on='EmpNr')) This gives an inner join as the outcome: Merge() on key EmpNr            Dest           Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] Joining with the join() method requires providing suffixes for the left and right operands: print("Dests join() tipsn", dests.join(tips, lsuffix='Dest', rsuffix='Tips')) This method call joins index values so that the result is different from an SQL inner join: Dests join() tips EmpNrDest Dest EmpNrTips Amount 0 5 The Hague 5 10.0 1 3 Amsterdam 9 5.0 2 9 Rotterdam 7 2.5 [3 rows x 4 columns] An even more explicit way to execute an inner join with merge() is as follows: print("Inner join with merge()n", pd.merge(dests, tips, how='inner')) The output is as follows: Inner join with merge() EmpNr            Dest           Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] To make this a full outer join requires only a small change: print("Outer joinn", pd.merge(dests, tips, how='outer')) The outer join adds rows with NaN values: Outer join EmpNr            Dest            Amount 0 5 The Hague 10.0 1 3 Amsterdam NaN 2 9 Rotterdam 5.0 3 7 NaN            2.5 [4 rows x 3 columns] In a relational database query, these values would have been set to NULL. The demo code is in the ch-03.ipynb file of this book's code bundle. We learnt how to perform various data manipulation techniques such as aggregating, concatenating, appending, cleaning, and handling missing values, with pandas. If you found this post useful, check out the book Python Data Analysis - Second Edition to learn advanced topics such as signal processing, textual data analysis, machine learning, and more.  
Read more
  • 0
  • 0
  • 6574
article-image-docker-container-management-with-saltstack
Nicole Thomas
25 Sep 2014
8 min read
Save for later

Docker Container Management at Scale with SaltStack

Nicole Thomas
25 Sep 2014
8 min read
Every once in a while a technology comes along that changes the way work is done in a data center. It happened with virtualization and we are seeing it with various cloud computing technologies and concepts. But most recently, the rise in popularity of container technology has given us reason to rethink interdependencies within software stacks and how we run applications in our infrastructures. Despite all of the enthusiasm around the potential of containers, they are still just another thing in the data center that needs to be centrally controlled and managed...often at massive scale. This article will provide an introduction to how SaltStack can be used to manage all of this, including container technology, at web scale. SaltStack Systems Management Software The SaltStack systems management software is built for remote execution, orchestration, and configuration management and is known for being fast, scalable, and flexible. Salt is easy to set up and can easily communicate asynchronously with tens of thousands of servers in a matter of seconds. Salt was originally built as a remote execution engine relying on a secure, bi-directional communication system utilizing a Salt Master daemon used to control Salt Minion daemons, where the minions receive commands from the remote master. Salt’s configuration management capabilities are called the state system, or Salt States, and are built on SLS formulas. SLS files are data structures based on dictionaries, lists, strings, and numbers that are used to enforce the state that a system should be in, also known as configuration management. SLS files are easy to write, simple to implement, and are typically written in YAML. State file execution occurs on the Salt Minion. Therefore, once any states files that the infrastructure requires have been written, it is possible to enforce state on tens of thousands of hosts simultaneously. Additionally, each minion returns its status to the Salt Master. Docker containers Docker is an agile runtime and packaging platform specializing in enabling applications to be quickly built, shipped, and run in any environment. Docker containers allow developers to easily compartmentalize their applications so that all of the program’s needs are installed and configured in a single container. This container can then be dropped into clouds, data centers, laptops, virtual machines (VMs), or infrastructures where the app can execute, unaltered, without any cross-environment issues. Building Docker containers is fast, simple, and inexpensive. Developers can create a container, change it, break it, fix it, throw it away, or save it much like a VM. However, Docker containers are much more nimble, as the containers only possess the application and its dependencies. VMs, along with the application and its dependencies, also install a guest operating system, which requires much more overhead than may be necessary. While being able to slice deployments into confined components is a good idea for application portability and resource allocation, it also means there are many more pieces that need to be managed. As the number of containers scales, without proper management, inefficiencies abound. This scenario, known as container sprawl, is where SaltStack can help and the combination of SaltStack and Docker quickly proves its value. SaltStack + Docker When we combine SaltStack’s powerful configuration management armory with Docker’s portable and compact containerization tools, we get the best of both worlds. SaltStack has added support for users to manage Docker containers on any infrastructure. The following demonstration illustrates how SaltStack can be used to manage Docker containers using Salt States. We are going to use a Salt Master to control a minimum of two Salt Minions. On one minion we will install HAProxy. On the other minion, we will install Docker and then spin up two containers on that minion. Each container hosts a PHP app with Apache that displays information about each container’s IP address and exposed port. The HAProxy minion will automatically update to pull the IP address and port number from each of the Docker containers housed on the Docker minion. You can certainly set up more minions for this experiment, where additional minions would be additional Docker container minions, to see an implementation of containers at scale, but it is not necessary to see what is possible. First, we have a few installation steps to take care of. SaltStack’s installation documentation provides a thorough overview of how to get Salt up and running, depending on your operating system. Follow the instructions on Salt’s Installation Guide to install and configure your Salt Master and two Salt Minions. Go through the process of accepting the minion keys on the Salt Master and, to make sure the minions are connected, run: salt ‘*’ test.ping Note: Much of the Docker functionality used in this demonstration is brand new. You must install the latest supported version of Salt, which is 2014.1.6. For reference, I have my master and minions all running on Ubuntu 14.04. My Docker minion is named “docker1” and my HAProxy minion is named “haproxy”. Now that you have a Salt Master and two Salt Minions configured and communicating with each other, it’s time to get to work. Dave Boucha, a Senior Engineer at SaltStack, has already set up the necessary configuration files for both our haproxy and docker1 minions and we will be using those to get started. First, clone the Docker files in Dave’s GitHub repository, dock-apache, and copy the dock_apache and docker> directories into the /srv/salt directory on your Salt Master (you may need to create the /srv/salt directory): cp -r path/to/download/dock_apache /srv/salt cp -r path/to/download/docker /srv/salt The init.sls file inside the docker directory is a Salt State that installs Docker and all of its dependencies on your minion. The docker.pgp file is a public key that is used during the Docker installation. To fire off all of these events, run the following command: salt ‘docker1’ state.sls docker Once Docker has successfully installed on your minion, we need to set up our two Docker containers. The init.sls file in the dock_apache directory is a Salt State that specifies the image to pull from Docker’s public container repository, installs the two containers, and assigns the ports that each container should expose: 8000 for the first container and 8080 for the second container. Now, let’s run the state: salt ‘docker1’ state.sls dock_apache Let’s check and see if it worked. Get the IP address of the minion by running: salt ‘docker1’ network.ip_addrs Copy and paste the IP address into a browser and add one of the ports to the end. Try them both (IP:8000 and IP:8080) to see each container up and running! At this point, you should have two Docker containers installed on your “docker1” minion (or any other docker minions you may have created). Now we need to set up the “haproxy” minion. Download the files from Dave’s GitHub repository, haproxy-docker, and place them all in your Salt Master’s /srv/salt/haproxy directory: cp -r path/to/download/haproxy /srv/salt First, we need to retrieve the data about our Docker containers from the Docker minion(s), such as the container IP address and port numbers, by running the haproxy.config salt ‘haproxy’ state.sls haproxy.config By running haproxy.config, we are setting up a new configuration with the Docker minion data. The configuration file also runs the haproxy state (init.sls. The init.sls file contains a haproxy state that ensures that HAProxy is installed on the minion, sets up the configuration for HAProxy, and then runs HAProxy. You should now be able to look at the HAProxy statistics by entering the haproxy minion’s IP address in your browser with /haproxy?stats on the end of the URL. When the authentication pop-up appears, the username and password are both saltstack. While this is a simple example of running HAProxy to balance the load of only two Docker containers, it demonstrates how Salt can be used to manage tens of thousands of containers as your infrastructure scales. To see this same demonstration with a haproxy minion in action with 99 Docker minions, each with two Docker containers installed, check out SaltStack’s Salt Air 20 tutorial by Dave Boucha on YouTube. About the author Nicole Thomas is a QA Engineer at SaltStack, Inc. Before coming to SaltStack, she wore many hats from web and Android developer to contributing editor to working in Environmental Education. Nicole recently graduated Summa Cum Laude from Westminster College with a degree in Computer Science. Nicole also has a degree in Environmental Studies from the University of Utah.
Read more
  • 0
  • 0
  • 6557

article-image-searching-data-using-phpmyadmin-and-mysql
Packt
09 Oct 2009
4 min read
Save for later

Searching Data using phpMyAdmin and MySQL

Packt
09 Oct 2009
4 min read
In this article by Marc Delisle, we present mechanisms that can be used to find the data we are looking for instead of just browsing tables page-by-page and sorting them. This article covers single-table and whole database searches. Single-Table Searches This section describes the Search sub-page where single-table search is available. Daily Usage of phpMyAdmin The main use for the tool for some users is with the Search mode, for finding and updating data. For this, the phpMyAdmin team has made it possible to define which sub-page is the starting page in Table view, with the $cfg['DefaultTabTable'] parameter. Setting it to 'tbl_select.php' defines the default sub-page to search. With this mode, application developers can look for data in ways not expected by the interface they are building, adjusting and sometimes repairing data. Entering the Search Sub-Page The Search sub-page can be accessed by clicking the Search link in the Table view. This has been done here for the book table: Selection of Display Fields The first panel facilitates a selection of the fields to be displayed in the results: All fields are selected by default, but we can control-click other fields to make the necessary selections. Mac users would use command-click to select / unselect the fields. Here are the fields of interest to us in this example: We can also specify the number of rows per page in the textbox just next to the field selection. The Add search conditions box will be explained in the Applying a WHERE Clause section later in this article. Ordering the Results The Display order dialog permits to specify an initial sorting order for the results to come. In this dialog, a drop-down menu contains all the table's columns; it's up to us to select the one on which we want to sort. By default, the sorting will be in Ascending order, but a choice of Descending order is available. It should be noted that on the results page, we can also change the sort order. Search Criteria by Field: Query by Example The main usage of the Search panel is to enter criteria for some fields so as to retrieve only the data in which we are interested. This is called Query by example because we give an example of what we are looking for. Our first retrieval will concern finding the book with ISBN 1-234567-89-0. We simply enter this value in the isbn box and choose the = operator: Clicking on Go gives the results shown in the following screenshot. The four fields displayed are those selected in the Select fields dialog: This is a standard results page. If the results ran in pages, we could navigate through them, and edit and delete data for the subset we chose during the process. Another feature of phpMyAdmin is that the fields used as the criteria are highlighted by changing the border color of the columns to better reflect their importance on the results page. It isn't necessary to specify that the isbn column be displayed. We could have selected only the title column for display and selected the isbn column as a criterion. Print View We see the Print view and Print view (with full texts) links on the results page. These links produce a more formal report of the results (without the navigation interface) directly to the printer. In our case, using Print view would produce the following: This report contains information about the server, database, time of generation, version of phpMyAdmin, version of MySQL, and SQL query used. The other link, Print view (with full texts) would print the contents of TEXT fields in its entirety.
Read more
  • 0
  • 0
  • 6550