Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1081 Articles
article-image-api-gateway-and-its-need
Packt
21 Feb 2018
9 min read
Save for later

API Gateway and its Need

Packt
21 Feb 2018
9 min read
 In this article by Umesh R Sharma, author of the book Practical Microservices, we will cover API Gateway and its need with simple and short examples. (For more resources related to this topic, see here.) Dynamic websites show a lot on a single page, and there is a lot of information that needs to be shown on the page. The common success order summary page shows the cart detail and customer address. For this, frontend has to fire a different query to the customer detail service and order detail service. This is a very simple example of having multiple services on a single page. As a single microservice has to deal with only one concern, in result of that to show much information on page, there are many API calls on the same page. So, a website or mobile page can be very chatty in terms of displaying data on the same page. Another problem is that, sometimes, microservice talks on another protocol, then HTTP only, such as thrift call and so on. Outer consumers can't directly deal with microservice in that protocol. As a mobile screen is smaller than a web page, the result of the data required by the mobile or desktop API call is different. A developer would want to give less data to the mobile API or have different versions of the API calls for mobile and desktop. So, you could face a problem such as this: each client is calling different web services and keeping track of their web service and developers have to give backward compatibility because API URLs are embedded in clients like in mobile app. Why do we need the API Gateway? All these preceding problems can be addressed with the API Gateway in place. The API Gateway acts as a proxy between the API consumer and the API servers. To address the first problem in that scenario, there will only be one call, such as /successOrderSummary, to the API Gateway. The API Gateway, on behalf of the consumer, calls the order and user detail, then combines the result and serves to the client. So basically, it acts as a facade or API call, which may internally call many APIs. The API Gateway solves many purposes, some of which are as follows. Authentication API Gateways can take the overhead of authenticating an API call from outside. After that, all the internal calls remove security check. If the request comes from inside the VPC, it can remove the check of security, decrease the network latency a bit, and make the developer focus more on business logic than concerning about security. Different protocol Sometimes, microservice can internally use different protocols to talk to each other; it can be thrift call, TCP, UDP, RMI, SOAP, and so on. For clients, there can be only one rest-based HTTP call. Clients hit the API Gateway with the HTTP protocol and the API Gateway can make the internal call in required protocol and combine the results in the end from all web service. It can respond to the client in required protocol; in most of the cases, that protocol will be HTTP. Load-balancing The API Gateway can work as a load balancer to handle requests in the most efficient manner. It can keep a track of the request load it has sent to different nodes of a particular service. Gateway should be intelligent enough to load balances between different nodes of a particular service. With NGINX Plus coming into the picture, NGINX can be a good candidate for the API Gateway. It has many of the features to address the problem that is usually handled by the API Gateway. Request dispatching (including service discovery) One main feature of the gateway is to make less communication between client and microservcies. So, it initiates the parallel microservices if that is required by the client. From the client side, there will only be one hit. Gateway hits all the required services and waits for the results from all services. After obtaining the response from all the services, it combines the result and sends it back to the client. Reactive microservice designs can help you achieve this. Working with service discovery can give many extra features. It can mention which is the master node of service and which is the slave. Same goes for DB in case any write request can go to the master or read request can go to the slave. This is the basic rule, but users can apply so many rules on the basis of meta information provided by the API Gateway. Gateway can record the basic response time from each node of service instance. For higher priority API calls, it can be routed to the fastest responding node. Again, rules can be defined on the basis of the API Gateway you are using and how it will be implemented. Response transformation Being a first and single point of entry for all API calls, the API Gateway knows which type of client is calling a mobile, web client, or other external consumer; it can make the internal call to the client and give the data to different clients as per needs and configuration. Circuit breaker To handle the partial failure, the API Gateway uses a technique called circuit breaker pattern. A service failure in one service can cause the cascading failure in the flow to all the service calls in stack. The API Gateway can keep an eye on some threshold for any microservice. If any service passes that threshold, it marks that API as open circuit and decides not to make the call for a configured time. Hystrix (by Netflix) served this purpose efficiently. Default value in this is failing of 20 requests in 5 seconds. Developers can also mention the fall back for this open circuit. This fall back can be of dummy service. Once API starts giving results as expected, then gateway marks it as a closed service again. Pros and cons of API Gateway Using the API Gateway itself has its own pros and cons. In the previous section, we have described the advantages of using the API Gateway already. I will still try to make them in points as the pros of the API Gateway. Pros Microservice can focus on business logic Clients can get all the data in a single hit Authentication, logging, and monitoring can be handled by the API Gateway Gives flexibility to use completely independent protocols in which clients and microservice can talk It can give tailor-made results, as per the clients needs It can handle partial failure Addition to the preceding mentioned pros, some of the trade-offs are also to use this pattern. Cons It can cause performance degrade due to lots of happenings on the API Gateway With this, discovery service should be implemented Sometimes, it becomes the single point of failure Managing routing is an overhead of the pattern Adding additional network hope in the call Overall. it increases the complexity of the system Too much logic implementation in this gateway will lead to another dependency problem So, before using the API Gateway, both of the aspects should be considered. Decision of including the API Gateway in the system increases the cost as well. Before putting effort, cost, and management in this pattern, it is recommended to analysis how much you can gain from it. Example of API Gateway In this example, we will try to show only sample product pages that will fetch the data from service product detail to give information about the product. This example can be increased in many aspects. Our focus of this example is to only show how the API Gateway pattern works; so we will try to keep this example simple and small. This example will be using Zuul from Netflix as an API Gateway. Spring also had an implementation of Zuul in it, so we are creating this example with Spring Boot. For a sample API Gateway implementation, we will be using http://start.spring.io/ to generate an initial template of our code. Spring initializer is the project from Spring to help beginners generate basic Spring Boot code. A user has to set a minimum configuration and can hit the Generate Project button. If any user wants to set more specific details regarding the project, then they can see all the configuration settings by clicking on the Switch to the full version button, as shown in the following screenshot: Let's create a controller in the same package of main application class and put the following code in the file: @SpringBootApplication @RestController public class ProductDetailConrtoller { @Resource ProductDetailService pdService; @RequestMapping(value = "/product/{id}") public ProductDetail getAllProduct( @PathParam("id") String id) { return pdService.getProductDetailById(id); } }   In the preceding code, there is an assumption of the pdService bean that will interact with Spring data repository for product detail and get the result for the required product ID. Another assumption is that this service is running on port 10000. Just to make sure everything is running, a hit on a URL such as http://localhost:10000/product/1 should give some JSON as response. For the API Gateway, we will create another Spring Boot application with Zuul support. Zuul can be activated by just adding a simple @EnableZuulProxy annotation. The following is a simple code to start the simple Zuul proxy: @SpringBootApplication @EnableZuulProxy public class ApiGatewayExampleInSpring { public static void main(String[] args) { SpringApplication.run(ApiGatewayExampleInSpring.class, args); } }   Rest all the things are managed in configuration. In the application.properties file of the API Gateway, the content will be something as follows: zuul.routes.product.path=/product/** zuul.routes.produc.url=http://localhost:10000 ribbon.eureka.enabled=false server.port=8080  With this configuration, we are defining rules such as this: for any request for a URL such as /product/xxx, pass this request to http://localhost:10000. For outer world, it will be like http://localhost:8080/product/1, which will internally be transferred to the 10000 port. If we defined a spring.application.name variable as product in product detail microservice, then we don't need to define the URL path property here (zuul.routes.product.path=/product/** ), as Zuul, by default, will make it a URL/product. The example taken here for an API Gateway is not very intelligent, but this is a very capable API Gateway. Depending on the routes, filter, and caching defined in the Zuul's property, one can make a very powerful API Gateway. Summary In this article, you learned about the API Gateway, its need, and its pros and cons with the code example. Resources for Article:   Further resources on this subject: What are Microservices? [article] Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article]
Read more
  • 0
  • 0
  • 10095

article-image-roslyn-cookbook
Packt
20 Feb 2018
6 min read
Save for later

Consuming Diagnostic Analyzers in .NET projects

Packt
20 Feb 2018
6 min read
We know how to write diagnostic analyzers to analyze and report issues about .NET source code and contribute them to the .NET developer community. In this article by the author Manish Vasani, of the book Roslyn Cookbook, we will show you how to search, install, view and configure the analyzers that have already been published by various analyzer authors on NuGet and VS Extension gallery. We will cover the following recipes: (For more resources related to this topic, see here.) Searching and installing analyzers through the NuGet package manager. Searching and installing VSIX analyzers through the VS extension gallery. Viewing and configuring analyzers in solution explorer in Visual Studio. Using ruleset file and ruleset editor to configure analyzers. Diagnostic analyzers are extensions to the Roslyn C# compiler and Visual Studio IDE to analyze user code and report diagnostics. User will see these diagnostics in the error list after building the project from Visual Studio and even when building the project on the command line. They will also see the diagnostics live while editing the source code in the Visual Studio IDE. Analyzers can report diagnostics to enforce specific code styles, improve code quality and maintenance, recommend design guidelines or even report very domain specific issues which cannot be covered by the core compiler. Analyzers can be installed to a .NET project either as a NuGet package or as a VSIX. To get a better understanding of these packaging schemes and learn about the differences in the analyzer experience when installed as a NuGet package versus a VSIX. Analyzers are supported on various different flavors of .NET standard, .NET core and .NET framework projects, for example, class library, console app, etc. Searching and installing analyzers through the NuGet package manager In this recipe we will show you how to search and install analyzer NuGet packages in the NuGet package manager in Visual Studio and see how the analyzer diagnostics from an installed NuGet package light up in project build and as live diagnostics during code editing in Visual Studio. Getting ready You will need to have Visual Studio 2017 installed on your machine to this recipe. You can install a free community version of Visual Studio 2017 from https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15.  How to do it… Create a C# class library project, say ClassLibrary, in Visual Studio 2017. In solution explorer, right click on the solution or project node and execute Manage NuGet Packages command.  This brings up the NuGet Package Manager, which can be used to search and install NuGet packages to the solution or project. In the search bar type the following text to find NuGet packages tagged as analyzers: Tags:"analyzers" Note that some of the well known packages are tagged as analyzer, so you may also want to search:Tags:"analyzer" Check or uncheck the Include prerelease checkbox to the right of the search bar to search or hide the prerelease analyzer packages respectively. The packages are listed based on the number of downloads, with the highest downloaded package at the top. Select a package to install, say System.Runtime.Analyzers, and pick a specific version, say 1.1.0, and click Install. Click on I Accept button on the License Acceptance dialog to install the NuGet package. Verify the installed analyzer(s) show up under the Analyzers node in the solution explorer. Verify the project file has a new ItemGroup with the following analyzer references from the installed analyzer package: <ItemGroup> <Analyzer Include="..packagesSystem.Runtime.Analyzers.1.1.0analyzersdotnetcsSystem.Runtime.Analyzers.dll" /> <Analyzer Include="..packagesSystem.Runtime.Analyzers.1.1.0analyzersdotnetcsSystem.Runtime.CSharp.Analyzers.dll" /> </ItemGroup> Add the following code to your C# project: namespace ClassLibrary { public class MyAttribute : System.Attribute { } } Verify the analyzer diagnostic from the installed analyzer is shown in the error list: Open a Visual Studio 2017 Developer Command Prompt and build the project to verify that the analyzer is executed on the command line build and the analyzer diagnostic is reported: Create a new C# project in VS2017 and add the same code to it as step 9 and verify no analyzer diagnostic shows up in error list or command line, confirming that the analyzer package was only installed to the selected project in steps 1-6. Note that CA1018 (Custom attribute should have AttributeUsage defined) has been moved to a separate analyzer assembly in future versions of FxCop/System.Runtime.Analyzers package. It is recommended that you install Microsoft.CodeAnalysis.FxCopAnalyzers NuGet package to get the latest group of Microsoft recommended analyzers. Searching and installing VSIX analyzers through the VS extension gallery In this recipe we will show you how to search and install analyzer VSIX packages in the Visual Studio Extension manager and see how the analyzer diagnostics from an installed VSIX light up as live diagnostics during code editing in Visual Studio. Getting ready You will need to have Visual Studio 2017 installed on your machine to this recipe. You can install a free community version of Visual Studio 2017 from https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15. How to do it… Create a C# class library project, say ClassLibrary, in Visual Studio 2017. From the top level menu, execute Tools | Extensions and Updates Navigate to Online | Visual Studio Marketplace on the left tab of the dialog to view the available VSIXes in the Visual Studio extension gallery/marketplace. Search analyzers in the search text box in the upper right corner of the dialog and download an analyzer VSIX, say Refactoring Essentials for Visual Studio. Once the download completes, you will get a message at the bottom of the dialog that the install will be scheduled to execute once Visual Studio and related windows are closed. Close the dialog and then close the Visual Studio instance to start the install. In the VSIX Installer dialog, click Modify to start installation. The subsequent message prompts you to kill all the active Visual Studio and satellite processes. Save all your relevant work in all the open Visual Studio instances, and click End Tasks to kill these processes and install the VSIX. After installation, restart VS, click Tools | Extensions And Updates, and verify Refactoring Essentials VSIX is installed. Create a new C# project with the following source code and verify analyzer diagnostic RECS0085 (Redundant array creation expression) in the error list: namespace ClassLibrary { public class Class1 { void Method() { int[] values = new int[] { 1, 2, 3 }; } } } Build the project from Visual Studio 2017 or command line and confirm no analyzer diagnostic shows up in the Output Window or the command line respectively, confirming that the VSIX analyzer did not execute as part of the build. Resources for Article: Further resources on this subject: C++, SFML, Visual Studio, and Starting the first game [article] Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [article] Creating efficient reports with Visual Studio [article]
Read more
  • 0
  • 0
  • 2246

article-image-introduction-performance-testing-and-jmeter
Packt
20 Feb 2018
11 min read
Save for later

Introduction to Performance Testing and JMeter

Packt
20 Feb 2018
11 min read
In this article by Bayo Erinle, the author of the book Performance Testing with JMeter 3, will explore some of the options that make JMeter a great tool of choice for performance testing.  (For more resources related to this topic, see here.) Performance testing and tuning There is a strong relationship between performance testing and tuning, in the sense that one often leads to the other. Often, end-to-end testing unveils system or application bottlenecks that are regarded unacceptable with project target goals. Once those bottlenecks are discovered, the next step for most teams is a series of tuning efforts to make the application perform adequately. Such efforts normally include, but are not limited to, the following: Configuring changes in system resources Optimizing database queries Reducing round trips in application calls, sometimes leading to redesigning and re-architecting problematic modules Scaling out application and database server capacity Reducing application resource footprint Optimizing and refactoring code, including eliminating redundancy and reducing execution time Tuning efforts may also commence if the application has reached acceptable performance but the team wants to reduce the amount of system resources being used, decrease the volume of hardware needed, or further increase system performance. After each change (or series of changes), the test is re-executed to see whether the performance has improved or declined due to the changes. The process will be continued with the performance results having reached acceptable goals. The outcome of these test-tuning circles normally produces a baseline. Baselines Baseline is a process of capturing performance metric data for the sole purpose of evaluating the efficacy of successive changes to the system or application. It is important that all characteristics and configurations, except those specifically being varied for comparison, remain the same in order to make effective comparisons as to which change (or series of changes) is driving results toward the targeted goal. Armed with such baseline results, subsequent changes can be made to the system configuration or application and testing results can be compared to see whether such changes were relevant or not. Some considerations when generating baselines include the following: They are application-specific They can be created for system, application, or modules They are metrics/results They should not be over generalized They evolve and may need to be redefined from time to time They act as a shared frame of reference They are reusable They help identify changes in performance Load and stress testing Load testing is the process of putting demand on a system and measuring its response, that is, determining how much volume the system can handle. Stress testing is the process of subjecting the system to unusually high loads far beyond its normal usage pattern to determine its responsiveness. These are different from performance testing, whose sole purpose is to determine the response and effectiveness of a system, that is, how fast the system is. Since load ultimately affects how a system responds, performance testing is always done in conjunction with stress testing. JMeter to the rescue One of the areas performance testing covers is testing tools. Which testing tool do you use to put the system and application under load? There are numerous testing tools available to perform this operation, from free to commercial solutions. However, our focus will be on Apache JMeter, a free, open source, cross-platform desktop application from the Apache Software foundation. JMeter has been around since 1998 according to historic change logs on its official site, making it a mature, robust, and reliable testing tool. Cost may also have played a role in its wide adoption. Small companies usually may not want to foot the bill for commercial end testing tools, which often place restrictions, for example, on how many concurrent users one can spin off. My first encounter with JMeter was exactly a result of this. I worked in a small shop that had paid for a commercial testing tool, but during the course of testing, we had outrun the licensing limits of how many concurrent users we needed to simulate for realistic test plans. Since JMeter was free, we explored it and were quite delighted with the offerings and the share amount of features we got for free. Here are some of its features: Performance tests of different server types, including web (HTTP and HTTPS), SOAP, database, LDAP, JMS, mail, and native commands or shell scripts Complete portability across various operating systems Full multithreading framework allowing concurrent sampling by many threads and simultaneous sampling of different functions by separate thread groups Full featured Test IDE that allows fast Test Plan recording, building, and debugging Dashboard Report for detailed analysis of application performance indexes and key transactions In-built integration with real-time reporting and analysis tools, such as Graphite, InfluxDB, and Grafana, to name a few Complete dynamic HTML reports Graphical User Interface (GUI) HTTP proxy recording server Caching and offline analysis/replaying of test results High extensibility Live view of results as testing is being conducted JMeter allows multiple concurrent users to be simulated on the application, allowing you to work toward most of the target goals obtained earlier, such as attaining baseline and identifying bottlenecks. It will help answer questions, such as the following: Will the application still be responsive if 50 users are accessing it concurrently? How reliable will it be under a load of 200 users? How much of the system resources will be consumed under a load of 250 users? What will the throughput look like with 1000 users active in the system? What will be the response time for the various components in the application under load? JMeter, however, should not be confused with a browser. It doesn't perform all the operations supported by browsers; in particular, JMeter does not execute JavaScript found in HTML pages, nor does it render HTML pages the way a browser does. However, it does give you the ability to view request responses as HTML through many of its listeners, but the timings are not included in any samples. Furthermore, there are limitations to how many users can be spun on a single machine. These vary depending on the machine specifications (for example, memory, processor speed, and so on) and the test scenarios being executed. In our experience, we have mostly been able to successfully spin off 250-450 users on a single machine with a 2.2 GHz processor and 8 GB of RAM. Up and running with JMeter Now, let's get up and running with JMeter, beginning with its installation. Installation JMeter comes as a bundled archive, so it is super easy to get started with it. Those working in corporate environments behind a firewall or machines with non-admin privileges appreciate this more. To get started, grab the latest binary release by pointing your browser to http://jmeter.apache.org/download_jmeter.cgi. At the time of writing this, the current release version is 3.1. The download site offers the bundle as both a .zip file and a .tgz file. We go with the .zip file option, but feel free to download the .tgz file if that's your preferred way of grabbing archives. Once downloaded, extract the archive to a location of your choice. The location you extracted the archive to will be referred to as JMETER_HOME. Provided you have a JDK/JRE correctly installed and a JAVA_HOME environment variable set, you are all set and ready to run! The following screenshot shows a trimmed down directory structure of a vanilla JMeter install: JMETER_HOME folder structure The following are some of the folders in Apache-JMeter-3.2, as shown in the preceding screenshot: bin: This folder contains executable scripts to run and perform other operations in JMeter docs: This folder contains a well-documented user guide extras: This folder contains miscellaneous items, including samples illustrating the usage of the Apache Ant build tool (http://ant.apache.org/) with JMeter, and bean shell scripting lib: This folder contains utility JAR files needed by JMeter (you may add additional JARs here to use from within JMeter; we will cover this in detail later) printable_docs: This is the printable documentation Installing Java JDK Follow these steps to install Java JDK: Go to http://www.oracle.com/technetwork/java/javase/downloads/index.html. Download Java JDK (not JRE) compatible with the system that you will use to test. At the time of writing, JDK 1.8 (update 131) was the latest. Double-click on the executable and follow the onscreen instructions. On Windows systems, the default location for the JDK is under Program Files. While there is nothing wrong with this, the issue is that the folder name contains a space, which can sometimes be problematic when attempting to set PATH and run programs, such as JMeter, depending on the JDK from the command line. With this in mind, it is advisable to change the default location to something like C:toolsjdk. Setting up JAVA_HOME Here are the steps to set up the JAVA_HOME environment variable on Windows and Unix operating systems. On Windows For illustrative purposes, assume that you have installed Java JDK at C:toolsjdk: Go to Control Panel. Click on System. Click on Advance System settings. Add Environment to the following variables:     Value: JAVA_HOME     Path: C:toolsjdk Locate Path (under system variables, bottom half of the screen). Click on Edit. Append %JAVA_HOME%/bin to the end of the existing path value (if any). On Unix For illustrative purposes, assume that you have installed Java JDK at /opt/tools/jdk: Open up a Terminal window. Export JAVA_HOME=/opt/tools/jdk. Export PATH=$PATH:$JAVA_HOME. It is advisable to set this in your shell profile settings, such as .bash_profile (for bash users) or .zshrc (for zsh users), so that you won't have to set it for each new Terminal window you open. Running JMeter Once installed, the bin folder under the JMETER_HOME folder contains all the executable scripts that can be run. Based on the operating system that you installed JMeter on, you either execute the shell scripts (.sh file) for operating systems that are Unix/Linux flavored, or their batch (.bat file) counterparts on operating systems that are Windows flavored. JMeter files are saved as XML files with a .jmx extension. We refer to them as test scripts or JMX files. These scripts include the following: jmeter.sh: This script launches JMeter GUI (the default) jmeter-n.sh: This script launches JMeter in non-GUI mode (takes a JMX file as input) jmeter-n-r.sh: This script launches JMeter in non-GUI mode remotely jmeter-t.sh: This opens a JMX file in the GUI jmeter-server.sh: This script starts JMeter in server mode (this will be kicked off on the master node when testing with multiple machines remotely) mirror-server.sh: This script runs the mirror server for JMeter shutdown.sh: This script gracefully shuts down a running non-GUI instance stoptest.sh: This script abruptly shuts down a running non-GUI instance   To start JMeter, open a Terminal shell, change to the JMETER_HOME/bin folder, and run the following command on Unix/Linux: ./jmeter.sh Alternatively, run the following command on Windows: jmeter.bat Take a moment to explore the GUI. Hover over each icon to see a short description of what it does. The Apache JMeter team has done an excellent job with the GUI. Most icons are very similar to what you are used to, which helps ease the learning curve for new adapters. Some of the icons, for example, stop and shutdown, are disabled for now till a scenario/test is being conducted. The JVM_ARGS environment variable can be used to override JVM settings in the jmeter.bat or jmeter.sh script. Consider the following example: export JVM_ARGS="-Xms1024m -Xmx1024m -Dpropname=propvalue". Command-line options To see all the options available to start JMeter, run the JMeter executable with the -? command. The options provided are as follows: . ./jmeter.sh -? -? print command line options and exit -h, --help print usage information and exit -v, --version print the version information and exit -p, --propfile <argument> the jmeter property file to use -q, --addprop <argument> additional JMeter property file(s) -t, --testfile <argument> the jmeter test(.jmx) file to run -l, --logfile <argument> the file to log samples to -j, --jmeterlogfile <argument> jmeter run log file (jmeter.log) -n, --nongui run JMeter in nongui mode ... -J, --jmeterproperty <argument>=<value> Define additional JMeter properties -G, --globalproperty <argument>=<value> Define Global properties (sent to servers) e.g. -Gport=123 or -Gglobal.properties -D, --systemproperty <argument>=<value> Define additional system properties -S, --systemPropertyFile <argument> additional system property file(s) This is a snippet (non-exhaustive list) of what you might see if you did the same. Summary In this article we have learnt relationship between performance testing and tuning, and how to install and run JMeter.   Resources for Article: Further resources on this subject: Functional Testing with JMeter [article] Creating an Apache JMeter™ test workbench [article] Getting Started with Apache Spark DataFrames [article]
Read more
  • 0
  • 0
  • 2051
Banner background image

article-image-getting-inside-c-plus-plus-multithreaded-application
Maya Posch
13 Feb 2018
8 min read
Save for later

Getting Inside a C++ Multithreaded Application

Maya Posch
13 Feb 2018
8 min read
This C++ programming tutorial is taken from Maya Posch's Mastering C++ Multithreading. In its most basic form, a multithreaded application consists of a singular process with two or more threads. These threads can be used in a variety of ways, for example, to allow the process to respond to events in an asynchronous manner by using one thread per incoming event or type of event, or to speed up the processing of data by splitting the work across multiple threads. Examples of asynchronous response to events include the processing of user interface (GUI) and network events on separate threads so that neither type of event has to wait on the other, or can block events from being responded to in time. Generally, a single thread performs a single task, such as the processing of GUI or network events, or the processing of data. For this basic example, the application will start with a singular thread, which will then launch a number of threads, and wait for them to finish. Each of these new threads will perform its own task before finishing. Let's start with the includes and global variables for our application: #include <iostream> #include <thread> #include <mutex> #include <vector> #include <random> using namespace std; // --- Globals mutex values_mtx; mutex cout_mtx; Both the I/O stream and vector headers should be familiar to anyone who has ever used C++: the former is here used for the standard output (cout), and vector for storing a sequence of values. The random header is new in c++11, and as the name suggests, it offers classes and methods for generating random sequences. We use it here to make our threads do something interesting. Finally, the thread and mutex includes are the core of our multithreaded application; they provide the basic means for creating threads, and allow for thread-safe interactions between them. Moving on, we create two mutexes: one for the global vector and one for cout, since the latter is not thread-safe. Next we create the main function as follows: int main() { values.push_back(42); We then push a fixed value onto the vector instance; this one will be used by the threads we create in a moment: thread tr1(threadFnc, 1); thread tr2(threadFnc, 2); thread tr3(threadFnc, 3); thread tr4(threadFnc, 4); We create new threads, and provide them with the name of the method to use, passing along any parameters--in this case, just a single integer: tr1.join(); tr2.join(); tr3.join(); tr4.join(); Next, we wait for each thread to finish before we continue by calling join() on each thread instance: cout << "Input: " << values[0] << ", Result 1: " << values[1] << ", Result 2: " << values[2] << ", Result 3: " << values[3] << ", Result 4: " << values[4] << "n"; return 1; } At this point, we expect that each thread has done whatever it's supposed to do, and added the result to the vector, which we then read out and show the user. Of course, this shows almost nothing of what really happens in the application, mostly just the essential simplicity of using threads. Next, let's see what happens inside this method that we pass to each thread instance: void threadFnc(int tid) { cout_mtx.lock(); cout << "Starting thread " << tid << ".n"; cout_mtx.unlock(); When we obtain the initial value set in the vector, we copy it to a local variable so that we can immediately release the mutex for the vector to enable other threads to use it: int rval = randGen(0, 10); val += rval; These last two lines contain the essence of what the threads created do: they take the initial value, and add a randomly generated value to it. The randGen() method takes two parameters, defining the range of the returned value: cout_mtx.lock(); cout << "Thread " << tid << " adding " << rval << ". New value: " << val << ".n"; cout_mtx.unlock(); values_mtx.lock(); values.push_back(val); values_mtx.unlock(); } Finally, we (safely) log a message informing the user of the result of this action before adding the new value to the vector. In both cases, we use the respective mutex to ensure that there can be no overlap with any of the other threads. Once the method reaches this point, the thread containing it will terminate, and the main thread will have one fewer thread to wait for to rejoin. Lastly, we'll take a look at the randGen() method. Here we can see some multithreaded specific additions as well: int randGen(const int& min, const int& max) { static thread_local mt19937 generator(hash<thread::id>()(this_thread::get_id())); uniform_int_distribution<int> distribution(min, max); return distribution(generator) } This preceding method takes a minimum and maximum value as explained earlier, which limit the range of the random numbers this method can return. At its core, it uses a mt19937-based generator, which employs a 32-bit Mersenne Twister algorithm with a state size of 19937 bits. This is a common and appropriate choice for most applications. Of note here is the use of the thread_local keyword. What this means is that even though it is defined as a static variable, its scope will be limited to the thread using it. Every thread will thus create its own generator instance, which is important when using the random number API in the STL. A hash of the internal thread identifier (not our own) is used as seed for the generator. This ensures that each thread gets a fairly unique seed for its generator instance, allowing for better random number sequences. Finally, we create a new uniform_int_distribution instance using the provided minimum and maximum limits, and use it together with the generator instance to generate the random number which we return. Makefile In order to compile the code described earlier, one could use an IDE, or type the command on the command line. As mentioned in the beginning of this chapter, we'll be using makefiles for the examples in this book. The big advantages of this are that one does not have to repeatedly type in the same extensive command, and it is portable to any system which supports make. The makefile for this example is rather basic: GCC := g++ OUTPUT := ch01_mt_example SOURCES := $(wildcard *.cpp) CCFLAGS := -std=c++11 all: $(OUTPUT) $(OUTPUT): clean: rm $(OUTPUT) .PHONY: all From the top down, we first define the compiler that we'll use (g++), set the name of the output binary (the .exe extension on Windows will be post-fixed automatically), followed by the gathering of the sources and any important compiler flags. The wildcard feature allows one to collect the names of all files matching the string following it in one go without having to define the name of each source file in the folder individually. For the compiler flags, we're only really interested in enabling the c++11 features, for which GCC still requires one to supply this compiler flag. For the all method, we just tell make to run g++ with the supplied information. Next we define a simple clean method which just removes the produced binary, and finally, we tell make to not interpret any folder or file named all in the folder, but to use the internal method with the .PHONY section. When we run this makefile, we see the following command-line output: $ make Afterwards, we find an executable file called ch01_mt_example (with the .exe extension attached on Windows) in the same folder. Executing this binary will result in a command-line output akin to the following: $ ./ch01_mt_example.exe Starting thread 1. Thread 1 adding 8. New value: 50. Starting thread 2. Thread 2 adding 2. New value: 44. Starting thread 3. Starting thread 4. Thread 3 adding 0. New value: 42. Thread 4 adding 8. New value: 50. Input: 42, Result 1: 50, Result 2: 44, Result 3: 42, Result 4: 50 What one can see here already is the somewhat asynchronous nature of threads and their output. While threads 1 and 2 appear to run synchronously, threads 3 and 4 clearly run asynchronously. For this reason, and especially in longer-running threads, it's virtually impossible to say in which order the log output and results will be returned. While we use a simple vector to collect the results of the threads, there is no saying whether Result 1 truly originates from the thread which we assigned ID 1 in the beginning. If we need this information, we need to extend the data we return by using an information structure with details on the processing thread or similar. One could, for example, use struct like this: struct result { int tid; int result; }; The vector would then be changed to contain result instances rather than integer instances. One could pass the initial integer value directly to the thread as part of its parameters, or pass it via some other way. Want to learn C++ multithreading in detail? You can find Mastering C++ Multithreading here, or explore all our latest C++ eBooks and videos here.
Read more
  • 0
  • 0
  • 4075

article-image-getting-started-soa-and-wso2
Packt
11 Jan 2018
11 min read
Save for later

Getting Started with SOA and WSO2

Packt
11 Jan 2018
11 min read
In this article by Fidel Prieto Estrada and Ramón Garrido, authors of the book WSO2: Developer’s Guide, we will discuss the facts or problems that large companies with a huge IT system had to face, and that finally gave rise to the SOA approach. (For more resources related to this topic, see here.) Once we know what we are talking about, we will introduce the WSO2 technology and describe the role it plays in SOA, which will be followed by the installation and configuration of the WSO2 products we will use. So, in this article, we willlearn about the basic knowledge of SOA. Service-oriented architecture (SOA) is a style, an approach to design software in a different way from the standard. SOA is not a technology; it is a paradigm, a design style. There comes a time when a company grows and grows, which means that its IT system also becomes bigger and bigger, fetching a huge amount of data that it has to share with other companies. This typical data may be, for example, any of the following: Sales data Employees data Customer data Business information In this environment, each information need of the company's applications is satisfied by a direct link to the system that owns the required information. So, when a company becomes a large corporation, with many departments and complex business logic, the IT system becomes a spaghetti dish: Insert Image B06549_01_01.png Spaghetti dish The spaghetti dish is a comparison widely used to describe how complex the integration links between applications may become in this large corporation. In this comparison, each spaghetti represents the link between two applications in order to share any kind of information. Thus, when the number of applications needed for our business rises, the amount of information shared is larger as well. So, if we draw the map that represents all the links between the whole set of applications, the image will be quite similar to a spaghetti dish. Take a look at the following diagram: Insert Image B06549_01_02.png Spaghetti integrations by Oracle (https://image.slidesharecdn.com/2012-09-20-aspire-oraclesoawebinar-finalversion-160109031240/95/maximizing-oracle-apps-capabilities-using-oracle-soa-7-           638.jpg?cb=1452309418) The preceding diagram represents an environment that is closed, monolithic, and inefficient,with the following features: The architecture is split into blocks divided by business areas. Each area is close to the rest of the areas, so interaction between them is quite difficult. These isolated blocks are hard to maintain. Each block was managed by just one provider, which knew that business area deeply. It is difficult for the company to change the provider that manages each business area due to the risk involved. The company cannot protect itself against the abuses of the provider. The provider may commit many abuses, such as raising the provided service fare, violatingservice level agreement (SLA), breaching the schedule, and many others we can imagine. In these situations, the company lacks instruments to fight them because if the business area managed by the provider stops working, the impact on the company profits is much larger than when assumingthat the provider abuses. The provider has deeper knowledge of the customer business than the customer itself. The maintenance cost is high due to the complexity of the network for many reasons; consider the following example: It is difficultto perform impact analysis when a new functionality is needed, which means high cost and long time to evaluate any fix, and higher cost of each fix in turn. The complex interconnection network is difficult to know in depth. Finding the cause of a failure or malfunction may become quite a task. When a system is down, most of the others may be down as well. A business process is used to involve different databases and applications. Thus, when a user has to run a business process in the company, he needs to use different applications, access different networks, and log in with different credentials in each one; this makes the business quite inefficient, making simple tasks take too much time. When a system in your puzzle uses an obsolete technology, which is quite common with legacy systems, you will always be tied to it and to the incompatibility issues with brand new technologies, for instance. Managing a fine-grained security policy that manages who has access to each piece of data is simply an uthopy. Something must to be done to face all these problems and SOA is the one to put this in order. SOA is the final approach after the previous attempt to try to tidy up this chaos. We can take a look at the SOA origin in the white paper,The 25-year history of SOA, by ErikTownsend(http://www.eriktownsend.com/white-papers/technology). It is quite an interesting read, where Erik establishes the origin of the manufacturing industry. I agree to that idea, and it is easy to see how the improvements in the manufacturing industry, or other industries, are applied to the IT world; take these examples: Hardware bus in motherboards are being used for decades, and now we can also find software bus, Enterprise Service Bus (ESB) in a company. The hardware bus connects hardware devices such as microprocessor, memory, or hard drive; the software bus connects applications. Hardware router in a network routes small fragments of data between different nets to lead these packets to the destination net. The message router software, which implements the message router enterprise integration pattern, routes data objects between applications. We create software factories to develop software using the same paradigm as a manufacturing industry. Lean IT is a trending topic nowadays. It tries, roughly speaking, to optimize the IT processes by removing the muda (Japanese word meaning wastefulness, uselessness). It is based on the benefits of the lean manufacturing applied by Toyota in the '70s, after the oil crisis, which led it to the top position in the car manufacturing industry. We find an analogy between what object-oriented language means to programming and what SOA represents to system integrations as well. We can also find analogies between ITILv3 and SOA. The way ITILv3 manages the company services can be applied to manage the SOA services at many points. ITILv3 deals with the services that a company offers and how to manage them, and SOA deals with the service that a company offers to expose data from one system to the rest of them. Both the conceptions are quite similar if we think of the ITILv3 company as the IT department and ofthe company's service as the SOA service. There is another quite interesting read--Note on Distributed Computing from Sun Microsystem Laboratories published in 1994. In this reading,four membersof Sun Microsystems discuss the problems that a company faces when it expands, and the system that made up the IT core of the company and its need to share information. You canfind this reading athttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.7969&rep=rep1&type=pdf. In the early '90s, when companies were starting to computerize, they needed to share information from one system to another, whichwas not an easy task at all. There was a discussion on how to handle the local and remote information as well as which technology to use to share that information. The Network File System(NFS), by IBM was a good attempt to share that information, but there was still a lot of work left to do.After NFS, other approaches came,such as CORBA and Microsoft DCOM, but they still keep the dependencies between the whole set of applications connected. Refer to the following diagram: Insert Image B06549_01_03.png The SOA approach versus CORBA and DCOM Finally, with the SOA approach, by the end of the '90s, independent applications where able to share their data avoiding dependencies. This data interchange is done using services. An SOA service is a data interchange need between different systems that accomplish some rules. These rules are the so-called SOA principles that we will explain as we move on. SOA Principles The SOA Principles are the rules that we always have to keep in mind when taking any kind of decisions in an SOA organization, such as the following: Analyzing proposal for services Deciding whether to add a new functionality to a service or to split it into two services Solving performance issues Designing new services There is no industry agreement about the SOA principles, and some of them publish their own principles. Now, we will go through the principles that will help us in understanding its importance: Service Standardization:Services must comply with communication and design agreements defined for the catalog they belong to. These include both high-level specifications and low level details, such as those mentioned here: Service name Functional details Input data Output data Protocols Security Service loose coupling: Services in the catalog must be independent from each other. The only thing a service should know about the rest of the services in the catalog is that they exist. The way to achieve this is by defining service contracts so that when a service needs to use another one, it has to just use that service contract. Service abstraction: The service should be a black box just defined by its contracts. The contract specifies the input and output parameters with no information about how the process is performed at all. This reduces the coupling with other services to a minimum. Service reusability: This is the most important principle and means that services must be conceived to be reused by the maximum number of consumers. The service must be reused in any context and by any consumer, not only by the application that originated the need for the service. Other applications in the company must be able to consume that service and even other systems outside the company in case the service is published, for example, for the citizenship. To achieve this, obviously the service must be independent from any technology and must not be coupled to a specific business process. If we have a service working in a context, and it is needed to serve in a wider context, the right choice is to modify the service for it to be able to be consumed in both the contexts. Service autonomy: A service must have a high degree of control over the runtime environment and over the logic it represents. The more control a service has over the underlying resources, the less dependencies it has and the more predictable and reliable it is. Resources maybe hardware resources or software resources, for example, the network is a hardware resource, and a database table is a software resource. It would be ideal to have a service with exclusive ownership over the resources, but with an equilibrated amount of control that allows it to minimize the dependencies on shared resources. Service statelessness: Services must have no state, that is, a service does not retain information about the data processed. All thedata needed comes from the input parameters every time it is consumed. The information needed during the process dies when the process ends. Managing the whole amount of state information will put its availability in serious trouble. Service discovery: With a goal to maximize the reutilization, services must be able to be discovered. Everyone should know the service list and their detailed information. To achieve that aim, services will have metadata to describe them, which will be stored in a repository or catalog. This metadata information must be accessed easily and automatically (programmatically) using, for example,Universal Description, Discovery, and Integration (UDDI). Thus, we avoid building or applying for a new service when we already have a service, or several ones, providing that information by composition. Service composability: Service with more complex requirements must use other existing services to achieve that aim instead of implementing the same logic that is already available in other services. Service granularity: Services must offer a relevant piece of business. The functionality of the service must not be so simple that the output of the service always needs to be complemented with another service'sfunctionality. Likewise, the functionality of the service must not be so complex that none of the services in the company uses the whole set of information returned by the service. Service normalization: Like in other areas such as database design, services must be decomposed, avoiding redundant logic. This principle may be omitted in some cases due to, for example, performance issues, where the priority is quick response for the business. Vendor independent: As we discussed earlier, services must not be attached to any technology. The service definition must be technology independent, and any vendor-specific feature must not affect the design of the service. Summary In this article, we discussed the issues that gave rise to SOA, described its main principles, and explained how to make our standard organization in an SOA organization. In order to achieve this aim, we named the WSO2 product we need as WSO2 Enterprise Integrator. Finally, we learned how to install, configure, and start it up. Resources for Article:   Further resources on this subject: [article] [article] [article]
Read more
  • 0
  • 0
  • 1018

article-image-starting-out
Packt
10 Aug 2017
21 min read
Save for later

Starting Out

Packt
10 Aug 2017
21 min read
In this article by Chris Simmonds, author of the book Mastering Embedded Linux Programming – Second Edition, you are about to begin working on your next project, and this time it is going to be running Linux. What should you think about before you put finger to keyboard? Let's begin with a high-level look at embedded Linux and see why it is popular, what are the implications of open source licenses, and what kind of hardware you will need to run Linux. (For more resources related to this topic, see here.) Linux first became a viable choice for embedded devices around 1999. That was when Axis (https://www.axis.com), released their first Linux-powered network camera and TiVo (https://business.tivo.com/) their first Digital Video Recorder (DVR). Since 1999, Linux has become ever more popular, to the point that today it is the operating system of choice for many classes of product. At the time of writing, in 2017, there are about two billion devices running Linux. That includes a large number of smartphones running Android, which uses a Linux kernel, and hundreds of millions of set-top-boxes, smart TVs, and Wi-Fi routers, not to mention a very diverse range of devices such as vehicle diagnostics, weighing scales, industrial devices, and medical monitoring units that ship in smaller volumes. So, why does your TV run Linux? At first glance, the function of a TV is simple: it has to display a stream of video on a screen. Why is a complex Unix-like operating system like Linux necessary? The simple answer is Moore's Law: Gordon Moore, co-founder of Intel, observed in 1965 that the density of components on a chip will double approximately every two years. That applies to the devices that we design and use in our everyday lives just as much as it does to desktops, laptops, and servers. At the heart of most embedded devices is a highly integrated chip that contains one or more processor cores and interfaces with main memory, mass storage, and peripherals of many types. This is referred to as a System on Chip, or SoC, and SoCs are increasing in complexity in accordance with Moore's Law. A typical SoC has a technical reference manual that stretches to thousands of pages. Your TV is not simply displaying a video stream as the old analog sets used to do. The stream is digital, possibly encrypted, and it needs processing to create an image. Your TV is (or soon will be) connected to the Internet. It can receive content from smartphones, tablets, and home media servers. It can be (or soon will be) used to play games. And so on and so on. You need a full operating system to manage this degree of complexity. Here are some points that drive the adoption of Linux: Linux has the necessary functionality. It has a good scheduler, a good network stack, support for USB, Wi-Fi, Bluetooth, many kinds of storage media, good support for multimedia devices, and so on. It ticks all the boxes. Linux has been ported to a wide range of processor architectures, including some that are very commonly found in SoC designs—ARM, MIPS, x86, and PowerPC. Linux is open source, so you have the freedom to get the source code and modify it to meet your needs. You, or someone working on your behalf, can create a board support package for your particular SoC board or device. You can add protocols, features, and technologies that may be missing from the mainline source code. You can remove features that you don't need to reduce memory and storage requirements. Linux is flexible. Linux has an active community; in the case of the Linux kernel, very active. There is a new release of the kernel every 8 to 10 weeks, and each release contains code from more than 1,000 developers. An active community means that Linux is up to date and supports current hardware, protocols, and standards. Open source licenses guarantee that you have access to the source code. There is no vendor tie-in. For these reasons, Linux is an ideal choice for complex devices. But there are a few caveats I should mention here. Complexity makes it harder to understand. Coupled with the fast moving development process and the decentralized structures of open source, you have to put some effort into learning how to use it and to keep on re-learning as it changes. Selecting the right operating system Is Linux suitable for your project? Linux works well where the problem being solved justifies the complexity. It is especially good where connectivity, robustness, and complex user interfaces are required. However, it cannot solve every problem, so here are some things to consider before you jump in: Is your hardware up to the job? Compared to a traditional real-time operating system (RTOS) such as VxWorks, Linux requires a lot more resources. It needs at least a 32-bit processor and lots more memory. I will go into more detail in the section on typical hardware requirements. Do you have the right skill set? The early parts of a project, board bring-up, detailed knowledge of Linux and how it relates to your hardware. Likewise, when debugging and tuning your application, you will need to be able to interpret the results. If you don't have the skills in-house, you may want to outsource some of the work. Is your system real-time? Linux can handle many real-time activities so long as you pay attention to certain details. Consider these points carefully. Probably the best indicator of success is to look around for similar products that run Linux and see how they have done it; follow best practice. The players Where does open source software come from? Who writes it? In particular, how does this relate to the key components of embedded development—the toolchain, bootloader, kernel, and basic utilities found in the root filesystem? The main players are: The open source community: This, after all, is the engine that generates the software you are going to be using. The community is a loose alliance of developers, many of whom are funded in some way, perhaps by a not-for-profit organization, an academic institution, or a commercial company. They work together to further the aims of the various projects. There are many of them—some small, some large.  CPU architects: These are the organizations that design the CPUs we use. The important ones here are ARM/Linaro (ARM-based SoCs), Intel (x86 and x86_64), Imagination Technologies (MIPS), and IBM (PowerPC). They implement or, at the very least, influence support for the basic CPU architecture. SoC vendors (Atmel, Broadcom, Intel, Qualcomm, TI, and many others). They take the kernel and toolchain from the CPU architects and modify them to support their chips. They also create reference boards: designs that are used by the next level down to create development boards and working products. Board vendors and OEMs: These people take the reference designs from SoC vendors and build them in to specific products, for instance, set-top-boxes or cameras, or create more general purpose development boards, such as those from Avantech and Kontron. An important category are the cheap development boards such as BeagleBoard/BeagleBone and Raspberry Pi that have created their own ecosystems of software and hardware add-ons. These form a chain, with your project usually at the end, which means that you do not have a free choice of components. You cannot simply take the latest kernel from https://www.kernel.org/, except in a few rare cases, because it does not have support for the chip or board that you are using. This is an ongoing problem with embedded development. Ideally, the developers at each link in the chain would push their changes upstream, but they don't. It is not uncommon to find a kernel which has many thousands of patches that are not merged. In addition, SoC vendors tend to actively develop open source components only for their latest chips, meaning that support for any chip more than a couple of years old will be frozen and not receive any updates. The consequence is that most embedded designs are based on old versions of software. They do not receive security fixes, performance enhancements, or features that are in newer versions. Problems such as Heartbleed (a bug in the OpenSSL libraries) and ShellShock (a bug in the bash shell) go unfixed. What can you do about it? First, ask questions of your vendors: what is their update policy, how often do they revise kernel versions, what is the current kernel version, what was the one before that, and what is their policy for merging changes up-stream? Some vendors are making great strides in this way. You should prefer their chips. Secondly, you can take steps to make yourself more self-sufficient. The article explains the dependencies in more detail and show you where you can help yourself. Don't just take the package offered to you by the SoC or board vendor and use it blindly without considering the alternatives. The four elements of embedded Linux Every project begins by obtaining, customizing, and deploying these four elements: the toolchain, the bootloader, the kernel, and the root filesystem: Toolchain: The compiler and other tools needed to create code for your target device. Everything else depends on the toolchain. Bootloader: The program that initializes the board and loads the Linux kernel. Kernel: This is the heart of the system, managing system resources and interfacing with hardware. Root filesystem: Contains the libraries and programs that are run once the kernel has completed its initialization. Of course, there is also a fifth element, not mentioned here. That is the collection of programs specific to your embedded application which make the device do whatever it is supposed to do, be it weigh groceries, display movies, control a robot, or fly a drone. Typically, you will be offered some or all of these elements as a package when you buy your SoC or board. But, for the reasons mentioned in the preceding paragraph, they may not be the best choices for you. Open source The components of embedded Linux are open source, so now is a good time to consider what that means, why open sources work the way they do, and how this affects the often proprietary embedded device you will be creating from it. Licenses When talking about open source, the word free is often used. People new to the subject often take it to mean nothing to pay, and open source software licenses do indeed guarantee that you can use the software to develop and deploy systems for no charge. However, the more important meaning here is freedom, since you are free to obtain the source code, modify it in any way you see fit, and redeploy it in other systems. These licenses give you this right. Compare that with shareware licenses which allow you to copy the binaries for no cost but do not give you the source code, or other licenses that allow you to use the software for free under certain circumstances, for example, for personal use but not commercial. These are not open source. I will provide the following comments in the interest of helping you understand the implications of working with open source licenses, but I would like to point out that I am an engineer and not a lawyer. What follows is my understanding of the licenses and the way they are interpreted. Open source licenses fall broadly into two categories: the copyleft licenses such as the General Public License (GPL) and the permissive licenses such as those from the Berkeley Software Distribution (BSD), the , and others. The permissive licenses say, in essence, that you may modify the source code and use it in systems of your own choosing so long as you do not modify the terms of the license in any way. In other words, with that one restriction, you can do with it what you want, including building it into possibly proprietary systems. The GPL licenses are similar, but have clauses which compel you to pass the rights to obtain and modify the software on to your end users. In other words, you share your source code. One option is to make it completely public by putting it onto a public server. Another is to offer it only to your end users by means of a written offer to provide the code when requested. The GPL goes further to say that you cannot incorporate GPL code into proprietary programs. Any attempt to do so would make the GPL apply to the whole. In other words, you cannot combine a GPL and proprietary code in one program. So, what about libraries? If they are licensed with the GPL, any program linked with them becomes GPL also. However, most libraries are licensed under the Lesser General Public License (LGPL). If this is the case, you are allowed to link with them from a proprietary program. All the preceding description relates specifically to GLP v2 and LGPL v2.1. I should mention the latest versions of GLP v3 and LGPL v3. These are controversial, and I will admit that I don't fully understand the implications. However, the intention is to ensure that the GPLv3 and LGPL v3 components in any system can be replaced by the end user, which is in the spirit of open source software for everyone. It does pose some problems though. Some Linux devices are used to gain access to information according to a subscription level or another restriction, and replacing critical parts of the software may compromise that. Set-top-boxes fit into this category. There are also issues with security. If the owner of a device has access to the system code, then so might an unwelcome intruder. Often the defense is to have kernel images that are signed by an authority, the vendor, so that unauthorized updates are not possible. Is that an infringement of my right to modify my device? Opinions differ. The TiVo set-top-box is an important part of this debate. It uses a Linux kernel, which is licensed under GPL v2. TiVo have released the source code of their version of the kernel and so comply with the license. TiVo also has a bootloader that will only load a kernel binary that is signed by them. Consequently, you can build a modified kernel for a TiVo box but you cannot load it on the hardware. The Free Software Foundation (FSF) takes the position that this is not in the spirit of open source software and refers to this procedure as Tivoization. The GPL v3 and LGPL v3 were written to explicitly prevent this happening. Some projects, the Linux kernel in particular, have been reluctant to adopt the version three licenses because of the restrictions it would place on device manufacturers. Hardware for embedded Linux If you are designing or selecting hardware for an embedded Linux project, what do you look out for? Firstly, a CPU architecture that is supported by the kernel—unless you plan to add a new architecture yourself, of course! Looking at the source code for Linux 4.9, there are 31 architectures, each represented by a sub-directory in the arch/ directory. They are all 32- or 64-bit architectures, most with a memory management unit (MMU), but some without. The ones most often found in embedded devices are ARM, MIPS PowerPC, and X86, each in 32- and 64-bit variants, and all of which have memory management units. That doesn't have an MMU that runs a subset of Linux known as microcontroller Linux or uClinux. These processor architectures include ARC, Blackfin, MicroBlaze, and Nios. I will mention uClinux from time to time but I will not go into detail because it is a rather specialized topic. Secondly, you will need a reasonable amount of RAM. 16 MiB is a good minimum, although it is quite possible to run Linux using half that. It is even possible to run Linux with 4 MiB if you are prepared to go to the trouble of optimizing every part of the system. It may even be possible to get lower, but there comes a point at which it is no longer Linux. Thirdly, there is non-volatile storage, usually flash memory. 8 MiB is enough for a simple device such as a webcam or a simple router. As with RAM, you can create a workable Linux system with less storage if you really want to, but the lower you go, the harder it becomes. Linux has extensive support for flash storage devices, including raw NOR and NAND flash chips, and managed flash in the form of SD cards, eMMC chips, USB flash memory, and so on. Fourthly, a debug port is very useful, most commonly an RS-232 serial port. It does not have to be fitted on production boards, but makes board bring-up, debugging, and development much easier. Fifthly, you need some means of loading software when starting from scratch. A few years ago, boards would have been fitted with a Joint Test Action Group (JTAG) interface for this purpose, but modern SoCs have the ability to load boot code directly from removable media, especially SD and micro SD cards, or serial interfaces such as RS-232 or USB. In addition to these basics, there are interfaces to the specific bits of hardware your device needs to get its job done. Mainline Linux comes with open source drivers for many thousands of different devices, and there are drivers (of variable quality) from the SoC manufacturer and from the OEMs of third-party chips that may be included in the design, but remember my comments on the commitment and ability of some manufacturers. As a developer of embedded devices, you will find that you spend quite a lot of time evaluating and adapting third-party code, if you have it, or liaising with the manufacturer if you don't. Finally, you will have to write the device support for interfaces that are unique to the device, or find someone to do it for you. Hardware The worked examples are intended to be generic, but to make them relevant and easy to follow, I have had to choose specific hardware. I have chosen two exemplar devices: the BeagleBone Black and QEMU. The first is a widely-available and cheap development board which can be used in serious embedded hardware. The second is a machine emulator that can be used to create a range of systems that are typical of embedded hardware. It was tempting to use QEMU exclusively, but, like all emulations, it is not quite the same as the real thing. Using a BeagleBone Black, you have the satisfaction of interacting with real hardware and seeing real LEDs flash. I could have selected a board that is more up-to-date than the BeagleBone Black, which is several years old now, but I believe that its popularity gives it a degree of longevity and it means that it will continue to be available for some years yet. In any case, I encourage you to try out as many of the examples as you can, using either of these two platforms, or indeed any embedded hardware you may have to hand. The BeagleBone Black The BeagleBone and the later BeagleBone Black are open hardware designs for a small, credit card sized development board produced by CircuitCo LLC. The main repository of information is at https://beagleboard.org/. The main points of the specifications are: TI AM335x 1 GHz ARM® Cortex-A8 Sitara SoC 512 MiB DDR3 RAM 2 or 4 GiB 8-bit eMMC on-board flash storage Serial port for debug and development MicroSD connector, which can be used as the boot device Mini USB OTG client/host port that can also be used to power the board Full size USB 2.0 host port 10/100 Ethernet port HDMI for video and audio output In addition, there are two 46-pin expansion headers for which there are a great variety of daughter boards, known as capes, which allow you to adapt the board to do many different things. However, you do not need to fit any capes in the examples. In addition to the board itself, you will need: A mini USB to full-size USB cable (supplied with the board) to provide power, unless you have the last item on this list. An RS-232 cable that can interface with the 6-pin 3.3V TTL level signals provided by the board. The Beagleboard website has links to compatible cables. A microSD card and a means of writing to it from your development PC or laptop, which will be needed to load software onto the board. An Ethernet cable, as some of the examples require network connectivity. Optional, but recommended, a 5V power supply capable of delivering 1 A or more. QEMU QEMU is a machine emulator. It comes in a number of different flavors, each of which can emulate a processor architecture and a number of boards built using that architecture. For example, we have the following: qemu-system-arm: ARM qemu-system-mips: MIPS qemu-system-ppc: PowerPC qemu-system-x86: x86 and x86_64 For each architecture, QEMU emulates a range of hardware, which you can see by using the option—machine help. Each machine emulates most of the hardware that would normally be found on that board. There are options to link hardware to local resources, such as using a local file for the emulated disk drive. Here is a concrete example: $ qemu-system-arm -machine vexpress-a9 -m 256M -drive file=rootfs.ext4,sd -net nic -net use -kernel zImage -dtb vexpress- v2p-ca9.dtb -append "console=ttyAMA0,115200 root=/dev/mmcblk0" - serial stdio -net nic,model=lan9118 -net tap,ifname=tap0 The options used in the preceding command line are: -machine vexpress-a9: Creates an emulation of an ARM Versatile Express development board with a Cortex A-9 processor -m 256M: Populates it with 256 MiB of RAM -drive file=rootfs.ext4,sd: Connects the SD interface to the local file rootfs.ext4 (which contains a filesystem image) -kernel zImage: Loads the Linux kernel from the local file named zImage -dtb vexpress-v2p-ca9.dtb: Loads the device tree from the local file vexpress-v2p-ca9.dtb -append "...": Supplies this string as the kernel command-line -serial stdio: Connects the serial port to the terminal that launched QEMU, usually so that you can log on to the emulated machine via the serial console -net nic,model=lan9118: Creates a network interface -net tap,ifname=tap0: Connects the network interface to the virtual network interface tap0 To configure the host side of the network, you need the tunctl command from the User Mode Linux (UML) project; on Debian and Ubuntu, the package is named uml-utilites: $ sudo tunctl -u $(whoami) -t tap0 This creates a network interface named tap0 which is connected to the network controller in the emulated QEMU machine. You configure tap0 in exactly the same way as any other interface. I will be using Versatile Express for most of my examples, but it should be easy to use a different machine or architecture. Software I have used only open source software, both for the development tools and the target operating system and applications. I assume that you will be using Linux on your development system. I tested all the host commands using Ubuntu 14.04 and so there is a slight bias towards that particular version, but any modern Linux distribution is likely to work just fine. Summary Embedded hardware will continue to get more complex, following the trajectory set by Moore's Law. Linux has the power and the flexibility to make use of hardware in an efficient way. Linux is just one component of open source software out of the many that you need to create a working product. The fact that the code is freely available means that people and organizations at many different levels can contribute. However, the sheer variety of embedded platforms and the fast pace of development lead to isolated pools of software which are not shared as efficiently as they should be. In many cases, you will become dependent on this software, especially the Linux kernel that is provided by your SoC or Board vendor, and to a lesser extent, the toolchain. Some SoC manufacturers are getting better at pushing their changes upstream and the maintenance of these changes is getting easier. Fortunately, there are some powerful tools that can help you create and maintain the software for your device. For example, Buildroot is ideal for small systems and the Yocto Project for larger ones. Before I describe these build tools, I will describe the four elements of embedded Linux, which you can apply to all embedded Linux projects, however they are created. Resources for Article: Further resources on this subject: Programming with Linux [article] Embedded Linux and Its Elements [article] Revisiting Linux Network Basics [article]
Read more
  • 0
  • 0
  • 1646
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-creating-first-python-script
Packt
09 Aug 2017
27 min read
Save for later

Creating the First Python Script

Packt
09 Aug 2017
27 min read
In this article by Silas Toms, the author of the book ArcPy and ArcGIS - Second Edition, we will demonstrate how to use ModelBuilder, which ArcGIS professionals are already familiar with, to model their first analysis and then export it out as a script. With the Python  environment configured to fit our needs, we can now create and execute ArcPy scripts. To ease into the creation of Python scripts, this article will use ArcGIS ModelBuilder to model a simple analysis, and export it as a Python script. ModelBuilder is very useful for creating Python scripts. It has an operational and a visual component, and all models can be outputted as Python scripts, where they can be further customized.  This article we will cover the following topics: Modeling a simple analysis using ModelBuilder Exporting the model out to a Python script Window file paths versus Pythonic file paths String formatting methods (For more resources related to this topic, see here.) Prerequisites The following are the prerequisites for this article: ArcGIS 10x and Python 2.7, with arcpy available as a module. For this article, the accompanying data and scripts should be downloaded from Packt Publishing's website. The completed scripts are available for comparison purposes, and the data will be used for this article's analysis. To run the code and test code examples, use your favorite IDE or open the IDLE (Python GUI) program from the Start Menu/ArcGIS/Python2.7 folder after installing ArcGIS for Desktop. Use the built-in "interpreter" or code entry interface, indicated by the triple chevron >>> and a blinking cursor. ModelBuilder ArcGIS has been in development since the 1970s. Since that time, it has included a variety of programming languages and tools to help GIS users automate analysis and map production. These include the Avenue scripting language in the ArcGIS 3x series, and the ARC Macro Language (AML) in the ARCInfo Workstation days as well as VBScript up until ArcGIS 10x, when Python was introduced. Another useful tool introduced in ArcGIS 9x was ModelBuilder, a visual programming environment used for both modeling analysis and creating tools that can be used repeatedly with different input feature classes. A useful feature of ModelBuilder is an export function, which allows modelers to create Python scripts directly from a model. This makes it easier to compare how parameters in a ModelBuilder tool are accepted as compared to how a Python script calls the same tool and supplies its parameters, and how generated feature classes are named and placed within the file structure. ModelBuilder is a helpful tool on its own, and its Python export functionality makes it easy for a GIS analyst to generate and customize ArcPy scripts. Creating a model and exporting to Python This article and the associated scripts depend on the downloadable file SanFrancisco.gdb geodatabase available from Packt. SanFrancisco.gdb contains data downloaded from https://datasf.org/ and the US Census' American Factfinder website at https://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml. All census and geographic data included in the geodatabase is from the 2010 census. The data is contained within a feature dataset called SanFrancisco. The data in this feature dataset is in NAD 83 California State Plane Zone 3, and the linear unit of measure is the US foot. This corresponds to SRID 2227 in the European Petroleum Survey Group (EPSG) format. The analysis which will create with the model, and eventually export to Python for further refinement, will use bus stops along a specific line in San Francisco. These bus stops will be buffered to create a representative region around each bus stop. The buffered areas will then be intersected with census blocks to find out how many people live within each representative region around the bus stops. Modeling the Select and Buffer tools Using ModelBuilder, we will model the basic bus stop analysis. Once it has been modeled, it will be exported as an automatically generated Python script. Follow these steps to begin the analysis: Open up ArcCatalog, and create a folder connection to the folder containing SanFrancisco.gdb. I have put the geodatabase in a C drive folder called "Projects" for a resulting file path of C:ProjectsSanFrancisco.gdb. Right-click on  geodatabase, and add a new toolbox called Chapter2Tools. Right-click on geodatabase; select New, and then Feature Dataset, from the menu. A dialogue will appear that asks for a name; call it Chapter2Results, and push Next. It will ask for a spatial reference system; enter 2227 into the search bar, and push the magnifying glass icon. This will locate the correct spatial reference system: NAD 1983 StatePlane California III FIPS 0403 Feet. Don't select a vertical reference system, as we are not doing any Z value analysis. Push next, select the default tolerances, and push Finish. Next, open ModelBuilder using the ModelBuilder icon or by right-clicking on the Toolbox, and create a new Model. Save the model in the Chapter2Tools toolbox as Chapter2Model1. Drag in the  Bus_Stops feature class and the Select tool from the Analysis/Extract toolset in ArcToolbox. Open up the Select tool, and name the output feature class Inbound71. Make sure that the feature class is written to the Chapter2Results feature dataset. Open up the Expression SQL Query Builder, and create the following SQL expression : NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'. The next step is to add a Buffer Tool from the Analysis/Proximity toolset. The Buffer tool will be used to create buffers around each bus stop. The buffered bus stops allow us to intersect with census data in the form of census blocks, creating the representative regions around each bus stop. Connect the output of the Select tool (Inbound71) to the Buffer tool. Open up the Buffer tool, add 400 to the Distance field, and change the units to Feet. Leave the rest of the options blank. Click on OK, and return to the model: Adding in the Intersect tool Now that we have selected the bus line of interest, and buffered the stops to create representative regions, we will need to intersect the regions with the census blocks to find the population of each representative region. This can be done as follows: First, add the CensusBlocks2010 feature class from the SanFrancisco feature dataset to the model. Next, add in the Intersect tool located in the Analysis/Overlay toolset in the ArcToolbox. While we could use a Spatial Join to achieve a similar result, I have used the Intersect tool to capture the area of intersect for use later in the model and script. At this point, our model should look like this: Tallying the analysis results After we have created this simple analysis, the next step is to determine the results for each bus stop. Finding the number of people that live in census blocks, touched by the 400-foot buffer of each bus stop, involves examining each row of data in the final feature class, and selecting rows that correspond to the bus stop. Once these are selected, a sum of the selected rows would be calculated either using the Field Calculator or the Summarize tool. All of these methods will work, and yet none are perfect. They take too long, and worse, are not repeatable automatically if an assumption in the model is adjusted (if the buffer is adjusted from 400 feet to 500 feet, for instance). This is where the traditional uses of ModelBuilder begin to fail analysts. It should be easy to instruct the model to select all rows associated with each bus stop, and then generate a summed population figure for each bus stop's representative region. It would be even better to have the model create a spreadsheet to contain the final results of the analysis. It's time to use Python to take this analysis to the next level. Exporting the model and adjusting the script While modeling analysis in ModelBuilder has its drawbacks, there is one fantastic option built into ModelBuilder: the ability to create a model, and then export the model to Python. Along with the ArcGIS Help Documentation, it is the best way to discover the correct Python syntax to use when writing ArcPy scripts. Create a folder that can hold the exported scripts next to the SanFrancisco geodatabase (for example, C:ProjectsScripts). This will hold both the exported scripts that ArcGIS automatically generates, and the versions that we will build from those generated scripts. Now, perform the following steps: Open up the model called Chapter2Model1. Click on the Model menu in the upper-left side of the screen. Select Export from the menu. Select To Python Script. Save the script as Chapter2Model1.py. Note that there is also the option to export the model as a graphic. Creating a graphic of the model is a good way to share what the model is doing with other analysts without the need to share the model and the data, and can also be useful when sharing Python scripts as well. The Automatically generated script Open the automatically generated script in an IDE. It should look like this: # -*- coding: utf-8 -*- # --------------------------------------------------------------------------- # Chapter2Model1.py # Created on: 2017-01-26 04:26:31.00000 # (generated by ArcGIS/ModelBuilder) # Description: # --------------------------------------------------------------------------- # Import arcpy module import arcpy # Local variables: Bus_Stops = "C:ProjectsSanFrancisco.gdbSanFranciscoBus_Stops" Inbound71 = "C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71" Inbound71_400ft_Buffer = "C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71_400ft_Buffer" CensusBlocks2010 = "C:ProjectsSanFrancisco.gdbSanFranciscoCensusBlocks2010" Intersect71Census = "C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census" # Process: Select arcpy.Select_analysis(Bus_Stops, Inbound71, "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'") # Process: Buffer arcpy.Buffer_analysis(Inbound71, Inbound71_400ft_buffer, "400 Feet", "FULL", "ROUND", "NONE", "") # Process: Intersect arcpy.Intersect_analysis("C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71_400ft_Buffer #;C:ProjectsSanFrancisco.gdbSanFranciscoCensusBlocks2010 #",Intersect71Census, "ALL", "", "INPUT") Let's examine this script line by line. The first line is preceded by a pound sign ("#"), which again means that this line is a comment; however, it is not ignored by the Python interpreter when the script is executed as usual, but is used to help Python interpret the encoding of the script as described here: http://legacy.python.org/dev/peps/pep-0263. The second commented line and the third line are included for decorative purposes. The next four lines, all commented, are used for providing readers information about the script: what it is called and when it was created along with a description, which is pulled from the model's properties. Another decorative line is included to visually separate out the informative header from the body of the script. While the commented information section is nice to include in a script for other users of the script, it is not necessary. The body of the script, or the executable portion of the script, starts with the import arcpy line. Import statements are, by convention, included at the top of the body of the script. In this instance, the only module that is being imported is ArcPy. ModelBuilder's export function creates not only an executable script, but also comments each section to help mark the different sections of the script. The comments let user know where the variables are located, and where the ArcToolbox tools are being executed.  After the import statements come the variables. In this case, the variables represent the file paths to the input and output feature classes. The variable names are derived from the names of the feature classes (the base names of the file paths). The file paths are assigned to the variables using the assignment operator ("="), and the parts of the file paths are separated by two backslashes. File paths in Python To store and retrieve data, it is important to understand how file paths are used in Python as compared to how they are represented in Windows. In Python, file paths are strings, and strings in Python have special characters used to represent tabs "t", newlines "n", or carriage returns "r", among many others. These special characters all incorporate single backslashes, making it very hard to create a file path that uses single backslashes. File paths in Windows Explorer all use single backslashes. Windows Explorer: C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census Python was developed within the Linux environment, where file paths have forward slashes. There are a number of methods used to avoid this issue. The first is using filepaths with forward slashes. The Python interpreter will understand file paths with forward slashes as seen in this code: Python version: "C:/Projects/SanFrancisco.gdb/Chapter2Results/Intersect71Census" Within a Python script, the Python file path with the forward slashes will definitely work, while the Windows Explorer version might cause the script to throw an exception as Python strings can have special characters like the newline character n, or tab t. that will cause the string file path to be read incorrectly by the Python interpreter. Another method used to avoid the issue with special characters is the one employed by ModelBuilder when it automatically creates the Python scripts from a model. In this case, the backslashes are "escaped" using a second backslash. The preceding script uses this second method to produce the following results: Python escaped version: "C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census" The third method, which I use when copying file paths from ArcCatalog or Windows Explorer into scripts, is to create what is known as a "raw" string. This is the same as a regular string, but it includes an "r" before the script begins. This "r" alerts the Python interpreter that the following script does not contain any special characters or escape characters. Here is an example of how it is used: Python raw string: r"C:ProjectsSanFrancisco.gdbSanFranciscoBus_Stops" Using raw strings makes it easier to grab a file path from Windows Explorer, and add it to a string inside a script. It also makes it easier to avoid accidentally forgetting to include a set of double backslashes in a file path, which happens all the time and is the cause of many script bugs. String manipulation There are three major methods for inserting variables into strings. Each has different advantages and disadvantages of a technical nature. It's good to know about all three, as they have uses beyond our needs here, so let's review them. String manipulation method 1: string addition String addition seems like an odd concept at first, as it would not seem possible to "add" strings together, unlike integers or floats which are numbers. However, within Python and other programming languages, this is a normal step. Using the plus sign "+", strings are "added" together to make longer strings, or to allow variables to be added into the middle of existing strings. Here are some examples of this process: >>> aString = "This is a string" >>> bString = " and this is another string" >>> cString = aString + bString >>> cString The output is as follows: 'This is a string and this is another string' Two or more strings can be "added" together, and the result can be assigned to a third variable for using it later in the script. This process can be useful for data processing and formatting.  Another similar offshoot of string addition is string multiplication, where strings are multiplied by an integer to produce repeating versions of the string, like this: >>> "string" * 3 'stringstringstring' String manipulation method 2: string formatting #1 The second method of string manipulation, known as string formatting, involves adding placeholders into the string, which accept specific kinds of data. This means that these special strings can accept other strings as well as integers and float values. These placeholders use the modulo "%" and a key letter to indicate the type of data to expect. Strings are represented using %s, floats using %f, and integers using %d. The floats can also be adjusted to limit the digits included by adding a modifying number after the modulo. If there is more than one placeholder in a string, the values are passed to the string in a tuple. This method has become less popular, since the third method discussed next was introduced in Python 2.6, but it is still valuable to know, as many older scripts use it. Here is an example of this method: >>> origString = "This string has as a placeholder %s" >>> newString = origString % "and this text was added" >>> print newString The output is as follows: This string has as a placeholder and this text was added Here is an example when using a float placeholder: >>> floatString1 = "This string has a float here: %f" >>> newString = floatString % 1.0 >>> newString = floatString1 % 1.0 >>> print newString The output is as follows: This string has a float here: 1.000000 Here is another example when using a float placeholder: >>> floatString2 = "This string has a float here: %.1f" >>> newString2 = floatString2 % 1.0 >>> print newString2 The output is as follows: This string has a float here: 1.0 Here is an example using an integer placeholder: >>> intString = "Here is an integer: %d" >>> newString = intString % 1 >>> print newString The output is as follows: Here is an integer: 1 String manipulation method 3: string formatting #2 The final method is known as string formatting. It is similar to the string formatting method 1, with the added benefit of not requiring a specific data type of placeholder. The placeholders, or tokens as they are also known, are only required to be in order to be accepted. The format function is built into strings; by adding .format to the string, and passing in parameters, the string accepts the values, as seen in the following example: >>> formatString = "This string has 3 tokens: {0}, {1}, {2}" >>> newString = formatString.format("String", 2.5, 4) >>> print newString This string has 3 tokens: String, 2.5, 4 The tokens don't have to be in order within the string, and can even be repeated by adding a token wherever it is needed within the template. The order of the values applied to the template is derived from the parameters supplied to the .format function, which passes the values to the string. The third method has become my go-to method for string manipulation because of the ability to add the values repeatedly, and because it makes it possible to avoid supplying the wrong type of data to a specific placeholder, unlike the second method. The ArcPy tools After the import statements and the variable definitions, the next section of the script is where the analysis is executed. The same tools that we created in the model--the Select, Buffer, and Intersect tools, are included in this section. The same parameters that we supplied in the model are also included here: the inputs and outputs, plus the SQL statement in the Select tool, and the buffer distance in the Buffer tool. The tool parameters are supplied to the tools in the script in the same order as they appear in the tool interfaces in the model. Here is the Select tool in the script: arcpy.Select_analysis(Bus_Stops, Inbound71, "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'") It works like this: the arcpy module has a "method", or tool, called Select_analysis. This method, when called, requires three parameters: the input feature class (or shapefile), the output feature class, and the SQL statement. In this example, the input is represented by the variable Bus_Stops, and the output feature class is represented by the variable Inbound71, both of which are defined in the variable section. The SQL statement is included as the third parameter. Note that it could also be represented by a variable if the variable was defined preceding to this line; the SQL statement, as a string, could be assigned to a variable, and the variable could replace the SQL statement as the third parameter. Here is an example of parameter replacement using a variable: sqlStatement = "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'" arcpy.Select_analysis(Bus_Stops, Inbound71, sqlStatement) While ModelBuilder is good for assigning input and output feature classes to variables, it does not assign variables to every portion of the parameters. This will be an important thing to correct when we adjust and build our own scripts. The Buffer tool accepts a similar set of parameters as the Select tool. There is an input feature class represented by a variable, an output feature class variable, and the distance that we provided (400 feet in this case) along with a series of parameters that were supplied by default. Note that the parameters rely on keywords, and these keywords can be adjusted within the text of the script to adjust the resulting buffer output. For instance, "Feet" could be adjusted to "Meters", and the buffer would be much larger. Check the help section of the tool to understand better how the other parameters will affect the buffer, and to find the keyword arguments that are accepted by the Buffer tool in ArcPy. Also, as noted earlier, all of the parameters could be assigned to variables, which can save time if the same parameters are used repeatedly throughout a script. Sometimes, the supplied parameter is merely an empty string, as in this case here with the last parameter: arcpy.Buffer_analysis(Inbound71,Inbound71_400ft_buffer, "400 Feet", "FULL", "ROUND", "NONE", "") The empty string for the last parameter, which, in this case, signifies that there is no dissolve field for this buffer, is found quite frequently within ArcPy. It could also be represented by two single quotes, but ModelBuilder has been built to use double quotes to encase strings. The Intersect tool The last tool, the Intersect tool, uses a different method to represent the files that need to be intersected together when the tool is executed. Because the tool accepts multiple files in the input section (meaning, there is no limit to the number of files that can be intersected together in one operation), it stores all of the file paths within one string. This string can be manipulated using one of the string manipulation methods discussed earlier, or it can be reorganized to accept a Python list that contains the file paths, or variables representing file paths as a list, as the first parameter in any order. The Intersect tool will find the intersection of all of the strings. Adjusting the script Now is the time to take the automatically generated script, and adjust it to fit our needs. We want the script to both produce the output data, and to have it analyze the data and tally the results into a spreadsheet. This spreadsheet will hold an averaged population value for each bus stop. The average will be derived from each census block that the buffered representative region surrounding the stops intersected. Save the original script as "Chapter2Model1Modified.py". Adding the CSV module to the script For this script, we will use the csv module, a useful module for creating Comma-Separated Value spreadsheets. Its simple syntax will make it a useful tool for creating script outputs. ArcGIS for Desktop also installs the xlrd and xlwt modules, used to read or generate Excel spreadsheets respectively, when it is installed. These modules are also great for data analysis output. After the import arcpy line, add import csv. This will allow us to use the csv module for creating the spreadsheet. # Import arcpy module import arcpy import csv The next adjustment is made to the Intersect tool. Notice that the two paths included in the input string are also defined as variables in the variable section. Remove the file paths from the input strings, and replace them with a list containing the variable names of the input datasets, as follows: # Process: Intersect arcpy.Intersect_analysis([Inbound71_400ft_buffer,CensusBlocks2010],Intersect71Census, "ALL", "", "INPUT") Accessing the data: using a cursor Now that the script is in place to generate the raw data we need, we need a way to access the data held in the output feature class from the Intersect tool. This access will allow us to aggregate the rows of data representing each bus stop. We also need a data container to hold the aggregated data in memory before it is written to the spreadsheet. To accomplish the second part, we will use a Python dictionary. To accomplish the first part, we will use a method built into the ArcPy module: the Data Access SearchCursor. The Python dictionary will be added after the Intersect tool. A dictionary in Python is created using curly brackets {}. Add the following line to the script, below the analysis section: dataDictionary = {} This script will use the bus stop IDs as keys for the dictionary. The values will be lists, which will hold all of the population values associated with each busStopID. Add the following lines to generate a Data Cursor: with arcpy.da.SearchCursor(Intersect71Census, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) This iteration combines a few ideas in Python and ArcPy. The with...as statement is used to create a variable (cursor), which represents the arcpy.da.SearchCursor object. It could also be written like this: cursor = arcpy.da.SearchCursor(Intersect71Census, ["STOPID","POP10"]) The advantage of the with...as structure is that the cursor object is erased from memory when the iteration is completed, which eliminates locks on the feature classes being evaluated. The arcpy.da.SearchCursor function requires an input feature class, and a list of fields to be returned. Optionally, an SQL statement can limit the number of rows returned. The next line, for row in cursor:, is the iteration through the data. It is not a normal Pythonic iteration, a distinction that will have ramifications in certain instances. For instance, one cannot pass index parameters to the cursor object to only evaluate specific rows within the cursor object, as one can do with a list.  When using a Search Cursor, each row of data is returned as a tuple, which cannot be modified. The data can be accessed using indexes. The if...else condition allows the data to be sorted. As noted earlier, the bus stop ID, which is the first member of the data included in the tuple, will be used as a key. The conditional evaluates if the bus stop ID is included in the dictionary's existing keys (which are contained in a list, and accessed using the dictionary.keys() method). If it is not, it is added to the keys, and assigned a value that is a list that contains (at first) one piece of data, the population value contained in that row. If it does exist in the keys, the list is appended with the next population value associated with that bus stop ID. With this code, we have now sorted each census block population according to the bus stop with which it is associated. Next we need to add code to create the spreadsheet. This code will use the same with...as structure, and will generate an average population value by using two built-in Python functions: sum, which creates a sum from a list of numbers, and len, which will get the length of a list, tuple, or string. with open(r'C:ProjectsAverages.csv', 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dataDictionary.keys(): popList = dataDictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) The average population value is retrieved from the dictionary using the busStopID key, and then assigned to the variable averagePop. The two data pieces, the busStopID and the averagePop variable are then added to a list.This list is supplied to a csvwriter object, which knows how to accept the data and write it out to a file located at the file path supplied to the built-in Python function open, used to create simple files. The script is complete, although it is nice to add one more line to the end to give us visual confirmation that the script has run. print "Data Analysis Complete" This last line will create an output indicating that the script has run. Once it is done, go to the location of the output CSV file and open it using Excel or Notepad, and see the results of the analysis. Our first script is complete! Exceptions and  tracebacks During the process of writing and testing scripts, there will be errors that cause the code to break and throw exceptions. In Python, these are reported as a "traceback", which shows the last few lines of code executed before an exception occurred. To best understand the message, read them from the last line up. It will tell you the type of exception that occurred, and preceding to that will be the code that failed, with a line number, that should allow you to find and fix the code. It's not perfect, but it works. Overwriting files One common issue is that ArcGIS for Desktop does not allow you to overwrite files without turning on an environment variable. To avoid this issue, you can add a line after the import statements that will make overwriting files possible. Be aware that the original data will be unrecoverable once it is overwritten. It uses the env module to access the ArcGIS environment: import arcpy arcpy.env.overwriteOutput = True The final script Here is how the script should look in the end: # Chapter2Model1Modified.py # Import arcpy module import arcpy import csv # Local variables: Bus_Stops = r"C:ProjectsSanFrancisco.gdbSanFranciscoBus_Stops" CensusBlocks2010 = r"C:ProjectsSanFrancisco.gdbSanFranciscoCensusBlocks2010" Inbound71 = r"C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71" Inbound71_400ft_buffer = r"C:ProjectsSanFrancisco.gdbChapter2ResultsInbound71_400ft_buffer" Intersect71Census = r"C:ProjectsSanFrancisco.gdbChapter2ResultsIntersect71Census" # Process: Select arcpy.Select_analysis(Bus_Stops, Inbound71, "NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza'") # Process: Buffer arcpy.Buffer_analysis(Inbound71, Inbound71_400ft_buffer, "400 Feet", "FULL", "ROUND", "NONE", "") # Process: Intersect arcpy.Intersect_analysis([Inbound71_400ft_buffe,CensusBlocks2010], Intersect71Census, "ALL", "", "INPUT") dataDictionary = {} with arcpy.da.SearchCursor(Intersect71Census, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) with open(r'C:ProjectsAverages.csv', 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dataDictionary.keys(): popList = dataDictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) print "Data Analysis Complete" Summary In this article, you learned how to craft a model of an analysis and export it out to a script. In particular, you learned how to use ModelBuilder to create an analysis and export it out as a script and how to adjust the script to be more "Pythonic". After explaining about the auto-generated script, we adjusted the script to include a results analysis and summation, which was outputted to a CSV file. We also briefly touched on the use of Search Cursors. Also, we saw how built-in modules such as the csv module can be used along with ArcPy to capture analysis output in formatted spreadsheets. Resources for Article: Further resources on this subject: Using the ArcPy DataAccess Module withFeature Classesand Tables [article] Measuring Geographic Distributions with ArcGIS Tool [article] Learning to Create and Edit Data in ArcGIS [article]
Read more
  • 0
  • 0
  • 1900

article-image-games-and-exercises
Packt
09 Aug 2017
3 min read
Save for later

Games and Exercises

Packt
09 Aug 2017
3 min read
In this article by Shishira Bhat and Ravi Wray, authors of the book, Learn Java in 7 days, we will study the following concepts: Making an object as the return type for a method Making an object as the parameter for a method (For more resources related to this topic, see here.) Let’s start this article by revisiting the reference variablesand custom data types: In the preceding program, p is a variable of datatype,Pen. Yes! Pen is a class, but it is also a datatype, a custom datatype. The pvariable stores the address of the Penobject, which is in heap memory. The pvariable is a reference that refers to a Penobject. Now, let’s get more comfortable by understanding and working with examples. How to return an Object from a method? In this section, let’s understand return types. In the following code, methods returnthe inbuilt data types (int and String), and the reason is explained after each method, as follows: int add () { int res = (20+50); return res; } The addmethod returns the res(70) variable, which is of the int type. Hence, the return type must be int: String sendsms () { String msg = "hello"; return msg; } The sendsmsmethod returns a variable by the name of msg, which is of the String type. Hence, the return type is String. The data type of the returning value and the return type must be the same. In the following code snippet, the return type of the givenPenmethod is not an inbuilt data type. However, the return type is a class (Pen) Let’s understand the following code: The givePen ()methodreturns a variable (reference variable) by the name of p, which is of the Pen type. Hence, the return type is Pen: In the preceding program, tk is a variable of the Ticket type. The method returns tk; hence, the return type of the method is Ticket. A method accepting an object (parameter) After seeing how a method can return an object/reference, let's understand how a method can take an object/reference as the input,that is, parameter. We already understood that if a method takes parameter(s), then we need to pass argument(s). Example In the preceding program,the method takestwo parameters,iandk. So, while calling/invoking the method, we need to pass two arguments, which are 20.5 and 15. The parameter type andthe argument type must be the same. Remember thatwhen class is the datatype, then object is the data. Consider the following example with respect toa non-primitive/class data type andthe object as its data: In the preceding code, the Kid class has the eat method, which takes ch as a parameter of the Chocolatetype, that is,the data type of ch is Chocolate, which is a class. When class is the data type then the object of that class is an actual data or argument. Hence,new Chocolate() is passed as an argument to the eat method. Let's see one more example: The drink method takes wtr as the parameter of the type,Water, which is a class/non-primitive type; hence, the argument must be an object of theWater class. Summary In this article we have learned what to return when a class is a return type for a method and what to pass as an argument for a method when a class is a parameter for the method.  Resources for Article: Further resources on this subject: Saying Hello to Java EE [article] Getting Started with Sorting Algorithms in Java [article] Debugging Java Programs using JDB [article]
Read more
  • 0
  • 0
  • 1088

article-image-parallelize-it
Packt
18 Jul 2017
15 min read
Save for later

Parallelize It

Packt
18 Jul 2017
15 min read
In this article by Elliot Forbes, the author of the book Learning Concurrency in Python, will explain concurrency and parallelism thoroughly, and bring necessary CPU knowledge related to it. Concurrency and parallelism are two concepts that are commonly confused. The reality though is that they are quite different and if you designed software to be concurrent when instead you needed parallel execution then you could be seriously impacting your software’s true performance potential. Due to this, it's vital to know exactly what the two concepts mean so that you can understand the differences. Through knowing these differences you’ll be putting yourself at a distinct advantage when it comes to designing your own high performance software in Python. In this article we’ll be covering the following topics: What is concurrency and what are the major bottlenecks that impact our applications? What is parallelism and how does this differ from concurrency? (For more resources related to this topic, see here.) Understanding concurrency Concurrency is essentially the practice of doing multiple things at the same time, but not specifically in parallel. It can help us to improve the perceived performance of our applications and it can also improve the speed at which our applications run. The best way to think of how concurrency works is to imagine one person working on multiple tasks and quickly switching between these tasks. Imagine this one person was working concurrently on a program and at the same time dealing with support requests. This person would focus primarily on the writing of their program and quickly context switch to fixing a bug or dealing with a support issue should there be one. Once they complete the support task, they could context switch again back to writing their program really quickly. However, in computing there are typically two performance bottlenecks that we have to watch out for and guard against when writing our programs. It’s important to know the differences between the two bottlenecks as if we tried to apply concurrency to a CPU based bottleneck then you could find that the program actually starts to see performance decreases as opposed to increases. And if you tried to apply parallelism to a task that really require a concurrent solution then again you could see the same performance hits. Properties of concurrent systems All concurrent systems share a similar set of properties, these can be defined as: Multiple actors: This represent the different processes and threads all trying to actively make progress on their own tasks. We could have multiple processes that contain multiple threads all trying to run at the same time. Shared Resources: This represents the memory, the disk and other resources that the actors in the above group must utilize in order to perform what they need to do. Rules: All concurrent systems must follow a strict set of rules that define when actors can and can’t acquire locks, access memory, modify state and so on. These rules are vital in order for these concurrent systems to work otherwise our programs would tear themselves apart. Input/Output bottlenecks Input/Output bottlenecks, or I/O bottlenecks for short, are bottlenecks where your computer spends more time waiting on various inputs and outputs than it does on processing the information. You’ll typically find this type of bottleneck when you are working with an I/O heavy application. We could take your standard web browser as an example of a heavy I/O application. In a browser we typically spend a significantly longer amount of time waiting for network requests to finish for things like style sheets, scripts or HTML pages to load as opposed to rendering this on the screen. If the rate at which data is requested is slower than the rate than which it is consumed at then you have yourself an I/O bottleneck. One of the main ways to improve the speed of these applications typically is to either improve the speed of the underlying I/O by buying more expensive and faster hardware or to improve the way in which we handle these I/O requests. A great example of a program bound by I/O bottlenecks would be a web crawler. Now the main purpose of a web crawler is to traverse the web and essentially index web pages so that they can be taken into consideration when Google runs its search ranking algorithm to decide the top 10 results for a given keyword. We’ll start by creating a very simple script that just requests a page and times how long it takes to request said web page: import urllib.request import time t0 = time.time() req = urllib.request.urlopen('http://www.example.com') pageHtml = req.read() t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) If we break down this code, first we import the two necessary modules, urllib.request and the time module. We then record the starting time and request the web page: example.com and then record the ending time and printing out the time difference. Now say we wanted to add a bit of complexity and follow any links to other pages so that we could index them in the future. We could use a library such as BeautifulSoup in order to make our lives a little easier: import urllib.request import time from bs4 import BeautifulSoup t0 = time.time() req = urllib.request.urlopen( 'http://www.example.com' ) t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) soup = BeautifulSoup(req.read(), "html.parser" ) for link in soup.find_all( 'a' ): print (link.get( 'href' )) t2 = time.time() print( "Total Execeution Time: {} Seconds" .format) When I execute the above program I see the results like so in my terminal: You’ll notice from this output that the time to fetch the page is over a quarter of a second. Now imagine we wanted to run our web crawler for a million different web pages, our total execution time would be roughly a million times longer. The main real cause for this enormous execution time would be purely down to the I/O bottleneck we face in our program. We spend a massive amount of time waiting on our network requests and a fraction of that time parsing our retrieved page for further links to crawl. Understanding parallelism Parallelism is the art of executing two or more actions simultaneously as opposed to concurrency in which you make progress on two or more things at the same time. This is an important distinction, and in order to achieve true parallelism, we’ll need multiple processors on which to run our code on at the same time. A good analogy to think of parallel processing is to think of a queue for coffee. If you had say two queues of 20 people all waiting to use this coffee machine so that they can get through the rest of the day. Well this would be an example of concurrency. Now say you were to introduce a second coffee machine into the mix, this would then be an example of something happening in parallel. This is exactly how parallel processing works, each of the coffee machines in that room would represent one processing core and are able to make progress on tasks simultaneously. A real life example which highlights the true power of parallel processing is your computer’s graphics card. These graphics cards tend to have hundreds if not thousands of individual processing cores that live independently and can compute things at the same time. The reason we are able to run high-end PC games at such smooth frame rates is due to the fact we’ve been able to put so many parallel cores onto these cards. CPU bound bottleneck A CPU bound bottleneck is typically the inverse of an I/O bound bottleneck. This bottleneck is typically found in applications that do a lot of heavy number crunching or any other task that is computationally expensive. These are programs for which the rate at which they execute is bound by the speed of the CPU, if you throw a faster CPU in your machine you should see a direct increase in the speed of these programs. If the rate at which you are processing data far outweighs the rate at which you are requesting data then you have a CPU Bound Bottleneck. How do they work on a CPU? Understanding the differences outlined in the previous section between both concurrency and parallelism is essential but it’s also very important to understand more about the systems that your software will be running on. Having an appreciation of the different architecture styles as well as the low level mechanics helps you make the most informed decisions in your software design. Single core CPUs Single core processors will only ever execute one thread at any given time as that is all they are capable of. However, in order to ensure that we don’t see our applications hanging and being unresponsive, these processors rapidly switch between multiple threads of execution many thousands of times per second. This switching between threads is what is called a "context switch" and involves storing all the necessary information for a thread at a specific point of time and then restoring it at a different point further down the line. Using this mechanism of constantly saving and restoring threads allows us to make progress on quite a number of threads within a given second and it appears like the computer is doing multiple things at once. It is in fact doing only one thing at any given time but doing it at such speed that it’s imperceptible to users of that machine. When writing multi-threaded applications in Python it is important to note that these context switches are computationally quite expensive. There is no way to get around this unfortunately and much of the design of operating systems these days is about optimizing for these context switches so that we don’t feel the pain quite as much. Advantages of single core CPUs: They do not require any complex communication protocols between multiple cores Single core CPUs require less power which typically makes them better suited for IoT devices Disadvantages: They are limited in speed and larger applications will cause them to struggle and potentially freeze Heat dissipation issues place a hard limit on how fast a single core CPU can go Clock rate One of the key limitations to a single-core application running on a machine is the Clock Speed of the CPU. When we talk about Clock rate, we are essentially talking about how many clock cycles a CPU can execute every second. For the past 10 years we have watched as manufacturers have been able to surpass Moore’s law which was essentially an observation that the number of transistors one was able to place on a piece of silicon was able to double roughly every 2 years. This doubling of transistors every 2 years paved the way for exponential gains in single-cpu clock rates and CPUs went from the low MHz to the 4-5GHz clock speeds we are seeing on Intel’s i7 6700k processor. But with transistors getting as small as a few nanometers across, this is inevitably coming to an end. We’ve started to hit the boundaries of Physics and unfortunately if we go any smaller we’ll start to be hit by the effects of quantum tunneling. Due to these physical limitations we need to start looking at other methods in order to improve the speeds at which we are able to compute things. This is where Materlli’s Model of Scalability comes into play. Martelli model of scalability The author of Python Cookbook, Alex Martelli came up with a model on scalability which Raymond Hettinger discussed in his brilliant hour-long talk "Thinking about Concurrency", which he gave at PyCon Russia 2016. This model represents three different types of problem and programs: 1 core: single threaded and single process programs 2-8 cores: multithreaded and multiprocess programs 9+ cores: distributed computing The first category, the single core, single threaded category is able to handle a growing number of problems due to the constant improvements of the speed of single core CPUs and as a result the second category is being rendered more and more obsolete. We will eventually hit a limit with the speed at which a 2-8 core system can run at and as a result we’ll have to start looking at other methods such as multiple CPU systems or even distributed computing. If your problem is worth solving quickly and it requires a lot of power then the sensible approach is to go with the distributed computing category and spin up multiple machines and multiple instances of your program in order to tackle your problems in a truly parallel manner. Large enterprise systems that handle hundreds of millions of requests are the main inhabitants of this category. You’ll typically find that these enterprise systems are deployed on tens, if not hundreds of high performance, incredibly powerful servers in various locations across the world. Time-Sharing - the task scheduler One of the most important parts of the Operating System is the task scheduler. This acts as the maestro of the orchestra and directs everything with impeccable precision and incredible timing and discipline. This maestro has only one real goal and that is to ensure that every task has a chance to run through till completion, the when and where of a task’s execution however is non-deterministic. That is to say, if we gave a task scheduler two identical competing processes one after the other, there is no guarantee that the first process will complete first. This non-deterministic nature is what makes concurrent programming so challenging. An excellent example that highlights this non-deterministic behavior is say we take the following code: import threading import time import random counter = 1 def workerA(): global counter while counter < 1000: counter += 1 print("Worker A is incrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def workerB(): global counter while counter > -1000: counter -= 1 print("Worker B is decrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def main(): t0 = time.time() thread1 = threading.Thread(target=workerA) thread2 = threading.Thread(target=workerB) thread1.start() thread2.start() thread1.join() thread2.join() t1 = time.time() print("Execution Time {}".format(t1-t0)) if __name__ == '__main__': main() Here we have two competing threads in Python that are each trying to accomplish their own goal of either decrementing the counter to 1,000 or conversely incrementing it to 1,000. In a single core processor there is the possibility that worker A managers to complete its task before worker B has a chance to execute and the same can be said for worker B. However there is a third potential possibility and that is that the task scheduler continues to switch between worker A and worker B for an infinite number of times and never complete. The above code incidentally also shows one of the dangers of multiple threads accessing shared resources without any form of synchronization. There is no accurate way to determine what will happen to our counter and as such our program could be considered unreliable. Multi-core processors We’ve now got some idea as to how single-core processors work, but now it’s time to take a look at multicore processors. Multicore processors contain multiple independent processing units or “cores”. Each core contains everything it needs in order to execute a sequence of stored instructions. These cores each follow their own cycle: Fetch - This step involves fetching instructions from program memory. This is dictated by a program counter (PC) which identifies the location of the next step to execute. Decode - The core converts the instruction that it has just fetched and converts it into a series of signals that will trigger various other parts of the CPU. Execute - Finally we perform the execute step. This is where we run the instruction that we have just fetched and decoded and typically the results of this execution are then stored in a CPU register. Having multiple cores offers us the advantage of being able to work independently on multiple Fetch -> Decode -> Execute cycles. This style of architecture essentially enables us to create higher performance programs that leverage this parallel execution. Advantages of multicore processors: We are no longer bound by the same performance limitations that a single core processor is bound Applications that are able to take advantage of multiple cores will tend to run faster if well designed Disadvantages of multicore processors: They require more power than your typical single core processor. Cross-core communication is no simple feat, we have multiple different ways of doing this. Summary In this article we covered a multitude of topics including the differences between Concurrency and Parallelism. We also looked at how they both leverage the CPU in different ways. Resources for Article: Further resources on this subject: Python Data Science Up and Running [article] Putting the Fun in Functional Python [article] Basics of Python for Absolute Beginners [article]
Read more
  • 0
  • 0
  • 1664

article-image-queues-and-topics
Packt
10 Jul 2017
8 min read
Save for later

Queues and topics

Packt
10 Jul 2017
8 min read
In this article by Luca Stancapiano, the author of the book Mastering Java EE Development with WildFly, we will see how to implement Java Message Service (JMS) in a queue channel using WildFly console. (For more resources related to this topic, see here.) JMS works inside channels of messages that manage the messages asynchronously. These channels contain messages that they will collect or remove according the configuration and the type of channel. These channels are of two types, queues and topics. These channels are highly configurable by the WildFly console. As for all components in WildFly they can be installed through the console command line or directly with maven plugins of the project.  In the next two paragraphs we will show what do they mean and all the possible configurations. Queues Queues collect the sent messages that are waiting to be read. The messages are delivered in the order they are sent and when beds are removed from the queue. Create the queue from the web console See now the steps to create a new queue through the web console. Connect to http://localhost:9990/.  Go in Configuration | Subsystems/Messaging - ActiveMQ/default. And click on Queues/Topics. Now select the Queues menu and click on the Add button. You will see this screen: The parameters to insert are as follows: Name:  The name of the queue. JNDI Names: The jndi names the queue will be bound to. Durable?: Whether the queue is durable or not. Selector: The queue selector.     As for all enterprise components, JMS components are callable through Java Naming Directory Interface (JNDI).  Durable queues keep messages around persistently for any suitable consumer to consume them. Durable queues do not need to concern themselves with which consumer is going to consume the messages at some point in the future. There is just one copy of a message that any consumer in the future can consume. Message Selectors allows to filter the messages that a Message Consumer will receive. The filter is a relatively complex language similar to the syntax of an SQL WHERE clause. The selector can use all the message headers and properties for filtering operations, but cannot use the message content.Selectors are mostly useful for channels that broadcast a very large number of messages to its subscribers. On Queues, only messages that match the selector will be returned. Others stay in the queue (and thus can be read by a MessageConsumer with different selector). The following SQL elements are allowed in our filters and we can put them in the Selector field of the form:  Element  Description of the Element  Example of Selectors  AND, OR, NOT  Logical operators  (releaseYear < 1986) ANDNOT (title = 'Bad')  String Literals  String literals in single quotes, duplicate to escape  title = 'Tom''s'  Number Literals  Numbers in Java syntax. They can be double or integer  releaseYear = 1982  Properties  Message properties that follow Java identifier naming  releaseYear = 1983  Boolean Literals  TRUE and FALSE  isAvailable = FALSE  ( )  Round brackets  (releaseYear < 1981) OR (releaseYear > 1990) BETWEEN Checks whether number is in range (both numbers inclusive) releaseYear BETWEEN 1980 AND 1989 Header Fields Any headers except JMSDestination, JMSExpiration and JMSReplyTo JMSPriority = 10 =, <>, <, <=, >, >= Comparison operators (releaseYear < 1986) AND (title <> 'Bad') LIKE String comparison with wildcards '_' and '%' title LIKE 'Mirror%' IN Finds value in set of strings title IN ('Piece of mind', 'Somewhere in time', 'Powerslave') IS NULL, IS NOT NULL Checks whether value is null or not null. releaseYear IS NULL *, +, -, / Arithmetic operators releaseYear * 2 > 2000 - 18 Fill the form now: In this article we will implement a messaging service to send coordinates of the bus means . The queue is created and showed in the queues list: Create the queue using CLI and Maven WildFly plugin The same thing can be done with the Command Line Interface (CLI). So start a WildFly instance, go in the bin directory of WildFly and execute the following script: bash-3.2$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect [standalone@localhost:9990 /] /subsystem=messagingactivemq/ server=default/jmsqueue= gps_coordinates:add(entries=["java:/jms/queue/GPS"]) {"outcome" => "success"} The same thing can be done through maven. Simply add this snippet in your pom.xml: <plugin> <groupId>org.wildfly.plugins</groupId> <artifactId>wildfly-maven-plugin</artifactId> <version>1.0.2.Final</version> <executions> <execution> <id>add-resources</id> <phase>install</phase> <goals> <goal>add-resource</goal> </goals> <configuration> <resources> <resource> <address>subsystem=messaging-activemq,server=default,jmsqueue= gps_coordinates</address> <properties> <durable>true</durable> <entries>!!["gps_coordinates", "java:/jms/queue/GPS"]</entries> </properties> </resource> </resources> </configuration> </execution> <execution> <id>del-resources</id> <phase>clean</phase> <goals> <goal>undeploy</goal> Queues and topics [ 7 ] </goals> <configuration> <afterDeployment> <commands> <command>/subsystem=messagingactivemq/ server=default/jms-queue=gps_coordinates:remove </command> </commands> </afterDeployment> </configuration> </execution> </executions> </plugin> The Maven WildFly plugin lets you to do admin operations in WildFly using the same custom protocol used by command line. Two executions are configured: add-resources: It hooks the install maven scope and it adds the queue passing the name, JNDI and durable parameters seen in the previous paragraph. del-resources: It hooks the clean maven scope and remove the chosen queue by name. Create the queue through an Arquillian test case Or we can add and remove the queue through an Arquillian test case: @RunWith(Arquillian.class) @ServerSetup(MessagingResourcesSetupTask.class) public class MessageTestCase { ... private static final String QUEUE_NAME = "gps_coordinates"; private static final String QUEUE_LOOKUP = "java:/jms/queue/GPS"; static class MessagingResourcesSetupTask implements ServerSetupTask { @Override public void setup(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).createJmsQueue(QUEUE_NA ME, QUEUE_LOOKUP); } @Override public void tearDown(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).removeJmsQueue(QUEUE_NA ME); } } Queues and topics [ 8 ] ... } The Arquillian org.jboss.as.arquillian.api.ServerSetup annotation let to use an external setup manager used to install or remove new components inside WildFly. In this case we are installing the queue declared with the two variables QUEUE_NAME and QUEUE_LOOKUP. When the test ends, automatically the tearDown method will be started and it will remove the installed queue. To use Arquillian it's important add the WildFly testsuite dependency in your pom.xml project: ... <dependencies> <dependency> <groupId>org.wildfly</groupId> <artifactId>wildfly-testsuite-shared</artifactId> <version>10.1.0.Final</version> <scope>test</scope> </dependency> </dependencies> ... Going in the standalone-full.xml we will find the created queue as: <subsystem > <server name="default"> ... <jms-queue name="gps_coordinates" entries="java:/jms/queue/GPS"/> ... </server> </subsystem> JMS is available in the standalone-full configuration. By default WildFly supports 4 standalone configurations. They can be found in the standalone/configuration directory: standalone.xml: It supports all components except the messaging and corba/iiop standalone-full.xml: It supports all components standalone-ha.xml: It supports all components except the messaging and corba/iiop with the enabled cluster standalone-full-ha.xml: It supports all components with the enabled cluster To start WildFly with the chosen configuration simply add a -c with the configuration in the standalone.sh script. Here a sample to start the standalone full configuration: ./standalone.sh -c standalone-full.xml Create the java client for the queue See now how create a client to send a message to the queue. JMS 2.0 simplify very much the creation of clients. Here a sample of a client inside a stateless Enterprise Java Beans (EJB): @Stateless public class MessageQueueSender { @Inject private JMSContext context; @Resource(mappedName = "java:/jms/queue/GPS") private Queue queue; public void sendMessage(String message) { context.createProducer().send(queue, message); } } The javax.jms.JMSContext is injectable from any EE component. We will see the JMS context in details in the next paragraph The JMS Context. The queue is represented in JMS by the javax.jms.Queue class. It can be injected as JNDI resource through the @Resource annotation. The JMS context through the createProducer method creates a producer represented by the javax.jms.JMSProducer class used to send the messages. We can now create a client injecting the stateless and sending a string message hello! ... @EJB private MessageQueueSender messageQueueSender; ... messageQueueSender.sendMessage("hello!"); Summary In this article we have seen how to implement Java Message Service in a queue channel using web console, Command Line Interface and Maven WildFly plugins, Arquillian test cases and how to create Java clients for queue. Resources for Article: Further resources on this subject: WildFly – the Basics [article] WebSockets in Wildfly [article] Creating Java EE Applications [article]
Read more
  • 0
  • 0
  • 5486
article-image-ruby-strings
Packt
06 Jul 2017
9 min read
Save for later

Ruby Strings

Packt
06 Jul 2017
9 min read
In this article by Jordan Hudgens, the author of the book Comprehensive Ruby Programming, you'll learn about the Ruby String data type and walk through how to integrate string data into a Ruby program. Working with words, sentences, and paragraphs are common requirements in many applications. Additionally you learn how to: Employ string manipulation techniques using core Ruby methods Demonstrate how to work with the string data type in Ruby (For more resources related to this topic, see here.) Using strings in Ruby A string is a data type in Ruby and contains set of characters, typically normal English text (or whatever natural language you're building your program for), that you would write. A key point for the syntax of strings is that they have to be enclosed in single or double quotes if you want to use them in a program. The program will throw an error if they are not wrapped inside quotation marks. Let's walk through three scenarios. Missing quotation marks In this code I tried to simply declare a string without wrapping it in quotation marks. As you can see, this results in an error. This error is because Ruby thinks that the values are classes and methods. Printing strings In this code snippet we're printing out a string that we have properly wrapped in quotation marks. Please note that both single and double quotation marks work properly. It's also important that you do not mix the quotation mark types. For example, if you attempted to run the code: puts "Name an animal' You would get an error, because you need to ensure that every quotation mark is matched with a closing (and matching) quotation mark. If you start a string with double quotation marks, the Ruby parser requires that you end the string with the matching double quotation marks. Storing strings in variables Lastly in this code snippet we're storing a string inside of a variable and then printing the value out to the console. We'll talk more about strings and string interpolation in subsequent sections. String interpolation guide for Ruby In this section, we are going to talk about string interpolation in Ruby. What is string interpolation? So what exactly is string interpolation? Good question. String interpolation is the process of being able to seamlessly integrate dynamic values into a string. Let's assume we want to slip dynamic words into a string. We can get input from the console and store that input into variables. From there we can call the variables inside of a pre-existing string. For example, let's give a sentence the ability to change based on a user's input. puts "Name an animal" animal = gets.chomp puts "Name a noun" noun= gets.chomp p "The quick brown #{animal} jumped over the lazy #{noun} " Note the way I insert variables inside the string? They are enclosed in curly brackets and are preceded by a # sign. If I run this code, this is what my output will look: So, this is how you insert values dynamically in your sentences. If you see sites like Twitter, it sometimes displays personalized messages such as: Good morning Jordan or Good evening Tiffany. This type of behavior is made possible by inserting a dynamic value in a fixed part of a string and leverages string interpolation. Now, let's use single quotes instead of double quotes, to see what happens. As you'll see, the string was printed as it is without inserting the values for animal and noun. This is exactly what happens when you try using single quotes—it prints the entire string as it is without any interpolation. Therefore it's important to remember the difference. Another interesting aspect is that anything inside the curly brackets can be a Ruby script. So, technically you can type your entire algorithm inside these curly brackets, and Ruby will run it perfectly for you. However, it is not recommended for practical programming purposes. For example, I can insert a math equation, and as you'll see it prints the value out. String manipulation guide In this section we are going to learn about string manipulation along with a number of examples of how to integrate string manipulation methods in a Ruby program. What is string manipulation? So what exactly is string manipulation? It's the process of altering the format or value of a string, usually by leveraging string methods. String manipulation code examples Let's start with an example. Let's say I want my application to always display the word Astros in capital letters. To do that, I simply write: "Astros".upcase Now if I always a string to be in lower case letters I can use the downcase method, like so: "Astros".downcase Those are both methods I use quite often. However there are other string methods available that we also have at our disposal. For the rare times when you want to literally swap the case of the letters you can leverage the swapcase method: "Astros".swapcase And lastly if you want to reverse the order of the letters in the string we can call the reverse method: "Astros".reverse These methods are built into the String data class and we can call them on any string values in Ruby. Method chaining Another neat thing we can do is join different methods together to get custom output. For example, I can run: "Astros".reverse.upcase The preceding code displays the value SORTSA. This practice of combining different methods with a dot is called method chaining. Split, strip, and join guides for strings In this section, we are going to walk through how to use the split and strip methods in Ruby. These methods will help us clean up strings and convert a string to an array so we can access each word as its own value. Using the strip method Let's start off by analyzing the strip method. Imagine that the input you get from the user or from the database is poorly formatted and contains white space before and after the value. To clean the data up we can use the strip method. For example: str = " The quick brown fox jumped over the quick dog " p str.strip When you run this code, the output is just the sentence without the white space before and after the words. Using the split method Now let's walk through the split method. The split method is a powerful tool that allows you to split a sentence into an array of words or characters. For example, when you type the following code: str = "The quick brown fox jumped over the quick dog" p str.split You'll see that it converts the sentence into an array of words. This method can be particularly useful for long paragraphs, especially when you want to know the number of words in the paragraph. Since the split method converts the string into an array, you can use all the array methods like size to see how many words were in the string. We can leverage method chaining to find out how many words are in the string, like so: str = "The quick brown fox jumped over the quick dog" p str.split.size This should return a value of 9, which is the number of words in the sentence. To know the number of letters, we can pass an optional argument to the split method and use the format: str = "The quick brown fox jumped over the quick dog" p str.split(//).size And if you want to see all of the individual letters, we can remove the size method call, like this: p str.split(//) And your output should look like this: Notice, that it also included spaces as individual characters which may or may not be what you want a program to return. This method can be quite handy while developing real-world applications. A good practical example of this method is Twitter. Since this social media site restricts users to 140 characters, this method is sure to be a part of the validation code that counts the number of characters in a Tweet. Using the join method We've walked through the split method, which allows you to convert a string into a collection of characters. Thankfully, Ruby also has a method that does the opposite, which is to allow you to convert an array of characters into a single string, and that method is called join. Let's imagine a situation where we're asked to reverse the words in a string. This is a common Ruby coding interview question, so it's an important concept to understand since it tests your knowledge of how string work in Ruby. Let's imagine that we have a string, such as: str = "backwards am I" And we're asked to reverse the words in the string. The pseudocode for the algorithm would be: Split the string into words Reverse the order of the words Merge all of the split words back into a single string We can actually accomplish each of these requirements in a single line of Ruby code. The following code snippet will perform the task: str.split.reverse.join(' ') This code will convert the single string into an array of strings, for the example it will equal ["backwards", "am", "I"]. From there it will reverse the order of the array elements, so the array will equal: ["I", "am", "backwards"]. With the words reversed, now we simply need to merge the words into a single string, which is where the join method comes in. Running the join method will convert all of the words in the array into one string. Summary In this article, we were introduced to the string data type and how it can be utilized in Ruby. We analyzed how to pass strings into Ruby processes by leveraging string interpolation. We also learned the methods of basic string manipulation and how to find and replace string data. We analyzed how to break strings into smaller components, along with how to clean up string based data. We even introduced the Array class in this article. Resources for Article: Further resources on this subject: Ruby and Metasploit Modules [article] Find closest mashup plugin with Ruby on Rails [article] Building tiny Web-applications in Ruby using Sinatra [article]
Read more
  • 0
  • 0
  • 4157

article-image-command-line-tools
Packt
06 Jul 2017
9 min read
Save for later

Command-Line Tools

Packt
06 Jul 2017
9 min read
In this article by Aaron Torres, author of the book, Go Cookbook, we will cover the following recipes: Using command-line arguments Working with Unix pipes An ANSI coloring application (For more resources related to this topic, see here.) Using command-line arguments This article will expand on other uses for these arguments by constructing a command that supports nested subcommands. This will demonstrate Flagsets and also using positional arguments passed into your application. This recipe requires a main function to run. There are a number of third-party packages for dealing with complex nested arguments and flags, but we'll again investigate doing so using only the standard library. Getting ready You need to perform the following steps for the installation: Download and install Go on your operating system at https://golang.org/doc/install and configure your GOPATH. Open a terminal/console application. Navigate to your GOPATH/src and create a project directory, for example, $GOPATH/src/github.com/yourusername/customrepo. All code will be run and modified from this directory. Optionally, install the latest tested version of the code using the go get github.com/agtorre/go-cookbook/ command. How to do it... From your terminal/console application, create and navigate to the chapter2/cmdargs directory. Copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter2/cmdargs or use this as an exercise to write some of your own. Create a file called cmdargs.go with the following content: package main import ( "flag" "fmt" "os" ) const version = "1.0.0" const usage = `Usage: %s [command] Commands: Greet Version ` const greetUsage = `Usage: %s greet name [flag] Positional Arguments: name the name to greet Flags: ` // MenuConf holds all the levels // for a nested cmd line argument type MenuConf struct { Goodbye bool } // SetupMenu initializes the base flags func (m *MenuConf) SetupMenu() *flag.FlagSet { menu := flag.NewFlagSet("menu", flag.ExitOnError) menu.Usage = func() { fmt.Printf(usage, os.Args[0]) menu.PrintDefaults() } return menu } // GetSubMenu return a flag set for a submenu func (m *MenuConf) GetSubMenu() *flag.FlagSet { submenu := flag.NewFlagSet("submenu", flag.ExitOnError) submenu.BoolVar(&m.Goodbye, "goodbye", false, "Say goodbye instead of hello") submenu.Usage = func() { fmt.Printf(greetUsage, os.Args[0]) submenu.PrintDefaults() } return submenu } // Greet will be invoked by the greet command func (m *MenuConf) Greet(name string) { if m.Goodbye { fmt.Println("Goodbye " + name + "!") } else { fmt.Println("Hello " + name + "!") } } // Version prints the current version that is // stored as a const func (m *MenuConf) Version() { fmt.Println("Version: " + version) } Create a file called main.go with the following content: package main import ( "fmt" "os" "strings" ) func main() { c := MenuConf{} menu := c.SetupMenu() menu.Parse(os.Args[1:]) // we use arguments to switch between commands // flags are also an argument if len(os.Args) > 1 { // we don't care about case switch strings.ToLower(os.Args[1]) { case "version": c.Version() case "greet": f := c.GetSubMenu() if len(os.Args) < 3 { f.Usage() return } if len(os.Args) > 3 { if.Parse(os.Args[3:]) } c.Greet(os.Args[2]) default: fmt.Println("Invalid command") menu.Usage() return } } else { menu.Usage() return } } Run the go build command. Run the following commands and try a few other combinations of arguments: $./cmdargs -h Usage: ./cmdargs [command] Commands: Greet Version $./cmdargs version Version: 1.0.0 $./cmdargs greet Usage: ./cmdargs greet name [flag] Positional Arguments: name the name to greet Flags: -goodbye Say goodbye instead of hello $./cmdargs greet reader Hello reader! $./cmdargs greet reader -goodbye Goodbye reader! If you copied or wrote your own tests go up one directory and run go test, and ensure all tests pass. How it works... Flagsets can be used to set up independent lists of expected arguments, usage strings, and more. The developer is required to do validation on a number of arguments, parsing in the right subset of arguments to commands, and defining usage strings. This can be error prone and requires a lot of iteration to get it completely correct. The flag package makes parsing arguments much easier and includes convenience methods to get the number of flags, arguments, and more. This recipe demonstrates basic ways to construct a complex command-line application using arguments, including a package-level config, required positional arguments, multi-leveled command usage, and how to split these things into multiple files or packages if needed. Working with Unix pipes Unix pipes are useful when passing the output of one program to the input of another. Consider the following example: $ echo "test case" | wc -l 1 In a Go application, the left-hand side of the pipe can be read in using os.Stdin and acts like a file descriptor. To demonstrate this, this recipe will take an input on the left-hand side of a pipe and return a list of words and their number of occurrences. These words will be tokenized on white space. Getting ready Refer to the Getting Ready section of the Using command-line arguments recipe. How to do it... From your terminal/console application, create a new directory, chapter2/pipes. Navigate to that directory and copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter2/pipes or use this as an exercise to write some of your own. Create a file called pipes.go with the following content: package main import ( "bufio" "fmt" "os" ) // WordCount takes a file and returns a map // with each word as a key and it's number of // appearances as a value func WordCount(f *os.File) map[string]int { result := make(map[string]int) // make a scanner to work on the file // io.Reader interface scanner := bufio.NewScanner(f) scanner.Split(bufio.ScanWords) for scanner.Scan() { result[scanner.Text()]++ } if err := scanner.Err(); err != nil { fmt.Fprintln(os.Stderr, "reading input:", err) } return result } func main() { fmt.Printf("string: number_of_occurrencesnn") for key, value := range WordCount(os.Stdin) { fmt.Printf("%s: %dn", key, value) } }   Run echo "some string" | go run pipes.go. You may also run: go build echo "some string" | ./pipes You should see the following output: $ echo "test case" | go run pipes.go string: number_of_occurrences test: 1 case: 1 $ echo "test case test" | go run pipes.go string: number_of_occurrences test: 2 case: 1 If you copied or wrote your own tests, go up one directory and run go test, and ensure that all tests pass. How it works... Working with pipes in go is pretty simple, especially if you're familiar with working with files. This recipe uses a scanner to tokenize the io.Reader interface of the os.Stdin file object. You can see how you must check for errors after completing all of the reads. An ANSI coloring application Coloring an ANSI terminal application is handled by a variety of code before and after a section of text that you want colored. This recipe will explore a basic coloring mechanism to color the text red or keep it plain. For a more complete application, take a look at https://github.com/agtorre/gocolorize, which supports many more colors and text types implements the fmt.Formatter interface for ease of printing. Getting ready Refer to the Getting Ready section of the Using command line arguments recipe. How to do it... From your terminal/console application, create and navigate to the chapter2/ansicolor directory. Copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter2/ansicolor or use this as an exercise to write some of your own. Create a file called color.go with the following content: package ansicolor import "fmt" //Color of text type Color int const ( // ColorNone is default ColorNone = iota // Red colored text Red // Green colored text Green // Yellow colored text Yellow // Blue colored text Blue // Magenta colored text Magenta // Cyan colored text Cyan // White colored text White // Black colored text Black Color = -1 ) // ColorText holds a string and its color type ColorText struct { TextColor Color Text string } func (r *ColorText) String() string { if r.TextColor == ColorNone { return r.Text } value := 30 if r.TextColor != Black { value += int(r.TextColor) } return fmt.Sprintf("33[0;%dm%s33[0m", value, r.Text) } Create a new directory named example. Navigate to example and then create a file named main.go with the following content. Ensure that you modify the ansicolor import to use the path you set up in step 1: package main import ( "fmt" "github.com/agtorre/go-cookbook/chapter2/ansicolor" ) func main() { r := ansicolor.ColorText{ansicolor.Red, "I'm red!"} fmt.Println(r.String()) r.TextColor = ansicolor.Green r.Text = "Now I'm green!" fmt.Println(r.String()) r.TextColor = ansicolor.ColorNone r.Text = "Back to normal..." fmt.Println(r.String()) } Run go run main.go. Alternatively, you may also run the following: go build ./example You should see the following with the text colored if your terminal supports the ANSI coloring format: $ go run main.go I'm red! Now I'm green! Back to normal... If you copied or wrote your own tests, go up one directory and run go test, and ensure that all the tests pass. How it works... This application makes use of a struct keyword to maintain state of the colored text. In this case, it stores the color of the text and the value of the text. The final string is rendered when you call the String() method, which will either return colored text or plain text depending on the values stored in the struct. By default, the text will be plain. Summary In this article, we demonstrated basic ways to construct a complex command-line application using arguments, including a package-level config, required positional arguments, multi-leveled command usage, and how to split these things into multiple files or packages if needed. We saw how to work with Unix pipes and explored a basic coloring mechanism to color text red or keep it plain. Resources for Article: Further resources on this subject: Building a Command-line Tool [article] A Command-line Companion Called Artisan [article] Scaffolding with the command-line tool [article]
Read more
  • 0
  • 0
  • 1888

article-image-exposure-rxjava
Packt
06 Jul 2017
10 min read
Save for later

Exposure to RxJava

Packt
06 Jul 2017
10 min read
In this article by Thomas Nield, the author of the book Learning RxJava, we will cover a quick exposure to RxJava, which is a Java VM implementation of ReactiveX (Reactive Extensions): a library for composing asynchronous and event-based programs by using observable sequences. (For more resources related to this topic, see here.) It is assumed you are fairly comfortable with Java and know how to use classes, interfaces, methods, properties, variables, static/nonstatic scopes, and collections. If you have not done concurrency or multithreading, that is okay. RxJava makes these advanced topics much more accessible. Have your favorite Java development environment ready, whether it is Intellij IDEA, Eclipse, NetBeans, or any other environment of your choosing. Recommended that you have a build automation system as well such as Gradle or Maven, which we will walk through shortly. History of ReactiveX and RxJava As developers, we tend to train ourselves to think in counter-intuitive ways. Modeling our world with code has never been short of challenges. It was not long ago that object-oriented programming was seen as the silver bullet to solve this problem. Making blueprints of what we interact with in real life was a revolutionary idea, and this core concept of classes and objects still impacts how we code today. However, business and user demands continued to grow in complexity. As 2010 approached, it became clear that object-oriented programming only solved part of the problem. Classes and objects do a great job representing an entity with properties and methods, but they become messy when they need to interact with each other in increasingly complex (and often unplanned) ways. Decoupling patterns and paradigms emerged, but this yielded an unwanted side effect of growing amounts of boilerplate code. In response to these problems, functional programming began to make a comeback not to replace object-oriented programming but rather complement it and fill this void. Reactive programming, a functional event-driven programming approach, began to receive special attention.A couple of reactive frameworks emerged ultimately, including Akka and Sodium. But at Microsoft, a computer scientist named Erik Meijer created a reactive programming framework for .NET called Reactive Extensions. In a matter of years, Reactive Extensions (also called ReactiveX or Rx) was ported to several languages and platforms, including JavaScript, Python, C++, Swift, and Java, of course. ReactiveX quickly emerged as a cross-language standard to bring reactive programming into the industry. RxJava, the ReactiveX port for Java, was created in large part by Ben Christensen from Netflix and David Karnok. RxJava 1.0 was released in November 2014, followed by RxJava 2.0 in November 2016. RxJava is the backbone to other ReactiveX JVM ports, such as RxScala, RxKotlin, and RxGroovy. It has become a core technology for Android development and has also found its way into Java backend development. Many RxJava adapter libraries, such as RxAndroid , RxJava-JDBC , RxNetty , and RxJavaFX adapted several Java frameworks to become reactive and work with RxJava out-of-the-box.This all shows that RxJava is more than a library. It is part of a greater ReactiveX ecosystem that represents an entire approach to programming. The fundamental idea of ReactiveX is that events are data and data are events. This is a powerful concept that we will explore, but first, let's step back and look at the world through the reactive lens. Thinking reactively Suspend everything you know about Java (and programming in general) for a moment, and let's make some observations about our world. These may sound like obvious statements, but as developers, we can easily overlook them. Bring your attention to the fact that everything is in motion. Traffic, weather, people, conversations, financial transactions, and so on are all moving. Technically, even something stationary as a rock is in motion due to the earth's rotation and orbit. When you consider the possibility that everything can be modeled as in motion, you may find it a bit overwhelming as a developer. Another observation to note is that these different events are happening concurrently. Multiple activities are happening at the same time. Sometimes, they act independently, but other times, they can converge at some point to interact. For instance, a car can drive with no impact on a person jogging. They are two separate streams of events. However, they may converge at some point and the car will stop when it encounters the jogger. If this is how our world works, why do we not model our code this way?. Why do we not model code as multiple concurrent streams of events or data happening at the same time? It is not uncommon for developers to spend more time managing the states of objects and doing it in an imperative and sequential manner. You may structure your code to execute Process 1, Process 2, and then Process 3, which depends on Process 1 and Process 2. Why not kick-off Process 1 and Process 2 simultaneously, and then the completion of these two events immediately kicks-off Process 3? Of course, you can use callbacks and Java concurrency tools, but RxJava makes this much easier and safer to express. Let's make one last observation. A book or music CD is static. A book is an unchanging sequence of words and a CD is a collection of tracks. There is nothing dynamic about them. However, when we read a book, we are reading each word one at a time. Those words are effectively put in motion as a stream being consumed by our eyes. It is no different with a music CD track, where each track is put in motion as sound waves and your ears are consuming each track. Static items can, in fact, be put in motion too. This is an abstract but powerful idea because we made each of these static items a series of events. When we level the playing field between data and events by treating them both the same, we unleash the power of functional programming and unlock abilities you previously might have thought impractical. The fundamental idea behind reactive programming is that events are data and data are events. This may seem abstract, but it really does not take long to grasp when you consider our real-world examples. The runner and car both have properties and states, but they are also in motion. The book and CD are put in motion when they are consumed. Merging the event and data to become one allows the code to feel organic and representative of the world we are modeling. Why should I learn RxJava?  ReactiveX and RxJava paints a broad stroke against many problems programmers face daily, allowing you to express business logic and spend less time engineering code. Have you ever struggled with concurrency, event handling, obsolete data states, and exception recovery? What about making your code more maintainable, reusable, and evolvable so it can keep up with your business? It might be presumptuous to call reactive programming a silver bullet to these problems, but it certainly is a progressive leap in addressing them. There is also growing user demand to make applications real time and responsive. Reactive programming allows you to quickly analyze and work with live data sources such as Twitter feeds or stock prices. It can also cancel and redirect work, scale with concurrency, and cope with rapidly emitting data. Composing events and data as streams that can be mixed, merged, filtered, split, and transformed opens up radically effective ways to compose and evolve code. In summary, reactive programming makes many hard tasks easy, enabling you to add value in ways you might have thought impractical earlier. If you have a process written reactively and you discover that you need to run part of it on a different thread, you can implement this change in a matter of seconds. If you find network connectivity issues crashing your application intermittently, you can gracefully use reactive recovery strategies that wait and try again. If you need to inject an operation in the middle of your process, it is as simple as inserting a new operator. Reactive programming is broken up into modular chain links that can be added or removed, which can help overcome all the aforementioned problems quickly. In essence, RxJava allows applications to be tactical and evolvable while maintaining stability in production. A quick exposure to RxJava  Before we dive deep into the reactive world of RxJava, here is a quick exposure to get your feet wet first. In ReactiveX, the core type you will work with is the Observable. We will be learning more about the Observable. But essentially, an Observable pushes things. A given Observable<T>pushes things of type T through a series of operators until it arrives at an Observer that consumes the items. For instance, create a new Launcher.java file in your project and put in the following code: import io.reactivex.Observable; public class Launcher { public static void main(String[] args) { Observable<String> myStrings = Observable.just("Alpha", "Beta", "Gamma", "Delta", "Epsilon"); } } In our main() method,  we have an Observable<String>that will push five string objects. An Observable can push data or events from virtually any source, whether it is a database query or live Twitter feeds. In this case, we are quickly creating an Observable using Observable.just(), which will emit a fixed set of items. However, running this main() method is not going to do anything other than declare Observable<String>. To make this Observable actually push these five strings (which are called emissions), we need an Observer to subscribe to it and receive the items. We can quickly create and connect an Observer by passing a lambda expression that specifies what to do with each string it receives: import io.reactivex.Observable; public class Launcher { public static void main(String[] args) { Observable<String> myStrings = Observable.just("Alpha", "Beta", "Gamma", "Delta", "Epsilon"); myStrings.subscribe(s -> System.out.println(s)); } }  When we run this code, we should get the following output: Alpha Beta Gamma Delta Epsilon What happened here is that our Observable<String> pushed each string object one at a time to our Observer, which we shorthanded using the lambda expression s -> System.out.println(s). We pass each string through the parameter s (which I arbitrarily named) and instructed it to print each one. Lambdas are essentially mini functions that allow us to quickly pass instructions on what action to take with each incoming item. Everything to the left of the arrow -> are arguments (which in this case is a string we named s), and everything to the right is the action (which is System.out.println(s)). Summary So in this article, we learned how to look at the world in a reactive way. As a developer, you may have to retrain yourself from a traditional imperative mindset and develop a reactive one. Especially if you have done imperative, object-oriented programming for a long time, this can be challenging. But the return on investment will be significant as your applications will become more maintainable, scalable, and evolvable. You will also have faster turn around and more legible code. We also got a brief introduction to reactive code and how Observable work through push-based iteration. You will hopefully find reactive programming intuitive and easy to reason with. I hope you find that RxJava not only makes you more productive, but also helps you take on tasks you hesitated to do earlier. So let's get started! Resources for Article: Further resources on this subject: Understanding the Basics of RxJava [article] Filtering a sequence [article] An Introduction to Reactive Programming [article]
Read more
  • 0
  • 0
  • 1014
article-image-writing-your-first-cucumber-appium-test
Packt
27 Jun 2017
12 min read
Save for later

Writing Your First Cucumber Appium Test

Packt
27 Jun 2017
12 min read
In this article, by Nishant Verma, author of the book Mobile Test Automation with Appium, you will learn about creating a new cucumber, appium Java project in IntelliJ. Next, you will learn to write a sample feature and automate, thereby learning how to start appium server session with an app using appium app, find locators using appium inspector and write java classes for each step implementation in the feature file. We will also discuss, how to write the test for mobile web and use Chrome Developer Tool to find the locators. Let's get started! In this article, we will discuss the following topics: Create a sample Java project (using gradle) Introduction to Cucumber Writing first appium test Starting appium server session and finding locators Write a test for mobile web (For more resources related to this topic, see here.) Create a sample Java project (using gradle) Let's create a sample appium Java project in IntelliJ. Below steps will help you do the same: Launch IntelliJ and click Create New Project on Welcome Screen. On the New Project screen, select Gradle from left pane. Project SDK should get populated with the Java version. Click on Next, enter the GroupId as com.test and ArtifactId as HelloAppium. Version would already be populated. Click on Next. Check the option Use Auto-Import and make sure Gradle JVM is populated. Click on Next. Project name field would be auto populated with what you gave as ArtifactId. Choose a Project location and click on Finish. IntelliJ would be running the background task (Gradle build) which can be seen in the status bar. We should have a project created with default structure. Open the build.gradle file. You would see a message as shown below, click on Ok, apply suggestion! Enter the below two lines in build.gradle. This would add appium and cucumber-jvm under dependencies. compile group: 'info.cukes', name: 'cucumber-java', version: '1.2.5' compile group: 'io.appium', name: 'java-client', version: '5.0.0-BETA6' Below is how the gradle file should look like: group 'com.test' version '1.0-SNAPSHOT' apply plugin: 'java' sourceCompatibility = 1.5 repositories { mavenCentral() } dependencies { testCompile group: 'junit', name: 'junit', version: '4.11' compile group: 'info.cukes', name: 'cucumber-java', version:'1.2.5' compile group: 'io.appium', name: 'java-client', version:'5.0.0-BETA6' } Once done, navigate to View -> Tools Window -> Gradle and click on Refresh all gradle projects icon. This would pull all the dependency in External Libraries. Navigate to Preferences -> Plugins, search for cucumber for Java and click on Install (if it's not previously installed). Repeat the above step for gherkin and install the same. Once done restart IntelliJ, if it prompts. We are now ready to write our first sample feature file but before that let's try to understand a brief about cucumber. Introduction to Cucumber Cucumber is a test framework which supports behaviour driven development (or BDD). The core idea behind BDD is a domain specific language (known as DSL), where the tests are written in normal English, expressing how the application or system has to behave. DSL is an executable test, which starts with a known state, perform some action and verify the expected state. For e.g. Feature: Registration with Facebook/Google Scenario: Registration Flow Validation via App As a user I should be able to see Facebook/Google button. When I try to register myself in Quikr. Given I launch the app When I click on Register Then I should see register with Facebook and Google Dan North (creator of BDD) defined behaviour-driven development in 2009 as- BDD is a second-generation, outside-in, pull-based, multiple-stakeholder, multiple-scale, high-automation, agile methodology. It describes a cycle of interactions with well-defined outputs, resulting in the delivery of working, tested software that matters. Cucumber feature files serve as a living documentation which can be implemented in many languages. It was first implemented in Ruby and later extended to Java. Some of the basic features of Cucumber are: The core of cucumber is text files called feature which contains scenario. These scenarios expresses the system or application behaviour. Scenario files comprise of steps which are written following the syntax of Gherkin. Each step will have step implementation which is the code behind which interacts with application. So in the above example, Feature, Scenario, Given, When, Then are keywords. Feature: Cucumber tests are grouped into features. We use this name because we want engineers to describe the features that a user will be able to use. Scenario: A Scenario expresses the behaviour we want, each feature contains several scenarios. Each scenario is an example of how the system should behave in a particular situation. The expected behaviour of the feature would be the total scenarios. For a feature to pass all scenarios must pass. Test Runner: There are different way to run the feature file however we would be using the JUnit runner initially and then move on to gradle command for command line execution. So I am hoping now we have a brief idea of what cucumber is. Further details can be read on their site (https://cucumber.io/). In the coming section, we will create a feature file, write a scenario, implement the code behind and execute it. Writing first appium test Till now we have created a sample Java project and added the appium dependency. Next we need to add a cucumber feature file and implement the code behind. Let's start that: Under Project folder, create the directory structure src/test/java/features. Right click on the features folder and select New -> File and enter name as Sample.feature In the Sample.feature file, let's write a scenario as shown below which is about log in using Google. Feature: Hello World. Scenario: Registration Flow Validation via App. As a user I should be able to see my google account. when I try to register myself in Quikr. When I launch Quikr app. And I choose to log in using Google. Then I see account picker screen with my email address "[email protected]". Right click on the java folder in IntelliJ, select New -> Package and enter name as steps. Next step is to implement the cucumber steps, click on the first line in Sample.feature file When I launch Quikr app and press Alt+Enter, then select the option Create step definition. It will present you with a pop up to enter File name, File location and File type. We need to enter below values: File name: HomePageSteps File Location: Browse it to the steps folder created above File Type: Java So the idea is that the steps will belong to a page and each page would typically have it's own step implementation class. Once you click on OK, it will create a sample template in the HomePageSteps class file. Now we need to implement these methods and write the code behind to launch Quikr app on emulator. Starting appium server session and finding locators First thing we need to do, is to download a sample app (Quikr apk in this case). Download the Quikr app (version 9.16). Create a folder named app under the HelloAppium project and copy the downloaded apk under that folder. Launch the appium GUI app. Launch the emulator or connect your device (assuming you have Developer Options enabled). On the appium GUI app, click on the android icon and select the below options: App Path - browse to the .apk location under the app folder. Platform Name - Android. Automation Name - Appium. Platform Version - Select the version which matches the emulator from the dropdown, it allows to edit the value. Device Name - enter any string e.g. Nexus6. Once the above settings are done, click on General Settings icon and choose the below mentioned settings. Once the setup is done, click on the icon to close the pop up. Select Prelaunch application Select Strict Capabilities Select Override Existing Sessions Select Kill Processes Using Server Port Before Launch Select New Command Timeout and enter the value 7200 Click on Launch This would start the appium server session. Once you click on Appium Inspector, it will install the app on emulator and launch the same. If you click on the Record button, it will generate the boilerplate code which has Desired Capabilities respective to the run environment and app location: We can copy the above line and put into the code template generated for the step When I launch Quikr app. This is how the code should look like after copying it in the method: @When("^I launch Quikr app$") public void iLaunchQuikrApp() throws Throwable { DesiredCapabilities capabilities = new DesiredCapabilities(); capabilities.setCapability("appium-version", "1.0"); capabilities.setCapability("platformName", "Android"); capabilities.setCapability("platformVersion", "5.1"); capabilities.setCapability("deviceName", "Nexus6"); capabilities.setCapability("app", "/Users/nishant/Development/HelloAppium/app/quikr.apk"); AppiumDriver wd = new AppiumDriver(new URL("http://0.0.0.0:4723/wd/hub"), capabilities); wd.manage().timeouts().implicitlyWait(60, TimeUnit. SECONDS ); } Now the above code only sets the Desired Capabilities, appium server is yet to be started. For now, we can start it from outside like terminal (or command prompt) by running the command appium. We can close the appium inspector and stop the appium server by click in onStop on the appium GUI app. To run the above test, we need to do the following: Start the appium server via command line (Command: appium --session-override ). In IntelliJ, right click on the feature file and choose the option to "Run...". Now the scope of AppiumDriver is local to the method, hence we can refactor and extract appiumDriver as field. To continue with the other steps automation, we can use the appium inspector to find the element handle. We can launch appium inspector using the above mentioned steps, then click on the element whose locator we want to find out as shown in the below mentioned screen. Once we have the locator, we can use the appium api (as shown below) to click it: appiumDriver.findElement(By.id("sign_in_button")).click(); This way we can implement the remaining steps. Write a small test for mobile web To automate mobile web app, we don't need to install the app on the device. We need a browser and the app url which is sufficient to start the test automation. We can tweak the above written code by adding a desired capability browserName. We can write a similar scenario and make it mobile web specific: Scenario: Registration Flow Validation via web As a User I want to verify that I get the option of choosing Facebook when I choose to register When I launch Quikr mobile web And I choose to register Then I should see an option to register using Facebook So the method for mobile web would look like: @When("^I launch Quikr mobile web$") public void iLaunchQuikrMobileWeb() throws Throwable { DesiredCapabilities desiredCapabilities = new DesiredCapabilities(); desiredCapabilities.setCapability("platformName", "Android"); desiredCapabilities.setCapability("deviceName", "Nexus"); desiredCapabilities.setCapability("browserName", "Browser"); URL url = new URL("http://127.0.0.1:4723/wd/hub"); appiumDriver = new AppiumDriver(url, desiredCapabilities); appiumDriver.get("http://m.quikr.com"); } So in the above code, we don't need platformVersion and we need a valid value for browserName parameter. Possible values for browserName are: Chrome - For Chrome browser on Android Safari - For Safari browser on iOS Browser - For stock browser on Android We can follow the same steps as above to run the test. Finding locators in mobile web app To implement the remaining steps of above mentioned feature, we need to find locators for the elements we want to interact with. Once the locators are found then we need to perform the desired operation which could be click, send keys etc. Below mentioned are the steps which will help us find the locators for a mobile web app: Launch the chrome browser on your machine and navigate to the mobile site (in our case:  http://m.quikr.com) Select More Tools -> Developer Tools from the Chrome Menu In the Developer Tool menu items, click on the Toggle device toolbar icon. Once done the page would be displayed in a mobile layout format. In order to find the locator of any UI element, click on the first icon of the dev tool bar and then click on the desired element. The HTML in the dev tool layout would change to highlight the selected element. Refer the below screenshot which shows the same. In the highlight panel on the right side, we can see the following properties name=query and id=query . We can choose to use id and implement the step as: appiumDriver.findElement(By.id("query")).click(); Using the above way, we can find the locator of the various elements we need to interact with and proceed with our test automation. Summary So in this article, we briefly described how we would go about writing test for a native app as well as a mobile web. We discussed how to create a project in IntelliJ and write a sample feature file. We also learned how to start the appium inspector and look for locator. We learned about the chrome dev tool and how can use the same to find locator for mobile web. Resources for Article: Further resources on this subject: Appium Essentials [article] Ensuring Five-star Rating in the MarketPlace [article] Testing in Agile Development and the State of Agile Adoption [article]
Read more
  • 0
  • 0
  • 18153

article-image-exploring-compilers
Packt
23 Jun 2017
17 min read
Save for later

Exploring Compilers

Packt
23 Jun 2017
17 min read
In this article by Gabriele Lanaro, author of the book, Python High Performance - Second Edition, we will see that Python is a mature and widely used language and there is a large interest in improving its performance by compiling functions and methods directly to machine code rather than executing instructions in the interpreter. In this article, we will explore two projects--Numba and PyPy--that approach compilation in a slightly different way. Numba is a library designed to compile small functions on the fly. Instead of transforming Python code to C, Numba analyzes and compiles Python functions directly to machine code. PyPy is a replacement interpreter that works by analyzing the code at runtime and optimizing the slow loops automatically. (For more resources related to this topic, see here.) Numba Numba was started in 2012 by Travis Oliphant, the original author of NumPy, as a library for compiling individual Python functions at runtime using the Low-Level Virtual Machine  ( LLVM ) toolchain. LLVM is a set of tools designed to write compilers. LLVM is language agnostic and is used to write compilers for a wide range of languages (an important example is the clang compiler). One of the core aspects of LLVM is the intermediate representation (the LLVM IR), a very low-level platform-agnostic language similar to assembly, that can be compiled to machine code for the specific target platform. Numba works by inspecting Python functions and by compiling them, using LLVM, to the IR. As we have already seen in the last article, the speed gains can be obtained when we introduce types for variables and functions. Numba implements clever algorithms to guess the types (this is called type inference) and compiles type-aware versions of the functions for fast execution. Note that Numba was developed to improve the performance of numerical code. The development efforts often prioritize the optimization of applications that intensively use NumPy arrays. Numba is evolving really fast and can have substantial improvements between releases and, sometimes, backward incompatible changes.  To keep up, ensure that you refer to the release notes for each version. In the rest of this article, we will use Numba version 0.30.1; ensure that you install the correct version to avoid any error. The complete code examples in this article can be found in the Numba.ipynb notebook. First steps with Numba Getting started with Numba is fairly straightforward. As a first example, we will implement a function that calculates the sum of squares of an array. The function definition is as follows: def sum_sq(a): result = 0 N = len(a) for i in range(N): result += a[i] return result To set up this function with Numba, it is sufficient to apply the nb.jit decorator: from numba import nb @nb.jit def sum_sq(a): ... The nb.jit decorator won't do much when applied. However, when the function will be invoked for the first time, Numba will detect the type of the input argument, a , and compile a specialized, performant version of the original function. To measure the performance gain obtained by the Numba compiler, we can compare the timings of the original and the specialized functions. The original, undecorated function can be easily accessed through the py_func attribute. The timings for the two functions are as follows: import numpy as np x = np.random.rand(10000) # Original %timeit sum_sq.py_func(x) 100 loops, best of 3: 6.11 ms per loop # Numba %timeit sum_sq(x) 100000 loops, best of 3: 11.7 µs per loop You can see how the Numba version is order of magnitude faster than the Python version. We can also compare how this implementation stacks up against NumPy standard operators: %timeit (x**2).sum() 10000 loops, best of 3: 14.8 µs per loop In this case, the Numba compiled function is marginally faster than NumPy vectorized operations. The reason for the extra speed of the Numba version is likely that the NumPy version allocates an extra array before performing the sum in comparison with the in-place operations performed by our sum_sq function. As we didn't use array-specific methods in sum_sq, we can also try to apply the same function on a regular Python list of floating point numbers. Interestingly, Numba is able to obtain a substantial speed up even in this case, as compared to a list comprehension: x_list = x.tolist() %timeit sum_sq(x_list) 1000 loops, best of 3: 199 µs per loop %timeit sum([x**2 for x in x_list]) 1000 loops, best of 3: 1.28 ms per loop Considering that all we needed to do was apply a simple decorator to obtain an incredible speed up over different data types, it's no wonder that what Numba does looks like magic. In the following sections, we will dig deeper and understand how Numba works and evaluate the benefits and limitations of the Numba compiler. Type specializations As shown earlier, the nb.jit decorator works by compiling a specialized version of the function once it encounters a new argument type. To better understand how this works, we can inspect the decorated function in the sum_sq example. Numba exposes the specialized types using the signatures attribute. Right after the sum_sq definition, we can inspect the available specialization by accessing the sum_sq.signatures, as follows: sum_sq.signatures # Output: # [] If we call this function with a specific argument, for instance, an array of float64 numbers, we can see how Numba compiles a specialized version on the fly. If we also apply the function on an array of float32, we can see how a new entry is added to the sum_sq.signatures list: x = np.random.rand(1000).astype('float64') sum_sq(x) sum_sq.signatures # Result: # [(array(float64, 1d, C),)] x = np.random.rand(1000).astype('float32') sum_sq(x) sum_sq.signatures # Result: # [(array(float64, 1d, C),), (array(float32, 1d, C),)] It is possible to explicitly compile the function for certain types by passing a signature to the nb.jit function. An individual signature can be passed as a tuple that contains the type we would like to accept. Numba provides a great variety of types that can be found in the nb.types module, and they are also available in the top-level nb namespace. If we want to specify an array of a specific type, we can use the slicing operator, [:], on the type itself. In the following example, we demonstrate how to declare a function that takes an array of float64 as its only argument: @nb.jit((nb.float64[:],)) def sum_sq(a): Note that when we explicitly declare a signature, we are prevented from using other types, as demonstrated in the following example. If we try to pass an array, x, as float32, Numba will raise a TypeError: sum_sq(x.astype('float32')) # TypeError: No matching definition for argument type(s) array(float32, 1d, C) Another way to declare signatures is through type strings. For example, a function that takes a float64 as input and returns a float64 as output can be declared with the float64(float64) string. Array types can be declared using a [:] suffix. To put this together, we can declare a signature for our sum_sq function, as follows: @nb.jit("float64(float64[:])") def sum_sq(a): You can also pass multiple signatures by passing a list: @nb.jit(["float64(float64[:])", "float64(float32[:])"]) def sum_sq(a): Object mode versus native mode So far, we have shown how Numba behaves when handling a fairly simple function. In this case, Numba worked exceptionally well, and we obtained great performance on arrays and lists.The degree of optimization obtainable from Numba depends on how well Numba is able to infer the variable types and how well it can translate those standard Python operations to fast type-specific versions. If this happens, the interpreter is side-stepped and we can get performance gains similar to those of Cython. When Numba cannot infer variable types, it will still try and compile the code, reverting to the interpreter when the types can't be determined or when certain operations are unsupported. In Numba, this is called object mode and is in contrast to the intepreter-free scenario, called native mode. Numba provides a function, called inspect_types, that helps understand how effective the type inference was and which operations were optimized. As an example, we can take a look at the types inferred for our sum_sq function: sum_sq.inspect_types() When this function is called, Numba will print the type inferred for each specialized version of the function. The output consists of blocks that contain information about variables and types associated with them. For example, we can examine the N = len(a) line: # --- LINE 4 --- # a = arg(0, name=a) :: array(float64, 1d, A) # $0.1 = global(len: <built-in function len>) :: Function(<built-in function len>) # $0.3 = call $0.1(a) :: (array(float64, 1d, A),) -> int64 # N = $0.3 :: int64 N = len(a) For each line, Numba prints a thorough description of variables, functions, and intermediate results. In the preceding example, you can see (second line) that the argument a is correctly identified as an array of float64 numbers. At LINE 4, the input and return type of the len function is also correctly identified (and likely optimized) as taking an array of float64 numbers and returning an int64. If you scroll through the output, you can see how all the variables have a well-defined type. Therefore, we can be certain that Numba is able to compile the code quite efficiently. This form of compilation is called native mode. As a counter example, we can see what happens if we write a function with unsupported operations. For example, as of version 0.30.1, Numba has limited support for string operations. We can implement a function that concatenates a series of strings, and compiles it as follows: @nb.jit def concatenate(strings): result = '' for s in strings: result += s return result Now, we can invoke this function with a list of strings and inspect the types: concatenate(['hello', 'world']) concatenate.signatures # Output: [(reflected list(str),)] concatenate.inspect_types() Numba will return the output of the function for the reflected list (str) type. We can, for instance, examine how line 3 gets inferred. The output of concatenate.inspect_types() is reproduced here: # --- LINE 3 --- # strings = arg(0, name=strings) :: pyobject # $const0.1 = const(str, ) :: pyobject # result = $const0.1 :: pyobject # jump 6 # label 6 result = '' You can see how this time, each variable or function is of the generic pyobject type rather than a specific one. This means that, in this case, Numba is unable to compile this operation without the help of the Python interpreter. Most importantly, if we time the original and compiled function, we note that the compiled function is about three times slower than the pure Python counterpart: x = ['hello'] * 1000 %timeit concatenate.py_func(x) 10000 loops, best of 3: 111 µs per loop %timeit concatenate(x) 1000 loops, best of 3: 317 µs per loop This is because the Numba compiler is not able to optimize the code and adds some extra overhead to the function call.As you may have noted, Numba compiled the code without complaints even if it is inefficient. The main reason for this is that Numba can still compile other sections of the code in an efficient manner while falling back to the Python interpreter for other parts of the code. This compilation strategy is called object mode. It is possible to force the use of native mode by passing the nopython=True option to the nb.jit decorator. If, for example, we apply this decorator to our concatenate function, we observe that Numba throws an error on first invocation: @nb.jit(nopython=True) def concatenate(strings): result = '' for s in strings: result += s return result concatenate(x) # Exception: # TypingError: Failed at nopython (nopython frontend) This feature is quite useful for debugging and ensuring that all the code is fast and correctly typed. Numba and NumPy Numba was originally developed to easily increase performance of code that uses NumPy arrays. Currently, many NumPy features are implemented efficiently by the compiler. Universal functions with Numba Universal functions are special functions defined in NumPy that are able to operate on arrays of different sizes and shapes according to the broadcasting rules. One of the best features of Numba is the implementation of fast ufuncs. We have already seen some ufunc examples in article 3, Fast Array Operations with NumPy and Pandas. For instance, the np.log function is a ufunc because it can accept scalars and arrays of different sizes and shapes. Also, universal functions that take multiple arguments still work according to the  broadcasting rules. Examples of universal functions that take multiple arguments are np.sum or np.difference. Universal functions can be defined in standard NumPy by implementing the scalar version and using the np.vectorize function to enhance the function with the broadcasting feature. As an example, we will see how to write the Cantor pairing function. A pairing function is a function that encodes two natural numbers into a single natural number so that you can easily interconvert between the two representations. The Cantor pairing function can be written as follows: import numpy as np def cantor(a, b): return int(0.5 * (a + b)*(a + b + 1) + b) As already mentioned, it is possible to create a ufunc in pure Python using the np.vectorized decorator: @np.vectorize def cantor(a, b): return int(0.5 * (a + b)*(a + b + 1) + b) cantor(np.array([1, 2]), 2) # Result: # array([ 8, 12]) Except for the convenience, defining universal functions in pure Python is not very useful as it requires a lot of function calls affected by interpreter overhead. For this reason, ufunc implementation is usually done in C or Cython, but Numba beats all these methods by its convenience. All that is needed to do in order to perform the conversion is using the equivalent decorator, nb.vectorize. We can compare the speed of the standard np.vectorized version which, in the following code, is called cantor_py, and the same function is implemented using standard NumPy operations: # Pure Python %timeit cantor_py(x1, x2) 100 loops, best of 3: 6.06 ms per loop # Numba %timeit cantor(x1, x2) 100000 loops, best of 3: 15 µs per loop # NumPy %timeit (0.5 * (x1 + x2)*(x1 + x2 + 1) + x2).astype(int) 10000 loops, best of 3: 57.1 µs per loop You can see how the Numba version beats all the other options by a large margin! Numba works extremely well because the function is simple and type inference is possible. An additional advantage of universal functions is that, since they depend on individual values, their evaluation can also be executed in parallel. Numba provides an easy way to parallelize such functions by passing the target="cpu" or target="gpu" keyword argument to the nb.vectorize decorator. Generalized universal functions One of the main limitations of universal functions is that they must be defined on scalar values. A generalized universal function, abbreviated gufunc, is an extension of universal functions to procedures that take arrays. A classic example is the matrix multiplication. In NumPy, matrix multiplication can be applied using the np.matmul function, which takes two 2D arrays and returns another 2D array. An example usage of np.matmul is as follows: a = np.random.rand(3, 3) b = np.random.rand(3, 3) c = np.matmul(a, b) c.shape # Result: # (3, 3) As we saw in the previous subsection, a ufunc broadcasts the operation over arrays of scalars, its natural generalization will be to broadcast over an array of arrays. If, for instance, we take two arrays of 3 by 3 matrices, we will expect np.matmul to take to match the matrices and take their product. In the following example, we take two arrays containing 10 matrices of shape (3, 3). If we apply np.matmul, the product will be applied matrix-wise to obtain a new array containing the 10 results (which are, again, (3, 3) matrices): a = np.random.rand(10, 3, 3) b = np.random.rand(10, 3, 3) c = np.matmul(a, b) c.shape # Output # (10, 3, 3) The usual rules for broadcasting will work in a similar way. For example, if we have an array of (3, 3) matrices, which will have a shape of (10, 3, 3), we can use np.matmul to calculate the matrix multiplication of each element with a single (3, 3) matrix. According to the broadcasting rules, we obtain that the single matrix will be repeated to obtain a size of (10, 3, 3): a = np.random.rand(10, 3, 3) b = np.random.rand(3, 3) # Broadcasted to shape (10, 3, 3) c = np.matmul(a, b) c.shape # Result: # (10, 3, 3) Numba supports the implementation of efficient generalized universal functions through the nb.guvectorize decorator. As an example, we will implement a function that computes the euclidean distance between two arrays as a gufunc. To create a gufunc, we have to define a function that takes the input arrays, plus an output array where we will store the result of our calculation. The nb.guvectorize decorator requires two arguments: The types of the input and output: two 1D arrays as input and a scalar as output The so called layout string, which is a representation of the input and output sizes; in our case, we take two arrays of the same size (denoted arbitrarily by n), and we output a scalar In the following example, we show the implementation of the euclidean function using the nb.guvectorize decorator: @nb.guvectorize(['float64[:], float64[:], float64[:]'], '(n), (n) - > ()') def euclidean(a, b, out): N = a.shape[0] out[0] = 0.0 for i in range(N): out[0] += (a[i] - b[i])**2 There are a few very important points to be made. Predictably, we declared the types of the inputs a and b as float64[:], because they are 1D arrays. However, what about the output argument? Wasn't it supposed to be a scalar? Yes, however, Numba treats scalar argument as arrays of size 1. That's why it was declared as float64[:]. Similarly, the layout string indicates that we have two arrays of size (n) and the output is a scalar, denoted by empty brackets--(). However, the array out will be passed as an array of size 1. Also, note that we don't return anything from the function; all the output has to be written in the out array. The letter n in the layout string is completely arbitrary; you may choose to use k  or other letters of your liking. Also, if you want to combine arrays of uneven sizes, you can use layouts strings, such as (n, m). Our brand new euclidean function can be conveniently used on arrays of different shapes, as shown in the following example: a = np.random.rand(2) b = np.random.rand(2) c = euclidean(a, b) # Shape: (1,) a = np.random.rand(10, 2) b = np.random.rand(10, 2) c = euclidean(a, b) # Shape: (10,) a = np.random.rand(10, 2) b = np.random.rand(2) c = euclidean(a, b) # Shape: (10,) How does the speed of euclidean compare to standard NumPy? In the following code, we benchmark a NumPy vectorized version with our previously defined euclidean function: a = np.random.rand(10000, 2) b = np.random.rand(10000, 2) %timeit ((a - b)**2).sum(axis=1) 1000 loops, best of 3: 288 µs per loop %timeit euclidean(a, b) 10000 loops, best of 3: 35.6 µs per loop The Numba version, again, beats the NumPy version by a large margin! Summary Numba is a tool that compiles fast, specialized versions of Python functions at runtime. In this article, we learned how to compile, inspect, and analyze functions compiled by Numba. We also learned how to implement fast NumPy universal functions that are useful in a wide array of numerical applications.  Tools such as PyPy allow us to run Python programs unchanged to obtain significant speed improvements. We demonstrated how to set up PyPy, and we assessed the performance improvements on our particle simulator application. Resources for Article: Further resources on this subject: Getting Started with Python Packages [article] Python for Driving Hardware [article] Python Data Science Up and Running [article]
Read more
  • 0
  • 0
  • 2106