Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1081 Articles
article-image-apache-karaf-provisioning-and-clusters
Packt
18 Jul 2014
12 min read
Save for later

Apache Karaf – Provisioning and Clusters

Packt
18 Jul 2014
12 min read
(For more resources related to this topic, see here.) In this article, we will cover the following topics: What is OSGi and what are its key features? The role of the OSGi framework The OSGi base artifact—the OSGi bundle and the concept of dependencies between bundles The Apache Karaf OSGi container and the provisioning of applications in the container How to manage the provisioning on multiple Karaf instances? What is OSGi? Developers are always looking for very dynamic, flexible, and agile software components. The purposes to do so are as follows: Reuse: This feature states that instead of duplicating the code, a component should be shared by other components, and multiple versions of the same component should be able to cohabit. Visibility: This feature specifies that a component should not use the implementation from another component directly. The implementation should be hidden, and the client module should use the interface provided by another component. Agility: This feature specifies that the deployment of a new version of a component should not require you to restart the platform. Moreover, a configuration change should not require a restart. For instance, it's not acceptable to restart a production platform just to change a log level. A minor change such as a log level should be dynamic, and the platform should be agile enough to reload the components that should be reloaded. Discovery: This feature states that a component should be able to discover other components. It's a kind of Plug and Play system: as soon as a component needs another component, it just looks for it and uses it. OSGi has been created to address the preceding points. The core concept is to force developers to use a very modular architecture in order to reduce complexity. As this paradigm is applicable for most modern systems, OSGi is now used for small embedded devices as well as for very large systems. Different applications and systems use OSGi, for example, desktop applications, application servers, frameworks, embedded devices, and so on. The OSGi framework OSGi is designed to run in Java. In order to provide these features and deploy OSGi applications, a core layer has to be deployed in the Java Virtual Machine (JVM): the OSGi framework. This framework manages the life cycle and the relationship between the different OSGi components and artifacts. The OSGi bundle In OSGi, the components are packaged as OSGi bundles. An OSGi bundle is a simple Java JAR (Java ARchive) file that contains additional metadata used by the OSGi framework. These metadata are stored in the manifest file of the JAR file. The following is the metadata: Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Version: 2.1.6 Bundle-Name: My Logger Bundle-SymbolicName: my_logger Export-Package: org.my.osgi.logger;version=2.1 Import-Package: org.apache.log4j;version="[1.2,2)" Private-Package: org.my.osgi.logger.internal We can see that OSGi is very descriptive and verbose. We explicitly describe all the OSGi metadata (headers), including the package that we export or import with a specified version or version range. As the OSGi headers are defined in the META-INF/MANIFEST file contained in the JAR file, it means that an OSGi bundle is a regular JAR file that you can use outside of OSGi. The life cycle layer of the OSGi framework is an API to install, start, stop, update, and uninstall OSGi bundles. Dependency between bundles An OSGi bundle can use other bundles from the OSGi framework in two ways. The first way is static code sharing. When we say that this bundle exports packages, it means a bundle can expose some code for other bundles. On the other hand, when we say that this bundle imports packages, it means a bundle can use code from other bundles. For instance, we have the bundle A (packaged as the bundleA.jar file) with the following META-INF/MANIFEST file: Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Version: 1.0.0 Bundle-Name: Bundle A Bundle-SymbolicName: bundle_a Export-Package: com.bundle.a;version=1.0 We can see that the bundle A exposes (exports) the com.bundle.a package with Version 1.0. On the other hand, we have the bundle B (packaged as the bundleB.jar file) with the following META-INF/MANIIFEST file: Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Version: 2.0.0 Bundle-Name: Bundle B Bundle-SymbolicName: bundle_b Import-Package: com.bundle.a;version="[1.0,2)" We can see that the bundle B imports (so, it will use) the com.bundle.a package in any version between 1.0 and 2 (excluded). So, this means that the OSGi framework will wire the bundles, as the bundle A provides the package used by the bundle B (so, the constraint is resolved). This mechanism is similar to regular Java applications, but instead of embedding the required JAR files in your application, you can just declare the expected code. The OSGi framework is responsible for the link between the different bundles; it's done by the modules layer of the OSGi framework. This approach is interesting when you want to use code which is not natively designed for OSGi. It's a step forward for the reuse of components. However, it provides a limited answer to the purposes seen earlier in the article, especially visibility and discovery. The second way in which an OSGi bundle can use other bundles from the OSGi framework is more interesting. It uses Service-Oriented Architecture (SOA) for low-level components. Here, more than exposing the code, an OSGi bundle exposes a OSGi service. On the other hand, another bundle can use an OSGi service. The services layer of the OSGi framework provides a service registry and all the plumbing mechanisms to wire the services. The OSGi services provide a very dynamic system, offering a Publish-Find-Bind model for the bundles. The OSGi container The OSGi container provides a set of additional features on top of the OSGi framework, as shown in the following diagram: Apache Karaf provides the following features: It provides the abstraction of the OSGi framework. If you write an OSGi application, you have to package your application tightly coupled with the OSGi framework (such as the Apache Felix framework or Eclipse Equinox). Most of the time, you have to prepare the scripts, configuration files, and so on in order to provide a complete, ready-to-use application. Apache Karaf allows you to focus only on your application. Karaf, by default, provides the packaging (including scripts and so on), and it also abstracts the OSGi framework. Thanks to Karaf, it's very easy to switch from Apache Felix (the default framework in Karaf) to Eclipse Equinox. Provides support for the OSGi Blueprint and Spring frameworks. Apache Karaf allows you to directly use Blueprint or Spring as the dependency framework in your bundles. In the new version of Karaf (starting from Karaf 3.0.1), it also supports new dependency frameworks (such as DS, CDI, and so on). Apache Karaf provides a complete, Unix-like shell console where you have a lot of commands available to manage and monitor your running container. This shell console works on any system supporting Java and provides a complete Unix-like environment, including completion, contextual help, key bindings, and more. You can access the shell console using SSH. Apache Karaf also provides a complete management layer (using JMX) that is remotely accessible, which means you can perform the same actions as you do using the shell commands with several MBeans. In addition to the default root Apache Karaf container, for convenience, Apache Karaf allows you to manage multiple container instances. Apache Karaf provides dedicated commands and MBeans to create the instances, control the instances, and so on. Logging is a key layer for any kind of software container. Apache Karaf provides a powerful and very dynamic logging system powered by Pax Logging. In your OSGi application, you are not coupled to a specific logging framework; you can use the framework of your choice (slf4j, log4j, logback, commons-logging, and so on). Apache Karaf uses a central configuration file irrespective of the logging frameworks in use. All changes in this configuration file are made on the fly; no need to restart anything. Again, Apache Karaf provides commands and MBeans dedicated to log management (changing the log level, direct display of the log in the shell console, and so on). Hot deployment is also an interesting feature provided by Apache Karaf. By default, the container monitors a deploy folder periodically. When a new file is dropped in the deploy folder, Apache Karaf checks the file type and delegates the deployment logic for this file to a deployer. Apache Karaf provides different deployers by default (spring, blueprint, features, war, and so on). If Java Authentication and Authorization Service (JAAS) is the Java implementation of Pluggable Authentication Modules (PAM), it's not very OSGi compliant by default. Apache Karaf leverages JAAS, exposing realm and login modules as OSGi services. Again, Apache Karaf provides dedicated JAAS shell commands and MBeans. The security framework is very flexible, allowing you to define the chain of login modules that you want for authentication. By default, Apache Karaf uses a PropertiesLoginModule using the etc/users.properties file for storage. The security framework also provides support for password encryption (you just have to enable encryption in the etc/org.apache.karaf.jaas.cfg configuration file). The new Apache Karaf version (3.0.0) also provides a complete Role Based Access Control (RBAC) system, allowing you to configure the users who can run commands, call MBeans, and so on. Apache Karaf is an enterprise-ready container and provides features dedicated to enterprise. The following enterprise features are not installed by default (to minimize the size and footprint of the container by default), but a simple command allows you to extend the container with enterprise functionalities: WebContainer allows you to deploy a Web Application Bundle (WAB) or WAR file. Apache Karaf is a complete HTTP server with JSP/servlet support, thanks to Pax Web. Java Naming and Directory Interface (JNDI) adds naming context support in Apache Karaf. You can bind an OSGi service to a JNDI name and look up these services using the name, thanks to Aries and Xbean naming. Java Transaction API (JTA) allows you to add a transaction engine (exposed as an OSGi service) in Apache Karaf, thanks to Aries JTA. Java Persistence API (JPA) allows you to add a persistence adapter (exposed as an OSGi service) in Apache Karaf, thanks to Aries JPA. Ready-to-use persistence engines can also be installed very easily (especially Apache OpenJPA and Hibernate). Java Database Connectivity (JDBC) or Java Message Service (JMS) are convenient features, allowing you to easily create JDBC DataSources or JMS ConnectionFactories and use them directly in the shell console. If you can completely administrate Apache Karaf using the shell commands and the JMX MBeans, you can also install Web Console. This Web Console uses the Felix Web Console and allows you to manage Karaf with a simple browser. Thanks to these features, Apache Karaf is a complete, rich, and enterprise-ready container. We can consider Apache Karaf as an OSGi application server. Provisioning in Apache Karaf In addition, Apache Karaf provides three core functionalities that can be used both internally in Apache Karaf or can be used by external applications deployed in the container: OSGi bundle management Configuration management Provisioning using Karaf Features As we learned earlier, the default artifact in OSGi is the bundle. Again, it's a regular JAR file with additional OSGi metadata in the MANIFEST file. The bundles are directly managed by the OSGi framework, but for convenience, Apache Karaf wraps the usage of bundles in specific commands and MBeans. A bundle has a specific life cycle. Especially when you install a bundle, the OSGi framework tries to resolve all the dependencies required by your bundle to promote it in a resolved state. The following is the life cycle of a bundle: The OSGi framework checks whether other bundles provide the packages imported by your bundle. The equivalent action for the OSGi services is performed when you start your bundle. It means that a bundle may require a lot of other bundles to start and so on for the transitive bundles. Moreover, a bundle may require configuration to work. Apache Karaf proposes a very convenient way to manage the configurations. The etc folder is periodically monitored to discover new configuration files and load the corresponding configurations. On the other hand, you have dedicated shell commands and MBeans to manage configurations (and configuration files). If a bundle requires a configuration to work, you first have to create a configuration file in the etc folder (with the expected filename) or use the config:* shell command or ConfigMBean to create the configuration. Considering that an OSGi application is a set of bundles, the installation of an OSGi application can be long and painful by hand. The deployment of an OSGi application is called provisioning as it gathers the following: The installation of a set of bundles, including transitive bundles The installation of a set of configurations required by these bundles OBR OSGi Bundle Repository (OBR) can be the first option to be considered in order to solve this problem. Apache Karaf can connect to the OBR server. The OBR server stores all the metadata for all the bundles, which includes the capabilities, packages, and services provided by a bundle and the requirements, packages, and services needed by a bundle. When you install a bundle via OBR, the OBR server checks the requirement of the installed bundle and finds the bundles that provide the capabilities matching the requirements. The OBR server can automatically install the bundles required for the first one.
Read more
  • 0
  • 0
  • 1684

Packt
18 Jul 2014
17 min read
Save for later

C10K – A Non-blocking Web Server in Go

Packt
18 Jul 2014
17 min read
This article by Nathan Kozyra, author of Mastering Concurrency in Go, tackles one of the Internet's most famous and esteemed challenges and attempt to solve it with core Go packages. (For more resources related to this topic, see here.) We've built a few usable applications; things we can start with and leapfrog into real systems for everyday use. By doing so, we've been able to demonstrate the basic and intermediate-level patterns involved in Go's concurrent syntax and methodology. However, it's about time we take on a real-world problem—one that has vexed developers (and their managers and VPs) for a great deal of the early history of the Web. In addressing and, hopefully, solving this problem, we'll be able to develop a high-performance web server that can handle a very large volume of live, active traffic. For many years, the solution to this problem was solely to throw hardware or intrusive caching systems at the problem; so, alternately, solving it with programming methodology should excite any programmer. We'll be using every technique and language construct we've learned so far, but we'll do so in a more structured and deliberate way than we have up to now. Everything we've explored so far will come into play, including the following points: Creating a visual representation of our concurrent application Utilizing goroutines to handle requests in a way that will scale Building robust channels to manage communication between goroutines and the loop that will manage them Profiling and benchmarking tools (JMeter, ab) to examine the way our event loop actually works Timeouts and concurrency controls—when necessary—to ensure data and request consistency Attacking the C10K problem The genesis of the C10K problem is rooted in serial, blocking programming, which makes it ideal to demonstrate the strength of concurrent programming, especially in Go. When he asked this in 1999, for many server admins and engineers, serving 10,000 concurrent visitors was something that would be solved with hardware. The notion that a single server on common hardware could handle this type of CPU and network bandwidth without falling over seemed foreign to most. The crux of his proposed solutions relied on producing non-blocking code. Of course, in 1999, concurrency patterns and libraries were not widespread. C++ had some polling and queuing options available via some third-party libraries and the earliest predecessor to multithreaded syntaxes, later available through Boost and then C++11. Over the coming years, solutions to the problem began pouring in across various flavors of languages, programming design, and general approaches. Any performance and scalability problem will ultimately be bound to the underlying hardware, so as always, your mileage may vary. Squeezing 10,000 concurrent connections on a 486 processor with 500 MB of RAM will certainly be more challenging than doing so on a barebones Linux server stacked with memory and multiple cores. It's also worth noting that a simple echo server would obviously be able to assume more cores than a functional web server that returns larger amounts of data and accepts greater complexity in requests, sessions, and so on, as we'll be dealing with here. Failing of servers at 10,000 concurrent connections When the Web was born and the Internet commercialized, the level of interactivity was pretty minimal. If you're a graybeard, you may recall the transition from NNTP/IRC and the like and how extraordinarily rudimentary the Web was. To address the basic proposition of [page request] → [HTTP response], the requirements on a web server in the early 1990s were pretty lenient. Ignoring all of the error responses, header reading, and settings, and other essential (but unrelated to the in → out mechanism) functions, the essence of the early servers was shockingly simple, at least compared to the modern web servers. The first web server was developed by the father of the Web, Tim Berners-Lee. Developed at CERN (such as WWW/HTTP itself), CERN httpd handled many of the things you would expect in a web server today—hunting through the code, you'll find a lot of notation that will remind you that the very core of the HTTP protocol is largely unchanged. Unlike most technologies, HTTP has had an extraordinarily long shelf life. Written in C in 1990, it was unable to utilize a lot of concurrency strategies available in languages such as Erlang. Frankly, doing so was probably unnecessary—the majority of web traffic was a matter of basic file retrieval and protocol. The meat and potatoes of a web server were not dealing with traffic, but rather dealing with the rules surrounding the protocol itself. You can still access the original CERN httpd site and download the source code for yourself from http://www.w3.org/Daemon/. I highly recommend that you do so as both a history lesson and a way to look at the way the earliest web server addressed some of the earliest problems. However, the Web in 1990 and the Web when the C10K question was first posed were two very different environments. By 1999, most sites had some level of secondary or tertiary latency provided by third-party software, CGI, databases, and so on, all of which further complicated the matter. The notion of serving 10,000 flat files concurrently is a challenge in itself, but try doing so by running them on top of a Perl script that accesses a MySQL database without any caching layer; this challenge is immediately exacerbated. By the mid 1990s, the Apache web server had taken hold and largely controlled the market (by 2009, it had become the first server software to serve more than 100 million websites). Apache's approach was rooted heavily in the earliest days of the Internet. At its launch, connections were initially handled first in, first out. Soon, each connection was assigned a thread from the thread pool. There are two problems with the Apache server. They are as follows: Blocking connections can lead to a domino effect, wherein one or more slowly resolved connections could avalanche into inaccessibility Apache had hard limits on the number of threads/workers you could utilize, irrespective of hardware constraints It's easy to see the opportunity here, at least in retrospect. A concurrent server that utilizes actors (Erlang), agents (Clojure), or goroutines (Go) seems to fit the bill perfectly. Concurrency does not solve the C10k problem in itself, but it absolutely provides a methodology to facilitate it. The most notable and visible example of an approach to the C10K problem today is Nginx, which was developed using concurrency patterns, widely available in C by 2002 to address—and ultimately solve—the C10k problem. Nginx, today, represents either the #2 or #3 web server in the world, depending on the source. Using concurrency to attack C10K There are two primary approaches to handle a large volume of concurrent requests. The first involves allocating threads per connection. This is what Apache (and a few others) do. On the one hand, allocating a thread to a connection makes a lot of sense—it's isolated, controllable via the application's and kernel's context switching, and can scale with increased hardware. One problem for Linux servers—on which the majority of the Web lives—is that each allocated thread reserves 8 MB of memory for its stack by default. This can (and should) be redefined, but this imposes a largely unattainable amount of memory required for a single server. Even if you set the default stack size to 1 MB, we're dealing with a minimum of 10 GB of memory just to handle the overhead. This is an extreme example that's unlikely to be a real issue for a couple of reasons: first, because you can dictate the maximum amount of resources available to each thread, and second, because you can just as easily load balance across a few servers and instances rather than add 10 GB to 80 GB of RAM. Even in a threaded server environment, we're fundamentally bound to the issue that can lead to performance decreases (to the point of a crash). First, let's look at a server with connections bound to threads (as shown in the following diagram), and visualize how this can lead to logjams and, eventually, crashes: This is obviously what we want to avoid. Any I/O, network, or external process that can impose some slowdown can bring about that avalanche effect we talked about, such that our available threads are taken (or backlogged) and incoming requests begin to stack up. We can spawn more threads in this model, but as mentioned earlier, there are potential risks there too, and even this will fail to mitigate the underlying problem. Taking another approach In an attempt to create our web server that can handle 10,000 concurrent connections, we'll obviously leverage our goroutine/channel mechanism to put an event loop in front of our content delivery to keep new channels recycled or created constantly. For this example, we'll assume we're building a corporate website and infrastructure for a rapidly expanding company. To do this, we'll need to be able to serve both static and dynamic content. The reason we want to introduce dynamic content is not just for the purposes of demonstration—we want to challenge ourselves to show 10,000 true concurrent connections even when a secondary process gets in the way. As always, we'll attempt to map our concurrency strategy directly to goroutines and channels. In a lot of other languages and applications, this is directly analogous to an event loop, and we'll approach it as such. Within our loop, we'll manage the available goroutines, expire or reuse completed ones, and spawn new ones where necessary. In this example visualization, we show how an event loop (and corresponding goroutines) can allow us to scale our connections without employing too many hard resources such as CPU threads or RAM: The most important step for us here is to manage that event loop. We'll want to create an open, infinite loop to manage the creation and expiration of our goroutines and respective channels. As part of this, we will also want to do some internal logging of what's happening, both for benchmarking and debugging our application. Building our C10K web server Our web server will be responsible for taking requests, routing them, and serving either flat files or dynamic files with templates parsed against a few different data sources. As mentioned earlier, if we exclusively serve flat files and remove much of the processing and network latency, we'd have a much easier time with handling 10,000 concurrent connections. Our goal is to approach as much of a real-world scenario as we can—very little of the Web operates on a single server in a static fashion. Most websites and applications utilize databases, CDNs (Content Delivery Networks), dynamic and uncached template parsing, and so on. We need to replicate them whenever possible. For the sake of simplicity, we'll separate our content by type and filter them through URL routing, as follows: /static/[request]: This will serve request.html directly /template/[request]: This will serve request.tpl after its been parsed through Go /dynamic/[request][number]: This will also serve request.tpl and parse it against a database source's record By doing this, we should get a better mixture of possible HTTP request types that could impede the ability to serve large numbers of users simultaneously, especially in a blocking web server environment. You can find Go's exceptional library to generate safe data-driven templating at http://golang.org/pkg/html/template/. By safe, we're largely referring to the ability to accept data and move it directly into templates without worrying about the sort of injection issues that are behind a large amount of malware and cross-site scripting. For the database source, we'll use MySQL here, but feel free to experiment with other databases if you're more comfortable with them. Like the html/template package, we're not going to put a lot of time into outlining MySQL and/or its variants. Benchmarking against a blocking web server It's only fair to add some starting benchmarks against a blocking web server first so that we can measure the effect of concurrent versus nonconcurrent architecture. For our starting benchmarks, we'll eschew any framework, and we'll go with our old stalwart, Apache. For the sake of completeness here, we'll be using an Intel i5 3GHz machine with 8 GB of RAM. While we'll benchmark our final product on Ubuntu, Windows, and OS X here, we'll focus on Ubuntu for our example. Our localhost domain will have three plain HTML files in /static, each trimmed to 80 KB. As we're not using a framework, we don't need to worry about raw dynamic requests, but only about static and dynamic requests in addition to data source requests. For all examples, we'll use a MySQL database (named master) with a table called articles that will contain 10,000 duplicate entries. Our structure is as follows: CREATE TABLE articles ( article_id INT NOT NULL AUTO_INCREMENT, article_title VARCHAR(128) NOT NULL, article_text VARCHAR(128) NOT NULL, PRIMARY KEY (article_id) ) With ID indexes ranging sequentially from 0-10,000, we'll be able to generate random number requests, but for now, we just want to see what kind of basic response we can get out of Apache serving static pages with this machine. For this test, we'll use Apache's ab tool and then gnuplot to sequentially map the request time as the number of concurrent requests and pages; we'll do this for our final product as well, but we'll also go through a few other benchmarking tools for it to get some better details.   Apache's AB comes with the Apache web server itself. You can read more about it at http://httpd.apache.org/docs/2.2/programs/ab.html. You can download it for Linux, Windows, OS X, and more from http://httpd.apache.org/download.cgi. The gnuplot utility is available for the same operating systems at http://www.gnuplot.info/. So, let's see how we did it. Have a look at the following graph: Ouch! Not even close. There are things we can do to tune the connections available (and respective threads/workers) within Apache, but this is not really our goal. Mostly, we want to know what happens with an out-of-the-box Apache server. In these benchmarks, we start to drop or refuse connections at around 800 concurrent connections. More troubling is that as these requests start stacking up, we see some that exceed 20 seconds or more. When this happens in a blocking server, each request behind it is queued; requests behind that are similarly queued and the entire thing starts to fall apart. Even if we cannot hit 10,000 concurrent connections, there's a lot of room for improvement. While a single server of any capacity is no longer the way we expect a web server environment to be designed, being able to squeeze as much performance as possible out of that server, ostensibly with our concurrent, event-driven approach, should be our goal. Handling requests The Gorilla toolkit certainly makes this easier, but we should also know how to intercept the functionality to impose our own custom handler. Here is a simple web router wherein we handle and direct requests using a custom http.Server struct, as shown in the following code: var routes []string type customRouter struct { } func (customRouter) ServeHTTP(rw http.ResponseWriter, r *http.Request) { fmt.Println(r.URL.Path); } func main() { var cr customRouter; server := &http.Server { Addr: ":9000", Handler:cr, ReadTimeout: 10 * time.Second, WriteTimeout: 10 * time.Second, MaxHeaderBytes: 1 << 20, } server.ListenAndServe() } Here, instead of using a built-in URL routing muxer and dispatcher, we're creating a custom server and custom handler type to accept URLs and route requests. This allows us to be a little more robust with our URL handling. In this case, we created a basic, empty struct called customRouter and passed it to our custom server creation call. We can add more elements to our customRouter type, but we really don't need to for this simple example. All we need to do is to be able to access the URLs and pass them along to a handler function. We'll have three: one for static content, one for dynamic content, and one for dynamic content from a database. Before we go so far though, we should probably see what our absolute barebones, HTTP server written in Go, does when presented with the same traffic that we sent Apache's way. By old school, we mean that the server will simply accept a request and pass along a static, flat file. You could do this using a custom router as we did earlier, taking requests, opening files, and then serving them, but Go provides a much simpler mode to handle this basic task in the http.FileServer method. So, to get some benchmarks for the most basic of Go servers against Apache, we'll utilize a simple FileServer and test it against a test.html page (which contains the same 80 KB file that we had with Apache). As our goal with this test is to improve our performance in serving flat and dynamic pages, the actual specs for the test suite are somewhat immaterial. We'd expect that while the metrics will not match from environment to environment, we should see a similar trajectory. That said, it's only fair we supply the environment used for these tests; in this case, we used a MacBook Air with a 1.4 GHz i5 processor and 4 GB of memory. First, we'll do this with our absolute best performance out of the box with Apache, which had 850 concurrent connections and 900 total requests. The results are certainly encouraging as compared to Apache. Neither of our test systems were tweaked much (Apache as installed and basic FileServer in Go), but Go's FileServer handles 1,000 concurrent connections without so much as a blip, with the slowest clocking in at 411 ms. Apache has made a great number of strides pertaining to concurrency and performance options in the last five years, but to get there does require a bit of tuning and testing. The intent of this experiment is not intended to denigrate Apache, which is well tested and established. Instead, it's to compare the out-of-the-box performance of the world's number 1 web server against what we can do with Go. To really get a baseline of what we can achieve in Go, let's see if Go's FileServer can hit 10,000 connections on a single, modest machine out of the box: ab -n 10500 -c 10000 -g test.csv http://localhost:8080/a.html We will get the following output: Success! Go's FileServer by itself will easily handle 10,000 concurrent connections, serving flat, static content. Of course, this is not the goal of this particular project—we'll be implementing real-world obstacles such as template parsing and database access, but this alone should show you the kind of starting point that Go provides for anyone who needs a responsive server that can handle a large quantity of basic web traffic. Routing requests So, let's take a step back and look again at routing our traffic through a traditional web server to include not only our static content, but also the dynamic content. We'll want to create three functions that will route traffic from our customRouter:serveStatic():: read function and serve a flat file serveRendered():, parse a template to display serveDynamic():, connect to MySQL, apply data to a struct, and parse a template. To take our requests and reroute, we'll change the ServeHTTP method for our customRouter struct to handle three regular expressions. For the sake of brevity and clarity, we'll only be returning data on our three possible requests. Anything else will be ignored. In a real-world scenario, we can take this approach to aggressively and proactively reject connections for requests we think are invalid. This would include spiders and nefarious bots and processes, which offer no real value as nonusers.
Read more
  • 0
  • 0
  • 6609

article-image-progressive-mockito
Packt
14 Jul 2014
16 min read
Save for later

Progressive Mockito

Packt
14 Jul 2014
16 min read
(For more resources related to this topic, see here.) Drinking Mockito Download the latest Mockito binary from the following link and add it to the project dependency: http://code.google.com/p/mockito/downloads/list As of February 2014, the latest Mockito version is 1.9.5. Configuring Mockito To add Mockito JAR files to the project dependency, perform the following steps: Extract the JAR files into a folder. Launch Eclipse. Create an Eclipse project named Chapter04. Go to the Libraries tab in the project build path. Click on the Add External JARs... button and browse to the Mockito JAR folder. Select all JAR files and click on OK. We worked with Gradle and Maven and built a project with the JUnit dependency. In this section, we will add Mockito dependencies to our existing projects. The following code snippet will add a Mockito dependency to a Maven project and download the JAR file from the central Maven repository (http://mvnrepository.com/artifact/org.mockito/mockito-core): <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-core</artifactId> <version>1.9.5</version> <scope>test</scope> </dependency> The following Gradle script snippet will add a Mockito dependency to a Gradle project: testCompile 'org.mockito:mockito-core:1.9.5' Mocking in action This section demonstrates the mock objects with a stock quote example. In the real world, people invest money on the stock market—they buy and sell stocks. A stock symbol is an abbreviation used to uniquely identify shares of a particular stock on a particular market, such as stocks of Facebook are registered on NASDAQ as FB and stocks of Apple as AAPL. We will build a stock broker simulation program. The program will watch the market statistics, and depending on the current market data, you can perform any of the following actions: Buy stocks Sell stocks Hold stocks The domain classes that will be used in the program are Stock, MarketWatcher, Portfolio, and StockBroker. Stock represents a real-world stock. It has a symbol, company name, and price. MarketWatcher looks up the stock market and returns the quote for the stock. A real implementation of a market watcher can be implemented from http://www.wikijava.org/wiki/Downloading_stock_market_quotes_from_Yahoo!_finance. Note that the real implementation will connect to the Internet and download the stock quote from a provider. Portfolio represents a user's stock data such as the number of stocks and price details. Portfolio exposes APIs for getting the average stock price and buying and selling stocks. Suppose on day one someone buys a share at a price of $10.00, and on day two, the customer buys the same share at a price of $8.00. So, on day two the person has two shares and the average price of the share is $9.00. The following screenshot represents the Eclipse project structure. You can download the project from the Packt Publishing website and work with the files: The following code snippet represents the StockBroker class. StockBroker collaborates with the MarketWatcher and Portfolio classes. The perform() method of StockBroker accepts a portfolio and a Stock object: public class StockBroker { private final static BigDecimal LIMIT = new BigDecimal("0.10"); private final MarketWatcher market; public StockBroker(MarketWatcher market) { this.market = market; } public void perform(Portfolio portfolio,Stock stock) { Stock liveStock = market.getQuote(stock.getSymbol()); BigDecimal avgPrice = portfolio.getAvgPrice(stock); BigDecimal priceGained = liveStock.getPrice().subtract(avgPrice); BigDecimal percentGain = priceGained.divide(avgPrice); if(percentGain.compareTo(LIMIT) > 0) { portfolio.sell(stock, 10); }else if(percentGain.compareTo(LIMIT) < 0){ portfolio.buy(stock); } } } Look at the perform method. It takes a portfolio object and a stock object, calls the getQuote method of MarketWatcher, and passes a stock symbol. Then, it gets the average stock price from portfolio and compares the current market price with the average stock price. If the current stock price is 10 percent greater than the average price, then the StockBroker program sells 10 stocks from Portfolio; however, if the current stock price goes down by 10 percent, then the program buys shares from the market to average out the loss. Why do we sell 10 stocks? This is just an example and 10 is just a number; this could be anything you want. StockBroker depends on Portfolio and MarketWatcher; a real implementation of Portfolio should interact with a database, and MarketWatcher needs to connect to the Internet. So, if we write a unit test for the broker, we need to execute the test with a database and an Internet connection. A database connection will take time and Internet connectivity depends on the Internet provider. So, the test execution will depend on external entities and will take a while to finish. This will violate the quick test execution principle. Also, the database state might not be the same across all test runs. This is also applicable for the Internet connection service. Each time the database might return different values, and therefore asserting a specific value in your unit test is very difficult. We'll use Mockito to mock the external dependencies and execute the test in isolation. So, the test will no longer be dependent on real external service, and therefore it will be executed quickly. Mocking objects A mock can be created with the help of a static mock() method as follows: import org.mockito.Mockito; public class StockBrokerTest { MarketWatcher marketWatcher = Mockito.mock(MarketWatcher.class); Portfolio portfolio = Mockito.mock(Portfolio.class); } Otherwise, you can use Java's static import feature and static import the mock method of the org.mockito.Mockito class as follows: import static org.mockito.Mockito.mock; public class StockBrokerTest { MarketWatcher marketWatcher = mock(MarketWatcher.class); Portfolio portfolio = mock(Portfolio.class); } There's another alternative; you can use the @Mock annotation as follows: import org.mockito.Mock; public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; } However, to work with the @Mock annotation, you are required to call MockitoAnnotations.initMocks( this ) before using the mocks, or use MockitoJUnitRunner as a JUnit runner. The following code snippet uses MockitoAnnotations to create mocks: import static org.junit.Assert.assertEquals; import org.mockito.MockitoAnnotations; public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; @Before public void setUp() { MockitoAnnotations.initMocks(this); } @Test public void sanity() throws Exception { assertNotNull(marketWatcher); assertNotNull(portfolio); } } The following code snippet uses the MockitoJUnitRunner JUnit runner: import org.mockito.runners.MockitoJUnitRunner; @RunWith(MockitoJUnitRunner.class) public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; @Test public void sanity() throws Exception { assertNotNull(marketWatcher); assertNotNull(portfolio); } } Before we deep dive into the Mockito world, there are a few things to remember. Mockito cannot mock or spy the following functions: final classes, final methods, enums, static methods, private methods, the hashCode() and equals() methods, anonymous classes, and primitive types. PowerMock (an extension of EasyMock) and PowerMockito (an extension of the Mockito framework) allows you to mock static and private methods; even PowerMockito allows you to set expectations on new invocations for private member classes, inner classes, and local or anonymous classes. However, as per the design, you should not opt for mocking private/static properties—it violates the encapsulation. Instead, you should refactor the offending code to make it testable. Change the Portfolio class, create the final class, and rerun the test; the test will fail as the Portfolio class is final, and Mockito cannot mock a final class. The following screenshot shows the JUnit output: Stubbing methods We read about stubs in ,Test Doubles. The stubbing process defines the behavior of a mock method such as the value to be returned or the exception to be thrown when the method is invoked. The Mockito framework supports stubbing and allows us to return a given value when a specific method is called. This can be done using Mockito.when() along with thenReturn (). The following is the syntax of importing when: import static org.mockito.Mockito.when; The following code snippet stubs the getQuote(String symbol) method of MarcketWatcher and returns a specific Stock object: import static org.mockito.Matchers.anyString; import static org.mockito.Mockito.when; @RunWith(MockitoJUnitRunner.class) public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; @Test public void marketWatcher_Returns_current_stock_status() { Stock uvsityCorp = new Stock("UV", "Uvsity Corporation", new BigDecimal("100.00")); when (marketWatcher.getQuote( anyString ())). thenReturn (uvsityCorp); assertNotNull(marketWatcher.getQuote("UV")); } } A uvsityCorp stock object is created with a stock price of $100.00 and the getQuote method is stubbed to return uvsityCorp whenever the getQuote method is called. Note that anyString() is passed to the getQuote method, which means whenever the getQuote method will be called with any String value, the uvsityCorp object will be returned. The when() method represents the trigger, that is, when to stub. The following methods are used to represent what to do when the trigger is triggered: thenReturn(x): This returns the x value. thenThrow(x): This throws an x exception. thenAnswer(Answer answer): Unlike returning a hardcoded value, a dynamic user-defined logic is executed. It's more like for fake test doubles, Answer is an interface. thenCallRealMethod(): This method calls the real method on the mock object. The following code snippet stubs the external dependencies and creates a test for the StockBroker class: import com.packt.trading.dto.Stock; import static org.junit.Assert.assertNotNull; import static org.mockito.Matchers.anyString; import static org.mockito.Matchers.isA; import static org.mockito.Mockito.verify; import static org.mockito.Mockito.when; @RunWith(MockitoJUnitRunner.class) public class StockBrokerTest { @Mock MarketWatcher marketWatcher; @Mock Portfolio portfolio; StockBroker broker; @Before public void setUp() { broker = new StockBroker(marketWatcher); } @Test public void when_ten_percent_gain_then_the_stock_is_sold() { //Portfolio's getAvgPrice is stubbed to return $10.00 when(portfolio.getAvgPrice(isA(Stock.class))). thenReturn(new BigDecimal("10.00")); //A stock object is created with current price $11.20 Stock aCorp = new Stock("A", "A Corp", new BigDecimal("11.20")); //getQuote method is stubbed to return the stock when(marketWatcher.getQuote(anyString())).thenReturn(aCorp); //perform method is called, as the stock price increases // by 12% the broker should sell the stocks broker.perform(portfolio, aCorp); //verifying that the broker sold the stocks verify(portfolio).sell(aCorp,10); } } The test method name is when_ten_percent_gain_then_the_stock_is_sold; a test name should explain the intention of the test. We use underscores to make the test name readable. We will use the when_<<something happens>>_then_<<the action is taken>> convention for the tests. In the preceding test example, the getAvgPrice() method of portfolio is stubbed to return $10.00, then the getQuote method is stubbed to return a hardcoded stock object with a current stock price of $11.20. The broker logic should sell the stock as the stock price goes up by 12 percent. The portfolio object is a mock object. So, unless we stub a method, by default, all the methods of portfolio are autostubbed to return a default value, and for the void methods, no action is performed. The sell method is a void method; so, instead of connecting to a database to update the stock count, the autostub will do nothing. However, how will we test whether the sell method was invoked? We use Mockito.verify. The verify() method is a static method, which is used to verify the method invocation. If the method is not invoked, or the argument doesn't match, then the verify method will raise an error to indicate that the code logic has issues. Verifying the method invocation To verify a redundant method invocation, or to verify whether a stubbed method was not called but was important from the test perspective, we should manually verify the invocation; for this, we need to use the static verify method. Why do we use verify? Mock objects are used to stub external dependencies. We set an expectation, and a mock object returns an expected value. In some conditions, a behavior or method of a mock object should not be invoked, or sometimes, we may need to call the method N (a number) times. The verify method verifies the invocation of mock objects. Mockito does not automatically verify all stubbed calls. If a stubbed behavior should not be called but the method is called due to a bug in the code, verify flags the error though we have to verify that manually. The void methods don't return values, so you cannot assert the returned values. Hence, verify is very handy to test the void methods. Verifying in depth The verify() method has an overloaded version that takes Times as an argument. Times is a Mockito framework class of the org.mockito.internal.verification package, and it takes wantedNumberOfInvocations as an integer argument. If 0 is passed to Times, it infers that the method will not be invoked in the testing path. We can pass 0 to Times(0) to make sure that the sell or buy methods are not invoked. If a negative number is passed to the Times constructor, Mockito throws MockitoException - org.mockito.exceptions.base.MockitoException, and this shows the Negative value is not allowed here error. The following methods are used in conjunction with verify: times(int wantedNumberOfInvocations): This method is invoked exactly n times; if the method is not invoked wantedNumberOfInvocations times, then the test fails. never(): This method signifies that the stubbed method is never called or you can use times(0) to represent the same scenario. If the stubbed method is invoked at least once, then the test fails. atLeastOnce(): This method is invoked at least once, and it works fine if it is invoked multiple times. However, the operation fails if the method is not invoked. atLeast(int minNumberOfInvocations): This method is called at least n times, and it works fine if the method is invoked more than the minNumberOfInvocations times. However, the operation fails if the method is not called minNumberOfInvocations times. atMost(int maxNumberOfInvocations): This method is called at the most n times. However, the operation fails if the method is called more than minNumberOfInvocations times. only(): The only method called on a mock fails if any other method is called on the mock object. In our example, if we use verify(portfolio, only()).sell(aCorp,10);, the test will fail with the following output: The test fails in line 15 as portfolio.getAvgPrice(stock) is called. timeout(int millis): This method is interacted in a specified time range. Verifying zero and no more interactions The verifyZeroInteractions(Object... mocks) method verifies whether no interactions happened on the given mocks. The following test code directly calls verifyZeroInteractions and passes the two mock objects. Since no methods are invoked on the mock objects, the test passes: @Test public void verify_zero_interaction() { verifyZeroInteractions(marketWatcher,portfolio); } The verifyNoMoreInteractions(Object... mocks) method checks whether any of the given mocks has any unverified interaction. We can use this method after verifying a mock method to make sure that nothing else was invoked on the mock. The following test code demonstrates verifyNoMoreInteractions: @Test public void verify_no_more_interaction() { Stock noStock = null; portfolio.getAvgPrice(noStock); portfolio.sell(null, 0); verify(portfolio).getAvgPrice(eq(noStock)); //this will fail as the sell method was invoked verifyNoMoreInteractions(portfolio); } The following is the JUnit output: The following are the rationales and examples of argument matchers. Using argument matcher ArgumentMatcher is a Hamcrest matcher with a predefined describeTo() method. ArgumentMatcher extends the org.hamcrest.BaseMatcher package. It verifies the indirect inputs into a mocked dependency. The Matchers.argThat(Matcher) method is used in conjunction with the verify method to verify whether a method is invoked with a specific argument value. ArgumentMatcher plays a key role in mocking. The following section describes the context of ArgumentMatcher. Mock objects return expected values, but when they need to return different values for different arguments, argument matcher comes into play. Suppose we have a method that takes a player name as input and returns the total number of runs (a run is a point scored in a cricket match) scored as output. We want to stub it and return 100 for Sachin and 10 for xyz. We have to use argument matcher to stub this. Mockito returns expected values when a method is stubbed. If the method takes arguments, the argument must match during the execution; for example, the getValue(int someValue) method is stubbed in the following way: when(mockObject.getValue(1)).thenReturn(expected value); Here, the getValue method is called with mockObject.getValue(100). Then, the parameter doesn't match (it is expected that the method will be called with 1, but at runtime, it encounters 100), so the mock object fails to return the expected value. It will return the default value of the return type—if the return type is Boolean, it'll return false; if the return type is object, then null, and so on. Mockito verifies argument values in natural Java style by using an equals() method. Sometimes, we use argument matchers when extra flexibility is required. Mockito provides built-in matchers such as anyInt(), anyDouble(), anyString(), anyList(), and anyCollection(). More built-in matchers and examples of custom argument matchers or Hamcrest matchers can be found at the following link: http://docs.mockito.googlecode.com/hg/latest/org/mockito/Matchers.html Examples of other matchers are isA(java.lang.Class<T> clazz), any(java.lang.Class<T> clazz), and eq(T) or eq(primitive value). The isA argument checks whether the passed object is an instance of the class type passed in the isA argument. The any(T) argument also works in the same way. Why do we need wildcard matchers? Wildcard matchers are used to verify the indirect inputs to the mocked dependencies. The following example describes the context. In the following code snippet, an object is passed to a method and then a request object is created and passed to service. Now, from a test, if we call the someMethod method and service is a mocked object, then from test, we cannot stub callMethod with a specific request as the request object is local to the someMethod: public void someMethod(Object obj){ Request req = new Request(); req.setValue(obj); Response resp = service.callMethod(req); } If we are using argument matchers, all arguments have to be provided by matchers. We're passing three arguments and all of them are passed using matchers: verify(mock).someMethod(anyInt(), anyString(), eq("third argument")); The following example will fail because the first and the third arguments are not passed using matcher: verify(mock).someMethod(1, anyString(), "third argument");
Read more
  • 0
  • 0
  • 2133
Banner background image

article-image-part-2-deploying-multiple-applications-capistrano-single-project
Rodrigo Rosenfeld
01 Jul 2014
8 min read
Save for later

Part 2: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld
01 Jul 2014
8 min read
In part 1, we covered Capistrano and why you would use it. We also covered mixins, which provide the base for what we will do in this post, which is to deploy a sample project using Capistrano. For this project, suppose our user interface is a combination of two applications,app1 and app2. They should be deployed to servers do and ec2. And we'll provide two environments,production and cert. Make sure Ruby and Bundler are installed before you start. First, we create a new directory for our project, and add a Gemfile to it with capistrano as a dependency. Then we will create the Capistrano directory structure: mkdircapsample cd capsample bundle init echo "gem 'capistrano'" >>Gemfile bundle bundle exec cap install STAGES="do_prod_app1,do_prod_app2,do_cert_app1,do_cert_app2,ec2_prod_app1,ec2_prod_app2,ec2_cert_app1,ec2_cert_app2" This will create nine files under config/deploy, one for each server/environment/application group. This is just to demonstrate the idea. We'll completely override their entire content later on. It will also create a Capfile file that works in a similar way to a regular Rakefile. With Rake, you can get a list of the available tasks with rake -T. With Capistrano you can get the same using: bundle exec cap -T Behind the scenes, cap is a binary distributed with the capistrano gem that will run Rake with Capfile set as the Rakefile and supporting a few other options like --roles.Now create a new file,lib/mixin.rb, with the content mentioned in the Using mixins section in part 1. Then add this to the top of the Capfile: $: . unshiftFile.dirname(__FILE__) require'lib/mixin' Each of the files under config/deploy will look very similar to each other. For instance, ec2_prod_app1 would look like this: mixin 'servers/ec2' mixin'environments/production' mixin'applications/app1' Then config/mixins/servers/ec2.rb would look like this: server 'ec2.mydomain.com', roles: [:main] set :database_host, 'ec2-db.mydomain.com' This file contains definitions that are valid (or default) for the whole server, no matter what environment or application we're deploying. In this example the database host is shared for all applications and environments hosted on our ec2 server. Something to note here is that we're adding a single role named main to our server. If we specified all roles, like [:web, :db, :assets, :puma], then they would be shared with all recipes relying on this server mixin. So, a better approach would be to add them on the application's recipe, if required. For instance, you might want to add something like set :server_name, 'ec2.mydomain.com' to your server definitions. Then you can dynamically set the role in the application's recipe by calling role :db, [fetch(:server_name)] and so on for all required roles. However, this is usually not necessary for third-party recipes as they let you decide which role the recipe should act on. For example, if you want to deploy your application with Puma you can write set :puma_role, :main. Before we discuss a full example for the application recipe, let's look at what config/mixins/environments/production.rb might look like: set :branch, 'production' set :encoding_key, '098f6bcd4621d373cade4e832627b4f6' set :database_name, 'app_production' set :app1_port, 3000 set :app2_port, 3001 set :redis_port, 6379 set :solr_port, 8080 In this example, we're assuming that the ports for app1 and app2 , Redis and Solr will be the same for production in all servers, as well as the database name. Finally, the recipes themselves, which tell Capistrano how to set up an application, will be defined byconfig/mixins/applications/app1.rb. Here's an example for a simple Rails application: Rake :: Task['load:defaults'].invoke Rake::Task['load:defaults'].clear require'capistrano/rails' require'capistrano/puma' Rake::Task['load:defaults'].reenable Rake::Task['load:defaults'].invoke set :application, 'app1' set :repo_url, '[email protected]:me/app1.git' set :rails_env, 'production' set :assets_roles, :main set :migration_role, :main set :puma_role, :main set :puma_bind, "tcp://0.0.0.0:#{fetch :app1_port}" namespace :railsdo desc'Generate settings file' task :generate_settingsdo on roles(:all) do template ="config/templates/database.yml.erb" dbconfig=StringIO.new(ERB.new(File.read template).result binding) upload! dbconfig, release_path.join('config', 'database.yml') end end end before 'deploy:migrate', 'rails:generate_settings' # Create directories expected by Puma default settings: before 'puma:restart', 'create_log_and_tmp'do on roles(:all) do within shared_pathdo execute :mkdir, '-p', 'log', 'tmp/pids' end end end Make sure you remove the lines that set application and repo_url on the config/deploy.rb file generated bycap install. Also, if you're deploying a Rails application using this recipe you should also add the capistrano-rails andcapistrano3-puma gems to your Gemfile and run bundle again. In case you're running rbenv or rvmto install Ruby in the server, make sure you include either capistrano-rbenv or capistrano-rvm gems and require them on the recipe. You may also need to provide more information in this case. For rbenv you'd need to tell it which version to use with set :rbenv_ruby, '2.1.2' for example. Sometimes you'll find out that some settings are valid for all applications under all environments in all servers. The most important one to notice is the location for our applications as they must not conflict with each other. Another setting that could be shared across all combinations could be the private key used to connect to all servers. For such cases, you should add those settings directly to config/deploy.rb: set :deploy_to, -> { "/home/vagrant/apps/#{fetch :environment}/#{fetch :application}" } set :ssh_options, { keys: %w(~/.vagrant.d/insecure_private_key) } I strongly recommend connecting to your servers with a regular account rather than root. For our applications we use userbenv to manage our Ruby versions, so we're able to deploy them as regular users as long as our applications listen to high port numbers. We'd then setup our proxy server (nginx in our case) to forward the requests on port 80 and 443 to each application's port accordingly to the requested domains and paths. This is set up by some Chef recipes. Those recipes run as root in our servers. To connect using another user, just pass it in the server declaration. To connect to [email protected], this is how you'd set it up: server '192.168.33.10', user: 'vagrant', roles: [:main] set :ssh_options, { keys: %w(~/.vagrant.d/insecure_private_key) } Finally, we create a config/database.yml that's suited for our environment on demand, before running the migrations task. Here's what the template config/templates/database.ymlcould look like: production: adapter: postgresql encoding: unicode pool: 30 database: <%= fetch :database_name %> host: <%= fetch :database_host %> I've omitted the settings for app2 , but in case it was another Rails application, we could extract the common logic between them to another common_rails mixin. Also notice that because we're not requiring capistrano/rails and capistrano/puma in the Capfile, their default values won't be set as Capistrano has already invoked the load:defaults task before our mixins are loaded. That's why we clear that task, require the recipes, and then re-enable and re-run the task so that the default for those recipes have the opportunity to load. Another approach is to require those recipes directly in the Capfile. But unless the recipes are carefully crafted to only run their commands for very specific roles, it's likely that you can get unexpected behavior if you deploy an application with Rails, another one with Grails, and yet another with NodeJS. If any of them has commands that run for all roles, or if the role names between them conflict somehow you'd be in trouble. So, unless you have total control and understanding about all your third-party recipes, I'd recommend that you use the approach outlined in the examples above. Conclusion All the techniques presented here are used to manage our real complex scenario at e-Core, where we support multiple applications in lots of environments that are replicated in three servers. We found that this allowed us to quickly add new environments or servers as needed to recreate our application in no time. Also, I'd like to thank Juan Ibiapina, who worked with me on all these recipes to ensure our deployment procedures are fully automated—almost. We still manage our databases and documents manually because we prefer to. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems. For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations,rails-web-console, the JS specs runner oojspec, sequel-devise, and the Linux X11 utility ktrayshortcut. Rodrigo was hired by e-Core (Porto Alegre-RS, Brazil) to work from home, building and maintaining software for Matterhorn Transactions Inc. with a team of great developers. Matterhorn's main product, the Market Tracker, is used by LexisNexis clients .
Read more
  • 0
  • 0
  • 3214

article-image-part-1-deploying-multiple-applications-capistrano-single-project
Rodrigo Rosenfeld
01 Jul 2014
9 min read
Save for later

Part 1: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld
01 Jul 2014
9 min read
Capistrano is a deployment tool written in Ruby that is able to deploy projects using any language or framework, through a set of recipes, which are also written in Ruby. Capistrano expects an application to have a single repository and it is able to run arbitrary commands on the server through an SSH non-interactive session. Capistrano was designed assuming that an application is completely described by a single repository with all code belonging to it. For example, your web application is written with Ruby on Rails and simply serving that application would be enough. But what if you decide to use a separate application for managing your users, in a separate language and framework? Or maybe some issue tracker application? You could setup a proxy server to properly deliver each request to the right application based upon the request path for example. But the problem remains: how do you use Capistrano to manage more complex scenarios like this if it supports a single repository? The typical approach is to integrate Capistrano on each of the component applications and then switching between those projects before deploying those components. Not only this is a lot of work to deploy all of these components, but it may also lead to a duplication of settings. For example, if your main application and the user management application both use the same database for a given environment, you’d have to duplicate this setting in each of the components. For the Market Tracker product, used byLexisNexis clients (which we develop at e-Core for Matterhorn Transactions Inc.), we were looking for a better way to manage many component applications, in lots of environments and servers. We wanted to manage all of them from a single repository, instead of adding Capistrano integration to each of our component’s repositories and having to worry about keeping the recipes in sync between each of the maintained repository branches. Motivation The Market Tracker application we maintain consists of three different applications: the main one, another to export search results to Excel files, and an administrative interface to manage users and other entities. We host the application in three servers: two for the real thing and another back-up server. The first two are identical ones and allow us to have redundancy and zero downtime deployments except for a few cases where we change our database schema in incompatible ways with previous versions. To add to the complexity of deploying our three composing applications to each of those servers, we also need to deploy them multiple times for different environments like production, certification, staging, and experimental. All of them run on the same server, in separate ports, and they are running separate databases:Solr and Redis instances. This is already complex enough to manage when you integrate Capistrano to each of your projects, but it gets worse. Sometimes you find bugs in production and have to release quick fixes, but you can't deploy the version in the master branch that has several other changes. At other times you find bugs on your Capistrano recipes themselves and fix them on the master. Or maybe you are changing your deploy settings rather than the application’s code. When you have to deploy to production, depending on how your Capistrano recipes work, you may have to change to the production branch, backport any changes for the Capistrano recipes from the master and finally deploy the latest fixes. This happens if your recipe will use any project files as a template and they moved to another place in the master branch, for example. We decided to try another approach, similar to what we do with our database migrations. Instead of integrating the database migrations into the main application (the default on Rails, Django, Grails, and similar web frameworks) we prefer to handle it as a separate project. In our case we use theactive_record_migrations gem, which brings standalone support for ActiveRecord migrations (the same that is bundled with Rails apps by default). Our database is shared between the administrative interface project and the main web application and we feel it's better to be able to manage our database schema independently from the projects using the database. We add the migrations project to the other application as submodules so that we know what database schema is expected to work for a particular commit of the application, but that's all. We wanted to apply the same principles to our Capistrano recipes. We wanted to manage all of our applications on different servers and environments from a single project containing the Capistrano recipes. We also wanted to store the common settings in a single place to avoid code duplication, which makes it hard to add new environments or update existing ones. Grouping all applications' Capistrano recipes in a single project It seems we were not the first to want all Capistrano recipes for all of our applications in a single project. We first tried a project called caphub. It worked fine initially and its inheritance model would allow us to avoid our code duplication. Well, not entirely. The problem is that we needed some kind of multiple inheritances or mixins. We have some settings, like token private key, that are unique across environments, like Certification and Production. But we also have other settings that are common in within a server. For example, the database host name will be the same for all applications and environments inside our collocation facility, but it will be different in our backup server at Amazon EC2. CapHub didn't help us to get rid of the duplication in such cases, but it certainly helped us to find a simple solution to get what we wanted. Let's explore how Capistrano 3 allows us to easily manage such complex scenarios that are more common than you might think. Capistrano stages Since Capistrano 3, multistage support is built-in (there was a multistage extension for Capistrano 2). That means you can writecap stage_nametask_name, for examplecap production deploy. By default,cap install will generate two stages: production and staging. You can generate as many as you want, for example: cap install STAGES=production,cert,staging,experimental,integrator But how do we deploy each of those stages to our multiple servers, since the settings for each stage may be different across the servers? Also, how can we manage separate applications? Even though those settings are called "stages" by Capistrano, you can use it as you want. For example, suppose our servers are named m1,m2, and ec2 and the applications are named web, exporter and admin. We can create settings likem1_staging_web, ec2_production_admin, and so on. This will result in lots of files (specifically 45 = 5 x 3 x 3 to support five environments, three applications, and three servers) but it's not a big deal if you consider the settings files can be really small, as the examples will demonstrate later on in this article by using mixins. Usually people will start with staging and production only, and then gradually add other environments. Also, they usually start with one or two servers and keep growing as they feel the need. So supporting 45 combinations is not such a pain since you don’t write all of them at once. On the other hand, if you have enough resources to have a separate server for each of your environments, Capistrano will allow you to add multiple "server" declarations and assign roles to them, which can be quite useful if you're running a cluster of servers. In our case, to avoid downtime we don't upgrade all servers in our cluster at once. We also don't have the budget to host 45 virtual machines or even 15. So the little effort to generate 45 small settings files compensates the savings with hosting expenses. Using mixins My next post will create an example deployment project from scratch providing detail for everything that has been discussed in this post. But first, let me introduce the concept of what we call a mixin in our project. Capistrano 3 is simply a wrapper on top of Rake. Rake is a build tool written in Ruby, similar to “make.” It has targets and targets have prerequisites. This fits nicely in the way Capistrano works, where some deployment tasks will depend on other tasks. Instead of a Rakefile (Rake’s Makefile) Capistrano will use a Capfile, but other than that it works almost the same way. The Domain Specific Language (DSL) in a Capfile is enhanced as you include Capistrano extensions to the Rake DSL. Here’s a sample Capfile, generated by cap install, when you install Capistrano: # Load DSL and Setup Up Stages require'capistrano/setup' # Includes default deployment tasks require'capistrano/deploy' # Includes tasks from other gems included in your Gemfile # # For documentation on these, see for example: # # https://github.com/capistrano/rvm # https://github.com/capistrano/rbenv # https://github.com/capistrano/chruby # https://github.com/capistrano/bundler # https://github.com/capistrano/rails # # require 'capistrano/rvm' # require 'capistrano/rbenv' # require 'capistrano/chruby' # require 'capistrano/bundler' # require 'capistrano/rails/assets' # require 'capistrano/rails/migrations' # Loads custom tasks from `lib/capistrano/tasks' if you have any defined. Dir.glob('lib/capistrano/tasks/*.rake').each { |r| import r } Just like a Rakefile, a Capfile is valid Ruby code, which you can easily extend using regular Ruby code. So, to support a mixin DSL, we simply need to extend the DSL, like this:   defmixin (path) loadFile.join('config', 'mixins', path +'.rb') end Pretty simple, right? We prefer to add this to a separate file, like lib/mixin.rb and add this to the Capfile: $:.unshiftFile.dirname(__FILE__) require 'lib/mixin' After that, calling mixin 'environments/staging' should load settings that are common for the staging environment from a file called config/mixins/environments/staging.rb in the root of the Capistrano-enabled project. This is the base to set up our deployment project that we will create in the next post. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems.For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations, rails-web-console, the JS specs runner oojspec, sequel-devise and the Linux X11 utility ktrayshortcut.Rodrigo was hired by e-Core (Porto Alegre - RS, Brazil) to work from home, building and maintaining software forMatterhorn Transactions Inc. with a team of great developers. Matterhorn'smain product, the Market Tracker, is used by LexisNexis clients.
Read more
  • 0
  • 0
  • 3409

article-image-part-1-managing-multiple-apps-and-environments-capistrano-3-and-chef-solo
Rodrigo Rosenfeld
30 Jun 2014
8 min read
Save for later

Part 1: Managing Multiple Apps and Environments with Capistrano 3 and Chef Solo

Rodrigo Rosenfeld
30 Jun 2014
8 min read
In my previous two posts, I explored how to use Capistrano to deploy multiple applications in different environments and servers. This, however, is only one part of our deployment procedures. It just takes care of the applications themselves, but we still rely on the server being properly set up so that our Capistrano recipes work. In these two posts I'll explain how to use Chef to manage servers, and how to integrate it with Capistrano and perform all of your deployment procedures from a single project. Introducing the sample deployment project After I wrote the previous two posts, I realized I was not fully happy with a few issues of our company's deployment strategy: Duplicate settings: This was the main issue that was puzzling me. I didn't like the fact that we had to duplicate some settings like the application's binding port in both Chef and Capistrano projects. Too many required files (45 to support 3 servers, 5 environments, and 3 applications): While the files were really small, I felt that this situation could be further improved by the use of some conventions. So, I decided to work in a proof-of-concept project that would integrate both Chef and Capistrano and fix these issues. After a weekend working (almost) full time on it, I came up with a sample project so that you can fork it and adapt it to your deployment scenario. The main goal of this project hasn't changed from my previous article. We want to be able to support new environments and servers very quickly by simply adding some settings to the project. Go ahead and clone it. Follow the instructions on the README and it should deploy the Rails Devise sample application into a VirtualBox Virtual Machine (VM) using Vagrant. The following sections will explain how it works and the reasons behind its design. The overall idea While it's possible to accomplish all of your deployment tasks with either Chef or Capistrano alone, I feel that they are more suitable for different tasks. There are many existing recipes that you can take advantage of for both projects, but they usually don't overlap much. There are Chef community cookbooks available to help you install nginx, apache2, java, databases, and much more. You probably want to use Chef to perform administrative tasks like managing services, server backup, installing software, and so on. Capistrano, on the other hand, will help you by deploying the applications itself after the server is ready to go, and after running your Chef recipes. This includes creating releases of your application, which allows you to easily rollback to a previous working version, for example. You'll find existing Capistrano recipes to help you with several application-related tasks like running Bundler, switching between Ruby versions with either rbenv, rvm or chruby, running Rails migrations and assets precompilation, and so on. Capistrano recipes are well integrated with the Capistrano deploy flow. For instance, the capistrano-puma recipe will automatically generate a settings file if it is missing and start puma after the remaining deployment tasks have finished by including this in its recipes: after 'deploy:check', 'puma:check' after 'deploy:finished', 'puma:smart_restart' Another difference between sysadmin and deployment tasks is that usually the former will require superuser privileges while the latter is recommended to be accomplished by a regular user. This way, you can feel safer when deploying Capistrano recipes, since you know it won't affect the server itself, except for the applications managed by that user account. And deploying an application is way more common than installing and configuring programs or changing the proxy's settings. Some of the settings required by Chef and Capistrano recipes overlap. One example is a Chef recipe that generates an nginx settings file that will proxy requests to a Rails application listening on a local port. In this scenario, the binding address used by the Capistrano puma recipe needs to coincide with the port declared in the proxy settings for the nginx configuration file. Managing deployment settings Capistrano and Chef provide different built-in ways of managing their settings. Capistrano will use a Domain Specific Language (DSL) like set/fetch, while Chef will read the attributes following a well described precedence. I strongly advise you to keep with those approaches for settings that are specific for each project. To enable you to remove any duplication by overlapping deployment settings, I introduced another configuration declaration framework for the shared settings using the configatron gem, by taking advantage of the fact that both Chef and Capistrano are written in Ruby. Take a look at the settings directory in the sample project: settings/ ├── applications │ └── rails-devise.rb ├── common.rb ├── environments │ ├── development.rb │ └── production.rb └── servers     └── vagrant.rb The settings are split in common, along with those specific for each application, environment, and servers. As you would expect, the Rails Devise application deployed to the production environment in the vagrant server will read the settings from common.rb, servers/vagrant.rb, environments/production.rb, and applications/rails-devise.rb. If some of your settings apply to the Rails Devise running on a given server or environment (or both), it's possible to override the specific settings in other files like rails-devise_production.rb, vagrant_production.rb, or vagrant_production_rails-devise.rb. Here's the definition of load_app_settings in common_helpers/settings_loader.rb: def load_app_settings(app_name, app_server, app_env) cfg.app_name = app_name cfg.app_server = app_server cfg.app_env = app_env [ 'common', "servers/#{app_server}", "environments/#{app_env}", "applications/#{app_name}", "#{app_server}_#{app_env}", "#{app_server}_#{app_name}", "#{app_name}_#{app_env}", "#{app_server}_#{app_env}_#{app_name}", ].each{|s| load_settings s } cfg.lock! end Feel free to change the load path order. The latest settings take precedence over the first ones. So if the binding port is usually 3000 for production but 4000 for your ec2 server, you can add a cfg.my_app.binding_port = 3000 to environments/production.rb and override it on ec2_production.rb. Once those settings are loaded, they are locked and can't be changed by the deployment recipes. As a final note, the settings can also be set using a hash notation, which can be useful if you’re using a dynamic setting attribute. Here’s an example: cfg[:my_app][“binding_#{‘port’}”] = 3000. This is not really useful in this case, but it illustrates the setting capabilities. Calculated settings Two types of calculated settings are supported on this project: delayed and dynamic. Delayed are lazily evaluated the first time they are requested, while dynamic are always evaluated. They are useful for providing default values for some settings that could be overridden by other settings files. I prefer to use delayed attributes for those that are meant to be overridden and dynamic ones for those that are meant to be calculated, even though delayed ones would be suitable for both cases. Here's the common.rb from the sample project to illustrate the idea: require 'set' cfg.chef_runlist = Set.new cfg.deploy_user = 'deploy' cfg.deployment_repo_url = '[email protected]:rosenfeld/capistrano-chef-deployment.git' cfg.deployment_repo_host = 'github.com' cfg.deployment_repo_symlink = false cfg.nginx.default = false # Delayed attributes: they are set to the block values unless explicitly set to other value cfg.database_name = delayed_attr{ "app1_#{cfg.app_env}" } cfg.nginx.subdomain = delayed_attr{ cfg.app_env } # Dynamic/calculated attributes: those are always evaluated by the block # Those attributes are not meant to be overrideable cfg.nginx.host = dyn_attr{ "#{cfg.nginx.subdomain}.mydomain.com" } cfg.nginx.host in this instance is not meant to be overridden by any other settings file and follows the company's policy. But it would be okay to override the production database name to app1 instead of using the default app1_production. This is just a guideline, but it should give you a good idea of some ways that Chef and Capistrano can be used together. Conclusion I hope you found this post as useful as I did. Being able to fully deploy the whole application stack from a single repository saves us a lot of time and simplifies our deployment a lot, and in the next post, Part 2, I will walk you through that deployment. About The Author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems. For the past 5 years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems including active_record_migrations, rails-web-console, the JS specs runner oojspec, sequel-devise, and the Linux X11 utility ktrayshortcut. Rodrigo was hired by e-Core (Porto Alegre - RS, Brazil) to work from home, building and maintaining software for Matterhorn Transactions Inc. with a team of great developers. Matterhorn's main product, the Market Tracker, is used by LexisNexis clients.
Read more
  • 0
  • 0
  • 1937
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-enterprise-geodatabase
Packt
25 Jun 2014
5 min read
Save for later

Enterprise Geodatabase

Packt
25 Jun 2014
5 min read
(For more resources related to this topic, see here.) Creating a connection to the enterprise geodatabase A geodatabase connection is a channel that is established between ArcGIS and the enterprise geodatabase. To create a connection, we need to specify the database server and the user credentials. Without this information, we will not be able to create a connection. To create a geodatabase connection using the SDE user, perform the following steps: Open ArcCatalog and expand the Database Connections dialog from the Catalog Tree window. Double-click on Add Database Connection. From the Database Platform drop-down list, select the database; ours is SQL Server. In the Instance field, type the name of the server; here, it is GDBServer. Select the Database authentication option from the Authentication Type drop-down list and type in the SDE credentials. Click on the Database drop-down list. This should be populated automatically as you leave the password field. Select your geodatabase. Click on OK and rename the connection to sde@gdbserver. This is illustrated in the following screenshot: The type of geodatabase connection depends on the roles assigned to the user. Connecting with the sde user will grant you full access to the geodatabase, where you can copy, delete, and change almost anything. Create four more database connections with the users Robb, Joffrey, Tyrion, and Dany. Give them proper names so we can use them later. Migrating a file geodatabase to an enterprise geodatabase We have our enterprise geodatabase. You might have created a few feature classes and tables. But eventually, our clients at Belize need to start working on the new geodatabase. So, we need to migrate the Bestaurants_new.gdb file to this enterprise geodatabase. This can be done with a simple copy and paste operation. Note that these steps work in the exact same way on any other DBMS once it is set up. You can copy and paste from a file geodatabase to any enterprise geodatabase using the following steps: Open ArcCatalog and browse to your Bestaurants_new.gdb geodatabase. Right-click on the Food_and_Drinks feature class and select Copy, as seen in the following screenshot: Now, browse and connect to sde@gdbserver; right-click on an empty area and click on Paste, as seen in the following screenshot: You will be prompted with a list of datasets that will be copied as shown in the following screenshot. Luckily, all the configurations will be copied. This includes domains, subtypes, feature classes, and related tables as follows: After the datasets and configurations have been copied, you will see all your data in the new geodatabase. Note that in an SQL Server enterprise geodatabase, there are two prefixes added to each dataset. First, the database is added, which is sdedb, followed by the schema, which is SDE, and finally the dataset name, as shown in the following screenshot: Assigning privileges Have you tried to connect as Robb or Tyrion to your new geodatabase? If you haven't, try it now. You will see that none of the users you created have access to the Food_and_Drinks feature class or any other dataset. You might have guessed why. That is because SDE has created this data, and only this user can allow other users to see this data. So, how do we allow users to see other users' datasets? This is simple just perform the following steps: From ArcCatalog, connect as sde@gdbserver. Right-click on the sdedb.SDE.Food_and_Drinks feature class, point the cursor to Manage, and then click on Privileges as shown in the following screenshot: In the Privileges... dialog, click on Add. Select all four users, Robb, Joffrey, Tyrion, and Dany, and click on OK. Make sure that the Select checkbox is checked for all four users, which means they can see and read this feature class. For Dany, assign Insert, Update, and Delete so that she can also edit this feature class, as shown in the following screenshot. Apply the same privileges to all other datasets as follows and click on OK. Try connecting with Robb; you will now be able to view all datasets. You can use Dany's account to edit your geodatabase using ArcMap. You can create more viewer users who have read-only access to your geodatabase but cannot edit or modify it in any way. Summary Enterprise geodatabases are an excellent choice when you have a multiuser environment. In this article, you learned how to create a geodatabase connection using ArcCatalog to the new enterprise geodatabase. You also learned to migrate your file geodatabase into a fresh enterprise geodatabase. Finally, you learned to assign different privileges to each user and access control to your new enterprise geodatabase. While setting up and configuring an enterprise geodatabase is challenging, working with the enterprise geodatabases in ArcCatalog and ArcMap is similar to working with file geodatabases. Thus, in this article, we took a leap by using an upgraded version of a geodatabase, which is called an enterprise geodatabase. Resources for Article: Further resources on this subject: Server Logs [Article] Google Earth, Google Maps and Your Photos: a Tutorial [Article] Including Google Maps in your Posts Using Apache Roller 4.0 [Article]
Read more
  • 0
  • 0
  • 2075

article-image-discovering-pythons-parallel-programming-tools
Packt
20 Jun 2014
3 min read
Save for later

Discovering Python's parallel programming tools

Packt
20 Jun 2014
3 min read
(For more resources related to this topic, see here.) The Python threading module The Python threading module offers a layer of abstraction to the module _thread, which is a lower-level module. It provides functions that help the programmer during the hard task of developing parallel systems based on threads. The threading module's official papers can be found at http://docs.python.org/3/library/threading.html?highlight=threading#module-threadin. The Python multiprocessing module The multiprocessing module aims at providing a simple API for the use of parallelism based on processes. This module is similar to the threading module, which simplifies alternations between the processes without major difficulties. The approach that is based on processes is very popular within the Python users' community as it is an alternative to answering questions on the use of CPU-Bound threads and GIL present in Python. The multiprocessing module's official papers can be found at http://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#multiprocessing. The parallel Python module The parallel Python module is external and offers a rich API for the creation of parallel and distributed systems making use of the processes approach. This module promises to be light and easy to install, and integrates with other Python programs. The parallel Python module can be found at http://parallelpython.com. Among some of the features, we may highlight the following: Automatic detection of the optimal confi guration The fact that a number of worker processes can be changed during runtime Dynamic load balance Fault tolerance Auto-discovery of computational resources Celery – a distributed task queue Celery is an excellent Python module that's used to create distributed systems and has excellent documentation. It makes use of at least three different types of approach to run tasks in concurrent form—multiprocessing, Eventlet, and Gevent. This work will, however, concentrate efforts on the use of the multiprocessing approach. Also, the link between one and another is a configuration issue, and it remains as a study so that the reader is able to establish comparisons with his/her own experiments. The Celery module can be obtained on the official project page at http://celeryproject.org. Summary In this article, we had a short introduction to some Python modules, built-in and external, which makes a developer's life easier when building up parallel systems. Resources for Article: Further resources on this subject: Getting Started with Spring Python [Article] Python Testing: Installing the Robot Framework [Article] Getting Up and Running with MySQL for Python [Article]
Read more
  • 0
  • 0
  • 1780

article-image-getting-started-mockito
Packt
19 Jun 2014
14 min read
Save for later

Getting Started with Mockito

Packt
19 Jun 2014
14 min read
(For more resources related to this topic, see here.) Mockito is an open source framework for Java that allows you to easily create test doubles (mocks). What makes Mockito so special is that it eliminates the common expect-run-verify pattern (which was present, for example, in EasyMock—please refer to http://monkeyisland.pl/2008/02/24/can-i-test-what-i-want-please for more details) that in effect leads to a lower coupling of the test code to the production code as such. In other words, one does not have to define the expectations of how the mock should behave in order to verify its behavior. That way, the code is clearer and more readable for the user. On one hand, Mockito has a very active group of contributors and is actively maintained. On the other hand, by the time this article is written, the last Mockito release (Version 1.9.5) would have been in October 2012. You may ask yourself the question, "Why should I even bother to use Mockito in the first place?" Out of many, Mockito offers the following key features: There is no expectation phase for Mockito—you can either stub or verify the mock's behavior You are able to mock both interfaces and classes You can produce little boilerplate code while working with Mockito by means of annotations You can easily verify or stub with intuitive argument matchers Before diving into Mockito as such, one has to understand the concept behind System Under Test (SUT) and test doubles. We will base on what Gerard Meszaros has defined in the xUnit Patterns (http://xunitpatterns.com/Mocks,%20Fakes,%20Stubs%20and%20Dummies.html). SUT (http://xunitpatterns.com/SUT.html) describes the system that we are testing. It doesn't have to necessarily signify a class but any part of the application that we are testing or even the whole application as such. As for test doubles (http://www.martinfowler.com/bliki/TestDouble.html), it's an object that is used only for testing purposes, instead of a real object. Let's take a look at different types of test doubles: Dummy: This is an object that is used only for the code to compile—it doesn't have any business logic (for example, an object passed as a parameter to a method) Fake: This is an object that has an implementation but it's not production ready (for example, using an in-memory database instead of communicating with a standalone one) Stub: This is an object that has predefined answers to method executions made during the test Mock: This is an object that has predefined answers to method executions made during the test and has recorded expectations of these executions Spy: These are objects that are similar to stubs, but they additionally record how they were executed (for example, a service that holds a record of the number of sent messages) An additional remark is also related to testing the output of our application. The more decoupled your test code is from your production code, the better since you will have to spend less time (or even none) on modifying your tests after you change the implementation of the code. Coming back to the article's content—this article is all about getting started with Mockito. We will begin with how to add Mockito to your classpath. Then, we'll see a simple setup of tests for both JUnit and TestNG test frameworks. Next, we will check why it is crucial to assert the behavior of the system under test instead of verifying its implementation details. Finally, we will check out some of Mockito's experimental features, adding hints and warnings to the exception messages. The very idea of the following recipes is to prepare your test classes to work with Mockito and to show you how to do this with as little boilerplate code as possible. Due to my fondness of the behavior driven development (http://dannorth.net/introducing-bdd/ first introduced by Dan North), I'm using Mockito's BDDMockito and AssertJ's BDDAssertions static methods to make the code even more readable and intuitive in all the test cases. Also, please read Szczepan Faber's blog (author of Mockito) about the given, when, then separation in your test methods—http://monkeyisland.pl/2009/12/07/given-when-then-forever/—since these are omnipresent throughout the article. I don't want the article to become a duplication of the Mockito documentation, which is of high quality—I would like you to take a look at good tests and get acquainted with Mockito syntax from the beginning. What's more, I've used static imports in the code to make it even more readable, so if you get confused with any of the pieces of code, it would be best to consult the repository and the code as such. Adding Mockito to a project's classpath Adding Mockito to a project's classpath is as simple as adding one of the two jars to your project's classpath: mockito-all: This is a single jar with all dependencies (with the hamcrest and objenesis libraries—as of June 2011). mockito-core: This is only Mockito core (without hamcrest or objenesis). Use this if you want to control which version of hamcrest or objenesis is used. How to do it... If you are using a dependency manager that connects to the Maven Central Repository, then you can get your dependencies as follows (examples of how to add mockito-all to your classpath for Maven and Gradle): For Maven, use the following code: <dependency> <groupId>org.mockito</groupId> <artifactId>mockito-all</artifactId> <version>1.9.5</version> <scope>test</scope> </dependency> For Gradle, use the following code: testCompile "org.mockito:mockito-all:1.9.5" If you are not using any of the dependency managers, you have to either download mockito-all.jar or mockito-core.jar and add it to your classpath manually (you can download the jars from https://code.google.com/p/mockito/downloads/list). Getting started with Mockito for JUnit Before going into details regarding Mockito and JUnit integration, it is worth mentioning a few words about JUnit. JUnit is a testing framework (an implementation of the xUnit famework) that allows you to create repeatable tests in a very readable manner. In fact, JUnit is a port of Smalltalk's SUnit (both the frameworks were originally implemented by Kent Beck). What is important in terms of JUnit and Mockito integration is that under the hood, JUnit uses a test runner to run its tests (from xUnit—test runner is a program that executes the test logic and reports the test results). Mockito has its own test runner implementation that allows you to reduce boilerplate in order to create test doubles (mocks and spies) and to inject them (either via constructors, setters, or reflection) into the defined object. What's more, you can easily create argument captors. All of this is feasible by means of proper annotations as follows: @Mock: This is used for mock creation @Spy: This is used to create a spy instance @InjectMocks: This is used to instantiate the @InjectMock annotated field and inject all the @Mock or @Spy annotated fields into it (if applicable) @Captor: This is used to create an argument captor By default, you should profit from Mockito's annotations to make your code look neat and to reduce the boilerplate code in your application. Getting ready In order to add JUnit to your classpath, if you are using a dependency manager that connects to the Maven Central Repository, then you can get your dependencies as follows (examples for Maven and Gradle): To add JUnit in Maven, use the following code: <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> To add JUnit in Gradle, use the following code: testCompile('junit:junit:4.11') If you are not using any of the dependency managers, you have to download the following jars: junit.jar hamcrest-core.jar Add the downloaded files to your classpath manually (you can download the jars from https://github.com/junit-team/junit/wiki/Download-and-Install). For this recipe, our system under test will be a MeanTaxFactorCalculator class that will call an external service, TaxService, to get the current tax factor for the current user. It's a tax factor and not tax as such since, for simplicity, we will not be using BigDecimals but doubles, and I'd never suggest using doubles to anything related to money, as follows: public class MeanTaxFactorCalculator { private final TaxService taxService; public MeanTaxFactorCalculator(TaxService taxService) { this.taxService = taxService; } public double calculateMeanTaxFactorFor(Person person) { double currentTaxFactor = taxService.getCurrentTaxFactorFor(person); double anotherTaxFactor = taxService.getCurrentTaxFactorFor(person); return (currentTaxFactor + anotherTaxFactor) / 2; } } How to do it... To use Mockito's annotations, you have to perform the following steps: Annotate your test with the @RunWith(MockitoJUnitRunner.class). Annotate the test fields with the @Mock or @Spy annotation to have either a mock or spy object instantiated. Annotate the test fields with the @InjectMocks annotation to first instantiate the @InjectMock annotated field and then inject all the @Mock or @Spy annotated fields into it (if applicable). The following snippet shows the JUnit and Mockito integration in a test class that verifies the SUT's behavior (remember that I'm using BDDMockito.given(...) and AssertJ's BDDAssertions.then(...) static methods: @RunWith(MockitoJUnitRunner.class) public class MeanTaxFactorCalculatorTest { static final double TAX_FACTOR = 10; @Mock TaxService taxService; @InjectMocks MeanTaxFactorCalculator systemUnderTest; @Test public void should_calculate_mean_tax_factor() { // given given(taxService.getCurrentTaxFactorFor(any(Person.class))).willReturn(TAX_FACTOR); // when double meanTaxFactor = systemUnderTest.calculateMeanTaxFactorFor(new Person()); // then then(meanTaxFactor).isEqualTo(TAX_FACTOR); } } To profit from Mockito's annotations using JUnit, you just have to annotate your test class with @RunWith(MockitoJUnitRunner.class). How it works... The Mockito test runner will adapt its strategy depending on the version of JUnit. If there exists a org.junit.runners.BlockJUnit4ClassRunner class, it means that the codebase is using at least JUnit in Version 4.5.What eventually happens is that the MockitoAnnotations.initMocks(...) method is executed for the given test, which initializes all the Mockito annotations (for more information, check the subsequent There's more… section). There's more... You may have a situation where your test class has already been annotated with a @RunWith annotation and seemingly, you may not profit from Mockito's annotations. In order to achieve this, you have to call the MockitoAnnotations.initMocks method manually in the @Before annotated method of your test, as shown in the following code: public class MeanTaxFactorCalculatorTest { static final double TAX_FACTOR = 10; @Mock TaxService taxService; @InjectMocks MeanTaxFactorCalculator systemUnderTest; @Before public void setup() { MockitoAnnotations.initMocks(this); } @Test public void should_calculate_mean_tax_factor() { // given given(taxService.getCurrentTaxFactorFor(Mockito.any(Person.class))).willReturn(TAX_FACTOR); // when double meanTaxFactor = systemUnderTest.calculateMeanTaxFactorFor(new Person()); // then then(meanTaxFactor).isEqualTo(TAX_FACTOR); } } To use Mockito's annotations without a JUnit test runner, you have to call the MockitoAnnotations.initMocks method and pass the test class as its parameter. Mockito checks whether the user has overridden the global configuration of AnnotationEngine, and if this is not the case, the InjectingAnnotationEngine implementation is used to process annotations in tests. What is done internally is that the test class fields are scanned for annotations and proper test doubles are initialized and injected into the @InjectMocks annotated object (either by a constructor, property setter, or field injection, in that precise order). You have to remember several factors related to the automatic injection of test doubles as follows: If Mockito is not able to inject test doubles into the @InjectMocks annotated fields through either of the strategies, it won't report failure—the test will continue as if nothing happened (and most likely, you will get NullPointerException). For constructor injection, if arguments cannot be found, then null is passed For constructor injection, if nonmockable types are required in the constructor, then the constructor injection won't take place. For other injection strategies, if you have properties with the same type (or same erasure) and if Mockito matches mock names with a field/property name, it will inject that mock properly. Otherwise, the injection won't take place. For other injection strategies, if the @InjectMocks annotated object wasn't previously initialized, then Mockito will instantiate the aforementioned object using a no-arg constructor if applicable. See also JUnit documentation at https://github.com/junit-team/junit/wiki Martin Fowler's article on xUnit at http://www.martinfowler.com/bliki/Xunit.html Gerard Meszaros's xUnit Test Patterns at http://xunitpatterns.com/ @InjectMocks Mockito documentation (with description of injection strategies) at http://docs.mockito.googlecode.com/hg/1.9.5/org/mockito/InjectMocks.html Getting started with Mockito for TestNG Before going into details regarding Mockito and TestNG integration, it is worth mentioning a few words about TestNG. TestNG is a unit testing framework for Java that was created, as the author defines it on the tool's website (refer to the See also section for the link), out of frustration for some JUnit deficiencies. TestNG was inspired by both JUnit and TestNG and aims at covering the whole scope of testing—from unit, through functional, integration, end-to-end tests, and so on. However, the JUnit library was initially created for unit testing only. The main differences between JUnit and TestNG are as follows: The TestNG author disliked JUnit's approach of having to define some methods as static to be executed before the test class logic gets executed (for example, the @BeforeClass annotated methods)—that's why in TestNG you don't have to define these methods as static TestNG has more annotations related to method execution before single tests, suites, and test groups TestNG annotations are more descriptive in terms of what they do; for example, the JUnit's @Before versus TestNG's @BeforeMethod Mockito in Version 1.9.5 doesn't provide any out-of-the-box solution to integrate with TestNG in a simple way, but there is a special Mockito subproject for TestNG (refer to the See also section for the URL) that should be part one of the subsequent Mockito releases. In the following recipe, we will take a look at how to profit from that code and that very elegant solution. Getting ready When you take a look at Mockito's TestNG subproject on the Mockito GitHub repository, you will find that there are three classes in the org.mockito.testng package, as follows: MockitoAfterTestNGMethod MockitoBeforeTestNGMethod MockitoTestNGListener Unfortunately, until this project eventually gets released you have to just copy and paste those classes to your codebase. How to do it... To integrate TestNG and Mockito, perform the following steps: Copy the MockitoAfterTestNGMethod, MockitoBeforeTestNGMethod, and MockitoTestNGListener classes to your codebase from Mockito's TestNG subproject. Annotate your test class with @Listeners(MockitoTestNGListener.class). Annotate the test fields with the @Mock or @Spy annotation to have either a mock or spy object instantiated. Annotate the test fields with the @InjectMocks annotation to first instantiate the @InjectMock annotated field and inject all the @Mock or @Spy annotated fields into it (if applicable). Annotate the test fields with the @Captor annotation to make Mockito instantiate an argument captor. Now let's take a look at this snippet that, using TestNG, checks whether the mean tax factor value has been calculated properly (remember that I'm using the BDDMockito.given(...) and AssertJ's BDDAssertions.then(...) static methods: @Listeners(MockitoTestNGListener.class) public class MeanTaxFactorCalculatorTestNgTest { static final double TAX_FACTOR = 10; @Mock TaxService taxService; @InjectMocks MeanTaxFactorCalculator systemUnderTest; @Test public void should_calculate_mean_tax_factor() { // given given(taxService.getCurrentTaxFactorFor(any(Person.class))).willReturn(TAX_FACTOR); // when double meanTaxFactor = systemUnderTest.calculateMeanTaxFactorFor(new Person()); // then then(meanTaxFactor).isEqualTo(TAX_FACTOR); } } How it works... TestNG allows you to register custom listeners (your listener class has to implement the IInvokedMethodListener interface). Once you do this, the logic inside the implemented methods will be executed before and after every configuration and test methods get called. Mockito provides you with a listener whose responsibilities are as follows: Initialize mocks annotated with the @Mock annotation (it is done only once) Validate the usage of Mockito after each test method Remember that with TestNG, all mocks are reset (or initialized if it hasn't already been done so) before any TestNG method! See also The TestNG homepage at http://testng.org/doc/index.html The Mockito TestNG subproject at https://github.com/mockito/mockito/tree/master/subprojects/testng The Getting started with Mockito for JUnit recipe on the @InjectMocks analysis
Read more
  • 0
  • 0
  • 2709

article-image-common-performance-issues
Packt
19 Jun 2014
16 min read
Save for later

Common performance issues

Packt
19 Jun 2014
16 min read
(For more resources related to this topic, see here.) Threading performance issues Threading performance issues are the issues related to concurrency, as follows: Lack of threading or excessive threading Threads blocking up to starvation (usually from competing on shared resources) Deadlock until the complete application hangs (threads waiting for each other) Memory performance issues Memory performance issues are the issues that are related to application memory management, as follows: Memory leakage: This issue is an explicit leakage or implicit leakage as seen in improper hashing Improper caching: This issue is due to over caching, inadequate size of the object, or missing essential caching Insufficient memory allocation: This issue is due to missing JVM memory tuning Algorithmic performance issues Implementing the application logic requires two important parameters that are related to each other; correctness and optimization. If the logic is not optimized, we have algorithmic issues, as follows: Costive algorithmic logic Unnecessary logic Work as designed performance issues The work as designed performance issue is a group of issues related to the application design. The application behaves exactly as designed but if the design has issues, it will lead to performance issues. Some examples of performance issues are as follows: Using synchronous when asynchronous should be used Neglecting remoteness, that is, using remote calls as if they are local calls Improper loading technique, that is, eager versus lazy loading techniques Selection of the size of the object Excessive serialization layers Web services granularity Too much synchronization Non-scalable architecture, especially in the integration layer or middleware Saturated hardware on a shared infrastructure Interfacing performance issues Whenever the application is dealing with resources, we may face the following interfacing issues that could impact our application performance: Using an old driver/library Missing frequent database housekeeping Database issues, such as, missing database indexes Low performing JMS or integration service bus Logging issues (excessive logging or not following the best practices while logging) Network component issues, that is, load balancer, proxy, firewall, and so on Miscellaneous performance issues Miscellaneous performance issues include different performance issues, as follows: Inconsistent performance of application components, for example, having slow components can cause the whole application to slow down Introduced performance issues to delay the processing speed Improper configuration tuning of different components, for example, JVM, application server, and so on Application-specific performance issues, such as excessive validations, apply many business rules, and so on Fake performance issues Fake performance issues could be a temporary issue or not even an issue. The famous examples are as follows: Networking temporary issues Scheduled running jobs (detected from the associated pattern) Software automatic updates (it must be disabled in production) Non-reproducible issues In the following sections, we will go through some of the listed issues. Threading performance issues Multithreading has the advantage of maximizing the hardware utilization. In particular, it maximizes the processing power by executing multiple tasks concurrently. But it has different side effects, especially if not used wisely inside the application. For example, in order to distribute tasks among different concurrent threads, there should be no or minimal data dependency, so each thread can complete its task without waiting for other threads to finish. Also, they shouldn't compete over different shared resources or they will be blocked, waiting for each other. We will discuss some of the common threading issues in the next section. Blocking threads A common issue where threads are blocked is waiting to obtain the monitor(s) of certain shared resources (objects), that is, holding by other threads. If most of the application server threads are consumed in a certain blocked status, the application becomes gradually unresponsive to user requests. In the Weblogic application server, if a thread keeps executing for more than a configurable period of time (not idle), it gets promoted to the Stuck thread. The more the threads are in the stuck status, the more the server status becomes critical. Configuring the stuck thread parameters is part of the Weblogic performance tuning. Performance symptoms The following symptoms are the performance symptoms that usually appear in cases of thread blocking: Slow application response (increased single request latency and pending user requests) Application server logs might show some stuck threads. The server's healthy status becomes critical on monitoring tools (application server console or different monitoring tools) Frequent application server restarts either manually or automatically Thread dump shows a lot of threads in the blocked status waiting for different resources Application profiling shows a lot of thread blocking An example of thread blocking To understand the effect of thread blocking on application execution, open the HighCPU project and measure the time it takes for execution by adding the following additional lines: long start= new Date().getTime(); .. .. long duration= new Date().getTime()-start; System.err.println("total time = "+duration); Now, try to execute the code with a different number of the thread pool size. We can try using the thread pool size as 50 and 5, and compare the results. In our results, the execution of the application with 5 threads is much faster than 50 threads! Let's now compare the NetBeans profiling results of both the executions to understand the reason behind this unexpected difference. The following screenshot shows the profiling of 50 threads; we can see a lot of blocking for the monitor in the column and the percentage of Monitor to the left waiting around at 75 percent: To get the preceding profiling screen, click on the Profile menu inside NetBeans, and then click on Profile Project (HighCPU). From the pop-up options, select Monitor and check all the available options, and then click on Run. The following screenshot shows the profiling of 5 threads, where there is almost no blocking, that is, less threads compete on these resources: Try to remove the System.out statement from inside the run() method, re-execute the tests, and compare the results. Another factor that also affects the selection of the pool size, especially when the thread execution takes long time, is the context switching overhead. This overhead requires the selection of the optimal pool size, usually related to the number of available processors for our application. Context switching is the CPU switching from one process (or thread) to another, which requires restoration of the execution data (different CPU registers and program counters). The context switching includes suspension of the current executing process, storing its current data, picking up the next process for execution according to its priority, and restoring its data. Although it's supported on the hardware level and is faster, most operating systems do this on the level of software context switching to improve the performance. The main reason behind this is the ability of the software context switching to selectively choose the required registers to save. Thread deadlock When many threads hold the monitor for objects that they need, this will result in a deadlock unless the implementation uses the new explicit Lock interface. In the example, we had a deadlock caused by two different threads waiting to obtain the monitor that the other thread held. The thread profiling will show these threads in a continuous blocking status, waiting for the monitors. All threads that go into the deadlock status become out of service for the user's requests, as shown in the following screenshot: Usually, this happens if the order of obtaining the locks is not planned. For example, if we need to have a quick and easy fix for a multidirectional thread deadlock, we can always lock the smallest or the largest bank account first, regardless of the transfer direction. This will prevent any deadlock from happening in our simple two-threaded mode. But if we have more threads, we need to have a much more mature way to handle this by using the Lock interface or some other technique. Memory performance issues In spite of all this great effort put into the allocated and free memory in an optimized way, we still see memory issues in Java Enterprise applications mainly due to the way people are dealing with memory in these applications. We will discuss mainly three types of memory issues: memory leakage, memory allocation, and application data caching. Memory leakage Memory leakage is a common performance issue where the garbage collector is not at fault; it is mainly the design/coding issues where the object is no longer required but it remains referenced in the heap, so the garbage collector can't reclaim its space. If this is repeated with different objects over a long period (according to object size and involved scenarios), it may lead to an out of memory error. The most common example of memory leakage is adding objects to the static collections (or an instance collection of long living objects, such as a servlet) and forgetting to clean collections totally or partially. Performance symptoms The following symptoms are some of the expected performance symptoms during a memory leakage in our application: The application uses heap memory increased by time The response slows down gradually due to memory congestion OutOfMemoryError occurs frequently in the logs and sometimes an application server restart is required Aggressive execution of garbage collection activities Heap dump shows a lot of objects retained (from the leakage types) A sudden increase of memory paging as reported by the operating system monitoring tools An example of memory leakage We have a sample application ExampleTwo; this is a product catalog where users can select products and add them to the basket. The application is written in spaghetti code, so it has a lot of issues, including bad design, improper object scopes, bad caching, and memory leakage. The following screenshot shows the product catalog browser page: One of the bad issues is the usage of the servlet instance (or static members), as it causes a lot of issues in multiple threads and has a common location for unnoticed memory leakages. We have added the following instance variable as a leakage location: private final HashMap<String, HashMap> cachingAllUsersCollection = new HashMap(); We will add some collections to the preceding code to cause memory leakage. We also used the caching in the session scope, which causes implicit leakage. The session scope leakage is difficult to diagnose, as it follows the session life cycle. Once the session is destroyed, the leakage stops, so we can say it is less severe but more difficult to catch. Adding global elements, such as a catalog or stock levels, to the session scope has no meaning. The session scope should only be restricted to the user-specific data. Also, forgetting to remove data that is not required from a session makes the memory utilization worse. Refer to the following code: @Stateful public class CacheSessionBean Instead of using a singleton class here or stateless bean with a static member, we used the Stateful bean, so it is instantiated per user session. We used JPA beans in the application layers instead of using View Objects. We also used loops over collections instead of querying or retrieving the required object directly, and so on. It would be good to troubleshoot this application with different profiling aspects to fix all these issues. All these factors are enough to describe such a project as spaghetti. We can use our knowledge in Apache JMeter to develop simple testing scenarios. As shown in the following screenshot, the scenario consists of catalog navigations and details of adding some products to the basket: Executing the test plan using many concurrent users over many iterations will show the bad behavior of our application, where the used memory is increased by time. There is no justification as the catalog is the same for all users and there's no specific user data, except for the IDs of the selected products. Actually, it needs to be saved inside the user session, which won't take any remarkable memory space. In our example, we intend to save a lot of objects in the session, implement a wrong session level, cache, and implement meaningless servlet level caching. All this will contribute to memory leakage. This gradual increase in the memory consumption is what we need to spot in our environment as early as possible (as we can see in the following screenshot, the memory consumption in our application is approaching 200 MB!): Improper data caching Caching is one of the critical components in the enterprise application architecture. It increases the application performance by decreasing the time required to query the object again from its data store, but it also complicates the application design and causes a lot of other secondary issues. The main concerns in the cache implementation are caching refresh rate, caching invalidation policy, data inconsistency in a distributed environment, locking issues while waiting to obtain the cached object's lock, and so on. Improper caching issue types The improper caching issue can take a lot of different variants. We will pick some of them and discuss them in the following sections. No caching (disabled caching) Disabled caching will definitely cause a big load over the interfacing resources (for example, database) by hitting it in with almost every interaction. This should be avoided while designing an enterprise application; otherwise; the application won't be usable. Fortunately, this has less impact than using wrong caching implementation! Most of the application components such as database, JPA, and application servers already have an out-of-the-box caching support. Too small caching size Too small caching size is a common performance issue, where the cache size is initially determined but doesn't get reviewed with the increase of the application data. The cache sizing is affected by many factors such as the memory size. If it allows more caching and the type of the data, lookup data should be cached entirely when possible, while transactional data shouldn't be cached unless required under a very strict locking mechanism. Also, the cache replacement policy and invalidation play an important role and should be tailored according to the application's needs, for example, least frequently used, least recently used, most frequently used, and so on. As a general rule, the bigger the cache size, the higher the cache hit rate and the lower the cache miss ratio. Also, the proper replacement policy contributes here; if we are working—as in our example—on an online product catalog, we may use the least recently used policy so all the old products will be removed, which makes sense as the users usually look for the new products. Monitoring of the caching utilization periodically is an essential proactive measure to catch any deviations early and adjust the cache size according to the monitoring results. For example, if the cache saturation is more than 90 percent and the missed cache ratio is high, a cache resizing is required. Missed cache hits are very costive as they hit the cache first and then the resource itself (for example, database) to get the required object, and then add this loaded object into the cache again by releasing another object (if the cache is 100 percent), according to the used cache replacement policy. Too big caching size Too big caching size might cause memory issues. If there is no control over the cache size and it keeps growing, and if it is a Java cache, the garbage collector will consume a lot of time trying to garbage collect that huge memory, aiming to free some memory. This will increase the garbage collection pause time and decrease the cache throughput. If the cache throughput is decreased, the latency to get objects from the cache will increase causing the cache retrieval cost to be high to the level it might be slower than hitting the actual resources (for example, database). Using the wrong caching policy Each application's cache implementation should be tailored according to the application's needs and data types (transactional versus lookup data). If the selection of the caching policy is wrong, the cache will affect the application performance rather than improving it. Performance symptoms According to the cache issue type and different cache configurations, we will see the following symptoms: Decreased cache hit rate (and increased cache missed ratio) Increased cache loading because of the improper size Increased cache latency with a huge caching size Spiky pattern in the performance testing response time, in case the cache size is not correct, causes continuous invalidation and reloading of the cached objects An example of improper caching techniques In our example, ExampleTwo, we have demonstrated many caching issues, such as no policy defined, global cache is wrong, local cache is improper, and no cache invalidation is implemented. So, we can have stale objects inside the cache. Cache invalidation is the process of refreshing or updating the existing object inside the cache or simply removing it from the cache. So in the next load, it reflects its recent values. This is to keep the cached objects always updated. Cache hit rate is the rate or ratio in which cache hits match (finds) the required cached object. It is the main measure for cache effectiveness together with the retrieval cost. Cache miss rate is the rate or ratio at which the cache hits the required object that is not found in the cache. Last access time is the timestamp of the last access (successful hit) to the cached objects. Caching replacement policies or algorithms are algorithms implemented by a cache to replace the existing cached objects with other new objects when there are no rooms available for any additional objects. This follows missed cache hits for these objects. Some examples of these policies are as follows: First-in-first-out (FIFO): In this policy, the cached object is aged and the oldest object is removed in favor of the new added ones. Least frequently used (LFU): In this policy, the cache picks the less frequently used object to free the memory, which means the cache will record statistics against each cached object. Least recently used (LRU): In this policy, the cache replaces the least recently accessed or used items; this means the cache will keep information like the last access time of all cached objects. Most recently used (MRU): This policy is the opposite of the previous one; it removes the most recently used items. This policy fits the application where items are no longer needed after the access, such as used exam vouchers. Aging policy: Every object in the cache will have an age limit, and once it exceeds this limit, it will be removed from the cache in the simple type. In the advanced type, it will also consider the invalidation of the cache according to predefined configuration rules, for example, every three hours, and so on. It is important for us to understand that caching is not our magic bullet and it has a lot of related issues and drawbacks. Sometimes, it causes overhead if not correctly tailored according to real application needs.
Read more
  • 0
  • 0
  • 4773
article-image-anatomy-report-processor
Packt
11 Jun 2014
6 min read
Save for later

The anatomy of a report processor

Packt
11 Jun 2014
6 min read
(For more resources related to this topic, see here.) At its most basic, a Puppet report processor is a piece of Ruby code that is triggered every time a Puppet agent passes a report to the Puppet master. This piece of code is passed as a Ruby object that contains both the client report and metrics. Although the data is sent in a wire format, such as YAML or PSON, by the time a report processor is triggered, this data is turned into an object by Puppet. This code can simply provide reports, but we're not limited to that. With a little imagination, we can use Puppet report processors for everything from alerts through to the orchestration of events. For instance, using a report processor and a suitable SMS provider would make it easy for Puppet to send you an SMS alert every time a run fails, or alternatively, using a report processor, you could analyze the data to reveal trends in your changes and update a change management console. The best way to think of a report processor is that it is a means to trigger actions on the event of a change, rather than strictly a reporting tool. Puppet reports are written in plain old Ruby, and so you have access to the multitude of libraries available via the RubyGems repositories. This can make developing your plugins relatively simple, as half the time you will find that the heavy lifting has been done for you by some enterprising fellow who has already solved your problem and published his code in a gem. Good examples of this can be found if you need to interoperate with another product such as MySQL, Oracle, Salesforce, and so on. A brief search on the Internet will bring up three or four examples of libraries that will offer this functionality within a few lines of code. Not having to produce the plumbing of a solution will both save time and generally produce fewer bugs. Creating a basic report processor Let's take a look at an incredibly simple report processor example. In the event that a Puppet agent fails to run, the following code will take the incoming data and create a little text file with a short message detailing which host had the problem: include puppet Puppet::Reports::register_report(:myfirstreport) do desc "My very first report!" def process if self.status == 'failed' msg = "failed puppet run for #{self.host} #{self.status} File.open('./tmp/puppetpanic.txt', 'w') { | f | f.write(msg)} end end end Although this code is basic, it contains all of the components required for a report processor. The first line includes the only mandatory library required: the Puppet library. This gives us access to several important methods that allow us to register and describe our report processor, and finally, a method to allow us to process our data. Registering your report processor The first method that every report processor must call is the Puppet::Reports::register_report method. This method can only take one argument, which is the name of the report processor. This name should be passed as a symbol and an alphanumeric title that starts with a letter (:report3 would be fine, but :3reports would not be). Try to avoid using any other characters—although you can potentially use underscores, the documentation is rather discouragingly vague on how valid this is and could well cause issues. Describing your report processor After we've called the Puppet::Reports::register_report method, we then need to call the desc method. The desc method is used to provide some brief documentation for what the report processor does and allows the use of Markdown formatting in the string. Processing your report The last method that every report processor must include is the process method. The process method is where we actually take our Puppet data and process it, and to make working with the report data easier, you have access to the .self object within the process method. The .self object is a Puppet::Transaction::Report object and gives you access to the Puppet report data. For example, to extract the hostname of the reporting host, we can use the self.host object. You can find the full details of what is contained in the Puppet::Transaction::Report object by visiting http://docs.puppetlabs.com/puppet/latest/reference/format_report.html. Let's go through our small example in detail and look at what it's doing. First of all, we include the Puppet library to ensure that we have access to the required methods. We then register our report by calling the Puppet::Reports::register_report(:myfirstreport) method and pass it the name of myfirstreport. Next, we add our desc method to tell users what this report is for. Finally, we have the process method, which is where we are going to place our code to process the report. For this example, we're going to keep it simple and simply check if the Puppet agent reported a successful run or not, and we do this by checking the Puppet status. This is described in the following code snippet: if self.status == 'failed' msg = "failed puppet run for #{self.host}#{self.status}" The transaction can produce one of three states: failed, changed, or unchanged. This is straightforward; a failed client run is any run that contains a resource that has a status of failed, a changed state is triggered when the client run contains a resource that has been given a status of changed, and the unchanged state occurs when a resource contains a value of out_of_sync; this generally happens if you run the Puppet client in noop (simulation) mode. Finally, we actually do something with the data. In the case of this very simple application, we're going to place the warning into a plain text file in the /tmp directory. This is described in the following code snippet: msg = "failed puppet run for #{self.host}" File.open('/tmp/puppetpanic.txt', 'w') { | f | f.write(msg)} As you can see, we're using basic string interpolation to take some of our report data and place it into the message. This is then written into a simple plain text file in the /tmp directory. Summary In this article, we have seen the anatomy of a report processor. We have also seen a basic Ruby code that sets up a simple report processor. Resources for Article: Further resources on this subject: Puppet: Integrating External Tools [Article] Quick start – Using the core Puppet resource types [Article] External Tools and the Puppet Ecosystem [Article]
Read more
  • 0
  • 0
  • 980

article-image-ranges
Packt
22 May 2014
11 min read
Save for later

Ranges

Packt
22 May 2014
11 min read
(For more resources related to this topic, see here.) Sorting ranges efficiently Phobos' std.algorthm includes sorting algorithms. Let's look at how they are used, what requirements they have, and the dangers of trying to implement range primitives without minding their efficiency requirements. Getting ready Let's make a linked list container that exposes an appropriate view, a forward range, and an inappropriate view, a random access range that doesn't meet its efficiency requirements. A singly-linked list can only efficiently implement forward iteration due to its nature; the only tool it has is a pointer to the next element. Implementing any other range primitives will require loops, which is not recommended. Here, however, we'll implement a fully functional range, with assignable elements, length, bidirectional iteration, random access, and even slicing on top of a linked list to see the negative effects this has when we try to use it. How to do it… We're going to both sort and benchmark this program. To sort Let's sort ranges by executing the following steps: Import std.algorithm. Determine the predicate you need. The default is (a, b) => a < b, which results in an ascending order when the sorting is complete (for example, [1,2,3]). If you want ascending order, you don't have to specify a predicate at all. If you need descending order, you can pass a greater-than predicate instead, as shown in the following line of code: auto sorted = sort!((a, b) => a > b)([1,2,3]); // results: [3,2,1] When doing string comparisons, the functions std.string.cmp (case-sensitive) or std.string.icmp (case-insensitive) may be used, as is done in the following code: auto sorted = sort!((a, b) => cmp(a, b) < 0)(["b", "c", "a"]); // results: a, b, c Your predicate may also be used to sort based on a struct member, as shown in the following code: auto sorted = sort!((a, b) => a.value < b.value)(structArray); Pass the predicate as the first compile-time argument. The range you want to sort is passed as the runtime argument. If your range is not already sortable (if it doesn't provide the necessary capabilities), you can convert it to an array using the array function from std.range, as shown in the following code: auto sorted = sort(fibanocci().take(10)); // won't compile, not enough capabilities auto sorted = sort(fibanocci().take(10).array); // ok, good Use the sorted range. It has a unique type from the input to signify that it has been successfully sorted. Other algorithms may use this knowledge to increase their efficiency. To benchmark Let's sort objects using benchmark by executing the following steps: Put our range and skeleton main function from the Getting ready section of this recipe into a file. Use std.datetime.benchmark to test the sorting of an array from the appropriate walker against the slow walker and print the results at the end of main. The code is as follows: auto result = benchmark!( { auto sorted = sort(list.walker.array); }, { auto sorted = sort(list.slowWalker); } )(100); writefln("Emulation resulted in a sort that was %d times slower.", result[1].hnsecs / result[0].hnsecs); Run it. Your results may vary slightly, but you'll see that the emulated, inappropriate range functions are consistently slower. The following is the output: Emulation resulted in a sort that was 16 times slower. Tweak the size of the list by changing the initialization loop. Instead of 1000 entries, try 2000 entries. Also, try to compile the program with inlining and optimization turned on (dmd –inline –O yourfile.d) and see the difference. The emulated version will be consistently slower, and as the list becomes longer, the gap will widen. On my computer, a growing list size led to a growing slowdown factor, as shown in the following table: List size Slowdown factor 500 13 1000 16 2000 29 4000 73 How it works… The interface to Phobos' main sort function hides much of the complexity of the implementation. As long as we follow the efficiency rules when writing our ranges, things either just work or fail, telling us we must call an array in the range before we can sort it. Building an array has a cost in both time and memory, which is why it isn't performed automatically (std.algorithm prefers lazy evaluation whenever possible for best speed and minimum memory use). However, as you can see in our benchmark, building an array is much cheaper than emulating unsupported functions. The sort algorithms require a full-featured range and will modify the range you pass to it instead of allocating memory for a copy. Thus, the range you pass to it must support random access, slicing, and either assignable or swappable elements. The prime example of such a range is a mutable array. This is why it is often necessary to use the array function when passing data to sort. Our linked list code used static if with a compile-time parameter as a configuration tool. The implemented functions include opSlice and properties that return ref. The ref value can only be used on function return values or parameters. Assignments to a ref value are forwarded to the original item. The opSlice function is called when the user tries to use the slice syntax: obj[start .. end]. Inside the beSlow condition, we broke the main rule of implementing range functions: avoid loops. Here, we see the consequences of breaking that rule; it ruined algorithm restrictions and optimizations, resulting in code that performs very poorly. If we follow the rules, we at least know where a performance problem will arise and can handle it gracefully. For ranges that do not implement the fast length property, std.algorithm includes a function called walkLength that determines the length by looping through all items (like we did in the slow length property). The walkLength function has a longer name than length precisely to warn you that it is a slower function, running in O(n) (linear with length) time instead of O(1) (constant) time. Slower functions are OK, they just need to be explicit so that the user isn't surprised. See also The std.algorithm module also includes other sorting algorithms that may fit a specific use case better than the generic (automatically specialized) function. See the documentation at http://dlang.org/phobos/std_algorithm.html for more information. Searching ranges Phobos' std.algorithm module includes search functions that can work on any ranges. It automatically specializes based on type information. Searching a sorted range is faster than an unsorted range. How to do it… Searching has a number of different scenarios, each with different methods: If you want to know if something is present, use canFind. Finding an item generically can be done with the find function. It returns the remainder of the range, with the located item at the front. When searching for a substring in a string, you can use haystack.find(boyerMooreFinder(needle)). This uses the Boyer-Moore algorithm which may give better performance. If you want to know the index where the item is located, use countUntil. It returns a numeric index into the range, just like the indexOf function for strings. Each find function can take a predicate to customize the search operation. When you know your range is sorted but the type doesn't already prove it, you may call assumeSorted on it before passing it to the search functions. The assumeSorted function has no runtime cost; it only adds information to the type that is used for compile-time specialization. How it works… The search functions in Phobos make use of the ranges' available features to choose good-fit algorithms. Pass them efficiently implemented ranges with accurate capabilities to get best performance. The find function returns the remainder of the data because this is the most general behavior; it doesn't need random access, like returning an index, and doesn't require an additional function if you are implementing a function to split a range on a given condition. The find function can work with a basic input range, serving as a foundation to implement whatever you need on top of it, and it will transparently optimize to use more range features if available. Using functional tools to query data The std.algorithm module includes a variety of higher-order ranges that provide tools similar to functional tools. Here, we'll see how D code can be similar to a SQL query. A SQL query is as follows: SELECT id, name, strcat("Title: ", title) FROM users WHERE name LIKE 'A%' ORDER BY id DESC LIMIT 5; How would we express something similar in D? Getting ready Let's create a struct to mimic the data table and make an array with the some demo information. The code is as follows: struct User { int id; string name; string title; } User[] users; users ~= User(1, "Alice", "President"); users ~= User(2, "Bob", "Manager"); users ~= User(3, "Claire", "Programmer"); How to do it… Let's use functional tools to query data by executing the following steps: Import std.algorithm. Use sort to translate the ORDER BY clause. If your dataset is large, you may wish to sort it at the end. This will likely require a call to an array, but it will only sort the result set instead of everything. With a small dataset, sorting early saves an array allocation. Use filter to implement the WHERE clause. Use map to implement the field selection and functions. The std.typecons.tuple module can also be used to return specific fields. Use std.range.take to implement the LIMIT clause. Put it all together and print the result. The code is as follows: import std.algorithm; import std.range; import std.typecons : tuple; // we use this below auto resultSet = users. sort!((a, b) => a.id > b.id). // the ORDER BY clause filter!((item) => item.name.startsWith("A")). // the WHERE clause take(5). map!((item) => tuple(item.id, item.name,"Title: " ~ item.title)); // the field list and transformations import std.stdio; foreach(line; resultSet) writeln(line[0], " ", line[1], " ", line[2]); It will print the following output: 1 Alice Title: President How it works… Many SQL operations or list comprehensions can be expressed in D using some building blocks from std.algorithm. They all work generally the same way; they take a predicate as a compile-time argument. The predicate is passed one or two items at a time and you perform a check or transformation on it. Chaining together functions with the dot syntax, like we did here, is possible thanks to uniform function call syntax. It could also be rewritten as take(5, filter!pred(map!pred(users))). It depends on author's preference, as both styles work exactly the same way. It is important to remember that all std.algorithm higher-order ranges are evaluated lazily. This means no computations, such as looping over or printing, are actually performed until they are required. Writing code using filter, take, map, and many other functions is akin to preparing a query. To execute it, you may print or loop the result, or if you want to save it to an array for use later, simply call .array at the end. There's more… The std.algorithm module also includes other classic functions, such as reduce. It works the same way as the others. D has a feature called pure functions. The functions in std.algorithm are conditionally pure, which means they can be used in pure functions if and only if the predicates you pass are also pure. With lambda functions, like we've been using here, the compiler will often automatically deduce this for you. If you use other functions you define as predicates and want to use it in a pure function, be sure to mark them pure as well. See also Visit http://dconf.org/2013/talks/wilson.html where Adam Wilson's DConf 2013 talk on porting C# to D showed how to translate some real-world LINQ code to D Summary In this article, we learned how to sort ranges in an efficient manner by using sorting algorithms. We learned how to search a range using different functions. We also learned how to use functional tools to query data (similar to a SQL query). Resources for Article: Further resources on this subject: Watching Multiple Threads in C# [article] Application Development in Visual C++ - The Tetris Application [article] Building UI with XAML for Windows 8 Using C [article]
Read more
  • 0
  • 0
  • 1186

article-image-continuous-integration
Packt
20 May 2014
14 min read
Save for later

Continuous Integration

Packt
20 May 2014
14 min read
(For more resources related to this topic, see here.) This article is named Continuous Integration; so, what exactly does this mean? You can find many long definitions, but to put it simply, it is a process where you integrate your code with code from other developers and run tests to verify the code functionality. You are aiming to detect problems as soon as possible and trying to fix problems immediately. It is always easier and cheaper to fix a couple of small problems than create one big problem. This can be translated to the following workflow: The change is committed to a version control system repository (such as Git or SVN). The Continuous Integration (CI) server is either notified of, or detects a change and then runs the defined tests. CI notifies the developer if the tests fail. With this method you immediately know who created the problem and when. For the CI to be able to run tests after every commit point, these tests need to be fast. Usually, you can do this with unit tests for integration, and with functional tests it might be better to run them within a defined time interval, for example, once every hour. You can have multiple sets of tests for each project, and another golden rule should be that no code is released to the production environment until all of the tests have been passed. It may seem surprising, but these rules and processes shouldn't make your work any slower, and in fact, should allow you to work faster and be more confident about the developed code functionality and changes. Initial investment pays off when you can focus on adding new functionality and are not spending time on tracking bugs and fixing problems. Also, tested and reliable code refers to code that can be released to the production environment more frequently than traditional big releases, which require a lot of manual testing and verification. There is a real impact on business, and it's not just about the discussion as to whether it is worthwhile and a good idea to write some tests and find yourself restricted by some stupid rules anymore. What will really help and is necessary is a CI server for executing tests and processing the results; this is also called test automation. Of course, in theory you can write a script for it and test it manually, but why would you do that when there are some really nice and proven solutions available? Save your time and energy to do something more useful. In this article, we will see what we can do with the most popular CI servers used by the PHP community: Travis CI Jenkins CI Xinc For us, a CI server will always have the same main task, that is, to execute tests, but to be precise, it includes the following steps: Check the code from the repository. Execute the tests. Process the results. Send a notification when tests fail. This is the bare minimum that a server must handle. Of course, there is much more to be offered, but these steps must be easy to configure. Using a Travis CI hosted service Travis is the easiest to use from the previously mentioned servers. Why is this the case? This is because you don't have to install it. It's a service that provides integration with GitHub for many programming languages, and not just for PHP. Primarily, it's a solution for open source projects, meaning your repository on GitHub is a public repository. It also has commercial support for private repositories and commercial projects. What is really good is that you don't have to worry about server configuration; instead, you just have to specify the required configuration (in the same way you do with Composer), and Travis does everything for you. You are not just limited to unit tests, and you can even specify which database you want to use and run ingratiation tests there. However, there is also a disadvantage to this solution. If you want to use it for a private repository, you have to pay for the service, and you are also limited with regard to the server configuration. You can specify your PHP version, but it's not recommended to specify a minor version such as 5.3.8; you should instead use a major version, such as 5.3. On the other hand, you can run tests against various PHP versions, such as PHP 5.3, 5.4, or 5.5, so when you want to upgrade your PHP version, you already have the test results and know how your code will behave with the new PHP version. Travis has become the CI server of choice for many open source projects, and it's no real surprise because it's really good! Setting up Travis CI To use Travis, you will need an account on GitHub. If you haven't got one, navigate to https://github.com/ and register there. When you have a GitHub account, navigate to https://travis-ci.org/ and click on Sign in with GitHub. As you can see in the preceding screenshot, there will be a Travis application added to your GitHub account. This application will work as a trigger that will start a build after any change is pushed onto the GitHub repository. To configure the Travis project, you have to follow these steps: You will be asked to allow Travis to access your account. When you do this you will go back to the Travis site, where you will see a list of your GitHub repositories. By clicking on On/Off, you can decide which project should be used by Travis. When you click on a project configuration, you will be taken to GitHub to enable the service hook. This is because you have to run a build after every commit, and Travis is going to be notified about this change. In the menu, search for Travis and fill in the details that you can find in your Travis account settings. Only the username and token are required, and the domain is optional. For a demonstration, you can refer to my sample project, where there is just one test suite, and its purpose is to test how Travis works (navigate to https://github.com/machek/travis): Using Travis CI When you link your GitHub account to Travis and set up a project to notify Travis, you need to configure the project. You need to follow the project setup in the same way that we did earlier. To have classes, you are required to have the test suites that you want to run, a bootstrap file, and a phpunit.xml configuration file. You should try this configuration locally to ensure that you can run PHPUnit, execute tests, and make sure that all tests pass. If you cloned the sample project, you will see that there is one important file: .travis.yml. This Travis configuration file is telling Travis what the server configuration should look like, and also what will happen after each commit. Let's have a look at what this file looks like: # see http://about.travis-ci.org/docs/user/languages/php/ for more hints language: php # list any PHP version you want to test against php: - 5.3 - 5.4 # optionally specify a list of environments env: - DB=mysql # execute any number of scripts before the test run, custom env's are available as variables before_script: - if [[ "$DB" == "mysql" ]]; then mysql -e "create database IF NOT EXISTS my_db;" -uroot; fi # omitting "script:" will default to phpunit script: phpunit --configuration phpunit.xml --coverage-text # configure notifications (email, IRC, campfire etc) notifications: email: "your@email" As you can see, the configuration is really simple, and it shows that we need PHP 5.3 and 5.4, and a MySQL database to create a database, execute the PHPUnit with our configuration, and send a report to my e-mail address. After each commit, PHPUnit executes all the tests. The following screenshot shows us an interesting insight into how Travis executes our tests and which environment it uses: You can view the build and the history for all builds. Even though there are no real builds in PHP because PHP is an interpreted language and not compiled, the action performed when you clone a repository, execute PHPUnit tests, and process results is usually called a build. Travis configuration can be much more complex, and you can run Composer to update dependency and much more. Just check the Travis documentation for PHP at http://about.travis-ci.org/docs/user/languages/php/. Using the Jenkins CI server Jenkins is a CI server. The difference between Travis and Jenkins is that when you use Travis as a service, you don't have to worry about the configuration, whereas Jenkins is piece of software that you install on your hardware. This is both an advantage and a disadvantage. The disadvantage is that you have to manually install it, configure it, and also keep it up to date. The advantage is that you can configure it in a way that suits you, and all of the data and code is completely under your control. This can be very important when you have customer code and data (for testing, never use live customer data) or sensitive information that can't be passed on to a third party. The Jenkins project started as a fork of the Hudson project and is written in Java but has many plugins that suit a variety of programming languages, including PHP. In recent years, it has become very popular, and nowadays it is probably the most popular CI server. The reasons for its popularity are that it is really good, can be configured easily, and there are many plugins available that probably cover everything you might need. Installation Installation is a really straightforward process. The easiest method is to use a Jenkins installation package from http://jenkins-ci.org/. There are packages available for Windows, OS X, and Linux, and the installation process is well-documented there. Jenkins is written in Java, which means that Java or OpenJDK is required. After this comes the installation, as you just launch the installation and point it to where it should be installed, and Jenkins is listening on port 8080. Before we move on to configure the first project (or job in Jenkins terminology), we need to install a few extra plugins. This is Jenkins' biggest advantage. There are many plugins and they are very easy to install. It doesn't matter that Jenkins is a Java app as it also serves PHP very well. For our task to execute tests, process results, and send notifications, we need the following plugins: Email-ext: This plugin is used to send notifications Git or Subversion: This plugin is used to check the code xUnit: This plugin is used for processing the PHPUnit test results Clover PHP: This plugin is used for processing the code coverage To install these plugins, navigate to Jenkins | Manage Jenkins | Manage Plugins and select the Available tab. You can find and check the required plugins, or alternatively use the search filter to find the one you need: For e-mails, you might need to configure the STMP server connection at Manage Jenkins | Configure System | E-mail notification section. Usage By now, we should have installed everything that we need, and we can start to configure our first simple project. We can use the same simple project that we used for Travis. This is just one test case, but it is important to learn how to set up a project. It doesn't matter if you have one or thousands of tests though, as the setup is going to be the same. Creating a job The first step is to create a new job. Select New Job from the Jenkins main navigation window, give it a name, and select Build a free-style software project. After clicking on OK, you get to the project configuration page. The most interesting things there are listed as follows: Source Code Management: This is where you check the code Build Triggers: This specifies when to run the build Build: This tests the execution for us Post-build Actions: This publishes results and sends notifications The following screenshot shows the project configuration window in Jenkins CI: Source Code Management Source code management simply refers to your version control system, path to the repository, and the branch/branches to be used. Every build is a clean operation, which means that Jenkins starts with a new directory where the code is checked. Build Triggers Build triggers is an interesting feature. You don't have to use it and you can start to build manually, but it is better to specify when a build should run. It can run periodically at a given interval (every two hours), or you can trigger a build remotely. One way to trigger a build is to use post commit hooks in the Git/SVN repository. A post commit hook is a script that is executed after every commit. Hooks are stored in the repository in the /hooks directory (.git/hooks for Git and /hooks for SVN). What you need to do is create a post-commit (SVN) or post-receive (Git) script that will call the URL given by Jenkins when you click on a Trigger remotely checkbox with a secret token: #!/bin/sh wget http ://localhost:8080/job/Sample_Project/build?token=secret12345ABC-O /dev/null After every commit/push to the repository, Jenkins will receive a request to run the build and execute the tests to check whether all of the tests work and that any code change there is not causing unexpected problems. Build A build is something that might sound weird in the PHP world, as PHP is interpreted and not compiled; so, why do we call it a build? It's just a word. For us, it refers to a main part of the process—to execute unit tests. You have to navigate to Add a build step—click on either Execute Windows batch command or Execute shell. This depends on your operating system, but the command remains the same: phpunit --log-junit=result.xml --coverage-clover=clover.xml This is simple and outputs what we want. It executes tests, stores the results in the JUnit format in the file result.xml, and generates code coverage in the clover format in the file clover.xml. I should probably mention that PHPUnit is not installed with Jenkins, and your build machine on which Jenkins is running must have PHPUnit installed and configured, including PHP CLI. Post-build Actions In our case, there are three post-build actions required. They are listed as follows: Process the test result: This denotes whether the build succeeded or failed. You need to navigate to Add a post-build action | Publish Junit test result report and type result.xml. This matches the switch --log-junit=result.xml. Jenkins will use this file to check the tests results and publish them. Generate code coverage: This is similar to the first step. You have to add the Publish Clover PHP Coverage report field and type clover.xml. It uses a second switch, --coverage-clover=clover.xml, to generate code coverage, and Jenkins uses this file to create a code coverage report. E-mail notification: It is a good idea to send an e-mail when a build fails in order to inform everybody that there is a problem, and maybe even let them know who caused this problem and what the last commit was. This step can be added simply by choosing E-mail notification action. Results The result could be just an e-mail notification, which is handy, but Jenkins also has a very nice dashboard that displays the current status for each job, and you can also see and view the build history to see when and why a build failed. A nice feature is that you can drill down through the test results or code coverage and find more details about test cases and code coverage per class. To make testing even more interesting, you can use Jenkins' The Continuous Integration Game plugin. Every developer receives positive points for written tests and a successful build, and negative points for every build that they broke. The game leaderboard shows who is winning the build game and writing better code.
Read more
  • 0
  • 0
  • 1476
article-image-working-neo4j-embedded-database
Packt
09 May 2014
6 min read
Save for later

Working with a Neo4j Embedded Database

Packt
09 May 2014
6 min read
(For more resources related to this topic, see here.) Neo4j is a graph database, which means that it does not use tables and rows to represent data logically; instead, it uses nodes and relationships. Both nodes and relationships can have a number of properties. While relationships must have one direction and one type, nodes can have a number of labels. For example, the following diagram shows three nodes and their relationships, where every node has a label (language or graph database), while relationships have a type (QUERY_LANGUAGE_OF and WRITTEN_IN). The properties used in the graph shown in the following diagram are: name, type, and from. Note that every relation must have exactly one type and one direction, whereas labels for nodes are optional and can be multiple. Neo4j running modes Neo4j can be used in two modes: An embedded database in a Java application; A standalone server via REST In any case, this choice does not affect the way you query and work with the database. It's only an architectural choice driven by the nature of the application (whether a standalone server or a client-server), performance, monitoring, and safety of data. An embedded database An embedded Neo4j database is the best choice for performance. It runs in the same process of the client application that hosts it and stores data in the given path. Thus, an embedded database must be created programmatically. We choose an embedded database for the following reasons: When we use Java as the programming language for our project When our application is standalone Preparing the development environment The fastest way to prepare the IDE for Neo4j is using Maven. Maven is a dependency management and automated building tool. In the following procedure, we will use NetBeans 7.4, but it works in a very similar way with the other IDEs (for Eclipse, you would need the m2eclipse plugin). The procedure is described as follows: Create a new Maven project as shown in the following screenshot: In the next page of the wizard, name the project, set a valid project location, and then click on Finish. After NetBeans has created the project, expand Project Files in the project tree and open the pom.xml file. In the <dependencies> tag, insert the following XML code: <dependencies> <dependency> <groupId>org.neo4j</groupId> <artifactId>neo4j</artifactId> <version>2.0.1</version> </dependency> </dependencies> <repositories> <repository> <id>neo4j</id> <url>http://m2.neo4j.org/content/repositories/releases/</url> <releases> <enabled>true</enabled> </releases> </repository> </repositories>   This code instructs Maven the dependency we are using on our project, that is, Neo4j. The version we have used here is 2.0.1. Of course, you can specify the latest available version. Once saved, the Maven file resolves the dependency, downloads the JAR files needed, and updates the Java build path. Now, the project is ready to use Neo4j and Cypher. Creating an embedded database Creating an embedded database is straightforward. First of all, to create a database, we need a GraphDatabaseFactory class, which can be done with the following code: GraphDatabaseFactory graphDbFactory = new GraphDatabaseFactory();   Then, we can invoke the newEmbeddedDatabase method with the following code: GraphDatabaseService graphDb = graphDbFactory .newEmbeddedDatabase("data/dbName");   Now, with the GraphDatabaseService class, we can fully interact with the database, create nodes, create relationships, set properties and indexes. Invoking Cypher from Java To execute Cypher queries on a Neo4j database, you need an instance of ExecutionEngine; this class is responsible for parsing and running Cypher queries, returning results in a ExecutionResult instance: import org.neo4j.cypher.javacompat.ExecutionEngine; import org.neo4j.cypher.javacompat.ExecutionResult; // ... ExecutionEngine engine = new ExecutionEngine(graphDb); ExecutionResult result = engine.execute("MATCH (e:Employee) RETURN e");   Note that we use the org.neo4j.cypher.javacompat package and not the org.neo4j.cypher package even though they are almost the same. The reason is that Cypher is written in Scala, and Cypher authors provide us with the former package for better Java compatibility. Now with the results, we can do one of the following options: Dumping to a string value Converting to a single column iterator Iterating over the full row Dumping to a string is useful for testing purposes: String dumped = result.dumpToString();   If we print the dumped string to the standard output stream, we will get the following result: Here, we have a single column (e) that contains the nodes. Each node is dumped with all its properties. The numbers between the square brackets are the node IDs, which are the long and unique values assigned by Neo4j on the creation of the node. When the result is single column, or we need only one column of our result, we can get an iterator over one column with the following code: import org.neo4j.graphdb.ResourceIterator; // ... ResourceIterator<Node> nodes = result.columnAs("e");   Then, we can iterate that column in the usual way, as shown in the following code: while(nodes.hasNext()) { Node node = nodes.next(); // do something with node }   However, Neo4j provides a syntax-sugar utility to shorten the code that is to be iterated: import org.neo4j.helpers.collection.IteratorUtil; // ... for (Node node : IteratorUtil.asIterable(nodes)) { // do something with node }   If we need to iterate over a multiple-column result, we would write this code in the following way: ResourceIterator<Map<String, Object>> rows = result.iterator(); for(Map<String,Object> row : IteratorUtil.asIterable(rows)) { Node n = (Node) row.get("e"); try(Transaction t = n.getGraphDatabase().beginTx()) { // do something with node } }   The iterator function returns an iterator of maps, where keys are the names of the columns. Note that when we have to work with nodes, even if they are returned by a Cypher query, we have to work in transaction. In fact, Neo4j requires that every time we work with the database, either reading or writing to the database, we must be in a transaction. The only exception is when we launch a Cypher query. If we launch the query within an existing transaction, Cypher will work as any other operation. No change will be persisted on the database until we commit the transaction, but if we run the query outside any transaction, Cypher will open a transaction for us and will commit changes at the end of the query. Summary We have now completed the setting up of a Neo4j database. We also learned about Cypher pattern matching. Resources for Article: Further resources on this subject: OpenSceneGraph: Advanced Scene Graph Components [Article] Creating Network Graphs with Gephi [Article] Building a bar graph cityscape [Article]
Read more
  • 0
  • 0
  • 5206

article-image-differences-style-between-java-and-scala-code
Packt
22 Apr 2014
6 min read
Save for later

Differences in style between Java and Scala code

Packt
22 Apr 2014
6 min read
(For more resources related to this topic, see here.) Writing an algorithm in Java follows an imperative style, that is, a sequence of statements that change a program state. Scala, focusing primarily on functional programming, adopts a more declarative approach, where everything is an expression rather than a statement. Let's illustrate this in an example. In Java, you would commonly find the following code snippet: ... String customerLevel = null; if(amountBought > 3000) { customerLevel = "Gold"; } else { customerLevel = "Silver"; } ... The Scala equivalent consists of the following code snippet: scala> val amountBought = 5000 amountBought: Int = 5000 scala> val customerLevel = if (amountBought> 3000) "Gold" else "Silver" customerLevel: String = Gold Note that unlike the Java statements, if is now embedded as part of the resulting evaluated expression. In general, working where everything is evaluated as an expression (and here an immutable expression) will make it much easier for reuse as well as composition. Being able to chain the result of one expression to the next will give you a concise way of expressing fairly complicated transformations that would require much more code in Java. Adjusting the code layout As the intent of functional programming is to minimize state behavior, it often consists of short lambda expressions so that you can visualize a fairly complicated transformation in an elegant and concise way, in many cases even as one-liners. For this reason, general formatting in Scala recommends that you use only two-space indentations instead of the four-space indentation that is generally admitted in Java code, as shown in the following code snippet: scala> class Customer( val firstName: String, val lastName: String, val age: Int, val address: String, val country: String, valhasAGoodRating: Boolean ) { override def toString() = s" $firstName $lastName" } defined class Customer If you have many constructor/method parameters, having them aligned as previously illustrated makes it easier to change them without the need to reformat the whole indentation. It is also the case if you want to refactor the class with a longer name, for example, VeryImportantCustomer instead of Customer; it will make smaller and more precise differences against your version control management system (Git, subversion, and so on). Naming conventions Conventions for naming packages, classes, fields, and methods in the camel case generally follow the Java conventions. Note that you should avoid the underscore (_) in variable names (such as first_name or _first_name) as the underscore has a special meaning in Scala (self or this in anonymous functions). However, constants, most likely declared as private static final myConstant in Java, are normally declared in Scala in the upper camel case, such as in the following enclosing object: scala> object Constants { | val MyNeverChangingAge = 20 | } defined module Constants Choosing a meaningful name for variables and methods should always be a priority in Java, and it is often recommended to use rather long variable names to precisely describe what a variable or method represents. In Scala, things are a little bit different; meaningful names are, of course, a good way to make code more readable. However, as we are at the same time aiming at making behavior transformations concise through the use of functions and lambda expressions, short variable names can be an advantage if you can capture a whole piece of functionality in a short block of code. For example, incrementing a list of integers in Scala can simply be expressed as follows: scala> val amounts = List(3,6,7,10) map ( x => x +1 ) amounts: List[Int] = List(4, 7, 8, 11) Although using x as a variable name is often discouraged in Java, here it does not matter that much as the variable is not reused and we can capture the transformation it does at once. There are many short or long alternatives to the previous lambda syntax that will produce the same result. So, which one to choose? Some of the alternatives are as follows: scala> val amounts = List(3,6,7,10) map ( myCurrentAmount => myCurrentAmount +1 ) amounts: List[Int] = List(4, 7, 8, 11) In this case, a long variable name breaks a clear and concise one-liner into two lines of code, thereby, making it difficult to understand. Meaningful names make more sense here if we start expressing logic on several lines as shown in the following code snippet: scala> val amounts = List(3,6,7,10) map { myCurrentAmount => val result = myCurrentAmount + 1 println("Result: " + result) result } Result: 4 Result: 7 Result: 8 Result: 11 amounts: List[Int] = List(4, 7, 8, 11) A shorter but still expressive name is sometimes a good compromise to indicate to the reader that this is an amount we are currently manipulating in our lambda expression, as follows: scala> val amounts = List(3,6,7,10) map( amt => amt + 1 ) amounts: List[Int] = List(4, 7, 8, 11) Finally, the shortest syntax of all that is well accepted by fluent Scala programmers for such a simple increment function is as follows: scala> val amounts = List(3,6,7,10) map( _ + 1 ) amounts: List[Int] = List(4, 7, 8, 11) Underscores are also encountered in Scala for expressing more complicated operations in an elegant but more awkward way, as is the following sum operation using the foldLeft method that accumulates the state from one element to the other: scala> val sumOfAmounts = List(3,6,7,10).foldLeft(0)( _ + _ ) sumOfAmounts: Int = 26 Instead of explicitly having 0 as the initial value for the sum, we can write this summation a bit more elegantly by using the reduce method that is similar to foldLeft. However, we take the first element of the collection as the initial value (here, 3 will be the initial value), as shown in the following command: scala> val sumOfAmounts = List(3,6,7,10) reduce ( _ + _ ) sumOfAmounts: Int = 26 As far as style is concerned, fluent Scala programmers will not have any problem reading this code. However, if the state accumulation operation is more complicated than just a simple + operation, it might be wise to write it more explicitly as shown in the following command: scala> val sumOfAmounts = List(3,6,7,10) reduce ( (total,element) => total + element ) sumOfAmounts: Int = 26 Summary In this article, we discussed about the style differences and the naming conventions that we must be aware of, to write easier-to-read and more maintainable code. Resources for Article: Further resources on this subject: The Business Layer (Java EE 7 First Look) [article] Getting Started with JavaFX [article] Enterprise JavaBeans [article]
Read more
  • 0
  • 0
  • 2473