Programming | 0 articles | Tech News, Tutorials & Expert Insights

18 Jul 2014

17 min read

C10K – A Non-blocking Web Server in Go

18 Jul 2014

0
0
6661

article-image-apache-karaf-provisioning-and-clusters

Packt

18 Jul 2014

12 min read

Apache Karaf – Provisioning and Clusters

Packt

18 Jul 2014

12 min read

0
0
2834

Packt

14 Jul 2014

16 min read

Progressive Mockito

Packt

14 Jul 2014

16 min read

0
0
2294

article-image-part-2-deploying-multiple-applications-capistrano-single-project

Rodrigo Rosenfeld

01 Jul 2014

8 min read

Part 2: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld

01 Jul 2014

8 min read

In part 1, we covered Capistrano and why you would use it. We also covered mixins, which provide the base for what we will do in this post, which is to deploy a sample project using Capistrano. For this project, suppose our user interface is a combination of two applications,app1 and app2. They should be deployed to servers do and ec2. And we'll provide two environments,production and cert. Make sure Ruby and Bundler are installed before you start. First, we create a new directory for our project, and add a Gemfile to it with capistrano as a dependency. Then we will create the Capistrano directory structure: mkdircapsample cd capsample bundle init echo "gem 'capistrano'" >>Gemfile bundle bundle exec cap install STAGES="do_prod_app1,do_prod_app2,do_cert_app1,do_cert_app2,ec2_prod_app1,ec2_prod_app2,ec2_cert_app1,ec2_cert_app2" This will create nine files under config/deploy, one for each server/environment/application group. This is just to demonstrate the idea. We'll completely override their entire content later on. It will also create a Capfile file that works in a similar way to a regular Rakefile. With Rake, you can get a list of the available tasks with rake -T. With Capistrano you can get the same using: bundle exec cap -T Behind the scenes, cap is a binary distributed with the capistrano gem that will run Rake with Capfile set as the Rakefile and supporting a few other options like --roles.Now create a new file,lib/mixin.rb, with the content mentioned in the Using mixins section in part 1. Then add this to the top of the Capfile: $: . unshiftFile.dirname(__FILE__) require'lib/mixin' Each of the files under config/deploy will look very similar to each other. For instance, ec2_prod_app1 would look like this: mixin 'servers/ec2' mixin'environments/production' mixin'applications/app1' Then config/mixins/servers/ec2.rb would look like this: server 'ec2.mydomain.com', roles: [:main] set :database_host, 'ec2-db.mydomain.com' This file contains definitions that are valid (or default) for the whole server, no matter what environment or application we're deploying. In this example the database host is shared for all applications and environments hosted on our ec2 server. Something to note here is that we're adding a single role named main to our server. If we specified all roles, like [:web, :db, :assets, :puma], then they would be shared with all recipes relying on this server mixin. So, a better approach would be to add them on the application's recipe, if required. For instance, you might want to add something like set :server_name, 'ec2.mydomain.com' to your server definitions. Then you can dynamically set the role in the application's recipe by calling role :db, [fetch(:server_name)] and so on for all required roles. However, this is usually not necessary for third-party recipes as they let you decide which role the recipe should act on. For example, if you want to deploy your application with Puma you can write set :puma_role, :main. Before we discuss a full example for the application recipe, let's look at what config/mixins/environments/production.rb might look like: set :branch, 'production' set :encoding_key, '098f6bcd4621d373cade4e832627b4f6' set :database_name, 'app_production' set :app1_port, 3000 set :app2_port, 3001 set :redis_port, 6379 set :solr_port, 8080 In this example, we're assuming that the ports for app1 and app2 , Redis and Solr will be the same for production in all servers, as well as the database name. Finally, the recipes themselves, which tell Capistrano how to set up an application, will be defined byconfig/mixins/applications/app1.rb. Here's an example for a simple Rails application: Rake :: Task['load:defaults'].invoke Rake::Task['load:defaults'].clear require'capistrano/rails' require'capistrano/puma' Rake::Task['load:defaults'].reenable Rake::Task['load:defaults'].invoke set :application, 'app1' set :repo_url, '[email protected]:me/app1.git' set :rails_env, 'production' set :assets_roles, :main set :migration_role, :main set :puma_role, :main set :puma_bind, "tcp://0.0.0.0:#{fetch :app1_port}" namespace :railsdo desc'Generate settings file' task :generate_settingsdo on roles(:all) do template ="config/templates/database.yml.erb" dbconfig=StringIO.new(ERB.new(File.read template).result binding) upload! dbconfig, release_path.join('config', 'database.yml') end end end before 'deploy:migrate', 'rails:generate_settings' # Create directories expected by Puma default settings: before 'puma:restart', 'create_log_and_tmp'do on roles(:all) do within shared_pathdo execute :mkdir, '-p', 'log', 'tmp/pids' end end end Make sure you remove the lines that set application and repo_url on the config/deploy.rb file generated bycap install. Also, if you're deploying a Rails application using this recipe you should also add the capistrano-rails andcapistrano3-puma gems to your Gemfile and run bundle again. In case you're running rbenv or rvmto install Ruby in the server, make sure you include either capistrano-rbenv or capistrano-rvm gems and require them on the recipe. You may also need to provide more information in this case. For rbenv you'd need to tell it which version to use with set :rbenv_ruby, '2.1.2' for example. Sometimes you'll find out that some settings are valid for all applications under all environments in all servers. The most important one to notice is the location for our applications as they must not conflict with each other. Another setting that could be shared across all combinations could be the private key used to connect to all servers. For such cases, you should add those settings directly to config/deploy.rb: set :deploy_to, -> { "/home/vagrant/apps/#{fetch :environment}/#{fetch :application}" } set :ssh_options, { keys: %w(~/.vagrant.d/insecure_private_key) } I strongly recommend connecting to your servers with a regular account rather than root. For our applications we use userbenv to manage our Ruby versions, so we're able to deploy them as regular users as long as our applications listen to high port numbers. We'd then setup our proxy server (nginx in our case) to forward the requests on port 80 and 443 to each application's port accordingly to the requested domains and paths. This is set up by some Chef recipes. Those recipes run as root in our servers. To connect using another user, just pass it in the server declaration. To connect to [email protected], this is how you'd set it up: server '192.168.33.10', user: 'vagrant', roles: [:main] set :ssh_options, { keys: %w(~/.vagrant.d/insecure_private_key) } Finally, we create a config/database.yml that's suited for our environment on demand, before running the migrations task. Here's what the template config/templates/database.ymlcould look like: production: adapter: postgresql encoding: unicode pool: 30 database: <%= fetch :database_name %> host: <%= fetch :database_host %> I've omitted the settings for app2 , but in case it was another Rails application, we could extract the common logic between them to another common_rails mixin. Also notice that because we're not requiring capistrano/rails and capistrano/puma in the Capfile, their default values won't be set as Capistrano has already invoked the load:defaults task before our mixins are loaded. That's why we clear that task, require the recipes, and then re-enable and re-run the task so that the default for those recipes have the opportunity to load. Another approach is to require those recipes directly in the Capfile. But unless the recipes are carefully crafted to only run their commands for very specific roles, it's likely that you can get unexpected behavior if you deploy an application with Rails, another one with Grails, and yet another with NodeJS. If any of them has commands that run for all roles, or if the role names between them conflict somehow you'd be in trouble. So, unless you have total control and understanding about all your third-party recipes, I'd recommend that you use the approach outlined in the examples above. Conclusion All the techniques presented here are used to manage our real complex scenario at e-Core, where we support multiple applications in lots of environments that are replicated in three servers. We found that this allowed us to quickly add new environments or servers as needed to recreate our application in no time. Also, I'd like to thank Juan Ibiapina, who worked with me on all these recipes to ensure our deployment procedures are fully automated—almost. We still manage our databases and documents manually because we prefer to. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems. For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations,rails-web-console, the JS specs runner oojspec, sequel-devise, and the Linux X11 utility ktrayshortcut. Rodrigo was hired by e-Core (Porto Alegre-RS, Brazil) to work from home, building and maintaining software for Matterhorn Transactions Inc. with a team of great developers. Matterhorn's main product, the Market Tracker, is used by LexisNexis clients .

0
0
3221

article-image-part-1-deploying-multiple-applications-capistrano-single-project

Rodrigo Rosenfeld

01 Jul 2014

9 min read

Part 1: Deploying Multiple Applications with Capistrano from a Single Project

Rodrigo Rosenfeld

01 Jul 2014

9 min read

Capistrano is a deployment tool written in Ruby that is able to deploy projects using any language or framework, through a set of recipes, which are also written in Ruby. Capistrano expects an application to have a single repository and it is able to run arbitrary commands on the server through an SSH non-interactive session. Capistrano was designed assuming that an application is completely described by a single repository with all code belonging to it. For example, your web application is written with Ruby on Rails and simply serving that application would be enough. But what if you decide to use a separate application for managing your users, in a separate language and framework? Or maybe some issue tracker application? You could setup a proxy server to properly deliver each request to the right application based upon the request path for example. But the problem remains: how do you use Capistrano to manage more complex scenarios like this if it supports a single repository? The typical approach is to integrate Capistrano on each of the component applications and then switching between those projects before deploying those components. Not only this is a lot of work to deploy all of these components, but it may also lead to a duplication of settings. For example, if your main application and the user management application both use the same database for a given environment, you’d have to duplicate this setting in each of the components. For the Market Tracker product, used byLexisNexis clients (which we develop at e-Core for Matterhorn Transactions Inc.), we were looking for a better way to manage many component applications, in lots of environments and servers. We wanted to manage all of them from a single repository, instead of adding Capistrano integration to each of our component’s repositories and having to worry about keeping the recipes in sync between each of the maintained repository branches. Motivation The Market Tracker application we maintain consists of three different applications: the main one, another to export search results to Excel files, and an administrative interface to manage users and other entities. We host the application in three servers: two for the real thing and another back-up server. The first two are identical ones and allow us to have redundancy and zero downtime deployments except for a few cases where we change our database schema in incompatible ways with previous versions. To add to the complexity of deploying our three composing applications to each of those servers, we also need to deploy them multiple times for different environments like production, certification, staging, and experimental. All of them run on the same server, in separate ports, and they are running separate databases:Solr and Redis instances. This is already complex enough to manage when you integrate Capistrano to each of your projects, but it gets worse. Sometimes you find bugs in production and have to release quick fixes, but you can't deploy the version in the master branch that has several other changes. At other times you find bugs on your Capistrano recipes themselves and fix them on the master. Or maybe you are changing your deploy settings rather than the application’s code. When you have to deploy to production, depending on how your Capistrano recipes work, you may have to change to the production branch, backport any changes for the Capistrano recipes from the master and finally deploy the latest fixes. This happens if your recipe will use any project files as a template and they moved to another place in the master branch, for example. We decided to try another approach, similar to what we do with our database migrations. Instead of integrating the database migrations into the main application (the default on Rails, Django, Grails, and similar web frameworks) we prefer to handle it as a separate project. In our case we use theactive_record_migrations gem, which brings standalone support for ActiveRecord migrations (the same that is bundled with Rails apps by default). Our database is shared between the administrative interface project and the main web application and we feel it's better to be able to manage our database schema independently from the projects using the database. We add the migrations project to the other application as submodules so that we know what database schema is expected to work for a particular commit of the application, but that's all. We wanted to apply the same principles to our Capistrano recipes. We wanted to manage all of our applications on different servers and environments from a single project containing the Capistrano recipes. We also wanted to store the common settings in a single place to avoid code duplication, which makes it hard to add new environments or update existing ones. Grouping all applications' Capistrano recipes in a single project It seems we were not the first to want all Capistrano recipes for all of our applications in a single project. We first tried a project called caphub. It worked fine initially and its inheritance model would allow us to avoid our code duplication. Well, not entirely. The problem is that we needed some kind of multiple inheritances or mixins. We have some settings, like token private key, that are unique across environments, like Certification and Production. But we also have other settings that are common in within a server. For example, the database host name will be the same for all applications and environments inside our collocation facility, but it will be different in our backup server at Amazon EC2. CapHub didn't help us to get rid of the duplication in such cases, but it certainly helped us to find a simple solution to get what we wanted. Let's explore how Capistrano 3 allows us to easily manage such complex scenarios that are more common than you might think. Capistrano stages Since Capistrano 3, multistage support is built-in (there was a multistage extension for Capistrano 2). That means you can writecap stage_nametask_name, for examplecap production deploy. By default,cap install will generate two stages: production and staging. You can generate as many as you want, for example: cap install STAGES=production,cert,staging,experimental,integrator But how do we deploy each of those stages to our multiple servers, since the settings for each stage may be different across the servers? Also, how can we manage separate applications? Even though those settings are called "stages" by Capistrano, you can use it as you want. For example, suppose our servers are named m1,m2, and ec2 and the applications are named web, exporter and admin. We can create settings likem1_staging_web, ec2_production_admin, and so on. This will result in lots of files (specifically 45 = 5 x 3 x 3 to support five environments, three applications, and three servers) but it's not a big deal if you consider the settings files can be really small, as the examples will demonstrate later on in this article by using mixins. Usually people will start with staging and production only, and then gradually add other environments. Also, they usually start with one or two servers and keep growing as they feel the need. So supporting 45 combinations is not such a pain since you don’t write all of them at once. On the other hand, if you have enough resources to have a separate server for each of your environments, Capistrano will allow you to add multiple "server" declarations and assign roles to them, which can be quite useful if you're running a cluster of servers. In our case, to avoid downtime we don't upgrade all servers in our cluster at once. We also don't have the budget to host 45 virtual machines or even 15. So the little effort to generate 45 small settings files compensates the savings with hosting expenses. Using mixins My next post will create an example deployment project from scratch providing detail for everything that has been discussed in this post. But first, let me introduce the concept of what we call a mixin in our project. Capistrano 3 is simply a wrapper on top of Rake. Rake is a build tool written in Ruby, similar to “make.” It has targets and targets have prerequisites. This fits nicely in the way Capistrano works, where some deployment tasks will depend on other tasks. Instead of a Rakefile (Rake’s Makefile) Capistrano will use a Capfile, but other than that it works almost the same way. The Domain Specific Language (DSL) in a Capfile is enhanced as you include Capistrano extensions to the Rake DSL. Here’s a sample Capfile, generated by cap install, when you install Capistrano: # Load DSL and Setup Up Stages require'capistrano/setup' # Includes default deployment tasks require'capistrano/deploy' # Includes tasks from other gems included in your Gemfile # # For documentation on these, see for example: # # https://github.com/capistrano/rvm # https://github.com/capistrano/rbenv # https://github.com/capistrano/chruby # https://github.com/capistrano/bundler # https://github.com/capistrano/rails # # require 'capistrano/rvm' # require 'capistrano/rbenv' # require 'capistrano/chruby' # require 'capistrano/bundler' # require 'capistrano/rails/assets' # require 'capistrano/rails/migrations' # Loads custom tasks from `lib/capistrano/tasks' if you have any defined. Dir.glob('lib/capistrano/tasks/*.rake').each { |r| import r } Just like a Rakefile, a Capfile is valid Ruby code, which you can easily extend using regular Ruby code. So, to support a mixin DSL, we simply need to extend the DSL, like this: defmixin (path) loadFile.join('config', 'mixins', path +'.rb') end Pretty simple, right? We prefer to add this to a separate file, like lib/mixin.rb and add this to the Capfile: $:.unshiftFile.dirname(__FILE__) require 'lib/mixin' After that, calling mixin 'environments/staging' should load settings that are common for the staging environment from a file called config/mixins/environments/staging.rb in the root of the Capistrano-enabled project. This is the base to set up our deployment project that we will create in the next post. About the author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems.For the past five years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems includingactive_record_migrations, rails-web-console, the JS specs runner oojspec, sequel-devise and the Linux X11 utility ktrayshortcut.Rodrigo was hired by e-Core (Porto Alegre - RS, Brazil) to work from home, building and maintaining software forMatterhorn Transactions Inc. with a team of great developers. Matterhorn'smain product, the Market Tracker, is used by LexisNexis clients.

0
0
3420

article-image-part-1-managing-multiple-apps-and-environments-capistrano-3-and-chef-solo

Rodrigo Rosenfeld

30 Jun 2014

8 min read

Part 1: Managing Multiple Apps and Environments with Capistrano 3 and Chef Solo

Rodrigo Rosenfeld

30 Jun 2014

8 min read

In my previous two posts, I explored how to use Capistrano to deploy multiple applications in different environments and servers. This, however, is only one part of our deployment procedures. It just takes care of the applications themselves, but we still rely on the server being properly set up so that our Capistrano recipes work. In these two posts I'll explain how to use Chef to manage servers, and how to integrate it with Capistrano and perform all of your deployment procedures from a single project. Introducing the sample deployment project After I wrote the previous two posts, I realized I was not fully happy with a few issues of our company's deployment strategy: Duplicate settings: This was the main issue that was puzzling me. I didn't like the fact that we had to duplicate some settings like the application's binding port in both Chef and Capistrano projects. Too many required files (45 to support 3 servers, 5 environments, and 3 applications): While the files were really small, I felt that this situation could be further improved by the use of some conventions. So, I decided to work in a proof-of-concept project that would integrate both Chef and Capistrano and fix these issues. After a weekend working (almost) full time on it, I came up with a sample project so that you can fork it and adapt it to your deployment scenario. The main goal of this project hasn't changed from my previous article. We want to be able to support new environments and servers very quickly by simply adding some settings to the project. Go ahead and clone it. Follow the instructions on the README and it should deploy the Rails Devise sample application into a VirtualBox Virtual Machine (VM) using Vagrant. The following sections will explain how it works and the reasons behind its design. The overall idea While it's possible to accomplish all of your deployment tasks with either Chef or Capistrano alone, I feel that they are more suitable for different tasks. There are many existing recipes that you can take advantage of for both projects, but they usually don't overlap much. There are Chef community cookbooks available to help you install nginx, apache2, java, databases, and much more. You probably want to use Chef to perform administrative tasks like managing services, server backup, installing software, and so on. Capistrano, on the other hand, will help you by deploying the applications itself after the server is ready to go, and after running your Chef recipes. This includes creating releases of your application, which allows you to easily rollback to a previous working version, for example. You'll find existing Capistrano recipes to help you with several application-related tasks like running Bundler, switching between Ruby versions with either rbenv, rvm or chruby, running Rails migrations and assets precompilation, and so on. Capistrano recipes are well integrated with the Capistrano deploy flow. For instance, the capistrano-puma recipe will automatically generate a settings file if it is missing and start puma after the remaining deployment tasks have finished by including this in its recipes: after 'deploy:check', 'puma:check' after 'deploy:finished', 'puma:smart_restart' Another difference between sysadmin and deployment tasks is that usually the former will require superuser privileges while the latter is recommended to be accomplished by a regular user. This way, you can feel safer when deploying Capistrano recipes, since you know it won't affect the server itself, except for the applications managed by that user account. And deploying an application is way more common than installing and configuring programs or changing the proxy's settings. Some of the settings required by Chef and Capistrano recipes overlap. One example is a Chef recipe that generates an nginx settings file that will proxy requests to a Rails application listening on a local port. In this scenario, the binding address used by the Capistrano puma recipe needs to coincide with the port declared in the proxy settings for the nginx configuration file. Managing deployment settings Capistrano and Chef provide different built-in ways of managing their settings. Capistrano will use a Domain Specific Language (DSL) like set/fetch, while Chef will read the attributes following a well described precedence. I strongly advise you to keep with those approaches for settings that are specific for each project. To enable you to remove any duplication by overlapping deployment settings, I introduced another configuration declaration framework for the shared settings using the configatron gem, by taking advantage of the fact that both Chef and Capistrano are written in Ruby. Take a look at the settings directory in the sample project: settings/ ├── applications │ └── rails-devise.rb ├── common.rb ├── environments │ ├── development.rb │ └── production.rb └── servers └── vagrant.rb The settings are split in common, along with those specific for each application, environment, and servers. As you would expect, the Rails Devise application deployed to the production environment in the vagrant server will read the settings from common.rb, servers/vagrant.rb, environments/production.rb, and applications/rails-devise.rb. If some of your settings apply to the Rails Devise running on a given server or environment (or both), it's possible to override the specific settings in other files like rails-devise_production.rb, vagrant_production.rb, or vagrant_production_rails-devise.rb. Here's the definition of load_app_settings in common_helpers/settings_loader.rb: def load_app_settings(app_name, app_server, app_env) cfg.app_name = app_name cfg.app_server = app_server cfg.app_env = app_env [ 'common', "servers/#{app_server}", "environments/#{app_env}", "applications/#{app_name}", "#{app_server}_#{app_env}", "#{app_server}_#{app_name}", "#{app_name}_#{app_env}", "#{app_server}_#{app_env}_#{app_name}", ].each{|s| load_settings s } cfg.lock! end Feel free to change the load path order. The latest settings take precedence over the first ones. So if the binding port is usually 3000 for production but 4000 for your ec2 server, you can add a cfg.my_app.binding_port = 3000 to environments/production.rb and override it on ec2_production.rb. Once those settings are loaded, they are locked and can't be changed by the deployment recipes. As a final note, the settings can also be set using a hash notation, which can be useful if you’re using a dynamic setting attribute. Here’s an example: cfg[:my_app][“binding_#{‘port’}”] = 3000. This is not really useful in this case, but it illustrates the setting capabilities. Calculated settings Two types of calculated settings are supported on this project: delayed and dynamic. Delayed are lazily evaluated the first time they are requested, while dynamic are always evaluated. They are useful for providing default values for some settings that could be overridden by other settings files. I prefer to use delayed attributes for those that are meant to be overridden and dynamic ones for those that are meant to be calculated, even though delayed ones would be suitable for both cases. Here's the common.rb from the sample project to illustrate the idea: require 'set' cfg.chef_runlist = Set.new cfg.deploy_user = 'deploy' cfg.deployment_repo_url = '[email protected]:rosenfeld/capistrano-chef-deployment.git' cfg.deployment_repo_host = 'github.com' cfg.deployment_repo_symlink = false cfg.nginx.default = false # Delayed attributes: they are set to the block values unless explicitly set to other value cfg.database_name = delayed_attr{ "app1_#{cfg.app_env}" } cfg.nginx.subdomain = delayed_attr{ cfg.app_env } # Dynamic/calculated attributes: those are always evaluated by the block # Those attributes are not meant to be overrideable cfg.nginx.host = dyn_attr{ "#{cfg.nginx.subdomain}.mydomain.com" } cfg.nginx.host in this instance is not meant to be overridden by any other settings file and follows the company's policy. But it would be okay to override the production database name to app1 instead of using the default app1_production. This is just a guideline, but it should give you a good idea of some ways that Chef and Capistrano can be used together. Conclusion I hope you found this post as useful as I did. Being able to fully deploy the whole application stack from a single repository saves us a lot of time and simplifies our deployment a lot, and in the next post, Part 2, I will walk you through that deployment. About The Author Rodrigo Rosenfeld Rosas lives in Vitória-ES, Brazil, with his lovely wife and daughter. He graduated in Electrical Engineering with a Master’s degree in Robotics and Real-time Systems. For the past 5 years Rodrigo has focused on building and maintaining single page web applications. He is the author of some gems including active_record_migrations, rails-web-console, the JS specs runner oojspec, sequel-devise, and the Linux X11 utility ktrayshortcut. Rodrigo was hired by e-Core (Porto Alegre - RS, Brazil) to work from home, building and maintaining software for Matterhorn Transactions Inc. with a team of great developers. Matterhorn's main product, the Market Tracker, is used by LexisNexis clients.

0
0
1947

Packt

25 Jun 2014

5 min read

Enterprise Geodatabase

Packt

25 Jun 2014

5 min read

0
0
2142

article-image-discovering-pythons-parallel-programming-tools

Packt

20 Jun 2014

3 min read

Discovering Python's parallel programming tools

Packt

20 Jun 2014

3 min read

(For more resources related to this topic, see here.) The Python threading module The Python threading module offers a layer of abstraction to the module _thread, which is a lower-level module. It provides functions that help the programmer during the hard task of developing parallel systems based on threads. The threading module's official papers can be found at http://docs.python.org/3/library/threading.html?highlight=threading#module-threadin. The Python multiprocessing module The multiprocessing module aims at providing a simple API for the use of parallelism based on processes. This module is similar to the threading module, which simplifies alternations between the processes without major difficulties. The approach that is based on processes is very popular within the Python users' community as it is an alternative to answering questions on the use of CPU-Bound threads and GIL present in Python. The multiprocessing module's official papers can be found at http://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#multiprocessing. The parallel Python module The parallel Python module is external and offers a rich API for the creation of parallel and distributed systems making use of the processes approach. This module promises to be light and easy to install, and integrates with other Python programs. The parallel Python module can be found at http://parallelpython.com. Among some of the features, we may highlight the following: Automatic detection of the optimal confi guration The fact that a number of worker processes can be changed during runtime Dynamic load balance Fault tolerance Auto-discovery of computational resources Celery – a distributed task queue Celery is an excellent Python module that's used to create distributed systems and has excellent documentation. It makes use of at least three different types of approach to run tasks in concurrent form—multiprocessing, Eventlet, and Gevent. This work will, however, concentrate efforts on the use of the multiprocessing approach. Also, the link between one and another is a configuration issue, and it remains as a study so that the reader is able to establish comparisons with his/her own experiments. The Celery module can be obtained on the official project page at http://celeryproject.org. Summary In this article, we had a short introduction to some Python modules, built-in and external, which makes a developer's life easier when building up parallel systems. Resources for Article: Further resources on this subject: Getting Started with Spring Python [Article] Python Testing: Installing the Robot Framework [Article] Getting Up and Running with MySQL for Python [Article]

0
0
4799

Packt

19 Jun 2014

14 min read

Getting Started with Mockito

Packt

19 Jun 2014

14 min read

0
0
7336

Packt

19 Jun 2014

16 min read

Common performance issues

Packt

19 Jun 2014

16 min read

0
0
7393

Packt

11 Jun 2014

6 min read

The anatomy of a report processor

Packt

11 Jun 2014

6 min read

(For more resources related to this topic, see here.) At its most basic, a Puppet report processor is a piece of Ruby code that is triggered every time a Puppet agent passes a report to the Puppet master. This piece of code is passed as a Ruby object that contains both the client report and metrics. Although the data is sent in a wire format, such as YAML or PSON, by the time a report processor is triggered, this data is turned into an object by Puppet. This code can simply provide reports, but we're not limited to that. With a little imagination, we can use Puppet report processors for everything from alerts through to the orchestration of events. For instance, using a report processor and a suitable SMS provider would make it easy for Puppet to send you an SMS alert every time a run fails, or alternatively, using a report processor, you could analyze the data to reveal trends in your changes and update a change management console. The best way to think of a report processor is that it is a means to trigger actions on the event of a change, rather than strictly a reporting tool. Puppet reports are written in plain old Ruby, and so you have access to the multitude of libraries available via the RubyGems repositories. This can make developing your plugins relatively simple, as half the time you will find that the heavy lifting has been done for you by some enterprising fellow who has already solved your problem and published his code in a gem. Good examples of this can be found if you need to interoperate with another product such as MySQL, Oracle, Salesforce, and so on. A brief search on the Internet will bring up three or four examples of libraries that will offer this functionality within a few lines of code. Not having to produce the plumbing of a solution will both save time and generally produce fewer bugs. Creating a basic report processor Let's take a look at an incredibly simple report processor example. In the event that a Puppet agent fails to run, the following code will take the incoming data and create a little text file with a short message detailing which host had the problem: include puppet Puppet::Reports::register_report(:myfirstreport) do desc "My very first report!" def process if self.status == 'failed' msg = "failed puppet run for #{self.host} #{self.status} File.open('./tmp/puppetpanic.txt', 'w') { | f | f.write(msg)} end end end Although this code is basic, it contains all of the components required for a report processor. The first line includes the only mandatory library required: the Puppet library. This gives us access to several important methods that allow us to register and describe our report processor, and finally, a method to allow us to process our data. Registering your report processor The first method that every report processor must call is the Puppet::Reports::register_report method. This method can only take one argument, which is the name of the report processor. This name should be passed as a symbol and an alphanumeric title that starts with a letter (:report3 would be fine, but :3reports would not be). Try to avoid using any other characters—although you can potentially use underscores, the documentation is rather discouragingly vague on how valid this is and could well cause issues. Describing your report processor After we've called the Puppet::Reports::register_report method, we then need to call the desc method. The desc method is used to provide some brief documentation for what the report processor does and allows the use of Markdown formatting in the string. Processing your report The last method that every report processor must include is the process method. The process method is where we actually take our Puppet data and process it, and to make working with the report data easier, you have access to the .self object within the process method. The .self object is a Puppet::Transaction::Report object and gives you access to the Puppet report data. For example, to extract the hostname of the reporting host, we can use the self.host object. You can find the full details of what is contained in the Puppet::Transaction::Report object by visiting http://docs.puppetlabs.com/puppet/latest/reference/format_report.html. Let's go through our small example in detail and look at what it's doing. First of all, we include the Puppet library to ensure that we have access to the required methods. We then register our report by calling the Puppet::Reports::register_report(:myfirstreport) method and pass it the name of myfirstreport. Next, we add our desc method to tell users what this report is for. Finally, we have the process method, which is where we are going to place our code to process the report. For this example, we're going to keep it simple and simply check if the Puppet agent reported a successful run or not, and we do this by checking the Puppet status. This is described in the following code snippet: if self.status == 'failed' msg = "failed puppet run for #{self.host}#{self.status}" The transaction can produce one of three states: failed, changed, or unchanged. This is straightforward; a failed client run is any run that contains a resource that has a status of failed, a changed state is triggered when the client run contains a resource that has been given a status of changed, and the unchanged state occurs when a resource contains a value of out_of_sync; this generally happens if you run the Puppet client in noop (simulation) mode. Finally, we actually do something with the data. In the case of this very simple application, we're going to place the warning into a plain text file in the /tmp directory. This is described in the following code snippet: msg = "failed puppet run for #{self.host}" File.open('/tmp/puppetpanic.txt', 'w') { | f | f.write(msg)} As you can see, we're using basic string interpolation to take some of our report data and place it into the message. This is then written into a simple plain text file in the /tmp directory. Summary In this article, we have seen the anatomy of a report processor. We have also seen a basic Ruby code that sets up a simple report processor. Resources for Article: Further resources on this subject: Puppet: Integrating External Tools [Article] Quick start – Using the core Puppet resource types [Article] External Tools and the Puppet Ecosystem [Article]

0
0
1141

Packt

22 May 2014

11 min read

Ranges

Packt

22 May 2014

11 min read

(For more resources related to this topic, see here.) Sorting ranges efficiently Phobos' std.algorthm includes sorting algorithms. Let's look at how they are used, what requirements they have, and the dangers of trying to implement range primitives without minding their efficiency requirements. Getting ready Let's make a linked list container that exposes an appropriate view, a forward range, and an inappropriate view, a random access range that doesn't meet its efficiency requirements. A singly-linked list can only efficiently implement forward iteration due to its nature; the only tool it has is a pointer to the next element. Implementing any other range primitives will require loops, which is not recommended. Here, however, we'll implement a fully functional range, with assignable elements, length, bidirectional iteration, random access, and even slicing on top of a linked list to see the negative effects this has when we try to use it. How to do it… We're going to both sort and benchmark this program. To sort Let's sort ranges by executing the following steps: Import std.algorithm. Determine the predicate you need. The default is (a, b) => a < b, which results in an ascending order when the sorting is complete (for example, [1,2,3]). If you want ascending order, you don't have to specify a predicate at all. If you need descending order, you can pass a greater-than predicate instead, as shown in the following line of code: auto sorted = sort!((a, b) => a > b)([1,2,3]); // results: [3,2,1] When doing string comparisons, the functions std.string.cmp (case-sensitive) or std.string.icmp (case-insensitive) may be used, as is done in the following code: auto sorted = sort!((a, b) => cmp(a, b) < 0)(["b", "c", "a"]); // results: a, b, c Your predicate may also be used to sort based on a struct member, as shown in the following code: auto sorted = sort!((a, b) => a.value < b.value)(structArray); Pass the predicate as the first compile-time argument. The range you want to sort is passed as the runtime argument. If your range is not already sortable (if it doesn't provide the necessary capabilities), you can convert it to an array using the array function from std.range, as shown in the following code: auto sorted = sort(fibanocci().take(10)); // won't compile, not enough capabilities auto sorted = sort(fibanocci().take(10).array); // ok, good Use the sorted range. It has a unique type from the input to signify that it has been successfully sorted. Other algorithms may use this knowledge to increase their efficiency. To benchmark Let's sort objects using benchmark by executing the following steps: Put our range and skeleton main function from the Getting ready section of this recipe into a file. Use std.datetime.benchmark to test the sorting of an array from the appropriate walker against the slow walker and print the results at the end of main. The code is as follows: auto result = benchmark!( { auto sorted = sort(list.walker.array); }, { auto sorted = sort(list.slowWalker); } )(100); writefln("Emulation resulted in a sort that was %d times slower.", result[1].hnsecs / result[0].hnsecs); Run it. Your results may vary slightly, but you'll see that the emulated, inappropriate range functions are consistently slower. The following is the output: Emulation resulted in a sort that was 16 times slower. Tweak the size of the list by changing the initialization loop. Instead of 1000 entries, try 2000 entries. Also, try to compile the program with inlining and optimization turned on (dmd –inline –O yourfile.d) and see the difference. The emulated version will be consistently slower, and as the list becomes longer, the gap will widen. On my computer, a growing list size led to a growing slowdown factor, as shown in the following table: List size Slowdown factor 500 13 1000 16 2000 29 4000 73 How it works… The interface to Phobos' main sort function hides much of the complexity of the implementation. As long as we follow the efficiency rules when writing our ranges, things either just work or fail, telling us we must call an array in the range before we can sort it. Building an array has a cost in both time and memory, which is why it isn't performed automatically (std.algorithm prefers lazy evaluation whenever possible for best speed and minimum memory use). However, as you can see in our benchmark, building an array is much cheaper than emulating unsupported functions. The sort algorithms require a full-featured range and will modify the range you pass to it instead of allocating memory for a copy. Thus, the range you pass to it must support random access, slicing, and either assignable or swappable elements. The prime example of such a range is a mutable array. This is why it is often necessary to use the array function when passing data to sort. Our linked list code used static if with a compile-time parameter as a configuration tool. The implemented functions include opSlice and properties that return ref. The ref value can only be used on function return values or parameters. Assignments to a ref value are forwarded to the original item. The opSlice function is called when the user tries to use the slice syntax: obj[start .. end]. Inside the beSlow condition, we broke the main rule of implementing range functions: avoid loops. Here, we see the consequences of breaking that rule; it ruined algorithm restrictions and optimizations, resulting in code that performs very poorly. If we follow the rules, we at least know where a performance problem will arise and can handle it gracefully. For ranges that do not implement the fast length property, std.algorithm includes a function called walkLength that determines the length by looping through all items (like we did in the slow length property). The walkLength function has a longer name than length precisely to warn you that it is a slower function, running in O(n) (linear with length) time instead of O(1) (constant) time. Slower functions are OK, they just need to be explicit so that the user isn't surprised. See also The std.algorithm module also includes other sorting algorithms that may fit a specific use case better than the generic (automatically specialized) function. See the documentation at http://dlang.org/phobos/std_algorithm.html for more information. Searching ranges Phobos' std.algorithm module includes search functions that can work on any ranges. It automatically specializes based on type information. Searching a sorted range is faster than an unsorted range. How to do it… Searching has a number of different scenarios, each with different methods: If you want to know if something is present, use canFind. Finding an item generically can be done with the find function. It returns the remainder of the range, with the located item at the front. When searching for a substring in a string, you can use haystack.find(boyerMooreFinder(needle)). This uses the Boyer-Moore algorithm which may give better performance. If you want to know the index where the item is located, use countUntil. It returns a numeric index into the range, just like the indexOf function for strings. Each find function can take a predicate to customize the search operation. When you know your range is sorted but the type doesn't already prove it, you may call assumeSorted on it before passing it to the search functions. The assumeSorted function has no runtime cost; it only adds information to the type that is used for compile-time specialization. How it works… The search functions in Phobos make use of the ranges' available features to choose good-fit algorithms. Pass them efficiently implemented ranges with accurate capabilities to get best performance. The find function returns the remainder of the data because this is the most general behavior; it doesn't need random access, like returning an index, and doesn't require an additional function if you are implementing a function to split a range on a given condition. The find function can work with a basic input range, serving as a foundation to implement whatever you need on top of it, and it will transparently optimize to use more range features if available. Using functional tools to query data The std.algorithm module includes a variety of higher-order ranges that provide tools similar to functional tools. Here, we'll see how D code can be similar to a SQL query. A SQL query is as follows: SELECT id, name, strcat("Title: ", title) FROM users WHERE name LIKE 'A%' ORDER BY id DESC LIMIT 5; How would we express something similar in D? Getting ready Let's create a struct to mimic the data table and make an array with the some demo information. The code is as follows: struct User { int id; string name; string title; } User[] users; users ~= User(1, "Alice", "President"); users ~= User(2, "Bob", "Manager"); users ~= User(3, "Claire", "Programmer"); How to do it… Let's use functional tools to query data by executing the following steps: Import std.algorithm. Use sort to translate the ORDER BY clause. If your dataset is large, you may wish to sort it at the end. This will likely require a call to an array, but it will only sort the result set instead of everything. With a small dataset, sorting early saves an array allocation. Use filter to implement the WHERE clause. Use map to implement the field selection and functions. The std.typecons.tuple module can also be used to return specific fields. Use std.range.take to implement the LIMIT clause. Put it all together and print the result. The code is as follows: import std.algorithm; import std.range; import std.typecons : tuple; // we use this below auto resultSet = users. sort!((a, b) => a.id > b.id). // the ORDER BY clause filter!((item) => item.name.startsWith("A")). // the WHERE clause take(5). map!((item) => tuple(item.id, item.name,"Title: " ~ item.title)); // the field list and transformations import std.stdio; foreach(line; resultSet) writeln(line[0], " ", line[1], " ", line[2]); It will print the following output: 1 Alice Title: President How it works… Many SQL operations or list comprehensions can be expressed in D using some building blocks from std.algorithm. They all work generally the same way; they take a predicate as a compile-time argument. The predicate is passed one or two items at a time and you perform a check or transformation on it. Chaining together functions with the dot syntax, like we did here, is possible thanks to uniform function call syntax. It could also be rewritten as take(5, filter!pred(map!pred(users))). It depends on author's preference, as both styles work exactly the same way. It is important to remember that all std.algorithm higher-order ranges are evaluated lazily. This means no computations, such as looping over or printing, are actually performed until they are required. Writing code using filter, take, map, and many other functions is akin to preparing a query. To execute it, you may print or loop the result, or if you want to save it to an array for use later, simply call .array at the end. There's more… The std.algorithm module also includes other classic functions, such as reduce. It works the same way as the others. D has a feature called pure functions. The functions in std.algorithm are conditionally pure, which means they can be used in pure functions if and only if the predicates you pass are also pure. With lambda functions, like we've been using here, the compiler will often automatically deduce this for you. If you use other functions you define as predicates and want to use it in a pure function, be sure to mark them pure as well. See also Visit http://dconf.org/2013/talks/wilson.html where Adam Wilson's DConf 2013 talk on porting C# to D showed how to translate some real-world LINQ code to D Summary In this article, we learned how to sort ranges in an efficient manner by using sorting algorithms. We learned how to search a range using different functions. We also learned how to use functional tools to query data (similar to a SQL query). Resources for Article: Further resources on this subject: Watching Multiple Threads in C# [article] Application Development in Visual C++ - The Tetris Application [article] Building UI with XAML for Windows 8 Using C [article]

0
0
4846

Packt

20 May 2014

14 min read

Continuous Integration

Packt

20 May 2014

14 min read

(For more resources related to this topic, see here.) This article is named Continuous Integration; so, what exactly does this mean? You can find many long definitions, but to put it simply, it is a process where you integrate your code with code from other developers and run tests to verify the code functionality. You are aiming to detect problems as soon as possible and trying to fix problems immediately. It is always easier and cheaper to fix a couple of small problems than create one big problem. This can be translated to the following workflow: The change is committed to a version control system repository (such as Git or SVN). The Continuous Integration (CI) server is either notified of, or detects a change and then runs the defined tests. CI notifies the developer if the tests fail. With this method you immediately know who created the problem and when. For the CI to be able to run tests after every commit point, these tests need to be fast. Usually, you can do this with unit tests for integration, and with functional tests it might be better to run them within a defined time interval, for example, once every hour. You can have multiple sets of tests for each project, and another golden rule should be that no code is released to the production environment until all of the tests have been passed. It may seem surprising, but these rules and processes shouldn't make your work any slower, and in fact, should allow you to work faster and be more confident about the developed code functionality and changes. Initial investment pays off when you can focus on adding new functionality and are not spending time on tracking bugs and fixing problems. Also, tested and reliable code refers to code that can be released to the production environment more frequently than traditional big releases, which require a lot of manual testing and verification. There is a real impact on business, and it's not just about the discussion as to whether it is worthwhile and a good idea to write some tests and find yourself restricted by some stupid rules anymore. What will really help and is necessary is a CI server for executing tests and processing the results; this is also called test automation. Of course, in theory you can write a script for it and test it manually, but why would you do that when there are some really nice and proven solutions available? Save your time and energy to do something more useful. In this article, we will see what we can do with the most popular CI servers used by the PHP community: Travis CI Jenkins CI Xinc For us, a CI server will always have the same main task, that is, to execute tests, but to be precise, it includes the following steps: Check the code from the repository. Execute the tests. Process the results. Send a notification when tests fail. This is the bare minimum that a server must handle. Of course, there is much more to be offered, but these steps must be easy to configure. Using a Travis CI hosted service Travis is the easiest to use from the previously mentioned servers. Why is this the case? This is because you don't have to install it. It's a service that provides integration with GitHub for many programming languages, and not just for PHP. Primarily, it's a solution for open source projects, meaning your repository on GitHub is a public repository. It also has commercial support for private repositories and commercial projects. What is really good is that you don't have to worry about server configuration; instead, you just have to specify the required configuration (in the same way you do with Composer), and Travis does everything for you. You are not just limited to unit tests, and you can even specify which database you want to use and run ingratiation tests there. However, there is also a disadvantage to this solution. If you want to use it for a private repository, you have to pay for the service, and you are also limited with regard to the server configuration. You can specify your PHP version, but it's not recommended to specify a minor version such as 5.3.8; you should instead use a major version, such as 5.3. On the other hand, you can run tests against various PHP versions, such as PHP 5.3, 5.4, or 5.5, so when you want to upgrade your PHP version, you already have the test results and know how your code will behave with the new PHP version. Travis has become the CI server of choice for many open source projects, and it's no real surprise because it's really good! Setting up Travis CI To use Travis, you will need an account on GitHub. If you haven't got one, navigate to https://github.com/ and register there. When you have a GitHub account, navigate to https://travis-ci.org/ and click on Sign in with GitHub. As you can see in the preceding screenshot, there will be a Travis application added to your GitHub account. This application will work as a trigger that will start a build after any change is pushed onto the GitHub repository. To configure the Travis project, you have to follow these steps: You will be asked to allow Travis to access your account. When you do this you will go back to the Travis site, where you will see a list of your GitHub repositories. By clicking on On/Off, you can decide which project should be used by Travis. When you click on a project configuration, you will be taken to GitHub to enable the service hook. This is because you have to run a build after every commit, and Travis is going to be notified about this change. In the menu, search for Travis and fill in the details that you can find in your Travis account settings. Only the username and token are required, and the domain is optional. For a demonstration, you can refer to my sample project, where there is just one test suite, and its purpose is to test how Travis works (navigate to https://github.com/machek/travis): Using Travis CI When you link your GitHub account to Travis and set up a project to notify Travis, you need to configure the project. You need to follow the project setup in the same way that we did earlier. To have classes, you are required to have the test suites that you want to run, a bootstrap file, and a phpunit.xml configuration file. You should try this configuration locally to ensure that you can run PHPUnit, execute tests, and make sure that all tests pass. If you cloned the sample project, you will see that there is one important file: .travis.yml. This Travis configuration file is telling Travis what the server configuration should look like, and also what will happen after each commit. Let's have a look at what this file looks like: # see http://about.travis-ci.org/docs/user/languages/php/ for more hints language: php # list any PHP version you want to test against php: - 5.3 - 5.4 # optionally specify a list of environments env: - DB=mysql # execute any number of scripts before the test run, custom env's are available as variables before_script: - if [[ "$DB" == "mysql" ]]; then mysql -e "create database IF NOT EXISTS my_db;" -uroot; fi # omitting "script:" will default to phpunit script: phpunit --configuration phpunit.xml --coverage-text # configure notifications (email, IRC, campfire etc) notifications: email: "your@email" As you can see, the configuration is really simple, and it shows that we need PHP 5.3 and 5.4, and a MySQL database to create a database, execute the PHPUnit with our configuration, and send a report to my e-mail address. After each commit, PHPUnit executes all the tests. The following screenshot shows us an interesting insight into how Travis executes our tests and which environment it uses: You can view the build and the history for all builds. Even though there are no real builds in PHP because PHP is an interpreted language and not compiled, the action performed when you clone a repository, execute PHPUnit tests, and process results is usually called a build. Travis configuration can be much more complex, and you can run Composer to update dependency and much more. Just check the Travis documentation for PHP at http://about.travis-ci.org/docs/user/languages/php/. Using the Jenkins CI server Jenkins is a CI server. The difference between Travis and Jenkins is that when you use Travis as a service, you don't have to worry about the configuration, whereas Jenkins is piece of software that you install on your hardware. This is both an advantage and a disadvantage. The disadvantage is that you have to manually install it, configure it, and also keep it up to date. The advantage is that you can configure it in a way that suits you, and all of the data and code is completely under your control. This can be very important when you have customer code and data (for testing, never use live customer data) or sensitive information that can't be passed on to a third party. The Jenkins project started as a fork of the Hudson project and is written in Java but has many plugins that suit a variety of programming languages, including PHP. In recent years, it has become very popular, and nowadays it is probably the most popular CI server. The reasons for its popularity are that it is really good, can be configured easily, and there are many plugins available that probably cover everything you might need. Installation Installation is a really straightforward process. The easiest method is to use a Jenkins installation package from http://jenkins-ci.org/. There are packages available for Windows, OS X, and Linux, and the installation process is well-documented there. Jenkins is written in Java, which means that Java or OpenJDK is required. After this comes the installation, as you just launch the installation and point it to where it should be installed, and Jenkins is listening on port 8080. Before we move on to configure the first project (or job in Jenkins terminology), we need to install a few extra plugins. This is Jenkins' biggest advantage. There are many plugins and they are very easy to install. It doesn't matter that Jenkins is a Java app as it also serves PHP very well. For our task to execute tests, process results, and send notifications, we need the following plugins: Email-ext: This plugin is used to send notifications Git or Subversion: This plugin is used to check the code xUnit: This plugin is used for processing the PHPUnit test results Clover PHP: This plugin is used for processing the code coverage To install these plugins, navigate to Jenkins | Manage Jenkins | Manage Plugins and select the Available tab. You can find and check the required plugins, or alternatively use the search filter to find the one you need: For e-mails, you might need to configure the STMP server connection at Manage Jenkins | Configure System | E-mail notification section. Usage By now, we should have installed everything that we need, and we can start to configure our first simple project. We can use the same simple project that we used for Travis. This is just one test case, but it is important to learn how to set up a project. It doesn't matter if you have one or thousands of tests though, as the setup is going to be the same. Creating a job The first step is to create a new job. Select New Job from the Jenkins main navigation window, give it a name, and select Build a free-style software project. After clicking on OK, you get to the project configuration page. The most interesting things there are listed as follows: Source Code Management: This is where you check the code Build Triggers: This specifies when to run the build Build: This tests the execution for us Post-build Actions: This publishes results and sends notifications The following screenshot shows the project configuration window in Jenkins CI: Source Code Management Source code management simply refers to your version control system, path to the repository, and the branch/branches to be used. Every build is a clean operation, which means that Jenkins starts with a new directory where the code is checked. Build Triggers Build triggers is an interesting feature. You don't have to use it and you can start to build manually, but it is better to specify when a build should run. It can run periodically at a given interval (every two hours), or you can trigger a build remotely. One way to trigger a build is to use post commit hooks in the Git/SVN repository. A post commit hook is a script that is executed after every commit. Hooks are stored in the repository in the /hooks directory (.git/hooks for Git and /hooks for SVN). What you need to do is create a post-commit (SVN) or post-receive (Git) script that will call the URL given by Jenkins when you click on a Trigger remotely checkbox with a secret token: #!/bin/sh wget http ://localhost:8080/job/Sample_Project/build?token=secret12345ABC-O /dev/null After every commit/push to the repository, Jenkins will receive a request to run the build and execute the tests to check whether all of the tests work and that any code change there is not causing unexpected problems. Build A build is something that might sound weird in the PHP world, as PHP is interpreted and not compiled; so, why do we call it a build? It's just a word. For us, it refers to a main part of the process—to execute unit tests. You have to navigate to Add a build step—click on either Execute Windows batch command or Execute shell. This depends on your operating system, but the command remains the same: phpunit --log-junit=result.xml --coverage-clover=clover.xml This is simple and outputs what we want. It executes tests, stores the results in the JUnit format in the file result.xml, and generates code coverage in the clover format in the file clover.xml. I should probably mention that PHPUnit is not installed with Jenkins, and your build machine on which Jenkins is running must have PHPUnit installed and configured, including PHP CLI. Post-build Actions In our case, there are three post-build actions required. They are listed as follows: Process the test result: This denotes whether the build succeeded or failed. You need to navigate to Add a post-build action | Publish Junit test result report and type result.xml. This matches the switch --log-junit=result.xml. Jenkins will use this file to check the tests results and publish them. Generate code coverage: This is similar to the first step. You have to add the Publish Clover PHP Coverage report field and type clover.xml. It uses a second switch, --coverage-clover=clover.xml, to generate code coverage, and Jenkins uses this file to create a code coverage report. E-mail notification: It is a good idea to send an e-mail when a build fails in order to inform everybody that there is a problem, and maybe even let them know who caused this problem and what the last commit was. This step can be added simply by choosing E-mail notification action. Results The result could be just an e-mail notification, which is handy, but Jenkins also has a very nice dashboard that displays the current status for each job, and you can also see and view the build history to see when and why a build failed. A nice feature is that you can drill down through the test results or code coverage and find more details about test cases and code coverage per class. To make testing even more interesting, you can use Jenkins' The Continuous Integration Game plugin. Every developer receives positive points for written tests and a successful build, and negative points for every build that they broke. The game leaderboard shows who is winning the build game and writing better code.

0
0
5708

article-image-working-neo4j-embedded-database

Packt

09 May 2014

6 min read

Working with a Neo4j Embedded Database

Packt

09 May 2014

6 min read

(For more resources related to this topic, see here.) Neo4j is a graph database, which means that it does not use tables and rows to represent data logically; instead, it uses nodes and relationships. Both nodes and relationships can have a number of properties. While relationships must have one direction and one type, nodes can have a number of labels. For example, the following diagram shows three nodes and their relationships, where every node has a label (language or graph database), while relationships have a type (QUERY_LANGUAGE_OF and WRITTEN_IN). The properties used in the graph shown in the following diagram are: name, type, and from. Note that every relation must have exactly one type and one direction, whereas labels for nodes are optional and can be multiple. Neo4j running modes Neo4j can be used in two modes: An embedded database in a Java application; A standalone server via REST In any case, this choice does not affect the way you query and work with the database. It's only an architectural choice driven by the nature of the application (whether a standalone server or a client-server), performance, monitoring, and safety of data. An embedded database An embedded Neo4j database is the best choice for performance. It runs in the same process of the client application that hosts it and stores data in the given path. Thus, an embedded database must be created programmatically. We choose an embedded database for the following reasons: When we use Java as the programming language for our project When our application is standalone Preparing the development environment The fastest way to prepare the IDE for Neo4j is using Maven. Maven is a dependency management and automated building tool. In the following procedure, we will use NetBeans 7.4, but it works in a very similar way with the other IDEs (for Eclipse, you would need the m2eclipse plugin). The procedure is described as follows: Create a new Maven project as shown in the following screenshot: In the next page of the wizard, name the project, set a valid project location, and then click on Finish. After NetBeans has created the project, expand Project Files in the project tree and open the pom.xml file. In the <dependencies> tag, insert the following XML code: <dependencies> <dependency> <groupId>org.neo4j</groupId> <artifactId>neo4j</artifactId> <version>2.0.1</version> </dependency> </dependencies> <repositories> <repository> <id>neo4j</id> <url>http://m2.neo4j.org/content/repositories/releases/</url> <releases> <enabled>true</enabled> </releases> </repository> </repositories> This code instructs Maven the dependency we are using on our project, that is, Neo4j. The version we have used here is 2.0.1. Of course, you can specify the latest available version. Once saved, the Maven file resolves the dependency, downloads the JAR files needed, and updates the Java build path. Now, the project is ready to use Neo4j and Cypher. Creating an embedded database Creating an embedded database is straightforward. First of all, to create a database, we need a GraphDatabaseFactory class, which can be done with the following code: GraphDatabaseFactory graphDbFactory = new GraphDatabaseFactory(); Then, we can invoke the newEmbeddedDatabase method with the following code: GraphDatabaseService graphDb = graphDbFactory .newEmbeddedDatabase("data/dbName"); Now, with the GraphDatabaseService class, we can fully interact with the database, create nodes, create relationships, set properties and indexes. Invoking Cypher from Java To execute Cypher queries on a Neo4j database, you need an instance of ExecutionEngine; this class is responsible for parsing and running Cypher queries, returning results in a ExecutionResult instance: import org.neo4j.cypher.javacompat.ExecutionEngine; import org.neo4j.cypher.javacompat.ExecutionResult; // ... ExecutionEngine engine = new ExecutionEngine(graphDb); ExecutionResult result = engine.execute("MATCH (e:Employee) RETURN e"); Note that we use the org.neo4j.cypher.javacompat package and not the org.neo4j.cypher package even though they are almost the same. The reason is that Cypher is written in Scala, and Cypher authors provide us with the former package for better Java compatibility. Now with the results, we can do one of the following options: Dumping to a string value Converting to a single column iterator Iterating over the full row Dumping to a string is useful for testing purposes: String dumped = result.dumpToString(); If we print the dumped string to the standard output stream, we will get the following result: Here, we have a single column (e) that contains the nodes. Each node is dumped with all its properties. The numbers between the square brackets are the node IDs, which are the long and unique values assigned by Neo4j on the creation of the node. When the result is single column, or we need only one column of our result, we can get an iterator over one column with the following code: import org.neo4j.graphdb.ResourceIterator; // ... ResourceIterator<Node> nodes = result.columnAs("e"); Then, we can iterate that column in the usual way, as shown in the following code: while(nodes.hasNext()) { Node node = nodes.next(); // do something with node } However, Neo4j provides a syntax-sugar utility to shorten the code that is to be iterated: import org.neo4j.helpers.collection.IteratorUtil; // ... for (Node node : IteratorUtil.asIterable(nodes)) { // do something with node } If we need to iterate over a multiple-column result, we would write this code in the following way: ResourceIterator<Map<String, Object>> rows = result.iterator(); for(Map<String,Object> row : IteratorUtil.asIterable(rows)) { Node n = (Node) row.get("e"); try(Transaction t = n.getGraphDatabase().beginTx()) { // do something with node } } The iterator function returns an iterator of maps, where keys are the names of the columns. Note that when we have to work with nodes, even if they are returned by a Cypher query, we have to work in transaction. In fact, Neo4j requires that every time we work with the database, either reading or writing to the database, we must be in a transaction. The only exception is when we launch a Cypher query. If we launch the query within an existing transaction, Cypher will work as any other operation. No change will be persisted on the database until we commit the transaction, but if we run the query outside any transaction, Cypher will open a transaction for us and will commit changes at the end of the query. Summary We have now completed the setting up of a Neo4j database. We also learned about Cypher pattern matching. Resources for Article: Further resources on this subject: OpenSceneGraph: Advanced Scene Graph Components [Article] Creating Network Graphs with Gephi [Article] Building a bar graph cityscape [Article]

0
0
10033

article-image-differences-style-between-java-and-scala-code

Packt

22 Apr 2014

6 min read

Differences in style between Java and Scala code

Packt

22 Apr 2014

6 min read

(For more resources related to this topic, see here.) Writing an algorithm in Java follows an imperative style, that is, a sequence of statements that change a program state. Scala, focusing primarily on functional programming, adopts a more declarative approach, where everything is an expression rather than a statement. Let's illustrate this in an example. In Java, you would commonly find the following code snippet: ... String customerLevel = null; if(amountBought > 3000) { customerLevel = "Gold"; } else { customerLevel = "Silver"; } ... The Scala equivalent consists of the following code snippet: scala> val amountBought = 5000 amountBought: Int = 5000 scala> val customerLevel = if (amountBought> 3000) "Gold" else "Silver" customerLevel: String = Gold Note that unlike the Java statements, if is now embedded as part of the resulting evaluated expression. In general, working where everything is evaluated as an expression (and here an immutable expression) will make it much easier for reuse as well as composition. Being able to chain the result of one expression to the next will give you a concise way of expressing fairly complicated transformations that would require much more code in Java. Adjusting the code layout As the intent of functional programming is to minimize state behavior, it often consists of short lambda expressions so that you can visualize a fairly complicated transformation in an elegant and concise way, in many cases even as one-liners. For this reason, general formatting in Scala recommends that you use only two-space indentations instead of the four-space indentation that is generally admitted in Java code, as shown in the following code snippet: scala> class Customer( val firstName: String, val lastName: String, val age: Int, val address: String, val country: String, valhasAGoodRating: Boolean ) { override def toString() = s" $firstName $lastName" } defined class Customer If you have many constructor/method parameters, having them aligned as previously illustrated makes it easier to change them without the need to reformat the whole indentation. It is also the case if you want to refactor the class with a longer name, for example, VeryImportantCustomer instead of Customer; it will make smaller and more precise differences against your version control management system (Git, subversion, and so on). Naming conventions Conventions for naming packages, classes, fields, and methods in the camel case generally follow the Java conventions. Note that you should avoid the underscore (_) in variable names (such as first_name or _first_name) as the underscore has a special meaning in Scala (self or this in anonymous functions). However, constants, most likely declared as private static final myConstant in Java, are normally declared in Scala in the upper camel case, such as in the following enclosing object: scala> object Constants { | val MyNeverChangingAge = 20 | } defined module Constants Choosing a meaningful name for variables and methods should always be a priority in Java, and it is often recommended to use rather long variable names to precisely describe what a variable or method represents. In Scala, things are a little bit different; meaningful names are, of course, a good way to make code more readable. However, as we are at the same time aiming at making behavior transformations concise through the use of functions and lambda expressions, short variable names can be an advantage if you can capture a whole piece of functionality in a short block of code. For example, incrementing a list of integers in Scala can simply be expressed as follows: scala> val amounts = List(3,6,7,10) map ( x => x +1 ) amounts: List[Int] = List(4, 7, 8, 11) Although using x as a variable name is often discouraged in Java, here it does not matter that much as the variable is not reused and we can capture the transformation it does at once. There are many short or long alternatives to the previous lambda syntax that will produce the same result. So, which one to choose? Some of the alternatives are as follows: scala> val amounts = List(3,6,7,10) map ( myCurrentAmount => myCurrentAmount +1 ) amounts: List[Int] = List(4, 7, 8, 11) In this case, a long variable name breaks a clear and concise one-liner into two lines of code, thereby, making it difficult to understand. Meaningful names make more sense here if we start expressing logic on several lines as shown in the following code snippet: scala> val amounts = List(3,6,7,10) map { myCurrentAmount => val result = myCurrentAmount + 1 println("Result: " + result) result } Result: 4 Result: 7 Result: 8 Result: 11 amounts: List[Int] = List(4, 7, 8, 11) A shorter but still expressive name is sometimes a good compromise to indicate to the reader that this is an amount we are currently manipulating in our lambda expression, as follows: scala> val amounts = List(3,6,7,10) map( amt => amt + 1 ) amounts: List[Int] = List(4, 7, 8, 11) Finally, the shortest syntax of all that is well accepted by fluent Scala programmers for such a simple increment function is as follows: scala> val amounts = List(3,6,7,10) map( _ + 1 ) amounts: List[Int] = List(4, 7, 8, 11) Underscores are also encountered in Scala for expressing more complicated operations in an elegant but more awkward way, as is the following sum operation using the foldLeft method that accumulates the state from one element to the other: scala> val sumOfAmounts = List(3,6,7,10).foldLeft(0)( _ + _ ) sumOfAmounts: Int = 26 Instead of explicitly having 0 as the initial value for the sum, we can write this summation a bit more elegantly by using the reduce method that is similar to foldLeft. However, we take the first element of the collection as the initial value (here, 3 will be the initial value), as shown in the following command: scala> val sumOfAmounts = List(3,6,7,10) reduce ( _ + _ ) sumOfAmounts: Int = 26 As far as style is concerned, fluent Scala programmers will not have any problem reading this code. However, if the state accumulation operation is more complicated than just a simple + operation, it might be wise to write it more explicitly as shown in the following command: scala> val sumOfAmounts = List(3,6,7,10) reduce ( (total,element) => total + element ) sumOfAmounts: Int = 26 Summary In this article, we discussed about the style differences and the naming conventions that we must be aware of, to write easier-to-read and more maintainable code. Resources for Article: Further resources on this subject: The Business Layer (Java EE 7 First Look) [article] Getting Started with JavaFX [article] Enterprise JavaBeans [article]

0
0
2521

How-To Tutorials - Programming

C10K – A Non-blocking Web Server in Go

Apache Karaf – Provisioning and Clusters

Progressive Mockito

Part 2: Deploying Multiple Applications with Capistrano from a Single Project

Part 1: Deploying Multiple Applications with Capistrano from a Single Project

Part 1: Managing Multiple Apps and Environments with Capistrano 3 and Chef Solo

Enterprise Geodatabase

Discovering Python's parallel programming tools

Getting Started with Mockito

Common performance issues

Trending Topics

The anatomy of a report processor

Ranges

Continuous Integration

Working with a Neo4j Embedded Database

Differences in style between Java and Scala code