Programming | 1081 articles | Tech News, Tutorials & Expert Insights

17 Aug 2015

14 min read

Listening Out

17 Aug 2015

0
0
1334

article-image-elegant-restful-client-python-exposing-remote-resources

Xavier Bruhiere

12 Aug 2015

6 min read

Elegant RESTful Client in Python for Exposing Remote Resources

Xavier Bruhiere

12 Aug 2015

6 min read

Product Hunt addicts like me might have noticed how often a "developer" tab was available on landing pages. More and more modern products offer a special entry point tailored for coders who want deeper interaction, beyond standard end-user experience. Twitter, Myo, Estimote are great examples of technologies an engineer could leverage for its own tool/product. And Application Programming Interfaces (API) make it possible. Companies design them as a communication contract between the developer and their product. We can discern Representational State Transfer APIs (RESTful) from programmatic ones. The latter usually offer deeper technical integration, while the former tries to abstract most of the product's complexity behind intuitive remote resources (more on that later). The resulting simplicity owes a lot to the HTTP protocol and turns out to be trickier than one thinks. Both RESTful servers and clients often underestimates the value of HTTP historical rules or the challenges behind network failures. I will dump in this article my last experience in building an HTTP+JSON API client. We are going to build a small framework in python to interact with well-designed third party services. One should get out of it a consistent starting point for new projects, like remotely controlling its car ! Stack and Context Before diving in, let's state an important assumption : APIs our client will call are well designed. They enforce RFC standards, conventions and consistent resources. Sometimes, however, real world throws at us ugly interfaces. Always read the documentation (if any) and deal with it. The choice of Python should be seen as a minor implementation consideration. Nevertheless, it will bring us the powerful requests package and a nice repl to manually explore remote services. Its popularity also suggests we are likely to be able to integrate our future package in a future project. To keep things practical, requests will hit Consul HTTP endpoints, providing us with a handy interface for our infrastructure. Consul, as a whole, it is a tool for discovering and configuring services in your infrastructure. Just download the appropriate binary, move it in your $PATH and start a new server : consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul -node consul-server We also need python 3.4 or 2.7, pip installed and, then, to download the single dependency we mentioned earlier with pip install requests==2.7.0. Now let's have a conversation with an API ! Sending requests APIs exposes resources for manipulation through HTTP verbs. Say we need to retrieve nodes in our cluster, Consul documentation requires us to perform a GET /v1/catalog/nodes. import requests def http_get(resource, payload=None): """ Perform an HTTP GET request against the given endpoint. """ # Avoid dangerous default function argument `{}` payload = payload or {} # versioning an API guarantees compatibility endpoint = '{}/{}/{}'.format('localhost:8500', 'v1', resource) return requests.get( endpoint, # attach parameters to the url, like `&foo=bar` params=payload, # tell the API we expect to parse JSON responses headers={'Accept': 'application/vnd.consul+json; version=1'}) Providing consul is running on the same host, we get the following result. In [4]: res = http_get('catalog/nodes') In [5]: res.json() Out[5]: [{'Address': '172.17.0.1', 'Node': 'consul-server'}] Awesome : a few lines of code gave us a really convenient access to Consul information. Let's leverage OOP to abstract further the nodes resource. Mapping resources The idea is to consider a Catalog class whose attributes are Consul API resources. A little bit of Python magic offers an elegant way to achieve that. class Catalog(object): # url specific path _path = 'catalog' def__getattr__(self, name): """ Extend built-in method to add support for attributes related to endpoints. Example: agent.members runs GET /v1/agent/members """ # Default behavior if name in self.__dict__: returnself.__dict__[name] # Dynamic attribute based on the property name else: return http_get('/'.join([self._path, name])) It might seem a little cryptic if you are not familiar with built-in Python's object methods but the usage is crystal clear : In [47]: catalog_ = Catalog() In [48]: catalog_.nodes.json() Out[48]: [{'Address': '172.17.0.1', 'Node': 'consul-server'}] The really nice benefit with this approach is that we become very productive in supporting new resources. Just rename the previous class ClientFactory and profit. class Status(ClientFactory): _path = 'status' In [58]: status_ = Status() In [59]: status_.peers.json() Out[59]: ['172.17.0.1:8300'] But... what if the resource we call does not exist ? And, although we provide a header with Accept: application/json, what if we actually don't get back a JSON object or reach our rate limit ? Reading responses Let's challenge our current implementation against those questions. In [61]: status_.not_there Out[61]: <Response [404]> In [68]: # ok, that's a consistent response In [69]: # 404 HTTP code means the resource wasn't found on server-side In [69]: status_.not_there.json() --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) ... ValueError: Expecting value: line 1 column 1 (char 0) Well that's not safe at all. We're going to wrap our HTTP calls with a decorator in charge of inspecting the API response. def safe_request(fct): """ Return Go-like data (i.e. actual response and possible error) instead of raising errors. """ def inner(*args, **kwargs): data, error = {}; one try: res = fct(*args, **kwargs) except requests.exceptions.ConnectionErroras error: returnNone, {'message': str(error), 'id': -1} if res.status_code == 200 and res.headers['content-type'] == 'application/json': # expected behavior data = res.json() elif res.status_code == 206 and res.headers['content-type'] == 'application/json': # partial response, return as-is data = res.json() else: # something went wrong error = {'id': res.status_code, 'message': res.reason} return res, error return inner # update our old code @safe_request def http_get(resource): # ... This implementation stills require us to check for errors instead of disposing of the data right away. But we are dealing with network and unexpected failures will happen. Being aware of them without crashing or wrapping every resources with try/catch is a working compromise. In [71]: res, err = status_.not_there In [72]: print(err) {'id': 404, 'message': 'Not Found'} Conclusion We just covered an opinionated python abstraction for programmatically expose remote resources. Subclassing the objects above allows one to quickly interact with new services, through command line tools or interactive prompt. Yet, we only worked with the GET method. Most of the APIs allow resources deletion (DELETE), update (PUT) or creation (POST) to name a few HTTP verbs. Other future work could involve : authentication smarter HTTP code handler when dealing with forbidden, rate limiting, internal server error responses Given the incredible services that emerged lately (IBM Watson, Docker, ...), building API clients is a more and more productive option to develop innovative projects. About the Author Xavier Bruhiere is a Lead Developer at AppTurbo in Paris, where he develops innovative prototypes to support company growth. He is addicted to learning, hacking on intriguing hot techs (both soft and hard), and practicing high intensity sports.

0
0
7116

Packt

11 Aug 2015

14 min read

Working with XAML

Packt

11 Aug 2015

14 min read

XAML is also known as Extensible Application Markup Language. XAML is a generic language like XML and can be used for multiple purposes and in the WPF and Silverlight applications. XAML is majorly used to declaratively design user interfaces. In this article by Abhishek Shukla, author of the book Blend for Visual Studio 2012 by Example Beginner's Guide, we had a look at the various layout controls and at how to use them in our application. In this article, we will have a look at the following topics: The fundamentals of XAML The use of XAML for applications in Blend (For more resources related to this topic, see here.) An important point to note here is that almost everything that we can do using XAML can also be done using C#. The following is the XAML and C# code to accomplish the same task of adding a rectangle within a canvas. In the XAML code, a canvas is created and an instance of a rectangle is created, which is placed inside the canvas: XAML code: In the following code, we declare the Rectangle element within Canvas, and that makes the Rectangle element the child of Canvas. The hierarchy of the elements defines the parent-child relationship of the elements. When we declare an element in XAML, it is the same as initializing it with a default constructor, and when we set an attribute in XAML, it is equivalent to setting the same property or event handler in code. In the following code, we set the various properties of Rectangle, such as Height, Width, and so on: <Canvas> <Rectangle Height="100" Width="250" Fill="AliceBlue" StrokeThickness="5" Stroke="Black" /> </Canvas> C# code: We created Canvas and Rectangle. Then, we set a few properties of the Rectangle element and then placed Rectangle inside Canvas: Canvas layoutRoot = new Canvas(); Rectangle rectangle = new Rectangle(); rectangle.Height = 100; rectangle.Width = 250; rectangle.Fill = Brushes.AliceBlue; rectangle.StrokeThickness = 5; rectangle.Stroke = Brushes.Black; layoutRoot.Children.Add(rectangle); AddChild(layoutRoot); We can clearly see why XAML is the preferred choice to define UI elements—XAML code is shorter, more readable, and has all the advantages that a declarative language provides. Another major advantage of working with XAML rather than C# is instant design feedback. As we type XAML code, we see the changes on the art board instantly. Whereas in the case of C#, we have to run the application to see the changes. Thanks to XAML, the process of creating a UI is now more like visual design than code development. When we drag and drop an element or draw an element on the art board, Blend generates XAML in the background. This is helpful as we do not need to hand-code XAML as we would have to when working with text-editing tools. Generally, the order of attaching properties and event handlers is performed based on the order in which they are defined in the object element. However, this should not matter, ideally, because, as per the design guidelines, the classes should allow properties and event handlers to be specified in any order. Wherever we create a UI, we should use XAML, and whatever relates to data should be processed in code. XAML is great for UIs that may have logic, but XAML is not intended to process data, which should be prepared in code and posted to the UI for displaying purposes. It is data processing that XAML is not designed for. The basics of XAML Each element in XAML maps to an instance of a .NET class, and the name of the element is exactly the same as the class. For example, the <Button> element in XAML is an instruction to create an instance of the Button class. XAML specifications define the rules to map the namespaces, types, events, and properties of object-oriented languages to XML namespaces. Time for action – taking a look at XAML code Perform the following steps and take a look at the XAML namespaces after creating a WPF application: Let's create a new WPF project in Expression Blend and name it Chapter03. In Blend, open MainWindow.xaml, and click on the split-view option so that we can see both the design view as well as the XAML view. The following screenshot shows this: You will also notice that a grid is present under the window and there is no element above the window, which means that Window is the root element for the current document. We see this in XAML, and we will also see multiple attributes set on Window: <Window x_Class="Chapter03.MainWindow" Title="MainWindow" Height="350" Width="525"> We can see in the preceding code that the XAML file has a root element. For a XAML document to be valid, there should be one and only one root element. Generally, we have a window, page, or user control as the root element. Other root elements that are used are ResourceDictionary for dictionaries and Application for application definition. The Window element also contains a few attributes, including a class name and two XML namespaces. The three properties (Title, Height, and Width) define the caption of the window and default size of the window. The class name, as you might have guessed, defines the class name of the window. The http://schemas.microsoft.com/winfx/2006/xaml/presentation is is the default namespace. The default namespace allows us to add elements to the page without specifying any prefix. So, all other namespaces defined must have a unique prefix for reference. For example, the namespace http://schemas.microsoft.com/winfx/2006/xaml is defined with the prefix :x. The prefix that is mapped to the schema allows us to reference this namespace just using the prefix instead of the full-schema namespace. XML namespaces don't have a one-to-one mapping with .NET namespaces. WPF types are spread across multiple namespaces, and one-to-one mapping would be rather cumbersome. So, when we refer to the presentation, various namespaces, such as System.Windows and System.Windows.Controls, are included. All the elements and their attributes defined in MainPage.xaml should be defined in at least one of the schemas mentioned in the root element of XAML; otherwise, the XAML document will be invalid, and there is no guarantee that the compiler will understand and continue. When we add Rectangle in XAML, we expect Rectangle and its attributes to be part of the default schema: <Rectangle x_Name="someRectangle" Fill="AliceBlue"/> What just happened? We had a look at the XAML namespaces that we use when we create a WPF application. Time for action – adding other namespaces in XAML In this section, we will add another namespace apart from the default namespace: We can use any other namespace in XAML as well. To do that, we need to declare the XML namespaces the schemas of which we want to adhere to. The syntax to do that is as follows: Prefix is the XML prefix we want to use in our XAML to represent that namespace. For example, the XAML namespace uses the :x prefix. Namespace is the fully qualified .NET namespace. AssemblyName is the assembly where the type is declared, and this assembly could be the current project assembly or a referenced assembly. Open the XAML view of MainWindow.xaml if it is not already open, and add the following line of code after the reference to To create an instance of an object we would have to use this namespace prefix as <system:Double></system:Double> We can access the types defined in the current assembly by referencing the namespace of the current project: To create an instance of an object we would have to use this namespace prefix as <local:MyObj></local:MyObj> What just happened? We saw how we can add more namespaces in XAML apart from the ones present by default and how we can use them to create objects. Naming elements In XAML, it is not mandatory to add a name to every element, but we might want to name some of the XAML elements that we want to access in the code or XAML. We can change properties or attach the event handler to, or detach it from, elements on the fly. We can set the name property by hand-coding in XAML or setting it in the properties window. This is shown in the following screenshot: The code-behind class The x:Class attribute in XAML references the code-behind class for XAML. You would notice the x:, which means that the Class attribute comes from the XAML namespace, as discussed earlier. The value of the attribute references the MainWindow class in the Chapter03 namespace. If we go to the MainWindow.xaml.cs file, we can see the partial class defined there. The code-behind class is where we can put C# (or VB) code for the implementation of event handlers and other application logic. As we discussed earlier, it is technically possible to create all of the XAML elements in code, but that bypasses the advantages of having XAML. So, as these two are partial classes, they are compiled into one class. So, as C# and XAML are equivalent and both these are partial classes, they are compiled into the same IL. Time for action – using a named element in a code-behind class Go to MainWindow.xaml.cs and change the background of the LayoutRoot grid to Green: public MainWindow() { InitializeComponent(); LayoutRoot.Background = Brushes.Green; } Run the application; you will see that the background color of the grid is green. What just happened? We accessed the element defined in XAML by its name. Default properties The content of a XAML element is the value that we can simply put between the tags without any specific references. For example, we can set the content of a button as follows: <Button Content="Some text" /> or <Button > Some text </Button> The default properties are specified in the help file. So, all we need to do is press F1 on any of our controls to see what the value is. Expressing properties as attributes We can express the properties of an element as an XML attribute. Time for action – adding elements in XAML by hand-coding In this section, instead of dragging and dropping controls, we will add XAML code: Move MainWindow.xaml to XAML and add the code shown here to add TextBlock and three buttons in Grid: <Grid x_Name="LayoutRoot"> <TextBlock Text="00:00" Height="170" Margin="49,32,38,116" Width="429" FontSize="48"/> <Button Content="Start" Height="50" Margin="49,220,342,49" Width="125"/> <Button Content="Stop" Height="50" Margin="203,220,188,49" Width="125"/> <Button Content="Reset" Height="50" Margin="353,220,38,49" Width="125"/> </Grid> We set a few properties for each of the elements. The property types of the various properties are also mentioned. These are the .NET types to which the type converter would convert them: Content: This is the content displayed onscreen. This property is of the Object type. Height: This is the height of the button. This property is of the Double type. Width: This is the width of the button. This property is of the Double type. Margin: This is the amount of space outside the control, that is, between the edge of the control and its container. This property is of the Thickness type. Text: This is the text displayed onscreen. This property is of the String type. FontSize: This is the size of the font of the text. This property is of the Double type. What just happened? We added elements in XAML with properties as XML attributes. Non-attribute syntax We will define a gradient background for the grid, which is a complex property. Notice that the code in the next section sets the background property of the grid using a different type converter this time. Instead of a string-to-brush converter, the LinearGradientBrush to brush type converter would be used. Time for action – defining the gradient for the grid Add the following code inside the grid. We have specified two gradient stops. The first one is black and the second one is a shade of green. We have specified the starting point of LinearGradientBrush as the top-left corner and the ending point as the bottom-right corner, so we will see a diagonal gradient: <Grid x_Name="LayoutRoot"> <Grid.Background> <LinearGradientBrush EndPoint="1,1" StartPoint="0,0"> <GradientStop Color="Black"/> <GradientStop Color="#FF27EC07" Offset="1"/> </LinearGradientBrush> </Grid.Background> The following is the output of the preceding code: Comments in XAML Using  tags, we can add comments in XAML just as we add comments in XML. Comments are really useful when we have complex XAML with lots of elements and complex layouts:  Styles in XAML We would like all the buttons in our application to look the same, and we can achieve this using styles. Here, we will just see how styles can be defined and used in XAML. Defining a style We will define a style as a static resource in the window. We can move all the properties we define for each button to that style. Time for action – defining style in XAML Add a new <Window.Resources> tag in XAML, and then add the code for the style, as shown here: <Window.Resources> <Style TargetType="Button" x_Key="MyButtonStyle"> <Setter Property="Height" Value="50" /> <Setter Property="Width" Value="125" /> <Setter Property="Margin" Value="0,10" /> <Setter Property="FontSize" Value="18"/> <Setter Property="FontWeight" Value="Bold" /> <Setter Property="Background" Value="Black" /> <Setter Property="Foreground" Value="Green" /> </Style> </Window.Resources> We defined a few properties on style that are worth noting: TargetType: This property specifies the type of element to which we will apply this style. In our case, it is Button. x:Key: This is the unique key to reference the style within the window. Setter: The setter elements contain the property name and value. What just happened We defined a style for a button in the window. Using a style Let's use the style that we defined in the previous section. All UI controls have a style property (from the FrameworkElement base class). Time for action – defining style in XAML To use a style, we have to set the style property of the element as shown in the following code. Add the same style code to all the buttons: We set the style property using curly braces ({}) because we are using another element. StaticResource denotes that we are using another resource. MyButtonStyle is the key to refer to the style. The following code encapsulates the style properties: <Button x_Name="BtnStart" Content="Start" Grid.Row="1" Grid.Column="0" Style="{StaticResource MyButtonStyle}"/> The following is the output of the preceding code: What just happened? We used a style defined for a button. Where to go from here This article gave a brief overview of XAML, which helps you to start designing your applications in Expression Blend. However, if you wish know more about XAML, you could visit the following MSDN links and go through the various XAML specifications and details: XAML in Silverlight: http://msdn.microsoft.com/en-us/library/cc189054(v=vs.95).aspx XAML in WPF: http://msdn.microsoft.com/en-us/library/ms747122(v=vs.110).aspx Pop quiz Q1. How can we find out the default property of a control? F1. F2. F3. F4. Q2. Is it possible to create a custom type converter? Yes. No. Summary We had a look at the basics of XAML, including namespaces, elements, and properties. With this introduction, you can now hand-edit XAML, where necessary, and this will allow you to tweak the output of Expression Blend or even hand-code an entire UI if you are more comfortable with that. Resources for Article: Further resources on this subject: Windows Phone 8 Applications [article] Building UI with XAML for Windows 8 Using C [article] Introduction to Modern OpenGL [article]

0
0
1036

article-image-advanced-data-access-patterns

Packt

11 Aug 2015

25 min read

Advanced Data Access Patterns

Packt

11 Aug 2015

25 min read

0
0
3473

Packt

11 Aug 2015

20 min read

Task Execution with Asio

Packt

11 Aug 2015

20 min read

In this article by Arindam Mukherjee, the author of Learning Boost C++ Libraries, we learch how to execute a task using Boost Asio (pronounced ay-see-oh), a portable library for performing efficient network I/O using a consistent programming model. At its core, Boost Asio provides a task execution framework that you can use to perform operations of any kind. You create your tasks as function objects and post them to a task queue maintained by Boost Asio. You enlist one or more threads to pick these tasks (function objects) and invoke them. The threads keep picking up tasks, one after the other till the task queues are empty at which point the threads do not block but exit. (For more resources related to this topic, see here.) IO Service, queues, and handlers At the heart of Asio is the type boost::asio::io_service. A program uses the io_service interface to perform network I/O and manage tasks. Any program that wants to use the Asio library creates at least one instance of io_service and sometimes more than one. In this section, we will explore the task management capabilities of io_service. Here is the IO Service in action using the obligatory "hello world" example: Listing 11.1: Asio Hello World 1 #include <boost/asio.hpp> 2 #include <iostream> 3 namespace asio = boost::asio; 4 5 int main() { 6 asio::io_service service; 7 8 service.post( 9 [] { 10 std::cout << "Hello, world!" << 'n'; 11 }); 12 13 std::cout << "Greetings: n"; 14 service.run(); 15 } We include the convenience header boost/asio.hpp, which includes most of the Asio library that we need for the examples in this aritcle (line 1). All parts of the Asio library are under the namespace boost::asio, so we use a shorter alias for this (line 3). The program itself just prints Hello, world! on the console but it does so through a task. The program first creates an instance of io_service (line 6) and posts a function object to it, using the post member function of io_service. The function object, in this case defined using a lambda expression, is referred to as a handler. The call to post adds the handler to a queue inside io_service; some thread (including that which posted the handler) must dispatch them, that is, remove them off the queue and call them. The call to the run member function of io_service (line 14) does precisely this. It loops through the handlers in the queue inside io_service, removing and calling each handler. In fact, we can post more handlers to the io_service before calling run, and it would call all the posted handlers. If we did not call run, none of the handlers would be dispatched. The run function blocks until all the handlers in the queue have been dispatched and returns only when the queue is empty. By itself, a handler may be thought of as an independent, packaged task, and Boost Asio provides a great mechanism for dispatching arbitrary tasks as handlers. Note that handlers must be nullary function objects, that is, they should take no arguments. Asio is a header-only library by default, but programs using Asio need to link at least with boost_system. On Linux, we can use the following command line to build this example: $ g++ -g listing11_1.cpp -o listing11_1 -lboost_system -std=c++11 Running this program prints the following: Greetings: Hello, World! Note that Greetings: is printed from the main function (line 13) before the call to run (line 14). The call to run ends up dispatching the sole handler in the queue, which prints Hello, World!. It is also possible for multiple threads to call run on the same I/O object and dispatch handlers concurrently. We will see how this can be useful in the next section. Handler states – run_one, poll, and poll_one While the run function blocks until there are no more handlers in the queue, there are other member functions of io_service that let you process handlers with greater flexibility. But before we look at this function, we need to distinguish between pending and ready handlers. The handlers we posted to the io_service were all ready to run immediately and were invoked as soon as their turn came on the queue. In general, handlers are associated with background tasks that run in the underlying OS, for example, network I/O tasks. Such handlers are meant to be invoked only once the associated task is completed, which is why in such contexts, they are called completion handlers. These handlers are said to be pending until the associated task is awaiting completion, and once the associated task completes, they are said to be ready. The poll member function, unlike run, dispatches all the ready handlers but does not wait for any pending handler to become ready. Thus, it returns immediately if there are no ready handlers, even if there are pending handlers. The poll_one member function dispatches exactly one ready handler if there be one, but does not block waiting for pending handlers to get ready. The run_one member function blocks on a nonempty queue waiting for a handler to become ready. It returns when called on an empty queue, and otherwise, as soon as it finds and dispatches a ready handler. post versus dispatch A call to the post member function adds a handler to the task queue and returns immediately. A later call to run is responsible for dispatching the handler. There is another member function called dispatch that can be used to request the io_service to invoke a handler immediately if possible. If dispatch is invoked in a thread that has already called one of run, poll, run_one, or poll_one, then the handler will be invoked immediately. If no such thread is available, dispatch adds the handler to the queue and returns just like post would. In the following example, we invoke dispatch from the main function and from within another handler: Listing 11.2: post versus dispatch 1 #include <boost/asio.hpp> 2 #include <iostream> 3 namespace asio = boost::asio; 4 5 int main() { 6 asio::io_service service; 7 // Hello Handler – dispatch behaves like post 8 service.dispatch([]() { std::cout << "Hellon"; }); 9 10 service.post( 11 [&service] { // English Handler 12 std::cout << "Hello, world!n"; 13 service.dispatch([] { // Spanish Handler, immediate 14 std::cout << "Hola, mundo!n"; 15 }); 16 }); 17 // German Handler 18 service.post([&service] {std::cout << "Hallo, Welt!n"; }); 19 service.run(); 20 } Running this code produces the following output: Hello Hello, world! Hola, mundo! Hallo, Welt! The first call to dispatch (line 8) adds a handler to the queue without invoking it because run was yet to be called on io_service. We call this the Hello Handler, as it prints Hello. This is followed by the two calls to post (lines 10, 18), which add two more handlers. The first of these two handlers prints Hello, world! (line 12), and in turn, calls dispatch (line 13) to add another handler that prints the Spanish greeting, Hola, mundo! (line 14). The second of these handlers prints the German greeting, Hallo, Welt (line 18). For our convenience, let's just call them the English, Spanish, and German handlers. This creates the following entries in the queue: Hello Handler English Handler German Handler Now, when we call run on the io_service (line 19), the Hello Handler is dispatched first and prints Hello. This is followed by the English Handler, which prints Hello, World! and calls dispatch on the io_service, passing the Spanish Handler. Since this executes in the context of a thread that has already called run, the call to dispatch invokes the Spanish Handler, which prints Hola, mundo!. Following this, the German Handler is dispatched printing Hallo, Welt! before run returns. What if the English Handler called post instead of dispatch (line 13)? In that case, the Spanish Handler would not be invoked immediately but would queue up after the German Handler. The German greeting Hallo, Welt! would precede the Spanish greeting Hola, mundo!. The output would look like this: Hello Hello, world! Hallo, Welt! Hola, mundo! Concurrent execution via thread pools The io_service object is thread-safe and multiple threads can call run on it concurrently. If there are multiple handlers in the queue, they can be processed concurrently by such threads. In effect, the set of threads that call run on a given io_service form a thread pool. Successive handlers can be processed by different threads in the pool. Which thread dispatches a given handler is indeterminate, so the handler code should not make any such assumptions. In the following example, we post a bunch of handlers to the io_service and then start four threads, which all call run on it: Listing 11.3: Simple thread pools 1 #include <boost/asio.hpp> 2 #include <boost/thread.hpp> 3 #include <boost/date_time.hpp> 4 #include <iostream> 5 namespace asio = boost::asio; 6 7 #define PRINT_ARGS(msg) do { 8 boost::lock_guard<boost::mutex> lg(mtx); 9 std::cout << '[' << boost::this_thread::get_id() 10 << "] " << msg << std::endl; 11 } while (0) 12 13 int main() { 14 asio::io_service service; 15 boost::mutex mtx; 16 17 for (int i = 0; i < 20; ++i) { 18 service.post([i, &mtx]() { 19 PRINT_ARGS("Handler[" << i << "]"); 20 boost::this_thread::sleep( 21 boost::posix_time::seconds(1)); 22 }); 23 } 24 25 boost::thread_group pool; 26 for (int i = 0; i < 4; ++i) { 27 pool.create_thread([&service]() { service.run(); }); 28 } 29 30 pool.join_all(); 31 } We post twenty handlers in a loop (line 18). Each handler prints its identifier (line 19), and then sleeps for a second (lines 19-20). To run the handlers, we create a group of four threads, each of which calls run on the io_service (line 21) and wait for all the threads to finish (line 24). We define the macro PRINT_ARGS which writes output to the console in a thread-safe way, tagged with the current thread ID (line 7-10). We will use this macro in other examples too. To build this example, you must also link against libboost_thread, libboost_date_time, and in Posix environments, with libpthread too: $ g++ -g listing9_3.cpp -o listing9_3 -lboost_system -lboost_thread -lboost_date_time -pthread -std=c++11 One particular run of this program on my laptop produced the following output (with some lines snipped): [b5c15b40] Handler[0] [b6416b40] Handler[1] [b6c17b40] Handler[2] [b7418b40] Handler[3] [b5c15b40] Handler[4] [b6416b40] Handler[5] … [b6c17b40] Handler[13] [b7418b40] Handler[14] [b6416b40] Handler[15] [b5c15b40] Handler[16] [b6c17b40] Handler[17] [b7418b40] Handler[18] [b6416b40] Handler[19] You can see that the different handlers are executed by different threads (each thread ID marked differently). If any of the handlers threw an exception, it would be propagated across the call to the run function on the thread that was executing the handler. io_service::work Sometimes, it is useful to keep the thread pool started, even when there are no handlers to dispatch. Neither run nor run_one blocks on an empty queue. So in order for them to block waiting for a task, we have to indicate, in some way, that there is outstanding work to be performed. We do this by creating an instance of io_service::work, as shown in the following example: Listing 11.4: Using io_service::work to keep threads engaged 1 #include <boost/asio.hpp> 2 #include <memory> 3 #include <boost/thread.hpp> 4 #include <iostream> 5 namespace asio = boost::asio; 6 7 typedef std::unique_ptr<asio::io_service::work> work_ptr; 8 9 #define PRINT_ARGS(msg) do { … ... 14 15 int main() { 16 asio::io_service service; 17 // keep the workers occupied 18 work_ptr work(new asio::io_service::work(service)); 19 boost::mutex mtx; 20 21 // set up the worker threads in a thread group 22 boost::thread_group workers; 23 for (int i = 0; i < 3; ++i) { 24 workers.create_thread([&service, &mtx]() { 25 PRINT_ARGS("Starting worker thread "); 26 service.run(); 27 PRINT_ARGS("Worker thread done"); 28 }); 29 } 30 31 // Post work 32 for (int i = 0; i < 20; ++i) { 33 service.post( 34 [&service, &mtx]() { 35 PRINT_ARGS("Hello, world!"); 36 service.post([&mtx]() { 37 PRINT_ARGS("Hola, mundo!"); 38 }); 39 }); 40 } 41 42 work.reset(); // destroy work object: signals end of work 43 workers.join_all(); // wait for all worker threads to finish 44 } In this example, we create an object of io_service::work wrapped in a unique_ptr (line 18). We associate it with an io_service object by passing to the work constructor a reference to the io_service object. Note that, unlike listing 11.3, we create the worker threads first (lines 24-27) and then post the handlers (lines 33-39). However, the worker threads stay put waiting for the handlers because of the calls to run block (line 26). This happens because of the io_service::work object we created, which indicates that there is outstanding work in the io_service queue. As a result, even after all handlers are dispatched, the threads do not exit. By calling reset on the unique_ptr, wrapping the work object, its destructor is called, which notifies the io_service that all outstanding work is complete (line 42). The calls to run in the threads return and the program exits once all the threads are joined (line 43). We wrapped the work object in a unique_ptr to destroy it in an exception-safe way at a suitable point in the program. We omitted the definition of PRINT_ARGS here, refer to listing 11.3. Serialized and ordered execution via strands Thread pools allow handlers to be run concurrently. This means that handlers that access shared resources need to synchronize access to these resources. We already saw examples of this in listings 11.3 and 11.4, when we synchronized access to std::cout, which is a global object. As an alternative to writing synchronization code in handlers, which can make the handler code more complex, we can use strands. Think of a strand as a subsequence of the task queue with the constraint that no two handlers from the same strand ever run concurrently. The scheduling of other handlers in the queue, which are not in the strand, is not affected by the strand in any way. Let us look at an example of using strands: Listing 11.5: Using strands 1 #include <boost/asio.hpp> 2 #include <boost/thread.hpp> 3 #include <boost/date_time.hpp> 4 #include <cstdlib> 5 #include <iostream> 6 #include <ctime> 7 namespace asio = boost::asio; 8 #define PRINT_ARGS(msg) do { ... 13 14 int main() { 15 std::srand(std::time(0)); 16 asio::io_service service; 17 asio::io_service::strand strand(service); 18 boost::mutex mtx; 19 size_t regular = 0, on_strand = 0; 20 21 auto workFuncStrand = [&mtx, &on_strand] { 22 ++on_strand; 23 PRINT_ARGS(on_strand << ". Hello, from strand!"); 24 boost::this_thread::sleep( 25 boost::posix_time::seconds(2)); 26 }; 27 28 auto workFunc = [&mtx, &regular] { 29 PRINT_ARGS(++regular << ". Hello, world!"); 30 boost::this_thread::sleep( 31 boost::posix_time::seconds(2)); 32 }; 33 // Post work 34 for (int i = 0; i < 15; ++i) { 35 if (rand() % 2 == 0) { 36 service.post(strand.wrap(workFuncStrand)); 37 } else { 38 service.post(workFunc); 39 } 40 } 41 42 // set up the worker threads in a thread group 43 boost::thread_group workers; 44 for (int i = 0; i < 3; ++i) { 45 workers.create_thread([&service, &mtx]() { 46 PRINT_ARGS("Starting worker thread "); 47 service.run(); 48 PRINT_ARGS("Worker thread done"); 49 }); 50 } 51 52 workers.join_all(); // wait for all worker threads to finish 53 } In this example, we create two handler functions: workFuncStrand (line 21) and workFunc (line 28). The lambda workFuncStrand captures a counter on_strand, increments it, and prints a message Hello, from strand!, prefixed with the value of the counter. The function workFunc captures another counter regular, increments it, and prints Hello, World!, prefixed with the counter. Both pause for 2 seconds before returning. To define and use a strand, we first create an object of io_service::strand associated with the io_service instance (line 17). Thereafter, we post all handlers that we want to be part of that strand by wrapping them using the wrap member function of the strand (line 36). Alternatively, we can post the handlers to the strand directly by using either the post or the dispatch member function of the strand, as shown in the following snippet: 33 for (int i = 0; i < 15; ++i) { 34 if (rand() % 2 == 0) { 35 strand.post(workFuncStrand); 37 } else { ... The wrap member function of strand returns a function object, which in turn calls dispatch on the strand to invoke the original handler. Initially, it is this function object rather than our original handler that is added to the queue. When duly dispatched, this invokes the original handler. There are no constraints on the order in which these wrapper handlers are dispatched, and therefore, the actual order in which the original handlers are invoked can be different from the order in which they were wrapped and posted. On the other hand, calling post or dispatch directly on the strand avoids an intermediate handler. Directly posting to a strand also guarantees that the handlers will be dispatched in the same order that they were posted, achieving a deterministic ordering of the handlers in the strand. The dispatch member of strand blocks until the handler is dispatched. The post member simply adds it to the strand and returns. Note that workFuncStrand increments on_strand without synchronization (line 22), while workFunc increments the counter regular within the PRINT_ARGS macro (line 29), which ensures that the increment happens in a critical section. The workFuncStrand handlers are posted to a strand and therefore are guaranteed to be serialized; hence no need for explicit synchronization. On the flip side, entire functions are serialized via strands and synchronizing smaller blocks of code is not possible. There is no serialization between the handlers running on the strand and other handlers; therefore, the access to global objects, like std::cout, must still be synchronized. The following is a sample output of running the preceding code: [b73b6b40] Starting worker thread [b73b6b40] 0. Hello, world from strand! [b6bb5b40] Starting worker thread [b6bb5b40] 1. Hello, world! [b63b4b40] Starting worker thread [b63b4b40] 2. Hello, world! [b73b6b40] 3. Hello, world from strand! [b6bb5b40] 5. Hello, world! [b63b4b40] 6. Hello, world! … [b6bb5b40] 14. Hello, world! [b63b4b40] 4. Hello, world from strand! [b63b4b40] 8. Hello, world from strand! [b63b4b40] 10. Hello, world from strand! [b63b4b40] 13. Hello, world from strand! [b6bb5b40] Worker thread done [b73b6b40] Worker thread done [b63b4b40] Worker thread done There were three distinct threads in the pool and the handlers from the strand were picked up by two of these three threads: initially, by thread ID b73b6b40, and later on, by thread ID b63b4b40. This also dispels a frequent misunderstanding that all handlers in a strand are dispatched by the same thread, which is clearly not the case. Different handlers in the same strand may be dispatched by different threads but will never run concurrently. Summary Asio is a well-designed library that can be used to write fast, nimble network servers that utilize the most optimal mechanisms for asynchronous I/O available on a system. It is an evolving library and is the basis for a Technical Specification that proposes to add a networking library to a future revision of the C++ Standard. In this article, we learned how to use the Boost Asio library as a task queue manager and leverage Asio's TCP and UDP interfaces to write programs that communicate over the network. Resources for Article: Further resources on this subject: Animation features in Unity 5 [article] Exploring and Interacting with Materials using Blueprints [article] A Simple Pathfinding Algorithm for a Maze [article]

0
1
14113

article-image-using-arcpy-dataaccess-module-withfeature-classesand-tables

Packt

10 Aug 2015

28 min read

Using the ArcPy DataAccess Module withFeature Classesand Tables

Packt

10 Aug 2015

28 min read

0
0
4843

Packt

21 Jul 2015

28 min read

More about Julia

Packt

21 Jul 2015

28 min read

In this article by Malcolm Sherrington, author of the book Mastering Julia, we will see why write a book on Julia when the language is not yet reached the version v1.0 stage? It was the first question which needed to be addressed when deciding on the contents and philosophy behind the book. (For more resources related to this topic, see here.) Julia at the time as v0.2, it is now soon to achieve a stable v0.4 but already the blueprint for v0.5 is being touted. There were some common misconceptions which I wished to address: It is a language designed for Geeks It's main attribute, possibly only, was its speed It was a scientific language primarily a MATLAB clone It is not as easy to use compared with the alternatives such Python and R There are not enough library support to tackle Enterprise Solutions In fact none of these apply to Julia. True it is a relatively young programming language. The initial design work on Julia project began at MIT in August 2009, by February 2012 it became open source. It is largely the work of three developers Stefan Karpinski, Jeff Bezanson, and Viral Shah. Initially Julia was envisaged by the designers as a scientific language sufficiently rapid to make the necessity of modeling in an interactive language and subsequently having to redevelop in a compiled language, such as C or Fortran. To achieve this, Julia code would need to be transformed to the underlying machine code of the computer but using the low level virtual machine (LLVM) compilation system, at the time itself a new project. This was a masterstroke. LLVM is now the basis of a variety of systems, the Apple C compiler (clang) uses it, Google V8 JavaScript engine and Mozilla's Rust language use it and Python is attempting to achieve significant increases in speed with its numba module. In Julia LLVM always works, there are no exceptions because it has to. When launched possibly the community itself saw Julia as a replacement for MATLAB but that proved not to be just case. The syntax of Julia is similar to MATLAB, so much so that anyone competent in the latter can easily learn Julia but, it is a much richer language with many significant differences. The task of the book was to focus on these. In particular my target audience was the data scientist and programmer analyst but have sufficient for the "jobbing" C++ and Java programmer. Julia's features The Julia programming language is free and open source (MIT licensed) and the source is available on GitHub. To the veteran programmer it has a look and feel similar to MATLAB. Blocks created by for, while, and if statements are all terminated by end rather than by endfor, endwhile, and endif or using the familiar {} style syntax. However it is not a MATLAB clone and sources written for MATLAB will not run on Julia. The following are some of the Julia's features: Designed for parallelism and distributed computation (multicore and cluster) C functions called directly (no wrappers or special APIs needed) Powerful shell-like capabilities for managing other processes Lisp-like macros and other metaprogramming facilities User-defined types are as fast and compact as built-ins LLVM-based, just-in-time (JIT) compiler that allows Julia to approach and often match the performance of C/C++ An extensive mathematical function library (written in Julia) Integrated mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing Julia's core is implemented in C and C++, its parser in Scheme, and the LLVM compiler framework is used for JIT generation of machine code. The standard library is written in Julia itself using the Node.js's libuv library for efficient, cross-platform I/O. Julia has a rich language of types for constructing and describing objects that can also optionally be used to make type declarations. The ability to define function behavior across many combinations of argument types via multiple dispatch which is a key cornerstone of the language design. Julia can utilize code in other programming languages by a directly calling routines written in C or Fortran and stored in shared libraries or DLLs. This is a feature of the language syntax. In addition it is possible to interact with Python via the PyCall and this is used in the implementation of the IJulia programming environment. A quick look at some Julia To get feel for programming in Julia let's look at an example which uses random numbers to price an Asian derivative on the options market. A share option is the right to purchase a specific stock at a nominated price sometime in the future. The person granting the option is called the grantor and the person who has the benefit of the option is the beneficiary. At the time the option matures the beneficiary may choose to exercise the option if it is in his/her interest the grantor is then obliged to complete the contract. The following snippet is part of the calculation and computes a single trial and uses the Winston package to display the trajectory: using Winston; S0 = 100; # Spot price K = 102; # Strike price r = 0.05; # Risk free rate q = 0.0; # Dividend yield v = 0.2; # Volatility tma = 0.25; # Time to maturity T = 100; # Number of time steps dt = tma/T; # Time increment S = zeros(Float64,T) S[1] = S0; dW = randn(T)*sqrt(dt); [ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] + 0.5*v*v*dW[t]*dW[t]) for t=2:T ] x = linspace(1, T, length(T)); p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%") add(p, Curve(x,S,color="red")) display(p) Plot one track so only compute a vector S of T elements. The stochastic variance dW is computed in a single vectorized statement. The track S is computed using a "list comprehension". The array x is created using linspace to define a linear absicca for the plot. Using the Winston package to produce the display, which only requires 3 statements: to define the plot space, add a curve to it and display the plot as shown in the following figure: Generating Julia sets Both the Mandelbrot set and Julia set (for a given constant z0) are the sets of all z (complex number) for which the iteration z → z*z + z0 does not diverge to infinity. The Mandelbrot set is those z0 for which the Julia set is connected. We create a file jset.jl and its contents defines the function to generate a Julia set. functionjuliaset(z, z0, nmax::Int64) for n = 1:nmax if abs(z) > 2 (return n-1) end z = z^2 + z0 end returnnmax end Here z and z0 are complex values and nmax is the number of trials to make before returning. If the modulus of the complex number z gets above 2 then it can be shown that it will increase without limit. The function returns the number of iterations until the modulus test succeeds or else nmax. Also we will write a second file pgmfile.jl to handling displaying the Julia set. functioncreate_pgmfile(img, outf::String) s = open(outf, "w") write(s, "P5n") n, m = size(img) write(s, "$m $n 255n") for i=1:n, j=1:m p = img[i,j] write(s, uint8(p)) end close(s) end It is quite easy to create a simple disk file using the portable bitmap (netpbm) format. This consists of a "magic" number P1 - P6, followed on the next line the image height, width and a maximum color value which must be greater than 0 and less than 65536; all of these are ASCII values not binary. Then follows the image values (height x width) which make be ASCII for P1, P2, P3 or binary for P4, P5, P6. There are three different types of portable bitmap; B/W (P1/P4), Grayscale (P2/P5), and Colour (P3/P6). The function create_pgmfile() creates a binary grayscale file (magic number = P5) from an image matrix where the values are written as Uint8. Notice that the for loop defines the indices i, j in a single statement with correspondingly only one end statement. The image matrix is output in column order which matches the way it is stored in Julia. So the main program looks like: include("jset.jl") include("pgmfile.jl") h = 400; w = 800; M = Array(Int64, h, w); c0 = -0.8+0.16im; pgm_name = "juliaset.pgm"; t0 = time(); for y=1:h, x=1:w c = complex((x-w/2)/(w/2), (y-h/2)/(w/2)) M[y,x] = juliaset(c, c0, 256) end t1 = time(); create_pgmfile(M, pgm_name); print("Written $pgm_namenFinished in $(t1-t0) seconds.n"); This is how the previous code works: We define an array N of type Int64 to hold the return values from the juliaset function. The constant c0 is arbitrary, different values of c0 will produce different Julia sets. The starting complex number is constructed from the (x,y) coordinates and scaled to the half width. We 'cheat' a little by defining the maximum number of iterations as 256. Because we are writing byte values (Uint8) and value which remains bounded will be 256 and since overflow values wrap around will be output as 0 (black). The script defines a main program in a function jmain(): julia>jmain Written juliaset.pgm Finished in 0.458 seconds # => (on my laptop) Julia type system and multiple dispatch Julia is not an object-oriented language so when we speak of object they are a different sort of data structure to those in traditional O-O languages. Julia does not allow types to have methods or so it is not possible to create subtypes which inherit methods. While this might seem restrictive it does permit methods to use a multiple dispatch call structure rather than the single dispatch system employed in object orientated ones. Coupled with Julia's system of types, multiple dispatch is extremely powerful. Moreover it is a more logical approach for data scientists and scientific programmers and if for no other reason exposing this to you the analyst/programmer is a reason to use Julia. A function is an object that maps a tuple of arguments to a return value. In the case where the arguments are not valid the function should handle the situation cleanly by catching the error and handling it or throw an exception. When a function is applied to its argument tuple it selects the appropriate method and this process is called dispatch. In traditional object-oriented languages the method chosen is based only on the object type and this paradigm is termed single-dispatch. With Julia the combination of all a functions argument determine which method is chosen, this is the basis of multiple dispatch. To the scientific programmer this all seems very natural. It makes little sense in most circumstances for one argument to be more important than the others. In Julia all functions and operators, which are also functions, use multiple dispatch. The methods chosen for any combination of operators. For example looking at the methods of the power operator (^): julia> methods(^) # 43 methods for generic function "^": ^(x::Bool,y::Bool) at bool.jl:41 ^(x::BigInt,y::Bool) at gmp.jl:314 ^(x::Integer,y::Bool) at bool.jl:42 ^(A::Array{T,2},p::Number) at linalg/dense.jl:180 ^(::MathConst{:e},x::AbstractArray{T,2}) at constants.jl:87 We can see that there are 43 methods for ^ and the file and line where the methods is defined are given too. Because any untyped argument is designed as type Any, it is possible to define a set of function methods such that there is no unique most specific method applicable to some combinations of arguments. julia> pow(a,b::Int64) = a^b; julia> pow(a::Float64,b) = a^b; Warning: New definition pow(Float64,Any) at /Applications/JuliaStudio.app/Contents/Resources/Console/Console.jl:1 is ambiguous with: pow(Any,Int64) at /Applications/JuliaStudio.app/Contents/Resources/Console/Console.jl:1. To fix, define pow(Float64,Int64) before the new definition. A call of pow(3.5, 2) can be handled by either function. In this case they will give the same result by only because of the function bodies and Julia can't know that. Working with Python The ability for Julia with call code written in other languages is one of its main strengths. From its inception Julia had to play "catchup" and a key feature was it makes calling code written in C, and by implication Fortran, very easy. The code to be called must be available as a shared library rather than just a standalone object file. There is a zero-overhead in the call, meaning that it is reduced to a single machine instruction in the LLVM compilation. In addition Python models can be accessed via the PyCall package which provides a @pyimport macro that mimics a Python import statement. This imports a Python module and provides Julia wrappers for all of the functions and constants including automatic conversion of types between Julia and Python. This work has led to the creation of an IJulia kernel to the IPython IDE which now is a principle component of the Jupyter project. In Pycall, type conversions are automatically performed for numeric, boolean, string, IO streams plus all tuples, arrays and dictionaries of these types. julia> using PyCall julia> @pyimport scipy.optimize as so julia> @pyimport scipy.integrate as si julia> so.ridder(x -> x*cos(x), 1, pi); # => 1.570796326795 julia> si.quad(x -> x*sin(x), 1, pi)[1]; # => 2.840423974650 In the preceding commands, the Python optimize and integrate modules are imported and functions in those modules called from the Julia REPL. One difference imposed on the package is that calls using the Python object notation are not possible from Julia, so these are referenced using an array-style notation po[:atr] rather po.atr, where po is a PyObject and atr is an attribute. It is also easy to use the Python matplotlib module to display simple (and complex) graphs. @pyimport matplotlib.pyplot as plt x = linspace(0,2*pi,1000); y = sin(3*x + 4*cos(2*x)); plt.plot(x, y, color="red", linewidth=2.0, linestyle="--") 1-element Array{Any,1}: PyObject<matplotlib.lines.Line2D object at 0x0000000027652358> plt.show() Notice that keywords can also be passed such as the color, line width and the preceding style. Simple statistics using dataframes Julia implements various approaches for handling data held on disk. These may be 'normal' files such as text files, CSV and other delimited files, or in SQL and NoSQL databases. Also there is an implementation of dataframe support similar to that provided in R and via the pandas module in Python. The following looks at the Apple share prices from 2000 to 200, using a CSV file with provides opening, closing, high and low prices together with trading volumes over the day. using DataFrames, StatsBase aapl = readtable("AAPL-Short.csv"); naapl = size(aapl)[1] m1 = int(mean((aapl[:Volume]))); # => 6306547 The data is read into a DataFrame and we can estimate the mean (m1). For the volume, it is possible to cast it as an integer as this makes more sense. We can do this by creating a weighting vector. using Match wts = zeros(naapl); for i in 1:naapl dt = aapl[:Date][i] wts[i] = @match dt begin r"^2000" => 1.0 r"^2001" => 2.0 r"^2002" => 4.0 _ => 0.0 end end; wv = WeightVec(wts); m2 = int(mean(aapl[:Volume], wv)); # => 6012863 Computing weighted statistical metrics it is possible to 'trim' off the outliers from each end of the data. Returning to the closing prices: mean(aapl[:Close]); # => 37.1255 mean(aapl[:Close], wv); # => 26.9944 trimmean(aapl[:Close], 0.1); # => 34.3951 trimean() is the trimmed mean where 5 percent is taken from each end. std(aapl[:Close]); # => 34.1186 skewness(aapl[:Close]) # => 1.6911 kurtosis(aapl[:Close]) # => 1.3820 As well as second moments such as standard deviation, StatsBase provides a generic moments() function and specific instances based on these such as for skewness (third) and kurtosis (fourth). It is also possible to provide some summary statistics: summarystats(aapl[:Close]) Summary Stats: Mean: 37.125505 Minimum: 13.590000 1st Quartile: 17.735000 Median: 21.490000 3rd Quartile: 31.615000 Maximum: 144.190000 The first and third quartiles related to the 25 percent and 75 percent percentiles for a finer granularity we can use the percentile() function. percentile(aapl[:Close],5); # => 14.4855 percentile(aapl[:Close],95); # => 118.934 MySQL access using PyCall We have seen previously that Python can be used for plotting via the PyPlot package which interfaces with matplotlib. In fact the ability to easily call Python modules is a very powerful feature in Julia and we can use this as an alternative method for connecting to databases. Any database which can be manipulated by Python is also available to Julia. In particular since the DBD driver for MySQL is not fully DBT compliant, let's look this approach to running some queries. Our current MySQL setup already has the Chinook dataset loaded some we will execute a query to list the Genre table. In Python we will first need to download the MySQL Connector module. For Anaconda this needs to be using the source (independent) distribution, rather than a binary package and the installation performed using the setup.py file. The query (in Python) to list the Genre table would be: import mysql.connector as mc cnx = mc.connect(user="malcolm", password="mypasswd") csr = cnx.cursor() qry = "SELECT * FROM Chinook.Genre" csr.execute(qry) for vals in csr: print(vals) (1, u'Rock') (2, u'Jazz') (3, u'Metal') (4, u'Alternative & Punk') (5, u'Rock And Roll') ... ... csr.close() cnx.close() We can execute the same in Julia by using the PyCall to the mysql.connector module and the form of the coding is remarkably similar: using PyCall @pyimport mysql.connector as mc cnx = mc.connect (user="malcolm", password="mypasswd"); csr = cnx[:cursor]() query = "SELECT * FROM Chinook.Genre" csr[:execute](query) for vals in csr id = vals[1] genre = vals[2] @printf “ID: %2d, %sn” id genre end ID: 1, Rock ID: 2, Jazz ID: 3, Metal ID: 4, Alternative & Punk ID: 5, Rock And Roll ... ... csr[:close]() cnx[:close]() Note that the form of the call is a little different from the corresponding Python method, since Julia is not object-oriented the methods for a Python object are constructed as an array of symbols. For example the Python csr.execute(qry) routine is called in Julia as csr[:execute](qry). Also be aware that although Python arrays are zero-based this is translated to one-based by PyCall, so the first values is referenced as vals[1]. Scientific programming with Julia Julia was originally developed as a replacement for MATLAB with a focus on scientific programming. There are modules which are concerned with linear algebra, signal processing, mathematical calculus, optimization problems, and stochastic simulation. The following is a subject dear to my heart: the solution of differential equations. Differential equations are those which involve terms which involve the rates of change of variates as well as the variates themselves. They arise naturally in a number of fields, notably dynamics and when the changes are with respect to one dependent variable, often time, the systems are called ordinary differential equations. If more than a single dependent variable is involved, then they are termed partial differential equations. Julia supports the solution of ordinary differential equations thorough a couple of packages ODE and Sundials. The former (ODE) consists of routines written solely in Julia whereas Sundials is a wrapper package around a shared library. ODE exports a set of adaptive solvers; adaptive meaning that the 'step' size of the algorithm changes algorithmically to reduce the error estimate to be below a certain threshold. The calls take the form odeXY, where X is the order of the solver and Y the error control. ode23: Third order adaptive solver with third order error control ode45: Fourth order adaptive solver with fifth order error control ode78: Seventh order adaptive solver with eighth order error control To solve the explicit ODE defined as a vectorize set of equations dy/dt = F(t,y), all routines of which have the same basic form: (tout, yout) = odeXY(F, y0, tspan). As an example, I will look at it as a linear three-species food chain model where the lowest-level prey x is preyed upon by a mid-level species y, which, in turn, is preyed upon by a top level predator z. This is an extension of the Lotka-Volterra system from to three species. Examples might be three-species ecosystems such as mouse-snake-owl, vegetation-rabbits-foxes, and worm-sparrow-falcon. x' = a*x − b*x*y y' = −c*y + d*x*y − e*y*z z' = −f*z + g*y*z #for a,b,c,d,e,f g > 0 Where a, b, c, d are in the two-species Lotka-Volterra equations: e represents the effect of predation on species y by species z f represents the natural death rate of the predator z in the absence of prey g represents the efficiency and propagation rate of the predator z in the presence of prey This translates to the following set of linear equations: x[1] = p[1]*x[1] - p[2]*x[1]*x[2] x[2] = -p[3]*x[2] + p[4]*x[1]*x[2] - p[5]*x[2]*x[3] x[3] = -p[6]*x[3] + p[7]*x[2]*x[3] It is slightly over specified since one of the parameters can be removed by rescaling the timescale. We define the function F as follows: function F(t,x,p) d1 = p[1]*x[1] - p[2]*x[1]*x[2] d2 = -p[3]*x[2] + p[4]*x[1]*x[2] - p[5]*x[2]*x[3] d3 = -p[6]*x[3] + p[7]*x[2]*x[3] [d1, d2, d3] end This takes the time range, vectors of the independent variables and of the coefficients and returns a vector of the derivative estimates: p = ones(7); # Choose all parameters as 1.0 x0 = [0.5, 1.0, 2.0]; # Setup the initial conditions tspan = [0.0:0.1:10.0]; # and the time range Solve the equations by calling the ode23 routine. This returns a matrix of the solutions in a columnar order which we extract and display using Winston: (t,x) = ODE.ode23((t,x) -> F(t,x,pp), x0, tspan); n = length(t); y1 = zeros(n); [y1[i] = x[i][1] for i = 1:n]; y2 = zeros(n); [y2[i] = x[i][2] for i = 1:n]; y3 = zeros(n); [y3[i] = x[i][3] for i = 1:n]; using Winston plot(t,y1,"b.",t,y2,"g-.",t,y3,"r--") This is shown in the following figure: Graphics with Gadlfy Julia now provides a vast array of graphics packages. The "popular" ones may be thought of as Winston, PyPlot and Gadfly and there is also an interface to the increasingly more popular online system Plot.ly. Gadfly is a large and complex package and provides great flexibility in the range and breadth of the visualizations possible in Julia. It is equivalent to the R module ggplot2 and similarly is based on the seminal work The Grammar of Graphics by Leland Wilkinson. The package was written by Daniel Jones and the source on GitHub contains numerous examples of visual displays together with the accompanying code. An entire text could be devoted just to Gadfly so I can only point out some of the main features here and encourage the reader interested in print standard graphics in Julia to refer to the online website at http://gadflyjl.org. The standard call is to the plot() function with creates a graph on the display device via a browser either directly or under the control of IJulia if that is being used as an IDE. It is possible to assign the result of plot() to a variable and invoke this using display(), In this way output can be written to files including: SVG, SVGJS/D3 PDF, and PNG. dd = plot(x = rand(10), y = rand(10)); draw(SVG(“random-pts.svg”, 15cm, 12cm) , dd); Notice that if writing to a backend, the display size is provided, this can be specified in units of cm and inch. Gadfly works well with C libraries of cairo, pango and, fontconfig installed. It will produce SVG and SVGJS graphics but for PNG, PostScript (PS) and PDF cairo is required. Also complex text layouts are more accurate when pango and fontconfig are available. The plot() call can operate on three different data sources: Dataframes Functions and expressions Arrays and collections Unless otherwise specified the type of graph produced is a scatter diagram. The ability to work directly with data frames is especially useful. To illustrate this let's look at the GCSE result set. Recall that this is available as part of the RDatasets suite of source data. using Gadfly, RDatasets, DataFrames; set_default_plot_size(20cm, 12cm); mlmf = dataset("mlmRev","Gcsemv") df = mlmf[complete_cases(mlmf), :] After extracting the data we need to operate with values with do not have any NA values, so we use the complete_cases() function to create a subset of the original data. names(df) 5-element Array{Symbol,1}: ; # => [ :School, :Student, :Gender, :Written, :Course ] If we wish to view the data values for the exam and course work results and at the same time differentiate between boys and girls, this can be displayed by: plot(df, x="Course", y="Written", color="Gender") The JuliaGPU community group Many Julia modules build on the work of other authors working within the same field of study and these have classified as community groups (http://julialang.org/community). Probably the most prolific is the statistics group: JuliaStats (http://juliastats.github.io). One of the main themes in my professional career has been working with hardware to speed up the computing process. In my work on satellite data I worked with the STAR-100 array processor, and once back in the UK, used Silicon Graphics for 3D rendering of medical data . Currently I am interested in using NVIDIA GPUs in financial scenarios and risk calculations. Much of this work has been coded in C, with domain specific languages to program the ancillary hardware. It is now possible to do much of this in Julia with packages contained in the JuliaGPU group. This has routines for both CUDA and OpenCL, at present covering: Basic runtime: CUDA.jl, CUDArt.jl, OpenCL.jl BLAS integration: CUBLAS.jl, CLBLAS FFT operations: CUFFT.jl, CLFFT.jl The CU*-style routines only applies to NVIDIA cards and requires the CUDA SDK to be installed, whereas CL*-functions can be used with variety of GPU s. The CLFFT and CLBLAS do require some additional libraries to be present but we can use OpenCL as is. The following is output from a Lenovo Z50 laptop with an i7 processor and both Intel and NVIDIA graphics chips. julia> using OpenCL julia> OpenCL.devices() OpenCL.Platform(Intel(R) HDGraphics 4400) OpenCL.Platform(Intel(R) Core(TM) i7-4510U CPU) OpenCL.Platform(GeForce 840M on NVIDIA CUDA) To do some calculations we need to define a kernel to be loaded on the GPU. The following multiplies two 1024x1024 matrices of Gaussian random numbers: import OpenCL const cl = OpenCL const kernel_source = """ __kernel void mmul( const int Mdim, const int Ndim, const int Pdim, __global float* A, __global float* B, __global float* C) { int k; int i = get_global_id(0); int j = get_global_id(1); float tmp; if ((i < Ndim) && (j < Mdim)) { tmp = 0.0f; for (k = 0; k < Pdim; k++) tmp += A[i*Ndim + k] * B[k*Pdim + j]; C[i*Ndim+j] = tmp; } } """ The kernel is expressed as a string and the OpenCL DSL has a C-like syntax: const ORDER = 1024; # Order of the square matrices A, B and C const TOL = 0.001; # Tolerance used in floating point comps const COUNT = 3; # Number of runs sizeN = ORDER * ORDER; h_A = float32(randn(ORDER)); # Fill array with random numbers h_B = float32(randn(ORDER)); # --- ditto -- h_C = Array(Float32, ORDER); # Array to hold the results ctx = cl.Context(cl.devices()[3]); queue = cl.CmdQueue(ctx, :profile); d_a = cl.Buffer(Float32, ctx, (:r,:copy), hostbuf = h_A); d_b = cl.Buffer(Float32, ctx, (:r,:copy), hostbuf = h_B); d_c = cl.Buffer(Float32, ctx, :w, length(h_C)); We now create the Open CL context and some data space on the GPU for the three arrays d_A, d_B, and D_C. Then we copy the data in the host arrays h_A and h_B to the device and then load the kernel onto the GPU. prg = cl.Program(ctx, source=kernel_source) |> cl.build! mmul = cl.Kernel(prg, "mmul"); The following loop runs the kernel COUNT times to give an accurate estimate of the elapsed time for the operation. This includes the cl-copy!() operation which copies the results back from the device to the host (Julia) program. for i in 1:COUNT fill!(h_C, 0.0); global_range = (ORDER. ORDER); mmul_ocl = mmul[queue, global_range]; evt = mmul_ocl(int32(ORDER), int32(ORDER), int32(ORDER), d_a, d_b, d_c); run_time = evt[:profile_duration] / 1e9; cl.copy!(queue, h_C, d_c); mflops = 2.0 * Ndims^3 / (1000000.0 * run_time); @printf “%10.8f seconds at %9.5f MFLOPSn” run_time mflops end 0.59426405 seconds at 3613.686 MFLOPS 0.59078856 seconds at 3634.957 MFLOPS 0.57401651 seconds at 3741.153 MFLOPS This compares with the figures for running this natively, without the GPU processor: 7.060888678 seconds at 304.133 MFLOPS That is using the GPU gives a 12-fold increase in the performance of matrix calculation. Summary This article has introduced some of the main features which sets Julia apart from other similar programming languages. I began with a quick look some Julia code by developing a trajectory used in estimating the price of a financial option which was displayed graphically. Continuing with the graphics theme we presented some code to generating a Julia set and to write this to disk as a PGM formatted file. The type system and use of multiple dispatch is discussed next. This a major difference for the user between Julia and object-orientated languages such as R and Python and is central to what gives Julia the power to generate fast machine-level code via LLVM compilation. We then turned to a series of topics from the Julia armory: Working with Python: The ability to call C and Fortran, seamlessly, has been a central feature of Julia since its initial development by the addition of interoperability with Python has opened up a new series of possibilities, leading to the development of the IJulia interface and its integration in the Jupyter project. Simple statistics using DataFrames :As an example of working with data highlighted the Julia implementation of data frames by looking at Apple share prices and applying some simple statistics. MySQL Access using PyCall: Returns to another usage of Python interoperability to illustrate an unconventional method to interface to a MySQL database. Scientific programming with Julia: The case of solution of the ordinary differential equations is presented here by looking at the Lotka-Volterras equation but unusually we develop a solution for the three species model. Graphics with Gadfly: Julia has a wide range of options when developing data visualizations. Gadfly is one of the ‘heavyweights’ and a dataset is extracted from the RDataset.jl package, containing UK GCSE results and the comparison between written and course work results is displayed using Gadfly, categorized by gender. Finally we showcased the work of Julia community groups by looking at an example from the JuliaGPU group by utilizing the OpenCL package to check on the set of supported devices. We then selected an NVIDIA GeForce chip, in order to run execute a simple kernel which multiplied a pair of matrices via the GPU. This was timed against conventional evaluation against native Julia coding in order to illustrate the speedup involved in this approach from parallelizing matrix operations. Resources for Article: Further resources on this subject: Pricing the Double-no-touch option [article] Basics of Programming in Julia [article] SQL Server Analysis Services Administering and Monitoring Analysis Services [article]

0
0
1769

article-image-fine-tune-nginx-configufine-tune-nginx-configurationfine-tune-nginx-configurationratio

Packt

14 Jul 2015

20 min read

Fine-tune the NGINX Configuration

Packt

14 Jul 2015

20 min read

In this article by Rahul Sharma, author of the book NGINX High Performance, we will cover the following topics: NGINX configuration syntax Configuring NGINX workers Configuring NGINX I/O Configuring TCP Setting up the server (For more resources related to this topic, see here.) NGINX configuration syntax This section aims to cover it in good detail. The complete configuration file has a logical structure that is composed of directives grouped into a number of sections. A section defines the configuration for a particular NGINX module, for example, the http section defines the configuration for the ngx_http_core module. An NGINX configuration has the following syntax: Valid directives begin with a variable name and then state an argument or series of arguments separated by spaces. All valid directives end with a semicolon (;). Sections are defined with curly braces ({}). Sections can be nested in one another. The nested section defines a module valid under the particular section, for example, the gzip section under the http section. Configuration outside any section is part of the NGINX global configuration. The lines starting with the hash (#) sign are comments. Configurations can be split into multiple files, which can be grouped using the include directive. This helps in organizing code into logical components. Inclusions are processed recursively, that is, an include file can further have include statements. Spaces, tabs, and new line characters are not part of the NGINX configuration. They are not interpreted by the NGINX engine, but they help to make the configuration more readable. Thus, the complete file looks like the following code: #The configuration begins here global1 value1; #This defines a new section section { sectionvar1 value1; include file1; subsection { subsectionvar1 value1; } } #The section ends here global2 value2; # The configuration ends here NGINX provides the -t option, which can be used to test and verify the configuration written in the file. If the file or any of the included files contains any errors, it prints the line numbers causing the issue: $ sudo nginx -t This checks the validity of the default configuration file. If the configuration is written in a file other than the default one, use the -c option to test it. You cannot test half-baked configurations, for example, you defined a server section for your domain in a separate file. Any attempt to test such a file will throw errors. The file has to be complete in all respects. Now that we have a clear idea of the NGINX configuration syntax, we will try to play around with the default configuration. This article only aims to discuss the parts of the configuration that have an impact on performance. The NGINX catalog has large number of modules that can be configured for some purposes. This article does not try to cover all of them as the details are beyond the scope of the book. Please refer to the NGINX documentation at http://nginx.org/en/docs/ to know more about the modules. Configuring NGINX workers NGINX runs a fixed number of worker processes as per the specified configuration. In the following sections, we will work with NGINX worker parameters. These parameters are mostly part of the NGINX global context. worker_processes The worker_processes directive controls the number of workers: worker_processes 1; The default value for this is 1, that is, NGINX runs only one worker. The value should be changed to an optimal value depending on the number of cores available, disks, network subsystem, server load, and so on. As a starting point, set the value to the number of cores available. Determine the number of cores available using lscpu: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 The same can be accomplished by greping out cpuinfo: $ cat /proc/cpuinfo | grep 'processor' | wc -l Now, set this value to the parameter: # One worker per CPU-core. worker_processes 4; Alternatively, the directive can have auto as its value. This determines the number of cores and spawns an equal number of workers. When NGINX is running with SSL, it is a good idea to have multiple workers. SSL handshake is blocking in nature and involves disk I/O. Thus, using multiple workers leads to improved performance. accept_mutex Since we have configured multiple workers in NGINX, we should also configure the flags that impact worker selection. The accept_mutex parameter available under the events section will enable each of the available workers to accept new connections one by one. By default, the flag is set to on. The following code shows this: events { accept_mutex on; } If the flag is turned to off, all of the available workers will wake up from the waiting state, but only one worker will process the connection. This results in the Thundering Herd phenomenon, which is repeated a number of times per second. The phenomenon causes reduced server performance as all the woken-up workers take up CPU time before going back to the wait state. This results in unproductive CPU cycles and nonutilized context switches. accept_mutex_delay When accept_mutex is enabled, only one worker, which has the mutex lock, accepts connections, while others wait for their turn. The accept_mutex_delay corresponds to the timeframe for which the worker would wait, and after which it tries to acquire the mutex lock and starts accepting new connections. The directive is available under the events section with a default value of 500 milliseconds. The following code shows this: events{ accept_mutex_delay 500ms; } worker_connections The next configuration to look at is worker_connections, with a default value of 512. The directive is present under the events section. The directive sets the maximum number of simultaneous connections that can be opened by a worker process. The following code shows this: events{ worker_connections 512; } Increase worker_connections to something like 1,024 to accept more simultaneous connections. The value of worker_connections does not directly translate into the number of clients that can be served simultaneously. Each browser opens a number of parallel connections to download various components that compose a web page, for example, images, scripts, and so on. Different browsers have different values for this, for example, IE works with two parallel connections while Chrome opens six connections. The number of connections also includes sockets opened with the upstream server, if any. worker_rlimit_nofile The number of simultaneous connections is limited by the number of file descriptors available on the system as each socket will open a file descriptor. If NGINX tries to open more sockets than the available file descriptors, it will lead to the Too many opened files message in the error.log. Check the number of file descriptors using ulimit: $ ulimit -n Now, increase this to a value more than worker_process * worker_connections. The value should be increased for the user that runs the worker process. Check the user directive to get the username. NGINX provides the worker_rlimit_nofile directive, which can be an alternative way of setting the available file descriptor rather modifying ulimit. Setting the directive will have a similar impact as updating ulimit for the worker user. The value of this directive overrides the ulimit value set for the user. The directive is not present by default. Set a large value to handle large simultaneous connections. The following code shows this: worker_rlimit_nofile 20960; To determine the OS limits imposed on a process, read the file /proc/$pid/limits. $pid corresponds to the PID of the process. multi_accept The multi_accept flag enables an NGINX worker to accept as many connections as possible when it gets the notification of a new connection. The purpose of this flag is to accept all connections in the listen queue at once. If the directive is disabled, a worker process will accept connections one by one. The following code shows this: events{ multi_accept on; } The directive is available under the events section with the default value off. If the server has a constant stream of incoming connections, enabling multi_accept may result in a worker accepting more connections than the number specified in worker_connections. The overflow will lead to performance loss as the previously accepted connections, part of the overflow, will not get processed. use NGINX provides several methods for connection processing. Each of the available methods allows NGINX workers to monitor multiple socket file descriptors, that is, when there is data available for reading/writing. These calls allow NGINX to process multiple socket streams without getting stuck in any one of them. The methods are platform-dependent, and the configure command, used to build NGINX, selects the most efficient method available on the platform. If we want to use other methods, they must be enabled first in NGINX. The use directive allows us to override the default method with the method specified. The directive is part of the events section: events { use select; } NGINX supports the following methods of processing connections: select: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-select_module or --without-select_module configuration parameter. poll: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-poll_module or --without-poll_module configuration parameter. kqueue: This is an efficient method of processing connections available on FreeBSD 4.1, OpenBSD 2.9+, NetBSD 2.0, and OS X. There are the additional directives kqueue_changes and kqueue_events. These directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 512. The kqueue method will ignore the multi_accept directive if it has been enabled. epoll: This is an efficient method of processing connections available on Linux 2.6+. The method is similar to the FreeBSD kqueue. There is also the additional directive epoll_events. This specifies the number of events that NGINX will pass to the kernel. The default value for this is 512. /dev/poll: This is an efficient method of processing connections available on Solaris 7 11/99+, HP/UX 11.22+, IRIX 6.5.15+, and Tru64 UNIX 5.1A+. This has the additional directives, devpoll_events and devpoll_changes. The directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 32. eventport: This is an efficient method of processing connections available on Solaris 10. The method requires necessary security patches to avoid kernel crash issues. rtsig: Real-time signals is a connection processing method available on Linux 2.2+. The method has some limitations. On older kernels, there is a system-wide limit of 1,024 signals. For high loads, the limit needs to be increased by setting the rtsig-max parameter. For kernel 2.6+, instead of the system-wide limit, there is a limit on the number of outstanding signals for each process. NGINX provides the worker_rlimit_sigpending parameter to modify the limit for each of the worker processes: worker_rlimit_sigpending 512; The parameter is part of the NGINX global configuration. If the queue overflows, NGINX drains the queue and uses the poll method to process the unhandled events. When the condition is back to normal, NGINX switches back to the rtsig method of connection processing. NGINX provides the rtsig_overflow_events, rtsig_overflow_test, and rtsig_overflow_threshold parameters to control how a signal queue is handled on overflows. The rtsig_overflow_events parameter defines the number of events passed to poll. The rtsig_overflow_test parameter defines the number of events handled by poll, after which NGINX will drain the queue. Before draining the signal queue, NGINX will look up how much it is filled. If the factor is larger than the specified rtsig_overflow_threshold, it will drain the queue. The rtsig method requires accept_mutex to be set. The method also enables the multi_accept parameter. Configuring NGINX I/O NGINX can also take advantage of the Sendfile and direct I/O options available in the kernel. In the following sections, we will try to configure parameters available for disk I/O. Sendfile When a file is transferred by an application, the kernel first buffers the data and then sends the data to the application buffers. The application, in turn, sends the data to the destination. The Sendfile method is an improved method of data transfer, in which data is copied between file descriptors within the OS kernel space, that is, without transferring data to the application buffers. This results in improved utilization of the operating system's resources. The method can be enabled using the sendfile directive. The directive is available for the http, server, and location sections. http{ sendfile on; } The flag is set to off by default. Direct I/O The OS kernel usually tries to optimize and cache any read/write requests. Since the data is cached within the kernel, any subsequent read request to the same place will be much faster because there's no need to read the information from slow disks. Direct I/O is a feature of the filesystem where reads and writes go directly from the applications to the disk, thus bypassing all OS caches. This results in better utilization of CPU cycles and improved cache effectiveness. The method is used in places where the data has a poor hit ratio. Such data does not need to be in any cache and can be loaded when required. It can be used to serve large files. The directio directive enables the feature. The directive is available for the http, server, and location sections: location /video/ { directio 4m; } Any file with size more than that specified in the directive will be loaded by direct I/O. The parameter is disabled by default. The use of direct I/O to serve a request will automatically disable Sendfile for the particular request. Direct I/O depends on the block size while doing a data transfer. NGINX has the directio_alignment directive to set the block size. The directive is present under the http, server, and location sections: location /video/ { directio 4m; directio_alignment 512; } The default value of 512 bytes works well for all boxes unless it is running a Linux implementation of XFS. In such a case, the size should be increased to 4 KB. Asynchronous I/O Asynchronous I/O allows a process to initiate I/O operations without having to block or wait for it to complete. The aio directive is available under the http, server, and location sections of an NGINX configuration. Depending on the section, the parameter will perform asynchronous I/O for the matching requests. The parameter works on Linux kernel 2.6.22+ and FreeBSD 4.3. The following code shows this: location /data { aio on; } By default, the parameter is set to off. On Linux, aio needs to be enabled with directio, while on FreeBSD, sendfile needs to be disabled for aio to take effect. If NGINX has not been configured with the --with-file-aio module, any use of the aio directive will cause the unknown directive aio error. The directive has a special value of threads, which enables multithreading for send and read operations. The multithreading support is only available on the Linux platform and can only be used with the epoll, kqueue, or eventport methods of processing requests. In order to use the threads value, configure multithreading in the NGINX binary using the --with-threads option. Post this, add a thread pool in the NGINX global context using the thread_pool directive. Use the same pool in the aio configuration: thread_pool io_pool threads=16; http{ …..... location /data{ sendfile on; aio threads=io_pool; } } Mixing them up The three directives can be mixed together to achieve different objectives on different platforms. The following configuration will use sendfile for files with size smaller than what is specified in directio. Files served by directio will be read using asynchronous I/O: location /archived-data/{ sendfile on; aio on; directio 4m; } The aio directive has a sendfile value, which is available only on the FreeBSD platform. The value can be used to perform Sendfile in an asynchronous manner: location /archived-data/{ sendfile on; aio sendfile; } NGINX invokes the sendfile() system call, which returns with no data in the memory. Post this, NGINX initiates data transfer in an asynchronous manner. Configuring TCP HTTP is an application-based protocol, which uses TCP as the transport layer. In TCP, data is transferred in the form of blocks known as TCP packets. NGINX provides directives to alter the behavior of the underlying TCP stack. These parameters alter flags for an individual socket connection. TCP_NODELAY TCP/IP networks have the "small packet" problem, where single-character messages can cause network congestion on a highly loaded network. Such packets are 41 bytes in size, where 40 bytes are for the TCP header and 1 byte has useful information. These small packets have huge overhead, around 4000 percent and can saturate a network. John Nagle solved the problem (Nagle's algorithm) by not sending the small packets immediately. All such packets are collected for some amount of time and then sent in one go as a single packet. This results in improved efficiency of the underlying network. Thus, a typical TCP/IP stack waits for up to 200 milliseconds before sending the data packages to the client. It is important to note that the problem exists with applications such as Telnet, where each keystroke is sent over wire. The problem is not relevant to a web server, which severs static files. The files will mostly form full TCP packets, which can be sent immediately instead of waiting for 200 milliseconds. The TCP_NODELAY option can be used while opening a socket to disable Nagle's buffering algorithm and send the data as soon as it is available. NGINX provides the tcp_nodelay directive to enable this option. The directive is available under the http, server, and location sections of an NGINX configuration: http{ tcp_nodelay on; } The directive is enabled by default. NGINX use tcp_nodelay for connections with the keep-alive mode. TCP_CORK As an alternative to Nagle's algorithm, Linux provides the TCP_CORK option. The option tells the TCP stack to append packets and send them when they are full or when the application instructs to send the packet by explicitly removing TCP_CORK. This results in an optimal amount of data packets being sent and, thus, improves the efficiency of the network. The TCP_CORK option is available as the TCP_NOPUSH flag on FreeBSD and Mac OS. NGINX provides the tcp_nopush directive to enable TCP_CORK over the connection socket. The directive is available under the http, server, and location sections of an NGINX configuration: http{ tcp_nopush on; } The directive is disabled by default. NGINX uses tcp_nopush for requests served with sendfile. Setting them up The two directives discussed previously do mutually exclusive things; the former makes sure that the network latency is reduced, while the latter tries to optimize the data packets sent. An application should set both of these options to get efficient data transfer. Enabling tcp_nopush along with sendfile makes sure that while transferring a file, the kernel creates the maximum amount of full TCP packets before sending them over wire. The last packet(s) can be partial TCP packets, which could end up waiting with TCP_CORK being enabled. NGINX make sure it removes TCP_CORK to send these packets. Since tcp_nodelay is also set then, these packets are immediately sent over the network, that is, without any delay. Setting up the server The following configuration sums up all the changes proposed in the preceding sections: worker_processes 3; worker_rlimit_nofile 8000; events { multi_accept on; use epoll; worker_connections 1024; } http { sendfile on; aio on; directio 4m; tcp_nopush on; tcp_nodelay on; # Rest Nginx configuration removed for brevity } It is assumed that NGINX runs on a quad core server. Thus three worker processes have been spanned to take advantage of three out of four available cores and leaving one core for other processes. Each of the workers has been configured to work with 1,024 connections. Correspondingly, the nofile limit has been increased to 8,000. By default, all worker processes operate with mutex; thus, the flag has not been set. Each worker processes multiple connections in one go using the epoll method. In the http section, NGINX has been configured to serve files larger than 4 MB using direct I/O, while efficiently buffering smaller files using Sendfile. TCP options have also been set up to efficiently utilize the available network. Measuring gains It is time to test the changes and make sure that they have given performance gain. Run a series of tests using Siege/JMeter to get new performance numbers. The tests should be performed with the same configuration to get a comparable output: $ siege -b -c 790 -r 50 -q http://192.168.2.100/hello Transactions: 79000 hits Availability: 100.00 % Elapsed time: 24.25 secs Data transferred: 12.54 MB Response time: 0.20 secs Transaction rate: 3257.73 trans/sec Throughput: 0.52 MB/sec Concurrency: 660.70 Successful transactions: 39500 Failed transactions: 0 Longest transaction: 3.45 Shortest transaction: 0.00 The results from Siege should be evaluated and compared to the baseline. Throughput: The transaction rate defines this as 3250 requests/second Error rate: Availability is reported as 100 percent; thus; the error rate is 0 percent Response time: The results shows a response time of 0.20 seconds Thus, these new numbers demonstrate performance improvement in various respects. After the server configuration is updated with all the changes, reperform all tests with increased numbers. The aim should be to determine the new baseline numbers for the updated configuration. Summary The article started with an overview of the NGINX configuration syntax. Going further, we discussed worker_connections and the related parameters. These allow you to take advantage of the available hardware. The article also talked about the different event processing mechanisms available on different platforms. The configuration discussed helped in processing more requests, thus improving the overall throughput. NGINX is primarily a web server; thus, it has to serve all kinds static content. Large files can take advantage of direct I/O, while smaller content can take advantage of Sendfile. The different disk modes make sure that we have an optimal configuration to serve the content. In the TCP stack, we discussed the flags available to alter the default behavior of TCP sockets. The tcp_nodelay directive helps in improving latency. The tcp_nopush directive can help in efficiently delivering the content. Both these flags lead to improved response time. In the last part of the article, we applied all the changes to our server and then did performance tests to determine the effectiveness of the changes done. In the next article, we will try to configure buffers, timeouts, and compression to improve the utilization of the available network. Resources for Article: Further resources on this subject: Using Nginx as a Reverse Proxy [article] Nginx proxy module [article] Introduction to nginx [article]

0
0
14139

article-image-understanding-mutability-and-immutability-python-c-and-javascript

Packt

14 Jul 2015

15 min read

Understanding mutability and immutability in Python, C#, and JavaScript

Packt

14 Jul 2015

15 min read

In this article by Gastón C. Hillar, author of the book, Learning Object-Oriented Programming, you will learn the concept of mutability and immutability in the programming languages such as, Python, C#, and JavaScript. What’s the difference? By default, any instance field or attribute works like a variable; therefore we can change their values. When we create an instance of a class that defines many public instance fields, we are creating a mutable object, that is, an object that can change its state. For example, let's think about a class named MutableVector3D that represents a mutable 3D vector with three public instance fields: X, Y, and Z. We can create a new MutableVector3D instance and initialize the X, Y, and Z attributes. Then, we can call the Sum method with their delta values for X, Y, and Z as arguments. The delta values specify the difference between the existing value and the new or desired value. So, for example, if we specify a positive value of 20 in the deltaX parameter, it means that we want to add 20 to the X value. The following lines show pseudocode in a neutral programming language that create a new MutableVector3D instance called myMutableVector, initialized with values for the X, Y, and Z fields. Then, the code calls the Sum method with the delta values for X, Y, and Z as arguments, as shown in the following code: myMutableVector = new MutableVector3D instance with X = 30, Y = 50 and Z = 70 myMutableVector.Sum(deltaX: 20, deltaY: 30, deltaZ: 15) The initial values for the myMutableVector field are 30 for X, 50 for Y, and 70 for Z. The Sum method changes the values of all the three fields; therefore, the object state mutates as follows: myMutableVector.X mutates from 30 to 30 + 20 = 50 myMutableVector.Y mutates from 50 to 50 + 30 = 80 myMutableVector.Z mutates from 70 to 70 + 15 = 85 The values for the myMutableVector field after the call to the Sum method are 50 for X, 80 for Y, and 85 for Z. We can say this method mutated the object's state; therefore, myMutableVector is a mutable object: an instance of a mutable class. Mutability is very important in object-oriented programming. In fact, whenever we expose fields and/or properties, we will create a class that will generate mutable instances. However, sometimes a mutable object can become a problem. In certain situations, we want to avoid objects to change their state. For example, when we work with a concurrent code, an object that cannot change its state solves many concurrency problems and avoids potential bugs. For example, we can create an immutable version of the previous MutableVector3D class to represent an immutable 3D vector. The new ImmutableVector3D class has three read-only properties: X, Y, and Z. Thus, there are only three getter methods without setter methods, and we cannot change the values of the underlying internal fields: m_X, m_Y, and m_Z. We can create a new ImmutableVector3D instance and initialize the underlying internal fields: m_X, m_Y, and m_Z. X, Y, and Z attributes. Then, we can call the Sum method with the delta values for X, Y, and Z as arguments. The following lines show the pseudocode in a neutral programming language that create a new ImmutableVector3D instance called myImmutableVector, which is initialized with values for X, Y, and Z as arguments. Then, the pseudocode calls the Sum method with the delta values for X, Y, and Z as arguments: myImmutableVector = new ImmutableVector3D instance with X = 30, Y = 50 and Z = 70 myImmutableSumVector = myImmutableVector.Sum(deltaX: 20, deltaY: 30, deltaZ: 15) However, this time the Sum method returns a new instance of the ImmutableVector3D class with the X, Y, and Z values initialized to the sum of X, Y, and Z and the delta values for X, Y, and Z. So, myImmutableSumVector is a new ImmutableVector3D instance initialized with X = 50, Y = 80, and Z = 85. The call to the Sum method generated a new instance and didn't mutate the existing object. The immutable version adds an overhead as compared with the mutable version because it's necessary to create a new instance of a class as a result of calling the Sum method. The mutable version just changed the values for the attributes and it wasn't necessary to generate a new instance. Obviously, the immutable version has a memory and a performance overhead. However, when we work with the concurrent code, it makes sense to pay for the extra overhead to avoid potential issues caused by mutable objects. Using methods to add behaviors to classes in Python So far, we have added instance methods to classes and used getter and setter methods combined with decorators to define properties. Now, we want to generate a class to represent the mutable version of a 3D vector. We will use properties with simple getter and setter methods for x, y, and z. The sum public instance method receives the delta values for x, y, and z and mutates an object, that is, the setter method changes the values of x, y, and z. Here is the initial code of the MutableVector3D class: class MutableVector3D: def __init__(self, x, y, z): self.__x = x self.__y = y self.__z = z def sum(self, delta_x, delta_y, delta_z): self.__x += delta_x self.__y += delta_y self.__z += delta_z @property def x(self): return self.__x @x.setter def x(self, x): self.__x = x @property def y(self): return self.__y @y.setter def y(self, y): self.__y = y @property def z(self): return self.__z @z.setter def z(self, z): self.__z = z It's a very common requirement to generate a 3D vector with all the values initialized to 0, that is, x = 0, y = 0, and z = 0. A 3D vector with these values is known as an origin vector. We can add a class method to the MutableVector3D class named origin_vector to generate a new instance of the class initialized with all the values initialized to 0. It's necessary to add the @classmethod decorator before the class method header. Instead of receiving self as the first argument, a class method receives the current class; the parameter name is usually named cls. The following code defines the origin_vector class method: @classmethod def origin_vector(cls): return cls(0, 0, 0) The preceding method returns a new instance of the current class (cls) with 0 as the initial value for the three elements. The class method receives cls as the only argument; therefore, it will be a parameterless method when we call it because Python includes a class as a parameter under the hood. The following command calls the origin_vector class method to generate a 3D vector, calls the sum method for the generated instance, and prints the values for the three elements: mutableVector3D = MutableVector3D.origin_vector() mutableVector3D.sum(5, 10, 15) print(mutableVector3D.x, mutableVector3D.y, mutableVector3D.z) Now, we want to generate a class to represent the immutable version of a 3D vector. In this case, we will use read-only properties for x, y, and z. The sum public instance method receives the delta values for x, y, and z and returns a new instance of the same class with the values of x, y, and z initialized with the results of the sum. Here is the code of the ImmutableVector3D class: class ImmutableVector3D: def __init__(self, x, y, z): self.__x = x self.__y = y self.__z = z def sum(self, delta_x, delta_y, delta_z): return type(self)(self.__x + delta_x, self.__y + delta_y, self.__z + delta_z) @property def x(self): return self.__x @property def y(self): return self.__y @property def z(self): return self.__z @classmethod def equal_elements_vector(cls, initial_value): return cls(initial_value, initial_value, initial_value) @classmethod def origin_vector(cls): return cls.equal_elements_vector(0) Note that the sum method uses type(self) to generate and return a new instance of the current class. In this case, the origin_vector class method returns the results of calling the equal_elements_vector class method with 0 as an argument. Remember that the cls argument refers to the actual class. The equal_elements_vector class method receives an initial_value argument for all the elements of the 3D vector, creates an instance of the actual class, and initializes all the elements with the received unique value. The origin_vector class method demonstrates how we can call another class method in a class method. The following command calls the origin_vector class method to generate a 3D vector, calls the sum method for the generated instance, and prints the values for the three elements of the new instance returned by the sum method: vector0 = ImmutableVector3D.origin_vector() vector1 = vector0.sum(5, 10, 15) print(vector1.x, vector1.y, vector1.z) As explained previously, we can change the values of the private attributes; therefore, the ImmutableVector3D class isn't 100 percent immutable. However, we are all adults and don't expect the users of a class with read-only properties to change the values of private attributes hidden under difficult to access names. Using methods to add behaviors to classes in C# So far, we have added instance methods to a class in C#, and used getter and setter methods to define properties. Now, we want to generate a class to represent the mutable version of a 3D vector in C#. We will use auto-implemented properties for X, Y, and Z. The public Sum instance method receives the delta values for X, Y, and Z (deltaX, deltaY, and deltaZ) and mutates the object, that is, the method changes the values of X, Y, and Z. The following shows the initial code of the MutableVector3D class: class MutableVector3D { public double X { get; set; } public double Y { get; set; } public double Z { get; set; } public void Sum(double deltaX, double deltaY, double deltaZ) { this.X += deltaX; this.Y += deltaY; this.Z += deltaZ; } public MutableVector3D(double x, double y, double z) { this.X = x; this.Y = y; this.Z = z; } } It's a very common requirement to generate a 3D vector with all the values initialized to 0, that is, X = 0, Y = 0, and Z = 0. A 3D vector with these values is known as an origin vector. We can add a class method to the MutableVector3D class named OriginVector to generate a new instance of the class initialized with all the values initialized to 0. Class methods are also known as static methods in C#. It's necessary to add the static keyword after the public access modifier before the class method name. The following commands define the OriginVector static method: public static MutableVector3D OriginVector() { return new MutableVector3D(0, 0, 0); } The preceding method returns a new instance of the MutableVector3D class with 0 as the initial value for all the three elements. The following code calls the OriginVector static method to generate a 3D vector, calls the Sum method for the generated instance, and prints the values for all the three elements on the console output: var mutableVector3D = MutableVector3D.OriginVector(); mutableVector3D.Sum(5, 10, 15); Console.WriteLine(mutableVector3D.X, mutableVector3D.Y, mutableVector3D.Z) Now, we want to generate a class to represent the immutable version of a 3D vector. In this case, we will use read-only properties for X, Y, and Z. We will use auto-generated properties with private set. The Sum public instance method receives the delta values for X, Y, and Z (deltaX, deltaY, and deltaZ) and returns a new instance of the same class with the values of X, Y, and Z initialized with the results of the sum. The code for the ImmutableVector3D class is as follows: class ImmutableVector3D { public double X { get; private set; } public double Y { get; private set; } public double Z { get; private set; } public ImmutableVector3D Sum(double deltaX, double deltaY, double deltaZ) { return new ImmutableVector3D ( this.X + deltaX, this.Y + deltaY, this.Z + deltaZ); } public ImmutableVector3D(double x, double y, double z) { this.X = x; this.Y = y; this.Z = z; } public static ImmutableVector3D EqualElementsVector(double initialValue) { return new ImmutableVector3D(initialValue, initialValue, initialValue); } public static ImmutableVector3D OriginVector() { return ImmutableVector3D.EqualElementsVector(0); } } In the new class, the Sum method returns a new instance of the ImmutableVector3D class, that is, the current class. In this case, the OriginVector static method returns the results of calling the EqualElementsVector static method with 0 as an argument. The EqualElementsVector class method receives an initialValue argument for all the elements of the 3D vector, creates an instance of the actual class, and initializes all the elements with the received unique value. The OriginVector static method demonstrates how we can call another static method in a static method. The following code calls the OriginVector static method to generate a 3D vector, calls the Sum method for the generated instance, and prints all the values for the three elements of the new instance returned by the Sum method on the console output: var vector0 = ImmutableVector3D.OriginVector(); var vector1 = vector0.Sum(5, 10, 15); Console.WriteLine(vector1.X, vector1.Y, vector1.Z); C# doesn't allow users of the ImmutableVector3D class to change the values of X, Y, and Z properties. The code doesn't compile if you try to assign a new value to any of these properties. Thus, we can say that the ImmutableVector3D class is 100 percent immutable. Using methods to add behaviors to constructor functions in JavaScript So far, we have added methods to a constructor function that produced instance methods in a generated object. In addition, we used getter and setter methods combined with local variables to define properties. Now, we want to generate a constructor function to represent the mutable version of a 3D vector. We will use properties with simple getter and setter methods for x, y, and z. The sum public instance method receives the delta values for x, y, and z and mutates an object, that is, the method changes the values of x, y, and z. The following code shows the initial code of the MutableVector3D constructor function: function MutableVector3D(x, y, z) { var _x = x; var _y = y; var _z = z; Object.defineProperty(this, 'x', { get: function(){ return _x; }, set: function(val){ _x = val; } }); Object.defineProperty(this, 'y', { get: function(){ return _y; }, set: function(val){ _y = val; } }); Object.defineProperty(this, 'z', { get: function(){ return _z; }, set: function(val){ _z = val; } }); this.sum = function(deltaX, deltaY, deltaZ) { _x += deltaX; _y += deltaY; _z += deltaZ; } } It's a very common requirement to generate a 3D vector with all the values initialized to 0, that is, x = 0, y = 0, and, z = 0. A 3D vector with these values is known as an origin vector. We can add a function to the MutableVector3D constructor function named originVector to generate a new instance of a class with all the values initialized to 0. The following code defines the originVector function: MutableVector3D.originVector = function() { return new MutableVector3D(0, 0, 0); }; The method returns a new instance built in the MutableVector3D constructor function with 0 as the initial value for all the three elements. The following code calls the originVector function to generate a 3D vector, calls the sum method for the generated instance, and prints all the values for all the three elements: var mutableVector3D = MutableVector3D.originVector(); mutableVector3D.sum(5, 10, 15); console.log(mutableVector3D.x, mutableVector3D.y, mutableVector3D.z); Now, we want to generate a constructor function to represent the immutable version of a 3D vector. In this case, we will use read-only properties for x, y, and z. In this case, we will use the ImmutableVector3D.prototype property to define the sum method. The method receives the values of delta for x, y, and z, and returns a new instance with the values of x, y, and z initialized with the results of the sum. The following code shows the ImmutableVector3D constructor function and the additional code that defines all the other methods: function ImmutableVector3D(x, y, z) { var _x = x; var _y = y; var _z = z; Object.defineProperty(this, 'x', { get: function(){ return _x; } }); Object.defineProperty(this, 'y', { get: function(){ return _y; } }); Object.defineProperty(this, 'z', { get: function(){ return _z; } }); } ImmutableVector3D.prototype.sum = function(deltaX, deltaY, deltaZ) { return new ImmutableVector3D( this.x + deltaX, this.y + deltaY, this.z + deltaZ); }; ImmutableVector3D.equalElementsVector = function(initialValue) { return new ImmutableVector3D(initialValue, initialValue, initialValue); }; ImmutableVector3D.originVector = function() { return ImmutableVector3D.equalElementsVector(0); }; Again, note that the preceding code defines the sum method in the ImmutableVector3D.prototype method. This method will be available to all the instances generated in the ImmutableVector3D constructor function. The sum method generates and returns a new instance of ImmutableVector3D. In this case, the originVector method returns the results of calling the equalElementsVector method with 0 as an argument. The equalElementsVector method receives an initialValue argument for all the elements of the 3D vector, creates an instance of the actual class, and initializes all the elements with the received unique value. The originVector method demonstrates how we can call another function defined in the constructor function. The following code calls the originVector method to generate a 3D vector, calls the sum method for the generated instance, and prints the values for all the three elements of the new instance returned by the sum method: var vector0 = ImmutableVector3D.originVector(); var vector1 = vector0.sum(5, 10, 15); console.log(vector1.x, vector1.y, vector1.z); Summary In this article, you learned the concept of mutability and immutability in the programming languages such as, Python, C#, and JavaScript and used methods to add behaviors in each of the programming language. So, what’s next? Continue Learning Object-Oriented Programming with Gastón’s book – find out more here.

0
0
1860

article-image-architecting-and-coding-high-performance-net-applications

Packt

09 Jul 2015

15 min read

Architecting and coding high performance .NET applications

Packt

09 Jul 2015

15 min read

In this article by Antonio Esposito, author of Learning .NET High Performance Programming, we will learn about low-pass audio filtering implemented using .NET, and also learn about MVVM and XAML. Model-View-ViewModel and XAML The MVVM pattern is another descendant of the MVC pattern. Born from an extensive update to the MVP pattern, it is at the base of all eXtensible Application Markup Language (XAML) language-based frameworks, such as Windows presentation foundation (WPF), Silverlight, Windows Phone applications, and Store Apps (formerly known as Metro-style apps). MVVM is different from MVC, which is used by Microsoft in its main web development framework in that it is used for desktop or device class applications. The first and (still) the most powerful application framework using MVVM in Microsoft is WPF, a desktop class framework that can use the full .NET 4.5.3 environment. Future versions within Visual Studio 2015 will support built-in .NET 4.6. On the other hand, all other frameworks by Microsoft that use the XAML language supporting MVVM patterns are based on a smaller edition of .NET. This happens with Silverlight, Windows Store Apps, Universal Apps, or Windows Phone Apps. This is why Microsoft made the Portable Library project within Visual Studio, which allows us to create shared code bases compatible with all frameworks. While a Controller in MVC pattern is sort of a router for requests to catch any request and parsing input/output Models, the MVVM lies behind any View with a full two-way data binding that is always linked to a View's controls and together at Model's properties. Actually, multiple ViewModels may run the same View and many Views can use the same single/multiple instance of a given ViewModel. A simple MVC/MVVM design comparative We could assert that the experience offered by MVVM is like a film, while the experience offered by MVC is like photography, because while a Controller always makes one-shot elaborations regarding the application user requests in MVC, in MVVM, the ViewModel is definitely the view! Not only does a ViewModel lie behind a View, but we could also say that if a VM is a body, then a View is its dress. While the concrete View is the graphical representation, the ViewModel is the virtual view, the un-concrete view, but still the View. In MVC, the View contains the user state (the value of all items showed in the UI) until a GET/POST invocation is sent to the web server. Once sent, in the MVC framework, the View simply binds one-way reading data from a Model. In MVVM, behaviors, interaction logic, and user state actually live within the ViewModel. Moreover, it is again in the ViewModel that any access to the underlying Model, domain, and any persistence provider actually flows. Between a ViewModel and View, a data connection called data binding is established. This is a declarative association between a source and target property, such as Person.Name with TextBox.Text. Although it is possible to configure data binding by imperative coding (while declarative means decorating or setting the property association in XAML), in frameworks such as WPF and other XAML-based frameworks, this is usually avoided because of the more decoupled result made by the declarative choice. The most powerful technology feature provided by any XAML-based language is actually the data binding, other than the simpler one that was available in Windows Forms. XAML allows one-way binding (also reverted to the source) and two-way binding. Such data binding supports any source or target as a property from a Model or ViewModel or any other control's dependency property. This binding subsystem is so powerful in XAML-based languages that events are handled in specific objects named Command, and this can be data-bound to specific controls, such as buttons. In the .NET framework, an event is an implementation of the Observer pattern that lies within a delegate object, allowing a 1-N association between the only source of the event (the owner of the event) and more observers that can handle the event with some specific code. The only object that can raise the event is the owner itself. In XAML-based languages, a Command is an object that targets a specific event (in the meaning of something that can happen) that can be bound to different controls/classes, and all of those can register handlers or raise the signaling of all handlers. An MVVM performance map analysis Performance concerns Regarding performance, MVVM behaves very well in several scenarios in terms of data retrieval (latency-driven) and data entry (throughput- and scalability-driven). The ability to have an impressive abstraction of the view in the VM without having to rely on the pipelines of MVC (the actions) makes the programming very pleasurable and give the developer the choice to use different designs and optimization techniques. Data binding itself is done by implementing specific .NET interfaces that can be easily centralized. Talking about latency, it is slightly different from previous examples based on web request-response time, unavailable in MVVM. Theoretically speaking, in the design pattern of MVVM, there is no latency at all. In a concrete implementation within XAML-based languages, latency can refer to two different kinds of timings. During data binding, latency is the time between when a VM makes new data available and a View actually renders it. Instead, during a command execution, latency is the time between when a command is invoked and all relative handlers complete their execution. We usually use the first definition until differently specified. Although the nominal latency is near zero (some milliseconds because of the dictionary-based configuration of data binding), specific implementation concerns about latency actually exist. In any Model or ViewModel, an updated data notification is made by triggering the View with the INotifyPropertyChanged interface. The .NET interface causes the View to read the underlying data again. Because all notifications are made by a single .NET event, this can easily become a bottleneck because of the serialized approach used by any delegate or event handlers in the .NET world. On the contrary, when dealing with data that flows from the View to the Model, such an inverse binding is usually configured declaratively within the {Binding …} keyword, which supports specifying binding directions and trigger timing (to choose from the control's lost focus CLR event or anytime the property value changes). The XAML data binding does not add any measurable time during its execution. Although this, as said, such binding may link multiple properties or the control's dependency properties together. Linking this interaction logic could increase latency time heavily, adding some annoying delay at the View level. One fact upon all, is the added latency by any validation logic. It is even worse if such validation is other than formal, such as validating some ID or CODE against a database value. Talking about scalability, MVVM patterns does some work here, while we can make some concrete analysis concerning the XAML implementation. It is easy to say that scaling out is impossible because MVVM is a desktop class layered architecture that cannot scale. Instead, we can say that in a multiuser scenario with multiple client systems connected in a 2-tier or 3-tier system architecture, simple MVVM and XAML-based frameworks will never act as bottlenecks. The ability to use the full .NET stack in WPF gives us the chance to use all synchronization techniques available, in order to use a directly connected DBMS or middleware tier. Instead of scaling up by moving the application to an increased CPU clock system, the XAML-based application would benefit more from an increased CPU core count system. Obviously, to profit from many CPU cores, mastering parallel techniques is mandatory. About the resource usage, MVVM-powered architectures require only a simple POCO class as a Model and ViewModel. The only additional requirement is the implementation of the INotifyPropertyChanged interface that costs next to nothing. Talking about the pattern, unlike MVC, which has a specific elaboration workflow, MVVM does not offer this functionality. Multiple commands with multiple logic can process their respective logic (together with asynchronous invocation) with the local VM data or by going down to the persistence layer to grab missing information. We have all the choices here. Although MVVM does not cost anything in terms of graphical rendering, XAML-based frameworks make massive use of hardware-accelerated user controls. Talking about an extreme choice, Windows Forms with Graphics Device Interface (GDI)-based rendering require a lot less resources and can give a higher frame rate on highly updatable data. Thus, if a very high FPS is needed, the choice of still rendering a WPF area in GDI is available. For other XAML languages, the choice is not so easy to obtain. Obviously, this does not mean that XAML is slow in rendering with its DirectX based engine. Simply consider that WPF animations need a good Graphics Processing Unit (GPU), while a basic GDI animation will execute on any system, although it is obsolete. Talking about availability, MVVM-based architectures usually lead programmers to good programming. As MVC allows it, MVVM designs can be tested because of the great modularity. While a Controller uses a pipelined workflow to process any requests, a ViewModel is more flexible and can be tested with multiple initialization conditions. This makes it more powerful but also less predictable than a Controller, and hence is tricky to use. In terms of design, the Controller acts as a transaction script, while the ViewModel acts in a more realistic, object-oriented approach. Finally, yet importantly, throughput and efficiency are simply unaffected by MVVM-based architectures. However, because of the flexibility the solution gives to the developer, any interaction and business logic design may be used inside a ViewModel and their underlying Models. Therefore, any success or failure regarding those performance aspects are usually related to programmer work. In XAML frameworks, throughput is achieved by an intensive use of asynchronous and parallel programming assisted by a built-in thread synchronization subsystem, based on the Dispatcher class that deals with UI updates. Low-pass filtering for Audio Low-pass filtering has been available since 2008 in the native .NET code. NAudio is a powerful library helping any CLR programmer to create, manipulate, or analyze audio data in any format. Available through NuGet Package Manager, NAudio offers a simple and .NET-like programming framework, with specific classes and stream-reader for audio data files. Let's see how to apply the low-pass digital filter in a real audio uncompressed file in WAVE format. For this test, we will use the Windows start-up default sound file. The chart is still made in a legacy Windows Forms application with an empty Form1 file, as shown in the previous example: private async void Form1_Load(object sender, EventArgs e) { //stereo wave file channels var channels = await Task.Factory.StartNew(() => { //the wave stream-like reader using (var reader = new WaveFileReader("startup.wav")) { var leftChannel = new List<float>(); var rightChannel = new List<float>(); //let's read all frames as normalized floats while (reader.Position < reader.Length) { var frame = reader.ReadNextSampleFrame(); leftChannel.Add(frame[0]); rightChannel.Add(frame[1]); } return new { Left = leftChannel.ToArray(), Right = rightChannel.ToArray(), }; } }); //make a low-pass digital filter on floating point data //at 200hz var leftLowpassTask = Task.Factory.StartNew(() => LowPass(channels.Left, 200).ToArray()); var rightLowpassTask = Task.Factory.StartNew(() => LowPass(channels.Right, 200).ToArray()); //this let the two tasks work together in task-parallelism var leftChannelLP = await leftLowpassTask; var rightChannelLP = await rightLowpassTask; //create and databind a chart var chart1 = CreateChart(); chart1.DataSource = Enumerable.Range(0, channels.Left.Length).Select(i => new { Index = i, Left = channels.Left[i], Right = channels.Right[i], LeftLP = leftChannelLP[i], RightLP = rightChannelLP[i], }).ToArray(); chart1.DataBind(); //add the chart to the form this.Controls.Add(chart1); } private static Chart CreateChart() { //creates a chart //namespace System.Windows.Forms.DataVisualization.Charting var chart1 = new Chart(); //shows chart in fullscreen chart1.Dock = DockStyle.Fill; //create a default area chart1.ChartAreas.Add(new ChartArea()); //left and right channel series chart1.Series.Add(new Series { XValueMember = "Index", XValueType = ChartValueType.Auto, YValueMembers = "Left", ChartType = SeriesChartType.Line, }); chart1.Series.Add(new Series { XValueMember = "Index", XValueType = ChartValueType.Auto, YValueMembers = "Right", ChartType = SeriesChartType.Line, }); //left and right channel low-pass (bass) series chart1.Series.Add(new Series { XValueMember = "Index", XValueType = ChartValueType.Auto, YValueMembers = "LeftLP", ChartType = SeriesChartType.Line, BorderWidth = 2, }); chart1.Series.Add(new Series { XValueMember = "Index", XValueType = ChartValueType.Auto, YValueMembers = "RightLP", ChartType = SeriesChartType.Line, BorderWidth = 2, }); return chart1; } Let's see the graphical result: The Windows start-up sound waveform. In bolt, the bass waveform with a low-pass filter at 200hz. The usage of parallelism in elaborations such as this is mandatory. Audio elaboration is a canonical example of engineering data computation because it works on a huge dataset of floating points values. A simple file, such as the preceding one that contains less than 2 seconds of audio sampled at (only) 22,050 Hz, produces an array greater than 40,000 floating points per channel (stereo = 2 channels). Just to have an idea of how hard processing audio files is, note that an uncompressed CD quality song of 4 minutes sampled at 44,100 samples per second * 60 (seconds) * 4 (minutes) will create an array greater than 10 million floating-point items per channel. Because of the FFT intrinsic logic, any low-pass filtering run must run in a single thread. This means that the only optimization we can apply when running FFT based low-pass filtering is parallelizing in a per channel basis. For most cases, this choice can only bring a 2X throughput improvement, regardless of the processor count of the underlying system. Summary In this article we got introduced to the applications of .NET high-performance performance. We learned how MVVM and XAML play their roles in .NET to create applications for various platforms, also we learned about its performance characteristics. Next we learned how high-performance .NET had applications in engineering aspects through a practical example of low-pass audio filtering. It showed you how versatile it is to apply high-performance programming to specific engineering applications. Resources for Article: Further resources on this subject: Windows Phone 8 Applications [article] Core .NET Recipes [article] Parallel Programming Patterns [article]

0
0
1797

article-image-essentials-working-python-collections

Packt

09 Jul 2015

14 min read

The Essentials of Working with Python Collections

Packt

09 Jul 2015

14 min read

In this article by Steven F. Lott, the author of the book Python Essentials, we'll look at the break and continue statements; these modify a for or while loop to allow skipping items or exiting before the loop has processed all items. This is a fundamental change in the semantics of a collection-processing statement. (For more resources related to this topic, see here.) Processing collections with the for statement The for statement is an extremely versatile way to process every item in a collection. We do this by defining a target variable, a source of items, and a suite of statements. The for statement will iterate through the source of items, assigning each item to the target variable, and also execute the suite of statements. All of the collections in Python provide the necessary methods, which means that we can use anything as the source of items in a for statement. Here's some sample data that we'll work with. This is part of Mike Keith's poem, Near a Raven. We'll remove the punctuation to make the text easier to work with: >>> text = '''Poe, E. ... Near a Raven ... ... Midnights so dreary, tired and weary.''' >>> text = text.replace(",","").replace(".","").lower() This will put the original text, with uppercase and lowercase and punctuation into the text variable. When we use text.split(), we get a sequence of individual words. The for loop can iterate through this sequence of words so that we can process each one. The syntax looks like this: >>> cadaeic= {} >>> for word in text.split(): ... cadaeic[word]= len(word) We've created an empty dictionary, and assigned it to the cadaeic variable. The expression in the for loop, text.split(), will create a sequence of substrings. Each of these substrings will be assigned to the word variable. The for loop body—a single assignment statement—will be executed once for each value assigned to word. The resulting dictionary might look like this (irrespective of ordering): {'raven': 5, 'midnights': 9, 'dreary': 6, 'e': 1, 'weary': 5, 'near': 4, 'a': 1, 'poe': 3, 'and': 3, 'so': 2, 'tired': 5} There's no guaranteed order for mappings or sets. Your results may differ slightly. In addition to iterating over a sequence, we can also iterate over the keys in a dictionary. >>> for word in sorted(cadaeic): ... print(word, cadaeic[word]) When we use sorted() on a tuple or a list, an interim list is created with sorted items. When we apply sorted() to a mapping, the sorting applies to the keys of the mapping, creating a sequence of sorted keys. This loop will print a list in alphabetical order of the various pilish words used in this poem. Pilish is a subset of English where the word lengths are important: they're used as mnemonic aids. A for statement corresponds to the "for all" logical quantifier, . At the end of a simple for loop we can assert that all items in the source collection have been processed. In order to build the "there exists" quantifier, , we can either use the while statement, or the break statement inside the body of a for statement. Using literal lists in a for statement We can apply the for statement to a sequence of literal values. One of the most common ways to present literals is as a tuple. It might look like this: for scheme in 'http', 'https', 'ftp': do_something(scheme) This will assign three different values to the scheme variable. For each of those values, it will evaluate the do_something() function. From this, we can see that, strictly-speaking, the () are not required to delimit a tuple object. If the sequence of values grows, however, and we need to span more than one physical line, we'll want to add (), making the tuple literal more explicit. Using the range() and enumerate() functions The range() object will provide a sequence of numbers, often used in a for loop. The range() object is iterable, it's not itself a sequence object. It's a generator, which will produce items when required. If we use range() outside a for statement, we need to use a function like list(range(x)) or tuple(range(a,b)) to consume all of the generated values and create a new sequence object. The range() object has three commonly-used forms: range(n) produces ascending numbers including 0 but not including n itself. This is a half-open interval. We could say that range(n) produces numbers, x, such that . The expression list(range(5)) returns [0, 1, 2, 3, 4]. This produces n values including 0 and n - 1. range(a,b) produces ascending numbers starting from a but not including b. The expression tuple(range(-1,3)) will return (-1, 0, 1, 2). This produces b - a values including a and b - 1. range(x,y,z) produces ascending numbers in the sequence . This produces (y-x)//z values. We can use the range() object like this: for n in range(1, 21): status= str(n) if n % 5 == 0: status += " fizz" if n % 7 == 0: status += " buzz" print(status) In this example, we've used a range() object to produce values, n, such that . We use the range() object to generate the index values for all items in a list: for n in range(len(some_list)): print(n, some_list[n]) We've used the range() function to generate values between 0 and the length of the sequence object named some_list. The for statement allows multiple target variables. The rules for multiple target variables are the same as for a multiple variable assignment statement: a sequence object will be decomposed and items assigned to each variable. Because of that, we can leverage the enumerate() function to iterate through a sequence and assign the index values at the same time. It looks like this: for n, v in enumerate(some_list): print(n, v) The enumerate() function is a generator function which iterates through the items in source sequence and yields a sequence of two-tuple pairs with the index and the item. Since we've provided two variables, the two-tuple is decomposed and assigned to each variable. There are numerous use cases for this multiple-assignment for loop. We often have list-of-tuples data structures that can be handled very neatly with this multiple-assignment feature. Iterating with the while statement The while statement is a more general iteration than the for statement. We'll use a while loop in two situations. We'll use this in cases where we don't have a finite collection to impose an upper bound on the loop's iteration; we may suggest an upper bound in the while clause itself. We'll also use this when writing a "search" or "there exists" kind of loop; we aren't processing all items in a collection. A desktop application that accepts input from a user, for example, will often have a while loop. The application runs until the user decides to quit; there's no upper bound on the number of user interactions. For this, we generally use a while True: loop. Infinite iteration is recommended. If we want to write a character-mode user interface, we could do it like this: quit_received= False while not quit_received: command= input("prompt> ") quit_received= process(command) This will iterate until the quit_received variable is set to True. This will process indefinitely; there's no upper boundary on the number of iterations. This process() function might use some kind of command processing. This should include a statement like this: if command.lower().startswith("quit"): return True When the user enters "quit", the process() function will return True. This will be assigned to the quit_received variable. The while expression, not quit_received, will become False, and the loop ends. A "there exists" loop will iterate through a collection, stopping at the first item that meets certain criteria. This can look complex because we're forced to make two details of loop processing explicit. Here's an example of searching for the first value that meets a condition. This example assumes that we have a function, condition(), which will eventually be True for some number. Here's how we can use a while statement to locate the minimum for which this function is True: >>> n = 1 >>> while n != 101 and not condition(n): ... n += 1 >>> assert n == 101 or condition(n) The while statement will terminate when n == 101 or the condition(n) is True. If this expression is False, we can advance the n variable to the next value in the sequence of values. Since we're iterating through the values in order from the smallest to the largest, we know that n will be the smallest value for which the condition() function is true. At the end of the while statement we have included a formal assertion that either n is 101 or the condition() function is True for the given value of n. Writing an assertion like this can help in design as well as debugging because it will often summarize the loop invariant condition. We can also write this kind of loop using the break statement in a for loop, something we'll look at in the next section. The continue and break statements The continue statement is helpful for skipping items without writing deeply-nested if statements. The effect of executing a continue statement is to skip the rest of the loop's suite. In a for loop, this means that the next item will be taken from the source iterable. In a while loop, this must be used carefully to avoid an otherwise infinite iteration. We might see file processing that looks like this: for line in some_file: clean = line.strip() if len(clean) == 0: continue data, _, _ = clean.partition("#") data = data.rstrip() if len(data) == 0: continue process(data) In this loop, we're relying on the way files act like sequences of individual lines. For each line in the file, we've stripped whitespace from the input line, and assigned the resulting string to the clean variable. If the length of this string is zero, the line was entirely whitespace, and we'll continue the loop with the next line. The continue statement skips the remaining statements in the body of the loop. We'll partition the line into three pieces: a portion in front of any "#", the "#" (if present), and the portion after any "#". We've assigned the "#" character and any text after the "#" character to the same easily-ignored variable, _, because we don't have any use for these two results of the partition() method. We can then strip any trailing whitespace from the string assigned to the data variable. If the resulting string has a length of zero, then the line is entirely filled with "#" and any trailing comment text. Since there's no useful data, we can continue the loop, ignoring this line of input. If the line passes the two if conditions, we can process the resulting data. By using the continue statement, we have avoided complex-looking, deeply-nested if statements. It's important to note that a continue statement must always be part of the suite inside an if statement, inside a for or while loop. The condition on that if statement becomes a filter condition that applies to the collection of data being processed. continue always applies to the innermost loop. Breaking early from a loop The break statement is a profound change in the semantics of the loop. An ordinary for statement can be summarized by "for all." We can comfortably say that "for all items in a collection, the suite of statements was processed." When we use a break statement, a loop is no longer summarized by "for all." We need to change our perspective to "there exists". A break statement asserts that at least one item in the collection matches the condition that leads to the execution of the break statement. Here's a simple example of a break statement: for n in range(1, 100): factors = [] for x in range(1,n): if n % x == 0: factors.append(x) if sum(factors) == n: break We've written a loop that is bound by . This loop includes a break statement, so it will not process all values of n. Instead, it will determine the smallest value of n, for which n is equal to the sum of its factors. Since the loop doesn't examine all values, it shows that at least one such number exists within the given range. We've used a nested loop to determine the factors of the number n. This nested loop creates a sequence, factors, for all values of x in the range , such that x, is a factor of the number n. This inner loop doesn't have a break statement, so we are sure it examines all values in the given range. The least value for which this is true is the number six. It's important to note that a break statement must always be part of the suite inside an if statement inside a for or while loop. If the break isn't in an if suite, the loop will always terminate while processing the first item. The condition on that if statement becomes the "where exists" condition that summarizes the loop as a whole. Clearly, multiple if statements with multiple break statements mean that the overall loop can have a potentially confusing and difficult-to-summarize post-condition. Using the else clause on a loop Python's else clause can be used on a for or while statement as well as on an if statement. The else clause executes after the loop body if there was no break statement executed. To see this, here's a contrived example: >>> for item in 1,2,3: ... print(item) ... if item == 2: ... print("Found",item) ... break ... else: ... print("Found Nothing") The for statement here will iterate over a short list of literal values. When a specific target value has been found, a message is printed. Then, the break statement will end the loop, avoiding the else clause. When we run this, we'll see three lines of output, like this: 1 2 Found 2 The value of three isn't shown, nor is the "Found Nothing" message in the else clause. If we change the target value in the if statement from two to a value that won't be seen (for example, zero or four), then the output will change. If the break statement is not executed, then the else clause will be executed. The idea here is to allow us to write contrasting break and non-break suites of statements. An if statement suite that includes a break statement can do some processing in the suite before the break statement ends the loop. An else clause allows some processing at the end of the loop when none of the break-related suites statements were executed. Summary In this article, we've looked at the for statement, which is the primary way we'll process the individual items in a collection. A simple for statement assures us that our processing has been done for all items in the collection. We've also looked at the general purpose while loop. Resources for Article: Further resources on this subject: Introspecting Maya, Python, and PyMEL [article] Analyzing a Complex Dataset [article] Geo-Spatial Data in Python: Working with Geometry [article]

0
0
1354

Packt

08 Jul 2015

5 min read

Creating an F# Project

Packt

08 Jul 2015

5 min read

In this article by Adnan Masood, author of the book, Learning F# Functional Data Structures and Algorithms, we see how to set up the IDE and create our first F# project. "Ah yes, Haskell. Where all the types are strong, all the men carry arrows, and all the children are above average." – marked trees (on the city of Haskell) The perceived adversity of functional programming is overly exaggerated; the essence of this paradigm is to explicitly recognize and enforce the referential transparency. We will see how to set up the tooling for Visual Studio 2013 and for F# 3.1, the currently available version of F# at the time of writing. We will review the F# 4.0 preview features by the end of this project. After we get the tooling sorted out, we will review some simple algorithms; starting with recursion with typical a Fibonacci sequence and Tower of Hanoi, we will perform lazy evaluation on a quick sort example. In this article, we will cover the following topics: Setting up Visual Studio and F# compiler to work together Setting up the environment and running your F# programs (For more resources related to this topic, see here.) Setting up the IDE As developers, we love our IDEs (Integrated Development Environments) and Visual Studio.NET is probably the best IDE for .NET development; no offense to Eclipse bloatware Luna. From the open source perspective, there has been a recent major development in making the .NET framework available as open source and on Mac and Linux platforms. Microsoft announced a pre-release of F# 4.0 in Visual Studio 2015 Preview and it will be available as part of the full release. To install and run F#, there are various options available for all platforms, sizes, and budgets. For those with a fear of commitments, there is the online interactive version of TryFsharp at http://www.tryfsharp.org/ where you can code in the browser. For Windows users, you have a few options. Until VS.NET 2015 comes out, you can try out the freely available Visual Studio Community 2013 or a Visual Studio 2013 trial edition, with trial being the keyword. The trial editions include Ultimate, Premium, and Professional versions. The free community edition IDE can be downloaded from https://www.visualstudio.com/en-us/news/vs2013-community-vs.aspx and the trial editions can be downloaded from http://www.visualstudio.com/downloads/download-visual-studio-vs. Alternatively, there are express editions available at no cost. Visual Studio Express 2013 for Windows Desktop Web editions can be downloaded from http://www.visualstudio.com/downloads/download-visual-studio-vs#d-express-windows-desktop. F# support is built into Visual Studio; the Visual F# tools package the latest updates to the F# compiler: interactive, runtime, and Visual Studio integration. F# support comes with Visual Studio. However, the F# team releases regular updates in the form of F# tools. The tools can be downloaded from www.microsoft.com/en-us/download/details.aspx?id=44011. The F# tools contain the F# command-line compiler (fsc.exe) and F# Interactive (fsi.exe), which are the easiest way to get started with F#. The fsi.exe compiler can be found in C:Program Files (x86)Microsoft SDKsF#<version>Framework<version>. The <version> placeholder in the preceding path is substituted by your .NET version installed. If you just want to use the F# compiler and tools from the command line, you can download the .NET framework 4.5 or above from https://www.microsoft.com/en-in/download/details.aspx?id=30653. You will also need the Windows SDK for associated dependencies, which can be downloaded from http://msdn.microsoft.com/windows/desktop/bg162891. Alternatively, Tsunami is the free IDE for F# that you can download from http://tsunami.io/download.html and use to build applications. CloudSharper by IntelliFactory is in beta but shows promise as a web-based IDE. For more information regarding CloudSharper, refer to http://cloudsharper.com/. In this article, we will be using Visual Studio 2013 Professional Edition and FSI (F# interactive) but you can either use the trial or Express edition, or the FSI command line to run the examples and exercises. Your first F# project Going through installation screens and showing how to click Next would be discourteous to our reader's intelligence. Therefore we will skip step-by-step installation for other more verbose texts. Let's start with our first F# project in Visual Studio. In the preceding screenshot, you can see the F# interactive window at the bottom. Here we have selected FILE | New | Project because we are attempting to open a new project of F# type. There are a few project types available, including console applications and F# library. For ease of explanation, let's begin with a Console Application as shown in the next screenshot: Alternatively, from within Visual Studio, we can use FSharp Interactive. FSharp Interactive (FSI) is an effective tool for testing out your code quickly. You can open the FSI window by selecting View | Other Windows | F# Interactive from the Visual Studio IDE as shown in the next screenshot: FSI lets you run code from a console which is similar to a shell. You can access the FSI executable directly from the location at c:Program Files (x86)Microsoft SDKsF#<version>Framework<version>. FSI maintains session context, which means that the constructs created earlier in the FSI are still available in the later parts of code. The FsiAnyCPU.exe executable file is the 64-bit counterpart of F# interactive; Visual Studio determines which executable to use based on the machine's architecture being either 32-bit or 64-bit. You can also change the F# interactive parameters and settings from the Options dialog as shown in the following screenshot: Summary In this article, you learned how to set up an IDE for F# in Visual Studio 2013 and created a new F# project. Resources for Article: Further resources on this subject: Test-driven API Development with Django REST Framework [article] edX E-Learning Course Marketing [article] Introduction to Microsoft Azure Cloud Services [article]

0
0
1385

Packt

08 Jul 2015

9 min read

What is Apache Camel?

Packt

08 Jul 2015

9 min read

In this article Jean-Baptiste Onofré, author of the book Mastering Apache Camel, we will see how Apache Camel originated in Apache ServiceMix. Apache ServiceMix 3 was powered by the Spring framework and implemented in the JBI specification. The Java Business Integration (JBI) specification proposed a Plug and Play approach for integration problems. JBI was based on WebService concepts and standards. For instance, it directly reusesthe Message Exchange Patterns (MEP) concept that comes from WebService Description Language (WSDL). Camel reuses some of these concepts, for instance, you will see that we have the concept of MEP in Camel. However, JBI suffered mostly from two issues: In JBI, all messages between endpoints are transported in the Normalized Messages Router (NMR). In the NMR, a message has a standard XML format. As all messages in the NMR havethe same format, it's easy to audit messages and the format is predictable. However, the JBI XML format has an important drawback for performances: it needs to marshall and unmarshall the messages. Some protocols (such as REST or RMI) are not easy to describe in XML. For instance, REST can work in stream mode. It doesn't make sense to marshall streamsin XML. Camel is payload-agnostic. This means that you can transport any kind of messages with Camel (not necessary XML formatted). JBI describes a packaging. We distinguish the binding components (responsible for the interaction with the system outside of the NMR and the handling of the messages in the NMR), and the service engines (responsible for transforming the messages inside the NMR). However, it's not possible to directly deploy the endpoints based on these components. JBI requires a service unit (a ZIP file) per endpoint, and for each package in a service assembly (another ZIP file). JBI also splits the description of the endpoint from its configuration. It does not result in a very flexible packaging: with definitions and configurations scattered in different files, not easy to maintain. In Camel, the configuration and definition of the endpoints are gatheredin a simple URI. It's easier to read. Moreover, Camel doesn't force any packaging; the same definition can be packaged in a simple XML file, OSGi bundle, andregular JAR file. In addition to JBI, another foundation of Camel is the book Enterprise Integration Patterns by Gregor Hohpe and Bobby Woolf. It describes design patterns answering classical problems while dealing with enterprise application integration and message oriented middleware. The book describes the problems and the patterns to solve them. Camel strives to implement the patterns described in the book to make them easy to use and let the developer concentrate on the task at hand. This is what Camel is: an open source framework that allows you to integrate systems and that comes with a lot of connectors and Enterprise Integration Patterns (EIP) components out of the box. And if that is not enough, one can extend and implement custom components. Components and bean support Apache Camel ships with a wide variety of components out of the box; currently, there are more than 100 components available. We can see: The connectivity components that allow exposure of endpoints for external systems or communicate with external systems. For instance, the FTP, HTTP, JMX, WebServices, JMS, and a lot more components are connectivity components. Creating an endpoint and the associated configuration for these components is easy, by directly using a URI. The internal components applying rules to the messages internally to Camel. These kinds of components apply validation or transformation rules to the inflight message. For instance, validation or XSLT are internal components. Camel brings a very powerful connectivity and mediation framework. Moreover, it's pretty easy to create new custom components, allowing you to extend Camel if the default components set doesn't match your requirements. It's also very easy to implement complex integration logic by creating your own processors and reusing your beans. Camel supports beans frameworks (IoC), such as Spring or Blueprint. Predicates and expressions As we will see later, most of the EIP need a rule definition to apply a routing logic to a message. The rule is described using an expression. It means that we have to define expressions or predicates in the Enterprise Integration Patterns. An expression returns any kind of value, whereas a predicate returns true or false only. Camel supports a lot of different languages to declare expressions or predicates. It doesn't force you to use one, it allows you to use the most appropriate one. For instance, Camel supports xpath, mvel, ognl, python, ruby, PHP, JavaScript, SpEL (Spring Expression Language), Groovy, and so on as expression languages. It also provides native Camel prebuilt functions and languages that are easy to use such as header, constant, or simple languages. Data format and type conversion Camel is payload-agnostic. This means that it can support any kind of message. Depending on the endpoints, it could be required to convert from one format to another. That's why Camel supports different data formats, in a pluggable way. This means that Camel can marshall or unmarshall a message in a given format. For instance, in addition to the standard JVM serialization, Camel natively supports Avro, JSON, protobuf, JAXB, XmlBeans, XStream, JiBX, SOAP, and so on. Depending on the endpoints and your need, you can explicitly define the data format during the processing of the message. On the other hand, Camel knows the expected format and type of endpoints. Thanks to this, Camel looks for a type converter, allowing to implicitly transform a message from one format to another. You can also explicitly define the type converter of your choice at some points during the processing of the message. Camel provides a set of ready-to-use type converters, but, as Camel supports a pluggable model, you can extend it by providing your own type converters. It's a simple POJO to implement. Easy configuration and URI Camel uses a different approach based on URI. The endpoint itself and its configuration are on the URI. The URI is human readable and provides the details of the endpoint, which is the endpoint component and the endpoint configuration. As this URI is part of the complete configuration (which defines what we name a route, as we will see later), it's possible to have a complete overview of the integration logic and connectivity in a row. Lightweight and different deployment topologies Camel itself is very light. The Camel core is only around 2 MB, and contains everythingrequired to run Camel. As it's based on a pluggable architecture, all Camel components are provided as external modules, allowing you to install only what you need, without installing superfluous and needlessly heavy modules. Camel is based on simple POJO, which means that the Camel core doesn't depend on other frameworks: it's an atomic framework and is ready to use. All other modules (components, DSL, and so on) are built on top of this Camel core. Moreover, Camel is not tied to one container for deployment. Camel supports a wide range of containers to run. They are as follows: A J2EE application server such as WebSphere, WebLogic, JBoss, and so on A Web container such as Apache Tomcat An OSGi container such as Apache Karaf A standalone application using frameworks such as Spring Camel gives a lot of flexibility, allowing you to embed it into your application or to use an enterprise-ready container. Quick prototyping and testing support In any integration project, it's typical that we have some part of the integration logic not yet available. For instance: The application to integrate with has not yet been purchased or not yet ready The remote system to integrate with has a heavy cost, not acceptable during the development phase Multiple teams work in parallel, so we may have some kinds of deadlocks between the teams As a complete integration framework, Camel provides a very easy way to prototype part of the integration logic. Even if you don't have the actual system to integrate, you can simulate this system (mock), as it allows you to implement your integration logic without waiting for dependencies. The mocking support is directly part of the Camel core and doesn't require any additional dependency. Along the same lines, testing is also crucial in an integration project. In such a kind of project, a lot of errors can happen and most are unforeseen. Moreover, a small change in an integration process might impact a lot of other processes. Camel provides the tools to easily test your design and integration logic, allowing you to integrate this in a continuous integration platform. Management and monitoring using JMX Apache Camel uses the Java Management Extension (JMX) standard and provides a lot of insights into the system using MBeans (Management Beans), providing a detailed view of the following current system: The different integration processes with the associated metrics The different components and endpoints with the associated metrics Moreover, these MBeans provide more insights than metrics. They also provide the operations to manage Camel. For instance, the operations allow you to stop an integration process, to suspend an endpoint, and so on. Using a combination of metrics and operations, you can configure a very agile integration solution. Active community The Apache Camel community is very active. This means that potential issues are identified very quickly and a fix is available soon after. However, it also means that a lot of ideas and contributions are proposed, giving more and more features to Camel. Another big advantage of an active community is that you will never be alone; a lot of people are active on the mailing lists who are ready to answer your question and provide advice. Summary Apache Camel is an enterprise integration solution used in many large organizations with enterprise support available through RedHat or Talend. Resources for Article: Further resources on this subject: Getting Started [article] A Quick Start Guide to Flume [article] Best Practices [article]

0
0
1828

article-image-processing-next-generation-sequencing-datasets-using-python

Packt

07 Jul 2015

25 min read

Processing Next-generation Sequencing Datasets Using Python

Packt

07 Jul 2015

25 min read

0
0
16576

article-image-methodology-modeling-business-processes-soa

Packt

07 Jul 2015

27 min read

Methodology for Modeling Business Processes in SOA

Packt

07 Jul 2015

27 min read

0
0
2231

How-To Tutorials - Programming

Listening Out

Elegant RESTful Client in Python for Exposing Remote Resources

Working with XAML

Advanced Data Access Patterns

Task Execution with Asio

Using the ArcPy DataAccess Module withFeature Classesand Tables

More about Julia

Fine-tune the NGINX Configuration

Understanding mutability and immutability in Python, C#, and JavaScript

Architecting and coding high performance .NET applications

Trending Topics

The Essentials of Working with Python Collections

Creating an F# Project

What is Apache Camel?

Processing Next-generation Sequencing Datasets Using Python

Methodology for Modeling Business Processes in SOA