Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Design Patterns

32 Articles
article-image-what-is-multi-layered-software-architecture
Packt Editorial Staff
17 May 2018
7 min read
Save for later

What is a multi layered software architecture?

Packt Editorial Staff
17 May 2018
7 min read
Multi layered software architecture is one of the most popular architectural patterns today. It moderates the increasing complexity of modern applications. It also makes it easier to work in a more agile manner. That's important when you consider the dominance of DevOps and other similar methodologies today. Sometimes called tiered architecture, or n-tier architecture, a multi layered software architecture consists of various layers, each of which corresponds to a different service or integration. Because each layer is separate, making changes to each layer is easier than having to tackle the entire architecture. Let's take a look at how a multi layered software architecture works, and what the advantages and disadvantages of it are. This has been taken from the book Architectural Patterns. Find it here. What does a layered software architecture consist of? Before we get into a multi layered architecture, let's start with the simplest form of layered architecture - three tiered architecture. This is a good place to start because all layered software architecture contains these three elements. These are the foundations: Presentation layer: This is the first and topmost layer which is present in the application. This tier provides presentation services, that is presentation, of content to the end user through GUI. This tier can be accessed through any type of client device like desktop, laptop, tablet, mobile, thin client, and so on. For the content to the displayed to the user, the relevant web pages should be fetched by the web browser or other presentation component which is running in the client device. To present the content, it is essential for this tier to interact with the other tiers that are present preceding it. Application layer: This is the middle tier of this architecture. This is the tier in which the business logic of the application runs. Business logic is the set of rules that are required for running the application as per the guidelines laid down by the organization. The components of this tier typically run on one or more application servers. Data layer: This is the lowest tier of this architecture and is mainly concerned with the storage and retrieval of application data. The application data is typically stored in a database server, file server, or any other device or media that supports data access logic and provides the necessary steps to ensure that only the data is exposed without providing any access to the data storage and retrieval mechanisms. This is done by the data tier by providing an API to the application tier. The provision of this API ensures complete transparency to the data operations which are done in this tier without affecting the application tier. For example, updates or upgrades to the systems in this tier do not affect the application tier of this architecture. The diagram below shows how a simple layered architecture with 3 tiers works:   These three layers are essential. But other layers can be built on top of them. That's when we get into multi layered architecture. It's sometimes called n-tiered architecture because the number of tiers or layers (n) could be anything! It depends on what you need and how much complexity you're able to handle. Multi layered software architecture A multi layered software architecture still has the presentation layer and data layer. It simply splits up and expands the application layer. These additional aspects within the application layer are essentially different services. This means your software should now be more scalable and have extra dimensions of functionality. Of course, the distribution of application code and functions among the various tiers will vary from one architectural design to another, but the concept remains the same. The diagram below illustrates what a multi layered software architecture looks like. As you can see, it's a little more complex that a three-tiered architecture, but it does increase scalability quite significantly: What are the benefits of a layered software architecture? A layered software architecture has a number of benefits - that's why it has become such a popular architectural pattern in recent years. Most importantly, tiered segregation allows you to manage and maintain each layer accordingly. In theory it should greatly simplify the way you manage your software infrastructure. The multi layered approach is particularly good for developing web-scale, production-grade, and cloud-hosted applications very quickly and relatively risk-free. It also makes it easier to update any legacy systems - when you're architecture is broken up into multiple layers, the changes that need to be made should be simpler and less extensive than they might otherwise have to be. When should you use a multi layered software architecture? Clearly, the argument for a multi layered software architecture is pretty clear. However, there are some instances when it is particularly appropriate: If you are building a system in which it is possible to split the application logic into smaller components that could be spread across several servers. This could lead to the design of multiple tiers in the application tier. If the system under consideration requires faster network communications, high reliability, and great performance, then n-tier has the capability to provide that as this architectural pattern is designed to reduce the overhead which is caused by network traffic. An example of a multi layered software architecture We can illustrate the working of an multi layered architecture with the help of an example of a shopping cart web application which is present in all e-commerce sites. The shopping cart web application is used by the e-commerce site user to complete the purchase of items through the e-commerce site. You'd expect the application to have several features that allow the user to: Add selected items to the cart Change the quantity of items in their cart Make payments The client tier, which is present in the shopping cart application, interacts with the end user through a GUI. The client tier also interacts with the application that runs in the application servers present in multiple tiers. Since the shopping cart is a web application, the client tier contains the web browser. The presentation tier present in the shopping cart application displays information related to the services like browsing merchandise, buying them, adding them to the shopping cart, and so on. The presentation tier communicates with other tiers by sending results to the client tier and all other tiers which are present in the network. The presentation tier also makes calls to database stored procedures and web services. All these activities are done with the objective of providing a quick response time to the end user. The presentation tier plays a vital role by acting as a glue which binds the entire shopping cart application together by allowing the functions present in different tiers to communicate with each other and display the outputs to the end user through the web browser. In this multi layered architecture, the business logic which is required for processing activities like calculation of shipping cost and so on are pulled from the application tier to the presentation tier. The application tier also acts as the integration layer and allows the applications to communicate seamlessly with both the data tier and the presentation tier. The last tier which is the data tier is used to maintain data. This layer typically contains database servers. This layer maintains data independent from the application server and the business logic. This approach provides enhanced scalability and performance to the data tier. Read next Microservices and Service Oriented Architecture What is serverless architecture and why should I be interested?
Read more
  • 0
  • 2
  • 40270

article-image-python-design-patterns-depth-singleton-pattern
Packt
15 Feb 2016
14 min read
Save for later

Python Design Patterns in Depth: The Singleton Pattern

Packt
15 Feb 2016
14 min read
There are situations where you need to create only one instance of data throughout the lifetime of a program. This can be a class instance, a list, or a dictionary, for example. The creation of a second instance is undesirable. This can result in logical errors or malfunctioning of the program. The design pattern that allows you to create only one instance of data is called singleton. In this article, you will learn about module-level, classic, and borg singletons; you'll also learn about how they work, when to use them, and build a two-threaded web crawler that uses a singleton to access the shared resource. (For more resources related to this topic, see here.) Singleton is the best candidate when the requirements are as follows: Controlling concurrent access to a shared resource If you need a global point of access for the resource from multiple or different parts of the system When you need to have only one object Some typical use cases of using a singleton are: The logging class and its subclasses (global point of access for the logging class to send messages to the log) Printer spooler (your application should only have a single instance of the spooler in order to avoid having a conflicting request for the same resource) Managing a connection to a database File manager Retrieving and storing information on external configuration files Read-only singletons storing some global states (user language, time, time zone, application path, and so on) There are several ways to implement singletons. We will look at module-level singleton, classic singletons, and borg singleton. Module-level singleton All modules are singletons by nature because of Python's module importing steps: Check whether a module is already imported. If yes, return it. If not, find a module, initialize it, and return it. Initializing a module means executing a code, including all module-level assignments. When you import the module for the first time, all of the initializations will be done; however, if you try to import the module for the second time, Python will return the initialized module. Thus, the initialization will not be done, and you get a previously imported module with all of its data. So, if you want to quickly make a singleton, use the following steps and keep the shared data as the module attribute. singletone.py: only_one_var = "I'm only one var" module1.py: import single tone print singleton.only_one_var singletone.only_one_var += " after modification" import module2 module2.py: import singletone print singleton.only_one_var Here, if you try to import a global variable in a singleton module and change its value in the module1 module, module2 will get a changed variable. This function is quick and sometimes is all that you need; however, we need to consider the following points: It's pretty error-prone. For example, if you happen to forget the global statements, variables local to the function will be created and, the module's variables won't be changed, which is not what you want. It's ugly, especially if you have a lot of objects that should remain as singletons. They pollute the module namespace with unnecessary variables. They don't permit lazy allocation and initialization; all global variables will be loaded during the module import process. It's not possible to re-use the code because you can not use the inheritance. No special methods and no object-oriented programming benefits at all. Classic singleton In classic singleton in Python, we check whether an instance is already created. If it is created, we return it; otherwise, we create a new instance, assign it to a class attribute, and return it. Let's try to create a dedicated singleton class: class Singleton(object): def __new__(cls): if not hasattr(cls, 'instance'): cls.instance = super(Singleton, cls).__new__(cls) return cls.instance Here, before creating the instance, we check for the special __new__ method, which is called right before __init__ if we had created an instance earlier. If not, we create a new instance; otherwise, we return the already created instance. Let's check how it works: >>> singleton = Singleton() >>> another_singleton = Singleton() >>> singleton is another_singleton True >>> singleton.only_one_var = "I'm only one var" >>> another_singleton.only_one_var I'm only one var Try to subclass the Singleton class with another one. class Child(Singleton): pass If it's a successor of Singleton, all of its instances should also be the instances of Singleton, thus sharing its states. But this doesn't work as illustrated in the following code: >>> child = Child() >>> child is singleton >>> False >>> child.only_one_var AttributeError: Child instance has no attribute 'only_one_var' To avoid this situation, the borg singleton is used. Borg singleton Borg is also known as monostate. In the borg pattern, all of the instances are different, but they share the same state. In the following code , the shared state is maintained in the _shared_state attribute. And all new instances of the Borg class will have this state as defined in the __new__ class method. class Borg(object):    _shared_state = {}    def __new__(cls, *args, **kwargs):        obj = super(Borg, cls).__new__(cls, *args, **kwargs)        obj.__dict__ = cls._shared_state        return obj Generally, Python stores the instance state in the __dict__ dictionary and when instantiated normally, every instance will have its own __dict__. But, here we deliberately assign the class variable _shared_state to all of the created instances. Here is how it works with subclassing: class Child(Borg):    pass>>> borg = Borg()>>> another_borg = Borg()>>> borg is another_borgFalse>>> child = Child()>>> borg.only_one_var = "I'm the only one var">>> child.only_one_varI'm the only one var So, despite the fact that you can't compare objects by their identity, using the is statement, all child objects share the parents' state. If you want to have a class, which is a descendant of the Borg class but has a different state, you can reset shared_state as follows: class AnotherChild(Borg):    _shared_state = {}>>> another_child = AnotherChild()>>> another_child.only_one_varAttributeError: AnotherChild instance has no attribute 'shared_state' Which type of singleton should be used is up to you. If you expect that your singleton will not be inherited, you can choose the classic singleton; otherwise, it's better to stick with borg. Implementation in Python As a practical example, we'll create a simple web crawler that scans a website you open on it, follows all the links that lead to the same website but to other pages, and downloads all of the images it'll find. To do this, we'll need two functions: a function that scans a website for links, which leads to other pages to build a set of pages to visit, and a function that scans a page for images and downloads them. To make it quicker, we'll download images in two threads. These two threads should not interfere with each other, so don't scan pages if another thread has already scanned them, and don't download images that are already downloaded. So, a set with downloaded images and scanned web pages will be a shared resource for our application, and we'll keep it in a singleton instance. In this example, you will need a library for parsing and screen scraping websites named BeautifulSoup and an HTTP client library httplib2. It should be sufficient to install both with either of the following commands: $ sudo pip install BeautifulSoup httplib2 $ sudo easy_install BeautifulSoup httplib2 First of all, we'll create a Singleton class. Let's use the classic singleton in this example: import httplib2import osimport reimport threadingimport urllibfrom urlparse import urlparse, urljoinfrom BeautifulSoup import BeautifulSoupclass Singleton(object):    def __new__(cls):        if not hasattr(cls, 'instance'):             cls.instance = super(Singleton, cls).__new__(cls)        return cls.instance It will return the singleton objects to all parts of the code that request it. Next, we'll create a class for creating a thread. In this thread, we'll download images from the website: class ImageDownloaderThread(threading.Thread):    """A thread for downloading images in parallel."""    def __init__(self, thread_id, name, counter):        threading.Thread.__init__(self)        self.name = name    def run(self):        print 'Starting thread ', self.name        download_images(self.name)        print 'Finished thread ', self.name The following function traverses the website using BFS algorithms, finds links, and adds them to a set for further downloading. We are able to specify the maximum links to follow if the website is too large. def traverse_site(max_links=10):    link_parser_singleton = Singleton()    # While we have pages to parse in queue    while link_parser_singleton.queue_to_parse:        # If collected enough links to download images, return        if len(link_parser_singleton.to_visit) == max_links:            return        url = link_parser_singleton.queue_to_parse.pop()        http = httplib2.Http()        try:            status, response = http.request(url)        except Exception:            continue        # Skip if not a web page        if status.get('content-type') != 'text/html':            continue        # Add the link to queue for downloading images        link_parser_singleton.to_visit.add(url)        print 'Added', url, 'to queue'        bs = BeautifulSoup(response)        for link in BeautifulSoup.findAll(bs, 'a'):            link_url = link.get('href')            # <img> tag may not contain href attribute            if not link_url:                continue            parsed = urlparse(link_url)            # If link follows to external webpage, skip it            if parsed.netloc and parsed.netloc != parsed_root.netloc:                continue            # Construct a full url from a link which can be relative            link_url = (parsed.scheme or parsed_root.scheme) + '://' + (parsed.netloc or parsed_root.netloc) + parsed.path or ''            # If link was added previously, skip it            if link_url in link_parser_singleton.to_visit:                continue            # Add a link for further parsing            link_parser_singleton.queue_to_parse = [link_url] + link_parser_singleton.queue_to_parse The following function downloads images from the last web resource page in the singleton.to_visit queue and saves it to the img directory. Here, we use a singleton for synchronizing shared data, which is a set of pages to visit between two threads: def download_images(thread_name):    singleton = Singleton()    # While we have pages where we have not download images    while singleton.to_visit:        url = singleton.to_visit.pop()        http = httplib2.Http()        print thread_name, 'Starting downloading images from', url        try:            status, response = http.request(url)        except Exception:            continue        bs = BeautifulSoup(response)       # Find all <img> tags        images = BeautifulSoup.findAll(bs, 'img')        for image in images:            # Get image source url which can be absolute or relative            src = image.get('src')            # Construct a full url. If the image url is relative,            # it will be prepended with webpage domain.            # If the image url is absolute, it will remain as is            src = urljoin(url, src)            # Get a base name, for example 'image.png' to name file locally            basename = os.path.basename(src)            if src not in singleton.downloaded:                singleton.downloaded.add(src)                print 'Downloading', src                # Download image to local filesystem                urllib.urlretrieve(src, os.path.join('images', basename))        print thread_name, 'finished downloading images from', url Our client code is as follows: if __name__ == '__main__':    root = 'http://python.org'    parsed_root = urlparse(root)    singleton = Singleton()    singleton.queue_to_parse = [root]    # A set of urls to download images from    singleton.to_visit = set()    # Downloaded images    singleton.downloaded = set()    traverse_site()    # Create images directory if not exists    if not os.path.exists('images'):        os.makedirs('images')    # Create new threads    thread1 = ImageDownloaderThread(1, "Thread-1", 1)    thread2 = ImageDownloaderThread(2, "Thread-2", 2)    # Start new Threads    thread1.start()    thread2.start() Run a crawler using the following command: $ python crawler.py You should get the following output (your output may vary because the order in which the threads access resources is not predictable): If you go to the images directory, you will find the downloaded images there. Summary To learn more about design patterns in depth, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Learning Python Design Patterns – Second Edition (https://www.packtpub.com/application-development/learning-python-design-patterns-second-edition) Mastering Python Design Patterns (https://www.packtpub.com/application-development/mastering-python-design-patterns) Resources for Article: Further resources on this subject: Python Design Patterns in Depth: The Factory Pattern [Article] Recommending Movies at Scale (Python) [Article] Customizing IPython [Article]
Read more
  • 0
  • 0
  • 17387

article-image-introduction-nodejs-design-patterns
Packt
18 Feb 2016
27 min read
Save for later

An Introduction to Node.js Design Patterns

Packt
18 Feb 2016
27 min read
A design pattern is a reusable solution to a recurring problem; the term is really broad in its definition and can span multiple domains of application. However, the term is often associated with a well-known set of object-oriented patterns that were popularized in the 90's by the book, Design Patterns: Elements of Reusable Object-Oriented Software, Pearson Education by the almost legendary Gang of Four (GoF): Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. We will often refer to these specific set of patterns as traditional design patterns, or GoF design patterns. (For more resources related to this topic, see here.) Applying this set of object-oriented design patterns in JavaScript is not as linear and formal as it would happen in a class-based object-oriented language. As we know, JavaScript is multi-paradigm, object-oriented, and prototype-based, and has dynamic typing; it treats functions as first class citizens, and allows functional programming styles. These characteristics make JavaScript a very versatile language, which gives tremendous power to the developer, but at the same time, it causes a fragmentation of programming styles, conventions, techniques, and ultimately the patterns of its ecosystem. There are so many ways to achieve the same result using JavaScript that everybody has their own opinion on what the best way is to approach a problem. A clear demonstration of this phenomenon is the abundance of frameworks and opinionated libraries in the JavaScript ecosystem; probably, no other language has ever seen so many, especially now that Node.js has given new astonishing possibilities to JavaScript and has created so many new scenarios. In this context, the traditional design patterns too are affected by the nature of JavaScript. There are so many ways in which they can be implemented so that their traditional, strongly object-oriented implementation is not a pattern anymore, and in some cases, not even possible because JavaScript, as we know, doesn't have real classes or abstract interfaces. What doesn't change though, is the original idea at the base of each pattern, the problem it solves, and the concepts at the heart of the solution. In this article, we will see how some of the most important GoF design patterns apply to Node.js and its philosophy, thus rediscovering their importance from another perspective. The design patterns explored in this article are as follows: Factory Proxy Decorator Adapter Strategy State Template Middleware Command This article assumes that the reader has some notion of how inheritance works in JavaScript. Please also be advised that throughout this article we will often use generic and more intuitive diagrams to describe a pattern in place of standard UML, since many patterns can have an implementation based not only on classes, but also on objects and even functions. Factory We begin our journey starting from what probably is the most simple and common design pattern in Node.js: Factory. A generic interface for creating objects We already stressed the fact that, in JavaScript, the functional paradigm is often preferred to a purely object-oriented design, for its simplicity, usability, and small surface area. This is especially true when creating new object instances. In fact, invoking a factory, instead of directly creating a new object from a prototype using the new operator or Object.create(), is so much more convenient and flexible under several aspects. First and foremost, a factory allows us to separate the object creation from its implementation; essentially, a factory wraps the creation of a new instance, giving us more flexibility and control in the way we do it. Inside the factory, we can create a new instance leveraging closures, using a prototype and the new operator, using Object.create(), or even returning a different instance based on a particular condition. The consumer of the factory is totally agnostic about how the creation of the instance is carried out. The truth is that, by using new, we are binding our code to one specific way of creating an object, while in JavaScript we can have much more flexibility, almost for free. As a quick example, let's consider a simple factory that creates an Image object: function createImage(name) { return new Image(name); } var image = createImage('photo.jpeg'); The createImage() factory might look totally unnecessary, why not instantiate the Image class by using the new operator directly? Something like the following line of code: var image = new Image(name); As we already mentioned, using new binds our code to one particular type of object; in the preceding case, to objects of type, Image. A factory instead, gives us much more flexibility; imagine that we want to refactor the Image class, splitting it into smaller classes, one for each image format that we support. If we exposed a factory as the only means to create new images, we can simply rewrite it as follows, without breaking any of the existing code: function createImage(name) { if(name.match(/.jpeg$/)) { return new JpegImage(name); } else if(name.match(/.gif$/)) { return new GifImage(name); } else if(name.match(/.png$/)) { return new PngImage(name); } else { throw new Exception('Unsupported format'); } } Our factory also allows us to not expose the constructors of the objects it creates, and prevents them from being extended or modified (remember the principle of small surface area?). In Node.js, this can be achieved by exporting only the factory, while keeping each constructor private. A mechanism to enforce encapsulation A factory can also be used as an encapsulation mechanism, thanks to closures. Encapsulation refers to the technique of controlling the access to some internal details of an object by preventing the external code from manipulating them directly. The interaction with the object happens only through its public interface, isolating the external code from the changes in the implementation details of the object. This practice is also referred to as information hiding. Encapsulation is also a fundamental principle of object-oriented design, together with inheritance, polymorphism, and abstraction. As we know, in JavaScript, we don't have access level modifiers (for example, we can't declare a private variable), so the only way to enforce encapsulation is through function scopes and closures. A factory makes it straightforward to enforce private variables, consider the following code for example: function createPerson(name) { var privateProperties = {}; var person = { setName: function(name) { if(!name) throw new Error('A person must have a name'); privateProperties.name = name; }, getName: function() { return privateProperties.name; } }; person.setName(name); return person; } In the preceding code, we leverage closures to create two objects: a person object which represents the public interface returned by the factory and a group of privateProperties that are inaccessible from the outside and that can be manipulated only through the interface provided by the person object. For example, in the preceding code, we make sure that a person's name is never empty, this would not be possible to enforce if name was just a property of the person object. Factories are only one of the techniques that we have for creating private members; in fact, other possible approaches are as follows: Defining private variables in a constructor (as recommended by Douglas Crockford: http://javascript.crockford.com/private.html) Using conventions, for example, prefixing the name of a property with an underscore "_" or the dollar sign "$" (this however, does not technically prevent a member from being accessed from the outside) Using ES6 WeakMaps (http://fitzgeraldnick.com/weblog/53/) Building a simple code profiler Now, let's work on a complete example using a factory. Let's build a simple code profiler, an object with the following properties: A start() method that triggers the start of a profiling session An end() method to terminate the session and log its execution time to the console Let's start by creating a file named profiler.js, which will have the following content: function Profiler(label) { this.label = label; this.lastTime = null; } Profiler.prototype.start = function() { this.lastTime = process.hrtime(); } Profiler.prototype.end = function() { var diff = process.hrtime(this.lastTime); console.log('Timer "' + this.label + '" took ' + diff[0] + ' seconds and ' + diff[1] + ' nanoseconds.'); } There is nothing fancy in the preceding class; we simply use the default high resolution timer to save the current time when start() is invoked, and then calculate the elapsed time when end() is executed, printing the result to the console. Now, if we are going to use such a profiler in a real-world application to calculate the execution time of the different routines, we can easily imagine the huge amount of logging we will generate to the standard output, especially in a production environment. What we might want to do instead is redirect the profiling information to another source, for example, a database, or alternatively, disabling the profiler altogether if the application is running in the production mode. It's clear that if we were to instantiate a Profiler object directly by using the new operator, we would need some extra logic in the client code or in the Profiler object itself in order to switch between the different logics. We can instead use a factory to abstract the creation of the Profiler object, so that, depending on whether the application runs in the production or development mode, we can return a fully working Profiler object, or alternatively, a mock object with the same interface, but with empty methods. Let's do this then, in the profiler.js module instead of exporting the Profiler constructor, we will export only a function, our factory. The following is its code: module.exports = function(label) { if(process.env.NODE_ENV === 'development') { return new Profiler(label); //[1] } else if(process.env.NODE_ENV === 'production') { return { //[2] start: function() {}, end: function() {} } } else { throw new Error('Must set NODE_ENV'); } } The factory that we created abstracts the creation of a profiler object from its implementation: If the application is running in the development mode, we return a new, fully functional Profiler object If instead the application is running in the production mode, we return a mock object where the start() and stop() methods are empty functions The nice feature to highlight is that, thanks to the JavaScript dynamic typing, we were able to return an object instantiated with the new operator in one circumstance and a simple object literal in the other (this is also known as duck typing http://en.wikipedia.org/wiki/Duck_typing). Our factory is doing its job perfectly; we can really create objects in any way that we like inside the factory function, and we can execute additional initialization steps or return a different type of object based on particular conditions, and all of this while isolating the consumer of the object from all these details. We can easily understand the power of this simple pattern. Now, we can play with our profiler; this is a possible use case for the factory that we just created earlier: var profiler = require('./profiler'); function getRandomArray(len) { var p = profiler('Generating a ' + len + ' items long array'); p.start(); var arr = []; for(var i = 0; i < len; i++) { arr.push(Math.random()); } p.end(); } getRandomArray(1e6); console.log('Done'); The p variable contains the instance of our Profiler object, but we don't know how it's created and what its implementation is at this point in the code. If we include the preceding code in a file named profilerTest.js, we can easily test these assumptions. To try the program with profiling enabled, run the following command: export NODE_ENV=development; node profilerTest The preceding command enables the real profiler and prints the profiling information to the console. If we want to try the mock profiler instead, we can run the following command: export NODE_ENV=production; node profilerTest The example that we just presented is just a simple application of the factory function pattern, but it clearly shows the advantages of separating an object's creation from its implementation. In the wild As we said, factories are very popular in Node.js. Many packages offer only a factory for creating new instances, some examples are the following: Dnode (https://npmjs.org/package/dnode): This is an RPC system for Node.js. If we look into its source code, we will see that its logic is implemented into a class named D; however, this is never exposed to the outside as the only exported interface is a factory, which allows us to create new instances of the class. You can take a look at its source code at https://github.com/substack/dnode/blob/34d1c9aa9696f13bdf8fb99d9d039367ad873f90/index.js#L7-9. Restify (https://npmjs.org/package/restify): This is a framework to build REST API that allows us to create new instances of a server using the restify.createServer()factory, which internally creates a new instance of the Server class (which is not exported). You can take a look at its source code at https://github.com/mcavage/node-restify/blob/5f31e2334b38361ac7ac1a5e5d852b7206ef7d94/lib/index.js#L91-116. Other modules expose both a class and a factory, but document the factory as the main method—or the most convenient way—to create new instances; some of the examples are as follows: http-proxy (https://npmjs.org/package/http-proxy): This is a programmable proxying library, where new instances are created with httpProxy.createProxyServer(options). The core Node.js HTTP server: This is where new instances are mostly created using http.createServer(), even though this is essentially a shortcut for new http.Server(). bunyan (https://npmjs.org/package/bunyan): This is a popular logging library; in its readme file the contributors propose a factory, bunyan.createLogger(), as the main method to create new instances, even though this would be equivalent to running new bunyan(). Some other modules provide a factory to wrap the creation of other components. Popular examples are through2 and from2 , which allow us to simplify the creation of new streams using a factory approach, freeing the developer from explicitly using inheritance and the new operator. Proxy A proxy is an object that controls the access to another object called subject. The proxy and the subject have an identical interface and this allows us to transparently swap one for the other; in fact, the alternative name for this pattern is surrogate. A proxy intercepts all or some of the operations that are meant to be executed on the subject, augmenting or complementing their behavior. The following figure shows the diagrammatic representation: The preceding figure shows us how the Proxy and the Subject have the same interface and how this is totally transparent to the client, who can use one or the other interchangeably. The Proxy forwards each operation to the subject, enhancing its behavior with additional preprocessing or post-processing. It's important to observe that we are not talking about proxying between classes; the Proxy pattern involves wrapping actual instances of the subject, thus preserving its state. A proxy is useful in several circumstances, for example, consider the following ones: Data validation: The proxy validates the input before forwarding it to the subject Security: The proxy verifies that the client is authorized to perform the operation and it passes the request to the subject only if the outcome of the check is positive Caching: The proxy keeps an internal cache so that the operations are executed on the subject only if the data is not yet present in the cache Lazy initialization: If the creation of the subject is expensive, the proxy can delay it to when it's really necessary Logging: The proxy intercepts the method invocations and the relative parameters, recoding them as they happen Remote objects: A proxy can take an object that is located remotely, and make it appear local Of course, there are many more applications for the Proxy pattern, but these should give us an idea of the extent of its purpose. Techniques for implementing proxies When proxying an object, we can decide to intercept all its methods or only part of them, while delegating the rest of them directly to the subject. There are several ways in which this can be achieved; let's analyze some of them. Object composition Composition is the technique whereby an object is combined with another object for the purpose of extending or using its functionality. In the specific case of the Proxy pattern, a new object with the same interface as the subject is created, and a reference to the subject is stored internally in the proxy in the form of an instance variable or a closure variable. The subject can be injected from the client at creation time or created by the proxy itself. The following is one example of this technique using a pseudo class and a factory: function createProxy(subject) { var proto = Object.getPrototypeOf(subject); function Proxy(subject) { this.subject = subject; } Proxy.prototype = Object.create(proto); //proxied method Proxy.prototype.hello = function() { return this.subject.hello() + ' world!'; } //delegated method Proxy.prototype.goodbye = function() { return this.subject.goodbye .apply(this.subject, arguments); } return new Proxy(subject); } To implement a proxy using composition, we have to intercept the methods that we are interested in manipulating (such as hello()), while simply delegating the rest of them to the subject (as we did with goodbye()). The preceding code also shows the particular case where the subject has a prototype and we want to maintain the correct prototype chain, so that, executing proxy instanceof Subject will return true; we used pseudo-classical inheritance to achieve this. This is just an extra step, required only if we are interested in maintaining the prototype chain, which can be useful in order to improve the compatibility of the proxy with code initially meant to work with the subject. However, as JavaScript has dynamic typing, most of the time we can avoid using inheritance and use more immediate approaches. For example, an alternative implementation of the proxy presented in the preceding code, might just use an object literal and a factory: function createProxy(subject) { return { //proxied method hello: function() { return subject.hello() + ' world!'; }, //delegated method goodbye: function() { return subject.goodbye.apply(subject, arguments); } }; } If we want to create a proxy that delegates most of its methods, it would be convenient to generate these automatically using a library, such as delegates (https://npmjs.org/package/delegates). Object augmentation Object augmentation (or monkey patching)is probably the most pragmatic way of proxying individual methods of an object and consists of modifying the subject directly by replacing a method with its proxied implementation; consider the following example: function createProxy(subject) { var helloOrig = subject.hello; subject.hello = function() { return helloOrig.call(this) + ' world!'; } return subject; } This technique is definitely the most convenient one when we need to proxy only one or a few methods, but it has the drawback of modifying the subject object directly. A comparison of the different techniques Composition can be considered the safest way of creating a proxy, because it leaves the subject untouched without mutating its original behavior. Its only drawback is that we have to manually delegate all the methods, even if we want to proxy only one of them. If needed, we might also have to delegate the access to the properties of the subject. The object properties can be delegated using Object.defineProperty(). Find out more at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/defineProperty. Object augmentation, on the other hand, modifies the subject, which might not always be what we want, but it does not present the various inconveniences related to delegation. For this reason, object augmentation is definitely the most pragmatic way to implement proxies in JavaScript, and it's the preferred technique in all those circumstances where modifying the subject is not a big concern. However, there is at least one situation where composition is almost necessary; this is when we want to control the initialization of the subject as for example, to create it only when needed (lazy initialization). It is worth pointing out that by using a factory function (createProxy() in our examples), we can shield our code from the technique used to generate the proxy. Creating a logging Writable stream To see the proxy pattern in a real example, we will now build an object that acts as a proxy to a Writable stream, by intercepting all the calls to the write() method and logging a message every time this happens. We will use an object composition to implement our proxy; this is how the loggingWritable.js file looks: function createLoggingWritable(writableOrig) { var proto = Object.getPrototypeOf(writableOrig); function LoggingWritable(subject) { this.writableOrig = writableOrig; } LoggingWritable.prototype = Object.create(proto); LoggingWritable.prototype.write = function(chunk, encoding, callback) { if(!callback && typeof encoding === 'function') { callback = encoding; encoding = undefined; } console.log('Writing ', chunk); return this.writableOrig.write(chunk, encoding, function() { console.log('Finished writing ', chunk); callback && callback(); }); }; LoggingWritable.prototype.on = function() { return this.writableOrig.on .apply(this.writableOrig, arguments); }; LoggingWritable.prototype.end = function() { return this.writableOrig.end .apply(this.writableOrig, arguments); } return new LoggingWritable(this.writableOrig); } In the preceding code, we created a factory that returns a proxied version of the writable object passed as an argument. We provide an override for the write() method that logs a message to the standard output every time it is invoked and every time the asynchronous operation completes. This is also a good example to demonstrate the particular case of creating proxies of asynchronous functions, which makes necessary to proxy the callback as well; this is an important detail to be considered in a platform such as Node.js. The remaining methods, on() and end(), are simply delegated to the original writable stream (to keep the code leaner we are not considering the other methods of the Writable interface). We can now include a few more lines of code into the logginWritable.js module to test the proxy that we just created: var fs = require('fs'); var writable = fs.createWriteStream('test.txt'); var writableProxy = createProxy(writable); writableProxy.write('First chunk'); writableProxy.write('Second chunk'); writable.write('This is not logged'); writableProxy.end(); The proxy did not change the original interface of the stream or its external behavior, but if we run the preceding code, we will now see that every chunk that is written into the stream is transparently logged to the console. Proxy in the ecosystem – function hooks and AOP In its numerous forms, Proxy is a quite popular pattern in Node.js and in the ecosystem. In fact, we can find several libraries that allow us to simplify the creation of proxies, most of the time leveraging object augmentation as an implementation approach. In the community, this pattern can be also referred to as function hooking or sometimes also as Aspect Oriented Programming (AOP), which is actually a common area of application for proxies. As it happens in AOP, these libraries usually allow the developer to set pre or post execution hooks for a specific method (or a set of methods) that allow us to execute some custom code before and after the execution of the advised method respectively. Sometimes proxies are also called middleware, because, as it happens in the middleware pattern, they allow us to preprocess and post-process the input/output of a function. Sometimes, they also allow to register multiple hooks for the same method using a middleware-like pipeline. There are several libraries on npm that allow us to implement function hooks with little effort. Among them there are hooks (https://npmjs.org/package/hooks), hooker (https://npmjs.org/package/hooker), and meld (https://npmjs.org/package/meld). In the wild Mongoose (http://mongoosejs.com) is a popular Object-Document Mapping (ODM) library for MongoDB. Internally, it uses the hooks package (https://npmjs.org/package/hooks) to provide pre and post execution hooks for the init, validate, save, and remove methods of its Document objects. Find out more on the official documentation at http://mongoosejs.com/docs/middleware.html. Decorator Decorator is a structural pattern that consists of dynamically augmenting the behavior of an existing object. It's different from classical inheritance, because the behavior is not added to all the objects of the same class but only to the instances that are explicitly decorated. Implementation-wise, it is very similar to the Proxy pattern, but instead of enhancing or modifying the behavior of the existing interface of an object, it augments it with new functionalities, as described in the following figure: In the previous figure, the Decorator object is extending the Component object by adding the methodC() operation. The existing methods are usually delegated to the decorated object, without further processing. Of course, if necessary we can easily combine the Proxy pattern, so that also the calls to the existing methods can be intercepted and manipulated. Techniques for implementing decorators Although Proxy and Decorator are conceptually two different patterns, with different intents, they practically share the same implementation strategies. Let's revise them. Composition Using composition, the decorated component is wrapped around a new object that usually inherits from it. The Decorator in this case simply needs to define the new methods while delegating the existing ones to the original component: function decorate(component) { var proto = Object.getPrototypeOf(component); function Decorator(component) { this.component = component; } Decorator.prototype = Object.create(proto); //new method Decorator.prototype.greetings = function() { //... }; //delegated method Decorator.prototype.hello = function() { this.component.hello.apply(this.component, arguments); }; return new Decorator(component); } Object augmentation Object decoration can also be achieved by simply attaching new methods directly to the decorated object, as follows: function decorate(component) { //new method component.greetings = function() { //... }; return component; } The same caveats discussed during the analysis of the Proxy pattern are also valid for Decorator. Let's now practice the pattern with a working example! Decorating a LevelUP database Before we start coding with the next example, let's spend a few words to introduce LevelUP, the module that we are now going to work with. Introducing LevelUP and LevelDB LevelUP (https://npmjs.org/package/levelup) is a Node.js wrapper around Google's LevelDB, a key-value store originally built to implement IndexedDB in the Chrome browser, but it's much more than that. LevelDB has been defined by Dominic Tarr as the "Node.js of databases", because of its minimalism and extensibility. Like Node.js, LevelDB provides blazing fast performances and only the most basic set of features, allowing developers to build any kind of database on top of it. The Node.js community, and in this case Rod Vagg, did not miss the chance to bring the power of this database into Node.js by creating LevelUP. Born as a wrapper for LevelDB, it then evolved to support several kinds of backends, from in-memory stores, to other NoSQL databases such as Riak and Redis, to web storage engines such as IndexedDB and localStorage, allowing to use the same API on both the server and the client, opening up some really interesting scenarios. Today, there is a full-fledged ecosystem around LevelUp made of plugins and modules that extend the tiny core to implement features such as replication, secondary indexes, live updates, query engines, and more. Also, complete databases were built on top of LevelUP, including CouchDB clones such as PouchDB (https://npmjs.org/package/pouchdb) and CouchUP (https://npmjs.org/package/couchup), and even a graph database, levelgraph (https://npmjs.org/package/levelgraph) that can work both on Node.js and the browser! Find out more about the LevelUP ecosystem at https://github.com/rvagg/node-levelup/wiki/Modules. Implementing a LevelUP plugin In the next example, we are going to show how we can create a simple plugin for LevelUp using the Decorator pattern, and in particular, the object augmentation technique, which is the simplest but nonetheless the most pragmatic and effective way to decorate objects with additional capabilities. For convenience, we are going to use the level package (http://npmjs.org/package/level) that bundles both levelup and the default adapter called leveldown, which uses LevelDB as the backend. What we want to build is a plugin for LevelUP that allows to receive notifications every time an object with a certain pattern is saved into the database. For example, if we subscribe to a pattern such as {a: 1}, we want to receive a notification when objects such as {a: 1, b: 3} or {a: 1, c: 'x'} are saved into the database. Let's start to build our small plugin by creating a new module called levelSubscribe.js. We will then insert the following code: module.exports = function levelSubscribe(db) { db.subscribe = function(pattern, listener) { //[1] db.on('put', function(key, val) { //[2] var match = Object.keys(pattern).every(function(k) { //[3] return pattern[k] === val[k]; }); if(match) { listener(key, val); //[4] } }); }; return db; } That's it for our plugin, and it's extremely simple. Let's see what happens in the preceding code briefly: We decorated the db object with a new method named subscribe(). We simply attached the method directly to the provided db instance (object augmentation). We listen for any put operation performed on the database. We performed a very simple pattern-matching algorithm, which verified that all the properties in the provided pattern are also available on the data being inserted. If we have a match, we notify the listener. Let's now create some code—in a new file named levelSubscribeTest.js—to try out our new plugin: var level = require('level'); //[1] var db = level(__dirname + '/db', {valueEncoding: 'json'}); var levelSubscribe = require('./levelSubscribe'); //[2] db = levelSubscribe(db); db.subscribe({doctype: 'tweet', language: 'en'}, //[3] function(k, val){ console.log(val); }); //[4] db.put('1', {doctype: 'tweet', text: 'Hi', language: 'en'}); db.put('2', {doctype: 'company', name: 'ACME Co.'}); This is what we did in the preceding code: First, we initialize our LevelUP database, choosing the directory where the files will be stored and the default encoding for the values. Then, we attach our plugin, which decorates the original db object. At this point, we are ready to use the new feature provided by our plugin, the subscribe() method, where we specify that we are interested in all the objects with doctype: 'tweet' and language: 'en'. Finally, we save some values in the database, so that we can see our plugin in action: db.put('1', {doctype: 'tweet', text: 'Hi', language: 'en'}); db.put('2', {doctype: 'company', name: 'ACME Co.'}); This example shows a real application of the decorator pattern in its most simple implementation: object augmentation. It might look like a trivial pattern but it has undoubted power if used appropriately. For simplicity, our plugin will work only in combination with the put operations, but it can be easily expanded to work even with the batch operations (https://github.com/rvagg/node-levelup#batch). In the wild For more examples of how Decorator is used in the real world, we might want to inspect the code of some more LevelUp plugins: level-inverted-index (https://github.com/dominictarr/level-inverted-index): This is a plugin that adds inverted indexes to a LevelUP database, allowing to perform simple text searches across the values stored in the database level-plus (https://github.com/eugeneware/levelplus): This is a plugin that adds atomic updates to a LevelUP database Summary To learn more about Node.js, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Node.js Essentials (https://www.packtpub.com/web-development/nodejs-essentials) Node.js Blueprints (https://www.packtpub.com/web-development/nodejs-blueprints) Learning Node.js for Mobile Application Development (https://www.packtpub.com/web-development/learning-nodejs-mobile-application-development) Mastering Node.js (https://www.packtpub.com/web-development/mastering-nodejs) Resources for Article: Further resources on this subject: Node.js Fundamentals [Article] Learning Node.js for Mobile Application Development [Article] Developing a Basic Site with Node.js and Express [Article]
Read more
  • 0
  • 0
  • 13317
Visually different images

article-image-exploring-the-strategy-behavioral-design-pattern-in-node-js
Expert Network
02 Jun 2021
10 min read
Save for later

Exploring the Strategy Behavioral Design Pattern in Node.js

Expert Network
02 Jun 2021
10 min read
A design pattern is a reusable solution to a recurring problem. The term is really broad in its definition and can span multiple domains of an application. However, the term is often associated with a well-known set of object-oriented patterns that were popularized in the 90s by the book, Design Patterns: Elements of Reusable Object- Oriented Software, Pearson Education, by the almost legendary Gang of Four (GoF): Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. This article is an excerpt from the book Node.js Design Patterns, Third Edition by Mario Casciaro and Luciano Mammino – a comprehensive guide for learning proven patterns, techniques, and tricks to take full advantage of the Node.js platform. In this article, we’ll look at the behavior of components in software design. We’ll learn how to combine objects and how to define the way they communicate so that the behavior of the resulting structure becomes extensible, modular, reusable, and adaptable. After introducing all the behavioral design patterns, we will dive deep into the details of the strategy pattern. Now, it's time to roll up your sleeves and get your hands dirty with some behavioral design patterns. Types of Behavioral Design Patterns The Strategy pattern allows us to extract the common parts of a family of closely related components into a component called the context and allows us to define strategy objects that the context can use to implement specific behaviors. The State pattern is a variation of the Strategy pattern where the strategies are used to model the behavior of a component when under different states. The Template pattern, instead, can be considered the "static" version of the Strategy pattern, where the different specific behaviors are implemented as subclasses of the template class, which models the common parts of the algorithm. The Iterator pattern provides us with a common interface to iterate over a collection. It has now become a core pattern in Node.js. JavaScript offers native support for the pattern (with the iterator and iterable protocols). Iterators can be used as an alternative to complex async iteration patterns and even to Node.js streams. The Middleware pattern allows us to define a modular chain of processing steps. This is a very distinctive pattern born from within the Node.js ecosystem. It can be used to preprocess and postprocess data and requests. The Command pattern materializes the information required to execute a routine, allowing such information to be easily transferred, stored, and processed. The Strategy Pattern The Strategy pattern enables an object, called the context, to support variations in its logic by extracting the variable parts into separate, interchangeable objects called strategies. The context implements the common logic of a family of algorithms, while a strategy implements the mutable parts, allowing the context to adapt its behavior depending on different factors, such as an input value, a system configuration, or user preferences. Strategies are usually part of a family of solutions and all of them implement the same interface expected by the context. The following figure shows the situation we just described: Figure 1: General structure of the Strategy pattern Figure 1 shows you how the context object can plug different strategies into its structure as if they were replaceable parts of a piece of machinery. Imagine a car; its tires can be considered its strategy for adapting to different road conditions. We can fit winter tires to go on snowy roads thanks to their studs, while we can decide to fit high-performance tires for traveling mainly on motorways for a long trip. On the one hand, we don't want to change the entire car for this to be possible, and on the other, we don't want a car with eight wheels so that it can go on every possible road. The Strategy pattern is particularly useful in all those situations where supporting variations in the behavior of a component requires complex conditional logic (lots of if...else or switch statements) or mixing different components of the same family. Imagine an object called Order that represents an online order on an e-commerce website. The object has a method called pay() that, as it says, finalizes the order and transfers the funds from the user to the online store. To support different payment systems, we have a couple of options: Use an ..elsestatement in the pay() method to complete the operation based on the chosen payment option Delegate the logic of the payment to a strategy object that implements the logic for the specific payment gateway selected by the user In the first solution, our Order object cannot support other payment methods unless its code is modified. Also, this can become quite complex when the number of payment options grows. Instead, using the Strategy pattern enables the Order object to support a virtually unlimited number of payment methods and keeps its scope limited to only managing the details of the user, the purchased items, and the relative price while delegating the job of completing the payment to another object. Let's now demonstrate this pattern with a simple, realistic example. Multi-format configuration objects Let's consider an object called Config that holds a set of configuration parameters used by an application, such as the database URL, the listening port of the server, and so on. The Config object should be able to provide a simple interface to access these parameters, but also a way to import and export the configuration using persistent storage, such as a file. We want to be able to support different formats to store the configuration, for example, JSON, INI, or YAML. By applying what we learned about the Strategy pattern, we can immediately identify the variable part of the Config object, which is the functionality that allows us to serialize and deserialize the configuration. This is going to be our strategy. Creating a new module Let's create a new module called config.js, and let's define the generic part of our configuration manager: import { promises as fs } from 'fs' import objectPath from 'object-path' export class Config { constructor (formatStrategy) {                           // (1) this.data = {} this.formatStrategy = formatStrategy } get (configPath) {                                       // (2) return objectPath.get(this.data, configPath) } set (configPath, value) {                                // (2) return objectPath.set(this.data, configPath, value) } async load (filePath) {                                  // (3) console.log(`Deserializing from ${filePath}`) this.data = this.formatStrategy.deserialize( await fs.readFile(filePath, 'utf-8') ) } async save (filePath) {                                  // (3) console.log(`Serializing to ${filePath}`) await fs.writeFile(filePath, this.formatStrategy.serialize(this.data)) } } This is what's happening in the preceding code: In the constructor, we create an instance variable called data to hold the configuration data. Then we also store formatStrategy, which represents the component that we will use to parse and serialize the data. We provide two methods, set()and get(), to access the configuration properties using a dotted path notation (for example, property.subProperty) by leveraging a library called object-path (nodejsdp.link/object-path). The load() and save() methods are where we delegate, respectively, the deserialization and serialization of the data to our strategy. This is where the logic of the Config class is altered based on the formatStrategy passed as an input in the constructor. As we can see, this very simple and neat design allows the Config object to seamlessly support different file formats when loading and saving its data. The best part is that the logic to support those various formats is not hardcoded anywhere, so the Config class can adapt without any modification to virtually any file format, given the right strategy. Creating format Strategies To demonstrate this characteristic, let's now create a couple of format strategies in a file called strategies.js. Let's start with a strategy for parsing and serializing data using the INI file format, which is a widely used configuration format (more info about it here: nodejsdp.link/ini-format). For the task, we will use an npm package called ini (nodejsdp.link/ini): import ini from 'ini' export const iniStrategy = { deserialize: data => ini.parse(data), serialize: data => ini.stringify(data) } Nothing really complicated! Our strategy simply implements the agreed interface, so that it can be used by the Config object. Similarly, the next strategy that we are going to create allows us to support the JSON file format, widely used in JavaScript and in the web development ecosystem in general: export const jsonStrategy = { deserialize: data => JSON.parse(data), serialize: data => JSON.stringify(data, null, '  ') } Now, to show you how everything comes together, let's create a file named index.js, and let's try to load and save a sample configuration using different formats: import { Config } from './config.js' import { jsonStrategy, iniStrategy } from './strategies.js' async function main () { const iniConfig = new Config(iniStrategy) await iniConfig.load('samples/conf.ini') iniConfig.set('book.nodejs', 'design patterns') await iniConfig.save('samples/conf_mod.ini') const jsonConfig = new Config(jsonStrategy) await jsonConfig.load('samples/conf.json') jsonConfig.set('book.nodejs', 'design patterns') await jsonConfig.save('samples/conf_mod.json') } main() Our test module reveals the core properties of the Strategy pattern. We defined only one Config class, which implements the common parts of our configuration manager, then, by using different strategies for serializing and deserializing data, we created different Config class instances supporting different file formats. The example we've just seen shows us only one of the possible alternatives that we had for selecting a strategy. Other valid approaches might have been the following: Creating two different strategy families: One for the deserialization and the other for the serialization. This would have allowed reading from a format and saving to another. Dynamically selecting the strategy: Depending on the extension of the file provided; the Config object could have maintained a map extension → strategy and used it to select the right algorithm for the given extension. As we can see, we have several options for selecting the strategy to use, and the right one only depends on your requirements and the tradeoff in terms of features and the simplicity you want to obtain. Furthermore, the implementation of the pattern itself can vary a lot as well. For example, in its simplest form, the context and the strategy can both be simple functions: function context(strategy) {...} Even though this may seem insignificant, it should not be underestimated in a programming language such as JavaScript, where functions are first-class citizens and used as much as fully-fledged objects. Between all these variations, though, what does not change is the idea behind the pattern; as always, the implementation can slightly change but the core concepts that drive the pattern are always the same. Summary In this article, we dive deep into the details of the strategy pattern, one of the Behavioral Design Patterns in Node.js. Learn more in the book, Node.js Design Patterns, Third Edition by Mario Casciaro and Luciano Mammino. About the Authors Mario Casciaro is a software engineer and entrepreneur. Mario worked at IBM for a number of years, first in Rome, then in Dublin Software Lab. He currently splits his time between Var7 Technologies-his own software company-and his role as lead engineer at D4H Technologies where he creates software for emergency response teams. Luciano Mammino wrote his first line of code at the age of 12 on his father's old i386. Since then he has never stopped coding. He is currently working at FabFitFun as principal software engineer where he builds microservices to serve millions of users every day.
Read more
  • 0
  • 0
  • 11574

article-image-common-design-patterns-javascript
Richa Tripathi
01 May 2018
14 min read
Save for later

Implementing 5 Common Design Patterns in JavaScript (ES8)

Richa Tripathi
01 May 2018
14 min read
In this tutorial, we'll see how common design patterns can be used as blueprints for organizing larger structures. Defining steps with template functions A template is a design pattern that details the order a given set of operations are to be executed in; however, a template does not outline the steps themselves. This pattern is useful when behavior is divided into phases that have some conceptual or side effect dependency that requires them to be executed in a specific order. Here, we'll see how to use the template function design pattern. We assume you already have a workspace that allows you to create and run ES modules in your browser for all the recipes given below: How to do it... Open your command-line application and navigate to your workspace. Create a new folder named 09-01-defining-steps-with-template-functions. Copy or create an index.html file that loads and runs a main function from main.js. Create a main.js file that defines a new abstract class named Mission: // main.js class Mission { constructor () { if (this.constructor === Mission) { throw new Error('Mission is an abstract class, must extend'); } } } Add a function named execute that calls three instance methods—determineDestination, determinPayload, and launch: // main.js class Mission { execute () { this.determinDestination(); this.determinePayload(); this.launch(); } } Create a LunarRover class that extends the Mission class: // main.js class LunarRover extends Mission {} Add a constructor that assigns name to an instance property: // main.js class LunarRover extends Mission constructor (name) { super(); this.name = name; } } Implement the three methods called by Mission.execute: // main.js class LunarRover extends Mission {} determinDestination() { this.destination = 'Oceanus Procellarum'; } determinePayload() { this.payload = 'Rover with camera and mass spectrometer.'; } launch() { console.log(` Destination: ${this.destination} Playload: ${this.payload} Lauched! Rover Will arrive in a week. `); } } Create a JovianOrbiter class that also extends the Mission class: // main.js class LunarRover extends Mission {} constructor (name) { super(); this.name = name; } determinDestination() { this.destination = 'Jovian Orbit'; } determinePayload() { this.payload = 'Orbiter with decent module.'; } launch() { console.log(` Destination: ${this.destination} Playload: ${this.payload} Lauched! Orbiter Will arrive in 7 years. `); } } Create a main function that creates both concrete mission types and executes them: // main.js export function main() { const jadeRabbit = new LunarRover('Jade Rabbit'); jadeRabbit.execute(); const galileo = new JovianOrbiter('Galileo'); galileo.execute(); } Start your Python web server and open the following link in your browser: http://localhost:8000/. The output should appear as follows: How it works... The Mission abstract class defines the execute method, which calls the other instance methods in a particular order. You'll notice that the methods called are not defined by the Mission class. This implementation detail is the responsibility of the extending classes. This use of abstract classes allows child classes to be used by code that takes advantage of the interface defined by the abstract class. In the template function pattern, it is the responsibility of the child classes to define the steps. When they are instantiated, and the execute method is called, those steps are then performed in the specified order. Ideally, we'd be able to ensure that Mission.execute was not overridden by any inheriting classes. Overriding this method works against the pattern and breaks the contract associated with it. This pattern is useful for organizing data-processing pipelines. The guarantee that these steps will occur in a given order means that, if side effects are eliminated, the instances can be organized more flexibly. The implementing class can then organize these steps in the best possible way. Assembling customized instances with builders The previous recipe shows how to organize the operations of a class. Sometimes, object initialization can also be complicated. In these situations, it can be useful to take advantage of another design pattern: builders. Now, we'll see how to use builders to organize the initialization of more complicated objects. How to do it... Open your command-line application and navigate to your workspace. Create a new folder named 09-02-assembling-instances-with-builders. Create a main.js file that defines a new class named Mission, which that takes a name constructor argument and assigns it to an instance property. Also, create a describe method that prints out some details: // main.js class Mission { constructor (name) { this.name = name; } describe () { console.log(` The ${this.name} mission will be launched by a ${this.rocket.name} rocket, and deliver a ${this.payload.name} to ${this.destination.name}. `); } } Create classes named Destination, Payload, and Rocket, which receive a name property as a constructor parameter and assign it to an instance property: // main.js class Destination { constructor (name) { this.name = name; } } class Payload { constructor (name) { this.name = name; } } class Rocket { constructor (name) { this.name = name; } }   Create a MissionBuilder class that defines the setMissionName, setDestination, setPayload, and setRocket methods: // main.js class MissionBuilder { setMissionName (name) { this.missionName = name; return this; } setDestination (destination) { this.destination = destination; return this; } setPayload (payload) { this.payload = payload; return this; } setRocket (rocket) { this.rocket = rocket; return this; } } Create a build method that creates a new Mission instance with the appropriate properties: // main.js class MissionBuilder { build () { const mission = new Mission(this.missionName); mission.rocket = this.rocket; mission.destination = this.destination; mission.payload = this.payload; return mission; } } Create a main function that uses MissionBuilder to create a new mission instance: // main.js export function main() { // build an describe a mission new MissionBuilder() .setMissionName('Jade Rabbit') .setDestination(new Destination('Oceanus Procellarum')) .setPayload(new Payload('Lunar Rover')) .setRocket(new Rocket('Long March 3B Y-23')) .build() .describe(); } Start your Python web server and open the following link in your browser: http://localhost:8000/. Your output should appear as follows: How it works... The builder defines methods for assigning all the relevant properties and defines a build method that ensures that each is called and assigned appropriately. Builders are like template functions, but instead of ensuring that a set of operations are executed in the correct order, they ensure that an instance is properly configured before returning. Because each instance method of MissionBuilder returns the this reference, the methods can be chained. The last line of the main function calls describe on the new Mission instance that is returned from the build method. Replicating instances with factories Like builders, factories are a way of organizing object construction. They differ from builders in how they are organized. Often, the interface of factories is a single function call. This makes factories easier to use, if less customizable, than builders. Now, we'll see how to use factories to easily replicate instances. How to do it... Open your command-line application and navigate to your workspace. Create a new folder named 09-03-replicating-instances-with-factories. Copy or create an index.html that loads and runs a main function from main.js. Create a main.js file that defines a new class named Mission. Add a constructor that takes a name constructor argument and assigns it to an instance property. Also, define a simple describe method: // main.js class Mission { constructor (name) { this.name = name; } describe () { console.log(` The ${this.name} mission will be launched by a ${this.rocket.name} rocket, and deliver a ${this.payload.name} to ${this.destination.name}. `); } } Create three classes named Destination, Payload, and Rocket, that take name as a constructor argument and assign it to an instance property: // main.js class Destination { constructor (name) { this.name = name; } } class Payload { constructor (name) { this.name = name; } } class Rocket { constructor (name) { this.name = name; } } Create a MarsMissionFactory object with a single create method that takes two arguments: name and rocket. This method should create a new Mission using those arguments: // main.js const MarsMissionFactory = { create (name, rocket) { const mission = new Mission(name); mission.destination = new Destination('Martian surface'); mission.payload = new Payload('Mars rover'); mission.rocket = rocket; return mission; } } Create a main method that creates and describes two similar missions: // main.js export function main() { // build an describe a mission MarsMissionFactory .create('Curiosity', new Rocket('Atlas V')) .describe(); MarsMissionFactory .create('Spirit', new Rocket('Delta II')) .describe(); } Start your Python web server and open the following link in your browser: http://localhost:8000/. Your output should appear as follows: How it works... The create method takes a subset of the properties needed to create a new mission. The remaining values are provided by the method itself. This allows factories to simplify the process of creating similar instances. In the main function, you can see that two Mars missions have been created, only differing in name and Rocket instance. We've halved the number of values needed to create an instance. This pattern can help reduce instantiation logic. In this recipe, we simplified the creation of different kinds of missions by identifying the common attributes, encapsulating those in the body of the factory function, and using arguments to supply the remaining properties. In this way, commonly used instance shapes can be created without additional boilerplate code. Processing a structure with the visitor pattern The patterns we've seen thus far organize the construction of objects and the execution of operations. The next pattern we'll look at is specially made to traverse and perform operations on hierarchical structures. Here, we'll be looking at the visitor pattern. How to do it... Open your command-line application and navigate to your workspace. Copy the 09-02-assembling-instances-with-builders folder to a new 09-04-processing-a-structure-with-the-visitor-pattern directory. Add a class named MissionInspector to main.js. Create a visitor method that calls a corresponding method for each of the following types: Mission, Destination, Rocket, and Payload: // main.js /* visitor that inspects mission */ class MissionInspector { visit (element) { if (element instanceof Mission) { this.visitMission(element); } else if (element instanceof Destination) { this.visitDestination(element); } else if (element instanceof Rocket) { this.visitRocket(element); } else if (element instanceof Payload) { this.visitPayload(element); } } } Create a visitMission method that logs out an ok message: // main.js class MissionInspector { visitMission (mission) { console.log('Mission ok'); mission.describe(); } } Create a visitDestination method that throws an error if the destination is not in an approved list: // main.js class MissionInspector { visitDestination (destination) { const name = destination.name.toLowerCase(); if ( name === 'mercury' || name === 'venus' || name === 'earth' || name === 'moon' || name === 'mars' ) { console.log('Destination: ', name, ' approved'); } else { throw new Error('Destination: '' + name + '' not approved at this time'); } } } Create a visitPayload method that throws an error if the payload isn't valid: // main.js class MissionInspector { visitPayload (payload) { const name = payload.name.toLowerCase(); const payloadExpr = /(orbiter)|(rover)/; if ( payloadExpr.test(name) ) { console.log('Payload: ', name, ' approved'); } else { throw new Error('Payload: '' + name + '' not approved at this time'); } } } Create a visitRocket method that logs out an ok message: // main.js class MissionInspector { visitRocket (rocket) { console.log('Rocket: ', rocket.name, ' approved'); } } Add an accept method to the Mission class that calls accept on its constituents, then tells visitor to visit the current instance: // main.js class Mission { // other mission code ... accept (visitor) { this.rocket.accept(visitor); this.payload.accept(visitor); this.destination.accept(visitor); visitor.visit(this); } } Add an accept method to the Destination class that tells visitor to visit the current instance: // main.js class Destination { // other mission code ... accept (visitor) { visitor.visit(this); } } Add an accept method to the Payload class that tells visitor to visit the current instance: // main.js class Payload { // other mission code ... accept (visitor) { visitor.visit(this); } } Add an accept method to the Rocket class that tells visitor to visit the current instance: // main.js class Rocket { // other mission code ... accept (visitor) { visitor.visit(this); } } Create a main function that creates different instances with the builder, visits them with the MissionInspector instance, and logs out any thrown errors: // main.js export function main() { // build an describe a mission const jadeRabbit = new MissionBuilder() .setMissionName('Jade Rabbit') .setDestination(new Destination('Moon')) .setPayload(new Payload('Lunar Rover')) .setRocket(new Rocket('Long March 3B Y-23')) .build(); const curiosity = new MissionBuilder() .setMissionName('Curiosity') .setDestination(new Destination('Mars')) .setPayload(new Payload('Mars Rover')) .setRocket(new Rocket('Delta II')) .build(); // expect error from Destination const buzz = new MissionBuilder() .setMissionName('Buzz Lightyear') .setDestination(new Destination('Too Infinity And Beyond')) .setPayload(new Payload('Interstellar Orbiter')) .setRocket(new Rocket('Self Propelled')) .build(); // expect error from payload const terraformer = new MissionBuilder() .setMissionName('Mars Terraformer') .setDestination(new Destination('Mars')) .setPayload(new Payload('Terraformer')) .setRocket(new Rocket('Light Sail')) .build(); const inspector = new MissionInspector(); [jadeRabbit, curiosity, buzz, terraformer].forEach((mission) => { try { mission.accept(inspector); } catch (e) { console.error(e); } }); } Start your Python web server and open the following link in your browser: http://localhost:8000/. Your output should appear as follows: How it works... The visitor pattern has two components. The visitor processes the subject objects and the subjects tell other related subjects about the visitor, and when the current subject should be visited. The accept method is required for each subject to receive a notification that there is a visitor. That method then makes two types of method call. The first is the accept method on its related subjects. The second is the visitor method on the visitor. In this way, the visitor traverses a structure by being passed around by the subjects. The visitor methods are used to process different types of node. In some languages, this is handled by language-level polymorphism. In JavaScript, we can use run-time type checks to do this. The visitor pattern is a good option for processing hierarchical structures of objects, where the structure is not known ahead of time, but the types of subjects are known. Using a singleton to manage instances Sometimes, there are objects that are resource intensive. They may require time, memory, battery power, or network usage that are unavailable or inconvenient. It is often useful to manage the creation and sharing of instances. Here, we'll see how to use singletons to manage instances. How to do it... Open your command-line application and navigate to your workspace. Create a new folder named 09-05-singleton-to-manage-instances. Copy or create an index.html that loads and runs a main function from main.js. Create a main.js file that defines a new class named Rocket. Add a constructor takes a name constructor argument and assigns it to an instance property: // main.js class Rocket { constructor (name) { this.name = name; } } Create a RocketManager object that has a rockets property. Add a findOrCreate method that indexes Rocket instances by the name property: // main.js const RocketManager = { rockets: {}, findOrCreate (name) { const rocket = this.rockets[name] || new Rocket(name); this.rockets[name] = rocket; return rocket; } } Create a main function that creates instances with and without the manager. Compare the instances and see whether they are identical: // main.js export function main() { const atlas = RocketManager.findOrCreate('Atlas V'); const atlasCopy = RocketManager.findOrCreate('Atlas V'); const atlasClone = new Rocket('Atlas V'); console.log('Copy is the same: ', atlas === atlasCopy); console.log('Clone is the same: ', atlas === atlasClone); } Start your Python web server and open the following link in your browser: http://localhost:8000/. Your output should appear as follows: How it works... The object stores references to the instances, indexed by the string value given with name. This map is created when the module loads, so it is persisted through the life of the program. The singleton is then able to look up the object and returns instances created by findOrCreate with the same name. Conserving resources and simplifying communication are primary motivations for using singletons. Creating a single object for multiple uses is more efficient in terms of space and time needed than creating several. Plus, having single instances for messages to be communicated through makes communication between different parts of a program easier. Singletons may require more sophisticated indexing if they are relying on more complicated data. You read an excerpt from a book written by Ross Harrison, titled ECMAScript Cookbook. This book contains over 70 recipes to help you improve your coding skills and solving practical JavaScript problems. 6 JavaScript micro optimizations you need to know Mozilla is building a bridge between Rust and JavaScript Behavior Scripting in C# and Javascript for game developers  
Read more
  • 0
  • 0
  • 10392

article-image-python-design-patterns-depth-factory-pattern
Packt
15 Feb 2016
17 min read
Save for later

Python Design Patterns in Depth: The Factory Pattern

Packt
15 Feb 2016
17 min read
Creational design patterns deal with an object creation [j.mp/wikicrea]. The aim of a creational design pattern is to provide better alternatives for situations where a direct object creation (which in Python happens by the __init__() function [j.mp/divefunc], [Lott14, page 26]) is not convenient. In the Factory design pattern, a client asks for an object without knowing where the object is coming from (that is, which class is used to generate it). The idea behind a factory is to simplify an object creation. It is easier to track which objects are created if this is done through a central function, in contrast to letting a client create objects using a direct class instantiation [Eckel08, page 187]. A factory reduces the complexity of maintaining an application by decoupling the code that creates an object from the code that uses it [Zlobin13, page 30]. Factories typically come in two forms: the Factory Method, which is a method (or in Pythonic terms, a function) that returns a different object per input parameter [j.mp/factorympat]; the Abstract Factory, which is a group of Factory Methods used to create a family of related products [GOF95, page 100], [j.mp/absfpat] (For more resources related to this topic, see here.) Factory Method In the Factory Method, we execute a single function, passing a parameter that provides information about what we want. We are not required to know any details about how the object is implemented and where it is coming from. A real-life example An example of the Factory Method pattern used in reality is in plastic toy construction. The molding powder used to construct plastic toys is the same, but different figures can be produced using different plastic molds. This is like having a Factory Method in which the input is the name of the figure that we want (soldier and dinosaur) and the output is the plastic figure that we requested. The toy construction case is shown in the following figure, which is provided by www.sourcemaking.com [j.mp/factorympat]. A software example The Django framework uses the Factory Method pattern for creating the fields of a form. The forms module of Django supports the creation of different kinds of fields (CharField, EmailField) and customizations (max_length, required) [j.mp/djangofacm]. Use cases If you realize that you cannot track the objects created by your application because the code that creates them is in many different places instead of a single function/method, you should consider using the Factory Method pattern [Eckel08, page 187]. The Factory Method centralizes an object creation and tracking your objects becomes much more easier. Note that it is absolutely fine to create more than one Factory Method, and this is how it is typically done in practice. Each Factory Method logically groups the creation of objects that have similarities. For example, one Factory Method might be responsible for connecting you to different databases (MySQL, SQLite), another Factory Method might be responsible for creating the geometrical object that you request (circle, triangle), and so on. The Factory Method is also useful when you want to decouple an object creation from an object usage. We are not coupled/bound to a specific class when creating an object, we just provide partial information about what we want by calling a function. This means that introducing changes to the function is easy without requiring any changes to the code that uses it [Zlobin13, page 30]. Another use case worth mentioning is related with improving the performance and memory usage of an application. A Factory Method can improve the performance and memory usage by creating new objects only if it is absolutely necessary [Zlobin13, page 28]. When we create objects using a direct class instantiation, extra memory is allocated every time a new object is created (unless the class uses caching internally, which is usually not the case). We can see that in practice in the following code (file id.py), it creates two instances of the same class A and uses the id() function to compare their memory addresses. The addresses are also printed in the output so that we can inspect them. The fact that the memory addresses are different means that two distinct objects are created as follows: class A(object):     pass if __name__ == '__main__':     a = A()     b = A()     print(id(a) == id(b))     print(a, b) Executing id.py on my computer gives the following output:>> python3 id.pyFalse<__main__.A object at 0x7f5771de8f60> <__main__.A object at 0x7f5771df2208> Note that the addresses that you see if you execute the file are not the same as I see because they depend on the current memory layout and allocation. But the result must be the same: the two addresses should be different. There's one exception that happens if you write and execute the code in the Python Read-Eval-Print Loop (REPL) (interactive prompt), but that's a REPL-specific optimization which is not happening normally. Implementation Data comes in many forms. There are two main file categories for storing/retrieving data: human-readable files and binary files. Examples of human-readable files are XML, Atom, YAML, and JSON. Examples of binary files are the .sq3 file format used by SQLite and the .mp3 file format used to listen to music. In this example, we will focus on two popular human-readable formats: XML and JSON. Although human-readable files are generally slower to parse than binary files, they make data exchange, inspection, and modification much more easier. For this reason, it is advised to prefer working with human-readable files, unless there are other restrictions that do not allow it (mainly unacceptable performance and proprietary binary formats). In this problem, we have some input data stored in an XML and a JSON file, and we want to parse them and retrieve some information. At the same time, we want to centralize the client's connection to those (and all future) external services. We will use the Factory Method to solve this problem. The example focuses only on XML and JSON, but adding support for more services should be straightforward. First, let's take a look at the data files. The XML file, person.xml, is based on the Wikipedia example [j.mp/wikijson] and contains information about individuals (firstName, lastName, gender, and so on) as follows: <persons>   <person>     <firstName>John</firstName>     <lastName>Smith</lastName>     <age>25</age>     <address>       <streetAddress>21 2nd Street</streetAddress>       <city>New York</city>       <state>NY</state>       <postalCode>10021</postalCode>     </address>     <phoneNumbers>       <phoneNumber type="home">212 555-1234</phoneNumber>       <phoneNumber type="fax">646 555-4567</phoneNumber>     </phoneNumbers>     <gender>       <type>male</type>     </gender>   </person>   <person>     <firstName>Jimy</firstName>     <lastName>Liar</lastName>     <age>19</age>     <address>       <streetAddress>18 2nd Street</streetAddress>       <city>New York</city>       <state>NY</state>       <postalCode>10021</postalCode>     </address>     <phoneNumbers>       <phoneNumber type="home">212 555-1234</phoneNumber>     </phoneNumbers>     <gender>       <type>male</type>     </gender>   </person>   <person>     <firstName>Patty</firstName>     <lastName>Liar</lastName>     <age>20</age>     <address>       <streetAddress>18 2nd Street</streetAddress>       <city>New York</city>       <state>NY</state>       <postalCode>10021</postalCode>     </address>     <phoneNumbers>       <phoneNumber type="home">212 555-1234</phoneNumber>       <phoneNumber type="mobile">001 452-8819</phoneNumber>     </phoneNumbers>     <gender>       <type>female</type>     </gender>   </person> </persons> The JSON file, donut.json, comes from the GitHub account of Adobe [j.mp/adobejson] and contains donut information (type, price/unit i.e. ppu, topping, and so on) as follows: [   {     "id": "0001",     "type": "donut",     "name": "Cake",     "ppu": 0.55,     "batters": {       "batter": [         { "id": "1001", "type": "Regular" },         { "id": "1002", "type": "Chocolate" },         { "id": "1003", "type": "Blueberry" },         { "id": "1004", "type": "Devil's Food" }       ]     },     "topping": [       { "id": "5001", "type": "None" },       { "id": "5002", "type": "Glazed" },       { "id": "5005", "type": "Sugar" },       { "id": "5007", "type": "Powdered Sugar" },       { "id": "5006", "type": "Chocolate with Sprinkles" },       { "id": "5003", "type": "Chocolate" },       { "id": "5004", "type": "Maple" }     ]   },   {     "id": "0002",     "type": "donut",     "name": "Raised",     "ppu": 0.55,     "batters": {       "batter": [         { "id": "1001", "type": "Regular" }       ]     },     "topping": [       { "id": "5001", "type": "None" },       { "id": "5002", "type": "Glazed" },       { "id": "5005", "type": "Sugar" },       { "id": "5003", "type": "Chocolate" },       { "id": "5004", "type": "Maple" }     ]   },   {     "id": "0003",     "type": "donut",     "name": "Old Fashioned",     "ppu": 0.55,     "batters": {       "batter": [         { "id": "1001", "type": "Regular" },         { "id": "1002", "type": "Chocolate" }       ]     },     "topping": [       { "id": "5001", "type": "None" },       { "id": "5002", "type": "Glazed" },       { "id": "5003", "type": "Chocolate" },       { "id": "5004", "type": "Maple" }     ]   } ] We will use two libraries that are part of the Python distribution for working with XML and JSON: xml.etree.ElementTree and json as follows: import xml.etree.ElementTree as etree import json The JSONConnector class parses the JSON file and has a parsed_data() method that returns all data as a dictionary (dict). The property decorator is used to make parsed_data() appear as a normal variable instead of a method as follows: class JSONConnector:     def __init__(self, filepath):         self.data = dict()         with open(filepath, mode='r', encoding='utf-8') as f:             self.data = json.load(f)       @property     def parsed_data(self):         return self.data The XMLConnector class parses the XML file and has a parsed_data() method that returns all data as a list of xml.etree.Element as follows: class XMLConnector:     def __init__(self, filepath):         self.tree = etree.parse(filepath)     @property    def parsed_data(self):         return self.tree The connection_factory() function is a Factory Method. It returns an instance of JSONConnector or XMLConnector depending on the extension of the input file path as follows: def connection_factory(filepath):     if filepath.endswith('json'):         connector = JSONConnector     elif filepath.endswith('xml'):         connector = XMLConnector     else:         raise ValueError('Cannot connect to {}'.format(filepath))     return connector(filepath) The connect_to() function is a wrapper of connection_factory(). It adds exception handling as follows: def connect_to(filepath):     factory = None     try:         factory = connection_factory(filepath)     except ValueError as ve:         print(ve)     return factory The main() function demonstrates how the Factory Method design pattern can be used. The first part makes sure that exception handling is effective as follows: def main():     sqlite_factory = connect_to('data/person.sq3') The next part shows how to work with the XML files using the Factory Method. XPath is used to find all person elements that have the last name Liar. For each matched person, the basic name and phone number information are shown as follows: xml_factory = connect_to('data/person.xml')     xml_data = xml_factory.parsed_data()     liars = xml_data.findall     (".//{person}[{lastName}='{}']".format('Liar'))     print('found: {} persons'.format(len(liars)))     for liar in liars:         print('first name:         {}'.format(liar.find('firstName').text))         print('last name: {}'.format(liar.find('lastName').text))         [print('phone number ({}):'.format(p.attrib['type']),         p.text) for p in liar.find('phoneNumbers')] The final part shows how to work with the JSON files using the Factory Method. Here, there's no pattern matching, and therefore the name, price, and topping of all donuts are shown as follows: json_factory = connect_to('data/donut.json')     json_data = json_factory.parsed_data     print('found: {} donuts'.format(len(json_data)))     for donut in json_data:         print('name: {}'.format(donut['name']))         print('price: ${}'.format(donut['ppu']))         [print('topping: {} {}'.format(t['id'], t['type'])) for t         in donut['topping']] For completeness, here is the complete code of the Factory Method implementation (factory_method.py) as follows: import xml.etree.ElementTree as etree import json class JSONConnector:     def __init__(self, filepath):         self.data = dict()         with open(filepath, mode='r', encoding='utf-8') as f:             self.data = json.load(f)     @property     def parsed_data(self):         return self.data class XMLConnector:     def __init__(self, filepath):         self.tree = etree.parse(filepath)     @property     def parsed_data(self):         return self.tree def connection_factory(filepath):     if filepath.endswith('json'):         connector = JSONConnector     elif filepath.endswith('xml'):         connector = XMLConnector     else:         raise ValueError('Cannot connect to {}'.format(filepath))     return connector(filepath) def connect_to(filepath):     factory = None     try:        factory = connection_factory(filepath)     except ValueError as ve:         print(ve)     return factory def main():     sqlite_factory = connect_to('data/person.sq3')     print()     xml_factory = connect_to('data/person.xml')     xml_data = xml_factory.parsed_data     liars = xml_data.findall(".//{}[{}='{}']".format('person',     'lastName', 'Liar'))     print('found: {} persons'.format(len(liars)))     for liar in liars:         print('first name:         {}'.format(liar.find('firstName').text))         print('last name: {}'.format(liar.find('lastName').text))         [print('phone number ({}):'.format(p.attrib['type']),         p.text) for p in liar.find('phoneNumbers')]     print()     json_factory = connect_to('data/donut.json')     json_data = json_factory.parsed_data     print('found: {} donuts'.format(len(json_data)))     for donut in json_data:     print('name: {}'.format(donut['name']))     print('price: ${}'.format(donut['ppu']))     [print('topping: {} {}'.format(t['id'], t['type'])) for t     in donut['topping']] if __name__ == '__main__':     main() Here is the output of this program as follows: >>> python3 factory_method.pyCannot connect to data/person.sq3found: 2 personsfirst name: Jimylast name: Liarphone number (home): 212 555-1234first name: Pattylast name: Liarphone number (home): 212 555-1234phone number (mobile): 001 452-8819found: 3 donutsname: Cakeprice: $0.55topping: 5001 Nonetopping: 5002 Glazedtopping: 5005 Sugartopping: 5007 Powdered Sugartopping: 5006 Chocolate with Sprinklestopping: 5003 Chocolatetopping: 5004 Maplename: Raisedprice: $0.55topping: 5001 Nonetopping: 5002 Glazedtopping: 5005 Sugartopping: 5003 Chocolatetopping: 5004 Maplename: Old Fashionedprice: $0.55topping: 5001 Nonetopping: 5002 Glazedtopping: 5003 Chocolatetopping: 5004 Maple Notice that although JSONConnector and XMLConnector have the same interfaces, what is returned by parsed_data() is not handled in a uniform way. Different codes must be used to work with each connector. Although it would be nice to be able to use the same code for all connectors, this is at most times not realistic unless we use some kind of common mapping for the data which is very often provided by external data providers. Assuming that you can use exactly the same code for handling the XML and JSON files, what changes are required to support a third format, for example, SQLite? Find an SQLite file or create your own and try it. As is now, the code does not forbid a direct instantiation of a connector. Is it possible to do this? Try doing it (hint: functions in Python can have nested classes). Summary To learn more about design patterns in depth, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Learning Python Design Patterns (https://www.packtpub.com/application-development/learning-python-design-patterns) Learning Python Design Patterns – Second Edition (https://www.packtpub.com/application-development/learning-python-design-patterns-second-edition) Resources for Article:   Further resources on this subject: Recommending Movies at Scale (Python) [article] An In-depth Look at Ansible Plugins [article] Elucidating the Game-changing Phenomenon of the Docker-inspired Containerization Paradigm [article]
Read more
  • 0
  • 0
  • 10192
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-why-we-need-design-patterns
Packt
10 Nov 2016
16 min read
Save for later

Why we need Design Patterns?

Packt
10 Nov 2016
16 min read
In this article by Praseed Pai, and Shine Xavier, authors of the book .NET Design Patterns, we will try to understand the necessity of choosing a pattern-based approach to software development. We start with some principles of software development, which one might find useful while undertaking large projects. The working example in the article starts with a requirements specification and progresses towards a preliminary implementation. We will then try to iteratively improve the solution using patterns and idioms, and come up with a good design that supports a well-defined programming Interface. In this process, we will learn about some software development principles (listed below) one can adhere to, including the following: SOLID principles for OOP Three key uses of design patterns Arlow/Nuestadt archetype patterns Entity, value, and data transfer objects Leveraging the .NET Reflection API for plug-in architecture (For more resources related to this topic, see here.) Some principles of software development Writing quality production code consistently is not easy without some foundational principles under your belt. The purpose of this section is to whet the developer's appetite, and towards the end, some references are given for detailed study. Detailed coverage of these principles warrants a separate book on its own scale. The authors have tried to assimilate the following key principles of software development which would help one write quality code: KISS: Keep it simple, Stupid DRY: Don't repeat yourself YAGNI: You aren't gonna need it Low coupling: Minimize coupling between classes SOLID principles: Principles for better OOP William of Ockham had framed the maxim Keep it simple, Stupid (KISS). It is also called law of parsimony. In programming terms, it can be translated as "writing code in a straightforward manner, focusing on a particular solution that solves the problem at hand". This maxim is important because, most often, developers fall into the trap of writing code in a generic manner for unwarranted extensibility. Even though it initially looks attractive, things slowly go out of bounds. The accidental complexity introduced in the code base for catering to improbable scenarios, often reduces readability and maintainability. The KISS principle can be applied to every human endeavor. Learn more about KISS principle by consulting the Web. Don't repeat yourself (DRY), a maxim which most programmers often forget while implementing their domain logic. Most often, in a collaborative development scenario, code gets duplicated inadvertently due to lack of communication and proper design specifications. This bloats the code base, induces subtle bugs, and make things really difficult to change. By following the DRY maxim at all stages of development, we can avoid additional effort and make the code consistent. The opposite of DRY is write everything twice (WET). You aren't gonna need it (YAGNI), a principle that compliments the KISS axiom. It serves as a warning for people who try to write code in the most general manner, anticipating changes right from the word go,. Too often, in practice, most of this code is not used to make potential code smells. While writing code, one should try to make sure that there are no hard-coded references to concrete classes. It is advisable to program to an interface as opposed to an implementation. This is a key principle which many patterns use to provide behavior acquisition at runtime. A dependency injection framework could be used to reduce coupling between classes. SOLID principles are a set of guidelines for writing better object-oriented software. It is a mnemonic acronym that embodies the following five principles: 1 Single Responsibility Principle (SRP) A class should have only one responsibility. If it is doing more than one unrelated thing, we need to split the class. 2 Open Close Principle (OCP) A class should be open for extension, closed for modification. 3 Liskov Substitution Principle (LSP) Named after Barbara Liskov, a Turing Award laureate, who postulated that a sub-class (derived class) could substitute any super class (base class) references without affecting the functionality. Even though it looks like stating the obvious, most implementations have quirks which violate this principle. 4 Interface segregation principle (ISP) It is more desirable to have multiple interfaces for a class (such classes can also be called components) than having one Uber interface that forces implementation of all methods (both relevant and non-relevant to the solution context). 5 Dependency Inversion (DI) This is a principle which is very useful for Framework design. In the case of Frameworks, the client code will be invoked by server code, as opposed to the usual process of client invoking the server. The main principle here is that abstraction should not depend upon details, rather, details should depend upon abstraction. This is also called the "Hollywood Principle" (Do not call us, we will call you back). The authors consider the preceding five principles primarily as a verification mechanism. This will be demonstrated by verifying the ensuing case study implementations for violation of these principles. Karl Seguin has written an e-book titled Foundations of Programming – Building Better Software, which covers most of what has been outlined here. Read his book to gain an in-depth understanding of most of these topics. The SOLID principles are well covered in the Wikipedia page on the subject, which can be retrieved from https://en.wikipedia.org/wiki/SOLID_(object-oriented_design). Robert Martin's Agile Principles, Patterns, and Practices in C# is a definitive book on learning about SOLID, as Robert Martin itself is the creator of these principles, even though Michael Feathers coined the acronym. Why patterns are required? According to the authors, the three key advantages of pattern-oriented software development that stand out are as follows: A language/platform-agnostic way to communicate about software artifacts A tool for refactoring initiatives (targets for refactoring) Better API design With the advent of the pattern movement, the software development community got a canonical language to communicate about software design, architecture, and implementation. Software development is a craft which has got trade-offs attached to each strategy, and there are multiple ways to develop software. The various pattern catalogues brought some conceptual unification for this cacophony in software development. Most developers around the world today who are worth their salt, can understand and speak this language. We believe you will be able to do the same at the end of the article. Fancy yourself stating the following about your recent implementation: For our tax computation example, we have used command pattern to handle the computation logic. The commands (handlers) are configured using an XML file, and a factory method takes care of the instantiation of classes on the fly using Lazy loading. We cache the commands, and avoid instantiation of more objects by imposing singleton constraints on the invocation. We support prototype pattern where command objects can be cloned. The command objects have a base implementation, where concrete command objects use the template method pattern to override methods which are necessary. The command objects are implemented using the design by contracts idiom. The whole mechanism is encapsulated using a Façade class, which acts as an API layer for the application logic. The application logic uses entity objects (reference) to store the taxable entities, attributes like tax parameters are stored as value objects. We use data transfer object (DTO) to transfer the data from the application layer to the computational layer. Arlow/Nuestadt-based archetype pattern is the unit of structuring the tax computation logic. For some developers, the preceding language/platform-independent description of the software being developed is enough to understand the approach taken. This will boos developer productivity (during all phases of SDLC, including development, maintenance, and support) as the developers will be able to get a good mental model of the code base. Without Pattern catalogs, such succinct descriptions of the design or implementation would have been impossible. In an Agile software development scenario, we develop software in an iterative fashion. Once we reach a certain maturity in a module, developers refactor their code. While refactoring a module, patterns do help in organizing the logic. The case study given next will help you to understand the rationale behind "Patterns as refactoring targets". APIs based on well-defined patterns are easy to use and impose less cognitive load on programmers. The success of the ASP.NET MVC framework, NHibernate, and API's for writing HTTP modules and handlers in the ASP.NET pipeline are a few testimonies to the process. Personal income tax computation - A case study Rather than explaining the advantages of patterns, the following example will help us to see things in action. Computation of the annual income tax is a well-known problem domain across the globe. We have chosen an application domain which is well known to focus on the software development issues. The application should receive inputs regarding the demographic profile (UID, Name, Age, Sex, Location) of a citizen and the income details (Basic, DA, HRA, CESS, Deductions) to compute his tax liability. The System should have discriminants based on the demographic profile, and have a separate logic for senior citizens, juveniles, disabled people, old females, and others. By discriminant we mean demographic that parameters like age, sex and location should determine the category to which a person belongs and apply category-specific computation for that individual. As a first iteration, we will implement logic for the senior citizen and ordinary citizen category. After preliminary discussion, our developer created a prototype screen as shown in the following image: Archetypes and business archetype pattern The legendary Swiss psychologist, Carl Gustav Jung, created the concept of archetypes to explain fundamental entities which arise from a common repository of human experiences. The concept of archetypes percolated to the software industry from psychology. The Arlow/Nuestadt patterns describe business archetype patterns like Party, Customer Call, Product, Money, Unit, Inventory, and so on. An Example is the Apache Maven archetype, which helps us to generate projects of different natures like J2EE apps, Eclipse plugins, OSGI projects, and so on. The Microsoft patterns and practices describes archetypes for targeting builds like Web applications, rich client application, mobile applications, and services applications. Various domain-specific archetypes can exist in respective contexts as organizing and structuring mechanisms. In our case, we will define some archetypes which are common in the taxation domain. Some of the key archetypes in this domain are: Sr.no Archetype Description 1 SeniorCitizenFemale Tax payers who are female, and above the age of 60 years 2 SeniorCitizen Tax payers who are male, and above the age of 60 years 3 OrdinaryCitizen Tax payers who are Male/Female, and above 18 years of age 3 DisabledCitizen Tax payers who have any disability 4 MilitaryPersonnel Tax payers who are military personnel 5 Juveniles Tax payers whose age is less than 18 years We will use demographic parameters as discriminant to find the archetype which corresponds to the entity. The whole idea of inducing archetypes is to organize the tax computation logic around them. Once we are able to resolve the archetypes, it is easy to locate and delegate the computations corresponding to the archetypes. Entity, value, and data transfer objects We are going to create a class which represents a citizen. Since citizen needs to be uniquely identified, we are going to create an entity object, which is also called reference object (from DDD catalog). The universal identifier (UID) of an entity object is the handle which an application refers. Entity objects are not identified by their attributes, as there can be two people with the same name. The ID uniquely identifies an entity object. The definition of an entity object is given as follows: public class TaxableEntity { public int Id { get; set; } public string Name { get; set; } public int Age { get; set; } public char Sex { get; set; } public string Location { get; set; } public TaxParamVO taxparams { get; set; } } In the preceding class definition, Id uniquely identifies the entity object. TaxParams is a value object (from DDD catalog) associated with the entity object. Value objects do not have a conceptual identity. They describe some attributes of things (entities). The definition of TaxParams is given as follows: public class TaxParamVO { public double Basic {get;set;} public double DA { get; set; } public double HRA { get; set; } public double Allowance { get; set; } public double Deductions { get; set; } public double Cess { get; set; } public double TaxLiability { get; set; } public bool Computed { get; set; } } While writing applications ever since Smalltalk, Model-view-controller (MVC) is the most dominant paradigm for structuring applications. The application is split into a model layer ( which mostly deals with data), view layer (which acts as a display layer), and a controller (to mediate between the two). In the Web development scenario, they are physically partitioned across machines. To transfer data between layers, the J2EE pattern catalog identified the DTO to transfer data between layers. The DTO object is defined as follows: public class TaxDTO { public int id { } public TaxParamVO taxparams { } } If the layering exists within the same process, we can transfer these objects as-is. If layers are partitioned across processes or systems, we can use XML or JSON serialization to transfer objects between the layers. A computation engine We need to separate UI processing, input validation, and computation to create a solution which can be extended to handle additional requirements. The computation engine will execute different logic depending upon the command received. The GoF command pattern is leveraged for executing the logic based on the command received. The command pattern consists of four constituents. They are: Command object Parameters Command Dispatcher Client The command object's interface has an Execute method. The parameters to the command objects are passed through a bag. The client invokes the command object by passing the parameters through a bag to be consumed by the Command Dispatcher. The Parameters are passed to the command object through the following data structure: public class COMPUTATION_CONTEXT { private Dictionary<String, Object> symbols = new Dictionary<String, Object>(); public void Put(string k, Object value) { symbols.Add(k, value); } public Object Get(string k) { return symbols[k]; } } The ComputationCommand interface, which all the command objects implement, has only one Execute method, which is shown next. The Execute method takes a bag as parameter. The COMPUTATION_CONTEXT data structure acts as the bag here. Interface ComputationCommand { bool Execute(COMPUTATION_CONTEXT ctx); } Since we have already implemented a command interface and bag to transfer the parameters, it is time that we implement a command object. For the sake of simplicity, we will implement two commands where we hardcode the tax liability. public class SeniorCitizenCommand : ComputationCommand { public bool Execute(COMPUTATION_CONTEXT ctx) { TaxDTO td = (TaxDTO)ctx.Get("tax_cargo"); //---- Instead of computation, we are assigning //---- constant tax for each arcetypes td.taxparams.TaxLiability = 1000; td.taxparams.Computed = true; return true; } } public class OrdinaryCitizenCommand : ComputationCommand { public bool Execute(COMPUTATION_CONTEXT ctx) { TaxDTO td = (TaxDTO)ctx.Get("tax_cargo"); //---- Instead of computation, we are assigning //---- constant tax for each arcetypes td.taxparams.TaxLiability = 1500; td.taxparams.Computed = true; return true; } } The commands will be invoked by a CommandDispatcher Object, which takes an archetype string and a COMPUTATION_CONTEXT object. The CommandDispatcher acts as an API layer for the application. class CommandDispatcher { public static bool Dispatch(string archetype, COMPUTATION_CONTEXT ctx) { if (archetype == "SeniorCitizen") { SeniorCitizenCommand cmd = new SeniorCitizenCommand(); return cmd.Execute(ctx); } else if (archetype == "OrdinaryCitizen") { OrdinaryCitizenCommand cmd = new OrdinaryCitizenCommand(); return cmd.Execute(ctx); } else { return false; } } } The application to engine communication The data from the application UI, be it Web or Desktop, has to flow to the computation engine. The following ViewHandler routine shows how data, retrieved from the application UI, is passed to the engine, via the Command Dispatcher, by a client: public static void ViewHandler(TaxCalcForm tf) { TaxableEntity te = GetEntityFromUI(tf); if (te == null){ ShowError(); return; } string archetype = ComputeArchetype(te); COMPUTATION_CONTEXT ctx = new COMPUTATION_CONTEXT(); TaxDTO td = new TaxDTO { id = te.id, taxparams = te.taxparams}; ctx.Put("tax_cargo",td); bool rs = CommandDispatcher.Dispatch(archetype, ctx); if ( rs ) { TaxDTO temp = (TaxDTO)ctx.Get("tax_cargo"); tf.Liabilitytxt.Text = Convert.ToString(temp.taxparams.TaxLiability); tf.Refresh(); } } At this point, imagine that a change in requirement has been received from the stakeholders. Now, we need to support tax computation for new categories. Initially, we had different computations for senior citizen and ordinary citizen. Now we need to add new Archetypes. At the same time, to make the software extensible (loosely coupled) and maintainable, it would be ideal if we provide the capability to support new Archetypes in a configurable manner as opposed to recompiling the application for every new archetype owing to concrete references. The Command Dispatcher object does not scale well to handle additional archetypes. We need to change the assembly whenever a new archetype is included, as the tax computation logic varies for each archetype. We need to create a pluggable architecture to add or remove archetypes at will. The plugin system to make system extensible Writing system logic without impacting the application warrants a mechanism—that of loading a class on the fly. Luckily, the .NET Reflection API provides a mechanism for one to load a class during runtime, and invoke methods within it. A developer worth his salt should learn the Reflection API to write systems which change dynamically. In fact, most of the technologies like ASP.NET, Entity framework, .NET Remoting, and WCF work because of the availability of Reflection API in the .NET stack. Henceforth, we will be using an XML configuration file to specify our tax computation logic. A sample XML file is given next: <?xml version="1.0"?> <plugins> <plugin archetype ="OrindaryCitizen" command="TaxEngine.OrdinaryCitizenCommand"/> <plugin archetype="SeniorCitizen" command="TaxEngine.SeniorCitizenCommand"/> </plugins> The contents of the XML file can be read very easily using LINQ to XML. We will be generating a Dictionary object by the following code snippet: private Dictionary<string,string> LoadData(string xmlfile) { return XDocument.Load(xmlfile) .Descendants("plugins") .Descendants("plugin") .ToDictionary(p => p.Attribute("archetype").Value, p => p.Attribute("command").Value); } Summary In this article, we have covered quite a lot of ground in understanding why pattern-oriented software development is a good way to develop modern software. We started the article citing some key principles. We progressed further to demonstrate the applicability of these key principles by iteratively skinning an application which is extensible and resilient to changes. Resources for Article: Further resources on this subject: Debugging Your .NET Application [article] JSON with JSON.Net [article] Using ASP.NET Controls in SharePoint [article]
Read more
  • 0
  • 0
  • 9439

Packt
16 Feb 2016
11 min read
Save for later

Python Design Patterns in Depth – The Observer Pattern

Packt
16 Feb 2016
11 min read
In this atricle you will see a group of objects when the state of another object changes. A very popular example lies in the Model-View-Controller (MVC) pattern. Assume that we are using the data of the same model in two views, for instance in a pie chart and in a spreadsheet. Whenever the model is modified, both the views need to be updated. That's the role of the Observer design pattern [Eckel08, page 213]. The Observer pattern describes a publish-subscribe relationship between a single object, : the publisher, which is also known as the subject or observable, and one or more objects, : the subscribers, also known as observers. In the MVC example, the publisher is the model and the subscribers are the views. However, MVC is not the only publish-subscribe example. Subscribing to a news feed such as RSS or Atom is another example. Many readers can subscribe to the feed typically using a feed reader, and every time a new item is added, they receive the update automatically. The ideas behind Observer are the same as the ideas behind MVC and the separation of concerns principle, that is, to increase decoupling between the publisher and subscribers, and to make it easy to add/remove subscribers at runtime. Additionally, the publisher is not concerned about who its observers are. It just sends notifications to all the subscribers [GOF95, page 327]. (For more resources related to this topic, see here.) A real-life example In reality, an auction resembles Observer. Every auction bidder has a number paddle that is raised whenever they want to place a bid. Whenever the paddle is raised by a bidder, the auctioneer acts as the subject by updating the price of the bid and broadcasting the new price to all bidders (subscribers). The following figure, courtesy of www.sourcemaking.com, [j.mp/observerpat], shows how the Observer pattern relates to an auction: A software example The django-observer package [j.mp/django-obs] is a third-party Django package that can be used to register callback functions that are executed when there are changes in several Django fields. Many different types of fields are supported (CharField, IntegerField, and so forth). RabbitMQ is a library that can be used to add asynchronous messaging support to an application. Several messaging protocols are supported, such as HTTP and AMQP. RabbitMQ can be used in a Python application to implement a publish-subscribe pattern, which is nothing more than the Observer design pattern [j.mp/rabbitmqobs]. Use cases We generally use the Observer pattern when we want to inform/update one or more objects (observers/subscribers) about a change that happened to another object (subject/publisher/observable). The number of observers as well as who the observers are may vary and can be changed dynamically (at runtime). We can think of many cases where Observer can be useful. Whether it is RSS, Atom, or another format, the idea is the same; you follow a feed, and every time it is updated, you receive a notification about the update [Zlobin13, page 60]. The same concept exists in social networking. If you are connected to another person using a social networking service, and your connection updates something, you are notified about it. It doesn't matter if the connection is a Twitter user that you follow, a real friend on Facebook, or a business colleague on LinkedIn. Event-driven systems are another example where Observer can be (and usually is) used. In such systems, listeners are used to "listen" for specific events. The listeners are triggered when an event they are listening to is created. This can be typing a specific key (of the keyboard), moving the mouse, and more. The event plays the role of the publisher and the listeners play the role of the observers. The key point in this case is that multiple listeners (observers) can be attached to a single event (publisher) [j.mp/magobs]. Implementation We will implement a data formatter. The ideas described here are based on the ActiveState Python Observer code recipe [j.mp/pythonobs]. There is a default formatter that shows a value in the decimal format. However, we can add/register more formatters. In this example, we will add a hex and binary formatter. Every time the value of the default formatter is updated, the registered formatters are notified and take action. In this case, the action is to show the new value in the relevant format. Observer is actually one of the patterns where inheritance makes sense. We can have a base Publisher class that contains the common functionality of adding, removing, and notifying observers. Our DefaultFormatter class derives from Publisher and adds the formatter-specific functionality. We can dynamically add and remove observers on demand. The following class diagram shows an instance of the example using two observers: HexFormatter and BinaryFormatter. Note that, because class diagrams are static, they cannot show the whole lifetime of a system, only the state of it at a specific point in time. We begin with the Publisher class. The observers are kept in the observers list. The add() method registers a new observer, or throws an error if it already exists. The remove() method unregisters an existing observer, or throws an exception if it does not exist. Finally, the notify() method informs all observers about a change: class Publisher: def __init__(self): self.observers = [] def add(self, observer): if observer not in self.observers: self.observers.append(observer) else: print('Failed to add: {}'.format(observer)) def remove(self, observer): try: self.observers.remove(observer) except ValueError: print('Failed to remove: {}'.format(observer)) def notify(self): [o.notify(self) for o in self.observers] Let's continue with the DefaultFormatter class. The first thing that __init__() does is call __init__() method of the base class, since this is not done automatically in Python. A DefaultFormatter instance has name to make it easier for us to track its status. We use name mangling in the _data variable to state that it should not be accessed directly. Note that this is always possible in Python [Lott14, page 54] but fellow developers have no excuse for doing so, since the code already states that they shouldn't. There is a serious reason for using name mangling in this case. Stay tuned. DefaultFormatter treats the _data variable as an integer, and the default value is zero: class DefaultFormatter(Publisher): def __init__(self, name): Publisher.__init__(self) self.name = name self._data = 0 The __str__() method returns information about the name of the publisher and the value of _data. type(self).__name__ is a handy trick to get the name of a class without hardcoding it. It is one of those things that make the code less readable but easier to maintain. It is up to you to decide if you like it or not: def __str__(self): return "{}: '{}' has data = {}".format(type(self).__name__, self.name, self._data) There are two data() methods. The first one uses the @property decorator to give read access to the _data variable. Using this, we can just execute object.data instead of object.data(): @property def data(self): return self._data The second data() method is more interesting. It uses the @setter decorator, which is called every time the assignment (=) operator is used to assign a new value to the _data variable. This method also tries to cast a new value to an integer, and does exception handling in case this operation fails: @data.setter def data(self, new_value): try: self._data = int(new_value) except ValueError as e: print('Error: {}'.format(e)) else: self.notify() The next step is to add the observers. The functionality of HexFormatter and BinaryFormatter is very similar. The only difference between them is how they format the value of data received by the publisher, that is, in hexadecimal and binary, respectively: class HexFormatter: def notify(self, publisher): print("{}: '{}' has now hex data = {}".format(type(self).__name__, publisher.name, hex(publisher.data))) class BinaryFormatter: def notify(self, publisher): print("{}: '{}' has now bin data = {}".format(type(self).__name__, publisher.name, bin(publisher.data))) No example is fun without some test data. The main() function initially creates a DefaultFormatter instance named test1 and afterwards attaches (and detaches) the two available observers. Exception handling is also exercised to make sure that the application does not crash when erroneous data is passed by the user. Moreover, things such as trying to add the same observer twice or removing an observer that does not exist should cause no crashes: def main(): df = DefaultFormatter('test1') print(df) print() hf = HexFormatter() df.add(hf) df.data = 3 print(df) print() bf = BinaryFormatter() df.add(bf) df.data = 21 print(df) print() df.remove(hf) df.data = 40 print(df) print() df.remove(hf) df.add(bf) df.data = 'hello' print(df) print() df.data = 15.8 print(df) Here's how the full code of the example (observer.py) looks: class Publisher: def __init__(self): self.observers = [] def add(self, observer): if observer not in self.observers: self.observers.append(observer) else: print('Failed to add: {}'.format(observer)) def remove(self, observer): try: self.observers.remove(observer) except ValueError: print('Failed to remove: {}'.format(observer)) def notify(self): [o.notify(self) for o in self.observers] class DefaultFormatter(Publisher): def __init__(self, name): Publisher.__init__(self) self.name = name self._data = 0 def __str__(self): return "{}: '{}' has data = {}".format(type(self).__name__, self.name, self._data) @property def data(self): return self._data @data.setter def data(self, new_value): try: self._data = int(new_value) except ValueError as e: print('Error: {}'.format(e)) else: self.notify() class HexFormatter: def notify(self, publisher): print("{}: '{}' has now hex data = {}".format(type(self).__name__, publisher.name, hex(publisher.data))) class BinaryFormatter: def notify(self, publisher): print("{}: '{}' has now bin data = {}".format(type(self).__name__, publisher.name, bin(publisher.data))) def main(): df = DefaultFormatter('test1') print(df) print() hf = HexFormatter() df.add(hf) df.data = 3 print(df) print() bf = BinaryFormatter() df.add(bf) df.data = 21 print(df) print() df.remove(hf) df.data = 40 print(df) print() df.remove(hf) df.add(bf) df.data = 'hello' print(df) print() df.data = 15.8 print(df) if __name__ == '__main__': main() Executing observer.py gives the following output: >>> python3 observer.py DefaultFormatter: 'test1' has data = 0 HexFormatter: 'test1' has now hex data = 0x3 DefaultFormatter: 'test1' has data = 3 HexFormatter: 'test1' has now hex data = 0x15 BinaryFormatter: 'test1' has now bin data = 0b10101 DefaultFormatter: 'test1' has data = 21 BinaryFormatter: 'test1' has now bin data = 0b101000 DefaultFormatter: 'test1' has data = 40 Failed to remove: <__main__.HexFormatter object at 0x7f30a2fb82e8> Failed to add: <__main__.BinaryFormatter object at 0x7f30a2fb8320> Error: invalid literal for int() with base 10: 'hello' BinaryFormatter: 'test1' has now bin data = 0b101000 DefaultFormatter: 'test1' has data = 40 BinaryFormatter: 'test1' has now bin data = 0b1111 DefaultFormatter: 'test1' has data = 15 What we see in the output is that as the extra observers are added, more (and relevant) output is shown, and when an observer is removed, it is not notified any longer. That's exactly what we want: runtime notifications that we are able to enable/disable on demand. The defensive programming part of the application also seems to work fine. Trying to do funny things such as removing an observer that does not exist or adding the same observer twice is not allowed. The messages shown are not very user-friendly but I leave that up to you as an exercise. Runtime failures of trying to pass a string when the API expects a number are also properly handled without causing the application to crash/terminate. This example would be much more interesting if it were interactive. Even a simple menu that allows the user to attach/detach observers at runtime and modify the value of DefaultFormatter would be nice because the runtime aspect becomes much more visible. Feel free to do it. Another nice exercise is to add more observers. For example, you can add an octal formatter, a roman numeral formatter, or any other observer that uses your favorite representation. Be creative and have fun! Summary In this article, we covered the Observer design pattern. We use Observer when we want to be able to inform/notify all stakeholders (an object or a group of objects) when the state of an object changes. An important feature of observer is that the number of subscribers/observers as well as who the subscribers are may vary and can be changed at runtime. You can refer more books on this topic mentioned as follows: Expert Python Programming: https://www.packtpub.com/application-development/expert-python-programming Learning Python Design Patterns: https://www.packtpub.com/application-development/learning-python-design-patterns Resources for Article: Further resources on this subject: Python Design Patterns in Depth: The Singleton Pattern[article] Python Design Patterns in Depth: The Factory Pattern[article] Automating Processes with ModelBuilder and Python[article]
Read more
  • 0
  • 0
  • 8130

article-image-factory-method-pattern
Packt
10 Feb 2016
10 min read
Save for later

The Factory Method Pattern

Packt
10 Feb 2016
10 min read
In this article by Anshul Verma and Jitendra Zaa, author of the book Apex Design Patterns, we will discuss some problems that can occur mainly during the creation of class instances and how we can write the code for the creation of objects in a more simple, easy to maintain, and scalable way. (For more resources related to this topic, see here.) In this article, we will discuss the the factory method creational design pattern. Often, we find that some classes have common features (behavior) and can be considered classes of the same family. For example, multiple payment classes represent a family of payment services. Credit card, debit card, and net banking are some of the examples of payment classes that have common methods, such as makePayment, authorizePayment, and so on. Using the factory method pattern, we can develop controller classes, which can use these payment services, without knowing the actual payment type at design time. The factory method pattern is a creational design pattern used to create objects of classes from the same family without knowing the exact class name at design time. Using the factory method pattern, classes can be instantiated from the common factory method. The advantage of using this pattern is that it delegates the creation of an object to another class and provides a good level of abstraction. Let's learn this pattern using the following example: The Universal Call Center company is new in business and provides free admin support to customers to resolve issues related to their products. A call center agent can provide some information about the product support; for example, to get the Service Level Agreement (SLA) or information about the total number of tickets allowed to open per month. A developer came up with the following class: public class AdminBasicSupport{ /** * return SLA in hours */ public Integer getSLA() { return 40; } /** * Total allowed support tickets allowed every month */ public Integer allowedTickets() { // As this is basic support return 9999; } } Now, to get the SLA of AdminBasicSupport, we need to use the following code every time: AdminBasicSupport support = new AdminBasicSupport(); System.debug('Support SLA is - '+support.getSLA()); Output - Support SLA is – 40 The "Universal Call Centre" company was doing very well, and in order to grow the business and increase the profit, they started the premium support for customers who were willing to pay for cases and get a quick support. To make them special from the basic support, they changed the SLA to 12 hours and maximum 50 cases could be opened in one month. A developer had many choices to make this happen in the existing code. However, instead of changing the existing code, they created a new class that would handle only the premium support-related functionalities. This was a good decision because of the single responsibility principle. public class AdminPremiumSupport{ /** * return SLA in hours */ public Integer getSLA() { return 12; } /** * Total allowed support tickets allowed every month is 50 */ public Integer allowedTickets() { return 50; } } Now, every time any information regarding the SLA or allowed tickets per month is needed, the following Apex code can be used: if(Account.supportType__c == 'AdminBasic') { AdminBasicSupport support = new AdminBasicSupport(); System.debug('Support SLA is - '+support.getSLA()); }else{ AdminPremiumSupport support = new AdminPremiumSupport(); System.debug('Support SLA is - '+support.getSLA()); } As we can see in the preceding example, instead of adding some conditions to the existing class, the developer decided to go with a new class. Each class has its own responsibility, and they need to be changed for only one reason. If any change is needed in the basic support, then only one class needs to be changed. As we all know that this design principle is known as the Single Responsibility Principle. Business was doing exceptionally well in the call center, and they planned to start the golden and platinum support as well. Developers started facing issues with the current approach. Currently, they have two classes for the basic and premium support and requests for two more classes were in the pipeline. There was no guarantee that the support type will not remain the same in future. Because of every new support type, a new class is needed; and therefore, the previous code needs to be updated to instantiate these classes. The following code will be needed to instantiate these classes: if(Account.supportType__c == 'AdminBasic') { AdminBasicSupport support = new AdminBasicSupport(); System.debug('Support SLA is - '+support.getSLA()); }else if(Account.supportType__c == 'AdminPremier') { AdminPremiumSupport support = new AdminPremiumSupport(); System.debug('Support SLA is - '+support.getSLA()); }else if(Account.supportType__c == 'AdminGold') { AdminGoldSupport support = new AdminGoldSupport(); System.debug('Support SLA is - '+support.getSLA()); }else{ AdminPlatinumSupport support = new AdminPlatinumSupport(); System.debug('Support SLA is - '+support.getSLA()); } We are only considering the getSLA() method, but in a real application, there can be other methods and scenarios as well. The preceding code snippet clearly depicts the code duplicity and maintenance nightmare. The following image shows the overall complexity of the example that we are discussing: Although they are using a separate class for each support type, an introduction to a new support class will lead to changes in the code in all existing code locations where these classes are being used. The development team started brainstorming to make sure that the code is capable to extend easily in future with the least impact on the existing code. One of the developers came up with a suggestion to use an interface for all support classes so that every class can have the same methods and they can be referred to using an interface. The following interface was finalized to reduce the code duplicity: public Interface IAdminSupport{ Integer getSLA() ; Integer allowedTickets(); } Methods defined within an interface have no access modifiers and just contain their signatures. Once an interface was created, it was time to update existing classes. In our case, only one line needed to be changed and the remaining part of the code was the same because both the classes already have the getSLA() and allowedTickets() methods. Let's take a look at the following line of code: public class AdminPremiumSupport{ This will be changed to the following code: public class AdminBasicSupportImpl implements IAdminSupport{ The following line of code is as follows: public class AdminPremiumSupport{ This will be changed to the following code: public class AdminPremiumSupportImpl implements IAdminSupport{ In the same way, the AdminGoldSupportImpl and AdminPlatinumSupportImpl classes are written. A class diagram is a type of Unified Modeling Language (UML), which describes classes, methods, attributes, and their relationships, among other objects in a system. You can read more about class diagrams at https://en.wikipedia.org/wiki/Class_diagram. The following image shows a class diagram of the code written by developers using an interface: Now, the code to instantiate different classes of the support type can be rewritten as follows: IAdminSupport support = null; if(Account.supportType__c == 'AdminBasic') { support = new AdminBasicSupportImpl(); }else if(Account.supportType__c == 'AdminPremier') { support = new AdminPremiumSupportImpl(); }else if(Account.supportType__c == 'AdminGold') { support = new AdminGoldSupportImpl(); }else{ support = new AdminPlatinumSupportImpl(); } System.debug('Support SLA is - '+support.getSLA()); There is no switch case statement in Apex, and that's why multiple if and else statements are written. As per the product team, a new compiler may be released in 2016 and it will be supported. You can vote for this idea at https://success.salesforce.com/ideaView?id=08730000000BrSIAA0. As we can see, the preceding code is minimized to create a required instance of a concrete class, and then uses an interface to access methods. This concept is known as program to interface. This is one of the most recommended OOP principles suggested to be followed. As interfaces are kinds of contracts, we already know which methods will be implemented by concrete classes, and we can completely rely on the interface to call them, which hides their complex implementation and logic. It has a lot of advantages and a few of them are loose coupling and dependency injection. A concrete class is a complete class that can be used to instantiate objects. Any class that is not abstract or an interface can be considered a concrete class. We still have one problem in the previous approach. The code to instantiate concrete classes is still present at many locations and will still require changes if a new support type is added. If we can delegate the creation of concrete classes to some other class, then our code will be completely independent of the existing code and new support types. This concept of delegating decisions and creation of similar types of classes is known as the factory method pattern. The following class can be used to create concrete classes and will act as a factory: /** * This factory class is used to instantiate concrete class * of respective support type * */ public class AdminSupportFactory { public static IAdminSupport getInstance(String supporttype){ IAdminSupport support = null; if(supporttype == 'AdminBasic') { support = new AdminBasicSupportImpl(); }else if(supporttype == 'AdminPremier') { support = new AdminPremiumSupportImpl(); }else if(supporttype == 'AdminGold') { support = new AdminGoldSupportImpl(); }else if(supporttype == 'AdminPlatinum') { support = new AdminPlatinumSupportImpl(); } return support ; } } In the preceding code, we only need to call the getInstance(string) method, and this method will take a decision and return the actual implementation. As a return type is an interface, we already know the methods that are defined, and we can use the method without actually knowing its implementation. This is a very good example of abstraction. The final class diagram of the factory method pattern that we discussed will look like this: The following code snippet can be used repeatedly by any client code to instantiate a class of any support type: IAdminSupport support = AdminSupportFactory.getInstance ('AdminBasic'); System.debug('Support SLA is - '+support.getSLA()); Output : Support SLA is – 40 Reflection in Apex The problem with the preceding design is that whenever a new support needs to be added, we need to add a condition to AdminSupportFactory. We can store the mapping between a support type and its concrete class name in Custom setting. This way, whenever a new concrete class is added, we don't even need to change the factory class and a new entry needs to be added to custom setting. Consider custom setting created by the Support_Type__c name with the Class_Name__c field name of the text type with the following records: Name Class name AdminBasic AdminBasicSupportImpl AdminGolden AdminGoldSupportImpl AdminPlatinum AdminPlatinumSupportImpl AdminPremier AdminPremiumSupportImpl However, using reflection, the AdminSupportFactory class can also be rewritten to instantiate service types at runtime as follows: /** * This factory class is used to instantiate concrete class * of respective support type * */ public class AdminSupportFactory { public static IAdminSupport getInstance(String supporttype) { //Read Custom setting to get actual class name on basis of Support type Support_Type__c supportTypeInfo = Support_Type__c.getValues(supporttype); //from custom setting get appropriate class name Type t = Type.forName(supportTypeInfo.Class_Name__c); IAdminSupport retVal = (IAdminSupport)t.newInstance(); return retVal; } } In the preceding code, we are using the Type system class. This is a very powerful class used to instantiate a new class at runtime. It has the following two important methods: forName: This returns a type that is equivalent to a string passed newInstance: This creates a new object for a specified type Inspecting classes, methods, and variables at runtime without knowing a class name, or instantiating a new object and invoking methods at runtime is known as Reflection in computer science. One more advantage of using the factory method, custom setting, and reflection together is that if in future one of the support types need to be replaced by another service type permanently, then we need to simply change the appropriate mapping in custom setting without any changes in the code. Summary In this article, we discussed how to deal with various situations while instantiating objects using design patterns, using the factory method. Resources for Article: Further resources on this subject: Getting Your APEX Components Logic Right[article] AJAX Implementation in APEX[article] Custom Coding with Apex[article]
Read more
  • 0
  • 23
  • 6759

Packt
12 Jan 2016
11 min read
Save for later

Façade Pattern – Being Adaptive with Façade

Packt
12 Jan 2016
11 min read
In this article by Chetan Giridhar, author of the book, Learning Python Design Patterns - Second Edition, we will get introduced to the Façade design pattern and how it is used in software application development. We will work with a sample use case and implement it in Python v3.5. In brief, we will cover the following topics in this article: An understanding of the Façade design pattern with a UML diagram A real-world use case with the Python v3.5 code implementation The Façade pattern and principle of least knowledge (For more resources related to this topic, see here.) Understanding the Façade design pattern Façade is generally referred to as the face of the building, especially an attractive one. It can be also referred to as a behavior or appearance that gives a false idea of someone's true feelings or situation. When people walk past a façade, they can appreciate the exterior face but aren't aware of the complexities of the structure within. This is how a façade pattern is used. Façade hides the complexities of the internal system and provides an interface to the client that can access the system in a very simplified way. Consider an example of a storekeeper. Now, when you, as a customer, visit a store to buy certain items, you're not aware of the layout of the store. You typically approach the storekeeper who is well aware of the store system. Based on your requirements, the storekeeper picks up items and hands them over to you. Isn't this easy? The customer need not know how the store looks and s/he gets the stuff done through a simple interface, the storekeeper. The Façade design pattern essentially does the following: It provides a unified interface to a set of interfaces in a subsystem and defines a high-level interface that helps the client use the subsystem in an easy way. Façade discusses representing a complex subsystem with a single interface object. It doesn't encapsulate the subsystem but actually combines the underlying subsystems. It promotes the decoupling of the implementation with multiple clients. A UML class diagram We will now discuss the Façade pattern with the help of the following UML diagram: As we observe the UML diagram, you'll realize that there are three main participants in this pattern: Façade: The main responsibility of a façade is to wrap up a complex group of subsystems so that it can provide a pleasing look to the outside world. System: This represents a set of varied subsystems that make the whole system compound and difficult to view or work with. Client: The client interacts with the Façade so that it can easily communicate with the subsystem and get the work completed. It doesn't have to bother about the complex nature of the system. You will now learn a little more about the three main participants from the data structure's perspective. Façade The following points will give us a better idea of Façade: It is an interface that knows which subsystems are responsible for a request It delegates the client's requests to the appropriate subsystem objects using composition For example, if the client is looking for some work to be accomplished, it need not have to go to individual subsystems but can simply contact the interface (Façade) that gets the work done System In the Façade world, System is an entity that performs the following: It implements subsystem functionality and is represented by a class. Ideally, a System is represented by a group of classes that are responsible for different operations. It handles the work assigned by the Façade object but has no knowledge of the façade and keeps no reference to it. For instance, when the client requests the Façade for a certain service, Façade chooses the right subsystem that delivers the service based on the type of service Client Here's how we can describe the client: The client is a class that instantiates the Façade It makes requests to the Façade to get the work done from the subsystems Implementing the Façade pattern in the real world To demonstrate the applications of the Façade pattern, let's take an example that we'd have experienced in our lifetime. Consider that you have a marriage in your family and you are in charge of all the arrangements. Whoa! That's a tough job on your hands. You have to book a hotel or place for marriage, talk to a caterer for food arrangements, organize a florist for all the decorations, and finally handle the musical arrangements expected for the event. In yesteryears, you'd have done all this by yourself, such as talking to the relevant folks, coordinating with them, negotiating on the pricing, but now life is simpler. You go and talk to an event manager who handles this for you. S/he will make sure that they talk to the individual service providers and get the best deal for you. From the Façade pattern perspective we will have the following three main participants: Client: It's you who need all the marriage preparations to be completed in time before the wedding. They should be top class and guests should love the celebrations. Façade: The event manager who's responsible for talking to all the folks that need to work on specific arrangements such as food, flower decorations, among others Subsystems: They represent the systems that provide services such as catering, hotel management, and flower decorations Let's develop an application in Python v3.5 and implement this use case. We start with the client first. It's you! Remember, you're the one who has been given the responsibility to make sure that the marriage preparations are done and the event goes fine! However, you're being clever here and passing on the responsibility to the event manager, isn't it? Let's now look at the You class. In this example, you create an object of the EventManager class so that the manager can work with the relevant folks on marriage preparations while you relax. class You(object):     def __init__(self):         print("You:: Whoa! Marriage Arrangements??!!!")     def askEventManager(self):         print("You:: Let's Contact the Event Manager\n\n")         em = EventManager()         em.arrange()     def __del__(self):         print("You:: Thanks to Event Manager, all preparations done! Phew!") Let's now move ahead and talk about the Façade class. As discussed earlier, the Façade class simplifies the interface for the client. In this case, EventManager acts as a façade and simplifies the work for You. Façade talks to the subsystems and does all the booking and preparations for the marriage on your behalf. Here is the Python code for the EventManager class: class EventManager(object):         def __init__(self):         print("Event Manager:: Let me talk to the folks\n")         def arrange(self):         self.hotelier = Hotelier()         self.hotelier.bookHotel()                 self.florist = Florist()         self.florist.setFlowerRequirements()                  self.caterer = Caterer()         self.caterer.setCuisine()                 self.musician = Musician()         self.musician.setMusicType() Now that we're done with the Façade and client, let's dive into the subsystems. We have developed the following classes for this scenario: Hotelier is for the hotel bookings. It has a method to check whether the hotel is free on that day (__isAvailable) and if it is free for booking the Hotel (bookHotel). The Florist class is responsible for flower decorations. Florist has the setFlowerRequirements() method to be used to set the expectations on the kind of flowers needed for the marriage decoration. The Caterer class is used to deal with the caterer and is responsible for the food arrangements. Caterer exposes the setCuisine() method to accept the type of cuisine to be served at the marriage. The Musician class is designed for musical arrangements at the marriage. It uses the setMusicType() method to understand the music requirements for the event. class Hotelier(object):     def __init__(self):         print("Arranging the Hotel for Marriage? --")         def __isAvailable(self):         print("Is the Hotel free for the event on given day?")         return True       def bookHotel(self):         if self.__isAvailable():             print("Registered the Booking\n\n")     class Florist(object):     def __init__(self):         print("Flower Decorations for the Event? --")         def setFlowerRequirements(self):         print("Carnations, Roses and Lilies would be used for Decorations\n\n")     class Caterer(object):     def __init__(self):         print("Food Arrangements for the Event --")         def setCuisine(self):         print("Chinese & Continental Cuisine to be served\n\n")     class Musician(object):     def __init__(self):         print("Musical Arrangements for the Marriage --")         def setMusicType(self):         print("Jazz and Classical will be played\n\n")   you = You() you.askEventManager() The output of the preceding code is given here: In the preceding code example: The EventManager class is the Façade that simplifies the interface for You EventManager uses composition to create objects of the subsystems such as Hotelier, Caterer, and others The principle of least knowledge As you have learned in the initial parts of this article, the Façade provides a unified system that makes subsystems easy to use. It also decouples the client from the subsystem of components. The design principle that is employed behind the Façade pattern is the principle of least knowledge. The principle of least knowledge guides us to reduce the interactions between objects to just a few friends that are close enough to you. In real terms, it means the following:: When designing a system, for every object created, one should look at the number of classes that it interacts with and the way in which the interaction happens. Following the principle, make sure that we avoid situations where there are many classes created tightly coupled to each other. If there are a lot of dependencies between classes, the system becomes hard to maintain. Any changes in one part of the system can lead to unintentional changes to other parts of the system, which means that the system is exposed to regressions and this should be avoided. Summary We began the article by first understanding the Façade design pattern and the context in which it's used. We understood the basis of Façade and how it is effectively used in software architecture. We looked at how Façade design patterns create a simplified interface for clients to use. It simplifies the complexity of subsystems so that the client benefits. The Façade doesn't encapsulate the subsystem and the client is free to access the subsystems even without going through the Façade. You also learned the pattern with a UML diagram and sample code implementation in Python v3.5. We understood the principle of least knowledge and how its philosophy governs the Façade design patterns. Further resources on this subject: Asynchronous Programming with Python [article] Optimization in Python [article] The Essentials of Working with Python Collections [article]
Read more
  • 0
  • 0
  • 6344
article-image-low-level-c-practices
Packt
21 Oct 2012
14 min read
Save for later

Low-level C# Practices

Packt
21 Oct 2012
14 min read
Working with generics Visual Studio 2005 included .NET version 2.0 which included generics. Generics give developers the ability to design classes and methods that defer the specification of specific parts of a class or method's specification until declaration or instantiation. Generics offer features previously unavailable in .NET. One benefit to generics, that is potentially the most common, is for the implementation of collections that provide a consistent interface to collections of different data types without needing to write specific code for each data type. Constraints can be used to restrict the types that are supported by a generic method or class, or can guarantee specific interfaces. Limits of generics Constraints within generics in C# are currently limited to a parameter-less constructor, interfaces, or base classes, or whether or not the type is a struct or a class (value or reference type). This really means that code within a generic method or type can either be constructed or can make use of methods and properties. Due to these restrictions types within generic types or methods cannot have operators. Writing sequence and iterator members Visual Studio 2005 and C# 2.0 introduced the yield keyword. The yield keyword is used within an iterator member as a means to effectively implement an IEnumerable interface without needing to implement the entire IEnumerable interface. Iterator members are members that return a type of IEnumerable or IEnumerable<T>, and return individual elements in the enumerable via yield return, or deterministically terminates the enumerable via yield break. These members can be anything that can return a value, such as methods, properties, or operators. An iterator that returns without calling yield break has an implied yield break, just as a void method has an implied return. Iterators operate on a sequence but process and return each element as it is requested. This means that iterators implement what is known as deferred execution. Deferred execution is when some or all of the code, although reached in terms of where the instruction pointer is in relation to the code, hasn't entirely been executed yet. Iterators are methods that can be executed more than once and result in a different execution path for each execution. Let's look at an example: public static IEnumerable<DateTime> Iterator() { Thread.Sleep(1000); yield return DateTime.Now; Thread.Sleep(1000); yield return DateTime.Now; Thread.Sleep(1000); yield return DateTime.Now; }   The Iterator method returns IEnumerable which results in three DateTime values. The creation of those three DateTime values is actually invoked at different times. The Iterator method is actually compiled in such a way that a state machine is created under the covers to keep track of how many times the code is invoked and is implemented as a special IEnumerable<DateTime> object. The actual invocation of the code in the method is done through each call of the resulting IEnumerator. MoveNext method. The resulting IEnumerable is really implemented as a collection of delegates that are executed upon each invocation of the MoveNext method, where the state, in the simplest case, is really which of the delegates to invoke next. It's actually more complicated than that, especially when there are local variables and state that can change between invocations and is used across invocations. But the compiler takes care of all that. Effectively, iterators are broken up into individual bits of code between yield return statements that are executed independently, each using potentially local shared data. What are iterators good for other than a really cool interview question? Well, first of all, due to the deferred execution, we can technically create sequences that don't need to be stored in memory all at one time. This is often useful when we want to project one sequence into another. Couple that with a source sequence that is also implemented with deferred execution, we end up creating and processing IEnumerables (also known as collections) whose content is never all in memory at the same time. We can process large (or even infinite) collections without a huge strain on memory. For example, if we wanted to model the set of positive integer values (an infinite set) we could write an iterator method shown as follows: static IEnumerable<BigInteger> AllThePositiveIntegers() { var number = new BigInteger(0); while (true) yield return number++; }   We can then chain this iterator with another iterator, say something that gets all of the positive squares: static IEnumerable<BigInteger> AllThePostivieIntegerSquares( IEnumerable<BigInteger> sourceIntegers) { foreach(var value in sourceIntegers) yield return value*value; }   Which we could use as follows: foreach(var value in AllThePostivieIntegerSquares(AllThePositiveIntegers())) Console.WriteLine(value);   We've now effectively modeled two infi nite collections of integers in memory. Of course, our AllThePostiveIntegerSquares method could just as easily be used with fi nite sequences of values, for example: foreach (var value in AllThePostivieIntegerSquares( Enumerable.Range(0, int.MaxValue) .Select(v => new BigInteger(v)))) Console.WriteLine(value);   In this example we go through all of the positive Int32 values and square each one without ever holding a complete collection of the set of values in memory. As we see, this is a useful method for composing multiple steps that operate on, and result in, sequences of values. We could have easily done this without IEnumerable<T>, or created an IEnumerator class whose MoveNext method performed calculations instead of navigating an array. However, this would be tedious and is likely to be error-prone. In the case of not using IEnumerable<T>, we'd be unable to operate on the data as a collection with things such as foreach. Context: When modeling a sequence of values that is either known only at runtime, or each element can be reliably calculated at runtime. Practice: Consider using an iterator. Working with lambdas Visual Studio 2008 introduced C# 3.0 . In this version of C# lambda expressions were introduced. Lambda expressions are another form of anonymous functions. Lambdas were added to the language syntax primarily as an easier anonymous function syntax for LINQ queries. Although you can't really think of LINQ without lambda expressions, lambda expressions are a powerful aspect of the C# language in their own right. They are concise expressions that use implicitly-typed optional input parameters whose types are implied through the context of their use, rather than explicit de fi nition as with anonymous methods. Along with C# 3.0 in Visual Studio 2008, the .NET Framework 3.5 was introduced which included many new types to support LINQ expressions, such as Action<T> and Func<T>. These delegates are used primarily as definitions for different types of anonymous methods (including lambda expressions). The following is an example of passing a lambda expression to a method that takes a Func<T1, T2, TResult> delegate and the two arguments to pass along to the delegate: ExecuteFunc((f, s) => f + s, 1, 2);   The same statement with anonymous methods: ExecuteFunc(delegate(int f, int s) { return f + s; }, 1, 2);   It's clear that the lambda syntax has a tendency to be much more concise, replacing the delegate and braces with the "goes to" operator (=>). Prior to anonymous functions, member methods would need to be created to pass as delegates to methods. For example: ExecuteFunc(SomeMethod, 1, 2);   This, presumably, would use a method named SomeMethod that looked similar to: private static int SomeMethod(int first, int second) { return first + second; }   Lambda expressions are more powerful in the type inference abilities, as we've seen from our examples so far. We need to explicitly type the parameters within anonymous methods, which is only optional for parameters in lambda expressions. LINQ statements don't use lambda expressions exactly in their syntax. The lambda expressions are somewhat implicit. For example, if we wanted to create a new collection of integers from another collection of integers, with each value incremented by one, we could use the following LINQ statement: var x = from i in arr select i + 1;   The i + 1 expression isn't really a lambda expression, but it gets processed as if it were first converted to method syntax using a lambda expression: var x = arr.Select(i => i + 1);   The same with an anonymous method would be: var x = arr.Select(delegate(int i) { return i + 1; });   What we see in the LINQ statement is much closer to a lambda expression. Using lambda expressions for all anonymous functions means that you have more consistent looking code. Context: When using anonymous functions. Practice: Prefer lambda expressions over anonymous methods. Parameters to lambda expressions can be enclosed in parentheses. For example: var x = arr.Select((i) => i + 1);   The parentheses are only mandatory when there is more than one parameter: var total = arr.Aggregate(0, (l, r) => l + r);   Context: When writing lambdas with a single parameter. Practice: Prefer no parenthesis around the parameter declaration. Sometimes when using lambda expressions, the expression is being used as a delegate that takes an argument. The corresponding parameter in the lambda expression may not be used within the right-hand expression (or statements). In these cases, to reduce the clutter in the statement, it's common to use the underscore character (_) for the name of the parameter. For example: task.ContinueWith(_ => ProcessSecondHalfOfData());   The task.ContinueWith method takes an Action <Task> delegate. This means the previous lambda expression is actually given a task instance (the antecedent Task). In our example, we don't use that task and just perform some completely independent operation. In this case, we use (_) to not only signify that we know we don't use that parameter, but also to reduce the clutter and potential name collisions a little bit. Context: When writing lambda expression that take a single parameter but the parameter is not used. Practice: Use underscore (_) for the name of the parameter. There are two types of lambda expressions. So far, we've seen expression lambdas. Expression lambdas are a single expression on the right-hand side that evaluates to a value or void. There is another type of lambda expression called statement lambdas. These lambdas have one or more statements and are enclosed in braces. For example: task.ContinueWith(_ => { var value = 10; value += ProcessSecondHalfOfData(); ProcessSomeRandomValue(value); });   As we can see, statement lambdas can declare variables, as well as have multiple statements. Working with extension methods Along with lambda expressions and iterators, C# 3.0 brought us extension methods. These static methods (contained in a static class whose first argument is modified with the this modifier) were created for LINQ so IEnumerable types could be queried without needing to add copious amounts of methods to the IEnumerable interface. An extension method has the basic form of: public static class EnumerableExtensions { public static IEnumerable<int> IntegerSquares( this IEnumerable<int> source) { return source.Select(value => value * value); } }   As stated earlier, extension methods must be within a static class, be a static method, and the first parameter must be modified with the this modifier. Extension methods extend the available instance methods of a type. In our previous example, we've effectively added an instance member to IEnumerable<int> named IntegerSquares so we get a sequence of integer values that have been squared. For example, if we created an array of integer values, we will have added a Cubes method to that array that returns a sequence of the values cubed. For example: var values = new int[] {1, 2, 3}; foreach (var v in values.Cubes()) { Console.WriteLine(v); }   Having the ability to create new instance methods that operate on any public members of a specific type is a very powerful feature of the language. This, unfortunately, does not come without some caveats. Extension methods suffer inherently from a scoping problem. The only scoping that can occur with these methods is the namespaces that have been referenced for any given C# source file. For example, we could have two static classes that have two extension methods named Cubes. If those static classes are in the same namespace, we'd never be able to use those extensions methods as extension methods because the compiler would never be able to resolve which one to use. For example: public static class IntegerEnumerableExtensions { public static IEnumerable<int> Squares( this IEnumerable<int> source) { return source.Select(value => value * value); } public static IEnumerable<int> Cubes( this IEnumerable<int> source) { return source.Select(value => value * value * value); } } public static class EnumerableExtensions { public static IEnumerable<int> Cubes( this IEnumerable<int> source) { return source.Select(value => value * value * value); } }   If we tried to use Cubes as an extension method, we'd get a compile error, for example: var values = new int[] {1, 2, 3}; foreach (var v in values.Cubes()) { Console.WriteLine(v); }   This would result in error CS0121: The call is ambiguous between the following methods or properties. To resolve the problem, we'd need to move one (or both) of the classes to another namespace, for example: namespace Integers { public static class IntegerEnumerableExtensions { public static IEnumerable<int> Squares( this IEnumerable<int> source) { return source.Select(value => value*value); } public static IEnumerable<int> Cubes( this IEnumerable<int> source) { return source.Select(value => value*value*value); } } } namespace Numerical { public static class EnumerableExtensions { public static IEnumerable<int> Cubes( this IEnumerable<int> source) { return source.Select(value => value*value*value); } } }   Then, we can scope to a particular namespace to choose which Cubes to use: Context: When considering extension methods, due to potential scoping problems. Practice: Use extension methods sparingly. Context: When designing extension methods. Practice: Keep all extension methods that operate on a specific type in their own class. Context: When designing classes to contain methods to extend a specific type, TypeName. Practice: Consider naming the static class TypeNameExtensions. Context: When designing classes to contain methods to extend a specific type, in order to scope the extension methods. Practice: Consider placing the class in its own namespace. Generally, there isn't much need to use extension methods on types that you own. You can simply add an instance method to contain the logic that you want to have. Where extension methods really shine is for effectively creating instance methods on interfaces. Typically, when code is necessary for shared implementations of interfaces, an abstract base class is created so each implementation of the interface can derive from it to implement these shared methods. This is a bit cumbersome in that it uses the one-and-only inheritance slot in C#, so an interface implementation would not be able to derive or extend any other classes. Additionally, there's no guarantee that a given interface implementation will derive from the abstract base and runs the risk of not being able to be used in the way it was designed. Extension methods get around this problem by being entirely independent from the implementation of an interface while still being able to extend it. One of the most notable examples of this might be the System.Linq.Enumerable class introduced in .NET 3.5. The static Enumerable class almost entirely consists of extension methods that extend IEnumerable. It is easy to develop the same sort of thing for our own interfaces. For example, say we have an ICoordinate interface to model a three-dimensional position in relation to the Earth's surface: namespace ConsoleApplication { using Numerical; internal class Program { private static void Main(string[] args) { var values = new int[] {1, 2, 3}; foreach (var v in values.Cubes()) { Console.WriteLine(v); } } } }   We could create a static class to contain extension methods to provide shared functionality between any implementation of ICoordinate. For example: public interface ICoordinate { /// <summary>North/south degrees from equator.</summary> double Latitude { get; set; } /// <summary>East/west degrees from meridian.</summary> double Longitude { get; set; } /// <summary>Distance from sea level in meters.</summary> double Altitude { get; set; } }   Context: When designing interfaces that require shared code. Practice: Consider providing extension methods instead of abstract base implementations.
Read more
  • 0
  • 0
  • 6010

article-image-performance-optimization
Packt
19 Dec 2014
30 min read
Save for later

Performance Optimization

Packt
19 Dec 2014
30 min read
In this article is written by Mark Kerzner and Sujee Maniyam, the authors of HBase Design Patterns, we will talk about how to write high performance and scalable HBase applications. In particular, will take a look at the following topics: The bulk loading of data into HBase Profiling HBase applications Tips to get good performance on writes Tips to get good performance on reads (For more resources related to this topic, see here.) Loading bulk data into HBase When deploying HBase for the first time, we usually need to import a significant amount of data. This is called initial loading or bootstrapping. There are three methods that can be used to import data into HBase, given as follows: Using the Java API to insert data into HBase. This can be done in a single client, using single or multiple threads. Using MapReduce to insert data in parallel (this approach also uses the Java API), as shown in the following diagram:  Using MapReduce to generate HBase store files in parallel in bulk and then import them into HBase directly. (This approach does not require the use of the API; it does not require code and is very efficient.)  On comparing the three methods speed wise, we have the following order: Java client < MapReduce insert < HBase file import The Java client and MapReduce use HBase APIs to insert data. MapReduce runs on multiple machines and can exploit parallelism. However, both of these methods go through the write path in HBase. Importing HBase files directly, however, skips the usual write path. HBase files already have data in the correct format that HBase understands. That's why importing them is much faster than using MapReduce and the Java client. We covered the Java API earlier. Let's start with how to insert data using MapReduce. Importing data into HBase using MapReduce MapReduce is the distributed processing engine of Hadoop. Usually, programs read/write data from HDFS. Luckily, HBase supports MapReduce. HBase can be the source and the sink for MapReduce programs. A source means MapReduce programs can read from HBase, and sink means results from MapReduce can be sent to HBase. The following diagram illustrates various sources and sinks for MapReduce:     The diagram we just saw can be summarized as follows: Scenario Source Sink Description 1 HDFS HDFS This is a typical MapReduce method that reads data from HDFS and also sends the results to HDFS. 2 HDFS HBase This imports the data from HDFS into HBase. It's a very common method that is used to import data into HBase for the first time. 3 HBase HBase Data is read from HBase and written to it. It is most likely that these will be two separate HBase clusters. It's usually used for backups and mirroring.  Importing data from HDFS into HBase Let's say we have lots of data in HDFS and want to import it into HBase. We are going to write a MapReduce program that reads from HDFS and inserts data into HBase. This is depicted in the second scenario in the table we just saw. Now, we'll be setting up the environment for the following discussion. In addition, you can find the code and the data for this discussion in our GitHub repository at https://github.com/elephantscale/hbase-book. The dataset we will use is the sensor data. Our (imaginary) sensor data is stored in HDFS as CSV (comma-separated values) text files. This is how their format looks: Sensor_id, max temperature, min temperature Here is some sample data: sensor11,90,70 sensor22,80,70 sensor31,85,72 sensor33,75,72 We have two sample files (sensor-data1.csv and sensor-data2.csv) in our repository under the /data directory. Feel free to inspect them. The first thing we have to do is copy these files into HDFS. Create a directory in HDFS as follows: $   hdfs   dfs -mkdir   hbase-import Now, copy the files into HDFS: $   hdfs   dfs   -put   sensor-data*   hbase-import/ Verify that the files exist as follows: $   hdfs   dfs -ls   hbase-import We are ready to insert this data into HBase. Note that we are designing the table to match the CSV files we are loading for ease of use. Our row key is sensor_id. We have one column family and we call it f (short for family). Now, we will store two columns, max temperature and min temperature, in this column family. Pig for MapReduce Pig allows you to write MapReduce programs at a very high level, and inserting data into HBase is just as easy. Here's a Pig script that reads the sensor data from HDFS and writes it in HBase: -- ## hdfs-to-hbase.pigdata = LOAD 'hbase-import/' using PigStorage(',') as (sensor_id:chararray, max:int, min:int);-- describe data;-- dump data; Now, store the data in hbase://sensors using the following line of code: org.apache.pig.backend.hadoop.hbase.HBaseStorage('f:max,f:min'); After creating the table, in the first command, we will load data from the hbase-import directory in HDFS. The schema for the data is defined as follows: Sensor_id : chararray (string)max : intmin : int The describe and dump statements can be used to inspect the data; in Pig, describe will give you the structure of the data object you have, and dump will output all the data to the terminal. The final STORE command is the one that inserts the data into HBase. Let's analyze how it is structured: INTO 'hbase://sensors': This tells Pig to connect to the sensors HBase table. org.apache.pig.backend.hadoop.hbase.HBaseStorage: This is the Pig class that will be used to write in HBase. Pig has adapters for multiple data stores. The first field in the tuple, sensor_id, will be used as a row key. We are specifying the column names for the max and min fields (f:max and f:min, respectively). Note that we have to specify the column family (f:) to qualify the columns. Before running this script, we need to create an HBase table called sensors. We can do this from the HBase shell, as follows: $ hbase shell$ create 'sensors' , 'f'$ quit Then, run the Pig script as follows: $ pig hdfs-to-hbase.pig Now watch the console output. Pig will execute the script as a MapReduce job. Even though we are only importing two small files here, we can insert a fairly large amount of data by exploiting the parallelism of MapReduce. At the end of the run, Pig will print out some statistics: Input(s):Successfully read 7 records (591 bytes) from: "hdfs://quickstart.cloudera:8020/user/cloudera/hbase-import"Output(s):Successfully stored 7 records in: "hbase://sensors" Looks good! We should have seven rows in our HBase sensors table. We can inspect the table from the HBase shell with the following commands: $ hbase shell$ scan 'sensors' This is how your output might look: ROW                      COLUMN+CELL sensor11                 column=f:max, timestamp=1412373703149, value=90 sensor11                 column=f:min, timestamp=1412373703149, value=70 sensor22                 column=f:max, timestamp=1412373703177, value=80 sensor22                column=f:min, timestamp=1412373703177, value=70 sensor31                 column=f:max, timestamp=1412373703177, value=85 sensor31                 column=f:min, timestamp=1412373703177, value=72 sensor33                 column=f:max, timestamp=1412373703177, value=75 sensor33                 column=f:min, timestamp=1412373703177, value=72 sensor44                 column=f:max, timestamp=1412373703184, value=55 sensor44                 column=f:min, timestamp=1412373703184, value=42 sensor45                 column=f:max, timestamp=1412373703184, value=57 sensor45                 column=f:min, timestamp=1412373703184, value=47 sensor55                 column=f:max, timestamp=1412373703184, value=55 sensor55                 column=f:min, timestamp=1412373703184, value=427 row(s) in 0.0820 seconds There you go; you can see that seven rows have been inserted! With Pig, it was very easy. It took us just two lines of Pig script to do the import. Java MapReduce We have just demonstrated MapReduce using Pig, and you now know that Pig is a concise and high-level way to write MapReduce programs. This is demonstrated by our previous script, essentially the two lines of Pig code. However, there are situations where you do want to use the Java API, and it would make more sense to use it than using a Pig script. This can happen when you need Java to access Java libraries or do some other detailed tasks for which Pig is not a good match. For that, we have provided the Java version of the MapReduce code in our GitHub repository. Using HBase's bulk loader utility HBase is shipped with a bulk loader tool called ImportTsv that can import files from HDFS into HBase tables directly. It is very easy to use, and as a bonus, it uses MapReduce internally to process files in parallel. Perform the following steps to use ImportTsv: Stage data files into HDFS (remember that the files are processed using MapReduce). Create a table in HBase if required. Run the import. Staging data files into HDFS The first step to stage data files into HDFS has already been outlined in the previous section. The following sections explain the next two steps to stage data files. Creating an HBase table We will do this from the HBase shell. A note on regions is in order here. Regions are shards created automatically by HBase. It is the regions that are responsible for the distributed nature of HBase. However, you need to pay some attention to them in order to assure performance. If you put all the data in one region, you will cause what is called region hotspotting. What is especially nice about a bulk loader is that when creating a table, it lets you presplit the table into multiple regions. Precreating regions will allow faster imports (because the insert requests will go out to multiple region servers). Here, we are creating a single column family: $ hbase shellhbase> create 'sensors', {NAME => 'f'}, {SPLITS => ['sensor20', 'sensor40', 'sensor60']}0 row(s) in 1.3940 seconds=> Hbase::Table - sensors hbase > describe 'sensors'DESCRIPTION                                       ENABLED'sensors', {NAME => 'f', DATA_BLOCK_ENCODING => true'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE=> '0', VERSIONS => '1', COMPRESSION => 'NONE',MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}1 row(s) in 0.1140 seconds We are creating regions here. Why there are exactly four regions will be clear from the following diagram:   On inspecting the table in the HBase Master UI, we will see this. Also, you can see how Start Key and End Key, which we specified, are showing up. Run the import Ok, now it's time to insert data into HBase. To see the usage of ImportTsv, do the following: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv This will print the usage as follows: $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,f:max,f:min sensors   hbase-import/ The following table explains what the parameters mean: Parameter Description -Dimporttsv.separator Here, our separator is a comma (,). The default value is tab (t). -Dimporttsv.columns=HBASE_ROW_KEY,f:max,f:min This is where we map our input files into HBase tables. The first field, sensor_id, is our key, and we use HBASE_ROW_KEY to denote that the rest we are inserting into column family f. The second field, max temp, maps to f:max. The last field, min temp, maps to f:min. sensors This is the table name. hbase-import This is the HDFS directory where the data files are located.  When we run this command, we will see that a MapReduce job is being kicked off. This is how an import is parallelized. Also, from the console output, we can see that MapReduce is importing two files as follows: [main] mapreduce.JobSubmitter: number of splits:2 While the job is running, we can inspect the progress from YARN (or the JobTracker UI). One thing that we can note is that the MapReduce job only consists of mappers. This is because we are reading a bunch of files and inserting them into HBase directly. There is nothing to aggregate. So, there is no need for reducers. After the job is done, inspect the counters and we can see this: Map-Reduce Framework Map input records=7 Map output records=7 This tells us that mappers read seven records from the files and inserted seven records into HBase. Let's also verify the data in HBase: $   hbase shellhbase >   scan 'sensors'ROW                 COLUMN+CELLsensor11           column=f:max, timestamp=1409087465345, value=90sensor11           column=f:min, timestamp=1409087465345, value=70sensor22           column=f:max, timestamp=1409087465345, value=80sensor22           column=f:min, timestamp=1409087465345, value=70sensor31           column=f:max, timestamp=1409087465345, value=85sensor31           column=f:min, timestamp=1409087465345, value=72sensor33           column=f:max, timestamp=1409087465345, value=75sensor33           column=f:min, timestamp=1409087465345, value=72sensor44            column=f:max, timestamp=1409087465345, value=55sensor44           column=f:min, timestamp=1409087465345, value=42sensor45           column=f:max, timestamp=1409087465345, value=57sensor45           column=f:min, timestamp=1409087465345, value=47sensor55           column=f:max, timestamp=1409087465345, value=55sensor55           column=f:min, timestamp=1409087465345, value=427 row(s) in 2.1180 seconds Your output might vary slightly. We can see that seven rows are inserted, confirming the MapReduce counters! Let's take another quick look at the HBase UI, which is shown here:    As you can see, the inserts go to different regions. So, on a HBase cluster with many region servers, the load will be spread across the cluster. This is because we have presplit the table into regions. Here are some questions to test your understanding. Run the same ImportTsv command again and see how many records are in the table. Do you get duplicates? Try to find the answer and explain why that is the correct answer, then check these in the GitHub repository (https://github.com/elephantscale/hbase-book). Bulk import scenarios Here are a few bulk import scenarios: Scenario Methods Notes The data is already in HDFS and needs to be imported into HBase. The two methods that can be used to do this are as follows: If the ImportTsv tool can work for you, then use it as it will save time in writing custom MapReduce code. Sometimes, you might have to write a custom MapReduce job to import (for example, complex time series data, doing data mapping, and so on). It is probably a good idea to presplit the table before a bulk import. This spreads the insert requests across the cluster and results in a higher insert rate. If you are writing a custom MapReduce job, consider using a high-level MapReduce platform such as Pig or Hive. They are much more concise to write than the Java code. The data is in another database (RDBMs/NoSQL) and you need to import it into HBase. Use a utility such as Sqoop to bring the data into HDFS and then use the tools outlined in the first scenario. Avoid writing MapReduce code that directly queries databases. Most databases cannot handle many simultaneous connections. It is best to bring the data into Hadoop (HDFS) first and then use MapReduce. Profiling HBase applications Just like any software development process, once we have our HBase application working correctly, we would want to make it faster. At times, developers get too carried away and start optimizing before the application is finalized. There is a well-known rule that premature optimization is the root of all evil. One of the sources for this rule is Scott Meyers Effective C++. We can perform some ad hoc profiling in our code by timing various function calls. Also, we can use profiling tools to pinpoint the trouble spots. Using profiling tools is highly encouraged for the following reasons: Profiling takes out the guesswork (and a good majority of developers' guesses are wrong). There is no need to modify the code. Manual profiling means that we have to go and insert the instrumentation code all over the code. Profilers work by inspecting the runtime behavior. Most profilers have a nice and intuitive UI to visualize the program flow and time flow. The authors use JProfiler. It is a pretty effective profiler. However, it is neither free nor open source. So, for the purpose of this article, we are going to show you a simple manual profiling, as follows: public class UserInsert {      static String tableName = "users";    static String familyName = "info";      public static void main(String[] args) throws Exception {        Configuration config = HBaseConfiguration.create();        // change the following to connect to remote clusters        // config.set("hbase.zookeeper.quorum", "localhost");        long t1a = System.currentTimeMillis();        HTable htable = new HTable(config, tableName);        long t1b = System.currentTimeMillis();        System.out.println ("Connected to HTable in : " + (t1b-t1a) + " ms");        int total = 100;        long t2a = System.currentTimeMillis();        for (int i = 0; i < total; i++) {            int userid = i;            String email = "user-" + i + "@foo.com";            String phone = "555-1234";              byte[] key = Bytes.toBytes(userid);            Put put = new Put(key);              put.add(Bytes.toBytes(familyName), Bytes.toBytes("email"), Bytes.toBytes(email));            put.add(Bytes.toBytes(familyName), Bytes.toBytes("phone"), Bytes.toBytes(phone));            htable.put(put);          }        long t2b = System.currentTimeMillis();        System.out.println("inserted " + total + " users in " + (t2b - t2a) + " ms");        htable.close();      } } The code we just saw inserts some sample user data into HBase. We are profiling two operations, that is, connection time and actual insert time. A sample run of the Java application yields the following: Connected to HTable in : 1139 msinserted 100 users in 350 ms We spent a lot of time in connecting to HBase. This makes sense. The connection process has to go to ZooKeeper first and then to HBase. So, it is an expensive operation. How can we minimize the connection cost? The answer is by using connection pooling. Luckily, for us, HBase comes with a connection pool manager. The Java class for this is HConnectionManager. It is very simple to use. Let's update our class to use HConnectionManager: Code : File name: hbase_dp.ch8.UserInsert2.java   package hbase_dp.ch8;   import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HConnection; import org.apache.hadoop.hbase.client.HConnectionManager; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.HTableInterface; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.util.Bytes;   public class UserInsert2 {      static String tableName = "users";    static String familyName = "info";      public static void main(String[] args) throws Exception {        Configuration config = HBaseConfiguration.create();        // change the following to connect to remote clusters        // config.set("hbase.zookeeper.quorum", "localhost");               long t1a = System.currentTimeMillis();        HConnection hConnection = HConnectionManager.createConnection(config);        long t1b = System.currentTimeMillis();        System.out.println ("Connection manager in : " + (t1b-t1a) + " ms");          // simulate the first 'connection'        long t2a = System.currentTimeMillis();        HTableInterface htable = hConnection.getTable(tableName) ;        long t2b = System.currentTimeMillis();        System.out.println ("first connection in : " + (t2b-t2a) + " ms");               // second connection        long t3a = System.currentTimeMillis();        HTableInterface htable2 = hConnection.getTable(tableName) ;        long t3b = System.currentTimeMillis();        System.out.println ("second connection : " + (t3b-t3a) + " ms");          int total = 100;        long t4a = System.currentTimeMillis();        for (int i = 0; i < total; i++) {            int userid = i;            String email = "user-" + i + "@foo.com";            String phone = "555-1234";              byte[] key = Bytes.toBytes(userid);            Put put = new Put(key);              put.add(Bytes.toBytes(familyName), Bytes.toBytes("email"), Bytes.toBytes(email));            put.add(Bytes.toBytes(familyName), Bytes.toBytes("phone"), Bytes.toBytes(phone));            htable.put(put);          }      long t4b = System.currentTimeMillis();        System.out.println("inserted " + total + " users in " + (t4b - t4a) + " ms");        hConnection.close();    } } A sample run yields the following timings: Connection manager in : 98 ms first connection in : 808 ms second connection : 0 ms inserted 100 users in 393 ms The first connection takes a long time, but then take a look at the time of the second connection. It is almost instant ! This is cool! If you are connecting to HBase from web applications (or interactive applications), use connection pooling. More tips for high-performing HBase writes Here we will discuss some techniques and best practices to improve writes in HBase. Batch writes Currently, in our code, each time we call htable.put (one_put), we make an RPC call to an HBase region server. This round-trip delay can be minimized if we call htable.put() with a bunch of put records. Then, with one round trip, we can insert a bunch of records into HBase. This is called batch puts. Here is an example of batch puts. Only the relevant section is shown for clarity. For the full code, see hbase_dp.ch8.UserInsert3.java:        int total = 100;        long t4a = System.currentTimeMillis();        List<Put> puts = new ArrayList<>();        for (int i = 0; i < total; i++) {            int userid = i;            String email = "user-" + i + "@foo.com";            String phone = "555-1234";              byte[] key = Bytes.toBytes(userid);            Put put = new Put(key);              put.add(Bytes.toBytes(familyName), Bytes.toBytes("email"), Bytes.toBytes(email));            put.add(Bytes.toBytes(familyName), Bytes.toBytes("phone"), Bytes.toBytes(phone));                       puts.add(put); // just add to the list        }        htable.put(puts); // do a batch put        long t4b = System.currentTimeMillis();        System.out.println("inserted " + total + " users in " + (t4b - t4a) + " ms"); A sample run with a batch put is as follows: inserted 100 users in 48 ms The same code with individual puts took around 350 milliseconds! Use batch writes when you can to minimize latency. Note that the HTableUtil class that comes with HBase implements some smart batching options for your use and enjoyment. Setting memory buffers We can control when the puts are flushed by setting the client write buffer option. Once the data in the memory exceeds this setting, it is flushed to disk. The default setting is 2 M. Its purpose is to limit how much data is stored in the buffer before writing it to disk. There are two ways of setting this: In hbase-site.xml (this setting will be cluster-wide): <property>  <name>hbase.client.write.buffer</name>    <value>8388608</value>   <!-- 8 M --></property> In the application (only applies for that application): htable.setWriteBufferSize(1024*1024*10); // 10 Keep in mind that a bigger buffer takes more memory on both the client side and the server side. As a practical guideline, estimate how much memory you can dedicate to the client and put the rest of the load on the cluster. Turning off autofush If autoflush is enabled, each htable.put() object incurs a round trip RPC call to HRegionServer. Turning autoflush off can reduce the number of round trips and decrease latency. To turn it off, use this code: htable.setAutoFlush(false); The risk of turning off autoflush is if the client crashes before the data is sent to HBase, it will result in a data loss. Still, when will you want to do it? The answer is: when the danger of data loss is not important and speed is paramount. Also, see the batch write recommendations we saw previously. Turning off WAL Before we discuss this, we need to emphasize that the write-ahead log (WAL) is there to prevent data loss in the case of server crashes. By turning it off, we are bypassing this protection. Be very careful when choosing this. Bulk loading is one of the cases where turning off WAL might make sense. To turn off WAL, set it for each put: put.setDurability(Durability.SKIP_WAL); More tips for high-performing HBase reads So far, we looked at tips to write data into HBase. Now, let's take a look at some tips to read data faster. The scan cache When reading a large number of rows, it is better to set scan caching to a high number (in the 100 seconds or 1,000 seconds range). Otherwise, each row that is scanned will result in a trip to HRegionServer. This is especially encouraged for MapReduce jobs as they will likely consume a lot of rows sequentially. To set scan caching, use the following code: Scan scan = new Scan(); scan.setCaching(1000); Only read the families or columns needed When fetching a row, by default, HBase returns all the families and all the columns. If you only care about one family or a few attributes, specifying them will save needless I/O. To specify a family, use this: scan.addFamily( Bytes.toBytes("familiy1")); To specify columns, use this: scan.addColumn( Bytes.toBytes("familiy1"),   Bytes.toBytes("col1")) The block cache When scanning large rows sequentially (say in MapReduce), it is recommended that you turn off the block cache. Turning off the cache might be completely counter-intuitive. However, caches are only effective when we repeatedly access the same rows. During sequential scanning, there is no caching, and turning on the block cache will introduce a lot of churning in the cache (new data is constantly brought into the cache and old data is evicted to make room for the new data). So, we have the following points to consider: Turn off the block cache for sequential scans Turn off the block cache for random/repeated access Benchmarking or load testing HBase Benchmarking is a good way to verify HBase's setup and performance. There are a few good benchmarks available: HBase's built-in benchmark The Yahoo Cloud Serving Benchmark (YCSB) JMeter for custom workloads HBase's built-in benchmark HBase's built-in benchmark is PerformanceEvaluation. To find its usage, use this: $   hbase org.apache.hadoop.hbase.PerformanceEvaluation To perform a write benchmark, use this: $ hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred randomWrite 5 Here we are using five threads and no MapReduce. To accurately measure the throughput, we need to presplit the table that the benchmark writes to. It is TestTable. $ hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred --presplit=3 randomWrite 5 Here, the table is split in three ways. It is good practice to split the table into as many regions as the number of region servers. There is a read option along with a whole host of scan options. YCSB The YCSB is a comprehensive benchmark suite that works with many systems such as Cassandra, Accumulo, and HBase. Download it from GitHub, as follows: $   git clone git://github.com/brianfrankcooper/YCSB.git Build it like this: $ mvn -DskipTests package Create an HBase table to test against: $ hbase shellhbase> create 'ycsb', 'f1' Now, copy hdfs-site.xml for your cluster into the hbase/src/main/conf/ directory and run the benchmark: $ bin/ycsb load hbase -P workloads/workloada -p columnfamily=f1 -p table=ycsb YCSB offers lots of workloads and options. Please refer to its wiki page at https://github.com/brianfrankcooper/YCSB/wiki. JMeter for custom workloads The standard benchmarks will give you an idea of your HBase cluster's performance. However, nothing can substitute measuring your own workload. We want to measure at least the insert speed or the query speed. We also want to run a stress test. So, we can measure the ceiling on how much our HBase cluster can support. We can do a simple instrumentation as we did earlier too. However, there are tools such as JMeter that can help us with load testing. Please refer to the JMeter website and check out the Hadoop or HBase plugins for JMeter. Monitoring HBase Running any distributed system involves decent monitoring. HBase is no exception. Luckily, HBase has the following capabilities: HBase exposes a lot of metrics These metrics can be directly consumed by monitoring systems such as Ganglia We can also obtain these metrics in the JSON format via the REST interface and JMX Monitoring is a big subject and we consider it as part HBase administration. So, in this section, we will give pointers to tools and utilities that allow you to monitor HBase. Ganglia Ganglia is a generic system monitor that can monitor hosts (such as CPU, disk usage, and so on). The Hadoop stack has had a pretty good integration with Ganglia for some time now. HBase and Ganglia integration is set up by modern installers from Cloudera and Hortonworks. To enable Ganglia metrics, update the hadoop-metrics.properties file in the HBase configuration directory. Here's a sample file: hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 hbase.period=10 hbase.servers=ganglia-server:PORT jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 jvm.period=10 jvm.servers=ganglia-server:PORT rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 rpc.period=10 rpc.servers=ganglia-server:PORT This file has to be uploaded to all the HBase servers (master servers as well as region servers). Here are some sample graphs from Ganglia (these are Wikimedia statistics, for example): These graphs show cluster-wide resource utilization. OpenTSDB OpenTSDB is a scalable time series database. It can collect and visualize metrics on a large scale. OpenTSDB uses collectors, light-weight agents that send metrics to the open TSDB server to collect metrics, and there is a collector library that can collect metrics from HBase. You can see all the collectors at http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html. An interesting factoid is that OpenTSDB is built on Hadoop/HBase. Collecting metrics via the JMX interface HBase exposes a lot of metrics via JMX. This page can be accessed from the web dashboard at http://<hbase master>:60010/jmx. For example, for a HBase instance that is running locally, it will be http://localhost:60010/jmx. Here is a sample screenshot of the JMX metrics via the web UI: Here's a quick example of how to programmatically retrieve these metrics using curl: $ curl 'localhost:60010/jmx' Since this is a web service, we can write a script/application in any language (Java, Python, or Ruby) to retrieve and inspect the metrics. Summary In this article, you learned how to push the performance of our HBase applications up. We looked at how to effectively load a large amount of data into HBase. You also learned about benchmarking and monitoring HBase and saw tips on how to do high-performing reads/writes. Resources for Article:   Further resources on this subject: The HBase's Data Storage [article] Hadoop and HDInsight in a Heartbeat [article] Understanding the HBase Ecosystem [article]
Read more
  • 0
  • 0
  • 5906

article-image-index-item-sharding-and-projection-dynamodb
Packt
17 Sep 2014
13 min read
Save for later

Index, Item Sharding, and Projection in DynamoDB

Packt
17 Sep 2014
13 min read
Understanding the secondary index and projections should go hand in hand because of the fact that a secondary index cannot be used efficiently without specifying projection. In this article by Uchit Vyas and Prabhakaran Kuppusamy, authors of DynamoDB Applied Design Patterns, we will take a look at local and global secondary indexes, and projection and its usage with indexes. (For more resources related to this topic, see here.) The use of projection in DynamoDB is pretty much similar to that of traditional databases. However, here are a few things to watch out for: Whenever a DynamoDB table is created, it is mandatory to create a primary key, which can be of a simple type (hash type), or it can be of a complex type (hash and range key). For the specified primary key, an index will be created (we call this index the primary index). Along with this primary key index, the user is allowed to create up to five secondary indexes per table. There are two kinds of secondary index. The first is a local secondary index (in which the hash key of the index must be the same as that of the table) and the second is the global secondary index (in which the hash key can be any field). In both of these secondary index types, the range key can be a field that the user needs to create an index for. Secondary indexes A quick question: while writing a query in any database, keeping the primary key field as part of the query (especially in the where condition) will return results much faster compared to the other way. Why? This is because of the fact that an index will be created automatically in most of the databases for the primary key field. This the case with DynamoDB also. This index is called the primary index of the table. There is no customization possible using the primary index, so the primary index is seldom discussed. In order to make retrieval faster, the frequently-retrieved attributes need to be made as part of the index. However, a DynamoDB table can have only one primary index and the index can have a maximum of two attributes (hash and range key). So for faster retrieval, the user should be given privileges to create user-defined indexes. This index, which is created by the user, is called the secondary index. Similar to the table key schema, the secondary index also has a key schema. Based on the key schema attributes, the secondary index can be either a local or global secondary index. Whenever a secondary index is created, during every item insertion, the items in the index will be rearranged. This rearrangement will happen for each item insertion into the table, provided the item contains both the index's hash and range key attribute. Projection Once we have an understanding of the secondary index, we are all set to learn about projection. While creating the secondary index, it is mandatory to specify the hash and range attributes based on which the index is created. Apart from these two attributes, if the query wants one or more attribute (assuming that none of these attributes are projected into the index), then DynamoDB will scan the entire table. This will consume a lot of throughput capacity and will have comparatively higher latency. The following is the table (with some data) that is used to store book information: Here are few more details about the table: The BookTitle attribute is the hash key of the table and local secondary index The Edition attribute is the range key of the table The PubDate attribute is the range key of the index (let's call this index IDX_PubDate) Local secondary index While creating the secondary index, the hash and range key of the table and index will be inserted into the index; optionally, the user can specify what other attributes need to be added. There are three kinds of projection possible in DynamoDB: KEYS_ONLY: Using this, the index consists of the hash and range key values of the table and index INCLUDE: Using this, the index consists of attributes in KEYS_ONLY plus other non-key attributes that we specify ALL: Using this, the index consists of all of the attributes from the source table The following code shows the creation of a local secondary index named Idx_PubDate with BookTitle as the hash key (which is a must in the case of a local secondary index), PubDate as the range key, and using the KEYS_ONLY projection: private static LocalSecondaryIndex getLocalSecondaryIndex() { ArrayList<KeySchemaElement> indexKeySchema =    newArrayList<KeySchemaElement>(); indexKeySchema.add(new KeySchemaElement()    .withAttributeName("BookTitle")    .withKeyType(KeyType.HASH)); indexKeySchema.add(new KeySchemaElement()    .withAttributeName("PubDate")    .withKeyType(KeyType.RANGE)); LocalSecondaryIndex lsi = new LocalSecondaryIndex()    .withIndexName("Idx_PubDate")    .withKeySchema(indexKeySchema)    .withProjection(new Projection()    .withProjectionType("KEYS_ONLY")); return lsi; } The usage of the KEYS_ONLY index type will create the smallest possible index and the usage of ALL will create the biggest possible index. We will discuss the trade-offs between these index types a little later. Going back to our example, let us assume that we are using the KEYS_ONLY index type, so none of the attributes (other than the previous three key attributes) are projected into the index. So the index will look as follows: You may notice that the row order of the index is almost the same as that of the table order (except the second and third rows). Here, you can observe one point: the table records will be grouped primarily based on the hash key, and then the records that have the same hash key will be ordered based on the range key of the index. In the case of the index, even though the table's range key is part of the index attribute, it will not play any role in the ordering (only the index's hash and range keys will take part in the ordering). There is a negative in this approach. If the user is writing a query using this index to fetch BookTitle and Publisher with PubDate as 28-Dec-2008, then what happens? Will DynamoDB complain that the Publisher attribute is not projected into the index? The answer is no. The reason is that even though Publisher is not projected into the index, we can still retrieve it using the secondary index. However, retrieving a nonprojected attribute will scan the entire table. So if we are sure that certain attributes need to be fetched frequently, then we must project it into the index; otherwise, it will consume a large number of capacity units and retrieval will be much slower as well. One more question: if the user is writing a query using the local secondary index to fetch BookTitle and Publisher with PubDate as 28-Dec-2008, then what happens? Will DynamoDB complain that the PubDate attribute is not part of the primary key and hence queries are not allowed on nonprimary key attributes? The answer is no. It is a rule of thumb that we can write queries on the secondary index attributes. It is possible to include nonprimary key attributes as part of the query, but these attributes must at least be key attributes of the index. The following code shows how to add non-key attributes to the secondary index's projection: private static Projection getProjectionWithNonKeyAttr() { Projection projection = new Projection()    .withProjectionType(ProjectionType.INCLUDE); ArrayList<String> nonKeyAttributes = new ArrayList<String>(); nonKeyAttributes.add("Language"); nonKeyAttributes.add("Author2"); projection.setNonKeyAttributes(nonKeyAttributes); return projection; } There is a slight limitation with the local secondary index. If we write a query on a non-key (both table and index) attribute, then internally DynamoDB might need to scan the entire table; this is inefficient. For example, consider a situation in which we need to retrieve the number of editions of the books in each and every language. Since both of the attributes are non-key, even if we create a local secondary index with either of the attributes (Edition and Language), the query will still result in a scan operation on the entire table. Global secondary index A problem arises here: is there any way in which we can create a secondary index using both the index keys that are different from the table's primary keys? The answer is the global secondary index. The following code shows how to create the global secondary index for this scenario: private static GlobalSecondaryIndex getGlobalSecondaryIndex() { GlobalSecondaryIndex gsi = new GlobalSecondaryIndex()    .withIndexName("Idx_Pub_Edtn")    .withProvisionedThroughput(new ProvisionedThroughput()    .withReadCapacityUnits((long) 1)    .withWriteCapacityUnits((long) 1))    .withProjection(newProjection().withProjectionType      ("KEYS_ONLY"));   ArrayList<KeySchemaElement> indexKeySchema1 =    newArrayList<KeySchemaElement>();   indexKeySchema1.add(new KeySchemaElement()    .withAttributeName("Language")    .withKeyType(KeyType.HASH)); indexKeySchema1.add(new KeySchemaElement()    .withAttributeName("Edition")    .withKeyType(KeyType.RANGE));   gsi.setKeySchema(indexKeySchema1); return gsi; } While deciding the attributes to be projected into a global secondary index, there are trade-offs we must consider between provisioned throughput and storage costs. A few of these are listed as follows: If our application doesn't need to query a table so often and it performs frequent writes or updates against the data in the table, then we must consider projecting the KEYS_ONLY attributes. The global secondary index will be minimum size, but it will still be available when required for the query activity. The smaller the index, the cheaper the cost to store it and our write costs will be cheaper too. If we need to access only those few attributes that have the lowest possible latency, then we must project only those (lesser) attributes into a global secondary index. If we need to access almost all of the non-key attributes of the DynamoDB table on a frequent basis, we can project these attributes (even the entire table) into the global secondary index. This will give us maximum flexibility with the trade-off that our storage cost would increase, or even double if we project the entire table's attributes into the index. The additional storage costs to store the global secondary index might equalize the cost of performing frequent table scans. If our application will frequently retrieve some non-key attributes, we must consider projecting these non-key attributes into the global secondary index. Item sharding Sharding, also called horizontal partitioning, is a technique in which rows are distributed among the database servers to perform queries faster. In the case of sharding, a hash operation will be performed on the table rows (mostly on one of the columns) and, based on the hash operation output, the rows will be grouped and sent to the proper database server. Take a look at the following diagram: As shown in the previous diagram, if all the table data (only four rows and one column are shown for illustration purpose) is stored in a single database server, the read and write operations will become slower and the server that has the frequently accessed table data will work more compared to the server storing the table data that is not accessed frequently. The following diagram shows the advantage of sharding over a multitable, multiserver database environment: In the previous diagram, two tables (Tbl_Places and Tbl_Sports) are shown on the left-hand side with four sample rows of data (Austria.. means only the first column of the first item is illustrated and all other fields are represented by ..).We are going to perform a hash operation on the first column only. In DynamoDB, this hashing will be performed automatically. Once the hashing is done, similar hash rows will be saved automatically in different servers (if necessary) to satisfy the specified provisioned throughput capacity. Have you ever wondered about the importance of the hash type key while creating a table (which is mandatory)? Of course we all know the importance of the range key and what it does. It simply sorts items based on the range key value. So far, we might have been thinking that the range key is more important than the hash key. If you think that way, then you may be correct, provided we neither need our table to be provisioned faster nor do we need to create any partitions for our table. As long as the table data is smaller, the importance of the hash key will be realized only while writing a query operation. However, once the table grows, in order to satisfy the same provision throughput, DynamoDB needs to partition our table data based on this hash key (as shown in the previous diagram). This partitioning of table items based on the hash key attribute is called sharding. It means the partitions are created by splitting items and not attributes. This is the reason why a query that has the hash key (of table and index) retrieves items much faster. Since the number of partitions is managed automatically by DynamoDB, we cannot just hope for things to work fine. We also need to keep certain things in mind, for example, the hash key attribute should have more distinct values. To simplify, it is not advisable to put binary values (such as Yes or No, Present or Past or Future, and so on) into the hash key attributes, thereby restricting the number of partitions. If our hash key attribute has either Yes or No values in all the items, then DynamoDB can create only a maximum of two partitions; therefore, the specified provisioned throughput cannot be achieved. Just consider that we have created a table called Tbl_Sports with a provisioned throughput capacity of 10, and then we put 10 items into the table. Assuming that only a single partition is created, we are able to retrieve 10 items per second. After a point of time, we put 10 more items into the table. DynamoDB will create another partition (by hashing over the hash key), thereby satisfying the provisioned throughput capacity. There is a formula taken from the AWS site: Total provisioned throughput/partitions = throughput per partition OR No. of partitions = Total provisioned throughput/throughput per partition In order to satisfy throughput capacity, the other parameters will be automatically managed by DynamoDB. Summary In this article, we saw what the local and global secondary indexes are. We walked through projection and its usage with indexes. Resources for Article: Further resources on this subject: Comparative Study of NoSQL Products [Article] Ruby with MongoDB for Web Development [Article] Amazon DynamoDB - Modelling relationships, Error handling [Article]
Read more
  • 0
  • 0
  • 5326
article-image-chain-responsibility-pattern
Packt
05 Feb 2015
12 min read
Save for later

The Chain of Responsibility Pattern

Packt
05 Feb 2015
12 min read
In this article by Sakis Kasampalis, author of the book Mastering Python Design Patterns, we will see a detailed description of the Chain of Responsibility design pattern with the help of a real-life example as well as a software example. Also, its use cases and implementation are discussed. (For more resources related to this topic, see here.) When developing an application, most of the time we know which method should satisfy a particular request in advance. However, this is not always the case. For example, we can think of any broadcast computer network, such as the original Ethernet implementation [j.mp/wikishared]. In broadcast computer networks, all requests are sent to all nodes (broadcast domains are excluded for simplicity), but only the nodes that are interested in a sent request process it. All computers that participate in a broadcast network are connected to each other using a common medium such as the cable that connects the three nodes in the following figure: If a node is not interested or does not know how to handle a request, it can perform the following actions: Ignore the request and do nothing Forward the request to the next node The way in which the node reacts to a request is an implementation detail. However, we can use the analogy of a broadcast computer network to understand what the chain of responsibility pattern is all about. The Chain of Responsibility pattern is used when we want to give a chance to multiple objects to satisfy a single request, or when we don't know which object (from a chain of objects) should process a specific request in advance. The principle is the same as the following: There is a chain (linked list, tree, or any other convenient data structure) of objects. We start by sending a request to the first object in the chain. The object decides whether it should satisfy the request or not. The object forwards the request to the next object. This procedure is repeated until we reach the end of the chain. At the application level, instead of talking about cables and network nodes, we can focus on objects and the flow of a request. The following figure, courtesy of a title="Scala for Machine Learning" www.sourcemaking.com [j.mp/smchain], shows how the client code sends a request to all processing elements (also known as nodes or handlers) of an application: Note that the client code only knows about the first processing element, instead of having references to all of them, and each processing element only knows about its immediate next neighbor (called the successor), not about every other processing element. This is usually a one-way relationship, which in programming terms means a singly linked list in contrast to a doubly linked list; a singly linked list does not allow navigation in both ways, while a doubly linked list allows that. This chain organization is used for a good reason. It achieves decoupling between the sender (client) and the receivers (processing elements) [GOF95, page 254]. A real-life example ATMs and, in general, any kind of machine that accepts/returns banknotes or coins (for example, a snack vending machine) use the chain of responsibility pattern. There is always a single slot for all banknotes, as shown in the following figure, courtesy of www.sourcemaking.com: When a banknote is dropped, it is routed to the appropriate receptacle. When it is returned, it is taken from the appropriate receptacle [j.mp/smchain], [j.mp/c2chain]. We can think of the single slot as the shared communication medium and the different receptacles as the processing elements. The result contains cash from one or more receptacles. For example, in the preceding figure, we see what happens when we request $175 from the ATM. A software example I tried to find some good examples of Python applications that use the Chain of Responsibility pattern but I couldn't, most likely because Python programmers don't use this name. So, my apologies, but I will use other programming languages as a reference. The servlet filters of Java are pieces of code that are executed before an HTTP request arrives at a target. When using servlet filters, there is a chain of filters. Each filter performs a different action (user authentication, logging, data compression, and so forth), and either forwards the request to the next filter until the chain is exhausted, or it breaks the flow if there is an error (for example, the authentication failed three consecutive times) [j.mp/soservl]. Apple's Cocoa and Cocoa Touch frameworks use Chain of Responsibility to handle events. When a view receives an event that it doesn't know how to handle, it forwards the event to its superview. This goes on until a view is capable of handling the event or the chain of views is exhausted [j.mp/chaincocoa]. Use cases By using the Chain of Responsibility pattern, we give a chance to a number of different objects to satisfy a specific request. This is useful when we don't know which object should satisfy a request in advance. An example is a purchase system. In purchase systems, there are many approval authorities. One approval authority might be able to approve orders up to a certain value, let's say $100. If the order is more than $100, the order is sent to the next approval authority in the chain that can approve orders up to $200, and so forth. Another case where Chain of Responsibility is useful is when we know that more than one object might need to process a single request. This is what happens in an event-based programming. A single event such as a left mouse click can be caught by more than one listener. It is important to note that the Chain of Responsibility pattern is not very useful if all the requests can be taken care of by a single processing element, unless we really don't know which element that is. The value of this pattern is the decoupling that it offers. Instead of having a many-to-many relationship between a client and all processing elements (and the same is true regarding the relationship between a processing element and all other processing elements), a client only needs to know how to communicate with the start (head) of the chain. The following figure demonstrates the difference between tight and loose coupling. The idea behind loosely coupled systems is to simplify maintenance and make it easier for us to understand how they function [j.mp/loosecoup]: Implementation There are many ways to implement Chain of Responsibility in Python, but my favorite implementation is the one by Vespe Savikko [j.mp/savviko]. Vespe's implementation uses dynamic dispatching in a Pythonic style to handle requests [j.mp/ddispatch]. Let's implement a simple event-based system using Vespe's implementation as a guide. The following is the UML class diagram of the system: The Event class describes an event. We'll keep it simple, so in our case an event has only name: class Event: def __init__(self, name): self.name = name def __str__(self): return self.name The Widget class is the core class of the application. The parent aggregation shown in the UML diagram indicates that each widget can have a reference to a parent object, which by convention, we assume is a Widget instance. Note, however, that according to the rules of inheritance, an instance of any of the subclasses of Widget (for example, an instance of MsgText) is also an instance of Widget. The default value of parent is None: class Widget: def __init__(self, parent=None): self.parent = parent The handle() method uses dynamic dispatching through hasattr() and getattr() to decide who is the handler of a specific request (event). If the widget that is asked to handle an event does not support it, there are two fallback mechanisms. If the widget has parent, then the handle() method of parent is executed. If the widget has no parent but a handle_default() method, handle_default() is executed: def handle(self, event): handler = 'handle_{}'.format(event) if hasattr(self, handler): method = getattr(self, handler) method(event) elif self.parent: self.parent.handle(event) elif hasattr(self, 'handle_default'): self.handle_default(event) At this point, you might have realized why the Widget and Event classes are only associated (no aggregation or composition relationships) in the UML class diagram. The association is used to show that the Widget class "knows" about the Event class but does not have any strict references to it, since an event needs to be passed only as a parameter to handle(). MainWIndow, MsgText, and SendDialog are all widgets with different behaviors. Not all these three widgets are expected to be able to handle the same events, and even if they can handle the same event, they might behave differently. MainWIndow can handle only the close and default events: class MainWindow(Widget): def handle_close(self, event): print('MainWindow: {}'.format(event)) def handle_default(self, event): print('MainWindow Default: {}'.format(event)) SendDialog can handle only the paint event: class SendDialog(Widget): def handle_paint(self, event): print('SendDialog: {}'.format(event)) Finally, MsgText can handle only the down event: class MsgText(Widget): def handle_down(self, event): print('MsgText: {}'.format(event)) The main() function shows how we can create a few widgets and events, and how the widgets react to those events. All events are sent to all the widgets. Note the parent relationship of each widget. The sd object (an instance of SendDialog) has as its parent the mw object (an instance of MainWindow). However, not all objects need to have a parent that is an instance of MainWindow. For example, the msg object (an instance of MsgText) has the sd object as a parent: def main(): mw = MainWindow() sd = SendDialog(mw) msg = MsgText(sd) for e in ('down', 'paint', 'unhandled', 'close'): evt = Event(e) print('nSending event -{}- to MainWindow'.format(evt)) mw.handle(evt) print('Sending event -{}- to SendDialog'.format(evt)) sd.handle(evt) print('Sending event -{}- to MsgText'.format(evt)) msg.handle(evt) The following is the full code of the example (chain.py): class Event: def __init__(self, name): self.name = name def __str__(self): return self.name class Widget: def __init__(self, parent=None): self.parent = parent def handle(self, event): handler = 'handle_{}'.format(event) if hasattr(self, handler): method = getattr(self, handler) method(event) elif self.parent: self.parent.handle(event) elif hasattr(self, 'handle_default'): self.handle_default(event) class MainWindow(Widget): def handle_close(self, event): print('MainWindow: {}'.format(event)) def handle_default(self, event): print('MainWindow Default: {}'.format(event)) class SendDialog(Widget): def handle_paint(self, event): print('SendDialog: {}'.format(event)) class MsgText(Widget): def handle_down(self, event): print('MsgText: {}'.format(event)) def main(): mw = MainWindow() sd = SendDialog(mw) msg = MsgText(sd) for e in ('down', 'paint', 'unhandled', 'close'): evt = Event(e) print('nSending event -{}- to MainWindow'.format(evt)) mw.handle(evt) print('Sending event -{}- to SendDialog'.format(evt)) sd.handle(evt) print('Sending event -{}- to MsgText'.format(evt)) msg.handle(evt) if __name__ == '__main__': main() Executing chain.py gives us the following results: >>> python3 chain.py Sending event -down- to MainWindow MainWindow Default: down Sending event -down- to SendDialog MainWindow Default: down Sending event -down- to MsgText MsgText: down Sending event -paint- to MainWindow MainWindow Default: paint Sending event -paint- to SendDialog SendDialog: paint Sending event -paint- to MsgText SendDialog: paint Sending event -unhandled- to MainWindow MainWindow Default: unhandled Sending event -unhandled- to SendDialog MainWindow Default: unhandled Sending event -unhandled- to MsgText MainWindow Default: unhandled Sending event -close- to MainWindow MainWindow: close Sending event -close- to SendDialog MainWindow: close Sending event -close- to MsgText MainWindow: close There are some interesting things that we can see in the output. For instance, sending a down event to MainWindow ends up being handled by the default MainWindow handler. Another nice case is that although a close event cannot be handled directly by SendDialog and MsgText, all the close events end up being handled properly by MainWindow. That's the beauty of using the parent relationship as a fallback mechanism. If you want to spend some more creative time on the event example, you can replace the dumb print statements and add some actual behavior to the listed events. Of course, you are not limited to the listed events. Just add your favorite event and make it do something useful! Another exercise is to add a MsgText instance during runtime that has MainWindow as the parent. Is this hard? Do the same for an event (add a new event to an existing widget). Which is harder? Summary In this article, we covered the Chain of Responsibility design pattern. This pattern is useful to model requests / handle events when the number and type of handlers isn't known in advance. Examples of systems that fit well with Chain of Responsibility are event-based systems, purchase systems, and shipping systems. In the Chain Of Responsibility pattern, the sender has direct access to the first node of a chain. If the request cannot be satisfied by the first node, it forwards to the next node. This continues until either the request is satisfied by a node or the whole chain is traversed. This design is used to achieve loose coupling between the sender and the receiver(s). ATMs are an example of Chain Of Responsibility. The single slot that is used for all banknotes can be considered the head of the chain. From here, depending on the transaction, one or more receptacles is used to process the transaction. The receptacles can be considered the processing elements of the chain. Java's servlet filters use the Chain of Responsibility pattern to perform different actions (for example, compression and authentication) on an HTTP request. Apple's Cocoa frameworks use the same pattern to handle events such as button presses and finger gestures. Resources for Article: Further resources on this subject: Exploring Model View Controller [Article] Analyzing a Complex Dataset [Article] Automating Your System Administration and Deployment Tasks Over SSH [Article]
Read more
  • 0
  • 0
  • 5181

article-image-design-patterns-out-there-and-setting-your-environment
Packt
14 Jan 2016
27 min read
Save for later

The Design Patterns Out There and Setting Up Your Environment

Packt
14 Jan 2016
27 min read
In this article by Ivan Nikolov, author of the book Scala Design Patterns, explains in the world of computer programming, there are multiple different ways to create a solution that does something. However, some might contemplate whether there is a correct way of achieving a specific task. The answer is yes; there is always a right way, but in software development, there are usually multiple ways to do achieve a task. Some factors exist, which guide the programmer to the right solution, and depending on them, people tend to get the expected result. These factors could define many things—the actual language being used, algorithm, type of executable produced, output format, and the code structure. The language is already chosen for us—Scala. There are, however, a number of ways to use Scala, and we will be focusing on them—the design patterns. In this article, we will explain what design patterns are and why they exist. We will go through the different types of design patterns that are out there. This aims to provide useful examples to aid you in the learning process, and being able to run them easily is key. Hence, some points on how to set up a development environment properly will be given here. The top-level topics we will go through are as follows: What is a design pattern and why do they exist? The main types of design patterns and their features Choosing the right design pattern Setting up a development environment in real life The last point doesn't have to do much with design patterns. However, it is always a good idea to build projects properly, as this makes it much easier to work in future. (For more resources related to this topic, see here.) Design patterns Before delving into the Scala design patterns, we have to explain what they actually are, why they exist, and why it is worth being familiar with them. Software is a broad subject, and there are innumerable examples of things people can do with it. At first glance, most of the things are completely different—games, websites, mobile phone applications, and specialized systems for different industries. There are, however, many similarities in how software is built. Many times, people have to deal with similar issues no matter the type of software they create. For example, computer games as well as websites might need to access a database. And throughout time, by experience, developers learn how structuring their code differs for various tasks that they perform. A formal definition for design patterns would help you understand where we are actually trying to get using good practices in building software. The formal definition for design patterns A design pattern is a reusable solution to a recurring problem in a software design. It is not a finished piece of code, but a template, which helps solving the particular problem or family of problems. Design patterns are best practices to which the software community has arrived over a period of time. They are supposed to help write efficient, readable, testable, and easily extendable code. In some cases, they can be a result of a programming language not being expressive enough to elegantly achieve a goal. This means that more feature-rich languages might not even need a design pattern, while others still do. Scala is one of those rich languages, and in some cases, it makes use of a design pattern that is obsolete or simpler. The lack or existence of a certain functionality within a programming language also makes it able to implement additional design patterns that others cannot. The opposite is also valid—it might not be able to implement things that others can. Scala and design patterns Scala is a hybrid language, which combines features from object-oriented and functional languages. This not only allows it to keep some of the well-known object-oriented design patterns relevant, but it also provides various other ways of exploiting its features to write code, which is clean, efficient, testable, and extendable all at the same time. The hybrid nature of the language also makes some of the traditional object-oriented design patterns obsolete, or possible, using other cleaner techniques. The need for design patterns and their benefits Everybody needs design patterns and should look into some before writing code. As we mentioned earlier, they help writing efficient, readable, extendable, and testable code. All these features are really important to companies in the industry. Even though in some cases it is preferred to quickly write a prototype and get it out, usually the case is that a piece of software is supposed to evolve. Maybe you will have experience of extending some badly written code, but regardless, it is a challenging task and takes really long, and sometimes it feels that rewriting it would be easier. Moreover, this makes introducing bugs into the system much more likely. Code readability is also something that should be appreciated. Of course, one could use a design pattern and still have their code hard to read, but generally, design patterns help. Big systems are usually worked on by many people, and everyone should be able to understand what exactly is going on. Also, people who join a team are able to integrate much easily and quickly if they work on some well-written piece of software. Testability is something that prevents developers from introducing bugs when writing or extending code. In some cases, code could be created so badly that it is not even testable. Design patterns are supposed to eliminate these problems as well. While efficiency is many times connected to algorithms, design patterns could also affect it. A simple example could be an object, which takes a long time to instantiate and instances are used in many places in an application, but could be made singleton instead. Design pattern categories The fact that software development is an extremely broad topic leads to a number of things that can be done with programming. Everything is different and this leads to various requirements about the qualities of programs. All these facts have caused many different design patterns to be invented. This is further contributed to by the existence of various programming languages with different features and levels of expressiveness. This article focuses on the design patterns from the point of view of Scala. As we already mentioned previously, Scala is a hybrid language. This leads us to a few famous design patterns that are not needed anymore—one example is the null object design pattern, which can simply be replaced by Scala's Option. Other design patterns become possible using different approaches—the decorator design pattern can be implemented using stackable traits. Finally, some new design patterns become available, which are applicable specifically to the Scala programming language —the cake design pattern, pimp my library, and so on. We will focus on all of these and make it clear where the richness of Scala helps us to make our code even cleaner and simpler. Even if there are many different design patterns, they can all be grouped in a few main groups: Creational Structural Behavioral Functional Scala-specific design patterns Some of the design patterns that are specific to Scala can be assigned to the previous groups. They can either be additional or replacements of the already existing ones. They are typical to Scala and take advantage of some advanced language features or simply features not available in other languages. The first three groups contain the famous Gang of Four design patterns. Every design pattern article covers them and so will we. The rest, even if they can be assigned to one of the first three groups, will be specific to Scala and functional programming languages. In the next few subsections, we will explain the main characteristics of these groups and briefly present the actual design patterns that fall under them. Creational design patterns The creational design patterns deal with object creation mechanisms. Their purpose is to create objects in a way that is suitable to the current situation, which could lead to unnecessary complexity and the need of extra knowledge if they were not there. The main ideas behind the creational design patterns are as follows: Knowledge encapsulation about the concrete classes Hiding details about the actual creation and how objects are combined We will be focusing on the following creational design patterns in this article: The abstract factory pattern The factory method pattern The lazy initialization pattern The singleton pattern The object pool pattern The builder pattern The prototype pattern The following few sections give a brief definition of what these patterns are. The abstract factory design pattern This is used to encapsulate a group of individual factories that have a common theme. When used, the developer creates a specific implementation of the abstract factory and uses its methods in the same way as in the factory design pattern to create objects. It can be thought of as another layer of abstraction that helps instantiating classes. The factory method design pattern This design pattern deals with creation of objects without explicitly specifying the actual class that the instance will have—it could be something that is decided at runtime based on many factors. Some of these factors can include operating systems, different data types, or input parameters. It gives developers the peace of mind of just calling a method rather than invoking a concrete constructor. The lazy initialization design pattern This pattern is an approach to delay the creation of an object or the evaluation of a value until the first time it is needed. It is much more simplified in Scala than it is in an object-oriented language as Java. The singleton design pattern This design pattern restricts the creation of a specific class just to one object. If more than one class in the application tries to use such an instance, then this same instance is returned for everyone. This is another design pattern that with the use of basic Scala features can be easily achieved. The object pool design pattern This pattern uses a pool of objects that are already instantiated and ready for use. Whenever someone requires an object from the pool, it is returned, and after the user is finished with it, it puts it back into the pool manually or automatically. A common use for pools are database connections, which generally are expensive to create; hence, they are created once and then served to the application on request. The builder design pattern The builder design pattern is extremely useful for objects with many possible constructor parameters, which would otherwise require developers to create many overrides for the different scenarios an object could be created in. This is different to the factory design pattern, which aims to enable polymorphism. Many of the modern libraries today employ this design pattern. As we will see later, Scala can achieve this pattern really easily. The prototype design pattern This design pattern allows object creation using a clone() method from an already created instance. It can be used in cases when a specific resource is expensive to create or when the abstract factory pattern is not desired. Structural design patterns Structural design patterns exist in order to help establish the relationships between different entities in order to form larger structures. They define how each component should be structured so that it has very flexible interconnecting modules that can work together in a larger system. The main features of structural design patterns include the following: The use of composition to combine the implementations of multiple objects Help build a large system made of various components by maintaining a high level of flexibility In this article, we will focus on the following structural design patterns: Adapter ge Composite FacDecorator Bridade Flyweight Proxy The next subsections will put some light on what these patterns are about. The adapter design pattern The adapter design pattern allows the interface of an existing class to be used from another interface. Imagine that there is a client who expects your class to expose a doWork() method. You might have the implementation ready in another class but the method is called differently and is incompatible. It might require extra parameters too. This could also be a library that the developer doesn't have access to for modifications. This is where the adapter can help by wrapping the functionality and exposing the required methods. The adapter is useful for integrating the existing components. In Scala, the adapter design pattern can be easily achieved using implicit classes. The decorator design pattern Decorators are a flexible alternative to sub classing. They allow developers to extend the functionality of an object without affecting other instances of the same class. This is achieved by wrapping an object of the extended class into one that extends the same class and overrides the methods whose functionality is supposed to be changed. Decorators in Scala can be built much easily using another design pattern called stackable traits. The bridge design pattern The purpose of the bridge design pattern is to decouple an abstraction from its implementation so that the two can vary independently. It is useful when the class and its functionality vary a lot. The bridge reminds us of the adapter pattern, but the difference is that the adapter pattern is used when something is already there and you cannot change it, while the bridge design pattern is used when things are being built. It helps us to avoid ending up with multiple concrete classes that will be exposed to the client. You will get a clearer understanding when we delve deeper in the topic, but for now, let's imagine that we want to have a FileReader class that supports multiple different platforms. The bridge will help us end up with FileReader, which will use a different implementation, depending on the platform. In Scala, we can use self-types in order to implement a bridge design pattern. The composite design pattern The composite is a partitioning design pattern that represents a group of objects that are to be treated as only one object. It allows developers to treat individual objects and compositions uniformly and to build complex hierarchies without complicating the source code. An example of composite could be a tree structure where a node can contain other nodes, and so on. The facade design pattern The purpose of the facade design pattern is to hide the complexity of a system and its implementation details by providing a simpler interface to use to the client. This also helps to make the code more readable and to reduce the dependencies of the outside code. It works as a wrapper around the system that is being simplified, and of course, it can be used in conjunction with some of the other design patterns we mentioned previously. The flyweight design pattern The flyweight design pattern provides an object that is used to minimize memory usage by sharing it throughout the application. This object should contain as much data as possible. A common example given is a word processor, where each character's graphical representation is shared with the other same characters. The local information then is only the position of the character, which is stored internally. The proxy design pattern The proxy design pattern allows developers to provide an interface to other objects by wrapping them. They can also provide additional functionality, for example, security or thread-safety. Proxies can be used together with the flyweight pattern, where the references to shared objects are wrapped inside proxy objects. Behavioral design patterns Behavioral design patterns increase communication flexibility between objects based on the specific ways they interact with each other. Here, creational patterns mostly describe a moment in time during creation, structural patterns describe a more or less static structure, and behavioral patterns describe a process or flow. They simplify this flow and make it more understandable. The main features of behavioral design patterns are as follows: What is being described is a process or flow The flows are simplified and made understandable They accomplish tasks that would be difficult or impossible to achieve with objects In this article, we will focus our attention on the following behavioral design patterns: Value object Null object Strategy Command Chain of responsibility Interpreter Iterator Mediator Memento Observer State Template method Visitor The following subsections will give brief definitions of the aforementioned behavioral design patterns. The value object design pattern Value objects are immutable and their equality is based not on their identity, but on their fields being equal. They can be used as data transfer objects, and they can represent dates, colors, money amounts, numbers, and so on. Their immutability makes them really useful in multithreaded programming. The Scala programming language promotes immutability, and value objects are something that naturally occur there. The null object design pattern Null objects represent the absence of a value and they define a neutral behavior. This approach removes the need to check for null references and makes the code much more concise. Scala adds the concepts of optional values, which can replace this pattern completely. The strategy design pattern The strategy design pattern allows algorithms to be selected at runtime. It defines a family of interchangeable encapsulated algorithms and exposes a common interface to the client. Which algorithm is chosen could depend on various factors that are determined while the application runs. In Scala, we can simply pass a function as a parameter to a method, and depending on the function, a different action will be performed. The command design pattern This design pattern represents an object that is used to store information about an action that needs to be triggered at a later time. The information includes the following: The method name The owner of the method Parameter values The client then decides which commands to be executed and when by the invoker. This design pattern can easily be implemented in Scala using the by-name parameters feature of the language. The chain of responsibility design pattern The chain of responsibility is a design pattern where the sender of a request is decoupled from its receiver. This way, it makes it possible for multiple objects to handle the request and to keep logic nicely separate. The receivers form a chain where they pass the request, and if possible, they process it, and if not, they pass it to the next receiver. There are variations where a handler might dispatch the request to multiple other handlers at the same time. This somehow reminds of function composition, which in Scala can be achieved using the stackable traits design pattern. The interpreter design pattern The interpreter design pattern is based on the possibility to characterize a well-known domain with a language with strict grammar. It defines classes for each grammar rule in order to interpret sentences in the given language. These classes are likely to represent hierarchies as grammar is usually hierarchical as well. Interpreters can be used in different parsers, for example, SQL or other languages. The iterator design pattern The iterator design pattern is when an iterator is used to traverse a container and access its elements. It helps to decouple containers from the algorithms performed on them. What an iterator should provide is sequential access to the elements of an aggregate object without exposing the internal representation of the iterated collection. The mediator design pattern This pattern encapsulates the communication between different classes in an application. Instead of interacting directly with each other, objects communicate through the mediator, which reduces the dependencies between them, lowers the coupling, and makes the overall application easier to read and maintain. The memento design pattern This pattern provides the ability to rollback an object to its previous state. It is implemented with three objects: originator, caretaker, and memento. The originator is the object with the internal state; the caretaker will modify the originator, and a memento is an object that contains the state that the originator returns. The originator knows how to handle a memento in order to restore its previous state. The observer design pattern This pattern allows the creation of publish/subscribe systems. There is a special object called subject that automatically notifies all the observers when there are any changes in the state. This design pattern is popular in various GUI toolkits and generally where event handling is needed. It is also related to reactive programming, which is enabled by libraries such as Akka. The state design pattern This pattern is similar to the strategy design pattern, and it uses a state object to encapsulate different behavior for the same object. It improves the code's readability and maintainability by avoiding the use of large conditional statements. The template method design pattern This pattern defines the skeleton of an algorithm in a method and then passes some of the actual steps to the subclasses. It allows developers to alter some of the steps of an algorithm without having to modify its structure. An example of this could be a method in an abstract class that calls other abstract methods, which will be defined in the children. The visitor design pattern The visitor design pattern represents an operation to be performed on the elements of an object structure. It allows developers to define a new operation without changing the original classes. Scala can minimize the verbosity of this pattern compared to the pure object-oriented way of implementing it by passing functions to methods. Functional design patterns We will be looking into all of the preceding design patterns from the point of view of Scala. This means that they will look different than in other languages, but they've still been designed not specifically for functional programming. Functional programming is much more expressive than object-oriented programming. It has its own design patterns that help making the life of a programmer easier. We will focus on: Monoids Monads Functors After we've looked at some Scala functional programming concepts, and we've been through these, we will mention some interesting design patterns from the Scala world. A brief explanation of the preceding listed patterns will follow in the next few subsections. Monoids Monoid is a concept that comes from mathematics. For now, it will be enough to remember that a monoid is an algebraic structure with a single associative binary operation and an identity element. Here are the keywords you should remember: The associative binary operation. This means (a+b)+c = a+(b+c) The identity element. This means a+i = i+a = a. Here, the identity is i What is important about monoids is that they give us the possibility to work with many different types of values in a common way. They allow us to convert pairwise operations to work with sequences; the associativity gives us the possibility for parallelization, and the identity element allows us to know what to do with empty lists. Monoids are great to easily describe and implement aggregations. Monads In functional programming, monads are structures that represent computations as sequences of steps. Monads are useful for building pipelines, adding operations with side effects cleanly to a language where everything is immutable, and implementing compositions. This definition might sound vague and unclear, but explaining monads in a few sentences seems to be something hard to achieve. We will try to show why monads are useful and what they can help with as long as developers understand them well. Functors Functors come from a category theory, and as for monads, it takes time to explain them properly. For now, you could remember that functors are things that can allow us to lift a function of the type A => B to a function of the type F[A] => F[B]. Scala-specific design patterns The design patterns in this group could be assigned to some of the previous groups. However, they are specific to Scala and exploit some of the language features that we will focus on, and we've decided to place them in their own group. We will focus our attention on the following ones: The lens design pattern The cake design pattern Pimp my library Stackable traits The Type class design pattern Lazy evaluation Partial functions Implicit injection Duck typing Memoization The next subsections will give you some brief information about these patterns. The lens design pattern The Scala programming language promotes immutability. Having objects immutable makes it harder to make mistakes. However, sometimes mutability is required and the lens design pattern helps us achieve this nicely. The cake design pattern The cake design pattern is the Scala way to implement dependency injection. It is something that is used quite a lot in real-life applications, and there are numerous libraries that help developers achieve it. Scala has a way of doing this using language features, and this is what the cake design pattern is all about. Pimp my library Many times, engineers need to work with libraries, which are made to be as generic as possible. Sometimes, we need to do something more specific to our use case, though. The pimp my library design pattern provides a way to write extension methods for libraries, which we cannot modify. We can also use it for our own libraries as well. This design pattern also helps to achieve a better code readability. Stackable traits The stackable traits is the Scala way to implement the decorator design pattern. It can also be used to compose functions, and it's based on a few advanced Scala features. The type class design pattern This pattern allows us to write generic code by defining a behavior that must be supported by all members of a specific type class. For example, all numbers must support the addition and subtraction operations. Lazy evaluation Many times, engineers have to deal with operations that are slow and/or expensive. Sometimes, the result of these operations might not even be needed. Lazy evaluation is a technique that postpones the operation execution until it is actually needed. It could be used for application optimization. Partial functions Mathematics and functional programming are really close together. As a consequence, there are functions that exist that are only defined for a subset of all the possible input values they can get. A popular example is the square root function, which only works for non-negative numbers. In Scala, such functions can be used to efficiently perform multiple operations at the same time or to compose functions. Implicit injection Implicit injection is based on the implicit functionality of the Scala programming language. It automatically injects objects whenever they are needed, as long as they exist in a specific scope. It can be used for many things, including dependency injection. Duck typing This is a feature that is available in Scala and is similar to what some dynamic languages provide. It allows developers to write code, which requires the callers to have some specific methods (but not implement an interface). When someone uses a method with a duck type, it is actually checked during compile time whether the parameters are valid. Memoization This design pattern helps with optimization by remembering function results, based on the inputs. This means that as long as the function is stable and will return the same result when the same parameters are passed, one can remember its results and simply return them for every consecutive identical call. How to choose a design pattern As we already saw, there is a huge number of design patterns. In many cases, they are suitable to be used in combinations as well. Unfortunately, there is no definite answer about how to choose the concept of designing our code. There are many factors that could affect the final decision, and you should ask yourselves the following questions: Is this piece of code going to be fairly static or will it change in the future? Do we have to dynamically decide what algorithms to use? Is our code going to be used by others? Do we have an agreed interface? What libraries are we planning to use if any? Are there any special performance requirements or limitations? This is by no means an exhaustive list of questions. There are a lot of amount of factors that could dictate our decision in how we build our systems. It is, however, really important to have a clear specification, and if something seems missing, it should always be checked first. They should help you ask the right questions and take the right decision before going on and writing code. Setting up the development environment This will aim to give real code examples for you to run and experiment with. This means that it is important to be able to easily run any examples we have provided here and not to fight with the code. We will do our best to have the code tested and properly packaged, but you should also make sure that you have everything needed for the examples. Installing Scala Of course, you will need the Scala programming language. It evolves quickly, and the newest version could be found at http://www.scala-lang.org/download/. There are a few tips about how to install the language in your operating system at http://www.scala-lang.org/download/install.html. Tips about installing Scala You can always download multiple versions of Scala and experiment with them. I use Linux and my tips will be applicable to Mac OS users, too. Windows users can also do a similar setup. Here are the steps: Install Scala under /opt/scala-{version}/. Then, create a symlink using the following command: sudo ln -s /opt/scala-{version} scala-current. Finally, add the path to the Scala bin folder to my .bashrc (or equivalent) file using the following lines: export SCALA_HOME=/opt/scala-current and export PATH=$PATH:$SCALA_HOME/bin. This allows us to quickly change versions of Scala by just redefining the symlink. Another way to experiment with any Scala version is to install SBT (you can find more information on this). Then, simply run sbt in your console, type ++ 2.11.7 (or any version you want), and then issue the console command. Now you can test Scala features easily. Using SBT or Maven or any other build tool will automatically download Scala for you. If you don't need to experiment with the console, you can skip the preceding steps. Using the preceding tips, we can use the Scala interpreter by just typing scala in the terminal or follow the sbt installation process and experiment with different language features in the REPL. Scala IDEs There are multiple IDEs out there that support development in Scala. There is absolutely no preference about which one to use to work with the code. Some of the most popular ones are as follows: IntelliJ Eclipse NetBeans They contain plugins to work with Scala, and downloading and using them should be straightforward. Summary By now, we have a fair idea about what a design pattern means and how it can affect the way we write our code. We've iterated the most famous design patterns out there, and we have outlined the main differences between them. We saw that in many cases, we could use Scala's features in order to make a pattern obsolete, simpler, or different to implement compared to the classical case for pure object-oriented languages. Knowing what to look for when picking a design pattern is important, and you should already know what specific details to watch out for and how important specifications are. Resources for Article: Further resources on this subject: Content-based recommendation [article] Getting Started with Apache Spark DataFrames [article] RESTServices with Finagle and Finch [article]
Read more
  • 0
  • 0
  • 4964
Modal Close icon
Modal Close icon