Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-css-animation-animatecss
Jabran Rafique
15 Nov 2016
6 min read
Save for later

CSS animation with Animate.css

Jabran Rafique
15 Nov 2016
6 min read
CSS animation is one of the best and easiest ways to make a web page interactive. These animations were made available as part of CSS3 specifications. Animate.css is one of the best libraries available for CSS animations, with a collection of more than 50 types of animations. It was built and maintained by designer Daniel Eden and many other contributors. Animate.css is an open source library and available at GitHub under MIT License. Installation There are a few ways to install the library: Download directly from Github Use directly from a Content Delivery Network (CDN) Install using Bower $ bower install animate.css --save Install using npm $ npm install animate.css --save Usage Now that we have Animate.css installed, using one of the above methods, we can start using it straight away in our app. For simplicity we will use the CDN URL to consume the Animate.css in our tutorial app. Here is a basic structure of our index.html: <!DOCTYPE html> <html> <head> <title>Learning Animate.css</title> <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/animate.css/3.5.2/animate.min.css"> </head> <body> <div class="container"> <h1>Learning Animate.css</h1> <p>Animate this text!</p> </div> </body> </html> We will use Animate.css effects on paragraph tag in this HTML document. Animate.css has a default class, .animated, that triggers the animation. Other animation effects are used in addition to this default class. For example, let's apply a bounce animation to our paragraph text: ... <p class="animated bounce">Animate this text!</p> ... Here is the demo for this: See the Pen Learning Animate.css – Demo by Jabran Rafique (@jabranr) on CodePen. Similarly, we can replace the bounce class with any other class from Animate.css and it will get the specific animation. The animations will run only once; they will run again only after we reload the browser. To make it run infinitely add the .infinite class so that it becomes: ... <p class="animated infinite bounce">This text will animate infinite times.</p> ... Here is the demo for this: See the Pen Learning Animate.css – Demo (Infinite animation) by Jabran Rafique (@jabranr) on CodePen. Animate.css works best when used dynamically as it sets more control over animations. To use it dynamically, we will make use of JavaScript. Let's try that with the following basic example: First we will add another class to the element, so we can manipulate the element via JavaScript. ... <p class="js-animate">Animate this text dynamically!</p> ... Now we can manipulate this DOM element using JavaScript to add/remove Animate.css animations when clicked on text. var $elementToAnimate = document.querySelector('.js-animate'); $elementToAnimate.addEventListener('click', function(e) { this.classList.toggle('animated'); if (this.classList.contains('bounce')) { this.classList.remove('bounce'); } else { this.classList.add('bounce'); } }, false); Here is the demo for this: See the Pen Learning Animate.css – Demo (Dynamic animations) – part 3 by Jabran Rafique (@jabranr) on CodePen. Similarly, you can also use a DOM manipulation library such as jQuery to apply Animate.css classes. In fact, the official documentation provides a short helper method that registers itself with jQuery as a plugin and can easily be used by any element in the DOM. Here is the code for this: $.fn.extend({ animateCss: function (animationName) { var animationEnd = 'webkitAnimationEnd mozAnimationEnd MSAnimationEnd oanimationend animationend'; this.addClass('animated ' + animationName).one(animationEnd, function() { $(this).removeClass('animated ' + animationName); }); } }); Now we can apply Animate.css animation to any element as follows: $('.js-animate').animateCss('bounce'); Customize Animate.css Using more than 50 different animations in a single project may not be an ideal scenario. Also, if we are looking at the performance side, then adding the complete library in order to use one or few animations may not be useful. Therefore, it is best to customize the library and use only those parts that are needed by the project. Animate.css comes with a built-in Gulp workflow that makes it easier to customize the library to our needs. In order to use the Gulp workflow, the following must be installed already: Node Gulp npm install -g gulp To customize the Animate.css, simply open the animate-config.json file and comment out the animations that are NOT required by the project. Then run the gulp command in terminal to generate a customized CSS file. The terminal window will inform on how many animations were activated into generated customized CSS file. Customizing the Animate.css not only boosts performance, but also keeps the code clean by including only parts that are actually being used in the project. Sass version Sass is a CSS preprocessor language that lets us use variables and methods to reuse code inside our project. Sass files are ultimately converted into CSS files before they are used by any web platform. A Sass-only version is under development. The in-development source code can be found under sass branch of the project at GitHub. Once released, this will provide more robust control for customized and personalized animations by relying less on JavaScript, that is Gulp workflow. Contribute Contributions to any open source project make it robust and more useful with bugfixes and new features. Just like any other open source project, Animate.css also welcomes contributions. To contribute to Animate.css project just head over to GitHub project and fork the repository to work on an existing issue, or by adding a new one. All of the demos in this tutorial can be found at CodePen as Collection. Hope this quick get started guide will make it easier to use the Animate.css library in your next project to create awesome animated effects. About the author Jabran Rafique is a London-based web engineer. He currently works as a Front-End Web Developer at Rated People. He has a Masters in Computer Science from Staffordshire University and more than 6 years of professional experience in web systems. He had also served as Regional Lead and Advocate at Google Map Maker in 2008, where he contributed to building digital maps in order to make them available to millions of people worldwide. He has organized and spoken at several international events as well. He writes on his website/blog about different things, and shares code at GitHub and thoughts on Twitter.
Read more
  • 0
  • 0
  • 9364

article-image-installing-a-blockchain-network-using-hyperledger-fabric-and-composertutorial
Savia Lobo
01 Apr 2019
6 min read
Save for later

Installing a blockchain network using Hyperledger Fabric and Composer[Tutorial]

Savia Lobo
01 Apr 2019
6 min read
This article is an excerpt taken from the book Hands-On IoT Solutions with Blockchain written by Maximiliano Santos and Enio Moura. In this book, you'll learn how to work with problem statements and learn how to design your solution architecture so that you can create your own integrated Blockchain and IoT solution. In this article, you will learn how to install your own blockchain network using Hyperledger Fabric and Composer. We can install the blockchain network using Hyperledger Fabric by many means, including local servers, Kubernetes, IBM Cloud, and Docker. To begin with, we'll explore Docker and Kubernetes. Setting up Docker Docker can be installed using information provided on https://www.docker.com/get-started. Hyperledger Composer works with two versions of Docker: Docker Composer version 1.8 or higher Docker Engine version 17.03 or higher If you already have Docker installed but you're not sure about the version, you can find out what the version is by using the following command in the terminal or command prompt: docker –version Be careful: many Linux-based operating systems, such as Ubuntu, come with the most recent version of Python (Python 3.5.1). In this case, it's important to get Python version 2.7. You can get it here: https://www.python.org/download/releases/2.7/. Installing Hyperledger Composer We're now going to set up Hyperledger Composer and gain access to its development tools, which are mainly used to create business networks. We'll also set up Hyperledger Fabric, which can be used to run or deploy business networks locally. These business networks can be run on Hyperledger Fabric runtimes in some alternative places as well, for example, on a cloud platform. Make sure that you've not installed the tools and used them before. If you have, you'll them using this guide. Components To successfully install Hyperledger Composer, you'll need these components ready: CLI Tools Playground Hyperledger Fabric An IDE Once these are set up, you can begin with the steps given here. Step 1 – Setting up CLI Tools CLI Tools, composer-cli, is a library with the most important operations, such as administrative, operational, and developmental tasks. We'll also install the following tools during this step: Yeoman: Frontend tool for generating applications Library generator: For generating application assets REST server: Utility for running a REST server (local) Let's start our setup of CLI Tools:  Install CLI Tools: npm install -g [email protected] Install the library generator: npm install -g [email protected] Install the REST server: npm install -g [email protected] This will allow for integration with a local REST server to expose your business networks as RESTful APIs. Install Yeoman: npm install -g yo Don't use the su or sudo commands with npm to ensure that the current user has all permissions necessary to run the environment by itself. Step 2 – Setting up Playground Playground can give you a UI in your local machine if using your browser to run Playground. This will allow you to display your business networks, browse apps to test edit, and test your business networks. Use the following command to install Playground: npm install -g [email protected] Now we can run Hyperledger Fabric. Step 3 – Hyperledger Fabric This step will allow you to run a Hyperledger Fabric runtime locally and deploy your business networks: Choose a directory, such as ~/fabric-dev-servers. Now get the .tar.gz file, which contains the tools for installing Hyperledger Fabric: mkdir ~/fabric-dev-servers && cd ~/fabric-dev-servers curl -O https://raw.githubusercontent.com/hyperledger/composer-tools/master/packages/fabric-dev-servers/fabric-dev-servers.tar.gz tar -xvf fabric-dev-servers.tar.gz You've downloaded some scripts that will allow the installation of a local Hyperledger Fabric v1.2 runtime. To download the actual environment Docker images, run the following commands in your user home directory: cd ~/fabric-dev-servers export FABRIC_VERSION=hlfv12 ./downloadFabric.sh Well done! Now you have everything required for a typical developer environment. Step 4 – IDE Hyperledger Composer allows you to work with many IDEs. Two well-known ones are Atom and VS Code, which both have good extensions for working with Hyperledger Composer. Atom lets you use the composer-atom plugin (https://github.com/hyperledger/composer-atom-plugin) for syntax highlighting of the Hyperledger Composer Modeling Language. You can download this IDE at the following link: https://atom.io/. Also, you can download VS Code at the following link: https://code.visualstudio.com/download. Installing Hyperledger Fabric 1.3 using Docker There are many ways to download the Hyperledger Fabric platform; Docker is the most used method. You can use an official repository. If you're using Windows, you'll want to use the Docker Quickstart Terminal for the upcoming terminal commands. If you're using Docker for Windows, follow these instructions: Consult the Docker documentation for shared drives, which can be found at https://docs.docker.com/docker-for-windows/#shared-drives, and use a location under one of the shared drives. Create a directory where the sample files will be cloned from the Hyperledger GitHub repository, and run the following commands: $ git clone -b master https://github.com/hyperledger/fabric-samples.git To download and install Hyperledger Fabric on your local machine, you have to download the platform-specific binaries by running the following command: $ curl -sSl https://goo.gl/6wtTN5 | bash -s 1.1.0 The complete installation guide can be found on the Hyperledger site. Deploying Hyperledger Fabric 1.3 to a Kubernetes environment This step is recommended for those of you who have the experience and skills to work with Kubernetes, a cloud environment, and networks and would like an in-depth exploration of Hyperledger Fabric 1.3. Kubernetes is a container orchestration platform and is available on major cloud providers such as Amazon Web Services, Google Cloud Platform, IBM, and Azure. Marcelo Feitoza Parisi, one of IBM's brilliant cloud architects, has created and published a guide on GitHub on how to set up a Hyperledger Fabric production-level environment on Kubernetes. The guide is available at https://github.com/feitnomore/hyperledger-fabric-kubernetes. If you've enjoyed reading this post, head over to the book, Hands-On IoT Solutions with Blockchain to understand how IoT and blockchain technology can help to solve the modern food chain and their current challenges. IBM announces the launch of Blockchain World Wire, a global blockchain network for cross-border payments Google expands its Blockchain search tools, adds six new cryptocurrencies in BigQuery Public Datasets Blockchain governance and uses beyond finance – Carnegie Mellon university podcast
Read more
  • 0
  • 0
  • 9357

article-image-getting-started-odoo-development
Packt
06 Apr 2015
14 min read
Save for later

Getting Started with Odoo Development

Packt
06 Apr 2015
14 min read
In this article by Daniel Reis, author of the book Odoo Development Essentials, we will see how to get started with Odoo. Odoo is a powerful open source platform for business applications. A suite of closely integrated applications was built on it, covering all business areas from CRM and Sales to Accounting and Stocks. Odoo has a dynamic and growing community around it, constantly adding features, connectors, and additional business apps. Many can be found at Odoo.com. In this article, we will guide you to install Odoo from the source code and create your first Odoo application. Inspired by the todomvc.com project, we will build a simple to-do application. It should allow us to add new tasks, mark them as completed, and finally, clear the task list from all already completed tasks. (For more resources related to this topic, see here.) Installing Odoo from source We will use a Debian/Ubuntu system for our Odoo server, so you will need to have it installed and available to work on. If you don't have one, you might want to set up a virtual machine with a recent version of Ubuntu Server before proceeding. For a development environment, we will install it directly from Odoo's Git repository. This will end up giving more control on versions and updates. We will need to make sure Git is installed. In the terminal, type the following command: $ sudo apt-get update && sudo apt-get upgrade # Update system $ sudo apt-get install git # Install Git To keep things tidy, we will keep all our work in the /odoo-dev directory inside our home directory: $ mkdir ~/odoo-dev # Create a directory to work in $ cd ~/odoo-dev # Go into our work directory Now, we can use this script to show how to install Odoo from source code in a Debian system: $ git clone https://github.com/odoo/odoo.git -b 8.0 --depth=1 $ ./odoo/odoo.py setup_deps # Installs Odoo system dependencies $ ./odoo/odoo.py setup_pg   # Installs PostgreSQL & db superuser Quick start an Odoo instance In Odoo 8.0, we can create a directory and quick start a server instance for it. We can start by creating the directory called todo-app for our instance as shown here: $ mkdir ~/odoo-dev/todo-app $ cd ~/odoo-dev/todo-app Now we can create our todo_minimal module in it and initialize the Odoo instance: $ ~/odoo-dev/odoo/odoo.py scaffold todo_minimal $ ~/odoo-dev/odoo/odoo.py start -i todo_minimal The scaffold command creates a module directory using a predefined template. The start command creates a database with the current directory name and automatically adds it to the addons path so that its modules are available to be installed. Additionally, we used the -i option to also install our todo_minimal module. It will take a moment to initialize the database, and eventually we will see an INFO log message Modules loaded. Then, the server will be ready to listen to client requests. By default, the database is initialized with demonstration data, which is useful for development databases. Open http://<server-name>:8069 in your browser to be presented with the login screen. The default administrator account is admin with the password admin. Whenever you want to stop the Odoo server instance and return to the command line, press CTRL + C. If you are hosting Odoo in a virtual machine, you might need to do some network configuration to be able to use it as a server. The simplest solution is to change the VM network type from NAT to Bridged. Hopefully, this can help you find the appropriate solution in your virtualization software documentation. Creating the application models Now that we have an Odoo instance and a new module to work with, let's start by creating the data model. Models describe business objects, such as an opportunity, a sales order, or a partner (customer, supplier, and so on). A model has data fields and can also define specific business logic. Odoo models are implemented using a Python class derived from a template class. They translate directly to database objects, and Odoo automatically takes care of that when installing or upgrading the module. Let's edit the models.py file in the todo_minimal module directory so that it contains this: # -*- coding: utf-8 -*- from openerp import models, fields, api   class TodoTask(models.Model):    _name = 'todo.task'    name = fields.Char()    is_done = fields.Boolean()    active = fields.Boolean(default=True) Our to-do tasks will have a name title text, a done flag, and an active flag. The active field has a special meaning for Odoo; by default, records with a False value in it won't be visible to the user. We will use it to clear the tasks out of sight without actually deleting them. Upgrading a module For our changes to take effect, the module has to be upgraded. The simplest and fastest way to make all our changes to a module effective is to go to the terminal window where you have Odoo running, stop it (CTRL + C), and then restart it requesting the module upgrade. To start upgrading the server, the todo_minimal module in the todo-app database, use the following command: $ cd ~/odoo-dev/todo-app # we should be in the right directory $ ./odoo.py start -u todo_minimal The -u option performs an upgrade on a given list of modules. In this case, we upgrade just the todo_minimal module. Developing a module is an iterative process. You should make your changes in gradual steps and frequently install them with a module upgrade. Doing so will make it easier to detect mistakes sooner, and narrowing down the culprit in case the error message is not clear enough. And this can be very frequent when starting with Odoo development. Adding menu options Now that we have a model to store our data, let's make it available on the user interface. All we need is to add a menu option to open the to-do task model, so it can be used. This is done using an XML data file. Let's reuse the templates.xml data file and edit it so that it look like this: <openerp> <data>    <act_window id="todo_task_action"                name="To-do Task Action"                    res_model="todo.task" view_mode="tree,form" />        <menuitem id="todo_task_menu"                 name="To-do Tasks"                  action="todo_task_action"                  parent="mail.mail_feeds"                  sequence="20" /> </data> </openerp> Here, we have two records: a menu option and a window action. The Communication top menu to the user interface was added by the mail module dependency. We can know the identifier of the specific menu option where we want to add our own menu option by inspecting that module, it is mail_feeds. Also, our menu option executes the todo_task_action action we created. And that window action opens a tree view for the todo.task model. If we upgrade the module now and try the menu option just added, it will open an automatically generated view for our model, allowing to add and edit records. Views should be defined for models to be exposed to the users, but Odoo is nice enough to do that automatically if we don't, so we can work with our model right away, without having any form or list views defined yet. So far so good. Let's improve our user interface now. Creating Views Odoo supports several types of views, but the more important ones are list (also known as "tree"), form, and search views. For our simple module, we will just add a list view. Edit the templates.xml file to add the following <record> element just after the <data> opening tag at the top:    <record id="todo_task_tree" model="ir.ui.view">        <field name="name">To-do Task Form</field>        <field name="model">todo.task</field>        <field name="arch" type="xml">            <tree editable="top" colors="gray:is_done==True">                <field name="name" />                <field name="is_done" />            </tree>          </field>    </record> This creates a tree view for the todo.task model with two columns: the title name and the is_done flag. Additionally, it has a color rule to display the tasks done in gray. Adding business logic We want to add business logic to be able to clear the already completed tasks. Our plan is to add an option on the More button, shown at the top of the list when we select lines. We will use a very simple wizard for this, opening a confirmation dialog, where we can execute a method to inactivate the done tasks. Wizards use a special type of model for temporary data: a Transient model. We will now add it to the models.py file as follows: class TodoTaskClear(models.TransientModel):    _name = 'todo.task.clear'      @api.multi    def do_clear_done(self):        Task = self.env['todo.task']        done_recs = Task.search([('is_done', '=', True)])        done_recs.write({'active': False)}        return True Transient models work just like regular models, but their data is temporary and will eventually be purged from the database. In this case, we don't need any fields, since no additional input is going to be asked to the user. It just has a method that will be called when the confirmation button is pressed. It lists all tasks that are done and then sets their active flag to False. Next, we need to add the corresponding user interface. In the templates.xml file, add the following code:    <record id="todo_task_clear_dialog" model="ir.ui.view">      <field name="name">To-do Clear Wizard</field>      <field name="model">todo.task.clear</field>      <field name="arch" type="xml">        <form>           All done tasks will be cleared, even if            unselected.<br/>Continue?            <footer>                <button type="object"                        name="do_clear_done"                        string="Clear                        " class="oe_highlight" />                or <button special="cancel"                            string="Cancel"/>            </footer>        </form>      </field>    </record>      <!-- More button Action -->    <act_window id="todo_task_clear_action"        name="Clear Done"        src_model="todo.task"        res_model="todo.task.clear"        view_mode="form"        target="new" multi="True" /> The first record defines the form for the dialog window. It has a confirmation text and two buttons on the footer: Clear and Cancel. The Clear button when pressed will call the do_clear_done() method defined earlier. The second record is an action that adds the corresponding option in the More button for the to-do tasks model. Configuring security Finally, we need to set the default security configurations for our module. These configurations are usually stored inside the security/ directory. We need to add them to the __openerp__.py manifest file. Change the data attribute to the following: 'data': [  'security/ir.model.access.csv',    'security/todo_access_rules.xml',    'templates.xml'], The access control lists are defined for models and user groups in the ir.model.access.csv file. There is a pre-generated template. Edit it to look like this: id,name,model_id:id,group_id:id,perm_read,perm_write,perm_create,perm_unlink access_todo_task_user,To-do Task User Access,model_todo_task,base.group_user,1,1,1,1 This gives full access to all users in the base group named Employees. However, we want each user to see only their own to-do tasks. For that, we need a record rule setting a filter on the records the base group can see. Inside the security/ directory, add a todo_access_rules.xml file to define the record rule: <openerp> <data>    <record id="todo_task_user_rule" model="ir.rule">        <field name="name">ToDo Tasks only for owner</field>        <field name="model_id" ref="model_todo_task"/>        <field name="domain_force">[('create_uid','=',user.id)]              </field>        <field name="groups"                eval="[(4, ref('base.group_user'))]"/>    </record> </data> </openerp> This is all we need to set up the module security. Summary We created a new module from start, covering the most frequently used elements in a module: models, user interface views, business logic in model methods, and access security. In the process, we got familiar with the module development process, involving module upgrades and application server restarts to make the gradual changes effective in Odoo. Resources for Article: Further resources on this subject: Making Goods with Manufacturing Resource Planning [article] Machine Learning in IPython with scikit-learn [article] Administrating Solr [article]
Read more
  • 0
  • 0
  • 9352

article-image-c-sfml-visual-studio-and-starting-first-game
Packt
19 Aug 2016
15 min read
Save for later

C++, SFML, Visual Studio, and Starting the first game

Packt
19 Aug 2016
15 min read
In this article by John Horton, author of the book, Beginning C++ Game Programming, we will waste no time in getting you started on your journey to writing great games for the PC, using C++ and the OpenGL powered SFML.We will learn absolutely everything we need, to have the first part of our first game up and running. Here is what we will do now. (For more resources related to this topic, see here.) Find out about the games we will build Learn a bit about C++ Explore SFML and its relationship with C++ Look at the software, Visual Studio Setup a game development environment Create a reusable project template which will save a lot of time Plan and prepare for the first game project Timber!!! Write the first C++ code and make a runnable game that draws a background The games The journey will be mainly smooth as we will learn the fundamentals of the super-fast C++ language, a step at a time and then put the new knowledge to use, adding cool features to the three games we are building. Timber!!! The first game is an addictive, fast-paced clone of the hugely successful Timberman http://store.steampowered.com/app/398710/. Our game, Timber!!!, will allow us to be introduced to all the C++ basics at the same time as building a genuinely playable game. Here is what our version of the game will look like when we are done and we have added a few last-minute enhancements. The Walking Zed Next we will build a frantic, zombie survival-shooter, not unlike the Steam hit, Over 9000 Zombies http://store.steampowered.com/app/273500/. The player will have a machine gun, the ability to gather resources and build defenses. All this will take place in a randomly generated, scrolling world. To achieve this we will learn about object oriented programming and how it enables us to have a large code base (lots of code) that is easy to write and maintain. Expect exciting features like hundreds of enemies, rapid fire weaponry and directional sound. Thomas gets a real friend The third game will be a stylish and challenging, single player and coop, puzzle platformer. It is based on the very popular game, Thomas was Alone http://store.steampowered.com/app/220780/. Expect to learn cool topics like particle effects, OpenGL Shaders and multiplayer networking. If you want to play any of the games now, you can do so from the download bundle in the Runnable Games folder. Just double-click on the appropriate .exe file. Let's get started by introducing C++, Visual Studio and SFML! C++ One question you might have is, why use C++ at all? C++ is fast, very fast. What makes this so, is the fact that the code that we write is directly translated into machine executable instructions. These instructions together, are what makes the game. The executable game is contained within a .exe file which the player can simply double-click to run. There are a few steps in the process. First the pre-processor looks to see if any other code needs to be included within our own and adds it if necessary. Next, all the code is compiled into object files by the compiler program. Finally a third program called the linker, joins all the object files into the executable file which is our game. In addition, C++ is well established at the same time as being extremely up-to-date. C++ is an Object Oriented Programming (OOP) language which means we can write and organize our code in a proven way that makes our games efficient and manageable. Most of this other code that I refered to, you might be able to guess, is SFML, and we will find out more about SFML in just a minute. The pre-processor, compiler and linker programs I have just mentioned, are all part of the Visual Studio Integrated Development Environment (IDE). Microsoft Visual Studio Visual Studio hides away the complexity of the pre-processing, compiling and linking. It wraps it all up into the press of a button. In addition to this, it provides a slick user interface for us to type our code and manage, what will become a large selection of code files and other project assets as well. While there are advanced versions of Visual Studio that cost hundreds of dollars, we will be able to build all three of our games in the free Express 2015 for Desktop version. SFML Simple Fast Media Library (SFML) is not the only C++ library for games. It is possible to make an argument to use other libraries, but SFML seems to come through in first place, for me, every time. Firstly it is written using object oriented C++. Perhaps the biggest benefit is that all modern C++ programming uses OOP. Every C++ beginners guide I have ever read. uses and teaches OOP. OOP is the future (and the now) of coding in almost all languages in fact. So why, if your learning C++ from the beginning, would you want to do it any other way? SFML has a module (code) for just about anything you would ever want to do in a 2d game. SFML works using OpenGL that can also make 3d games. OpenGL is the de-facto free-to-use graphics library for games. When you use SFML, you are automatically using OpenGL. SFML drastically simplifies: 2d graphics and animation including scrolling game-worlds. Sound effects and music playback, including high quality directional sound. Online multiplayer features The same code can be compiled and linked on all major desktop operating systems, and soon mobile as well! Extensive research has not uncovered any more suitable way to build 2d games for PC, even for expert developers and especially if you are a beginner and want to learn C++ in a fun gaming environment. Setting up the development environment Now you know a bit more about how we will be making games, it is time to set up a development environment so we can get coding. What about Mac and Linux? The games that we make can be built to run on Windows, Mac and Linux! The code we use will be identical for each each. However, each version does need to be compiled and linked on the platform for which it is intended and Visual Studio will not be able to help us with Mac and Linux. Although, I guess, if you are an enthusiastic Mac or Linux user and you are comfortable with your operating system, the vast majority of the challenge you will encounter, will be in the initial setup of the development environment, SFML and the first project. For Linux, read this for an overview: http://www.sfml-dev.org/tutorials/2.0/start-linux.php For Linux, read this for step-by-step: http://en.sfml-dev.org/forums/index.php?topic=9808.0 On Mac, read this tutorial as well as the linked out articles: http://www.edparrish.net/common/sfml-osx.html Installing Visual Studio Express 2015 for Desktop Installing Visual Studio can be almost as simple as downloading a file and clicking a few buttons. It will help us, however, if we carefully run through exactly how we do this. For this reason I will walk through the installation process a step at a time. The Microsoft Visual Studio site says, you need 5 GB of hard disk space. From experience, however, I would suggest you need at least 10 GB of free space. In addition, these figures are slightly ambiguous. If you are planning to install on a secondary hard drive, you will still need at least 5 GB on the primary hard drive because no matter where you choose to install Visual Studio, it will need this space too. To summarize this ambiguous situation: It is essential to have a full 10 GB space on the primary hard disk, if you will be installing Visual Studio to that primary hard disk. On the other hand, make sure you have 5 GB on the primary hard disk as well as 10 GB on the secondary, if you intend to install to a secondary hard disk. Yep, stupid, I know! The first thing you need is a Microsoft account and the login details. If you have a Hotmail or MSN email address then you already have one. If not, you can sign up for a free one here:https://login.live.com/. Visit this link: https://www.visualstudio.com/en-us/downloads/download-visual-studio-vs.aspx. Click on Visual Studio 2015, then Express 2015 for desktop then the Download button. This next image shows the three places to click. Wait for the short download to complete and then run the downloaded file. Now you just need to follow the on-screen instructions. However, make a note of the folder where you choose to install Visual Studio. If you want to do things exactly the same as me, then create a new folder called Visual Studio 2015 on your preferred hard disk and install to this folder. This whole process could take a while depending on the speed of your Internet connection. When you see the next screen, click on Launch and enter your Microsoft account login details. Now we can turn to SFML. Setting up SFML This short tutorial will step through downloading the SFML files, that allows us to include the functionality contained in the library as well as the files, called DLL files, that will enable us to link our compiled object code to the SFML compiled object code. Visit this link on the SFML website: http://www.sfml-dev.org/download.php. Click on the button that says Latest Stable Version as shown next. By the time you read this guide, the actual latest version will almost certainly have changed. That doesn't matter as long as you do the next step just right. We want to download the 32 bit version for Visual C++ 2014. This might sound counter-intuitive because we have just installed Visual Studio 2015 and you probably(most commonly) have a 64 bit PC. The reason we choose the download that we do, is because Visual C++ 2014 is part of Visual Studio 2015 (Visual Studio does more than C++) and we will be building games in 32 bit so they run on both 32 and 64 bit machines. To be clear click the download indicated below. When the download completes, create a folder at the root of the same drive where you installed Visual Studio and name it SFML. Also create another folder at the root of the drive where you installed Visual Studio and call it Visual Studio Stuff. We will store all kinds of Visual Studio related things here so Stuff seems like a good name. Just to be clear, here is what my hard drive looks like after this step Obviously, the folders you have in between the highlighted three folders in the image will probably be totally different to mine. Now, ready for all the projects we will soon be making, create a new folder inside Visual Studio Stuff. Name the new folder Projects. Finally, unzip the SFML download. Do this on your desktop. When unzipping is complete you can delete the zip folder. You will be left with a single folder on your desktop. Its name will reflect the version of SFML that you downloaded. Mine is called SFML-2.3.2-windows-vc14-32-bit. Your file name will likely reflect a more recent version. Double click this folder to see the contents, then double click again into the next folder (mine is called SFML-2.3.2). The image below is what my SFML-2.3.2 folder contents looks like, when the entire contents has been selected. Yours should look the same. Copy the entire contents of this folder, as seen in the previous image and paste/drag all the contents into the SFML folder you created in step 3. I will refer to this folder simply as your SFML folder. Now we are ready to start using C++ and SFML in Visual Studio. Creating a reusable project template As setting up a project is a fairly fiddly process, we will create a project and then save it as a Visual Studio template. This will save us a quite significant amount of work each time we start a new game. So if you find the next tutorial a little tedious, rest assured that you will never need to do this again. In the New Project window, click the little drop-down arrow next to Visual C++ to reveal more options, then click Win32 and then click Win32 Console Application, You can see all these selections in the next screen-shot. Now, at the bottom of the New Project window type HelloSFML in the Name: field. Next, browse to the Visual Studio StuffProjects folder that we created in the previous tutorial. This will be the location that all our project files will be kept. All templates are based on an actual project. So we will have a project called HelloSFML but the only thing we will do with it, is make a template from it. When you have completed the steps above click OK. The next image shows the Application Settings window. Check the box for Console application, and leave the other options as shown below. Click Finish and Visual Studio will create the new project. Next we will add some fairly intricate and important project settings. This is the laborious part, but as we will create a template, we will only need to do this once. What we need to do is to tell Visual Studio, or more specifically the code compiler, that is part of Visual Studio, where to find a special type of code file from SFML. The special type of file I am referring to is a header file. Header files are the files that define the format of the SFML code. So when we use the SFML code, the compiler knows how to handle it. Note that the header files are distinct from the main source code files and they are contained in files with the .hpp file extension. All this will become clearer when we eventually start adding our own header files in the second project. In addition, we need to tell Visual Studio where it can find the SFML library files. From the Visual Studio main menu select Project | HelloSFML properties. In the resulting HelloSFML Property Pages window, take the following steps, which are numbered and can be referred to in the next image. First, select All Configurations from the Configuration: drop-down. Second, select C/C++ then General from the left-hand menu. Third, locate the Additional Include Directories edit box and type the drive letter where your SFML folder is located, followed by SFMLinclude. The full path to type, if you located your SFML folder on your D drive is, as shown in the screen-shot, D:SFMLinclude. Vary your path if you installed SFML to a different drive. Click Apply to save your configurations so far. Now, still in the same window, perform these next steps which again refer to the next image. Select Linker then General. Find the Additional Library Directories edit box and type the drive letter where your SFML folder is, followed by SFMLlib. So the full path to type if you located your SFML folder on your D drive is, as shown in the screen-shot, D:SFMLlib. Vary your path if you installed SFML to a different drive. Click Apply to save your configurations so far. Finally for this stage, still in the same window, perform these steps which again refer to the next image. Switch the Configuration: drop down(1) to Debug as we will be running and testing our games in debug mode. Select Linker then Input (2). Find the Additional Dependencies edit box (3) and click into it at the far left hand side. Now copy & paste/type the following at the indicated place.: sfml-graphics-d.lib;sfml-window-d.lib;sfml-system-d.lib;sfml-network-d.lib;sfml-audio-d.lib; Again be REALLY careful to place the cursor exactly and not to overwrite any of the text that is already there. Click OK. Let's make the template of our HelloSFML project so we never have to do this slightly mind-numbing task again. Creating a reusable project template is really easy. In Visual Studio select File | Export Template…. Then in the Export Template Wizard window make sure the Project template option is selected and the HelloSFML project is selected for the From which project do you want to create a template option. Click Next and then Finish. Phew, that's it! Next time we create a project I'll show you how to do it from this template. Let's build Timber!!! Summary In this article we learnt that, it is true that configuring an IDE, to use a C++ library can be a bit awkward and long. Also the concept of classes and objects is well known to be slightly awkward for people new to coding. Resources for Article: Further resources on this subject: Game Development Using C++ [Article] Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [Article] Introducing the Boost C++ Libraries [Article]
Read more
  • 0
  • 0
  • 9307

article-image-customer-management-joomla-and-virtuemart
Packt
22 Oct 2009
7 min read
Save for later

Customer Management in Joomla! and VirtueMart

Packt
22 Oct 2009
7 min read
Note that all VirtueMart customers must be registered with Joomla!. However, not all Joomla! users need to be the VirtueMart customers. Within the first few sections of this article, you will have a clear concept about user management in Joomla! and VirtueMart. Customer management Customer management in VirtueMart includes registering customers to the VirtueMart shop, assigning them to user groups for appropriate permission levels, managing fields in the registration form, viewing and editing customer information, and managing the user groups. Let's dive in to these activities in the following sections. Registration/Authentication of customers Joomla! has a very strong user registration and authentication system. One core component in Joomla! is com_users, which manages user registration and authentication in Joomla!. However, VirtueMart needs some extra information for customers. VirtueMart collects this information through its own customer registration process, and stores the information in separate tables in the database. The extra information required by VirtueMart is stored in a table named jos_vm_user_info, which is related to the jos_users table by the user id field. Usually, when a user registers to the Joomla! site, they also register with VirtueMart. This depends on some global settings. In the following sections, we are going to learn how to enable the user registration and authentication for VirtueMart. Revisiting registration settings We configure the registration settings from VirtueMart's administration panel Admin | Configuration | Global screen. There is a section titled User Registration Settings, which defines how the user registration will be handled: Ensure that your VirtueMart shop has been configured as shown in the screenshot above. The first field to configure is the User Registration Type. Selecting Normal Account Creation in this field creates both a Joomla! and VirtueMart account during user registration. For our example shop, we will be using this setting. Joomla!'s new user activation should be disabled when we are using VirtueMart. That means the Joomla! New account activation necessary? field should read No. Enabling VirtueMart login module There is a default module in Joomla! which is used for user registrations and login. When we are using this default Login Form (mod_login module), it does not collect information required by VirtueMart, and does not create customers in VirtueMart. By default, when published, the mod_login module looks like the following screenshot: As you see, registered users can log in to Joomla! through this form, recover their forgotten password by clicking on the Forgot your password? link, and create a new user account by clicking on the Create an account link. When a user clicks on the Create an account link, they get the form as shown in the following screenshot: We see that normal registration in Joomla! only requires four pieces of information: Name, Username, Email, and Password. It does not collect information needed in VirtueMart, such as billing and shipping address, to be a customer. Therefore, we need to disable the mod_login module and enable the mod_virtuemart_login module. We have already learned how to enable and disable a module in Joomla!. We have also learned how to install modules. By default, the mod_virtuemart_login module's title is VirtueMart Login. You may prefer to show this title as Login only. In that case, click on the VirtueMart Login link in the Module Name column. This brings the Module:[Edit] screen: In the Title field, type Login (or any other text you want to show as the title of this module). Make sure the module is enabled and position is set to left or right. Click  on the Save icon to save your settings. Now, browse to your site's front-page  (for example, http://localhost/bdosn/), and you will see the login form as shown in the following screenshot: As you can see, this module has the same functionalities as we saw in the mod_login module of Joomla!. Let us test the account creation in this module. Click on the Register link. It brings the following screen: The registration form has three main sections: Customer Information, Bill To Information, and Send Registration. At the end, there is the Send Registration button for submitting the form data. In the Customer Information section, type your email address, the desired username, and password. Confirm the password by typing it again in the Confirm password field. In the Bill To Information section, type the address details where bills are to be sent. In the entire form, required fields are marked with an asterisk (*). You must provide information for these required fields. In the Send Registration section, you need to agree to the Terms of Service. Click on the Terms of Service link to read it. Then, check the I agree to the Terms of Service checkbox and click on the Send Registration button to submit the form data: If you have provided all of the required information and submitted a unique  email address, the registration will be successful. On successful completion of registration, you get the following screen notification, and will be logged in to  the shop automatically: If you scroll down to the Login module, you will see that you are logged in and greeted by the store. You also see the User Menu in this screen: Both the User Menu and the Login modules contain a Logout button. Click on either of these buttons to log out from the Joomla! site. In fact, links in the User Menu module are for Joomla! only. Let us try the link Your Details. Click on the Your Details link, and you will see the information shown in the following screenshot: As you see in the screenshot above, you can change your full name, email, password, frontend language, and time zone. You cannot view any information regarding billing address, or other information of the customer. In fact, this information is for regular Joomla! users. We can only get full customer information by clicking on the Account Maintenance link in the Login module. Let us try it. Click on the Account Maintenance link, and it shows the following screenshot: The Account Maintenance screen has three sections: Account Information, Shipping Information, and Order Information. Click on the Account Information link to see what happens. It shows the following screen: This shows Customer Information and Bill To Information, which have been entered during user registration. The last section on this screen is the Bank Information, from where the customer can add bank account information. This section looks like the following screenshot: As you can see, from the Bank Account Info section, the customers can enter their bank account information including the account holder's name, account number, bank's sorting code number, bank's name, account type, and IBAN (International Bank Account Number). Entering this information is important when you are using  a Bank Account Debit payment method. Now, let us go back to the Account Maintenance screen and see the other sections. Click on the Shipping Information link, and you get the following screen: There is one default shipping address, which is the same as the billing address. The customers can create additional shipping addresses. For creating a new shipping address, click on the Add Address link. It shows the following screen:
Read more
  • 0
  • 0
  • 9307

article-image-data-cleaning-worst-part-of-data-analysis
Amey Varangaonkar
04 Jun 2018
5 min read
Save for later

Data cleaning is the worst part of data analysis, say data scientists

Amey Varangaonkar
04 Jun 2018
5 min read
The year was 2012. Harvard Business Review had famously declared the role of data scientist as the ‘sexiest job of the 21st century’. Companies were slowly working with more data than ever before. The real actionable value of the data that could be used for commercial purposes was slowly beginning to uncover. Someone who could derive these actionable insights from the data was needed. The demand for data scientists was higher than ever. Fast forward to 2018 - more data has been collected in the last 2 years than ever before. Data scientists are still in high demand, and the need for insights is higher than ever. There has been one significant change, though - the process of deriving insights has become more complex. If you ask the data scientists, the first initial phase of this process, which involves data cleansing, has become a lot more cumbersome. So much so, that it is no longer a myth that data scientists spend almost 80% of their time cleaning and readying the data for analysis. Why data cleaning is a nightmare In the recently conducted Packt Skill-Up survey, we asked data professionals what the worst part of the data analysis process was, and a staggering 50% responded with data cleaning. Source: Packt Skill Up Survey We dived deep into this, and tried to understand why many data science professionals have this common feeling of dislike towards data cleaning, or scrubbing - as many call it. Read the Skill Up report in full. Sign up to our weekly newsletter and download the PDF for free. There is no consistent data format Organizations these days work with a lot of data. Some of it is in a structured, readily understandable format. This kind of data is usually quite easy to clean, parse and analyze. However, some of the data is really messy, and cannot be used as is for analysis. This includes missing data, irregularly formatted data, and irrelevant data which is not worth analyzing at all. There is also the problem of working with unstructured data which needs to be pre-processed to get the data worth analyzing. Audio or video files, email messages, presentations, xml documents and web pages are some classic examples of this. There’s too much data to be cleaned The volume of data that businesses deal with on a day to day basis is in the scale of terabytes or even petabytes. Making sense of all this data, coming from a variety of sources and in different formats is, undoubtedly, a huge task. There are a whole host of tools designed to ease this process today, but it remains an incredibly tricky challenge to sift through the large volumes of data and prepare it for analysis. Data cleaning is tricky and time-consuming Data cleansing can be quite an exhaustive and time-consuming task, especially for data scientists. Cleaning the data requires removal of duplications, removing or replacing missing entries, correcting misfielded values, ensuring consistent formatting and a host of other tasks which take a considerable amount of time. Once the data is cleaned, it needs to be placed in a secure location. Also, a log of the entire process needs to be kept to ensure the right data goes through the right process. All of this requires the data scientists to create a well-designed data scrubbing framework to avoid the risk of repetition. All of this is more of a grunt work and requires a lot of manual effort. Sadly, there are no tools in the market which can effectively automate this process. Outsourcing the process is expensive Given that data cleaning is a rather tedious job, many businesses think of outsourcing the task to third party vendors. While this reduces a lot of time and effort on the company’s end, it definitely increases the cost of the overall process. Many small and medium scale businesses may not be able to afford this, and thus are heavily reliant on the data scientist to do the job for them. You can hate it, but you cannot ignore it It is quite obvious that data scientists need clean, ready-to-analyze data if they are to to extract actionable business insights from it. Some data scientists equate data cleaning to donkey work, suggesting there’s not a lot of innovation involved in this process. However, some believe data cleaning is rather important, and pay special attention to it given once it is done right, most of the problems in data analysis are solved. It is very difficult to take advantage of the intrinsic value offered by the dataset if it does not adhere to the quality standards set by the business, making data cleaning a crucial component of the data analysis process. Now that you know why data cleaning is essential, why not dive deeper into the technicalities? Check out our book Practical Data Wrangling for expert tips on turning your noisy data into relevant, insight-ready information using R and Python. Read more Cleaning Data in PDF Files 30 common data science terms explained How to create a strong data science project portfolio that lands you a job  
Read more
  • 0
  • 2
  • 9299
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-perform-data-partitioning-postgresql-10
Sugandha Lahoti
09 Mar 2018
11 min read
Save for later

How to perform data partitioning in PostgreSQL 10

Sugandha Lahoti
09 Mar 2018
11 min read
Partitioning refers to splitting; logically it means breaking one large table into smaller physical pieces. PostgreSQL supports basic table partitioning. It can store up to 32 TB of data inside a single table, which are by default 8k blocks. Infact, if we compile PostgreSQL with 32k blocks, we can even put up to 128 TB into a single table. However, large tables like these are not necessarily too convenient and it makes sense to partition tables to enable processing easier, and in some cases, a bit faster. With PostgreSQL 10.0, partitioning data has improved and offers significantly easier handling of partitioning data to the end users. In this article, we will talk about both, the classic way to partition data as well as the new features available on PostgreSQL 10.0 to perform data partitioning. Creating partitions First, we will learn the old method to partition data. Before digging deeper into the advantages of partitioning, I want to show how partitions can be created. The entire thing starts with a parent table: test=# CREATE TABLE t_data (id serial, t date, payload text); CREATE TABLE In this example, the parent table has three columns. The date column will be used for partitioning but more on that a bit later. Now that the parent table is in place, the child tables can be created. This is how it works: test=# CREATE TABLE t_data_2016 () INHERITS (t_data); CREATE TABLE test=# d t_data_2016 Table "public.t_data_2016" Column | Type  | Modifiers ---------+---------+----------------------------------------------------- id   | integer | not null default nextval('t_data_id_seq'::regclass) t    | date | payload | text   | Inherits: t_data The table is called t_data_2016 and inherits from t_data.  () means that no extra columns are added to the child table. As you can see, inheritance means that all columns from the parents are available in the child table. Also note that the id column will inherit the sequence from the parent so that all children can share the very same numbering. Let's create more tables: test=# CREATE TABLE t_data_2015 () INHERITS (t_data); CREATE TABLE test=# CREATE TABLE t_data_2014 () INHERITS (t_data); CREATE TABLE So far, all the tables are identical and just inherit from the parent. However, there is more: child tables can actually have more columns than parents. Adding more fields is simple: test=# CREATE TABLE t_data_2013 (special text) INHERITS (t_data); CREATE TABLE In this case, a special column has been added. It has no impact on the parent, but just enriches the children and makes them capable of holding more data. After creating a handful of tables, a row can be added: test=# INSERT INTO t_data_2015 (t, payload) VALUES ('2015-05-04', 'some data'); INSERT 0 1 The most important thing now is that the parent table can be used to find all the data in the child tables: test=# SELECT * FROM t_data; id |   t   | payload ----+------------+----------- 1 | 2015-05-04 | some data (1 row) Querying the parent allows you to gain access to everything below the parent in a simple and efficient manner. To understand how PostgreSQL does partitioning, it makes sense to take a look at the plan: test=# EXPLAIN SELECT * FROM t_data; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..84.10 rows=4411 width=40) -> Seq Scan on t_data (cost=0.00..0.00 rows=1 width=40) -> Seq Scan on t_data_2016 (cost=0.00..22.00 rows=1200 width=40) -> Seq Scan on t_data_2015 (cost=0.00..22.00 rows=1200 width=40) -> Seq Scan on t_data_2014 (cost=0.00..22.00 rows=1200 width=40) -> Seq Scan on t_data_2013 (cost=0.00..18.10 rows=810 width=40) (6 rows) Actually, the process is quite simple. PostgreSQL will simply unify all tables and show us all the content from all the tables inside and below the partition we are looking at. Note that all tables are independent and are just connected logically through the system catalog. Applying table constraints What happens if filters are applied? test=# EXPLAIN SELECT * FROM t_data WHERE t = '2016-01-04'; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..95.12 rows=23 width=40) -> Seq Scan on t_data (cost=0.00..0.00 rows=1 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2016 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2015 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2014 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2013 (cost=0.00..20.12 rows=4 width=40) Filter: (t = '2016-01-04'::date) (11 rows) PostgreSQL will apply the filter to all the partitions in the structure. It does not know that the table name is somehow related to the content of the tables. To the database, names are just names and have nothing to do with what you are looking for. This makes sense, of course, as there is no mathematical justification for doing anything else. The point now is: how can we teach the database that the 2016 table only contains 2016 data, the 2015 table only contains 2015 data, and so on? Table constraints are here to do exactly that. They teach PostgreSQL about the content of those tables and therefore allow the planner to make smarter decisions than before. The feature is called constraint exclusion and helps dramatically to speed up queries in many cases. The following listing shows how table constraints can be created: test=# ALTER TABLE t_data_2013 ADD CHECK (t < '2014-01-01'); ALTER TABLE test=# ALTER TABLE t_data_2014 ADD CHECK (t >= '2014-01-01' AND t < '2015-01-01'); ALTER TABLE test=# ALTER TABLE t_data_2015 ADD CHECK (t >= '2015-01-01' AND t < '2016-01-01'); ALTER TABLE test=# ALTER TABLE t_data_2016 ADD CHECK (t >= '2016-01-01' AND t < '2017-01-01'); ALTER TABLE For each table, a CHECK constraint can be added. PostgreSQL will only create the constraint if all the data in those tables is perfectly correct and if every single row satisfies the constraint. In contrast to MySQL, constraints in PostgreSQL are taken seriously and honored under any circumstances. In PostgreSQL, those constraints can overlap--this is not forbidden and can make sense in some cases. However, it is usually better to have non-overlapping constraints because PostgreSQL has the option to prune more tables. Here is what happens after adding those table constraints: test=# EXPLAIN SELECT * FROM t_data WHERE t = '2016-01-04'; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..25.00 rows=7 width=40) -> Seq Scan on t_data (cost=0.00..0.00 rows=1 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2016 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) (5 rows) The planner will be able to remove many of the tables from the query and only keep those which potentially contain the data. The query can greatly benefit from a shorter and more efficient plan. In particular, if those tables are really large, removing them can boost speed considerably. Modifying inherited structures Once in a while, data structures have to be modified. The ALTER  TABLE clause is here to do exactly that. The question is: how can partitioned tables be modified? Basically, all you have to do is tackle the parent table and add or remove columns. PostgreSQL will automatically propagate those changes through to the child tables and ensure that changes are made to all the relations, as follows: test=# ALTER TABLE t_data ADD COLUMN x int; ALTER TABLE test=# d t_data_2016 Table "public.t_data_2016" Column |   Type   | Modifiers ---------+---------+----------------------------------------------------- id | integer | not null default t | date | payload |  text | x | integer | Check constraints: nextval('t_data_id_seq'::regclass) "t_data_2016_t_check" CHECK (t >= '2016-01-01'::date AND t < '2017-01-01'::date) Inherits: t_data As you can see, the column is added to the parent and automatically added to the child table here. Note that this works for columns, and so on. Indexes are a totally different story. In an inherited structure, every table has to be indexed separately. If you add an index to the parent table, it will only be present on the parent-it won't be deployed on those child tables. Indexing all those columns in all those tables is your task and PostgreSQL is not going to make those decisions for you. Of course, this can be seen as a feature or as a limitation. On the upside, you could say that PostgreSQL gives you all the flexibility to index things separately and therefore potentially more efficiently. However, people might also argue that deploying all those indexes one by one is a lot more work. Moving tables in and out of partitioned structures Suppose you have an inherited structure. Data is partitioned by date and you want to provide the most recent years to the end user. At some point, you might want to remove some data from the scope of the user without actually touching it. You might want to put data into some sort of archive or something. PostgreSQL provides a simple means to achieve exactly that. First, a new parent can be created: test=# CREATE TABLE t_history (LIKE t_data); CREATE TABLE The LIKE keyword allows you to create a table which has exactly the same layout as the t_data table. If you have forgotten which columns the t_data table actually has, this might come in handy as it saves you a lot of work. It is also possible to include indexes, constraints, and defaults. Then, the table can be moved away from the old parent table and put below the new one. Here is how it works: test=# ALTER TABLE t_data_2013 NO INHERIT t_data; ALTER TABLE test=# ALTER TABLE t_data_2013 INHERIT t_history; ALTER TABLE The entire process can of course be done in a single transaction to assure that the operation stays atomic. Cleaning up data One advantage of partitioned tables is the ability to clean data up quickly. Suppose that we want to delete an entire year. If data is partitioned accordingly, a simple DROP  TABLE clause can do the job: test=# DROP TABLE t_data_2014; DROP TABLE As you can see, dropping a child table is easy. But what about the parent table? There are depending objects and therefore PostgreSQL naturally errors out to make sure that nothing unexpected happens: test=# DROP TABLE t_data; ERROR: cannot drop table t_data because other objects depend on it DETAIL: default for table t_data_2013 column id depends on sequence t_data_id_seq table t_data_2016 depends on table t_data table t_data_2015 depends on table t_data HINT: Use DROP ... CASCADE to drop the dependent objects too. The DROP  TABLE clause will warn us that there are depending objects and refuses to drop those tables. The CASCADE clause is needed to force PostgreSQL to actually remove those objects, along with the parent table: test=# DROP TABLE t_data CASCADE; NOTICE:   drop   cascades to 3 other objects DETAIL:   drop   cascades to default for table    t_data_2013 column id drop cascades to table      t_data_2016 drop   cascades to table t_data_2015 DROP TABLE Understanding PostgreSQL 10.0 partitioning For many years, the PostgreSQL community has been working on built-in partitioning. Finally, PostgreSQL 10.0 offers the first implementation of in-core partitioning, which will be covered in this chapter. For now, the partitioning functionality is still pretty basic. However, a lot of infrastructure for future improvements is already in place. To show you how partitioning works, I have compiled a simple example featuring range partitioning: CREATE TABLE data ( payload   integer )  PARTITION BY RANGE (payload); CREATE TABLE negatives PARTITION OF data FOR VALUES FROM (MINVALUE) TO (0); CREATE TABLE positives PARTITION OF data FOR VALUES FROM (0) TO (MAXVALUE); In this example, one partition will hold all negative values while the other one will take care of positive values. While creating the parent table, you can simply specify which way you want to partition data. In PostgreSQL 10.0, there is range partitioning and list partitioning. Support for hash partitioning and the like might be available as soon as PostgreSQL 11.0. Once the parent table has been created, it is already time to create the partitions. To do that, the PARTITION  OF clause has been added. At this point, there are still some limitations. The most important one is that a tuple (= a row) cannot move from one partition to the other, for example: UPDATE data SET payload = -10 WHERE id = 5 If there were rows satisfying this condition, PostgreSQL would simply error out and refuse to change the value. However, in case of a good design, it is a bad idea to change the partitioning key anyway. Also, keep in mind that you have to think about indexing each partition. We learnt both, the old way of data partitioning and new data partitioning features introduced in PostgreSQL 10.0. [box type="note" align="" class="" width=""]You read an excerpt from the book Mastering PostgreSQL 10, written by Hans-Jürgen Schönig.  To know about, query optimization, stored procedures and other techniques in PostgreSQL 10.0, you may check out this book Mastering PostgreSQL 10..[/box]
Read more
  • 0
  • 0
  • 9293

article-image-client-side-validation-jquery-validation-plugin
Jabran Rafique
23 Nov 2016
9 min read
Save for later

Client-Side Validation with the jQuery Validation Plugin

Jabran Rafique
23 Nov 2016
9 min read
Form validation is a critical procedure in data collection. A correct validation in place makes sure that forms collect correct and expected data. Validation must be done on server side and optionally on client side. Server-side validation is robust and secure because a user will not access to modify its behaviour. However, client-side validation can be tempered easily. Applications relying on client-side validation completely and bypassing at server side are more open to security threats with data exploits. Client-side validation is about giving users a better experience. This is so that users don't have to go through everything on the page or submit a whole form to find out that they have one or even a few entries incorrect, but rather alerting users instantly to correct mistakes in place. Modern browsers enforce client-side validation by default for form fields in an HTML5 document with certain attributes (that is, required). There are cross-browser limitations on how these validation messages can be styled, positioned, and labeled. JavaScript plays a vital role in client-side form validation. There are many different ways to validate a form using JavaScript. JavaScript libraries such as jQuery also provide a number of ways to validate a form. If a project is using any such library already, it would easier to utilize the library's form validation methods. jQuery validation is a jQuery plugin that makes it convenient to validate a HTML form. We will take a quick look at it below. Installation There are a number of following ways to install this library. Download directly from GitHub Use CDN hosted files Using bower $ bower install jquery-validation --save For simplicity of this tutorial's demo we will use CDN hosted files. Usage Now that we have installed jQuery Validation we start by adding it to our HTML document. Here is the HTML for our demo app: <!DOCTYPE html> <html> <head> <title>Learn jQuery Validation</title> </head> <body> <div class="container"> <h1>Learn jQuery Validation</h1> <form method="post"> <div> <label for="first-name">First Name:</label> <input type="text" id="first-name" name="first_name"> </div> <div> <label for="last-name">Last Name:</label> <input type="text" id="last-name" name="last_name"> </div> <div> <label for="email-address">Email:</label> <input type="text" id="email-address" name="email_address"> </div> <div> <button type="submit" id="submit-cta">Submit</button> </div> </form> </div> <script src="https://code.jquery.com/jquery-2.2.4.min.js" integrity="sha256-BbhdlvQf/xTY9gja0Dq3HiwQF8LaCRTXxZKRutelT44=" crossorigin="anonymous"></script> <script src="//cdnjs.cloudflare.com/ajax/libs/jquery-validate/1.15.0/jquery.validate.min.js)></script> </body> </html> All jQuery validation plugin methods are instantly available after the page is loaded. In the following example we add required attribute to our form fields, which will be used by the jQuery Validation plugin for basic validation: ... <form method="post"> <div> <label for="first-name">First Name:</label> <input type="text" id="first-name" name="first_name" required> </div> <div> <label for="last-name">Last Name:</label> <input type="text" id="last-name" name="last_name" required> </div> <div> <label for="email-address">Email:</label> <input type="text" id="email-address" name="email_address" required> </div> <div> <button type="submit" id="submit-cta">Submit</button> </div> </form> ... This simply enables HTML5 browser validation in most modern browsers. Adding the following JavaScript line will activate the jQuery Validation plugin. $('form').validate(); validate() is a special method exposed by the jQuery Validation plugin and it can help customise our validation further. We will discuss this method in more details later in this article. Here is it in action: See the Pen jQuery Validation – Part I by Jabran Rafique (@jabranr) on CodePen. Submitting the form will validate the form and for empty fields it will return an error message. The generic error messages (This field is required) you see are set as default messages by the plugin. These can easily be changed by adding a data-msg-required attribute to the form field element, as shown in the following example: ... <form method="post"> <div> <label for="first-name">First Name:</label> <input type="text" id="first-name" name="first_name" required data-msg-required="First name is required."> </div> <div> <label for="last-name">Last Name:</label> <input type="text" id="last-name" name="last_name" required data-msg-required="Last name is required."> </div> <div> <label for="email-address">Email:</label> <input type="text" id="email-address" name="email_address" required data-msg-required="Email is required."> </div> <div> <button type="submit" id="submit-cta">Submit</button> </div> </form> ... Here it is in action: See the Pen jQuery Validation – Part II by Jabran Rafique (@jabranr) on CodePen. Similarly we can add different types of validation to our form fields using attributes such as minlength, maxlength, and so on, as shown in the following examples: ... <div> <label for="phone">Phone:</label> <!-- With default error messages --> <input type="text" id="phone" name="phone" minlength="11" maxlength="15"> <!-- With custom error messages --> <input type="text" id="phone" name="phone" minlength="11" maxlength="15" data-msg-min="Enter minimum of 11 digits." data-msg-max="Enter maximum of 15 digits."> </div> ... The alternative way of setting rules and messages is to pass these settings as arguments to the validate() method. Here are the above examples using the validate() method. Here is the HTML for the form with no custom validation messages and optionally no validation attributes: ... <form method="post"> <div> <label for="first-name">First Name:</label> <input type="text" id="first-name" name="first_name"> </div> <div> <label for="last-name">Last Name:</label> <input type="text" id="last-name" name="last_name"> </div> <div> <label for="email-address">Email:</label> <input type="text" id="email-address" name="email_address"> </div> <div> <button type="submit" id="submit-cta">Submit</button> </div> </form> ... Here is the JavaScript that enables validation and set custom validation messages for this form: $('form').validate({ rules: { 'first_name': required, 'last_name': required, 'email_address': { required: true, email: true } }, messages: { 'first_name': 'First name is required.', 'last_name': 'Last name is required.', 'email_address': { required: 'Email is required.', email: 'A valid email is required.' } } }); As you may have noticed a new validation constraint in the above code that is email: true. This enables checks for a valid email address. There are many more built-in constraints in the jQuery validation plugin. You can find those in the official documentation. validate() has other properties that can be updated to customised the behavior of the jQuery Validation plugin completely. Some of those are the following: errorClass: Set custom CSS class for error message element validClass: Set custom CSS class for a validated element errorPlacement(): Define where to put error messages highlight(): Define method to highlight error/validated states unhighlight(): Define method to unhighlight error/validated states The default plugin methods can also be overidden for custom implementations using Validator object. Additional Methods Sooner or later in every other project, there arises a requirement for some advanced validation. Let's think of date of birth fields. We want to validate all three fields as one and it is not possible to do so using the default methods of this plugin. There are optional additional methods to include if such a requirement appears. The additional methods can either be a script included using CDN link or each of them can be included individually when using bower installation. Contribute Contributions to any open source project make it robust and more useful with bugfixes and new features. Just like any other open source project, the jQuery Validation plugin also welcomes contributions. To contribute to the jQuery Validation plugin project just head over to the GitHub project and fork the repository to work on an existing issue or by adding or start a new one. Don't forget to read the contribution guidelines before starting. All of the demos in this tutorial can be found at CodePen as Collection. Hope this quick get started guide will make it easier to use the jQuery Validation plugin in your next project for better form validation. Author Jabran Rafique is a London-based web engineer. He currently works as a front-end web developer at Rated People. He has a master’s in computer science from Staffordshire University and more than 6 years of professional experience in web systems. He has also served as aregional lead and advocate at Google Map Makersince 2008, where he contributed to building digital maps in order to make them available to millions of people worldwide as well as organize and speak at international events. He writes on his website/blog about different things, and shares code at GitHub and thoughts on Twitter.
Read more
  • 0
  • 1
  • 9279

article-image-adding-geolocation-trigger-salesforce-account-object
Packt
16 May 2014
8 min read
Save for later

Adding a Geolocation Trigger to the Salesforce Account Object

Packt
16 May 2014
8 min read
(For more resources related to this topic, see here.) Obtaining the Google API key First, you need to obtain an API key for the Google Geocoding API: Visit https://code.google.com/apis/console and sign in with your Google account (assuming you already have one). Click on the Create Project button. Enter My Salesforce Account Project for the Project name. Accept the default value for the Project ID. Click on Create. Click on APIs & auth from the left-hand navigation bar. Set the Geocoding API to ON. Select Credentials and click on CREATE NEW KEY. Click on the Browser Key button. Click on Create to generate the key. Make a note of the API key. Adding a Salesforce remote site Now, we need to add a Salesforce remote site for the Google Maps API: Navigate to Setup | Security Controls | Remote Site Settings. Click on the New Remote Site button. Enter Google_Maps_API for the Remote Site Name. Enter https://maps.googleapis.com for the Remote Site URL. Ensure that the Active checkbox is checked. Click on Save. Your remote site detail should resemble the following screenshot: Adding the Location custom field to Account Next, we need to add a Location field to the Account object: Navigate to Setup | Customize | Accounts | Fields. Click on the New button in the Custom Fields & Relationships section. Select Geolocation for the Data Type. Click on Next. Enter Location for the Field Label. The Field Name should also default to Location. Select Decimal for the Latitude and Longitude Display Notation. Enter 7 for the Decimal Places. Click on Next. Click on Next to accept the defaults for Field-Level Security. Click on Save to add the field to all account related page layouts. Adding the Apex Utility Class Next, we need an Apex utility class to geocode an address using the Google Geocoding API: Navigate to Setup | Develop | Apex Classes. All of the Apex classes for your organization will be displayed. Click on Developer Console. Navigate to File | New | Apex Class. Enter AccountGeocodeAddress for the Class Name and click on OK. Enter the following code into the Apex Code Editor in your Developer Console window: // static variable to determine if geocoding has already occurred private static Boolean geocodingCalled = false; // wrapper method to prevent calling future methods from an existing future context public static void DoAddressGeocode(id accountId) {   if (geocodingCalled || System.isFuture()) {     System.debug(LoggingLevel.WARN, '***Address Geocoding Future Method Already Called - Aborting...');     return;   }   // if not being called from future context, geocode the address   geocodingCalled = true;   geocodeAddress(accountId); } The AccountGeocodeAddress method and public static variable geocodingCalled protect us from a potential error where a future method may be called from within a future method that is already executing. If this isn't the case, we call the geocodeAddress method that is defined next. Enter the following code into the Apex Code Editor in your Developer Console window: // we need a future method to call Google Geocoding API from Salesforce @future (callout=true) static private void geocodeAddress(id accountId) {   // Key for Google Maps Geocoding API   String geocodingKey = '[Your API Key here]';   // get the passed in address   Account geoAccount = [SELECT BillingStreet, BillingCity, BillingState, BillingCountry, BillingPostalCode     FROM Account     WHERE id = :accountId];       // check that we have enough information to geocode the address   if ((geoAccount.BillingStreet == null) || (geoAccount.BillingCity == null)) {     System.debug(LoggingLevel.WARN, 'Insufficient Data to Geocode Address');     return;   }   // create a string for the address to pass to Google Geocoding API   String geoAddress = '';   if (geoAccount.BillingStreet != null)     geoAddress += geoAccount.BillingStreet + ', ';   if (geoAccount.BillingCity != null)     geoAddress += geoAccount.BillingCity + ', ';   if (geoAccount.BillingState != null)     geoAddress += geoAccount.BillingState + ', ';   if (geoAccount.BillingCountry != null)     geoAddress += geoAccount.BillingCountry + ', ';   if (geoAccount.BillingPostalCode != null)     geoAddress += geoAccount.BillingPostalCode;     // encode the string so we can pass it as part of URL   geoAddress = EncodingUtil.urlEncode(geoAddress, 'UTF-8');   // build and make the callout to the Geocoding API   Http http = new Http();   HttpRequest request = new HttpRequest();   request.setEndpoint('https://maps.googleapis.com/maps/api/geocode/json?address='     + geoAddress + '&key=' + geocodingKey     + '&sensor=false');   request.setMethod('GET');   request.setTimeout(60000);   try {     // make the http callout     HttpResponse response = http.send(request);     // parse JSON to extract co-ordinates     JSONParser responseParser = JSON.createParser(response.getBody());     // initialize co-ordinates     double latitude = null;     double longitude = null;     while (responseParser.nextToken() != null) {       if ((responseParser.getCurrentToken() == JSONToken.FIELD_NAME) &&       (responseParser.getText() == 'location')) {         responseParser.nextToken();         while (responseParser.nextToken() != JSONToken.END_OBJECT) {           String locationText = responseParser.getText();           responseParser.nextToken();           if (locationText == 'lat')             latitude = responseParser.getDoubleValue();           else if (locationText == 'lng')             longitude = responseParser.getDoubleValue();         }       }     }     // update co-ordinates on address if we get them back     if (latitude != null) {       geoAccount.Location__Latitude__s = latitude;       geoAccount.Location__Longitude__s = longitude;       update geoAccount;     }   } catch (Exception e) {     System.debug(LoggingLevel.ERROR, 'Error Geocoding Address - ' + e.getMessage());   } } Insert your Google API key in the following line of code: String geocodingKey = '[Your API Key here]'; Navigate to File | Save. Adding the Apex Trigger Finally, we need to implement an Apex trigger class to geocode the Billing Address when an Account is added or updated Navigate to Setup | Develop | Apex Triggers. All of the Apex triggers for your organization will be displayed. Click on Developer Console. Navigate to File | New | Apex Trigger in the Developer Console. Enter geocodeAccountAddress in the Name field. Select Account in the Objects dropdown list and click on Submit. Enter the following code into the Apex Code Editor in your Developer Console window: trigger geocodeAccountAddress on Account (after insert, after update) {       // bulkify trigger in case of multiple accounts   for (Account account : trigger.new) {       // check if Billing Address has been updated     Boolean addressChangedFlag = false;     if (Trigger.isUpdate) {       Account oldAccount = Trigger.oldMap.get(account.Id);       if ((account.BillingStreet != oldAccount.BillingStreet) ||       (account.BillingCity != oldAccount.BillingStreet) ||         (account.BillingCountry != oldAccount.BillingCountry) ||         (account.BillingPostalCode != oldAccount.BillingPostalCode)) {           addressChangedFlag = true;           System.debug(LoggingLevel.DEBUG, '***Address changed for - ' + oldAccount.Name);       }     }     // if address is null or has been changed, geocode it     if ((account.Location__Latitude__s == null) || (addressChangedFlag == true)) {       System.debug(LoggingLevel.DEBUG, '***Geocoding Account - ' + account.Name);       AccountGeocodeAddress.DoAddressGeocode(account.id);     }   } } Navigate to File | Save. The after insert / after update account trigger itself is relatively simple. If the Location field is blank, or the Billing Address has been updated, a call is made to the AccountGeocodeAddress.DoAddressGeocode method to geocode the address against the Google Maps Geocoding API. Summary Congratulations, you have now completed the Geolocation trigger for your Salesforce Account Object. With this, we can calculate distances between two objects in Salesforce or search for accounts/contacts within a certain radius. Resources for Article: Further resources on this subject: Learning to Fly with Force.com [Article] Salesforce CRM Functions [Article] Force.com: Data Management [Article]
Read more
  • 0
  • 2
  • 9278

article-image-why-dont-you-have-a-monorepo
Viktor Charypar
01 Feb 2019
27 min read
Save for later

Why don't you have a monorepo?

Viktor Charypar
01 Feb 2019
27 min read
You’ve probably heard that Facebook, Twitter, Google, Microsoft, and other tech industry behemoths keep their entire codebase, all services, applications and tools in a single huge repository - a monorepo. If you’re used to the standard way most teams manage their codebase - one application, service or tool per repository - this sounds very strange. Many people conclude it must only solve problems the likes of Google and Facebook have. This is a guest post by Viktor Charypar, Technical Director at Red Badger. But monorepos are not only useful if you can build a custom version control system to cope. They actually have many advantages even at a smaller scale that standard tools like Git handle just fine. Using a monorepo can result in fewer barriers in the software development lifecycle. It can allow faster feedback loops, less time spent looking for code,, and less time reporting bugs and waiting for them to be fixed. It also makes it much easier to analyze a huge treasure trove of interesting data about how your software is actually built and where problem areas are. We’ve used a monorepo at one of our clients for almost three years and it’s been great. I really don’t see why you wouldn’t. But roughly every two months I tend to have a conversation with someone who’s not used to working in this way and the entire idea just seems totally crazy to them. And the conversation tends to always follow the same path, starting with the sheer size and quickly moving on to dependency management, testing and versioning strategies. It gets complicated. It’s time I finally wrote down a coherent reasoning behind why I believe monorepos should be the default way we manage a codebase. Especially if you’re building something even vaguely microservices based, you have multiple teams and want to share common code. What do you mean “just one repo”? Just so we’re all thinking about the same thing, when I say monorepo, I’m talking about a strategy of storing all the code you as an organization are responsible for. This could be a project, a programme of work, or the entirety of a product and infrastructure code of your company in a single repository, under one revision history. Individual components (libraries, services, custom tools, infrastructure automation, ...) are stored alongside each other in folders. It’s analogous to the UNIX file tree which has a single root, as opposed to multiple, device based roots in Windows operating systems. People not familiar with the concept typically have a fairly strong reaction to the idea. One giant repo? Why would anyone do that? That cannot possibly scale! Many different objections come out, most of them often only tangentially related to storing all the code together. Occasionally, people get almost religious about it (I am talking about engineers after all). Despite being used by some of the largest tech companies, it is a relatively foreign concept and on the surface goes against everything you’ve been taught about not making huge monolithic things. It also seems like we’re fixing things that are not broken: everyone in the world is doing multiple repos, building and sharing artifacts (npm modules, JARs, ruby gems…), using SemVer to version and manage dependencies and long running branches to patch bugs in older versions of code, right? Surely if it’s industry standard it must be the right thing to do. Well, I don’t believe so. I personally think almost every single one of those things is harder, more laborious, more brittle, harder to test and generally just more wasteful than the equivalent approach you get as a consequence of a monorepo. And a few of the capabilities a monorepo enables can’t be replicated in a multi-repo situation even if you build a lot of infrastructure around it, basically because you introduce distributed computing problems and get on the bad side of CAP theorem (we’ll look at this closer below). Apart from letting you make dependency management easier and testing more reliable than it can get with multiple repos, monorepo will also give you a few simple, important, but easy to underestimate advantages. The biggest advantages of using a monorepo It's easier to find and discover your code in a monorepo With a monorepo, there is no question about where all the code is and when you have access to some of it, you can see all of it. It may come as a surprise, but making code visible to the rest of the organization isn’t always the default behavior. Human insecurities get in the way and people create private repositories and squirrel code away to experiment with things “until they are ready”. Typically, when the thing does get “ready”, it now has a Continuous Integration (CI) service attached, many hyperlinks lead to it from emails, chat rooms and internal wikis, several people have cloned the repo and it’s now quite a hassle to move the code to a more visible, obvious place and so it stays where it started. As a consequence, it is often quite hard work to find all the code for the project and gain access to it, which is hard and expensive for new joiners and hurts collaboration in general. You could say this is a matter of discipline and I will agree with you, but why leave to individual discipline what you can simply prevent by telling everyone that all the code belongs in the one repo, it’s completely ok to put even little experiments and pet projects there. You never know what they will grow into and putting them in the repo has basically no cost attached. Visibility aids understanding of how to use internal APIs (internal in the sense of being designed and built by your organisation). The ability to search the entire codebase from within your editor and find usages of the call you’re considering to use is very powerful. Code editors and languages can also be set up for cross-references to work, which means you can follow references into shared libraries and find usages of shared code across the codebase. And I mean the entire codebase. This also enables all kind of analyses to be done on the codebase and its history. Knowing the totality of the codebase and having a history of all the code lets you see dependencies, find parts of the codebase only committed to by a very limited group of people, hotspots changing suspiciously frequently or by a large number of people… Your codebase is the source of truth about what your engineering organization is producing, it contains an incredible amount of interesting information we typically just ignore. Monorepos give you more flexibility when moving code Conway’s Law famously states that “organizations which design systems (...) are constrained to produce designs which are copies of the communication structures of these organisations”. This is due to the level of communication necessary to produce a coherent piece of software. The further away in the organisation an owner of a piece of software is, the harder it is to directly influence it, so you design strict interfaces to insulate yourself from the effect of “their” changes. This typically affects the repository structure as well. There are two problems with this: the structure is generally chosen upfront, before we know what the right shape of the software is, and changing the structure has a cost attached. With each service and library being in a separate repository, the module boundaries are quite a lot stronger than if they are all in one repository. Extracting common pieces of code into a shared library becomes more difficult and involves a setup of a whole new repository - full with CI integration, pull request templates and labels, access control setup… hard work. In a monorepo, these boundaries are much more fluid and flexible: moving code between services and libraries, extracting new ones or inlining libraries back into their consumers all become as easy as general refactoring. There is no reason to use a completely different set of tools to change the small-scale and the large-scale structure of your codebase. The only real downside is tooling support for access control and declaring ownership. However, as monorepos get more popular, this support is getting better. GitHub now supports codeowners, for example. We will get there. A monorepo gives you a single history timeline While visibility and flexibility are quite convenient, the one feature of a monorepo which is very hard (if not impossible) to replicate is the single history timeline. We’ll go into why it’s so hard further below, but for now let’s look at the advantages it brings. Single history timelines gives us a reliable total order of changes to the codebase over time. This means that for any two contributions to the codebase, we can definitively and reliably decide which came first and which came second. It should never be ambiguous. It also means that each commit in a monorepo is a snapshot of the system as it was at that given moment. This enables a really interesting capability: it means cross-cutting changes can be made atomically, safely, in one go. Atomic cross-cutting commits Atomic cross-cutting commits make two specific scenarios much easier to achieve. First, externally forced global migrations are much easier and quicker. Let’s say multiple services use a common database and need the password and we need to rotate it. The password itself is (hopefully!) stored in a secure credential store, but at least the reference to it will be in several different places within the codebase. If the reference changes (let’s say the reference is generated every time), we can update every specific the mention of it at once, in one commit, with a search & replac. This will get everything working again. Second, and more important, we can change APIs and update both producer and all consumers at the same time, atomically. For example, we can add an endpoint to an API service and migrate consumers to use the new endpoint. In the next commit, we can remove the old API endpoint as it’s no longer needed. If you're trying to do this across multiple repositories with their own histories, the change will have to be split into several parallel commits. This leaves the potential for the two changes to overlap and happen in the wrong order. Some consumers can get migrated, then the endpoint gets removed, then the rest of the consumers get migrated. The mid-stage is an inconsistent state and an attempt to use the not-yet-migrated consumers will fail attempting to call an API endpoint that no longer exists. Monorepos remove inconsistencies in your dependencies Inconsistencies between dependent modules are the entire reason why dependency management and versioning exist. In a monorepo, the above scenario simply can’t happen. And that’s how the conversation about storing code in one place ends up being about versioning and dependency management. Monorepos essentially make the problem go away (which is my favourite kind of problem solving). Okay, this isn’t entirely true. There are still consequences to making breaking changes to APIs. For one, you need to update all the consumers, which is work, but you also need to build all of them, test everything works and deploy it all. This is quite hard for (micro)services that get individually deployed: making coordinated deployment of multiple services atomic is possible but not trivial. You can use a blue-green strategy, for example, to make sure there is no moment in time where only some services changed but not others. It gets harder for shared libraries. Building and publishing artifacts of new versions and updating all consumers to use the new version are still at least two commits, otherwise you’d be referring to versions that won’t exist until the builds finish. Now, things are getting inconsistent again and the view of what is supposed to work together is getting blurred in time again. And what if someone sneaks some changes in between the two commits. We are, once again, in a race. Unless... Building from the latest source Yes. What if instead of building and publishing shared code as prebuilt artifacts (binaries, jars, gems, npm modules), we build each deployable service completely from source. Every time a service changes, it is entirely rebuilt, including all dependencies. This is a fair bit of work for some compiled languages. However, it can be optimised with incremental build tools which skip work that’s already been done and cached. Some, like Go, solve it by simply having designed for a fast compiler. For dynamic languages, it’s just a matter of setting up include paths correctly and bundling all the relevant code. The added benefit here is you don’t need to do anything special if you’re working on a set of interdependent projects locally. No more `npm link`. The more interesting consequence is how this affects changing a shared dependency. When building from source, you have to make sure every time that happens, all the consumers get rebuilt using it. This is great, everyone gets the latest and greatest things immediately. ...right? Don’t worry, I can hear the alarm bells ringing in your head all the way from here. Your service depends on tens, if not hundreds of libraries. Any time anyone makes a mistake and breaks any of them, it breaks your code? Hell no. But hear me out. This is a problem of understanding dependencies and testing consumers. The important consequence of building from source is you now have a single handle on what is supposed to work together. There are no separate versions, you know what to test and it’s just one thing at any one time. Push dependency management In manual dependency update management - I will call it “pull” dependency management - you as a consumer are responsible for updating your dependencies as you see fit and making sure everything still works. If you find a bug, you simply don’t upgrade. Instead you report the bug to the maintainer and expect them to fix it. This can be months after the bug was introduced and the bug may have already been fixed in a newer version you haven’t yet upgraded to because things have moved on quite a bit while you were busy hitting a deadline and it would now be a sizable investment to upgrade. Now you’re a little stuck and all the ways out are a significant amount of work for someone, all that because the feedback loop is too long. Normally, as a library maintainer, you’re never quite certain how to make sure you’re not breaking anything. Even if you could run your consumers’ test suites, which consumers at what versions do you test against? And as a DevOps team doing 24/7 support for a system, how do you know which version or versions of a library is used across your services. What do you need to update to roll out that important bug fix to your customers? In push dependency management, quite a few things are the other way round. As a consumer, you’re not responsible for updating, it is done for you - effectively, you depended on the “latest” version of everything. Every time a maintainer of the library makes a change you are responsible for testing for regressions. No, not manually! You do have unit tests, right? Right?? Please have a solid regression test suite you trust, it’s 2019. So with your unit test suite, all you need to do is run it. Actually no, let’s let the maintainer run it. If they introduce a problem, they get immediate feedback from you, before their change ever hits the master branch. And this is the contract agreement in push dependency management: If you make a change and break anyone, you are responsible for fixing them. They are responsible for supplying a good enough, by their own standard, automated mechanism for you to verify things still work. The definition of “works” is the tests pass. Seriously though, you need to have a decent regression test suite! Continuous integration for push dependencies: the final piece of the monorepo puzzle The main missing piece of tooling around monorepos is support for push dependencies in CI systems. It’s quite straightforward to implement the strategy yourself, but it’s still hard enough to be worth some shared tooling. Unfortunately, the existing build tools geared towards monorepos like Bazel and Buck take over the entire build process from more familiar tools (like Maven or Babel) and you need to switch to them. Although to be fair, in exchange, you get very performant incremental builds. A lighter tooling, which lets you express dependencies between components in a monorepo, in a language agnostic way, and is only responsible for deciding which build jobs needs to be triggered given a set of changed components in a commit seems to be missing. So I built one. It’s far from perfect, but it should do the trick. Hopefully, someone with more time on their hands will eventually come up with something similarly cheap to introduce into your build system and the community will adopt it more widely. The main takeaway is that if we build from source in a monorepo we can set up a central Continuous Integration system responsible for triggering builds for all projects potentially affected by a change, intended to make sure you didn’t break anything with the work you did, whether it belongs to you or someone else. This is next to impossible in a multi-repo codebase because of the blurriness of history mentioned above. It’s interesting to me that we have this problem today in the larger ecosystem. Everywhere. And we stumble forward and somewhat successfully live with upstream changes occasionally breaking us, because we don’t really have a better choice. We don’t have the ability to test all the consumers in the world and fix things when we break them. But if we can, at least for our own codebase, why wouldn’t we do that? Along with a “you broke it you fix it” policy. Building from source in a monorepo allows that. It also makes it significantly easier to make breaking changes much harder to make. That said... About breaking changes There are two kinds of changes that break the consumer - the ones you introduce by accident, intending the keep backwards compatibility, and then the intentional ones. The first kind should not be too laborious to fix: once you find out what’s wrong, fix it in one place, make sure you didn’t break anything else, done. The second kind is harder. If you absolutely have to make an intentional breaking change, you will need to update all the consumers. Yes. That’s the deal. And that’s also fair. I’m not sure why we’re okay with breaking changes being intentionally introduced upstream on a whim. In any other area of human endeavour, a breach of contract makes people angry and they will expect you to make good by them. Yet we accept breaking changes in software as a fact of life. “It’s fine, I bumped the major version!” Semantic versioning: a bad idea It’s not fine. In fact, semantic versioning is just a bad idea. I know that’s a bold claim and this is a whole separate article (which I promise to write soon), but I’ll try to do it at least some justice here. Semantic versioning is meant to convey some meaning with the version number, but the chosen meanings it expresses are completely arbitrary. First of all, semver only talks about API contract, not behaviour. Adding side effects or changing performance characteristics of an API call for worse while keeping the data interface the same is a completely legal patch level change. And I bet you consider that a breaking change, because it will break your app. Second, does anyone really care about minor vs. patch? The promise is the API doesn’t break. So really we only care about major or other. Major is a dealbreaker, otherwise we’re ok. From a consumer perspective a major version bump spells trouble and potentially a lot of work. Making breaking changes is a mean thing to do to your consumers and you can and should avoid them. Just keep the old API around and working and add the new one next to it. As for version numbers, the most important meaning to convey seems to be “how old?” because code tends to rot, and so versioning by date might be a good choice. But, you say, I’ll have more and more code to maintain! Well yes, of course. And that’s the other problem with semver, the expectation that even old versions still get patches and fixes. It’s not very explicitly stated but it’s there. And because we typically maintain old versions on long-running branches in version control, it’s not even very visible in the codebase. What if you kept older APIs around, but deprecated them and the answer to bugs in them would be to migrate to the newer version of a particular call, which doesn’t have the bug? Would you care about having that old code? It just sits there in the codebase, until nobody uses it. It would also be much less work for the consumer, it’s just one particular call. Also, the bug is typically deeper inside your code, so it’s actually more likely you can fix it in one go for all the API surfaces, old or new. Doing the same thing in the branching model is N times the work (for N maintenance branches). There are technologies that follow this model out of necessity. One example is GraphQL, which was built to solve (among other things) the problem of many old API consumers in people’s hands and the need to support all of them for at least some time. In GraphQL, you deprecate data fields in your API and they become invisible in documentation and introspection calls, but they still work as they used to. Possibly forever. Or at least until barely anyone uses them. The other option you have if you want to keep an older version of a library around and maintain it in a monorepo is to make a copy of the folder and work on the two separately. It’s the same thing as cutting a long running branch, you’re just making a copy in “file space” not “branch space”. And it’s more visible and representative of reality - both versions exist as first-class components being maintained. There are many different versioning and maintenance strategies you could adopt, but in my opinion the preference should be to invest effort into the latest version, making breaking changes only when absolutely inevitable (and at that point, isn’t the new version just a new thing? Like Luxon, the next version of Moment.js) and making updates trivial for your consumers. And if it’s trivial you can do it for them. Ultimately it was your decision to break the API, so you should also do the work, it’s only fair and it makes you evaluate the cost-benefit trade-off of the change. In a monorepo with building from source, this versioning strategy happens naturally. You can, however, adopt others. You just lose some of the guarantees and make feedback loops longer. Versioning strategy is really an orthogonal concept to storing code in a single repository, but the relative costs do change if you use one. Versioning by using a single version that cuts across the system becomes a lot cheaper, which means breaking changes becomes more expensive. This tends to lead to more discussions about versioning. This is actually true for most of the things we covered above. You can, but don’t have to adopt these strategies with a monorepo. Pay as you go monorepos It’s totally possible to just store everything in a single repo and not do anything else. You’ll get the visibility of what exists and flexibility of boundaries and ownership. You can still publish individual build artifacts and pull-manage dependencies (but you’d be missing out). Add building from source and you get the single snapshot benefit - you now know what code runs in a particular version of your system and to an extent, you can think about it as a monolith, despite being formed of many different independent modules and services. Add dependency aware continuous integration and the feedback loop around issues introduced while working on the codebase gets much much shorter, allowing you to go faster and waste less time on carefully managing versions, reporting bugs, making big forklift upgrades, etc. Things tend to get out of control much less. It’s simpler. Best of all, you can mix and match strategies. If you have a hugely popular library in your monorepo and each change in it triggers a build of hundreds of consumers, it only takes a couple of those builds being flakey and it will make it very hard to get builds for the changes in the library to pass. This is really a CI problem to fix (and there are so many interesting strategies out there), but sometimes you can’t do that easily. You could also say the feedback loop is now too tight for the scale and start versioning the library’s intentional releases. This still doesn’t mean you have to publish versioned artifacts. You can have a stable copy of the library in the repo, which consumers depend on, and a development copy next to it which the maintainers work on. Releasing then means moving changes from the development folder to the release one and getting its builds to pass. Or, if you wish, you can publish artifacts and let consumers pull them on their own time and report bugs to you. And you still don’t need to promise fixes for older versions without upgrading to the latest (Libraries should really have a published “code of maintenance” outlining the promises and setting maintenance expectations). And if you have to, I would again recommend making a copy, not branching. In fact, in a monorepo, branching might just not be a very good idea. Temporary branches are still useful to work on proposed changes, but long-running branches just hide the full truth about the system. And so does relying on old commits. The copies of code being used exist either way, the are still relevant and you still need to consider them for testing and security patching, they are just hidden in less apparent dimensions of the codebase “space” - branch dimension or time dimension. These are hard to think about and visualise, so maybe it’s not a good idea to use them to keep relevant and current versions of the code and stick to them as change proposal and “time travel” mechanisms. Hopefully you can see that there’s an entire spectrum of strategies you can follow but don’t have to adopt wholesale. I’m sold, but... can’t we do all this with a multi-repo? Most of the things discussed above are not really strictly dependent on monorepos, they are more a natural consequence of adopting one. You can follow versioning strategies other than semver outside of a monorepo. You can probably implement an automated version bumping system which will upgrade all the dependents of a library and test them, logging issues if they don’t pass. What you can’t do outside of a monorepo, as far as I can tell, is the atomic snapshotting of history to have a clear view of the system. And have the same view of the system a year ago. And be able to reproduce it. As soon as multiple parallel version histories are established, this ability goes away and you introduce distributed state. It’s impossible to update all the “heads” in this multi history at the same time, consistently. In version control, like git, the histories are ordered by the “follows” relationship. Later version follows - points to - its predecessor. To get a consistent, canonical view of time, there needs to be a single entrypoint. Without this central entry point, it’s impossible to define a consistent order across the entire set, it depends on where we start looking. Essentially, you already chose Partition from the three CAP properties. Now you can pick either Consistency or Availability. Typically, availability is important and so you lose consistency. You could choose consistency instead, but that would mean you can’t have availability - in order to get a consistent snapshot of the state of all of the repos, write access would need to be stopped while the snapshot is taken. In a monorepo, you don’t have partitioning, and can therefore have consistency and availability. From a physics perspective, multiple repositories with their own history effectively create a kind of spacetime, where each repository is a place and references across repos represent information propagating across space. The speed of that propagation isn’t infinite - it’s not instant. If changes happen in two places close enough in time, from the perspective of those two places, they happen in a globally inconsistent order, first the local change, then the remote change. Neither of the views is better and more true and it’s impossible to decide which of the changes came first. Unless, that is, we introduce an agreed upon central point which references all the repositories that exist and every time one of them updates, the reference in this master gets updated and a revision is created. Congratulations, we created a monorepo, well done us. The benefits of going all-in when it comes to monorepos As I said at the beginning, adopting the monorepo approach fully will result in fewer barriers in the software development lifecycle. You get faster feedback loops - the ability to test consumers of libraries before checking in a change and immediate feedback. You will spend less time looking for code and working out how it gets assembled together. You won’t need to set up repositories or ask for permissions to contribute. You can spend more time-solving problems to help your customers instead of problems you created for yourself. It takes some time to get the tooling setup right, but you only do it once, all the later projects get the setup for free. Some of the tooling is a little lacking, but in our experience there are no show stoppers. A stable, reliable CI is an absolute must, but that’s regardless of monorepos. Monorepos should also help make builds repeatable. The repo does eventually get big, but it takes years and years and hundreds of people to get to a size where it actually becomes a real problem. The Linux kernel is a monorepo and it’s probably still at least an order of magnitude bigger than your project (it is bigger than ours anyway, despite having hundreds of engineers involved at this point). Basically, you’re not Google or Microsoft. And when you are, you’ll be able to afford optimising your version control system. The UX of your code review and source hosting tooling is probably the first thing that will break, not the underlying infrastructure. For smoother scaling, the one recommendation I have is to set a file size limit - accidentally committed large files are quite hard to remove, at least in git. After using a monorepo for over two years we’re still yet to have any big technical issues with it (plenty of political ones, but that’s a story for another day) and we see the same benefits as Google reported in their recent paper. I honestly don’t really know why you would start with any other strategy. Viktor Charypar is Technical Director at Red badger, a digital consultancy based in London. You can follow him on Twitter @charypar or read more of his writing on the Red Badger blog here.
Read more
  • 0
  • 0
  • 9254
article-image-send-email-notifications-using-sendgrid
Packt Editorial Staff
10 Aug 2018
6 min read
Save for later

How to send email Notifications using SendGrid

Packt Editorial Staff
10 Aug 2018
6 min read
SendGrid is one of the popular services that allow the audience to send emails for different purposes. In today’s tutorial we will explore to: Create SendGrid account Generate SendGrid API Key Configure SendGrid API key with Azure function app Send an email notification to the website administrator Here, we will learn how to create a SendGrid output binding and send an email notification to the administrator with a static content. In general there would be only administrators so we will be hard coding the email address of the administrator in the To address field of the SendGrid output binding Getting ready Create a SendGrid account API Key from the Azure Management Portal. Generate an API Key from the SendGrid Portal. Create a SendGrid account Navigate to Azure Management Portal and create a SendGrid Email Delivery account by searching for the same in the Marketplace shown as follows: In the SendGrid Email Delivery blade, click on Create button to navigate to the Create a new SendGrid Account. Please select Free tier in the Pricing tier and provide all other details and click on the Create button shown as follows: Once the account is created successfully, navigate to the SendGrid account. You can use the search box available in the top which is shown as follows: Navigate to the Settings, choose configurations and grab the username and SmtpServer from the Configurations blade. Generate SendGrid API key In order to utilize SendGrid account by the Azure Functions runtime, we need to provide the SendGrid API key as input to the Azure Functions. You can generate an API Key from the SendGrid portal. Let's navigate to the SendGrid portal by clicking on the Manage button in the Essentials blade of the SendGrid account shown as follows: In the SendGrid portal, click on the API Keys under Settings section of the Left hand side menu shown as follows: In the API Keys page, click on Create API Key shown as follows: In the Create API Key popup, provide a name and choose the API Key Permissions and click on Create & View button. After a moment you will be able to see the API key. Click on the key to copy the same to the clipboard: Configure SendGrid API key with Azure Function app Create a new app setting in the Azure Function app by navigating to the Application Settings blade under the Platform features section of the function app shown as follows: Click on Save button after adding the app settings in the preceding step. How to do it... Navigate to the Integrate tab of the RegisterUser function and click on New Output button to add a new output binding. Choose the SendGrid output binding and click on Select button to add the binding. Please provide the following parameters in the SendGrid output binding: Message parameter name - leave the default value - message. We will be using this parameter in the run method in a moment. SendGrid API key: Please provide the app settings key that you have created in the application settings. To address: Please provide the email address of the administrator. From address: Please provide the email address from where you would like to send the email. In general, it would be kind of [email protected]. Message subject: Please provide the subject that you would like to have in the email subject. Message Text: Please provide the email body text that you would like to have in the email body. Below is how the SendGrid output binding should look like after providing all the fields: Once you review the values, click on Save to save the changes. Navigate to Run method and make the following changes: Add a new reference for SendGrid and also the namespace Add a new out parameter message of type Mail. Create an object of type Mail. Following is the complete code of the Run method: #r  "Microsoft.WindowsAzure.Storage" #r "SendGrid" using  System.Net; using SendGrid.Helpers.Mail; using  Microsoft.WindowsAzure.Storage.Table; using  Newtonsoft.Json; public  static  void  Run(HttpRequestMessage  req, TraceWriter  log, CloudTable  objUserProfileTable, out  string  objUserProfileQueueItem, out Mail message ) { var  inputs  =  req.Content.ReadAsStringAsync().Result; dynamic  inputJson  =  JsonConvert.DeserializeObject<dynamic>(inputs); string  firstname=  inputJson.firstname; string  lastname=inputJson.lastname; string  profilePicUrl  =  inputJson.ProfilePicUrl; objUserProfileQueueItem  =  profilePicUrl; UserProfile  objUserProfile  =  new  UserProfile(firstname,  lastname); TableOperation  objTblOperationInsert  = TableOperation.Insert(objUserProfile); objUserProfileTable.Execute(objTblOperationInsert); message = new Mail(); } public  class  UserProfile  :  TableEntity { public  UserProfile(string  lastName,  string  firstname,string profilePicUrl) { this.PartitionKey  =  "p1"; this.RowKey  =  Guid.NewGuid().ToString();; this.FirstName  =  firstName; this.LastName  =  lastName; this.ProfilePicUrl  =  profilePicUrl; } public  UserProfile()  {  } public  string  FirstName  {  get;  set;  } public  string  LastName  {  get;  set;  } public  string  ProfilePicUrl  {get;  set;} } Now, let's test the functionality of sending the email by navigating to the RegisterUser function and submit a request with the some test values: { "firstname": "Bill", "lastname": "Gates", "ProfilePicUrl":"https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/ Bill_Gates_June_2015.jpg/220px-Bill_Gates_June_2015.jpg" } How it works... The aim here is to send a notification via email to an administrator updating that a new registration got created successfully. We have used the one of the Azure Function experimental templates named SendGrid as a SMTP server for sending the emails by hard coding the following properties in the SendGrid output bindings: From email address To email address Subject of the email Body of the email SendGrid output bindings will use the API key provided in the app settings to invoke the required APIs of the SendGrid library for sending the emails. To summarize, we learnt about sending an email notification using SendGrid service. [box type="shadow" align="" class="" width=""]This article is an excerpt from the book, Azure Serverless Computing Cookbook, written by Praveen Kumar Sriram. It contains over 50 recipes to help you build applications hosted on Serverless architecture using Azure Functions.[/box] 5 reasons why your business should adopt cloud computing Alibaba Cloud partners with SAP to provide a versatile, one-stop cloud computing environment Top 10 IT certifications for cloud and networking professionals in 2018    
Read more
  • 0
  • 0
  • 9248

article-image-kriging-interpolation-geostatistics
Guest Contributor
15 Nov 2017
7 min read
Save for later

Using R to implement Kriging - A Spatial Interpolation technique for Geostatistics data

Guest Contributor
15 Nov 2017
7 min read
The Kriging interpolation technique is being increasingly used in geostatistics these days. But how does Kriging work to create a prediction, after all? To start with, Kriging is a method where the distance and direction between the sample data points indicate a spatial correlation. This correlation is then used to explain the different variations in the surface. In cases where the distance and direction give appropriate spatial correlation, Kriging will be able to predict surface variations in the most effective way. As such, we often see Kriging being used in Geology and Soil Sciences. Kriging generates an optimal output surface for prediction which it estimates based on a scattered set with z-values. The procedure involves investigating the z-values’ spatial behavior in an ‘interactive’ manner where advanced statistical relationships are measured (autocorrelation). Mathematically speaking, Kriging is somewhat similar to regression analysis and its whole idea is to predict the unknown value of a function at a given point by calculating the weighted average of all known functional values in the neighborhood. To get the output value for a location, we take the weighted sum of already measured values in the surrounding (all the points that we intend to consider around a specific radius), using a  formula such as the following: In a regression equation, λi would represent the weights of how far the points are from the prediction location. However, in Kriging, λi represent not just the weights of how far the measured points are from prediction location, but also how the measured points are arranged spatially around the prediction location. First, the variograms and covariance functions are generated to create the spatial autocorrelation of data. Then, that data is used to make predictions. Thus, unlike the deterministic interpolation techniques like Inverse Distance Weighted (IDW) and Spline interpolation tools, Kriging goes beyond just estimating a prediction surface. Here, it brings an element of certainty in that prediction surface. That is why experts rate kriging so highly for a strong prediction. Instead of a weather report forecasting a 2 mm rain on a certain Saturday, Kriging also tells you what is the "probability" of a 2 mm rain on that Saturday. We hope you enjoy this simple R tutorial on Kriging by Berry Boessenkool. Geostatistics: Kriging - spatial interpolation between points, using semivariance We will be covering following sections in our tutorial with supported illustrations: Packages read shapefile Variogram Kriging Plotting Kriging: packages install.packages("rgeos") install.packages("sf") install.packages("geoR") library(sf) # for st_read (read shapefiles), # st_centroid, st_area, st_union library(geoR) # as.geodata, variog, variofit, # krige.control, krige.conv, legend.krige ## Warning: package ’sf’ was built under R version 3.4.1 Kriging: read shapefile / few points for demonstration x <- c(1,1,2,2,3,3,3,4,4,5,6,6,6) y <- c(4,7,3,6,2,4,6,2,6,5,1,5,7) z <- c(5,9,2,6,3,5,9,4,8,8,3,6,7) plot(x,y, pch="+", cex=z/4) Kriging: read shapefile II GEODATA <- as.geodata(cbind(x,y,z)) plot(GEODATA) Kriging: Variogram I EMP_VARIOGRAM <- variog(GEODATA) ## variog: computing omnidirectional variogram FIT_VARIOGRAM <- variofit(EMP_VARIOGRAM) ## variofit: covariance model used is matern ## variofit: weights used: npairs ## variofit: minimisation function used: optim ## Warning in variofit(EMP_VARIOGRAM): initial values not provided - running the default search ## variofit: searching for best initial value ... selected values: ## sigmasq phi tausq kappa ## initial.value "9.19" "3.65" "0" "0.5" ## status "est" "est" "est" "fix" ## loss value: 401.578968904954 Kriging: Variogram II plot(EMP_VARIOGRAM) lines(FIT_VARIOGRAM) Kriging: Kriging res <- 0.1 grid <- expand.grid(seq(min(x),max(x),res), seq(min(y),max(y),res)) krico <- krige.control(type.krige="OK", obj.model=FIT_VARIOGRAM) krobj <- krige.conv(GEODATA, locations=grid, krige=krico) ## krige.conv: model with constant mean ## krige.conv: Kriging performed using global neighbourhood # KRigingObjekt Kriging: Plotting I image(krobj, col=rainbow2(100)) legend.krige(col=rainbow2(100), x.leg=c(6.2,6.7), y.leg=c(2,6), vert=T, off=-0.5, values=krobj$predict) contour(krobj, add=T) colPoints(x,y,z, col=rainbow2(100), legend=F) points(x,y) Kriging: Plotting II library("berryFunctions") # scatterpoints by color colPoints(x,y,z, add=F, cex=2, legargs=list(y1=0.8,y2=1)) Kriging: Plotting III colPoints(grid[ ,1], grid[ ,2], krobj$predict, add=F, cex=2, col2=NA, legargs=list(y1=0.8,y2=1)) Time for a real dataset Precipitation from ca 250 gauges in Brandenburg  as Thiessen Polygons with steep gradients at edges: Exercise 41: Kriging Load and plot the shapefile in PrecBrandenburg.zip with sf::st_read. With colPoints in the package berryFunctions, add the precipitation  values at the centroids of the polygons. Calculate the variogram and fit a semivariance curve. Perform kriging on a grid with a useful resolution (keep in mind that computing time rises exponentially  with grid size). Plot the interpolated  values with image or an equivalent (Rclick 4.15) and add contour lines. What went wrong? (if you used the defaults, the result will be dissatisfying.) How can you fix it? Solution for exercise 41.1-2: Kriging Data # Shapefile: p <- sf::st_read("data/PrecBrandenburg/niederschlag.shp", quiet=TRUE) # Plot prep pcol <- colorRampPalette(c("red","yellow","blue"))(50) clss <- berryFunctions::classify(p$P1, breaks=50)$index # Plot par(mar = c(0,0,1.2,0)) plot(p, col=pcol[clss], max.plot=1) # P1: Precipitation # kriging coordinates cent <- sf::st_centroid(p) berryFunctions::colPoints(cent$x, cent$y, p$P1, add=T, cex=0.7, legargs=list(y1=0.8,y2=1), col=pcol) points(cent$x, cent$y, cex=0.7) Solution for exercise 41.3: Variogram library(geoR) # Semivariance: geoprec <- as.geodata(cbind(cent$x,cent$y,p$P1)) vario <- variog(geoprec, max.dist=130000) ## variog: computing omnidirectional variogram fit <-variofit(vario) ## Warning in variofit(vario): initial values not provided - running the default search ## variofit: searching for best initial value ... selected values: ## sigmasq phi tausq kappa ## initial.value "1326.72" "19999.93" "0" "0.5" ## status "est" "est" "est" "fix" ## loss value: 107266266.76371 plot(vario) ; lines(fit) # distance to closest other point: d <- sapply(1:nrow(cent), function(i) min(berryFunctions::distance( cent$x[i], cent$y[i], cent$x[-i], cent$y[-i]))) hist(d/1000, breaks=20, main="distance to closest gauge [km]") mean(d) # 8 km ## [1] 8165.633 Solution for exercise 41.4-5: Kriging # Kriging: res <- 1000 # 1 km, since stations are 8 km apart on average grid <- expand.grid(seq(min(cent$x),max(cent$x),res), seq(min(cent$y),max(cent$y),res)) krico <- krige.control(type.krige="OK", obj.model=fit) krobj <- krige.conv(geoprec, locations=grid, krige=krico) ## krige.conv: model with constant mean ## krige.conv: Kriging performed using global neighbourhood # Set values outside of Brandenburg to NA: grid_sf <- sf::st_as_sf(grid, coords=1:2, crs=sf::st_crs(p)) isinp <- sapply(sf::st_within(grid_sf, p), length) > 0 krobj2 <- krobj krobj2$predict[!isinp] <- NA Solution for exercise 41.5: Kriging Visualization geoR:::image.kriging(krobj2, col=pcol) colPoints(cent$x, cent$y, p$P1, col=pcol, zlab="Prec", cex=0.7, legargs=list(y1=0.1,y2=0.8, x1=0.78, x2=0.87, horiz=F)) plot(p, add=T, col=NA, border=8)#; points(cent$x,cent$y, cex=0.7) [author title="About the author"]Berry started working with R in 2010 during his studies of Geoecology at Potsdam University, Germany. He has since then given a number of R programming workshops and tutorials, including full-week workshops in Kyrgyzstan and Kazachstan. He has left the department for environmental science in summer 2017 to focus more on software development and teaching in the data science industry. Please follow the Github link for detailed explanations on Berry’s R courses. [/author]
Read more
  • 0
  • 0
  • 9244

article-image-splunk-how-to-work-with-multiple-indexes-tutorial
Pravin Dhandre
20 Jun 2018
12 min read
Save for later

Splunk: How to work with multiple indexes [Tutorial]

Pravin Dhandre
20 Jun 2018
12 min read
An index in Splunk is a storage pool for events, capped by size and time. By default, all events will go to the index specified by defaultDatabase, which is called main but lives in a directory called defaultdb. In this tutorial, we put focus to index structures, need of multiple indexes, how to size an index and how to manage multiple indexes in a Splunk environment. This article is an excerpt from a book written by James D. Miller titled Implementing Splunk 7 - Third Edition. Directory structure of an index Each index occupies a set of directories on the disk. By default, these directories live in $SPLUNK_DB, which, by default, is located in $SPLUNK_HOME/var/lib/splunk. Look at the following stanza for the main index: [main] homePath = $SPLUNK_DB/defaultdb/db coldPath = $SPLUNK_DB/defaultdb/colddb thawedPath = $SPLUNK_DB/defaultdb/thaweddb maxHotIdleSecs = 86400 maxHotBuckets = 10 maxDataSize = auto_high_volume If our Splunk installation lives at /opt/splunk, the index main is rooted at the path /opt/splunk/var/lib/splunk/defaultdb. To change your storage location, either modify the value of SPLUNK_DB in $SPLUNK_HOME/etc/splunk-launch.conf or set absolute paths in indexes.conf. splunk-launch.conf cannot be controlled from an app, which means it is easy to forget when adding indexers. For this reason, and for legibility, I would recommend using absolute paths in indexes.conf. The homePath directories contain index-level metadata, hot buckets, and warm buckets. coldPath contains cold buckets, which are simply warm buckets that have aged out. See the upcoming sections The lifecycle of a bucket and Sizing an index for details. When to create more indexes There are several reasons for creating additional indexes. If your needs do not meet one of these requirements, there is no need to create more indexes. In fact, multiple indexes may actually hurt performance if a single query needs to open multiple indexes. Testing data If you do not have a test environment, you can use test indexes for staging new data. This then allows you to easily recover from mistakes by dropping the test index. Since Splunk will run on a desktop, it is probably best to test new configurations locally, if possible. Differing longevity It may be the case that you need more history for some source types than others. The classic example here is security logs, as compared to web access logs. You may need to keep security logs for a year or more, but need the web access logs for only a couple of weeks. If these two source types are left in the same index, security events will be stored in the same buckets as web access logs and will age out together. To split these events up, you need to perform the following steps: Create a new index called security, for instance Define different settings for the security index Update inputs.conf to use the new index for security source types For one year, you might make an indexes.conf setting such as this: [security] homePath = $SPLUNK_DB/security/db coldPath = $SPLUNK_DB/security/colddb thawedPath = $SPLUNK_DB/security/thaweddb #one year in seconds frozenTimePeriodInSecs = 31536000 For extra protection, you should also set maxTotalDataSizeMB, and possibly coldToFrozenDir. If you have multiple indexes that should age together, or if you will split homePath and coldPath across devices, you should use volumes. See the upcoming section, Using volumes to manage multiple indexes, for more information. Then, in inputs.conf, you simply need to add an index to the appropriate stanza as follows: [monitor:///path/to/security/logs/logins.log] sourcetype=logins index=security Differing permissions If some data should only be seen by a specific set of users, the most effective way to limit access is to place this data in a different index, and then limit access to that index by using a role. The steps to accomplish this are essentially as follows: Define the new index. Configure inputs.conf or transforms.conf to send these events to the new index. Ensure that the user role does not have access to the new index. Create a new role that has access to the new index. Add specific users to this new role. If you are using LDAP authentication, you will need to map the role to an LDAP group and add users to that LDAP group. To route very specific events to this new index, assuming you created an index called sensitive, you can create a transform as follows: [contains_password] REGEX = (?i)password[=:] DEST_KEY = _MetaData:Index FORMAT = sensitive You would then wire this transform to a particular sourcetype or source index in props.conf. Using more indexes to increase performance Placing different source types in different indexes can help increase performance if those source types are not queried together. The disks will spend less time seeking when accessing the source type in question. If you have access to multiple storage devices, placing indexes on different devices can help increase the performance even more by taking advantage of different hardware for different queries. Likewise, placing homePath and coldPath on different devices can help performance. However, if you regularly run queries that use multiple source types, splitting those source types across indexes may actually hurt performance. For example, let's imagine you have two source types called web_access and web_error. We have the following line in web_access: 2012-10-19 12:53:20 code=500 session=abcdefg url=/path/to/app And we have the following line in web_error: 2012-10-19 12:53:20 session=abcdefg class=LoginClass If we want to combine these results, we could run a query like the following: (sourcetype=web_access code=500) OR sourcetype=web_error | transaction maxspan=2s session | top url class If web_access and web_error are stored in different indexes, this query will need to access twice as many buckets and will essentially take twice as long. The life cycle of a bucket An index is made up of buckets, which go through a specific life cycle. Each bucket contains events from a particular period of time. The stages of this lifecycle are hot, warm, cold, frozen, and thawed. The only practical difference between hot and other buckets is that a hot bucket is being written to, and has not necessarily been optimized. These stages live in different places on the disk and are controlled by different settings in indexes.conf: homePath contains as many hot buckets as the integer value of maxHotBuckets, and as many warm buckets as the integer value of maxWarmDBCount. When a hot bucket rolls, it becomes a warm bucket. When there are too many warm buckets, the oldest warm bucket becomes a cold bucket. Do not set maxHotBuckets too low. If your data is not parsing perfectly, dates that parse incorrectly will produce buckets with very large time spans. As more buckets are created, these buckets will overlap, which means all buckets will have to be queried every time, and performance will suffer dramatically. A value of five or more is safe. coldPath contains cold buckets, which are warm buckets that have rolled out of homePath once there are more warm buckets than the value of maxWarmDBCount. If coldPath is on the same device, only a move is required; otherwise, a copy is required. Once the values of frozenTimePeriodInSecs, maxTotalDataSizeMB, or maxVolumeDataSizeMB are reached, the oldest bucket will be frozen. By default, frozen means deleted. You can change this behavior by specifying either of the following: coldToFrozenDir: This lets you specify a location to move the buckets once they have aged out. The index files will be deleted, and only the compressed raw data will be kept. This essentially cuts the disk usage by half. This location is unmanaged, so it is up to you to watch your disk usage. coldToFrozenScript: This lets you specify a script to perform some action when the bucket is frozen. The script is handed the path to the bucket that is about to be frozen. thawedPath can contain buckets that have been restored. These buckets are not managed by Splunk and are not included in all time searches. To search these buckets, their time range must be included explicitly in your search. I have never actually used this directory. Search https://splunk.com for restore archived to learn the procedures. Sizing an index To estimate how much disk space is needed for an index, use the following formula: (gigabytes per day) * .5 * (days of retention desired) Likewise, to determine how many days you can store an index, the formula is essentially: (device size in gigabytes) / ( (gigabytes per day) * .5 ) The .5 represents a conservative compression ratio. The log data itself is usually compressed to 10 percent of its original size. The index files necessary to speed up search brings the size of a bucket closer to 50 percent of the original size, though it is usually smaller than this. If you plan to split your buckets across devices, the math gets more complicated unless you use volumes. Without using volumes, the math is as follows: homePath = (maxWarmDBCount + maxHotBuckets) * maxDataSize coldPath = maxTotalDataSizeMB - homePath For example, say we are given these settings: [myindex] homePath = /splunkdata_home/myindex/db coldPath = /splunkdata_cold/myindex/colddb thawedPath = /splunkdata_cold/myindex/thaweddb maxWarmDBCount = 50 maxHotBuckets = 6 maxDataSize = auto_high_volume #10GB on 64-bit systems maxTotalDataSizeMB = 2000000 Filling in the preceding formula, we get these values: homePath = (50 warm + 6 hot) * 10240 MB = 573440 MB coldPath = 2000000 MB - homePath = 1426560 MB If we use volumes, this gets simpler and we can simply set the volume sizes to our available space and let Splunk do the math. Using volumes to manage multiple indexes Volumes combine pools of storage across different indexes so that they age out together. Let's make up a scenario where we have five indexes and three storage devices. The indexes are as follows: Name Data per day Retention required Storage needed web 50 GB no requirement ? security 1 GB 2 years 730 GB * 50 percent app 10 GB no requirement ? chat 2 GB 2 years 1,460 GB * 50 percent web_summary 1 GB 1 years 365 GB * 50 percent Now let's say we have three storage devices to work with, mentioned in the following table: Name Size small_fast 500 GB big_fast 1,000 GB big_slow 5,000 GB We can create volumes based on the retention time needed. Security and chat share the same retention requirements, so we can place them in the same volumes. We want our hot buckets on our fast devices, so let's start there with the following configuration: [volume:two_year_home] #security and chat home storage path = /small_fast/two_year_home maxVolumeDataSizeMB = 300000 [volume:one_year_home] #web_summary home storage path = /small_fast/one_year_home maxVolumeDataSizeMB = 150000 For the rest of the space needed by these indexes, we will create companion volume definitions on big_slow, as follows: [volume:two_year_cold] #security and chat cold storage path = /big_slow/two_year_cold maxVolumeDataSizeMB = 850000 #([security]+[chat])*1024 - 300000 [volume:one_year_cold] #web_summary cold storage path = /big_slow/one_year_cold maxVolumeDataSizeMB = 230000 #[web_summary]*1024 - 150000 Now for our remaining indexes, whose timeframe is not important, we will use big_fast and the remainder of big_slow, like so: [volume:large_home] #web and app home storage path = /big_fast/large_home maxVolumeDataSizeMB = 900000 #leaving 10% for pad [volume:large_cold] #web and app cold storage path = /big_slow/large_cold maxVolumeDataSizeMB = 3700000 #(big_slow - two_year_cold - one_year_cold)*.9 Given that the sum of large_home and large_cold is 4,600,000 MB, and a combined daily volume of web and app is 60,000 MB approximately, we should retain approximately 153 days of web and app logs with 50 percent compression. In reality, the number of days retained will probably be larger. With our volumes defined, we now have to reference them in our index definitions: [web] homePath = volume:large_home/web coldPath = volume:large_cold/web thawedPath = /big_slow/thawed/web [security] homePath = volume:two_year_home/security coldPath = volume:two_year_cold/security thawedPath = /big_slow/thawed/security coldToFrozenDir = /big_slow/frozen/security [app] homePath = volume:large_home/app coldPath = volume:large_cold/app thawedPath = /big_slow/thawed/app [chat] homePath = volume:two_year_home/chat coldPath = volume:two_year_cold/chat thawedPath = /big_slow/thawed/chat coldToFrozenDir = /big_slow/frozen/chat [web_summary] homePath = volume:one_year_home/web_summary coldPath = volume:one_year_cold/web_summary thawedPath = /big_slow/thawed/web_summary thawedPath cannot be defined using a volume and must be specified for Splunk to start. For extra protection, we specified coldToFrozenDir for the indexes' security and chat. The buckets for these indexes will be copied to this directory before deletion, but it is up to us to make sure that the disk does not fill up. If we allow the disk to fill up, Splunk will stop indexing until space is made available. This is just one approach to using volumes. You could overlap in any way that makes sense to you, as long as you understand that the oldest bucket in a volume will be frozen first, no matter what index put the bucket in that volume. With this, we learned to operate multiple indexes and how we can get effective business intelligence out of the data without hurting system performance. If you found this tutorial useful, do check out the book Implementing Splunk 7 - Third Edition and start creating advanced Splunk dashboards. Splunk leverages AI in its monitoring tools Splunk’s Input Methods and Data Feeds Splunk Industrial Asset Intelligence (Splunk IAI) targets Industrial IoT marketplace
Read more
  • 0
  • 0
  • 9217
article-image-writing-blog-application-nodejs-and-angularjs
Packt
16 Feb 2016
35 min read
Save for later

Writing a Blog Application with Node.js and AngularJS

Packt
16 Feb 2016
35 min read
In this article, we are going to build a blog application by using Node.js and AngularJS. Our system will support adding, editing, and removing articles, so there will be a control panel. The MongoDB or MySQL database will handle the storing of the information and the Express framework will be used as the site base. It will deliver the JavaScript, CSS, and the HTML to the end user, and will provide an API to access the database. We will use AngularJS to build the user interface and control the client-side logic in the administration page. (For more resources related to this topic, see here.) This article will cover the following topics: AngularJS fundamentals Choosing and initializing a database Implementing the client-side part of an application with AngularJS Exploring AngularJS AngularJS is an open source, client-side JavaScript framework developed by Google. It's full of features and is really well documented. It has almost become a standard framework in the development of single-page applications. The official site of AngularJS, http://angularjs.org, provides a well-structured documentation. As the framework is widely used, there is a lot of material in the form of articles and video tutorials. As a JavaScript library, it collaborates pretty well with Node.js. In this article, we will build a simple blog with a control panel. Before we start developing our application, let's first take a look at the framework. AngularJS gives us very good control over the data on our page. We don't have to think about selecting elements from the DOM and filling them with values. Thankfully, due to the available data-binding, we may update the data in the JavaScript part and see the change in the HTML part. This is also true for the reverse. Once we change something in the HTML part, we get the new values in the JavaScript part. The framework has a powerful dependency injector. There are predefined classes in order to perform AJAX requests and manage routes. You could also read Mastering Web Development with AngularJS by Peter Bacon Darwin and Pawel Kozlowski, published by Packt Publishing. Bootstrapping AngularJS applications To bootstrap an AngularJS application, we need to add the ng-app attribute to some of our HTML tags. It is important that we pick the right one. Having ng-app somewhere means that all the child nodes will be processed by the framework. It's common practice to put that attribute on the <html> tag. In the following code, we have a simple HTML page containing ng-app: <html ng-app> <head> <script src="angular.min.js"></script> </head> <body> ... </body> </html>   Very often, we will apply a value to the attribute. This will be a module name. We will do this while developing the control panel of our blog application. Having the freedom to place ng-app wherever we want means that we can decide which part of our markup will be controlled by AngularJS. That's good, because if we have a giant HTML file, we really don't want to spend resources parsing the whole document. Of course, we may bootstrap our logic manually, and this is needed when we have more than one AngularJS application on the page. Using directives and controllers In AngularJS, we can implement the Model-View-Controller pattern. The controller acts as glue between the data (model) and the user interface (view). In the context of the framework, the controller is just a simple function. For example, the following HTML code illustrates that a controller is just a simple function: <html ng-app> <head> <script src="angular.min.js"></script> <script src="HeaderController.js"></script> </head> <body> <header ng-controller="HeaderController"> <h1>{{title}}</h1> </header> </body> </html>   In <head> of the page, we are adding the minified version of the library and HeaderController.js; a file that will host the code of our controller. We also set an ng-controller attribute in the HTML markup. The definition of the controller is as follows: function HeaderController($scope) { $scope.title = "Hello world"; } Every controller has its own area of influence. That area is called the scope. In our case, HeaderController defines the {{title}} variable. AngularJS has a wonderful dependency-injection system. Thankfully, due to this mechanism, the $scope argument is automatically initialized and passed to our function. The ng-controller attribute is called the directive, that is, an attribute, which has meaning to AngularJS. There are a lot of directives that we can use. That's maybe one of the strongest points of the framework. We can implement complex logic directly inside our templates, for example, data binding, filtering, or modularity. Data binding Data binding is a process of automatically updating the view once the model is changed. As we mentioned earlier, we can change a variable in the JavaScript part of the application and the HTML part will be automatically updated. We don't have to create a reference to a DOM element or attach event listeners. Everything is handled by the framework. Let's continue and elaborate on the previous example, as follows: <header ng-controller="HeaderController"> <h1>{{title}}</h1> <a href="#" ng-click="updateTitle()">change title</a> </header>   A link is added and it contains the ng-click directive. The updateTitle function is a function defined in the controller, as seen in the following code snippet: function HeaderController($scope) { $scope.title = "Hello world"; $scope.updateTitle = function() { $scope.title = "That's a new title."; } }   We don't care about the DOM element and where the {{title}} variable is. We just change a property of $scope and everything works. There are, of course, situations where we will have the <input> fields and we want to bind their values. If that's the case, then the ng-model directive can be used. We can see this as follows: <header ng-controller="HeaderController"> <h1>{{title}}</h1> <a href="#" ng-click="updateTitle()">change title</a> <input type="text" ng-model="title" /> </header>   The data in the input field is bound to the same title variable. This time, we don't have to edit the controller. AngularJS automatically changes the content of the h1 tag. Encapsulating logic with modules It's great that we have controllers. However, it's not a good practice to place everything into globally defined functions. That's why it is good to use the module system. The following code shows how a module is defined: angular.module('HeaderModule', []); The first parameter is the name of the module and the second one is an array with the module's dependencies. By dependencies, we mean other modules, services, or something custom that we can use inside the module. It should also be set as a value of the ng-app directive. The code so far could be translated to the following code snippet: angular.module('HeaderModule', []) .controller('HeaderController', function($scope) { $scope.title = "Hello world"; $scope.updateTitle = function() { $scope.title = "That's a new title."; } });   So, the first line defines a module. We can chain the different methods of the module and one of them is the controller method. Following this approach, that is, putting our code inside a module, we will be encapsulating logic. This is a sign of good architecture. And of course, with a module, we have access to different features such as filters, custom directives, and custom services. Preparing data with filters The filters are very handy when we want to prepare our data, prior to be displayed to the user. Let's say, for example, that we need to mention our title in uppercase once it reaches a length of more than 20 characters: angular.module('HeaderModule', []) .filter('customuppercase', function() { return function(input) { if(input.length > 20) { return input.toUpperCase(); } else { return input; } }; }) .controller('HeaderController', function($scope) { $scope.title = "Hello world"; $scope.updateTitle = function() { $scope.title = "That's a new title."; } });   That's the definition of the custom filter called customuppercase. It receives the input and performs a simple check. What it returns, is what the user sees at the end. Here is how this filter could be used in HTML: <h1>{{title | customuppercase}}</h1> Of course, we may add more than one filter per variable. There are some predefined filters to limit the length, such as the JavaScript to JSON conversion or, for example, date formatting. Dependency injection Dependency management can be very tough sometimes. We may split everything into different modules/components. They have nicely written APIs and they are very well documented. However, very soon, we may realize that we need to create a lot of objects. Dependency injection solves this problem by providing what we need, on the fly. We already saw this in action. The $scope parameter passed to our controller, is actually created by the injector of AngularJS. To get something as a dependency, we need to define it somewhere and let the framework know about it. We do this as follows: angular.module('HeaderModule', []) .factory("Data", function() { return { getTitle: function() { return "A better title."; } } }) .controller('HeaderController', function($scope, Data) { $scope.title = Data.getTitle(); $scope.updateTitle = function() { $scope.title = "That's a new title."; } });   The Module class has a method called factory. It registers a new service that could later be used as a dependency. The function returns an object with only one method, getTitle. Of course, the name of the service should match the name of the controller's parameter. Otherwise, AngularJS will not be able to find the dependency's source. The model in the context of AngularJS In the well-known Model-View-Controller pattern, the model is the part that stores the data in the application. AngularJS doesn't have a specific workflow to define models. The $scope variable could be considered a model. We keep the data in properties attached to the current scope. Later, we can use the ng-model directive and bind a property to the DOM element. We already saw how this works in the previous sections. The framework may not provide the usual form of a model, but it's made like that so that we can write our own implementation. The fact that AngularJS works with plain JavaScript objects, makes this task easily doable. Final words on AngularJS AngularJS is one of the leading frameworks, not only because it is made by Google, but also because it's really flexible. We could use just a small piece of it or build a solid architecture using the giant collection of features. Selecting and initializing the database To build a blog application, we need a database that will store the published articles. In most cases, the choice of the database depends on the current project. There are factors such as performance and scalability and we should keep them in mind. In order to have a better look at the possible solutions, we will have a look at the two of the most popular databases: MongoDB and MySQL. The first one is a NoSQL type of database. According to the Wikipedia entry (http://en.wikipedia.org/wiki/ NoSQL) on NoSQL databases: "A NoSQL or Not Only SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases." In other words, it's simpler than a SQL database, and very often stores information in the key value type. Usually, such solutions are used when handling and storing large amounts of data. It is also a very popular approach when we need flexible schema or when we want to use JSON. It really depends on what kind of system we are building. In some cases, MySQL could be a better choice, while in some other cases, MongoDB. In our example blog, we're going to use both. In order to do this, we will need a layer that connects to the database server and accepts queries. To make things a bit more interesting, we will create a module that has only one API, but can switch between the two database models. Using NoSQL with MongoDB Let's start with MongoDB. Before we start storing information, we need a MongoDB server running. It can be downloaded from the official page of the database https://www.mongodb.org/downloads. We are not going to handle the communication with the database manually. There is a driver specifically developed for Node.js. It's called mongodb and we should include it in our package.json file. After successful installation via npm install, the driver will be available in our scripts. We can check this as follows: "dependencies": { "mongodb": "1.3.20" }   We will stick to the Model-View-Controller architecture and the database-related operations in a model called Articles. We can see this as follows: var crypto = require("crypto"), type = "mongodb", client = require('mongodb').MongoClient, mongodb_host = "127.0.0.1", mongodb_port = "27017", collection; module.exports = function() { if(type == "mongodb") { return { add: function(data, callback) { ... }, update: function(data, callback) { ... }, get: function(callback) { ... }, remove: function(id, callback) { ... } } } else { return { add: function(data, callback) { ... }, update: function(data, callback) { ... }, get: function(callback) { ... }, remove: function(id, callback) { ... } } } }   It starts with defining a few dependencies and settings for the MongoDB connection. Line number one requires the crypto module. We will use it to generate unique IDs for every article. The type variable defines which database is currently accessed. The third line initializes the MongoDB driver. We will use it to communicate with the database server. After that, we set the host and port for the connection and at the end a global collection variable, which will keep a reference to the collection with the articles. In MongoDB, the collections are similar to the tables in MySQL. The next logical step is to establish a database connection and perform the needed operations, as follows: connection = 'mongodb://'; connection += mongodb_host + ':' + mongodb_port; connection += '/blog-application'; client.connect(connection, function(err, database) { if(err) { throw new Error("Can't connect"); } else { console.log("Connection to MongoDB server successful."); collection = database.collection('articles'); } });   We pass the host and the port, and the driver is doing everything else. Of course, it is a good practice to handle the error (if any) and throw an exception. In our case, this is especially needed because without the information in the database, the frontend has nothing to show. The rest of the module contains methods to add, edit, retrieve, and delete records: return { add: function(data, callback) { var date = new Date(); data.id = crypto.randomBytes(20).toString('hex'); data.date = date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate(); collection.insert(data, {}, callback || function() {}); }, update: function(data, callback) { collection.update( {ID: data.id}, data, {}, callback || function(){ } ); }, get: function(callback) { collection.find({}).toArray(callback); }, remove: function(id, callback) { collection.findAndModify( {ID: id}, [], {}, {remove: true}, callback ); } }   The add and update methods accept the data parameter. That's a simple JavaScript object. For example, see the following code: { title: "Blog post title", text: "Article's text here ..." }   The records are identified by an automatically generated unique id. The update method needs it in order to find out which record to edit. All the methods also have a callback. That's important, because the module is meant to be used as a black box, that is, we should be able to create an instance of it, operate with the data, and at the end continue with the rest of the application's logic. Using MySQL We're going to use an SQL type of database with MySQL. We will add a few more lines of code to the already working Articles.js model. The idea is to have a class that supports the two databases like two different options. At the end, we should be able to switch from one to the other, by simply changing the value of a variable. Similar to MongoDB, we need to first install the database to be able use it. The official download page is http://www.mysql.com/downloads. MySQL requires another Node.js module. It should be added again to the package. json file. We can see the module as follows: "dependencies": { "mongodb": "1.3.20", "mysql": "2.0.0" }   Similar to the MongoDB solution, we need to firstly connect to the server. To do so, we need to know the values of the host, username, and password fields. And because the data is organized in databases, a name of the database. In MySQL, we put our data into different databases. So, the following code defines the needed variables: var mysql = require('mysql'), mysql_host = "127.0.0.1", mysql_user = "root", mysql_password = "", mysql_database = "blog_application", connection;   The previous example leaves the password field empty but we should set the proper value of our system. The MySQL database requires us to define a table and its fields before we start saving data. So, the following code is a short dump of the table used in this article: CREATE TABLE IF NOT EXISTS `articles` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` longtext NOT NULL, `text` longtext NOT NULL, `date` varchar(100) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;   Once we have a database and its table set, we can continue with the database connection, as follows: connection = mysql.createConnection({ host: mysql_host, user: mysql_user, password: mysql_password }); connection.connect(function(err) { if(err) { throw new Error("Can't connect to MySQL."); } else { connection.query("USE " + mysql_database, function(err, rows, fields) { if(err) { throw new Error("Missing database."); } else { console.log("Successfully selected database."); } }) } });   The driver provides a method to connect to the server and execute queries. The first executed query selects the database. If everything is ok, you should see Successfully selected database as an output in your console. Half of the job is done. What we should do now is replicate the methods returned in the first MongoDB implementation. We need to do this because when we switch to the MySQL usage, the code using the class will not work. And by replicating them we mean that they should have the same names and should accept the same arguments. If we do everything correctly, at the end our application will support two types of databases. And all we have to do is change the value of the type variable: return { add: function(data, callback) { var date = new Date(); var query = ""; query += "INSERT INTO articles (title, text, date) VALUES ("; query += connection.escape(data.title) + ", "; query += connection.escape(data.text) + ", "; query += "'" + date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate() + "'"; query += ")"; connection.query(query, callback); }, update: function(data, callback) { var query = "UPDATE articles SET "; query += "title=" + connection.escape(data.title) + ", "; query += "text=" + connection.escape(data.text) + " "; query += "WHERE id='" + data.id + "'"; connection.query(query, callback); }, get: function(callback) { var query = "SELECT * FROM articles ORDER BY id DESC"; connection.query(query, function(err, rows, fields) { if (err) { throw new Error("Error getting."); } else { callback(rows); } }); }, remove: function(id, callback) { var query = "DELETE FROM articles WHERE id='" + id + "'"; connection.query(query, callback); } }   The code is a little longer than the one generated in the first MongoDB variant. That's because we needed to construct MySQL queries from the passed data. Keep in mind that we have to escape the information, which comes to the module. That's why we use connection.escape(). With these lines of code, our model is completed. Now we can add, edit, remove, or get data. Let's continue with the part that shows the articles to our users. Developing the client side with AngularJS Let's assume that there is some data in the database and we are ready to present it to the users. So far, we have only developed the model, which is the class that takes care of the access to the information. To simplify the process, we will Express here. We need to first update the package.json file and include that in the framework, as follows: "dependencies": { "express": "3.4.6", "jade": "0.35.0", "mongodb": "1.3.20", "mysql": "2.0.0" }   We are also adding Jade, because we are going to use it as a template language. The writing of markup in plain HTML is not very efficient nowadays. By using the template engine, we can split the data and the HTML markup, which makes our application much better structured. Jade's syntax is kind of similar to HTML. We can write tags without the need to close them: body p(class="paragraph", data-id="12"). Sample text here footer a(href="#"). my site   The preceding code snippet is transformed to the following code snippet: <body> <p data-id="12" class="paragraph">Sample text here</p> <footer><a href="#">my site</a></footer> </body>   Jade relies on the indentation in the content to distinguish the tags. Let's start with the project structure, as seen in the following screenshot: We placed our already written class, Articles.js, inside the models directory. The public directory will contain CSS styles, and all the necessary client-side JavaScript: the AngularJS library, the AngularJS router module, and our custom code. We will skip some of the explanations about the following code. Our index.js file looks as follows: var express = require('express'); var app = express(); var articles = require("./models/Articles")(); app.set('views', __dirname + '/views'); app.set('view engine', 'jade'); app.use(express.static(__dirname + '/public')); app.use(function(req, res, next) { req.articles = articles; next(); }); app.get('/api/get', require("./controllers/api/get")); app.get('/', require("./controllers/index")); app.listen(3000); console.log('Listening on port 3000');   At the beginning, we require the Express framework and our model. Maybe it's better to initialize the model inside the controller, but in our case this is not necessary. Just after that, we set up some basic options for Express and define our own middleware. It has only one job to do and that is to attach the model to the request object. We are doing this because the request object is passed to all the route handlers. In our case, these handlers are actually the controllers. So, Articles.js becomes accessible everywhere via the req.articles property. At the end of the script, we placed two routes. The second one catches the usual requests that come from the users. The first one, /api/get, is a bit more interesting. We want to build our frontend on top of AngularJS. So, the data that is stored in the database should not enter the Node.js part but on the client side where we use Google's framework. To make this possible, we will create routes/controllers to get, add, edit, and delete records. Everything will be controlled by HTTP requests performed by AngularJS. In other words, we need an API. Before we start using AngularJS, let's take a look at the /controllers/api/get.js controller: module.exports = function(req, res, next) { req.articles.get(function(rows) { res.send(rows); }); }   The main job is done by our model and the response is handled by Express. It's nice because if we pass a JavaScript object, as we did, (rows is actually an array of objects) the framework sets the response headers automatically. To test the result, we could run the application with node index.js and open http://localhost:3000/api/ get. If we don't have any records in the database, we will get an empty array. If not, the stored articles will be returned. So, that's the URL, which we should hit from within the AngularJS controller in order to get the information. The code of the /controller/index.js controller is also just a few lines. We can see the code as follows: module.exports = function(req, res, next) { res.render("list", { app: "" }); }   It simply renders the list view, which is stored in the list.jade file. That file should be saved in the /views directory. But before we see its code, we will check another file, which acts as a base for all the pages. Jade has a nice feature called blocks. We may define different partials and combine them into one template. The following is our layout.jade file: doctype html html(ng-app="#{app}") head title Blog link(rel='stylesheet', href='/style.css') script(src='/angular.min.js') script(src='/angular-route.min.js') body block content   There is only one variable passed to this template, which is #{app}. We will need it later to initialize the administration's module. The angular.min.js and angular-route.min.js files should be downloaded from the official AngularJS site, and placed in the /public directory. The body of the page contains a block placeholder called content, which we will later fill with the list of the articles. The following is the list.jade file: extends layout block content .container(ng-controller="BlogCtrl") section.articles article(ng-repeat="article in articles") h2 {{article.title}} br small published on {{article.date}} p {{article.text}} script(src='/blog.js')   The two lines in the beginning combine both the templates into one page. The Express framework transforms the Jade template into HTML and serves it to the browser of the user. From there, the client-side JavaScript takes control. We are using the ng-controller directive saying that the div element will be controlled by an AngularJS controller called BlogCtrl. The same class should have variable, articles, filled with the information from the database. ng-repeat goes through the array and displays the content to the users. The blog.js class holds the code of the controller: function BlogCtrl($scope, $http) { $scope.articles = [ { title: "", text: "Loading ..."} ]; $http({method: 'GET', url: '/api/get'}) .success(function(data, status, headers, config) { $scope.articles = data; }) .error(function(data, status, headers, config) { console.error("Error getting articles."); }); }   The controller has two dependencies. The first one, $scope, points to the current view. Whatever we assign as a property there is available as a variable in our HTML markup. Initially, we add only one element, which doesn't have a title, but has text. It is shown to indicate that we are still loading the articles from the database. The second dependency, $http, provides an API in order to make HTTP requests. So, all we have to do is query /api/get, fetch the data, and pass it to the $scope dependency. The rest is done by AngularJS and its magical two-way data binding. To make the application a little more interesting, we will add a search field, as follows: // views/list.jade header .search input(type="text", placeholder="type a filter here", ng-model="filterText") h1 Blog hr   The ng-model directive, binds the value of the input field to a variable inside our $scope dependency. However, this time, we don't have to edit our controller and can simply apply the same variable as a filter to the ng-repeat: article(ng-repeat="article in articles | filter:filterText") As a result, the articles shown will be filtered based on the user's input. Two simple additions, but something really valuable is on the page. The filters of AngularJS can be very powerful. Implementing a control panel The control panel is the place where we will manage the articles of the blog. Several things should be made in the backend before continuing with the user interface. They are as follows: app.set("username", "admin"); app.set("password", "pass"); app.use(express.cookieParser('blog-application')); app.use(express.session());   The previous lines of code should be added to /index.js. Our administration should be protected, so the first two lines define our credentials. We are using Express as data storage, simply creating key-value pairs. Later, if we need the username we can get it with app.get("username"). The next two lines enable session support. We need that because of the login process. We added a middleware, which attaches the articles to the request object. We will do the same with the current user's status, as follows: app.use(function(req, res, next) { if (( req.session && req.session.admin === true ) || ( req.body && req.body.username === app.get("username") && req.body.password === app.get("password") )) { req.logged = true; req.session.admin = true; }; next(); });   Our if statement is a little long, but it tells us whether the user is logged in or not. The first part checks whether there is a session created and the second one checks whether the user submitted a form with the correct username and password. If these expressions are true, then we attach a variable, logged, to the request object and create a session that will be valid during the following requests. There is only one thing that we need in the main application's file. A few routes that will handle the control panel operations. In the following code, we are defining them along with the needed route handler: var protect = function(req, res, next) { if (req.logged) { next(); } else { res.send(401, 'No Access.'); } } app.post('/api/add', protect, require("./controllers/api/add")); app.post('/api/edit', protect, require("./controllers/api/edit")); app.post('/api/delete', protect, require("./controllers/api/ delete")); app.all('/admin', require("./controllers/admin"));   The three routes, which start with /api, will use the model Articles.js to add, edit, and remove articles from the database. These operations should be protected. We will add a middleware function that takes care of this. If the req.logged variable is not available, it simply responds with a 401 - Unauthorized status code. The last route, /admin, is a little different because it shows a login form instead. The following is the controller to create new articles: module.exports = function(req, res, next) { req.articles.add(req.body, function() { res.send({success: true}); }); }   We transfer most of the logic to the frontend, so again, there are just a few lines. What is interesting here is that we pass req.body directly to the model. It actually contains the data submitted by the user. The following code, is how the req.articles.add method looks for the MongoDB implementation: add: function(data, callback) { data.ID = crypto.randomBytes(20).toString('hex'); collection.insert(data, {}, callback || function() {}); } And the MySQL implementation is as follows: add: function(data, callback) { var date = new Date(); var query = ""; query += "INSERT INTO articles (title, text, date) VALUES ("; query += connection.escape(data.title) + ", "; query += connection.escape(data.text) + ", "; query += "'" + date.getFullYear() + "-" + date.getMonth() + "-" + date.getDate() + "'"; query += ")"; connection.query(query, callback); } In both the cases, we need title and text in the passed data object. Thankfully, due to Express' bodyParser middleware, this is what we have in the req.body object. We can directly forward it to the model. The other route handlers are almost the same: // api/edit.js module.exports = function(req, res, next) { req.articles.update(req.body, function() { res.send({success: true}); }); } What we changed is the method of the Articles.js class. It is not add but update. The same technique is applied in the route to delete an article. We can see it as follows: // api/delete.js module.exports = function(req, res, next) { req.articles.remove(req.body.id, function() { res.send({success: true}); }); }   What we need for deletion is not the whole body of the request but only the unique ID of the record. Every API method sends {success: true} as a response. While we are dealing with API requests, we should always return a response. Even if something goes wrong. The last thing in the Node.js part, which we have to cover, is the controller responsible for the user interface of the administration panel, that is, the. / controllers/admin.js file: module.exports = function(req, res, next) { if(req.logged) { res.render("admin", { app: "admin" }); } else { res.render("login", { app: "" }); } }   There are two templates that are rendered: /views/admin.jade and /views/login. jade. Based on the variable, which we set in /index.js, the script decides which one to show. If the user is not logged in, then a login form is sent to the browser, as follows: extends layout block content .container header h1 Administration hr section.articles article form(method="post", action="/admin") span Username: br input(type="text", name="username") br span Password: br input(type="password", name="password") br br input(type="submit", value="login")   There is no AngularJS code here. All we have is the good old HTML form, which submits its data via POST to the same URL—/admin. If the username and password are correct, the .logged variable is set to true and the controller renders the other template: extends layout block content .container header h1 Administration hr a(href="/") Public span | a(href="#/") List span | a(href="#/add") Add section(ng-view) script(src='/admin.js')   The control panel needs several views to handle all the operations. AngularJS has a great router module, which works with hashtags-type URLs, that is, URLs such as / admin#/add. The same module requires a placeholder for the different partials. In our case, this is a section tag. The ng-view attribute tells the framework that this is the element prepared for that logic. At the end of the template, we are adding an external file, which keeps the whole client-side JavaScript code that is needed by the control panel. While the client-side part of the applications needs only loading of the articles, the control panel requires a lot more functionalities. It is good to use the modular system of AngularJS. We need the routes and views to change, so the ngRoute module is needed as a dependency. This module is not added in the main angular.min.js build. It is placed in the angular-route.min.js file. The following code shows how our module starts: var admin = angular.module('admin', ['ngRoute']); admin.config(['$routeProvider', function($routeProvider) { $routeProvider .when('/', {}) .when('/add', {}) .when('/edit/:id', {}) .when('/delete/:id', {}) .otherwise({ redirectTo: '/' }); } ]);   We configured the router by mapping URLs to specific routes. At the moment, the routes are just empty objects, but we will fix that shortly. Every controller will need to make HTTP requests to the Node.js part of the application. It will be nice if we have such a service and use it all over our code. We can see an example as follows: admin.factory('API', function($http) { var request = function(method, url) { return function(callback, data) { $http({ method: method, url: url, data: data }) .success(callback) .error(function(data, status, headers, config) { console.error("Error requesting '" + url + "'."); }); } } return { get: request('GET', '/api/get'), add: request('POST', '/api/add'), edit: request('POST', '/api/edit'), remove: request('POST', '/api/delete') } });   One of the best things about AngularJS is that it works with plain JavaScript objects. There are no unnecessary abstractions and no extending or inheriting special classes. We are using the .factory method to create a simple JavaScript object. It has four methods that can be called: get, add, edit, and remove. Each one of them calls a function, which is defined in the helper method request. The service has only one dependency, $http. We already know this module; it handles HTTP requests nicely. The URLs that we are going to query are the same ones that we defined in the Node.js part. Now, let's create a controller that will show the articles currently stored in the database. First, we should replace the empty route object .when('/', {}) with the following object: .when('/', { controller: 'ListCtrl', template: ' <article ng-repeat="article in articles"> <hr /> <strong>{{article.title}}</strong><br /> (<a href="#/edit/{{article.id}}">edit</a>) (<a href="#/delete/{{article.id}}">remove</a>) </article> ' })   The object has to contain a controller and a template. The template is nothing more than a few lines of HTML markup. It looks a bit like the template used to show the articles on the client side. The difference is the links used to edit and delete. JavaScript doesn't allow new lines in the string definitions. The backward slashes at the end of the lines prevent syntax errors, which will eventually be thrown by the browser. The following is the code for the controller. It is defined, again, in the module: admin.controller('ListCtrl', function($scope, API) { API.get(function(articles) { $scope.articles = articles; }); });   And here is the beauty of the AngularJS dependency injection. Our custom-defined service API is automatically initialized and passed to the controller. The .get method fetches the articles from the database. Later, we send the information to the current $scope dependency and the two-way data binding does the rest. The articles are shown on the page. The work with AngularJS is so easy that we could combine the controller to add and edit in one place. Let's store the route object in an external variable, as follows: var AddEditRoute = { controller: 'AddEditCtrl', template: ' <hr /> <article> <form> <span>Title</spna><br /> <input type="text" ng-model="article.title"/><br /> <span>Text</spna><br /> <textarea rows="7" ng-model="article.text"></textarea> <br /><br /> <button ng-click="save()">save</button> </form> </article> ' };   And later, assign it to the both the routes, as follows: .when('/add', AddEditRoute) .when('/edit/:id', AddEditRoute)   The template is just a form with the necessary fields and a button, which calls the save method in the controller. Notice that we bound the input field and the text area to variables inside the $scope dependency. This comes in handy because we don't need to access the DOM to get the values. We can see this as follows: admin.controller( 'AddEditCtrl', function($scope, API, $location, $routeParams) { var editMode = $routeParams.id ? true : false; if (editMode) { API.get(function(articles) { articles.forEach(function(article) { if (article.id == $routeParams.id) { $scope.article = article; } }); }); } $scope.save = function() { API[editMode ? 'edit' : 'add'](function() { $location.path('/'); }, $scope.article); } })   The controller receives four dependencies. We already know about $scope and API. The $location dependency is used when we want to change the current route, or, in other words, to forward the user to another view. The $routeParams dependency is needed to fetch parameters from the URL. In our case, /edit/:id is a route with a variable inside. Inside the code, the id is available in $routeParams.id. The adding and editing of articles uses the same form. So, with a simple check, we know what the user is currently doing. If the user is in the edit mode, then we fetch the article based on the provided id and fill the form. Otherwise, the fields are empty and new records will be created. The deletion of an article can be done by using a similar approach, which is adding a route object and defining a new controller. We can see the deletion as follows: .when('/delete/:id', { controller: 'RemoveCtrl', template: ' ' })   We don't need a template in this case. Once the article is deleted from the database, we will forward the user to the list page. We have to call the remove method of the API. Here is how the RemoveCtrl controller looks like: admin.controller( 'RemoveCtrl', function($scope, $location, $routeParams, API) { API.remove(function() { $location.path('/'); }, $routeParams); } );   The preceding code depicts same dependencies like in the previous controller. This time, we simply forward the $routeParams dependency to the API. And because it is a plain JavaScript object, everything works as expected. Summary In this article, we built a simple blog by writing the backend of the application in Node.js. The module for database communication, which we wrote, can work with the MongoDB or MySQL database and store articles. The client-side part and the control panel of the blog were developed with AngularJS. We then defined a custom service using the built-in HTTP and routing mechanisms. Node.js works well with AngularJS, mainly because both are written in JavaScript. We found out that AngularJS is built to support the developer. It removes all those boring tasks such as DOM element referencing, attaching event listeners, and so on. It's a great choice for the modern client-side coding stack. You can refer to the following books to learn more about Node.js: Node.js Essentials Learning Node.js for Mobile Application Development Node.js Design Patterns Resources for Article: Further resources on this subject: Node.js Fundamentals [Article] AngularJS Project [Article] Working with Live Data and AngularJS [Article]
Read more
  • 0
  • 2
  • 9204

article-image-web-scraping-python-part-2
Packt
23 Oct 2009
7 min read
Save for later

Web scraping with Python (Part 2)

Packt
23 Oct 2009
7 min read
This article by Javier Collado expands the set of web scraping techniques shown in his previous article by looking closely into a more complex problem that cannot be solved with the tools that were explained there. For those who missed out on that article, here's the link. Web Scraping with Python This article will show how to extract the desired information using the same three steps when the web page is not written directly using HTML, but is auto-generated using JavaScript to update the DOM tree. As you may remember from that article, web scraping is the ability to extract information automatically from a set of web pages that were designed only to display information nicely to humans; but that might not be suitable when a machine needs to retrieve that information. The three basic steps that were recommended to be followed when performing a scraping task were the following: Explore the website to find out where the desired information is located in the HTML DOM tree Download as many web pages as needed Parse downloaded web pages and extract the information from the places found in the exploration step What should be taken into account when the content is not directly coded in the HTML DOM tree? The main difference, as you probably have already noted, is that using the downloading methods that were suggested in the previous article (urllib2 or mechanize) just don't work. This is because they generate an HTTP request to get the web page and deliver the received HTML directly to the scraping script. However, the pieces of information that are auto-generated by the JavaScript code are not yet in the HTML file because the code is not executed in any virtual machine as it happens when the page is displayed in a web browser. Hence, instead of relying on a library that generates HTTP requests, we need a library that behaves as a real web browser, or even better, a library that interacts with a real web browser. So that we are sure that we obtain the same data as we see when manually opening a page in a web browser. Please remember that the aim of web scraping is actually parsing the data that a human user sees, so interacting with a real web browser would be a really nice feature. Is there any tool out there to perform that? Fortunately, the answer is yes. In particular, there are a couple of tools used for web testing automation that can be used to solve the JavaScript execution problem: Selenium and Windmill . For the code samples in the sections below, Windmill is used. Any choice would be fine as both of them are well documented and stable tools ready to be used for production. Let's now follow the same three steps that were suggested in the previous article to solve the scraping of the contents of a web page that is partly generated using JavaScript code. Explore Imagine that you are a fan of NASA Image of the day gallery. You want to get a list of the names of all the images in the gallery together with the link to the whole resolution picture just in case you decide to download it later to use as a desktop wallpaper. The first thing to do is to locate the data that has to be extracted on the desired web page. In the case of the Image of the day gallery (see screenshot below), there are three elements that are important to note: Title of the image that is being currently displayed Link to the image full resolution file Next link to make it possible navigate through all the images To find out the location of each piece of interesting information, as it was already suggested in the previous article, it's better to use a tool such as Firebug whose inspect functionality can be really useful. The following picture, for example, shows the location of the image title inside an h3 tag: The other two fields can be located as easily as the title, so no further explanation will be given here. Please refer to the previous article for further information. Download As explained in the introduction, to download the content of the web page, we will use Windmill as it allows the JavaScript code to execute in the web browser before getting the page content. Because Windmill is mostly a testing library, instead of writing a script that calls the Windmill API, I will write a test case for Windmill to navigate through all the image web pages. The code for the test should be as follows: 1 def test_scrape_iotd_gallery(): 2 """ 3 Scrape NASA Image of the Day Gallery 4 """ 5 # Extra data massage for BeautifulSoup 6 my_massage = get_massage() 7 8 # Open main gallery page 9 client = WindmillTestClient(__name__) 10 client.open(url='http://www.nasa.gov/multimedia/imagegallery/iotd.html') 11 12 # Page isn't completely loaded until image gallery data 13 # has been updated by javascript code 14 client.waits.forElement(xpath=u"//div[@id='gallery_image_area']/img", 15 timeout=30000) 16 17 # Scrape all images information 18 images_info = {} 19 while True: 20 image_info = get_image_info(client, my_massage) 21 22 # Break if image has been already scrapped 23 # (that means that all images have been parsed 24 # since they are ordered in a circular ring) 25 if image_info['link'] in images_info: 26 break 27 28 images_info[image_info['link']] = image_info 29 30 # Click to get the information for the next image 31 client.click(xpath=u"//div[@class='btn_image_next']") 32 33 # Print results to stdout ordered by image name 34 for image_info in sorted(images_info.values(), 35 key=lambda image_info: image_info['name']): 36 print ("Name: %(name)sn" 37 "Link: %(link)sn" % image_info) As it can be seen, the usage of Windmill is similar to other libraries such as mechanize. For example, first of all a client object has to be created to interact with the browser, (line 9) and later, the main web page, that is going to be used to navigate through all the information, has to be opened (line 10). Nevertheless, it also includes some facilities that take into account JavaScript code as shown at line 14. In this line, the waits.forElement method has been used to look for DOM element that is filled by the JavaScript code so when that element, in this case the big image in the image gallery, is displayed, the rest of the script can proceed. It is important to note here that the web page processing doesn't start when the page is downloaded (this happens after line 10), but when there's some evidence that JavaScript code has finished the DOM tree manipulation. For navigating through all the pages that contain the information needed, this is just a matter of pressing over the next arrow (line 30). As the images are ordered in a circular buffer, the point when it is decided to stop is when the same image link has been parsed twice (line 25). To execute the script, instead of launching it as we would normally do for a python script, we should call it through the Windmill script to properly initialize the environment: $ windmill firefox test=nasa_iotd.py As it can be seen in the following screenshot, Windmill takes care of opening a browser (Firefox in this case) window and a controller window in which it's possible to see the commands that the script is executing (several clicks on next in the example): The controller window is really interesting because not only does it display the progress of the test cases, but also allows to enter/record actions interactively, which is a nice feature when trying things out. In particular, the recording may be used under some situations to replace Firebug in the exploration step. This is because the captured actions may be stored in a script without spending much time in xpath expressions. For more information about how to use Windmill and the complete API, please refer to the Windmill documentation.
Read more
  • 0
  • 0
  • 9202