Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-windows-drive-acquisition
Packt
21 Jul 2017
13 min read
Save for later

Windows Drive Acquisition

Packt
21 Jul 2017
13 min read
In this article, by Oleg Skulkin and Scar de Courcier, authors of Windows Forensics Cookbook, we  will cover drive acquisition in E01 format with FTK Imager, drive acquisition in RAW Format with DC3DD, and mounting forensic images with Arsenal Image Mounter. (For more resources related to this topic, see here.) Before you can begin analysing evidence from a source, it first of all, needs to be imaged. This describes a forensic process in which an exact copy of a drive is taken. This is an important step, especially if evidence needs to be taken to court because forensic investigators must be able to demonstrate that they have not altered the evidence in any way. The term forensic image can refer to either a physical or a logical image. Physical images are precise replicas of the drive they are referencing, whereas a logical image is a copy of a certain volume within that drive. In general, logical images show what the machine’s user will have seen and dealt with, whereas physical images give a more comprehensive overview of how the device works at a higher level. A hash value is generated to verify the authenticity of the acquired image. Hash values are essentially cryptographic digital fingerprints which show whether a particular item is an exact copy of another. Altering even the smallest bit of data will generate a completely new hash value, thus demonstrating that the two items are not the same. When a forensic investigator images a drive, they should generate a hash value for both the original drive and the acquired image. Some pieces of forensic software will do this for you. There are a number of tools available for imaging hard drives, some of which are free and open source. However, the most popular way for forensic analysts to image hard drives is by using one of the more well-known forensic software vendors solutions. This is because it is imperative to be able to explain how the image was acquired and its integrity, especially if you are working on a case that will be taken to court. Once you have your image, you will then be able to analyse the digital evidence from a device without directly interfering with the device itself. In this chapter, we will be looking at various tools that can help you to image a Windows drive, and taking you through the process of acquisition. Drive acquisition in E01 format with FTK Imager FTK Imager is an imaging and data preview tool by AccessData, which allows an examiner not only to create forensic images in different formats, including RAW, SMART, E01 and AFF, but also to preview data sources in a forensically sound manner. In the first recipe of this article, we will show you how to create a forensic image of a hard drive from a Windows system in E01 format. E01 or EnCase's Evidence File is a standard format for forensic images in law enforcement. Such images consist of a header with case info, including acquisition date and time, examiner's name, acquisition notes, and password (optional), bit-by-bit copy of an acquired drive (consists of data blocks, each is verified with its own CRC or Cyclical Redundancy Check), and a footer with MD5 hash for the bitstream.  Getting ready First of all, let's download FTK Imager from AccessData website. To do it, go to SOLUTIONS tab, and after - to Product Downloads. Now choose DIGITAL FORENSICS, and after - FTK Imager. At the time of this writing, the most up-to-date version is 3.4.3, so click DOWNLOAD PAGE green button on the right. Ok, now you should be at the download page. Click on DOWNLOAD NOW button and fill in the form, after this you'll get the download link to the email you provided. The installation process is quite straightforward, all you need is just click Next a few times, so we won't cover it in the recipe. How to do it... There are two ways of initiating drive imaging process: Using Create Disk Image button from the Toolbar as shown in the following figure: Create Disk Image button on the Toolbar Use Create Disk Image option from the File menu as shown in the following figure: Create Disk Image... option in the File Menu You can choose any option you like. The first window you see is Select Source. Here you have five options: Physical Drive: This allows you to choose a physical drive as the source, with all partitions and unallocated space Logical Drive: This allows you to choose a logical drive as the source, for example, E: drive Image File: This allows you to choose an image file as the source, for example, if you need to convert you forensic image from one format to another Contents of a Folder: This allows you to choose a folder as the source, of course, no deleted files, and so on will be included Fernico Device: This allows you to restore images from multiple CD/DVD Of course, we want to image the whole drive to be able to work with deleted data and unallocated space, so: Let's choose Physical Drive option. Evidence source mustn't be altered in any way, so make sure you are using a hardware write blocker, you can use the one from Tableau, for example. These devices allow acquisition of  drive contents without creating the possibility of modifying the data.  FTK Imager Select Source window Click Next and you'll see the next window - Select Drive. Now you should choose the source drive from the drop down menu, in our case it's .PHYSICALDRIVE2. FTK Imager Select Drive window Ok, the source drive is chosen, click Finish. Next window - Create Image. We'll get back to this window soon, but for now, just click Add...  It's time to choose the destination image type. As we decided to create our image in EnCase's Evidence File format, let's choose E01. FTK Imager Select Image Type window Click Next and you'll see Evidence Item Information window. Here we have five fields to fill in: Case Number, Evidence Number, Unique Description, Examiner and Notes. All fields are optional. FTK Imager Evidence Item Information window Filled the fields or not, click Next. Now choose image destination. You can use Browse button for it. Also, you should fill in image filename. If you want your forensic image to be split, choose fragment size (in megabytes). E01 format supports compression, so if you want to reduce the image size, you can use this feature, as you can see in the following figure, we have chosen 6. And if you want the data in the image to be secured, use AD Encryption feature. AD Encryption is a whole image encryption, so not only is the raw data encrypted, but also any metadata. Each segment or file of the image is encrypted with a randomly generated image key using AES-256. FTK Imager Select Image Destination window Ok, we are almost done. Click Finish and you'll see Create Image window again. Now, look at three options at the bottom of the window. The verification process is very important, so make sure Verify images after they are created option is ticked, it helps you to be sure that the source and the image are equal. Precalculate Progress Statistics option is also very useful: it will show you estimated time of arrival during the imaging process. The last option will create directory listings of all files in the image for you, but of course, it takes time, so use it only if you need it.  FTK Imager Create Image window All you need to do now is to click Start. Great, the imaging process has been started! When the image is created, the verification process starts. Finally, you'll get Drive/Image Verify Results window, like the one in the following figure: FTK Imager Drive/Image Verify Results window As you can see, in our case the source and the image are identical: both hashes matched. In the folder with the image, you will also find an info file with valuable information such as drive model, serial number, source data size, sector count, MD5 and SHA1 checksums, and so on. How it works... FTK Imager uses the physical drive of your choice as the source and creates a bit-by-bit image of it in EnCase's Evidence File format. During the verification process, MD5 and SHA1 hashes of the image and the source are being compared. See more FTK Imager download page: http://accessdata.com/product-download/digital-forensics/ftk-imager-version-3.4.3 FTK Imager User Guide: https://ad-pdf.s3.amazonaws.com/Imager/3_4_3/FTKImager_UG.pdf Drive acquisition in RAW format with DC3DD DC3DD is a patched (by Jesse Kornblum) version of classic GNU DD utility with some computer forensics features. For example, the fly hashing with a number of algorithms, such as MD5, SHA-1, SHA-256, and SHA-512, showing the progress of the acquisition process, and so on. Getting ready You can find a compiled stand alone 64 bit version of DC3DD for Windows at Sourceforge. Just download the ZIP or 7z archive, unpack it, and you are ready to go. How to do it... Open Windows Command Prompt and change directory (you can use cd command to do it) to the one with dc3dd.exe, and type the following command: dc3dd.exe if=.PHYSICALDRIVE2 of=X:147-2017.dd hash=sha256 log=X:147-2017.log Press Enter and the acquisition process will start. Of course, your command will be a bit different, so let's find out what each part of it means: if: It stands for input file, yes, originally DD is a Linux utility, and, if you don't know, everything is a file in Linux, as you can see in our command, we put physical drive 2 here (this is the drive we wanted to image, but in your case it can be another drive, depend on the number of drives connected to your workstation). of: It stands for output file, here you should type the destination of your image, as you remember, in RAW format, in our case it's X: drive and 147-2017.dd file. hash: As it's already been said, DC3DD supports four hashing algorithms: MD5, SHA-1, SHA-256, and SHA-512, we chose SHA-256, but you can choose the one you like. log: Here you should type the destination for the logs, you will find the image version, image hash, and so on in this file after acquisition is completed. How it works... DC3DD creates bit-by-bit image of the source drive n RAW format, so the size of the image will be the same as source, and calculates the image hash using the algorithm of the examiner's choice, in our case SHA-256. See also DC3DD download page: https://sourceforge.net/projects/dc3dd/files/dc3dd/7.2%20-%20Windows/ Mounting forensic images with Arsenal Image Mounter Arsenal Image Mounter is an open source tool developed by Arsenal Recon. It can help a digital forensic examiner to mount a forensic image or virtual machine disk in Windows. It supports both E01 (and Ex01) and RAW forensic images, so you can use it with any of the images we created in the previous recipes. It's very important to note, that Arsenal Image Mounter mounts the contents of disk images as complete disks. The tool supports all file systems you can find on Windows drives: NTFS, ReFS, FAT32 and exFAT. Also, it has temporary write support for images and it's very useful feature, for example, if you want to boot system from the image you are examining. Getting ready Go to Arsenal Image Mounter web page at Arsenal Recon website and click on Download button to download the ZIP archive. At the time of this writing the latest version of the tool is 2.0.010, so in our case, the archive has the name  Arsenal_Image_Mounter_v2.0.010.0_x64.zip. Extract it to a location of your choice and you are ready to go, no installation is needed. How to do it... There two ways to choose an image for mounting in Arsenal Image Mounter: You can use File menu and choose Mount image. Use the Mount image button as shown in the following figure:  Arsenal Image Mounter main window When you choose Mount image option from File menu or click on Mount image button, Open window will popup - here you should choose an image for mounting. The next windows you will see - Mount options, like the one in the following figure:  Arsenal Image Mounter Mount options window As you can see, there are a few options here: Read only: If you choose this option, the image is mounted in read-only mode, so no write operations are allowed (Do you still remember that you mustn't alter the evidence in any way? Of course, it's already an image, not the original drive, but nevertheless).Fake disk signatures: If an all-zero disk signature is found on the image, Arsenal Image Mounter reports a random disk signature to Windows, so it's mounted properly. Write temporary: If you choose this option, the image is mounted in read-write mode, but all modifications are written not in the original image file, but to a temporary differential file. Write original: Again, this option mounts the image in read-write mode, but this time the original image file will be modified. Sector size: This option allows you to choose sector size. Create "removable" disk device: This option emulates the attachment of a USB thumb drive.   Choose the options you think you need and click OK. We decided to mount our image as read only. Now you can see a hard drive icon on the main windows of the tool - the image is mounted. If you mounted only one image and want to unmount it- select the image and click on Remove selected button. If you have a few mounted images and want to unmount all of them - click on Remove all button. How it works... Arsenal Image Mounter mounts forensic images or virtual machine disks as complete disks in read-only or read-write mode. Later, a digital forensics examiner can access their contents even with Windows Explorer. See also Arsenal Image Mounter page at Arsenal Recon website: https://arsenalrecon.com/apps/image-mounter/ Summary In this article, the author has explained about the process and importance of drive acquisition using imaging software's which are available with well-known forensic software vendors such as FTK Imager and DC3DD. Drive acquisition being the first step in the analysis of digital evidence, need to be carried out with utmost care which in turn will make the analysis process smooth. Resources for Article: Further resources on this subject: Forensics Recovery [article] Digital and Mobile Forensics [article] Mobile Forensics and Its Challanges [article]
Read more
  • 0
  • 0
  • 5616

article-image-creating-and-calling-subroutines
Packt
21 Jul 2017
8 min read
Save for later

Creating and Calling Subroutines

Packt
21 Jul 2017
8 min read
In this article by James Kent Lewis author for the book Linux Shell Scripting Bootcamp, we will learn how to create and call subroutines in a script. The topics covered in this article are as follows: Show subroutines that take parameters Mention return codes again and how they work in scripts How to use subroutines First, let's start with a selection of simple but powerful scripts. These are mainly shown to give the reader an idea of just what can be done quickly with a script. (For more resources related to this topic, see here.) Clearing the screen The tput clear terminal command can be used to clear the current command-line session. You could type tput clear all the time, but wouldn't just cls be nicer? Here's a simple script that clears the current screen: Chapter 4 - Script 1 #!/bin/sh # # 5/9/2017 # tput clear Notice that this was so simple that I didn't even bother to include a Usage message or return code. Remember, to make this a command on your system, do this: cd $HOME/bin Create/edit a file named cls Copy and paste the preceding code into this file Save the file Run chmod 755 cls You can now type cls from any terminal (under that user) and your screen will clear. Try it. File redirection At this point, we need to go over file redirection. This is the ability to have the output from a command or script be copied into a file instead of going to the screen. This is done using the redirection operator, which is really just the greater than sign. These commands were run on my system and here is the output: Command piping Now, let's look at command piping, which is the ability to run a command and have the output from it serve as the input to another command. Suppose a program or script named loop1 is running on your system and you want to know the PID of it. You could run the psauxw command to a file, and then grep the file for loop1. Alternatively, you could do it in one step using a pipe as follows: Pretty cool, right? This is a very powerful feature in a Linux system and is used extensively. We will be seeing a lot more of this soon. The next section shows another very short script using some command piping. This clears the screen and then shows only the first 10 lines from dmesg. Chapter 4 - Script 2 #!/bin/sh # # 5/9/2017 # tput clear dmesg | head The next section shows file redirection. Chapter 4 - Script 3 #!/bin/sh # # 5/9/2017 # FN=/tmp/dmesg.txt dmesg> $FN echo "File $FN created." Try it on your system! This shows how easy it is to create a script to perform commands that you would normally type on the command line. Also, notice the use of the FN variable. If you want to use a different filename later, you only have to make the change in one place. Subroutines Now, let's really get into subroutines. To do this, we will use more of the tput commands: tput cup <row><col> # puts the cursor at row, col tput cup 0 0 # puts the cursor at the upper left hand side of terminal tput cup $LINES $COLUMNS # puts the cursor at bottom right hand side tput clear # clears the terminal screen tputsmso # bolds the text that follows tputrmso # un-bolds the text that follows test of Code[PACKT] 123456789012345678901234567890 tput cup 0 0 # tputsmso # The script is shown in the next section. This was mainly written to show the concept of subroutines; however, it can also be used as a guide on writing interactive tools. Chapter 4 - Script 4 i #!/bin/sh # # 5/9/2017 # echo "script4 - Linux Scripting Book" # Subroutines cls() { tput clear return 0 } home() { tput cup 0 0 return 0 } end() { let x=$COLUMNS-1 tput cup $LINES $x echo -n "X" # no newline or else will scroll } bold() { tputsmso } unbold() { tputrmso } underline() { tputsmul } normalline() { tputrmul } # Code starts here rc=0 # return code if [ $# -ne 1 ] ; then echo "Usage: script4 parameter" echo "Where parameter can be: " echo " home - put an X at the home position" echo " cls - clear the terminal screen" echo " end - put an X at the last screen position" echo " bold - bold the following output" echo " underline - underline the following output" exit 255 fi parm=$1 # main parameter 1 if [ "$parm" = "home" ] ; then echo "Calling subroutine home." home echo -n "X" elif [ "$parm" = "cls" ] ; then cls elif [ "$parm" = "end" ] ; then echo "Calling subroutine end." end elif [ "$parm" = "bold" ] ; then echo "Calling subroutine bold." bold echo "After calling subroutine bold." unbold echo "After calling subroutine unbold." elif [ "$parm" = "underline" ] ; then echo "Calling subroutine underline." underline echo "After subroutine underline." normalline echo "After subroutine normalline." else echo "Unknown parameter: $parm" rc=1 fi exit $rc The following is the output: Try this on your system. If you run it with the home parameter, it might look a little strange to you. The code puts a capital X at the home position (0,0) and this causes the prompt to print 1 character over. Nothing is wrong here, it just looks a little weird. Don't worry if this still doesn't make sense to you, just go ahead and look at Script 5. Using parameters Okay, let's add some routines to this script to show how to use parameters with a subroutine. In order to make the output look better, the cls routine is called first to clear the screen, which is shown in the next section. Chapter 4 - Script 5 #!/bin/sh # # 5/9/2017 # echo "script5 - Linux Scripting Book" # Subroutines cls() { tput clear return 0 } home() { tput cup 0 0 return 0 } end() { let x=$COLUMNS-1 tput cup $LINES $x echo -n "X" # no newline or else will scroll } bold() { tputsmso } unbold() { tputrmso } underline() { tputsmul } normalline() { tputrmul } move() # move cursor to row, col { tput cup $1 $2 } movestr() # move cursor to row, col { tput cup $1 $2 echo $3 } # Code starts here cls # clear the screen to make the output look better rc=0 # return code if [ $# -ne 1 ] ; then echo "Usage: script5 parameter" echo "Where parameter can be: " echo " home - put an X at the home position" echo " cls - clear the terminal screen" echo " end - put an X at the last screen position" echo " bold - bold the following output" echo " underline - underline the following output" echo " move - move cursor to row,col" echo " movestr - move cursor to row,col and output string" exit 255 fi parm=$1 # main parameter 1 if [ "$parm" = "home" ] ; then home echo -n "X" elif [ "$parm" = "cls" ] ; then cls elif [ "$parm" = "end" ] ; then move 0 0 echo "Calling subroutine end." end elif [ "$parm" = "bold" ] ; then echo "Calling subroutine bold." bold echo "After calling subroutine bold." unbold echo "After calling subroutine unbold." elif [ "$parm" = "underline" ] ; then echo "Calling subroutine underline." underline echo "After subroutine underline." normalline echo "After subroutine normalline." elif [ "$parm" = "move" ] ; then move 10 20 echo "This line started at row 10 col 20" elif [ "$parm" = "movestr" ] ; then movestr 30 40 "This line started at 30 40" else echo "Unknown parameter: $parm" rc=1 fi exit $rc Since this script only has two extra functions, you can just run them. This will be shown one command at a time as follows: guest1 $ script5 guest1 $ script5 move guest1 $ script5 movestr Since we are now placing the cursor at a specific location, the output should make more sense to you. Notice how the command-line prompt reappears where the last cursor position was. You probably noticed that the parameters to a subroutine work just like with a script. Parameter 1 is $1, parameter 2 is $2, and so on. This is good and bad, good because you don't have to learn anything radically different. But bad in that it is very easy to get the $1, $2, and vars mixed up if you are not careful. A possible solution, and the one I use, is to assign the $1, $2, and so on variables in the main script to a variable with a good meaningful name. For example, in these example scripts, I set parm1 equal to $1 (parm1=$1), and so on. Summary We started with some very simple scripts and then proceeded to show some simple subroutines. We then showed some subroutines that take parameters. Return codes were mentioned again to show how they work in subroutines. We included several scripts to show the concepts, and also included a special bonus script at no extra charge. Resources for Article:   Further resources on this subject: Linux Shell Scripting [article] Linux Shell Scripting – various recipes to help you [article] GLSL 4.0: Using Subroutines to Select Shader Functionality [article]
Read more
  • 0
  • 0
  • 2334

article-image-getting-started-android-things
Packt
21 Jul 2017
16 min read
Save for later

Getting Started with Android Things

Packt
21 Jul 2017
16 min read
In this article by Charbel Nemnom, author of the book Getting Started with Nano Server, we will cover following topics: Internet of things overview Internet of Things components Android Things overview Things support library Android Things board compatibility How to install Android Things on Raspberry Creating the first Android Things project (For more resources related to this topic, see here.) Internet of things overview Internet of things, or briefly IoT, is one of the most promising trends in technology. According to many analysts, Internet of things can be the most disruptive technology in the upcoming decade. It will have a huge impact on our lives and it promises to modify our habits. IoT is and will be in the future a pervasive technology that will span its effects across many sectors: Industry Healthcare Transportation Manufacturing Agriculture Retail Smart cities All these areas will have benefits using IoT. Before diving into IoT projects, it is important to know what Internet of things means. There are several definitions about Internet of things addressing different aspects considering different areas of application. Anyway, it is important to underline that IoT is much more than a network of smartphones, tablets, and PCs connected to each other. Briefly, IoT is an ecosystem where objects are interconnected and, at the same time, they connect to the internet. Internet of things includes every object that can potentially connect to the internet and exchange data and information. These objects are connected always, anytime, anywhere, and they exchange data. The concept of connected objects is not new and over the years it has been developed. The level of circuit miniaturization and the increasing power of CPU with a lower consumption makes it possible to imagine a future where there are millions of “things” that talk each other. The first time that Internet of things was officially recognized was in 2005. International Communication Union (ITU) in a report titled The Internet of Things (https://www.itu.int/osg/spu/publications/internetofthings/InternetofThings_summar y.pdf), gave the first definition: “A new dimension has been added to the world of information and communication technologies (ICTs): from anytime, any place connectivity for anyone, we will now have connectivity for anything…. Connections will multiply and create an entirely new dynamic network of networks – an Internet of Things” In other words, IoT is a network of smart objects (or things) that can receive and send data and we can control it remotely. Internet of Things components There are several elements that contribute to creating the IoT ecosystem and it is important to know clearly the role they play in order to have a clear picture about IoT. This will be useful to better understand the projects we will build using Android Things. The basic brick of IoT is a smart object. It is a device that connects to the internet and it is capable of exchanging data. It can be a simple sensor that measures a quantity such as pressure, temperature, and so on or a complex system. Extending this concept, our oven, our coffee machine, our washing machine are examples of smart objects once they connect to the internet. All of these smart objects contribute to developing the internet of things network. Anyway, not only household appliances are examples of smart objects, but cars, buildings, actuators, and so on can be included in the list. We can reference these objects, when connected, using a unique identifier and start talking to it. At the low level, these devices exchange data using a network layer. The most important and known protocols at the base of Internet of things are: Wi-Fi Bluetooth Zigbee Cellular Network NB-IoT LoRA From an application point of view, there are several application protocols widely used in the internet of things. Some protocols derive from different contexts (such as the web) others are IoT-specific. To name a few of them, we can remember: HTTP MQTT CoAP AMQP Rest XMPP Stomp By now, they could be just names or empty boxes, but throughout this article we will explore how to use these protocols with Android Things. Prototyping boards play an important role in Internet of things and they help to develop the number of connected objects. Using prototyping boards, we can experiment IoT projects and in this article, we will explore how to build and test IoT projects using boards compatible with Android Things. As you may already know, there are several prototyping boards available on the market, each one having specific features. Just to name a few of them, we can list: Arduino (in different flavors) Raspberry Pi (in different flavors) Intel Edison ESP8266 NXP We will focus our attention on Raspberry Pi 3 and Intel Edison because Android Things supports officially these two boards. We will also use other developments boards so that you can understand how to integrate them. Android Things overview Android Things is the new operating system developed by Google to build IoT projects. This helps you to develop professional applications using trusted platforms and Android. Yes Android, because Android Things is a modified version of Android and we can reuse our Android knowledge to implement smart Internet of things projects. This OS has great potential because Android developers can smoothly move to IoT and start developing and building projects in a few days. Before diving into Android Things, it is important to have an overview. Android Things OS has the layer structure shown in the following diagram: This structure is slightly different from Android OS because it is much more compact so that apps for Android Things have fewer layers beneath and they are closer to drivers and peripherals than normal Android apps. Even if Android Things derives from Android, there are some APIs available in Android not supported in Android Things. We will now briefly describe the similarities and the differences. Let us start with the content providers, widely used in Android, and not present in Android Things SDK. Therefore, we should pay attention when we develop an Android Things app. To have more information about these content providers not supported, please refer to the Official Android Things website https://developer.android.com/things/sdk/index.html. Moreover, like a normal Android app, an Android Things app can have a User Interface (UI), even if this is optional and it depends on the type of application we are developing. A user can interact with the UI to trigger events as they happen in an Android app. From this point of view, as we will see later, the developing process of a UI is the same used in Android. This is an interesting feature because we can develop an IoT UI easily and fast re- using our Android knowledge. It is worth noticing that Android Things fits perfectly in the Google services. Almost all cloud services implemented by Google are available in Android Things with some exceptions. Android Things does not support Google services strictly connected to the mobile world and those that require a user input or authentication. Do not forget that user interface for an Android Things app is optional. To have a detailed list of Google services available in Android Things refer to the Official page (https://developer.android.com/things/sdk/index.html). An important Android aspect is the permission management. An Android app runs in a sandbox with limited access to the resources. When an app needs to access a specific resource outside the sandbox it has to request the permission. In an Android app, this happens in the Manifest.xml file. This is still true in Android Things and all the permissions requested by the app are granted at installation time. Android 6 (API level 23) has introduced a new way to request a permission. An app can request a permission not only at installation time (using the Manifest.xml file), but at runtime too. Android Things does not support this new feature, so we have to request all the permissions in the Manifest file. The last thing to notice is the notifications. As we will see later, Android Things UI does not support the notification status bar, so we cannot trigger notifications from our Android Things apps. To make things simpler, you should remember that all the services related to the user interface or that require a user interface to accomplish the task are not guaranteed to work in Android Things. Things support library Things support library is the new library developed by Google to handle the communication with peripherals and drivers. This is a completely new library not present in the Android SDK and this library is one of the most important features. It exposes a set of Java Interface and classes (APIs) that we can use to connect and exchange data with external devices such as sensors, actuators, and so on. This library hides the inner communication details supporting several industry standard protocols as: GPIO I2C PWM SPI UART Moreover, this library exposes a set of APIs to create and register new device drivers called user drivers. These drivers are custom components deployed with the Android Things app that extends the Android Things framework. In other words, they are custom libraries that enable an app to communicate with other device types not supported by Android Things natively. This article will guide you, step by step, to know how to build real-life projects using Android. You will explore the new Android Things APIs and how to use them. In the next sections, you will learn how to install Android Things on Raspberry Pi 3 and Intel Edison. Android Things board compatibility Android Things is a new operating system specifically built for IoT. At the time of writing, Android Things supported four different development boards: Raspberry Pi 3 Model B Intel Edison NXP Pico i.MX6UL Intel Joule 570x In the near future, more boards will be added to the list. Google has already announced that it will support this new board: NXP Argon i.MX6UL The article will focus on using the first two boards: Raspberry Pi 3 and Intel Edison. Anyway, you can develop. This is the power of Android Things: it abstracts the underlying hardware providing a common way to interact with peripherals and devices. The paradigm that made Java famous, Write Once and Run Anywhere (WORA), applies to Android Things too. This is a winning feature of Android Things because we can develop an Android Things app without worrying about the underlying board. Anyway, when we develop an IoT app there are some minor aspects we should consider so that our app will be portable to other compatible boards. How to install Android Things on Raspberry Raspberry Pi 3 is the latest board developed by Raspberry. It is an upgrade of Raspberry Pi 2 Model B and respect to its predecessor it has some great features: Quad-core ARMv8 Cpu at 1.2Ghz Wireless Lan 802.11n Bluetooth 4.0 The below screenshot shows a Raspberry Pi 3 Model B: In this section, you will learn how to install Android Things on Raspberry Pi 3 using a Windows PC or a Mac OS X. Before starting the installation process you must have: Raspberry Pi 3 Model B At least 8Gb SD card A USB cable to connect Raspberry to your PC An HDMI cable to connect Raspberry to a TV/monitor (optional) If you do not have an HDMI cable you can use a screen mirroring tool. This is useful to know the result of the installation process and when we will develop the Android Things UIs. The installation steps are different if you are using Windows, OS X or Linux. How to install Android Things using Windows At the beginning we will cover how to install Android Things on Raspberry Pi 3 using a Windows PC: Download the Android Things image from this link: https://developer.android.com/things/preview/download.html. Select the right image, in this case, you have to choose the Raspberry Pi image. Accept the license and wait until the download is completed. Once the download is complete, extract the zip file. To install the image on the SD card, there is a great application called Win32 Disk Imager that works perfectly. It is free and you can download it from SourceForge using this link: https://sourceforge.net/projects/win32diskimager/. At the time of writing, the application version is 0.9.5. After you downloaded it, you have to run the installation executable as Administrator. Now you are ready to burn the image into the SD card. Insert the SD card into your PC. Select the image you have unzipped at step 3 and be sure to select the right disk name (your SD). At the end click on Write. You are done! The image is installed on the SD card and we can now start Raspberry Pi. How to install Android Things using OS X If you have a Mac OS X, the steps to install Android Things are slightly different. There are several options to flash this OS to the SD card, you will learn the fastest and easiest one. These are the steps to follow: Format your SD card using FAT32. Insert your SD card into your Mac and run Disk Utility. You should see something like it: Download the Android Things OS image using this link: https://developer.android.com/things/preview/download.html. Unzip the file you have downloaded. Insert the SD card into your Mac. Now it is time to copy the image to the SD card. Open a terminal window and write: sudo dd bs=1m if=path_of_your_image.img of=/dev/rdiskn Where the path_to_your_image is the path to the file with img extension you downloaded at step 2. In order to find out the rdiskn you have to select Preferences and then System Report. The result is shown in the picture below: The BSD name is the disk name we are looking for. In this case, we have to write: sudo dd bs=1m if=path_of_your_image.img of=/dev/disk1 That's all. You have to wait until the image is copied into the SD card. Do not forget that the copying process could take a while. So be patient! Creating the first Android Things project Considering that Android Things derives from Android, the development process and the app structure are the same we use in a common Android app. For this reason, the development tool to use for Android Things is Android studio. If you have already used Android studio in the past, reading this paragraph you will discover the main differences between an Android Things app and Android app. Otherwise, if you are new to Android development, this paragraph will guide you step by step to create your first Android Things app. Android Studio is the official development environment to develop Android Things apps, therefore, before starting, it is necessary you have installed it. If not, go to https://developer.android.com/studio/index.html, download and install it. The development environment must adhere to these prerequisites: SDK tools version 24 or higher. Update the SDK with Android 7 (API level 24) Android Studio 2.2 or higher If your environment does not meet the previous conditions, you have to update your Android Studio using the Update manager. Now there are two alternatives to starting a new project: Cloning a template project from Github and import it into Android Studio Create a new Android project in Android Studio To better understand the main differences between Android and Android Things you should follow option number 2, at least the first time. Cloning the template project This is the fastest path because with a few steps you are ready to develop an Android Things app: go to https://github.com/androidthings/new-project-template and clone the repository. Open a terminal and write: git clone https://github.com/androidthings/new-project-template.git Now you have to import the cloned project into Android Studio. Create the project manually This step is a quite longer respect to the previous option but it is useful to know the main differences between these two worlds. Create a new Android project. Do not forget to set the Minimum SDK to API Level 24: By now, you should create a project with empty activity. Confirm and create the new project. There are some steps you have to follow before your Android app project turns into an Android Things app project: Open Gradle scripts folder and modify build.gradle (app-level) and replace the dependency directive with the following lines: dependencies { provided 'com.google.android.things:androidthings:0.2-devpreview' } Open res folder and remove all the files under it except strings.xml. Open Manifest.xml and remove android: theme attribute in application tag. In the Manifest.xmladd the following line inside application tag: <uses-library android_name="com.google.android.things"/> In the layout folder open all the layout files created automatically and remove the references to values. In the Activity created by default (MainActivity.java) remove this line: import android.support.v7.app.AppCompatActivity; Replace AppCompatActivity with Activity. Under the folder java remove all the folders except the one with your package name. That's all. You have now transformed an Android app project into an Android Things app project. Compiling the code you will have no errors. For the next times, you can simply clone the repository holding the project template and start coding. Differences between Android and Android Things As you can notice an Android Things project is very similar to an Android project. We have always Activities, layouts, gradle files and so on. At the same time, there are some differences: Android Things does not use multiple layouts to support different screen sizes. So when we develop an Android Things app we create only one layout Android Things does not support themes and styles. Android support libraries are not available in Android Things. Summary In this article we have covered, Internet of things overview and components, Android Things overview, Things support library, and Android Things board compatibility, and also learned how to create first Android Things project. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android [article] Getting started with Android Development [article] Android Game Development with Unity3D [article]
Read more
  • 0
  • 0
  • 2694

article-image-machine-learning-review
Packt
18 Jul 2017
20 min read
Save for later

Machine Learning Review

Packt
18 Jul 2017
20 min read
In this article by Uday Kamath and Krishna Choppella, authors for the book Mastering Java Machine Learning, will discuss how in recent years a revival of interest is seen in the area ofartificial intelligence (AI)and machine learning, in particular, both in academic circles and industry. In the last decade, AI has seen dramatic successes that eluded practitioners in the intervening years since the original promise of the field gave way to relative decline until its re-emergence in the last few years. (For more resources related to this topic, see here.) What made these successes possible, in large part, was the availability of prodigious amounts of data and the inexorable increase in raw computational power. Among the areas of AI leading the resurgence, machine learning has seen spectacular developments and continues to find the widest applicability in an array of domains. The use of machine learning to help in complex decision making at the highest levels of business, and at the same time, its enormous success in improving the accuracy of what are now everyday applications, such assearch, speech recognition, and personal assistants on mobile phones,has made its effects commonplace in the family room and the boardroom alike. Articles breathlessly extolling the power of "deep learning" can be found today not only in the popular science and technology press, but also in mainstream outlets such as The New York Times and The Huffington Post. Machine learning has indeed become ubiquitous in a relatively short time. An ordinary user encounters machine learning in many ways in his day-to-day activities. Interacting with well-known e-mail providers such as Gmail gives the user automated sorting and categorization of e-mails into categories, such as spam, junk, promotions, and so on,which is made possible using text mining, a branch of machine learning. When shopping online for products on ecommerce websites such as https://www.amazon.com/ or watching movies from content providers such as http://netflix.com/, one is offered recommendations for other products and content by so-called recommender systems, another branch of machine learning. Forecasting the weather, estimating real estate prices, predicting voter turnout and even election results—all use some form of machine learningto see into the future as it were. The ever-growing availability of data and the promise of systems that can enrich our lives by learning from that data place a growing demand on skills from a limited workforce of professionals in the field of data science. This demand is particularly acute for well-trained experts who know their way around the landscape of machine learning techniques in the more popular languages, including Java, Python, R, and increasingly, Scala. By far, the number and availability of machine learning libraries, tools, APIs, and frameworks in Java outstrip those in other languages. Consequently, mastery of these skills will put any aspiring professional with a desire to enter the field at a distinct advantage in the marketplace. Perhaps you already apply machine learning techniques in your professional work, or maybe you simply have a hobbyist's interest in the subject.Clearly, you can bend Java to your will, but now you feel you're ready to dig deeper and learn how to use thebest of breed open-source ML Java frameworks in your next data science project. Mastery of a subject, especially one that has such obvious applicability as machine learning, requires more than an understanding of its core concepts and familiarity with its mathematical underpinnings. Unlike an introductory treatment of the subject, a project that purports to help you master the subject must be heavily focused on practical aspects in addition to introducing more advanced topics that would have stretched the scope of the introductory material.To warm up before we embark on sharpening our instrument, we will devote this article to a quick review of what we already know.For the ambitious novice with little or no prior exposure to the subject (who is nevertheless determined to get the fullest benefit from this article), here's our advice: make sure you do not skip the rest of this article instead, use it as a springboard to explore unfamiliar concepts in more depth. Seek out external resources as necessary.Wikipedia it. Then jump right back in. For the rest of this article, we will review the following: History and definitions What is not machine learning? Concepts and terminology Important branches of machine learning Different data types in machine learning Applications of machine learning Issues faced in machine learning The meta-process used in most machine learning projects Information on some well-known tools, APIs,and resources that we will employ in this article Machine learning –history and definition It is difficult to give an exact history, but the definition of machine learning we use today finds its usage as early as in the 1860s.In Rene Descartes' Discourse on the Method, he refers to Automata and saysthe following: For we can easily understand a machine's being constituted so that it can utter words, and even emit some responses to action on it of a corporeal kind, which brings about a change in its organs; for instance, if touched in a particular part it may ask what we wish to say to it; if in another part it may exclaim that it is being hurt, and so on http://www.earlymoderntexts.com/assets/pdfs/descartes1637.pd https://www.marxists.org/reference/archive/descartes/1635/discourse-method.htm Alan Turing, in his famous publication Computing Machinery and Intelligence, gives basic insights into the goals of machine learning by asking the question "Can machines think?". http://csmt.uchicago.edu/annotations/turing.htm http://www.csee.umbc.edu/courses/471/papers/turing.pdf Arthur Samuel, in 1959,wrote,"Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.". Tom Mitchell, in recent times, gave a more exact definition of machine learning:"A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E." Machine Learning has a relationship with several areas: Statistics: This uses the elements of data sampling, estimation, hypothesis testing, learning theory, and statistical based modeling, to name a few Algorithms and computation: This uses basics of search, traversal, parallelization, distributed computing, and so on from basic computer science Database and knowledge discovery: This has the ability to store, retrieve, access information in various formats Pattern recognition: This has the ability to find interesting patterns from the data either to explore, visualize, or predict Artificial Intelligence: Though it is considered a branch of artificial intelligence, it also has relationships with other branches, such as heuristics, optimization, evolutionary computing, and so on. What is not machine learning? It is important to recognize areas that share a connection with machine learning but cannot themselves be considered as being part of machine learning. Some disciplines may overlap to a smaller or larger extent, yet the principles underlying machine learning are quite distinct: Business intelligence and reporting: Reporting Key Performance Indicators (KPIs), querying OLAP for slicing, dicing, and drilling into the data, dashboards,and so on. that form central components of BI are not machine learning. Storage and ETL: Data storage and ETL are key elements needed in any machine learning process, but by themselves, they don't qualify as machine learning. Information retrieval, search, and queries: The ability to retrieve the data or documents based on search criteria or indexes, which form the basis of information retrieval, are not really machine learning. Many forms of machine learning, such as semi-supervised learning, can rely on search of similar data for modeling but that doesn't qualify search as machine learning. Knowledge representation and reasoning: Representing knowledge for performing complex tasks such as Ontology, Expert Systems, and Semantic Web do not qualify as machine learning. Machine learning –concepts and terminology In this article, we will describe different concepts and terms normally used in machine learning: Data or dataset: The basics of machine learning rely on understanding the data. The data or dataset normally refers to content available in structured or unstructured format for using in machine learning. Structured datasets have specific formats, and an unstructured dataset is normally in the form of some free flowing text. Data can be available in various storage types or formats. In structured data, every element known as an instance or an example or row follows a predefined structure. Data can be also be categorized by size; small or medium data have a few hundreds to thousands of instances, whereas big data refers to large volume, mostly in the millions or billions, which cannot be stored or accessed using common devices or fit in the memory of such devices. Features, attributes, variables or dimensions: In structured datasets, as mentioned earlier, there are predefined elements with their own semantic and data type, which are known variously as features, attributes, variables, or dimensions. Data types: The preceding features defined need some form of typing in many machine learning algorithms or techniques. The most commonly used data types are as follows: Categorical or nominal: This indicates well-defined categories or values present in the dataset. For example, eye color, such as black, blue, brown, green, or grey; document content type, such as text, image, or video. Continuous or numeric: This indicates the numeric nature of the data field. For example, a person's weight measured by a bathroom scale, temperature from a sensor, the monthly balance in dollars on a credit card account. Ordinal: This denotes the data that can be ordered in some way. For example, garment size, such as small, medium, or large; boxing weight classes, such as heavyweight, light heavyweight, middleweight, lightweight,and bantamweight. Target or label: A feature or set of features in the dataset, which is used for learning from training data and predicting in unseen dataset, is known as a target or a label. A label can have any form as specified earlier, that is, categorical, continuous, or ordinal. Machine learning model: Each machine learning algorithm, based on what it learned from the dataset, maintains the state of its learning for predicting or giving insights into future or unseen data. This is referred to as the machine learning model. Sampling: Data sampling is an essential step in machine learning. Sampling means choosing a subset of examples from a population with the intent of treating the behavior seen in the (smaller) sample as being representative of the behavior of the (larger) population. In order for the sample to be representative of the population, care must be taken in the way the sample is chosen. Generally, a population consists of every object sharing the properties of interest in the problem domain,for example,all people eligible to vote in the general election, all potential automobile owners in the next four years.Since it is usually prohibitive (or impossible) to collect data for all the objects in a population, a well-chosen subset is selected for the purposes of analysis.A crucial consideration in the sampling process is that the sample be unbiased with respect to the population. The following are types of probability based sampling: Uniform random sampling: A sampling method when the sampling is done over a uniformly distributed population, that is, each object has an equal probability of being chosen. Stratified random sampling: A sampling method when the data can be categorized into multiple classes.In such cases, in order to ensure all categories are represented in the sample, the population is divided into distinct strata based on these classifications, and each stratum is sampled in proportion to the fraction of its class in the overall population. Stratified sampling is common when the population density varies across categories, and it is important to compare these categories with the same statistical power. Cluster sampling: Sometimes there are natural groups among the population being studied, and each group is representative of the whole population.An example is data that spans many geographical regions. In cluster sampling you take a random subset of the groups followed by a random sample from within each of those groups to construct the full data sample.This kind of sampling can reduce costs of data collection without compromising the fidelity of distribution in the population. Systematic sampling: Systematic or interval sampling is used when there is a certain ordering present in the sampling frame (a finite set of objects treated as the population and taken to be the source of data for sampling, for example, the corpus of Wikipedia articles arranged lexicographically by title). If the sample is then selected by starting at a random object and skipping a constant k number of object before selecting the next one, that is called systematic sampling.K is calculated as the ratio of the population and the sample size. Model evaluation metrics: Evaluating models for performance is generally based on different evaluation metrics for different types of learning. In classification, it is generally based on accuracy, receiver operating characteristics (ROC) curves, training speed, memory requirements, false positive ratio,and so on. In clustering, the number of clusters found, cohesion, separation, and so on form the general metrics. In stream-based learning apart from preceding standard metrics mentioned, adaptability, speed of learning, and robustness to sudden changes are some of the conventional metrics for evaluating the performance of the learner. To illustrate these concepts, a concrete example in the form of a well-known weather dataset is given.The data gives a set of weather conditions and a label that indicates whether the subject decided to play a game of tennis on the day or not: @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature numeric @attribute humidity numeric @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no The dataset is in the format of an ARFF (Attribute-Relation File Format) file. It consists of a header giving the information about features or attributes with their data types and actual comma separated data following the data tag. The dataset has five features: outlook, temperature, humidity, windy, and play. The features outlook and windy are categorical features, while humidity and temperature are continuous. The feature play is the target and is categorical. Machine learning –types and subtypes We will now explore different subtypes or branches of machine learning. Though the following list is not comprehensive, it covers the most well-known types: Supervised learning: This is the most popular branch of machine learning, which is about learning from labeled data. If the data type of the label is categorical, it becomes a classification problem, and if numeric, it becomes a regression problem. For example, if the target of the dataset is detection of fraud, which has categorical values of either true or false, we are dealing with a classification problem. If, on the other hand, the target is to predict thebest price to list the sale of a home at, which is a numeric dollar value, the problem is one of regression. The following diagram illustrates labeled data that is conducive to classification techniques that are suitable for linearly separable data, such as logistic regression: Linearly separable data An example of dataset that is not linearly separable. This type of problem calls for classification techniques such asSupport Vector Machines. Unsupervised learning: Understanding the data and exploring it in order to buildmachine learning models when the labels are not given is called unsupervised learning. Clustering, manifold learning, and outlier detection are techniques that are covered in this topic. Examples of problems that require unsupervised learning are many; grouping customers according to their purchasing behavior is one example.In the case of biological data, tissues samples can be clustered based on similar gene expression values using unsupervised learning techniques The following diagram represents data with inherent structure that can be revealed as distinct clusters using an unsupervised learning technique such as K-Means: Clusters in data Different techniques are used to detect global outliers—examples that are anomalous with respect to the entire data set, and local outliers—examples that are misfits in their neighborhood. In the following diagam, the notion of local and global outliers is illustrated for a two-feature dataset: Local and Global outliers Semi-supervised learning: When the dataset has some labeled data and large data, which is not labeled, learning from such dataset is called semi-supervised learning. When dealing with financial data with the goal to detect fraud, for example, there may be a large amount of unlabeled data and only a small number of known fraud and non-fraud transactions.In such cases, semi-supervised learning may be applied. Graph mining: Mining data represented as graph structures is known as graph mining. It is the basis of social network analysis and structure analysis in different bioinformatics, web mining, and community mining applications. Probabilistic graphmodeling and inferencing: Learning and exploiting structures present between features to model the data comes under the branch of probabilistic graph modeling. Time-series forecasting: This is a form of learning where data has distinct temporal behavior and the relationship with time is modeled.A common example is in financial forecasting, where the performance of stocks in a certain sector may be the target of the predictive model. Association analysis:This is a form of learning where data is in the form of an item set or market basket and association rules are modeled to explore and predict the relationships between the items. A common example in association analysis is to learn relationships between the most common items bought by the customers when they visit the grocery store. Reinforcement learning: This is a form of learning where machines learn to maximize performance based on feedback in the form of rewards or penalties received from the environment. A recent example that famously used reinforcement learning was AlphaGo, the machine developed by Google that beat the world Go champion Lee Sedoldecisively, in March 2016.Using a reward and penalty scheme, the model first trained on millions of board positions in the supervised learning stage, then played itselfin the reinforcement learning stage to ultimately become good enough to triumph over the best human player. http://www.theatlantic.com/technology/archive/2016/03/the-invisible-opponent/475611/ https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf Stream learning or incremental learning: Learning in supervised, unsupervised, or semi-supervised manner from stream data in real time or pseudo-real time is called stream or incremental learning. Learning the behaviors of sensors from different types of industrial systems for categorizing into normal and abnormal needs real time feed and detection. Datasets used in machine learning To learn from data, we must be able to understand and manage data in all forms.Data originates from many different sources, and consequently, datasets may differ widely in structure or have little or no structure at all.In this section, we present a high level classification of datasets with commonly occurring examples. Based on structure, dataset may be classified as containing the following: Structured or record data: Structured data is the most common form of dataset available for machine learning. The data is in theform of records or rows following a well-known format with features that are either columns in a table or fields delimited by separators or tokens. There is no explicit relationship between the records or instances. The dataset is available mostly in flat files or relational databases. The records of financial transactions at a bank shown in the following screenshotare an example of structured data: Financial Card Transactional Data with labels of Fraud. Transaction or market data: This is a special form of structured data whereeach corresponds to acollection of items. Examples of market dataset are the list of grocery item purchased by different customers, or movies viewed by customers as shown in the following screenshot: Market Dataset for Items bought from grocery store. Unstructured data: Unstructured data is normally not available in well-known formats such as structured data. Text data, image, and video data are different formats of unstructured data. Normally, a transformation of some form is needed to extract features from these forms of data to the aforementioned structured datasets so that traditional machine learning algorithms can be applied: Sample Text Data from SMS with labels of spam and ham from by Tiago A. Almeida from the Federal University of Sao Carlos. Sequential data: Sequential data have an explicit notion of order to them. The order can be some relationship between features and time variable in time series data, or symbols repeating in some form in genomic datasets. Two examples are weather data and genomic sequence data. The following diagram shows the relationship between time and the sensor level for weather: Time Series from Sensor Data Three genomic sequences are taken into consideration to show the repetition of the sequences CGGGT and TTGAAAGTGGTG in all the three genomic sequences: Genomic Sequences of DNA as sequence of symbols. Graph data: Graph data is characterized by the presence of relationships between entities in the data to form a graph structure. Graph datasets may be in structured record format or unstructured format. Typically, the graph relationship has to be mined from the dataset. Claims in the insurance domain can be considered structured records containingrelevant claims details withclaimants related through addresses/phonenumbers,and so on.This can be viewed in graph structure. Using theWorld Wide Web as an example, we have web pages available as unstructured datacontaininglinks,and graphs of relationships between web pages that can be built using web links, producing some of the most mined graph datasets today: Insurance Claim Data, converted into graph structure with relationship between vehicles, drivers, policies and addresses Machine learning applications Given the rapidly growing use of machine learning in diverse areas of human endeavor, any attempt to list typical applications in the different industries, where some form of machine learning is in use,must necessarily be incomplete. Nevertheless, in this section we list a broad set of machine learning applications by domain, uses and the type of learning used: Domain/Industry Applications Machine Learning Type Financial Credit Risk Scoring, Fraud Detection, Anti-Money Laundering Supervised, Unsupervised, Graph Models, Time Series, and Stream Learning Web Online Campaigns, Health Monitoring, Ad Targeting Supervised, Unsupervised, Semi-Supervised Healthcare Evidence-based Medicine, Epidemiological Surveillance, Drug Events Prediction, Claim Fraud Detection Supervised, Unsupervised, Graph Models, Time Series, and Stream Learning Internet of Thing (IoT) Cyber Security, Smart Roads, Sensor Health Monitoring Supervised, Unsupervised, Semi-Supervised, and Stream Learning Environment Weather forecasting, Pollution modeling, Water quality measurement Time Series, Supervised, Unsupervised, Semi-Supervised, and Stream Learning Retail Inventory, Customer Management and Recommendations, Layout and Forecasting Time Series, Supervised, Unsupervised, Semi-Supervised, and Stream Learning Summary: A revival of interest is seen in the area of artificial intelligence (AI)and machine learning, in particular, both in academic circles and industry. The use of machine learning  is to help in complex decision making at the highest levels of business. It has also achieved enormous success in improving the accuracy of everyday applications, such as search, speech recognition, and personal assistants on mobile phones. The basics of machine learning rely on understanding of data.Structured datasets have specific formats, and an unstructured dataset is normally in the form of some free flowing text. Machine learning is of two types: Supervised learning is the popular branch of machine learning, which is about learning from labeled data and Unsupervised learning is understanding the data and exploring it in order to build machine learning models when the labels are not given.  Resources for Article: Further resources on this subject: Specialized Machine Learning Topics [article] Machine learning in practice [article] Introduction to Machine Learning with R [article]
Read more
  • 0
  • 0
  • 2624

article-image-parallelize-it
Packt
18 Jul 2017
15 min read
Save for later

Parallelize It

Packt
18 Jul 2017
15 min read
In this article by Elliot Forbes, the author of the book Learning Concurrency in Python, will explain concurrency and parallelism thoroughly, and bring necessary CPU knowledge related to it. Concurrency and parallelism are two concepts that are commonly confused. The reality though is that they are quite different and if you designed software to be concurrent when instead you needed parallel execution then you could be seriously impacting your software’s true performance potential. Due to this, it's vital to know exactly what the two concepts mean so that you can understand the differences. Through knowing these differences you’ll be putting yourself at a distinct advantage when it comes to designing your own high performance software in Python. In this article we’ll be covering the following topics: What is concurrency and what are the major bottlenecks that impact our applications? What is parallelism and how does this differ from concurrency? (For more resources related to this topic, see here.) Understanding concurrency Concurrency is essentially the practice of doing multiple things at the same time, but not specifically in parallel. It can help us to improve the perceived performance of our applications and it can also improve the speed at which our applications run. The best way to think of how concurrency works is to imagine one person working on multiple tasks and quickly switching between these tasks. Imagine this one person was working concurrently on a program and at the same time dealing with support requests. This person would focus primarily on the writing of their program and quickly context switch to fixing a bug or dealing with a support issue should there be one. Once they complete the support task, they could context switch again back to writing their program really quickly. However, in computing there are typically two performance bottlenecks that we have to watch out for and guard against when writing our programs. It’s important to know the differences between the two bottlenecks as if we tried to apply concurrency to a CPU based bottleneck then you could find that the program actually starts to see performance decreases as opposed to increases. And if you tried to apply parallelism to a task that really require a concurrent solution then again you could see the same performance hits. Properties of concurrent systems All concurrent systems share a similar set of properties, these can be defined as: Multiple actors: This represent the different processes and threads all trying to actively make progress on their own tasks. We could have multiple processes that contain multiple threads all trying to run at the same time. Shared Resources: This represents the memory, the disk and other resources that the actors in the above group must utilize in order to perform what they need to do. Rules: All concurrent systems must follow a strict set of rules that define when actors can and can’t acquire locks, access memory, modify state and so on. These rules are vital in order for these concurrent systems to work otherwise our programs would tear themselves apart. Input/Output bottlenecks Input/Output bottlenecks, or I/O bottlenecks for short, are bottlenecks where your computer spends more time waiting on various inputs and outputs than it does on processing the information. You’ll typically find this type of bottleneck when you are working with an I/O heavy application. We could take your standard web browser as an example of a heavy I/O application. In a browser we typically spend a significantly longer amount of time waiting for network requests to finish for things like style sheets, scripts or HTML pages to load as opposed to rendering this on the screen. If the rate at which data is requested is slower than the rate than which it is consumed at then you have yourself an I/O bottleneck. One of the main ways to improve the speed of these applications typically is to either improve the speed of the underlying I/O by buying more expensive and faster hardware or to improve the way in which we handle these I/O requests. A great example of a program bound by I/O bottlenecks would be a web crawler. Now the main purpose of a web crawler is to traverse the web and essentially index web pages so that they can be taken into consideration when Google runs its search ranking algorithm to decide the top 10 results for a given keyword. We’ll start by creating a very simple script that just requests a page and times how long it takes to request said web page: import urllib.request import time t0 = time.time() req = urllib.request.urlopen('http://www.example.com') pageHtml = req.read() t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) If we break down this code, first we import the two necessary modules, urllib.request and the time module. We then record the starting time and request the web page: example.com and then record the ending time and printing out the time difference. Now say we wanted to add a bit of complexity and follow any links to other pages so that we could index them in the future. We could use a library such as BeautifulSoup in order to make our lives a little easier: import urllib.request import time from bs4 import BeautifulSoup t0 = time.time() req = urllib.request.urlopen( 'http://www.example.com' ) t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) soup = BeautifulSoup(req.read(), "html.parser" ) for link in soup.find_all( 'a' ): print (link.get( 'href' )) t2 = time.time() print( "Total Execeution Time: {} Seconds" .format) When I execute the above program I see the results like so in my terminal: You’ll notice from this output that the time to fetch the page is over a quarter of a second. Now imagine we wanted to run our web crawler for a million different web pages, our total execution time would be roughly a million times longer. The main real cause for this enormous execution time would be purely down to the I/O bottleneck we face in our program. We spend a massive amount of time waiting on our network requests and a fraction of that time parsing our retrieved page for further links to crawl. Understanding parallelism Parallelism is the art of executing two or more actions simultaneously as opposed to concurrency in which you make progress on two or more things at the same time. This is an important distinction, and in order to achieve true parallelism, we’ll need multiple processors on which to run our code on at the same time. A good analogy to think of parallel processing is to think of a queue for coffee. If you had say two queues of 20 people all waiting to use this coffee machine so that they can get through the rest of the day. Well this would be an example of concurrency. Now say you were to introduce a second coffee machine into the mix, this would then be an example of something happening in parallel. This is exactly how parallel processing works, each of the coffee machines in that room would represent one processing core and are able to make progress on tasks simultaneously. A real life example which highlights the true power of parallel processing is your computer’s graphics card. These graphics cards tend to have hundreds if not thousands of individual processing cores that live independently and can compute things at the same time. The reason we are able to run high-end PC games at such smooth frame rates is due to the fact we’ve been able to put so many parallel cores onto these cards. CPU bound bottleneck A CPU bound bottleneck is typically the inverse of an I/O bound bottleneck. This bottleneck is typically found in applications that do a lot of heavy number crunching or any other task that is computationally expensive. These are programs for which the rate at which they execute is bound by the speed of the CPU, if you throw a faster CPU in your machine you should see a direct increase in the speed of these programs. If the rate at which you are processing data far outweighs the rate at which you are requesting data then you have a CPU Bound Bottleneck. How do they work on a CPU? Understanding the differences outlined in the previous section between both concurrency and parallelism is essential but it’s also very important to understand more about the systems that your software will be running on. Having an appreciation of the different architecture styles as well as the low level mechanics helps you make the most informed decisions in your software design. Single core CPUs Single core processors will only ever execute one thread at any given time as that is all they are capable of. However, in order to ensure that we don’t see our applications hanging and being unresponsive, these processors rapidly switch between multiple threads of execution many thousands of times per second. This switching between threads is what is called a "context switch" and involves storing all the necessary information for a thread at a specific point of time and then restoring it at a different point further down the line. Using this mechanism of constantly saving and restoring threads allows us to make progress on quite a number of threads within a given second and it appears like the computer is doing multiple things at once. It is in fact doing only one thing at any given time but doing it at such speed that it’s imperceptible to users of that machine. When writing multi-threaded applications in Python it is important to note that these context switches are computationally quite expensive. There is no way to get around this unfortunately and much of the design of operating systems these days is about optimizing for these context switches so that we don’t feel the pain quite as much. Advantages of single core CPUs: They do not require any complex communication protocols between multiple cores Single core CPUs require less power which typically makes them better suited for IoT devices Disadvantages: They are limited in speed and larger applications will cause them to struggle and potentially freeze Heat dissipation issues place a hard limit on how fast a single core CPU can go Clock rate One of the key limitations to a single-core application running on a machine is the Clock Speed of the CPU. When we talk about Clock rate, we are essentially talking about how many clock cycles a CPU can execute every second. For the past 10 years we have watched as manufacturers have been able to surpass Moore’s law which was essentially an observation that the number of transistors one was able to place on a piece of silicon was able to double roughly every 2 years. This doubling of transistors every 2 years paved the way for exponential gains in single-cpu clock rates and CPUs went from the low MHz to the 4-5GHz clock speeds we are seeing on Intel’s i7 6700k processor. But with transistors getting as small as a few nanometers across, this is inevitably coming to an end. We’ve started to hit the boundaries of Physics and unfortunately if we go any smaller we’ll start to be hit by the effects of quantum tunneling. Due to these physical limitations we need to start looking at other methods in order to improve the speeds at which we are able to compute things. This is where Materlli’s Model of Scalability comes into play. Martelli model of scalability The author of Python Cookbook, Alex Martelli came up with a model on scalability which Raymond Hettinger discussed in his brilliant hour-long talk "Thinking about Concurrency", which he gave at PyCon Russia 2016. This model represents three different types of problem and programs: 1 core: single threaded and single process programs 2-8 cores: multithreaded and multiprocess programs 9+ cores: distributed computing The first category, the single core, single threaded category is able to handle a growing number of problems due to the constant improvements of the speed of single core CPUs and as a result the second category is being rendered more and more obsolete. We will eventually hit a limit with the speed at which a 2-8 core system can run at and as a result we’ll have to start looking at other methods such as multiple CPU systems or even distributed computing. If your problem is worth solving quickly and it requires a lot of power then the sensible approach is to go with the distributed computing category and spin up multiple machines and multiple instances of your program in order to tackle your problems in a truly parallel manner. Large enterprise systems that handle hundreds of millions of requests are the main inhabitants of this category. You’ll typically find that these enterprise systems are deployed on tens, if not hundreds of high performance, incredibly powerful servers in various locations across the world. Time-Sharing - the task scheduler One of the most important parts of the Operating System is the task scheduler. This acts as the maestro of the orchestra and directs everything with impeccable precision and incredible timing and discipline. This maestro has only one real goal and that is to ensure that every task has a chance to run through till completion, the when and where of a task’s execution however is non-deterministic. That is to say, if we gave a task scheduler two identical competing processes one after the other, there is no guarantee that the first process will complete first. This non-deterministic nature is what makes concurrent programming so challenging. An excellent example that highlights this non-deterministic behavior is say we take the following code: import threading import time import random counter = 1 def workerA(): global counter while counter < 1000: counter += 1 print("Worker A is incrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def workerB(): global counter while counter > -1000: counter -= 1 print("Worker B is decrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def main(): t0 = time.time() thread1 = threading.Thread(target=workerA) thread2 = threading.Thread(target=workerB) thread1.start() thread2.start() thread1.join() thread2.join() t1 = time.time() print("Execution Time {}".format(t1-t0)) if __name__ == '__main__': main() Here we have two competing threads in Python that are each trying to accomplish their own goal of either decrementing the counter to 1,000 or conversely incrementing it to 1,000. In a single core processor there is the possibility that worker A managers to complete its task before worker B has a chance to execute and the same can be said for worker B. However there is a third potential possibility and that is that the task scheduler continues to switch between worker A and worker B for an infinite number of times and never complete. The above code incidentally also shows one of the dangers of multiple threads accessing shared resources without any form of synchronization. There is no accurate way to determine what will happen to our counter and as such our program could be considered unreliable. Multi-core processors We’ve now got some idea as to how single-core processors work, but now it’s time to take a look at multicore processors. Multicore processors contain multiple independent processing units or “cores”. Each core contains everything it needs in order to execute a sequence of stored instructions. These cores each follow their own cycle: Fetch - This step involves fetching instructions from program memory. This is dictated by a program counter (PC) which identifies the location of the next step to execute. Decode - The core converts the instruction that it has just fetched and converts it into a series of signals that will trigger various other parts of the CPU. Execute - Finally we perform the execute step. This is where we run the instruction that we have just fetched and decoded and typically the results of this execution are then stored in a CPU register. Having multiple cores offers us the advantage of being able to work independently on multiple Fetch -> Decode -> Execute cycles. This style of architecture essentially enables us to create higher performance programs that leverage this parallel execution. Advantages of multicore processors: We are no longer bound by the same performance limitations that a single core processor is bound Applications that are able to take advantage of multiple cores will tend to run faster if well designed Disadvantages of multicore processors: They require more power than your typical single core processor. Cross-core communication is no simple feat, we have multiple different ways of doing this. Summary In this article we covered a multitude of topics including the differences between Concurrency and Parallelism. We also looked at how they both leverage the CPU in different ways. Resources for Article: Further resources on this subject: Python Data Science Up and Running [article] Putting the Fun in Functional Python [article] Basics of Python for Absolute Beginners [article]
Read more
  • 0
  • 0
  • 3016

article-image-optimizing-hadoop
Packt
18 Jul 2017
9 min read
Save for later

Optimizing Hadoop

Packt
18 Jul 2017
9 min read
In this article by Gurmukh Singh, the author of the book Hadoop Administration 2.x Cookbook, we will cover the some Hadoop concepts throughout this article. Hadoop being a distributed system, makes it complex to troubleshoot, tune and operate at scale. In a multi-tenancy environment with varied data sizes and format with different rate of data flow, it is important that the Hadoop clusters are setup in the optimal way, by following best practices. The recipe article walks the users through the phases of installation, planning, tuning, securing, and scaling out the clusters. (For more resources related to this topic, see here.) HDFS as storage layer One of the most important components of Hadoop is the distributed file system like HDFS. It provides high throughput by having parallel operations across multiple nodes and multiple disk. It provides fault tolerance by replicating data across multiple nodes. HDFS is a core part, designed not just for storage of data but also for distributed cache and localization of files and jars for easy access across all nodes in the cluster. We can have custom implementations as long as we call the Hadoop File system API and can extend the features. Hadoop is based on client server architecture with Namenode as master and Datanodes as slave. The Datanodes register with the Namenode and advertise their blocks, which are by default128 MB in size. The HDFS blocks are actual files on the lower file system, with each block having one block file and one checksum file on ext3/4 filesystem. The Namenode keeps a metadata of the objects in the form of namespace and bitmap called inode structure. In HDFS, everything is an object, whether file, block, or directory and it is represented by approximately 150 bytes per object in the Namenode metadata. The entire metadata is loaded into the memory of the Namenode for its operation. Thus, the memory of the Namenode is a limiting factor on the size of the cluster. During the operation of the cluster, Datanodes send heartbeat to the cluster, updating the status and sending the block report. Any client reading, writing data must contact Namenodefor the location of blocks or Datanodesand once Namenode returns the address of the Datanodes, the clientcommunicates directly with the Datanodes for read, write operations. The protocols like RPC, HTTP help with the communication between various parties in the cluster. Each daemon in Hadoop has a light weight Jetty server inbuilt for presenting web UIand also to provide rest API calls, for Data transfer between primary, secondary and other components in the cluster. The recipestalks about, how we setup Hadoop cluster and configureHadoop for addressing various complex production issues,communication channels, failure recovery, location of the Datanode blocks, the Namenode metadata location and many more. YARNFramework For Hadoop 2.x, the YARN framework has provided independence of the process engine we can use in Hadoop. It can be MapReduce, Spark, run with Mesos, and so on. The master ResourceManager(RM) accepts jobs and launches Application master, which controls the task and resources for that job with the help of master. Having separated Application Master (AM) per application, relieves the RM from all the overheads of job management and just concentrate as pure scheduler and honour the resource requests from AM. The task failure, re-launch of containers is the responsibility of AM and it asks for containers on a node after getting a grant from the Resource Manager. Each application makes a resource request, which is a combination of node address, memory, cpu cores, disks, and AM has to take permission from the master RM called grant, to be allocated a container by the Node manager (NM). This is to make sure that any errant AM, does not bring the cluster to a dead lock byaggressively asking for resources. All the job configurations files, jars are localized on the nodes where containers for that particular job will run with the help of HDFS. The NodeManagers (NM), periodically send heart beats to the RM, updating it about the capacity used, capacity available and its presence. A container reservation for particular task can also be done on a particular node, due to locality of reference and the framework will make the best effort to fulfil it. Any containers allocation is given 600secs, to move from Allocated to Running state, else they will be killed. The amount of data that a mapper or reducer can work on, depends upon the memory it has, example the mapper memory mapreduce.map.memory.mb, specifies the mapper memory. It is important to understand the percentage of JVM memory that will contribute to heap, as many buffers are allocated from it like mapred.io.sort.mb. Increasing HDFS block size, does not mean that job will run faster, as memory must be available to accommodate that. This is addressed with example in the recipe, to help better understand the relationship between various parameters and how changing one, impacts the other. The recipes address the layout of YARN components for optimal performance and distribution of load. Hadoop ecosystem Hadoop ecosystem consists of Hive, HBase, Sqoop, Presto, Oozie, and so on. In any organization, there will be a Hadoop stack, consisting of multiple components rather than just HDFS and YARN. The use case could be databases, analytics, NoSQL, ETL, or real time streaming with Spark. As the number of components in a cluster increase, so does the complexity and challenges to manage it. If we design the cluster properly from the very beginning, keeping in mind the future growth prospects, it becomes easy to scale. Many traditional technologies co-exist with Hadoop and there will be need to migrate data across systems using tools like Hive, HBase, Sqoop, and so on. In the recipe, we address migration process, configuration and best practices for Hive, HBase and its interaction. It is important to understand the partitioning and bucketing in Hive to enable scaling to large data sets. The recipes address the partitioning schemes needed to work at scale and return result in a reasonable time. In large organizations with multiple tenants with different use cases and data formats, it is important to understand the cluster layout. With ease of scheduling jobs to run at particular time and on a precondition, users do not need to baby sit the jobs. Capturing logs of various components and jobs, help to diagnose the issues and also help to predict the need to cluster expansion, if any. The granularity, at which the logs and events can be controlled with capture is different tools makes the life easier for the Hadoop operations team. The Hadoop logj4 configuration settings to capture job failures, memory errors or GC events help to quickly troubleshoot any failures as shown in the recipe. Cluster planning and tuning In a production environment with many nodes in a cluster and large number of operations per second, it is important to address performance, failures, backup and recovery of critical components and keep the downtime to minimal.At any given time, there can be many clients connected to the cluster, so it is important to layout the Datanodes across racks evenly, with right set of disks per node. As the cluster size grows to few thousand nodes, the load on the Namenode increases as it has to address larger namespace. What if in a cluster with 3000 Datanode nodes, all Datanode daemons are started in one go? It will generate block advertisement storm, overloading the Namenode, causing it to hang. To address this, we introduce initDelay, tune the Namenode handler count dfs.namenode.handler.count. What about decommissioning a Datanode with 4 TB of disk? How much time will it take? All these needs to be addressed by tuning the appropriate parameters. It is important to monitor the bandwidth/transfer rate to keep a track of the cluster health in terms of network, disk and other resources. As shown in the picture below, the AM container talks with all containers of a job as can be seen in the following image.Node master1.cyrus.com is the RM and AM got allocated on node rt2.cyrus.com. Fan-in and Fan-out of containers gives a great picture about the bottlenecks and the constraints per application type, as shown in the following image: Despite running Hadoop master nodes of reliable hardware, it is important to plan for failures and upgrades. Namenode and ResourceManagerHA, eliminates single point of failure and avoid cluster outages. The high availability (HA) can be achieved by using Shared NFS storage or using QJM. Securing the cluster When Hadoop becomes part of critical businesses like banks, financial domains, aviation, weather forecast, stock exchange, medicine and many more, it becomes important to secure the cluster and safe guard critical and sensitive data. By default, Hadoop is not security enabled for HDFS access or job execution. Every block can be access by any container in the cluster, thus making it unsuitable for critical multi-tenant environments. In any organization, there will not be a separate cluster for each BU, but there might be a few clusters shared by finance, sales, marketing. How do we safeguard each BU's data, do data masking and make sure that any HDFS block is accessed by an authenticated user? To enable security, we configure secure block access, in-transit encryption, Kerberos, ssl. The container on a Datanode, should access only the data blocks for which it is authorized and at the same time is should make things transparent to the end user, hiding the complexity of the implementation. Providing isolated networks for Hadoop clusters with VLAN segregation and firewalls to control the in and out-flow of data. This is well covered with details on secure access, in-transit transfer, Kerberos and single sign-on. Summary Hadoop distributed environment is a complex ecosystem with various challenges of installation, configuration, performance and security. There is no one solution, which fits all, as the work load, data formats for each job will be different with various file formats and data sizes. The recipe addresses the challenges faced by a Hadoop engineer to setup, maintain, optimize, and scale the cluster.  Resources for Article: Further resources on this subject: Event detection from the news headlines in Hadoop [article] Sizing and Configuring your Hadoop Cluster [article] Getting Started with Apache Hadoop and Apache Spark [article]
Read more
  • 0
  • 0
  • 1775
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €14.99/month. Cancel anytime
article-image-and-running-pandas
Packt
18 Jul 2017
15 min read
Save for later

Up and Running with pandas

Packt
18 Jul 2017
15 min read
In this article by Michael Heydt, author of the book Learning Pandas - Second Edition, we will cover how to install pandas and start using its basic functionality.  The content is provided as IPython and Jupyter notebooks, and hence we will also take a quick look at using both of those tools. We will utilize the Anaconda Scientific Python distribution from Continuum. Anaconda is a popular Python distribution with both free and paid components. Anaconda provides cross-platform support—including Windows, Mac, and Linux. The base distribution of Anaconda installs pandas, IPython and Jupyter notebooks, thereby making it almost trivial to get started. (For more resources related to this topic, see here.) IPython and Jupyter Notebook So far we have executed Python from the command line / terminal. This is the default Read-Eval-Print-Loop (REPL)that comes with Python. Let’s take a brief look at both. IPython IPython is an alternate shell for interactively working with Python. It provides several enhancements to the default REPL provided by Python. If you want to learn about IPython in more detail check out the documentation at https://ipython.org/ipython-doc/3/interactive/tutorial.html To start IPython, simply execute the ipython command from the command line/terminal. When started you will see something like the following: The input prompt which shows In [1]:. Each time you enter a statement in the IPythonREPL the number in the prompt will increase. Likewise, output from any particular entry you make will be prefaced with Out [x]:where x matches the number of the corresponding In [x]:.The following demonstrates: This numbering of in and out statements will be important to the examples as all examples will be prefaced with In [x]:and Out [x]: so that you can follow along. Note that these numberings are purely sequential. If you are following through the code in the text and occur errors in input or enter additional statements, the numbering may get out of sequence (they can be reset by exiting and restarting IPython).  Please use them purely as reference. Jupyter Notebook Jupyter Notebook is the evolution of IPython notebook. It is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and markdown. The original IPython notebook was constrained to Python as the only language. Jupyter Notebook has evolved to allow many programming languages to be used including Python, R, Julia, Scala, and F#. If you want to take a deeper look at Jupyter Notebook head to the following URL http://jupyter.org/: where you will be presented with a page similar to the following: Jupyter Notebookcan be downloaded and used independently of Python. Anaconda does installs it by default. To start a Jupyter Notebookissue the following command at the command line/Terminal: $ jupyter notebook To demonstrate, let's look at how to run the example code that comes with the text. Download the code from the Packt website and and unzip the file to a directory of your choosing. In the directory, you will see the following contents: Now issue the jupyter notebook command. You should see something similar to the following: A browser page will open displaying the Jupyter Notebook homepage which is http://localhost:8888/tree. This will open a web browser window showing this page, which will be a directory listing: Clicking on a .ipynb link opens a notebook page like the following: The notebook that is displayed is HTML that was generated by Jupyter and IPython.  It consists of a number of cells that can be one of 4 types: code, markdown, raw nbconvert or heading. Jupyter runs an IPython kernel for each notebook.  Cells that contain Python code are executed within that kernel and the results added to the notebook as HTML. Double-clicking on any of the cells will make the cell editable.  When done editing the contents of a cell, press shift-return,  at which point Jupyter/IPython will evaluate the contents and display the results. If you wan to learn more about the notebook format that underlies the pages, see https://ipython.org/ipython-doc/3/notebook/nbformat.html. The toolbar at the top of a notebook gives you a number of abilities to manipulate the notebook. These include adding, removing, and moving cells up and down in the notebook. Also available are commands to run cells, rerun cells, and restart the underlying IPython kernel. To create a new notebook, go to File > New Notebook > Python 3 menu item. A new notebook page will be created in a new browser tab. Its name will be Untitled: The notebook consists of a single code cell that is ready to have Python entered.  EnterIPython1+1 in the cell and press Shift + Enter to execute. The cell has been executed and the result shown as Out [2]:.  Jupyter also opened a new cellfor you to enter more code or markdown. Jupyter Notebook automatically saves your changes every minute, but it's still a good thing to save manually every once and a while. One final point before we look at a little bit of pandas. Code in the text will be in the format of  command -line IPython. As an example, the cell we just created in our notebook will be shown as follows: In [1]: 1+1 Out [1]: 2 Introducing the pandas Series and DataFrame Let’s jump into using some pandas with a brief introduction to pandas two main data structures, the Series and the DataFrame. We will examine the following: Importing pandas into your application Creating and manipulating a pandas Series Creating and manipulating a pandas DataFrame Loading data from a file into a DataFrame The pandas Series The pandas Series is the base data structure of pandas. A Series is similar to a NumPy array, but it differs by having an index which allows for much richer lookup of items instead of just a zero-based array index value. The following creates a Series from a Python list.: In [2]: # create a four item Series s = Series([1, 2, 3, 4]) s Out [2]:0 1 1 2 2 3 3 4 dtype: int64 The output consists of two columns of information.  The first is the index and the second is the data in the Series. Each row of the output represents the index label (in the first column) and then the value associated with that label. Because this Series was created without specifying an index (something we will do next), pandas automatically creates an integer index with labels starting at 0 and increasing by one for each data item. The values of a Series object can then be accessed through the using the [] operator, paspassing the label for the value you require. The following gets the value for the label 1: In [3]:s[1] Out [3]: 2 This looks very much like normal array access in many programming languages. But as we will see, the index does not have to start at 0, nor increment by one, and can be many other data types than just an integer. This ability to associate flexible indexes in this manner is one of the great superpowers of pandas. Multiple items can be retrieved by specifying their labels in a python list. The following retrieves the values at labels 1 and 3: In [4]: # return a Series with the row with labels 1 and 3 s[[1, 3]] Out [4]: 1 2 3 4 dtype: int64 A Series object can be created with a user-defined index by using the index parameter and specifying the index labels. The following creates a Series with the same valuesbut with an index consisting of string values: In [5]:# create a series using an explicit index s = pd.Series([1, 2, 3, 4], index = ['a', 'b', 'c', 'd']) s Out [5}:a 1 b 2 c 3 d 4 dtype: int64 Data in the Series object can now be accessed by those alphanumeric index labels. The following retrieves the values at index labels 'a' and 'd': In [6]:# look up items the series having index 'a' and 'd' s[['a', 'd']] Out [6]:a 1 d 4 dtype: int64 It is still possible to refer to the elements of this Series object by their numerical 0-based position. : In [7]:# passing a list of integers to a Series that has # non-integer index labels will look up based upon # 0-based index like an array s[[1, 2]] Out [7]:b 2 c 3 dtype: int64 We can examine the index of a Series using the .index property: In [8]:# get only the index of the Series s.index Out [8]: Index(['a', 'b', 'c', 'd'], dtype='object') The index is itself actually a pandas objectand this output shows us the values of the index and the data type used for the index. In this case note that the type of the data in the index (referred to as the dtype) is object and not string. A common usage of a Series in pandas is to represent a time-series that associates date/time index labels withvalues. The following demonstrates by creating a date range can using the pandas function pd.date_range(): In [9]:# create a Series who's index is a series of dates # between the two specified dates (inclusive) dates = pd.date_range('2016-04-01', '2016-04-06') dates Out [9]:DatetimeIndex(['2016-04-01', '2016-04-02', '2016- 04-03', '2016-04-04', '2016-04-05', '2016-04-06'], dtype='datetime64[ns]', freq='D') This has created a special index in pandas referred to as a DatetimeIndex which is a specialized type of pandas index that is optimized to index data with dates and times. Now let's create a Series using this index. The data values represent high-temperatures on those specific days: In [10]:# create a Series with values (representing temperatures) # for each date in the index temps1 = Series([80, 82, 85, 90, 83, 87], index = dates) temps1 Out [10]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, dtype: int64 This type of Series with a DateTimeIndexis referred to as a time-series. We can look up a temperature on a specific data by using the date as a string: In [11]:temps1['2016-04-04'] Out [11]:90 Two Series objects can be applied to each other with an arithmetic operation. The following code creates a second Series and calculates the difference in temperature between the two: In [12]:# create a second series of values using the same index temps2 = Series([70, 75, 69, 83, 79, 77], index = dates) # the following aligns the two by their index values # and calculates the difference at those matching labels temp_diffs = temps1 - temps2 temp_diffs Out [12]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 The result of an arithmetic operation (+, -, /, *, …) on two Series objects that are non-scalar values returns another Series object. Since the index is not integerwe can also then look up values by 0-based value: In [13]:# and also possible by integer position as if the # series was an array temp_diffs[2] Out [13]: 16 Finally, pandas provides many descriptive statistical methods. As an example, the following returns the mean of the temperature differences: In [14]: # calculate the mean of the values in the Serie temp_diffs Out [14]: 9.0 The pandas DataFrame A pandas Series can only have a single value associated with each index label. To have multiple values per index label we can use a DataFrame. A DataFrame represents one or more Series objects aligned by index label.  Each Series will be a column in the DataFrame, and each column can have an associated name.— In a way, a DataFrame is analogous to a relational database table in that it contains one or more columns of data of heterogeneous types (but a single type for all items in each respective column). The following creates a DataFrame object with two columns and uses the temperature Series objects: In [15]:# create a DataFrame from the two series objects temp1 # and temp2 and give them column names temps_df = DataFrame( {'Missoula': temps1, 'Philadelphia': temps2}) temps_df Out [15]: Missoula Philadelphia 2016-04-01 80 70 2016-04-02 82 75 2016-04-03 85 69 2016-04-04 90 83 2016-04-05 83 79 2016-04-06 87 77 The resulting DataFrame has two columns named Missoula and Philadelphia. These columns are new Series objects contained within the DataFrame with the values copied from the original Series objects. Columns in a DataFrame object can be accessed using an array indexer [] with the name of the column or a list of column names. The following code retrieves the Missoula column: In [16]:# get the column with the name Missoula temps_df['Missoula'] Out [16]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 And the following code retrieves the Philadelphia column: In [17]:# likewise we can get just the Philadelphia column temps_df['Philadelphia'] Out [17]:2016-04-01 70 2016-04-02 75 2016-04-03 69 2016-04-04 83 2016-04-05 79 2016-04-06 77 Freq: D, Name: Philadelphia, dtype: int64 A Python list of column names can also be used to return multiple columns.: In [18]:# return both columns in a different order temps_df[['Philadelphia', 'Missoula']] Out [18]: Philadelphia Missoula 2016-04-01 70 80 2016-04-02 75 82 2016-04-03 69 85 2016-04-04 83 90 2016-04-05 79 83 2016-04-06 77 87 There is a subtle difference in a DataFrame object as compared to a Series object. Passing a list to the [] operator of DataFrame retrieves the specified columns whereas aSerieswould return rows. If the name of a column does not have spaces it can be accessed using property-style: In [19]:# retrieve the Missoula column through property syntax temps_df.Missoula Out [19]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 Arithmetic operations between columns within a DataFrame are identical in operation to those on multiple Series. To demonstrate, the following code calculates the difference between temperatures using property notation: In [20]:# calculate the temperature difference between the two #cities temps_df.Missoula - temps_df.Philadelphia Out [20]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 A new column can be added to DataFrame simply by assigning another Series to a column using the array indexer [] notation. The following adds a new column in the DataFrame with the temperature differences: In [21]:# add a column to temp_df which contains the difference # in temps temps_df['Difference'] = temp_diffs temps_df Out [21]: Missoula Philadelphia Difference 2016-04-01 80 70 10 2016-04-02 82 75 7 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 The names of the columns in a DataFrame are accessible via the.columns property. : In [22]:# get the columns, which is also an Index object temps_df.columns Out [22]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') The DataFrame (and Series) objects can be sliced to retrieve specific rows. The following slices the second through fourth rows of temperature difference values: In [23]:# slice the temp differences column for the rows at # location 1 through 4 (as though it is an array) temps_df.Difference[1:4] Out [23]: 2016-04-02 7 2016-04-03 16 2016-04-04 7 Freq: D, Name: Difference, dtype: int64 Entire rows from a DataFrame can be retrieved using the .loc and .iloc properties. .loc ensures that the lookup is by index label, where .iloc uses the 0-based position. - The following retrieves the second row of the DataFrame. In[24]:# get the row at array position 1 temps_df.iloc[1] Out [24]:Missoula 82 Philadelphia 75 Difference 7 Name: 2016-04-02 00:00:00, dtype: int64 Notice that this result has converted the row into a Series with the column names of the DataFrame pivoted into the index labels of the resulting Series.The following shows the resulting index of the result. In [25]:# the names of the columns have become the index # they have been 'pivoted' temps_df.iloc[1].index Out [25]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') Rows can be explicitly accessed via index label using the .loc property. The following code retrieves a row by the index label: In [26]:# retrieve row by index label using .loc temps_df.loc['2016-04-05'] Out [26]:Missoula 83 Philadelphia 79 Difference 4 Name: 2016-04-05 00:00:00, dtype: int64 Specific rows in a DataFrame object can be selected using a list of integer positions. The following selects the values from the Differencecolumn in rows at integer-locations 1, 3, and 5: In [27]:# get the values in the Differences column in rows 1, 3 # and 5using 0-based location temps_df.iloc[[1, 3, 5]].Difference Out [27]:2016-04-02 7 2016-04-04 7 2016-04-06 10 Freq: 2D, Name: Difference, dtype: int64 Rows of a DataFrame can be selected based upon a logical expression that is applied to the data in each row. The following shows values in the Missoulacolumn that are greater than 82 degrees: In [28]:# which values in the Missoula column are > 82? temps_df.Missoula > 82 Out [28]:2016-04-01 False 2016-04-02 False 2016-04-03 True 2016-04-04 True 2016-04-05 True 2016-04-06 True Freq: D, Name: Missoula, dtype: bool The results from an expression can then be applied to the[] operator of a DataFrame (and a Series) which results in only the rows where the expression evaluated to Truebeing returned: In [29]:# return the rows where the temps for Missoula > 82 temps_df[temps_df.Missoula > 82] Out [29]: Missoula Philadelphia Difference 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 This technique is referred to as boolean election in pandas terminologyand will form the basis of selecting rows based upon values in specific columns (like a query in —SQL using a WHERE clause - but as we will see also being much more powerful). Visualization We will dive into visualization in quite some depth in Chapter 14, Visualization, but prior to then we will occasionally perform a quick visualization of data in pandas. Creating a visualization of data is quite simple with pandas. All that needs to be done is to call the .plot() method. The following demonstrates by plotting the Close value of the stock data. In [40]: df[['Close']].plot(); Summary In this article, took an introductory look at the pandas Series and DataFrame objects, demonstrating some of the fundamental capabilities. This exposition showed you how to perform a few basic operations that you can use to get up and running with pandas prior to diving in and learning all the details. Resources for Article: Further resources on this subject: Using indexes to manipulate pandas objects [article] Predicting Sports Winners with Decision Trees and pandas [article] The pandas Data Structures [article]
Read more
  • 0
  • 0
  • 1998

article-image-introduction-iot
Packt
17 Jul 2017
11 min read
Save for later

Introduction to IOT

Packt
17 Jul 2017
11 min read
In this article by Kallol Bosu Roy Choudhuri, the author of the book Learn Arduino Prototyping in 10 days, we will learn about IoT. (For more resources related to this topic, see here.) As per Gartner, the number of connected devices around the world is going to reach 50 billion by the year 2020. Just imagine the magnitude and scale of the hyper-connectedness that is being forged every moment, as we read through this exciting article. Figure 1: A typical IoT scenario (automobile example) As we can see in the preceding figure, a typical IoT-based scenario is composed of the following fundamental building blocks: IoT edge device IoT cloud platform An IoT device is used to serve as a bridge between existing machines on the ground and an IoT cloud platform. The IoT cloud platform provides a cloud-based infrastructure backbone for data acquisition, data storage, and computing power for data analytics and reporting. The Arduino platform can be effectively used for prototyping IoT devices for almost any IoT solution very rapidly. Building the edge device In this section, we will learn how to use the ESP8266 Wi-Fi chip with the Arduino Uno for connecting to the Internet and posting data to an IoT cloud. There are numerous IoT cloud players in the market today, including Microsoft Azure and Amazon IoT. In this article, we will use the ThingSpeak IoT platform that is very simple to use with the Arduino platform. The following parts will be required for this prototype: 1 Arduino Uno R3 1 USB Cable 1 ESP8266-01 Wi-Fi Transceiver module 1 Breadboard 1 pc. 1K Ohms resistor 1 pc. 2k ohms resistor Some jumper wires Once all the parts have been assembled, follow the breadboard circuit shown in the following figure and build the edge device: Figure 2: ESP8266 with Arduino Uno Wiring The important facts to remember in the preceding setup are: The RXD pin of the ESP8266 chip should receive a 3.3V input signal. We have ensured this by employing the voltage division method. For test purposes, the preceding setup should work fine. However, the ESP8266 chip is demanding when it comes to power (read current) consumption, especially during transmission cycles. Just in case the ESP8266 chip does not respond to the Arduino sketch or AT commands properly, then the power supply may not be enough. Try using a separate battery for the setup. When using a separate battery, remember to use a voltage regulator that steps down the voltage to 3.3 volts before supplying the ESP8266 chip. For prolonged usage, a separate battery based power supply is recommended. Smart retail project inspiration In the previous sections, we looked at the basics of achieving IoT prototyping with the Arduino platform and an IoT cloud platform. With this basic knowledge, you are encouraged to start exploring more advanced IoT scenarios. As a future inspiration, the following smart retail project idea is being provided for you to try by applying the basic principles that you have learned in this article. After all, the goal of this article has been to show you the light and make you self-reliant with the Arduino platform. Imagine a large retail store where products are displayed in hundreds of aisles and thousands of racks. Such large warehouse-type retail layouts are common in some countries, usually with furniture sellers. One of the time-consuming tasks that these retail shops face is to keep the price of their displayed inventory matched with the ever changing competitive market rates. For example, the price of a sofa set could be marked at 350 dollars on aisle number 47 rack number 1. Now let's think from a customer’s perspective. Imagine being a potential customer, standing in front of the sofa set we would naturally search for the prices of that sofa set on the Internet. It would not be very surprising to find a similar sofa set that is priced a little lower, maybe at 325 dollars at another store. That is exactly how and when a potential customer would change her mind. The story after this is simple. The customer leaves store A, goes to store B and purchases the sofa set at 325 dollars. In order to address these types of lost sale opportunities, the furniture company management decides to lower the prices of the sofa set to 325 dollars, in order to match the competition. Thereafter, all that needs to be done is for someone to change the price label for the sofa set in aisle number 47 rack number 1, which is a 5–10 minute walk (considering the size of the shop floor) from the shop back office. In a localized store, it is still achievable without further loss of customers. Now, let's appreciate the problem by thinking hyperscale. The furniture seller’s management is located centrally, say in Sweden and they want to dynamically change the product prices for specific price-sensitive products that are displayed across more than 350 stores, in more than 40 countries. The price change should be automatic, near real time, and simultaneous across all company stores. Given the preceding problem statement, it seems a daunting task that could leave hundreds for shop floor staff scampering all day long, just to keep changing price tags for hundreds of products. An elegant solution to this problem lies in the Internet-of-Things. Figure 3: Smart retail with IoT Referring to the preceding figure, it depicts a basic IoT solution for addressing the furniture seller’s problem to matching product prices dynamically on the fly. Remember, since this is an Arduino prototyping article, the stress is on edge device prototyping. However, IoT as a whole; encompasses many areas that include edge devices and cloud platform capabilities. Our focus in this section is to be able to comprehend and build the edge device prototype that supports this smart retail IoT solution. In the preceding solution, the cloud platform takes care of running intelligent program batches to continuously analyze market rates for price-sensitive products. Once a new price has been determined by the cloud job/program, the new product prices are updated in the cloud-hosted database. Immediately after a new price change, all the IoT edge devices attached to the price tag of specific products in the company stores are notified of the new price. We can build a smart LCD panel for displaying product prices. For Internet connectivity, we can reuse the ESP8266 Wi-Fi chip that we learned in this article. Standalone Devices We are already familiar with the basic parts required for building a prototype. The two new aspects to consider when building a standalone project are an independent power source and a project container. Figure 4 - Typical parts of a Standalone Prototype As shown above, typically a standalone device prototype will contain the following parts: The device prototype (Arduino board + Peripherals + all the required Connections) An independent power source A project container/box After the basic prototype has been built the next consideration is to make it operable on its own, like an island. This is because in real world situations, you will often have to make a device that is not directly be connected to and powered from a computer. Therefore we will need to understand the various options that are available for powering our device prototypes and also understand when to choose which option. The second aspect to consider is an appropriate physical container to house the various parts of a project. A container is a physical device container will ensure that all parts of a project are nicely packaged in a safe and aesthetic manner. A distance measurement device Let's build an exciting project by combining an Ultrasonic Sensor with a 16x2 LCD character display to build an electronic distance measurement device. We will use one of the most easily available 9-volt batteries for powering this standalone device prototype. For building the distance measurement device, the following parts will be required. Arduino Uno R3 USB connector 1 pc. 9 volt battery 1 full sized bread board 1 HC-SR04 ultrasonic sensor 1 pc. 16x2 LCD character display 1 pc. 10K potentiometer 2 pcs. 220 ohms resistor 1 pc. 150 ohms resistor Some jumper wires Before we start building the device, let's understand what the device will do and the various parts involved in the device. The purpose of the device will be to be able to measure the distance of an object from the device. The following diagram depicts the overview of the device: Figure 5 - A standalone distance measurement device overview First, let's quickly understand each of the components involved in the preceding setup. Then, we will jump into hands-on prototyping and coding. The ultrasonic sensor model used in this example is known as HC-SR04. HC-SR04 is a standard commercially available ultrasonic transceiver. A transceiver is a device that is capable of transmitting as well as receiving signals. The HC-SR04 transceiver transmits ultrasonic signals. Once the signals hit an object/obstacle, the signals echo back to the HC-SR04 transceiver. The HC-SR04 ultrasonic module is shown below for reference. Figure 6 - The HC-SR04 Ultrasonic module The HC-SR-04 has four pins. The usage of the pins is explained below for easy understanding: Vcc: This pin is connected to a 5 volt power supply. Trig: This pin receives digital signals from the attached microcontroller unit in order to send out an ultrasonic burst. Echo: This pin sends the measured time duration proportional to the distance travelled by the ultrasonic burst. Gnd: This pin is connected to the ground terminal. The total time taken for the ultrasonic signals to echo back from an obstacle can be divided by 2 and then based on the speed of sound in air, the distance between the object and the HC-SR04 can be calculated. We will see how to calculate the distance in the sketch for this device prototype. As per the HC-SR04 data sheet, it is a 5-volt tolerant device, operating at 15 mA, and has a measurement range starting from 2 centimeters to a maximum of 4 meters. The HC-SR04 can be directly connected to the Arduino board pins. The 16x2 LCD character display is also a standard commercially available device, it has 16 columns and 2 rows. The LCD is controlled by its 4 data pins/lines. We will also see how to send string outputs to the LCD from the Arduino sketch. The power supply being used in today's example is a standard 9-volt battery plugged in Arduino's DC IN Jack. Alternatively, another option is to use 6 pieces of either AA-sized or AAA-sized batteries in series and plug them into the VIN pin of the Arduino board. Distance measurement device circuit Follow the bread board diagram shown next to build the distance measurement device. The diagram shown on the next page is quite complex. Take your time as you unravel through it. All the components (including the Arduino board) in this prototype are powered from the 9 volt battery. Sometimes the LCD procured online might not ship with soldered header pins. In such a case, you will have to solder 16 header pins. It is very important to note that unless the header pins are soldered properly into the LCD board, the LCD screen will not work correctly. This is a very challenging prototype to get it working at one go. Make sure there are no loose jumper wires. Notice how the positive and negative terminals of the power source are plugged into the VIN and GND pins of the Arduino board respectively. The 10K potentiometer has three legs. If you look straight at the breadboard diagram, the left hand side leg of the potentiometer is connected to the 5V power supply rail of the breadboard. Figure 7 - Typical potentiometersr The right hand-side leg is connected to the common ground rail of the breadboard. The leg in the middle is the regulated (via the potentiometer's 10K resistance dial) output that controls the LCD's V0/VEE pin. Basically, this pin controls the contrast of the display. You will also have to adjust (a simple screw driver may be used) the 10K potentiometer dial (to around halfway at 5K) to make the characters visible on the LCD screen. Initially, you may not see anything on the LCD, until the potentiometer is adjusted properly. Figure 8 - Distance measurement device prototype When the 'Trig' pin receives a signal (via pin D8 in this example) and results in sending out ultrasonic waves to the surroundings. As soon as the ultrasonic waves collide with an obstacle, they get reflected. The reflected ultrasonic waves are received by the HC-SR04 sensor. The Echo pin is used to read the output of the ultrasonic sensor (via pin D7 in this example). The output read from the 'Echo' pin is processed by the Arduino sketch to calculate the distance. Summary Thus an IoT device is used to serve as a bridge between existing machines on the ground and an IoT cloud platform. Resources for Article: Further resources on this subject: Introducing IoT with Particle's Photon and Electron [article] IoT and Decision Science [article] IoT Analytics for the Cloud [article]
Read more
  • 0
  • 0
  • 3566

article-image-initial-configuration-sco-2016
Packt
17 Jul 2017
13 min read
Save for later

Initial Configuration of SCO 2016

Packt
17 Jul 2017
13 min read
In this article by Michael Seidl, author of the book Microsoft System Center 2016 Orchestrator Cookbook - Second Edition, will show you how to setup Orchestrator Environment and how to deploy and configure Orchestrator Integration Packs. (For more resources related to this topic, see here.) Deploying an additional Runbook designer Runbook designer is the key feature to build your Runbooks. After the initial installation, Runbook designer is installed on the server. For your daily work with orchestrator and Runbooks, you would like to install the Runbook designer on your client or on admin server. We will go through these steps in this recipe. Getting ready You must review the planning the Orchestrator deployment recipe before performing the steps in this recipe. There are a number of dependencies in the planning recipe you must perform in order to successfully complete the tasks in this recipe. You must install a management server before you can install the additional Runbook Designers. The user account performing the installation has administrative privileges on the server nominated for the SCO deployment and must also be a member of OrchestratorUsersGroup or equivalent rights. The example deployment in this recipe is based on the following configuration details: Management server called TLSCO01 with a remote database is already installed System Center 2016 Orchestrator How to do it... The Runbook designer is used to build Runbooks using standard activities and or integration pack activities. The designer can be installed on either a server class operating system or a client class operating system. Follow these steps to deploy an additional Runbook Designer using the deployment manager: Install a supported operating system and join the active directory domain in scope of the SCO deployment. In this recipe the operating system is Windows 10. Ensure you configure the allowed ports and services if the local firewall is enabled for the domain profile. See the following link for details: https://technet.microsoft.com/en-us/library/hh420382(v=sc.12).aspx. Log in to the SCO Management server with a user account with SCO administrative rights. Launch System Center 2016 Orchestrator Deployment Manager: Right-click on Runbook designers, and select Deploy new Runbook Designer: Click on Next on the welcome page. Type the computer name in the Computer field and click on Add. Click on Next. On the Deploy Integration Packs or Hotfixes page check all the integration packs required by the user of the Runbook designer (for this example we will select the AD IP). Click on Next. Click on Finish to begin the installation using the Deployment Manager. How it works... The Deployment Manager is a great option for scaling out your Runbook Servers and also for distributing the Runbook Designer without the need for the installation media. In both cases the Deployment Manager connects to the Management Server and the database server to configure the necessary settings. On the target system the deployment manager installs the required binaries and optionally deploys the integration packs selected. Using the Deployment Manager provides a consistent and coordinated approach to scaling out the components of a SCO deployment. See also The following official web link is a great source of the most up to date information on SCO: https://docs.microsoft.com/en-us/system-center/orchestrator/ Registering an SCO Integration Pack Microsoft System Center 2016 Orchestrator (SCO) automation is driven by process automation components. These process automation components are similar in concept to a physical toolbox. In a toolbox you typically have different types of tools which enable you to build what you desire. In the context of SCO these tools are known as Activities. Activities fall into two main categories: Built-in Standard Activities: These are the default activity categories available to you in the Runbook Designer. The standard activities on their own provide you with a set of components to create very powerful Runbooks. Integration Pack Activities: Integration Pack Activities are provided either by Microsoft, the community, solution integration organizations, or are custom created by using the Orchestrator Integration Pack Toolkit. These activities provide you with the Runbook components to interface with the target environment of the IP. For example, the Active Directory IP has the activities you can perform in the target Active Directory environment. This recipe provides the steps to find and register the second type of activities into your default implementation of SCO. Getting ready You must download the Integration Pack(s) you plan to deploy from the provider of the IP. In this example we will be deploying the Active Directory IP, which can be found at the following link: https://www.microsoft.com/en-us/download/details.aspx?id=54098. You must have deployed a System Center 2016 Orchestrator environment and have full administrative rights in the environment. How to do it... The following diagram provides a visual summary and order of the tasks you need to perform to complete this recipe: We will deploy the Microsoft Active Directory (AD) integration pack (IP). Integration pack organization A good practice is to create a folder structure for your integration packs. The folders should reflect versions of the IPs for logical grouping and management. The version of the IP will be visible in the console and as such you must perform this step after you have performed the step to load the IP(s). This approach will aid in change management when updating IPs in multiple environments. Follow these steps to deploy the Active Directory integration pack. Identify the source location for the Integration Pack in scope (for example, the AD IP for SCO2016). Download the IP to a local directory on the Management Server or UNC share. Log in to the SCO Management server. Launch the Deployment Manager: Under Orchestrator Management Server, right-click on Integration Packs. Select Register IP with the Orchestrator Management Server: Click on Next on the welcome page. Click on Add on the Select Integration Packs or Hotfixes page. Navigate to the directory where the target IP is located, click on Open, and then click on Next. Click on Finish . Click on Accept on End-User License Agreement to complete the registration. Click on Refresh to validate if the IP has successfully been registered. How it works... The process of loading an integration pack is simple. The prerequisite for successfully registering the IP (loading) is ensuring you have downloaded a supported IP to a location accessible to the SCO management server. Additionally the person performing the registration must be a SCO administrator. At this point we have registered the Integration Pack to our Deployment Wizard, 2 Steps are still necessary before we can use the Integration Pack, see our following Recipe for this. There's more... Registering the IP is the first part of the process of making the IP activities available to Runbook designers and Runbook Servers. The next Step has to be the Deployment of Integration Packs to Runbook Designer. See the next Recipe for that. Orchestrator Integration Packs are provided not only by Microsoft, also third party Companies like Cisco or NetAPP are providing OIP’s for their Products. Additionally there is a huge Community which are providing Orchestrator Integration Packs. There are several Sources of downloading Integration Packs, here are two useful links: http://www.techguy.at/liste-mit-integration-packs-fuer-system-center-orchestrator/ http://scorch.codeplex.com/ https://www.microsoft.com/en-us/download/details.aspx?id=54098 Deploying the IP to designers and Runbook servers Registering the Orchestrator Integration Pack is only the first step, you also need to deploy the OIP to your Designer or Runbook Server. Getting Ready You have to follow the steps described in Recipe Registering an SCO Integration Pack before you can start with the next steps to deploy an OIP. How to do it In our example we will deploy the Active Direcgtory Integration Pack to our Runbooks Desginer. Follow these steps to deploy the Active Directory integration pack. Once the IP in scope (AD IP in our example) has successfully been registered, follow these steps to deploy it to the Runbook Designers and Runbook Servers. Log in to the SCO Management server and launch Deployment Manager: Under Orchestrator Management Server, right-click on the Integration Pack in scope and select Deploy IP to Runbook Server or Runbook Designer: Click on Next on the welcome page, select the IP you would like to deploy (in our example, System Center Integration Pack for Active Directory ,  and then click on Next. On the computer Selection page. Type the name of the Runbook Server or designer  in scope and click on Add (repeat for all servers in the scope).  On the Installation Options page you have the following three options: Schedule the Installation: select this option if you want to schedule the deployment for a specific time. You still have to select one of the next two options. Stop all running Runbooks before installing the Integration Packs  or Hotfixes: This option will as described stop all current Runbooks in the environment. Install the Integration Packs or Hotfixes without stopping the running Runbooks: This is the preferred option if you want to have a controlled deployment without impacting current jobs: Click on Next after making your installation option selection. Click on Finish The integration pack will be deployed to all selected designers and Runbook servers. You must close all Runbook designer consoles and re-launch to see the newly deployed Integration Pack. How it works… The process of deploying an integration pack is simple. The pre-requisite for successfully deploying the IP (loading) is ensuring you have registered a supported IP in the SCO management server. Now we have successfully deployed an Orchestrator Integration Pack. If you have deployed it to a Runbook designer, make sure you close and reopen the designer to be able to use the activities in this Integration Pack. Now your are able to use these activities to build your Runbooks, the only thing you have to do, is to follow our next recipe and configure this Integration Pack. This steps can be used for each single Integration Pack, also deploy multiple OIP with one deployment. There’s more… You have to deploy an OIP to every single Designer and Runbook Server, where you want to work with the Activities. Doesn’t matter if you want to edit a Runbook with the Designer or want to run a Runbook on a special Runbook Server, the OIP has to be deployed to both. With Orchestrator Deployment Manager, this is a easy task to do. Initial Integration Pack configuration This recipe provides the steps required to configure an integration pack for use once it has been successfully deployed to a Runbook designer. Getting ready You must deploy an Orchestrator environment and also deploy the IP you plan to configure to a Runbook designer before following the steps in this recipe. The authors assume the user account performing the installation has administrative privileges on the server nominated for the SCO Runbook designer. How to do it... Each integration pack serves as an interface to the actions SCO can perform in the target environment. In our example we will be focusing on the Active Directory connector. We will have two accounts under two categories of AD tasks in our scenario: IP name Category of actions Account name Active Directory Domain Account Management SCOAD_ACCMGT Active Directory Domain Administrator Management SCOAD_DOMADMIN The following diagram provides a visual summary and order of the tasks you need to perform to complete this recipe: Follow these steps to complete the configuration of the Active Directory IP options in the Runbook Designer: Create or identify an existing account for the IP tasks. In our example we are using two accounts to represent two personas of a typical active directory delegation model. SCOAD_ACCMGT is an account with the rights to perform account management tasks only and SCOAD_DOMADMIN is a domain admin account for elevated tasks in Active Directory. Launch the Runbook Designer as a SCO administrator, select Options from the menu bar, and select the IP to configure (in our example, Active Directory). Click on Add, type AD Account Management in the Name: field, select Microsoft Active Directory Domain Configuration in the Type field by clicking on the. In the Properties section type the following: Configuration User Name: SCOAD_ACCMGT Configuration Password: Enter the password for SCOAD_ACCMGT Configuration Domain Controller Name (FQDN): The FQDN of an accessible domain controller in the target AD (In this example, TLDC01.TRUSTLAB.LOCAL). Configuration Default Parent Container: This is an optional field. Leave it blank: Click on OK. Repeat steps 3 and 4 for the Domain Admin account and click on Finish to complete the configuration. How it works... The IP configuration is unique for each system environment SCO interfaces with for the tasks in scope of automation. The active directory IP configuration grants SCO the rights to perform the actions specified in the Runbook using the activities of the IP. Typical Active Directory activities include, but are not limited to creating user and computer accounts, moving user and computer accounts into organizational units, or deleting user and computer accounts. In our example we created two connection account configurations for the following reasons: Follow the guidance of scoping automation to the rights of the manual processes. If we use the example of a Runbook for creating user accounts we do not need domain admin access. A service desk user performing the same action manually would typically be granted only account management rights in AD. We have more flexibility with delegating management and access to Runbooks. Runbooks with elevated rights through the connection configuration can be separated from Runbooks with lower rights using folder security. The configuration requires planning and understanding of its implication before implementing. Each IP has its own unique options which you must specify before you create Runbooks using the specified IP. The default IPs that you can download from Microsoft include the documentation on the properties you must set. There’s more… As you have seen in this recipe, we need to configure each additional Integration Pack with a Connections String, User and Password. The built in Activities from SCO, are using the Service Account rights to perform this Actions, or you can configure a different User for most of the built in Activities.  See also The official online documentation for Microsoft Integration Packs is updated regularly and should be a point for reference at https://www.microsoft.com/en-us/download/details.aspx?id=54098 The creating and maintaining a security model for Orchestrator in this article expands further on the delegation model in SCO. Summary In this article, we have covered the following: Deploying an Additional Runbook Designer Registering an SCO Integration Pack Deploying an SCO Integration Pack to Runbook Designer and Server Initial Integration Pack Configuration Resources for Article: Further resources on this subject: Deploying the Orchestrator Appliance [article] Unpacking System Center 2012 Orchestrator [article] Planning a Compliance Program in Microsoft System Center 2012 [article]
Read more
  • 0
  • 0
  • 6354

article-image-introduction-moodle-3
Packt
17 Jul 2017
13 min read
Save for later

Introduction to Moodle 3

Packt
17 Jul 2017
13 min read
In this article, Ian Wild, the author of the book, Moodle 3.x Developer's Guide will be intoroducing you to Moodle 3.For any organization considering implementing an online learning environment, Moodle is often the number one choice. Key to its success is the free, open source ethos that underpins it. Not only is the Moodle source code fully available to developers, but Moodle itself has been developed to allow for the inclusion of third party plugins. Everything from how users access the platform, the kinds of teaching interactions that are available, through to how attendance and success can be reported – in fact all the key Moodle functionality – can be adapted and enhanced through plugins. (For more resources related to this topic, see here.) What is Moodle? There are three reasons why Moodle has become so important, and much talked about, in the world of online learning, one technical, one philosophical and the third educational. From a technical standpoint Moodle - an acronym for Modular Object Oriented Dynamic Learning Environment – is highly customizable. As a Moodle developer, always remember that the ‘M’ in Moodle stands for modular. If you are faced with a client feature request that demands a feature Moodle doesn’t support then don’t panic. The answer is simple: we create a new custom plugin to implement it. Check out the Moodle Plugins Directory (https://moodle.org/plugins/) for a comprehensive library of supported 3rd party plugins that Moodle developers have created and given back to the community. And this leads to the philosophical reason why Moodle dominates. Free open source software for education Moodle is grounded firmly in a community-based, open source philosophy (see https://en.wikipedia.org/wiki/Open-source_model). But what does this mean for developers? Fundamentally, it means that we have complete access to the source code and, within reason, unfettered access to the people who develop it. Access to the application itself is free – you don’t need to pay to download it and you don’t need to pay to run it. But be aware of what ‘free’ means in this context. Hosting and administration, for example, take time and resources and are very unlikely to be free. As an educational tool, Moodle was developed to support social constructionism (see https://docs.moodle.org/31/en/Pedagogy) – if you are not familiar with this concept then it is essentially suggesting that building an understanding of a concept or idea can be best achieved by interacting with a broad community. The impact on us as Moodle plugin developers is that there is a highly active group of users and developers. Before you begin developing any Moodle plugins come and join us at https://moodle.org. Plugin Development – Authentication In this article, we will be developing a novel plugin that will seamlessly integrate Moodle and the WordPress content management system. Our plugin will authorize users via WordPress when they click on a link to Moodle when on a WordPress page. The plugin discussed in this article has already been released to the Moodle community – check out the Moodle Plugins Directory at https://moodle.org/plugins/auth_wordpress for details. Let us start by learning what Moodle authentication is and how new user accounts are created. Authentication Moodle supports a range of different authentication methods out of the box, each one supported by its own plugin. To go to the list of available plugins, from the Administration block, click on Site administration, click Plugins, then click Authentication, and finally click on Manage authentication. The list of currently installed authentication plugins is displayed: Each plugin interfaces with an internal Application Programming Interface (API), the Access API – see the Moodle developer documentation for details here: https://docs.moodle.org/dev/Access_API Getting logged in There are two ways of prompting the Moodle authentication process: Attempting to log in from the log in page. Clicking on a link to a protected resource (i.e. a page or file that you can’t view or download without logging in). For an overview of the process, take a look in the developer documentation at https://docs.moodle.org/dev/Authentication_plugins#Overview_of_Moodle_authentication_process. After checks to determine if an upgrade is required (or if we are partway through the upgrade process), there is a short fragment of code that loads the configured authentication plugins and for each one calls a special method called loginpage_hook(): $authsequence = get_enabled_auth_plugins(true); // auths, in sequence foreach($authsequence as $authname) { $authplugin = get_auth_plugin($authname); $authplugin->loginpage_hook(); } The loginpage_hook() function gives each authentication plugin the chance to intercept the login. Assuming that the login has not been intercepted, the process then continues with a check to ensure the supplied username conforms to the configured standard before calling authenticate_user_login() which, if successful, returns a $user object. OAuth Overview The OAuth authentication mechanism provides secure delegated access. OAuth supports a number of scenarios, including: A client requests access from a server and the server responds with either a ‘confirm’ or ‘deny’. This is called two legged authentication A client requests access from a server and the server, the server then pops up a confirmation dialog so that the user can authorize the access, and then it finally responds with either a ‘confirm’ or ‘deny’. This is called three legged authentication In this article we will be implementing such a mechanism means, in practice, that: An authentication server will only talk to configured clients No passwords are exchanged between server and client – only tokens are exchanged, which are meaningless on their own By default, users need to give permission before resources are accessed Having given an overview, here is the process again described in a little more detail: A new client is configured in the authentication server. A client is allocated a unique client key, along with a secret token (referred to as client secret) The client POSTs an HTTP request to the server (identifying itself using the client key and client secret) and the server responds with a temporary access token. This token is used to request authorization to access protected resources from the server. In this case ‘protected resources’ mean the WordPress API. Access to the WordPress API will allow us to determine details of the currently logged in user. The server responds not with an HTTP response but by POSTing new permanent tokens back to the client via a callback URI (i.e. the server talks to the client directly in order to ensure security). The process ends with the client possessing permanent authorization tokens that can be used to access WP-API functions. Obviously, the most effective way of learning about this process is to implement it so let’s go ahead and do that now. Installing the WordPress OAuth 1.0a server The first step will be to add the OAuth 1.0a server plugin to WordPress. Why not the more recent OAuth 2.0 server plugin? This is because 2.0 only supports https:// and not http://. Also, internally (at least at time of writing) WordPress will only authenticate internally using either OAuth 1.0a or cookies. Log into WordPress as an administrator and, from the Dashboard, hover the mouse over the Plugins menu item and click on Installed Plugins. The Plugins page is displayed. At the top of the page, press the Add New button: As described previously, ensure that you install version 1.0a and not 2.0: Once installed, we need to configure a new client. From the Dashboard menu, hover the mouse over Users and you will see a new Applications menu item has been added. Click on this to display the Registered Applications page. Click the Add New button to display the Add Application page: The Consumer Name is the title for our client that will appear in the Applications page, and Description is a brief explanation of that client to aid with the identification. The Callback is the URI that WordPress will talk to (refer to the outline of the OAuth authentication steps). As we have not yet developed the Moodle/OAuth client end yet you can specify oob in Callback (this stands for ‘out of band’). Once configured, WordPress will generate new OAuth credentials, a Client Key and a Client Secret: Having installed and configured the server end now it’s time to develop the client. Creating a new Moodle auth plugin Before we begin, download the finished plugin from https://github.com/iandavidwild/moodle-auth_wordpress and install it in your local development Moodle instance. The development of a new authentication plugin is described in the developer documentation at https://docs.moodle.org/dev/Authentication_plugins. As described there, let us start with copying the none plugin (the no login authentication method) and using this as a template for our new plugin. In Eclipse, I’m going to copy the none plugin to a new authentication method called wordpress: That done, we need to update the occurrences of auth_none to auth_wordpress. Firstly, rename /auth/wordpress/lang/en/auth_none.php to auth_wordpress.php. Then, in auth.php we need to rename the class auth_plugin_none to auth_plugin_wordpress. As described, the Eclipse Find/Replace function is great for updating scripts: Next, we need to update version information in version.php. Update all the relevant names, descriptions and dates. Finally, we can check that Moodle recognises our new plugin by navigating to the Site administration menu and clicking on Notifications. If installation is successful, our new plugin will be listed on the Available authentication plugins page: Configuration Let us start with considering the plugin configuration. We will need to allow a Moodle administrator to configure the following: The URL of the WordPress installation The client key and client secret provided by WordPress There is very little flexibility in the design of an authentication plugin configuration page so at this stage, rather than creating a wireframe drawing and having this agreed with the client, we can simply go ahead and write the code. The configuration page is defined in /config.html. Remember to start declaring the relevant language strings in /lang/en/auth_wordpress.php. Configuration settings themselves will be managed by the Moodle framework by calling our plugin’s process_config() method. Here is the declaration: /** * Processes and stores configuration data for this authentication plugin. * * @return @bool */ function process_config($config) { // Set to defaults if undefined if (!isset($config->wordpress_host)) { $config->wordpress_host = ‘‘; } if (!isset($config->client_key)) { $config->client_key = ‘‘; } if (!isset($config->client_secret)) { $config->client_secret = ‘‘; } set_config(‘wordpress_host’, trim($config->wordpress_host), ‘auth/wordpress’); set_config(‘client_key’, trim($config->client_key), ‘auth/wordpress’); set_config(‘client_secret’, trim($config->client_secret), ‘auth/wordpress’); return true; } Having dealt with configuration, now let us start managing the actual OAuth process. Handling OAuth calls Rather than go into the details of how we can send HTTP requests to WordPress let’s use a third party library to do this work. The code I’m going to use is based on Abraham Williamstwitteroauth library (see https://github.com/abraham/twitteroauth). In Eclipse, take a look at the files OAuth.php and BasicOAuth.php for details. To use the library, you will need to add the following lines to the top of /wordpress/auth.php: require_once($CFG->dirroot . ‘/auth/wordpress/OAuth.php’); require_once($CFG->dirroot . ‘/auth/wordpress/BasicOAuth.php’); use OAuth1BasicOauth; Let’s now start work on handling the Moodle login event. Handling the Moodle login event When a user clicks on link to a protected resource Moodle calls loginpage_hook() in each enabled authentication plugin. To handle this, let us first implement loginpage_hook(). We need to add the following lines to auth.php: /** * Will get called before the login page is shown. * */ function loginpage_hook() { $client_key = $this->config->client_key; $client_secret = $this->config->client_secret; $wordpress_host = $this->config->wordpress_host; if( (strlen($wordpress_host) > 0) && (strlen($client_key) > 0) && (strlen($client_secret) > 0) ) { // kick ff the authentication process $connection = new BasicOAuth($client_key, $client_secret); // strip the trailing slashes from the end of the host URL to avoid any confusion (and to make the code easier to read) $wordpress_host = rtrim($wordpress_host, ‘/’); $connection->host = $wordpress_host . “/wp-json”; $connection->requestTokenURL = $wordpress_host . “/oauth1/request”; $callback = $CFG->wwwroot . ‘/auth/wordpress/callback.php’; $tempCredentials = $connection->getRequestToken($callback); // Store temporary credentials in the $_SESSION }// if } This implements the first leg of the authentication process and the variable $tempCredentials will now contain a temporary access token. We will need to store these temporary credentials and then call on the server to ask the user to authorize the connection (leg two). Add the following lines immediately after the // Store temporary credentials in the $_SESSION comment: $_SESSION[‘oauth_token’] = $tempCredentials[‘oauth_token’]; $_SESSION[‘oauth_token_secret’] = $tempCredentials[‘oauth_token_secret’]; $connection->authorizeURL = $wordpress_host . “/oauth1/authorize”; $redirect_url = $connection->getAuthorizeURL($tempCredentials); header(‘Location: ‘ . $redirect_url); die; Next, we need to implement the OAuth callback. Create a new script called callback.php: The callback.php script will need to: Sanity check the data being passed back from WordPress and fail gracefully if there is an issue Get the wordpress authentication plugin instance (an instance of auth_plugin_wordpress) Call on a handler method that will perform the authentication (which we will then need to implement) The script is simple, short and available here: https://github.com/iandavidwild/moodle-auth_wordpress/blob/MOODLE_31_STABLE/callback.php Now, in the auth.php script, we need to implement the callback_handler() method to auth_plugin_wordpress. You can check out the code in GitHub. Visit https://github.com/iandavidwild/moodle-auth_wordpress/blob/MOODLE_31_STABLE/auth.php and scroll down to the call_backhandler() method. Lastly, let us add a fragment of code to the loginpage_hook() method that allows us to turn off WordPress authentication in config.php. Add the following to the very beginning of the loginpage_hook() function: global $CFG; if(isset($CFG->disablewordpressauth) && ($CFG->disablewordpressauth == true)) { return; } Summary In this article, we introduced the Moodle learning platform, investigated the open source philosophy that underpins it, and how Moodle’s functionality can be extended and enhanced through plugins. We took a pre-existing plugin to develop a new WordPress authentication module. This will allow a user logged into WordPress to automatically log into Moodle. To do so we implemented three legged Oauth 1.0a WordPress to Moodle authentication. Check out the complete code at https://github.com/iandavidwild/moodle-auth_wordpress/blob/MOODLE_31_STABLE/callback.php. More information on the plugin described in this article is available from the main Moodle website at https://moodle.org/plugins/auth_wordpress. Resources for Article:   Further resources on this subject: Introduction to Moodle [article] Moodle for Online Communities [article] An Introduction to Moodle 3 and MoodleCloud [article]
Read more
  • 0
  • 0
  • 3023
article-image-introduction-hololens
Packt
11 Jul 2017
10 min read
Save for later

Introduction to HoloLens

Packt
11 Jul 2017
10 min read
In this article, Abhijit Jana, Manish Sharma, and Mallikarjuna Rao, the authors of the book, HoloLens Blueprints, we will be covering the following points to introduce you to using HoloLens for exploratory data analysis. Digital Reality - Under the Hood Holograms in reality Sketching the scenarios 3D Modeling workflow Adding Air Tap on speaker Real-time visualization through HoloLens (For more resources related to this topic, see here.) Digital Reality - Under the Hood Welcome to the world of Digital Reality. The purpose of Digital Reality is to bring immersive experiences, such as taking or transporting you to different world or places, make you interact within those immersive, mix digital experiences with reality, and ultimately open new horizons to make you more productive. Applications of Digital Reality are advancing day by day; some of them are in the field of gaming, education, defense, tourism, aerospace, corporate productivity, enterprise applications, and so on. The spectrum and scenarios of Digital Reality are huge. In order to understand them better, they are broken down into three different categories:  Virtual Reality (VR): It is where you are disconnected from the real world and experience the virtual world. Devices available on the market for VR are Oculus Rift, Google VR, and so on. VR is the common abbreviation of Virtual Reality. Augmented Reality (AR): It is where digital data is overlaid over the real world. Pokémon GO, one of the very famous games, is an example of the this globally. A device available on the market, which falls under this category, is Google Glass. Augmented Reality is abbreviated to AR. Mixed Reality (MR): It spreads across the boundary of the real environment and VR. Using MR, you can have a seamless and immersive integration of the virtual and the real world. Mixed Reality is abbreviated to MR. This topic is mainly focused on developing MR applications using Microsoft HoloLens devices. Although these technologies look similar in the way they are used, and sometimes the difference is confusing to understand, there is a very clear boundary that distinguishes these technologies from each other. As you can see in the following diagram, there is a very clear distinction between AR and VR. However, MR has a spectrum, that overlaps across all three boundaries of real world, AR, and MR. Digital Reality Spectrum The following table describes the differences between the three: Holograms in reality Till now, we have mentioned Hologram several times. It is evident that these are crucial for HoloLens and Holographic apps, but what is a Hologram? Virtual Reality Complete Virtual World User is completely isolated from the Real World Device examples: Oculus Rift and Google VR Augmented Reality Overlays Data over the real world Often used for mobile devices Device example: Google Glass Application example: Pokémon GO Mixed Reality Seamless integration of the real and virtual world Virtual world interacts with Real world Natural interactions Device examples: HoloLens and Meta Holograms are the virtual objects which will be made up with light and sound and blend with the real world to give us an immersive MR experience with both real and virtual worlds. In other words, a Hologram is an object like any other real-world object; the only difference is that it is made up of light rather than matter. The technology behind making holograms is known as Holography. The following figure represent two holographic objects placed on the top of a real-size table and gives the experience of placing a real object on a real surface: Holograms objects in real environment Interacting with holograms There are basically five ways that you can interact with holograms and HoloLens. Using your Gaze, Gesture, and Voice and with spatial audio and spatial mapping. Spatial mapping provides a detailed illustration of the real-world surfaces in the environment around HoloLens. This allows developers to understand the digitalized real environments and mix holograms into the world around you. Gaze is the most usual and common one, and we start the interaction with it. At any time, HoloLens would know what you are looking at using Gaze. Based on that, the device can take further decisions on the gesture and voice that should be targeted. Spatial audio is the sound coming out from HoloLens and we use spatial audio to inflate the MR experience beyond the visual. HoloLens Interaction Model Sketching the scenarios The next step after elaborating scenario details is to come up with sketches for this scenario. There is a twofold purpose for sketching; first, it will be input to the next phase of asset development for the 3D Artist, as well as helping to validate requirements from the customer, so there are no surprises at the time of delivery. For sketching, either the designer can take it up on their own and build sketches, or they can take help from the 3D Artist. Let's start with the sketch for the primary view of the scenario, where the user is viewing the HoloLens's hologram: Roam around the hologram to view it from different angles Gaze at different interactive components Sketch for user viewing hologram for the HoloLens Sketching - interaction with speakers While viewing the hologram, a user can gaze at different interactive components. One such component, identified earlier, is the speaker. At the time of gazing at the speaker, it should be highlighted and the user can then Air Tap at it. The Air Tap action should expand the speaker hologram and the user should be able to view the speaker component in detail. Sketch for expanded speakers After the speakers are expanded, the user should be able to visualize the speaker components in detail. Now, if the user Air Taps on the expanded speakers, the application should do the following: Open the textual detail component about the speakers; the user can read the content and learn about the speakers in detail Start voice narration, detailing speaker details The user can also Air Tap on the expanded speaker component, and this action should close the expanded speaker Textual and voice narration for speaker details  As you did sketching for the speakers, apply a similar approach and do sketching for other components, such as lenses, buttons, and so on. 3D Modeling workflow Before jumping to 3D Modeling, let's understand the 3D Modeling workflow across different tools that we are going to use during the course of this topic. The following diagram explains the flow of the 3D Modeling workflow: Flow of 3D Modeling workflow Adding Air Tap on speaker In this project, we will be using the left-side speaker for applying Air Tap on speaker. However, you can apply the same for the right-side speaker as well. Similar to Lenses, we have two objects here which we need to identify from the object explorer. Navigate to Left_speaker_geo and left_speaker_details_geo in Object Hierarchy window Tag them as leftspeaker and speakerDetails respectively By default, when you are just viewing the Holograms, we will be hiding the speaker details section. This section only becomes visible when we do the Air Tap, and goes back again when we Air Tap again: Speaker with Box Collider Add a new script inside the Scripts folder, and name it ShowHideBehaviour. This script will handle the Show and Hide behaviour of the speakerDetails game object. Use the following script inside the ShowHideBehaviour.cs file. This script we can use for any other object to show or hide. public class ShowHideBehaviour : MonoBehaviour { public GameObject showHideObject; public bool showhide = false; private void Start() { try { MeshRenderer render = showHideObject.GetComponent InChildren<MeshRenderer>(); if (render != null) { render.enabled = showhide; } } catch (System.Exception) { } } } The script finds the MeshRenderer component from the gameObject and enables or disables it based on the showhide property. In this script, the showhide is property exposed as public, so that you can provide the reference of the object from the Unity scene itself. Attach ShowHideBehaviour.cs as components in speakerDetails tagged object. Then drag and drop the object in the showhide property section. This just takes the reference for the current speaker details objects and will hide the object in the first instance. Attach show-hide script to the object By default, it is unchecked, showhide is set to false and it will be hidden from view. At this point in time, you must check the left_speaker_details_geo on, as we are now handling visibility using code. Now, during the Air Tapped event handler, we can handle the render object to enable visibility. Add a new script by navigating from the context menu Create | C# Scripts, and name it SpeakerGestureHandler. Open the script file in Visual Studio. Similar to SpeakerGestureHandler, by default, the SpeakerGestureHandler class will be inherited from the MonoBehaviour. In the next step, implement the InputClickHandler interface in the SpeakerGestureHandler class. This interface implement the methods OnInputClicked() that invoke on click input. So, whenever you do an Air Tap gesture, this method is invoked. RaycastHit hit; bool isTapped = false; public void OnInputClicked(InputEventData eventData) { hit = GazeManager.Instance.HitInfo; if (hit.transform.gameObject != null) { isTapped = !isTapped; var lftSpeaker = GameObject.FindWithTag("LeftSpeaker"); var lftSpeakerDetails = GameObject.FindWithTag("speakerDetails"); MeshRenderer render = lftSpeakerDetails.GetComponentInChildren <MeshRenderer>(); if (isTapped) { lftSpeaker.transform.Translate(0.0f, -1.0f * Time.deltaTime, 0.0f); render.enabled = true; } else { lftSpeaker.transform.Translate(0.0f, 1.0f * Time.deltaTime, 0.0f); render.enabled = false; } } } When it is gazed, we find the game object for both LeftSpeaker and speakerDetails by the tag name. For the LeftSpeaker object, we are applying transformation based on tapped or not tapped, which worked like what we did for lenses. In the case of speaker details object, we have also taken the reference of MeshRenderer to make it's visibility true and false based on the Air Tap. Attach the SpeakerGestureHandler class with leftSpeaker Game Object. Air Tap in speaker – see it in action Air Tap action for speaker is also done. Save the scene, build and run the solution in emulator once again. When you can see the cursor on the speaker, perform Air Tap. Default View and Air Tapped View Real-time visualization through HoloLens We have learned about the data ingress flow, where devices connect with the IoT Hub, and stream analytics processes the stream of data and pushes it to storage. Now, in this section, let's discuss how this stored data will be consumed for data visualization within holographic application. Solution to consume data through services Summary In this article, we demonstrated using HoloLens,  for exploring Digital Reality - Under the Hood, Holograms in reality, Sketching the scenarios, 3D Modeling workflow, Adding Air Tap on speaker,  and  Real-time visualization through HoloLens. Resources for Article: Further resources on this subject: Creating Controllers with Blueprints [article] Raspberry Pi LED Blueprints [article] Exploring and Interacting with Materials using Blueprints [article]
Read more
  • 0
  • 0
  • 3053

article-image-massive-graphs-big-data
Packt
11 Jul 2017
19 min read
Save for later

Massive Graphs on Big Data

Packt
11 Jul 2017
19 min read
In this article by Rajat Mehta, author of the book Big Data Analytics with Java, we will learn about graphs. Graphs theory is one of the most important and interesting concepts of computer science. Graphs have been implemented in real life in a lot of use cases. If you use a GPS on your phone or a GPS device and it shows you a driving direction to a place, behind the scene there is an efficient graph that is working for you to give you the best possible direction. In a social network you are connected to your friends and your friends are connected to other friends and so on. This is a massive graph running in production in all the social networks that you use. You can send messages to your friends or follow them or get followed all in this graph. Social networks or a database storing driving directions all involve massive amounts of data and this is not data that can be stored on a single machine, instead this is distributed across a cluster of thousands of nodes or machines. This massive data is nothing but big data and in this article we will learn how data can be represented in the form of a graph so that we make analysis or deductions on top of these massive graphs. In this article, we will cover: A small refresher on the graphs and its basic concepts A small introduction on graph analytics, its advantages and how Apache Spark fits in Introduction on the GraphFrames library that is used on top of Apache Spark Before we dive deeply into each individual section, let's look at the basic graph concepts in brief. (For more resources related to this topic, see here.) Refresher on graphs In this section, we will cover some of the basic concepts of graphs, and this is supposed to be a refresher section on graphs. This is a basic section, hence if some users already know this information they can skip this section. Graphs are used in many important concepts in our day-to-day lives. Before we dive into the ways of representing a graph, let's look at some of the popular use cases of graphs (though this is not a complete list) Graphs are used heavily in social networks In finding driving directions via GPS In many recommendation engines In fraud detection in many financial companies In search engines and in network traffic flows In biological analysis As you must have noted earlier, graphs are used in many applications that we might be using on a daily basis. Graphs is a form of a data structure in computer science that helps in depicting entities and the connection between them. So, if there are two entities such as Airport A and Airport B and they are connected by a flight that takes, for example, say a few hours then Airport A and Airport B are the two entities and the flight connecting between them that takes those specific hours depict the weightage between them or their connection. In formal terms, these entities are called as vertexes and the relationship between them are called as edges. So, in mathematical terms, graph G = {V, E},that is, graph is a function of vertexes and edges. Let's look at the following diagram for the simple example of a graph: As you can see, the preceding graph is a set of six vertexes and eight edges,as shown next: Vertexes = {A, B, C, D, E, F} Edges = { AB, AF, BC, CD, CF, DF, DE, EF} These vertexes can represent any entities, for example, they can be places with the edges being 'distances' between the places or they could be people in a social network with the edges being the type of relationship, for example, friends or followers. Thus, graphs can represent real-world entities like this. The preceding graph is also called a bidirected graph because in this graph the edges go in either direction that is the edge from A to B can be traversed both ways from A to B as well as from B to A. Thus, the edges in the preceding diagrams that is AB can be BA or AF can be FA too. There are other types of graphs called as directed graphs and in these graphs the direction of the edges go in one way only and does not retrace back. A simple example of a directed graph is shown as follows:. As seen in the preceding graph, the edge A to B goes only in one direction as well as the edge B to C. Hence, this is a directed graph. A simple linked list data structure or a tree datastructure are also forms of graph only. In a tree, nodes can have children only and there are no loops, while there is no such rule in a general graph. Representing graphs Visualizing a graph makes it easily comprehensible but depicting it using a program requires two different approaches Adjacency matrix: Representing a graph as a matrix is easy and it has its own advantages and disadvantages. Let's look at the bidirected graph that we showed in the preceding diagram. If you would represent this graph as a matrix, it would like this: The preceding diagram is a simple representation of our graph in matrix form. The concept of matrix representation of graph is simple—if there is an edge to a node we mark the value as 1, else, if the edge is not present, we mark it as 0. As this is a bi-directed graph, it has edges flowing in one direction only. Thus, from the matrix, the rows and columns depict the vertices. There if you look at the vertex A, it has an edge to vertex B and the corresponding matrix value is 1. As you can see, it takes just one step or O[1]to figure out an edge between two nodes. We just need the index (rows and columns) in the matrix and we can extract that value from it. Also, if you would have looked at the matrix closely, you would have seen that most of the entries are zero, hence this is a sparse matrix. Thus, this approach eats a lot of space in computer memory in marking even those elements that do not have an edge to each other, and this is its main disadvantage. Adjacency list: Adjacency list solves the problem of space wastage of adjacency matrix. To solve this problem, it stores the node and its neighbors in a list (linked list) as shown in the following diagram: For maintaining brevity, we have not shown all the vertices but you can make out from the diagram that each vertex is storing its neighbors in a linked list. So when you want to figure out the neighbors of a particular vertex, you can directly iterate over the list. Of course this has the disadvantage of iterating when you have to figure out whether an edge exists between two nodes or not. This approach is also widely used in many algorithms in computer science. We have briefly seen how graphs can be represented, let's now see some important terms that are used heavily on graphs. Common terminology on graphs We will now introduce you to some common terms and concepts in graphs that you can use in your analytics on top of graphs: Vertices:As we mentioned earlier, vertices are the mathematical terms for the nodes in a graph. For analytic purposes, thevertices count shows the number of nodes in the system, for example, the number of people involved in a social graph. Edges: As we mentioned earlier, edges are the connection between vertices and edges can carry weights. The number of edges represent the number of relations in a system of graph. The weight on a graph represents the intensity of the relationship between the nodes involved; for example, in a social network, the relationship of friends is a stronger relationship than followed between nodes. Degrees: Represent the total number of connections flowing into as well as out of a node. For example, in the previous diagram the degree of node F is four. The degree count is useful, for example, in a social network graph it can represent how well a person is connected if his degree count is very high. Indegrees: This represents the number of connections flowing into a node. For example, in the previous diagram, for node F the indegree value is three. In a social network graph, this might represent how many people can send messages to this person or node. Oudegrees: This represents the number of connections flowing out of a node. For example, in the previous diagram, for node F the outdegree value is one. In a social network graph, this might represent how many people can send messages to this person or node. Common algorithms on graphs Let's look at the three common algorithms that are run on graphs frequently and some of their uses: Breadth first search: Breadth first search is an algorithm for graph traversal or searching. As the name suggests, the traversal occurs across the breadth of the graphs that is to say the neighbors of the node form where traversal starts are searched first before exploring further in the same manner. We will refer to the same graph we used earlier: If we start at vertex A, then according to breadth first search next we search or go at the neighbors of A that's B and F. After that, we will go at the neighbors of B and that will be C. Next we will go to the neighbours of F and those will be E and D. We only go through each node once and this mimics real-life travel as well, as to reach from a point to another point we seldom cover the same road or path again. Thus, our breadth first traversal starting from A will be {A , B , F , C , D , E }. Breadth first search is very useful in graph analytics and can tell us things such as the friends that are not your immediate friends but just at the next level after your immediate friends in a social network or in the case of a graph of a flights network it can show flights with just a single stop or two stops to the destination. Depth first search: This is another way of searching where we start from the source vertice and keep on searching until we reach the end node or the leaf node and then we backtrack. This algorithm is not as performant as the bread first search as it requires lots of traversals. So if you want to know if a node A is connected to node B, you might end up searching along a lot of wasteful nodes that do not have anything to do with the original nodes A and B before coming at the appropriate solution. Dijkstra's shortest path: This is a greedy algorithm to find the shortest path in a graph network. So in a weighted graph, if you need to find the shortest path between two nodes, you can start from the starting node and keep on picking the next node in path greedily to be the one with the least weight (in the case of weights being distances between nodes like in city graphs depicting interconnecting cities and roads).So in a road network, you can find the shortest path between two cities using this algorithm. PageRank algorithm: This is a very popular algorithm that came out from Google and it essentially is used to find the importance of a web page by figuring out how connected it is to other important websites. It gives a page rank score to each of the websites based on this approach and finally the search results are built based on this score. The best part about this algorithm is it can be applied to other areas in life too, for example, in figuring out the important airports in a flight graph, or figuring out the most important people in a social network group. So much for the basics and refresher on graphs, in the next section, we will now see how graphs can be used in real world in massive datasets such as social network data or in data used in the field of biology. We will also study how graph analytics can be used on top of these graphs to derive exclusive deductions. Plotting graphs There is a handy open source Java library called GraphStream, which can be used to plot graphs and this is very useful specially if you want to view the structure of your graphs. While viewing, you can also figure out if some of the vertices are very close to each other (clustered) or in general how they are placed. Using the GraphStream library is easy. Just download the jar from http://graphstream-project.org and put it in the classpath of your project. Next, we will show a simple example demonstrating how easy it is to plot a graph using this library. Just create an instance of a graph. For our example, we will create a simple DefaultGraph and name it SimpleGraph. Next, we will add the nodes or vertices of the graph. We will also add the attribute of the label that is displayed on the vertice. Graph graph = newDefaultGraph("SimpleGraph"); graph.addNode("A" ).setAttribute("ui.label", "A"); graph.addNode("B" ).setAttribute("ui.label", "B"); graph.addNode("C" ).setAttribute("ui.label", "C"); After building the nodes, it's now time to connect these nodes using the edges. The API is simple to use and on the graph instance we can define the edges, provided an ID is given to them and the starting and ending nodes are also given.   graph.addEdge("AB", "A", "B"); graph.addEdge("BC", "B", "C"); graph.addEdge("CA", "C", "A"); All the information of nodes and edges is present on the graph instance. It's now time to plot this graph on the UI and we can just invoke the display method on the graph instance as shown next and display it on the UI. graph.display(); This would plot the graph on the UI as follows: This library is extensive and it will be good learning experience to explore this library further and we would urge the readers to further explore this library on their own. Massive graphs on big data Big data comprises huge amount of data distributed across a cluster of thousands (if not more) of machines. Building graphs based on this massive data has different challenges shown as follows: Due to the vast amount of data involved, the data for the graph is distributed across a cluster of machines. Hence, in actuality, it's not a single node graph and we have to build a graph that spans across a cluster of machines. A graph that spans across a cluster of machines would have vertices and edges spread across different machines and this data in a graph won't fit into the memory of one single machine. Consider your friend's list on Facebook; some of your friend's data in your Facebook friend list graph might lie on different machines and this data might be just tremendous in size. Look at an example diagram of a graph of 10 Facebook friends and their network shown as follows: As you can see in the preceding diagram, when for just 10 friends the data can be huge, and here since the graph is drawn by hand we have not even shown a lot of connections to make the image comprehensible, but in real life each person can have say more than thousands of connections. So imagine what will happen to a graph with say thousands if not more people on the list. As shown in the reasons we just saw, building massive graphs on big data is a different ball game altogether and there are few main approaches for building this massive graphs. From the perspective of big data building the massive graphs involve running and storing data parallely on many nodes. The two main approaches are bulk synchronous parallely and the pregel approach. Apache Spark follows the pregel approach. Covering these approaches in detail is out of scope of this book and if the users are interested more on these topics they should refer to other books and the Wikipedia for the same. Graph analytics The biggest advantage to using graphs is you can analyze these graphs and use them for analyzing complex datasets. You might ask what is so special about graph analytics that we can't do by relational databases. Let's try to understand this using an example, suppose we want to analyze your friends network on Facebook and pull information about your friends such as their name, their birth date, their recent likes, and so on. If Facebook had a relational database, then this would mean firing a query on some table using the foreign key of the user requesting this info. From the perspective of relational database, this first level query is easy. But what if we now ask you to go to the friends at level four in your network and fetch their data (as shown in the following diagram). The query to get this becomes more and more complicated from a relational database perspective but this is a trivial task on a graph or graphical database (such as Neo4j). Graphs are extremely good on operations where you want to pull information from one end of the node to another, where the other node lies after a lot of joins and hops. As such, graph analytics is good for certain use cases (but not for all use cases, relation database are still good on many other use cases). As you can see, the preceding diagram depicts a huge social network (though the preceding diagram might just be depicting a network of a few friends only). The dots represent actual people in a social network. So if somebody asks to pick one user on the left-most side of the diagram and see and follow host connections to the right-most side and pull the friends at the say 10th level or more, this is something very difficult to do in a normal relational database and doing it and maintaining it could easily go out of hand. There are four particular use cases where graph analytics is extremely useful and used frequently (though there are plenty more use cases too) Path analytics: As the name suggests, this analytics approach is used to figure out the paths as you traverse along the nodes of a graph. There are many fields where this can be used—simplest being road networks and figuring out details such as shortest path between cities, or in flight analytics to figure out the shortest time taking flight or direct flights. Connectivity analytics: As the name suggests, this approach outlines how the nodes within a graph are connected to each other. So using this you can figure out how many edges are flowing into a node and how many are flowing out of the node. This kind of information is very useful in analysis. For example, in a social network if there is a person who receives just one message but gives out say ten messages within his network then this person can be used to market his favorite products as he is very good in responding to messages. Community Analytics: Some graphs on big data are huge. But within these huge graphs there might be nodes that are very close to each other and are almost stacked in a cluster of their own. This is useful information as based on this you can extract out communities from your data. For example, in a social network if there are people who are part of some community, say marathon runners, then they can be clubbed into a single community and further tracked. Centrality Analytics: This kind of analytical approach is useful in finding central nodes in a network or graph. This is useful in figuring out sources that are single handedly connected to many other sources. It is helpful in figuring out influential people in a social network, or a central computer in a computer network. From the perspective of this article, we will be covering some of these use cases in our sample case studies and for this we will be using a library on Apache Spark called GraphFrames. GraphFrames GraphX library is advanced and performs well on massive graphs, but, unfortunately, it's currently only implemented in Scala and does not have any direct Java API. GraphFrames is a relatively new library that is built on top of Apache Spark and provides support for dataframe (now dataset) based graphs.It contains a lot of methods that are direct wrappers over the underlying sparkx methods. As such it provides similar functionality as GraphX except that GraphX acts on the Spark SRDD and GraphFrame works on the dataframe so GraphFrame is more user friendly (as dataframes are simpler to use). All the advantages of firing Spark SQL queries, joining datasets, filtering queries are all supported on this. To understand GraphFrames and representing massive big data graphs, we will take small baby steps first by building some simple programs using GraphFrames before building full-fledged case studies. Summary In this article, we learned about graph analytics. We saw how graphs can be built even on top of massive big datasets. We learned how Apache Spark can be used to build these massive graphs and in the process we learned about the new library GraphFrames that helps us in building these graphs. Resources for Article: Further resources on this subject: Saying Hello to Java EE [article] Object-Oriented JavaScript [article] Introduction to JavaScript [article]
Read more
  • 0
  • 0
  • 4507

article-image-thread-execution
Packt
10 Jul 2017
6 min read
Save for later

Thread of Execution

Packt
10 Jul 2017
6 min read
In this article by Anton Polukhin Alekseevic, the author of the book Boost C++ Application Development Cookbook - Second Edition, we will see the multithreading concept.  Multithreading means multiple threads of execution exist within a single process. Threads may share process resources and have their own resources. Those threads of execution may run independently on different CPUs, leading to faster and more responsible programs. Let's see how to create a thread of execution. (For more resources related to this topic, see here.) Creating a thread of execution On modern multicore compilers, to achieve maximal performance (or just to provide a good user experience), programs usually must use multiple threads of execution. Here is a motivating example in which we need to create and fill a big file in a thread that draws the user interface: #include <algorithm> #include <fstream> #include <iterator> bool is_first_run(); // Function that executes for a long time. void fill_file(char fill_char, std::size_t size, const char* filename); // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { // This will be executing for a long time during which // users interface freezes. fill_file(0, 8 * 1024 * 1024, "save_file.txt"); } Getting ready This recipe requires knowledge of boost::bind or std::bind. How to do it... Starting a thread of execution never was so easy: #include <boost/thread.hpp> // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { boost::thread(boost::bind( &fill_file, 0, 8 * 1024 * 1024, "save_file.txt" )).detach(); } How it works... The boost::thread variable accepts a functional object that can be called without parameters (we provided one using boost::bind) and creates a separate thread of execution. That functional object is copied into a constructed thread of execution and run there. We are using version 4 of the Boost.Thread in all recipes (defined BOOST_THREAD_VERSION to 4). Important differences between Boost.Thread versions are highlighted. After that, we call the detach() function, which does the following: The thread of execution is detached from the boost::thread variable but continues its execution The boost::thread variable start to hold a Not-A-Thread state Note that without a call to detach(), the destructor of boost::thread will notice that it still holds a OS thread and will call std::terminate.  std::terminate terminates our program without calling destructors, freeing up resources, and without other cleanup. Default constructed threads also have a Not-A-Thread state, and they do not create a separate thread of execution. There's more... What if we want to make sure that file was created and written before doing some other job? In that case we need to join the thread in the following way: // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { boost::thread t(boost::bind( &fill_file, 0, 8 * 1024 * 1024, "save_file.txt" )); // Do some work. // ... // Waiting for thread to finish. t.join(); } After the thread is joined, the boost::thread variable holds a Not-A-Thread state and its destructor does not call std::terminate. Remember that the thread must be joined or detached before its destructor is called. Otherwise, your program will terminate! With BOOST_THREAD_VERSION=2 defined, the destructor of boost::thread calls detach(), which does not lead to std::terminate. But doing so breaks compatibility with std::thread, and some day, when your project is moving to the C++ standard library threads or when BOOST_THREAD_VERSION=2 won't be supported; this will give you a lot of surprises. Version 4 of Boost.Thread is more explicit and strong, which is usually preferable in C++ language. Beware that std::terminate() is called when any exception that is not of type boost::thread_interrupted leaves boundary of the functional object that was passed to the boost::thread constructor. There is a very helpful wrapper that works as a RAII wrapper around the thread and allows you to emulate the BOOST_THREAD_VERSION=2 behavior; it is called boost::scoped_thread<T>, where T can be one of the following classes: boost::interrupt_and_join_if_joinable: To interrupt and join thread at destruction boost::join_if_joinable: To join a thread at destruction boost::detach: To detach a thread at destruction Here is a small example: #include <boost/thread/scoped_thread.hpp> void some_func(); void example_with_raii() { boost::scoped_thread<boost::join_if_joinable> t( boost::thread(&some_func) ); // 't' will be joined at scope exit. } The boost::thread class was accepted as a part of the C++11 standard and you can find it in the <thread> header in the std:: namespace. There is no big difference between the Boost's version 4 and C++11 standard library versions of the thread class. However, boost::thread is available on the C++03 compilers, so its usage is more versatile. There is a very good reason for calling std::terminate instead of joining by default! C and C++ languages are often used in life critical software. Such software is controlled by other software, called watchdog. Those watchdogs can easily detect that application has terminated but not always can detect deadlocks or detect them with bigger delays. For example for a defibrillator software it's safer to terminate, than hang on join() for a few seconds waiting for a watchdog reaction. Keep that in mind while designing such applications. See also All the recipes in this chapter are using Boost.Thread. You may continue reading to get more information about the library. The official documentation has a full list of the boost::thread methods and remarks about their availability in the C++11 standard library. The official documentation can be found at http://boost.org/libs/thread. The Interrupting a thread recipe will give you an idea of what the boost::interrupt_and_join_if_joinable class does. Summary We saw how to create a thread of execution using some easy techniques. Resources for Article: Further resources on this subject: Introducing the Boost C++ Libraries [article] Boost.Asio C++ Network Programming [article] Application Development in Visual C++ - The Tetris Application [article]
Read more
  • 0
  • 0
  • 3561
article-image-queues-and-topics
Packt
10 Jul 2017
8 min read
Save for later

Queues and topics

Packt
10 Jul 2017
8 min read
In this article by Luca Stancapiano, the author of the book Mastering Java EE Development with WildFly, we will see how to implement Java Message Service (JMS) in a queue channel using WildFly console. (For more resources related to this topic, see here.) JMS works inside channels of messages that manage the messages asynchronously. These channels contain messages that they will collect or remove according the configuration and the type of channel. These channels are of two types, queues and topics. These channels are highly configurable by the WildFly console. As for all components in WildFly they can be installed through the console command line or directly with maven plugins of the project.  In the next two paragraphs we will show what do they mean and all the possible configurations. Queues Queues collect the sent messages that are waiting to be read. The messages are delivered in the order they are sent and when beds are removed from the queue. Create the queue from the web console See now the steps to create a new queue through the web console. Connect to http://localhost:9990/.  Go in Configuration | Subsystems/Messaging - ActiveMQ/default. And click on Queues/Topics. Now select the Queues menu and click on the Add button. You will see this screen: The parameters to insert are as follows: Name:  The name of the queue. JNDI Names: The jndi names the queue will be bound to. Durable?: Whether the queue is durable or not. Selector: The queue selector.     As for all enterprise components, JMS components are callable through Java Naming Directory Interface (JNDI).  Durable queues keep messages around persistently for any suitable consumer to consume them. Durable queues do not need to concern themselves with which consumer is going to consume the messages at some point in the future. There is just one copy of a message that any consumer in the future can consume. Message Selectors allows to filter the messages that a Message Consumer will receive. The filter is a relatively complex language similar to the syntax of an SQL WHERE clause. The selector can use all the message headers and properties for filtering operations, but cannot use the message content.Selectors are mostly useful for channels that broadcast a very large number of messages to its subscribers. On Queues, only messages that match the selector will be returned. Others stay in the queue (and thus can be read by a MessageConsumer with different selector). The following SQL elements are allowed in our filters and we can put them in the Selector field of the form:  Element  Description of the Element  Example of Selectors  AND, OR, NOT  Logical operators  (releaseYear < 1986) ANDNOT (title = 'Bad')  String Literals  String literals in single quotes, duplicate to escape  title = 'Tom''s'  Number Literals  Numbers in Java syntax. They can be double or integer  releaseYear = 1982  Properties  Message properties that follow Java identifier naming  releaseYear = 1983  Boolean Literals  TRUE and FALSE  isAvailable = FALSE  ( )  Round brackets  (releaseYear < 1981) OR (releaseYear > 1990) BETWEEN Checks whether number is in range (both numbers inclusive) releaseYear BETWEEN 1980 AND 1989 Header Fields Any headers except JMSDestination, JMSExpiration and JMSReplyTo JMSPriority = 10 =, <>, <, <=, >, >= Comparison operators (releaseYear < 1986) AND (title <> 'Bad') LIKE String comparison with wildcards '_' and '%' title LIKE 'Mirror%' IN Finds value in set of strings title IN ('Piece of mind', 'Somewhere in time', 'Powerslave') IS NULL, IS NOT NULL Checks whether value is null or not null. releaseYear IS NULL *, +, -, / Arithmetic operators releaseYear * 2 > 2000 - 18 Fill the form now: In this article we will implement a messaging service to send coordinates of the bus means . The queue is created and showed in the queues list: Create the queue using CLI and Maven WildFly plugin The same thing can be done with the Command Line Interface (CLI). So start a WildFly instance, go in the bin directory of WildFly and execute the following script: bash-3.2$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect [standalone@localhost:9990 /] /subsystem=messagingactivemq/ server=default/jmsqueue= gps_coordinates:add(entries=["java:/jms/queue/GPS"]) {"outcome" => "success"} The same thing can be done through maven. Simply add this snippet in your pom.xml: <plugin> <groupId>org.wildfly.plugins</groupId> <artifactId>wildfly-maven-plugin</artifactId> <version>1.0.2.Final</version> <executions> <execution> <id>add-resources</id> <phase>install</phase> <goals> <goal>add-resource</goal> </goals> <configuration> <resources> <resource> <address>subsystem=messaging-activemq,server=default,jmsqueue= gps_coordinates</address> <properties> <durable>true</durable> <entries>!!["gps_coordinates", "java:/jms/queue/GPS"]</entries> </properties> </resource> </resources> </configuration> </execution> <execution> <id>del-resources</id> <phase>clean</phase> <goals> <goal>undeploy</goal> Queues and topics [ 7 ] </goals> <configuration> <afterDeployment> <commands> <command>/subsystem=messagingactivemq/ server=default/jms-queue=gps_coordinates:remove </command> </commands> </afterDeployment> </configuration> </execution> </executions> </plugin> The Maven WildFly plugin lets you to do admin operations in WildFly using the same custom protocol used by command line. Two executions are configured: add-resources: It hooks the install maven scope and it adds the queue passing the name, JNDI and durable parameters seen in the previous paragraph. del-resources: It hooks the clean maven scope and remove the chosen queue by name. Create the queue through an Arquillian test case Or we can add and remove the queue through an Arquillian test case: @RunWith(Arquillian.class) @ServerSetup(MessagingResourcesSetupTask.class) public class MessageTestCase { ... private static final String QUEUE_NAME = "gps_coordinates"; private static final String QUEUE_LOOKUP = "java:/jms/queue/GPS"; static class MessagingResourcesSetupTask implements ServerSetupTask { @Override public void setup(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).createJmsQueue(QUEUE_NA ME, QUEUE_LOOKUP); } @Override public void tearDown(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).removeJmsQueue(QUEUE_NA ME); } } Queues and topics [ 8 ] ... } The Arquillian org.jboss.as.arquillian.api.ServerSetup annotation let to use an external setup manager used to install or remove new components inside WildFly. In this case we are installing the queue declared with the two variables QUEUE_NAME and QUEUE_LOOKUP. When the test ends, automatically the tearDown method will be started and it will remove the installed queue. To use Arquillian it's important add the WildFly testsuite dependency in your pom.xml project: ... <dependencies> <dependency> <groupId>org.wildfly</groupId> <artifactId>wildfly-testsuite-shared</artifactId> <version>10.1.0.Final</version> <scope>test</scope> </dependency> </dependencies> ... Going in the standalone-full.xml we will find the created queue as: <subsystem > <server name="default"> ... <jms-queue name="gps_coordinates" entries="java:/jms/queue/GPS"/> ... </server> </subsystem> JMS is available in the standalone-full configuration. By default WildFly supports 4 standalone configurations. They can be found in the standalone/configuration directory: standalone.xml: It supports all components except the messaging and corba/iiop standalone-full.xml: It supports all components standalone-ha.xml: It supports all components except the messaging and corba/iiop with the enabled cluster standalone-full-ha.xml: It supports all components with the enabled cluster To start WildFly with the chosen configuration simply add a -c with the configuration in the standalone.sh script. Here a sample to start the standalone full configuration: ./standalone.sh -c standalone-full.xml Create the java client for the queue See now how create a client to send a message to the queue. JMS 2.0 simplify very much the creation of clients. Here a sample of a client inside a stateless Enterprise Java Beans (EJB): @Stateless public class MessageQueueSender { @Inject private JMSContext context; @Resource(mappedName = "java:/jms/queue/GPS") private Queue queue; public void sendMessage(String message) { context.createProducer().send(queue, message); } } The javax.jms.JMSContext is injectable from any EE component. We will see the JMS context in details in the next paragraph The JMS Context. The queue is represented in JMS by the javax.jms.Queue class. It can be injected as JNDI resource through the @Resource annotation. The JMS context through the createProducer method creates a producer represented by the javax.jms.JMSProducer class used to send the messages. We can now create a client injecting the stateless and sending a string message hello! ... @EJB private MessageQueueSender messageQueueSender; ... messageQueueSender.sendMessage("hello!"); Summary In this article we have seen how to implement Java Message Service in a queue channel using web console, Command Line Interface and Maven WildFly plugins, Arquillian test cases and how to create Java clients for queue. Resources for Article: Further resources on this subject: WildFly – the Basics [article] WebSockets in Wildfly [article] Creating Java EE Applications [article]
Read more
  • 0
  • 0
  • 5834

article-image-hpc-cluster-computing
Packt
10 Jul 2017
8 min read
Save for later

HPC Cluster Computing

Packt
10 Jul 2017
8 min read
In this article by, Raja Malleswara Rao Pattamsetti, the author of the book Distributed Computing in Java 9, the author is going to look into the processing capabilities for organizational applications are more than a chronological computer configuration that can be addressed to an extent by increasing the processor capacity and other resource allocation. While this can alleviate the performance for a while, the future computational requirements are restricted for adding more powerful computational processors and cost incurred in producing such powerful systems. Also, there is a need to produce efficient algorithms and practices to produce the best result out of it. A practical and economic substitute for these single high-power computers is to establish multiple low-power capacity processors that work collectively and organize their processing capabilities, which results in a powerful system, that is, parallel computers that permit the processing activities to be distributed among multiple low-capacity computers and together obtain the best expected result.   In this article, we will cover the following topics: Era of Computing Commanding Parallel System Architectures MPP: Massively Parallel Processors SMP: Symmetric Multiprocessors CC-NUMA: Cache Coherent Nonuniform Memory Access Distributed Systems Clusters Java support for High-Performance Computing (For more resources related to this topic, see here.) Era of Computing The technological developments are rapid in the industry of computing with the help of advancements in system software and hardware. The hardware advancements are around the processor advancements and their making techniques, and high-performance processors are getting generated in amazingly low cost. The hardware advancements are further boosted by the high-bandwidth and low-latency network systems. Very Large Scale Integration (VLSI) as a concept brought several phenomenal advancements in producing commanding chronological and parallel processing computers. Simultaneously, the System Software advancements have improved the ability of operating system, advanced software programming development techniques. It was observed as two commanding computing eras, namely Sequential and Parallel Era of Computing. The following diagram shows the advancements in both the Era of Computing from last few decades to the forecast for next two decades:   In each of these era, it is observed that the hardware architecture growth is trailed by the system software advancements, which mean that as there was more powerful hardware evolved, correspondingly advanced software programs and operating system capacities doubled its strength. As the applications and problem solving environments are added to this with the advent of parallel computing, mini to microprocessor development, clustered computers. Let’s now review some of the commanding parallel system architectures from the last few years. Commanding parallel system architectures From the last few years, multiple varieties of computing models provisioning great processing performance have developed. They are classified depending on the memory allocated to their processors and their alignment are designed. Some of the important parallel systems are as follows: MPP: Massively Parallel Processors SMP: Symmetric Multiprocessors CC-NUMA: Cache Coherent Nonuniform Memory Access Distributed Systems Clusters Let’s now understand a little more detail about each of these system architectures and review some of their important performance characteristics. Massively Parallel Processors (MPP) Massively Parallel Processors (MPP), as the name advises, are a huge parallel processing system developed in no sharing architecture. Such systems usually contain large number of processing nodes that are integrated with a high-speed interconnected network switch. Node is nothing but an element of computer with diverse hardware component combination, usually containing a memory unit and more than on processor. Some of the purpose nodes are designed to have backup disks or additional storage capacity.The following diagram represents the massively parallel processors architecture:   Symmetric Multiprocessors (SMP) Symmetric Multiprocessors (SMP), as the name advises, contain a set of limited number of processors ranging from 2 to 64 processors and share most of the resources among those processors. One instance of the operating system will be operating together on all these connected processors while they commonly share the I/O, memory, and the network bus. Based on the nature of similar set of processors connected and acting together as on operating system is the essential behavior of symmetric multiprocessors.The following diagram depicts the symmetric multiprocessors representation: Cache Coherent Nonuniform Memory Access (CC-NUMA) Cache Coherent Nonuniform Memory Access (CC-NUMA) is a special type of multiple processor system having mountable additional processor capability.  In CC-NUMA, the SMP system and the other remote nodes communicate through the remote interconnection link. Each of the remote nodes contains local memory and processors of its own.The following diagram represents the cache coherent nonuniform memory access architecture: The nature of memory access is nonuniform. Just Like the symmetric multiprocessors, CC-NUMA system is a comprehensive sight to entire system memory, and as the name advises, it takes nonuniform time to access either the close or distant memory locations. Distributed systems Distributed systems, as we have been discussing from previous articles, are traditional individual set of computers interconnected through an Internet/intranet running on their own operating system. Any diverse set of computer systems can contribute to be part of the distributed system with this expectation, including the combinations of Massively Parallel Processors, Symmetric Multiprocessors, distinct computers, and clusters. Clusters Cluster is an assembly of terminals or computers that are assembled through internetwork connection to address a large processing requirement through parallel processing. Hence, the clusters are usually configured with terminals or computers having higher processing capabilities connected with high-speed internetwork. Usually, Cluster is considered as one image that integrates a number of nodes with a group of resource utilization. The following diagram is a sample clustered tomcat server for a typical J2EE web application deployment:   The following table shows some of the important performance characteristics of the different parallel architecture systems discussed so far: Network of workstations A network of workstations is a group of resources connected like system processors, interface for networks, storage, and data disks that open the space for newer combinations such as: Parallel Computing: As discussed in the previous sections, the group of system processor can be connected as MPP or DSM, which can obtain the parallel processing ability. RAM Network: As a number of systems are connected, each system memory can collectively work as DRAM cache, which intensely expand the virtual memory for the entire system to improve its processing ability. Software Redundant Array of Inexpensive Disks (RAID): As a group of systems are used in the network connected in array improve the system stability, availability, and memory capacity with the help of low-cost multiple systems connected in the local area network. This also gives the simultaneous I/O system support. Multipath Communication: Multipath communication is a technique of using more than one network connection between the network of workstations to allow simultaneous information exchange among system nodes.  Java support for High-Performance Computing Java is providing some numerous advantages for HPC (High-Performance Computing) as a programming language, particularly with Grid computing. Some of the key benefits of using Java in HPC include the following: Portability: The capability to write the programming in one platform and port it to run on any other operating platform has been the biggest strength of Java language. This continue to be the advantage when porting Java applications to HPC systems. In Grid computing, this is an essential feature, as the execution environment gets decided during the execution of the program. This is possible since the Java byte code executes in Java Virtual Machine (JVM), which itself acts as an abstract operating environment. Network Centricity: As discussed in previous articles, Java provides a great support for distributed systems with its network centric feature of remote procedure calls (RPC) through RMI and CORBA services. Along with these, Java support for socket programming is another great support for grid computing. Software Engineering: Another great feature of Java is the way it can be used for engineering the solutions. We can produce more object-oriented, loosely coupled software that can be independently added as API through jar files in multiple other systems. Encapsulation and Interface programming makes it a great advantage in such application development. Security: Java is highly secure programming language with its feature like byte code verification to limit the resource utilization by an untrusted intruder software or program. This is a great advantage when running application on distributed environment with remote communication established over shared network. GUI development: Java support for platform independent GUI development is a great advantage to develop and deploy the enterprise web applications over a HPC environment to interact with. Availability: Java is having a great supporting from multiple operating system including Windows, Linux, SGI, and Compaq. It is easily available to consume and develop using a great set of open source frameworks developed on top of it.   While the advantages listed in the previous points in favor of using Java for HPC environments, some concerns need to be reviewed while designing Java applications, including its numerics (complex numbers, fastfp, multidimensional), performance, and parallel computing designs. Summary Through this article, you have learned about era of computing, commanding parallel, system architectures, MPP: Massively Parallel Processors, SMP: Symmetric Multiprocessors, CC-NUMA: Cache Coherent Nonuniform Memory Access, Distributed Systems, Clusters, Java support for High-Performance Computing Resources for Article:  Further resources on this subject: The Amazon S3 SDK for Java [article] Gradle with the Java Plugin [article] Getting Started with Sorting Algorithms in Java [article]
Read more
  • 0
  • 0
  • 3033