How-To Tutorials

21 Jul 2017

13 min read

Windows Drive Acquisition

21 Jul 2017

0
0
5616

article-image-creating-and-calling-subroutines

Packt

21 Jul 2017

8 min read

Creating and Calling Subroutines

Packt

21 Jul 2017

8 min read

In this article by James Kent Lewis author for the book Linux Shell Scripting Bootcamp, we will learn how to create and call subroutines in a script. The topics covered in this article are as follows: Show subroutines that take parameters Mention return codes again and how they work in scripts How to use subroutines First, let's start with a selection of simple but powerful scripts. These are mainly shown to give the reader an idea of just what can be done quickly with a script. (For more resources related to this topic, see here.) Clearing the screen The tput clear terminal command can be used to clear the current command-line session. You could type tput clear all the time, but wouldn't just cls be nicer? Here's a simple script that clears the current screen: Chapter 4 - Script 1 #!/bin/sh # # 5/9/2017 # tput clear Notice that this was so simple that I didn't even bother to include a Usage message or return code. Remember, to make this a command on your system, do this: cd $HOME/bin Create/edit a file named cls Copy and paste the preceding code into this file Save the file Run chmod 755 cls You can now type cls from any terminal (under that user) and your screen will clear. Try it. File redirection At this point, we need to go over file redirection. This is the ability to have the output from a command or script be copied into a file instead of going to the screen. This is done using the redirection operator, which is really just the greater than sign. These commands were run on my system and here is the output: Command piping Now, let's look at command piping, which is the ability to run a command and have the output from it serve as the input to another command. Suppose a program or script named loop1 is running on your system and you want to know the PID of it. You could run the psauxw command to a file, and then grep the file for loop1. Alternatively, you could do it in one step using a pipe as follows: Pretty cool, right? This is a very powerful feature in a Linux system and is used extensively. We will be seeing a lot more of this soon. The next section shows another very short script using some command piping. This clears the screen and then shows only the first 10 lines from dmesg. Chapter 4 - Script 2 #!/bin/sh # # 5/9/2017 # tput clear dmesg | head The next section shows file redirection. Chapter 4 - Script 3 #!/bin/sh # # 5/9/2017 # FN=/tmp/dmesg.txt dmesg> $FN echo "File $FN created." Try it on your system! This shows how easy it is to create a script to perform commands that you would normally type on the command line. Also, notice the use of the FN variable. If you want to use a different filename later, you only have to make the change in one place. Subroutines Now, let's really get into subroutines. To do this, we will use more of the tput commands: tput cup <row><col> # puts the cursor at row, col tput cup 0 0 # puts the cursor at the upper left hand side of terminal tput cup $LINES $COLUMNS # puts the cursor at bottom right hand side tput clear # clears the terminal screen tputsmso # bolds the text that follows tputrmso # un-bolds the text that follows test of Code[PACKT] 123456789012345678901234567890 tput cup 0 0 # tputsmso # The script is shown in the next section. This was mainly written to show the concept of subroutines; however, it can also be used as a guide on writing interactive tools. Chapter 4 - Script 4 i #!/bin/sh # # 5/9/2017 # echo "script4 - Linux Scripting Book" # Subroutines cls() { tput clear return 0 } home() { tput cup 0 0 return 0 } end() { let x=$COLUMNS-1 tput cup $LINES $x echo -n "X" # no newline or else will scroll } bold() { tputsmso } unbold() { tputrmso } underline() { tputsmul } normalline() { tputrmul } # Code starts here rc=0 # return code if [ $# -ne 1 ] ; then echo "Usage: script4 parameter" echo "Where parameter can be: " echo " home - put an X at the home position" echo " cls - clear the terminal screen" echo " end - put an X at the last screen position" echo " bold - bold the following output" echo " underline - underline the following output" exit 255 fi parm=$1 # main parameter 1 if [ "$parm" = "home" ] ; then echo "Calling subroutine home." home echo -n "X" elif [ "$parm" = "cls" ] ; then cls elif [ "$parm" = "end" ] ; then echo "Calling subroutine end." end elif [ "$parm" = "bold" ] ; then echo "Calling subroutine bold." bold echo "After calling subroutine bold." unbold echo "After calling subroutine unbold." elif [ "$parm" = "underline" ] ; then echo "Calling subroutine underline." underline echo "After subroutine underline." normalline echo "After subroutine normalline." else echo "Unknown parameter: $parm" rc=1 fi exit $rc The following is the output: Try this on your system. If you run it with the home parameter, it might look a little strange to you. The code puts a capital X at the home position (0,0) and this causes the prompt to print 1 character over. Nothing is wrong here, it just looks a little weird. Don't worry if this still doesn't make sense to you, just go ahead and look at Script 5. Using parameters Okay, let's add some routines to this script to show how to use parameters with a subroutine. In order to make the output look better, the cls routine is called first to clear the screen, which is shown in the next section. Chapter 4 - Script 5 #!/bin/sh # # 5/9/2017 # echo "script5 - Linux Scripting Book" # Subroutines cls() { tput clear return 0 } home() { tput cup 0 0 return 0 } end() { let x=$COLUMNS-1 tput cup $LINES $x echo -n "X" # no newline or else will scroll } bold() { tputsmso } unbold() { tputrmso } underline() { tputsmul } normalline() { tputrmul } move() # move cursor to row, col { tput cup $1 $2 } movestr() # move cursor to row, col { tput cup $1 $2 echo $3 } # Code starts here cls # clear the screen to make the output look better rc=0 # return code if [ $# -ne 1 ] ; then echo "Usage: script5 parameter" echo "Where parameter can be: " echo " home - put an X at the home position" echo " cls - clear the terminal screen" echo " end - put an X at the last screen position" echo " bold - bold the following output" echo " underline - underline the following output" echo " move - move cursor to row,col" echo " movestr - move cursor to row,col and output string" exit 255 fi parm=$1 # main parameter 1 if [ "$parm" = "home" ] ; then home echo -n "X" elif [ "$parm" = "cls" ] ; then cls elif [ "$parm" = "end" ] ; then move 0 0 echo "Calling subroutine end." end elif [ "$parm" = "bold" ] ; then echo "Calling subroutine bold." bold echo "After calling subroutine bold." unbold echo "After calling subroutine unbold." elif [ "$parm" = "underline" ] ; then echo "Calling subroutine underline." underline echo "After subroutine underline." normalline echo "After subroutine normalline." elif [ "$parm" = "move" ] ; then move 10 20 echo "This line started at row 10 col 20" elif [ "$parm" = "movestr" ] ; then movestr 30 40 "This line started at 30 40" else echo "Unknown parameter: $parm" rc=1 fi exit $rc Since this script only has two extra functions, you can just run them. This will be shown one command at a time as follows: guest1 $ script5 guest1 $ script5 move guest1 $ script5 movestr Since we are now placing the cursor at a specific location, the output should make more sense to you. Notice how the command-line prompt reappears where the last cursor position was. You probably noticed that the parameters to a subroutine work just like with a script. Parameter 1 is $1, parameter 2 is $2, and so on. This is good and bad, good because you don't have to learn anything radically different. But bad in that it is very easy to get the $1, $2, and vars mixed up if you are not careful. A possible solution, and the one I use, is to assign the $1, $2, and so on variables in the main script to a variable with a good meaningful name. For example, in these example scripts, I set parm1 equal to $1 (parm1=$1), and so on. Summary We started with some very simple scripts and then proceeded to show some simple subroutines. We then showed some subroutines that take parameters. Return codes were mentioned again to show how they work in subroutines. We included several scripts to show the concepts, and also included a special bonus script at no extra charge. Resources for Article: Further resources on this subject: Linux Shell Scripting [article] Linux Shell Scripting – various recipes to help you [article] GLSL 4.0: Using Subroutines to Select Shader Functionality [article]

0
0
2334

article-image-getting-started-android-things

Packt

21 Jul 2017

16 min read

Getting Started with Android Things

Packt

21 Jul 2017

16 min read

0
0
2694

Packt

18 Jul 2017

20 min read

Machine Learning Review

Packt

18 Jul 2017

20 min read

0
0
2624

Packt

18 Jul 2017

15 min read

Parallelize It

Packt

18 Jul 2017

15 min read

In this article by Elliot Forbes, the author of the book Learning Concurrency in Python, will explain concurrency and parallelism thoroughly, and bring necessary CPU knowledge related to it. Concurrency and parallelism are two concepts that are commonly confused. The reality though is that they are quite different and if you designed software to be concurrent when instead you needed parallel execution then you could be seriously impacting your software’s true performance potential. Due to this, it's vital to know exactly what the two concepts mean so that you can understand the differences. Through knowing these differences you’ll be putting yourself at a distinct advantage when it comes to designing your own high performance software in Python. In this article we’ll be covering the following topics: What is concurrency and what are the major bottlenecks that impact our applications? What is parallelism and how does this differ from concurrency? (For more resources related to this topic, see here.) Understanding concurrency Concurrency is essentially the practice of doing multiple things at the same time, but not specifically in parallel. It can help us to improve the perceived performance of our applications and it can also improve the speed at which our applications run. The best way to think of how concurrency works is to imagine one person working on multiple tasks and quickly switching between these tasks. Imagine this one person was working concurrently on a program and at the same time dealing with support requests. This person would focus primarily on the writing of their program and quickly context switch to fixing a bug or dealing with a support issue should there be one. Once they complete the support task, they could context switch again back to writing their program really quickly. However, in computing there are typically two performance bottlenecks that we have to watch out for and guard against when writing our programs. It’s important to know the differences between the two bottlenecks as if we tried to apply concurrency to a CPU based bottleneck then you could find that the program actually starts to see performance decreases as opposed to increases. And if you tried to apply parallelism to a task that really require a concurrent solution then again you could see the same performance hits. Properties of concurrent systems All concurrent systems share a similar set of properties, these can be defined as: Multiple actors: This represent the different processes and threads all trying to actively make progress on their own tasks. We could have multiple processes that contain multiple threads all trying to run at the same time. Shared Resources: This represents the memory, the disk and other resources that the actors in the above group must utilize in order to perform what they need to do. Rules: All concurrent systems must follow a strict set of rules that define when actors can and can’t acquire locks, access memory, modify state and so on. These rules are vital in order for these concurrent systems to work otherwise our programs would tear themselves apart. Input/Output bottlenecks Input/Output bottlenecks, or I/O bottlenecks for short, are bottlenecks where your computer spends more time waiting on various inputs and outputs than it does on processing the information. You’ll typically find this type of bottleneck when you are working with an I/O heavy application. We could take your standard web browser as an example of a heavy I/O application. In a browser we typically spend a significantly longer amount of time waiting for network requests to finish for things like style sheets, scripts or HTML pages to load as opposed to rendering this on the screen. If the rate at which data is requested is slower than the rate than which it is consumed at then you have yourself an I/O bottleneck. One of the main ways to improve the speed of these applications typically is to either improve the speed of the underlying I/O by buying more expensive and faster hardware or to improve the way in which we handle these I/O requests. A great example of a program bound by I/O bottlenecks would be a web crawler. Now the main purpose of a web crawler is to traverse the web and essentially index web pages so that they can be taken into consideration when Google runs its search ranking algorithm to decide the top 10 results for a given keyword. We’ll start by creating a very simple script that just requests a page and times how long it takes to request said web page: import urllib.request import time t0 = time.time() req = urllib.request.urlopen('http://www.example.com') pageHtml = req.read() t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) If we break down this code, first we import the two necessary modules, urllib.request and the time module. We then record the starting time and request the web page: example.com and then record the ending time and printing out the time difference. Now say we wanted to add a bit of complexity and follow any links to other pages so that we could index them in the future. We could use a library such as BeautifulSoup in order to make our lives a little easier: import urllib.request import time from bs4 import BeautifulSoup t0 = time.time() req = urllib.request.urlopen( 'http://www.example.com' ) t1 = time.time() print("Total Time To Fetch Page: {} Seconds".format(t1-t0)) soup = BeautifulSoup(req.read(), "html.parser" ) for link in soup.find_all( 'a' ): print (link.get( 'href' )) t2 = time.time() print( "Total Execeution Time: {} Seconds" .format) When I execute the above program I see the results like so in my terminal: You’ll notice from this output that the time to fetch the page is over a quarter of a second. Now imagine we wanted to run our web crawler for a million different web pages, our total execution time would be roughly a million times longer. The main real cause for this enormous execution time would be purely down to the I/O bottleneck we face in our program. We spend a massive amount of time waiting on our network requests and a fraction of that time parsing our retrieved page for further links to crawl. Understanding parallelism Parallelism is the art of executing two or more actions simultaneously as opposed to concurrency in which you make progress on two or more things at the same time. This is an important distinction, and in order to achieve true parallelism, we’ll need multiple processors on which to run our code on at the same time. A good analogy to think of parallel processing is to think of a queue for coffee. If you had say two queues of 20 people all waiting to use this coffee machine so that they can get through the rest of the day. Well this would be an example of concurrency. Now say you were to introduce a second coffee machine into the mix, this would then be an example of something happening in parallel. This is exactly how parallel processing works, each of the coffee machines in that room would represent one processing core and are able to make progress on tasks simultaneously. A real life example which highlights the true power of parallel processing is your computer’s graphics card. These graphics cards tend to have hundreds if not thousands of individual processing cores that live independently and can compute things at the same time. The reason we are able to run high-end PC games at such smooth frame rates is due to the fact we’ve been able to put so many parallel cores onto these cards. CPU bound bottleneck A CPU bound bottleneck is typically the inverse of an I/O bound bottleneck. This bottleneck is typically found in applications that do a lot of heavy number crunching or any other task that is computationally expensive. These are programs for which the rate at which they execute is bound by the speed of the CPU, if you throw a faster CPU in your machine you should see a direct increase in the speed of these programs. If the rate at which you are processing data far outweighs the rate at which you are requesting data then you have a CPU Bound Bottleneck. How do they work on a CPU? Understanding the differences outlined in the previous section between both concurrency and parallelism is essential but it’s also very important to understand more about the systems that your software will be running on. Having an appreciation of the different architecture styles as well as the low level mechanics helps you make the most informed decisions in your software design. Single core CPUs Single core processors will only ever execute one thread at any given time as that is all they are capable of. However, in order to ensure that we don’t see our applications hanging and being unresponsive, these processors rapidly switch between multiple threads of execution many thousands of times per second. This switching between threads is what is called a "context switch" and involves storing all the necessary information for a thread at a specific point of time and then restoring it at a different point further down the line. Using this mechanism of constantly saving and restoring threads allows us to make progress on quite a number of threads within a given second and it appears like the computer is doing multiple things at once. It is in fact doing only one thing at any given time but doing it at such speed that it’s imperceptible to users of that machine. When writing multi-threaded applications in Python it is important to note that these context switches are computationally quite expensive. There is no way to get around this unfortunately and much of the design of operating systems these days is about optimizing for these context switches so that we don’t feel the pain quite as much. Advantages of single core CPUs: They do not require any complex communication protocols between multiple cores Single core CPUs require less power which typically makes them better suited for IoT devices Disadvantages: They are limited in speed and larger applications will cause them to struggle and potentially freeze Heat dissipation issues place a hard limit on how fast a single core CPU can go Clock rate One of the key limitations to a single-core application running on a machine is the Clock Speed of the CPU. When we talk about Clock rate, we are essentially talking about how many clock cycles a CPU can execute every second. For the past 10 years we have watched as manufacturers have been able to surpass Moore’s law which was essentially an observation that the number of transistors one was able to place on a piece of silicon was able to double roughly every 2 years. This doubling of transistors every 2 years paved the way for exponential gains in single-cpu clock rates and CPUs went from the low MHz to the 4-5GHz clock speeds we are seeing on Intel’s i7 6700k processor. But with transistors getting as small as a few nanometers across, this is inevitably coming to an end. We’ve started to hit the boundaries of Physics and unfortunately if we go any smaller we’ll start to be hit by the effects of quantum tunneling. Due to these physical limitations we need to start looking at other methods in order to improve the speeds at which we are able to compute things. This is where Materlli’s Model of Scalability comes into play. Martelli model of scalability The author of Python Cookbook, Alex Martelli came up with a model on scalability which Raymond Hettinger discussed in his brilliant hour-long talk "Thinking about Concurrency", which he gave at PyCon Russia 2016. This model represents three different types of problem and programs: 1 core: single threaded and single process programs 2-8 cores: multithreaded and multiprocess programs 9+ cores: distributed computing The first category, the single core, single threaded category is able to handle a growing number of problems due to the constant improvements of the speed of single core CPUs and as a result the second category is being rendered more and more obsolete. We will eventually hit a limit with the speed at which a 2-8 core system can run at and as a result we’ll have to start looking at other methods such as multiple CPU systems or even distributed computing. If your problem is worth solving quickly and it requires a lot of power then the sensible approach is to go with the distributed computing category and spin up multiple machines and multiple instances of your program in order to tackle your problems in a truly parallel manner. Large enterprise systems that handle hundreds of millions of requests are the main inhabitants of this category. You’ll typically find that these enterprise systems are deployed on tens, if not hundreds of high performance, incredibly powerful servers in various locations across the world. Time-Sharing - the task scheduler One of the most important parts of the Operating System is the task scheduler. This acts as the maestro of the orchestra and directs everything with impeccable precision and incredible timing and discipline. This maestro has only one real goal and that is to ensure that every task has a chance to run through till completion, the when and where of a task’s execution however is non-deterministic. That is to say, if we gave a task scheduler two identical competing processes one after the other, there is no guarantee that the first process will complete first. This non-deterministic nature is what makes concurrent programming so challenging. An excellent example that highlights this non-deterministic behavior is say we take the following code: import threading import time import random counter = 1 def workerA(): global counter while counter < 1000: counter += 1 print("Worker A is incrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def workerB(): global counter while counter > -1000: counter -= 1 print("Worker B is decrementing counter to {}".format(counter)) sleepTime = random.randint(0,1) time.sleep(sleepTime) def main(): t0 = time.time() thread1 = threading.Thread(target=workerA) thread2 = threading.Thread(target=workerB) thread1.start() thread2.start() thread1.join() thread2.join() t1 = time.time() print("Execution Time {}".format(t1-t0)) if __name__ == '__main__': main() Here we have two competing threads in Python that are each trying to accomplish their own goal of either decrementing the counter to 1,000 or conversely incrementing it to 1,000. In a single core processor there is the possibility that worker A managers to complete its task before worker B has a chance to execute and the same can be said for worker B. However there is a third potential possibility and that is that the task scheduler continues to switch between worker A and worker B for an infinite number of times and never complete. The above code incidentally also shows one of the dangers of multiple threads accessing shared resources without any form of synchronization. There is no accurate way to determine what will happen to our counter and as such our program could be considered unreliable. Multi-core processors We’ve now got some idea as to how single-core processors work, but now it’s time to take a look at multicore processors. Multicore processors contain multiple independent processing units or “cores”. Each core contains everything it needs in order to execute a sequence of stored instructions. These cores each follow their own cycle: Fetch - This step involves fetching instructions from program memory. This is dictated by a program counter (PC) which identifies the location of the next step to execute. Decode - The core converts the instruction that it has just fetched and converts it into a series of signals that will trigger various other parts of the CPU. Execute - Finally we perform the execute step. This is where we run the instruction that we have just fetched and decoded and typically the results of this execution are then stored in a CPU register. Having multiple cores offers us the advantage of being able to work independently on multiple Fetch -> Decode -> Execute cycles. This style of architecture essentially enables us to create higher performance programs that leverage this parallel execution. Advantages of multicore processors: We are no longer bound by the same performance limitations that a single core processor is bound Applications that are able to take advantage of multiple cores will tend to run faster if well designed Disadvantages of multicore processors: They require more power than your typical single core processor. Cross-core communication is no simple feat, we have multiple different ways of doing this. Summary In this article we covered a multitude of topics including the differences between Concurrency and Parallelism. We also looked at how they both leverage the CPU in different ways. Resources for Article: Further resources on this subject: Python Data Science Up and Running [article] Putting the Fun in Functional Python [article] Basics of Python for Absolute Beginners [article]

0
0
3016

Packt

18 Jul 2017

9 min read

Optimizing Hadoop

Packt

18 Jul 2017

9 min read

In this article by Gurmukh Singh, the author of the book Hadoop Administration 2.x Cookbook, we will cover the some Hadoop concepts throughout this article. Hadoop being a distributed system, makes it complex to troubleshoot, tune and operate at scale. In a multi-tenancy environment with varied data sizes and format with different rate of data flow, it is important that the Hadoop clusters are setup in the optimal way, by following best practices. The recipe article walks the users through the phases of installation, planning, tuning, securing, and scaling out the clusters. (For more resources related to this topic, see here.) HDFS as storage layer One of the most important components of Hadoop is the distributed file system like HDFS. It provides high throughput by having parallel operations across multiple nodes and multiple disk. It provides fault tolerance by replicating data across multiple nodes. HDFS is a core part, designed not just for storage of data but also for distributed cache and localization of files and jars for easy access across all nodes in the cluster. We can have custom implementations as long as we call the Hadoop File system API and can extend the features. Hadoop is based on client server architecture with Namenode as master and Datanodes as slave. The Datanodes register with the Namenode and advertise their blocks, which are by default128 MB in size. The HDFS blocks are actual files on the lower file system, with each block having one block file and one checksum file on ext3/4 filesystem. The Namenode keeps a metadata of the objects in the form of namespace and bitmap called inode structure. In HDFS, everything is an object, whether file, block, or directory and it is represented by approximately 150 bytes per object in the Namenode metadata. The entire metadata is loaded into the memory of the Namenode for its operation. Thus, the memory of the Namenode is a limiting factor on the size of the cluster. During the operation of the cluster, Datanodes send heartbeat to the cluster, updating the status and sending the block report. Any client reading, writing data must contact Namenodefor the location of blocks or Datanodesand once Namenode returns the address of the Datanodes, the clientcommunicates directly with the Datanodes for read, write operations. The protocols like RPC, HTTP help with the communication between various parties in the cluster. Each daemon in Hadoop has a light weight Jetty server inbuilt for presenting web UIand also to provide rest API calls, for Data transfer between primary, secondary and other components in the cluster. The recipestalks about, how we setup Hadoop cluster and configureHadoop for addressing various complex production issues,communication channels, failure recovery, location of the Datanode blocks, the Namenode metadata location and many more. YARNFramework For Hadoop 2.x, the YARN framework has provided independence of the process engine we can use in Hadoop. It can be MapReduce, Spark, run with Mesos, and so on. The master ResourceManager(RM) accepts jobs and launches Application master, which controls the task and resources for that job with the help of master. Having separated Application Master (AM) per application, relieves the RM from all the overheads of job management and just concentrate as pure scheduler and honour the resource requests from AM. The task failure, re-launch of containers is the responsibility of AM and it asks for containers on a node after getting a grant from the Resource Manager. Each application makes a resource request, which is a combination of node address, memory, cpu cores, disks, and AM has to take permission from the master RM called grant, to be allocated a container by the Node manager (NM). This is to make sure that any errant AM, does not bring the cluster to a dead lock byaggressively asking for resources. All the job configurations files, jars are localized on the nodes where containers for that particular job will run with the help of HDFS. The NodeManagers (NM), periodically send heart beats to the RM, updating it about the capacity used, capacity available and its presence. A container reservation for particular task can also be done on a particular node, due to locality of reference and the framework will make the best effort to fulfil it. Any containers allocation is given 600secs, to move from Allocated to Running state, else they will be killed. The amount of data that a mapper or reducer can work on, depends upon the memory it has, example the mapper memory mapreduce.map.memory.mb, specifies the mapper memory. It is important to understand the percentage of JVM memory that will contribute to heap, as many buffers are allocated from it like mapred.io.sort.mb. Increasing HDFS block size, does not mean that job will run faster, as memory must be available to accommodate that. This is addressed with example in the recipe, to help better understand the relationship between various parameters and how changing one, impacts the other. The recipes address the layout of YARN components for optimal performance and distribution of load. Hadoop ecosystem Hadoop ecosystem consists of Hive, HBase, Sqoop, Presto, Oozie, and so on. In any organization, there will be a Hadoop stack, consisting of multiple components rather than just HDFS and YARN. The use case could be databases, analytics, NoSQL, ETL, or real time streaming with Spark. As the number of components in a cluster increase, so does the complexity and challenges to manage it. If we design the cluster properly from the very beginning, keeping in mind the future growth prospects, it becomes easy to scale. Many traditional technologies co-exist with Hadoop and there will be need to migrate data across systems using tools like Hive, HBase, Sqoop, and so on. In the recipe, we address migration process, configuration and best practices for Hive, HBase and its interaction. It is important to understand the partitioning and bucketing in Hive to enable scaling to large data sets. The recipes address the partitioning schemes needed to work at scale and return result in a reasonable time. In large organizations with multiple tenants with different use cases and data formats, it is important to understand the cluster layout. With ease of scheduling jobs to run at particular time and on a precondition, users do not need to baby sit the jobs. Capturing logs of various components and jobs, help to diagnose the issues and also help to predict the need to cluster expansion, if any. The granularity, at which the logs and events can be controlled with capture is different tools makes the life easier for the Hadoop operations team. The Hadoop logj4 configuration settings to capture job failures, memory errors or GC events help to quickly troubleshoot any failures as shown in the recipe. Cluster planning and tuning In a production environment with many nodes in a cluster and large number of operations per second, it is important to address performance, failures, backup and recovery of critical components and keep the downtime to minimal.At any given time, there can be many clients connected to the cluster, so it is important to layout the Datanodes across racks evenly, with right set of disks per node. As the cluster size grows to few thousand nodes, the load on the Namenode increases as it has to address larger namespace. What if in a cluster with 3000 Datanode nodes, all Datanode daemons are started in one go? It will generate block advertisement storm, overloading the Namenode, causing it to hang. To address this, we introduce initDelay, tune the Namenode handler count dfs.namenode.handler.count. What about decommissioning a Datanode with 4 TB of disk? How much time will it take? All these needs to be addressed by tuning the appropriate parameters. It is important to monitor the bandwidth/transfer rate to keep a track of the cluster health in terms of network, disk and other resources. As shown in the picture below, the AM container talks with all containers of a job as can be seen in the following image.Node master1.cyrus.com is the RM and AM got allocated on node rt2.cyrus.com. Fan-in and Fan-out of containers gives a great picture about the bottlenecks and the constraints per application type, as shown in the following image: Despite running Hadoop master nodes of reliable hardware, it is important to plan for failures and upgrades. Namenode and ResourceManagerHA, eliminates single point of failure and avoid cluster outages. The high availability (HA) can be achieved by using Shared NFS storage or using QJM. Securing the cluster When Hadoop becomes part of critical businesses like banks, financial domains, aviation, weather forecast, stock exchange, medicine and many more, it becomes important to secure the cluster and safe guard critical and sensitive data. By default, Hadoop is not security enabled for HDFS access or job execution. Every block can be access by any container in the cluster, thus making it unsuitable for critical multi-tenant environments. In any organization, there will not be a separate cluster for each BU, but there might be a few clusters shared by finance, sales, marketing. How do we safeguard each BU's data, do data masking and make sure that any HDFS block is accessed by an authenticated user? To enable security, we configure secure block access, in-transit encryption, Kerberos, ssl. The container on a Datanode, should access only the data blocks for which it is authorized and at the same time is should make things transparent to the end user, hiding the complexity of the implementation. Providing isolated networks for Hadoop clusters with VLAN segregation and firewalls to control the in and out-flow of data. This is well covered with details on secure access, in-transit transfer, Kerberos and single sign-on. Summary Hadoop distributed environment is a complex ecosystem with various challenges of installation, configuration, performance and security. There is no one solution, which fits all, as the work load, data formats for each job will be different with various file formats and data sizes. The recipe addresses the challenges faced by a Hadoop engineer to setup, maintain, optimize, and scale the cluster. Resources for Article: Further resources on this subject: Event detection from the news headlines in Hadoop [article] Sizing and Configuring your Hadoop Cluster [article] Getting Started with Apache Hadoop and Apache Spark [article]

0
0
1775

Packt

18 Jul 2017

15 min read

Up and Running with pandas

Packt

18 Jul 2017

15 min read

In this article by Michael Heydt, author of the book Learning Pandas - Second Edition, we will cover how to install pandas and start using its basic functionality. The content is provided as IPython and Jupyter notebooks, and hence we will also take a quick look at using both of those tools. We will utilize the Anaconda Scientific Python distribution from Continuum. Anaconda is a popular Python distribution with both free and paid components. Anaconda provides cross-platform support—including Windows, Mac, and Linux. The base distribution of Anaconda installs pandas, IPython and Jupyter notebooks, thereby making it almost trivial to get started. (For more resources related to this topic, see here.) IPython and Jupyter Notebook So far we have executed Python from the command line / terminal. This is the default Read-Eval-Print-Loop (REPL)that comes with Python. Let’s take a brief look at both. IPython IPython is an alternate shell for interactively working with Python. It provides several enhancements to the default REPL provided by Python. If you want to learn about IPython in more detail check out the documentation at https://ipython.org/ipython-doc/3/interactive/tutorial.html To start IPython, simply execute the ipython command from the command line/terminal. When started you will see something like the following: The input prompt which shows In [1]:. Each time you enter a statement in the IPythonREPL the number in the prompt will increase. Likewise, output from any particular entry you make will be prefaced with Out [x]:where x matches the number of the corresponding In [x]:.The following demonstrates: This numbering of in and out statements will be important to the examples as all examples will be prefaced with In [x]:and Out [x]: so that you can follow along. Note that these numberings are purely sequential. If you are following through the code in the text and occur errors in input or enter additional statements, the numbering may get out of sequence (they can be reset by exiting and restarting IPython). Please use them purely as reference. Jupyter Notebook Jupyter Notebook is the evolution of IPython notebook. It is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and markdown. The original IPython notebook was constrained to Python as the only language. Jupyter Notebook has evolved to allow many programming languages to be used including Python, R, Julia, Scala, and F#. If you want to take a deeper look at Jupyter Notebook head to the following URL http://jupyter.org/: where you will be presented with a page similar to the following: Jupyter Notebookcan be downloaded and used independently of Python. Anaconda does installs it by default. To start a Jupyter Notebookissue the following command at the command line/Terminal: $ jupyter notebook To demonstrate, let's look at how to run the example code that comes with the text. Download the code from the Packt website and and unzip the file to a directory of your choosing. In the directory, you will see the following contents: Now issue the jupyter notebook command. You should see something similar to the following: A browser page will open displaying the Jupyter Notebook homepage which is http://localhost:8888/tree. This will open a web browser window showing this page, which will be a directory listing: Clicking on a .ipynb link opens a notebook page like the following: The notebook that is displayed is HTML that was generated by Jupyter and IPython. It consists of a number of cells that can be one of 4 types: code, markdown, raw nbconvert or heading. Jupyter runs an IPython kernel for each notebook. Cells that contain Python code are executed within that kernel and the results added to the notebook as HTML. Double-clicking on any of the cells will make the cell editable. When done editing the contents of a cell, press shift-return, at which point Jupyter/IPython will evaluate the contents and display the results. If you wan to learn more about the notebook format that underlies the pages, see https://ipython.org/ipython-doc/3/notebook/nbformat.html. The toolbar at the top of a notebook gives you a number of abilities to manipulate the notebook. These include adding, removing, and moving cells up and down in the notebook. Also available are commands to run cells, rerun cells, and restart the underlying IPython kernel. To create a new notebook, go to File > New Notebook > Python 3 menu item. A new notebook page will be created in a new browser tab. Its name will be Untitled: The notebook consists of a single code cell that is ready to have Python entered. EnterIPython1+1 in the cell and press Shift + Enter to execute. The cell has been executed and the result shown as Out [2]:. Jupyter also opened a new cellfor you to enter more code or markdown. Jupyter Notebook automatically saves your changes every minute, but it's still a good thing to save manually every once and a while. One final point before we look at a little bit of pandas. Code in the text will be in the format of command -line IPython. As an example, the cell we just created in our notebook will be shown as follows: In [1]: 1+1 Out [1]: 2 Introducing the pandas Series and DataFrame Let’s jump into using some pandas with a brief introduction to pandas two main data structures, the Series and the DataFrame. We will examine the following: Importing pandas into your application Creating and manipulating a pandas Series Creating and manipulating a pandas DataFrame Loading data from a file into a DataFrame The pandas Series The pandas Series is the base data structure of pandas. A Series is similar to a NumPy array, but it differs by having an index which allows for much richer lookup of items instead of just a zero-based array index value. The following creates a Series from a Python list.: In [2]: # create a four item Series s = Series([1, 2, 3, 4]) s Out [2]:0 1 1 2 2 3 3 4 dtype: int64 The output consists of two columns of information. The first is the index and the second is the data in the Series. Each row of the output represents the index label (in the first column) and then the value associated with that label. Because this Series was created without specifying an index (something we will do next), pandas automatically creates an integer index with labels starting at 0 and increasing by one for each data item. The values of a Series object can then be accessed through the using the [] operator, paspassing the label for the value you require. The following gets the value for the label 1: In [3]:s[1] Out [3]: 2 This looks very much like normal array access in many programming languages. But as we will see, the index does not have to start at 0, nor increment by one, and can be many other data types than just an integer. This ability to associate flexible indexes in this manner is one of the great superpowers of pandas. Multiple items can be retrieved by specifying their labels in a python list. The following retrieves the values at labels 1 and 3: In [4]: # return a Series with the row with labels 1 and 3 s[[1, 3]] Out [4]: 1 2 3 4 dtype: int64 A Series object can be created with a user-defined index by using the index parameter and specifying the index labels. The following creates a Series with the same valuesbut with an index consisting of string values: In [5]:# create a series using an explicit index s = pd.Series([1, 2, 3, 4], index = ['a', 'b', 'c', 'd']) s Out [5}:a 1 b 2 c 3 d 4 dtype: int64 Data in the Series object can now be accessed by those alphanumeric index labels. The following retrieves the values at index labels 'a' and 'd': In [6]:# look up items the series having index 'a' and 'd' s[['a', 'd']] Out [6]:a 1 d 4 dtype: int64 It is still possible to refer to the elements of this Series object by their numerical 0-based position. : In [7]:# passing a list of integers to a Series that has # non-integer index labels will look up based upon # 0-based index like an array s[[1, 2]] Out [7]:b 2 c 3 dtype: int64 We can examine the index of a Series using the .index property: In [8]:# get only the index of the Series s.index Out [8]: Index(['a', 'b', 'c', 'd'], dtype='object') The index is itself actually a pandas objectand this output shows us the values of the index and the data type used for the index. In this case note that the type of the data in the index (referred to as the dtype) is object and not string. A common usage of a Series in pandas is to represent a time-series that associates date/time index labels withvalues. The following demonstrates by creating a date range can using the pandas function pd.date_range(): In [9]:# create a Series who's index is a series of dates # between the two specified dates (inclusive) dates = pd.date_range('2016-04-01', '2016-04-06') dates Out [9]:DatetimeIndex(['2016-04-01', '2016-04-02', '2016- 04-03', '2016-04-04', '2016-04-05', '2016-04-06'], dtype='datetime64[ns]', freq='D') This has created a special index in pandas referred to as a DatetimeIndex which is a specialized type of pandas index that is optimized to index data with dates and times. Now let's create a Series using this index. The data values represent high-temperatures on those specific days: In [10]:# create a Series with values (representing temperatures) # for each date in the index temps1 = Series([80, 82, 85, 90, 83, 87], index = dates) temps1 Out [10]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, dtype: int64 This type of Series with a DateTimeIndexis referred to as a time-series. We can look up a temperature on a specific data by using the date as a string: In [11]:temps1['2016-04-04'] Out [11]:90 Two Series objects can be applied to each other with an arithmetic operation. The following code creates a second Series and calculates the difference in temperature between the two: In [12]:# create a second series of values using the same index temps2 = Series([70, 75, 69, 83, 79, 77], index = dates) # the following aligns the two by their index values # and calculates the difference at those matching labels temp_diffs = temps1 - temps2 temp_diffs Out [12]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 The result of an arithmetic operation (+, -, /, *, …) on two Series objects that are non-scalar values returns another Series object. Since the index is not integerwe can also then look up values by 0-based value: In [13]:# and also possible by integer position as if the # series was an array temp_diffs[2] Out [13]: 16 Finally, pandas provides many descriptive statistical methods. As an example, the following returns the mean of the temperature differences: In [14]: # calculate the mean of the values in the Serie temp_diffs Out [14]: 9.0 The pandas DataFrame A pandas Series can only have a single value associated with each index label. To have multiple values per index label we can use a DataFrame. A DataFrame represents one or more Series objects aligned by index label. Each Series will be a column in the DataFrame, and each column can have an associated name.— In a way, a DataFrame is analogous to a relational database table in that it contains one or more columns of data of heterogeneous types (but a single type for all items in each respective column). The following creates a DataFrame object with two columns and uses the temperature Series objects: In [15]:# create a DataFrame from the two series objects temp1 # and temp2 and give them column names temps_df = DataFrame( {'Missoula': temps1, 'Philadelphia': temps2}) temps_df Out [15]: Missoula Philadelphia 2016-04-01 80 70 2016-04-02 82 75 2016-04-03 85 69 2016-04-04 90 83 2016-04-05 83 79 2016-04-06 87 77 The resulting DataFrame has two columns named Missoula and Philadelphia. These columns are new Series objects contained within the DataFrame with the values copied from the original Series objects. Columns in a DataFrame object can be accessed using an array indexer [] with the name of the column or a list of column names. The following code retrieves the Missoula column: In [16]:# get the column with the name Missoula temps_df['Missoula'] Out [16]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 And the following code retrieves the Philadelphia column: In [17]:# likewise we can get just the Philadelphia column temps_df['Philadelphia'] Out [17]:2016-04-01 70 2016-04-02 75 2016-04-03 69 2016-04-04 83 2016-04-05 79 2016-04-06 77 Freq: D, Name: Philadelphia, dtype: int64 A Python list of column names can also be used to return multiple columns.: In [18]:# return both columns in a different order temps_df[['Philadelphia', 'Missoula']] Out [18]: Philadelphia Missoula 2016-04-01 70 80 2016-04-02 75 82 2016-04-03 69 85 2016-04-04 83 90 2016-04-05 79 83 2016-04-06 77 87 There is a subtle difference in a DataFrame object as compared to a Series object. Passing a list to the [] operator of DataFrame retrieves the specified columns whereas aSerieswould return rows. If the name of a column does not have spaces it can be accessed using property-style: In [19]:# retrieve the Missoula column through property syntax temps_df.Missoula Out [19]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 Arithmetic operations between columns within a DataFrame are identical in operation to those on multiple Series. To demonstrate, the following code calculates the difference between temperatures using property notation: In [20]:# calculate the temperature difference between the two #cities temps_df.Missoula - temps_df.Philadelphia Out [20]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 A new column can be added to DataFrame simply by assigning another Series to a column using the array indexer [] notation. The following adds a new column in the DataFrame with the temperature differences: In [21]:# add a column to temp_df which contains the difference # in temps temps_df['Difference'] = temp_diffs temps_df Out [21]: Missoula Philadelphia Difference 2016-04-01 80 70 10 2016-04-02 82 75 7 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 The names of the columns in a DataFrame are accessible via the.columns property. : In [22]:# get the columns, which is also an Index object temps_df.columns Out [22]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') The DataFrame (and Series) objects can be sliced to retrieve specific rows. The following slices the second through fourth rows of temperature difference values: In [23]:# slice the temp differences column for the rows at # location 1 through 4 (as though it is an array) temps_df.Difference[1:4] Out [23]: 2016-04-02 7 2016-04-03 16 2016-04-04 7 Freq: D, Name: Difference, dtype: int64 Entire rows from a DataFrame can be retrieved using the .loc and .iloc properties. .loc ensures that the lookup is by index label, where .iloc uses the 0-based position. - The following retrieves the second row of the DataFrame. In[24]:# get the row at array position 1 temps_df.iloc[1] Out [24]:Missoula 82 Philadelphia 75 Difference 7 Name: 2016-04-02 00:00:00, dtype: int64 Notice that this result has converted the row into a Series with the column names of the DataFrame pivoted into the index labels of the resulting Series.The following shows the resulting index of the result. In [25]:# the names of the columns have become the index # they have been 'pivoted' temps_df.iloc[1].index Out [25]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') Rows can be explicitly accessed via index label using the .loc property. The following code retrieves a row by the index label: In [26]:# retrieve row by index label using .loc temps_df.loc['2016-04-05'] Out [26]:Missoula 83 Philadelphia 79 Difference 4 Name: 2016-04-05 00:00:00, dtype: int64 Specific rows in a DataFrame object can be selected using a list of integer positions. The following selects the values from the Differencecolumn in rows at integer-locations 1, 3, and 5: In [27]:# get the values in the Differences column in rows 1, 3 # and 5using 0-based location temps_df.iloc[[1, 3, 5]].Difference Out [27]:2016-04-02 7 2016-04-04 7 2016-04-06 10 Freq: 2D, Name: Difference, dtype: int64 Rows of a DataFrame can be selected based upon a logical expression that is applied to the data in each row. The following shows values in the Missoulacolumn that are greater than 82 degrees: In [28]:# which values in the Missoula column are > 82? temps_df.Missoula > 82 Out [28]:2016-04-01 False 2016-04-02 False 2016-04-03 True 2016-04-04 True 2016-04-05 True 2016-04-06 True Freq: D, Name: Missoula, dtype: bool The results from an expression can then be applied to the[] operator of a DataFrame (and a Series) which results in only the rows where the expression evaluated to Truebeing returned: In [29]:# return the rows where the temps for Missoula > 82 temps_df[temps_df.Missoula > 82] Out [29]: Missoula Philadelphia Difference 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 This technique is referred to as boolean election in pandas terminologyand will form the basis of selecting rows based upon values in specific columns (like a query in —SQL using a WHERE clause - but as we will see also being much more powerful). Visualization We will dive into visualization in quite some depth in Chapter 14, Visualization, but prior to then we will occasionally perform a quick visualization of data in pandas. Creating a visualization of data is quite simple with pandas. All that needs to be done is to call the .plot() method. The following demonstrates by plotting the Close value of the stock data. In [40]: df[['Close']].plot(); Summary In this article, took an introductory look at the pandas Series and DataFrame objects, demonstrating some of the fundamental capabilities. This exposition showed you how to perform a few basic operations that you can use to get up and running with pandas prior to diving in and learning all the details. Resources for Article: Further resources on this subject: Using indexes to manipulate pandas objects [article] Predicting Sports Winners with Decision Trees and pandas [article] The pandas Data Structures [article]

0
0
1998

Packt

17 Jul 2017

11 min read

Introduction to IOT

Packt

17 Jul 2017

11 min read

In this article by Kallol Bosu Roy Choudhuri, the author of the book Learn Arduino Prototyping in 10 days, we will learn about IoT. (For more resources related to this topic, see here.) As per Gartner, the number of connected devices around the world is going to reach 50 billion by the year 2020. Just imagine the magnitude and scale of the hyper-connectedness that is being forged every moment, as we read through this exciting article. Figure 1: A typical IoT scenario (automobile example) As we can see in the preceding figure, a typical IoT-based scenario is composed of the following fundamental building blocks: IoT edge device IoT cloud platform An IoT device is used to serve as a bridge between existing machines on the ground and an IoT cloud platform. The IoT cloud platform provides a cloud-based infrastructure backbone for data acquisition, data storage, and computing power for data analytics and reporting. The Arduino platform can be effectively used for prototyping IoT devices for almost any IoT solution very rapidly. Building the edge device In this section, we will learn how to use the ESP8266 Wi-Fi chip with the Arduino Uno for connecting to the Internet and posting data to an IoT cloud. There are numerous IoT cloud players in the market today, including Microsoft Azure and Amazon IoT. In this article, we will use the ThingSpeak IoT platform that is very simple to use with the Arduino platform. The following parts will be required for this prototype: 1 Arduino Uno R3 1 USB Cable 1 ESP8266-01 Wi-Fi Transceiver module 1 Breadboard 1 pc. 1K Ohms resistor 1 pc. 2k ohms resistor Some jumper wires Once all the parts have been assembled, follow the breadboard circuit shown in the following figure and build the edge device: Figure 2: ESP8266 with Arduino Uno Wiring The important facts to remember in the preceding setup are: The RXD pin of the ESP8266 chip should receive a 3.3V input signal. We have ensured this by employing the voltage division method. For test purposes, the preceding setup should work fine. However, the ESP8266 chip is demanding when it comes to power (read current) consumption, especially during transmission cycles. Just in case the ESP8266 chip does not respond to the Arduino sketch or AT commands properly, then the power supply may not be enough. Try using a separate battery for the setup. When using a separate battery, remember to use a voltage regulator that steps down the voltage to 3.3 volts before supplying the ESP8266 chip. For prolonged usage, a separate battery based power supply is recommended. Smart retail project inspiration In the previous sections, we looked at the basics of achieving IoT prototyping with the Arduino platform and an IoT cloud platform. With this basic knowledge, you are encouraged to start exploring more advanced IoT scenarios. As a future inspiration, the following smart retail project idea is being provided for you to try by applying the basic principles that you have learned in this article. After all, the goal of this article has been to show you the light and make you self-reliant with the Arduino platform. Imagine a large retail store where products are displayed in hundreds of aisles and thousands of racks. Such large warehouse-type retail layouts are common in some countries, usually with furniture sellers. One of the time-consuming tasks that these retail shops face is to keep the price of their displayed inventory matched with the ever changing competitive market rates. For example, the price of a sofa set could be marked at 350 dollars on aisle number 47 rack number 1. Now let's think from a customer’s perspective. Imagine being a potential customer, standing in front of the sofa set we would naturally search for the prices of that sofa set on the Internet. It would not be very surprising to find a similar sofa set that is priced a little lower, maybe at 325 dollars at another store. That is exactly how and when a potential customer would change her mind. The story after this is simple. The customer leaves store A, goes to store B and purchases the sofa set at 325 dollars. In order to address these types of lost sale opportunities, the furniture company management decides to lower the prices of the sofa set to 325 dollars, in order to match the competition. Thereafter, all that needs to be done is for someone to change the price label for the sofa set in aisle number 47 rack number 1, which is a 5–10 minute walk (considering the size of the shop floor) from the shop back office. In a localized store, it is still achievable without further loss of customers. Now, let's appreciate the problem by thinking hyperscale. The furniture seller’s management is located centrally, say in Sweden and they want to dynamically change the product prices for specific price-sensitive products that are displayed across more than 350 stores, in more than 40 countries. The price change should be automatic, near real time, and simultaneous across all company stores. Given the preceding problem statement, it seems a daunting task that could leave hundreds for shop floor staff scampering all day long, just to keep changing price tags for hundreds of products. An elegant solution to this problem lies in the Internet-of-Things. Figure 3: Smart retail with IoT Referring to the preceding figure, it depicts a basic IoT solution for addressing the furniture seller’s problem to matching product prices dynamically on the fly. Remember, since this is an Arduino prototyping article, the stress is on edge device prototyping. However, IoT as a whole; encompasses many areas that include edge devices and cloud platform capabilities. Our focus in this section is to be able to comprehend and build the edge device prototype that supports this smart retail IoT solution. In the preceding solution, the cloud platform takes care of running intelligent program batches to continuously analyze market rates for price-sensitive products. Once a new price has been determined by the cloud job/program, the new product prices are updated in the cloud-hosted database. Immediately after a new price change, all the IoT edge devices attached to the price tag of specific products in the company stores are notified of the new price. We can build a smart LCD panel for displaying product prices. For Internet connectivity, we can reuse the ESP8266 Wi-Fi chip that we learned in this article. Standalone Devices We are already familiar with the basic parts required for building a prototype. The two new aspects to consider when building a standalone project are an independent power source and a project container. Figure 4 - Typical parts of a Standalone Prototype As shown above, typically a standalone device prototype will contain the following parts: The device prototype (Arduino board + Peripherals + all the required Connections) An independent power source A project container/box After the basic prototype has been built the next consideration is to make it operable on its own, like an island. This is because in real world situations, you will often have to make a device that is not directly be connected to and powered from a computer. Therefore we will need to understand the various options that are available for powering our device prototypes and also understand when to choose which option. The second aspect to consider is an appropriate physical container to house the various parts of a project. A container is a physical device container will ensure that all parts of a project are nicely packaged in a safe and aesthetic manner. A distance measurement device Let's build an exciting project by combining an Ultrasonic Sensor with a 16x2 LCD character display to build an electronic distance measurement device. We will use one of the most easily available 9-volt batteries for powering this standalone device prototype. For building the distance measurement device, the following parts will be required. Arduino Uno R3 USB connector 1 pc. 9 volt battery 1 full sized bread board 1 HC-SR04 ultrasonic sensor 1 pc. 16x2 LCD character display 1 pc. 10K potentiometer 2 pcs. 220 ohms resistor 1 pc. 150 ohms resistor Some jumper wires Before we start building the device, let's understand what the device will do and the various parts involved in the device. The purpose of the device will be to be able to measure the distance of an object from the device. The following diagram depicts the overview of the device: Figure 5 - A standalone distance measurement device overview First, let's quickly understand each of the components involved in the preceding setup. Then, we will jump into hands-on prototyping and coding. The ultrasonic sensor model used in this example is known as HC-SR04. HC-SR04 is a standard commercially available ultrasonic transceiver. A transceiver is a device that is capable of transmitting as well as receiving signals. The HC-SR04 transceiver transmits ultrasonic signals. Once the signals hit an object/obstacle, the signals echo back to the HC-SR04 transceiver. The HC-SR04 ultrasonic module is shown below for reference. Figure 6 - The HC-SR04 Ultrasonic module The HC-SR-04 has four pins. The usage of the pins is explained below for easy understanding: Vcc: This pin is connected to a 5 volt power supply. Trig: This pin receives digital signals from the attached microcontroller unit in order to send out an ultrasonic burst. Echo: This pin sends the measured time duration proportional to the distance travelled by the ultrasonic burst. Gnd: This pin is connected to the ground terminal. The total time taken for the ultrasonic signals to echo back from an obstacle can be divided by 2 and then based on the speed of sound in air, the distance between the object and the HC-SR04 can be calculated. We will see how to calculate the distance in the sketch for this device prototype. As per the HC-SR04 data sheet, it is a 5-volt tolerant device, operating at 15 mA, and has a measurement range starting from 2 centimeters to a maximum of 4 meters. The HC-SR04 can be directly connected to the Arduino board pins. The 16x2 LCD character display is also a standard commercially available device, it has 16 columns and 2 rows. The LCD is controlled by its 4 data pins/lines. We will also see how to send string outputs to the LCD from the Arduino sketch. The power supply being used in today's example is a standard 9-volt battery plugged in Arduino's DC IN Jack. Alternatively, another option is to use 6 pieces of either AA-sized or AAA-sized batteries in series and plug them into the VIN pin of the Arduino board. Distance measurement device circuit Follow the bread board diagram shown next to build the distance measurement device. The diagram shown on the next page is quite complex. Take your time as you unravel through it. All the components (including the Arduino board) in this prototype are powered from the 9 volt battery. Sometimes the LCD procured online might not ship with soldered header pins. In such a case, you will have to solder 16 header pins. It is very important to note that unless the header pins are soldered properly into the LCD board, the LCD screen will not work correctly. This is a very challenging prototype to get it working at one go. Make sure there are no loose jumper wires. Notice how the positive and negative terminals of the power source are plugged into the VIN and GND pins of the Arduino board respectively. The 10K potentiometer has three legs. If you look straight at the breadboard diagram, the left hand side leg of the potentiometer is connected to the 5V power supply rail of the breadboard. Figure 7 - Typical potentiometersr The right hand-side leg is connected to the common ground rail of the breadboard. The leg in the middle is the regulated (via the potentiometer's 10K resistance dial) output that controls the LCD's V0/VEE pin. Basically, this pin controls the contrast of the display. You will also have to adjust (a simple screw driver may be used) the 10K potentiometer dial (to around halfway at 5K) to make the characters visible on the LCD screen. Initially, you may not see anything on the LCD, until the potentiometer is adjusted properly. Figure 8 - Distance measurement device prototype When the 'Trig' pin receives a signal (via pin D8 in this example) and results in sending out ultrasonic waves to the surroundings. As soon as the ultrasonic waves collide with an obstacle, they get reflected. The reflected ultrasonic waves are received by the HC-SR04 sensor. The Echo pin is used to read the output of the ultrasonic sensor (via pin D7 in this example). The output read from the 'Echo' pin is processed by the Arduino sketch to calculate the distance. Summary Thus an IoT device is used to serve as a bridge between existing machines on the ground and an IoT cloud platform. Resources for Article: Further resources on this subject: Introducing IoT with Particle's Photon and Electron [article] IoT and Decision Science [article] IoT Analytics for the Cloud [article]

0
0
3566

article-image-initial-configuration-sco-2016

Packt

17 Jul 2017

13 min read

Initial Configuration of SCO 2016

Packt

17 Jul 2017

13 min read

In this article by Michael Seidl, author of the book Microsoft System Center 2016 Orchestrator Cookbook - Second Edition, will show you how to setup Orchestrator Environment and how to deploy and configure Orchestrator Integration Packs. (For more resources related to this topic, see here.) Deploying an additional Runbook designer Runbook designer is the key feature to build your Runbooks. After the initial installation, Runbook designer is installed on the server. For your daily work with orchestrator and Runbooks, you would like to install the Runbook designer on your client or on admin server. We will go through these steps in this recipe. Getting ready You must review the planning the Orchestrator deployment recipe before performing the steps in this recipe. There are a number of dependencies in the planning recipe you must perform in order to successfully complete the tasks in this recipe. You must install a management server before you can install the additional Runbook Designers. The user account performing the installation has administrative privileges on the server nominated for the SCO deployment and must also be a member of OrchestratorUsersGroup or equivalent rights. The example deployment in this recipe is based on the following configuration details: Management server called TLSCO01 with a remote database is already installed System Center 2016 Orchestrator How to do it... The Runbook designer is used to build Runbooks using standard activities and or integration pack activities. The designer can be installed on either a server class operating system or a client class operating system. Follow these steps to deploy an additional Runbook Designer using the deployment manager: Install a supported operating system and join the active directory domain in scope of the SCO deployment. In this recipe the operating system is Windows 10. Ensure you configure the allowed ports and services if the local firewall is enabled for the domain profile. See the following link for details: https://technet.microsoft.com/en-us/library/hh420382(v=sc.12).aspx. Log in to the SCO Management server with a user account with SCO administrative rights. Launch System Center 2016 Orchestrator Deployment Manager: Right-click on Runbook designers, and select Deploy new Runbook Designer: Click on Next on the welcome page. Type the computer name in the Computer field and click on Add. Click on Next. On the Deploy Integration Packs or Hotfixes page check all the integration packs required by the user of the Runbook designer (for this example we will select the AD IP). Click on Next. Click on Finish to begin the installation using the Deployment Manager. How it works... The Deployment Manager is a great option for scaling out your Runbook Servers and also for distributing the Runbook Designer without the need for the installation media. In both cases the Deployment Manager connects to the Management Server and the database server to configure the necessary settings. On the target system the deployment manager installs the required binaries and optionally deploys the integration packs selected. Using the Deployment Manager provides a consistent and coordinated approach to scaling out the components of a SCO deployment. See also The following official web link is a great source of the most up to date information on SCO: https://docs.microsoft.com/en-us/system-center/orchestrator/ Registering an SCO Integration Pack Microsoft System Center 2016 Orchestrator (SCO) automation is driven by process automation components. These process automation components are similar in concept to a physical toolbox. In a toolbox you typically have different types of tools which enable you to build what you desire. In the context of SCO these tools are known as Activities. Activities fall into two main categories: Built-in Standard Activities: These are the default activity categories available to you in the Runbook Designer. The standard activities on their own provide you with a set of components to create very powerful Runbooks. Integration Pack Activities: Integration Pack Activities are provided either by Microsoft, the community, solution integration organizations, or are custom created by using the Orchestrator Integration Pack Toolkit. These activities provide you with the Runbook components to interface with the target environment of the IP. For example, the Active Directory IP has the activities you can perform in the target Active Directory environment. This recipe provides the steps to find and register the second type of activities into your default implementation of SCO. Getting ready You must download the Integration Pack(s) you plan to deploy from the provider of the IP. In this example we will be deploying the Active Directory IP, which can be found at the following link: https://www.microsoft.com/en-us/download/details.aspx?id=54098. You must have deployed a System Center 2016 Orchestrator environment and have full administrative rights in the environment. How to do it... The following diagram provides a visual summary and order of the tasks you need to perform to complete this recipe: We will deploy the Microsoft Active Directory (AD) integration pack (IP). Integration pack organization A good practice is to create a folder structure for your integration packs. The folders should reflect versions of the IPs for logical grouping and management. The version of the IP will be visible in the console and as such you must perform this step after you have performed the step to load the IP(s). This approach will aid in change management when updating IPs in multiple environments. Follow these steps to deploy the Active Directory integration pack. Identify the source location for the Integration Pack in scope (for example, the AD IP for SCO2016). Download the IP to a local directory on the Management Server or UNC share. Log in to the SCO Management server. Launch the Deployment Manager: Under Orchestrator Management Server, right-click on Integration Packs. Select Register IP with the Orchestrator Management Server: Click on Next on the welcome page. Click on Add on the Select Integration Packs or Hotfixes page. Navigate to the directory where the target IP is located, click on Open, and then click on Next. Click on Finish . Click on Accept on End-User License Agreement to complete the registration. Click on Refresh to validate if the IP has successfully been registered. How it works... The process of loading an integration pack is simple. The prerequisite for successfully registering the IP (loading) is ensuring you have downloaded a supported IP to a location accessible to the SCO management server. Additionally the person performing the registration must be a SCO administrator. At this point we have registered the Integration Pack to our Deployment Wizard, 2 Steps are still necessary before we can use the Integration Pack, see our following Recipe for this. There's more... Registering the IP is the first part of the process of making the IP activities available to Runbook designers and Runbook Servers. The next Step has to be the Deployment of Integration Packs to Runbook Designer. See the next Recipe for that. Orchestrator Integration Packs are provided not only by Microsoft, also third party Companies like Cisco or NetAPP are providing OIP’s for their Products. Additionally there is a huge Community which are providing Orchestrator Integration Packs. There are several Sources of downloading Integration Packs, here are two useful links: http://www.techguy.at/liste-mit-integration-packs-fuer-system-center-orchestrator/ http://scorch.codeplex.com/ https://www.microsoft.com/en-us/download/details.aspx?id=54098 Deploying the IP to designers and Runbook servers Registering the Orchestrator Integration Pack is only the first step, you also need to deploy the OIP to your Designer or Runbook Server. Getting Ready You have to follow the steps described in Recipe Registering an SCO Integration Pack before you can start with the next steps to deploy an OIP. How to do it In our example we will deploy the Active Direcgtory Integration Pack to our Runbooks Desginer. Follow these steps to deploy the Active Directory integration pack. Once the IP in scope (AD IP in our example) has successfully been registered, follow these steps to deploy it to the Runbook Designers and Runbook Servers. Log in to the SCO Management server and launch Deployment Manager: Under Orchestrator Management Server, right-click on the Integration Pack in scope and select Deploy IP to Runbook Server or Runbook Designer: Click on Next on the welcome page, select the IP you would like to deploy (in our example, System Center Integration Pack for Active Directory , and then click on Next. On the computer Selection page. Type the name of the Runbook Server or designer in scope and click on Add (repeat for all servers in the scope). On the Installation Options page you have the following three options: Schedule the Installation: select this option if you want to schedule the deployment for a specific time. You still have to select one of the next two options. Stop all running Runbooks before installing the Integration Packs or Hotfixes: This option will as described stop all current Runbooks in the environment. Install the Integration Packs or Hotfixes without stopping the running Runbooks: This is the preferred option if you want to have a controlled deployment without impacting current jobs: Click on Next after making your installation option selection. Click on Finish The integration pack will be deployed to all selected designers and Runbook servers. You must close all Runbook designer consoles and re-launch to see the newly deployed Integration Pack. How it works… The process of deploying an integration pack is simple. The pre-requisite for successfully deploying the IP (loading) is ensuring you have registered a supported IP in the SCO management server. Now we have successfully deployed an Orchestrator Integration Pack. If you have deployed it to a Runbook designer, make sure you close and reopen the designer to be able to use the activities in this Integration Pack. Now your are able to use these activities to build your Runbooks, the only thing you have to do, is to follow our next recipe and configure this Integration Pack. This steps can be used for each single Integration Pack, also deploy multiple OIP with one deployment. There’s more… You have to deploy an OIP to every single Designer and Runbook Server, where you want to work with the Activities. Doesn’t matter if you want to edit a Runbook with the Designer or want to run a Runbook on a special Runbook Server, the OIP has to be deployed to both. With Orchestrator Deployment Manager, this is a easy task to do. Initial Integration Pack configuration This recipe provides the steps required to configure an integration pack for use once it has been successfully deployed to a Runbook designer. Getting ready You must deploy an Orchestrator environment and also deploy the IP you plan to configure to a Runbook designer before following the steps in this recipe. The authors assume the user account performing the installation has administrative privileges on the server nominated for the SCO Runbook designer. How to do it... Each integration pack serves as an interface to the actions SCO can perform in the target environment. In our example we will be focusing on the Active Directory connector. We will have two accounts under two categories of AD tasks in our scenario: IP name Category of actions Account name Active Directory Domain Account Management SCOAD_ACCMGT Active Directory Domain Administrator Management SCOAD_DOMADMIN The following diagram provides a visual summary and order of the tasks you need to perform to complete this recipe: Follow these steps to complete the configuration of the Active Directory IP options in the Runbook Designer: Create or identify an existing account for the IP tasks. In our example we are using two accounts to represent two personas of a typical active directory delegation model. SCOAD_ACCMGT is an account with the rights to perform account management tasks only and SCOAD_DOMADMIN is a domain admin account for elevated tasks in Active Directory. Launch the Runbook Designer as a SCO administrator, select Options from the menu bar, and select the IP to configure (in our example, Active Directory). Click on Add, type AD Account Management in the Name: field, select Microsoft Active Directory Domain Configuration in the Type field by clicking on the. In the Properties section type the following: Configuration User Name: SCOAD_ACCMGT Configuration Password: Enter the password for SCOAD_ACCMGT Configuration Domain Controller Name (FQDN): The FQDN of an accessible domain controller in the target AD (In this example, TLDC01.TRUSTLAB.LOCAL). Configuration Default Parent Container: This is an optional field. Leave it blank: Click on OK. Repeat steps 3 and 4 for the Domain Admin account and click on Finish to complete the configuration. How it works... The IP configuration is unique for each system environment SCO interfaces with for the tasks in scope of automation. The active directory IP configuration grants SCO the rights to perform the actions specified in the Runbook using the activities of the IP. Typical Active Directory activities include, but are not limited to creating user and computer accounts, moving user and computer accounts into organizational units, or deleting user and computer accounts. In our example we created two connection account configurations for the following reasons: Follow the guidance of scoping automation to the rights of the manual processes. If we use the example of a Runbook for creating user accounts we do not need domain admin access. A service desk user performing the same action manually would typically be granted only account management rights in AD. We have more flexibility with delegating management and access to Runbooks. Runbooks with elevated rights through the connection configuration can be separated from Runbooks with lower rights using folder security. The configuration requires planning and understanding of its implication before implementing. Each IP has its own unique options which you must specify before you create Runbooks using the specified IP. The default IPs that you can download from Microsoft include the documentation on the properties you must set. There’s more… As you have seen in this recipe, we need to configure each additional Integration Pack with a Connections String, User and Password. The built in Activities from SCO, are using the Service Account rights to perform this Actions, or you can configure a different User for most of the built in Activities. See also The official online documentation for Microsoft Integration Packs is updated regularly and should be a point for reference at https://www.microsoft.com/en-us/download/details.aspx?id=54098 The creating and maintaining a security model for Orchestrator in this article expands further on the delegation model in SCO. Summary In this article, we have covered the following: Deploying an Additional Runbook Designer Registering an SCO Integration Pack Deploying an SCO Integration Pack to Runbook Designer and Server Initial Integration Pack Configuration Resources for Article: Further resources on this subject: Deploying the Orchestrator Appliance [article] Unpacking System Center 2012 Orchestrator [article] Planning a Compliance Program in Microsoft System Center 2012 [article]

0
0
6354

Packt

17 Jul 2017

13 min read

Introduction to Moodle 3

Packt

17 Jul 2017

13 min read

0
0
3023

Packt

11 Jul 2017

10 min read

Introduction to HoloLens

Packt

11 Jul 2017

10 min read

In this article, Abhijit Jana, Manish Sharma, and Mallikarjuna Rao, the authors of the book, HoloLens Blueprints, we will be covering the following points to introduce you to using HoloLens for exploratory data analysis. Digital Reality - Under the Hood Holograms in reality Sketching the scenarios 3D Modeling workflow Adding Air Tap on speaker Real-time visualization through HoloLens (For more resources related to this topic, see here.) Digital Reality - Under the Hood Welcome to the world of Digital Reality. The purpose of Digital Reality is to bring immersive experiences, such as taking or transporting you to different world or places, make you interact within those immersive, mix digital experiences with reality, and ultimately open new horizons to make you more productive. Applications of Digital Reality are advancing day by day; some of them are in the field of gaming, education, defense, tourism, aerospace, corporate productivity, enterprise applications, and so on. The spectrum and scenarios of Digital Reality are huge. In order to understand them better, they are broken down into three different categories: Virtual Reality (VR): It is where you are disconnected from the real world and experience the virtual world. Devices available on the market for VR are Oculus Rift, Google VR, and so on. VR is the common abbreviation of Virtual Reality. Augmented Reality (AR): It is where digital data is overlaid over the real world. Pokémon GO, one of the very famous games, is an example of the this globally. A device available on the market, which falls under this category, is Google Glass. Augmented Reality is abbreviated to AR. Mixed Reality (MR): It spreads across the boundary of the real environment and VR. Using MR, you can have a seamless and immersive integration of the virtual and the real world. Mixed Reality is abbreviated to MR. This topic is mainly focused on developing MR applications using Microsoft HoloLens devices. Although these technologies look similar in the way they are used, and sometimes the difference is confusing to understand, there is a very clear boundary that distinguishes these technologies from each other. As you can see in the following diagram, there is a very clear distinction between AR and VR. However, MR has a spectrum, that overlaps across all three boundaries of real world, AR, and MR. Digital Reality Spectrum The following table describes the differences between the three: Holograms in reality Till now, we have mentioned Hologram several times. It is evident that these are crucial for HoloLens and Holographic apps, but what is a Hologram? Virtual Reality Complete Virtual World User is completely isolated from the Real World Device examples: Oculus Rift and Google VR Augmented Reality Overlays Data over the real world Often used for mobile devices Device example: Google Glass Application example: Pokémon GO Mixed Reality Seamless integration of the real and virtual world Virtual world interacts with Real world Natural interactions Device examples: HoloLens and Meta Holograms are the virtual objects which will be made up with light and sound and blend with the real world to give us an immersive MR experience with both real and virtual worlds. In other words, a Hologram is an object like any other real-world object; the only difference is that it is made up of light rather than matter. The technology behind making holograms is known as Holography. The following figure represent two holographic objects placed on the top of a real-size table and gives the experience of placing a real object on a real surface: Holograms objects in real environment Interacting with holograms There are basically five ways that you can interact with holograms and HoloLens. Using your Gaze, Gesture, and Voice and with spatial audio and spatial mapping. Spatial mapping provides a detailed illustration of the real-world surfaces in the environment around HoloLens. This allows developers to understand the digitalized real environments and mix holograms into the world around you. Gaze is the most usual and common one, and we start the interaction with it. At any time, HoloLens would know what you are looking at using Gaze. Based on that, the device can take further decisions on the gesture and voice that should be targeted. Spatial audio is the sound coming out from HoloLens and we use spatial audio to inflate the MR experience beyond the visual. HoloLens Interaction Model Sketching the scenarios The next step after elaborating scenario details is to come up with sketches for this scenario. There is a twofold purpose for sketching; first, it will be input to the next phase of asset development for the 3D Artist, as well as helping to validate requirements from the customer, so there are no surprises at the time of delivery. For sketching, either the designer can take it up on their own and build sketches, or they can take help from the 3D Artist. Let's start with the sketch for the primary view of the scenario, where the user is viewing the HoloLens's hologram: Roam around the hologram to view it from different angles Gaze at different interactive components Sketch for user viewing hologram for the HoloLens Sketching - interaction with speakers While viewing the hologram, a user can gaze at different interactive components. One such component, identified earlier, is the speaker. At the time of gazing at the speaker, it should be highlighted and the user can then Air Tap at it. The Air Tap action should expand the speaker hologram and the user should be able to view the speaker component in detail. Sketch for expanded speakers After the speakers are expanded, the user should be able to visualize the speaker components in detail. Now, if the user Air Taps on the expanded speakers, the application should do the following: Open the textual detail component about the speakers; the user can read the content and learn about the speakers in detail Start voice narration, detailing speaker details The user can also Air Tap on the expanded speaker component, and this action should close the expanded speaker Textual and voice narration for speaker details As you did sketching for the speakers, apply a similar approach and do sketching for other components, such as lenses, buttons, and so on. 3D Modeling workflow Before jumping to 3D Modeling, let's understand the 3D Modeling workflow across different tools that we are going to use during the course of this topic. The following diagram explains the flow of the 3D Modeling workflow: Flow of 3D Modeling workflow Adding Air Tap on speaker In this project, we will be using the left-side speaker for applying Air Tap on speaker. However, you can apply the same for the right-side speaker as well. Similar to Lenses, we have two objects here which we need to identify from the object explorer. Navigate to Left_speaker_geo and left_speaker_details_geo in Object Hierarchy window Tag them as leftspeaker and speakerDetails respectively By default, when you are just viewing the Holograms, we will be hiding the speaker details section. This section only becomes visible when we do the Air Tap, and goes back again when we Air Tap again: Speaker with Box Collider Add a new script inside the Scripts folder, and name it ShowHideBehaviour. This script will handle the Show and Hide behaviour of the speakerDetails game object. Use the following script inside the ShowHideBehaviour.cs file. This script we can use for any other object to show or hide. public class ShowHideBehaviour : MonoBehaviour { public GameObject showHideObject; public bool showhide = false; private void Start() { try { MeshRenderer render = showHideObject.GetComponent InChildren<MeshRenderer>(); if (render != null) { render.enabled = showhide; } } catch (System.Exception) { } } } The script finds the MeshRenderer component from the gameObject and enables or disables it based on the showhide property. In this script, the showhide is property exposed as public, so that you can provide the reference of the object from the Unity scene itself. Attach ShowHideBehaviour.cs as components in speakerDetails tagged object. Then drag and drop the object in the showhide property section. This just takes the reference for the current speaker details objects and will hide the object in the first instance. Attach show-hide script to the object By default, it is unchecked, showhide is set to false and it will be hidden from view. At this point in time, you must check the left_speaker_details_geo on, as we are now handling visibility using code. Now, during the Air Tapped event handler, we can handle the render object to enable visibility. Add a new script by navigating from the context menu Create | C# Scripts, and name it SpeakerGestureHandler. Open the script file in Visual Studio. Similar to SpeakerGestureHandler, by default, the SpeakerGestureHandler class will be inherited from the MonoBehaviour. In the next step, implement the InputClickHandler interface in the SpeakerGestureHandler class. This interface implement the methods OnInputClicked() that invoke on click input. So, whenever you do an Air Tap gesture, this method is invoked. RaycastHit hit; bool isTapped = false; public void OnInputClicked(InputEventData eventData) { hit = GazeManager.Instance.HitInfo; if (hit.transform.gameObject != null) { isTapped = !isTapped; var lftSpeaker = GameObject.FindWithTag("LeftSpeaker"); var lftSpeakerDetails = GameObject.FindWithTag("speakerDetails"); MeshRenderer render = lftSpeakerDetails.GetComponentInChildren <MeshRenderer>(); if (isTapped) { lftSpeaker.transform.Translate(0.0f, -1.0f * Time.deltaTime, 0.0f); render.enabled = true; } else { lftSpeaker.transform.Translate(0.0f, 1.0f * Time.deltaTime, 0.0f); render.enabled = false; } } } When it is gazed, we find the game object for both LeftSpeaker and speakerDetails by the tag name. For the LeftSpeaker object, we are applying transformation based on tapped or not tapped, which worked like what we did for lenses. In the case of speaker details object, we have also taken the reference of MeshRenderer to make it's visibility true and false based on the Air Tap. Attach the SpeakerGestureHandler class with leftSpeaker Game Object. Air Tap in speaker – see it in action Air Tap action for speaker is also done. Save the scene, build and run the solution in emulator once again. When you can see the cursor on the speaker, perform Air Tap. Default View and Air Tapped View Real-time visualization through HoloLens We have learned about the data ingress flow, where devices connect with the IoT Hub, and stream analytics processes the stream of data and pushes it to storage. Now, in this section, let's discuss how this stored data will be consumed for data visualization within holographic application. Solution to consume data through services Summary In this article, we demonstrated using HoloLens, for exploring Digital Reality - Under the Hood, Holograms in reality, Sketching the scenarios, 3D Modeling workflow, Adding Air Tap on speaker, and Real-time visualization through HoloLens. Resources for Article: Further resources on this subject: Creating Controllers with Blueprints [article] Raspberry Pi LED Blueprints [article] Exploring and Interacting with Materials using Blueprints [article]

0
0
3053

Packt

11 Jul 2017

19 min read

Massive Graphs on Big Data

Packt

11 Jul 2017

19 min read

In this article by Rajat Mehta, author of the book Big Data Analytics with Java, we will learn about graphs. Graphs theory is one of the most important and interesting concepts of computer science. Graphs have been implemented in real life in a lot of use cases. If you use a GPS on your phone or a GPS device and it shows you a driving direction to a place, behind the scene there is an efficient graph that is working for you to give you the best possible direction. In a social network you are connected to your friends and your friends are connected to other friends and so on. This is a massive graph running in production in all the social networks that you use. You can send messages to your friends or follow them or get followed all in this graph. Social networks or a database storing driving directions all involve massive amounts of data and this is not data that can be stored on a single machine, instead this is distributed across a cluster of thousands of nodes or machines. This massive data is nothing but big data and in this article we will learn how data can be represented in the form of a graph so that we make analysis or deductions on top of these massive graphs. In this article, we will cover: A small refresher on the graphs and its basic concepts A small introduction on graph analytics, its advantages and how Apache Spark fits in Introduction on the GraphFrames library that is used on top of Apache Spark Before we dive deeply into each individual section, let's look at the basic graph concepts in brief. (For more resources related to this topic, see here.) Refresher on graphs In this section, we will cover some of the basic concepts of graphs, and this is supposed to be a refresher section on graphs. This is a basic section, hence if some users already know this information they can skip this section. Graphs are used in many important concepts in our day-to-day lives. Before we dive into the ways of representing a graph, let's look at some of the popular use cases of graphs (though this is not a complete list) Graphs are used heavily in social networks In finding driving directions via GPS In many recommendation engines In fraud detection in many financial companies In search engines and in network traffic flows In biological analysis As you must have noted earlier, graphs are used in many applications that we might be using on a daily basis. Graphs is a form of a data structure in computer science that helps in depicting entities and the connection between them. So, if there are two entities such as Airport A and Airport B and they are connected by a flight that takes, for example, say a few hours then Airport A and Airport B are the two entities and the flight connecting between them that takes those specific hours depict the weightage between them or their connection. In formal terms, these entities are called as vertexes and the relationship between them are called as edges. So, in mathematical terms, graph G = {V, E},that is, graph is a function of vertexes and edges. Let's look at the following diagram for the simple example of a graph: As you can see, the preceding graph is a set of six vertexes and eight edges,as shown next: Vertexes = {A, B, C, D, E, F} Edges = { AB, AF, BC, CD, CF, DF, DE, EF} These vertexes can represent any entities, for example, they can be places with the edges being 'distances' between the places or they could be people in a social network with the edges being the type of relationship, for example, friends or followers. Thus, graphs can represent real-world entities like this. The preceding graph is also called a bidirected graph because in this graph the edges go in either direction that is the edge from A to B can be traversed both ways from A to B as well as from B to A. Thus, the edges in the preceding diagrams that is AB can be BA or AF can be FA too. There are other types of graphs called as directed graphs and in these graphs the direction of the edges go in one way only and does not retrace back. A simple example of a directed graph is shown as follows:. As seen in the preceding graph, the edge A to B goes only in one direction as well as the edge B to C. Hence, this is a directed graph. A simple linked list data structure or a tree datastructure are also forms of graph only. In a tree, nodes can have children only and there are no loops, while there is no such rule in a general graph. Representing graphs Visualizing a graph makes it easily comprehensible but depicting it using a program requires two different approaches Adjacency matrix: Representing a graph as a matrix is easy and it has its own advantages and disadvantages. Let's look at the bidirected graph that we showed in the preceding diagram. If you would represent this graph as a matrix, it would like this: The preceding diagram is a simple representation of our graph in matrix form. The concept of matrix representation of graph is simple—if there is an edge to a node we mark the value as 1, else, if the edge is not present, we mark it as 0. As this is a bi-directed graph, it has edges flowing in one direction only. Thus, from the matrix, the rows and columns depict the vertices. There if you look at the vertex A, it has an edge to vertex B and the corresponding matrix value is 1. As you can see, it takes just one step or O[1]to figure out an edge between two nodes. We just need the index (rows and columns) in the matrix and we can extract that value from it. Also, if you would have looked at the matrix closely, you would have seen that most of the entries are zero, hence this is a sparse matrix. Thus, this approach eats a lot of space in computer memory in marking even those elements that do not have an edge to each other, and this is its main disadvantage. Adjacency list: Adjacency list solves the problem of space wastage of adjacency matrix. To solve this problem, it stores the node and its neighbors in a list (linked list) as shown in the following diagram: For maintaining brevity, we have not shown all the vertices but you can make out from the diagram that each vertex is storing its neighbors in a linked list. So when you want to figure out the neighbors of a particular vertex, you can directly iterate over the list. Of course this has the disadvantage of iterating when you have to figure out whether an edge exists between two nodes or not. This approach is also widely used in many algorithms in computer science. We have briefly seen how graphs can be represented, let's now see some important terms that are used heavily on graphs. Common terminology on graphs We will now introduce you to some common terms and concepts in graphs that you can use in your analytics on top of graphs: Vertices:As we mentioned earlier, vertices are the mathematical terms for the nodes in a graph. For analytic purposes, thevertices count shows the number of nodes in the system, for example, the number of people involved in a social graph. Edges: As we mentioned earlier, edges are the connection between vertices and edges can carry weights. The number of edges represent the number of relations in a system of graph. The weight on a graph represents the intensity of the relationship between the nodes involved; for example, in a social network, the relationship of friends is a stronger relationship than followed between nodes. Degrees: Represent the total number of connections flowing into as well as out of a node. For example, in the previous diagram the degree of node F is four. The degree count is useful, for example, in a social network graph it can represent how well a person is connected if his degree count is very high. Indegrees: This represents the number of connections flowing into a node. For example, in the previous diagram, for node F the indegree value is three. In a social network graph, this might represent how many people can send messages to this person or node. Oudegrees: This represents the number of connections flowing out of a node. For example, in the previous diagram, for node F the outdegree value is one. In a social network graph, this might represent how many people can send messages to this person or node. Common algorithms on graphs Let's look at the three common algorithms that are run on graphs frequently and some of their uses: Breadth first search: Breadth first search is an algorithm for graph traversal or searching. As the name suggests, the traversal occurs across the breadth of the graphs that is to say the neighbors of the node form where traversal starts are searched first before exploring further in the same manner. We will refer to the same graph we used earlier: If we start at vertex A, then according to breadth first search next we search or go at the neighbors of A that's B and F. After that, we will go at the neighbors of B and that will be C. Next we will go to the neighbours of F and those will be E and D. We only go through each node once and this mimics real-life travel as well, as to reach from a point to another point we seldom cover the same road or path again. Thus, our breadth first traversal starting from A will be {A , B , F , C , D , E }. Breadth first search is very useful in graph analytics and can tell us things such as the friends that are not your immediate friends but just at the next level after your immediate friends in a social network or in the case of a graph of a flights network it can show flights with just a single stop or two stops to the destination. Depth first search: This is another way of searching where we start from the source vertice and keep on searching until we reach the end node or the leaf node and then we backtrack. This algorithm is not as performant as the bread first search as it requires lots of traversals. So if you want to know if a node A is connected to node B, you might end up searching along a lot of wasteful nodes that do not have anything to do with the original nodes A and B before coming at the appropriate solution. Dijkstra's shortest path: This is a greedy algorithm to find the shortest path in a graph network. So in a weighted graph, if you need to find the shortest path between two nodes, you can start from the starting node and keep on picking the next node in path greedily to be the one with the least weight (in the case of weights being distances between nodes like in city graphs depicting interconnecting cities and roads).So in a road network, you can find the shortest path between two cities using this algorithm. PageRank algorithm: This is a very popular algorithm that came out from Google and it essentially is used to find the importance of a web page by figuring out how connected it is to other important websites. It gives a page rank score to each of the websites based on this approach and finally the search results are built based on this score. The best part about this algorithm is it can be applied to other areas in life too, for example, in figuring out the important airports in a flight graph, or figuring out the most important people in a social network group. So much for the basics and refresher on graphs, in the next section, we will now see how graphs can be used in real world in massive datasets such as social network data or in data used in the field of biology. We will also study how graph analytics can be used on top of these graphs to derive exclusive deductions. Plotting graphs There is a handy open source Java library called GraphStream, which can be used to plot graphs and this is very useful specially if you want to view the structure of your graphs. While viewing, you can also figure out if some of the vertices are very close to each other (clustered) or in general how they are placed. Using the GraphStream library is easy. Just download the jar from http://graphstream-project.org and put it in the classpath of your project. Next, we will show a simple example demonstrating how easy it is to plot a graph using this library. Just create an instance of a graph. For our example, we will create a simple DefaultGraph and name it SimpleGraph. Next, we will add the nodes or vertices of the graph. We will also add the attribute of the label that is displayed on the vertice. Graph graph = newDefaultGraph("SimpleGraph"); graph.addNode("A" ).setAttribute("ui.label", "A"); graph.addNode("B" ).setAttribute("ui.label", "B"); graph.addNode("C" ).setAttribute("ui.label", "C"); After building the nodes, it's now time to connect these nodes using the edges. The API is simple to use and on the graph instance we can define the edges, provided an ID is given to them and the starting and ending nodes are also given. graph.addEdge("AB", "A", "B"); graph.addEdge("BC", "B", "C"); graph.addEdge("CA", "C", "A"); All the information of nodes and edges is present on the graph instance. It's now time to plot this graph on the UI and we can just invoke the display method on the graph instance as shown next and display it on the UI. graph.display(); This would plot the graph on the UI as follows: This library is extensive and it will be good learning experience to explore this library further and we would urge the readers to further explore this library on their own. Massive graphs on big data Big data comprises huge amount of data distributed across a cluster of thousands (if not more) of machines. Building graphs based on this massive data has different challenges shown as follows: Due to the vast amount of data involved, the data for the graph is distributed across a cluster of machines. Hence, in actuality, it's not a single node graph and we have to build a graph that spans across a cluster of machines. A graph that spans across a cluster of machines would have vertices and edges spread across different machines and this data in a graph won't fit into the memory of one single machine. Consider your friend's list on Facebook; some of your friend's data in your Facebook friend list graph might lie on different machines and this data might be just tremendous in size. Look at an example diagram of a graph of 10 Facebook friends and their network shown as follows: As you can see in the preceding diagram, when for just 10 friends the data can be huge, and here since the graph is drawn by hand we have not even shown a lot of connections to make the image comprehensible, but in real life each person can have say more than thousands of connections. So imagine what will happen to a graph with say thousands if not more people on the list. As shown in the reasons we just saw, building massive graphs on big data is a different ball game altogether and there are few main approaches for building this massive graphs. From the perspective of big data building the massive graphs involve running and storing data parallely on many nodes. The two main approaches are bulk synchronous parallely and the pregel approach. Apache Spark follows the pregel approach. Covering these approaches in detail is out of scope of this book and if the users are interested more on these topics they should refer to other books and the Wikipedia for the same. Graph analytics The biggest advantage to using graphs is you can analyze these graphs and use them for analyzing complex datasets. You might ask what is so special about graph analytics that we can't do by relational databases. Let's try to understand this using an example, suppose we want to analyze your friends network on Facebook and pull information about your friends such as their name, their birth date, their recent likes, and so on. If Facebook had a relational database, then this would mean firing a query on some table using the foreign key of the user requesting this info. From the perspective of relational database, this first level query is easy. But what if we now ask you to go to the friends at level four in your network and fetch their data (as shown in the following diagram). The query to get this becomes more and more complicated from a relational database perspective but this is a trivial task on a graph or graphical database (such as Neo4j). Graphs are extremely good on operations where you want to pull information from one end of the node to another, where the other node lies after a lot of joins and hops. As such, graph analytics is good for certain use cases (but not for all use cases, relation database are still good on many other use cases). As you can see, the preceding diagram depicts a huge social network (though the preceding diagram might just be depicting a network of a few friends only). The dots represent actual people in a social network. So if somebody asks to pick one user on the left-most side of the diagram and see and follow host connections to the right-most side and pull the friends at the say 10th level or more, this is something very difficult to do in a normal relational database and doing it and maintaining it could easily go out of hand. There are four particular use cases where graph analytics is extremely useful and used frequently (though there are plenty more use cases too) Path analytics: As the name suggests, this analytics approach is used to figure out the paths as you traverse along the nodes of a graph. There are many fields where this can be used—simplest being road networks and figuring out details such as shortest path between cities, or in flight analytics to figure out the shortest time taking flight or direct flights. Connectivity analytics: As the name suggests, this approach outlines how the nodes within a graph are connected to each other. So using this you can figure out how many edges are flowing into a node and how many are flowing out of the node. This kind of information is very useful in analysis. For example, in a social network if there is a person who receives just one message but gives out say ten messages within his network then this person can be used to market his favorite products as he is very good in responding to messages. Community Analytics: Some graphs on big data are huge. But within these huge graphs there might be nodes that are very close to each other and are almost stacked in a cluster of their own. This is useful information as based on this you can extract out communities from your data. For example, in a social network if there are people who are part of some community, say marathon runners, then they can be clubbed into a single community and further tracked. Centrality Analytics: This kind of analytical approach is useful in finding central nodes in a network or graph. This is useful in figuring out sources that are single handedly connected to many other sources. It is helpful in figuring out influential people in a social network, or a central computer in a computer network. From the perspective of this article, we will be covering some of these use cases in our sample case studies and for this we will be using a library on Apache Spark called GraphFrames. GraphFrames GraphX library is advanced and performs well on massive graphs, but, unfortunately, it's currently only implemented in Scala and does not have any direct Java API. GraphFrames is a relatively new library that is built on top of Apache Spark and provides support for dataframe (now dataset) based graphs.It contains a lot of methods that are direct wrappers over the underlying sparkx methods. As such it provides similar functionality as GraphX except that GraphX acts on the Spark SRDD and GraphFrame works on the dataframe so GraphFrame is more user friendly (as dataframes are simpler to use). All the advantages of firing Spark SQL queries, joining datasets, filtering queries are all supported on this. To understand GraphFrames and representing massive big data graphs, we will take small baby steps first by building some simple programs using GraphFrames before building full-fledged case studies. Summary In this article, we learned about graph analytics. We saw how graphs can be built even on top of massive big datasets. We learned how Apache Spark can be used to build these massive graphs and in the process we learned about the new library GraphFrames that helps us in building these graphs. Resources for Article: Further resources on this subject: Saying Hello to Java EE [article] Object-Oriented JavaScript [article] Introduction to JavaScript [article]

0
0
4507

Packt

10 Jul 2017

6 min read

Thread of Execution

Packt

10 Jul 2017

6 min read

In this article by Anton Polukhin Alekseevic, the author of the book Boost C++ Application Development Cookbook - Second Edition, we will see the multithreading concept. Multithreading means multiple threads of execution exist within a single process. Threads may share process resources and have their own resources. Those threads of execution may run independently on different CPUs, leading to faster and more responsible programs. Let's see how to create a thread of execution. (For more resources related to this topic, see here.) Creating a thread of execution On modern multicore compilers, to achieve maximal performance (or just to provide a good user experience), programs usually must use multiple threads of execution. Here is a motivating example in which we need to create and fill a big file in a thread that draws the user interface: #include <algorithm> #include <fstream> #include <iterator> bool is_first_run(); // Function that executes for a long time. void fill_file(char fill_char, std::size_t size, const char* filename); // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { // This will be executing for a long time during which // users interface freezes. fill_file(0, 8 * 1024 * 1024, "save_file.txt"); } Getting ready This recipe requires knowledge of boost::bind or std::bind. How to do it... Starting a thread of execution never was so easy: #include <boost/thread.hpp> // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { boost::thread(boost::bind( &fill_file, 0, 8 * 1024 * 1024, "save_file.txt" )).detach(); } How it works... The boost::thread variable accepts a functional object that can be called without parameters (we provided one using boost::bind) and creates a separate thread of execution. That functional object is copied into a constructed thread of execution and run there. We are using version 4 of the Boost.Thread in all recipes (defined BOOST_THREAD_VERSION to 4). Important differences between Boost.Thread versions are highlighted. After that, we call the detach() function, which does the following: The thread of execution is detached from the boost::thread variable but continues its execution The boost::thread variable start to hold a Not-A-Thread state Note that without a call to detach(), the destructor of boost::thread will notice that it still holds a OS thread and will call std::terminate. std::terminate terminates our program without calling destructors, freeing up resources, and without other cleanup. Default constructed threads also have a Not-A-Thread state, and they do not create a separate thread of execution. There's more... What if we want to make sure that file was created and written before doing some other job? In that case we need to join the thread in the following way: // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { boost::thread t(boost::bind( &fill_file, 0, 8 * 1024 * 1024, "save_file.txt" )); // Do some work. // ... // Waiting for thread to finish. t.join(); } After the thread is joined, the boost::thread variable holds a Not-A-Thread state and its destructor does not call std::terminate. Remember that the thread must be joined or detached before its destructor is called. Otherwise, your program will terminate! With BOOST_THREAD_VERSION=2 defined, the destructor of boost::thread calls detach(), which does not lead to std::terminate. But doing so breaks compatibility with std::thread, and some day, when your project is moving to the C++ standard library threads or when BOOST_THREAD_VERSION=2 won't be supported; this will give you a lot of surprises. Version 4 of Boost.Thread is more explicit and strong, which is usually preferable in C++ language. Beware that std::terminate() is called when any exception that is not of type boost::thread_interrupted leaves boundary of the functional object that was passed to the boost::thread constructor. There is a very helpful wrapper that works as a RAII wrapper around the thread and allows you to emulate the BOOST_THREAD_VERSION=2 behavior; it is called boost::scoped_thread<T>, where T can be one of the following classes: boost::interrupt_and_join_if_joinable: To interrupt and join thread at destruction boost::join_if_joinable: To join a thread at destruction boost::detach: To detach a thread at destruction Here is a small example: #include <boost/thread/scoped_thread.hpp> void some_func(); void example_with_raii() { boost::scoped_thread<boost::join_if_joinable> t( boost::thread(&some_func) ); // 't' will be joined at scope exit. } The boost::thread class was accepted as a part of the C++11 standard and you can find it in the <thread> header in the std:: namespace. There is no big difference between the Boost's version 4 and C++11 standard library versions of the thread class. However, boost::thread is available on the C++03 compilers, so its usage is more versatile. There is a very good reason for calling std::terminate instead of joining by default! C and C++ languages are often used in life critical software. Such software is controlled by other software, called watchdog. Those watchdogs can easily detect that application has terminated but not always can detect deadlocks or detect them with bigger delays. For example for a defibrillator software it's safer to terminate, than hang on join() for a few seconds waiting for a watchdog reaction. Keep that in mind while designing such applications. See also All the recipes in this chapter are using Boost.Thread. You may continue reading to get more information about the library. The official documentation has a full list of the boost::thread methods and remarks about their availability in the C++11 standard library. The official documentation can be found at http://boost.org/libs/thread. The Interrupting a thread recipe will give you an idea of what the boost::interrupt_and_join_if_joinable class does. Summary We saw how to create a thread of execution using some easy techniques. Resources for Article: Further resources on this subject: Introducing the Boost C++ Libraries [article] Boost.Asio C++ Network Programming [article] Application Development in Visual C++ - The Tetris Application [article]

0
0
3561

Packt

10 Jul 2017

8 min read

Queues and topics

Packt

10 Jul 2017

8 min read

In this article by Luca Stancapiano, the author of the book Mastering Java EE Development with WildFly, we will see how to implement Java Message Service (JMS) in a queue channel using WildFly console. (For more resources related to this topic, see here.) JMS works inside channels of messages that manage the messages asynchronously. These channels contain messages that they will collect or remove according the configuration and the type of channel. These channels are of two types, queues and topics. These channels are highly configurable by the WildFly console. As for all components in WildFly they can be installed through the console command line or directly with maven plugins of the project. In the next two paragraphs we will show what do they mean and all the possible configurations. Queues Queues collect the sent messages that are waiting to be read. The messages are delivered in the order they are sent and when beds are removed from the queue. Create the queue from the web console See now the steps to create a new queue through the web console. Connect to http://localhost:9990/. Go in Configuration | Subsystems/Messaging - ActiveMQ/default. And click on Queues/Topics. Now select the Queues menu and click on the Add button. You will see this screen: The parameters to insert are as follows: Name: The name of the queue. JNDI Names: The jndi names the queue will be bound to. Durable?: Whether the queue is durable or not. Selector: The queue selector. As for all enterprise components, JMS components are callable through Java Naming Directory Interface (JNDI). Durable queues keep messages around persistently for any suitable consumer to consume them. Durable queues do not need to concern themselves with which consumer is going to consume the messages at some point in the future. There is just one copy of a message that any consumer in the future can consume. Message Selectors allows to filter the messages that a Message Consumer will receive. The filter is a relatively complex language similar to the syntax of an SQL WHERE clause. The selector can use all the message headers and properties for filtering operations, but cannot use the message content.Selectors are mostly useful for channels that broadcast a very large number of messages to its subscribers. On Queues, only messages that match the selector will be returned. Others stay in the queue (and thus can be read by a MessageConsumer with different selector). The following SQL elements are allowed in our filters and we can put them in the Selector field of the form: Element Description of the Element Example of Selectors AND, OR, NOT Logical operators (releaseYear < 1986) ANDNOT (title = 'Bad') String Literals String literals in single quotes, duplicate to escape title = 'Tom''s' Number Literals Numbers in Java syntax. They can be double or integer releaseYear = 1982 Properties Message properties that follow Java identifier naming releaseYear = 1983 Boolean Literals TRUE and FALSE isAvailable = FALSE ( ) Round brackets (releaseYear < 1981) OR (releaseYear > 1990) BETWEEN Checks whether number is in range (both numbers inclusive) releaseYear BETWEEN 1980 AND 1989 Header Fields Any headers except JMSDestination, JMSExpiration and JMSReplyTo JMSPriority = 10 =, <>, <, <=, >, >= Comparison operators (releaseYear < 1986) AND (title <> 'Bad') LIKE String comparison with wildcards '_' and '%' title LIKE 'Mirror%' IN Finds value in set of strings title IN ('Piece of mind', 'Somewhere in time', 'Powerslave') IS NULL, IS NOT NULL Checks whether value is null or not null. releaseYear IS NULL *, +, -, / Arithmetic operators releaseYear * 2 > 2000 - 18 Fill the form now: In this article we will implement a messaging service to send coordinates of the bus means . The queue is created and showed in the queues list: Create the queue using CLI and Maven WildFly plugin The same thing can be done with the Command Line Interface (CLI). So start a WildFly instance, go in the bin directory of WildFly and execute the following script: bash-3.2$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect [standalone@localhost:9990 /] /subsystem=messagingactivemq/ server=default/jmsqueue= gps_coordinates:add(entries=["java:/jms/queue/GPS"]) {"outcome" => "success"} The same thing can be done through maven. Simply add this snippet in your pom.xml: <plugin> <groupId>org.wildfly.plugins</groupId> <artifactId>wildfly-maven-plugin</artifactId> <version>1.0.2.Final</version> <executions> <execution> <id>add-resources</id> <phase>install</phase> <goals> <goal>add-resource</goal> </goals> <configuration> <resources> <resource> <address>subsystem=messaging-activemq,server=default,jmsqueue= gps_coordinates</address> <properties> <durable>true</durable> <entries>!!["gps_coordinates", "java:/jms/queue/GPS"]</entries> </properties> </resource> </resources> </configuration> </execution> <execution> <id>del-resources</id> <phase>clean</phase> <goals> <goal>undeploy</goal> Queues and topics [ 7 ] </goals> <configuration> <afterDeployment> <commands> <command>/subsystem=messagingactivemq/ server=default/jms-queue=gps_coordinates:remove </command> </commands> </afterDeployment> </configuration> </execution> </executions> </plugin> The Maven WildFly plugin lets you to do admin operations in WildFly using the same custom protocol used by command line. Two executions are configured: add-resources: It hooks the install maven scope and it adds the queue passing the name, JNDI and durable parameters seen in the previous paragraph. del-resources: It hooks the clean maven scope and remove the chosen queue by name. Create the queue through an Arquillian test case Or we can add and remove the queue through an Arquillian test case: @RunWith(Arquillian.class) @ServerSetup(MessagingResourcesSetupTask.class) public class MessageTestCase { ... private static final String QUEUE_NAME = "gps_coordinates"; private static final String QUEUE_LOOKUP = "java:/jms/queue/GPS"; static class MessagingResourcesSetupTask implements ServerSetupTask { @Override public void setup(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).createJmsQueue(QUEUE_NA ME, QUEUE_LOOKUP); } @Override public void tearDown(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).removeJmsQueue(QUEUE_NA ME); } } Queues and topics [ 8 ] ... } The Arquillian org.jboss.as.arquillian.api.ServerSetup annotation let to use an external setup manager used to install or remove new components inside WildFly. In this case we are installing the queue declared with the two variables QUEUE_NAME and QUEUE_LOOKUP. When the test ends, automatically the tearDown method will be started and it will remove the installed queue. To use Arquillian it's important add the WildFly testsuite dependency in your pom.xml project: ... <dependencies> <dependency> <groupId>org.wildfly</groupId> <artifactId>wildfly-testsuite-shared</artifactId> <version>10.1.0.Final</version> <scope>test</scope> </dependency> </dependencies> ... Going in the standalone-full.xml we will find the created queue as: <subsystem > <server name="default"> ... <jms-queue name="gps_coordinates" entries="java:/jms/queue/GPS"/> ... </server> </subsystem> JMS is available in the standalone-full configuration. By default WildFly supports 4 standalone configurations. They can be found in the standalone/configuration directory: standalone.xml: It supports all components except the messaging and corba/iiop standalone-full.xml: It supports all components standalone-ha.xml: It supports all components except the messaging and corba/iiop with the enabled cluster standalone-full-ha.xml: It supports all components with the enabled cluster To start WildFly with the chosen configuration simply add a -c with the configuration in the standalone.sh script. Here a sample to start the standalone full configuration: ./standalone.sh -c standalone-full.xml Create the java client for the queue See now how create a client to send a message to the queue. JMS 2.0 simplify very much the creation of clients. Here a sample of a client inside a stateless Enterprise Java Beans (EJB): @Stateless public class MessageQueueSender { @Inject private JMSContext context; @Resource(mappedName = "java:/jms/queue/GPS") private Queue queue; public void sendMessage(String message) { context.createProducer().send(queue, message); } } The javax.jms.JMSContext is injectable from any EE component. We will see the JMS context in details in the next paragraph The JMS Context. The queue is represented in JMS by the javax.jms.Queue class. It can be injected as JNDI resource through the @Resource annotation. The JMS context through the createProducer method creates a producer represented by the javax.jms.JMSProducer class used to send the messages. We can now create a client injecting the stateless and sending a string message hello! ... @EJB private MessageQueueSender messageQueueSender; ... messageQueueSender.sendMessage("hello!"); Summary In this article we have seen how to implement Java Message Service in a queue channel using web console, Command Line Interface and Maven WildFly plugins, Arquillian test cases and how to create Java clients for queue. Resources for Article: Further resources on this subject: WildFly – the Basics [article] WebSockets in Wildfly [article] Creating Java EE Applications [article]

0
0
5834

Packt

10 Jul 2017

8 min read

HPC Cluster Computing

Packt

10 Jul 2017

8 min read

In this article by, Raja Malleswara Rao Pattamsetti, the author of the book Distributed Computing in Java 9, the author is going to look into the processing capabilities for organizational applications are more than a chronological computer configuration that can be addressed to an extent by increasing the processor capacity and other resource allocation. While this can alleviate the performance for a while, the future computational requirements are restricted for adding more powerful computational processors and cost incurred in producing such powerful systems. Also, there is a need to produce efficient algorithms and practices to produce the best result out of it. A practical and economic substitute for these single high-power computers is to establish multiple low-power capacity processors that work collectively and organize their processing capabilities, which results in a powerful system, that is, parallel computers that permit the processing activities to be distributed among multiple low-capacity computers and together obtain the best expected result. In this article, we will cover the following topics: Era of Computing Commanding Parallel System Architectures MPP: Massively Parallel Processors SMP: Symmetric Multiprocessors CC-NUMA: Cache Coherent Nonuniform Memory Access Distributed Systems Clusters Java support for High-Performance Computing (For more resources related to this topic, see here.) Era of Computing The technological developments are rapid in the industry of computing with the help of advancements in system software and hardware. The hardware advancements are around the processor advancements and their making techniques, and high-performance processors are getting generated in amazingly low cost. The hardware advancements are further boosted by the high-bandwidth and low-latency network systems. Very Large Scale Integration (VLSI) as a concept brought several phenomenal advancements in producing commanding chronological and parallel processing computers. Simultaneously, the System Software advancements have improved the ability of operating system, advanced software programming development techniques. It was observed as two commanding computing eras, namely Sequential and Parallel Era of Computing. The following diagram shows the advancements in both the Era of Computing from last few decades to the forecast for next two decades: In each of these era, it is observed that the hardware architecture growth is trailed by the system software advancements, which mean that as there was more powerful hardware evolved, correspondingly advanced software programs and operating system capacities doubled its strength. As the applications and problem solving environments are added to this with the advent of parallel computing, mini to microprocessor development, clustered computers. Let’s now review some of the commanding parallel system architectures from the last few years. Commanding parallel system architectures From the last few years, multiple varieties of computing models provisioning great processing performance have developed. They are classified depending on the memory allocated to their processors and their alignment are designed. Some of the important parallel systems are as follows: MPP: Massively Parallel Processors SMP: Symmetric Multiprocessors CC-NUMA: Cache Coherent Nonuniform Memory Access Distributed Systems Clusters Let’s now understand a little more detail about each of these system architectures and review some of their important performance characteristics. Massively Parallel Processors (MPP) Massively Parallel Processors (MPP), as the name advises, are a huge parallel processing system developed in no sharing architecture. Such systems usually contain large number of processing nodes that are integrated with a high-speed interconnected network switch. Node is nothing but an element of computer with diverse hardware component combination, usually containing a memory unit and more than on processor. Some of the purpose nodes are designed to have backup disks or additional storage capacity.The following diagram represents the massively parallel processors architecture: Symmetric Multiprocessors (SMP) Symmetric Multiprocessors (SMP), as the name advises, contain a set of limited number of processors ranging from 2 to 64 processors and share most of the resources among those processors. One instance of the operating system will be operating together on all these connected processors while they commonly share the I/O, memory, and the network bus. Based on the nature of similar set of processors connected and acting together as on operating system is the essential behavior of symmetric multiprocessors.The following diagram depicts the symmetric multiprocessors representation: Cache Coherent Nonuniform Memory Access (CC-NUMA) Cache Coherent Nonuniform Memory Access (CC-NUMA) is a special type of multiple processor system having mountable additional processor capability. In CC-NUMA, the SMP system and the other remote nodes communicate through the remote interconnection link. Each of the remote nodes contains local memory and processors of its own.The following diagram represents the cache coherent nonuniform memory access architecture: The nature of memory access is nonuniform. Just Like the symmetric multiprocessors, CC-NUMA system is a comprehensive sight to entire system memory, and as the name advises, it takes nonuniform time to access either the close or distant memory locations. Distributed systems Distributed systems, as we have been discussing from previous articles, are traditional individual set of computers interconnected through an Internet/intranet running on their own operating system. Any diverse set of computer systems can contribute to be part of the distributed system with this expectation, including the combinations of Massively Parallel Processors, Symmetric Multiprocessors, distinct computers, and clusters. Clusters Cluster is an assembly of terminals or computers that are assembled through internetwork connection to address a large processing requirement through parallel processing. Hence, the clusters are usually configured with terminals or computers having higher processing capabilities connected with high-speed internetwork. Usually, Cluster is considered as one image that integrates a number of nodes with a group of resource utilization. The following diagram is a sample clustered tomcat server for a typical J2EE web application deployment: The following table shows some of the important performance characteristics of the different parallel architecture systems discussed so far: Network of workstations A network of workstations is a group of resources connected like system processors, interface for networks, storage, and data disks that open the space for newer combinations such as: Parallel Computing: As discussed in the previous sections, the group of system processor can be connected as MPP or DSM, which can obtain the parallel processing ability. RAM Network: As a number of systems are connected, each system memory can collectively work as DRAM cache, which intensely expand the virtual memory for the entire system to improve its processing ability. Software Redundant Array of Inexpensive Disks (RAID): As a group of systems are used in the network connected in array improve the system stability, availability, and memory capacity with the help of low-cost multiple systems connected in the local area network. This also gives the simultaneous I/O system support. Multipath Communication: Multipath communication is a technique of using more than one network connection between the network of workstations to allow simultaneous information exchange among system nodes. Java support for High-Performance Computing Java is providing some numerous advantages for HPC (High-Performance Computing) as a programming language, particularly with Grid computing. Some of the key benefits of using Java in HPC include the following: Portability: The capability to write the programming in one platform and port it to run on any other operating platform has been the biggest strength of Java language. This continue to be the advantage when porting Java applications to HPC systems. In Grid computing, this is an essential feature, as the execution environment gets decided during the execution of the program. This is possible since the Java byte code executes in Java Virtual Machine (JVM), which itself acts as an abstract operating environment. Network Centricity: As discussed in previous articles, Java provides a great support for distributed systems with its network centric feature of remote procedure calls (RPC) through RMI and CORBA services. Along with these, Java support for socket programming is another great support for grid computing. Software Engineering: Another great feature of Java is the way it can be used for engineering the solutions. We can produce more object-oriented, loosely coupled software that can be independently added as API through jar files in multiple other systems. Encapsulation and Interface programming makes it a great advantage in such application development. Security: Java is highly secure programming language with its feature like byte code verification to limit the resource utilization by an untrusted intruder software or program. This is a great advantage when running application on distributed environment with remote communication established over shared network. GUI development: Java support for platform independent GUI development is a great advantage to develop and deploy the enterprise web applications over a HPC environment to interact with. Availability: Java is having a great supporting from multiple operating system including Windows, Linux, SGI, and Compaq. It is easily available to consume and develop using a great set of open source frameworks developed on top of it. While the advantages listed in the previous points in favor of using Java for HPC environments, some concerns need to be reviewed while designing Java applications, including its numerics (complex numbers, fastfp, multidimensional), performance, and parallel computing designs. Summary Through this article, you have learned about era of computing, commanding parallel, system architectures, MPP: Massively Parallel Processors, SMP: Symmetric Multiprocessors, CC-NUMA: Cache Coherent Nonuniform Memory Access, Distributed Systems, Clusters, Java support for High-Performance Computing Resources for Article: Further resources on this subject: The Amazon S3 SDK for Java [article] Gradle with the Java Plugin [article] Getting Started with Sorting Algorithms in Java [article]

0
0
3033

Windows Drive Acquisition

Creating and Calling Subroutines

Getting Started with Android Things

Machine Learning Review

Parallelize It

Optimizing Hadoop

Up and Running with pandas

Introduction to IOT

Initial Configuration of SCO 2016

Introduction to Moodle 3

Trending Topics

Introduction to HoloLens

Massive Graphs on Big Data

Thread of Execution

Queues and topics

HPC Cluster Computing