How-To Tutorials

article-image-xamarin-how-to-add-a-mvvm-pattern-to-an-app-tutorial

22 Jun 2018

13 min read

Xamarin: How to add a MVVM pattern to an app [Tutorial]

22 Jun 2018

In our previous tutorial, we created a basic travel app using Xamarin.Forms. In this post, we will look at adding the Model-View-View-Model (MVVM) pattern to our travel app. The MVVM elements are offered with the Xamarin.Forms toolkit and we can expand on them to truly take advantage of the power of the pattern. As we dig into MVVM, we will apply what we have learned to the TripLog app that we started building in our previous tutorial. This article is an excerpt from the book Mastering Xamaring.Forms by Ed Snider. Understanding the MVVM pattern At its core, MVVM is a presentation pattern designed to control the separation between user interfaces and the rest of an application. The key elements of the MVVM pattern are as follows: Models: Models represent the business entities of an application. When responses come back from an API, they are typically deserialized to models. Views: Views represent the actual pages or screens of an application, along with all of the elements that make them up, including custom controls. Views are very platform-specific and depend heavily on platform APIs to render the application's user interface (UI). ViewModels: ViewModels control and manipulate the Views by serving as their data context. ViewModels are made up of a series of properties represented by Models. These properties are part of what is bound to the Views to provide the data that is displayed to users, or to collect the data that is entered or selected by users. In addition to model-backed properties, ViewModels can also contain commands, which are action-backed properties that bind the actual functionality and execution to events that occur in the Views, such as button taps or list item selections. Data binding: Data binding is the concept of connecting data properties and actions in a ViewModel with the user interface elements in a View. The actual implementation of how data binding happens can vary and, in most cases is provided by a framework, toolkit, or library. In Windows app development, data binding is provided declaratively in XAML. In traditional (non-Xamarin.Forms) Xamarin app development, data binding is either a manual process or dependent on a framework such as MvvmCross (https://github.com/MvvmCross/MvvmCross), a popular framework in the .NET mobile development community. Data binding in Xamarin.Forms follows a very similar approach to Windows app development. Adding MVVM to the app The first step of introducing MVVM into an app is to set up the structure by adding folders that will represent the core tenants of the pattern, such as Models, ViewModels, and Views. Traditionally, the Models and ViewModels live in a core library (usually, a portable class library or .NET standard library), whereas the Views live in a platform-specific library. Thanks to the power of the Xamarin.Forms toolkit and its abstraction of platform-specific UI APIs, the Views in a Xamarin.Forms app can also live in the core library. Just because the Views can live in the core library with the ViewModels and Models, this doesn't mean that separation between the user interface and the app logic isn't important. When implementing a specific structure to support a design pattern, it is helpful to have your application namespaces organized in a similar structure. This is not a requirement but it is something that can be useful. By default, Visual Studio for Mac will associate namespaces with directory names, as shown in the following screenshot: Setting up the app structure For the TripLog app, we will let the Views, ViewModels, and Models all live in the same core portable class library. In our solution, this is the project called TripLog. We have already added a Models folder in our previous tutorial, so we just need to add a ViewModels folder and a Views folder to the project to complete the MVVM structure. In order to set up the app structure, perform the following steps: Add a new folder named ViewModels to the root of the TripLog project. Add a new folder named Views to the root of the TripLog project. Move the existing XAML pages files (MainPage.xaml, DetailPage.xaml, and NewEntryPage.xaml and their .cs code-behind files) into the Views folder that we have just created. Update the namespace of each Page from TripLog to TripLog.Views. Update the x:Class attribute of each Page's root ContentPage from TripLog.MainPage, TripLog.DetailPage, and TripLog.NewEntryPage to TripLog.Views.MainPage, TripLog.Views.DetailPage, and TripLog.Views.NewEntryPage, respectively. Update the using statements on any class that references the Pages. Currently, this should only be in the App class in App.xaml.cs, where MainPage is instantiated. Once the MVVM structure has been added, the folder structure in the solution should look similar to the following screenshot: In MVVM, the term View is used to describe a screen. Xamarin.Forms uses the term View to describe controls, such as buttons or labels, and uses the term Page to describe a screen. In order to avoid confusion, I will stick with the Xamarin.Forms terminology and refer to screens as Pages, and will only use the term Views in reference to screens for the folder where the Pages will live, in order to stick with the MVVM pattern. Adding ViewModels In most cases, Views (Pages) and ViewModels have a one-to-one relationship. However, it is possible for a View (Page) to contain multiple ViewModels or for a ViewModel to be used by multiple Views (Pages). For now, we will simply have a single ViewModel for each Page. Before we create our ViewModels, we will start by creating a base ViewModel class, which will be an abstract class containing the basic functionality that each of our ViewModels will inherit. Initially, the base ViewModel abstract class will only contain a couple of members and will implement INotifyPropertyChanged, but we will add to this class as we continue to build upon the TripLog app throughout this book. In order to create a base ViewModel, perform the following steps: Create a new abstract class named BaseViewModel in the ViewModels folder using the following code: public abstract class BaseViewModel { protected BaseViewModel() { } } Update BaseViewModel to implement INotifyPropertyChanged: public abstract class BaseViewModel : INotifyPropertyChanged { protected BaseViewModel() { } public event PropertyChangedEventHandler PropertyChanged; protected virtual void OnPropertyChanged( [CallerMemberName] string propertyName = null) { PropertyChanged?.Invoke(this, new PropertyChangedEventArgs(propertyName)); } } The implementation of INotifyPropertyChanged is key to the behavior and role of the ViewModels and data binding. It allows a Page to be notified when the properties of its ViewModel have changed. Now that we have created a base ViewModel, we can start adding the actual ViewModels that will serve as the data context for each of our Pages. We will start by creating a ViewModel for MainPage. Adding MainViewModel The main purpose of a ViewModel is to separate the business logic, for example, data access and data manipulation, from the user interface logic. Right now, our MainPage directly defines the list of data that it is displaying. This data will eventually be dynamically loaded from an API but for now, we will move this initial static data definition to its ViewModel so that it can be data bound to the user interface. In order to create the ViewModel for MainPage, perform the following steps: Create a new class file in the ViewModels folder and name it MainViewModel. Update the MainViewModel class to inherit from BaseViewModel: public class MainViewModel : BaseViewModel { // ... } Add an ObservableCollection<T> property to the MainViewModel class and name it LogEntries. This property will be used to bind to the ItemsSource property of the ListView element on MainPage.xaml: public class MainViewModel : BaseViewModel { ObservableCollection<TripLogEntry> _logEntries; public ObservableCollection<TripLogEntry> LogEntries { get { return _logEntries; } set { _logEntries = value; OnPropertyChanged (); } } // ... } Next, remove the List<TripLogEntry> that populates the ListView element on MainPage.xaml and repurpose that logic in the MainViewModel—we will put it in the constructor for now: public MainViewModel() { LogEntries = new ObservableCollection<TripLogEntry>(); LogEntries.Add(new TripLogEntry { Title = "Washington Monument", Notes = "Amazing!", Rating = 3, Date = new DateTime(2017, 2, 5), Latitude = 38.8895, Longitude = -77.0352 }); LogEntries.Add(new TripLogEntry { Title = "Statue of Liberty", Notes = "Inspiring!", Rating = 4, Date = new DateTime(2017, 4, 13), Latitude = 40.6892, Longitude = -74.0444 }); LogEntries.Add(new TripLogEntry { Title = "Golden Gate Bridge", Notes = "Foggy, but beautiful.", Rating = 5, Date = new DateTime(2017, 4, 26), Latitude = 37.8268, Longitude = -122.4798 }); } Set MainViewModel as the BindingContext property for MainPage. Do this by simply setting the BindingContext property of MainPage in its code-behind file to a new instance of MainViewModel. The BindingContext property comes from the Xamarin.Forms.ContentPage base class: public MainPage() { InitializeComponent(); BindingContext = new MainViewModel(); } Finally, update how the ListView element on MainPage.xaml gets its items. Currently, its ItemsSource property is being set directly in the Page's code behind. Remove this and instead update the ListView element's tag in MainPage.xaml to bind to the MainViewModel LogEntries property: <ListView ... ItemsSource="{Binding LogEntries}"> Adding DetailViewModel Next, we will add another ViewModel to serve as the data context for DetailPage, as follows: Create a new class file in the ViewModels folder and name it DetailViewModel. Update the DetailViewModel class to inherit from the BaseViewModel abstract class: public class DetailViewModel : BaseViewModel { // ... } Add a TripLogEntry property to the class and name it Entry. This property will be used to bind details about an entry to the various labels on DetailPage: public class DetailViewModel : BaseViewModel { TripLogEntry _entry; public TripLogEntry Entry { get { return _entry; } set { _entry = value; OnPropertyChanged (); } } // ... } Update the DetailViewModel constructor to take a TripLogEntry parameter named entry. Use this constructor property to populate the public Entry property created in the previous step: public class DetailViewModel : BaseViewModel { // ... public DetailViewModel(TripLogEntry entry) { Entry = entry; } } Set DetailViewModel as the BindingContext for DetailPage and pass in the TripLogEntry property that is being passed to DetailPage: public DetailPage (TripLogEntry entry) { InitializeComponent(); BindingContext = new DetailViewModel(entry); // ... } Next, remove the code at the end of the DetailPage constructor that directly sets the Text properties of the Label elements: public DetailPage(TripLogEntry entry) { // ... // Remove these lines of code: //title.Text = entry.Title; //date.Text = entry.Date.ToString("M"); //rating.Text = $"{entry.Rating} star rating"; //notes.Text = entry.Notes; } Next, update the Label element tags in DetailPage.xaml to bind their Text properties to the DetailViewModel Entry property: <Label ... Text="{Binding Entry.Title}" /> <Label ... Text="{Binding Entry.Date, StringFormat='{0:M}'}" /> <Label ... Text="{Binding Entry.Rating, StringFormat='{0} star rating'}" /> <Label ... Text="{Binding Entry.Notes}" /> Finally, update the map to get the values it is plotting from the ViewModel. Since the Xamarin.Forms Map control does not have bindable properties, the values have to be set directly to the ViewModel properties. The easiest way to do this is to add a private field to the page that returns the value of the page's BindingContext and then use that field to set the values on the map: public partial class DetailPage : ContentPage { DetailViewModel _vm { get { return BindingContext as DetailViewModel; } } public DetailPage(TripLogEntry entry) { InitializeComponent(); BindingContext = new DetailViewModel(entry); TripMap.MoveToRegion(MapSpan.FromCenterAndRadius( new Position(_vm.Entry.Latitude, _vm.Entry.Longitude), Distance.FromMiles(.5))); TripMap.Pins.Add(new Pin { Type = PinType.Place, Label = _vm.Entry.Title, Position = new Position(_vm.Entry.Latitude, _vm.Entry.Longitude) }); } } Adding NewEntryViewModel Finally, we will need to add a ViewModel for NewEntryPage, as follows: Create a new class file in the ViewModels folder and name it NewEntryViewModel. Update the NewEntryViewModel class to inherit from BaseViewModel: public class NewEntryViewModel : BaseViewModel { // ... } Add public properties to the NewEntryViewModel class that will be used to bind it to the values entered into the EntryCell elements in NewEntryPage.xaml: public class NewEntryViewModel : BaseViewModel { string _title; public string Title { get { return _title; } set { _title = value; OnPropertyChanged(); } } double _latitude; public double Latitude { get { return _latitude; } set { _latitude = value; OnPropertyChanged(); } } double _longitude; public double Longitude { get { return _longitude; } set { _longitude = value; OnPropertyChanged(); } } DateTime _date; public DateTime Date { get { return _date; } set { _date = value; OnPropertyChanged(); } } int _rating; public int Rating { get { return _rating; } set { _rating = value; OnPropertyChanged(); } } string _notes; public string Notes { get { return _notes; } set { _notes = value; OnPropertyChanged(); } } // ... } Update the NewEntryViewModel constructor to initialize the Date and Rating properties: public NewEntryViewModel() { Date = DateTime.Today; Rating = 1; } Add a public Command property to NewEntryViewModel and name it SaveCommand. This property will be used to bind to the Save ToolbarItem in NewEntryPage.xaml. The Xamarin. Forms Command type implements System.Windows.Input.ICommand to provide an Action to run when the command is executed, and a Func to determine whether the command can be executed: public class NewEntryViewModel : BaseViewModel { // ... Command _saveCommand; public Command SaveCommand { get { return _saveCommand ?? (_saveCommand = new Command(ExecuteSaveCommand, CanSave)); } } void ExecuteSaveCommand() { var newItem = new TripLogEntry { Title = Title, Latitude = Latitude, Longitude = Longitude, Date = Date, Rating = Rating, Notes = Notes }; } bool CanSave () { return !string.IsNullOrWhiteSpace (Title); } } In order to keep the CanExecute function of the SaveCommand up to date, we will need to call the SaveCommand.ChangeCanExecute() method in any property setters that impact the results of that CanExecute function. In our case, this is only the Title property: public string Title { get { return _title; } set { _title = value; OnPropertyChanged(); SaveCommand.ChangeCanExecute(); } } The CanExecute function is not required, but by providing it, you can automatically manipulate the state of the control in the UI that is bound to the Command so that it is disabled until all of the required criteria are met, at which point it becomes enabled. Next, set NewEntryViewModel as the BindingContext for NewEntryPage: public NewEntryPage() { InitializeComponent(); BindingContext = new NewEntryViewModel(); // ... } Next, update the EntryCell elements in NewEntryPage.xaml to bind to the NewEntryViewModel properties: <EntryCell Label="Title" Text="{Binding Title}" /> <EntryCell Label="Latitude" Text="{Binding Latitude}" ... /> <EntryCell Label="Longitude" Text="{Binding Longitude}" ... /> <EntryCell Label="Date" Text="{Binding Date, StringFormat='{0:d}'}" /> <EntryCell Label="Rating" Text="{Binding Rating}" ... /> <EntryCell Label="Notes" Text="{Binding Notes}" /> Finally, we will need to update the Save ToolbarItem element in NewEntryPage.xaml to bind to the NewEntryViewModel SaveCommand property: <ToolbarItem Text="Save" Command="{Binding SaveCommand}" /> Now, when we run the app and navigate to the new entry page, we can see the data binding in action, as shown in the following screenshots. Notice how the Save button is disabled until the title field contains a value: To summarize, we updated the app that we had created in this article; Create a basic travel app using Xamarin.Forms. We removed data and data-related logic from the Pages, offloading it to a series of ViewModels and then binding the Pages to those ViewModels. If you liked this tutorial, read our book, Mastering Xamaring.Forms , to create an architecture rich mobile application with good design patterns and best practices using Xamarin.Forms. Xamarin Forms 3, the popular cross-platform UI Toolkit, is here! Five reasons why Xamarin will change mobile development Creating Hello World in Xamarin.Forms_sample

0
0
23627

article-image-how-to-recover-deleted-data-from-an-android-device-tutorial

Sugandha Lahoti

04 Feb 2019

11 min read

How to recover deleted data from an Android device [Tutorial]

Sugandha Lahoti

04 Feb 2019

11 min read

In this tutorial, we are going to learn about data recovery techniques that enable us to view data that has been deleted from a device. Deleted data could contain highly sensitive information and thus data recovery is a crucial aspect of mobile forensics. This article will cover the following topics: Data recovery overview Recovering data deleted from an SD card Recovering data deleted from a phone's internal storage This article is taken from the book Learning Android Forensics by Oleg Skulkin, Donnie Tindall, and Rohit Tamma. This book is a comprehensive guide to Android forensics, from setting up the workstation to analyzing key artifacts. Data recovery overview Data recovery is a powerful concept within digital forensics. It is the process of retrieving deleted data from a or SD card when it cannot be accessed normally. Being able to recover data that has been deleted by a user could help solve civil or criminal cases. This is because many accused just delete data from their device hoping that the evidence will be destroyed. Thus, in most criminal cases, deleted data could be crucial because it may contain information the user wanted to erase from their Android device. For example, consider the scenario where a mobile phone has been seized from a terrorist. Wouldn't it be of the greatest importance to know which items were deleted by them? Access to any deleted SMS messages, pictures, dialed numbers, and so on could be of critical importance as they may reveal a lot of sensitive information. From a normal user's point of view, recovering data that has been deleted would usually mean referring to the operating system's built-in solutions, such as the Recycle Bin in Windows. While it's true that data can be recovered from these locations, due to an increase in user awareness, these options often don't work. For instance, on a desktop computer, people now use Shift + Del whenever they want to delete a file completely from their desktop. Similarly, in mobile environments, users are aware of the restore operations provided by apps and so on. In spite of these situations, data recovery techniques allow a forensic investigator to access the data that has been deleted from the device. With respect to Android, it is possible to recover most of the deleted data, including SMS, pictures, application data, and so on. But it is important to seize the device in a proper manner and follow certain procedures, otherwise, data might be deleted permanently. To ensure that the deleted data is not lost forever, it is recommended to keep the following points in mind: Do not use the phone for any activity after seizing it. The deleted text message exists on the device until space is needed by some other incoming data, so the phone must not be used for any sort of activity to prevent the data from being overwritten. Even when the phone is not used, without any intervention from our end, data can be overwritten. For instance, an incoming SMS would automatically occupy the space, which overwrites the deleted data. Also, remote wipe commands can wipe the content present on the device. To prevent such events, you can consider the option of placing the device in a Faraday bag. Thus, care should be taken to prevent delivery of any new messages or data through any means of communication. How can deleted files be recovered? When a user deletes any data from a device, the data is not actually erased from the device and continues to exist on it. What gets deleted is the pointer to that data. All filesystems contain metadata, which maintains information about the hierarchy of files, filenames, and so on. Deletion will not really erase the data but instead removes the file system metadata. Thus, when text messages or any other files are deleted from a device, they are just made invisible to the user, but the files are still present on the device as long as they are not overwritten by some other data. Hence, there is the possibility of recovering them before new data is added and occupies the space. Deleting the pointer and marking the space as available is an extremely fast operation compared to actually erasing all the data from the device. Hence, to increase performance, operating systems just delete the metadata. Recovering deleted data on an Android device involves three scenarios: Recovering data that is deleted from the SD card such as pictures, videos, and so on Recovering data that is deleted from SQLite databases such as SMS, chats, web history, and so on Recovering data that is deleted from the device's internal storage The following sections cover the techniques that can be used to recover deleted data from SD cards, and the internal storage of the Android device. Recovering deleted data from SD cards Data present on an SD card can reveal lots of information that is useful during a forensic investigation. The fact that pictures, videos, voice recordings, and application data are stored on the SD card adds weight to this. As mentioned in the previous chapters, Android devices often use FAT32 or exFAT file systems on their SD card. The main reason for this is that these file systems are widely supported by most operating systems, including Windows, Linux, and macOS X. The maximum file size on a FAT32 formatted drive is around 4 GB. With increasingly high-resolution formats now available, this limit is commonly reached, that's why newer devices support exFAT: this file system doesn't have such limitations. Recovering the data deleted from an external SD is pretty easy if it can be mounted as a drive. If the SD card is removable, it can be mounted as a drive by connecting it to a computer using a card reader. Any files can be transferred to the SD card while it's mounted. Some of the older devices that use USB mass storage also mount the device to a drive when connected through a USB cable. In order to make sure that the original evidence is not modified, a physical image of the disk is taken and all further experimentation is done on the image itself. Similarly, in the case of SD card analysis, an image of the SD card needs to be taken. Once the imaging is done, we have a raw image file. In our example, we will use FTK Imager by AccessData, which is an imaging utility. In addition to creating disk images, it can also be used to explore the contents of a disk image. The following are the steps that can be followed to recover the contents of an SD card using this tool: Start FTK Imager and click on File and then Add Evidence Item... in the menu, as shown in the following screenshot: Adding evidence source to FTK Imager Select Image File in the Select Source dialog and click on Next. In the Select File dialog, browse to the location where you downloaded the sdcard.dd file, select it, and click on Finish, as shown in the following screenshot: Selecting the image file for analysis in FTK Imager FTK Imager's default display will appear with the contents of the SD card visible in the View pane at the lower right. You can also click on the Properties tab below the lower left pane to view the properties for the disk image. Now, on the left pane, the drive has opened. You can open folders by clicking on the + sign. When highlighting the folder, contents are shown on the right pane. When a file is selected, its contents can be seen on the bottom pane. As shown in the following screenshot, the deleted files will have a red X over the icon derived from their file extension: Deleted files shown with red X over the icons As shown in the following screenshot, to export the file, right-click on the file that contains the picture and select Export Files...: Sometimes, only a fragment of the file is recoverable, which cannot be read or viewed directly. In that case, we need to look through free or unallocated space for more data. Carving can be used to recover files from free and unallocated space. PhotoRec is one of the tools that can help you to do that. You will learn more about file carving with PhotoRec in the following sections. Recovering deleted data from internal memory Recovering files deleted from Android's internal memory, such as app data and so on, is not as easy as recovering such data from SD cards and SQLite databases, but, of course, it's not impossible. Many commercial forensic tools are capable of recovering deleted data from Android devices, of course, if a physical acquisition is possible and the user data partition isn't encrypted. But this is not very common for modern devices, especially those running most recent versions of the operating system, such as Oreo and Pie. Most Android devices, especially modern smartphones, and tablets, use the EXT4 file system to organize data in their internal storage. This file system is very common for Linux-based devices. So, if we want to recover deleted data from the device's internal storage, we need a tool capable of recovering deleted files from the EXT4 file system. One such tool is extundelete. The tool is available for downloading here: http://extundelete.sourceforge.net/. To recover the contents of an inode, extundelete searches a file system's journal for an old copy of that inode. Information contained in the inode helps the tool to locate the file within the file system. To recover not only the file's contents, but also its name, extundelete is able to search the deleted entries in a directory to match the inode number of a file to a file name. To use this tool, you will need a Linux workstation. Most forensic Linux distributions have it already on board. For example, the following is a screenshot from SIFT Workstation—a popular digital forensics and incident response Linux distribution created by Rob Lee and his team from the SANS Institute (https://digital-forensics.sans.org/community/downloads): extundelete command-line options Before you can start the recovery process, you will need to mount a previously imaged userdata partition. In this example, we are going to use an Android device imaged via the chip-off technique. First of all, we need to determine the location of the userdata partition within the image. To do this, we can use mmls from the Sleuth Kit, as shown in the following screenshot: Android device partitions As you can see in the screenshot, the userdata partition is the last one and starts in sector 9199616. To make sure the userdata partition is EXT4 formatted, let's use fsstat, as shown in the following example: A part of fsstat output All you need now is to mount the userdata partition and run extundelete against it, as shown in the following example: extundelete /userdata/partition/mount/point --restore-all All recovered files will be saved to a subdirectory of the current directory named RECOVERED_FILES. If you are interested in recovering files before or after the specified date, you can use the --before date and --after-date options. It's important to note that these dates must be in UNIX Epoch format. There are quite a lot of both online and offline tools capable of converting timestamps, for example, you can use https://www.epochconverter.com/. As you can see, this method isn't very easy and fast, but there is a better way: using Autopsy, an open source digital forensic tool In the following example, we used a built-in file extension filter to find all the images on the Android device, and found a lot of deleted artifacts: Recovering deleted files from an EXT4 partition with Autopsy Summary Data recovery is the process of retrieving deleted data from the device and thus is a very important concept in forensics. In this chapter, we have seen various techniques to recover deleted data from both the SD card and the internal memory. While recovering the data from a removable SD card is easy, recovering data from internal memory involves a few complications. SQLite file parsing and file carving techniques aid a forensic analyst in recovering the deleted items present in the internal memory of an Android device. In order to understand the forensic perspective and the analysis of Android apps, read our book Learning Android Forensics. What role does Linux play in securing Android devices? How the Titan M chip will improve Android security Getting your Android app ready for the Play Store[Tutorial]

0
0
23531

How-To Tutorials

article-image-really-basic-guide-to-batch-file-programming

Richard Gall

31 May 2018

3 min read

A really basic guide to batch file programming

Richard Gall

31 May 2018

3 min read

Batch file programming is a way of making a computer do things simply by creating, yes, you guessed it, a batch file. It's a way of doing things you might ordinarily do in the command prompt, but automates some tasks, which means you don't have to write so much code. If it sounds straightforward, that's because it is, generally. Which is why it's worth learning... Batch file programming is a good place to start learning how computers work Of course, if you already know your way around batch files, I'm sure you'll agree it's a good way for someone relatively experienced in software to get to know their machine a little better. If you know someone that you think would get a lot from learning batch file programming share this short guide with them! Why would I write a batch script? There are a number of reasons you might write batch scripts. It's particularly useful for resolving network issues, installing a number of programs on different machines, even organizing files and folders on your computer. Imagine you have a recurring issue - with a batch file you can solve it quickly and easily wherever you are without having to write copious lines of code in the command line. Or maybe your desktop simply looks like a mess; with a little knowledge of batch file programming you can clean things up without too much effort. How to write a batch file Clearly, batch file programming can make your life a lot easier. Let's take a look at the key steps to begin writing batch scripts. Step 1: Open your text editor Batch file programming is really about writing commands - so you'll need your text editor open to begin. Notepad, wordpad, it doesn't matter! Step 2: Begin writing code As we've already seen, batch file programming is really about writing commands for your computer. The code is essentially the same as what you would write in the command prompt. Here are a few batch file commands you might want to know to get started: ipconfig - this presents network information like your IP and MAC address. start “” [website] - this opens a specified website in your browser. rem - this is used if you want to make a comment or remark in your code (ie. for documentation purposes) pause - this, as you'd expect, pauses the script so it can be read before it continues. echo - this command will display text in the command prompt. %%a - this command refers to every file in a given folder if - this is a conditional command The list of batch file commands is pretty long. There are plenty of other resources with an exhaustive list of commands you can use, but a good place to begin is this page on Wikipedia. Step 3: Save your batch file Once you've written your commands in the text editor, you'll then need to save your document as a batch file. Title it, and suffix it with the .bat extension. You'll also need to make sure save as type is set as 'All files'. That's basically it when it comes to batch file programming. Of course, there are some complex things you can do, but once you know the basics, getting into the code is where you can start to experiment. Read next Jupyter and Python scripting Python Scripting Essentials

0
0
23381

article-image-restful-java-web-services-swagger

Fatema Patrawala

22 May 2018

14 min read

Documenting RESTful Java Web Services using Swagger

Fatema Patrawala

22 May 2018

14 min read

0
0
23375

article-image-object-detection-using-image-features-javascript

Packt

05 Oct 2015

16 min read

Object Detection Using Image Features in JavaScript

Packt

05 Oct 2015

16 min read

In this article by Foat Akhmadeev, author of the book Computer Vision for the Web, we will discuss how we can detect an object on an image using several JavaScript libraries. In particular, we will see techniques such as FAST features detection, and BRIEF and ORB descriptors matching. Eventually, the object detection example will be presented. There are many ways to detect an object on an image. Color object detection, which is the detection of changes in intensity of an image is just a simple computer vision methods. There are some sort of fundamental things which every computer vision enthusiast should know. The libraries we use here are: JSFeat (http://inspirit.github.io/jsfeat/) tracking.js (http://trackingjs.com) (For more resources related to this topic, see here.) Detecting key points What information do we get when we see an object on an image? An object usually consists of some regular parts or unique points, which represent this particular object. Of course, we can compare each pixel of an image, but it is not a good thing in terms of computational speed. Probably, we can take unique points randomly, thus reducing the computation cost significantly, but we will still not get much information from random points. Using the whole information, we can get too much noise and lose important parts of an object representation. Eventually, we need to consider that both ideas, getting all pixels and selecting random pixels, are really bad. So, what can we do in that case? We are working with a grayscale image and we need to get unique points of an image. Then, we need to focus on the intensity information. For example, getting object edges in the Canny edge detector or the Sobel filter. We are closer to the solution! But still not close enough. What if we have a long edge? Don't you think that it is a bit bad that we have too many unique pixels that lay on this edge? An edge of an object has end points or corners; if we reduce our edge to those corners, we will get enough unique pixels and remove unnecessary information. There are various methods of getting keypoints from an image, many of which extract corners as those keypoints. To get them, we will use the FAST (Features from Accelerated Segment Test) algorithm. It is really simple and you can easily implement it by yourself if you want. But you do not need to. The algorithm implementation is provided by both tracking.js and JSFeat libraries. The idea of the FAST algorithm can be captured from the following image: Suppose we want to check whether the pixel P is a corner. We will check 16 pixels around it. If at least 9 pixels in an arc around P are much darker or brighter than the value of P, then we say that P is a corner. How much darker or brighter should the P pixels be? The decision is made by applying a threshold for the difference between the value of P and the value of pixels around P. A practical example First, we will start with an example of FAST corner detection for the tracking.js library. Before we do something, we can set the detector threshold. Threshold defines the minimum difference between a tested corner and the points around it: tracking.Fast.THRESHOLD = 30; It is usually a good practice to apply a Gaussian blur on an image before we start the method. It significantly reduces the noise of an image: var imageData = context.getImageData(0, 0, cols, rows); var gray = tracking.Image.grayscale(imageData.data, cols, rows, true); var blurred4 = tracking.Image.blur(gray, cols, rows, 3); Remember that the blur function returns a 4 channel array—RGBA. In that case, we need to convert it to 1-channel. Since we can easily skip other channels, it should not be a problem: var blurred1 = new Array(blurred4.length / 4); for (var i = 0, j = 0; i < blurred4.length; i += 4, ++j) { blurred1[j] = blurred4[i]; } Next, we run a corner detection function on our image array: var corners = tracking.Fast.findCorners(blurred1, cols, rows); The result returns an array with its length twice the length of the corner's number. The array is returned in the format [x0,y0,x1,y1,...]. Where [xn, yn] are coordinates of a detected corner. To print the result on a canvas, we will use the fillRect function: for (i = 0; i < corners.length; i += 2) { context.fillStyle = '#0f0'; context.fillRect(corners[i], corners[i + 1], 3, 3); } Let's see an example with the JSFeat library,. for which the steps are very similar to that of tracking.js. First, we set the global threshold with a function: jsfeat.fast_corners.set_threshold(30); Then, we apply a Gaussian blur to an image matrix and run the corner detection: jsfeat.imgproc.gaussian_blur(matGray, matBlurred, 3); We need to preallocate keypoints for a corners result. The keypoint_t function is just a new type which is useful for keypoints of an image. The first two parameters represent coordinates of a point and the other parameters set: point score (which checks whether the point is good enough to be a key point), point level (which you can use it in an image pyramid, for example), and point angle (which is usually used for the gradient orientation): var corners = []; var i = cols * rows; while (--i >= 0) { corners[i] = new jsfeat.keypoint_t(0, 0, 0, 0, -1); } After all this, we execute the FAST corner detection method. As a last parameter of detection function, we define a border size. The border is used to constrain circles around each possible corner. For example, you cannot precisely say whether the point is a corner for the [0,0] pixel. There is no [0, -3] pixel in our matrix: var count = jsfeat.fast_corners.detect(matBlurred, corners, 3); Since we preallocated the corners, the function returns the number of calculated corners for us. The result returns an array of structures with the x and y fields, so we can print it using those fields: for (var i = 0; i < count; i++) { context.fillStyle = '#0f0'; context.fillRect(corners[i].x, corners[i].y, 3, 3); } The result is nearly the same for both algorithms. The difference is in some parts of realization. Let's look at the following example: From left to right: tracking.js without blur, JSFeat without blur, tracking.js and JSFeat with blur. If you look closely, you can see the difference between tracking.js and JSFeat results, but it is not easy to spot it. Look at how much noise was reduced by applying just a small 3 x 3 Gaussian filter! A lot of noisy points were removed from the background. And now the algorithm can focus on points that represent flowers and the pattern of the vase. We have extracted key points from our image, and we successfully reached the goal of reducing the number of keypoints and focusing on the unique points of an image. Now, we need to compare or match those points somehow. How we can do that? Descriptors and object matching Image features by themselves are a bit useless. Yes, we have found unique points on an image. But what did we get? Only values of pixels and that's it. If we try to compare these values, it will not give us much information. Moreover, if we change the overall image brightness, we will not find the same keypoints on the same image! Taking into account all of this, we need the information that surrounds our key points. Moreover, we need a method to efficiently compare this information. First, we need to describe the image features, which comes from image descriptors. In this part, we will see how these descriptors can be extracted and matched. The tracking.js and JSFeat libraries provide different methods for image descriptors. We will discuss both. BRIEF and ORB descriptors The descriptors theory is focused on changes in image pixels' intensities. The tracking.js library provides the BRIEF (Binary Robust Independent Elementary Features) descriptors and its JSFeat extension—ORB (Oriented FAST and Rotated BRIEF). As we can see from the ORB naming, it is rotation invariant. This means that even if you rotate an object, the algorithm can still detect it. Moreover, the authors of the JSFeat library provide an example using the image pyramid, which is scale invariant too. Let's start by explaining BRIEF, since it is the source for ORB descriptors. As a first step, the algorithm takes computed image features, and it takes the unique pairs of elements around each feature. Based on these pairs' intensities it forms a binary string. For example, if we have a pair of positions i and j, and if I(i) < I(j) (where I(pos) means image value at the position pos), then the result is 1, else 0. We add this result to the binary string. We do that for N pairs, where N is taken as a power of 2 (128, 256, 512). Since descriptors are just binary strings, we can compare them in an efficient manner. To match these strings, the Hamming distance is usually used. It shows the minimum number of substitutions required to change one string to another. For example, we have two binary strings: 10011 and 11001. The Hamming distance between them is 2, since we need to change 2 bits of information to change the first string to the second. The JSFeat library provides the functionality to apply ORB descriptors. The core idea is very similar to BRIEF. However, there are two major differences: The implementation is scale invariant, since the descriptors are computed for an image pyramid. The descriptors are rotation invariant; the direction is computed using intensity of the patch around a feature. Using this orientation, ORB manages to compute the BRIEF descriptor in a rotation-invariant manner. Implementation of descriptors implementation and their matching Our goal is to find an object from a template on a scene image. We can do that by finding features and descriptors on both images and matching descriptors from a template to an image. We start from the tracking.js library and BRIEF descriptors. The first thing that we can do is set the number of location pairs: tracking.Brief.N = 512 By default, it is 256, but you can choose a higher value. The larger the value, the more information you will get and the more the memory and computational cost it requires. Before starting the computation, do not forget to apply the Gaussian blur to reduce the image noise. Next, we find the FAST corners and compute descriptors on both images. Here and in the next example, we use the suffix Object for a template image and Scene for a scene image: var cornersObject = tracking.Fast.findCorners(grayObject, colsObject, rowsObject); var cornersScene = tracking.Fast.findCorners(grayScene, colsScene, rowsScene); var descriptorsObject = tracking.Brief.getDescriptors(grayObject, colsObject, cornersObject); var descriptorsScene = tracking.Brief.getDescriptors(grayScene, colsScene, cornersScene); Then we do the matching: var matches = tracking.Brief.reciprocalMatch(cornersObject, descriptorsObject, cornersScene, descriptorsScene); We need to pass information of both corners and descriptors to the function, since it returns coordinate information as a result. Next, we print both images on one canvas. To draw the matches using this trick, we need to shift our scene keypoints for the width of a template image as a keypoint1 matching returns a point on a template and keypoint2 returns a point on a scene image. The keypoint1 and keypoint2 are arrays with x and y coordinates at 0 and 1 indexes, respectively: for (var i = 0; i < matches.length; i++) { var color = '#' + Math.floor(Math.random() * 16777215).toString(16); context.fillStyle = color; context.strokeStyle = color; context.fillRect(matches[i].keypoint1[0], matches[i].keypoint1[1], 5, 5); context.fillRect(matches[i].keypoint2[0] + colsObject, matches[i].keypoint2[1], 5, 5); context.beginPath(); context.moveTo(matches[i].keypoint1[0], matches[i].keypoint1[1]); context.lineTo(matches[i].keypoint2[0] + colsObject, matches[i].keypoint2[1]); context.stroke(); } The JSFeat library provides most of the code for pyramids and scale invariant features not in the library, but in the examples, which are available on https://github.com/inspirit/jsfeat/blob/gh-pages/sample_orb.html. We will not provide the full code here, because it requires too much space. But do not worry, we will highlight main topics here. Let's start from functions that are included in the library. First, we need to preallocate the descriptors matrix, where 32 is the length of a descriptor and 500 is the maximum number of descriptors. Again, 32 is a power of two: var descriptors = new jsfeat.matrix_t(32, 500, jsfeat.U8C1_t); Then, we compute the ORB descriptors for each corner, we need to do that for both template and scene images: jsfeat.orb.describe(matBlurred, corners, num_corners, descriptors); The function uses global variables, which mainly define input descriptors and output matching: function match_pattern() The result match_t contains the following fields: screen_idx: This is the index of a scene descriptor pattern_lev: This is the index of a pyramid level pattern_idx: This is the index of a template descriptor Since ORB works with the image pyramid, it returns corners and matches for each level: var s_kp = screen_corners[m.screen_idx]; var p_kp = pattern_corners[m.pattern_lev][m.pattern_idx]; We can print each matching as shown here. Again, we use Shift, since we computed descriptors on separate images, but print the result on one canvas: context.fillRect(p_kp.x, p_kp.y, 4, 4); context.fillRect(s_kp.x + shift, s_kp.y, 4, 4); Working with a perspective Let's take a step away. Sometimes, an object you want to detect is affected by a perspective distortion. In that case, you may want to rectify an object plane. For example, a building wall: Looks good, doesn't it? How do we do that? Let's look at the code: var imgRectified = new jsfeat.matrix_t(mat.cols, mat.rows, jsfeat.U8_t | jsfeat.C1_t); var transform = new jsfeat.matrix_t(3, 3, jsfeat.F32_t | jsfeat.C1_t); jsfeat.math.perspective_4point_transform(transform, 0, 0, 0, 0, // first pair x1_src, y1_src, x1_dst, y1_dst 640, 0, 640, 0, // x2_src, y2_src, x2_dst, y2_dst and so on. 640, 480, 640, 480, 0, 480, 180, 480); jsfeat.matmath.invert_3x3(transform, transform); jsfeat.imgproc.warp_perspective(mat, imgRectified, transform, 255); Primarily, as we did earlier, we define a result matrix object. Next, we assign a matrix for image perspective transformation. We calculate it based on four pairs of corresponding points. For example, the last, that is, the fourth point of the original image, which is [0, 480], should be projected to the point [180, 480] on the rectified image. Here, the first coordinate refers to x and the second to y. Then, we invert the transform matrix to be able to apply it to the original image—the mat variable. We pick the background color as white (255 for an unsigned byte). As a result, we get a nice image without any perspective distortion. Finding an object location Returning to our primary goal, we found a match. That is great. But what we did not do is finding an object location. There is no function for that in the tracking.js library, but JSFeat provides such functionality in the examples section. First, we need to compute a perspective transform matrix. This is why we discussed the example of such transformation previously. We have points from two images but we do not have a transformation for the whole image. First, we define a transform matrix: var homo3x3 = new jsfeat.matrix_t(3, 3, jsfeat.F32C1_t); To compute the homography, we need only four points. But after the matching, we get too many. In addition, there are can be noisy points, which we will need to skip somehow. For that, we use a RANSAC (Random sample consensus) algorithm. It is an iterative method for estimating a mathematical model from a dataset that contains outliers (noise). It estimates outliers and generates a model that is computed without the noisy data. Before we start, we need to define the algorithm parameters. The first parameter is a match mask, where all mathces will be marked as good (1) or bad (0): var match_mask = new jsfeat.matrix_t(500, 1, jsfeat.U8C1_t); Our mathematical model to find: var mm_kernel = new jsfeat.motion_model.homography2d(); Minimum number of points to estimate a model (4 points to get a homography): var num_model_points = 4; Maximum threshold to classify a data point as an inlier or a good match: var reproj_threshold = 3; Finally, the variable that holds main parameters and the last two arguments define the maximum ratio of outliers and probability of success when the algorithm stops at the point where the number of inliers is 99 percent: var ransac_param = new jsfeat.ransac_params_t(num_model_points, reproj_threshold, 0.5, 0.99); Then, we run the RANSAC algorithm. The last parameter represents the number of maximum iterations for the algorithm: jsfeat.motion_estimator.ransac(ransac_param, mm_kernel, object_xy, screen_xy, count, homo3x3, match_mask, 1000); The shape finding can be applied for both tracking.js and JSFeat libraries, you just need to set matches as object_xy and screen_xy, where those arguments must hold an array of objects with the x and y fields. After we find the transformation matrix, we compute the projected shape of an object to a new image: var shape_pts = tCorners(homo3x3.data, colsObject, rowsObject); After the computation is done, we draw computed shapes on our images: As we see, our program successfully found an object in both cases. Actually, both methods can show different performance, it is mainly based on the thresholds you set. Summary Image features and descriptors matching are powerful tools for object detection. Both JSFeat and tracking.js libraries provide different functionalities to match objects using these features. In addition to this, the JSFeat project contains algorithms for object finding. These methods can be useful for tasks such as uniques object detection, face tracking, and creating a human interface by tracking various objects very efficiently. Resources for Article: Further resources on this subject: Welcome to JavaScript in the full stack[article] Introducing JAX-RS API[article] Using Google Maps APIs with Knockout.js [article]

0
1
23287

article-image-the-cap-theorem-in-practice-the-consistency-vs-availability-trade-off-in-distributed-databases

Richard Gall

12 Sep 2019

7 min read

The CAP Theorem in practice: The consistency vs. availability trade-off in distributed databases

Richard Gall

12 Sep 2019

7 min read

When you choose a database you are making a design decision. One of the best frameworks for understanding what this means in practice is the CAP Theorem. What is the CAP Theorem? The CAP Theorem, developed by computer scientist Eric Brewer in the late nineties, states that databases can only ever fulfil two out of three elements: Consistency - that reads are always up to date, which means any client making a request to the database will get the same view of data. Availability - database requests always receive a response (when valid). Partition tolerance - that a network fault doesn’t prevent messaging between nodes. In the context of distributed (NoSQL) databases, this means there is always going to be a trade-off between consistency and availability. This is because distributed systems are always necessarily partition tolerant (ie. it simply wouldn’t be a distributed database if it wasn’t partition tolerant.) Read next: Different types of NoSQL databases and when to use them How do you use the CAP Theorem when making database decisions? Although the CAP Theorem can feel quite abstract, it has practical, real-world consequences. From both a technical and business perspective the trade-offs will lead you to some very important questions. There are no right answers. Ultimately it will be all about the context in which your database is operating, the needs of the business, and the expectations and needs of users. You will have to consider things like: Is it important to avoid throwing up errors in the client? Or are we willing to sacrifice the visible user experience to ensure consistency? Is consistency an actual important part of the user’s experience Or can we actually do what we want with a relational database and avoid the need for partition tolerance altogether? As you can see, these are ultimately user experience questions. To properly understand those, you need to be sensitive to the overall goals of the project, and, as said above, the context in which your database solution is operating. (Eg. Is it powering an internal analytics dashboard? Or is it supporting a widely used external-facing website or application?) And, as the final bullet point highlights, it’s always worth considering whether the consistency v availability trade-off should matter at all. Avoid the temptation to think a complex database solution will always be better when a simple, more traditional solution will do the job. Of course, it’s important to note that systems that aren’t partition tolerant are a single point of failure in a system. That introduces the potential for unreliability. Prioritizing consistency in a distributed database It’s possible to get into a lot of technical detail when talking about consistency and availability, but at a really fundamental level the principle is straightforward: you need consistency (or what is called a CP database) if the data in the database must always be up to date and aligned, even in the instance of a network failure (eg. the partitioned nodes are unable to communicate with one another for whatever reason). Particular use cases where you would prioritize consistency is when you need multiple clients to have the same view of the data. For example, where you’re dealing with financial information, personal information, using a database that gives you consistency and confidence that data you are looking at is up to date in a situation where the network is unreliable or fails. Examples of CP databases MongoDB Learning MongoDB 4 [Video] MongoDB 4 Quick Start Guide MongoDB, Express, Angular, and Node.js Fundamentals Redis Build Complex Express Sites with Redis and Socket.io [Video] Learning Redis HBase Learn by Example : HBase - The Hadoop Database [Video] HBase Design Patterns Prioritizing availability in a distributed database Availability is essential when data accumulation is a priority. Think here of things like behavioral data or user preferences. In scenarios like these, you will want to capture as much information as possible about what a user or customer is doing, but it isn’t critical that the database is constantly up to date. It simply just needs to be accessible and available even when network connections aren’t working. The growing demand for offline application use is also one reason why you might use a NoSQL database that prioritizes availability over consistency. Examples of AP databases Cassandra Learn Apache Cassandra in Just 2 Hours [Video] Mastering Apache Cassandra 3.x - Third Edition DynamoDB Managed NoSQL Database In The Cloud - Amazon AWS DynamoDB [Video] Hands-On Amazon DynamoDB for Developers [Video] Limitations and criticisms of CAP Theorem It’s worth noting that the CAP Theorem can pose problems. As with most things, in truth, things are a little more complicated. Even Eric Brewer is circumspect about the theorem, especially as what we expect from distributed databases. Back in 2012, twelve years after he first put his theorem into the world, he wrote that: “Although designers still need to choose between consistency and availability when partitions are present, there is an incredible range of flexibility for handling partitions and recovering from them. The modern CAP goal should be to maximize combinations of consistency and availability that make sense for the specific application. Such an approach incorporates plans for operation during a partition and for recovery afterward, thus helping designers think about CAP beyond its historically perceived limitations.” So, this means we must think about the trade-off between consistency and availability as a balancing act, rather than a binary design decision. Elsewhere, there have been more robust criticisms of CAP Theorem. Software engineer Martin Kleppmann, for example, pleaded Please stop calling databases CP or AP in 2015. In a blog post he argues that CAP Theorem only works if you adhere to specific definitions of consistency, availability, and partition tolerance. “If your use of words matches the precise definitions of the proof, then the CAP theorem applies to you," he writes. “But if you’re using some other notion of consistency or availability, you can’t expect the CAP theorem to still apply.” The consequences of this are much like those described in Brewer’s piece from 2012. You need to take a nuanced approach to database trade-offs in which you think them through on your own terms and up against your own needs. The PACELC Theorem One of the developments of this line of argument is an extension to the CAP Theorem: the PACELC Theorem. This moves beyond thinking about consistency and availability and instead places an emphasis on the trade-off between consistency and latency. The PACELC Theorem builds on the CAP Theorem (the ‘PAC’) and adds an else (the ‘E’). What this means is that while you need to choose between availability and consistency if communication between partitions has failed in a distributed system, even if things are running properly and there are no network issues, there is still going to be a trade-off between consistency and latency (the ‘LC’). Conclusion: Learn to align context with technical specs Although the CAP Theorem might seem somewhat outdated, it is valuable in providing a way to think about database architecture design. It not only forces engineers and architects to ask questions about what they want from the technologies they use, but it also forces them to think carefully about the requirements of a given project. What are the business goals? What are user expectations? The PACELC Theorem builds on CAP in an effective way. However, the most important thing about these frameworks is how they help you to think about your problems. Of course the CAP Theorem has limitations. Because it abstracts a problem it is necessarily going to lack nuance. There are going to be things it simplifies. It’s important, as Kleppmann reminds us - to be mindful of these nuances. But at the same time, we shouldn’t let an obsession with nuance and detail allow us to miss the bigger picture.

0
0
23241

article-image-exploring-structure-motion-using-opencv

Packt

09 Jan 2017

20 min read

Exploring Structure from Motion Using OpenCV

Packt

09 Jan 2017

20 min read

0
1
23233

article-image-creating-test-suites-specs-and-expectations-jest

Packt

12 Aug 2015

7 min read

Creating test suites, specs and expectations in Jest

Packt

12 Aug 2015

7 min read

In this article by Artemij Fedosejev, the author of React.js Essentials, we will take a look at test suites, specs, and expectations. To write a test for JavaScript functions, you need a testing framework. Fortunately, Facebook built their own unit test framework for JavaScript called Jest. It is built on top of Jasmine - another well-known JavaScript test framework. If you’re familiar with Jasmine you’ll find Jest's approach to testing very similar. However I'll make no assumptions about your prior experience with testing frameworks and discuss the basics first. The fundamental idea of unit testing is that you test only one piece of functionality in your application that usually is implemented by one function. And you test it in isolation - meaning that all other parts of your application which that function depends on are not used by your tests. Instead, they are imitated by your tests. To imitate a JavaScript object is to create a fake one that simulates the behavior of the real object. In unit testing the fake object is called mock and the process of creating it is called mocking. Jest automatically mocks dependencies when you're running your tests. Better yet, it automatically finds tests to execute in your repository. Let's take a look at the example. Create a directory called ./snapterest/source/js/utils/ and create a new file called TweetUtils.js within it, with the following contents: function getListOfTweetIds(tweets) { return Object.keys(tweets);}module.exports.getListOfTweetIds = getListOfTweetIds; TweetUtils.js file is a module with the getListOfTweetIds() utility function for our application to use. Given an object with tweets, getListOfTweetIds() returns an array of tweet IDs. Using the CommonJS module pattern we export this function: module.exports.getListOfTweetIds = getListOfTweetIds; Jest Unit Testing Now let's write our first unit test with Jest. We'll be testing our getListOfTweetIds() function. Create a new directory: ./snapterest/source/js/utils/__tests__/. Jest will run any tests in any __tests__ directories that it finds within your project structure. So it's important to name your directories with tests: __tests__. Create a TweetUtils-test.js file inside of __tests__:jest.dontMock('../TweetUtils');describe('Tweet utilities module', function () { it('returns an array of tweet ids', function () { var TweetUtils = require('../TweetUtils'); var tweetsMock = { tweet1: {}, tweet2: {}, tweet3: {} }; var expectedListOfTweetIds = ['tweet1', 'tweet2', 'tweet3']; var actualListOfTweetIds = TweetUtils.getListOfTweetIds(tweetsMock); expect(actualListOfTweetIds).toBe(expectedListOfTweetIds); });}); First we tell Jest not to mock our TweetUtils module: jest.dontMock('../TweetUtils'); We do this because Jest will automatically mock modules returned by the require() function. In our test we're requiring the TweetUtils module: var TweetUtils = require('../TweetUtils'); Without the jest.dontMock('../TweetUtils') call, Jest would return an imitation of our TweetUtils module, instead of the real one. But in this case we actually need the real TweetUtils module, because that's what we're testing. Creating test suites Next we call a global Jest function describe(). In our TweetUtils-test.js file we're not just creating a single test, instead we're creating a suite of tests. A suite is a collection of tests that collectively test a bigger unit of functionality. For example a suite can have multiple tests which tests all individual parts of a larger module. In our example, we have a TweetUtils module with a number of utility functions. In that situation we would create a suite for the TweetUtils module and then create tests for each individual utility function, like getListOfTweetIds(). describe defines a suite and takes two parameters: Suite name - the description of what is being tested: 'Tweet utilities module'. Suit implementation: the function that implements this suite. In our example, the suite is: describe('Tweet utilities module', function () { // Suite implementation goes here...}); Defining specs How do you create an individual test? In Jest, individual tests are called specs. They are defined by calling another global Jest function it(). Just like describe(), it() takes two parameters: Spec name: the title that describes what is being tested by this spec: 'returns an array of tweet ids'. Spec implementation: the function that implements this spec. In our example, the spec is: it('returns an array of tweet ids', function () { // Spec implementation goes here...}); Let's take a closer look at the implementation of our spec: var TweetUtils = require('../TweetUtils');var tweetsMock = { tweet1: {}, tweet2: {}, tweet3: {}};var expectedListOfTweetIds = ['tweet1', 'tweet2', 'tweet3'];var actualListOfTweetIds = TweetUtils.getListOfTweetIds(tweetsMock);expect(actualListOfTweetIds).toEqual(expectedListOfTweetIds); This spec tests whether getListOfTweetIds() method of our TweetUtils module returns an array of tweet IDs when given an object with tweets. First we import the TweetUtils module: var TweetUtils = require('../TweetUtils'); Then we create a mock object that simulates the real tweets object: var tweetsMock = { tweet1: {}, tweet2: {}, tweet3: {}}; The only requirement for this mock object is to have tweet IDs as object keys. The values are not important hence we choose empty objects. Key names are not important either, so we can name them tweet1, tweet2 and tweet3. This mock object doesn't fully simulate the real tweet object. Its sole purpose is to simulate the fact that its keys are tweet IDs. The next step is to create an expected list of tweet IDs: var expectedListOfTweetIds = ['tweet1', 'tweet2', 'tweet3']; We know what tweet IDs to expect because we've mocked a tweets object with the same IDs. The next step is to extract the actual tweet IDs from our mocked tweets object. For that we use getListOfTweetIds()that takes the tweets object and returns an array of tweet IDs: var actualListOfTweetIds = TweetUtils.getListOfTweetIds(tweetsMock); We pass tweetsMock to that method and store the results in actualListOfTweetIds. The reason this variable is named actualListOfTweetIds is because this list of tweet IDs is produced by the actual getListOfTweetIds() function that we're testing. Setting Expectations The final step will introduce us to a new important concept: expect(actualListOfTweetIds).toEqual(expectedListOfTweetIds); Let's think about the process of testing. We need to take an actual value produced by the method that we're testing - getListOfTweetIds(), and match it to the expected value that we know in advance. The result of that match will determine if our test has passed or failed. The reason why we can guess what getListOfTweetIds() will return in advance is because we've prepared the input for it - that's our mock object: var tweetsMock = { tweet1: {}, tweet2: {}, tweet3: {}}; So we can expect the following output from calling TweetUtils.getListOfTweetIds(tweetsMock): ['tweet1', 'tweet2', 'tweet3'] But because something can go wrong inside of getListOfTweetIds() we cannot guarantee this result - we can only expect it. That's why we need to create an expectation. In Jest, an Expectation is built using expect()which takes an actual value, for example: actualListOfTweetIds. expect(actualListOfTweetIds) Then we chain it with a Matcher function that compares the actual value with the expected value and tells Jest whether the expectation was met. expect(actualListOfTweetIds).toEqual(expectedListOfTweetIds); In our example we use the toEqual() matcher function to compare two arrays. Click here for a list of all built-in matcher functions in Jest. And that's how you create a spec. A spec contains one or more expectations. Each expectation tests the state of your code. A spec can be either a passing spec or a failing spec. A spec is a passing spec only when all expectations are met, otherwise it's a failing spec. Well done, you've written your first testing suite with a single spec that has one expectation. Continue reading React.js Essentials to continue your journey into testing.

0
0
22950

article-image-why-golan-is-the-fastest-growing-language-on-github

Sugandha Lahoti

09 Aug 2018

4 min read

Why Golang is the fastest growing language on GitHub

Sugandha Lahoti

09 Aug 2018

4 min read

Google’s Go language or alternatively Golang is currently one of the fastest growing programming languages in the software industry. Its speed, simplicity, and reliability make it the perfect choice for all kinds of developers. Now, its popularity has further gained momentum. According to a report, Go is the fastest growing language on GitHub in Q2 of 2018. Go has grown almost 7% overall with a 1.5% change from the previous Quarter. Source: Madnight.github.io What makes Golang so popular? A person was quoted on Reddit saying, “What I would have done in Python, Ruby, C, C# or C++, I'm now doing in Go.” Such is the impact of Go. Let’s see what makes Golang so popular. Go is cross-platform, so you can target an operating system of your choice when compiling a piece of code. Go offers a native concurrency model that is unlike most mainstream programming languages. Go relies on a concurrency model called CSP ( Communicating Sequential Processes). Instead of locking variables to share memory, Golang allows you to communicate the value stored in your variable from one thread to another. Go has a fairly mature package of its own. Once you install Go, you can build production level software that can cover a wide range of use cases from Restful web APIs to encryption software, before needing to consider any third party packages. Go code typically compiles to a single native binary, which basically makes deploying an application written in Go as easy as copying the application file to the destination server. Go is also being rapidly being adopted as the go-to cloud native language and by leading projects like Docker and Ethereum. It’s concurrency feature and easy deployment make it a popular choice for cloud development. Can Golang replace Python? Reddit is abuzz with people sharing their thoughts about whether Golang would replace Python. A user commented that “Writing a utility script is quicker in Go than in Python or JS. Not quicker as in performance, but in terms of raw development speed.” Another Reddit user pointed out three reasons not to use Python in a Reddit discussion, Why are people ditching python for go?: Dynamic compilation of Python can result in errors that exist in code, but they are in fact not detected. CPython really is very slow; very specifically, procedures that are invoked multiple times are not optimized to run more quickly in future runs (like pypy); they always run at the same slow speed. Python has a terrible distribution story; it's really hard to ship all your Python dependencies onto a new system. Go addresses those points pretty sharply. It has a good distribution story with static binaries. It has a repeatable build process, and it's pretty fast. In the same discussion, however, a user nicely sums it up saying, “There is nothing wrong with python except maybe that it is not statically typed and can be a bit slow, which also depends on the use case. Go is the new kid on the block, and while Go is nice, it doesn't have nearly as many libraries as python does. When it comes to stable, mature third-party packages, it can't beat python at the moment.” If you’re still thinking about whether or not to begin coding with Go, here’s a quirky rendition of the popular song Let it Go from Disney’s Frozen to inspire you. Write in Go! Write in Go! Go Cloud is Google’s bid to establish Golang as the go-to language of cloud Writing test functions in Golang [Tutorial] How Concurrency and Parallelism works in Golang [Tutorial]

0
0
22819

article-image-build-google-cloud-iot-application

Gebin George

27 Jun 2018

19 min read

Build an IoT application with Google Cloud [Tutorial]

Gebin George

27 Jun 2018

19 min read

0
11
22801

article-image-top-5-deep-learning-architectures

Amey Varangaonkar

24 Jul 2018

9 min read

Top 5 Deep Learning Architectures

Amey Varangaonkar

24 Jul 2018

9 min read

If you are a deep learning practitioner or someone who wants to get into the world of deep learning, you might be well acquainted with neural networks already. Neural networks, inspired by biological neural networks, are pretty useful when it comes to solving complex, multi-layered computational problems. Deep learning has stood out pretty well in several high-profile research fields - including facial and speech recognition, natural language processing, machine translation, and more. In this article, we look at the top 5 popular and widely-used deep learning architectures you should know in order to advance your knowledge or deep learning research. Convolutional Neural Networks Convolutional Neural Networks, or CNNs in short, are the popular choice of neural networks for different Computer Vision tasks such as image recognition. The name ‘convolution’ is derived from a mathematical operation involving the convolution of different functions. There are 4 primary steps or stages in designing a CNN: Convolution: The input signal is received at this stage Subsampling: Inputs received from the convolution layer are smoothened to reduce the sensitivity of the filters to noise or any other variation Activation: This layer controls how the signal flows from one layer to the other, similar to the neurons in our brain Fully connected: In this stage, all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer Here is an in-depth look at the CNN Architecture and its working, as explained by the popular AI Researcher Giancarlo Zaccone. A sample CNN in action Advantages of CNN Very good for visual recognition Once a segment within a particular sector of an image is learned, the CNN can recognize that segment present anywhere else in the image Disadvantages of CNN CNN is highly dependent on the size and quality of the training data Highly susceptible to noise Recurrent Neural Networks Recurrent Neural Networks (RNNs) have been very popular in areas where the sequence in which the information is presented is crucial. As a result, they find a lot applications in real-world domains such as natural language processing, speech synthesis and machine translation. RNNs are called ‘recurrent’ mainly because a uniform task is performed for every single element of a sequence, with the output dependant on the previous computations as well. Think of these networks as having a memory, where every calculated information is captured, stored and utilized to calculate the final outcome. Over the years, quite a few varieties of RNNs have been researched and developed: Bidirectional RNN - The output in this type of RNN depends not only on the past but also the future outcomes Deep RNN - In this type of RNN, there are multiple layers present per step, allowing for a greater rate of learning and more accuracy RNNs can be used to build industry-standard chatbots that can be used to interact with customers on websites. Given a sequence of signals from an audio wave, RNNs can also be used to predict a correct sequence of phonetic segments with a given probability. Advantages of RNN Unlike a traditional neural network, an RNN shares the same parameters across all steps. This greatly reduces the number of parameters that we need to learn RNNs can be used along with CNNs to generate accurate descriptions for unlabeled images. Disadvantages of RNN RNNs find it difficult to track long-term dependencies. This is especially true in case of long sentences and paragraphs having too many words in between the noun and the verb. RNNs cannot be stacked into very deep models. This is due to the activation function used in RNN models, making the gradient decay over multiple layers. Autoencoders Autoencoders apply the principle of backpropagation in an unsupervised environment. Autoencoders, interestingly, have a close resemblance to PCA (Principal Component Analysis) except that they are more flexible. Some of the popular applications of Autoencoders is anomaly detection - for example detecting fraud in financial transactions in banks. Basically, the core task of autoencoders is to identify and determine what constitutes regular, normal data and then identify the outliers or anomalies. Autoencoders usually represent data through multiple hidden layers such that the output signal is as close to the input signal. There are 4 major types of autoencoders being used today: Vanilla autoencoder - the simplest form of autoencoders there is, i.e. a neural net with one hidden layer Multilayer autoencoder - when one hidden layer is not enough, an autoencoder can be extended to include more hidden layers Convolutional autoencoder - In this type, convolutions are used in the autoencoders instead of fully-connected layers Regularized autoencoder - this type of autoencoders use a special loss function that enables the model to have properties beyond the basic ability to copy a given input to the output. This article demonstrates training an autoencoder using H20, a popular machine learning and AI platform. A basic representation of Autoencoder Advantages of Autoencoders Autoencoders give a resultant model which is primarily based on the data rather than predefined filters Very less complexity means it’s easier to train them Disadvantages of Autoencoders Training time can be very high sometimes If the training data is not representative of the testing data, then the information that comes out of the model can be obscured and unclear Some autoencoders, especially of the variational type, cause a deterministic bias being introduced in the model Generative Adversarial Networks The basic premise of Generative Adversarial Networks (GANs) is the training of two deep learning models simultaneously. These deep learning networks basically compete with each other - one model that tries to generate new instances or examples is called as the generator. The other model that tries to classify if a particular instance originates from the training data or from the generator is called as the discriminator. GANs, a breakthrough recently in the field of deep learning, was a concept put forth by the popular deep learning expert Ian Goodfellow in 2014. It finds large and important applications in Computer Vision, especially image generation. Read more about the structure and the functionality of the GAN from the official paper submitted by Ian Goodfellow. General architecture of GAN (Source: deeplearning4j) Advantages of GAN Per Goodfellow, GANs allow for efficient training of classifiers in a semi-supervised manner Because of the improved accuracy of the model, the generated data is almost indistinguishable from the original data GANs do not introduce any deterministic bias unlike variational autoencoders Disadvantages of GAN Generator and discriminator working efficiently is crucial to the success of GAN. The whole system fails even if one of them fails Both the generator and discriminator are separate systems and trained with different loss functions. Hence the time required to train the entire system can get quite high. Interested to know more about GANs? Here’s what you need to know about them. ResNets Ever since they gained popularity in 2015, ResNets or Deep Residual Networks have been widely adopted and used by many data scientists and AI researchers. As you already know, CNNs are highly useful when it comes to solving image classification and visual recognition problems. As these tasks become more complex, training of the neural network starts to get a lot more difficult, as additional deep layers are required to compute and enhance the accuracy of the model. Residual learning is a concept designed to tackle this very problem, and the resultant architecture is popularly known as a ResNet. A ResNet consists of a number of residual modules - where each module represents a layer. Each layer consists of a set of functions to be performed on the input. The depth of a ResNet can vary greatly - the one developed by Microsoft researchers for an image classification problem had 152 layers! A basic building block of ResNet (Source: Quora) Advantages of ResNets ResNets are more accurate and require less weights than LSTMs and RNNs in some cases They are highly modular. Hundreds and thousands of residual layers can be added to create a network and then trained. ResNets can be designed to determine how deep a particular network needs to be. Disadvantages of ResNets If the layers in a ResNet are too deep, errors can be hard to detect and cannot be propagated back quickly and correctly. At the same time, if the layers are too narrow, the learning might not be very efficient. Apart from the ones above, a few more deep learning models are being increasingly adopted and preferred by data scientists. These definitely deserve a honorable mention: LSTM: LSTMs are a special kind of Recurrent Neural Networks that include a special memory cell that can hold information for long periods of time. A set of gates is used to determine when a particular information enters the memory and when it is forgotten. SqueezeNet: One of the newer but very powerful deep learning architectures which is extremely efficient for low bandwidth platforms such as mobile. CapsNet: CapsNet, or Capsule Networks, is a recent breakthrough in the field of Deep Learning and neural network modeling. Mainly used for accurate image recognition tasks, and is an advanced variation of the CNNs. SegNet: A popular deep learning architecture especially used to solve the image segmentation problem. Seq2Seq: An upcoming deep learning architecture being increasingly used for machine translation and building efficient chatbots So there you have it! Thanks to the intense efforts in research in deep learning and AI, we now have a variety of deep learning models at our disposal to solve a variety of problems - both functional and computational. What’s even better is that we have the liberty to choose the most appropriate deep learning architecture based on the problem at hand. [box type="shadow" align="" class="" width=""]Editor’s Tip: It is very important to know the best deep learning frameworks you can use to train your models. Here are the top 10 deep learning frameworks for you.[/box] In contrast to the traditional programming approach where we tell the computer what to do, the deep learning models figure out the problem and devise the most appropriate solution on their own - however complex the problem may be. No wonder these deep learning architectures are being researched on and deployed on a large scale by the major market players such as Google, Facebook, Microsoft and many others. Packt Explains… Deep Learning in 90 seconds Behind the scenes: Deep learning evolution and core concepts Facelifting NLP with Deep Learning

0
0
22647

article-image-sizing-configuring-hadoop-cluster

Oli Huggins

16 Feb 2014

10 min read

Sizing and Configuring your Hadoop Cluster

Oli Huggins

16 Feb 2014

10 min read

This article, written by Khaled Tannir, the author of Optimizing Hadoop for MapReduce, discusses two of the most important aspects to consider while optimizing Hadoop for MapReduce: sizing and configuring the Hadoop cluster correctly. Sizing your Hadoop cluster Hadoop's performance depends on multiple factors based on well-configured software layers and well-dimensioned hardware resources that utilize its CPU, Memory, hard drive (storage I/O) and network bandwidth efficiently. Planning the Hadoop cluster remains a complex task that requires a minimum knowledge of the Hadoop architecture and may be out the scope of this book. This is what we are trying to make clearer in this section by providing explanations and formulas in order to help you to best estimate your needs. We will introduce a basic guideline that will help you to make your decision while sizing your cluster and answer some How to plan questions about cluster's needs such as the following: How to plan my storage? How to plan my CPU? How to plan my memory? How to plan the network bandwidth? While sizing your Hadoop cluster, you should also consider the data volume that the final users will process on the cluster. The answer to this question will lead you to determine how many machines (nodes) you need in your cluster to process the input data efficiently and determine the disk/memory capacity of each one. Hadoop is a Master/Slave architecture and needs a lot of memory and CPU bound. It has two main components: JobTracker: This is the critical component in this architecture and monitors jobs that are running on the cluster TaskTracker: This runs tasks on each node of the cluster To work efficiently, HDFS must have high throughput hard drives with an underlying filesystem that supports the HDFS read and write pattern (large block). This pattern defines one big read (or write) at a time with a block size of 64 MB, 128 MB, up to 256 MB. Also, the network layer should be fast enough to cope with intermediate data transfer and block. HDFS is itself based on a Master/Slave architecture with two main components: the NameNode / Secondary NameNode and DataNode components. These are critical components and need a lot of memory to store the file's meta information such as attributes and file localization, directory structure, names, and to process data. The NameNode component ensures that data blocks are properly replicated in the cluster. The second component, the DataNode component, manages the state of an HDFS node and interacts with its data blocks. It requires a lot of I/O for processing and data transfer. Typically, the MapReduce layer has two main prerequisites: input datasets must be large enough to fill a data block and split in smaller and independent data chunks (for example, a 10 GB text file can be split into 40,960 blocks of 256 MB each, and each line of text in any data block can be processed independently). The second prerequisite is that it should consider the data locality, which means that the MapReduce code is moved where the data lies, not the opposite (it is more efficient to move a few megabytes of code to be close to the data to be processed, than moving many data blocks over the network or the disk). This involves having a distributed storage system that exposes data locality and allows the execution of code on any storage node. Concerning the network bandwidth, it is used at two instances: during the replication process and following a file write, and during the balancing of the replication factor when a node fails. The most common practice to size a Hadoop cluster is sizing the cluster based on the amount of storage required. The more data into the system, the more will be the machines required. Each time you add a new node to the cluster, you get more computing resources in addition to the new storage capacity. Let's consider an example cluster growth plan based on storage and learn how to determine the storage needed, the amount of memory, and the number of DataNodes in the cluster. Daily data input 100 GB Storage space used by daily data input = daily data input * replication factor = 300 GB HDFS replication factor 3 Monthly growth 5% Monthly volume = (300 * 30) + 5% = 9450 GB After one year = 9450 * (1 + 0.05)^12 = 16971 GB Intermediate MapReduce data 25% Dedicated space = HDD size * (1 - Non HDFS reserved space per disk / 100 + Intermediate MapReduce data / 100) = 4 * (1 - (0.25 + 0.30)) = 1.8 TB (which is the node capacity) Non HDFS reserved space per disk 30% Size of a hard drive disk 4 TB Number of DataNodes needed to process: Whole first month data = 9.450 / 1800 ~= 6 nodes The 12th month data = 16.971/ 1800 ~= 10 nodes Whole year data = 157.938 / 1800 ~= 88 nodes Do not use RAID array disks on a DataNode. HDFS provides its own replication mechanism. It is also important to note that for every disk, 30 percent of its capacity should be reserved to non-HDFS use. It is easy to determine the memory needed for both NameNode and Secondary NameNode. The memory needed by NameNode to manage the HDFS cluster metadata in memory and the memory needed for the OS must be added together. Typically, the memory needed by Secondary NameNode should be identical to NameNode. Then you can apply the following formulas to determine the memory amount: NameNode memory 2 GB - 4 GB Memory amount = HDFS cluster management memory + NameNode memory + OS memory Secondary NameNode memory 2 GB - 4 GB OS memory 4 GB - 8 GB HDFS memory 2 GB - 8 GB At least NameNode (Secondary NameNode) memory = 2 + 2 + 4 = 8 GB It is also easy to determine the DataNode memory amount. But this time, the memory amount depends on the physical CPU's core number installed on each DataNode. DataNode process memory 4 GB - 8 GB Memory amount = Memory per CPU core * number of CPU's core + DataNode process memory + DataNode TaskTracker memory + OS memory DataNode TaskTracker memory 4 GB - 8 GB OS memory 4 GB - 8 GB CPU's core number 4+ Memory per CPU core 4 GB - 8 GB At least DataNode memory = 4*4 + 4 + 4 + 4 = 28 GB Regarding how to determine the CPU and the network bandwidth, we suggest using the now-a-days multicore CPUs with at least four physical cores per CPU. The more physical CPU's cores you have, the more you will be able to enhance your job's performance (according to all rules discussed to avoid underutilization or overutilization). For the network switches, we recommend to use equipment having a high throughput (such as 10 GB) Ethernet intra rack with N x 10 GB Ethernet inter rack. Configuring your cluster correctly To run Hadoop and get a maximum performance, it needs to be configured correctly. But the question is how to do that. Well, based on our experiences, we can say that there is not one single answer to this question. The experiences gave us a clear indication that the Hadoop framework should be adapted for the cluster it is running on and sometimes also to the job. In order to configure your cluster correctly, we recommend running a Hadoop job(s) the first time with its default configuration to get a baseline. Then, you will check the resource's weakness (if it exists) by analyzing the job history logfiles and report the results (measured time it took to run the jobs). After that, iteratively, you will tune your Hadoop configuration and re-run the job until you get the configuration that fits your business needs. The number of mappers and reducer tasks that a job should use is important. Picking the right amount of tasks for a job can have a huge impact on Hadoop's performance. The number of reducer tasks should be less than the number of mapper tasks. Google reports one reducer for 20 mappers; the others give different guidelines. This is because mapper tasks often process a lot of data, and the result of those tasks are passed to the reducer tasks. Often, a reducer task is just an aggregate function that processes a minor portion of the data compared to the mapper tasks. Also, the correct number of reducers must also be considered. The number of mappers and reducers is related to the number of physical cores on the DataNode, which determines the maximum number of jobs that can run in parallel on DataNode. In a Hadoop cluster, master nodes typically consist of machines where one machine is designed as a NameNode, and another as a JobTracker, while all other machines in the cluster are slave nodes that act as DataNodes and TaskTrackers. When starting the cluster, you begin starting the HDFS daemons on the master node and DataNode daemons on all data nodes machines. Then, you start the MapReduce daemons: JobTracker on the master node and the TaskTracker daemons on all slave nodes. The following diagram shows the Hadoop daemon's pseudo formula: When configuring your cluster, you need to consider the CPU cores and memory resources that need to be allocated to these daemons. In a huge data context, it is recommended to reserve 2 CPU cores on each DataNode for the HDFS and MapReduce daemons. While in a small and medium data context, you can reserve only one CPU core on each DataNode. Once you have determined the maximum mapper's slot numbers, you need to determine the reducer's maximum slot numbers. Based on our experience, there is a distribution between the Map and Reduce tasks on DataNodes that give good performance result to define the reducer's slot numbers the same as the mapper's slot numbers or at least equal to two-third mapper slots. Let's learn to correctly configure the number of mappers and reducers and assume the following cluster examples: Cluster machine Nb Medium data size Large data size DataNode CPU cores 8 Reserve 1 CPU core Reserve 2 CPU cores DataNode TaskTracker daemon 1 1 1 DataNode HDFS daemon 1 1 1 Data block size 128 MB 256 MB DataNode CPU % utilization 95% to 120% 95% to 150% Cluster nodes 20 40 Replication factor 2 3 We want to use the CPU resources at least 95 percent, and due to Hyper-Threading, one CPU core might process more than one job at a time, so we can set the Hyper-Threading factor range between 120 percent and 170 percent. Maximum mapper's slot numbers on one node in a large data context = number of physical cores - reserved core * (0.95 -> 1.5) Reserved core = 1 for TaskTracker + 1 for HDFS Let's say the CPU on the node will use up to 120% (with Hyper-Threading) Maximum number of mapper slots = (8 - 2) * 1.2 = 7.2 rounded down to 7 Let's apply the 2/3 mappers/reducers technique: Maximum number of reducers slots = 7 * 2/3 = 5 Let's define the number of slots for the cluster: Mapper's slots: = 7 * 40 = 280 Reducer's slots: = 5 * 40 = 200 The block size is also used to enhance performance. The default Hadoop configuration uses 64 MB blocks, while we suggest using 128 MB in your configuration for a medium data context as well and 256 MB for a very large data context. This means that a mapper task can process one data block (for example, 128 MB) by only opening one block. In the default Hadoop configuration (set to 2 by default), two mapper tasks are needed to process the same amount of data. This may be considered as a drawback because initializing one more mapper task and opening one more file takes more time. Summary In this article, we learned about sizing and configuring the Hadoop cluster for optimizing it for MapReduce. Resources for Article: Further resources on this subject: Hadoop Tech Page Hadoop and HDInsight in a Heartbeat Securing the Hadoop Ecosystem Advanced Hadoop MapReduce Administration

0
3
22529

article-image-3-programming-languages-some-people-think-are-dead-but-definitely-arent

Richard Gall

24 Oct 2019

11 min read

3 programming languages some people think are dead but definitely aren’t

Richard Gall

24 Oct 2019

11 min read

0
0
22525

article-image-how-configure-squid-proxy-server

Packt

25 Apr 2011

8 min read

How to Configure Squid Proxy Server

Packt

25 Apr 2011

8 min read

0
2
22487

article-image-how-to-remotely-monitor-hosts-over-telnet-and-ssh-tutorial

Melisha Dsouza

20 Mar 2019

14 min read

How to remotely monitor hosts over Telnet and SSH [Tutorial]

Melisha Dsouza

20 Mar 2019

14 min read

In this tutorial, you will learn how to carry out basic configurations on a server with Telnet and SSH configured. We will begin by using the Telnet module, after which we will implement the same configurations using the preferred method: SSH using different modules in Python. You will also learn about how telnetlib, subprocess, fabric, Netmiko, and paramiko modules work. This tutorial is an excerpt from a book written by Ganesh Sanjiv Naik titled Mastering Python Scripting for System Administrators. This book will take you through a set of specific software patterns and you will learn, in detail, how to apply these patterns and build working software on top of a serverless system. The telnetlib() module In this section, we are going to learn about the Telnet protocol and then we will do Telnet operations using the telnetlib module over a remote server. Telnet is a network protocol that allows a user to communicate with remote servers. It is mostly used by network administrators to remotely access and manage devices. To access the device, run the Telnet command with the IP address or hostname of a remote server in your Terminal. Telnet uses TCP on the default port number 23. To use Telnet, make sure it is installed on your system. If not, run the following command to install it: $ sudo apt-get install telnetd To run Telnet using the simple Terminal, you just have to enter the following command: $ telnet ip_address_of_your_remote_server Python has the telnetlib module to perform Telnet functions through Python scripts. Before telnetting your remote device or router, make sure they are configured properly and, if not, you can do basic configuration by using the following command in the router's Terminal: configure terminal enable password 'set_Your_password_to_access_router' username 'set_username' password 'set_password_for_remote_access' line vty 0 4 login local transport input all interface f0/0 ip add 'set_ip_address_to_the_router' 'put_subnet_mask' no shut end show ip interface brief Now, let's see the example of Telnetting a remote device. For that, create a telnet_example.py script and write following content in it: import telnetlib import getpass import sys HOST_IP = "your host ip address" host_user = input("Enter your telnet username: ") password = getpass.getpass() t = telnetlib.Telnet(HOST_IP) t.read_until(b"Username:") t.write(host_user.encode("ascii") + b"\n") if password: t.read_until(b"Password:") t.write(password.encode("ascii") + b"\n") t.write(b"enable\n") t.write(b"enter_remote_device_password\n") #password of your remote device t.write(b"conf t\n") t.write(b"int loop 1\n") t.write(b"ip add 10.1.1.1 255.255.255.255\n") t.write(b"int loop 2\n") t.write(b"ip add 20.2.2.2 255.255.255.255\n") t.write(b"end\n") t.write(b"exit\n") print(t.read_all().decode("ascii") ) Run the script and you will get the output as follows: student@ubuntu:~$ python3 telnet_example.py Output: Enter your telnet username: student Password: server>enable Password: server#conf t Enter configuration commands, one per line. End with CNTL/Z. server(config)#int loop 1 server(config-if)#ip add 10.1.1.1 255.255.255.255 server(config-if)#int loop 23 server(config-if)#ip add 20.2.2.2 255.255.255.255 server(config-if)#end server#exit In the preceding example, we accessed and configured a Cisco router using the telnetlib module. In this script, first, we took the username and password from the user to initialize the Telnet connection with a remote device. When the connection was established, we did further configuration on the remote device. After telnetting, we will be able to access a remote server or device. But there is one very important disadvantage of this Telnet protocol, and that is all the data, including usernames and passwords, is sent over a network in a text manner, which may cause a security risk. Because of that, nowadays Telnet is rarely used and has been replaced by a very secure protocol named Secure Shell, known as SSH. Install SSH by running the following command in your Terminal: $ sudo apt install ssh Also, on a remote server where the user wants to communicate, an SSH server must be installed and running. SSH uses the TCP protocol and works on port number 22 by default. You can run the ssh command through the Terminal as follows: $ ssh host_name@host_ip_address Now, you will learn to do SSH by using different modules in Python, such as subprocess, fabric, Netmiko, and Paramiko. Now, we will see those modules one by one. The subprocess.Popen() module The Popen class handles the process creation and management. By using this module, developers can handle less common cases. The child program execution will be done in a new process. To execute a child program on Unix/Linux, the class will use the os.execvp() function. To execute a child program in Windows, the class will use the CreateProcess() function. Now, let's see some useful arguments of subprocess.Popen(): class subprocess.Popen(args, bufsize=0, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False, cwd=None, env=None, universal_newlines=False, startupinfo=None, creationflags=0) Let's look at each argument: args: It can be a sequence of program arguments or a single string. If args is a sequence, the first item in args is executed. If args is a string, it recommends to pass args as a sequence. shell: The shell argument is by default set to False and it specifies whether to use shell for execution of the program. If shell is True, it recommends to pass args as a string. In Linux, if shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. bufsize: If bufsize is 0 (by default, it is 0), it means unbuffered and if bufsize is 1, it means line buffered. If bufsize is any other positive value, use a buffer of the given size. If bufsize is any other negative value, it means fully buffered. executable: It specifies that the replacement program to be executed. stdin, stdout, and stderr: These arguments define the standard input, standard output, and standard error respectively. preexec_fn: This is set to a callable object and will be called just before the child is executed in the child process. close_fds: In Linux, if close_fds is true, all file descriptors except 0, 1, and 2 will be closed before the child process is executed. In Windows, if close_fds is true then the child process will inherit no handles. env: If the value is not None, then mapping will define environment variables for new process. universal_newlines: If the value is True then stdout and stderr will be opened as text files in newlines mode. Now, we are going to see an example of subprocess.Popen(). For that, create a ssh_using_sub.py script and write the following content in it: import subprocess import sys HOST="your host username@host ip" COMMAND= "ls" ssh_obj = subprocess.Popen(["ssh", "%s" % HOST, COMMAND], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) result = ssh_obj.stdout.readlines() if result == []: err = ssh_obj.stderr.readlines() print(sys.stderr, "ERROR: %s" % err) else: print(result) Run the script and you will get the output as follows: student@ubuntu:~$ python3 ssh_using_sub.py Output : [email protected]'s password: [b'Desktop\n', b'Documents\n', b'Downloads\n', b'examples.desktop\n', b'Music\n', b'Pictures\n', b'Public\n', b'sample.py\n', b'spark\n', b'spark-2.3.1-bin-hadoop2.7\n', b'spark-2.3.1-bin-hadoop2.7.tgz\n', b'ssh\n', b'Templates\n', b'test_folder\n', b'test.txt\n', b'Untitled1.ipynb\n', b'Untitled.ipynb\n', b'Videos\n', b'work\n'] In the preceding example, first, we imported the subprocess module, then we defined the host address where you want to establish the SSH connection. After that, we gave one simple command that executed over the remote device. After all this was set up, we put this information in the subprocess.Popen() function. This function executed the arguments defined inside that function to create a connection with the remote device. After the SSH connection was established, our defined command was executed and provided the result. Then we printed the result of SSH on the Terminal, as shown in the output. SSH using fabric module Fabric is a Python library as well as a command-line tool for the use of SSH. It is used for system administration and application deployment over the network. We can also execute shell commands over SSH. To use fabric module, first you have to install it using the following command: $ pip3 install fabric3 Now, we will see an example. Create a fabfile.py script and write the following content in it: from fabric.api import * env.hosts=["host_name@host_ip"] env.password='your password' def dir(): run('mkdir fabric') print('Directory named fabric has been created on your host network') def diskspace(): run('df') Run the script and you will get the output as follows: student@ubuntu:~$ fab dir Output: [[email protected]] Executing task 'dir' [[email protected]] run: mkdir fabric Done. Disconnecting from 192.168.0.106... done. In the preceding example, first, we imported the fabric.api module, then we set the hostname and password to get connected with the host network. After that, we set a different task to perform over SSH. Therefore, to execute our program instead of the Python3 fabfile.py, we used the fab utility (fab dir), and after that we stated that the required tasks should be performed from our fabfile.py. In our case, we performed the dir task, which creates a directory with the name 'fabric' on your remote network. You can add your specific task in your Python file. It can be executed using the fab utility of the fabric module. SSH using the Paramiko library Paramiko is a library that implements the SSHv2 protocol for secure connections to remote devices. Paramiko is a pure Python interface around SSH. Before using Paramiko, make sure you have installed it properly on your system. If it is not installed, you can install it by running the following command in your Terminal: $ sudo pip3 install paramiko Now, we will see an example of using paramiko. For this paramiko connection, we are using a Cisco device. Paramiko supports both password-based and key-pair based authentication for a secure connection with the server. In our script, we are using password-based authentication, which means we check for a password and, if available, authentication is attempted using plain username/password authentication. Before we are going to do SSH to your remote device or multi-layer router, make sure they are configured properly and, if not, you can do basic configuration by using the following command in a multi-layer router Terminal: configure t ip domain-name cciepython.com crypto key generate rsa How many bits in the modulus [512]: 1024 interface range f0/0 - 1 switchport mode access switchport access vlan 1 no shut int vlan 1 ip add 'set_ip_address_to_the_router' 'put_subnet_mask' no shut exit enable password 'set_Your_password_to_access_router' username 'set_username' password 'set_password_for_remote_access' username 'username' privilege 15 line vty 0 4 login local transport input all end Now, create a pmiko.py script and write the following content in it: import paramiko import time ip_address = "host_ip_address" usr = "host_username" pwd = "host_password" c = paramiko.SSHClient() c.set_missing_host_key_policy(paramiko.AutoAddPolicy()) c.connect(hostname=ip_address,username=usr,password=pwd) print("SSH connection is successfully established with ", ip_address) rc = c.invoke_shell() for n in range (2,6): print("Creating VLAN " + str(n)) rc.send("vlan database\n") rc.send("vlan " + str(n) + "\n") rc.send("exit\n") time.sleep(0.5) time.sleep(1) output = rc.recv(65535) print(output) c.close Run the script and you will get the output as follows: student@ubuntu:~$ python3 pmiko.py Output: SSH connection is successfuly established with 192.168.0.70 Creating VLAN 2 Creating VLAN 3 Creating VLAN 4 Creating VLAN 5 In the preceding example, first, we imported the paramiko module, then we defined the SSH credentials required to connect the remote device. After providing credentials, we created an instance 'c' of paramiko.SSHclient(), which is the primary client used to establish connections with the remote device and execute commands or operations. The creation of an SSHClient object allows us to establish remote connections using the .connect() function. Then, we set the policy paramiko connection because, by default, paramiko.SSHclient sets the SSH policy in reject policy state. That causes the policy to reject any SSH connection without any validation. In our script, we are neglecting this possibility of SSH connection drop by using the AutoAddPolicy() function that automatically adds the server's host key without prompting it. We can use this policy for testing purposes only, but this is not a good option in a production environment because of security purpose. When an SSH connection is established, you can do any configuration or operation that you want on your device. Here, we created a few virtual LANs on a remote device. After creating VLANs, we just closed the connection. SSH using the Netmiko library In this section, we will learn about Netmiko. The Netmiko library is an advanced version of Paramiko. It is a multi_vendor library that is based on Paramiko. Netmiko simplifies SSH connection to a network device and takes particular operation on the device. Before going doing SSH to your remote device or multi-layer router, make sure they are configured properly and, if not, you can do basic configuration by command mentioned in the Paramiko section. Now, let's see an example. Create a nmiko.py script and write the following code in it: from netmiko import ConnectHandler remote_device={ 'device_type': 'cisco_ios', 'ip': 'your remote_device ip address', 'username': 'username', 'password': 'password', } remote_connection = ConnectHandler(**remote_device) #net_connect.find_prompt() for n in range (2,6): print("Creating VLAN " + str(n)) commands = ['exit','vlan database','vlan ' + str(n), 'exit'] output = remote_connection.send_config_set(commands) print(output) command = remote_connection.send_command('show vlan-switch brief') print(command) Run the script and you will get the output as follows: student@ubuntu:~$ python3 nmiko.py Output: Creating VLAN 2 config term Enter configuration commands, one per line. End with CNTL/Z. server(config)#exit server #vlan database server (vlan)#vlan 2 VLAN 2 modified: server (vlan)#exit APPLY completed. Exiting.... server # .. .. .. .. switch# Creating VLAN 5 config term Enter configuration commands, one per line. End with CNTL/Z. server (config)#exit server #vlan database server (vlan)#vlan 5 VLAN 5 modified: server (vlan)#exit APPLY completed. Exiting.... VLAN Name Status Ports ---- -------------------------------- --------- ------------------------------- 1 default active Fa0/0, Fa0/1, Fa0/2, Fa0/3, Fa0/4, Fa0/5, Fa0/6, Fa0/7, Fa0/8, Fa0/9, Fa0/10, Fa0/11, Fa0/12, Fa0/13, Fa0/14, Fa0/15 2 VLAN0002 active 3 VLAN0003 active 4 VLAN0004 active 5 VLAN0005 active 1002 fddi-default active 1003 token-ring-default active 1004 fddinet-default active 1005 trnet-default active In the preceding example, we use Netmiko library to do SSH, instead of Paramiko. In this script, first, we imported ConnectHandler from the Netmiko library, which we used to establish an SSH connection to the remote network devices by passing in the device dictionary. In our case, that dictionary is remote_device. After the connection is established, we executed configuration commands to create a number of virtual LANs using the send_config_set() function. When we use this type (.send_config_set()) of function to pass commands on a remote device, it automatically sets our device in configuration mode. After sending configuration commands, we also passed a simple command to get the information about the configured device. Summary In this tutorial, you learned about Telnet and SSH and different Python modules such as telnetlib, subprocess, fabric, Netmiko, and Paramiko, using which we perform Telnet and SSH. SSH uses the public key encryption for security purposes and is more secure than Telnet. To learn how to leverage the features and libraries of Python to administrate your environment efficiently, check out our book Mastering Python Scripting for System Administrators. 5 blog posts that could make you a better Python programmer “With Python, you can create self-explanatory, concise, and engaging data visuals, and present insights that impact your business” – Tim Großmann and Mario Döbler [Interview] Using Python Automation to interact with network devices [Tutorial]

0
0
22400

How-To Tutorials

Xamarin: How to add a MVVM pattern to an app [Tutorial]

How to recover deleted data from an Android device [Tutorial]

A really basic guide to batch file programming

Documenting RESTful Java Web Services using Swagger

Object Detection Using Image Features in JavaScript

The CAP Theorem in practice: The consistency vs. availability trade-off in distributed databases

Exploring Structure from Motion Using OpenCV

Creating test suites, specs and expectations in Jest

Why Golang is the fastest growing language on GitHub

Build an IoT application with Google Cloud [Tutorial]

Trending Topics

Top 5 Deep Learning Architectures

Sizing and Configuring your Hadoop Cluster

3 programming languages some people think are dead but definitely aren’t

How to Configure Squid Proxy Server

How to remotely monitor hosts over Telnet and SSH [Tutorial]