How-To Tutorials

article-image-2018-new-year-resolutions-to-thrive-in-the-algorithmic-world-part-3-of-3

05 Jan 2018

5 min read

2018 new year resolutions to thrive in the Algorithmic World - Part 3 of 3

05 Jan 2018

We have already talked about a simple learning roadmap for you to develop your data science skills in the first resolution. We also talked about the importance of staying relevant in an increasingly automated job market, in our second resolution. Now it’s time to think about the kind of person you want to be and the legacy you will leave behind. 3rd Resolution: Choose projects wisely and be mindful of their impact. Your work has real consequences. And your projects will often be larger than what you know or can do. As such, the first step toward creating impact with intention is to define the project scope, purpose, outcomes and assets clearly. The next most important factor is choosing the project team. 1. Seek out, learn from and work with a diverse group of people To become a successful data scientist you must learn how to collaborate. Not only does it make projects fun and efficient, but it also brings in diverse points of view and expertise from other disciplines. This is a great advantage for machine learning projects that attempt to solve complex real-world problems. You could benefit from working with other technical professionals like web developers, software programmers, data analysts, data administrators, game developers etc. Collaborating with such people will enhance your own domain knowledge and skills and also let you see your work from a broader technical perspective. Apart from the people involved in the core data and software domain, there are others who also have a primary stake in your project’s success. These include UX designers, people with humanities background if you are building a product intended to participate in society (which most products often are), business development folks, who actually sell your product and bring revenue, marketing people, who are responsible for bringing your product to a much wider audience to name a few. Working with people of diverse skill sets will help market your product right and make it useful and interpretable to the target audience. In addition to working with a melange of people with diverse skill sets and educational background it is also important to work with people who think differently from you, and who have experiences that are different from yours to get a more holistic idea of the problems your project is trying to tackle and to arrive at a richer and unique set of solutions to solve those problems. 2. Educate yourself on ethics for data science As an aspiring data scientist, you should always keep in mind the ethical aspects surrounding privacy, data sharing, and algorithmic decision-making. Here are some ways to develop a mind inclined to designing ethically-sound data science projects and models. Listen to seminars and talks by experts and researchers in fairness, accountability, and transparency in machine learning systems. Our favorites include Kate Crawford’s talk on The trouble with bias, Tricia Wang on The human insights missing from big data and Ethics & Data Science by Jeff Hammerbacher. Follow top influencers on social media and catch up with their blogs and about their work regularly. Some of these researchers include Kate Crawford, Margaret Mitchell, Rich Caruana, Jake Metcalf, Michael Veale, and Kristian Lum among others. Take up courses which will guide you on how to eliminate unintended bias while designing data-driven algorithms. We recommend Data Science Ethics by the University of Michigan, available on edX. You can also take up a course on basic Philosophy from your choice of University. Start at the beginning. Read books on ethics and philosophy when you get long weekends this year. You can begin with Aristotle's Nicomachean Ethics to understand the real meaning of ethics, a term Aristotle helped develop. We recommend browsing through The Stanford Encyclopedia of Philosophy, which is an online archive of peer-reviewed publication of original papers in philosophy, freely accessible to Internet users. You can also try Practical Ethics, a book by Peter Singer and The Elements of Moral Philosophy by James Rachels. Attend or follow upcoming conferences in the field of bringing transparency in socio-technical systems. For starters, FAT* (Conference on Fairness, Accountability, and Transparency) is scheduled on February 23 and 24th, 2018 at New York University, NYC. We also have the 5th annual conference of FAT/ML, later in the year. 3. Question/Reassess your hypotheses before, during and after actual implementation Finally, for any data science project, always reassess your hypotheses before, during, and after the actual implementation. Always ask yourself these questions after each of the above steps and compare them with the previous answers. What question are you asking? What is your project about? Whose needs is it addressing? Who could it adversely impact? What data are you using? Is the data-type suitable for your type of model? Is the data relevant and fresh? What are its inherent biases and limitations? How robust are your workarounds for them? What techniques are you going to try? What algorithms are you going to implement? What would be its complexity? Is it interpretable and transparent? How will you evaluate your methods and results? What do you expect the results to be? Are the results biased? Are they reproducible? These pointers will help you evaluate your project goals from a customer and business point of view. Additionally, it will also help you in building efficient models which can benefit the society and your organization at large. With this, we come to the end of our new year resolutions for an aspiring data scientist. However, the beauty of the ideas behind these resolutions is that they are easily transferable to anyone in any job. All you gotta do is get your foundations right, stay relevant, and be mindful of your impact. We hope this gives a great kick start to your career in 2018. “Motivation is what gets you started. Habit is what keeps you going.” ― Jim Ryun Happy New Year! May the odds and the God(s) be in your favor this year to help you build your resolutions into your daily routines and habits!

0
0
2017

article-image-getting-to-know-different-big-data-characteristics

Gebin George

05 Jan 2018

4 min read

Getting to know different Big data Characteristics

Gebin George

05 Jan 2018

4 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Osvaldo Martin titled Mastering Predictive Analytics with R, Second Edition. This book will help you leverage the flexibility and modularity of R to experiment with a range of different techniques and data types.[/box] Our article will quickly walk you through all the fundamental characteristics of Big Data. For you to determine if your data source qualifies as big data or as needing special handling, you can start by examining your data source in the following areas: The volume (amount) of data. The variety of data. The number of different sources and spans of the data. Let's examine each of these areas. Volume If you are talking about the number of rows or records, then most likely your data source is not a big data source since big data is typically measured in gigabytes, terabytes, and petabytes. However, space doesn't always mean big, as these size measurements can vary greatly in terms of both volume and functionality. Additionally, data sources of several million records may qualify as big data, given their structure (or lack of structure). Varieties Data used in predictive models may be structured or unstructured (or both) and include transactions from databases, survey results, website logs, application messages, and so on (by using a data source consisting of a higher variety of data, you are usually able to cover a broader context for the analytics you derive from it). Variety, much like volume, is considered a normal qualifier for big data. Sources and spans If the data source for your predictive analytics project is the result of integrating several sources, you most likely hit on both criteria of volume and variety and your data qualifies as big data. If your project uses data that is affected by governmental mandates, consumer requests is a historical analysis, you are almost certainty using big data. Government regulations usually require that certain types of data need to be stored for several years. Products can be consumer driven over the lifetime of the product and with today's trends, historical analysis data is usually available for more than five years. Again, all examples of big data sources. Structure You will often find that data sources typically fall into one of the following three categories: 1. Sources with little or no structure in the data (such as simple text files). 2. Sources containing both structured and unstructured data (like data that is sourced from document management systems or various websites, and so on). 3. Sources containing highly structured data (like transactional data stored in a relational database example). How your data source is categorized will determine how you prepare and work with your data in each phase of your predictive analytics project. Although data sources with structure can obviously still fall into the category of big data, it's data containing both structured and unstructured data (and of course totally unstructured data) that fit as big data and will require special handling and or pre-processing. Statistical noise Finally, we should take a note here that other factors (other than those discussed already in the chapter) can qualify your project data source as being unwieldy, overly complex, or a big data source. These include (but are not limited to): Statistical noise (a term for recognized amounts of unexplained variations within the data) Data suffering from mismatched understandings (the differences in interpretations of the data by communities, cultures, practices, and so on) Once you have determined that the data source that you will be using in your predictive analytics project seems to qualify as big (again as we are using the term here) then you can proceed with the process of deciding how to manage and manipulate that data source, based upon the known challenges this type of data demands, so as to be most effective. In the next section, we will review some of these common problems, before we go on to offer useable solutions. We have learned fundamental characteristics which define Big Data, to further use them for Analytics. If you enjoyed our post, check out the book Mastering Predictive Analytics with R, Second Edition to learn complex machine learning models using R.

0
0
2108

article-image-creating-reports-using-sql-server-2016-reporting-services

Kunal Chaudhari

04 Jan 2018

6 min read

Creating reports using SQL Server 2016 Reporting Services

Kunal Chaudhari

04 Jan 2018

6 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book authored by Dinesh Priyankara and Robert C. Cain, titled SQL Server 2016 Reporting Services Cookbook.This book will help you create cross-browser and cross-platform reports using SQL Server 2016 Reporting Services.[/box] In today’s tutorial, we explore steps to create reports on multiple axis charts with SQL Server 2016. Often you will want to have multiple items plotted on a chart. In this article, we will plot two values over time, in this case, the Total Sales Amount (Excluding Tax) and the Total Tax Amount. As you might expect though, the tax amounts are going to be a small percentage of the sales amounts. By default, this would create a chart with a huge gap in the middle and a Y Axis that is quite large and difficult to pinpoint values on. To prevent this, Reporting Services allows us to place a second Y Axis on our charts. With this article, we'll explore both adding a second line to our chart as well as having it plotted on a second Y-Axis. Getting ready First, we'll create a new Reporting Services project to contain it. Name this new project Chapter03. Within the new project, create a Shared Data Source that will connect to the WideWorldImportersDW database. Name the new data source after the database, WideWorldImportersDW. Next, we'll need data. Our data will come from the sales table, and we will want to sum our totals by year, so we can plot our years across the X-Axis. For the Y-Axis, we'll use the totals of two fields: TotalExcludingTax and TaxAmount. Here is the query by which we will accomplish this: SELECT YEAR([Invoice Date Key]) AS InvoiceYear ,SUM([Total Excluding Tax]) AS TotalExcludingTax ,SUM([Tax Amount]) AS TotalTaxAmount FROM [Fact].[Sale] GROUP BY YEAR([Invoice Date Key]) How to do it… Right-click on the Reports branch in the Solution Explorer. Go to Add | New Item… from the pop-up menu. On the Add New Item dialog, select Report from the choice of templates in the middle (do not select Report wizard). At the bottom, name the report Report 03-01 Multi Axis Charts.rdl and click on Add. Go to the Report Data tool pane. Right-click on Data Sources and then click Add Data Source… from the menu. In the Name: area, enter WideWorldImportersDW. Change the data source option to the Use shared dataset source reference. In the dropdown, select WideWorldImportersDW. Click on OK to close the DataSet Properties window. Right-click on the Datasets branch and select Add Dataset…. Name the dataset SalesTotalsOverTime. Select the Use a dataset embedded in my report option. Select WideWorldImportersDW in the Data source dropdown. Paste in the query from the Getting ready area of this article: When your window resembles that of the preceding figure, click on OK. Next, go to the Toolbox pane. Drag and drop a Chart tool onto the report. Select the leftmost Line chart from the Select Chart Type window, and click on OK. Resize the chart to a larger size. (For this demo, the exact size is not important. For your production reports, you can resize as needed using the Properties window, as seen previously.) Click inside the main chart area to make the Chart Data dialog appear to the right of the chart. Click on the + (plus button) to the right of the Values. Select TotalExcludingTax. Click on the plus button again, and now pick TotalTaxAmount. Click on the + (plus button) beside Category Groups, and pick InvoiceYear. Click on Preview. You will note the large gap between the two graphed lines. In addition, the values for the Total Tax Amount are almost impossible to guess, as shown in the following figure: Return to the designer, and again click in the chart area to make the Chart Data dialog appear. In the Chart Data dialog, click on the dropdown beside TotalTaxAmount: Select Series Properties…. Click on the Axes and Chart Area page, and for Vertical axis, select Secondary: Click on OK to close the Series Properties window. Right-click on the numbers now appearing on the right in the vertical axis area, and select Secondary Vertical Axis Properties in the menu. In the Axis Options, uncheck Always include zero. Click on the Number page. Under Category, select Currency. Change the Decimal places to 0, and place a check in Use 1000 separator. Click on OK to close this window. Now move to the vertical axis on the left-hand side of the chart, right-click, and pick Vertical Axis Properties. Uncheck Always include zero. On the Number page, pick Currency, set Decimal places to 0, and check Use 1000 separator. Click on OK to close. Click on the Preview tab to see the results: You can now see a chart with a second axis. The monetary amounts are much easier to read. Further, the plotted lines have a similar rise and fall, indicating the taxes collected matched the sales totals in terms of trending. SSRS is capable of plotting multiple lines on a chart. Here we've just placed two fields, but you can add as many as you need. But do realize that the more lines included, the harder the chart can become to read. All that is needed is to put the additional fields into the Values area of the Chart Data window. When these values are of similar scale, for example, sales broken up by state, this works fine. There are times though when the scale between plotted values is so great that it distorts the entire chart, leaving one value in a slender line at the top and another at the bottom, with a huge gap in the middle. To fix this, SSRS allows a second Y-Axis to be included. This will create a scale for the field (or fields) assigned to that axis in the Series Properties window. To summarize, we learned how creating reports with multiple axis is much more simpler with SQL Server 2016 Reporting Services. If you liked our post, check out the book SQL Server 2016 Reporting Services Cookbook to know more about different types of reportings and Power BI integrations.

0
0
2851

article-image-2018-data-science-part-2-of-3

Savia Lobo

04 Jan 2018

7 min read

2018 new year resolutions to thrive in the Algorithmic World - Part 2 of 3

Savia Lobo

04 Jan 2018

7 min read

In our first resolution, we talked about learning the building blocks of data science i.e developing your technical skills. In this second resolution, we walk you through steps to stay relevant in your field and how to dodge jobs that have a high possibility of getting automated in the near future. 2nd Resolution: Stay relevant in your field even as job automation is on the rise (Time investment: half an hour every day, 2 hours on weekends) Once you have got your fundamentals right, it is important to stay relevant through continuous learning and reskilling. In addition to honing your technical skills, you must also deepen your domain expertise and keep adding to your portfolio of soft skills to stay ahead of not the just human competition but also to thrive in an automated job market. We list below some simple ways to do all these in a systematic manner. All it requires is a commitment of half an hour to one hour of your time daily for your professional development. 1. Commit to and execute a daily learning-practice-participation ritual Here are some ways to stay relevant. Follow data science blogs and podcasts relevant to your area of interest. Here are some of our favorites: Data Science 101, the journey of a data scientist The Data Skeptic for a healthy dose of scientific skepticism Data Stories for data visualization This Week in Machine Learning & AI for informative discussions with prominent people in the data science/machine learning community Linear Digressions, a podcast co-hosted by a data scientist and a software engineer attempting to make data science accessible You could also follow individual bloggers/vloggers in this space like Siraj Raval, Sebastian Raschka, Denny Britz, Rodney Brookes, Corinna Cortes, Erin LeDell Newsletters are a great way to stay up-to-date and to get a macro-level perspective. You don’t have to spend an awful lot of time doing the research yourself on many different subtopics. So, subscribe to useful newsletters on data science. You can subscribe to our newsletter here. It is a good idea to subscribe to multiple newsletters on your topic of interest to get a balanced and comprehensive view of the topic. Try to choose newsletters that have distinct perspectives, are regular and are published by people passionate about the topic. Twitter gives a whole new meaning to ‘breaking news’. Also, it is a great place to follow contemporary discussions on topics of interest where participation is open to all. When done right, it can be a gold mine for insights and learning. But often it is too overwhelming as it is viewed as a broadcasting marketing tool. Follow your role models in data science on Twitter. Or you could follow us on Twitter @PacktDataHub for curated content from key data science influencers and our own updates about the world of data science. You could also click here to keep a track of 737 twitter accounts most followed by the members of the NIPS2017 community. Quora, Reddit, Medium, and StackOverflow are great places to learn about topics in depth when you have a specific question in mind or a narrow focus area. They help you get multiple informed opinions on topics. In other words, when you choose a topic worth learning, these are great places to start. Follow them up by reading books on the topic and also by reading the seminal papers to gain a robust technical appreciation. Create a Github account and participate in Kaggle competitions. Nothing sticks as well as learning by doing. You can also browse into Data Helpers, a site voluntarily set up by Angela Bass where interested data science people can offer to help newcomers with their queries on entering the required field and anything else. 2. Identify your strengths and interests to realign your career trajectory OK, now that you have got your daily learning routine in place, it is time to think a little more strategically about your career trajectory, goals and eventually the kind of work you want to be doing. This means: Getting out of jobs that can be automated Developing skills that augment or complement AI driven tasks Finding your niche and developing deep domain expertise that AI will find hard to automate in the near future Here are some ideas to start thinking about some of the above ideas. The first step is to assess your current job role and understand how likely it is to get automated. If you are in a job that has well-defined routines and rules to follow, it is quite likely to go the AI job apocalypse route. Eg: data entry, customer support that follows scripts, invoice processing, template-based software testing or development etc. Even “creative” job such as content summarization, news aggregation, template-based photo-editing/video editing etc fall in this category. In the world of data professionals, jobs like data cleaning, database optimization, feature generation, even model building (gasp!) among others could head the same way given the right incentives. Choose today to transition out of jobs that may not exist in the next 10 years. Then instead of hitting the panic button, invest in redefining your skills in a way that would be helpful in the long run. If you are a data professional, skills such as data interpretation, data-driven storytelling, data pipeline architecture and engineering, feature engineering, and others that require a high level of human judgment skills are least likely to be replicated by machines anytime soon. By mastering skills that complement AI driven tasks and jobs, you should be able to present yourself as a lucrative option to potential employers in a highly competitive job market space. In addition to reskilling, try to find your niche and dive deep. By niche, we mean, if you are a data scientist, choose a specific technical aspect in your field, something that interests you. It could be anything from computer vision to NLP to even a class of algorithms like neural nets or a type of problem that machine learning solves such as recommender systems or classification systems. It could even be a specific phase of a data science project such as data visualization or data pipeline engineering. Master your niche while keeping up with what’s happening in other related areas. Next, understand where your strengths lie. In other words, what your expertise is, what industry or domain do you understand well or have amassed experience in. For instance, NLP, a subset of machine learning abilities, can be applied to customer reviews to mine useful insights, perform sentiment analysis, build recommendation systems in conjunction with predictive modeling among other things. In order to build an NLP model to mine some kind of insights from customer feedback, we must have some idea of what we are looking for. Your domain expertise can be of great value here. If you are in the publishing business, you would know what keywords matter most in reviews and more importantly why they matter and how to convert the findings into actionable insights - aspects that your model or even a machine learning engineer outside your industry may not understand or appreciate. Take the case of Brendan Frey and the team of researchers at Deep Genomics as a real-world example. They applied AI and machine learning (their niche expertise) to build a neural network to identify pathological mutations in genes (their domain expertise). Their knowledge of how genes get created and how they work, what a mutation looks like etc helped them feed the features and hyperparameters into their model. Similarly, you can pick up any of your niche skills and apply them in whichever field you find interesting and worthwhile. Based on your domain knowledge and area of expertise, it could range from sorting a person into a Hogwarts house because you are a Harry Potter fan to sorting them into potential patients with a high likelihood to develop diabetes because you have a background in biotechnology. This brings us to the next resolution where we cover aspects related to how your work will come to define you and why it matters that you choose your projects well.

0
0
3168

article-image-how-to-recognize-patterns-with-neural-networks-in-java

Kunal Chaudhari

04 Jan 2018

8 min read

How to recognize Patterns with Neural Networks in Java

Kunal Chaudhari

04 Jan 2018

8 min read

0
0
6834

article-image-2018-new-year-resolutions-algorithmic-world-part-1-of-3

Sugandha Lahoti

03 Jan 2018

6 min read

2018 new year resolutions to thrive in an Algorithmic World - Part 1 of 3

Sugandha Lahoti

03 Jan 2018

6 min read

We often think of Data science and machine learning as skills essential to a niche group of researchers, data scientists, and developers. But the world as we know today revolves around data and algorithms, just as it used to revolve around programming a decade back. As data science and algorithms get integrated into all aspects of businesses across industries, data science like Microsoft Excel will become ubiquitous and will serve as a handy tool which makes you better at your job no matter what your job is. Knowing data science is key to having a bright career in this algoconomy (algorithm driven economy). If you are big on new year resolutions, make yourself a promise to carve your place in the algorithm-powered world by becoming data science savvy. Follow these three resolutions to set yourself up for a bright data-driven career. Get the foundations right: Start with the building blocks of data science, i.e. developing your technical skills. Stay relevant: Keep yourself updated on the latest developments in your field and periodically invest in reskilling and upskilling. Be mindful of your impact: Finally, always remember that your work has real-world implications. Choose your projects wisely and your project goals, hypotheses, and contributors with even more care. In this three-part series, we expand on how data professionals could go about achieving these three resolutions. But the principles behind the ideas are easily transferable to anyone in any job. Think of them as algorithms that can help you achieve your desired professional outcome! You simply need to engineer the features and fine-tune the hyperparameters specific to your industry and job role. 1st Resolution: Learn the building blocks of data science If you are interested in starting a career in data science or in one that involves data, here is a simple learning roadmap for you to develop your technical skills. Start off with learning a data-friendly programming language, one that you find easy and interesting. Next, brush up your statistics skills. Nothing fancy, just your high school math and stats would do nicely. Next, learn about algorithms - what they do, what questions they answer, how many types are there and how to write one. Finally, you can put all that learning to practice by building models on top of your choice of Machine Learning framework. Now let’s see, how you can accomplish each of these tasks 1. Learn Python or any another popular data friendly programming language you find interesting (Learning period: 1 week - 2 months) If you see yourself as a data scientist in the near future, knowing a programming language is one of the first things to check off your list. We suggest you learn a data-friendly programming language like Python or R. Python is a popular choice because of its strong, fast, and easy computational capabilities for the Data Science workflow. Moreover, because of a large and active community, the likelihood of finding someone in your team or your organization who knows Python is quite high, which is an added advantage. “Python has become the most popular programming language for data science because it allows us to forget about the tedious parts of programming and offers us an environment where we can quickly jot down our ideas and put concepts directly into action.” - Sebastian Raschka We suggest learning the basics from the book Learn Python in 7 days by Mohit, Bhaskar N. Das. Then you can move on to learning Python specifically for data science with Python Data Science Essentials by Alberto Boschetti. Additionally, you can learn R, which is a highly useful language when it comes to statistics and data. For learning R, we recommend R Data science Essentials by Raja B. Koushik. You can learn more about how Python and R stand against each other in the data science domain here. Although R and Python are the most popular choices for new developers and aspiring data scientists, you can also use Java for data science, if that is your cup of tea. Scala is another alternative. 2. Brush up on Statistics (Learning period: 1 week - 3 weeks) While you are training your programming muscle, we recommend that you brush through basic mathematics (probability and statistics). Remember, you already know everything to get started with data science from your high school days. You just need to refresh your memory with a little practice. A good place to start is to understand concepts like standard deviation, probability, mean, mode, variance, kurtosis among others. Now, your normal high-school books should be enough to get started, however, an in-depth understanding is required to leverage the power of data science. We recommend the book Statistics for Data Science by James D. Miller for this. 3. Learn what machine learning algorithms do and which ones to learn (Learning period: 1 month - 3 months) Machine Learning is a powerful tool to make predictions based on huge amounts of data. According to a recent study, in the next ten years, ML algorithms are expected to replace a quarter of the jobs across the world, in fields like transport, manufacturing, architecture, healthcare and many others. So the next step in your data science journey is learning about machine learning algorithms. There are new algorithms popping up almost every day. We’ve collated a list of top ten algorithms that you should learn to effectively design reliable and robust ML systems. But fear not, you don’t need to know all of them to get started. Start with some basic algorithms that are majorly used in the real world applications like linear regression, naive bayes, and decision trees. 4. Learn TensorFlow, Keras, or any other popular machine learning framework (Learning period: 1 month - 3 months) After you have familiarized yourself with some of the machine learning algorithms, it is time you put that learning to practice by building models based on those algorithms. While there are many cloud-based machine learning options that have click-based model building features available, the best way to learn a skill is to get your hands dirty. There is a growing range of frameworks that make it easy to build complex models while allowing for high degrees of customization. Here is a list of top 10 deep learning frameworks at your disposal to choose from. Our favorite pick is TensorFlow. It’s Python-based, backed by Google, has a very good documentation, and there are tons of tutorials and videos available on the internet to guide you. You can find a comprehensive list of books for learning Tensorflow here. We also recommend learning Keras, which is a good option if you have some knowledge of Python programming and want to get started with deep learning. Try the book Deep Learning with Keras, by Antonio Gulli and Sujit Pal, to get you started. If you find learning from multiple sources daunting, just learn from Sebastian Raschka’s Python machine learning book. Once you have got your fundamentals right, it is important to stay relevant through continuous learning and reskilling. Check out part 2 where we explore how you could about doing this in a systematic and time efficient manner. In part 3, we look at ways you can own your work and become aware of its outcome.

0
0
2177

article-image-popular-data-sources-and-models-in-sap-analytics-cloud

Kunal Chaudhari

03 Jan 2018

12 min read

Popular Data sources and models in SAP Analytics Cloud

Kunal Chaudhari

03 Jan 2018

12 min read

0
0
6050

article-image-write-first-blockchain-learning-solidity-programming-15-minutes

Aaron Lazar

03 Jan 2018

15 min read

Write your first Blockchain: Learning Solidity Programming in 15 minutes

Aaron Lazar

03 Jan 2018

15 min read

[box type="note" align="" class="" width=""]This post is a book extract from the title Mastering Blockchain, authored by Imran Bashir. The book begins with the technical foundations of blockchain, teaching you the fundamentals of cryptography and how it keeps data secure.[/box] Our article aims to quickly get you up to speed with Blockchain development using the Solidity Programming language. Introducing solidity Solidity is a domain-specific language of choice for programming contracts in Ethereum. There are, however, other languages, such as serpent, Mutan, and LLL but solidity is the most popular at the time of writing this. Its syntax is closer to JavaScript and C. Solidity has evolved into a mature language over the last few years and is quite easy to use, but it still has a long way to go before it can become advanced and feature-rich like other well established languages. Nevertheless, this is the most widely used language available for programming contracts currently. It is a statically typed language, which means that variable type checking in solidity is carried out at compile time. Each variable, either state or local, must be specified with a type at compile time. This is beneficial in the sense that any validation and checking is completed at compile time and certain types of bugs, such as interpretation of data types, can be caught earlier in the development cycle instead of at run time, which could be costly, especially in the case of the blockchain/smart contracts paradigm. Other features of the language include inheritance, libraries, and the ability to define composite data types. Solidity is also a called contract-oriented language. In solidity, contracts are equivalent to the concept of classes in other object-oriented programming languages. Types Solidity has two categories of data types: value types and reference types. Value types These are explained in detail here. Boolean This data type has two possible values, true or false, for example: bool v = true; This statement assigns the value true to v. Integers This data type represents integers. A table is shown here, which shows various keywords used to declare integer data types. For example, in this code, note that uint is an alias for uint256: uint256 x; uint y; int256 z; These types can also be declared with the constant keyword, which means that no storage slot will be reserved by the compiler for these variables. In this case, each occurrence will be replaced with the actual value: uint constant z=10+10; State variables are declared outside the body of a function, and they remain available throughout the contract depending on the accessibility assigned to them and as long as the contract persists. Address This data type holds a 160-bit long (20 byte) value. This type has several members that can be used to interact with and query the contracts. These members are described here: Balance The balance member returns the balance of the address in Wei. Send This member is used to send an amount of ether to an address (Ethereum's 160-bit address) and returns true or false depending on the result of the transaction, for example, the following: address to = 0x6414cc08d148dce9ebf5a2d0b7c220ed2d3203da; address from = this; if (to.balance < 10 && from.balance > 50) to.send(20); Call functions The call, callcode, and delegatecall are provided in order to interact with functions that do not have Application Binary Interface (ABI). These functions should be used with caution as they are not safe to use due to the impact on type safety and security of the contracts. Array value types (fixed size and dynamically sized byte arrays) Solidity has fixed size and dynamically sized byte arrays. Fixed size keywords range from bytes1 to bytes32, whereas dynamically sized keywords include bytes and strings. bytes are used for raw byte data and string is used for strings encoded in UTF-8. As these arrays are returned by the value, calling them will incur gas cost. length is a member of array value types and returns the length of the byte array. An example of a static (fixed size) array is as follows: bytes32[10] bankAccounts; An example of a dynamically sized array is as follows: bytes32[] trades; Get length of trades: trades.length; Literals These are used to represent a fixed value. Integer literals Integer literals are a sequence of decimal numbers in the range of 0-9. An example is shown as follows: uint8 x = 2; String literals String literals specify a set of characters written with double or single quotes. An example is shown as follows: 'packt' "packt” Hexadecimal literals Hexadecimal literals are prefixed with the keyword hex and specified within double or single quotation marks. An example is shown as follows: (hex'AABBCC'); Enums This allows the creation of user-defined types. An example is shown as follows: enum Order{Filled, Placed, Expired }; Order private ord; ord=Order.Filled; Explicit conversion to and from all integer types is allowed with enums. Function types There are two function types: internal and external functions. Internal functions These can be used only within the context of the current contract. External functions External functions can be called via external function calls. A function in solidity can be marked as a constant. Constant functions cannot change anything in the contract; they only return values when they are invoked and do not cost any gas. This is the practical implementation of the concept of call as discussed in the previous chapter. The syntax to declare a function is shown as follows: function <nameofthefunction> (<parameter types> <name of the variable>) {internal|external} [constant] [payable] [returns (<return types> <name of the variable>)] Reference types As the name suggests, these types are passed by reference and are discussed in the following section. Arrays Arrays represent a contiguous set of elements of the same size and type laid out at a memory location. The concept is the same as any other programming language. Arrays have two members named length and push: uint[] OrderIds; Structs These constructs can be used to group a set of dissimilar data types under a logical group. These can be used to define new types, as shown in the following example: Struct Trade { uint tradeid; uint quantity; uint price; string trader; } Data location Data location specifies where a particular complex data type will be stored. Depending on the default or annotation specified, the location can be storage or memory. This is applicable to arrays and structs and can be specified using the storage or memory keywords. As copying between memory and storage can be quite expensive, specifying a location can be helpful to control the gas expenditure at times. Calldata is another memory location that is used to store function arguments. Parameters of external functions use calldata memory. By default, parameters of functions are stored in memory, whereas all other local variables make use of storage. State variables, on the other hand, are required to use storage. Mappings Mappings are used for a key to value mapping. This is a way to associate a value with a key. All values in this map are already initialized with all zeroes, for example, the following: mapping (address => uint) offers; This example shows that offers is declared as a mapping. Another example makes this clearer: mapping (string => uint) bids; bids["packt"] = 10; This is basically a dictionary or a hash table where string values are mapped to integer values. The mapping named bids has a packt string value mapped to value 10. Global variables Solidity provides a number of global variables that are always available in the global namespace. These variables provide information about blocks and transactions. Additionally, cryptographic functions and address-related variables are available as well. A subset of available functions and variables is shown as follows: keccak256(...) returns (bytes32) This function is used to compute the keccak256 hash of the argument provided to the Function: ecrecover(bytes32 hash, uint8 v, bytes32 r, bytes32 s) returns (address) This function returns the associated address of the public key from the elliptic curve signature: block.number This returns the current block number. Control structures Control structures available in solidity are if - else, do, while, for, break, continue, return. They work in a manner similar to how they work in C-language or JavaScript. Events Events in solidity can be used to log certain events in EVM logs. These are quite useful when external interfaces are required to be notified of any change or event in the contract. These logs are stored on the blockchain in transaction logs. Logs cannot be accessed from the contracts but are used as a mechanism to notify change of state or the occurrence of an event (meeting a condition) in the contract. In a simple example here, the valueEvent event will return true if the x parameter passed to function Matcher is equal to or greater than 10: contract valueChecker { uint8 price=10; event valueEvent(bool returnValue); function Matcher(uint8 x) returns (bool) { if (x>=price) { valueEvent(true); return true; } } } Inheritance Inheritance is supported in solidity. The is keyword is used to derive a contract from another contract. In the following example, valueChecker2 is derived from the valueChecker contract. The derived contract has access to all nonprivate members of the parent contract: contract valueChecker { uint8 price=10; event valueEvent(bool returnValue); function Matcher(uint8 x) returns (bool) { if (x>=price) { valueEvent(true); return true; } } } contract valueChecker2 is valueChecker { function Matcher2() returns (uint) { return price + 10; } } In the preceding example, if uint8 price = 10 is changed to uint8 private price = 10, then it will not be accessible by the valuechecker2 contract. This is because now the member is declared as private, it is not allowed to be accessed by any other contract. Libraries Libraries are deployed only once at a specific address and their code is called via CALLCODE/DELEGATECALL Opcode of the EVM. The key idea behind libraries is code reusability. They are similar to contracts and act as base contracts to the calling contracts. A library can be declared as shown in the following example: library Addition { function Add(uint x,uint y) returns (uint z) { return x + y; } } This library can then be called in the contract, as shown here. First, it needs to be imported and it can be used anywhere in the code. A simple example is shown as follows: Import "Addition.sol" function Addtwovalues() returns(uint) { return Addition.Add(100,100); } There are a few limitations with libraries; for example, they cannot have state variables and cannot inherit or be inherited. Moreover, they cannot receive Ether either; this is in contrast to contracts that can receive Ether. Functions Functions in solidity are modules of code that are associated with a contract. Functions are declared with a name, optional parameters, access modifier, optional constant keyword, and optional return type. This is shown in the following example: function orderMatcher(uint x) private constant returns(bool returnvalue) In the preceding example, function is the keyword used to declare the function. orderMatcher is the function name, uint x is an optional parameter, private is the access modifier/specifier that controls access to the function from external contracts, constant is an optional keyword used to specify that this function does not change anything in the contract but is used only to retrieve values from the contract instead, and returns (bool returnvalue) is the optional return type of the function. How to define a function: The syntax of defining a function is shown as follows: function <name of the function>(<parameters>) <visibility specifier> returns (<return data type> <name of the variable>) { <function body> } Function signature: Functions in solidity are identified by its signature, which is the first four bytes of the keccak-256 hash of its full signature string. This is also visible in browser solidity, as shown in the following screenshot. D99c89cb is the first four bytes of 32 byte keccak-256 hash of the function named Matcher. In this example function, Matcher has the signature hash of d99c89cb. This information is useful in order to build interfaces. Input parameters of a function: Input parameters of a function are declared in the form of <data type> <parameter name>. This example clarifies the concept where uint x and uint y are input parameters of the checkValues function: contract myContract { function checkValues(uint x, uint y) { } } Output parameters of a function: Output parameters of a function are declared in the form of <data type> <parameter name>. This example shows a simple function returning a uint value: contract myContract { Function getValue() returns (uint z) { z=x+y; } } A function can return multiple values. In the preceding example function, getValue only returns one value, but a function can return up to 14 values of different data types. The names of the unused return parameters can be omitted optionally. Internal function calls: Functions within the context of the current contract can be called internally in a direct manner. These calls are made to call the functions that exist within the same contract. These calls result in simple JUMP calls at the EVM byte code level. External function calls: External function calls are made via message calls from a contract to another contract. In this case, all function parameters are copied to the memory. If a call to an internal function is made using the this keyword, it is also considered an external call. The this variable is a pointer that refers to the current contract. It is explicitly convertible to an address and all members for a contract are inherited from the address. Fall back functions: This is an unnamed function in a contract with no arguments and return data. This function executes every time ether is received. It is required to be implemented within a contract if the contract is intended to receive ether; otherwise, an exception will be thrown and ether will be returned. This function also executes if no other function signatures match in the contract. If the contract is expected to receive ether, then the fall back function should be declared with the payable modifier. The payable is required; otherwise, this function will not be able to receive any ether. This function can be called using the address.call() method as, for example, in the following: function () { throw; } In this case, if the fallback function is called according to the conditions described earlier; it will call throw, which will roll back the state to what it was before making the call. It can also be some other construct than throw; for example, it can log an event that can be used as an alert to feed back the outcome of the call to the calling application. Modifier functions: These functions are used to change the behavior of a function and can be called before other functions. Usually, they are used to check some conditions or verification before executing the function. _(underscore) is used in the modifier functions that will be replaced with the actual body of the function when the modifier is called. Basically, it symbolizes the function that needs to be guarded. This concept is similar to guard functions in other languages. Constructor function: This is an optional function that has the same name as the contract and is executed once a contract is created. Constructor functions cannot be called later on by users, and there is only one constructor allowed in a contract. This implies that no overloading functionality is available. Function visibility specifiers (access modifiers): Functions can be defined with four access specifiers as follows: External: These functions are accessible from other contracts and transactions. They cannot be called internally unless the this keyword is used. Public: By default, functions are public. They can be called either internally or using messages. Internal: Internal functions are visible to other derived contracts from the parent contract. Private: Private functions are only visible to the same contract they are declared in. Other important keywords/functions throw: throw is used to stop execution. As a result, all state changes are reverted. In this case, no gas is returned to the transaction originator because all the remaining gas is consumed. Layout of a solidity source code file Version pragma In order to address compatibility issues that may arise from future versions of the solidity compiler version, pragma can be used to specify the version of the compatible compiler as, for example, in the following: pragma solidity ^0.5.0 This will ensure that the source file does not compile with versions smaller than 0.5.0 and versions starting from 0.6.0. Import Import in solidity allows the importing of symbols from the existing solidity files into the current global scope. This is similar to import statements available in JavaScript, as for example, in the following: Import "module-name"; Comments Comments can be added in the solidity source code file in a manner similar to C-language. Multiple line comments are enclosed in /* and */, whereas single line comments start with //. An example solidity program is as follows, showing the use of pragma, import, and comments: To summarize, we went through a brief introduction to the solidity language. Detailed documentation and coding guidelines are available online. If you found this article useful, and would like to learn more about building blockchains, go ahead and grab the book Mastering Blockchain, authored by Imran Bashir.

0
0
4083

article-image-getting-started-with-linear-and-logistic-regression

Richa Tripathi

03 Jan 2018

7 min read

Getting started with Linear and logistic regression

Richa Tripathi

03 Jan 2018

7 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Alberto Boschetti and Luca Massaron, titled Python Data Science Essentials - Second Edition. This book provides the fundamentals of data science with Python by leveraging the latest tools and libraries such as Jupyter notebooks, NumPy, pandas and scikit-learn.[/box] In this article, we will learn about two easy and effective classifiers known as linear and logistic regressors. Linear and logistic regressions are the two methods that can be used to linearly predict a target value or a target class, respectively. Let's start with an example of linear regression predicting a target value. In this article, we will again use the Boston dataset, which contains 506 samples, 13 features (all real numbers), and a (real) numerical target (which renders it ideal for regression problems). We will divide our dataset into two sections by using a train/test split cross- validation to test our methodology (in the example, 80 percent of our dataset goes in training and 20 percent in test): In: from sklearn.datasets import load_boston boston = load_boston() from sklearn.cross_validation import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=0) The dataset is now loaded and the train/test pairs have been created. In the next few steps, we're going to train and fit the regressor in the training set and predict the target variable in the test dataset. We are then going to measure the accuracy of the regression task by using the MAE score. As for the scoring function, we decided on the mean absolute error in order to penalize errors just proportionally to the size of the error itself (using the more common mean squared error would have emphasized larger errors more, since errors are squared): In: from sklearn.linear_model import LinearRegression regr = LinearRegression() regr.fit(X_train, Y_train) Y_pred = regr.predict(X_test) from sklearn.metrics import mean_absolute_error print ("MAE", mean_absolute_error(Y_test, Y_pred)) Out: MAE 3.84281058945 Great! We achieved our goal in the simplest possible way. Now, let's take a look at the time needed to train the system: In: %timeit regr.fit(X_train, y_train) Out: 1000 loops, best of 3: 381 µs per loop That was really quick! The results, of course, are not all that great. However, linear regression offers a very good trade-off between performance and speed of training and simplicity. Now, let's take a look under the hood of the algorithm. Why is it so fast but not that accurate? The answer is somewhat expected-this is so because it's a very simple linear method. Let's briefly dig into a mathematical explanation of this technique. Let's name X(i) the ith sample (it is actually a row vector of numerical features) and Y(i) its target. The goal of linear regression is to find a good weight (column) vector W, which is best suited for approximating the target value when multiplied by the observation vector, that is, X(i) * W ≈ Y(i) (note that this is a dot product). W should be the same, and the best for every observation. Thus, solving the following equation becomes easy: W can be found easily with the help of a matrix inversion (or, more likely, a pseudo- inversion, which is a computationally efficient way) and a dot product. Here's the reason linear regression is so fast. Note that this is a simplistic explanation—the real method adds another virtual feature to compensate for the bias of the process. However, this does not change the complexity of the regression algorithm much. We progress now to logistic regression. In spite of what the name suggests, it is a classifier and not a regressor. It must be used in classification problems where you are dealing with only two classes (binary classification). Typically, target labels are Boolean; that is, they have values as either True/False or 0/1 (indicating the presence or absence of the expected outcome). In our example, we keep on using the same dataset. The target is to guess whether a house value is over or under the average of a threshold value we are interested in. In essence, we moved from a regression problem to a binary classification one because now our target is to guess how likely an example is to be a part of a group. We start preparing the dataset by using the following commands: In: import numpy as np avg_price_house = np.average(boston.target) high_priced_idx = (Y_train >= avg_price_house) Y_train[high_priced_idx] = 1 Y_train[np.logical_not(high_priced_idx)] = 0 Y_train = Y_train.astype(np.int8) high_priced_idx = (Y_test >= avg_price_house) Y_test[high_priced_idx] = 1 Y_test[np.logical_not(high_priced_idx)] = 0 Y_test = Y_test.astype(np.int8) Now, we will train and apply the classifier. To measure its performance, we will simply print the classification report: In: from sklearn.linear_model import LogisticRegression clf = LogisticRegression() clf.fit(X_train, Y_train) Y_pred = clf.predict(X_test) from sklearn.metrics import classification_report print (classification_report(Y_test, Y_pred)) Out: precision recall f1-score support 0 0.81 0.90 0.85 61 1 0.82 0.68 0.75 41 avg / total 0.83 0.81 0.81 102 The output of this command can change on your machine depending on the optimization process of the LogisticRegression classifier (no seed has been set for replicability of the results). The precision and recall values are over 80 percent. This is already a good result for a very simple method. The training speed is impressive, too. Thanks to Jupyter Notebook, we can have a comparison of the algorithm with a more advanced classifier in terms of performance and speed: In: %timeit clf.fit(X_train, y_train) 100 loops, best of 3: 2.54 ms per loop What's under the hood of a logistic regression? The simplest classifier a person could imagine (apart from a mean) is a linear regressor followed by a hard threshold: Here, sign(a) = +1 if a is greater or equal than zero, and 0 otherwise. To smooth down the hardness of the threshold and predict the probability of belonging to a class, logistic regression resorts to the logit function. Its output is a (0 to 1] real number (0.0 and 1.0 are attainable only via rounding, otherwise the logit function just tends toward them), which indicates the probability that the observation belongs to class 1. Using a formula, that becomes: Here Why the logistic function instead of some other function? Well, because it just works pretty well in most real cases. In the remaining cases, if you're not completely satisfied with its results, you may want to try some other nonlinear functions instead (there is limited variety of suitable ones, though). To summarize, we learned about two classic algorithms used in machine learning namely linear and logistic regression. With the help of an example, we put the theory into practice by predicting a target value which helped us understand the trade-offs and benefits. If you enjoyed this excerpt, check out the book Python Data Science Essentials - Second Edition to know more about other popular machine learning algorithms such as Naive Bayes, k-Nearest Neighbors (kNN), Support Vector Machines (SVM) etc.

0
0
2673

article-image-getting-started-with-web-scraping-with-beautiful-soup

Richa Tripathi

02 Jan 2018

5 min read

Getting started with Web Scraping with Beautiful Soup

Richa Tripathi

02 Jan 2018

5 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Alberto Boschetti and Luca Massaron, titled Python Data Science Essentials - Second Edition. This book will take you beyond the fundamentals of data science by diving into the world of data visualizations, web development, and deep learning.[/box] In this article, we will learn to scrape a web page and learn to download it manually. This process happens more often than you might expect; and it's a very popular topic of interest in data science. For example: Financial institutions scrape the Web to extract fresh details and information about the companies in their portfolio. Newspapers, social networks, blogs, forums, and corporate websites are the ideal targets for these analyses. Advertisement and media companies analyze sentiment and the popularity of many pieces of the Web to understand people's reactions. Companies specialized in insight analysis and recommendation scrape the Web to understand patterns and model user behaviors. Comparison websites use the web to compare prices, products, and services, offering the user an updated synoptic table of the current situation. Unfortunately, understanding websites is a very hard work since each website is built and maintained by different people, with different infrastructures, locations, languages, and structures. The only common aspect among them is represented by the standard exposed language, which, most of the time, is HTML. That's why the vast majority of the web scrapers, available as of today, are only able to understand and navigate HTML pages in a general-purpose way. One of the most used web parsers is named BeautifulSoup. It's written in Python, and it's very stable and simple to use. Moreover, it's able to detect errors and pieces of malformed code in the HTML page (always remember that web pages are often human-made products and prone to errors). A complete description of Beautiful Soup would require an entire book; here we will see just a few bits. First at all, Beautiful Soup is not a crawler. In order to download a web page, we should use the urllib library, for example. Let's now download the code behind the William Shakespeare page on Wikipedia: In: import urllib.request url = 'https://en.wikipedia.org/wiki/William_Shakespeare' request = urllib.request.Request(url) response = urllib.request.urlopen(request) It's time to instruct Beautiful Soup to read the resource and parse it using the HTML parser: In: from bs4 import BeautifulSoup soup = BeautifulSoup(response, 'html.parser') Now the soup is ready, and can be queried. To extract the title, we can simply ask for the title attribute: In: soup.title Out: <title>William Shakespeare - Wikipedia, the free encyclopedia</title> As you can see, the whole title tag is returned, allowing a deeper investigation of the nested HTML structure. What if we want to know the categories associated with the Wikipedia page of William Shakespeare? It can be very useful to create a graph of the entry, simply recurrently downloading and parsing adjacent pages. We should first manually analyze the HTML page itself to figure out what's the best HTML tag containing the information we're looking for. Remember here the “no free lunch” theorem in data science: there are no auto- discovery functions, and furthermore, things can change if Wikipedia modifies its format. After a manual analysis, we discover that categories are inside a div named “mw-normal- catlinks”; excluding the first link, all the others are okay. Now it's time to program. Let's put into code what we've observed, printing for each category, the title of the linked page and the relative link to it: In: section = soup.find_all(id='mw-normal-catlinks')[0] for catlink in section.find_all("a")[1:]: print(catlink.get("title"), "->", catlink.get("href")) Out: Category:William Shakespeare -> /wiki/Category:William_Shakespeare Category:1564 births -> /wiki/Category:1564_births Category:1616 deaths -> /wiki/Category:1616_deaths Category:16th-century English male actors -> /wiki/Category:16th- century_English_male_actors Category:English male stage actors -> /wiki/Category:English_male_stage_actors Category:16th-century English writers -> /wiki/Category:16th- century_English_writers We've used the find_all method twice to find all the HTML tags with the text contained in the argument. In the first case, we were specifically looking for an ID; in the second case, we were looking for all the "a" tags. Given the output then, and using the same code with the new URLs, it's possible to download recursively the Wikipedia category pages, arriving at this point at the ancestor categories. A final note about scraping: always remember that this practice is not always allowed, and when so, remember to tune down the rate of the download (at high rates, the website's server may think you're doing a small-scale DoS attack and will probably blacklist/ban your IP address). For more information, you can read the terms and conditions of the website, or simply contact the administrators. Downloading data from various sites where there are copyright laws in place is bound to get you into real legal trouble. That’s also why most companies that employ web scraping use external vendors for this task, or have a special arrangement with the site owners. We learned the process of data scraping, its practical use case in the industries and finally we put the theory to practice by learning how to scrape data from web with the help of Beautiful Soup. If you enjoyed this excerpt, check out the book Python Data Science Essentials - Second Edition to know more about different libraries such as pandas and NumPy, that can provide you with all the tools to load and effectively manage your data

0
0
3052

article-image-6-popular-regression-techniques-need-know

Amey Varangaonkar

02 Jan 2018

8 min read

6 Popular Regression Techniques You Need To Know

Amey Varangaonkar

02 Jan 2018

8 min read

0
0
2603

article-image-create-treemap-packed-bubble-chart-tableau

Sugandha Lahoti

02 Jan 2018

4 min read

How to create a Treemap and Packed Bubble Chart in Tableau

Sugandha Lahoti

02 Jan 2018

4 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Shweta Sankhe-Savale titled Tableau Cookbook – Recipes for Data Visualization. This cookbook has simple recipes for creating visualizations in Tableau. It covers the fundamentals of data visualization such as getting familiarized with Tableau Desktop and also goes to more complex problems like creating dynamic analytics with parameters, and advanced calculations.[/box] In today’s tutorial, we will learn how to create a Treemap and a packed Bubble chart in Tableau. Treemaps Treemaps are useful for representing hierarchical (tree-structured) data as a part-to-whole relationship. It shows data as a set of nested rectangles, and each branch of the tree is given a rectangle, which represents the amount of data it comprises. These can then be further divided into smaller rectangles that represent sub branches, based on its proportion to the whole. We can show information via the color and size of the rectangles and find out patterns that would be difficult to spot in other ways. They make efficient use of the space and hence can display a lot of items in a single visualization simultaneously. Getting Ready We will create a Treemap to show the sales and profit across various product subcategories. Let's see how to create a Treemap. How to Do it We will first create a new sheet and rename it as Treemap. Next, we will drag Sales from the Measures pane and drop it into the Size shelf. We will then drag Profit from Measures pane and drop it into the Color shelf. Our Mark type will automatically change to show squares. Refer to the following image: 5. Next, we will drop Sub-Category into the Label shelf in the Marks card, and we will get the output as shown in the following image: How it Works In the preceding image, since we have placed Sales in the Size shelf, we are inferring this: the greater the size, the higher the sales value; the smaller the size, the smaller the sales value. Since the Treemap is sorted in descending order of Size, we will see the biggest block in the top left-hand side corner and the smaller block in the bottom right-hand side corner. Further, we placed Profit in the Color shelf. There are some subcategories where the profit is negative and hence Tableau selects the orange/blue diverging color. Thus, when the color blue is the darkest, it indicates the Most profit. However, the orange color indicates that a particular subcategory is in a loss scenario. So, in the preceding chart, Phones has the maximum number of sales. Further, Copiers has the highest profit. Tables, on the other hand, is non-profitable. Packed Bubble Charts A Packed bubble chart is a cluster of circles where we use dimensions to define individual bubbles, and the size and/or color of the individual circles represent measures. Bubble charts have many benefits and one of them is to let us spot categories easily and compare them to the rest of the data by looking at the size of the bubble. This simple data visualization technique can provide insight in a visually attractive format. The Packed Bubble chart in Tableau uses the Circle mark type. Getting Ready To create a packed bubble chart, we will continue with the same example that we saw in the Treemap recipe. In the following section, we will see how we can convert the Treemap we created earlier into a Packed Bubble chart. How to Do it Let us duplicate the Tree Map sheet name and rename it to Packed Bubble chart. Next, change the marks from Square to Circle from the Marks dropdown in the Marks card. The output will be as shown in the following image: How it works In the Packed Bubble chart, there is no specific sort of order for Bubbles. The size and/or color are what defines the chart; the bigger or darker the circle, the greater the value. So, in the preceding example, we have Sales in the Size shelf, Profit in the Color shelf, and Sub-Category in the Label shelf. Thus, when we look at it, we understand that Phones has the most sales. Further, Copiers has the highest profit. Tables, on the other hand, is non-profitable even though the size indicates that the sales are fairly good. We saw two ways to visualize data by using Treemap and Packed Bubble chart types in Tableau. If you found this post is useful, do check out the book Tableau Cookbook – Recipes for Data Visualization to create more such charts, interactive dashboards and other beautiful data visualizations with Tableau.

0
0
4937

article-image-2017-generative-adversarial-networks-gans-research-milestones

Savia Lobo

30 Dec 2017

9 min read

2017 Generative Adversarial Networks (GANs) Research Milestones

Savia Lobo

30 Dec 2017

9 min read

Generative Adversarial Models, introduced by Ian Goodfellow, are the next big revolution in the field of deep learning. Why? Because of their ability to perform semi-supervised learning where there is a vast majority of data is unlabelled. Here, GANs can efficiently carry out image generation tasks and other tasks such as converting sketches to an image, conversion of satellite images to a map, and many other tasks. GANs are capable of generating realistic images in any circumstances, for instance, giving some text written in a particular handwriting as an input to the generative model in order to generate more texts in the similar handwriting. The speciality of these GANs is that as compared to discriminative models, these generative models make use of a joint distribution probability to generate more likely samples. In short, these generative models or GANs are an improvisation to the discriminative models. Let’s explore some of the research papers that are contributing to further advancements in GANs. CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks This paper talks about CycleGANs, a class of generative Adversarial networks that carry out Image-to-Image translation. This means, capturing special characteristics of one image collection and figuring out how these characteristics could be translated into the other image collection, all in the absence of any paired training examples. CycleGANs method can also be applied in variety of applications such as Collection Style Transfer, Object Transfiguration, season transfer and photo enhancement. Cycle GAN architecture Source: GitHub CycleGANs are built upon the advantages of PIX2PIX architecture. The key advantage of CycleGANs model is, it allows to point the model at two discrete, unpaired collection of images.For example, one image collection say Group A, would consist photos of landscapes in summer whereas Group B would include photos of landscapes in winter. The CycleGAN model can learn to translate the images between these two aesthetics without the need to merge tightly correlated matches together into a single X/Y training image. Source: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks The way CycleGANs are able to learn such great translations without having explicit X/Y training images involves introducing the idea of a full translation cycle to determine how good the entire translation system is, thus improving both generators at the same time. Source: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks Currently, the applications of CycleGANs can be seen in Image-to-Image translation and video translations. For example they can be seen used in Animal Transfiguration, Turning portrait faces into doll faces, and so on. Further ahead, we could potentially see its implementations in audio, text, etc., would help us in generating new data for training. Although this method has compelling results, it also has some limitations The geometric changes within an image are not fully successful (for instance, the cat to dog transformation showed minute success). This could be caused by the generator architecture choices, which are tailored for good performance on the appearance changes. Thus, handling more varied and extreme transformations, especially geometric changes, is an important problem. Failure caused by the distribution characteristics of the training datasets. For instance, in the horse to zebra transfiguration, the model got confused as it was trained on the wild horse and zebra synsets of ImageNet, which does not contain images of a person riding a horse or zebra. These and some other limitations are described in the research paper. To read more about CycleGANs in detail visit the link here. Wasserstein GAN In this paper, we get an exposure to Wasserstein GANs and how they overcomes the drawbacks in original GANs. Although GANs have shown a drastic success in realistic image generation, the training however is not that easy as the process is slow and unstable. In the paper proposed for WGANs, it is empirically shown that WGANs cure the training problem. Wasserstein distance, also known as Earth Mover’s (EM) distance, is a measure of distance between two probability distributions. The basic idea in WGAN is to replace the loss function so that there always exists a non-zero gradient. This can be done using Wasserstein distance between the generator distribution and the data distribution. Training these WGANs does not require keeping a balance in training of the discriminator and the generator. It also doesn’t require a design of the network architecture too. One of the most fascinating practical benefits of WGANs is the ability to continuously estimate the EM distance by training the discriminator to an optimal level. The learning curves when used for plotting are useful for debugging and hyperparameter searches. These curves also correlate well with the observed sample quality and improved stability of the optimization process. Thus, Wasserstein GANs are an alternative to traditional GAN training with features such as: Improvement in the stability of learning Elimination of problems like mode collapse Provide meaningful learning curves useful for debugging and hyperparameter searches Furthermore, the paper also showcases that the corresponding optimization problem is sound, and provides extensive theoretical work highlighting the deep connections to other distances between distributions. The Wasserstein GAN has been utilized to train a language translation machine. The condition here is that there is no parallel data between the word embeddings between the two languages. Wasserstein GANs have been used to perform English-Russian and English-Chinese language mappings. Limitations of WGANs: WGANs suffer from unstable training at times, when one uses a momentum based optimizer or when one uses high learning rates. Includes slow convergence after weight clipping, especially when clipping window is too large. It also suffers from the vanishing gradient problem when the clipping window is too small. To have a detailed understanding of WGANs have a look at the research paper here. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets This paper describes InfoGAN (Information-theoretic extension to the Generative Adversarial Network). It can learn disentangled representations in a completely unsupervised manner. In traditional GANs, learned dataset is entangled i.e. encoded in a complex manner within the data space. However, if the representation is disentangled, it would be easy to implement and easy to apply tasks on it. InfoGAN solves the entangled data problem in GANs. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, extracts poses of objects correctly irrespective of the lighting conditions within the 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hairstyles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. InfoGAN does not require any kind of supervision. In comparison to InfoGAN, the only other unsupervised method that learns disentangled representations is hossRBM, a higher-order extension of the spike-and-slab restricted Boltzmann machine which disentangles emotion from identity on the Toronto Face Dataset. However, hossRBM can only disentangle discrete latent factors, and its computation cost grows exponentially in the number of factors. Whereas, InfoGAN can disentangle both discrete and continuous latent factors, scale to complicated datasets, and typically requires no more training time than regular GAN. In the experiments given in the paper, firstly the comparison of InfoGAN with prior approaches on relatively clean datasets is shown. Another experiment shown is, where InfoGAN can learn interpretable representations on complex datasets (here no previous unsupervised approach is known to learn representations of comparable quality.) Thus, InfoGAN is completely unsupervised and learns interpretable and disentangled representations on challenging datasets. Additionally, InfoGAN adds only negligible computation cost on top of GAN and is easy to train. The core idea of using mutual information to induce representation can be applied to other methods like VAE (Variational AutoEncoder) in future. The other possibilities with InfoGAN in future could be,learning hierarchical latent representations, improving semi-supervised learning with better codes, and using InfoGAN as a high-dimensional data discovery tool. To know more about this research paper in detail, visit the link given here. Progressive growing of GANs for improved Quality, Stability, and Variation This paper describes a brand new method for training your Generative Adversarial Networks. The basic idea here is to train both the generator and the discriminator progressively. This means, starting from a low resolution and adding new layers so that the model increases in providing images with finer details as training progresses. Such a method speeds up the training and also stabilizes it to a greater extent, which in turn produces images of unprecedented quality. For instance, a higher quality version of the CELEBA images dataset that provides output resolutions up to 10242 pixels. Source: https://arxiv.org/pdf/1710.10196.pdf When new layers are added to the networks, they fade in smoothly. This helps in avoiding the sudden shocks to the already well-trained, smaller resolution layers. Also, the progressive training has various other benefits. The generation of smaller images is substantially more stable because there is less class information and fewer modes By increasing the resolution little by little, we are continuously asking a much simpler question compared to the end goal of discovering a mapping from latent vectors to e.g. 10242 images Progressive growing of GANs also reduces the training time. In addition to this, most of the iterations are done at lower resolutions, and the quality of the result obtained is upto 2-6 times faster, depending on the resolution of the final output. Thus, by progressively training GANs results into better quality, stability, and variation in images. This may also lead to true photorealism in near future. The paper concludes with the fact that, though there are certain limitations with this training method, which include semantic sensibility and understanding dataset-dependent constraints(such as certain objects being straight rather than curved). This leaves a lot to be desired from GANs and there is also room for improvement in the micro-structure of the images. To have a thorough understanding of this research paper, read the paper here.

0
0
3085

article-image-create-box-whisker-plot-tableau

Sugandha Lahoti

30 Dec 2017

5 min read

How to create a Box and Whisker Plot in Tableau

Sugandha Lahoti

30 Dec 2017

5 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Shweta Sankhe-Savale, titled Tableau Cookbook – Recipes for Data Visualization. With the recipes in this book, learn to create beautiful data visualizations in no time on Tableau.[/box] In today’s tutorial, we will learn how to create a Box and Whisker plot in Tableau. The Box plot, or Box and Whisker plot as it is popularly known, is a convenient statistical representation of the variation in a statistical population. It is a great way of showing a number of data points as well as showing the outliers and the central tendencies of data. This visual representation of the distribution within a dataset was first introduced by American mathematician John W. Tukey in 1969. A box plot is significantly easier to plot than say a histogram and it does not require the user to make assumptions regarding the bin sizes and number of bins; and yet it gives significant insight into the distribution of the dataset. The box plot primarily consists of four parts: The median provides the central tendency of our dataset. It is the value that divides our dataset into two parts, values that are either higher or lower than the median. The position of the median within the box indicates the skewness in the data as it shifts either towards the upper or lower quartile. The upper and lower quartiles, which form the box, represent the degree of dispersion or spread of the data between them. The difference between the upper and lower quartile is called the Interquartile Range (IQR) and it indicates the mid-spread within which 50 percentage of the points in our dataset lie. The upper and lower whiskers in a box plot can either be plotted at the maximum and minimum value in the dataset, or 1.5 times the IQR on the upper and lower side. Plotting the whiskers at the maximum and minimum values includes 100 percentage of all values in the dataset including all the outliers. Whereas plotting the whiskers at 1.5 times the IQR on the upper and lower side represents outliers in the data beyond the whiskers. The points lying between the lower whisker and the lower quartile are the lower 25 percent of values in the dataset, whereas the points lying between the upper whisker and the upper quartile are the upper 25 percent of values in the dataset. In a typical normal distribution, each part of the box plot will be equally spaced. However, in most cases, the box plot will quickly show the underlying variations and trends in data and allows for easy comparison between datasets: Getting Ready Create a Box and Whisker plot in a new sheet in a workbook. For this purpose, we will connect to an Excel file named Data for Box plot & Gantt chart, which has been uploaded on https://1drv.ms/f/ s!Av5QCoyLTBpnhkGyrRrZQWPHWpcY. Let us save this Excel file in Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder. The data contains information about customers in terms of their gender and recorded weight. The data contains 100 records, one record per customer. Using this data, let us look at how we can create a Box and Whisker plot. How to do it Once we have downloaded and saved the data from the link provided in the Getting ready section, we will create a new worksheet in our existing workbook and rename it to Box and Whisker plot. Since we haven't connected to the new dataset yet, establish a new data connection by pressing Ctrl + D on our keyboard. Select the Excel option and connect to the Data for Box plot & Gantt chart file, which is saved in our Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder. Next let us select the table named Box and Whisker plot data by doubleclicking on it. Let us go ahead with the Live option to connect to this data. Next let us multi-select the Customer and Gender field from the Dimensions pane and the Weight from the Measures pane by doing a Ctrl + Select. Refer to the following image: 6. Next let us click on the Show Me! button and select the box-and-whisker plot. Refer to the highlighted section in the following image: 7. Once we click on the box-and-whisker plot option, we will see the following view: How it works In the preceding chart, we get two box and whisker plots: one for each gender. The whiskers are the maximum and minimum extent of the data. Furthermore, in each category we can see some circles, which are essentially representing a customer. Thus, within each gender category, the graph is showing the distribution of customers by their respective weights. When we hover over any of these circles, we can see details of the customer in terms of name, gender, and recorded weight in the tooltip. Refer to the following image: However, when we hover over the box (gray section), we will see the details in terms of median, lower quartiles, upper quartiles, and so on. Refer to the following image: Thus, a summary of the box plot that we created is as follows: In more simple terms, for the female category, the majority of the population lies between the weight range of 44 to 75, whereas for the male category, the majority of the population lies between the weight range of 44 to 82. Please note that in our visualization, even though the Row shelf displays SUM(Weight), since we have Customer in the Detail shelf, there's only one entry per customer, so SUM(Weight) is actually the same as MIN(Weight), MAX(Weight), or AVG(Weight). We learnt the basics of Box and Whisker plot and how to create them using Tableau. If you had fun with this recipe, do check out our book Tableau Cookbook – Recipes for Data Visualization to create interactive dashboards and beautiful data visualizations with Tableau.

0
0
7265

article-image-saving-backups-on-cloud-services-with-elasticsearch-plugins

Savia Lobo

29 Dec 2017

9 min read

Saving backups on cloud services with ElasticSearch plugins

Savia Lobo

29 Dec 2017

9 min read

[box type="note" align="" class="" width=""]Our article is a book excerpt taken from Mastering Elasticsearch 5.x. written by Bharvi Dixit. This book guides you through the intermediate and advanced functionalities of Elasticsearch, such as querying, indexing, searching, and modifying data. In other words, you will gain all the knowledge necessary to master Elasticsearch and put into efficient use.[/box] This article will explain how Elasticsearch, with the help of additional plugins, allows us to push our data outside of the cluster to the cloud. There are three possibilities where our repository can be located, at least using officially supported plugins: The S3 repository: AWS The HDFS repository: Hadoop clusters The GCS repository: Google cloud services The Azure repository: Microsoft's cloud platform Let's go through these repositories to see how we can push our backup data on the cloud services. The S3 repository The S3 repository is a part of the Elasticsearch AWS plugin, so to use S3 as the repository for snapshotting, we need to install the plugin first on every node of the cluster and each node must be restarted after the plugin installation: sudo bin/elasticsearch-plugin install repository-s3 After installing the plugin on every Elasticsearch node in the cluster, we need to alter their configuration (the elasticsearch.yml file) so that the AWS access information is available. The example configuration can look like this: cloud: aws: access_key: YOUR_ACCESS_KEY secret_key: YOUT_SECRET_KEY To create the S3 repository that Elasticsearch will use for snapshotting, we need to run a command similar to the following one: curl -XPUT 'http://localhost:9200/_snapshot/my_s3_repository' -d '{ "type": "s3", "settings": { "bucket": "bucket_name" } }' The following settings are supported when defining an S3-based repository: bucket: This is the required parameter describing the Amazon S3 bucket to which the Elasticsearch data will be written and from which Elasticsearch will read the data. region: This is the name of the AWS region where the bucket resides. By default, the US Standard region is used. base_path: By default, Elasticsearch puts the data in the root directory. This parameter allows you to change it and alter the place where the data is placed in the repository. server_side_encryption: By default, encryption is turned off. You can set this parameter to true in order to use the AES256 algorithm to store data. chunk_size: By default, this is set to 1GB and specifies the size of the data chunk that will be sent. If the snapshot size is larger than the chunk_size, Elasticsearch will split the data into smaller chunks that are not larger than the size specified in the chunk_size. The chunk size can be specified in size notations such as 1GB, 100mb, and 1024kB. buffer_size: The size of this buffer is set to 100mb by default. When the chunk size is greater than the value of the buffer_size, Elasticsearch will split it into buffer_size fragments and use the AWS multipart API to send it. The buffer size cannot be set lower than 5 MB because it disallows the use of the multipart API. endpoint: This defaults to AWS's default S3 endpoint. Setting a region overrides the endpoint setting. protocol: Specifies whether to use http or https. It default to cloud.aws.protocol or cloud.aws.s3.protocol. compress: Defaults to false and when set to true. This option allows snapshot metadata files to be stored in a compressed format. Please note that index files are already compressed by default. read_only: Makes a repository to be read only. It defaults to false. max_retries: This specifies the number of retries Elasticsearch will take before giving up on storing or retrieving the snapshot. By default, it is set to 3. In addition to the preceding properties, we are allowed to set two additional properties that can overwrite the credentials stored in elasticsearch.yml, which will be used to connect to S3. This is especially handy when you want to use several S3 repositories, each with its own security settings: access_key: This overwrites cloud.aws.access_key from elasticsearch.yml secret_key: This overwrites cloud.aws.secret_key from elasticsearch.yml Note: AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through that VPC's NAT instance. If your VPC's NAT instance is a smaller instance size (for example, a t1.micro) or is handling a high volume of network traffic, your bandwidth to S3 may be limited by that NAT instance's networking bandwidth limitations. So, if you running your Elasticsearch cluster inside a VPC then make sure that you are using instances with a high networking bandwidth and there is no network congestion. Note: Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC's Internet gateway and not be bandwidth limited by the VPC's NAT instance. The HDFS repository If you use Hadoop and its HDFS (http://wiki.apache.org/hadoop/HDFS) filesystem, a good alternative to back up the Elasticsearch data is to store it in your Hadoop cluster. As with the case of S3, there is a dedicated plugin for this. To install it, we can use the following command: sudo bin/elasticsearch-plugin install repository-hdfs Note : The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If your Hadoop distribution is not protocol compatible with Apache Hadoop, you can replace the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required). Note: Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder. Note that in most cases, if the distribution is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files. After installing the plugin on each node in the cluster and restarting every node, we can use the following command to create a repository in our Hadoop cluster: curl -XPUT 'http://localhost:9200/_snapshot/es_hdfs_repository' -d '{ "type": "hdfs" "settings": { "uri": "hdfs://namenode:8020/", "path": "elasticsearch_snapshots/es_hdfs_repository" } }' The available settings that we can use are as follows: uri: This is a required parameter that tells Elasticsearch where HDFS resides. It should have a format like hdfs://HOST:PORT/. path: This is the information about the path where snapshot files should be stored. It is a required parameter. load_default: This specifies whether the default parameters from the Hadoop configuration should be loaded and set to false if the reading of the settings should be disabled. This setting is enabled by default. chunk_size: This specifies the size of the chunk that Elasticsearch will use to split the snapshot data. If you want the snapshotting to be faster, you can use smaller chunks and more streams to push the data to HDFS. By default, it is disabled. conf.<key>: This is an optional parameter and tells where a key is in any Hadoop argument. The value provided using this property will be merged with the configuration. As an alternative, you can define your HDFS repository and its settings inside the elasticsearch.yml file of each node as follows: repositories: hdfs: uri: "hdfs://<host>:<port>/" path: "some/path" load_defaults: "true" conf.<key> : "<value>" compress: "false" chunk_size: "10mb" The Azure repository Just like Amazon S3, we are able to use a dedicated plugin to push our indices and metadata to Microsoft cloud services. To do this, we need to install a plugin on every node of the cluster, which we can do by running the following command: sudo bin/elasticsearch-plugin install repository-azure The configuration is also similar to the Amazon S3 plugin configuration. Our elasticsearch.yml file should contain the following section: cloud: azure: storage: my_account: account: your_azure_storage_account key: your_azure_storage_key Do not forget to restart all the nodes after installing the plugin. After Elasticsearch is configured, we need to create the actual repository, which we do by running the following command: curl -XPUT 'http://localhost:9200/_snapshot/azure_repository' -d '{ "type": "azure" }' The following settings are supported by the Elasticsearch Azure plugin: account: Microsoft Azure account settings to be used. container: As with the bucket in Amazon S3, every piece of information must reside in the container. This setting defines the name of the container in the Microsoft Azure space. The default value is elasticsearch-snapshots. base_path: This allows us to change the place where Elasticsearch will put the data. By default, the value for this setting is empty which causes Elasticsearch to put the data in the root directory. compress: This defaults to false and when enabled it allows us to compress the metadata files during the snapshot creation. chunk_size: This is the maximum chunk size used by Elasticsearch (set to 64m by default, and this is also the maximum value allowed). You can change it to change the size when the data should be split into smaller chunks. You can change the chunk size using size value notations such as, 1g, 100m, or 5k. An example of creating a repository using the settings follows: curl -XPUT "http://localhost:9205/_snapshot/azure_repository" -d' { "type": "azure", "settings": { "container": "es-backup-container", "base_path": "backups", "chunk_size": "100m", "compress": true } }' The Google cloud storage repository Similar to Amazon S3 and Microsoft Azure, we can use a GCS repository plugin for snapshotting and restoring of our indices. The settings for this plugin are almost similar to other cloud plugins. To know how to work with the Google cloud repository plugin please refer to the following URL: https://www.elastic.co/guide/en/elasticsearch/plugins/5.0/repository-gcs.htm Thus, in the article we learn how to carry out backup of your data from Elasticsearch clusters to the cloud, i.e. within different cloud repositories by making use of the additional plugin options with Elasticsearch. If you found our excerpt useful, you may explore other interesting features and advanced concepts of Elasticsearch 5.x like aggregation, index control, sharding, replication, and clustering in the book Mastering Elasticsearch 5.x.

0
1
5294

2018 new year resolutions to thrive in the Algorithmic World - Part 3 of 3

Getting to know different Big data Characteristics

Creating reports using SQL Server 2016 Reporting Services

2018 new year resolutions to thrive in the Algorithmic World - Part 2 of 3

How to recognize Patterns with Neural Networks in Java

2018 new year resolutions to thrive in an Algorithmic World - Part 1 of 3

Popular Data sources and models in SAP Analytics Cloud

Write your first Blockchain: Learning Solidity Programming in 15 minutes

Getting started with Linear and logistic regression

Getting started with Web Scraping with Beautiful Soup

Trending Topics

6 Popular Regression Techniques You Need To Know

How to create a Treemap and Packed Bubble Chart in Tableau

2017 Generative Adversarial Networks (GANs) Research Milestones

How to create a Box and Whisker Plot in Tableau

Saving backups on cloud services with ElasticSearch plugins