Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides - Data

281 Articles
article-image-how-should-web-developers-learn-machine-learning
Chris Tava
12 Jun 2017
6 min read
Save for later

How should web developers learn machine learning?

Chris Tava
12 Jun 2017
6 min read
Do you have the motivation to learn machine learning? Given its relevance in today's landscape, you should be motivated to learn about this field. But if you're a web developer, how do you go about learning it? In this article, I show you how. So, let’s break this down. What is machine learning? You may be wondering why machine learning matters to you, or how you would even go about learning it. Machine learning is a smart way to create software that finds patterns in data without having to explicitly program for each condition. Sounds too good to be true? Well it is. Quite frankly many of the state-of-the-art solutions to the toughest machine learning problems don’t even come close to reaching 100 percent accuracy and precision. This might not sound right to you if you’ve been trained, or have learned, to be precise and deterministic with the solutions you provide to the web applications you’ve worked on. In fact, machine learning is such a challenging problem domain that data scientists describe problems to be tractable or not. Computer algorithms can solve tractable problems in a reasonable amount of time with a reasonable amount of resources, whereas, in-tractable problems simply can’t be solved. Decades more of R&D is needed at a deep theoretical level, to bring approaches and frameworks forward that will then take years to be applied and be useful to society. Did I scare you off? Nope? Okay great. Then you accept this challenge to learn machine learning.  But before we dive into how to learn machine learning, let's answer the question: Why does learning machine learning matter to you?  Well, you're a technologist and as a result, it’s your duty, your obligation, to be on the cutting edge. The technology world is moving at a fast clip and it’s accelerating. Take for example, the shortened duration between public accomplishments of machine learning versus top gaming experts. It took a while to get to the 2011 Watson v. Jeopardy champion, and far less time between AlphaGo and Libratus. So what's the significance to you and your professional software engineering career? Elementary dear my Watson—just like the so-called digital divide between non-technical and technical lay people, there is already the start of a technological divide between top systems engineers and the rest of the playing field in terms of making an impact and disrupting the way the world works.  Don’t believe me? When’s the last time you’ve programmed a self-driving car or a neural network that can guess your drawings? Making an impact and how to learn machine learning The toughest part about getting started with machine learning is figuring out what type of problem you have at hand because you run the risk of jumping to potential solutions too quickly before understanding the problem. Sure you can say this of any software design task, but this point can’t be stressed enough when thinking about how to get machines to recognize patterns in data. There are specific applications of machine learning algorithms that solve a very specific problem in a very specific way and it’s difficult to know how to solve a meta-problem if you haven’t studied the field from a conceptual standpoint. For me, a break through in learning machine learning came from taking Andrew Ng’s machine learning course on courser. So taking online courses can be a good way to start learning.  If you don’t have the time, you can learn about machine learning through numbers and images. Let's take a look.  Numbers Conceptually speaking, predicting a pattern in a single variable based on a direct—otherwise known as a linear relationship with another piece of data—is probably the easiest machine learning problem and solution to understand and implement.  The following script predicts the amount of data that will be created based on fitting a sample data set to a linear regression model: https://github.com/ctava/linearregression-go. Because there is somewhat of a fit of the sample data to a linear model, the machine learning program predicted that the data created in the fictitious Bob’s system will grow from 2017, 2018.  Bob’s Data 2017: 4401Bob’s Data 2018 Prediction: 5707  This is great news for Bob and for you. You see, machine learning isn’t so tough after all. I’d like to encourage you to save data for a single variable—also known as feature—to a CSV file and see if you can find that the data has a linear relationship with time. The following website is handy in calculating the number of dates between two dates: https://www.timeanddate.com/date/duration.html. Be sure to choose your starting day and year appropriately at the top of the file to fit your data. Images Machine learning on images is exciting! It’s fun to see what the computer comes up with in terms of pattern recognition, or image recognition. Here’s an example using computer vision to detect that grumpy cat is actually a Persian cat: https://github.com/ctava/tensorflow-go-imagerecognition. If setting up Tensorflow from source isn’t your thing, not to worry. Here’s a Docker image to start off with: https://github.com/ctava/tensorflow-go. Once you’ve followed the instructions in the readme.md file, simply:  Get github.com/ctava/tensorflow-go-imagerecognition Run main.go -dir=./ -image=./grumpycat.jpg Result: BEST MATCH: (66% likely) Persian cat Sure there is a whole discussion on this topic alone in terms of what Tensorflow is, what’s a tensor, and what’s image recognition. But I just wanted to spark your interest so that maybe you’ll start to look at the amazing advances in the computer vision field. Hopefully this has motivated you to learn more about machine learning based on reading about the recent advances in the field and seeing two simple examples of predicting numbers, and classifying images.I’d like to encourage you to keep up with data science in general. About the Author  Chris Tava is a Software Engineering / Product Leader with 20 years of experience delivering applications for B2C and B2B businesses. His specialties include: program strategy, product and project management, agile software engineering, resource management, recruiting, people development, business analysis, machine learning, ObjC / Swift, Golang, Python, Android, Java, and JavaScript.
Read more
  • 0
  • 0
  • 6816

article-image-5-ways-artificial-intelligence-is-transforming-the-gaming-industry
Amey Varangaonkar
01 Dec 2017
7 min read
Save for later

5 Ways Artificial Intelligence is Transforming the Gaming Industry

Amey Varangaonkar
01 Dec 2017
7 min read
Imagine yourself playing a strategy game, like Age of Empires perhaps. You are in a world that looks real and you are pitted against the computer, and your mission is to protect your empire and defeat the computer, at the same time. What if you could create an army of soldiers who could explore the map and attack the enemies on their own, based on just a simple command you give them? And what if your soldiers could have real, unscripted conversations with you as their commander-in-chief to seek instructions? And what if the game’s scenes change spontaneously based on your decisions and interactions with the game elements, like a movie? Sounds too good to be true? It’s not far-fetched at all - thanks to the rise of Artificial Intelligence! The gaming industry today is a market worth over a hundred billion dollars. The Global Games Market Report says that about 2.2 billion gamers across the world are expected to generate an incredible $108.9 billion in game revenue by the end of 2017. As such, gaming industry giants are seeking newer and more innovative ways to attract more customers and expand their brands. While terms like Virtual Reality, Augmented Reality and Mixed Reality come to mind immediately as the future of games, the rise of Artificial Intelligence is an equally important stepping stone in making games smarter and more interactive, and as close to reality as possible. In this article, we look at the 5 ways AI is revolutionizing the gaming industry, in a big way! Making games smarter While scripting is still commonly used for control of NPCs (Non-playable character) in many games today, many heuristic algorithms and game AIs are also being incorporated for controlling these NPCs. Not just that, the characters also learn from the actions taken by the player and modify their behaviour accordingly. This concept can be seen implemented in Nintendogs, a real-time pet simulation video game by Nintendo. The ultimate aim of the game creators in the future will be to design robust systems within games that understand speech, noise and other sounds within the game and tweak the game scenario accordingly. This will also require modern AI techniques such as pattern recognition and reinforcement learning, where the characters within the games will self-learn from their own actions and evolve accordingly. The game industry has identified this and some have started implementing these ideas - games like F.E.A.R and The Sims are a testament to this. Although the adoption of popular AI techniques in gaming is still quite limited, their possible applications in the near-future has the entire gaming industry buzzing. Making games more realistic This is one area where the game industry has grown leaps and bounds over the last 10 years. There have been incredible advancements in 3D visualization techniques, physics-based simulations and more recently, inclusion of Virtual Reality and Augmented Reality in games. These tools have empowered game developers to create interactive, visually appealing games which one could never imagine a decade ago. Meanwhile, gamers have evolved too. They don’t just want good graphics anymore; they want games to resemble reality. This is a massive challenge for game developers, and AI is playing a huge role in addressing this need. Imagine a game which can interpret and respond to your in-game actions, anticipate your next move and act accordingly. Not the usual scripts where an action X will give a response Y, but an AI program that chooses the best possible alternative to your action in real-time, making the game more realistic and enjoyable for you. Improving the overall gaming experience Let’s take a real-world example here. If you’ve played EA Sports’ FIFA 17, you may be well-versed with their Ultimate Team mode. For the uninitiated, it’s more of a fantasy draft, where you can pick one of the five player choices given to you for each position in your team, and the AI automatically determines the team chemistry based on your choices. The team chemistry here is important, because the higher the team chemistry, the better the chances of your team playing well. The in-game AI also makes the playing experience better by making it more interactive. Suppose you’re losing a match against an opponent - the AI reacts by boosting your team’s morale through increased fan chants, which in turn affects player performances positively. Gamers these days pay a lot of attention to detail - this not only includes the visual appearance and the high-end graphics, but also how immersive and interactive the game is, in all possible ways. Through real-time customization of scenarios, AI has the capability to play a crucial role in taking the gaming experience to the next level. Transforming developer skills The game developer community have always been innovators in adopting cutting edge technology to hone their technical skills and creativity. Reinforcement Learning, a sub-set of Machine Learning, and the algorithm behind the popular AI computer program AlphaGo, that beat the world’s best human Go player is a case in point. Even for the traditional game developers, the rising adoption of AI in games will mean a change in the way games are developed. In an interview with Gamasutra, AiGameDev.com’s Alex Champandard says something interesting: “Game design that hinges on more advanced AI techniques is slowly but surely becoming more commonplace. Developers are more willing to let go and embrace more complex systems.” It’s safe to say that the notion of Game AI is changing drastically. Concepts such as smarter function-based movements, pathfinding, inclusion of genetic algorithms and rule-based AI such as fuzzy logic are being increasingly incorporated in games, although not at a very large scale. There are some implementation challenges currently as to how academic AI techniques can be brought more into games, but with time these AI algorithms and techniques are expected to embed more seamlessly with traditional game development skills. As such, in addition to knowledge of traditional game development tools and techniques, game developers will now have to also skill up on these AI techniques to make smarter, more realistic and more interactive games. Making smarter mobile games The rise of the mobile game industry today is evident from the fact that close to 50% of the game revenue in 2017 will come from mobile games - be it smartphones or tablets. The increasingly high processing power of these devices has allowed developers to create more interactive and immersive mobile games. However, it is important to note that the processing power of the mobile games is yet to catch up to their desktop counterparts, not to mention the lack of a gaming console, which is beyond comparison at this stage. To tackle this issue, mobile game developers are experimenting with different machine learning and AI algorithms to impart ‘smartness’ to mobile games, while still adhering to the processing power limits. Compare today’s mobile games to the ones 5 years back, and you’ll notice a tremendous shift in terms of the visual appearance of the games, and how interactive they have become. New machine learning and deep learning frameworks & libraries are being developed to cater specifically to the mobile platform. Google’s TensorFlow Lite and Facebook’s Caffe2 are instances of such development. Soon, these tools will come to developers’ rescue to build smarter and more interactive mobile games. In Conclusion Gone are the days when games were just about entertainment and passing time. The gaming industry is now one of the most profitable industries of today. As it continues to grow, the demands of the gaming community and the games themselves keep evolving. The need for realism in games is higher than ever, and AI has an important role to play in making games more interactive, immersive and intelligent. With the rate at which new AI techniques and algorithms are developing, it’s an exciting time for game developers to showcase their full potential. Are you ready to start building AI for your own games? Here are some books to help you get started: Practical Game AI Programming Learning game AI programming with Lua
Read more
  • 0
  • 0
  • 6768

article-image-predictive-analytics-with-amazon-ml
Natasha Mathur
09 Aug 2018
9 min read
Save for later

Predictive Analytics with AWS: A quick look at Amazon ML

Natasha Mathur
09 Aug 2018
9 min read
As artificial intelligence and big data have become a ubiquitous part of our everyday lives, cloud-based machine learning services are part of a rising billion-dollar industry. Among the several services currently available in the market, Amazon Machine Learning stands out for its simplicity. In this article, we will look at Amazon Machine Learning, MLaaS, and other related concepts. This article is an excerpt taken from the book 'Effective Amazon Machine Learning' written by Alexis Perrier. Machine Learning as a Service Amazon Machine Learning is an online service by Amazon Web Services (AWS) that does supervised learning for predictive analytics. Launched in April 2015 at the AWS Summit, Amazon ML joins a growing list of cloud-based machine learning services, such as Microsoft Azure, Google prediction, IBM Watson, Prediction IO, BigML, and many others. These online machine learning services form an offer commonly referred to as Machine Learning as a Service or MLaaS following a similar denomination pattern of other cloud-based services such as SaaS, PaaS, and IaaS respectively for Software, Platform, or Infrastructure as a Service. Studies show that MLaaS is a potentially big business trend. ABI Research, a business intelligence consultancy, estimates machine learning-based data analytics tools and services revenues to hit nearly $20 billion in 2021 as MLaaS services take off as outlined in this business report  Eugenio Pasqua, a Research Analyst at ABI Research, said the following: "The emergence of the Machine-Learning-as-a-Service (MLaaS) model is good news for the market, as it cuts down the complexity and time required to implement machine learning and thus opens the doors to an increase in its adoption level, especially in the small-to-medium business sector." The increased accessibility is a direct result of using an API-based infrastructure to build machine-learning models instead of developing applications from scratch. Offering efficient predictive analytics models without the need to code, host, and maintain complex code bases lowers the bar and makes ML available to smaller businesses and institutions. Amazon ML takes this democratization approach further than the other actors in the field by significantly simplifying the predictive analytics process and its implementation. This simplification revolves around four design decisions that are embedded in the platform: A limited set of tasks: binary classification, multi-classification, and regression A single linear algorithm A limited choice of metrics to assess the quality of the prediction A simple set of tuning parameters for the underlying predictive algorithm That somewhat constrained environment is simple enough while addressing most predictive analytics problems relevant to business. It can be leveraged across an array of different industries and use cases. Let's see how! Leveraging full AWS integration The AWS data ecosystem of pipelines, storage, environments, and Artificial Intelligence (AI) is also a strong argument in favor of choosing Amazon ML as a business platform for its predictive analytics needs. Although Amazon ML is simple, the service evolves to greater complexity and more powerful features once it is integrated into a larger structure of AWS data related services. AWS is already a major factor in cloud computing. Here's what an excerpt from The Economist, August  2016 has to say about AWS (http://www.economist.com/news/business/21705849-how-open-source-software-and-cloud-computing-have-set-up-it-industry): AWS shows no sign of slowing its progress towards full dominance of cloud computing's wide skies. It has ten times as much computing capacity as the next 14 cloud providers combined, according to Gartner, a consulting firm. AWS's sales in the past quarter were about three times the size of its closest competitor, Microsoft's Azure. This gives an edge to Amazon ML, as many companies that are using cloud services are likely to be already using AWS. Adding simple and efficient machine learning tools to the product offering mix anticipates the rise of predictive analytics features as a standard component of web services. Seamless integration with other AWS services is a strong argument in favor of using Amazon ML despite its apparent simplicity. The following architecture is a case study taken from an AWS January 2016 white paper titled Big Data Analytics Options on AWS (http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf), showing a potential AWS architecture for sentiment analysis on social media. It shows how Amazon ML can be part of a more complex architecture of AWS services: Comparing performances in Amazon ML services Keeping systems and applications simple is always difficult, but often worth it for the business. Examples abound with overloaded UIs bringing down the user experience, while products with simple, elegant interfaces and minimal features enjoy widespread popularity. The Keep It Simple mantra is even more difficult to adhere to in a context such as predictive analytics where performance is key. This is the challenge Amazon took on with its Amazon ML service. A typical predictive analytics project is a sequence of complex operations: getting the data, cleaning the data, selecting, optimizing and validating a model and finally making predictions. In the scripting approach, data scientists develop codebases using machine learning libraries such as the Python scikit-learn library or R packages to handle all these steps from data gathering to predictions in production. As a developer breaks down the necessary steps into modules for maintainability and testability, Amazon ML breaks down a predictive analytics project into different entities: datasource, model, evaluation, and predictions. It's the simplicity of each of these steps that makes AWS so powerful to implement successful predictive analytics projects. Engineering data versus model variety Having a large choice of algorithms for your predictions is always a good thing, but at the end of the day, domain knowledge and the ability to extract meaningful features from clean data is often what wins the game. Kaggle is a well-known platform for predictive analytics competitions, where the best data scientists across the world compete to make predictions on complex datasets. In these predictive competitions, gaining a few decimals on your prediction score is what makes the difference between earning the prize or being just an extra line on the public leaderboard among thousands of other competitors. One thing Kagglers quickly learn is that choosing and tuning the model is only half the battle. Feature extraction or how to extract relevant predictors from the dataset is often the key to winning the competition. In real life, when working on business-related problems, the quality of the data processing phase and the ability to extract meaningful signal out of raw data is the most important and time-consuming part of building an effective predictive model. It is well known that "data preparation accounts for about 80% of the work of data scientists" (http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/). Model selection and algorithm optimization remains an important part of the work but is often not the deciding factor when the implementation is concerned. A solid and robust implementation that is easy to maintain and connects to your ecosystem seamlessly is often preferred to an overly complex model developed and coded in-house, especially when the scripted model only produces small gains when compared to a service-based implementation. Amazon's expertise and the gradient descent algorithm Amazon has been using machine learning for the retail side of its business and has built a serious expertise in predictive analytics. This expertise translates into the choice of algorithm powering the Amazon ML service. The Stochastic Gradient Descent (SGD) algorithm is the algorithm powering Amazon ML linear models and is ultimately responsible for the accuracy of the predictions generated by the service. The SGD algorithm is one of the most robust, resilient, and optimized algorithms. It has been used in many diverse environments, from signal processing to deep learning and for a wide variety of problems, since the 1960s with great success. The SGD has also given rise to many highly efficient variants adapted to a wide variety of data contexts. We will come back to this important algorithm in a later chapter; suffice it to say at this point that the SGD algorithm is the Swiss army knife of all possible predictive analytics algorithm. Several benchmarks and tests of the Amazon ML service can be found across the web (Amazon, Google, and Azure: https://blog.onliquid.com/machine-learning-services-2/ and Amazon versus scikit-learn: http://lenguyenthedat.com/minimal-data-science-2-avazu/). Overall results show that the Amazon ML performance is on a par with other MLaaS platforms, but also with scripted solutions based on popular machine learning libraries such as scikit-learn. For a given problem in a specific context and with an available dataset and a particular choice of a scoring metric, it is probably possible to code a predictive model using an adequate library and obtain better performances than the ones obtained with Amazon ML. But what Amazon ML offers is stability, an absence of coding, and a very solid benchmark record, as well as a seamless integration with the Amazon Web Services ecosystem that already powers a large portion of the Internet. Amazon ML service pricing strategy As with other MLaaS providers and AWS services, Amazon ML only charges for what you consume. The cost is broken down into the following: An hourly rate for the computing time used to build predictive models A prediction fee per thousand prediction samples And in the context of real-time (streaming) predictions, a fee based on the memory allocated upfront for the model The computational time increases as a function of the following: The complexity of the model The size of the input data The number of attributes The number and types of transformations applied At the time of writing, these charges are as follows: $0.42 per hour for data analysis and model building fees $0.10 per 1,000 predictions for batch predictions $0.0001 per prediction for real-time predictions $0.001 per hour for each 10 MB of memory provisioned for your model These prices do not include fees related to the data storage (S3, Redshift, or RDS), which are charged separately. During the creation of your model, Amazon ML gives you a cost estimation based on the data source that has been selected. The Amazon ML service is not part of the AWS free tier, a 12-month offer applicable to certain AWS services for free under certain conditions. To summarize, we presented a simple introduction to the Amazon ML service. Amazon ML is built on a solid ground, with a simple yet very efficient algorithm driving its predictions. If you found this post useful, be sure to check out the book  'Effective Amazon Machine Learning' to learn about predictive analytics and other concepts in AWS machine learning. Integrate applications with AWS services: Amazon DynamoDB & Amazon Kinesis [Tutorial] AWS makes Amazon Rekognition, its image recognition AI, available for Asia-Pacific developers AWS Elastic Load Balancing: support added for Redirects and Fixed Responses in Application Load Balancer
Read more
  • 0
  • 0
  • 6747
Visually different images

article-image-julia-for-machine-learning-will-the-new-language-pick-up-pace
Prasad Ramesh
20 Oct 2018
4 min read
Save for later

Julia for machine learning. Will the new language pick up pace?

Prasad Ramesh
20 Oct 2018
4 min read
Machine learning can be done using many languages, with Python and R being the most popular. But one language has been overlooked for some time—Julia. Why isn’t Julia machine learning a thing? Julia isn't an obvious choice for machine learning simply because it's a new language that has only recently hit version 1.0. While Python is well-established, with a large community and many libraries, Julia simply doesn't have the community to shout about it. And that's a shame. Right now Julia is used in various fields. From optimizing milk production in dairy farms to parallel supercomputing for astronomy, Julia has a wide range of applications. A common theme here is that these actions all require numerical, scientific, and sometimes parallel computation. Julia is well-suited to the sort of tasks where intensive computation is essential. Viral Shah, CEO of Julia Computing said to Forbes “Amazon, Apple, Disney, Facebook, Ford, Google, Grindr, IBM, Microsoft, NASA, Oracle and Uber are other Julia users, partners and organizations hiring Julia programmers.” Clearly, Julia is powering the analytical nous of some of the most high profile organizations on the planet. Perhaps it just needs more cheerleading to go truly mainstream. Why Julia is a great language for machine learning Julia was originally designed for high-performance numerical analysis. This means that everything that has gone into its design is built for the very things you need to do to build effective machine learning systems. Speed and functionality Julia combines the functionality from various popular languages like Python, R, Matlab, SAS and Stata with the speed of C++ and Java. A lot of the standard LaTeX symbols can be used in Julia, with the syntax usually being the same as LaTeX. This mathematical syntax makes it easy for implementing mathematical formulae in code and make Julia machine learning possible. It also has in-built support for parallelism which allows utilization of multiple cores at once making it fast at computations. Julia’s loops and functions features are pretty fast, fast enough that you would probably notice significant performance differences against other languages. The performance can be almost comparable to C with very little code actually used. With packages like ArrayFire, generic code can be run on GPUs. In Julia, the multiple dispatch feature is very useful for defining number and array-like datatypes. Matrices, data tables work with good compatibility and performance. Julia has automatic garbage collection, a collection of libraries for mathematical calculations, linear algebra, random number generation, and regular expression matching. Libraries and scalability Julia machine learning can be done with powerful tools like MLBase.jl, Flux.jl, Knet.jl, that can be used for machine learning and artificial intelligence systems. It also has a scikit-learn implementation called ScikitLearn.jl. Although ScikitLearn.jl is not an official port, it is a useful additional tool for building machine learning systems with Julia. As if all those weren’t enough, Julia also has TensorFlow.jl and MXNet.jl. So, if you already have experience with these tools, in other implementations, the transition is a little easier than learning everything from scratch. Julia is also incredibly scalable. It can be deployed on large clusters quickly, which is vital if you’re working with big data across a distributed system. Should you consider Julia machine learning? Because it’s fast and possesses a great range of features, Julia could potentially overtake both Python and R to be the choice of language for machine learning in the future. Okay, maybe we shouldn’t get ahead of ourselves. But with Julia reaching the 1.0 milestone, and the language rising on the TIOBE index, you certainly shouldn’t rule out Julia when it comes to machine learning. Julia is also available to use in the popular tool Jupyter Notebook, paving a path for wider adoption. A note of caution, however, is important. Rather than simply dropping everything for Julia, it will be worth monitoring the growth of the language. Over the next 12 to 24 months we’ll likely see new projects and libraries, and the Julia machine learning community expanding. If you start hearing more noise about the language, it becomes a much safer option to invest your time and energy in learning it. If you are just starting off with machine learning, then you should stick to other popular languages. An experienced engineer, however, who already has a good grip on other languages shouldn’t be scared of experimenting with Julia - it gives you another option, and might just help you to uncover new ways of working and solving problems. Julia 1.0 has just been released What makes functional programming a viable choice for artificial intelligence projects? Best Machine Learning Datasets for beginners
Read more
  • 0
  • 0
  • 6595

article-image-why-should-enterprises-use-splunk
Sunith Shetty
25 Jul 2018
4 min read
Save for later

Why should enterprises use Splunk?

Sunith Shetty
25 Jul 2018
4 min read
Splunk is a multinational software company that offers its core platform, Splunk Enterprise, as well as many related offerings built on the Splunk platform. The platform helps a wide variety of organizational personas, such as analysts, operators, developers, testers, managers, and executives. They get analytical insights from machine-created data. It collects, stores, and provides powerful analytical capabilities, enabling organizations to act on often powerful insights derived from this data. The Splunk Enterprise platform was built with IT operations in mind. When companies had IT infrastructure problems, troubleshooting and solving problems was immensely difficult, complicated, and manual. It was built to collect and make log files from IT systems searchable and accessible. It is commonly used for information security and development operations, as well as more advanced use cases for custom machines, Internet of Things, and mobile devices. Most organizations will start using Splunk in one of three areas: IT operations management, information security, or development operations (DevOps). In today's post, we will understand the thoughts, concepts, and ideas to apply Splunk to an organization level. This article is an excerpt from a book written by J-P Contreras, Erickson Delgado and Betsy Page Sigman titled Splunk 7 Essentials, Third Edition. IT operations IT operations have moved from predominantly being a cost center to also being a revenue center. Today, many of the world's oldest companies also make money based on IT services and/or systems. As a result, the delivery of these IT services must be monitored and, ideally, proactively remedied before failures occur. Ensuring that hardware such as servers, storage, and network devices are functioning properly via their log data is important. Organizations can also log and monitor mobile and browser-based software applications for any issues from software. Ultimately, organizations will want to correlate these sets of data together to get a complete picture of IT Health. In this regard, Splunk takes the expertise accumulated over the years and offers a paid-for application known as IT Server Intelligence (ITSI) to help give companies a framework for tackling large IT environments. Complicating matters for many traditional organizations is the use of Cloud computing technologies, which now drive log captured from both internally and externally hosted systems. Cybersecurity With the relentless focus in today's world on cybersecurity, there is a good chance your organization will need a tool such as Splunk to address a wide variety of Information Security needs as well. It acts as a log data consolidation and reporting engine, capturing essential security-related log data from devices and software, such as vulnerability scanners, phishing prevention, firewalls, and user management and behavior, just to name a few. Companies need to ensure they are protected from external as well as internal threats, and as a result offer the paid-for applications enterprise security and User behavior analytics (UBA). Similar to ITSI, these applications deliver frameworks to help companies meet their specific requirements in these areas. In addition to cyber-security to protect the business, often companies will have to comply with, and audit against, specific security standards, which can be industry-related, such as PCI compliance of financial transactions; customer-related, such as National Institute of Standards and Technologies (NIST) requirements in working with the the US government; or data privacy-related, such as the Health Insurance Portability and Accountability Act (HIPAA) or the European Union's General Data Protection Regulation (GPDR). Software development and support operations Commonly referred to as DevOps, Splunk's ability to ingest and correlate data from many sources solves many challenges faced in software development, testing, and release cycles. Using Splunk will help teams provide higher quality software more efficiently. Then, with the controls into the software in place, it will provide visibility into released software, its use and user behavior changes, intended or not. This set of use cases is particularly applicable to organizations that develop their own software. Internet of Things Many organizations today are looking to build upon the converging trends in computing, mobility and wireless communications and data to capture data from more and more devices. Examples can include data captured from sensors placed on machinery such as wind turbines, trains, sensors, heating, and cooling systems. These sensors provide access to the data they capture in standard formats such as JavaScript Object Notation (JSON) through application programming interfaces (APIs). To summarize, we saw how Splunk can be used at an organizational level for IT operations, cybersecurity, software development and support and the IoTs. To know more about how Splunk can be used to make informed decisions in areas such as IT operations, information security, and the Internet of Things., do checkout this book Splunk 7 Essentials, Third Edition. Create a data model in Splunk to enable interactive reports and dashboards Splunk leverages AI in its monitoring tools Splunk Industrial Asset Intelligence (Splunk IAI) targets Industrial IoT marketplace
Read more
  • 0
  • 1
  • 6513

article-image-the-rise-of-machine-learning-in-the-investment-industry
Natasha Mathur
15 Feb 2019
13 min read
Save for later

The rise of machine learning in the investment industry

Natasha Mathur
15 Feb 2019
13 min read
The investment industry has evolved dramatically over the last several decades and continues to do so amid increased competition, technological advances, and a challenging economic environment. In this article, we will review several key trends that have shaped the investment environment in general, and the context for algorithmic trading more specifically. This article is an excerpt taken from the book 'Hands on Machine Learning for algorithmic trading' written by Stefan Jansen. The book explores the strategic perspective, conceptual understanding, and practical tools to add value from applying ML to the trading and investment process. The trends that have propelled algorithmic trading and ML to current prominence include: Changes in the market microstructure, such as the spread of electronic trading and the integration of markets across asset classes and geographies The development of investment strategies framed in terms of risk-factor exposure, as opposed to asset classes The revolutions in computing power, data-generation and management, and analytic methods The outperformance of the pioneers in algorithmic traders relative to human, discretionary investors In addition, the financial crises of 2001 and 2008 have affected how investors approach diversification and risk management and have given rise to low-cost passive investment vehicles in the form of exchange-traded funds (ETFs). Amid low yield and low volatility after the 2008 crisis, cost-conscious investors shifted $2 trillion from actively-managed mutual funds into passively managed ETFs. Competitive pressure is also reflected in lower hedge fund fees that dropped from the traditional 2% annual management fee and 20% take of profits to an average of 1.48% and 17.4%, respectively, in 2017. Let's have a look at how ML has come to play a strategic role in algorithmic trading. Factor investing and smart beta funds The return provided by an asset is a function of the uncertainty or risk associated with financial investment. An equity investment implies, for example, assuming a company's business risk, and a bond investment implies assuming default risk. To the extent that specific risk characteristics predict returns, identifying and forecasting the behavior of these risk factors becomes a primary focus when designing an investment strategy. It yields valuable trading signals and is the key to superior active-management results. The industry's understanding of risk factors has evolved very substantially over time and has impacted how ML is used for algorithmic trading. Modern Portfolio Theory (MPT) introduced the distinction between idiosyncratic and systematic sources of risk for a given asset. Idiosyncratic risk can be eliminated through diversification, but systematic risk cannot. In the early 1960s, the Capital Asset Pricing Model (CAPM) identified a single factor driving all asset returns: the return on the market portfolio in excess of T-bills. The market portfolio consisted of all tradable securities, weighted by their market value. The systematic exposure of an asset to the market is measured by beta, which is the correlation between the returns of the asset and the market portfolio. The recognition that the risk of an asset does not depend on the asset in isolation, but rather how it moves relative to other assets, and the market as a whole, was a major conceptual breakthrough. In other words, assets do not earn a risk premium because of their specific, idiosyncratic characteristics, but because of their exposure to underlying factor risks. However, a large body of academic literature and long investing experience have disproved the CAPM prediction that asset risk premiums depend only on their exposure to a single factor measured by the asset's beta. Instead, numerous additional risk factors have since been discovered. A factor is a quantifiable signal, attribute, or any variable that has historically correlated with future stock returns and is expected to remain correlated in future. These risk factors were labeled anomalies since they contradicted the Efficient Market Hypothesis (EMH), which sustained that market equilibrium would always price securities according to the CAPM so that no other factors should have predictive power. The economic theory behind factors can be either rational, where factor risk premiums compensate for low returns during bad times, or behavioral, where agents fail to arbitrage away excess returns. Well-known anomalies include the value, size, and momentum effects that help predict returns while controlling for the CAPM market factor. The size effect rests on small firms systematically outperforming large firms, discovered by Banz (1981) and Reinganum (1981). The value effect (Basu 1982) states that firms with low valuation metrics outperform. It suggests that firms with low price multiples, such as the price-to-earnings or the price-to-book ratios, perform better than their more expensive peers (as suggested by the inventors of value investing, Benjamin Graham and David Dodd, and popularized by Warren Buffet). The momentum effect, discovered in the late 1980s by, among others, Clifford Asness, the founding partner of AQR, states that stocks with good momentum, in terms of recent 6-12 month returns, have higher returns going forward than poor momentum stocks with similar market risk. Researchers also found that value and momentum factors explain returns for stocks outside the US, as well as for other asset classes, such as bonds, currencies, and commodities, and additional risk factors. In fixed income, the value strategy is called riding the yield curve and is a form of the duration premium. In commodities, it is called the roll return, with a positive return for an upward-sloping futures curve and a negative return otherwise. In foreign exchange, the value strategy is called carry. There is also an illiquidity premium. Securities that are more illiquid trade at low prices and have high average excess returns, relative to their more liquid counterparts. Bonds with higher default risk tend to have higher returns on average, reflecting a credit risk premium. Since investors are willing to pay for insurance against high volatility when returns tend to crash, sellers of volatility protection in options markets tend to earn high returns. Multifactor models define risks in broader and more diverse terms than just the market portfolio. In 1976, Stephen Ross proposed arbitrage pricing theory, which asserted that investors are compensated for multiple systematic sources of risk that cannot be diversified away. The three most important macro factors are growth, inflation, and volatility, in addition to productivity, demographic, and political risk. In 1992, Eugene Fama and Kenneth French combined the equity risk factors' size and value with a market factor into a single model that better explained cross-sectional stock returns. They later added a model that also included bond risk factors to simultaneously explain returns for both asset classes. A particularly attractive aspect of risk factors is their low or negative correlation. Value and momentum risk factors, for instance, are negatively correlated, reducing the risk and increasing risk-adjusted returns above and beyond the benefit implied by the risk factors. Furthermore, using leverage and long-short strategies, factor strategies can be combined into market-neutral approaches. The combination of long positions in securities exposed to positive risks with underweight or short positions in the securities exposed to negative risks allows for the collection of dynamic risk premiums. As a result, the factors that explained returns above and beyond the CAPM were incorporated into investment styles that tilt portfolios in favor of one or more factors, and assets began to migrate into factor-based portfolios. The 2008 financial crisis underlined how asset-class labels could be highly misleading and create a false sense of diversification when investors do not look at the underlying factor risks, as asset classes came crashing down together. Over the past several decades, quantitative factor investing has evolved from a simple approach based on two or three styles to multifactor smart or exotic beta products. Smart beta funds have crossed $1 trillion AUM in 2017, testifying to the popularity of the hybrid investment strategy that combines active and passive management. Smart beta funds take a passive strategy but modify it according to one or more factors, such as cheaper stocks or screening them according to dividend payouts, to generate better returns. This growth has coincided with increasing criticism of the high fees charged by traditional active managers as well as heightened scrutiny of their performance. The ongoing discovery and successful forecasting of risk factors that, either individually or in combination with other risk factors, significantly impact future asset returns across asset classes is a key driver of the surge in ML in the investment industry. Algorithmic pioneers outperform humans at scale The track record and growth of Assets Under Management (AUM) of firms that spearheaded algorithmic trading has played a key role in generating investor interest and subsequent industry efforts to replicate their success. Systematic funds differ from HFT in that trades may be held significantly longer while seeking to exploit arbitrage opportunities as opposed to advantages from sheer speed. Systematic strategies that mostly or exclusively rely on algorithmic decision-making were most famously introduced by mathematician James Simons who founded Renaissance Technologies in 1982 and built it into the premier quant firm. Its secretive Medallion Fund, which is closed to outsiders, has earned an estimated annualized return of 35% since 1982. DE Shaw, Citadel, and Two Sigma, three of the most prominent quantitative hedge funds that use systematic strategies based on algorithms, rose to the all-time top-20 performers for the first time in 2017 in terms of total dollars earned for investors, after fees, and since inception. DE Shaw, founded in 1988 with $47 billion AUM in 2018 joined the list at number 3. Citadel started in 1990 by Kenneth Griffin, manages $29 billion and ranks 5, and Two Sigma started only in 2001 by DE Shaw alumni John Overdeck and David Siegel, has grown from $8 billion AUM in 2011 to $52 billion in 2018. Bridgewater started in 1975 with over $150 billion AUM, continues to lead due to its Pure Alpha Fund that also incorporates systematic strategies. Similarly, on the Institutional Investors 2017 Hedge Fund 100 list, five of the top six firms rely largely or completely on computers and trading algorithms to make investment decisions—and all of them have been growing their assets in an otherwise challenging environment. Several quantitatively-focused firms climbed several ranks and in some cases grew their assets by double-digit percentages. Number 2-ranked Applied Quantitative Research (AQR) grew its hedge fund assets 48% in 2017 to $69.7 billion and managed $187.6  billion firm-wide. Among all hedge funds, ranked by compounded performance over the last three years, the quant-based funds run by Renaissance Technologies achieved ranks 6 and 24, Two Sigma rank 11, D.E. Shaw no 18 and 32, and Citadel ranks 30 and 37. Beyond the top performers, algorithmic strategies have worked well in the last several years. In the past five years, quant-focused hedge funds gained about 5.1% per year while the average hedge fund rose 4.3% per year in the same period. ML driven funds attract $1 trillion AUM The familiar three revolutions in computing power, data, and ML methods have made the adoption of systematic, data-driven strategies not only more compelling and cost-effective but a key source of competitive advantage. As a result, algorithmic approaches are not only finding wider application in the hedge-fund industry that pioneered these strategies but across a broader range of asset managers and even passively-managed vehicles such as ETFs. In particular, predictive analytics using machine learning and algorithmic automation play an increasingly prominent role in all steps of the investment process across asset classes, from idea-generation and research to strategy formulation and portfolio construction, trade execution, and risk management. Estimates of industry size vary because there is no objective definition of a quantitative or algorithmic fund, and many traditional hedge funds or even mutual funds and ETFs are introducing computer-driven strategies or integrating them into a discretionary environment in a human-plus-machine approach. Morgan Stanley estimated in 2017 that algorithmic strategies have grown at 15% per year over the past six years and control about $1.5 trillion between hedge funds, mutual funds, and smart beta ETFs. Other reports suggest the quantitative hedge fund industry was about to exceed $1 trillion AUM, nearly doubling its size since 2010 amid outflows from traditional hedge funds. In contrast, total hedge fund industry capital hit $3.21 trillion according to the latest global Hedge Fund Research report. The market research firm Preqin estimates that almost 1,500 hedge funds make a majority of their trades with help from computer models. Quantitative hedge funds are now responsible for 27% of all US stock trades by investors, up from 14% in 2013. But many use data scientists—or quants—which, in turn, use machines to build large statistical models (WSJ). In recent years, however, funds have moved toward true ML, where artificially-intelligent systems can analyze large amounts of data at speed and improve themselves through such analyses. Recent examples include Rebellion Research, Sentient, and Aidyia, which rely on evolutionary algorithms and deep learning to devise fully-automatic Artificial Intelligence (AI)-driven investment platforms. From the core hedge fund industry, the adoption of algorithmic strategies has spread to mutual funds and even passively-managed exchange-traded funds in the form of smart beta funds, and to discretionary funds in the form of quantamental approaches. The emergence of quantamental funds Two distinct approaches have evolved in active investment management: systematic (or quant) and discretionary investing. Systematic approaches rely on algorithms for a repeatable and data-driven approach to identify investment opportunities across many securities; in contrast, a discretionary approach involves an in-depth analysis of a smaller number of securities. These two approaches are becoming more similar to fundamental managers take more data-science-driven approaches. Even fundamental traders now arm themselves with quantitative techniques, accounting for $55 billion of systematic assets, according to Barclays. Agnostic to specific companies, quantitative funds trade patterns and dynamics across a wide swath of securities. Quants now account for about 17% of total hedge fund assets, data compiled by Barclays shows. Point72 Asset Management, with $12 billion in assets, has been shifting about half of its portfolio managers to a man-plus-machine approach. Point72 is also investing tens of millions of dollars into a group that analyzes large amounts of alternative data and passes the results on to traders. Investments in strategic capabilities Rising investments in related capabilities—technology, data and, most importantly, skilled humans—highlight how significant algorithmic trading using ML has become for competitive advantage, especially in light of the rising popularity of passive, indexed investment vehicles, such as ETFs, since the 2008 financial crisis. Morgan Stanley noted that only 23% of its quant clients say they are not considering using or not already using ML, down from 44% in 2016. Guggenheim Partners LLC built what it calls a supercomputing cluster for $1 million at the Lawrence Berkeley National Laboratory in California to help crunch numbers for Guggenheim's quant investment funds. Electricity for the computers costs another $1 million a year. AQR is a quantitative investment group that relies on academic research to identify and systematically trade factors that have, over time, proven to beat the broader market. The firm used to eschew the purely computer-powered strategies of quant peers such as Renaissance Technologies or DE Shaw. More recently, however, AQR has begun to seek profitable patterns in markets using ML to parse through novel datasets, such as satellite pictures of shadows cast by oil wells and tankers. The leading firm BlackRock, with over $5 trillion AUM, also bets on algorithms to beat discretionary fund managers by heavily investing in SAE, a systematic trading firm it acquired during the financial crisis. Franklin Templeton bought Random Forest Capital, a debt-focused, data-led investment company for an undisclosed amount, hoping that its technology can support the wider asset manager. We looked at how ML plays a role in different industry trends around algorithmic trading. If you want to learn more about design and execution of algorithmic trading strategies, and use cases of ML in algorithmic trading, be sure to check out the book 'Hands on Machine Learning for algorithmic trading'. Using machine learning for phishing domain detection [Tutorial] Anatomy of an automated machine learning algorithm (AutoML) 10 machine learning algorithms every engineer needs to know
Read more
  • 0
  • 0
  • 6315
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-what-is-core-ml
Savia Lobo
28 Sep 2018
5 min read
Save for later

What is Core ML?

Savia Lobo
28 Sep 2018
5 min read
Introduced by Apple, CoreML is a machine learning framework that powers the iOS app developers to integrate machine learning technology into their apps. It supports natural language processing (NLP), image analysis, and various other conventional models to provide a top-notch on-device performance with minimal memory footprint and power consumption. This article is an extract taken from the book Machine Learning with Core ML written by Joshua Newnham. In this article, you will get to know the basics of what CoreML is and its typical workflow. With the release of iOS 11 and Core ML, performing inference is just a matter of a few lines of code. Prior to iOS 11, inference was possible, but it required some work to take a pre-trained model and port it across using an existing framework such as Accelerate or metal performance shaders (MPSes). Accelerate and MPSes are still used under the hood by Core ML, but Core ML takes care of deciding which underlying framework your model should use (Accelerate using the CPU for memory-heavy tasks and MPSes using the GPU for compute-heavy tasks). It also takes care of abstracting a lot of the details away; this layer of abstraction is shown in the following diagram: There are additional layers too; iOS 11 has introduced and extended domain-specific layers that further abstract a lot of the common tasks you may use when working with image and text data, such as face detection, object tracking, language translation, and named entity recognition (NER). These domain-specific layers are encapsulated in the Vision and natural language processing (NLP) frameworks; we won't be going into any details of these frameworks here, but you will get a chance to use them in later chapters: It's worth noting that these layers are not mutually exclusive and it is common to find yourself using them together, especially the domain-specific frameworks that provide useful preprocessing methods we can use to prepare our data before sending to a Core ML model. So what exactly is Core ML? You can think of Core ML as a suite of tools used to facilitate the process of bringing ML models to iOS and wrapping them in a standard interface so that you can easily access and make use of them in your code. Let's now take a closer look at the typical workflow when working with Core ML. CoreML Workflow As described previously, the two main tasks of a ML workflow consist of training and inference. Training involves obtaining and preparing the data, defining the model, and then the real training. Once your model has achieved satisfactory results during training and is able to perform adequate predictions (including on data it hasn't seen before), your model can then be deployed and used for inference using data outside of the training set. Core ML provides a suite of tools to facilitate getting a trained model into iOS, one being the Python packaged released called Core ML Tools; it is used to take a model (consisting of the architecture and weights) from one of the many popular packages and exporting a .mlmodel file, which can then be imported into your Xcode project. Once imported, Xcode will generate an interface for the model, making it easily accessible via code you are familiar with. Finally, when you build your app, the model is further optimized and packaged up within your application. A summary of the process of generating the model is shown in the following diagram:   The previous diagram illustrates the process of creating the .mlmodel;, either using an existing model from one of the supported frameworks, or by training it from scratch. Core ML Tools supports most of the frameworks, either internal or as third party plug-ins, including  Keras, Turi, Caffe, scikit-learn, LibSVN, and XGBoost frameworks. Apple has also made this package open source and modular for easy adaption for other frameworks or by yourself. The process of importing the model is illustrated in this diagram: In addition; there are frameworks with tighter integration with Core ML that handle generating the Core ML model such as Turi Create, IBM Watson Services for Core ML, and Create ML. We will be introducing Create ML in chapter 10; for those interesting in learning more about Turi Create and IBM Watson Services for Core ML then please refer to the official webpages via the following links: Turi Create; https://github.com/apple/turicreate IBM Watson Services for Core ML; https://developer.apple.com/ibm/ Once the model is imported, as mentioned previously, Xcode generates an interface that wraps the model, model inputs, and outputs. Thus, in this post, we learned about the workflow of training and how to import a model. If you've enjoyed this post, head over to the book  Machine Learning with Core ML  to delve into the details of what this model is and what Core ML currently supports. Emotional AI: Detecting facial expressions and emotions using CoreML [Tutorial] Build intelligent interfaces with CoreML using a CNN [Tutorial] Watson-CoreML: IBM and Apple’s new machine learning collaboration project
Read more
  • 0
  • 0
  • 6199

article-image-deploy-self-service-business-intelligence-qlik-sense
Amey Varangaonkar
31 May 2018
7 min read
Save for later

Best practices for deploying self-service BI with Qlik Sense

Amey Varangaonkar
31 May 2018
7 min read
As part of a successful deployment of Qlik Sense, it is important IT recognizes self-service Business Intelligence to have its own dynamics and adoption rules. The various use cases and subsequent user groups thus need to be assessed and captured. Governance should always be present but power users should never get the feeling that they are restricted. Once they are won over, the rest of the traction and the adoption of other user types is very easy. In this article, we will look at the most important points to keep in mind while deploying self-service with Qlik Sense. The following excerpt is taken from the book Mastering Qlik Sense, authored by Martin Mahler and Juan Ignacio Vitantonio. This book demonstrates useful techniques to design useful and highly profitable Business Intelligence solutions using Qlik Sense. Here's the list of points to be kept in mind: Qlik Sense is not QlikView Not even nearly. The biggest challenge and fallacy is that the organization was sold, by Qlik or someone else, just the next version of the tool. It did not help at all that Qlik itself was working for years on Qlik Sense under the initial product name Qlik.Next. Whatever you are being told, however, it is being sold to you, Qlik Sense is at best the cousin of QlikView. Same family, but no blood relation. Thinking otherwise sets the wrong expectation so the business gives the wrong message to stakeholders and does not raise awareness to IT that self-service BI cannot be deployed in the same fashion as guided analytics, QlikView in this case. Disappointment is imminent when stakeholders realize Qlik Sense cannot replicate their QlikView dashboards. Simply installing Qlik Sense does not create a self-service BI environment Installing Qlik Sense and giving users access to the tool is a start but there is more to it than simply installing it. The infrastructure requires design and planning, data quality processing, data collection, and determining who intends to use the platform to consume what type of data. If data is not available and accessible to the user, data analytics serve no purpose. Make sure a data warehouse or similar is in place and the business has a use case for self-service data analytics. A good indicator for this is when the business or project works with a lot of data, and there are business users who have lots of Excel spreadsheets lying around analyzing it in different ways. That’s your best case candidate for Qlik Sense. IT to monitor Qlik Sense environment rather control IT needs to unlearn to learn new things and the same applies when it comes to deploying self-service. Create a framework with guidelines and principles and monitor that users are following it, rather than limiting them in their capabilities. This framework needs to have the input of the users as well and to be elastic. Also, not many IT professionals agree with giving away too much power to the user in the development process, believing this leads to chaos and anarchy. While the risk is there, this fear needs to be overcome. Users love data analytics, and they are keen to get the help of IT to create the most valuable dashboard possible and ensure it will be well received by a wide audience. Identifying key users and user groups is crucial For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and to win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. Governance should always be present but power users should never get the feeling they are restricted by it. Because once they are won over, the rest of the traction and the adoption of other user types is very easy. Qlik Sense sells well–do a lot of demos Data analytics, compelling visualizations, and the interactivity of Qlik Sense is something almost everyone is interested in. The business wants to see its own data aggregated and distilled in a cool and glossy dashboard. Utilize the momentum and do as many demos as you can to win advocates of the technology and promote a consciousness of becoming a data-driven culture in the organization. Even the simplest Qlik Sense dashboards amaze people and boost their creativity for use cases where data analytics in their area could apply and create value. Promote collaboration Sharing is caring. This not only applies to insights, which naturally are shared with the excitement of having found out something new and valuable, but also to how the new insight has been derived. People keep their secrets on the approach and methodology to themselves, but this is counterproductive. It is important that applications, visualizations, and dashboards created with Qlik Sense are shared and demonstrated to other Qlik Sense users as frequently as possible. This not only promotes a data-driven culture but also encourages the collaboration of users and teams across various business functions, which would not have happened otherwise. They could either be sharing knowledge, tips, and tricks or even realizing they look at the same slices of data and could create additional value by connecting them together. Market the success of Qlik Sense within the organization If Qlik Sense has had a successful achievement in a project, tell others about it. Create a success story and propose doing demos of the dashboard and its analytics. IT has been historically very bad in promoting their work, which is counterproductive. Data analytics creates value and there is nothing embarrassing about boasting about its success; as Muhammad Ali suggested, it’s not bragging if it’s true. Introduce guidelines on design and terminology Avoiding the pitfalls of having multiple different-looking dashboards by promoting a consistent branding look across all Qlik Sense dashboards and applications, including terminology and best practices. Ensure the document is easily accessible to all users. Also, create predesigned templates with some sample sheets so the users duplicate them and modify them to their liking and extend them, applying the same design. Protect less experienced users from complexities Don’t overwhelm users if they have never developed in their life. Approach less technically savvy users in a different way by providing them with sample data and sample templates, including a library of predefined visualizations, dimensions, or measures (so-called Master Key Items). Be aware that what is intuitive to Qlik professionals or power users is not necessarily intuitive to other users – be patient and appreciative of their feedback, and try to understand how a typical business user might think. For a strong adoption of the tool, IT needs to prepare the environment and identify the key power users in the organization and win them over to using the technology. It is important they are intensively supported, especially in the beginning, and they are allowed to drive how the technology should be used rather than having principles imposed on them. If you found the excerpt useful, make sure you check out the book Mastering Qlik Sense to learn more of these techniques on efficient Business Intelligence using Qlik Sense. Read more How Qlik Sense is driving self-service Business Intelligence Overview of a Qlik Sense® Application’s Life Cycle What we learned from Qlik Qonnections 2018
Read more
  • 0
  • 0
  • 6180

article-image-quantum-expert-robert-sutor-explains-the-basics-of-quantum-computing
Packt Editorial Staff
12 Dec 2019
9 min read
Save for later

Quantum expert Robert Sutor explains the basics of Quantum Computing

Packt Editorial Staff
12 Dec 2019
9 min read
What if we could do chemistry inside a computer instead of in a test tube or beaker in the laboratory? What if running a new experiment was as simple as running an app and having it completed in a few seconds? For this to really work, we would want it to happen with complete fidelity. The atoms and molecules as modeled in the computer should behave exactly like they do in the test tube. The chemical reactions that happen in the physical world would have precise computational analogs. We would need a completely accurate simulation. If we could do this at scale, we might be able to compute the molecules we want and need. These might be for new materials for shampoos or even alloys for cars and airplanes. Perhaps we could more efficiently discover medicines that are customized to your exact physiology. Maybe we could get a better insight into how proteins fold, thereby understanding their function, and possibly creating custom enzymes to positively change our body chemistry. Is this plausible? We have massive supercomputers that can run all kinds of simulations. Can we model molecules in the above ways today?  This article is an excerpt from the book Dancing with Qubits written by Robert Sutor. Robert helps you understand how quantum computing works and delves into the math behind it with this quantum computing textbook.  Can supercomputers model chemical simulations? Let’s start with C8H10N4O2 – 1,3,7-Trimethylxanthine.  This is a very fancy name for a molecule that millions of people around the world enjoy every day: caffeine. An 8-ounce cup of coffee contains approximately 95 mg of caffeine, and this translates to roughly 2.95 × 10^20 molecules. Written out, this is 295, 000, 000, 000, 000, 000, 000 molecules. A 12 ounce can of a popular cola drink has 32 mg of caffeine, the diet version has 42 mg, and energy drinks often have about 77 mg. These numbers are large because we are counting physical objects in our universe, which we know is very big. Scientists estimate, for example, that there are between 10^49 and 10^50 atoms in our planet alone. To put these values in context, one thousand = 10^3, one million = 10^6, one billion = 10^9, and so on. A gigabyte of storage is one billion bytes, and a terabyte is 10^12 bytes. Getting back to the question I posed at the beginning of this section, can we model caffeine exactly on a computer? We don’t have to model the huge number of caffeine molecules in a cup of coffee, but can we fully represent a single molecule at a single instant? Caffeine is a small molecule and contains protons, neutrons, and electrons. In particular, if we just look at the energy configuration that determines the structure of the molecule and the bonds that hold it all together, the amount of information to describe this is staggering. In particular, the number of bits, the 0s and 1s, needed is approximately 10^48: 10, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000. And this is just one molecule! Yet somehow nature manages to deal quite effectively with all this information. It handles the single caffeine molecule, to all those in your coffee, tea, or soft drink, to every other molecule that makes up you and the world around you. How does it do this? We don’t know! Of course, there are theories and these live at the intersection of physics and philosophy. However, we do not need to understand it fully to try to harness its capabilities.  We have no hope of providing enough traditional storage to hold this much information. Our dream of exact representation appears to be dashed. This is what Richard Feynman meant in his quote: “Nature isn’t classical.” However, 160 qubits (quantum bits) could hold 2^160 ≈ 1.46 × 10^48 bits while the qubits were involved in a computation. To be clear, I’m not saying how we would get all the data into those qubits and I’m also not saying how many more we would need to do something interesting with the information. It does give us hope, however. In the classical case, we will never fully represent the caffeine molecule. In the future, with enough very high-quality qubits in a powerful quantum computing system, we may be able to perform chemistry on a computer. How quantum computing is different than classical computing I can write a little app on a classical computer that can simulate a coin flip. This might be for my phone or laptop. Instead of heads or tails, let’s use 1 and 0. The routine, which I call R, starts with one of those values and randomly returns one or the other. That is, 50% of the time it returns 1 and 50% of the time it returns 0. We have no knowledge whatsoever of how R does what it does. When you see “R,” think “random.” This is called a “fair flip.” It is not weighted to slightly prefer one result over the other. Whether we can produce a truly random result on a classical computer is another question. Let’s assume our app is fair. If I apply R to 1, half the time I expect 1 and another half 0. The same is true if I apply R to 0. I’ll call these applications R(1) and R(0), respectively. If I look at the result of R(1) or R(0), there is no way to tell if I started with 1 or 0. This is just like a  secret coin flip where I can’t tell whether I began with heads or tails just by looking at how the coin has landed. By “secret coin flip,” I mean that someone else has flipped it and I can see the result, but I have no knowledge of the mechanics of the flip itself or the starting state of the coin.  If R(1) and R(0) are randomly 1 and 0, what happens when I apply R twice? I write this as R(R(1)) and R(R(0)). It’s the same answer: random result with an equal split. The same thing happens no matter how many times we apply R. The result is random, and we can’t reverse things to learn the initial value.  Now for the quantum version, Instead of R, I use H. It too returns 0 or 1 with equal chance, but it has two interesting properties. It is reversible. Though it produces a random 1 or 0 starting from either of them, we can always go back and see the value with which we began. It is its own reverse (or inverse) operation. Applying it two times in a row is the same as having done nothing at all.  There is a catch, though. You are not allowed to look at the result of what H does if you want to reverse its effect. If you apply H to 0 or 1, peek at the result, and apply H again to that, it is the same as if you had used R. If you observe what is going on in the quantum case at the wrong time, you are right back at strictly classical behavior.  To summarize using the coin language: if you flip a quantum coin and then don’t look at it, flipping it again will yield heads or tails with which you started. If you do look, you get classical randomness. A second area where quantum is different is in how we can work with simultaneous values. Your phone or laptop uses bytes as individual units of memory or storage. That’s where we get phrases like “megabyte,” which means one million bytes of information. A byte is further broken down into eight bits, which we’ve seen before. Each bit can be a 0 or 1. Doing the math, each byte can represent 2^8 = 256 different numbers composed of eight 0s or 1s, but it can only hold one value at a time. Eight qubits can represent all 256 values at the same time This is through superposition, but also through entanglement, the way we can tightly tie together the behavior of two or more qubits. This is what gives us the (literally) exponential growth in the amount of working memory. How quantum computing can help artificial intelligence Artificial intelligence and one of its subsets, machine learning, are extremely broad collections of data-driven techniques and models. They are used to help find patterns in information, learn from the information, and automatically perform more “intelligently.” They also give humans help and insight that might have been difficult to get otherwise. Here is a way to start thinking about how quantum computing might be applicable to large, complicated, computation-intensive systems of processes such as those found in AI and elsewhere. These three cases are in some sense the “small, medium, and large” ways quantum computing might complement classical techniques: There is a single mathematical computation somewhere in the middle of a software component that might be sped up via a quantum algorithm. There is a well-described component of a classical process that could be replaced with a quantum version. There is a way to avoid the use of some classical components entirely in the traditional method because of quantum, or the entire classical algorithm can be replaced by a much faster or more effective quantum alternative. As I write this, quantum computers are not “big data” machines. This means you cannot take millions of records of information and provide them as input to a quantum calculation. Instead, quantum may be able to help where the number of inputs is modest but the computations “blow up” as you start examining relationships or dependencies in the data.  In the future, however, quantum computers may be able to input, output, and process much more data. Even if it is just theoretical now, it makes sense to ask if there are quantum algorithms that can be useful in AI someday. To summarize, we explored how quantum computing works and different applications of artificial intelligence in quantum computing. Get this quantum computing book Dancing with Qubits by Robert Sutor today where he has explored the inner workings of quantum computing. The book entails some sophisticated mathematical exposition and is therefore best suited for those with a healthy interest in mathematics, physics, engineering, and computer science. Intel introduces cryogenic control chip, ‘Horse Ridge’ for commercially viable quantum computing Microsoft announces Azure Quantum, an open cloud ecosystem to learn and build scalable quantum solutions Amazon re:Invent 2019 Day One: AWS launches Braket, its new quantum service and releases
Read more
  • 0
  • 0
  • 6163

article-image-8-myths-rpa-robotic-process-automation
Savia Lobo
08 Nov 2017
9 min read
Save for later

8 Myths about RPA (Robotic Process Automation)

Savia Lobo
08 Nov 2017
9 min read
Many say we are on the cusp of the fourth industrial revolution that promises to blur the lines between the real, virtual and the biological worlds. Amongst many trends, Robotic Process Automation (RPA) is also one of those buzzwords surrounding the hype of the fourth industrial revolution. Although poised to be a $6.7 trillion industry by 2025, RPA is shrouded in just as much fear as it is brimming with potential. We have heard time and again how automation can improve productivity, efficiency, and effectiveness while conducting business in transformative ways. We have also heard how automation and machine-driven automation, in particular, can displace humans and thereby lead to a dystopian world. As humans, we make assumptions based on what we see and understand. But sometimes those assumptions become so ingrained that they evolve into myths which many start accepting as facts. Here is a closer look at some of the myths surrounding RPA. [dropcap]1[/dropcap] RPA means robots will automate processes The term robot evokes in our minds a picture of a metal humanoid with stiff joints that speaks in a monotone. RPA does mean robotic process automation. But the robot doing the automation is nothing like the ones we are used to seeing in the movies. These are software robots that perform routine processes within organizations. They are often referred to as virtual workers/digital workforce complete with their own identity and credentials. They essentially consist of algorithms programmed by RPA developers with an aim to automate mundane business processes. These processes are repetitive, highly structured, fall within a well-defined workflow, consist of a finite set of tasks/steps and may often be monotonous and labor intensive. Let us consider a real-world example here - Automating the invoice generation process. The RPA system will run through all the emails in the system, and download the pdf files containing details of the relevant transactions. Then, it would fill a spreadsheet with the details and maintain all the records therein. Later, it would log on to the enterprise system and generate appropriate invoice reports for each detail in the spreadsheet. Once the invoices are created, the system would then send a confirmation mail to the relevant stakeholders. Here, the RPA user will only specify the individual tasks that are to be automated, and the system will take care of the rest of the process. So, yes, while it is true that RPA involves robots automating processes, it is a myth that these robots are physical entities or that they can automate all processes. [dropcap]2[/dropcap] RPA is useful only in industries that rely heavily on software “Almost anything that a human can do on a PC, the robot can take over without the need for IT department support.” - Richard Bell, former Procurement Director at Averda RPA is a software which can be injected into a business process. Traditional industries such as banking and finance, healthcare, manufacturing etc that have significant tasks that are routine and depend on software for some of their functioning can benefit from RPA. Loan processing and patient data processing are some examples. RPA, however, cannot help with automating the assembly line in a manufacturing unit or with performing regular tests on patients. Even in industries that maintain daily essential utilities such as cooking gas, electricity, telephone services etc RPA can be put to use for generating automated bills, invoices, meter-readings etc. By adopting RPA, businesses irrespective of the industry they belong to can achieve significant cost savings, operational efficiency, and higher productivity. To leverage the benefits of RPA, rather than understanding the SDLC process, it is important that users have a clear understanding of business workflow processes and domain knowledge. Industry professionals can be easily trained on how to put RPA into practice. The bottom line - RPA is not limited to industries that rely heavily on software to exist. But it is true that RPA can be used only in situations where some form of software is used to perform tasks manually. [dropcap]3[/dropcap] RPA will replace humans in most frontline jobs Many organizations employ a large workforce in frontline roles to do routine tasks such as data entry operations, managing processes, customer support, IT support etc. But frontline jobs are just as diverse as the people performing them. Take sales reps for example. They bring new business through their expert understanding of the company’s products, their potential customer base coupled with the associated soft skills. Currently, they spend significant time on administrative tasks such as developing and finalizing business contracts, updating the CRM database, making daily status reports etc. Imagine the spike in productivity if these aspects could be taken off the plates of sales reps and they could just focus on cultivating relationships and converting leads. By replacing human efforts in mundane tasks within frontline roles, RPA can help employees focus on higher value-yielding tasks. In conclusion, RPA will not replace humans in most frontline jobs. It will, however, replace humans in a few roles that are very rule-based and narrow in scope such as simple data entry operators or basic invoice processing executives. In most frontline roles like sales or customer support, RPA is quite likely to change significantly at least in some ways how one sees their job responsibilities. Also, the adoption of RPA will generate new job opportunities around the development, maintenance, and sale of RPA based software. [dropcap]4[/dropcap] Only large enterprises can afford to deploy RPA The cost of implementing and maintaining the RPA software and training employees to use it can be quite high. This can make it an unfavorable business proposition for SMBs with fairly simple organizational processes and cross-departmental considerations. On the other hand, large organizations with higher revenue generation capacity, complex business processes, and a large army of workers can deploy an RPA system to automate high-volume tasks quite easily and recover that cost within a few months.   It is obvious that large enterprises will benefit from RPA systems due to the economies of scale offered and faster recovery of investments made. SMBs (Small to medium-sized businesses) can also benefit from RPA to automate their business processes. But this is possible only if they look at RPA as a strategic investment whose cost will be recovered over a longer time period of say 2-4 years. [dropcap]5[/dropcap] RPA adoption should be owned and driven by the organization's IT department The RPA team handling the automation process need not be from the IT department. The main role of the IT department is providing necessary resources for the software to function smoothly. An RPA reliability team which is trained in using RPA tools does not include IT professionals but rather business operations professionals. In simple terms, RPA is not owned by the IT department but by the whole business and is driven by the RPA team. [dropcap]6[/dropcap] RPA is an AI virtual assistant specialized to do a narrow set of tasks An RPA bot performs a narrow set of tasks based on the given data and instructions. It is a system of rule-based algorithms which can be used to capture, process and interpret streams of data, trigger appropriate responses and communicate with other processes. However, it cannot learn on its own - a key trait of an AI system. Advanced AI concepts such as reinforcement learning and deep learning are yet to be incorporated in robotic process automation systems. Thus, an RPA bot is not an AI virtual assistant, like Apple’s Siri, for example. That said, it is not impractical to think that in the future, these systems will be able to think on their own, decide the best possible way to execute a business process and learn from its own actions to improve the system. [dropcap]7[/dropcap] To use the RPA software, one needs to have basic programming skills Surprisingly, this is not true. Associates who use the RPA system need not have any programming knowledge. They only need to understand how the software works on the front-end, and how they can assign tasks to the RPA worker for automation. On the other hand, RPA system developers do require some programming skills, such as knowledge of scripting languages. Today, there are various platforms for developing RPA tools such as UIPath, Blueprism and more, which empower RPA developers to build these systems without any hassle, reducing their coding responsibilities even more. [dropcap]8[/dropcap] RPA software is fully automated and does not require human supervision This is a big myth. RPA is often misunderstood as a completely automated system. Humans are indeed required to program the RPA bots, to feed them tasks for automation and to manage them. The automation factor here lies in aggregating and performing various tasks which otherwise would require more than one human to complete. There’s also the efficiency factor which comes into play - the RPA systems are fast, and almost completely avoid faults in the system or the process that are otherwise caused due to human error. Having a digital workforce in place is far more profitable than recruiting human workforce. Conclusion One of the most talked about areas in terms of technological innovations, RPA is clearly still in its early days and is surrounded by a lot of myths. However, there’s little doubt that its adoption will take off rapidly as RPA systems become more scalable, more accurate and deploy faster. AI, cognitive, and Analytics-driven RPA will take it up a notch or two, and help the businesses improve their processes even more by taking away dull, repetitive tasks from the people. Hype can get ahead of the reality, as we've seen quite a few times - but RPA is an area definitely worth keeping an eye on despite all the hype.
Read more
  • 0
  • 0
  • 6050
article-image-3-natural-language-processing-models-every-engineer-should-know
Amey Varangaonkar
18 Dec 2017
5 min read
Save for later

3 Natural Language Processing Models every ML Engineer should know

Amey Varangaonkar
18 Dec 2017
5 min read
[box type="note" align="" class="" width=""]This interesting excerpt is taken from the book Mastering Text Mining with R, co-authored by Ashish Kumar and Avinash Paul. The book gives an advanced view of different natural language processing techniques and their implementation in R. [/box] Our article given below aims to introduce to the concept of language models and their relevance to natural language processing. In terms of natural language processing, language models generate output strings that help to assess the likelihood of a bunch of strings to be a sentence in a specific language. If we discard the sequence of words in all sentences of a text corpus and basically treat it like a bag of words, then the efficiency of different language models can be estimated by how accurately a model restored the order of strings in sentences. Which sentence is more likely: I am learning text mining or I text mining learning am? Which word is more likely to follow “I am…”? Basically, a language model assigns the probability of a sentence being in a correct order. The probability is assigned over the sequence of terms by using conditional probability. Let us define a simple language modeling problem. Assume a bag of words contains words W1, W2,………………….,Wn. A language model can be defined to compute any of the following: Estimate the probability of a sentence S1: P (S1) = P (W1, W2, W3, W4, W5) Estimate the probability of the next word in a sentence or set of strings: P (W3|W2, W1) How to compute the probability? We will use chain law, by decomposing the sentence probability as a product of smaller string probabilities: P(W1W2W3W4) = P(W1)P(W2|W1)P(W3|W1W2)P(W4|W1W2W3) N-gram models N-grams are important for a wide range of applications. They can be used to build simple language models. Let's consider a text T with W tokens. Let SW be a sliding window. If the sliding window consists of one cell (wi wi wi wi) then the collection of strings is called a unigram; if the sliding window consists of two cells, the output Is (w1 , w2)(w3 , w4)(w5 w5 , w5)(w1 , w2)(w3 , w4)(w5 , w5), this is called a bigram .Using conditional probability, we can define the probability of a word having seen the previous word; this is known as bigram probability. So the conditional probability of an element, given the previous element (wi -1), is: Extending the sliding window, we can generalize that n-gram probability as the conditional probability of an element given previous n-1 element: The most common bigrams in any corpus are not very interesting, such as on the, can be, in it, it is. In order to get more meaningful bigrams, we can run the corpus through a part-of-speech (POS) tagger. This would filter the bigrams to more content-related pairs such as infrastructure development, agricultural subsidies, banking rates; this can be one way of filtering less meaningful bigrams. A better way to approach this problem is to take into account collocations; a collocation is the string created when two or more words co-occur in a language more frequently. One way to do this over a corpus is pointwise mutual information (PMI). The concept behind PMI is for two words, A and B, we would like to know how much one word tells us about the other. For example, given an occurrence of A, a, and an occurrence of B, b, how much does their joint probability differ from the expected value of assuming that they are independent. This can be expressed as follows: Unigram model: Punigram(W1W2W3W4) = P(W1)P(W2)P(W3)P(W4) Bigram model: Pbu(W1W2W3W4) = P(W1)P(W2|W1)P(W3|W2)P(W4|W3) P(w1w2…wn ) = P(wi | w1w2…wi"1) Applying the chain rule on n contexts can be difficult to estimate; Markov assumption is applied to handle such situations. Markov Assumption If predicting that a current string is independent of some word string in the past, we can drop that string to simplify the probability. Say the history consists of three words, Wi, Wi-1, Wi-2, instead of estimating the probability P(Wi+1) using P(Wi,i- 1,i-2) , we can directly apply P(Wi+1 | Wi, Wi-1). Hidden Markov Models Markov chains are used to study systems that are subject to random influences. Markov chains model systems that move from one state to another in steps governed by probabilities. The same set of outcomes in a sequence of trials is called states. Knowing the probabilities of states is called state distribution. The state distribution in which the system starts is the initial state distribution. The probability of going from one state to another is called transition probability. A Markov chain consists of a collection of states along with transition probabilities. The study of Markov chains is useful to understand the long-term behavior of a system. Each arc associates to certain probability value and all arcs coming out of each node must have exhibit a probability distribution. In simple terms, there is a probability associated to every transition in states: Hidden Markov models are non-deterministic Markov chains. They are an extension of Markov models in which output symbol is not the same as state. Language models are widely used in machine translation, spelling correction, speech recognition, text summarization, questionnaires, and many more real-world use-cases. If you would like to learn more about how to implement the above language models. check out our book Mastering Text Mining with R. This book will help you leverage the language models using popular packages in R for effective text mining.  
Read more
  • 0
  • 0
  • 6021

article-image-everything-know-ethereum
Packt Editorial Staff
10 Apr 2018
8 min read
Save for later

Everything you need to know about Ethereum

Packt Editorial Staff
10 Apr 2018
8 min read
Ethereum was first conceived of by Vitalik Buterin in November 2013. The critical idea proposed was the development of a Turing-complete language that allows the development of arbitrary programs (smart contracts) for Blockchain and decentralized applications. This concept is in contrast to Bitcoin, where the scripting language is limited in nature and allows necessary operations only. This is an excerpt from the second edition of Mastering Blockchain by Imram Bashir. The following table shows all the releases of Ethereum starting from the first release to the planned final release: Version Release date Olympic May, 2015 Frontier July 30, 2015 Homestead March 14, 2016 Byzantium (first phase of Metropolis) October 16, 2017 Metropolis To be released Serenity (final version of Ethereum) To be released   The first version of Ethereum, called Olympic, was released in May, 2015. Two months later, a second version was released, called Frontier. After about a year, another version named Homestead with various improvements was released in March, 2016. The latest Ethereum release is called Byzantium. This is the first part of the development phase called Metropolis. This release implemented a planned hard fork at block number 4,370,000 on October 16, 2017. The second part of this release called Constantinople is expected in 2018 but there is no exact time frame available yet. The final planned release of Ethereum is called Serenity. It's planned for Serenity to introduce the final version of PoS based blockchain instead of PoW. The yellow paper The Yellow Paper, written by Dr. Gavin Wood, serves as a formal definition of the Ethereum protocol. Anyone can implement an Ethereum client by following the protocol specifications defined in the paper. While this paper is a challenging read, especially for those who do not have a background in algebra or mathematics, it contains a complete formal specification of Ethereum. This specification can be used to implement a fully compliant Ethereum client. The list of all symbols with their meanings used in the paper is provided here with the anticipation that it will make reading the yellow paper more accessible. Once symbol meanings are known, it will be much easier to understand how Ethereum works in practice. Symbol Meaning Symbol Meaning ≡ Is defined as ≤ Less than or equal to = Is equal to Sigma, World state ≠ Is not equal to Mu, Machine state ║...║ Length of Upsilon, Ethereum state transition function Is an element of Block level state transition function Is not an element of . Sequence concatenation For all There exists Union ᴧ Contract creation function Logical AND Increment : Such that Floor, lowest element {} Set Ceiling, highest element () Function of tuple No of bytes [] Array indexing Exclusive OR Logical OR (a ,b) Real numbers >= a and < b > Is greater than Empty set, null + Addition - Subtraction ∑ Summation { Describing various cases of if, otherwise   Ethereum blockchain Ethereum, like any other blockchain, can be visualized as a transaction-based state machine. This definition is referred to in the Yellow Paper. The core idea is that in Ethereum blockchain, a genesis state is transformed into a final state by executing transactions incrementally. The final transformation is then accepted as the absolute undisputed version of the state. In the following diagram, the Ethereum state transition function is shown, where a transaction execution has resulted in a state transition: In the example above, a transfer of two Ether from address 4718bf7a to address 741f7a2 is initiated. The initial state represents the state before the transaction execution, and the final state is what the morphed state looks like. Mining plays a central role in state transition, and we will elaborate the mining process in detail in the later sections. The state is stored on the Ethereum network as the world state. This is the global state of the Ethereum blockchain. How Ethereum works from a user's perspective For all the conversation around cryptocurrencies, it's very rare for anyone to actually explain how it works from the perspective of a user. Let's take a look at how it works in practice. In this example, I'll use the example of one man (Bashir) transferring money to another (Irshad). You may also want to read our post on if Ethereum will eclipse bitcoin. For the purposes of this example, we're using Jaxx wallet. However, you can use any cryptocurrency wallet for this. First, either a user requests money by sending the request to the sender, or the sender decides to send money to the receiver. The request can be sent by sending the receivers Ethereum address to the sender. For example, there are two users, Bashir and Irshad. If Irshad requests money from Bashir, then she can send a request to Bashir by using QR code. Once Bashir receives this request he will either scan the QR code or manually type in Irshad's Ethereum address and send Ether to Irshad's address. This request is encoded as a QR code shown in the following screenshot which can be shared via email, text or any other communication methods.2. Once Bashir receives this request he will either scan this QR code or copy the Ethereum address in the Ethereum wallet software and initiate a transaction. This process is shown in the following screenshot where the Jaxx Ethereum wallet software on iOS is used to send money to Irshad. The following screenshot shows that the sender has entered both the amount and destination address for sending Ether. Just before sending the Ether the final step is to confirm the transaction which is also shown here: Once the request (transaction) of sending money is constructed in the wallet software, it is then broadcasted to the Ethereum network. The transaction is digitally signed by the sender as proof that he is the owner of the Ether. This transaction is then picked up by nodes called miners on the Ethereum network for verification and inclusion in the block. At this stage, the transaction is still unconfirmed. Once it is verified and included in the block, the PoW process begins. Once a miner finds the answer to the PoW problem, by repeatedly hashing the block with a new nonce, this block is immediately broadcasted to the rest of the nodes which then verifies the block and PoW. If all the checks pass then this block is added to the blockchain, and miners are paid rewards accordingly. Finally, Irshad gets the Ether, and it is shown in her wallet software. This is shown here: On the blockchain, this transaction is identified by the following transaction hash: 0xc63dce6747e1640abd63ee63027c3352aed8cdb92b6a02ae25225666e171009e Details regarding this transaction can be visualized from the block explorer, as shown in the following screenshot: Thiswalkthroughh should give you some idea of how it works. Different Ethereum networks The Ethereum network is a peer-to-peer network where nodes participate in order to maintain the blockchain and contribute to the consensus mechanism. Networks can be divided into three types, based on requirements and usage. These types are described in the following subsections. Mainnet Mainnet is the current live network of Ethereum. The current version of mainnet is Byzantium (Metropolis) and its chain ID is 1. Chain ID is used to identify the network. A block explorer which shows detailed information about blocks and other relevant metrics is available here. This can be used to explore the Ethereum blockchain. Testnet Testnet is the widely used test network for the Ethereum blockchain. This test blockchain is used to test smart contracts and DApps before being deployed to the production live blockchain. Because it is a test network, it allows experimentation and research. The main testnet is called Ropsten which contains all features of other smaller and special purpose testnets that were created for specific releases. For example, other testnets include Kovan and Rinkeby which were developed for testing Byzantium releases. The changes that were implemented on these smaller testnets has also been implemented on Ropsten. Now the Ropsten test network contains all properties of Kovan and Rinkeby. Private net As the name suggests, this is the private network that can be created by generating a new genesis block. This is usually the case in private blockchain distributed ledger networks, where a private group of entities start their blockchain and use it as a permissioned blockchain. The following table shows the list of Ethereum network with their network IDs. These network IDs are used to identify the network by Ethereum clients. Network name Network ID / Chain ID Ethereum mainnet 1 Morden 2 Ropsten 3 Rinkeby 4 Kovan 42 Ethereum Classic mainnet 61   You should now have a good foundation of knowledge to get started with Ethereum. To learn more about Ethereum and other cryptocurrencies, check out the new edition of Mastering Blockchain. Other posts from this book A brief history of Blockchain Write your first Blockchain: Learning Solidity Programming in 15 minutes 15 ways to make Blockchains scalable, secure and safe! What is Bitcoin
Read more
  • 0
  • 0
  • 6018

article-image-9-data-science-myths-debunked
Amey Varangaonkar
03 Jul 2018
9 min read
Save for later

9 Data Science Myths Debunked

Amey Varangaonkar
03 Jul 2018
9 min read
The benefits of data science are evident for all to see. Not only does it equip you with the tools and techniques to make better business decisions, the predictive power of analytics also allows you to determine future outcomes - something that can prove to be crucial to businesses. Despite all these advantages, data science is a touchy topic for many businesses. It’s worth looking at some glaring stats that show why businesses are reluctant to adopt data science: Poor data across businesses and organizations - in both private and government costs the U.S economy close to $3 Trillion per year. Only 29% enterprises are able to properly leverage the power of Big Data and derive useful business value from it. These stats show a general lack of awareness or knowledge when it comes to data science. It could be due to some preconceived notions, or simply lack of knowledge and its application that seems to be a huge hurdle to these companies. In this article, we attempt to take down some of these notions and give a much clearer picture of what data science really is. Here are 5 of the most common myths or misconceptions in data science, and why are absolutely wrong: Data Science is just a fad, it won’t last long This is probably the most common misconception. Many tend to forget that although ‘data science’ is a recently coined term, this field of study is a cumulation of decades of research and innovation in statistical methodologies and tools. It has been in use since the 1960s or even before - just that the scale at which it was being used then was small. Back in the day, there were no ‘data scientists’, but just statisticians and economists who used the now unknown terms such as ‘data fishing’ or ‘data dredging’. Even the terms ‘data analysis’ and ‘data mining’ only went mainstream in the 1990s, but they were in use way before that period. Data Science’s rise to fame has coincided with the exponential rise in the amount of data being generated every minute. The need to understand this information and make positive use of it led to an increase in the demand for data science. Now with Big Data and Internet of Things going wild, the rate of data generation and the subsequent need for its analysis will only increase. So if you think data science is a fad that will go away soon, think again. Data Science and Business Intelligence are the same Those who are unfamiliar with what data science and Business Intelligence actually entail often get confused, and think they’re one and the same. No, they’re not. Business Intelligence is an umbrella term for the tools and techniques that give answers to the operational and contextual aspects of your business or organization. Data science, on the other hand has more to do with collecting information in order to build patterns and insights. Learning about your customers or your audience is Business Intelligence. Understanding why something happened, or whether it will happen again, is data science. If you want to gauge how changing a certain process will affect your business, data science - not Business Intelligence - is what will help you. Data Science is only meant for large organizations with large resources Many businesses and entrepreneurs are wrongly of the opinion that data science is - or works best - only for large organizations. It is a wrongly perceived notion that you need sophisticated infrastructure to process and get the most value out of your data. In reality, all you need is a bunch of smart people who know how to get the best value of the available data. When it comes to taking a data-driven approach, there’s no need to invest a fortune in setting up an analytics infrastructure for an organization of any scale. There are many open source tools out there which can be easily leveraged to process large-scale data with efficiency and accuracy. All you need is a good understanding of the tools. It is difficult to integrate data science systems with the organizational workflow With the advancement of tech, one critical challenge that has now become very easy to overcome is to collaborate with different software systems at once. With the rise of general-purpose programming languages, it is now possible to build a variety of software systems using a single programming language. Take Python for example. You can use it to analyze your data, perform machine learning or develop neural networks to work on more complex data models. All this while, you can link your web API designed in Python to communicate with these data science systems. There are provisions being made now to also integrate codes written in different programming languages while ensuring smooth interoperability and no loss of latency. So if you’re wondering how to incorporate your analytics workflow in your organizational workflow, don’t worry too much. Data Scientists will be replaced by Artificial Intelligence soon Although there has been an increased adoption of automation in data science, the notion that the work of a data scientist will be taken over by an AI algorithm soon is rather interesting. Currently, there is an acute shortage of data scientists, as this McKinsey Global Report suggests. Could this change in the future? Will automation completely replace human efforts when it comes to data science? Surely machines are a lot better than humans at finding patterns; AI best the best go player, remember. This is what the common perception seems to be, but it is not true. However sophisticated the algorithms become in automating data science tasks, we will always need a capable data scientist to oversee them and fine-tune their performance. Not just that, businesses will always need professionals with strong analytical and problem solving skills with relevant domain knowledge. They will always need someone to communicate the insights coming out of the analysis to non-technical stakeholders. Machines don’t ask questions of data. Machines don’t convince people. Machines don’t understand the ‘why’. Machines don’t have intuition. At least, not yet. Data scientists are here to stay, and their demand is not expected to go down anytime soon. You need a Ph.D. in statistics to be a data scientist No, you don’t. Data science involves crunching numbers to get interesting insights, and it often involves the use of statistics to better understand the results. When it comes to performing some advanced tasks such as machine learning and deep learning, sure, an advanced knowledge of statistics helps. But that does not imply that people who do not have a degree in maths or statistics cannot become expert data scientists. Today, organizations are facing a severe shortage of data professionals capable of leveraging the data to get useful business insights. This has led to the rise of citizen data scientists - meaning professionals who are not experts in data science, but can use the data science tools and techniques to create efficient data models. These data scientists are no experts in statistics and maths, they just know the tool inside out, ask the right questions, and have the necessary knowledge of turning data into insights. Having an expertise of the data science tools is enough Many people wrongly think that learning a statistical tool such as SAS, or mastering Python and its associated data science libraries is enough to get the data scientist tag. While learning a tool or skill is always helpful (and also essential), by no means is it the only requisite to do effective data science. One needs to go beyond the tools and also master skills such as non-intuitive thinking, problem-solving, and knowing the correct practical applications of a tool to tackle any given business problem. Not just that, it requires you to have excellent communication skills to present your insights and findings related to the most complex of analysis to other stakeholders, in a way they can easily understand and interpret. So if you think that a SAS certification is enough to get you a high-paying data science job and keep it, think again. You need to have access to a lot of data to get useful insights Many small to medium-sized businesses don’t adopt a data science framework because they think it takes lots and lots of data to be able to use the analytics tools and techniques. Data when present in bulk, always helps, true, but don’t need hundreds of thousands of records to identify some pattern, or to extract relevant insights. Per IBM, data science is defined by the 4 Vs of data, meaning Volume, Velocity, Veracity and Variety. If you are able to model your existing data into one of these formats, it automatically becomes useful and valuable. Volume is important to an extent, but it’s the other three parameters that add the required quality. More data = more accuracy Many businesses collect large hordes of information and use the modern tools and frameworks available at their disposal for analyzing this data. Unfortunately, this does not always guarantee accurate results. Neither does it guarantee useful actionable insights or more value. Once the data is collected, the preliminary analysis on what needs to be done with the data is required. Then, we use the tools and frameworks at our disposal to extract the relevant insights and built an appropriate data model. These models need to be fine-tuned as per the processes for which they will be used. Then, eventually, we get the desired degree of accuracy from the model. Data in itself is quite useless. It’s how we work on it - more precisely, how effectively we work on it - that makes all the difference. So there you have it! Data science is one of the most popular skills to have in your resume today, but it is important to first clear all the confusions and misconceptions that you may have about it. Lack of information or misinformation can do more harm than good, when it comes to leveraging the power of data science within a business - especially considering it could prove to be a differentiating factor for its success and failure. Do you agree with our list? Do you think there are any other commonly observed myths around data science that we may have missed? Let us know. Read more 30 common data science terms explained Why is data science important? 15 Useful Python Libraries to make your Data Science tasks Easier
Read more
  • 0
  • 0
  • 6015
article-image-how-artificial-intelligence-and-machine-learning-can-turbocharge-a-game-developers-career
Guest Contributor
06 Sep 2018
7 min read
Save for later

How Artificial Intelligence and Machine Learning can turbocharge a Game Developer's career

Guest Contributor
06 Sep 2018
7 min read
Gaming - whether board games or games set in the virtual realm - has been a massively popular form of entertainment since time immemorial. In the pursuit of creating more sophisticated, thrilling, and intelligent games, game developers have delved into ML and AI technologies to fuel innovation in the gaming sphere. The gaming domain is the ideal experimentation bed for evolving technologies because not only do they put up complex and challenging problems for ML and AI to solve, they also pose as a ground for creativity - a meeting ground for machine learning and the art of interaction. Machine Learning and Artificial Intelligence in Gaming The reliance on AI for gaming is not a recent development. In fact, it dates back to 1949, when the famous cryptographer and mathematician Claude Shannon made his musings public about how a supercomputer could be made to master Chess. Then again, in 1952, a graduate student in the UK developed an AI that could play tic-tac-toe with ultimate perfection. Source : Medium However, it isn’t just ML and AI that are progressing through experimentations on games. Game development, too, has benefited a great deal from these pioneering technologies. AI and ML have helped enhance the gaming experience on many grounds such as gaming design, the interactive quotient, as well as the inner functionalities of games. The above mentioned AI use cases focus on two primary things: one is to impart enhanced realism in virtual gaming environment and the second is to create a more naturalistic interface between the gaming environment and the players. As of now, the focus of game developers, data scientists, and ML researchers lies in two specific categories of the gaming domain - games of perfect information and games of imperfect information. In games of perfect information, a player is aware of all the aspects of the game throughout the playing session, whereas, in games of imperfect information, players are oblivious to specific aspects of the game. When it comes to games of perfect information such as Chess and Go, AI has shown various instances of overpowering human intelligence. Back in 1997, IBM’s Deep Blue successfully defeated world Chess champion, Garry Kasparov in a six-game match. In 2016, Google’s AlphaGo emerged as the victor in a Go match scoring 4-1 after defeating South Korean Go champion, Lee Sedol. One of the most advanced chess AIs developed yet, Stockfish, uses a combination of advanced heuristics and brute force to compute numeric values for each and every move in a specific position in Chess. It also effectively eliminates bad moves using the Alpha-beta pruning search algorithm. While the progress and contribution of AI and ML to the field of games of perfect information is laudable, researchers are now intrigued by games of imperfect information. Games of imperfect information offer much more challenging situations that are essentially difficult for machines to learn and master. Thus, the next evolution in the world of gaming will be to create spontaneous gaming environment using AI technology, in which developers will build only the gaming environment and its mechanics instead of creating a game with pre-programmed/scripted plots. In such a scenario, the AI will have to confront and solve spontaneous challenges with personalized scenarios generated on the spot. Games like StarCraft and StarCraft II have stirred up massive interest among game researchers and developers. In these games, the players are only partially aware of the gaming aspects and the game is largely determined not just by the AI moves and the previous state of the game, but also by the moves of other players. Since in these games you will have little knowledge about your rival’s moves, you have to take decisions on the go and your moves have to be spontaneous. The recent win of OpenAI Five over amateur human players in Dota2 is a good case in point. OpenAI Five is a team of five neural networks that leverages an advanced version of Proximal Policy Optimization and uses a separate LSTM to learn identifiable strategies. The progress of OpenAI Five shows that even without human data, reinforcement learning can facilitate long-term planning, thus, allowing us to make further progress in the games of imperfect information. Career in Game Development With ML and AI As ML and AI continue to penetrate the gaming industry, it is creating a huge demand for talented and skilled game developers who are well-versed in these technologies. Today, game development is at a place where it’s no longer necessary to build games using time-consuming manual techniques. ML and AI have made the task of game developers easier as by leveraging these technologies, they can design and build innovative gaming environment, and test them automatically. The integration of AI and ML in the gaming domain is giving birth to new job positions like Gameplay Software Engineer (AI), Gameplay Programmer (AI), and Game Security Data Scientist, to name a few. The salaries of traditional game developers is in stark contrast with that of those having AI/ML skills. While the average salary of game developers is usually around $44,000, it can scale up to and over $1,20,000 if one possesses AI/ML skills. Gameplay Engineer Average salary - $73,000 - $1,16,000 Gameplay engineers are usually part of the core game dev team and are entrusted with the responsibility of enhancing the existing gameplay systems to enrich the player experience. Companies today demand for gameplay engineers who are proficient in C/C++ and well-versed with AI/ML technologies. Gameplay Programmer Average salary - $98,000 - $1,49,000 Gameplay programmers work in close collaboration with the production and design team to develop cutting edge features in the existing and upcoming gameplay systems. Programming skills are a must and knowledge of AI/ML technologies is an added bonus. Game Security Data Scientist Average salary - $73,000 - $1,06,000 The role of a gameplay security data scientist is to combine both security and data science approaches to detect anomalies and fraudulent behavior in games. This calls for a high degree of expertise in AI, ML, and other statistical methods. With impressive salaries and exciting job opportunities cropping up fast in the game development sphere, the industry is attracting some major talent towards it. Game developers and software developers around the world are choosing the field due to the promises of rapid career growth. If you wish to bag better and more challenging roles in the domain of game development, you should definitely try and upskill your talent and knowledge base by mastering the fields of ML and AI. Packt Publishing is the leading UK provider of Technology eBooks, Coding eBooks, Videos and Blogs; helping IT professionals to put software to work. It offers several books and videos on Game development with AI and machine learning. It’s never too late to learn new disciplines and expand your knowledge base. There are numerous online platforms that offer great artificial intelligent courses. The perk of learning from a registered online platform is that you can learn and grow at your own pace and according to your convenience. So, enroll yourself in one and spice up your career in game development! About Author: Abhinav Rai is the Data Analyst at UpGrad, an online education platform providing industry oriented programs in collaboration with world-class institutes, some of which are MICA, IIIT Bangalore, BITS and various industry leaders which include MakeMyTrip, Ola, Flipkart etc.   Best game engines for AI game development Implementing Unity game engine and assets for 2D game development [Tutorial] How to use arrays, lists, and dictionaries in Unity for 3D game development      
Read more
  • 0
  • 0
  • 5963

article-image-dask-library-scalable-analytics-python
Amey Varangaonkar
22 May 2018
6 min read
Save for later

Introducing Dask: The library that makes scalable analytics in Python easier

Amey Varangaonkar
22 May 2018
6 min read
Python’s rise as the preferred language of choice in Data Science is unprecedented, but not really unexpected. Apart from being a general-purpose language which can be used for a variety of tasks - from scripting to networking, Python offers a rich suite of libraries for general data science tasks such as scientific computing, data visualization, and more. However, one big challenge faced by the data scientists is that these packages are not designed for scale. This is crucial in today’s Big Data era where tons of data needs to be processed and analyzed on the go. A platform which supports the existing Python ecosystem and allows it to scale across multiple machines and clusters without affecting the performance was conspicuously missing. Enter Dask. What is Dask? Dask is a flexible parallel computing library written in Python for analytics, designed mainly to offer scalability and enhanced power to the existing packages and libraries. It allows the users to integrate their existing Python-based projects written in popular libraries such as NumPy, SciPy, pandas, and more. Architecture is demonstrated in the diagram below: Architecture (Image courtesy: Slideshare) The 2 key components of Dask that interact with the Python libraries are: Dynamic task schedulers - which takes care of the intensive computational workloads ‘Big Data’ Dask collections - consisting of dataframes, parallel arrays and interfaces that allow for the computations to run on distributed environments Why use Dask? Given there are already quite a few distributed platforms for large-scale data processing such as Apache Spark, Apache Storm, Flink and so on, why and when should one go for Dask? What are the advantages offered by this Python library? Let us take a look at the 4 major reasons to prefer Dask for distributed, scalable analytics in Python: Easy to get started: If you are an existing Python user, you must have already worked with popular Python packages such as NumPy, SciPy, matplotlib, scikit-learn, pandas, and more. Dask offers a similar, intuitive interface and since it is a part of the bigger Python ecosystem, getting started with Dask is very easy. It uses the existing Python APIs to switch between the popular packages and their Dask-equivalents, so you don’t have to spend a lot of time in porting the code. For absolute beginners, using Dask for scalable analytics would be an easier and logical option to pursue, once they have grasped the fundamentals of Python and the associated libraries. Scales up and down quite easily: You can run your project on Dask on a single machine, or on a cluster with thousands of cores without essentially affecting the speed and performance of your code. Dask uses the multi-core CPUs within a single system optimally to process hundreds of terabytes of data without the need for additional hardware. Similarly, for moderate to large datasets spanning 100+ gigabytes which often don’t fit into a single storage device, the computing power of the clusters can be coupled with Dask for effective analytics. Supports complex applications: Many companies tend to tackle complex computations by introducing custom codes that run on popular Big Data tools such as Hadoop MapReduce and Apache Spark. However, with the help of the dynamic task schedule feature of Dask, it is now possible to run and process complex applications without introducing any additional code. Dask is solely responsible for the smooth handling of various tasks such as network communication, load balancing and diagnostics, among the others. Clear, responsive, real-time feedback: One of the most important features of Dask is its user-friendliness. Dask provides a real-time dashboard that highlights the key metrics of the processing task undertaken by the user - such as the current progress of your project, memory consumption and more. It also offers an in-built IPython kernel that allows the user to investigate the ongoing computation with just a terminal. How Dask compares with Apache Spark Apache Spark is one of the most popular and widely used Big Data tools for distributed data processing and analytics. Dask and Apache Spark have many features in common, prompting us and many other developers to ask the question - which tool is better? While Spark has been around for quite some and has many standard, stable features over years of development, Dask is quite new and is still being improved as a tool. We summarize the important differences between Dask and Apache Spark in the table below: CriteriaApache SparkDaskPrimary languageScalaPythonScaleSupports a single node to thousands of nodes in the clusterSupports a single node to thousands of nodes in the clusterEcosystemAll-in-one self-sufficient ecosystemIntegration with popular libraries within the Python ecosystemFlexibilityLowHighStream processingBuilt-in module called Spark Streaming presentReal-time interface which is pretty low-level, requires more work than Apache SparkGraph processingPossible with GraphX moduleNot possibleMachine learningUses the Spark MLlib moduleIntegrates with scikit-learn and XGBoostPopularityVery high, commonly used tool in the Big Data ecosystemFairly new tool but has already found its place in the pandas, scikit-learn and Jupyter stack   You can read a detailed comparison of Apache Spark and Dask on the official Dask documentation page. What we can expect from Dask As we saw from the comparison above, it is fairly easy to port an existing Python project using several high-profile Python libraries such as NumPy, scikit-learn and more. Python developers and data scientists will appreciate the high flexibility and complex computational capabilities offered by Dask. The limited stream processing and graph processing features are big areas of improvement, but we can expect some developments in this domain in the near future. Even though Dask is still relatively new, it looks very promising due to its close affinity with the Python ecosystem. With Python’s clout rising, many people would prefer a Python-based data processing tool which works at scale, without having to switch to an external Big Data framework. Dask may well be the superhero to come to the developers’ rescue, in such cases. You can learn more about the latest developments in Dask on their official GitHub page. Read more Is Apache Spark today’s Hadoop? Apache Spark 2.3 now has native Kubernetes support! Should you move to Python 3? 7 Python experts’ opinions
Read more
  • 0
  • 0
  • 5900
Modal Close icon
Modal Close icon