Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-how-far-will-facebook-go-to-fix-what-it-broke-democracy-trust-reality
Aarthi Kumaraswamy
24 Sep 2018
19 min read
Save for later

How far will Facebook go to fix what it broke: Democracy, Trust, Reality

Aarthi Kumaraswamy
24 Sep 2018
19 min read
Facebook, along with other tech media giants, like Twitter and Google, broke the democratic process in 2016. Facebook also broke the trust of many of its users as scandal after scandal kept surfacing telling the same story in different ways - the story of user data and trust abused in exchange for growth and revenue. The week before last, Mark Zuckerberg posted a long explanation on Facebook titled ‘Preparing for Elections’. It is the first of a series of reflections by Zuckerberg that ‘address the most important issues facing Facebook’. That post explored what Facebook is doing to avoid ending up in a situation similar to the 2016 elections when the platform ‘inadvertently’ became a super-effective channel for election interference of various kinds. It follows just weeks after Facebook COO, Sheryl Sandberg appeared in front of a Senate Intelligence hearing alongside Twitter CEO, Jack Dorsey on the topic of social media’s role in election interference. Zuckerberg’s mobile-first rigor oversimplifies the issues Zuckerberg opened his post with a strong commitment to addressing the issues plaguing Facebook using the highest levels of rigor the company has known in its history. He wrote, “I am bringing the same focus and rigor to addressing these issues that I've brought to previous product challenges like shifting our services to mobile.”  To understand the weight of this statement we must go back to how Facebook became a mobile-first company that beat investor expectations wildly. Suffice to say it went through painful years of restructuring and reorientation in the process. Those unfamiliar with that phase of Facebook, please read the section ‘How far did Facebook go to become a mobile-first company?’ at the end of this post for more details. To be fair, Zuckerberg does acknowledge that pivoting to mobile was a lot easier than what it will take to tackle the current set of challenges. He writes, “These issues are even harder because people don't agree on what a good outcome looks like, or what tradeoffs are acceptable to make. When it comes to free expression, thoughtful people come to different conclusions about the right balances. When it comes to implementing a solution, certainly some investors disagree with my approach to invest so much on security. We have a lot of work ahead, but I am confident we will end this year with much more sophisticated approaches than we began, and that the focus and investments we've put in will be better for our community and the world over the long term.” However, what Zuckerberg does not acknowledge in the above statement is that the current set of issues is not merely a product challenge, but a business ethics and sustainability challenge. Unless ‘an honest look in the mirror’ kind of analysis is done on that side of Facebook, any level of product improvements will only result in cosmetic changes that will end in an ‘operation successful, patient dead’ scenario. In the coming sections, I attempt to dissect Zuckerberg’s post in the context of the above points by reading between the lines to see how serious the platform really is about changing its ways to ‘be better for our community and the world over the long term’. Why does Facebook’s commitment to change feel hollow? Let’s focus on election interference in this analysis as Zuckerberg limits his views to this topic in his post. Facebook has been at the center of this story on many levels. Here is some context on where Zuckerberg is coming from.   Facebook’s involvement in the 2016 election meddling Apart from the traditional cyber-attacks (which they had even back then managed to prevent successfully), there were Russia-backed coordinated misinformation campaigns found on the platform. Then there was also the misuse of its user data by data analytics firm, Cambridge Analytica, which consulted on election campaigning. They micro-profiled users based on their psychographics (the way they think and behave) to ensure more effective ad spending by political parties. There was also the issue of certain kinds of ads, subliminal messages and peer pressure sent out to specific Facebook users during elections to prompt them to vote for certain candidates while others did not receive similar messages. There were also alleged reports of a certain set of users having been sent ‘dark posts’ (posts that aren’t publicly visible to all, but visible only to those on the target list) to discourage them from voting altogether. It also appears that Facebook staff offered both the Clinton and the Trump campaigns to assist with Facebook advertising. The former declined the offer while the latter accepted. We don’t know which of the above and to what extent each of these decisions and actions impacted the outcome of the 2016 US presidential elections. But one thing is certain, collectively they did have a significant enough impact for Zuckerberg and team to acknowledge these are serious problems that they need to address, NOW! Deconstructing Zuckerberg’s ‘Protecting Elections’ Before diving into what is problematic about the measures that are taken (or not taken) by Facebook, I must commend them for taking ownership of their role in election interference in the past and for attempting to rectify the wrongs. I like that Zuckerberg has made himself vulnerable by sharing his corrective plans with the public while it is a work in progress and is engaging with the public at a personal level. Facebook’s openness to academic research using anonymized Facebook data and their willingness to permit publishing findings without Facebook’s approval is also noteworthy. Other initiatives such as the political ad transparency report, AI enabled fake account & fake news reduction strategy, doubling the content moderator base, improving their recommendation algorithms are all steps in the right direction. However, this is where my list of nice things to say ends. The overall tone of Zuckerberg’s post is that of bargaining rather than that of acceptance. Interestingly this was exactly the tone adopted by Sandberg as well in the Senate hearing earlier this month, down to some very similar phrases. This makes one question if everything isn’t just one well-orchestrated PR disaster management plan. Disappointingly, most of the actions stated in Zuckerberg's post feel like half-measures; I get the sense that they aren’t willing to go the full distance to achieve the objectives they set for themselves. I hope to be wrong. 1. Zuckerberg focuses too much on ‘what’ and ‘how’, is ignoring the ‘why’ Zuckerberg identifies three key issues he wants to address in 2018: preventing election interference, protecting the community from abuse, and providing users with better control over their information. This clarity is a good starting point. In this post, he only focuses on the first issue. So I will reserve sharing my detailed thoughts on the other two for now. What I would say for now is that the key to addressing all issues on Facebook is taking a hard look at Facebook policies, including privacy, from a mission statement perspective. In other words, be honest about ‘Why Facebook exists’. Users are annoyed, advertisers are not satisfied and neither are shareholders confident about Facebook’s future. Trying to be everyone’s friend is clearly not working for Facebook. As such, I expected this in the opening part of the series. ‘Be better for our community and the world over the long term’ is too vague of a mission statement to be of any practical use. 2. Political Ad transparency report is necessary, but not sufficient In May this year, Facebook released its first political ad transparency report as a gesture to show its commitment to minimizing political interference. The report allows one to see who sponsored which issue advertisement and for how much. This was a move unanimously welcomed by everyone and soon others like Twitter and Google followed suit. By doing this, Facebook hopes to allow its users to form more informed views about political causes and other issues.   Here is my problem with this feature. (Yes, I do view this report as a ‘feature’ of the new Facebook app which serves a very specific need: to satisfy regulators and media.) The average Facebook user is not the politically or technologically savvy consumer. They use Facebook to connect with friends and family and maybe play silly games now and then. The majority of these users aren’t going to proactively check out this ad transparency report or the political ad database to arrive at the right conclusions. The people who will find this report interesting are academic researchers, campaign managers, and analysts. It is one more rich data point to understand campaign strategy and thereby infer who the target audience is. This could most likely lead to a downward spiral of more and more polarizing ads from parties across the spectrum. 3. How election campaigning, hate speech, and real violence are linked but unacknowledged Another issue closely tied with political ads is hate speech and violence-inciting polarising content that aren’t necessarily paid ads. These are typical content in the form of posts, images or videos that are posted as a response to political ads or discourses. These act as carriers that amplify the political message, often in ways unintended by the campaigners themselves. The echo chambers still exist. And the more one's ecosystem or ‘look-alike audience’ responds to certain types of ads or posts, users are more likely to keep seeing them, thanks to Facebook's algorithms. Seeing something that is endorsed by one’s friends often primes one to trust what is said without verifying the facts for themselves thus enabling fake news to go viral. The algorithm does the rest to ensure everyone who will engage with the content sees it. Newsy political ads will thrive in such a setup while getting away with saying ‘we made full disclosure in our report’. All of this is great for Facebook’s platform as it not only gets great engagement from the content but also increased ad spendings from all political parties as they can’t afford to be missing from action on Facebook. A by-product of this ultra-polarised scenario though is more protectionism and less free, open and meaningful dialog and debate between candidates as well as supporters on the platform. That’s bad news for the democratic process. 4. Facebook’s election interference prevention model is not scalable Their single-minded focus on eliminating US election interference on Facebook’s platforms through a multipronged approach to content moderation is worth appreciating. This also makes one optimistic about Facebook’s role in consciously attempting to do the right thing when it comes to respecting election processes in other nations as well. But the current approach of creating an ‘election war room’ is neither scalable nor sustainable. What happens everytime a constituency in the US has some election or some part of the world does? What happens when multiple elections take place across the world simultaneously? Who does Facebook prioritize to provide election interference defense support and why? Also, I wouldn’t go too far to trust that they will uphold individual liberties in troubled nations with strong regimes or strong divisive political discourses. What happens when the ruling party is the one interfering with the elections? Who is Facebook answerable to? 5. Facebook’s headcount hasn’t kept up with its own growth ambitions  Zuckerberg proudly states in his post that they’ve deleted a billion fake accounts with machine learning and have double the number of people hired to work on safety and security. "With advances in machine learning, we have now built systems that block millions of fake accounts every day. In total, we removed more than one billion fake accounts -- the vast majority within minutes of being created and before they could do any harm -- in the six months between October and March. ....it is still very difficult to identify the most sophisticated actors who build their networks manually one fake account at a time. This is why we've also hired a lot more people to work on safety and security -- up from 10,000 last year to more than 20,000 people this year." ‘People working on safety and security’ could have a wide range of job responsibilities from network security engineers to security guards hired at Facebook offices. What is missing conspicuously in the above picture is a breakdown of the number of people hired specifically to fact check, moderate content and resolve policy related disputes and review flagged content. With billions of users posting on Facebook, the job of content moderators and policy enforcers, even when assisted by algorithms, is massive. It is important that they are rightly incentivized to do their job well and are set clear and measurable goals. The post neither talks of how Facebook plans to reward moderators and neither does it talk about what the yardsticks for performance in this area would be. Facebook fails to acknowledge that it is not fully prepared, partly because it is understaffed. 6. The new Product Policy Director, human rights role is a glorified Public Relations job The weekend following Zuckerberg’s post, a new job opening appeared on Facebook’s careers page for the position of ‘Product policy director, human rights’. Below snippet is taken from that job posting. Source: Facebook careers The above is typically what a Public relations head does as well. Not only are the responsibilities cited above heavily communication and public perception building based, there’s not much given in terms of authority to this role to influence how other teams achieve their goals. Simply put, this role ‘works with, coordinates or advises teams’, it does not ‘guide or direct teams’. Als,o another key point to observe is that this role aims to add another layer of distance to further minimize exposure for Zuckerberg, Sandberg and other top key executives in public forums such as congressional hearings or press meets. Any role/area that is important to a business typically finds a place at the C-suite table. Had this new role been one of the c-suite roles it would have been advertised so, and it may have had some teeth. Of the 24 key executives in Facebook, only one is concerned with privacy and policy, ‘Chief Privacy Officer & VP of U.S. Public Policy’. Even this role does not have a global directive or public welfare in mind. On the other hand, there are multiple product development, creative and business development roles on Facebook’s c-suite. There is even a separate watch product head, a messaging product head, and one just dedicated to China called ‘Head of Creative Shop - Greater China’. This is why Facebook’s plan to protect elections will fail I am afraid Facebook’s greatest strength is also it’s Achilles heel. The tech industry’s deified hacker culture is embodied perfectly by Facebook. Facebook’s ad revenue based flawed business model is the ingenious creation of that very hacker culture. Any attempts to correct everything else is futile without correcting the issues with the current model. The ad revenue based model is why the Facebook app is designed the way it is: with ‘relevant’ news feeds, filter bubbles and look-alike audience segmentation. It is the reason why viral content gets rewarded irrespective of its authenticity or the impact it has on society. It is also the reason why Facebook has a ‘move fast and break things’ internal culture where growth at all costs is favored and idolized. Facebook’s Q2 2018 Earnings summary highlights the above points succinctly. Source: Facebook's SEC Filing The above snapshot means that even if we assume all 30k odd employees do some form of content moderation (the probability of which is zero), every employee is responsible for 50k users’ content daily. Let’s say every user only posts 1 post a day. If we assume Facebook’s news feed algorithms are super efficient and only find 2% of the user content questionable/fake (as speculated by Sandberg in her Senate hearing this month), that would still mean nearly 1k posts per person to review every day!   What can Facebook do to turn over a new leaf? Unless Facebook attempts to sincerely address at least some of the below, I will continue to be skeptical of any number of beautifully written posts by Zuckerberg or patriotically orated speeches by Sandberg. A content moderation transparency report that shares not just the number of posts moderated, the number of people working to moderate content on Facebook but also the nature of content moderated, the moderators’ job satisfaction levels, their tenure, qualifications, career aspirations, their challenges, and how much Facebook is investing in people, processes and technology to make its platform safe and objective for everyone to engage with others. A general Ad transparency report that not only lists advertisers on Facebook but also their spendings and chosen ad filters for the public and academia to review or analyze any time. Taking responsibility for the real-world consequences of actions enabled by Facebook. Like the recent gender and age discrimination employment ads shown on Facebook. Really banning hate speech and fake viral content. Bring in a business/AI ethics head who is only next to Zuckerberg and equal to Sandberg’s COO role. Exploring and experimenting with other alternative revenue channels to tackle the current ad-driven business model problem. Resolving the UI problem so that users can gain back control over their data and make it easy to choose to not participate in Facebook’s data experiments. This would mean a potential loss in some ad revenue. The ‘grow hacker’ culture problem that is a byproduct of years of moving fast and breaking things. This would mean a significant change in behavior by everyone starting from the top and probably restructuring the way teams are organized and business is done. It would also mean a different definition and measurement of success which could lead to shareholder backlash. But Mark is uniquely placed to withstand these pressures given his clout over the board voting powers. Like Augustus Caesar his role model, Zuckerberg has a chance to make history. But he might have to put the company through hard and sacrificing times in exchange for the proverbial 200 years of world peace. He’s got the best minds and limitless resources at his disposal to right what he and his platform wronged. But he would have to make enemies with the hands that feed him. Would he rise to the challenge? Like Augustus who is rumored to have killed his grandson, will Zuckerberg ever be prepared to kill his ad revenue generating brainchild? In the meanwhile, we must not underestimate the power of good digital citizenry. We must continue to fight the good fight to move tech giants like Facebook in the right direction. Just as persistent trickling water droplets can erode mountains and create new pathways, so can our mindful actions as digital platform users prompt major tech reforms. It could be as bold as deleting one's Facebook account (I haven’t been on the platform for years now, and I don’t miss it at all). You could organize groups to create awareness on topics like digital privacy, fake news, filter bubbles, or deliberately choose to engage with those whose views differ from yours to understand their perspective on topics and thereby do your part in reversing algorithmically accentuated polarity. It could also be by selecting the right individuals to engage in informed dialog with tech conglomerates. Not every action needs to be hard though. It could be as simple as customizing your default privacy settings or choosing to only spend a select amount of time on such platforms, or deciding to verify the authenticity and assessing the toxicity of a post you wish to like, share or forward to your network. Addendum How far did Facebook go to become a mobile-first company? Following are some of the things Facebook did to become the largest mobile advertising platform in the world, surpassing Google by a huge margin. Clear purpose and reason for the change: “For one, there are more mobile users. Second, they’re spending more time on it... third, we can have better advertising on mobile, make more money,” said Zuckerberg at TechCrunch Disrupt back in 2012 on why they were becoming mobile first. In other words, there was a lot of growth and revenue potential in investing in this space. This was a simple and clear ‘what’s in it for me’ incentive for everyone working to make the transition as well for stockholders and advertisers to place their trust in Zuckerberg’s endeavors. Setting company-wide accountability: “We realigned the company around, so everybody was responsible for mobile.”, said the then President of Business and Marketing Partnerships David Fischer to Fortune in 2013. Willing to sacrifice desktop for mobile: Facebook decided to make a bold gamble to lose its desktop users to grow its unproven mobile platform. Essentially it was willing to bet its only cash cow for a dark horse that was dependent on so many other factors to go right. Strict consequences for non-compliance: Back in the days of transitioning to a mobile-first company Zuckerberg famously said to all his product teams that when they went in for reviews: “Come in with mobile. If you come in and try to show me a desktop product, I’m going to kick you out. You have to come in and show me a mobile product.” Expanding resources and investing in reskilling: They grew their team of 20 mobile engineers to literally all engineers at Facebook undergoing training courses on iOS and Android development. “we’ve completely changed the way we do product development. We’ve trained all our engineers to do mobile first.”, said Facebook’s VP of corporate development, Vaughan Smith to TechCrunch by the end of 2012. Realigning product design philosophy: Designed custom features for the mobile-first interface instead of trying to adapt the features for the web to mobile. In other words, they began with mobile as their default user interface. Local and global user behavior sensitization: Some of their engineering teams even did field visits to developing nations like the Philippines to see first hand how mobile apps are being used there. Environmental considerations in app design: Facebook even had the foresight to consider scenarios where mobile users may not have quality internet signals or poor quality mobile battery related issues. They designed their apps keeping these future needs in mind.
Read more
  • 0
  • 0
  • 2855

article-image-understanding-deep-reinforcement-learning-by-understanding-the-markov-decision-process-tutorial
Savia Lobo
24 Sep 2018
10 min read
Save for later

Understanding Deep Reinforcement Learning by understanding the Markov Decision Process [Tutorial]

Savia Lobo
24 Sep 2018
10 min read
This article is an excerpt taken from the book, Hands-On Intelligent Agents with OpenAI Gym, written by Praveen Palanisamy. In this article, the author introduces us to the Markov Decision Process followed by the understanding of Deep reinforcement learning. A Markov Decision Process (MDP) provides a formal framework for reinforcement learning. It is used to describe a fully observable environment where the outcomes are partly random and partly dependent on the actions taken by the agent or the decision maker. The following diagram is the progression of a Markov Process into a Markov Decision Process through the Markov Reward Process: These stages can be described as follows: A Markov Process (or a markov chain) is a sequence of random states s1, s2,...  that obeys the Markov property. In simple terms, it is a random process without any memory about its history. A Markov Reward Process (MRP) is a Markov Process (also called a Markov chain) with values. A Markov Decision Process is a Markov Reward Process with decisions. Dynamic programming with Markov Decision Process Dynamic programming is a very general method to efficiently solve problems that can be decomposed into overlapping sub-problems. If you have used any type of recursive function in your code, you might have already got some preliminary flavor of dynamic programming. Dynamic programming, in simple terms, tries to cache or store the results of sub-problems so that they can be used later if required, instead of computing the results again. Okay, so how is that relevant here, you may ask. Well, they are pretty useful for solving a fully defined MDP, which means that an agent can find the most optimal way to act in an environment to achieve the highest reward using dynamic programming if it has full knowledge of the MDP! In the following table, you will find a concise summary of what the inputs and outputs are when we are interested in sequential prediction or control: Task/objective Input Output Prediction MDP or MRP and policy  Value function  Control MDP Optimal value function  and optimal policy  Monte Carlo learning and temporal difference learning At this point, we understand that it is very useful for an agent to learn the state value function , which informs the agent about the long-term value of being in state so that the agent can decide if it is a good state to be in or not. The Monte Carlo (MC) and Temporal Difference (TD) learning methods enable an agent to learn that! The goal of MC and TD learning is to learn the value functions from the agent's experience as the agent follows its policy . The following table summarizes the value estimate's update equation for the MC and TD learning methods: Learning method State-value function Monte Carlo Temporal Difference MC learning updates the value towards the actual return ,which is the total discounted reward from time step t. This means that until the end. It is important to note that we can calculate this value only after the end of the sequence, whereas TD learning (TD(0) to be precise), updates the value towards the estimated return given by , which can be calculated after every step. SARSA and Q-learning It is also very useful for an agent to learn the action value function , which informs the agent about the long-term value of taking action  in state  so that the agent can take those actions that will maximize its expected, discounted future reward. The SARSA and Q-learning algorithms enable an agent to learn that! The following table summarizes the update equation for the SARSA algorithm and the Q-learning algorithm: Learning method Action-value function SARSA Q-learning SARSA is so named because of the sequence State->Action->Reward->State'->Action' that the algorithm's update step depends on. The description of the sequence goes like this: the agent, in state S, takes an action A and gets a reward R, and ends up in the next state S', after which the agent decides to take an action A' in the new state. Based on this experience, the agent can update its estimate of Q(S,A). Q-learning is a popular off-policy learning algorithm, and it is similar to SARSA, except for one thing. Instead of using the Q value estimate for the new state and the action that the agent took in that new state, it uses the Q value estimate that corresponds to the action that leads to the maximum obtainable Q value from that new state, S'. Deep reinforcement learning With a basic understanding of reinforcement learning, you are now in a better state (hopefully you are not in a strictly Markov state where you have forgotten the history/things you have learned so far) to understand the basics of the cool new suite of algorithms that have been rocking the field of AI in recent times. Deep reinforcement learning emerged naturally when people made advancements in the deep learning field and applied them to reinforcement learning. We learned about the state-value function, action-value function, and policy. Let's briefly look at how they can be represented mathematically or realized through computer code. The state-value function  is a real-value function that takes the current state  as the input and outputs a real-value number (such as 4.57). This number is the agent's prediction of how good it is to be in state and the agent keeps updating the value function based on the new experiences it gains. Likewise, the action-value function is also a real-value function, which takes action as an input in addition to state , and outputs a real number. One way to represent these functions is using neural networks because neural networks are universal function approximators, which are capable of representing complex, non-linear functions. For an agent trying to play a game of Atari by just looking at the images on the screen (like we do), state could be the pixel values of the image on the screen. In such cases, we could use a deep neural network with convolutional layers to extract the visual features from the state/image, and then a few fully connected layers to finally output  or , depending on which function we want to approximate. Recall from the earlier sections of this chapter that  is the state-value function and provides an estimate of the value of being in state , and  is the action-value function, which provides an estimate of the value of each action given the  state. If we do this, then we are doing deep reinforcement learning! Easy enough to understand? I hope so. Let's look at some other ways in which we can use deep learning in reinforcement learning. Recall that a policy is represented as  in the case of deterministic policies, and as  in the case of stochastic policies, where action could be discrete (such as "move left," "move right," or "move straight ahead") or continuous values (such as "0.05" for acceleration, "0.67" for steering, and so on), and they can be single or multi-dimensional. Therefore, a policy can be a complicated function at times! It might have to take in a multi-dimensional state (such as an image) as input and output a multi-dimensional vector of probabilities as output (in the case of stochastic policies). So, this does look like it will be a monster function, doesn't it? Yes it does. That's where deep neural networks come to the rescue! We could approximate an agent's policy using a deep neural network and directly learn to update the policy (by updating the parameters of the deep neural network). This is called policy optimization-based deep reinforcement learning and it has been shown to be quite efficient in solving several challenging control problems, especially in robotics. So in summary, deep reinforcement learning is the application of deep learning to reinforcement learning and so far, researchers have applied deep learning to reinforcement learning successfully in two ways. One way is using deep neural networks to approximate the value functions, and the other way is to use a deep neural network to represent the policy. These ideas have been known from the early days, when researchers were trying to use neural networks as value function approximators, even back in 2005. But it rose to stardom only recently because although neural networks or other non-linear value function approximators can better represent the complex values of environment states and actions, they were prone to instability and often led to sub-optimal functions. Only recently have researchers such as Volodymyr Mnih and his colleagues at DeepMind (now part of Google) figured out the trick of stabilizing the learning and trained agents with deep, non-linear function approximators that converged to near-optimal value functions. In the later chapters of this book, we will, in fact, reproduce some of their then-groundbreaking results, which surpassed human Atari game playing capabilities! Practical applications of reinforcement and deep reinforcement learning algorithms Until recently, practical applications of reinforcement learning and deep reinforcement learning were limited, due to sample complexity and instability. But, these algorithms proved to be quite powerful in solving some really hard practical problems. Some of them are listed here to give you an idea: Learning to play video games better than humans: This news has probably reached you by now. Researchers at DeepMind and others developed a series of algorithms, starting with DeepMind's Deep-Q-Network, or DQN for short, which reached human-level performance in playing Atari games. We will actually be implementing this algorithm in a later chapter of this book! In essence, it is a deep variant of the Q-learning algorithm we briefly saw in this chapter, with a few changes that increased the speed of learning and the stability. It was able to reach human-level performance in terms of game scores after several games. What is more impressive is that the same algorithm achieved this level of play without any game-specific fine-tuning or changes! Mastering the game of Go: Go is a Chinese game that has challenged AI for several decades. It is played on a full-size 19 x 19 board and is orders of magnitude more complex than chess because of the large number () of possible board positions. Until recently, no AI algorithm or software was able to play anywhere close to the level of humans at this game. AlphaGo—the AI agent from DeepMind that uses deep reinforcement learning and Monte Carlo tree search—changed this all and beat the human world champions Lee Sedol (4-1) and Fan Hui (5-0). DeepMind released more advanced versions of their AI agent, named AlphaGO Zero (which uses zero human knowledge and learned to play all by itself!) and AlphaZero (which could play the games of Go, chess, and Shogi!), all of which used deep reinforcement learning as the core algorithm. Helping AI win Jeopardy!: IBM's Watson—an AI system developed by IBM, which came to fame by beating humans at Jeopardy!—used an extension of TD learning to create its daily-double wagering strategies that helped it to win against human champions. Robot locomotion and manipulation: Both reinforcement learning and deep reinforcement learning have enabled the control of complex robots, both for locomotion and navigation. Several recent works from the researchers at UC Berkeley have shown how, using deep reinforcement, they train policies that offer vision and control for robotic manipulation tasks and generate join actuations for making a complex bipedal humanoid walk and run. Summary To summarize, in this article, we learned about the Markov Decision process, Deep reinforcement learning, and its applications. If you've enjoyed this post, head over to the book, Hands-On Intelligent Agents with OpenAI Gym for implementing learning algorithms for machine software agents in order to solve discrete or continuous sequential decision making and control tasks, and much more. Budget and Demand Forecasting using Markov model in SAS [Tutorial] Implement Reinforcement learning using Markov Decision Process [Tutorial] What are generative adversarial networks (GANs) and how do they work? [Video]
Read more
  • 0
  • 0
  • 3989

article-image-performing-sentiment-analysis-with-r-on-obamas-state-of-the-union-speeches-tutorial
Sugandha Lahoti
23 Sep 2018
16 min read
Save for later

Performing Sentiment Analysis with R on Obama's State of the Union speeches [Tutorial]

Sugandha Lahoti
23 Sep 2018
16 min read
For this article, we will take a look at former President Obama's State of the Union speeches. We will be performing Sentiment Analysis with R on Obama's State of the Union speeches.  The two main analytical goals are to build topic models on the six State of the Union speeches and then compare the first speech in 2010 and the last in January 2016 for sentence-based textual measures, such as sentiment and dispersion. This tutorial is taken from the book Mastering Machine Learning with R - Second Edition by Cory Lesmeister. In this book, you will master machine learning techniques with R to deliver insights in complex projects.  Preparing our Data and performing text transformations The primary package that we will use is tm, the text mining package. We will also need SnowballC for the stemming of the words, RColorBrewer for the color palettes in wordclouds, and the wordcloud package. Please ensure that you have these packages installed before attempting to load them: > library(tm) > library(wordcloud) > library(RColorBrewer) The data files are available for download in https://github.com/datameister66/data. Please ensure you put the text files into a separate directory because it will all go into our corpus for analysis. Download the seven .txt files, for example, sou2012.txt, into your working R directory. You can identify your current working directory and set it with these functions: > getwd() > setwd(".../data") We can now begin to create the corpus by first creating an object with the path to the speeches and then seeing how many files are in this directory and what they are named: > name <- file.path(".../text") > length(dir(name)) [1] 7 > dir(name) [1] "sou2010.txt" "sou2011.txt" "sou2012.txt" "sou2013.txt" [5] "sou2014.txt" "sou2015.txt" "sou2016.txt" We will name our corpus docs and create it with the Corpus() function, wrapped around the directory source function, DirSource(), which is also part of the tm package: > docs <- Corpus(DirSource(name)) > docs <<VCorpus>> Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 7 Note that there is no corpus or document level metadata. There are functions in the tm package to apply things such as author's names and timestamp information, among others, at both document level and corpus. We will not utilize this for our purposes. We can now begin the text transformations using the tm_map() function from the tm package. These will be the transformations that we discussed previously--lowercase letters, remove numbers, remove punctuation, remove stop words, strip out the whitespace, and stem the words: > docs <- tm_map(docs, tolower) > docs <- tm_map(docs, removeNumbers) > docs <- tm_map(docs, removePunctuation) > docs <- tm_map(docs, removeWords, stopwords("english")) > docs <- tm_map(docs, stripWhitespace) At this point, it is a good idea to eliminate unnecessary words. For example, during the speeches, when Congress applauds a statement, you will find (Applause) in the text. This must be removed: > docs <- tm_map(docs, removeWords, c("applause", "can", "cant", "will", "that", "weve", "dont", "wont", "youll", "youre")) After completing the transformations and removal of other words, make sure that your documents are plain text, put it in a document-term matrix, and check the dimensions: > docs = tm_map(docs, PlainTextDocument) > dtm = DocumentTermMatrix(docs) > dim(dtm) [1] 7 4738 The six speeches contain 4738 words. It is optional, but one can remove the sparse terms with the removeSparseTerms() function. You will need to specify a number between zero and one where the higher the number, the higher the percentage of sparsity in the matrix. Sparsity is the relative frequency of a term in the documents. So, if your sparsity threshold is 0.75, only terms with sparsity greater than 0.75 are removed. For us that would be (1 - 0.75) * 7, which is equal to 1.75.  Therefore, any term in fewer than two documents would be removed: > dtm <- removeSparseTerms(dtm, 0.75) > dim(dtm) [1] 7 2254 As we don't have the metadata on the documents, it is important to name the rows of the matrix so that we know which document is which: > rownames(dtm) <- c("2010", "2011", "2012", "2013", "2014", "2015", "2016") Using the inspect() function, you can examine the matrix. Here, we will look at the seven rows and the first five columns: > inspect(dtm[1:7, 1:5]) Terms Docs abandon ability able abroad absolutely 2010 0 1 1 2 2 2011 1 0 4 3 0 2012 0 0 3 1 1 2013 0 3 3 2 1 2014 0 0 1 4 0 2015 1 0 1 1 0 2016 0 0 1 0 0 It appears that our data is ready for analysis, starting with looking at the word frequency counts. Let me point out that the output demonstrates why I've been trained to not favor wholesale stemming. You may be thinking that 'ability' and 'able' could be combined. If you stemmed the document you would end up with 'abl'. How does that help the analysis? I think you lose context, at least in the initial analysis. Again, I recommend applying stemming thoughtfully and judiciously. Data Modeling and evaluation Modeling will be broken into two distinct parts. The first will focus on word frequency and correlation and culminate in the building of a topic model. In the next portion, we will examine many different quantitative techniques by utilizing the power of the qdap package in order to compare two different speeches. Word frequency and topic models As we have everything set up in the document-term matrix, we can move on to exploring word frequencies by creating an object with the column sums, sorted in descending order. It is necessary to use as.matrix() in the code to sum the columns. The default order is ascending, so putting - in front of freq will change it to descending: > freq <- colSums(as.matrix(dtm)) > ord <- order(-freq)  We will examine the head and tail of the object with the following code: > freq[head(ord)] new america people jobs now years 193 174 168 163 157 148 > freq[tail(ord)] wright written yearold youngest youngstown zero 2 2 2 2 2 2 The most frequent word is new and, as you might expect, the president mentions america frequently. Also, notice how important employment is with the frequency of jobs. I find it interesting that he mentions Youngstown, for Youngstown, OH, a couple of times. To look at the frequency of the word frequency, you can create tables, as follows: > head(table(freq)) freq 2 3 4 5 6 7 596 354 230 141 137 89 > tail(table(freq)) freq 148 157 163 168 174 193 1 1 1 1 1 1 What these tables show is the number of words with that specific frequency. So 354 words occurred three times; and one word, new in our case, occurred 193 times. Using findFreqTerms(), we can see which words occurred at least 125 times: > findFreqTerms(dtm, 125) [1] "america" "american" "americans" "jobs" "make" "new" [7] "now" "people" "work" "year" "years" You can find associations with words by correlation with the findAssocs() function. Let's look at jobs as two examples using 0.85 as the correlation cutoff: > findAssocs(dtm, "jobs", corlimit = 0.85) $jobs colleges serve market shouldnt defense put tax came 0.97 0.91 0.89 0.88 0.87 0.87 0.87 0.86 For visual portrayal, we can produce wordclouds and a bar chart. We will do two wordclouds to show the different ways to produce them: one with a minimum frequency and the other by specifying the maximum number of words to include. The first one with a minimum frequency also includes code to specify the color. The scale syntax determines the minimum and maximum word size by frequency; in this case, the minimum frequency is 70: > wordcloud(names(freq), freq, min.freq = 70, scale = c(3, .5), colors = brewer.pal(6, "Dark2")) The output of the preceding command is as follows: One can forgo all the fancy graphics, as we will in the following image, capturing the 25 most frequent words: > wordcloud(names(freq), freq, max.words = 25)   The output of the preceding command is as follows: To produce a bar chart, the code can get a bit complicated, whether you use base R, ggplot2, or lattice. The following code will show you how to produce a bar chart for the 10 most frequent words in base R: > freq <- sort(colSums(as.matrix(dtm)), decreasing = TRUE) > wf <- data.frame(word = names(freq), freq = freq) > wf <- wf[1:10, ] > barplot(wf$freq, names = wf$word, main = "Word Frequency", xlab = "Words", ylab = "Counts", ylim = c(0, 250)) The output of the preceding command is as follows: We will now move on to building topic models using the topicmodels package, which offers the LDA() function. The question now is how many topics to create. It seems logical to solve for three  topics (k=3). Certainly, I encourage you to try other numbers of topics: > library(topicmodels) > set.seed(123) > lda3 <- LDA(dtm, k = 3, method = "Gibbs") > topics(lda3) 2010 2011 2012 2013 2014 2015 2016 2 1 1 1 3 3 2 We can see an interesting transition over time. The first and last addresses have the same topic grouping, almost as if he opened and closed his tenure with the same themes. Using the terms() function produces a list of an ordered word frequency for each topic. The list of words is specified in the function, so let's look at the top 20 per topic: > terms(lda3, 25) Topic 1 Topic 2 Topic 3 [1,] "jobs" "people" "america" [2,] "now" "one" "new" [3,] "get" "work" "every" [4,] "tonight" "just" "years" [5,] "last" "year" "like" [6,] "energy" "know" "make" [7,] "tax" "economy" "time" [8,] "right" "americans" "need" [9,] "also" "businesses" "american" [10,] "government" "even" "world" [11,] "home" "give" "help" [12,] "well" "many" "lets" [13,] "american" "security" "want" [14,] "two" "better" "states" [15,] "congress" "come" "first" [16,] "country" "still" "country" [17,] "reform" "workers" "together" [18,] "must" "change" "keep" [19,] "deficit" "take" "back" [20,] "support" "health" "americans" [21,] "business" "care" "way" [22,] "education" "families" "hard" [23,] "companies" "made" "today" [24,] "million" "future" "working" [25,] "nation" "small" "good" Topic 2 covers the first and last speeches. Nothing really stands out as compelling in that topic as the others. It will be interesting to see how the next analysis can yield insights into those speeches. Topic 1 covers the next three speeches. Here, the message transitions to "jobs", "energy", "reform", and the "deficit", not to mention the comments about "education" and as we saw above, the correlation of "jobs" and "colleges". Topic 3 brings us to the next two speeches. The focus seems to really shift on to the economy and business with mentions to "security" and healthcare. In the next section, we can dig into the exact speech content further, along with comparing and contrasting the first and last State of the Union addresses. Additional quantitative analysis This portion of the analysis will focus on the power of the qdap package. It allows you to compare multiple documents over a wide array of measures. Our effort will be on comparing the 2010 and 2016 speeches. For starters, we will need to turn the text into data frames, perform sentence splitting, and then combine them into one data frame with a variable created that specifies the year of the speech. We will use this as our grouping variable in the analyses. Dealing with text data, even in R, can be tricky. The code that follows seemed to work the best, in this case, to get the data loaded and ready for analysis. We first load the qdap package. Then, to bring in the data from a text file, we will use the readLines() function from base R, collapsing the results to eliminate unnecessary whitespace. I also recommend putting your text encoding to ASCII, otherwise, you may run into some bizarre text that will mess up your analysis. That is done with the iconv() function: > library(qdap) > speech16 <- paste(readLines("sou2016.txt"), collapse=" ") Warning message: In readLines("sou2016.txt") : incomplete final line found on 'sou2016.txt' > speech16 <- iconv(speech16, "latin1", "ASCII", "") The warning message is not an issue as it is just telling us that the final line of text is not the same length as the other lines in the .txt file. We now apply the qprep() function from qdap. This function is a wrapper for a number of other replacement functions and using it will speed pre-processing, but it should be used with caution if a more detailed analysis is required. The functions it passes through are as follows: bracketX(): apply bracket removal replace_abbreviation(): replaces abbreviations replace_number(): numbers to words, for example '100' becomes 'one hundred' replace_symbol(): symbols become words, for example @ becomes 'at' > prep16 <- qprep(speech16) The other pre-processing we should do is to replace contractions (can't to cannot), remove stopwords, in our case the top 100, and remove unwanted characters, with the exception of periods and question marks. They will come in handy shortly: > prep16 <- replace_contraction(prep16) > prep16 <- rm_stopwords(prep16, Top100Words, separate = F) > prep16 <- strip(prep16, char.keep = c("?", ".")) Critical to this analysis is to now split it into sentences and add what will be the grouping variable, the year of the speech. This also creates the tot variable, which stands for Turn of Talk, serving as an indicator of sentence order. This is especially helpful in a situation where you are analyzing dialogue, say in a debate or question and answer session: > sent16 <- data.frame(speech = prep16) > sent16 <- sentSplit(sent16, "speech") > sent16$year <- "2016" Repeat the steps for the 2010 speech: > speech10 <- paste(readLines("sou2010.txt"), collapse=" ") > speech10 <- iconv(speech10, "latin1", "ASCII", "") > speech10 <- gsub("(Applause.)", "", speech10) > prep10 <- qprep(speech10) > prep10 <- replace_contraction(prep10) > prep10 <- rm_stopwords(prep10, Top100Words, separate = F) > prep10 <- strip(prep10, char.keep = c("?", ".")) > sent10 <- data.frame(speech = prep10) > sent10 <- sentSplit(sent10, "speech") > sent10$year <- "2010" Concatenate the separate years into one dataframe: > sentences <- data.frame(rbind(sent10, sent16)) One of the great things about the qdap package is that it facilitates basic text exploration, as we did before. Let's see a plot of frequent terms: > plot(freq_terms(sentences$speech)) The output of the preceding command is as follows: You can create a word frequency matrix that provides the counts for each word by speech: > wordMat <- wfm(sentences$speech, sentences$year) > head(wordMat[order(wordMat[, 1], wordMat[, 2],decreasing = TRUE),]) 2010 2016 our 120 85 us 33 33 year 29 17 americans 28 15 why 27 10 jobs 23 8 This can also be converted into a document-term matrix with the function as.dtm() should you so desire. Let's next build wordclouds, by year with qdap functionality: > trans_cloud(sentences$speech, sentences$year, min.freq = 10) The preceding command produces the following two images: Comprehensive word statistics are available. Here is a plot of the stats available in the package. The plot loses some of its visual appeal with just two speeches but is revealing nonetheless. A complete explanation of the stats is available under ?word_stats: > ws <- word_stats(sentences$speech, sentences$year, rm.incomplete = T) > plot(ws, label = T, lab.digits = 2) The output of the preceding command is as follows: Notice that the 2016 speech was much shorter, with over a hundred fewer sentences and almost a thousand fewer words. Also, there seems to be the use of asking questions as a rhetorical device in 2016 versus 2010 (n.quest 10 versus n.quest 4). To compare the polarity (sentiment scores), use the polarity() function, specifying the text and grouping variables: > pol = polarity(sentences$speech, sentences$year) > pol year total.sentences total.words ave.polarity sd.polarity stan.mean.polarity 1 2010 435 3900 0.052 0.432 0.121 2 2016 299 2982 0.105 0.395 0.267 The stan.mean.polarity value represents the standardized mean polarity, which is the average polarity divided by the standard deviation. We see that 2015 was slightly higher (0.267) than 2010 (0.121). This is in line with what we would expect, wanting to end on a more positive note. You can also plot the data. The plot produces two charts. The first shows the polarity by sentences over time and the second shows the distribution of the polarity: > plot(pol) The output of the preceding command is as follows: This plot may be a challenge to read in this text, but let me do my best to interpret it. The 2010 speech starts out with a strong negative sentiment and is slightly more negative than 2016. We can identify the most negative sentiment sentence by creating a dataframe of the pol object, find the sentence number, and produce it: > pol.df <- pol$all > which.min(pol.df$polarity) [1] 12 > pol.df$text.var[12] [1] "One year ago, I took office amid two wars, an economy rocked by a severe recession, a financial system on the verge of collapse, and a government deeply in debt. Now that is negative sentiment! Ironically, the government is even more in debt today. We will look at the readability index next: > ari <- automated_readability_index(sentences$speech, sentences$year) > ari$Readability year word.count sentence.count character.count 1 2010 3900 435 23859 2 2016 2982 299 17957 Automated_Readability_Index 1 11.86709 2 11.91929 I think it is no surprise that they are basically the same. Formality analysis is next. This takes a couple of minutes to run in R: > form <- formality(sentences$speech, sentences$year) > form year word.count formality 1 2016 2983 65.61 2 2010 3900 63.88 This looks to be very similar. We can examine the proportion of the parts of the speech. A plot is available, but adds nothing to the analysis, in this instance: > form$form.prop.by year word.count noun adj prep articles pronoun 1 2010 3900 44.18 15.95 3.67 0 4.51 2 2016 2982 43.46 17.37 4.49 0 4.96 verb adverb interj other 1 23.49 7.77 0.05 0.38 2 21.73 7.41 0.00 0.57 Now, the diversity measures are produced. Again, they are nearly identical. A plot is also available, (plot(div)), but being so similar, it once again adds no value. It is important to note that Obama's speechwriter for 2010 was Jon Favreau, and in 2016, it was Cody Keenan: > div <- diversity(sentences$speech, sentences$year) > div year wc simpson shannon collision berger_parker brillouin 1 2010 3900 0.998 6.825 5.970 0.031 6.326 2 2015 2982 0.998 6.824 6.008 0.029 6.248 One of my favorite plots is the dispersion plot. This shows the dispersion of a word throughout the text. Let's examine the dispersion of "jobs", "families", and "economy": > dispersion_plot(sentences$speech, rm.vars = sentences$year, c("security", "jobs", "economy"), color = "black", bg.color = "white") The output of the preceding command is as follows: This completes our analysis of the two speeches.  The analysis showed that, although the speeches had a similar style, the core messages changed over time as the political landscape changed. This extract is taken from the book Mastering Machine Learning with R - Second Edition. Read the book to know more advanced prediction, algorithms, and learning methods with R. Understanding Sentiment Analysis and other key NLP concepts Twitter Sentiment Analysis Sentiment Analysis of the 2017 US elections on Twitter
Read more
  • 0
  • 0
  • 5193

article-image-build-your-first-neural-network-with-pytorch-tutorial
Sugandha Lahoti
22 Sep 2018
14 min read
Save for later

Build your first neural network with PyTorch [Tutorial]

Sugandha Lahoti
22 Sep 2018
14 min read
Understanding the basic building blocks of a neural network, such as tensors, tensor operations, and gradient descents, is important for building complex neural networks. In this article, we will build our first Hello world program in PyTorch. This tutorial is taken from the book Deep Learning with PyTorch. In this book, you will build neural network models in text, vision and advanced analytics using PyTorch. Let's assume that we work for one of the largest online companies, Wondermovies, which serves videos on demand. Our training dataset contains a feature that represents the average hours spent by users watching movies on the platform and we would like to predict how much time each user would spend on the platform in the coming week. It's just an imaginary use case, don't think too much about it. Some of the high-level activities for building such a solution are as follows: Data preparation: The get_data function prepares the tensors (arrays) containing input and output data Creating learnable parameters: The get_weights function provides us with tensors containing random values that we will optimize to solve our problem Network model: The simple_network function produces the output for the input data, applying a linear rule, multiplying weights with input data, and adding the bias term (y = Wx+b) Loss: The loss_fn function provides information about how good the model is Optimizer: The optimize function helps us in adjusting random weights created initially to help the model calculate target values more accurately Let's consider following linear regression equation for our neural network: Let's write our first neural network in PyTorch: x,y = get_data() # x - represents training data,y - represents target variables w,b = get_weights() # w,b - Learnable parameters for i in range(500): y_pred = simple_network(x) # function which computes wx + b loss = loss_fn(y,y_pred) # calculates sum of the squared differences of y and y_pred if i % 50 == 0: print(loss) optimize(learning_rate) # Adjust w,b to minimize the loss Data preparation PyTorch provides two kinds of data abstractions called tensors and variables. Tensors are similar to numpy arrays and they can also be used on GPUs, which provide increased performance. They provide easy methods of switching between GPUs and CPUs. For certain operations, we can notice a boost in performance and machine learning algorithms can understand different forms of data, only when represented as tensors of numbers. Tensors are like Python arrays and can change in size. Scalar (0-D tensors) A tensor containing only one element is called a scalar. It will generally be of type FloatTensor or LongTensor. At the time of writing, PyTorch does not have a special tensor with zero dimensions. So, we use a one-dimension tensor with one element, as follows: x = torch.rand(10) x.size() Output - torch.Size([10]) Vectors (1-D tensors) A vector is simply an array of elements. For example, we can use a vector to store the average temperature for the last week: temp = torch.FloatTensor([23,24,24.5,26,27.2,23.0]) temp.size() Output - torch.Size([6]) Matrix (2-D tensors) Most of the structured data is represented in the form of tables or matrices. We will use a dataset called Boston House Prices, which is readily available in the Python scikit-learn machine learning library. The dataset is a numpy array consisting of 506 samples or rows and 13 features representing each sample. Torch provides a utility function called from_numpy(), which converts a numpy array into a torch tensor. The shape of the resulting tensor is 506 rows x 13 columns: boston_tensor = torch.from_numpy(boston.data) boston_tensor.size() Output: torch.Size([506, 13]) boston_tensor[:2] Output: Columns 0 to 7 0.0063 18.0000 2.3100 0.0000 0.5380 6.5750 65.2000 4.0900 0.0273 0.0000 7.0700 0.0000 0.4690 6.4210 78.9000 4.9671 Columns 8 to 12 1.0000 296.0000 15.3000 396.9000 4.9800 2.0000 242.0000 17.8000 396.9000 9.1400 [torch.DoubleTensor of size 2x13] 3-D tensors When we add multiple matrices together, we get a 3-D tensor. 3-D tensors are used to represent data-like images. Images can be represented as numbers in a matrix, which are stacked together. An example of an image shape is 224, 224, 3, where the first index represents height, the second represents width, and the third represents a channel (RGB). Let's see how a computer sees a panda, using the next code snippet: from PIL import Image # Read a panda image from disk using a library called PIL and convert it to numpy array panda = np.array(Image.open('panda.jpg').resize((224,224))) panda_tensor = torch.from_numpy(panda) panda_tensor.size() Output - torch.Size([224, 224, 3]) #Display panda plt.imshow(panda) Since displaying the tensor of size 224, 224, 3 would occupy a couple of pages in the book, we will display the image and learn to slice the image into smaller tensors to visualize it: Displaying the image Slicing tensors A common thing to do with a tensor is to slice a portion of it. A simple example could be choosing the first five elements of a one-dimensional tensor; let's call the tensor sales. We use a simple notation, sales[:slice_index] where slice_index represents the index where you want to slice the tensor: sales = torch.FloatTensor([1000.0,323.2,333.4,444.5,1000.0,323.2,333.4,444.5]) sales[:5] 1000.0000 323.2000 333.4000 444.5000 1000.0000 [torch.FloatTensor of size 5] sales[:-5] 1000.0000 323.2000 333.4000 [torch.FloatTensor of size 3] Let's do more interesting things with our panda image, such as see what the panda image looks like when only one channel is chosen and see how to select the face of the panda. Here, we select only one channel from the panda image: plt.imshow(panda_tensor[:,:,0].numpy()) #0 represents the first channel of RGB The output is as follows: Now, let's crop the image. Say we want to build a face detector for pandas and we need just the face of a panda for that. We crop the tensor image such that it contains only the panda's face: plt.imshow(panda_tensor[25:175,60:130,0].numpy())  The output is as follows: Another common example would be where you need to pick a specific element of a tensor: #torch.eye(shape) produces an diagonal matrix with 1 as it diagonal #elements. sales = torch.eye(3,3) sales[0,1] Output- 0.00.0 Most of the PyTorch tensor operations are very similar to NumPy operations. 4-D tensors One common example for four-dimensional tensor types is a batch of images. Modern CPUs and GPUs are optimized to perform the same operations on multiple examples faster. So, they take a similar time to process one image or a batch of images. So, it is common to use a batch of examples rather than use a single image at a time. Choosing the batch size is not straightforward; it depends on several factors. One major restriction for using a bigger batch or the complete dataset is GPU memory limitations—16, 32, and 64 are commonly used batch sizes. Let's look at an example where we load a batch of cat images of size 64 x 224 x 224 x 3 where 64 represents the batch size or the number of images, 244 represents height and width, and 3 represents channels: #Read cat images from disk cats = glob(data_path+'*.jpg') #Convert images into numpy arrays cat_imgs = np.array([np.array(Image.open(cat).resize((224,224))) for cat in cats[:64]]) cat_imgs = cat_imgs.reshape(-1,224,224,3) cat_tensors = torch.from_numpy(cat_imgs) cat_tensors.size() Output - torch.Size([64, 224, 224, 3]) Tensors on GPU We have learned how to represent different forms of data in a tensor representation. Some of the common operations we perform once we have data in the form of tensors are addition, subtraction, multiplication, dot product, and matrix multiplication. All of these operations can be either performed on the CPU or the GPU. PyTorch provides a simple function called cuda() to copy a tensor on the CPU to the GPU. We will take a look at some of the operations and compare the performance between matrix multiplication operations on the CPU and GPU. Tensor addition can be obtained by using the following code: #Various ways you can perform tensor addition a = torch.rand(2,2) b = torch.rand(2,2) c = a + b d = torch.add(a,b) #For in-place addition a.add_(5) #Multiplication of different tensors a*b a.mul(b) #For in-place multiplication a.mul_(b) For tensor matrix multiplication, let's compare the code performance on CPU and GPU. Any tensor can be moved to the GPU by calling the .cuda() function. Multiplication on the GPU runs as follows: a = torch.rand(10000,10000) b = torch.rand(10000,10000) a.matmul(b) Time taken: 3.23 s #Move the tensors to GPU a = a.cuda() b = b.cuda() a.matmul(b) Time taken: 11.2 µs These fundamental operations of addition, subtraction, and matrix multiplication can be used to build complex operations, such as a Convolution Neural Network (CNN) and a recurrent neural network (RNN). Variables Deep learning algorithms are often represented as computation graphs. Here is a simple example of the variable computation graph that we built in our example: Each circle in the preceding computation graph represents a variable. A variable forms a thin wrapper around a tensor object, its gradients, and a reference to the function that created it. The following figure shows Variable class components: The gradients refer to the rate of the change of the loss function with respect to various parameters (W, b). For example, if the gradient of a is 2, then any change in the value of a would modify the value of Y by two times. If that is not clear, do not worry—most of the deep learning frameworks take care of calculating gradients for us. In this part, we learn how to use these gradients to improve the performance of our model. Apart from gradients, a variable also has a reference to the function that created it, which in turn refers to how each variable was created. For example, the variable a has information that it is generated as a result of the product between X and W. Let's look at an example where we create variables and check the gradients and the function reference: x = Variable(torch.ones(2,2),requires_grad=True) y = x.mean() y.backward() x.grad Variable containing: 0.2500 0.2500 0.2500 0.2500 [torch.FloatTensor of size 2x2] x.grad_fn Output - None x.data 1 1 1 1 [torch.FloatTensor of size 2x2] y.grad_fn <torch.autograd.function.MeanBackward at 0x7f6ee5cfc4f8> In the preceding example, we called a backward operation on the variable to compute the gradients. By default, the gradients of the variables are none. The grad_fn of the variable points to the function it created. If the variable is created by a user, like the variable x in our case, then the function reference is None. In the case of variable y, it refers to its function reference, MeanBackward. The Data attribute accesses the tensor associated with the variable. Creating data for our neural network The get_data function in our first neural network code creates two variables, x and y, of sizes (17, 1) and (17). We will take a look at what happens inside the function: def get_data(): train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167, 7.042,10.791,5.313,7.997,5.654,9.27,3.1]) train_Y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221, 2.827,3.465,1.65,2.904,2.42,2.94,1.3]) dtype = torch.FloatTensor X = Variable(torch.from_numpy(train_X).type(dtype),requires_grad=False).view(17,1) y = Variable(torch.from_numpy(train_Y).type(dtype),requires_grad=False) return X,y Creating learnable parameters In our neural network example, we have two learnable parameters, w and b, and two fixed parameters, x and y. We have created variables x and y in our get_data function. Learnable parameters are created using random initialization and have the require_grad parameter set to True, unlike x and y, where it is set to False.  Let's take a look at our get_weights function: def get_weights(): w = Variable(torch.randn(1),requires_grad = True) b = Variable(torch.randn(1),requires_grad=True) return w,b Most of the preceding code is self-explanatory; torch.randn creates a random value of any given shape. Neural network model Once we have defined the inputs and outputs of the model using PyTorch variables, we have to build a model which learns how to map the outputs from the inputs. In traditional programming, we build a function by hand coding different logic to map the inputs to the outputs. However, in deep learning and machine learning, we learn the function by showing it the inputs and the associated outputs. In our example, we implement a simple neural network which tries to map the inputs to outputs, assuming a linear relationship. The linear relationship can be represented as y = wx + b, where w and b are learnable parameters. Our network has to learn the values of w and b, so that wx + b will be closer to the actual y. Let's visualize our training dataset and the model that our neural network has to learn: The following figure represents a linear model fitted on input data points: The dark-gray (blue) line in the image represents the model that our network learns. Network implementation As we have all the parameters (x, w, b, and y) required to implement the network, we perform a matrix multiplication between w and x. Then, sum the result with b. That will give our predicted y. The function is implemented as follows: def simple_network(x): y_pred = torch.matmul(x,w)+b return y_pred PyTorch also provides a higher-level abstraction in torch.nn called layers, which will take care of most of these underlying initialization and operations associated with most of the common techniques available in the neural network. We are using the lower-level operations to understand what happens inside these functions.  The previous model can be represented as a torch.nn layer, as follows: f = nn.Linear(17,1) # Much simpler. Now that we have calculated the y values, we need to know how good our model is, which is done in the loss function. Loss function As we start with random values, our learnable parameters, w and b, will result in y_pred, which will not be anywhere close to the actual y. So, we need to define a function which tells the model how close its predictions are to the actual values. Since this is a regression problem, we use a loss function called the sum of squared error (SSE). We take the difference between the predicted y and the actual y and square it. SSE helps the model to understand how close the predicted values are to the actual values. The torch.nn library has different loss functions, such as MSELoss and cross-entropy loss. However, for this chapter, let's implement the loss function ourselves: def loss_fn(y,y_pred): loss = (y_pred-y).pow(2).sum() for param in [w,b]: if not param.grad is None: param.grad.data.zero_() loss.backward() return loss.data[0] Apart from calculating the loss, we also call the backward operation, which calculates the gradients of our learnable parameters, w and b. As we will use the loss function more than once, we remove any previously calculated gradients by calling the grad.data.zero_() operation. The first time we call the backward function, the gradients are empty, so we zero the gradients only when they are not None. Optimize the neural network We started with random weights to predict our targets and calculate loss for our algorithm. We calculate the gradients by calling the backward function on the final loss variable. This entire process repeats for one epoch, that is, for the entire set of examples. In most of the real-world examples, we will do the optimization step per iteration, which is a small subset of the total set. Once the loss is calculated, we optimize the values with the calculated gradients so that the loss reduces, which is implemented in the following function: def optimize(learning_rate): w.data -= learning_rate * w.grad.data b.data -= learning_rate * b.grad.data The learning rate is a hyper-parameter, which allows us to adjust the values in the variables by a small amount of the gradients, where the gradients denote the direction in which each variable (w and b) needs to be adjusted. Different optimizers, such as Adam, RmsProp, and SGD are already implemented for use in the torch.optim package. The final network architecture is a model for learning to predict average hours spent by users on our Wondermovies platform. Next, to learn PyTorch built-in modules for building network architectures, read our book Deep Learning with PyTorch. Can a production ready Pytorch 1.0 give TensorFlow a tough time? PyTorch 0.3.0 releases, ending stochastic functions Is Facebook-backed PyTorch better than Google’s TensorFlow?
Read more
  • 0
  • 0
  • 9099

article-image-enhancing-markovs-decision-process-with-bellman-equation-tutorial
Sugandha Lahoti
21 Sep 2018
14 min read
Save for later

Enhancing Markov's Decision Process with Bellman Equation [Tutorial]

Sugandha Lahoti
21 Sep 2018
14 min read
Reinforcement learning, one of the foundations of machine learning, supposes learning through trial and error by interacting with an environment. Reinforcement learning often uses the Markov Decision Process (MDP). MDP contains a memoryless and unlabeled action-reward equation with a learning parameter. This equation, the Bellman equation (often coined as the Q function), was used to beat world-class Atari gamers.  In this article, we are going to tackle Markov's Decision Process (Q function) and apply it to reinforcement learning with the Bellman equation. This tutorial is taken from the book Artificial Intelligence By Example by Denis Rothman. In this book, you will develop machine intelligence from scratch using real artificial intelligence use cases. Step 1 – Markov Decision Process in natural language Step 1 of any artificial intelligence problem is to transpose it into something you know in your everyday life (work or personal). Let's say you are an e-commerce business driver delivering a package in an area you do not know. You are the operator of a self-driving vehicle. You have a GPS system with a beautiful color map on it. The areas around you are represented by the letters A to F, as shown in the simplified map in the following diagram. You are presently at F. Your goal is to reach area C. You are happy, listening to the radio. Everything is going smoothly, and it looks like you are going to be there on time. The following graph represents the locations and routes that you can possibly cover. The guiding system's state indicates the complete path to reach C. It is telling you that you are going to go from F to B to D and then to C. It looks good! To break things down further, let's say: The present state is the letter s. Your next action is the letter a (action). This action a is not location A. The next action a (not location A) is to go to location B. You look at your guiding system; it tells you there is no traffic, and that to go from your present state F to your next state B will take you only a few minutes. Let's say that the next state B is the letter B. At this point, you are still quite happy, and we can sum up your situation with the following sequence of events: The letter s is your present state, your present situation. The letter a is the action you're deciding, which is to go to the next area; there you will be in another state, s'. We can say that thanks to the action a, you will go from s to s'. Now, imagine that the driver is not you anymore. You are tired for some reason. That is when a self-driving vehicle comes in handy. You set your car to autopilot. Now you are not driving anymore; the system is. Let's call that system the agent. At point F, you set your car to autopilot and let the self-driving agent take over. The agent now sees what you have asked it to do and checks its mapping environment, which represents all the areas in the previous diagram from A to F. In the meantime, you are rightly worried. Is the agent going to make it or not? You are wondering if its strategy meets yours. You have your policy P—your way of thinking—which is to take the shortest paths possible. Will the agent agree? What's going on in its mind? You observe and begin to realize things you never noticed before. Since this is the first time you are using this car and guiding system, the agent is memoryless, which is an MDP feature. This means the agent just doesn't know anything about what went on before. It seems to be happy with just calculating from this state s at area F. It will use machine power to run as many calculations as necessary to reach its goal. Another thing you are watching is the total distance from F to C to check whether things are OK. That means that the agent is calculating all the states from F to C. In this case, state F is state 1, which we can simplify by writing s1. B is state 2, which we can simplify by write s2. D is s3 and C is  s4. The agent is calculating all of these possible states to make a decision. The agent knows that when it reaches D, C will be better because the reward will be higher to go to C than anywhere else. Since it cannot eat a piece of cake to reward itself, the agent uses numbers. Our agent is a real number cruncher. When it is wrong, it gets a poor reward or nothing in this model. When it's right, it gets a reward represented by the letter R. This action-value (reward) transition, often named the Q function, is the core of many reinforcement learning algorithms. When our agent goes from one state to another, it performs a transition and gets a reward. For example, the transition can be from F to B, state 1 to state 2, or s1 to s2. You are feeling great and are going to be on time. You are beginning to understand how the machine learning agent in your self-driving car is thinking. Suddenly your guiding system breaks down. All you can see on the screen is that static image of the areas of the last calculation. You look up and see that a traffic jam is building up. Area D is still far away, and now you do not know whether it would be good to go from D to C or D to E to get a taxi that can take special lanes. You are going to need your agent! The agent takes the traffic jam into account, is stubborn, and increases its reward to get to C by the shortest way. Its policy is to stick to the initial plan. You do not agree. You have another policy. You stop the car. You both have to agree before continuing. You have your opinion and policy; the agent does not agree. Before continuing, your views need to converge. Convergence is the key to making sure that your calculations are correct. This is the kind of problem that persons, or soon, self-driving vehicles (not to speak about drone air jams), delivering parcels encounter all day long to get the workload done. The number of parcels to delivery per hour is an example of the workload that needs to be taken into account when making a decision. To represent the problem at this point, the best way is to express this whole process mathematically. Step 2 – the mathematical representation of the Bellman equation and MDP Mathematics involves a whole change in your perspective on a problem. You are going from words to functions, the pillars of source coding. The goal here is to pick up enough mathematics to implement a solution in real-life companies. It is necessary to think of a problem through, by finding something familiar around us, such as the delivery itinerary example covered before. It is a good thing to write it down with some abstract letters and symbols as described before, with a meaning an action and s meaning a state. Once you have understood the problem and expressed the parameters in a way you are used to, you can proceed further. From MDP to the Bellman equation In the previous step 1, the agent went from F or state 1 or s to B, which was state 2 or s'. To do that, there was a strategy—a policy represented by P. All of this can be shown in one mathematical expression, the MDP state transition function: P is the policy, the strategy made by the agent to go from F to B through action a. When going from F to B, this state transition is called state transition function: a is the action s is state 1 (F) and s' is state 2 (B) This is the basis of MDP. The reward (right or wrong) is represented in the same way: That means R is the reward for the action of going from state s to state s'. Going from one state to another will be a random process. This means that potentially, all states can go to another state. The example we will be working on inputs a reward matrix so that the program can choose its best course of action. Then, the agent will go from state to state, learning the best trajectories for every possible starting location point. The goal of the MDP is to go to C (line 3, column 3 in the reward matrix), which has a starting value of 100 in the following Python code. # Markov Decision Process (MDP) - The Bellman equations adapted to # Reinforcement Learning # R is The Reward Matrix for each state R = ql.matrix([ [0,0,0,0,1,0], [0,0,0,1,0,1], [0,0,100,1,0,0], [0,1,1,0,1,0], [1,0,0,1,0,0], [0,1,0,0,0,0] ]) Each line in the matrix in the example represents a letter from A to F, and each column represents a letter from A to F. All possible states are represented. The 1 values represent the nodes (vertices) of the graph. Those are the possible locations. For example, line 1 represents the possible moves for letter A, line 2 for letter B, and line 6 for letter F. On the first line, A cannot go to C directly, so a 0 value is entered. But, it can go to E, so a 1 value is added. Some models start with -1 for impossible choices, such as B going directly to C and 0 values to define the locations. This model starts with 0 and 1 values. It sometimes takes weeks to design functions that will create a reward matrix To sum it up, we have three tools: Pa(s,s'): A policy, P, or strategy to move from one state to another Ta(s,s'): A T, or stochastic (random) transition, function to carry out that action Ra(s,s'): An R, or reward, for that action, which can be negative, null, or positive T is the transition function, which makes the agent decide to go from one point to another with a policy. In this case, it will be random. That's what machine power is for, and that's how reinforcement learning is often implemented. Randomness is a property of MDP. The following code describes the choice the agent is going to make. next_action = int(ql.random.choice(PossibleAction,1)) return next_action Once the code has been run, a new random action (state) has been chosen. The Bellman equation is the road to programming reinforcement learning. Bellman's equation completes the MDP. To calculate the value of a state, let's use Q, for the Q action-reward (or value) function. The pre-source code of Bellman's equation can be expressed as follows for one individual state: The source code then translates the equation into a machine representation as in the following code: # The Bellman equation Q[current_state, action] = R[current_state, action] + gamma * MaxValue The source code variables of the Bellman equation are as follows: Q(s): This is the value calculated for this state—the total reward. In step 1 when the agent went from F to B, the driver had to be happy. Maybe she/he had a crunch in a candy bar to feel good, which is the human counterpart of the reward matrix. The automatic driver maybe ate (reward matrix) some electricity, renewable energy of course! The reward is a number such as 50 or 100 to show the agent that it's on the right track. It's like when a student gets a good grade in an exam. R(s): This is the sum of the values up to there. It's the total reward at that point. ϒ = gamma: This is here to remind us that trial and error has a price. We're wasting time, money, and energy. Furthermore, we don't even know whether the next step is right or wrong since we're in a trial-and-error mode. Gamma is often set to 0.8. What does that mean? Suppose you're taking an exam. You study and study, but you don't really know the outcome. You might have 80 out of 100 (0.8) chances of clearing it. That's painful, but that's life. This is what makes Bellman's equation and MDP realistic and efficient. max(s'): s' is one of the possible states that can be reached with Pa (s,s'); max is the highest value on the line of that state (location line in the reward matrix). Step 3 – implementing the solution in Python In step 1, a problem was described in natural language to be able to talk to experts and understand what was expected. In step 2, an essential mathematical bridge was built between natural language and source coding. Step 3 is the software implementation phase. The code is a reinforcement learning program using the Q function with the following reward matrix: import numpy as ql R = ql.matrix([ [0,0,0,0,1,0], [0,0,0,1,0,1], [0,0,100,1,0,0], [0,1,1,0,1,0], [1,0,0,1,0,0], [0,1,0,0,0,0] ]) Q = ql.matrix(ql.zeros([6,6])) gamma = 0.8 R is the reward matrix described in the mathematical analysis. Q  inherits the same structure as R, but all values are set to 0 since this is a learning matrix. It will progressively contain the results of the decision process. The gamma variable is a double reminder that the system is learning and that its decisions have only an 80% chance of being correct each time. As the following code shows, the system explores the possible actions during the process. agent_s_state = 1 # The possible "a" actions when the agent is in a given state def possible_actions(state): current_state_row = R[state,] possible_act = ql.where(current_state_row >0)[1] return possible_act # Get available actions in the current state PossibleAction = possible_actions(agent_s_state) The agent starts in state 1, for example. You can start wherever you want because it's a random process. Note that only values > 0 are taken into account. They represent the possible moves (decisions). The current state goes through an analysis process to find possible actions (next possible states). You will note that there is no algorithm in the traditional sense with many rules. It's a pure random calculation, as the following random.choice function shows. def ActionChoice(available_actions_range): next_action = int(ql.random.choice(PossibleAction,1)) return next_action # Sample next action to be performed action = ActionChoice(PossibleAction) Now comes the core of the system containing Bellman's equation, translated into the following source code: def reward(current_state, action, gamma): Max_State = ql.where(Q[action,] == ql.max(Q[action,]))[1] if Max_State.shape[0] > 1: Max_State = int(ql.random.choice(Max_State, size = 1)) else: Max_State = int(Max_State) MaxValue = Q[action, Max_State] # Q function Q[current_state, action] = R[current_state, action] + gamma * MaxValue # Rewarding Q matrix reward(agent_s_state,action,gamma) You can see that the agent looks for the maximum value of the next possible state chosen at random. The best way to understand this is to run the program in your Python environment and print() the intermediate values. I suggest that you open a spreadsheet and note the values. It will give you a clear view of the process. The last part is simply about running the learning process 50,000 times, just to be sure that the system learns everything there is to find. During each iteration, the agent will detect its present state, choose a course of action, and update the Q function matrix: for i in range(50000): current_state = ql.random.randint(0, int(Q.shape[0])) PossibleAction = possible_actions(current_state) action = ActionChoice(PossibleAction) reward(current_state,action,gamma) # Displaying Q before the norm of Q phase print("Q :") print(Q) # Norm of Q print("Normed Q :") print(Q/ql.max(Q)*100) After the process is repeated and until the learning process is over, the program will print the result in Q and the normed result. The normed result is the process of dividing all values by the sum of the values found. The result comes out as a normed percentage. View the Python program at https://github.com/PacktPublishing/Artificial-Intelligence-By-Example/blob/master/Chapter01/MDP.py. In this article, we talked about MDP, a stochastic random action-reward (value) system enhanced by Bellman's equation, as a means of effective solution provider to many AI problems in corporate environments. Next, to discover how to create the reward matrix in the first place through explanations and source code read our book Artificial Intelligence By Example. How Reinforcement Learning works. Convolutional Neural Networks with Reinforcement Learning. Implement Reinforcement learning using Markov Decision Process [Tutorial].
Read more
  • 0
  • 0
  • 5433

article-image-build-a-neural-network-to-recognize-handwritten-numbers-in-keras-and-mnist
Fatema Patrawala
20 Sep 2018
8 min read
Save for later

Build a Neural Network to recognize handwritten numbers in Keras and MNIST

Fatema Patrawala
20 Sep 2018
8 min read
A neural network is made up of many artificial neurons. Is it a representation of the brain or is it a mathematical representation of some knowledge? Here, we will simply try to understand how a neural network is used in practice. A convolutional neural network (CNN) is a very special kind of multi-layer neural network. CNN is designed to recognize visual patterns directly from images with minimal processing. A graphical representation of this network is produced in the following image. The field of neural networks was originally inspired by the goal of modeling biological neural systems, but since then it has branched in different directions and has become a matter of engineering and attaining good results in machine learning tasks. In this article we will look at building blocks of neural networks and build a neural network which will recognize handwritten numbers in Keras and MNIST from 0-9. This article is an excerpt taken from the book Practical Convolutional Neural Networks, written by Mohit Sewak, Md Rezaul Karim and Pradeep Pujari and published by Packt Publishing. An artificial neuron is a function that takes an input and produces an output. The number of neurons that are used depends on the task at hand. It could be as low as two or as many as several thousands. There are numerous ways of connecting artificial neurons together to create a CNN. One such topology that is commonly used is known as a feed-forward network: Each neuron receives inputs from other neurons. The effect of each input line on the neuron is controlled by the weight. The weight can be positive or negative. The entire neural network learns to perform useful computations for recognizing objects by understanding the language. Now, we can connect those neurons into a network known as a feed-forward network. This means that the neurons in each layer feed their output forward to the next layer until we get a final output. This can be written as follows: The preceding forward-propagating neuron can be implemented as follows: import numpy as np import math class Neuron(object):    def __init__(self):        self.weights = np.array([1.0, 2.0])        self.bias = 0.0    def forward(self, inputs):        """ Assuming that inputs and weights are 1-D numpy arrays and the bias is a number """        a_cell_sum = np.sum(inputs * self.weights) + self.bias        result = 1.0 / (1.0 + math.exp(-a_cell_sum)) # This is the sigmoid activation function        return result neuron = Neuron() output = neuron.forward(np.array([1,1])) print(output) Now that we have understood what are the building blocks of neural networks, let us get to building a neural network that will recognize handwritten numbers from 0 - 9. Handwritten number recognition with Keras and MNIST A typical neural network for a digit recognizer may have 784 input pixels connected to 1,000 neurons in the hidden layer, which in turn connects to 10 output targets — one for each digit. Each layer is fully connected to the layer above. A graphical representation of this network is shown as follows, where x are the inputs, h are the hidden neurons, and y are the output class variables: In this notebook, we will build a neural network that will recognize handwritten numbers from 0-9. The type of neural network that we are building is used in a number of real-world applications, such as recognizing phone numbers and sorting postal mail by address. To build this network, we will use the MNIST dataset. We will begin as shown in the following code by importing all the required modules, after which the data will be loaded, and then finally building the network: # Import Numpy, keras and MNIST dataimportnumpyasnpimportmatplotlib.pyplotaspltfromkeras.datasetsimportmnistfromkeras.modelsimportSequentialfromkeras.layers.coreimportDense,Dropout,Activationfromkeras.utilsimportnp_utils Retrieving training and test data The MNIST dataset already comprises both training and test data. There are 60,000 data points of training data and 10,000 points of test data. If you do not have the data file locally at the '~/.keras/datasets/' + path, it can be downloaded at this location. Each MNIST data point has: An image of a handwritten digit A corresponding label that is a number from 0-9 to help identify the image The images will be called, and will be the input to our neural network, X; their corresponding labels are y. We want our labels as one-hot vectors. One-hot vectors are vectors of many zeros and one. It's easiest to see this in an example. The number 0 is represented as [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], and 4 is represented as [0, 0, 0, 0, 1, 0, 0, 0, 0, 0] as a one-hot vector. Flattened data We will use flattened data in this example, or a representation of MNIST images in one dimension rather than two can also be used. Thus, each 28 x 28 pixels number image will be represented as a 784 pixel 1 dimensional array. By flattening the data, information about the 2D structure of the image is thrown; however, our data is simplified. With the help of this, all our training data can be contained in one array of shape (60,000, 784), wherein the first dimension represents the number of training images and the second depicts the number of pixels in each image. This kind of data is easy to analyze using a simple neural network, as follows: # Retrieving the training and test data (X_train,y_train),(X_test,y_test)=mnist.load_data() print('X_train shape:',X_train.shape) print('X_test shape: ',X_test.shape) print('y_train shape:',y_train.shape) print('y_test shape: ',y_test.shape) Visualizing the training data The following function will help you visualize the MNIST data. By passing in the index of a training example, the show_digit function will display that training image along with its corresponding label in the title: # Visualize the dataimportmatplotlib.pyplotasplt%matplotlibinline #Displaying a training image by its index in the MNIST setdefdisplay_digit(index):label=y_train[index].argmax(axis=0)image=X_train[index]plt.title('Training data, index: %d,  Label: %d'%(index,label))plt.imshow(image,cmap='gray_r')plt.show()# Displaying the first (index 0) training imagedisplay_digit(0) X_train=X_train.reshape(60000,784)X_test=X_test.reshape(10000,784)X_train=X_train.astype('float32')X_test=X_test.astype('float32')X_train/=255X_test/=255print("Train the matrix shape",X_train.shape)print("Test the matrix shape",X_test.shape) #One Hot encoding of labels.fromkeras.utils.np_utilsimportto_categoricalprint(y_train.shape)y_train=to_categorical(y_train,10)y_test=to_categorical(y_test,10)print(y_train.shape) Building the network For this example, you'll define the following: The input layer, which you should expect for each piece of MNIST data, as it tells the network the number of inputs Hidden layers, as they recognize patterns in data and also connect the input layer to the output layer The output layer, as it defines how the network learns and gives a label as the output for a given image, as follows: # Defining the neural networkdefbuild_model():model=Sequential()model.add(Dense(512,input_shape=(784,)))model.add(Activation('relu'))# An "activation" is just a non-linear function that is applied to the output# of the above layer. In this case, with a "rectified linear unit",# we perform clamping on all values below 0 to 0.model.add(Dropout(0.2))#With the help of Dropout helps we can protect the model from memorizing or "overfitting" the training datamodel.add(Dense(512))model.add(Activation('relu'))model.add(Dropout(0.2))model.add(Dense(10))model.add(Activation('softmax'))# This special "softmax" activation,#It also ensures that the output is a valid probability distribution,#Meaning that values obtained are all non-negative and sum up to 1.returnmodel #Building the modelmodel=build_model() model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy']) Training the network Now that we've constructed the network, we feed it with data and train it, as follows: # Trainingmodel.fit(X_train,y_train,batch_size=128,nb_epoch=4,verbose=1,validation_data=(X_test,y_test)) Testing After you're satisfied with the training output and accuracy, you can run the network on the test dataset to measure its performance! A good result will obtain an accuracy higher than 95%. Some simple models have been known to achieve even up to 99.7% accuracy! We can test the model, as shown here: # Comparing the labels predicted by our model with the actual labelsscore=model.evaluate(X_test,y_test,batch_size=32,verbose=1,sample_weight=None)# Printing the resultprint('Test score:',score[0])print('Test accuracy:',score[1]) To summarize we got to know about the building blocks of neural networks and we successfully built a neural network that recognized handwritten numbers using MNIST dataset in Keras. To implement award winning and cutting edge CNN architectures, check out this one stop guide published by Packtpub, Practical Convolutional Neural Networks. Are Recurrent Neural Networks capable of warping time? Recurrent neural networks and the LSTM architecture Build a generative chatbot using recurrent neural networks (LSTM RNNs)
Read more
  • 0
  • 0
  • 6257
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-create-your-first-openai-gym-environment-tutorial
Savia Lobo
19 Sep 2018
7 min read
Save for later

Create your first OpenAI Gym environment [Tutorial]

Savia Lobo
19 Sep 2018
7 min read
OpenAI Gym is an open source toolkit that provides a diverse collection of tasks, called environments, with a common interface for developing and testing your intelligent agent algorithms. The toolkit introduces a standard Application Programming Interface (API) for interfacing with environments designed for reinforcement learning. Each environment has a version attached to it, which ensures meaningful comparisons and reproducible results with the evolving algorithms and the environments themselves. This article is an excerpt taken from the book, Hands-On Intelligent Agents with OpenAI Gym, written by Praveen Palanisamy. In this article, you will get to know what OpenAI Gym is, its features, and later create your own OpenAI Gym environment. The Gym toolkit, through its various environments, provides an episodic setting for reinforcement learning, where an agent's experience is broken down into a series of episodes. In each episode, the initial state of the agent is randomly sampled from a distribution, and the interaction between the agent and the environment proceeds until the environment reaches a terminal state. Do not worry if you are not familiar with reinforcement learning. Some of the basic environments available in the OpenAI Gym library are shown in the following screenshot: Examples of basic environments available in the OpenAI Gym with a short description of the task The OpenAI Gym natively has about 797 environments spread over different categories of tasks. The famous Atari category has the largest share with about 116 (half with screen inputs and half with RAM inputs) environments! The categories of tasks/environments supported by the toolkit are listed here: Algorithmic Atari Board games Box2D Classic control Doom (unofficial) Minecraft (unofficial) MuJoCo Soccer Toy text Robotics (newly added) The various types of environment (or tasks) available under the different categories, along with a brief description of each environment, is given next. Keep in mind that you may need some additional tools and packages installed on your system to run environments in each of these categories. To have a detailed overview of each of these categories, head over to the book. With that, you have a very good overview of all the different categories and types of environment that are available as part of the OpenAI Gym toolkit. It is worth noting that the release of the OpenAI Gym toolkit was accompanied by an OpenAI Gym website (gym.openai.com), which maintained a scoreboard for every algorithm that was submitted for evaluation. It showcased the performance of user-submitted algorithms, and some submissions were also accompanied by detailed explanations and source code. Unfortunately, OpenAI decided to withdraw support for the evaluation website. The service went offline in September 2017. Now you have a good picture of the various categories of environment available in OpenAI Gym and what each category provides you with. Next, we will look at the key features of OpenAI Gym that make it an indispensable component in many of today's advancements in intelligent agent development, especially those that use reinforcement learning or deep reinforcement learning. Understanding the features of OpenAI Gym Here, we will take a look at the key features that have made the OpenAI Gym toolkit very popular in the reinforcement learning community and led to it becoming widely adopted. Simple environment interface OpenAI Gym provides a simple and common Python interface to environments. Specifically, it takes an action as input and provides observation, reward, done and an optional info object, based on the action as the output at each step. If this does not make perfect sense to you yet, do not worry. We will go over the interface again in a more detailed manner to help you understand. This paragraph is just to give you an overview of the interface to make it clear how simple it is. This provides great flexibility for users as they can design and develop their agent algorithms based on any paradigm they like, and not be constrained to use any particular paradigm because of this simple and convenient interface. Comparability and reproducibility We intuitively feel that we should be able to compare the performance of an agent or an algorithm in a particular task to the performance of another agent or algorithm in the same task. For example, if an agent gets a score of 1,000 on average in the Atari game of Space Invaders, we should be able to tell that this agent is performing worse than an agent that scores 5000 on average in the Space Invaders game in the same amount of training time. But what happens if the scoring system for the game is slightly changed? Or if the environment interface was modified to include additional information about the game states that will provide an advantage to the second agent? This would make the score-to-score comparison unfair, right? To handle such changes in the environment, OpenAI Gym uses strict versioning for environments. The toolkit guarantees that if there is any change to an environment, it will be accompanied by a different version number. Therefore, if the original version of the Atari Space Invaders game environment was named SpaceInvaders-v0 and there were some changes made to the environment to provide more information about the game states, then the environment's name would be changed to SpaceInvaders-v1. This simple versioning system makes sure we are always comparing performance measured on the exact same environment setup. This way, the results obtained are comparable and reproducible. Ability to monitor progress All the environments available as part of the Gym toolkit are equipped with a monitor. This monitor logs every time step of the simulation and every reset of the environment. What this means is that the environment automatically keeps track of how our agent is learning and adapting with every step. You can even configure the monitor to automatically record videos of the game while your agent is learning to play. Creating your first OpenAI Gym environment This section provides a quick way to get started with the OpenAI Gym Python API on Linux and macOS using virtualenv so that you can get a sneak peak into the Gym! MacOS and Ubuntu Linux systems come with Python installed by default. You can check which version of Python is installed by running python --version from a terminal window. If this returns python followed by a version number, then you are good to proceed to the next steps! If you get an error saying the Python command was not found, then you have to install Python. Install virtualenv: $pip install virtualenv If pip is not installed on your system, you can install it by typing sudo easy_install pip. Create a virtual environment named openai-gym using the virtualenv tool: $virtualenv openai-gym Activate the openai-gym virtual environment: $source openai-gym/bin/activate Install all the packages for the Gym toolkit from upstream: $pip install -U gym If you get permission denied or failed with error code 1 when you run the pip install command, it is most likely because the permissions on the directory you are trying to install the package to (the openai-gym directory inside virtualenv in this case) needs special/root privileges. You can either run sudo -H pip install -U gym[all] to solve the issue or change permissions on the openai-gym directory by running sudo chmod -R o+rw ~/openai-gym. Test to make sure the installation is successful: $python -c 'import gym; gym.make("CartPole-v0");' Creating and visualizing a new Gym environment In just a minute or two, you have created an instance of an OpenAI Gym environment to get started! Let's open a new Python prompt and import the gym module: >>import gym Once the gym module is imported, we can use the gym.make method to create our new environment like this: >>env = gym.make('CartPole-v0') >>env.reset() env.render() This will bring up a window like this: Hooray! Summary In this post, you learned what OpenAI Gym is, its features, and created your first OpenAI Gym environment. You now have a very good idea about OpenAI Gym. If you've enjoyed this post, head over to the book, Hands-On Intelligent Agents with OpenAI Gym, to know about other latest learning environments and learning algorithms. Extending OpenAI Gym environments with Wrappers and Monitors [Tutorial] How to build a cartpole game using OpenAI Gym Top 5 tools for reinforcement learningTop 5 tools for reinforcement learning
Read more
  • 0
  • 0
  • 24021

article-image-10-useful-google-cloud-ai-services-for-your-next-machine-learning-project-tutorial
Savia Lobo
18 Sep 2018
9 min read
Save for later

10 useful Google Cloud AI services for your next machine learning project [Tutorial]

Savia Lobo
18 Sep 2018
9 min read
Google Cloud seems to be using artificial intelligence as a strategy to unlock more customers in the ever increasing race to hyper-competitive cloud infrastructure landscape.  Cloud AI provides modern machine learning services, including pre-trained models and a service to create your own tailored models.  It has increased accuracy compared to other deep learning systems. Google Cloud AI is fast, scalable, and easy to use. In this tutorial, we will learn the various Google Cloud AI services. This article is an excerpt from a book written by Arvind Ravulavaru titled Google Cloud AI Services Quick Start Guide. Cloud AutoML Alpha As of April 2018, Cloud AutoML is in alpha and is only available on request, subject to GCP terms and conditions. AutoML helps us develop custom machine learning models with minimal ML knowledge and experience, using the power of Google's transfer learning and Neural Architecture Search technology. Under this service, the first custom service that Google is releasing is named AutoML Vision. This service will help users to train custom vision models for their own use cases. There are other services that will follow. Some of the key AutoML features are the following: Integration with human labeling Powered by Google's Transfer Learning and AutoML Fully integrated with other services of Google Cloud You can read more about AutoML here: https://cloud.google.com/automl/. Cloud TPU Beta As of today, this service is in beta, but we need to explicitly request a TPU quota for our processing needs. Using the Cloud TPUs, one can easily request large computation power to run our own machine learning algorithms. This service helps us with not only the required computing, but by using Google's TensorFlow, we can accelerate the complete setup. This service can be used to perform heavy-duty machine learning, both training and prediction. Some of the key Cloud TPU features are the following: High performance Utilizing the power of GCP Referencing data models Fully Integrated with other services of Google Cloud Connecting Cloud TPUs to custom machine types You can read more about Cloud TPU here: https://cloud.google.com/tpu/. Cloud Machine Learning Engine Cloud Machine Learning Engine helps us easily build machine learning models that work on any type of data, of any size. Cloud Machine Learning Engine can take any TensorFlow model and perform large-scale training on a managed cluster. Additionally, it can also manage the trained models for large-scale online and batch predictions. Cloud Machine Learning Engine can seamlessly transition from training to prediction, using online and batch prediction services. Cloud Machine Learning Engine uses the same scalable and distributed infrastructure with GPU acceleration that powers Google ML products. Some of the key Cloud Machine Learning Engine features are the following: Fully integrated with other Google Cloud services Discover and Share Samples HyperTune your models Managed and Scalable Service Notebook Developer Experience Portable Models You can read more about Cloud Machine Learning Engine here: https://cloud.google.com/ml-engine/. Cloud Job Discovery Private Beta Matching qualified people with the right people doesn't have to be so hard; that is the premise of Cloud Job Discovery. Today's job portals and career sites search people for a job role based on keywords. This approach most of the time results in a mismatch of the candidate to the role. That is where Cloud Job Discovery comes into the picture to bridge the gap between employer and employee. Job Discovery provides plug-and-play access to Google's search and machine learning capabilities, enabling the entire recruiting ecosystem—company career sites, job boards, applicant-tracking systems, and staffing agencies—to improve job site engagement and candidate conversion. Before we continue, you can navigate to https://cloud.google.com/job-discovery/ and try out the Job Discovery Demo. You should see results based on your selection, similar to the following screenshot: The key takeaway from the demo is how Discovery relates a profile to a keyword. This diagram explains how Cloud Job Discovery works: Some of the key differences of Cloud Job Discovery over a standard keyword search are the following: Keyword matching Company jargon recognition Abbreviation recognition Commute search Spelling correction Concept recognition Title detection Real-time query broadening Employer recognition Job enrichment Advanced location mapping Location expansion Seniority alignment Dialogflow Enterprise Edition Beta Dialogflow is a development suite which is used for building interfaces for websites, mobile applications, some of the popular machine learning platforms, and IoT devices. It is powered by machine learning to recognize the intent and context of what a user says, allowing your conversational interface to provide highly efficient and accurate responses. Natural language understanding recognizes a user's intent and extracts prebuilt entities such as time, date, and numbers. You can train your agent to identify custom entity types by providing a small dataset of examples. This service offers cross-platform and multi-language support and can work well with the Google Cloud speech service. You can read more about Dialogflow Enterprise Edition here: https://cloud.google.com/dialogflow-enterprise/. Cloud Natural Language Google's Cloud Natural Language service helps us better understand the structure and meaning of a piece of text by providing powerful machine learning models. These models can be queried by REpresentational State Transfer (REST) API. We can use it to understand sentiment about our product on social media, or parse intent from customer conversations happening in a call center or through a messaging app. Before we continue with Cloud Natural Language, I would recommend heading over to https://cloud.google.com/natural-language/ and trying out the API. Here is a quick glimpse of it: As we can see from the previous screenshot, this service offers various insights regarding a piece of text. Some of the key features are: Syntax analysis Entity recognition Sentiment analysis Content classification Multi-language Integrated REST API You can read more about Cloud Natural Language service here: https://cloud.google.com/natural-language/. Cloud Speech API Cloud Speech API uses powerful neural network models to convert audio to text in real time. This service is exposed as a REST API, as we have seen with the Google Cloud Natural Language API. This API can recognize over 110 languages and users can use this service to convert speech to text in real time, recognize audio uploaded in the request, and integrate with our audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products. Before we continue with Cloud Speech API, I would recommend heading over to https://cloud.google.com/speech/ and trying out the API. Here is a quick glimpse of it: I was actually playing a song in the background and tried the speech-to-text. I was very impressed with the results, except for one part, where I said with a song playing and the API represented it as with the song playing; still, pretty good! I think it is only a matter of time and continued use of these services that will increase their accuracy. Some of the key features of Cloud Speech API are: Automatic Speech Recognition (ASR) Global vocabulary Streaming recognition Word hints Real-time or prerecorded audio support Noise robustness Inappropriate content filtering Integrated API You can read more about Cloud Speech API here: https://cloud.google.com/speech/. Cloud Translation API Using the state-of-the-art Neural Machine Translation, the Cloud Translation service converts texts from one language to another. Translation API is highly responsive, so websites and applications can integrate with Translation API for fast, dynamic translation of source text from the source language to a target language. Before we continue with Cloud Translation API, I would recommend heading over to https://cloud.google.com/translate/ and trying out the API. Here is a quick glimpse of it, as shown in the following screenshot: Some of the key features of Cloud Translation API are as follows: Programmatic access – REST API-driven Text translation Language detection Continuous updates You can read more about Cloud Translate API here: https://cloud.google.com/translate/. Cloud Vision API Fred R. Barnard of Printers' Ink stated "A picture is worth ten thousand words". But no one really knows what those words are. Here comes the Google Cloud Vision API to decipher that for us. Cloud Vision API takes an image as input and spits out the contents of the image as text. It can understand the contents of the image. And this service can be accessed over REST API. Before we continue with Cloud Vision API, I would recommend heading over to https://cloud.google.com/vision/ and trying out the API. Here is a quick glimpse of it as shown in the screenshot: That is a photo of me when I was going through a trying-to-grow-long-hair phase, and after having fun at the beach. What is important is how the vision service was able to look at the image and detect my mood. The same service can perform label detection as well as detect web entities related to this image among others. Some of the key features of this service are: Detecting explicit content Detecting logos, labels, landmarks Landmark detection Optical character recognition Face detection Image attributes Integrated REST API To find out more about Cloud Vision API, check this out: https://cloud.google.com/vision/. Cloud Video Intelligence Cloud Video Intelligence is one of the latest cognitive services released by Google. Cloud Video Intelligence API does almost all the things that the Cloud Vision API can do, but on videos. This service extracts the metadata from a video frame by frame, and we can search any moment of the video file. Before we continue with Cloud Video Intelligence, I would recommend heading over to https://cloud.google.com/video-intelligence/ and trying out the API. Here is a quick glimpse of it, as shown in the screenshot: I have selected the dinosaur and the bicycle video, and you can see the analysis. Some of the key features of Cloud Video Intelligence are: Label detection Shot change detection Explicit content detection Video transcription Alpha This concludes the overview of the various services offered as part of the Cloud AI vertical. In this book, we are going to use a few of these to make a simple web application smart. Summary In this tutorial, we have understood what is truly inside Google Cloud AI.  We have seen the different sercives that it offers along with their key features. In order to Leverage the power of various Google Cloud AI Services by building a smart web application using MEAN Stack,  check out this book  Google Cloud AI Services Quick Start Guide Google’s event-driven serverless platform, Cloud Function, is now generally available What’s new in Google Cloud Functions serverless platform Google Cloud Next: Fei-Fei Li reveals new AI tools for developers
Read more
  • 0
  • 0
  • 2660

article-image-bias-variance-tradeoff-choose-bias-and-variance-machine-learning-model-tutorial
Savia Lobo
17 Sep 2018
15 min read
Save for later

Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial]

Savia Lobo
17 Sep 2018
15 min read
This article is an excerpt taken from the book Hands-On Data Science and Python Machine Learning authored by Frank Kane. In this article, we're going to start by talking about the bias-variance trade-off, which is kind of a more principled way of talking about the different ways you might overfit and underfit data, and how it all interrelates with each other. We will later talk about the k-fold cross-validation technique, which is an important tool in your chest to combat overfitting and look at how to implement it using Python. Finally, we look at how to detect outliers and deal with them. Bias is just how far off you are from the correct values, that is, how good are your predictions overall in predicting the right overall value. If you take the mean of all your predictions, are they more or less on the right spot? Or are your errors all consistently skewed in one direction or another? If so, then your predictions are biased in a certain direction. Variance is just a measure of how spread out, or how scattered your predictions are. So, if your predictions are all over the place, then that's high variance. But, if they're very tightly focused on what the correct values are, or even an incorrect value in the case of high bias, then your variance is small. In reality, you often need to choose between bias and variance. It comes down to overfitting Vs underfitting your data. Let's take a look at the following example: It's a little bit of a different way of thinking of bias and variance. So, in the left graph, we have a straight line, and you can think of that as having very low variance, relative to these observations. So, there's not a lot of variance in this line, that is, there is low variance. But the bias, the error from each individual point, is actually high. Now, contrast that to the overfitted data in the graph at the right, where we've kind of gone out of our way to fit the observations. The line has high variance, but low bias, because each individual point is pretty close to where it should be. So, this is an example of where we traded off variance for bias. At the end of the day, you're not out to just reduce bias or just reduce variance, you want to reduce error. That's what really matters, and it turns out you can express error as a function of bias and variance: Looking at this, error is equal to bias squared plus variance. So, these things both contribute to the overall error, with bias actually contributing more. But keep in mind, it's error you really want to minimize, not the bias or the variance specifically, and that an overly complex model will probably end up having a high variance and low bias, whereas a too simple model will have low variance and high bias. However, they could both end up having similar error terms at the end of the day. You just have to find the right happy medium of these two things when you're trying to fit your data. This is bias-variance trade-off. You know the decision you have to make between how overall accurate your values are, and how spread out they are or how tightly clustered they are. That's the bias-variance trade-off and they both contribute to the overall error, which is the thing you really care about minimizing. So, keep those terms in mind! K-fold cross-validation to avoid overfitting Train and test as a good way of preventing overfitting and actually measuring how well your model can perform on data it's never seen before. We can take that to the next level with a technique called k-fold cross-validation. So, let's talk about this powerful tool in your arsenal for fighting overfitting; k-fold cross-validation and learn how that works. The idea, although it sounds complicated, is fairly simple: Instead of dividing our data into two buckets, one for training and one for testing, we divide it into K buckets. We reserve one of those buckets for testing purposes, for evaluating the results of our model. We train our model against the remaining buckets that we have, K-1, and then we take our test dataset and use that to evaluate how well our model did amongst all of those different training datasets. We average those resulting error metrics, that is, those r-squared values, together to get a final error metric from k-fold cross-validation. Example of k-fold cross-validation using scikit-learn Fortunately, scikit-learn makes this really easy to do, and it's even easier than doing normal train/test! It's extremely simple to do k-fold cross-validation, so you may as well just do it. Now, the way this all works in practice is you will have a model that you're trying to tune, and you will have different variations of that model, different parameters you might want to tweak on it, right? Like, for example, the degree of polynomial for a polynomial fit. So, the idea is to try different values of your model, different variations, measure them all using k-fold cross-validation, and find the one that minimizes error against your test dataset. That's kind of your sweet spot there. In practice, you want to use k-fold cross-validation to measure the accuracy of your model against a test dataset, and just keep refining that model, keep trying different values within it, keep trying different variations of that model or maybe even different models entirely, until you find the technique that reduces error the most, using k-fold cross validation. Please go ahead and open up the KFoldCrossValidation.ipynb and follow along if you will. We're going to look at the Iris dataset again; remember we introduced this when we talk about dimensionality reduction? We're going to use the SVC model. If you remember back again, that's just a way of classifying data that's pretty robust. There's a section on that if you need to go and refresh your memory: import numpy as np from sklearn import cross_validation from sklearn import datasets from sklearn import svm iris = datasets.load_iris() # Split the iris data into train/test data sets with #40% reserved for testing X_train, X_test, y_train, y_test = cross_validation.train_test_split(iris.data, iris.target, test_size=0.4, random_state=0) # Build an SVC model for predicting iris classifications #using training data clf = svm.SVC(kernel='linear', C=1).fit(X_train, y_train) # Now measure its performance with the test data clf.score(X_test, y_test) What we do is use the cross_validation library from scikit-learn, and we start by just doing a conventional train test split, just a single train/test split, and see how that will work. To do that we have a train_test_split() function that makes it pretty easy. So, the way this works is we feed into train_test_split() a set of feature data. iris.data just contains all the actual measurements of each flower. iris.target is basically the thing we're trying to predict. In this case, it contains all the species for each flower. test_size says what percentage do we want to train versus test. So, 0.4 means we're going to extract 40% of that data randomly for testing purposes, and use 60% for training purposes. What this gives us back is 4 datasets, basically, a training dataset and a test dataset for both the feature data and the target data. So, X_train ends up containing 60% of our Iris measurements, and X_test contains 40% of the measurements used for testing the results of our model. y_train and y_test contain the actual species for each one of those segments. Then after that we go ahead and build an SVC model for predicting Iris species given their measurements, and we build that only using the training data. We fit this SVC model, using a linear kernel, using only the training feature data, and the training species data, that is, target data. We call that model clf. Then, we call the score() function on clf to just measure its performance against our test dataset. So, we score this model against the test data we reserved for the Iris measurements, and the test Iris species, and see how well it does: It turns out it does really well! Over 96% of the time, our model is able to correctly predict the species of an Iris that it had never seen before, just based on the measurements of that Iris. So that's pretty cool! But, this is a fairly small dataset, about 150 flowers if I remember right. So, we're only using 60% of 150 flowers for training and only 40% of 150 flowers for testing. These are still fairly small numbers, so we could still be overfitting to our specific train/test split that we made. So, let's use k-fold cross-validation to protect against that. It turns out that using k-fold cross-validation, even though it's a more robust technique, is actually even easier to use than train/test. So, that's pretty cool! So, let's see how that works: # We give cross_val_score a model, the entire data set and its "real" values, and the number of folds: scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5) # Print the accuracy for each fold: print scores # And the mean accuracy of all 5 folds: print scores.mean() We have a model already, the SVC model that we defined for this prediction, and all you need to do is call cross_val_score() on the cross_validation package. So, you pass in this function a model of a given type (clf), the entire dataset that you have of all of the measurements, that is, all of my feature data (iris.data) and all of my target data (all of the species), iris.target. I want cv=5 which means it's actually going to use 5 different training datasets while reserving 1 for testing. Basically, it's going to run it 5 times, and that's all we need to do. That will automatically evaluate our model against the entire dataset, split up five different ways, and give us back the individual results. If we print back the output of that, it gives us back a list of the actual error metric from each one of those iterations, that is, each one of those folds. We can average those together to get an overall error metric based on k-fold cross-validation: When we do this over 5 folds, we can see that our results are even better than we thought! 98% accuracy. So that's pretty cool! In fact, in a couple of the runs we had perfect accuracy. So that's pretty amazing stuff. Now let's see if we can do even better. We used a linear kernel before, what if we used a polynomial kernel and got even fancier? Will that be overfitting or will it actually better fit the data that we have? That kind of depends on whether there's actually a linear relationship or polynomial relationship between these petal measurements and the actual species or not. So, let's try that out: clf = svm.SVC(kernel='poly', C=1).fit(X_train, y_train) scores = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5) print scores print scores.mean() We'll just run this all again, using the same technique. But this time, we're using a polynomial kernel. We'll fit that to our training dataset, and it doesn't really matter where you fit to in this case, because cross_val_score() will just keep re-running it for you: It turns out that when we use a polynomial fit, we end up with an overall score that's even lower than our original run. So, this tells us that the polynomial kernel is probably overfitting. When we use k-fold cross-validation it reveals an actual lower score than with our linear kernel. The important point here is that if we had just used a single train/test split, we wouldn't have realized that we were overfitting. We would have actually gotten the same result if we just did a single train/test split here as we did on the linear kernel. So, we might inadvertently be overfitting our data there, and not have even known it had we not use k-fold cross-validation. So, this is a good example of where k-fold comes to the rescue, and warns you of overfitting, where a single train/test split might not have caught that. So, keep that in your tool chest. If you want to play around with this some more, go ahead and try different degrees. So, you can actually specify a different number of degrees. The default is 3 degrees for the polynomial kernel, but you can try a different one, you can try two. Detecting outliers A common problem with real-world data is outliers. You'll always have some strange users or some strange agents that are polluting your data, that act abnormally and atypically from the typical user. They might be legitimate outliers; they might be caused by real people and not by some sort of malicious traffic, or fake data. So sometimes, it's appropriate to remove them, sometimes it isn't. Dealing with outliers So, let's take some example code, and see how you might handle outliers in practice. Let's mess around with some outliers. It's a pretty simple section. A little bit of review actually. If you want to follow along, we're in Outliers.ipynb. So, go ahead and open that up if you'd like: import numpy as np incomes = np.random.normal(27000, 15000, 10000) incomes = np.append(incomes, [1000000000]) import matplotlib.pyplot as plt plt.hist(incomes, 50) plt.show() What we're going to do is start off with a normal distribution of incomes here that are have a mean of $27,000 per year, with a standard deviation of 15,000. I'm going to create 10,000 fake Americans that have an income in that distribution. This is totally made-up data, by the way, although it's not that far off from reality. Then, I'm going to stick in an outlier - call it Donald Trump, who has a billion dollars. We're going to stick this guy in at the end of our dataset. So, we have a normally distributed dataset around $27,000, and then we're going to stick in Donald Trump at the end. We'll go ahead and plot that as a histogram: We have the entire normal distribution of everyone else in the country squeezed into one bucket of the histogram. On the other hand, we have Donald Trump out at the right side screwing up the whole thing at a billion dollars. The other problem too is that if I'm trying to answer the question how much money does the typical American make. If I take the mean to try and figure that out, it's not going to be a very good, useful number: incomes.mean () The output of the preceding code is as follows: 126892.66469341301 Donald Trump has pushed that number up all by himself to $126,000 and some odd of change, when I know that the real mean of my normally distributed data that excludes Donald Trump is only $27,000. So, the right thing to do there would be to use the median instead of the mean. A better thing to do would be to actually measure the standard deviation of your dataset, and identify outliers as being some multiple of a standard deviation away from the mean. So, following is a little function that I wrote that does just that. It's called reject_outliers(): def reject_outliers(data): u = np.median(data) s = np.std(data) filtered = [e for e in data if (u - 2 * s < e < u + 2 * s)] return filtered filtered = reject_outliers(incomes) plt.hist(filtered, 50) plt.show() It takes in a list of data and finds the median. It also finds the standard deviation of that dataset. So, I filter that out, so I only preserve data points that are within two standard deviations of the median for my data. So, I can use this handy dandy reject_outliers() function on my income data, to actually strip out weird outliers automatically: Sure enough, it works! I get a much prettier graph now that excludes Donald Trump and focuses in on the more typical dataset here in the center. So, pretty cool stuff! So, that's one example of identifying outliers, and automatically removing them, or dealing with them however you see fit. Remember, always do this in a principled manner. Don't just throw out outliers because they're inconvenient. Understand where they're coming from, and how they actually affect the thing you're trying to measure in spirit. By the way, our mean is also much more meaningful now; much closer to 27,000 that it should be, now that we've gotten rid of that outlier. In this article we came across the Bias-variance tradeoff and how to minimize the error. We also saw the concept of k-fold cross-validation and how to implement it in Python to prevent overfitting. If you've enjoyed this excerpt, head over to the book Hands-On Data Science and Python Machine Learning to prepare your data for analysis, training machine learning models, and visualizing the final data analysis, and much more. 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017 Here’s how you can handle the bias variance trade-off in your ML models
Read more
  • 0
  • 0
  • 3103

article-image-understanding-the-tensorflow-data-model-tutorial
Sugandha Lahoti
16 Sep 2018
12 min read
Save for later

Understanding the TensorFlow data model [Tutorial]

Sugandha Lahoti
16 Sep 2018
12 min read
TensorFlow is a mathematical software and an open source framework for deep learning developed by the Google Brain Team in 2011. Nevertheless, it can be used to help us analyze data in order to predict an effective business outcome. Although the initial target of TensorFlow was to conduct research in ML and in Deep Neural Networks (DNNs), the system is general enough to be applicable to a wide variety of classical machine learning algorithm such as Support Vector Machine (SVM), logistic regression, decision trees, random forest and so on. In this article we will talk about data model in TensorFlow. The data model in TensorFlow is represented by tensors. Without using complex mathematical definitions, we can say that a tensor (in TensorFlow) identifies a multidimensional numerical array. We will see more details on tensors in the next subsection. This article is taken from the book Deep Learning with TensorFlow - Second Edition by Giancarlo Zaccone and Md. Rezaul Karim. In this book, we will delve into neural networks, implement deep learning algorithms, and explore layers of data abstraction with the help of TensorFlow. Tensors in a data model Let's see the formal definition of tensor on Wikipedia, as follows: "Tensors are geometric objects that describe linear relations between geometric vectors, scalars, and other tensors. Elementary examples of such relations include the dot product, the cross product, and linear maps. Geometric vectors, often used in physics and engineering applications, and scalars themselves are also tensors." This data structure is characterized by three parameters: rank, shape, and type, as shown in the following figure: Figure 6: Tensors are nothing but geometric objects with a shape, rank, and type, used to hold a multidimensional array A tensor can thus be thought of as the generalization of a matrix that specifies an element with an arbitrary number of indices. The syntax for tensors is more or less the same as nested vectors. [box type="shadow" align="" class="" width=""]Tensors just define the type of this value and the means by which this value should be calculated during the session. Therefore, they do not represent or hold any value produced by an operation.[/box] Some people love to compare NumPy and TensorFlow. However, in reality, TensorFlow and NumPy are quite similar in the sense that both are N-d array libraries! Well, it's true that NumPy has n-dimensional array support, but it doesn't offer methods to create tensor functions and automatically compute derivatives (and it has no GPU support). The following figure is a short and one-to-one comparison of NumPy and TensorFlow: Figure 7: NumPy versus TensorFlow: a one-to-one comparison Now let's see an alternative way of creating tensors before they could be fed (we will see other feeding mechanisms later on) by the TensorFlow graph: >>> X = [[2.0, 4.0],        [6.0, 8.0]] # X is a list of lists >>> Y = np.array([[2.0, 4.0],              [6.0, 6.0]], dtype=np.float32)#Y is a Numpy array >>> Z = tf.constant([[2.0, 4.0],                 [6.0, 8.0]]) # Z is a tensor Here, X is a list, Y is an n-dimensional array from the NumPy library, and Z is a TensorFlow tensor object. Now let's see their types: >>> print(type(X)) >>> print(type(Y)) >>> print(type(Z)) #Output <class 'list'> <class 'numpy.ndarray'> <class 'tensorflow.python.framework.ops.Tensor'> Well, their types are printed correctly. However, a more convenient function that we're formally dealing with tensors as opposed to the other types is tf.convert_to_tensor() function as follows: t1 = tf.convert_to_tensor(X, dtype=tf.float32) t2 = tf.convert_to_tensor(Y dtype=tf.float32) Now let's see their types using the following code: >>> print(type(t1)) >>> print(type(t2)) #Output: <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'> Fantastic! That's enough discussion about tensors for now. So, we can think about the structure that is characterized by the term rank. Rank and shape of Tensors A unit of dimensionality called rank describes each tensor. It identifies the number of dimensions of the tensor. For this reason, a rank is known as order or n–dimensions of a tensor. A rank zero tensor is a scalar, a rank one tensor is a vector, and a rank two tensor is a matrix. The following code defines a TensorFlow scalar, vector, matrix, and cube_matrix. In the next example, we will show how rank works: import tensorflow as tf scalar = tf.constant(100) vector = tf.constant([1,2,3,4,5]) matrix = tf.constant([[1,2,3],[4,5,6]]) cube_matrix = tf.constant([[[1],[2],[3]],[[4],[5],[6]],[[7],[8],[9]]]) print(scalar.get_shape()) print(vector.get_shape()) print(matrix.get_shape()) print(cube_matrix.get_shape()) The results are printed here: >>> () (5,) (2, 3) (3, 3, 1) >>> The shape of a tensor is the number of rows and columns it has. Now we will see how to relate the shape of a tensor to its rank: >>scalar.get_shape() TensorShape([]) >>vector.get_shape() TensorShape([Dimension(5)]) >>matrix.get_shape() TensorShape([Dimension(2), Dimension(3)]) >>cube.get_shape() TensorShape([Dimension(3), Dimension(3), Dimension(1)]) Data type of Tensors In addition to rank and shape, tensors have a data type. Here is a list of the data types: Data type Python type Description DT_FLOAT tf.float32 32-bit floating point DT_DOUBLE tf.float64 64-bit floating point DT_INT8 tf.int8 8-bit signed integer DT_INT16 tf.int16 16-bit signed integer DT_INT32 tf.int32 32-bit signed integer DT_INT64 tf.int64 64-bit signed integer DT_UINT8 tf.uint8 8-bit unsigned integer DT_STRING tf.string Variable length byte arrays. Each element of a tensor is a byte array DT_BOOL tf.bool Boolean DT_COMPLEX64 tf.complex64 Complex number made of two 32-bit floating points: real and imaginary parts DT_COMPLEX128 tf.complex128 Complex number made of two 64-bit floating points: real and imaginary parts DT_QINT8 tf.qint8 8-bit signed integer used in quantized Ops DT_QINT32 tf.qint32 32-bit signed integer used in quantized Ops DT_QUINT8 tf.quint8 8-bit unsigned integer used in quantized Ops The preceding table is self-explanatory, so we have not provided a detailed discussion of the data types. The TensorFlow APIs are implemented to manage data to and from NumPy arrays. Thus, to build a tensor with a constant value, pass a NumPy array to the tf.constant() operator, and the result will be a tensor with that value: import tensorflow as tf import numpy as np array_1d = np.array([1,2,3,4,5,6,7,8,9,10]) tensor_1d = tf.constant(array_1d) with tf.Session() as sess:    print(tensor_1d.get_shape())    print(sess.run(tensor_1d)) # Close the TensorFlow session when you're done sess.close() Running the example, we obtain the following: >>> (10,) [ 1  2 3 4  5 6 7 8  9 10] To build a tensor with variable values, use a NumPy array and pass it to the tf.Variable constructor. The result will be a variable tensor with that initial value: import tensorflow as tf import numpy as np # Create a sample NumPy array array_2d = np.array([(1,2,3),(4,5,6),(7,8,9)]) # Now pass the preceding array to tf.Variable() tensor_2d = tf.Variable(array_2d) # Execute the preceding op under an active session with tf.Session() as sess:    sess.run(tf.global_variables_initializer())    print((tensor_2d.get_shape()))    print sess.run(tensor_2d) # Finally, close the TensorFlow session when you're done sess.close() In the preceding code block, tf.global_variables_initializer() is used to initialize all the ops we created before. If you need to create a variable with an initial value dependent on another variable, use the other variable's initialized_value(). This ensures that variables are initialized in the right order. The result is as follows: >>> (3, 3) [[1 2 3] [4 5 6] [7 8 9]] For ease of use in interactive Python environments, we can use the InteractiveSession class, and then use that session for all Tensor.eval() and Operation.run() calls: import tensorflow as tf # Import TensorFlow import numpy as np # Import numpy # Create an interactive TensorFlow session interactive_session = tf.InteractiveSession() # Create a 1d NumPy array array1 = np.array([1,2,3,4,5]) # An array # Then convert the preceding array into a tensor tensor = tf.constant(array1) # convert to tensor print(tensor.eval()) # evaluate the tensor op interactive_session.close() # close the session [box type="shadow" align="" class="" width=""]tf.InteractiveSession() is just convenient syntactic sugar for keeping a default session open in IPython.[/box] The result is as follows: >>>   [1 2 3 4 5] This can be easier in an interactive setting, such as the shell or an IPython Notebook, as it can be tedious to pass around a session object everywhere. [box type="shadow" align="" class="" width=""]The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. For more information, interested readers should refer to https://ipython.org/notebook.html.[/box] Another way to define a tensor is using the tf.convert_to_tensor statement: import tensorflow as tf import numpy as np tensor_3d = np.array([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],                   [[9, 10, 11], [12, 13, 14], [15, 16, 17]],                  [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) tensor_3d = tf.convert_to_tensor(tensor_3d, dtype=tf.float64) with tf.Session() as sess:    print(tensor_3d.get_shape())    print(sess.run(tensor_3d)) # Finally, close the TensorFlow session when you're done sess.close() Following is the output of the preceding code: >>> (3, 3, 3) [[[  0. 1.   2.]  [ 3.   4. 5.]  [ 6.   7. 8.]] [[  9. 10.  11.]  [ 12.  13. 14.]  [ 15.  16. 17.]] [[ 18.  19. 20.]  [ 21.  22. 23.]  [ 24.  25. 26.]]] Variables Variables are TensorFlow objects used to hold and update parameters. A variable must be initialized so that you can save and restore it to analyze your code later on. Variables are created by using either tf.Variable() or tf.get_variable() statements. Whereas tf.get_varaiable() is recommended but tf.Variable() is lower-label abstraction. In the following example, we want to count the numbers from 1 to 10, but let's import TensorFlow first: import tensorflow as tf We created a variable that will be initialized to the scalar value 0: value = tf.get_variable("value", shape=[], dtype=tf.int32, initializer=None, regularizer=None, trainable=True, collections=None) The assign() and add() operators are just nodes of the computation graph, so they do not execute the assignment until the session is run: one = tf.constant(1) update_value = tf.assign_add(value, one) initialize_var = tf.global_variables_initializer() We can instantiate the computation graph: with tf.Session() as sess:    sess.run(initialize_var)    print(sess.run(value))    for _ in range(5):        sess.run(update_value)        print(sess.run(value)) # Close the session sess.close() Let's recall that a tensor object is a symbolic handle to the result of an operation, but it does not actually hold the values of the operation's output: >>> 0 1 2 3 4 5 Fetches To fetch the output of an operation, the graph can be executed by calling run() on the session object and passing in the tensors. Apart from fetching a single tensor node, you can also fetch multiple tensors. In the following example, the sum and multiply tensors are fetched together using the run() call: import tensorflow as tf constant_A = tf.constant([100.0]) constant_B = tf.constant([300.0]) constant_C = tf.constant([3.0]) sum_ = tf.add(constant_A,constant_B) mul_ = tf.multiply(constant_A,constant_C) with tf.Session() as sess:    result = sess.run([sum_,mul_])# _ means throw away afterwards    print(result) # Finally, close the TensorFlow session when you're done: sess.close() The output is as follows: >>> [array(400.],dtype=float32),array([ 300.],dtype=float32)] It should be noted that all the ops that need to be executed (that is, in order to produce tensor values) are run once (not once per requested tensor). Feeds and placeholders There are four methods of getting data into a TensorFlow program (for more information, see https://www.tensorflow.org/api_guides/python/reading_data): The Dataset API: This enables you to build complex input pipelines from simple and reusable pieces of distributed filesystems and perform complex operations. Using the Dataset API is recommended if you are dealing with large amounts of data in different data formats. The Dataset API introduces two new abstractions to TensorFlow for creating a feedable dataset: tf.contrib.data.Dataset (by creating a source or applying transformation operations) and tf.contrib.data.Iterator. Feeding: This allows us to inject data into any tensor in a computation graph. Reading from files: This allows us to develop an input pipeline using Python's built-in mechanism for reading data from data files at the beginning of the graph. Preloaded data: For a small dataset, we can use either constants or variables in the TensorFlow graph to hold all the data. In this section, we will see an example of a feeding mechanism. TensorFlow provides a feed mechanism that allows us to inject data into any tensor in a computation graph. You can provide the feed data through the feed_dict argument to a run() or eval() invocation that initiates the computation. [box type="shadow" align="" class="" width=""]Feeding using feed_dict argument is the least efficient way to feed data into a TensorFlow execution graph and should only be used for small experiments needing small dataset. It can also be used for debugging.[/box] We can also replace any tensor with feed data (that is, variables and constants). Best practice is to use a TensorFlow placeholder node using tf.placeholder() (https://www.tensorflow.org/api_docs/python/tf/placeholder). A placeholder exists exclusively to serve as the target of feeds. An empty placeholder is not initialized, so it does not contain any data. Therefore, it will always generate an error if it is executed without a feed, so you won't forget to feed it. The following example shows how to feed data to build a random 2×3 matrix: import tensorflow as tf import numpy as np a = 3 b = 2 x = tf.placeholder(tf.float32,shape=(a,b)) y = tf.add(x,x) data = np.random.rand(a,b) sess = tf.Session() print(sess.run(y,feed_dict={x:data})) sess.close()# close the session The output is as follows: >>> [[ 1.78602004  1.64606333] [ 1.03966308  0.99269408] [ 0.98822606  1.50157797]] >>> We understood the data model in TensorFlow. To understand the TensorFlow computational graph and the TensorFlow code structure, read our book Deep Learning with TensorFlow - Second Edition. Why TensorFlow always tops machine learning and artificial intelligence tool surveys. TensorFlow 2.0 is coming. Here’s what we can expect. Getting to know and manipulate Tensors in TensorFlow.
Read more
  • 0
  • 0
  • 4959
article-image-how-to-perform-sentiment-analysis-using-python-tutorial
Sugandha Lahoti
15 Sep 2018
4 min read
Save for later

How to perform sentiment analysis using Python [Tutorial]

Sugandha Lahoti
15 Sep 2018
4 min read
Sentiment analysis is one of the most popular applications of NLP. Sentiment analysis refers to the process of determining whether a given piece of text is positive or negative. In some variations, we consider "neutral" as a third option. This technique is commonly used to discover how people feel about a particular topic. This is used to analyze the sentiments of users in various forms, such as marketing campaigns, social media, e-commerce customers, and so on. In this article, we will perform sentiment analysis using Python. This extract is taken from Python Machine Learning Cookbook by Prateek Joshi. This book contains 100 recipes that teach you how to perform various machine learning tasks in the real world. How to Perform Sentiment Analysis in Python Step 1: Create a new Python file, and import the following packages: import nltk.classify.util from nltk.classify import NaiveBayesClassifier from nltk.corpus import movie_reviews Step 2: Define a function to extract features: def extract_features(word_list): return dict([(word, True) for word in word_list]) Step 3: We need training data for this, so we will use movie reviews in NLTK: if __name__=='__main__':    # Load positive and negative reviews      positive_fileids = movie_reviews.fileids('pos')    negative_fileids = movie_reviews.fileids('neg') Step 4: Let's separate these into positive and negative reviews: features_positive = [(extract_features(movie_reviews.words(fileids=[f])),            'Positive') for f in positive_fileids]    features_negative = [(extract_features(movie_reviews.words(fileids=[f])),            'Negative') for f in negative_fileids] Step 5: Divide the data into training and testing datasets: # Split the data into train and test (80/20)    threshold_factor = 0.8    threshold_positive = int(threshold_factor * len(features_positive))    threshold_negative = int(threshold_factor * len(features_negative)) Step 6: Extract the features: features_train = features_positive[:threshold_positive] + features_negative[:threshold_negative]    features_test = features_positive[threshold_positive:] + features_negative[threshold_negative:]      print "\nNumber of training datapoints:", len(features_train)    print "Number of test datapoints:", len(features_test) Step 7: We will use a Naive Bayes classifier. Define the object and train it: # Train a Naive Bayes classifier    classifier = NaiveBayesClassifier.train(features_train)    print "\nAccuracy of the classifier:", nltk.classify.util.accuracy(classifier, features_test) Step 8: The classifier object contains the most informative words that it obtained during analysis. These words basically have a strong say in what's classified as a positive or a negative review. Let's print them out: print "\nTop 10 most informative words:"    for item in classifier.most_informative_features()[:10]:        print item[0] Step 9: Create a couple of random input sentences: # Sample input reviews    input_reviews = [        "It is an amazing movie",        "This is a dull movie. I would never recommend it to anyone.",        "The cinematography is pretty great in this movie",        "The direction was terrible and the story was all over the place"    ] Step 10: Run the classifier on those input sentences and obtain the predictions: print "\nPredictions:"    for review in input_reviews:        print "\nReview:", review        probdist = classifier.prob_classify(extract_features(review.split()))        pred_sentiment = probdist.max() Step 11: Print the output: print "Predicted sentiment:", pred_sentiment        print "Probability:", round(probdist.prob(pred_sentiment), 2) If you run this code, you will see three main things printed on the Terminal. The first is the accuracy, as shown in the following image: The next is a list of most informative words: The last is the list of predictions, which are based on the input sentences: How does the Code work? We use NLTK's Naive Bayes classifier for our task here. In the feature extractor function, we basically extract all the unique words. However, the NLTK classifier needs the data to be arranged in the form of a dictionary. Hence, we arranged it in such a way that the NLTK classifier object can ingest it. Once we divide the data into training and testing datasets, we train the classifier to categorize the sentences into positive and negative. If you look at the top informative words, you can see that we have words such as "outstanding" to indicate positive reviews and words such as "insulting" to indicate negative reviews. This is interesting information because it tells us what words are being used to indicate strong reactions. Thus we learn how to perform Sentiment Analysis in Python. For more interesting machine learning recipes read our book, Python Machine Learning Cookbook. Understanding Sentiment Analysis and other key NLP concepts. Twitter Sentiment Analysis. Sentiment Analysis of the 2017 US elections on Twitter.
Read more
  • 0
  • 0
  • 19932

article-image-how-facebook-is-advancing-artificial-intelligence-video
Richard Gall
14 Sep 2018
4 min read
Save for later

How Facebook is advancing artificial intelligence [Video]

Richard Gall
14 Sep 2018
4 min read
Facebook is playing a huge role in artificial intelligence research. It’s not only a core part of the Facebook platform, it’s central to how the organization works. The company launched its AI research lab - FAIR - back in 2013. Today, led by some of the best minds in the field, it's not only helping Facebook to leverage artificial intelligence, it's also making it more accessible to researchers and engineers around the world. Let’s take a look at some of the tools built by Facebook that are doing just that. PyTorch: Facebook's leading artificial intelligence tool PyTorch is a hugely popular deep learning framework (rivalling Google's TensorFlow) that, by combining flexiblity and dynamism with stability, bridges the gap between research and production. Using a tape-based auto-differentiation system, PyTorch can be modified and changed by engineers without losing speed. That’s good news for everyone. Although PyTorch steals the headlines, there are a range of supporting tools that are making artificial intelligence and deep learning more accessible and achievable for other engineers. Read next: Is PyTorch better than Google’s TensorFlow? Find PyTorch eBooks and videos on the Packt website.  Facebook's computer vision tools Another field that Facebook has revolutionized is computer vision and image processing. Detectron, Facebook’s state-of-the-art object detection software system, has powered many research projects including Mask R-CNN - a simple and flexible way of developing Convolution Neural Networks for image processing. Mask R-CNN has also helped to power DensePose, a tool that map all human pixels of an RGB image to a 3D surface-based representation of the human body. Facebook has also heavily contributed to research in detecting and recognizing Human-Object interactions as well. Their contribution to the field of generative modeling is equally very important, with tasks such as minimizing variations in the quality of images, JPEG compression as well as image quantization now becoming easier and more accessible. Facebook, language and artificial intelligence We share updates, we send messages - language is a cornerstone of Facebook. This is why it's such an important area for Facebook’s AI researchers. There are a whole host of libraries and tools that are built for language problems. FastText is a library for text representation and classification, while ParlAI is a platform pushing the boundaries of dialog research. The platform is focused on tackling 5 key AI tasks: question answering, sentence completion, goal-oriented dialog, chit-chat dialog, and visual dialog. The ultimate aim for ParlAI is to develop a general dialog AI. There are also a few more language tools in Facebook’s AI toolkit - Fairseq and Translate are helping with translation and text generation, while Wav2Letter is an Automatic Speech Recognition system that can be used for transcription tasks. Rational artificial intelligence for gaming and smart decision making Although Facebook isn’t known for gaming, its interest in developing artificial intelligence that can reason could have an impact on the way games are built in the future. ELF is a tool developed by Facebook that allows game developers to train and test AI algorithms in a gaming environment. ELF was used by Facebook researchers to recreate DeepMind’s AlphaGo Zero, the AI bot that has defeated Go champions. Running on a single GPU, the ELF OpenGo bot defeated four professional Go players 14-0. Impressive, right? There are other tools built by Facebook that aim to build AI into game reasoning. Torchcraft is probably the most notable example - its a library that’s making AI research on Starcraft - a strategy game - accessible to game developers and AI specialists alike. Facebook is defining the future of artificial intelligence As you can see, Facebook is doing a lot to push the boundaries of artificial intelligence. However, it’s not just keeping these tools for itself - all these tools are open source, which means they can be used by anyone.
Read more
  • 0
  • 0
  • 2784

article-image-emotional-ai-detecting-facial-expressions-and-emotions-using-coreml-tutorial
Savia Lobo
14 Sep 2018
11 min read
Save for later

Emotional AI: Detecting facial expressions and emotions using CoreML [Tutorial]

Savia Lobo
14 Sep 2018
11 min read
Recently we see computers allow natural forms of interaction and are becoming more ubiquitous, more capable, and more ingrained in our daily lives. They are becoming less like heartless dumb tools and more like friends, able to entertain us, look out for us, and assist us with our work. This article is an excerpt taken from the book Machine Learning with Core ML authored by Joshua Newnham. With this shift comes a need for computers to be able to understand our emotional state. For example, you don't want your social robot cracking a joke after you arrive back from work having lost your job (to an AI bot!). This is a field of computer science known as affective computing (also referred to as artificial emotional intelligence or emotional AI), a field that studies systems that can recognize, interpret, process, and simulate human emotions. The first stage of this is being able to recognize the emotional state. In this article, we will be creating a model that can detect the exact face expression or emotion using CoreML. Input data and preprocessing We will implement the preprocessing functionality required to transform images into something the model is expecting. We will build up this functionality in a playground project before migrating it across to our project in the next section. If you haven't done so already, pull down the latest code from the accompanying repository: https://github.com/packtpublishing/machine-learning-with-core-ml. Once downloaded, navigate to the directory Chapter4/Start/ and open the Playground project ExploringExpressionRecognition.playground. Once loaded, you will see the playground for this extract, as shown in the following screenshot: Before starting, to avoid looking at images of me, please replace the test images with either personal photos of your own or royalty free images from the internet, ideally a set expressing a range of emotions. Along with the test images, this playground includes a compiled Core ML model (we introduced it in the previous image) with its generated set of wrappers for inputs, outputs, and the model itself. Also included are some extensions for UIImage, UIImageView, CGImagePropertyOrientation, and an empty CIImage extension, to which we will return later in the extract. The others provide utility functions to help us visualize the images as we work through this playground. When developing machine learning applications, you have two broad paths. The first, which is becoming increasingly popular, is to use an end-to-end machine learning model capable of just being fed the raw input and producing adequate results. One particular field that has had great success with end-to-end models is speech recognition. Prior to end-to-end deep learning, speech recognition systems were made up of many smaller modules, each one focusing on extracting specific pieces of data to feed into the next module, which was typically manually engineered. Modern speech recognition systems use end-to-end models that take the raw input and output the result. Both of the described approaches can been seen in the following diagram: Obviously, this approach is not constrained to speech recognition and we have seen it applied to image recognition tasks, too, along with many others. But there are two things that make this particular case different; the first is that we can simplify the problem by first extracting the face. This means our model has less features to learn and offers a smaller, more specialized model that we can tune. The second thing, which is no doubt obvious, is that our training data consisted of only faces and not natural images. So, we have no other choice but to run our data through two models, the first to extract faces and the second to perform expression recognition on the extracted faces, as shown in this diagram: Luckily for us, Apple has mostly taken care of our first task of detecting faces through the Vision framework it released with iOS 11. The Vision framework provides performant image analysis and computer vision tools, exposing them through a simple API. This allows for face detection, feature detection and tracking, and classification of scenes in images and video. The latter (expression recognition) is something we will take care of using the Core ML model introduced earlier. Prior to the introduction of the Vision framework, face detection would typically be performed using the Core Image filter. Going back further, you had to use something like OpenCV. You can learn more about Core Image here: https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_detect_faces/ci_detect_faces.html. Now that we have got a bird's-eye view of the work that needs to be done, let's turn our attention to the editor and start putting all of this together. Start by loading the images; add the following snippet to your playground: var images = [UIImage]() for i in 1...3{ guard let image = UIImage(named:"images/joshua_newnham_\(i).jpg") else{ fatalError("Failed to extract features") } images.append(image) } let faceIdx = 0 let imageView = UIImageView(image: images[faceIdx]) imageView.contentMode = .scaleAspectFit In the preceding snippet, we are simply loading each of the images we have included in our resources' Images folder and adding them to an array we can access conveniently throughout the playground. Once all the images are loaded, we set the constant faceIdx, which will ensure that we access the same images throughout our experiments. Finally, we create an ImageView to easily preview it. Once it has finished running, click on the eye icon in the right-hand panel to preview the loaded image, as shown in the following screenshot: Next, we will take advantage of the functionality available in the Vision framework to detect faces. The typical flow when working with the Vision framework is defining a request, which determines what analysis you want to perform, and defining the handler, which will be responsible for executing the request and providing means of obtaining the results (either through delegation or explicitly queried). The result of the analysis is a collection of observations that you need to cast into the appropriate observation type; concrete examples of each of these can be seen here: As illustrated in the preceding diagram, the request determines what type of image analysis will be performed; the handler, using a request or multiple requests and an image, performs the actual analysis and generates the results (also known as observations). These are accessible via a property or delegate if one has been assigned. The type of observation is dependent on the request performed; it's worth highlighting that the Vision framework is tightly integrated into Core ML and provides another layer of abstraction and uniformity between you and the data and process. For example, using a classification Core ML model would return an observation of type VNClassificationObservation. This layer of abstraction not only simplifies things but also provides a consistent way of working with machine learning models. In the previous figure, we showed a request handler specifically for static images. Vision also provides a specialized request handler for handling sequences of images, which is more appropriate when dealing with requests such as tracking. The following diagram illustrates some concrete examples of the types of requests and observations applicable to this use case: So, when do you use VNImageRequestHandler and VNSequenceRequestHandler? Though the names provide clues as to when one should be used over the other, it's worth outlining some differences. The image request handler is for interactive exploration of an image; it holds a reference to the image for its life cycle and allows optimizations of various request types. The sequence request handler is more appropriate for performing tasks such as tracking and does not optimize for multiple requests on an image. Let's see how this all looks in code; add the following snippet to your playground: let faceDetectionRequest = VNDetectFaceRectanglesRequest() let faceDetectionRequestHandler = VNSequenceRequestHandler() Here, we are simply creating the request and handler; as discussed in the preceding code, the request encapsulates the type of image analysis while the handler is responsible for executing the request. Next, we will get faceDetectionRequestHandler to run faceDetectionRequest; add the following code: try? faceDetectionRequestHandler.perform( [faceDetectionRequest], on: images[faceIdx].cgImage!, orientation: CGImagePropertyOrientation(images[faceIdx].imageOrientation)) The perform function of the handler can throw an error if it fails; for this reason, we wrap the call with try? at the beginning of the statement and can interrogate the error property of the handler to identify the reason for failing. We pass the handler a list of requests (in this case, only our faceDetectionRequest), the image we want to perform the analysis on, and, finally, the orientation of the image that can be used by the request during analysis. Once the analysis is done, we can inspect the observation obtained through the results property of the request itself, as shown in the following code: if let faceDetectionResults = faceDetectionRequest.results as? [VNFaceObservation]{ for face in faceDetectionResults{ // ADD THE NEXT SNIPPET OF CODE HERE } } The type of observation is dependent on the analysis; in this case, we're expecting a VNFaceObservation. Hence, we cast it to the appropriate type and then iterate through all the observations. Next, we will take each recognized face and extract the bounding box. Then, we'll proceed to draw it in the image (using an extension method of UIImageView found within the UIImageViewExtension.swift file). Add the following block within the for loop shown in the preceding code: if let currentImage = imageView.image{ let bbox = face.boundingBox let imageSize = CGSize( width:currentImage.size.width, height: currentImage.size.height) let w = bbox.width * imageSize.width let h = bbox.height * imageSize.height let x = bbox.origin.x * imageSize.width let y = bbox.origin.y * imageSize.height let faceRect = CGRect( x: x, y: y, width: w, height: h) let invertedY = imageSize.height - (faceRect.origin.y + faceRect.height) let invertedFaceRect = CGRect( x: x, y: invertedY, width: w, height: h) imageView.drawRect(rect: invertedFaceRect) } We can obtain the bounding box of each face via the let boundingBox property; the result is normalized, so we then need to scale this based on the dimensions of the image. For example, you can obtain the width by multiplying boundingBox with the width of the image: bbox.width * imageSize.width. Next, we invert the y axis as the coordinate system of Quartz 2D is inverted with respect to that of UIKit's coordinate system, as shown in this diagram: We invert our coordinates by subtracting the bounding box's origin and height from height of the image and then passing this to our UIImageView to render the rectangle. Click on the eye icon in the right-hand panel in line with the statement imageView.drawRect(rect: invertedFaceRect) to preview the results; if successful, you should see something like the following: An alternative to inverting the face rectangle would be to use an AfflineTransform, such as: var transform = CGAffineTransform(scaleX: 1, y: -1) transform = transform.translatedBy(x: 0, y: -imageSize.height) let invertedFaceRect = faceRect.apply(transform) This approach leads to less code and therefore less chances of errors. So, it is the recommended approach. The long approach was taken previously to help illuminate the details. As a designer and builder of intelligent systems, it is your task to interpret these results and present them to the user. Some questions you'll want to ask yourself are as follows: What is an acceptable threshold of a probability before setting the class as true? Can this threshold be dependent on probabilities of other classes to remove ambiguity? That is, if Sad and Happy have a probability of 0.3, you can infer that the prediction is inaccurate, or at least not useful. Is there a way to accept multiple probabilities? Is it useful to expose the threshold to the user and have it manually set and/or tune it? These are only a few questions you should ask. The specific questions and their answers will depend on your use case and users. At this point, we have everything we need to preprocess and perform inference We briefly explored some use cases showing how emotion recognition could be applied. For a detailed overview of this experiment, check out our book, Machine Learning with Core ML to further implement Core ML for visual-based applications using the principles of transfer learning and neural networks. Amazon Rekognition can now ‘recognize’ faces in a crowd at real-time 5 cool ways Transfer Learning is being used today My friend, the robot: Artificial Intelligence needs Emotional Intelligence
Read more
  • 0
  • 0
  • 8310
article-image-aws-machine-learning-learning-aws-cli-to-execute-a-simple-amazon-ml-workflow-tutorial
Melisha Dsouza
13 Sep 2018
15 min read
Save for later

AWS machine learning: Learning AWS CLI to execute a simple Amazon ML workflow [Tutorial]

Melisha Dsouza
13 Sep 2018
15 min read
Using the AWS web interface to manage and run your projects is time-consuming. We will, therefore, start running our projects via the command line with the AWS Command Line Interface (AWS CLI). With just one tool to download and configure, multiple  AWS services can be controlled from the command line and they can be automated through scripts. The code files for this article are available on Github. This article is an excerpt from a book written by Alexis Perrier titled Effective Amazon Machine Learning. Getting started and setting up Creating a performing predictive model from raw data requires many trials and errors, much back and forth. Creating new features, cleaning up data, and trying out new parameters for the model are needed to ensure the robustness of the model. There is a constant back and forth between the data, the models, and the evaluations. Scripting this workflow either via the AWS CLI will give us the ability to speed up the create, test, select loop. Installing AWS CLI In order to set up your CLI credentials, you need your access key ID and your secret access key.  You can simply create them from the IAM console (https://console.aws.amazon.com/iam). Navigate to Users, select your IAM user name and click on the Security credentials tab. Choose Create Access Key and download the CSV file. Store the keys in a secure location. We will need the key in a few minutes to set up AWS CLI. But first, we need to install AWS CLI. Docker environment – This tutorial will help you use the AWS CLI within a docker container: https://blog.flowlog-stats.com/2016/05/03/aws-cli-in-a-docker-container/. A docker image for running the AWS CLI is available at https://hub.docker.com/r/fstab/aws-cli/. There is no need to rewrite the AWS documentation on how to install the AWS CLI. It is complete and up to date, and available at http://docs.aws.amazon.com/cli/latest/userguide/installing.html. In a nutshell, installing the CLI requires you to have Python and pip already installed. Then, run the following: $ pip install --upgrade --user awscli Add AWS to your $PATH: $ export PATH=~/.local/bin:$PATH Reload the bash configuration file (this is for OSX): $ source ~/.bash_profile Check that everything works with the following command: $ aws --version You should see something similar to the following output: $ aws-cli/1.11.47 Python/3.5.2 Darwin/15.6.0 botocore/1.5.10 Once installed, we need to configure the AWS CLI type: $ aws configure Now input the access keys you just created: $ aws configure AWS Access Key ID [None]: ABCDEF_THISISANEXAMPLE AWS Secret Access Key [None]: abcdefghijk_THISISANEXAMPLE Default region name [None]: us-west-2 Default output format [None]: json Choose the region that is closest to you and the format you prefer (JSON, text, or table). JSON is the default format. The AWS configure command creates two files: a config file and a credential file. On OSX, the files are ~/.aws/config and ~/.aws/credentials. You can directly edit these files to change your access or configuration. You will need to create different profiles if you need to access multiple AWS accounts. You can do so via the AWS configure command: $ aws configure --profile user2 You can also do so directly in the config and credential files: ~/.aws/config [default] output = json region = us-east-1 [profile user2] output = text region = us-west-2 You can edit Credential file as follows: ~/.aws/credentials [default] aws_secret_access_key = ABCDEF_THISISANEXAMPLE aws_access_key_id = abcdefghijk_THISISANEXAMPLE [user2] aws_access_key_id = ABCDEF_ANOTHERKEY aws_secret_access_key = abcdefghijk_ANOTHERKEY Refer to the AWS CLI setup page for more in-depth information: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html Picking up CLI syntax The overall format of any AWS CLI command is as follows: $ aws <service> [options] <command> <subcommand> [parameters] Here the terms are stated as: <service>: Is the name of the service you are managing: S3, machine learning, and EC2 [options] : Allows you to set the region, the profile, and the output of the command <command> <subcommand>: Is the actual command you want to execute  [parameters] : Are the parameters for these commands A simple example will help you understand the syntax better. To list the content of an S3 bucket named aml.packt, the command is as follows: $ aws s3 ls aml.packt Here, s3 is the service, ls is the command, and aml.packt is the parameter. The aws help command will output a list of all available services. There are many more examples and explanations on the AWS documentation available at http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-using.html. Passing parameters using JSON files For some services and commands, the list of parameters can become long and difficult to check and maintain. For instance, in order to create an Amazon ML model via the CLI, you need to specify at least seven different elements: the Model ID, name, type, the model's parameters, the ID of the training data source, and the recipe name and URI (aws machinelearning create-ml-model help ). When possible, we will use the CLI ability to read parameters from a JSON file instead of specifying them in the command line. AWS CLI also offers a way to generate a JSON template, which you can then use with the right parameters. To generate that JSON parameter file model (the JSON skeleton), simply add --generate-cli-skeleton after the command name. For instance, to generate the JSON skeleton for the create model command of the machine learning service, write the following: $ aws machinelearning create-ml-model --generate-cli-skeleton This will give the following output: { "MLModelId": "", "MLModelName": "", "MLModelType": "", "Parameters": { "KeyName": "" }, "TrainingDataSourceId": "", "Recipe": "", "RecipeUri": "" } You can then configure this to your liking. To have the skeleton command generate a JSON file and not simply output the skeleton in the terminal, add > filename.json: $ aws machinelearning create-ml-model --generate-cli-skeleton > filename.json This will create a filename.json file with the JSON template. Once all the required parameters are specified, you create the model with the command (assuming the filename.json is in the current folder): $ aws machinelearning create-ml-model file://filename.json Before we dive further into the machine learning workflow via the CLI, we need to introduce the dataset we will be using in this chapter. Introducing the Ames Housing dataset We will use the Ames Housing dataset that was compiled by Dean De Cock for use in data science education. It is a great alternative to the popular but older Boston Housing dataset. The Ames Housing dataset is used in the Advanced Regression Techniques challenge on the Kaggle website: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/. The original version of the dataset is available: http://www.amstat.org/publications/jse/v19n3/decock/AmesHousing.xls and in the GitHub repository for this chapter. For more information on the genesis of this dataset and an in-depth explanation of the different variables, read the paper by Dean De Cock available in PDF at https://ww2.amstat.org/publications/jse/v19n3/decock.pdf. We will start by splitting the dataset into a train and a validate set and build a model on the train set. Both train and validate sets are available in the GitHub repository as ames_housing_training.csv and ames_housing_validate.csv. The entire dataset is in the ames_housing.csv file. Splitting the dataset with shell commands We will use shell commands to shuffle, split, and create training and validation subsets of the Ames Housing dataset: First, extract the first line into a separate file, ames_housing_header.csv and remove it from the original file: $ head -n 1 ames_housing.csv > ames_housing_header.csv We just tail all the lines after the first one into the same file: $ tail -n +2 ames_housing.csv > ames_housing_nohead.csv Then randomly sort the rows into a temporary file. (gshuf is the OSX equivalent of the Linux shuf shell command. It can be installed via brew install coreutils): $ gshuf ames_housing_nohead.csv -o ames_housing_nohead.csv Extract the first 2,050 rows as the training file and the last 880 rows as the validation file: $ head -n 2050 ames_housing_nohead.csv > ames_housing_training.csv $ tail -n 880 ames_housing_nohead.csv > ames_housing_validate.csv Finally, add back the header into both training and validation files: $ cat ames_housing_header.csv ames_housing_training.csv > tmp.csv $ mv tmp.csv ames_housing_training.csv $ cat ames_housing_header.csv ames_housing_validate.csv > tmp.csv $ mv tmp.csv ames_housing_validate.csv A simple project using AWS CLI We are now ready to execute a simple Amazon ML workflow using the CLI. This includes the following: Uploading files on S3 Creating a datasource and the recipe Creating a model Creating an evaluation Prediction batch and real time Let's start by uploading the training and validation files to S3. In the following lines, replace the bucket name aml.packt with your own bucket name. To upload the files to the S3 location s3://aml.packt/data/ch8/, run the following command lines: $ aws s3 cp ./ames_housing_training.csv s3://aml.packt/data/ch8/ upload: ./ames_housing_training.csv to s3://aml.packt/data/ch8/ames_housing_training.csv $ aws s3 cp ./ames_housing_validate.csv s3://aml.packt/data/ch8/ upload: ./ames_housing_validate.csv to s3://aml.packt/data/ch8/ames_housing_validate.csv An overview of Amazon ML CLI commands That's it for the S3 part. Now let's explore the CLI for Amazon's machine learning service. All Amazon ML CLI commands are available at http://docs.aws.amazon.com/cli/latest/reference/machinelearning/. There are 30 commands, which can be grouped by object and action. You can perform the following: create : creates the object describe: searches objects given some parameters (location, dates, names, and so on) get: given an object ID, returns information update: given an object ID, updates the object delete: deletes an object These can be performed on the following elements: datasource create-data-source-from-rds create-data-source-from-redshift create-data-source-from-s3 describe-data-sources delete-data-source get-data-source update-data-source ml-model create-ml-model describe-ml-models get-ml-model delete-ml-model update-ml-model evaluation create-evaluation describe-evaluations get-evaluation delete-evaluation update-evaluation batch prediction create-batch-prediction describe-batch-predictions get-batch-prediction delete-batch-prediction update-batch-prediction real-time end point create-realtime-endpoint delete-realtime-endpoint predict You can also handle tags and set waiting times. Note that the AWS CLI gives you the ability to create datasources from S3, Redshift, and RDS, while the web interface only allowed datasources from S3 and Redshift. Creating the datasource We will start by creating the datasource. Let's first see what parameters are needed by generating the following skeleton: $ aws machinelearning create-data-source-from-s3 --generate-cli-skeleton This generates the following JSON object: { "DataSourceId": "", "DataSourceName": "", "DataSpec": { "DataLocationS3": "", "DataRearrangement": "", "DataSchema": "", "DataSchemaLocationS3": "" }, "ComputeStatistics": true } The different parameters are mostly self-explanatory and further information can be found on the AWS documentation at http://docs.aws.amazon.com/cli/latest/reference/machinelearning/create-data-source-from-s3.html. A word on the schema: when creating a datasource from the web interface, you have the possibility to use a wizard, to be guided through the creation of the schema. The wizard facilitates the process by guessing the type of the variables, thus making available a default schema that you can modify. There is no default schema available via the AWS CLI. You have to define the entire schema yourself, either in a JSON format in the DataSchema field or by uploading a schema file to S3 and specifying its location, in the DataSchemaLocationS3 field. Since our dataset has many variables (79), we cheated and used the wizard to create a default schema that we uploaded to S3. Throughout the rest of the chapter, we will specify the schema location not its JSON definition. In this example, we will create the following datasource parameter file, dsrc_ames_housing_001.json: { "DataSourceId": "ch8_ames_housing_001", "DataSourceName": "[DS] Ames Housing 001", "DataSpec": { "DataLocationS3": "s3://aml.packt/data/ch8/ames_housing_training.csv", "DataSchemaLocationS3": "s3://aml.packt/data/ch8/ames_housing.csv.schema" }, "ComputeStatistics": true } For the validation subset (save to dsrc_ames_housing_002.json): { "DataSourceId": "ch8_ames_housing_002", "DataSourceName": "[DS] Ames Housing 002", "DataSpec": { "DataLocationS3": "s3://aml.packt/data/ch8/ames_housing_validate.csv", "DataSchemaLocationS3": "s3://aml.packt/data/ch8/ames_housing.csv.schema" }, "ComputeStatistics": true } Since we have already split our data into a training and a validation set, there's no need to specify the data DataRearrangement field. Alternatively, we could also have avoided splitting our dataset and specified the following DataRearrangement on the original dataset, assuming it had been already shuffled: (save to dsrc_ames_housing_003.json): { "DataSourceId": "ch8_ames_housing_003", "DataSourceName": "[DS] Ames Housing training 003", "DataSpec": { "DataLocationS3": "s3://aml.packt/data/ch8/ames_housing_shuffled.csv", "DataRearrangement": "{"splitting":{"percentBegin":0,"percentEnd":70}}", "DataSchemaLocationS3": "s3://aml.packt/data/ch8/ames_housing.csv.schema" }, "ComputeStatistics": true } For the validation set (save to dsrc_ames_housing_004.json): { "DataSourceId": "ch8_ames_housing_004", "DataSourceName": "[DS] Ames Housing validation 004", "DataSpec": { "DataLocationS3": "s3://aml.packt/data/ch8/ames_housing_shuffled.csv", "DataRearrangement": "{"splitting":{"percentBegin":70,"percentEnd":100}}", }, "ComputeStatistics": true } Here, the ames_housing.csv file has previously been shuffled using the gshuf command line and uploaded to S3: $ gshuf ames_housing_nohead.csv -o ames_housing_nohead.csv $ cat ames_housing_header.csv ames_housing_nohead.csv > tmp.csv $ mv tmp.csv ames_housing_shuffled.csv $ aws s3 cp ./ames_housing_shuffled.csv s3://aml.packt/data/ch8/ Note that we don't need to create these four datasources; these are just examples of alternative ways to create datasources. We then create these datasources by running the following: $ aws machinelearning create-data-source-from-s3 --cli-input-json file://dsrc_ames_housing_001.json We can check whether the datasource creation is pending: In return, we get the datasoure ID we had specified: { "DataSourceId": "ch8_ames_housing_001" } We can then obtain information on that datasource with the following: $ aws machinelearning get-data-source --data-source-id ch8_ames_housing_001 This returns the following: { "Status": "COMPLETED", "NumberOfFiles": 1, "CreatedByIamUser": "arn:aws:iam::178277xxxxxxx:user/alexperrier", "LastUpdatedAt": 1486834110.483, "DataLocationS3": "s3://aml.packt/data/ch8/ames_housing_training.csv", "ComputeStatistics": true, "StartedAt": 1486833867.707, "LogUri": "https://eml-prod-emr.s3.amazonaws.com/178277513911-ds-ch8_ames_housing_001/.....", "DataSourceId": "ch8_ames_housing_001", "CreatedAt": 1486030865.965, "ComputeTime": 880000, "DataSizeInBytes": 648150, "FinishedAt": 1486834110.483, "Name": "[DS] Ames Housing 001" } Note that we have access to the operation log URI, which could be useful to analyze the model training later on. Creating the model Creating the model with the create-ml-model command follows the same steps: Generate the skeleton with the following: $ aws machinelearning create-ml-model --generate-cli-skeleton > mdl_ames_housing_001.json Write the configuration file: { "MLModelId": "ch8_ames_housing_001", "MLModelName": "[MDL] Ames Housing 001", "MLModelType": "REGRESSION", "Parameters": { "sgd.shuffleType": "auto", "sgd.l2RegularizationAmount": "1.0E-06", "sgd.maxPasses": "100" }, "TrainingDataSourceId": "ch8_ames_housing_001", "RecipeUri": "s3://aml.packt/data/ch8 /recipe_ames_housing_001.json" } Note the parameters of the algorithm. Here, we used mild L2 regularization and 100 passes. Launch the model creation with the following: $ aws machinelearning create-ml-model --cli-input-json file://mdl_ames_housing_001.json The model ID is returned: { "MLModelId": "ch8_ames_housing_001" } This get-ml-model command gives you a status update on the operation as well as the URL to the log. $ aws machinelearning get-ml-model --ml-model-id ch8_ames_housing_001 The watch command allows you to repeat a shell command every n seconds. To get the status of the model creation every 10s, just write the following: $ watch -n 10 aws machinelearning get-ml-model --ml-model-id ch8_ames_housing_001 The output of the get-ml-model will be refreshed every 10s until you kill it. It is not possible to create the default recipe via the AWS CLI commands. You can always define a blank recipe that would not carry out any transformation on the data. However, the default recipe has been shown to be positively impacting the model performance. To obtain this default recipe, we created it via the web interface, copied it into a file that we uploaded to S3. The resulting file recipe_ames_housing_001.json is available in our GitHub repository. Its content is quite long as the dataset has 79 variables and is not reproduced here for brevity purposes. Evaluating our model with create-evaluation Our model is now trained and we would like to evaluate it on the evaluation subset. For that, we will use the create-evaluation CLI command: Generate the skeleton: $ aws machinelearning create-evaluation --generate-cli-skeleton > eval_ames_housing_001.json Configure the parameter file: { "EvaluationId": "ch8_ames_housing_001", "EvaluationName": "[EVL] Ames Housing 001", "MLModelId": "ch8_ames_housing_001", "EvaluationDataSourceId": "ch8_ames_housing_002" } Launch the evaluation creation: $ aws machinelearning create-evaluation --cli-input-json file://eval_ames_housing_001.json Get the evaluation information: $ aws machinelearning get-evaluation --evaluation-id ch8_ames_housing_001 From that output, we get the performance of the model in the form of the RMSE: "PerformanceMetrics": { "Properties": { "RegressionRMSE": "29853.250469108018" } } The value may seem big, but it is relative to the range of the salePrice variable for the houses, which has a mean of 181300.0 and std of 79886.7. So an RMSE of 29853.2 is a decent score. You don't have to wait for the datasource creation to be completed in order to launch the model training. Amazon ML will simply wait for the parent operation to conclude before launching the dependent one. This makes chaining operations possible. At this point, we have a trained and evaluated model. In this tutorial, we have successfully seen the detailed steps on how to get started with CLI and we have also implemented a  simple project to get comfortable with the same. To understand how to leverage Amazon's powerful platform for your predictive analytics needs,  check out this book Effective Amazon Machine Learning Part1. Learning AWS CLI Part2. ChatOps with Slack and AWS CLI Automate tasks using Azure PowerShell and Azure CLI [Tutorial]
Read more
  • 0
  • 0
  • 3034

article-image-how-to-predict-viral-content-using-random-forest-regression-in-python-tutorial
Prasad Ramesh
12 Sep 2018
9 min read
Save for later

How to predict viral content using random forest regression in Python [Tutorial]

Prasad Ramesh
12 Sep 2018
9 min read
Understanding sharing behavior is a big business. As consumers become blind to traditional advertising, the push is to go beyond simple pitches to tell engaging stories. In this article we will build a predictive content scoring model that will predict whether the content will go viral or not using random forest regression. This article is an excerpt from a book written by Alexander T. Combs titled Python Machine Learning Blueprints: Intuitive data projects you can relate to. You can download the code and other relevant files used in this article from this GitHub link. What does research tell us about content virality? Increasingly, the success of these endeavors is measured in social shares. Why go to so much trouble? Because as a brand, every share that I receive represents another consumer that I've reached—all without spending an additional cent. Due to this value, several researchers have examined sharing behavior in the hopes of understanding what motivates it. Among the reasons researchers have found: To provide practical value to others (an altruistic motive) To associate ourselves with certain ideas and concepts (an identity motive) To bond with others around a common emotion (a communal motive) With regard to the last motive, one particularly well-designed study looked at the 7,000 pieces of content from the New York Times to examine the effect of emotion on sharing. They found that simple emotional sentiment was not enough to explain sharing behavior, but when combined with emotional arousal, the explanatory power was greater. For example, while sadness has a strong negative valence, it is considered to be a low arousal state. Anger, on the other hand, has a negative valence paired with a high arousal state. As such, stories that sadden the reader tend to generate far fewer stories than anger-inducing stories: Source : “What Makes Online Content Viral?” by Jonah Berger and Katherine L. Milkman Building a predictive content scoring model Let's create a model that can estimate the share counts for a given piece of content. Ideally, we would have a much larger sample of content, especially content that had more typical share counts. However, we'll make do with what we have here. We're going to use an algorithm called random forest regression. Here we're going to use a regression and attempt to predict the share counts. We could bucket our share classes into ranges, but it is preferable to use regression when dealing with continuous variables. To begin, we'll create a bare-bones model. We'll use the number of images, the site, and the word count. We'll train our model on the number of Facebook likes. We'll first import the sci-kit learn library, then we'll prepare our data by removing the rows with nulls, resetting our index, and finally splitting the frame into our training and testing set: from sklearn.ensemble import RandomForestRegressor all_data = dfc.dropna(subset=['img_count', 'word_count']) all_data.reset_index(inplace=True, drop=True) train_index = [] test_index = [] for i in all_data.index: result = np.random.choice(2, p=[.65,.35]) if result == 1: test_index.append(i) else: train_index.append(i) We used a random number generator with a probability set for approximately 2/3 and 1/3 to determine which row items (based on their index) would be placed in each set. Setting the probabilities this way ensures that we get approximately twice the number of rows in our training set as compared to the test set. We see this, as follows: print('test length:', len(test_index), '\ntrain length:', len(train_index)) The preceding code will generate the following output: Now, we'll continue on with preparing our data. Next, we need to set up categorical encoding for our sites. Currently, our DataFrame object has the name for each site represented with a string. We need to use dummy encoding. This creates a column for each site. If the row is for that particular site, then that column will be filled in with 1; all the other site columns be filled in with 0. Let's do that now: sites = pd.get_dummies(all_data['site']) sites The preceding code will generate the following output: The dummy encoding can be seen in the preceding image. We'll now continue by splitting our data into training and test sets as follows: y_train = all_data.iloc[train_index]['fb'].astype(int) X_train_nosite = all_data.iloc[train_index][['img_count', 'word_count']] X_train = pd.merge(X_train_nosite, sites.iloc[train_index], left_index=True, right_index=True) y_test = all_data.iloc[test_index]['fb'].astype(int) X_test_nosite = all_data.iloc[test_index][['img_count', 'word_count']] X_test = pd.merge(X_test_nosite, sites.iloc[test_index], left_index=True, right_index=True) With this, we've set up our X_test, X_train, y_test, and y_train variables. We'll use this now to build our model: clf = RandomForestRegressor(n_estimators=1000) clf.fit(X_train, y_train) With these two lines of code, we have trained our model. Let's now use it to predict the Facebook likes for our testing set: y_actual = y_test deltas = pd.DataFrame(list(zip(y_pred, y_actual, (y_pred - y_actual)/(y_actual))), columns=['predicted', 'actual', 'delta']) deltas The preceding code will generate the following output: Here we see the predicted value, the actual value, and the difference as a percentage. Let's take a look at the descriptive stats for this: deltas['delta'].describe() The preceding code will generate the following output: Our median error is 0! Well, unfortunately, this isn't a particularly useful bit of information as errors are on both sides—positive and negative, and they tend to average out, which is what we see here. Let's now look at a more informative metric to evaluate our model. We're going to look at root mean square error as a percentage of the actual mean. To first illustrate why this is more useful, let's run the following scenario on two sample series: a = pd.Series([10,10,10,10]) b = pd.Series([12,8,8,12]) np.sqrt(np.mean((b-a)**2))/np.mean(a) This results in the following output: Now compare this to the mean: (b-a).mean() This results in the following output: Clearly the former is the more meaningful statistic. Let's now run this for our model: np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual) The preceding code will generate the following output: Let's now add another feature that iscounts for words and see if it  helps our model. We'll use a count vectorizer to do this. Much like what we did with the site names, we'll transform individual words and n-grams into features: from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer(ngram_range=(1,3)) X_titles_all = vect.fit_transform(all_data['title']) X_titles_train = X_titles_all[train_index] X_titles_test = X_titles_all[test_index] X_test = pd.merge(X_test, pd.DataFrame(X_titles_test.toarray(), index=X_test.index), left_index=True, right_index=True) X_train = pd.merge(X_train, pd.DataFrame(X_titles_train.toarray(), index=X_train.index), left_index=True, right_index=True) In these lines, we joined our existing features to our new n-gram features. Let's now train our model and see if we have any improvement: clf.fit(X_train, y_train) y_pred = clf.predict(X_test) deltas = pd.DataFrame(list(zip(y_pred, y_actual, (y_pred - y_actual)/(y_actual))), columns=['predicted', 'actual', 'delta']) deltas The preceding code will generate the following output: While checking our errors again, we see the following: np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual) This code results in the following output: So, it appears that we have a modestly improved model. Now, let's add another feature i.e the word count of the title, as follows: all_data = all_data.assign(title_wc = all_data['title'].map(lambda x: len(x.split(' ')))) X_train = pd.merge(X_train, all_data[['title_wc']], left_index=True, right_index=True) X_test = pd.merge(X_test, all_data[['title_wc']], left_index=True, right_index=True) clf.fit(X_train, y_train) y_pred = clf.predict(X_test) np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual) The preceding code will generate the following output: It appears that each feature has modestly improved our model. There are certainly more features that we could add to our model. For example, we could add the day of the week and the hour of the posting, we could determine if the article is a listicle by running a regex on the headline, or we could examine the sentiment of each article. This only begins to touch on the features that could be important to model virality. We would certainly need to go much further to continue reducing the error in our model. We have performed only the most cursory testing of our model. Each measurement should be run multiple times to get a more accurate representation of the true error rate. It is possible that there is no statistically discernible difference between our last two models, as we only performed one test. To summarize, we learned how we can build a model to predict content virality using a random forest regression. To know more about predicting and other machine learning projects in Python projects check out Python Machine Learning Blueprints: Intuitive data projects you can relate to. Writing web services with functional Python programming [Tutorial] Visualizing data in R and Python using Anaconda [Tutorial] Python 3.7 beta is available as the second generation Google App Engine standard runtime
Read more
  • 0
  • 0
  • 3291