How-To Tutorials

article-image-how-far-will-facebook-go-to-fix-what-it-broke-democracy-trust-reality

24 Sep 2018

19 min read

How far will Facebook go to fix what it broke: Democracy, Trust, Reality

24 Sep 2018

Facebook, along with other tech media giants, like Twitter and Google, broke the democratic process in 2016. Facebook also broke the trust of many of its users as scandal after scandal kept surfacing telling the same story in different ways - the story of user data and trust abused in exchange for growth and revenue. The week before last, Mark Zuckerberg posted a long explanation on Facebook titled ‘Preparing for Elections’. It is the first of a series of reflections by Zuckerberg that ‘address the most important issues facing Facebook’. That post explored what Facebook is doing to avoid ending up in a situation similar to the 2016 elections when the platform ‘inadvertently’ became a super-effective channel for election interference of various kinds. It follows just weeks after Facebook COO, Sheryl Sandberg appeared in front of a Senate Intelligence hearing alongside Twitter CEO, Jack Dorsey on the topic of social media’s role in election interference. Zuckerberg’s mobile-first rigor oversimplifies the issues Zuckerberg opened his post with a strong commitment to addressing the issues plaguing Facebook using the highest levels of rigor the company has known in its history. He wrote, “I am bringing the same focus and rigor to addressing these issues that I've brought to previous product challenges like shifting our services to mobile.” To understand the weight of this statement we must go back to how Facebook became a mobile-first company that beat investor expectations wildly. Suffice to say it went through painful years of restructuring and reorientation in the process. Those unfamiliar with that phase of Facebook, please read the section ‘How far did Facebook go to become a mobile-first company?’ at the end of this post for more details. To be fair, Zuckerberg does acknowledge that pivoting to mobile was a lot easier than what it will take to tackle the current set of challenges. He writes, “These issues are even harder because people don't agree on what a good outcome looks like, or what tradeoffs are acceptable to make. When it comes to free expression, thoughtful people come to different conclusions about the right balances. When it comes to implementing a solution, certainly some investors disagree with my approach to invest so much on security. We have a lot of work ahead, but I am confident we will end this year with much more sophisticated approaches than we began, and that the focus and investments we've put in will be better for our community and the world over the long term.” However, what Zuckerberg does not acknowledge in the above statement is that the current set of issues is not merely a product challenge, but a business ethics and sustainability challenge. Unless ‘an honest look in the mirror’ kind of analysis is done on that side of Facebook, any level of product improvements will only result in cosmetic changes that will end in an ‘operation successful, patient dead’ scenario. In the coming sections, I attempt to dissect Zuckerberg’s post in the context of the above points by reading between the lines to see how serious the platform really is about changing its ways to ‘be better for our community and the world over the long term’. Why does Facebook’s commitment to change feel hollow? Let’s focus on election interference in this analysis as Zuckerberg limits his views to this topic in his post. Facebook has been at the center of this story on many levels. Here is some context on where Zuckerberg is coming from. Facebook’s involvement in the 2016 election meddling Apart from the traditional cyber-attacks (which they had even back then managed to prevent successfully), there were Russia-backed coordinated misinformation campaigns found on the platform. Then there was also the misuse of its user data by data analytics firm, Cambridge Analytica, which consulted on election campaigning. They micro-profiled users based on their psychographics (the way they think and behave) to ensure more effective ad spending by political parties. There was also the issue of certain kinds of ads, subliminal messages and peer pressure sent out to specific Facebook users during elections to prompt them to vote for certain candidates while others did not receive similar messages. There were also alleged reports of a certain set of users having been sent ‘dark posts’ (posts that aren’t publicly visible to all, but visible only to those on the target list) to discourage them from voting altogether. It also appears that Facebook staff offered both the Clinton and the Trump campaigns to assist with Facebook advertising. The former declined the offer while the latter accepted. We don’t know which of the above and to what extent each of these decisions and actions impacted the outcome of the 2016 US presidential elections. But one thing is certain, collectively they did have a significant enough impact for Zuckerberg and team to acknowledge these are serious problems that they need to address, NOW! Deconstructing Zuckerberg’s ‘Protecting Elections’ Before diving into what is problematic about the measures that are taken (or not taken) by Facebook, I must commend them for taking ownership of their role in election interference in the past and for attempting to rectify the wrongs. I like that Zuckerberg has made himself vulnerable by sharing his corrective plans with the public while it is a work in progress and is engaging with the public at a personal level. Facebook’s openness to academic research using anonymized Facebook data and their willingness to permit publishing findings without Facebook’s approval is also noteworthy. Other initiatives such as the political ad transparency report, AI enabled fake account & fake news reduction strategy, doubling the content moderator base, improving their recommendation algorithms are all steps in the right direction. However, this is where my list of nice things to say ends. The overall tone of Zuckerberg’s post is that of bargaining rather than that of acceptance. Interestingly this was exactly the tone adopted by Sandberg as well in the Senate hearing earlier this month, down to some very similar phrases. This makes one question if everything isn’t just one well-orchestrated PR disaster management plan. Disappointingly, most of the actions stated in Zuckerberg's post feel like half-measures; I get the sense that they aren’t willing to go the full distance to achieve the objectives they set for themselves. I hope to be wrong. 1. Zuckerberg focuses too much on ‘what’ and ‘how’, is ignoring the ‘why’ Zuckerberg identifies three key issues he wants to address in 2018: preventing election interference, protecting the community from abuse, and providing users with better control over their information. This clarity is a good starting point. In this post, he only focuses on the first issue. So I will reserve sharing my detailed thoughts on the other two for now. What I would say for now is that the key to addressing all issues on Facebook is taking a hard look at Facebook policies, including privacy, from a mission statement perspective. In other words, be honest about ‘Why Facebook exists’. Users are annoyed, advertisers are not satisfied and neither are shareholders confident about Facebook’s future. Trying to be everyone’s friend is clearly not working for Facebook. As such, I expected this in the opening part of the series. ‘Be better for our community and the world over the long term’ is too vague of a mission statement to be of any practical use. 2. Political Ad transparency report is necessary, but not sufficient In May this year, Facebook released its first political ad transparency report as a gesture to show its commitment to minimizing political interference. The report allows one to see who sponsored which issue advertisement and for how much. This was a move unanimously welcomed by everyone and soon others like Twitter and Google followed suit. By doing this, Facebook hopes to allow its users to form more informed views about political causes and other issues. Here is my problem with this feature. (Yes, I do view this report as a ‘feature’ of the new Facebook app which serves a very specific need: to satisfy regulators and media.) The average Facebook user is not the politically or technologically savvy consumer. They use Facebook to connect with friends and family and maybe play silly games now and then. The majority of these users aren’t going to proactively check out this ad transparency report or the political ad database to arrive at the right conclusions. The people who will find this report interesting are academic researchers, campaign managers, and analysts. It is one more rich data point to understand campaign strategy and thereby infer who the target audience is. This could most likely lead to a downward spiral of more and more polarizing ads from parties across the spectrum. 3. How election campaigning, hate speech, and real violence are linked but unacknowledged Another issue closely tied with political ads is hate speech and violence-inciting polarising content that aren’t necessarily paid ads. These are typical content in the form of posts, images or videos that are posted as a response to political ads or discourses. These act as carriers that amplify the political message, often in ways unintended by the campaigners themselves. The echo chambers still exist. And the more one's ecosystem or ‘look-alike audience’ responds to certain types of ads or posts, users are more likely to keep seeing them, thanks to Facebook's algorithms. Seeing something that is endorsed by one’s friends often primes one to trust what is said without verifying the facts for themselves thus enabling fake news to go viral. The algorithm does the rest to ensure everyone who will engage with the content sees it. Newsy political ads will thrive in such a setup while getting away with saying ‘we made full disclosure in our report’. All of this is great for Facebook’s platform as it not only gets great engagement from the content but also increased ad spendings from all political parties as they can’t afford to be missing from action on Facebook. A by-product of this ultra-polarised scenario though is more protectionism and less free, open and meaningful dialog and debate between candidates as well as supporters on the platform. That’s bad news for the democratic process. 4. Facebook’s election interference prevention model is not scalable Their single-minded focus on eliminating US election interference on Facebook’s platforms through a multipronged approach to content moderation is worth appreciating. This also makes one optimistic about Facebook’s role in consciously attempting to do the right thing when it comes to respecting election processes in other nations as well. But the current approach of creating an ‘election war room’ is neither scalable nor sustainable. What happens everytime a constituency in the US has some election or some part of the world does? What happens when multiple elections take place across the world simultaneously? Who does Facebook prioritize to provide election interference defense support and why? Also, I wouldn’t go too far to trust that they will uphold individual liberties in troubled nations with strong regimes or strong divisive political discourses. What happens when the ruling party is the one interfering with the elections? Who is Facebook answerable to? 5. Facebook’s headcount hasn’t kept up with its own growth ambitions Zuckerberg proudly states in his post that they’ve deleted a billion fake accounts with machine learning and have double the number of people hired to work on safety and security. "With advances in machine learning, we have now built systems that block millions of fake accounts every day. In total, we removed more than one billion fake accounts -- the vast majority within minutes of being created and before they could do any harm -- in the six months between October and March. ....it is still very difficult to identify the most sophisticated actors who build their networks manually one fake account at a time. This is why we've also hired a lot more people to work on safety and security -- up from 10,000 last year to more than 20,000 people this year." ‘People working on safety and security’ could have a wide range of job responsibilities from network security engineers to security guards hired at Facebook offices. What is missing conspicuously in the above picture is a breakdown of the number of people hired specifically to fact check, moderate content and resolve policy related disputes and review flagged content. With billions of users posting on Facebook, the job of content moderators and policy enforcers, even when assisted by algorithms, is massive. It is important that they are rightly incentivized to do their job well and are set clear and measurable goals. The post neither talks of how Facebook plans to reward moderators and neither does it talk about what the yardsticks for performance in this area would be. Facebook fails to acknowledge that it is not fully prepared, partly because it is understaffed. 6. The new Product Policy Director, human rights role is a glorified Public Relations job The weekend following Zuckerberg’s post, a new job opening appeared on Facebook’s careers page for the position of ‘Product policy director, human rights’. Below snippet is taken from that job posting. Source: Facebook careers The above is typically what a Public relations head does as well. Not only are the responsibilities cited above heavily communication and public perception building based, there’s not much given in terms of authority to this role to influence how other teams achieve their goals. Simply put, this role ‘works with, coordinates or advises teams’, it does not ‘guide or direct teams’. Als,o another key point to observe is that this role aims to add another layer of distance to further minimize exposure for Zuckerberg, Sandberg and other top key executives in public forums such as congressional hearings or press meets. Any role/area that is important to a business typically finds a place at the C-suite table. Had this new role been one of the c-suite roles it would have been advertised so, and it may have had some teeth. Of the 24 key executives in Facebook, only one is concerned with privacy and policy, ‘Chief Privacy Officer & VP of U.S. Public Policy’. Even this role does not have a global directive or public welfare in mind. On the other hand, there are multiple product development, creative and business development roles on Facebook’s c-suite. There is even a separate watch product head, a messaging product head, and one just dedicated to China called ‘Head of Creative Shop - Greater China’. This is why Facebook’s plan to protect elections will fail I am afraid Facebook’s greatest strength is also it’s Achilles heel. The tech industry’s deified hacker culture is embodied perfectly by Facebook. Facebook’s ad revenue based flawed business model is the ingenious creation of that very hacker culture. Any attempts to correct everything else is futile without correcting the issues with the current model. The ad revenue based model is why the Facebook app is designed the way it is: with ‘relevant’ news feeds, filter bubbles and look-alike audience segmentation. It is the reason why viral content gets rewarded irrespective of its authenticity or the impact it has on society. It is also the reason why Facebook has a ‘move fast and break things’ internal culture where growth at all costs is favored and idolized. Facebook’s Q2 2018 Earnings summary highlights the above points succinctly. Source: Facebook's SEC Filing The above snapshot means that even if we assume all 30k odd employees do some form of content moderation (the probability of which is zero), every employee is responsible for 50k users’ content daily. Let’s say every user only posts 1 post a day. If we assume Facebook’s news feed algorithms are super efficient and only find 2% of the user content questionable/fake (as speculated by Sandberg in her Senate hearing this month), that would still mean nearly 1k posts per person to review every day! What can Facebook do to turn over a new leaf? Unless Facebook attempts to sincerely address at least some of the below, I will continue to be skeptical of any number of beautifully written posts by Zuckerberg or patriotically orated speeches by Sandberg. A content moderation transparency report that shares not just the number of posts moderated, the number of people working to moderate content on Facebook but also the nature of content moderated, the moderators’ job satisfaction levels, their tenure, qualifications, career aspirations, their challenges, and how much Facebook is investing in people, processes and technology to make its platform safe and objective for everyone to engage with others. A general Ad transparency report that not only lists advertisers on Facebook but also their spendings and chosen ad filters for the public and academia to review or analyze any time. Taking responsibility for the real-world consequences of actions enabled by Facebook. Like the recent gender and age discrimination employment ads shown on Facebook. Really banning hate speech and fake viral content. Bring in a business/AI ethics head who is only next to Zuckerberg and equal to Sandberg’s COO role. Exploring and experimenting with other alternative revenue channels to tackle the current ad-driven business model problem. Resolving the UI problem so that users can gain back control over their data and make it easy to choose to not participate in Facebook’s data experiments. This would mean a potential loss in some ad revenue. The ‘grow hacker’ culture problem that is a byproduct of years of moving fast and breaking things. This would mean a significant change in behavior by everyone starting from the top and probably restructuring the way teams are organized and business is done. It would also mean a different definition and measurement of success which could lead to shareholder backlash. But Mark is uniquely placed to withstand these pressures given his clout over the board voting powers. Like Augustus Caesar his role model, Zuckerberg has a chance to make history. But he might have to put the company through hard and sacrificing times in exchange for the proverbial 200 years of world peace. He’s got the best minds and limitless resources at his disposal to right what he and his platform wronged. But he would have to make enemies with the hands that feed him. Would he rise to the challenge? Like Augustus who is rumored to have killed his grandson, will Zuckerberg ever be prepared to kill his ad revenue generating brainchild? In the meanwhile, we must not underestimate the power of good digital citizenry. We must continue to fight the good fight to move tech giants like Facebook in the right direction. Just as persistent trickling water droplets can erode mountains and create new pathways, so can our mindful actions as digital platform users prompt major tech reforms. It could be as bold as deleting one's Facebook account (I haven’t been on the platform for years now, and I don’t miss it at all). You could organize groups to create awareness on topics like digital privacy, fake news, filter bubbles, or deliberately choose to engage with those whose views differ from yours to understand their perspective on topics and thereby do your part in reversing algorithmically accentuated polarity. It could also be by selecting the right individuals to engage in informed dialog with tech conglomerates. Not every action needs to be hard though. It could be as simple as customizing your default privacy settings or choosing to only spend a select amount of time on such platforms, or deciding to verify the authenticity and assessing the toxicity of a post you wish to like, share or forward to your network. Addendum How far did Facebook go to become a mobile-first company? Following are some of the things Facebook did to become the largest mobile advertising platform in the world, surpassing Google by a huge margin. Clear purpose and reason for the change: “For one, there are more mobile users. Second, they’re spending more time on it... third, we can have better advertising on mobile, make more money,” said Zuckerberg at TechCrunch Disrupt back in 2012 on why they were becoming mobile first. In other words, there was a lot of growth and revenue potential in investing in this space. This was a simple and clear ‘what’s in it for me’ incentive for everyone working to make the transition as well for stockholders and advertisers to place their trust in Zuckerberg’s endeavors. Setting company-wide accountability: “We realigned the company around, so everybody was responsible for mobile.”, said the then President of Business and Marketing Partnerships David Fischer to Fortune in 2013. Willing to sacrifice desktop for mobile: Facebook decided to make a bold gamble to lose its desktop users to grow its unproven mobile platform. Essentially it was willing to bet its only cash cow for a dark horse that was dependent on so many other factors to go right. Strict consequences for non-compliance: Back in the days of transitioning to a mobile-first company Zuckerberg famously said to all his product teams that when they went in for reviews: “Come in with mobile. If you come in and try to show me a desktop product, I’m going to kick you out. You have to come in and show me a mobile product.” Expanding resources and investing in reskilling: They grew their team of 20 mobile engineers to literally all engineers at Facebook undergoing training courses on iOS and Android development. “we’ve completely changed the way we do product development. We’ve trained all our engineers to do mobile first.”, said Facebook’s VP of corporate development, Vaughan Smith to TechCrunch by the end of 2012. Realigning product design philosophy: Designed custom features for the mobile-first interface instead of trying to adapt the features for the web to mobile. In other words, they began with mobile as their default user interface. Local and global user behavior sensitization: Some of their engineering teams even did field visits to developing nations like the Philippines to see first hand how mobile apps are being used there. Environmental considerations in app design: Facebook even had the foresight to consider scenarios where mobile users may not have quality internet signals or poor quality mobile battery related issues. They designed their apps keeping these future needs in mind.

0
0
2855

article-image-understanding-deep-reinforcement-learning-by-understanding-the-markov-decision-process-tutorial

Savia Lobo

24 Sep 2018

10 min read

Understanding Deep Reinforcement Learning by understanding the Markov Decision Process [Tutorial]

Savia Lobo

24 Sep 2018

10 min read

0
0
4036

How-To Tutorials

article-image-performing-sentiment-analysis-with-r-on-obamas-state-of-the-union-speeches-tutorial

Sugandha Lahoti

23 Sep 2018

16 min read

Performing Sentiment Analysis with R on Obama's State of the Union speeches [Tutorial]

Sugandha Lahoti

23 Sep 2018

16 min read

For this article, we will take a look at former President Obama's State of the Union speeches. We will be performing Sentiment Analysis with R on Obama's State of the Union speeches. The two main analytical goals are to build topic models on the six State of the Union speeches and then compare the first speech in 2010 and the last in January 2016 for sentence-based textual measures, such as sentiment and dispersion. This tutorial is taken from the book Mastering Machine Learning with R - Second Edition by Cory Lesmeister. In this book, you will master machine learning techniques with R to deliver insights in complex projects. Preparing our Data and performing text transformations The primary package that we will use is tm, the text mining package. We will also need SnowballC for the stemming of the words, RColorBrewer for the color palettes in wordclouds, and the wordcloud package. Please ensure that you have these packages installed before attempting to load them: > library(tm) > library(wordcloud) > library(RColorBrewer) The data files are available for download in https://github.com/datameister66/data. Please ensure you put the text files into a separate directory because it will all go into our corpus for analysis. Download the seven .txt files, for example, sou2012.txt, into your working R directory. You can identify your current working directory and set it with these functions: > getwd() > setwd(".../data") We can now begin to create the corpus by first creating an object with the path to the speeches and then seeing how many files are in this directory and what they are named: > name <- file.path(".../text") > length(dir(name)) [1] 7 > dir(name) [1] "sou2010.txt" "sou2011.txt" "sou2012.txt" "sou2013.txt" [5] "sou2014.txt" "sou2015.txt" "sou2016.txt" We will name our corpus docs and create it with the Corpus() function, wrapped around the directory source function, DirSource(), which is also part of the tm package: > docs <- Corpus(DirSource(name)) > docs <<VCorpus>> Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 7 Note that there is no corpus or document level metadata. There are functions in the tm package to apply things such as author's names and timestamp information, among others, at both document level and corpus. We will not utilize this for our purposes. We can now begin the text transformations using the tm_map() function from the tm package. These will be the transformations that we discussed previously--lowercase letters, remove numbers, remove punctuation, remove stop words, strip out the whitespace, and stem the words: > docs <- tm_map(docs, tolower) > docs <- tm_map(docs, removeNumbers) > docs <- tm_map(docs, removePunctuation) > docs <- tm_map(docs, removeWords, stopwords("english")) > docs <- tm_map(docs, stripWhitespace) At this point, it is a good idea to eliminate unnecessary words. For example, during the speeches, when Congress applauds a statement, you will find (Applause) in the text. This must be removed: > docs <- tm_map(docs, removeWords, c("applause", "can", "cant", "will", "that", "weve", "dont", "wont", "youll", "youre")) After completing the transformations and removal of other words, make sure that your documents are plain text, put it in a document-term matrix, and check the dimensions: > docs = tm_map(docs, PlainTextDocument) > dtm = DocumentTermMatrix(docs) > dim(dtm) [1] 7 4738 The six speeches contain 4738 words. It is optional, but one can remove the sparse terms with the removeSparseTerms() function. You will need to specify a number between zero and one where the higher the number, the higher the percentage of sparsity in the matrix. Sparsity is the relative frequency of a term in the documents. So, if your sparsity threshold is 0.75, only terms with sparsity greater than 0.75 are removed. For us that would be (1 - 0.75) * 7, which is equal to 1.75. Therefore, any term in fewer than two documents would be removed: > dtm <- removeSparseTerms(dtm, 0.75) > dim(dtm) [1] 7 2254 As we don't have the metadata on the documents, it is important to name the rows of the matrix so that we know which document is which: > rownames(dtm) <- c("2010", "2011", "2012", "2013", "2014", "2015", "2016") Using the inspect() function, you can examine the matrix. Here, we will look at the seven rows and the first five columns: > inspect(dtm[1:7, 1:5]) Terms Docs abandon ability able abroad absolutely 2010 0 1 1 2 2 2011 1 0 4 3 0 2012 0 0 3 1 1 2013 0 3 3 2 1 2014 0 0 1 4 0 2015 1 0 1 1 0 2016 0 0 1 0 0 It appears that our data is ready for analysis, starting with looking at the word frequency counts. Let me point out that the output demonstrates why I've been trained to not favor wholesale stemming. You may be thinking that 'ability' and 'able' could be combined. If you stemmed the document you would end up with 'abl'. How does that help the analysis? I think you lose context, at least in the initial analysis. Again, I recommend applying stemming thoughtfully and judiciously. Data Modeling and evaluation Modeling will be broken into two distinct parts. The first will focus on word frequency and correlation and culminate in the building of a topic model. In the next portion, we will examine many different quantitative techniques by utilizing the power of the qdap package in order to compare two different speeches. Word frequency and topic models As we have everything set up in the document-term matrix, we can move on to exploring word frequencies by creating an object with the column sums, sorted in descending order. It is necessary to use as.matrix() in the code to sum the columns. The default order is ascending, so putting - in front of freq will change it to descending: > freq <- colSums(as.matrix(dtm)) > ord <- order(-freq) We will examine the head and tail of the object with the following code: > freq[head(ord)] new america people jobs now years 193 174 168 163 157 148 > freq[tail(ord)] wright written yearold youngest youngstown zero 2 2 2 2 2 2 The most frequent word is new and, as you might expect, the president mentions america frequently. Also, notice how important employment is with the frequency of jobs. I find it interesting that he mentions Youngstown, for Youngstown, OH, a couple of times. To look at the frequency of the word frequency, you can create tables, as follows: > head(table(freq)) freq 2 3 4 5 6 7 596 354 230 141 137 89 > tail(table(freq)) freq 148 157 163 168 174 193 1 1 1 1 1 1 What these tables show is the number of words with that specific frequency. So 354 words occurred three times; and one word, new in our case, occurred 193 times. Using findFreqTerms(), we can see which words occurred at least 125 times: > findFreqTerms(dtm, 125) [1] "america" "american" "americans" "jobs" "make" "new" [7] "now" "people" "work" "year" "years" You can find associations with words by correlation with the findAssocs() function. Let's look at jobs as two examples using 0.85 as the correlation cutoff: > findAssocs(dtm, "jobs", corlimit = 0.85) $jobs colleges serve market shouldnt defense put tax came 0.97 0.91 0.89 0.88 0.87 0.87 0.87 0.86 For visual portrayal, we can produce wordclouds and a bar chart. We will do two wordclouds to show the different ways to produce them: one with a minimum frequency and the other by specifying the maximum number of words to include. The first one with a minimum frequency also includes code to specify the color. The scale syntax determines the minimum and maximum word size by frequency; in this case, the minimum frequency is 70: > wordcloud(names(freq), freq, min.freq = 70, scale = c(3, .5), colors = brewer.pal(6, "Dark2")) The output of the preceding command is as follows: One can forgo all the fancy graphics, as we will in the following image, capturing the 25 most frequent words: > wordcloud(names(freq), freq, max.words = 25) The output of the preceding command is as follows: To produce a bar chart, the code can get a bit complicated, whether you use base R, ggplot2, or lattice. The following code will show you how to produce a bar chart for the 10 most frequent words in base R: > freq <- sort(colSums(as.matrix(dtm)), decreasing = TRUE) > wf <- data.frame(word = names(freq), freq = freq) > wf <- wf[1:10, ] > barplot(wf$freq, names = wf$word, main = "Word Frequency", xlab = "Words", ylab = "Counts", ylim = c(0, 250)) The output of the preceding command is as follows: We will now move on to building topic models using the topicmodels package, which offers the LDA() function. The question now is how many topics to create. It seems logical to solve for three topics (k=3). Certainly, I encourage you to try other numbers of topics: > library(topicmodels) > set.seed(123) > lda3 <- LDA(dtm, k = 3, method = "Gibbs") > topics(lda3) 2010 2011 2012 2013 2014 2015 2016 2 1 1 1 3 3 2 We can see an interesting transition over time. The first and last addresses have the same topic grouping, almost as if he opened and closed his tenure with the same themes. Using the terms() function produces a list of an ordered word frequency for each topic. The list of words is specified in the function, so let's look at the top 20 per topic: > terms(lda3, 25) Topic 1 Topic 2 Topic 3 [1,] "jobs" "people" "america" [2,] "now" "one" "new" [3,] "get" "work" "every" [4,] "tonight" "just" "years" [5,] "last" "year" "like" [6,] "energy" "know" "make" [7,] "tax" "economy" "time" [8,] "right" "americans" "need" [9,] "also" "businesses" "american" [10,] "government" "even" "world" [11,] "home" "give" "help" [12,] "well" "many" "lets" [13,] "american" "security" "want" [14,] "two" "better" "states" [15,] "congress" "come" "first" [16,] "country" "still" "country" [17,] "reform" "workers" "together" [18,] "must" "change" "keep" [19,] "deficit" "take" "back" [20,] "support" "health" "americans" [21,] "business" "care" "way" [22,] "education" "families" "hard" [23,] "companies" "made" "today" [24,] "million" "future" "working" [25,] "nation" "small" "good" Topic 2 covers the first and last speeches. Nothing really stands out as compelling in that topic as the others. It will be interesting to see how the next analysis can yield insights into those speeches. Topic 1 covers the next three speeches. Here, the message transitions to "jobs", "energy", "reform", and the "deficit", not to mention the comments about "education" and as we saw above, the correlation of "jobs" and "colleges". Topic 3 brings us to the next two speeches. The focus seems to really shift on to the economy and business with mentions to "security" and healthcare. In the next section, we can dig into the exact speech content further, along with comparing and contrasting the first and last State of the Union addresses. Additional quantitative analysis This portion of the analysis will focus on the power of the qdap package. It allows you to compare multiple documents over a wide array of measures. Our effort will be on comparing the 2010 and 2016 speeches. For starters, we will need to turn the text into data frames, perform sentence splitting, and then combine them into one data frame with a variable created that specifies the year of the speech. We will use this as our grouping variable in the analyses. Dealing with text data, even in R, can be tricky. The code that follows seemed to work the best, in this case, to get the data loaded and ready for analysis. We first load the qdap package. Then, to bring in the data from a text file, we will use the readLines() function from base R, collapsing the results to eliminate unnecessary whitespace. I also recommend putting your text encoding to ASCII, otherwise, you may run into some bizarre text that will mess up your analysis. That is done with the iconv() function: > library(qdap) > speech16 <- paste(readLines("sou2016.txt"), collapse=" ") Warning message: In readLines("sou2016.txt") : incomplete final line found on 'sou2016.txt' > speech16 <- iconv(speech16, "latin1", "ASCII", "") The warning message is not an issue as it is just telling us that the final line of text is not the same length as the other lines in the .txt file. We now apply the qprep() function from qdap. This function is a wrapper for a number of other replacement functions and using it will speed pre-processing, but it should be used with caution if a more detailed analysis is required. The functions it passes through are as follows: bracketX(): apply bracket removal replace_abbreviation(): replaces abbreviations replace_number(): numbers to words, for example '100' becomes 'one hundred' replace_symbol(): symbols become words, for example @ becomes 'at' > prep16 <- qprep(speech16) The other pre-processing we should do is to replace contractions (can't to cannot), remove stopwords, in our case the top 100, and remove unwanted characters, with the exception of periods and question marks. They will come in handy shortly: > prep16 <- replace_contraction(prep16) > prep16 <- rm_stopwords(prep16, Top100Words, separate = F) > prep16 <- strip(prep16, char.keep = c("?", ".")) Critical to this analysis is to now split it into sentences and add what will be the grouping variable, the year of the speech. This also creates the tot variable, which stands for Turn of Talk, serving as an indicator of sentence order. This is especially helpful in a situation where you are analyzing dialogue, say in a debate or question and answer session: > sent16 <- data.frame(speech = prep16) > sent16 <- sentSplit(sent16, "speech") > sent16$year <- "2016" Repeat the steps for the 2010 speech: > speech10 <- paste(readLines("sou2010.txt"), collapse=" ") > speech10 <- iconv(speech10, "latin1", "ASCII", "") > speech10 <- gsub("(Applause.)", "", speech10) > prep10 <- qprep(speech10) > prep10 <- replace_contraction(prep10) > prep10 <- rm_stopwords(prep10, Top100Words, separate = F) > prep10 <- strip(prep10, char.keep = c("?", ".")) > sent10 <- data.frame(speech = prep10) > sent10 <- sentSplit(sent10, "speech") > sent10$year <- "2010" Concatenate the separate years into one dataframe: > sentences <- data.frame(rbind(sent10, sent16)) One of the great things about the qdap package is that it facilitates basic text exploration, as we did before. Let's see a plot of frequent terms: > plot(freq_terms(sentences$speech)) The output of the preceding command is as follows: You can create a word frequency matrix that provides the counts for each word by speech: > wordMat <- wfm(sentences$speech, sentences$year) > head(wordMat[order(wordMat[, 1], wordMat[, 2],decreasing = TRUE),]) 2010 2016 our 120 85 us 33 33 year 29 17 americans 28 15 why 27 10 jobs 23 8 This can also be converted into a document-term matrix with the function as.dtm() should you so desire. Let's next build wordclouds, by year with qdap functionality: > trans_cloud(sentences$speech, sentences$year, min.freq = 10) The preceding command produces the following two images: Comprehensive word statistics are available. Here is a plot of the stats available in the package. The plot loses some of its visual appeal with just two speeches but is revealing nonetheless. A complete explanation of the stats is available under ?word_stats: > ws <- word_stats(sentences$speech, sentences$year, rm.incomplete = T) > plot(ws, label = T, lab.digits = 2) The output of the preceding command is as follows: Notice that the 2016 speech was much shorter, with over a hundred fewer sentences and almost a thousand fewer words. Also, there seems to be the use of asking questions as a rhetorical device in 2016 versus 2010 (n.quest 10 versus n.quest 4). To compare the polarity (sentiment scores), use the polarity() function, specifying the text and grouping variables: > pol = polarity(sentences$speech, sentences$year) > pol year total.sentences total.words ave.polarity sd.polarity stan.mean.polarity 1 2010 435 3900 0.052 0.432 0.121 2 2016 299 2982 0.105 0.395 0.267 The stan.mean.polarity value represents the standardized mean polarity, which is the average polarity divided by the standard deviation. We see that 2015 was slightly higher (0.267) than 2010 (0.121). This is in line with what we would expect, wanting to end on a more positive note. You can also plot the data. The plot produces two charts. The first shows the polarity by sentences over time and the second shows the distribution of the polarity: > plot(pol) The output of the preceding command is as follows: This plot may be a challenge to read in this text, but let me do my best to interpret it. The 2010 speech starts out with a strong negative sentiment and is slightly more negative than 2016. We can identify the most negative sentiment sentence by creating a dataframe of the pol object, find the sentence number, and produce it: > pol.df <- pol$all > which.min(pol.df$polarity) [1] 12 > pol.df$text.var[12] [1] "One year ago, I took office amid two wars, an economy rocked by a severe recession, a financial system on the verge of collapse, and a government deeply in debt. Now that is negative sentiment! Ironically, the government is even more in debt today. We will look at the readability index next: > ari <- automated_readability_index(sentences$speech, sentences$year) > ari$Readability year word.count sentence.count character.count 1 2010 3900 435 23859 2 2016 2982 299 17957 Automated_Readability_Index 1 11.86709 2 11.91929 I think it is no surprise that they are basically the same. Formality analysis is next. This takes a couple of minutes to run in R: > form <- formality(sentences$speech, sentences$year) > form year word.count formality 1 2016 2983 65.61 2 2010 3900 63.88 This looks to be very similar. We can examine the proportion of the parts of the speech. A plot is available, but adds nothing to the analysis, in this instance: > form$form.prop.by year word.count noun adj prep articles pronoun 1 2010 3900 44.18 15.95 3.67 0 4.51 2 2016 2982 43.46 17.37 4.49 0 4.96 verb adverb interj other 1 23.49 7.77 0.05 0.38 2 21.73 7.41 0.00 0.57 Now, the diversity measures are produced. Again, they are nearly identical. A plot is also available, (plot(div)), but being so similar, it once again adds no value. It is important to note that Obama's speechwriter for 2010 was Jon Favreau, and in 2016, it was Cody Keenan: > div <- diversity(sentences$speech, sentences$year) > div year wc simpson shannon collision berger_parker brillouin 1 2010 3900 0.998 6.825 5.970 0.031 6.326 2 2015 2982 0.998 6.824 6.008 0.029 6.248 One of my favorite plots is the dispersion plot. This shows the dispersion of a word throughout the text. Let's examine the dispersion of "jobs", "families", and "economy": > dispersion_plot(sentences$speech, rm.vars = sentences$year, c("security", "jobs", "economy"), color = "black", bg.color = "white") The output of the preceding command is as follows: This completes our analysis of the two speeches. The analysis showed that, although the speeches had a similar style, the core messages changed over time as the political landscape changed. This extract is taken from the book Mastering Machine Learning with R - Second Edition. Read the book to know more advanced prediction, algorithms, and learning methods with R. Understanding Sentiment Analysis and other key NLP concepts Twitter Sentiment Analysis Sentiment Analysis of the 2017 US elections on Twitter

0
0
5208

How-To Tutorials

article-image-build-your-first-neural-network-with-pytorch-tutorial

Sugandha Lahoti

22 Sep 2018

14 min read

Build your first neural network with PyTorch [Tutorial]

Sugandha Lahoti

22 Sep 2018

14 min read

Understanding the basic building blocks of a neural network, such as tensors, tensor operations, and gradient descents, is important for building complex neural networks. In this article, we will build our first Hello world program in PyTorch. This tutorial is taken from the book Deep Learning with PyTorch. In this book, you will build neural network models in text, vision and advanced analytics using PyTorch. Let's assume that we work for one of the largest online companies, Wondermovies, which serves videos on demand. Our training dataset contains a feature that represents the average hours spent by users watching movies on the platform and we would like to predict how much time each user would spend on the platform in the coming week. It's just an imaginary use case, don't think too much about it. Some of the high-level activities for building such a solution are as follows: Data preparation: The get_data function prepares the tensors (arrays) containing input and output data Creating learnable parameters: The get_weights function provides us with tensors containing random values that we will optimize to solve our problem Network model: The simple_network function produces the output for the input data, applying a linear rule, multiplying weights with input data, and adding the bias term (y = Wx+b) Loss: The loss_fn function provides information about how good the model is Optimizer: The optimize function helps us in adjusting random weights created initially to help the model calculate target values more accurately Let's consider following linear regression equation for our neural network: Let's write our first neural network in PyTorch: x,y = get_data() # x - represents training data,y - represents target variables w,b = get_weights() # w,b - Learnable parameters for i in range(500): y_pred = simple_network(x) # function which computes wx + b loss = loss_fn(y,y_pred) # calculates sum of the squared differences of y and y_pred if i % 50 == 0: print(loss) optimize(learning_rate) # Adjust w,b to minimize the loss Data preparation PyTorch provides two kinds of data abstractions called tensors and variables. Tensors are similar to numpy arrays and they can also be used on GPUs, which provide increased performance. They provide easy methods of switching between GPUs and CPUs. For certain operations, we can notice a boost in performance and machine learning algorithms can understand different forms of data, only when represented as tensors of numbers. Tensors are like Python arrays and can change in size. Scalar (0-D tensors) A tensor containing only one element is called a scalar. It will generally be of type FloatTensor or LongTensor. At the time of writing, PyTorch does not have a special tensor with zero dimensions. So, we use a one-dimension tensor with one element, as follows: x = torch.rand(10) x.size() Output - torch.Size([10]) Vectors (1-D tensors) A vector is simply an array of elements. For example, we can use a vector to store the average temperature for the last week: temp = torch.FloatTensor([23,24,24.5,26,27.2,23.0]) temp.size() Output - torch.Size([6]) Matrix (2-D tensors) Most of the structured data is represented in the form of tables or matrices. We will use a dataset called Boston House Prices, which is readily available in the Python scikit-learn machine learning library. The dataset is a numpy array consisting of 506 samples or rows and 13 features representing each sample. Torch provides a utility function called from_numpy(), which converts a numpy array into a torch tensor. The shape of the resulting tensor is 506 rows x 13 columns: boston_tensor = torch.from_numpy(boston.data) boston_tensor.size() Output: torch.Size([506, 13]) boston_tensor[:2] Output: Columns 0 to 7 0.0063 18.0000 2.3100 0.0000 0.5380 6.5750 65.2000 4.0900 0.0273 0.0000 7.0700 0.0000 0.4690 6.4210 78.9000 4.9671 Columns 8 to 12 1.0000 296.0000 15.3000 396.9000 4.9800 2.0000 242.0000 17.8000 396.9000 9.1400 [torch.DoubleTensor of size 2x13] 3-D tensors When we add multiple matrices together, we get a 3-D tensor. 3-D tensors are used to represent data-like images. Images can be represented as numbers in a matrix, which are stacked together. An example of an image shape is 224, 224, 3, where the first index represents height, the second represents width, and the third represents a channel (RGB). Let's see how a computer sees a panda, using the next code snippet: from PIL import Image # Read a panda image from disk using a library called PIL and convert it to numpy array panda = np.array(Image.open('panda.jpg').resize((224,224))) panda_tensor = torch.from_numpy(panda) panda_tensor.size() Output - torch.Size([224, 224, 3]) #Display panda plt.imshow(panda) Since displaying the tensor of size 224, 224, 3 would occupy a couple of pages in the book, we will display the image and learn to slice the image into smaller tensors to visualize it: Displaying the image Slicing tensors A common thing to do with a tensor is to slice a portion of it. A simple example could be choosing the first five elements of a one-dimensional tensor; let's call the tensor sales. We use a simple notation, sales[:slice_index] where slice_index represents the index where you want to slice the tensor: sales = torch.FloatTensor([1000.0,323.2,333.4,444.5,1000.0,323.2,333.4,444.5]) sales[:5] 1000.0000 323.2000 333.4000 444.5000 1000.0000 [torch.FloatTensor of size 5] sales[:-5] 1000.0000 323.2000 333.4000 [torch.FloatTensor of size 3] Let's do more interesting things with our panda image, such as see what the panda image looks like when only one channel is chosen and see how to select the face of the panda. Here, we select only one channel from the panda image: plt.imshow(panda_tensor[:,:,0].numpy()) #0 represents the first channel of RGB The output is as follows: Now, let's crop the image. Say we want to build a face detector for pandas and we need just the face of a panda for that. We crop the tensor image such that it contains only the panda's face: plt.imshow(panda_tensor[25:175,60:130,0].numpy()) The output is as follows: Another common example would be where you need to pick a specific element of a tensor: #torch.eye(shape) produces an diagonal matrix with 1 as it diagonal #elements. sales = torch.eye(3,3) sales[0,1] Output- 0.00.0 Most of the PyTorch tensor operations are very similar to NumPy operations. 4-D tensors One common example for four-dimensional tensor types is a batch of images. Modern CPUs and GPUs are optimized to perform the same operations on multiple examples faster. So, they take a similar time to process one image or a batch of images. So, it is common to use a batch of examples rather than use a single image at a time. Choosing the batch size is not straightforward; it depends on several factors. One major restriction for using a bigger batch or the complete dataset is GPU memory limitations—16, 32, and 64 are commonly used batch sizes. Let's look at an example where we load a batch of cat images of size 64 x 224 x 224 x 3 where 64 represents the batch size or the number of images, 244 represents height and width, and 3 represents channels: #Read cat images from disk cats = glob(data_path+'*.jpg') #Convert images into numpy arrays cat_imgs = np.array([np.array(Image.open(cat).resize((224,224))) for cat in cats[:64]]) cat_imgs = cat_imgs.reshape(-1,224,224,3) cat_tensors = torch.from_numpy(cat_imgs) cat_tensors.size() Output - torch.Size([64, 224, 224, 3]) Tensors on GPU We have learned how to represent different forms of data in a tensor representation. Some of the common operations we perform once we have data in the form of tensors are addition, subtraction, multiplication, dot product, and matrix multiplication. All of these operations can be either performed on the CPU or the GPU. PyTorch provides a simple function called cuda() to copy a tensor on the CPU to the GPU. We will take a look at some of the operations and compare the performance between matrix multiplication operations on the CPU and GPU. Tensor addition can be obtained by using the following code: #Various ways you can perform tensor addition a = torch.rand(2,2) b = torch.rand(2,2) c = a + b d = torch.add(a,b) #For in-place addition a.add_(5) #Multiplication of different tensors a*b a.mul(b) #For in-place multiplication a.mul_(b) For tensor matrix multiplication, let's compare the code performance on CPU and GPU. Any tensor can be moved to the GPU by calling the .cuda() function. Multiplication on the GPU runs as follows: a = torch.rand(10000,10000) b = torch.rand(10000,10000) a.matmul(b) Time taken: 3.23 s #Move the tensors to GPU a = a.cuda() b = b.cuda() a.matmul(b) Time taken: 11.2 µs These fundamental operations of addition, subtraction, and matrix multiplication can be used to build complex operations, such as a Convolution Neural Network (CNN) and a recurrent neural network (RNN). Variables Deep learning algorithms are often represented as computation graphs. Here is a simple example of the variable computation graph that we built in our example: Each circle in the preceding computation graph represents a variable. A variable forms a thin wrapper around a tensor object, its gradients, and a reference to the function that created it. The following figure shows Variable class components: The gradients refer to the rate of the change of the loss function with respect to various parameters (W, b). For example, if the gradient of a is 2, then any change in the value of a would modify the value of Y by two times. If that is not clear, do not worry—most of the deep learning frameworks take care of calculating gradients for us. In this part, we learn how to use these gradients to improve the performance of our model. Apart from gradients, a variable also has a reference to the function that created it, which in turn refers to how each variable was created. For example, the variable a has information that it is generated as a result of the product between X and W. Let's look at an example where we create variables and check the gradients and the function reference: x = Variable(torch.ones(2,2),requires_grad=True) y = x.mean() y.backward() x.grad Variable containing: 0.2500 0.2500 0.2500 0.2500 [torch.FloatTensor of size 2x2] x.grad_fn Output - None x.data 1 1 1 1 [torch.FloatTensor of size 2x2] y.grad_fn <torch.autograd.function.MeanBackward at 0x7f6ee5cfc4f8> In the preceding example, we called a backward operation on the variable to compute the gradients. By default, the gradients of the variables are none. The grad_fn of the variable points to the function it created. If the variable is created by a user, like the variable x in our case, then the function reference is None. In the case of variable y, it refers to its function reference, MeanBackward. The Data attribute accesses the tensor associated with the variable. Creating data for our neural network The get_data function in our first neural network code creates two variables, x and y, of sizes (17, 1) and (17). We will take a look at what happens inside the function: def get_data(): train_X = np.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167, 7.042,10.791,5.313,7.997,5.654,9.27,3.1]) train_Y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221, 2.827,3.465,1.65,2.904,2.42,2.94,1.3]) dtype = torch.FloatTensor X = Variable(torch.from_numpy(train_X).type(dtype),requires_grad=False).view(17,1) y = Variable(torch.from_numpy(train_Y).type(dtype),requires_grad=False) return X,y Creating learnable parameters In our neural network example, we have two learnable parameters, w and b, and two fixed parameters, x and y. We have created variables x and y in our get_data function. Learnable parameters are created using random initialization and have the require_grad parameter set to True, unlike x and y, where it is set to False. Let's take a look at our get_weights function: def get_weights(): w = Variable(torch.randn(1),requires_grad = True) b = Variable(torch.randn(1),requires_grad=True) return w,b Most of the preceding code is self-explanatory; torch.randn creates a random value of any given shape. Neural network model Once we have defined the inputs and outputs of the model using PyTorch variables, we have to build a model which learns how to map the outputs from the inputs. In traditional programming, we build a function by hand coding different logic to map the inputs to the outputs. However, in deep learning and machine learning, we learn the function by showing it the inputs and the associated outputs. In our example, we implement a simple neural network which tries to map the inputs to outputs, assuming a linear relationship. The linear relationship can be represented as y = wx + b, where w and b are learnable parameters. Our network has to learn the values of w and b, so that wx + b will be closer to the actual y. Let's visualize our training dataset and the model that our neural network has to learn: The following figure represents a linear model fitted on input data points: The dark-gray (blue) line in the image represents the model that our network learns. Network implementation As we have all the parameters (x, w, b, and y) required to implement the network, we perform a matrix multiplication between w and x. Then, sum the result with b. That will give our predicted y. The function is implemented as follows: def simple_network(x): y_pred = torch.matmul(x,w)+b return y_pred PyTorch also provides a higher-level abstraction in torch.nn called layers, which will take care of most of these underlying initialization and operations associated with most of the common techniques available in the neural network. We are using the lower-level operations to understand what happens inside these functions. The previous model can be represented as a torch.nn layer, as follows: f = nn.Linear(17,1) # Much simpler. Now that we have calculated the y values, we need to know how good our model is, which is done in the loss function. Loss function As we start with random values, our learnable parameters, w and b, will result in y_pred, which will not be anywhere close to the actual y. So, we need to define a function which tells the model how close its predictions are to the actual values. Since this is a regression problem, we use a loss function called the sum of squared error (SSE). We take the difference between the predicted y and the actual y and square it. SSE helps the model to understand how close the predicted values are to the actual values. The torch.nn library has different loss functions, such as MSELoss and cross-entropy loss. However, for this chapter, let's implement the loss function ourselves: def loss_fn(y,y_pred): loss = (y_pred-y).pow(2).sum() for param in [w,b]: if not param.grad is None: param.grad.data.zero_() loss.backward() return loss.data[0] Apart from calculating the loss, we also call the backward operation, which calculates the gradients of our learnable parameters, w and b. As we will use the loss function more than once, we remove any previously calculated gradients by calling the grad.data.zero_() operation. The first time we call the backward function, the gradients are empty, so we zero the gradients only when they are not None. Optimize the neural network We started with random weights to predict our targets and calculate loss for our algorithm. We calculate the gradients by calling the backward function on the final loss variable. This entire process repeats for one epoch, that is, for the entire set of examples. In most of the real-world examples, we will do the optimization step per iteration, which is a small subset of the total set. Once the loss is calculated, we optimize the values with the calculated gradients so that the loss reduces, which is implemented in the following function: def optimize(learning_rate): w.data -= learning_rate * w.grad.data b.data -= learning_rate * b.grad.data The learning rate is a hyper-parameter, which allows us to adjust the values in the variables by a small amount of the gradients, where the gradients denote the direction in which each variable (w and b) needs to be adjusted. Different optimizers, such as Adam, RmsProp, and SGD are already implemented for use in the torch.optim package. The final network architecture is a model for learning to predict average hours spent by users on our Wondermovies platform. Next, to learn PyTorch built-in modules for building network architectures, read our book Deep Learning with PyTorch. Can a production ready Pytorch 1.0 give TensorFlow a tough time? PyTorch 0.3.0 releases, ending stochastic functions Is Facebook-backed PyTorch better than Google’s TensorFlow?

0
0
9154

How-To Tutorials

article-image-enhancing-markovs-decision-process-with-bellman-equation-tutorial

Sugandha Lahoti

21 Sep 2018

14 min read

Enhancing Markov's Decision Process with Bellman Equation [Tutorial]

Sugandha Lahoti

21 Sep 2018

14 min read

0
0
5485

How-To Tutorials

article-image-build-a-neural-network-to-recognize-handwritten-numbers-in-keras-and-mnist

Fatema Patrawala

20 Sep 2018

8 min read

Build a Neural Network to recognize handwritten numbers in Keras and MNIST

Fatema Patrawala

20 Sep 2018

8 min read

A neural network is made up of many artificial neurons. Is it a representation of the brain or is it a mathematical representation of some knowledge? Here, we will simply try to understand how a neural network is used in practice. A convolutional neural network (CNN) is a very special kind of multi-layer neural network. CNN is designed to recognize visual patterns directly from images with minimal processing. A graphical representation of this network is produced in the following image. The field of neural networks was originally inspired by the goal of modeling biological neural systems, but since then it has branched in different directions and has become a matter of engineering and attaining good results in machine learning tasks. In this article we will look at building blocks of neural networks and build a neural network which will recognize handwritten numbers in Keras and MNIST from 0-9. This article is an excerpt taken from the book Practical Convolutional Neural Networks, written by Mohit Sewak, Md Rezaul Karim and Pradeep Pujari and published by Packt Publishing. An artificial neuron is a function that takes an input and produces an output. The number of neurons that are used depends on the task at hand. It could be as low as two or as many as several thousands. There are numerous ways of connecting artificial neurons together to create a CNN. One such topology that is commonly used is known as a feed-forward network: Each neuron receives inputs from other neurons. The effect of each input line on the neuron is controlled by the weight. The weight can be positive or negative. The entire neural network learns to perform useful computations for recognizing objects by understanding the language. Now, we can connect those neurons into a network known as a feed-forward network. This means that the neurons in each layer feed their output forward to the next layer until we get a final output. This can be written as follows: The preceding forward-propagating neuron can be implemented as follows: import numpy as np import math class Neuron(object): def __init__(self): self.weights = np.array([1.0, 2.0]) self.bias = 0.0 def forward(self, inputs): """ Assuming that inputs and weights are 1-D numpy arrays and the bias is a number """ a_cell_sum = np.sum(inputs * self.weights) + self.bias result = 1.0 / (1.0 + math.exp(-a_cell_sum)) # This is the sigmoid activation function return result neuron = Neuron() output = neuron.forward(np.array([1,1])) print(output) Now that we have understood what are the building blocks of neural networks, let us get to building a neural network that will recognize handwritten numbers from 0 - 9. Handwritten number recognition with Keras and MNIST A typical neural network for a digit recognizer may have 784 input pixels connected to 1,000 neurons in the hidden layer, which in turn connects to 10 output targets — one for each digit. Each layer is fully connected to the layer above. A graphical representation of this network is shown as follows, where x are the inputs, h are the hidden neurons, and y are the output class variables: In this notebook, we will build a neural network that will recognize handwritten numbers from 0-9. The type of neural network that we are building is used in a number of real-world applications, such as recognizing phone numbers and sorting postal mail by address. To build this network, we will use the MNIST dataset. We will begin as shown in the following code by importing all the required modules, after which the data will be loaded, and then finally building the network: # Import Numpy, keras and MNIST dataimportnumpyasnpimportmatplotlib.pyplotaspltfromkeras.datasetsimportmnistfromkeras.modelsimportSequentialfromkeras.layers.coreimportDense,Dropout,Activationfromkeras.utilsimportnp_utils Retrieving training and test data The MNIST dataset already comprises both training and test data. There are 60,000 data points of training data and 10,000 points of test data. If you do not have the data file locally at the '~/.keras/datasets/' + path, it can be downloaded at this location. Each MNIST data point has: An image of a handwritten digit A corresponding label that is a number from 0-9 to help identify the image The images will be called, and will be the input to our neural network, X; their corresponding labels are y. We want our labels as one-hot vectors. One-hot vectors are vectors of many zeros and one. It's easiest to see this in an example. The number 0 is represented as [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], and 4 is represented as [0, 0, 0, 0, 1, 0, 0, 0, 0, 0] as a one-hot vector. Flattened data We will use flattened data in this example, or a representation of MNIST images in one dimension rather than two can also be used. Thus, each 28 x 28 pixels number image will be represented as a 784 pixel 1 dimensional array. By flattening the data, information about the 2D structure of the image is thrown; however, our data is simplified. With the help of this, all our training data can be contained in one array of shape (60,000, 784), wherein the first dimension represents the number of training images and the second depicts the number of pixels in each image. This kind of data is easy to analyze using a simple neural network, as follows: # Retrieving the training and test data (X_train,y_train),(X_test,y_test)=mnist.load_data() print('X_train shape:',X_train.shape) print('X_test shape: ',X_test.shape) print('y_train shape:',y_train.shape) print('y_test shape: ',y_test.shape) Visualizing the training data The following function will help you visualize the MNIST data. By passing in the index of a training example, the show_digit function will display that training image along with its corresponding label in the title: # Visualize the dataimportmatplotlib.pyplotasplt%matplotlibinline #Displaying a training image by its index in the MNIST setdefdisplay_digit(index):label=y_train[index].argmax(axis=0)image=X_train[index]plt.title('Training data, index: %d, Label: %d'%(index,label))plt.imshow(image,cmap='gray_r')plt.show()# Displaying the first (index 0) training imagedisplay_digit(0) X_train=X_train.reshape(60000,784)X_test=X_test.reshape(10000,784)X_train=X_train.astype('float32')X_test=X_test.astype('float32')X_train/=255X_test/=255print("Train the matrix shape",X_train.shape)print("Test the matrix shape",X_test.shape) #One Hot encoding of labels.fromkeras.utils.np_utilsimportto_categoricalprint(y_train.shape)y_train=to_categorical(y_train,10)y_test=to_categorical(y_test,10)print(y_train.shape) Building the network For this example, you'll define the following: The input layer, which you should expect for each piece of MNIST data, as it tells the network the number of inputs Hidden layers, as they recognize patterns in data and also connect the input layer to the output layer The output layer, as it defines how the network learns and gives a label as the output for a given image, as follows: # Defining the neural networkdefbuild_model():model=Sequential()model.add(Dense(512,input_shape=(784,)))model.add(Activation('relu'))# An "activation" is just a non-linear function that is applied to the output# of the above layer. In this case, with a "rectified linear unit",# we perform clamping on all values below 0 to 0.model.add(Dropout(0.2))#With the help of Dropout helps we can protect the model from memorizing or "overfitting" the training datamodel.add(Dense(512))model.add(Activation('relu'))model.add(Dropout(0.2))model.add(Dense(10))model.add(Activation('softmax'))# This special "softmax" activation,#It also ensures that the output is a valid probability distribution,#Meaning that values obtained are all non-negative and sum up to 1.returnmodel #Building the modelmodel=build_model() model.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy']) Training the network Now that we've constructed the network, we feed it with data and train it, as follows: # Trainingmodel.fit(X_train,y_train,batch_size=128,nb_epoch=4,verbose=1,validation_data=(X_test,y_test)) Testing After you're satisfied with the training output and accuracy, you can run the network on the test dataset to measure its performance! A good result will obtain an accuracy higher than 95%. Some simple models have been known to achieve even up to 99.7% accuracy! We can test the model, as shown here: # Comparing the labels predicted by our model with the actual labelsscore=model.evaluate(X_test,y_test,batch_size=32,verbose=1,sample_weight=None)# Printing the resultprint('Test score:',score[0])print('Test accuracy:',score[1]) To summarize we got to know about the building blocks of neural networks and we successfully built a neural network that recognized handwritten numbers using MNIST dataset in Keras. To implement award winning and cutting edge CNN architectures, check out this one stop guide published by Packtpub, Practical Convolutional Neural Networks. Are Recurrent Neural Networks capable of warping time? Recurrent neural networks and the LSTM architecture Build a generative chatbot using recurrent neural networks (LSTM RNNs)

0
0
6313

How-To Tutorials

article-image-create-your-first-openai-gym-environment-tutorial

Savia Lobo

19 Sep 2018

7 min read

Create your first OpenAI Gym environment [Tutorial]

Savia Lobo

19 Sep 2018

7 min read

0
0
24029

How-To Tutorials

article-image-10-useful-google-cloud-ai-services-for-your-next-machine-learning-project-tutorial

Savia Lobo

18 Sep 2018

9 min read

10 useful Google Cloud AI services for your next machine learning project [Tutorial]

Savia Lobo

18 Sep 2018

9 min read

0
0
2713

How-To Tutorials

article-image-bias-variance-tradeoff-choose-bias-and-variance-machine-learning-model-tutorial

Savia Lobo

17 Sep 2018

15 min read

Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial]

Savia Lobo

17 Sep 2018

15 min read

0
0
3111

How-To Tutorials

article-image-understanding-the-tensorflow-data-model-tutorial

Sugandha Lahoti

16 Sep 2018

12 min read

Understanding the TensorFlow data model [Tutorial]

Sugandha Lahoti

16 Sep 2018

12 min read

TensorFlow is a mathematical software and an open source framework for deep learning developed by the Google Brain Team in 2011. Nevertheless, it can be used to help us analyze data in order to predict an effective business outcome. Although the initial target of TensorFlow was to conduct research in ML and in Deep Neural Networks (DNNs), the system is general enough to be applicable to a wide variety of classical machine learning algorithm such as Support Vector Machine (SVM), logistic regression, decision trees, random forest and so on. In this article we will talk about data model in TensorFlow. The data model in TensorFlow is represented by tensors. Without using complex mathematical definitions, we can say that a tensor (in TensorFlow) identifies a multidimensional numerical array. We will see more details on tensors in the next subsection. This article is taken from the book Deep Learning with TensorFlow - Second Edition by Giancarlo Zaccone and Md. Rezaul Karim. In this book, we will delve into neural networks, implement deep learning algorithms, and explore layers of data abstraction with the help of TensorFlow. Tensors in a data model Let's see the formal definition of tensor on Wikipedia, as follows: "Tensors are geometric objects that describe linear relations between geometric vectors, scalars, and other tensors. Elementary examples of such relations include the dot product, the cross product, and linear maps. Geometric vectors, often used in physics and engineering applications, and scalars themselves are also tensors." This data structure is characterized by three parameters: rank, shape, and type, as shown in the following figure: Figure 6: Tensors are nothing but geometric objects with a shape, rank, and type, used to hold a multidimensional array A tensor can thus be thought of as the generalization of a matrix that specifies an element with an arbitrary number of indices. The syntax for tensors is more or less the same as nested vectors. [box type="shadow" align="" class="" width=""]Tensors just define the type of this value and the means by which this value should be calculated during the session. Therefore, they do not represent or hold any value produced by an operation.[/box] Some people love to compare NumPy and TensorFlow. However, in reality, TensorFlow and NumPy are quite similar in the sense that both are N-d array libraries! Well, it's true that NumPy has n-dimensional array support, but it doesn't offer methods to create tensor functions and automatically compute derivatives (and it has no GPU support). The following figure is a short and one-to-one comparison of NumPy and TensorFlow: Figure 7: NumPy versus TensorFlow: a one-to-one comparison Now let's see an alternative way of creating tensors before they could be fed (we will see other feeding mechanisms later on) by the TensorFlow graph: >>> X = [[2.0, 4.0], [6.0, 8.0]] # X is a list of lists >>> Y = np.array([[2.0, 4.0], [6.0, 6.0]], dtype=np.float32)#Y is a Numpy array >>> Z = tf.constant([[2.0, 4.0], [6.0, 8.0]]) # Z is a tensor Here, X is a list, Y is an n-dimensional array from the NumPy library, and Z is a TensorFlow tensor object. Now let's see their types: >>> print(type(X)) >>> print(type(Y)) >>> print(type(Z)) #Output <class 'list'> <class 'numpy.ndarray'> <class 'tensorflow.python.framework.ops.Tensor'> Well, their types are printed correctly. However, a more convenient function that we're formally dealing with tensors as opposed to the other types is tf.convert_to_tensor() function as follows: t1 = tf.convert_to_tensor(X, dtype=tf.float32) t2 = tf.convert_to_tensor(Y dtype=tf.float32) Now let's see their types using the following code: >>> print(type(t1)) >>> print(type(t2)) #Output: <class 'tensorflow.python.framework.ops.Tensor'> <class 'tensorflow.python.framework.ops.Tensor'> Fantastic! That's enough discussion about tensors for now. So, we can think about the structure that is characterized by the term rank. Rank and shape of Tensors A unit of dimensionality called rank describes each tensor. It identifies the number of dimensions of the tensor. For this reason, a rank is known as order or n–dimensions of a tensor. A rank zero tensor is a scalar, a rank one tensor is a vector, and a rank two tensor is a matrix. The following code defines a TensorFlow scalar, vector, matrix, and cube_matrix. In the next example, we will show how rank works: import tensorflow as tf scalar = tf.constant(100) vector = tf.constant([1,2,3,4,5]) matrix = tf.constant([[1,2,3],[4,5,6]]) cube_matrix = tf.constant([[[1],[2],[3]],[[4],[5],[6]],[[7],[8],[9]]]) print(scalar.get_shape()) print(vector.get_shape()) print(matrix.get_shape()) print(cube_matrix.get_shape()) The results are printed here: >>> () (5,) (2, 3) (3, 3, 1) >>> The shape of a tensor is the number of rows and columns it has. Now we will see how to relate the shape of a tensor to its rank: >>scalar.get_shape() TensorShape([]) >>vector.get_shape() TensorShape([Dimension(5)]) >>matrix.get_shape() TensorShape([Dimension(2), Dimension(3)]) >>cube.get_shape() TensorShape([Dimension(3), Dimension(3), Dimension(1)]) Data type of Tensors In addition to rank and shape, tensors have a data type. Here is a list of the data types: Data type Python type Description DT_FLOAT tf.float32 32-bit floating point DT_DOUBLE tf.float64 64-bit floating point DT_INT8 tf.int8 8-bit signed integer DT_INT16 tf.int16 16-bit signed integer DT_INT32 tf.int32 32-bit signed integer DT_INT64 tf.int64 64-bit signed integer DT_UINT8 tf.uint8 8-bit unsigned integer DT_STRING tf.string Variable length byte arrays. Each element of a tensor is a byte array DT_BOOL tf.bool Boolean DT_COMPLEX64 tf.complex64 Complex number made of two 32-bit floating points: real and imaginary parts DT_COMPLEX128 tf.complex128 Complex number made of two 64-bit floating points: real and imaginary parts DT_QINT8 tf.qint8 8-bit signed integer used in quantized Ops DT_QINT32 tf.qint32 32-bit signed integer used in quantized Ops DT_QUINT8 tf.quint8 8-bit unsigned integer used in quantized Ops The preceding table is self-explanatory, so we have not provided a detailed discussion of the data types. The TensorFlow APIs are implemented to manage data to and from NumPy arrays. Thus, to build a tensor with a constant value, pass a NumPy array to the tf.constant() operator, and the result will be a tensor with that value: import tensorflow as tf import numpy as np array_1d = np.array([1,2,3,4,5,6,7,8,9,10]) tensor_1d = tf.constant(array_1d) with tf.Session() as sess: print(tensor_1d.get_shape()) print(sess.run(tensor_1d)) # Close the TensorFlow session when you're done sess.close() Running the example, we obtain the following: >>> (10,) [ 1 2 3 4 5 6 7 8 9 10] To build a tensor with variable values, use a NumPy array and pass it to the tf.Variable constructor. The result will be a variable tensor with that initial value: import tensorflow as tf import numpy as np # Create a sample NumPy array array_2d = np.array([(1,2,3),(4,5,6),(7,8,9)]) # Now pass the preceding array to tf.Variable() tensor_2d = tf.Variable(array_2d) # Execute the preceding op under an active session with tf.Session() as sess: sess.run(tf.global_variables_initializer()) print((tensor_2d.get_shape())) print sess.run(tensor_2d) # Finally, close the TensorFlow session when you're done sess.close() In the preceding code block, tf.global_variables_initializer() is used to initialize all the ops we created before. If you need to create a variable with an initial value dependent on another variable, use the other variable's initialized_value(). This ensures that variables are initialized in the right order. The result is as follows: >>> (3, 3) [[1 2 3] [4 5 6] [7 8 9]] For ease of use in interactive Python environments, we can use the InteractiveSession class, and then use that session for all Tensor.eval() and Operation.run() calls: import tensorflow as tf # Import TensorFlow import numpy as np # Import numpy # Create an interactive TensorFlow session interactive_session = tf.InteractiveSession() # Create a 1d NumPy array array1 = np.array([1,2,3,4,5]) # An array # Then convert the preceding array into a tensor tensor = tf.constant(array1) # convert to tensor print(tensor.eval()) # evaluate the tensor op interactive_session.close() # close the session [box type="shadow" align="" class="" width=""]tf.InteractiveSession() is just convenient syntactic sugar for keeping a default session open in IPython.[/box] The result is as follows: >>> [1 2 3 4 5] This can be easier in an interactive setting, such as the shell or an IPython Notebook, as it can be tedious to pass around a session object everywhere. [box type="shadow" align="" class="" width=""]The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment in which you can combine code execution, rich text, mathematics, plots, and rich media. For more information, interested readers should refer to https://ipython.org/notebook.html.[/box] Another way to define a tensor is using the tf.convert_to_tensor statement: import tensorflow as tf import numpy as np tensor_3d = np.array([[[0, 1, 2], [3, 4, 5], [6, 7, 8]], [[9, 10, 11], [12, 13, 14], [15, 16, 17]], [[18, 19, 20], [21, 22, 23], [24, 25, 26]]]) tensor_3d = tf.convert_to_tensor(tensor_3d, dtype=tf.float64) with tf.Session() as sess: print(tensor_3d.get_shape()) print(sess.run(tensor_3d)) # Finally, close the TensorFlow session when you're done sess.close() Following is the output of the preceding code: >>> (3, 3, 3) [[[ 0. 1. 2.] [ 3. 4. 5.] [ 6. 7. 8.]] [[ 9. 10. 11.] [ 12. 13. 14.] [ 15. 16. 17.]] [[ 18. 19. 20.] [ 21. 22. 23.] [ 24. 25. 26.]]] Variables Variables are TensorFlow objects used to hold and update parameters. A variable must be initialized so that you can save and restore it to analyze your code later on. Variables are created by using either tf.Variable() or tf.get_variable() statements. Whereas tf.get_varaiable() is recommended but tf.Variable() is lower-label abstraction. In the following example, we want to count the numbers from 1 to 10, but let's import TensorFlow first: import tensorflow as tf We created a variable that will be initialized to the scalar value 0: value = tf.get_variable("value", shape=[], dtype=tf.int32, initializer=None, regularizer=None, trainable=True, collections=None) The assign() and add() operators are just nodes of the computation graph, so they do not execute the assignment until the session is run: one = tf.constant(1) update_value = tf.assign_add(value, one) initialize_var = tf.global_variables_initializer() We can instantiate the computation graph: with tf.Session() as sess: sess.run(initialize_var) print(sess.run(value)) for _ in range(5): sess.run(update_value) print(sess.run(value)) # Close the session sess.close() Let's recall that a tensor object is a symbolic handle to the result of an operation, but it does not actually hold the values of the operation's output: >>> 0 1 2 3 4 5 Fetches To fetch the output of an operation, the graph can be executed by calling run() on the session object and passing in the tensors. Apart from fetching a single tensor node, you can also fetch multiple tensors. In the following example, the sum and multiply tensors are fetched together using the run() call: import tensorflow as tf constant_A = tf.constant([100.0]) constant_B = tf.constant([300.0]) constant_C = tf.constant([3.0]) sum_ = tf.add(constant_A,constant_B) mul_ = tf.multiply(constant_A,constant_C) with tf.Session() as sess: result = sess.run([sum_,mul_])# _ means throw away afterwards print(result) # Finally, close the TensorFlow session when you're done: sess.close() The output is as follows: >>> [array(400.],dtype=float32),array([ 300.],dtype=float32)] It should be noted that all the ops that need to be executed (that is, in order to produce tensor values) are run once (not once per requested tensor). Feeds and placeholders There are four methods of getting data into a TensorFlow program (for more information, see https://www.tensorflow.org/api_guides/python/reading_data): The Dataset API: This enables you to build complex input pipelines from simple and reusable pieces of distributed filesystems and perform complex operations. Using the Dataset API is recommended if you are dealing with large amounts of data in different data formats. The Dataset API introduces two new abstractions to TensorFlow for creating a feedable dataset: tf.contrib.data.Dataset (by creating a source or applying transformation operations) and tf.contrib.data.Iterator. Feeding: This allows us to inject data into any tensor in a computation graph. Reading from files: This allows us to develop an input pipeline using Python's built-in mechanism for reading data from data files at the beginning of the graph. Preloaded data: For a small dataset, we can use either constants or variables in the TensorFlow graph to hold all the data. In this section, we will see an example of a feeding mechanism. TensorFlow provides a feed mechanism that allows us to inject data into any tensor in a computation graph. You can provide the feed data through the feed_dict argument to a run() or eval() invocation that initiates the computation. [box type="shadow" align="" class="" width=""]Feeding using feed_dict argument is the least efficient way to feed data into a TensorFlow execution graph and should only be used for small experiments needing small dataset. It can also be used for debugging.[/box] We can also replace any tensor with feed data (that is, variables and constants). Best practice is to use a TensorFlow placeholder node using tf.placeholder() (https://www.tensorflow.org/api_docs/python/tf/placeholder). A placeholder exists exclusively to serve as the target of feeds. An empty placeholder is not initialized, so it does not contain any data. Therefore, it will always generate an error if it is executed without a feed, so you won't forget to feed it. The following example shows how to feed data to build a random 2×3 matrix: import tensorflow as tf import numpy as np a = 3 b = 2 x = tf.placeholder(tf.float32,shape=(a,b)) y = tf.add(x,x) data = np.random.rand(a,b) sess = tf.Session() print(sess.run(y,feed_dict={x:data})) sess.close()# close the session The output is as follows: >>> [[ 1.78602004 1.64606333] [ 1.03966308 0.99269408] [ 0.98822606 1.50157797]] >>> We understood the data model in TensorFlow. To understand the TensorFlow computational graph and the TensorFlow code structure, read our book Deep Learning with TensorFlow - Second Edition. Why TensorFlow always tops machine learning and artificial intelligence tool surveys. TensorFlow 2.0 is coming. Here’s what we can expect. Getting to know and manipulate Tensors in TensorFlow.

0
0
4960

How-To Tutorials

article-image-how-to-perform-sentiment-analysis-using-python-tutorial

Sugandha Lahoti

15 Sep 2018

4 min read

How to perform sentiment analysis using Python [Tutorial]

Sugandha Lahoti

15 Sep 2018

4 min read

Sentiment analysis is one of the most popular applications of NLP. Sentiment analysis refers to the process of determining whether a given piece of text is positive or negative. In some variations, we consider "neutral" as a third option. This technique is commonly used to discover how people feel about a particular topic. This is used to analyze the sentiments of users in various forms, such as marketing campaigns, social media, e-commerce customers, and so on. In this article, we will perform sentiment analysis using Python. This extract is taken from Python Machine Learning Cookbook by Prateek Joshi. This book contains 100 recipes that teach you how to perform various machine learning tasks in the real world. How to Perform Sentiment Analysis in Python Step 1: Create a new Python file, and import the following packages: import nltk.classify.util from nltk.classify import NaiveBayesClassifier from nltk.corpus import movie_reviews Step 2: Define a function to extract features: def extract_features(word_list): return dict([(word, True) for word in word_list]) Step 3: We need training data for this, so we will use movie reviews in NLTK: if __name__=='__main__': # Load positive and negative reviews positive_fileids = movie_reviews.fileids('pos') negative_fileids = movie_reviews.fileids('neg') Step 4: Let's separate these into positive and negative reviews: features_positive = [(extract_features(movie_reviews.words(fileids=[f])), 'Positive') for f in positive_fileids] features_negative = [(extract_features(movie_reviews.words(fileids=[f])), 'Negative') for f in negative_fileids] Step 5: Divide the data into training and testing datasets: # Split the data into train and test (80/20) threshold_factor = 0.8 threshold_positive = int(threshold_factor * len(features_positive)) threshold_negative = int(threshold_factor * len(features_negative)) Step 6: Extract the features: features_train = features_positive[:threshold_positive] + features_negative[:threshold_negative] features_test = features_positive[threshold_positive:] + features_negative[threshold_negative:] print "\nNumber of training datapoints:", len(features_train) print "Number of test datapoints:", len(features_test) Step 7: We will use a Naive Bayes classifier. Define the object and train it: # Train a Naive Bayes classifier classifier = NaiveBayesClassifier.train(features_train) print "\nAccuracy of the classifier:", nltk.classify.util.accuracy(classifier, features_test) Step 8: The classifier object contains the most informative words that it obtained during analysis. These words basically have a strong say in what's classified as a positive or a negative review. Let's print them out: print "\nTop 10 most informative words:" for item in classifier.most_informative_features()[:10]: print item[0] Step 9: Create a couple of random input sentences: # Sample input reviews input_reviews = [ "It is an amazing movie", "This is a dull movie. I would never recommend it to anyone.", "The cinematography is pretty great in this movie", "The direction was terrible and the story was all over the place" ] Step 10: Run the classifier on those input sentences and obtain the predictions: print "\nPredictions:" for review in input_reviews: print "\nReview:", review probdist = classifier.prob_classify(extract_features(review.split())) pred_sentiment = probdist.max() Step 11: Print the output: print "Predicted sentiment:", pred_sentiment print "Probability:", round(probdist.prob(pred_sentiment), 2) If you run this code, you will see three main things printed on the Terminal. The first is the accuracy, as shown in the following image: The next is a list of most informative words: The last is the list of predictions, which are based on the input sentences: How does the Code work? We use NLTK's Naive Bayes classifier for our task here. In the feature extractor function, we basically extract all the unique words. However, the NLTK classifier needs the data to be arranged in the form of a dictionary. Hence, we arranged it in such a way that the NLTK classifier object can ingest it. Once we divide the data into training and testing datasets, we train the classifier to categorize the sentences into positive and negative. If you look at the top informative words, you can see that we have words such as "outstanding" to indicate positive reviews and words such as "insulting" to indicate negative reviews. This is interesting information because it tells us what words are being used to indicate strong reactions. Thus we learn how to perform Sentiment Analysis in Python. For more interesting machine learning recipes read our book, Python Machine Learning Cookbook. Understanding Sentiment Analysis and other key NLP concepts. Twitter Sentiment Analysis. Sentiment Analysis of the 2017 US elections on Twitter.

0
0
19933

How-To Tutorials

article-image-how-facebook-is-advancing-artificial-intelligence-video

Richard Gall

14 Sep 2018

4 min read

How Facebook is advancing artificial intelligence [Video]

Richard Gall

14 Sep 2018

4 min read

0
0
2804

article-image-emotional-ai-detecting-facial-expressions-and-emotions-using-coreml-tutorial

Savia Lobo

14 Sep 2018

11 min read

Emotional AI: Detecting facial expressions and emotions using CoreML [Tutorial]

Savia Lobo

14 Sep 2018

11 min read

Recently we see computers allow natural forms of interaction and are becoming more ubiquitous, more capable, and more ingrained in our daily lives. They are becoming less like heartless dumb tools and more like friends, able to entertain us, look out for us, and assist us with our work. This article is an excerpt taken from the book Machine Learning with Core ML authored by Joshua Newnham. With this shift comes a need for computers to be able to understand our emotional state. For example, you don't want your social robot cracking a joke after you arrive back from work having lost your job (to an AI bot!). This is a field of computer science known as affective computing (also referred to as artificial emotional intelligence or emotional AI), a field that studies systems that can recognize, interpret, process, and simulate human emotions. The first stage of this is being able to recognize the emotional state. In this article, we will be creating a model that can detect the exact face expression or emotion using CoreML. Input data and preprocessing We will implement the preprocessing functionality required to transform images into something the model is expecting. We will build up this functionality in a playground project before migrating it across to our project in the next section. If you haven't done so already, pull down the latest code from the accompanying repository: https://github.com/packtpublishing/machine-learning-with-core-ml. Once downloaded, navigate to the directory Chapter4/Start/ and open the Playground project ExploringExpressionRecognition.playground. Once loaded, you will see the playground for this extract, as shown in the following screenshot: Before starting, to avoid looking at images of me, please replace the test images with either personal photos of your own or royalty free images from the internet, ideally a set expressing a range of emotions. Along with the test images, this playground includes a compiled Core ML model (we introduced it in the previous image) with its generated set of wrappers for inputs, outputs, and the model itself. Also included are some extensions for UIImage, UIImageView, CGImagePropertyOrientation, and an empty CIImage extension, to which we will return later in the extract. The others provide utility functions to help us visualize the images as we work through this playground. When developing machine learning applications, you have two broad paths. The first, which is becoming increasingly popular, is to use an end-to-end machine learning model capable of just being fed the raw input and producing adequate results. One particular field that has had great success with end-to-end models is speech recognition. Prior to end-to-end deep learning, speech recognition systems were made up of many smaller modules, each one focusing on extracting specific pieces of data to feed into the next module, which was typically manually engineered. Modern speech recognition systems use end-to-end models that take the raw input and output the result. Both of the described approaches can been seen in the following diagram: Obviously, this approach is not constrained to speech recognition and we have seen it applied to image recognition tasks, too, along with many others. But there are two things that make this particular case different; the first is that we can simplify the problem by first extracting the face. This means our model has less features to learn and offers a smaller, more specialized model that we can tune. The second thing, which is no doubt obvious, is that our training data consisted of only faces and not natural images. So, we have no other choice but to run our data through two models, the first to extract faces and the second to perform expression recognition on the extracted faces, as shown in this diagram: Luckily for us, Apple has mostly taken care of our first task of detecting faces through the Vision framework it released with iOS 11. The Vision framework provides performant image analysis and computer vision tools, exposing them through a simple API. This allows for face detection, feature detection and tracking, and classification of scenes in images and video. The latter (expression recognition) is something we will take care of using the Core ML model introduced earlier. Prior to the introduction of the Vision framework, face detection would typically be performed using the Core Image filter. Going back further, you had to use something like OpenCV. You can learn more about Core Image here: https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_detect_faces/ci_detect_faces.html. Now that we have got a bird's-eye view of the work that needs to be done, let's turn our attention to the editor and start putting all of this together. Start by loading the images; add the following snippet to your playground: var images = [UIImage]() for i in 1...3{ guard let image = UIImage(named:"images/joshua_newnham_\(i).jpg") else{ fatalError("Failed to extract features") } images.append(image) } let faceIdx = 0 let imageView = UIImageView(image: images[faceIdx]) imageView.contentMode = .scaleAspectFit In the preceding snippet, we are simply loading each of the images we have included in our resources' Images folder and adding them to an array we can access conveniently throughout the playground. Once all the images are loaded, we set the constant faceIdx, which will ensure that we access the same images throughout our experiments. Finally, we create an ImageView to easily preview it. Once it has finished running, click on the eye icon in the right-hand panel to preview the loaded image, as shown in the following screenshot: Next, we will take advantage of the functionality available in the Vision framework to detect faces. The typical flow when working with the Vision framework is defining a request, which determines what analysis you want to perform, and defining the handler, which will be responsible for executing the request and providing means of obtaining the results (either through delegation or explicitly queried). The result of the analysis is a collection of observations that you need to cast into the appropriate observation type; concrete examples of each of these can be seen here: As illustrated in the preceding diagram, the request determines what type of image analysis will be performed; the handler, using a request or multiple requests and an image, performs the actual analysis and generates the results (also known as observations). These are accessible via a property or delegate if one has been assigned. The type of observation is dependent on the request performed; it's worth highlighting that the Vision framework is tightly integrated into Core ML and provides another layer of abstraction and uniformity between you and the data and process. For example, using a classification Core ML model would return an observation of type VNClassificationObservation. This layer of abstraction not only simplifies things but also provides a consistent way of working with machine learning models. In the previous figure, we showed a request handler specifically for static images. Vision also provides a specialized request handler for handling sequences of images, which is more appropriate when dealing with requests such as tracking. The following diagram illustrates some concrete examples of the types of requests and observations applicable to this use case: So, when do you use VNImageRequestHandler and VNSequenceRequestHandler? Though the names provide clues as to when one should be used over the other, it's worth outlining some differences. The image request handler is for interactive exploration of an image; it holds a reference to the image for its life cycle and allows optimizations of various request types. The sequence request handler is more appropriate for performing tasks such as tracking and does not optimize for multiple requests on an image. Let's see how this all looks in code; add the following snippet to your playground: let faceDetectionRequest = VNDetectFaceRectanglesRequest() let faceDetectionRequestHandler = VNSequenceRequestHandler() Here, we are simply creating the request and handler; as discussed in the preceding code, the request encapsulates the type of image analysis while the handler is responsible for executing the request. Next, we will get faceDetectionRequestHandler to run faceDetectionRequest; add the following code: try? faceDetectionRequestHandler.perform( [faceDetectionRequest], on: images[faceIdx].cgImage!, orientation: CGImagePropertyOrientation(images[faceIdx].imageOrientation)) The perform function of the handler can throw an error if it fails; for this reason, we wrap the call with try? at the beginning of the statement and can interrogate the error property of the handler to identify the reason for failing. We pass the handler a list of requests (in this case, only our faceDetectionRequest), the image we want to perform the analysis on, and, finally, the orientation of the image that can be used by the request during analysis. Once the analysis is done, we can inspect the observation obtained through the results property of the request itself, as shown in the following code: if let faceDetectionResults = faceDetectionRequest.results as? [VNFaceObservation]{ for face in faceDetectionResults{ // ADD THE NEXT SNIPPET OF CODE HERE } } The type of observation is dependent on the analysis; in this case, we're expecting a VNFaceObservation. Hence, we cast it to the appropriate type and then iterate through all the observations. Next, we will take each recognized face and extract the bounding box. Then, we'll proceed to draw it in the image (using an extension method of UIImageView found within the UIImageViewExtension.swift file). Add the following block within the for loop shown in the preceding code: if let currentImage = imageView.image{ let bbox = face.boundingBox let imageSize = CGSize( width:currentImage.size.width, height: currentImage.size.height) let w = bbox.width * imageSize.width let h = bbox.height * imageSize.height let x = bbox.origin.x * imageSize.width let y = bbox.origin.y * imageSize.height let faceRect = CGRect( x: x, y: y, width: w, height: h) let invertedY = imageSize.height - (faceRect.origin.y + faceRect.height) let invertedFaceRect = CGRect( x: x, y: invertedY, width: w, height: h) imageView.drawRect(rect: invertedFaceRect) } We can obtain the bounding box of each face via the let boundingBox property; the result is normalized, so we then need to scale this based on the dimensions of the image. For example, you can obtain the width by multiplying boundingBox with the width of the image: bbox.width * imageSize.width. Next, we invert the y axis as the coordinate system of Quartz 2D is inverted with respect to that of UIKit's coordinate system, as shown in this diagram: We invert our coordinates by subtracting the bounding box's origin and height from height of the image and then passing this to our UIImageView to render the rectangle. Click on the eye icon in the right-hand panel in line with the statement imageView.drawRect(rect: invertedFaceRect) to preview the results; if successful, you should see something like the following: An alternative to inverting the face rectangle would be to use an AfflineTransform, such as: var transform = CGAffineTransform(scaleX: 1, y: -1) transform = transform.translatedBy(x: 0, y: -imageSize.height) let invertedFaceRect = faceRect.apply(transform) This approach leads to less code and therefore less chances of errors. So, it is the recommended approach. The long approach was taken previously to help illuminate the details. As a designer and builder of intelligent systems, it is your task to interpret these results and present them to the user. Some questions you'll want to ask yourself are as follows: What is an acceptable threshold of a probability before setting the class as true? Can this threshold be dependent on probabilities of other classes to remove ambiguity? That is, if Sad and Happy have a probability of 0.3, you can infer that the prediction is inaccurate, or at least not useful. Is there a way to accept multiple probabilities? Is it useful to expose the threshold to the user and have it manually set and/or tune it? These are only a few questions you should ask. The specific questions and their answers will depend on your use case and users. At this point, we have everything we need to preprocess and perform inference We briefly explored some use cases showing how emotion recognition could be applied. For a detailed overview of this experiment, check out our book, Machine Learning with Core ML to further implement Core ML for visual-based applications using the principles of transfer learning and neural networks. Amazon Rekognition can now ‘recognize’ faces in a crowd at real-time 5 cool ways Transfer Learning is being used today My friend, the robot: Artificial Intelligence needs Emotional Intelligence

0
0
8359

How-To Tutorials

article-image-aws-machine-learning-learning-aws-cli-to-execute-a-simple-amazon-ml-workflow-tutorial

Melisha Dsouza

13 Sep 2018

15 min read

AWS machine learning: Learning AWS CLI to execute a simple Amazon ML workflow [Tutorial]

Melisha Dsouza

13 Sep 2018

15 min read

0
0
3053

How-To Tutorials

article-image-how-to-predict-viral-content-using-random-forest-regression-in-python-tutorial

Prasad Ramesh

12 Sep 2018

9 min read

How to predict viral content using random forest regression in Python [Tutorial]

Prasad Ramesh

12 Sep 2018

9 min read

0
0
3293

How-To Tutorials

How far will Facebook go to fix what it broke: Democracy, Trust, Reality

Understanding Deep Reinforcement Learning by understanding the Markov Decision Process [Tutorial]

Performing Sentiment Analysis with R on Obama's State of the Union speeches [Tutorial]

Build your first neural network with PyTorch [Tutorial]

Enhancing Markov's Decision Process with Bellman Equation [Tutorial]

Build a Neural Network to recognize handwritten numbers in Keras and MNIST

Create your first OpenAI Gym environment [Tutorial]

10 useful Google Cloud AI services for your next machine learning project [Tutorial]

Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial]

Understanding the TensorFlow data model [Tutorial]

Trending Topics

How to perform sentiment analysis using Python [Tutorial]

How Facebook is advancing artificial intelligence [Video]

Emotional AI: Detecting facial expressions and emotions using CoreML [Tutorial]

AWS machine learning: Learning AWS CLI to execute a simple Amazon ML workflow [Tutorial]

How to predict viral content using random forest regression in Python [Tutorial]