Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-nips-2017-special-machine-learning-genomics-bridging-gap-research-clinical-trial-success-brendan-frey

14 Dec 2017

10 min read

NIPS 2017 Special: How machine learning for genomics is bridging the gap between research and clinical trial success by Brendan Frey

14 Dec 2017

Brendan Frey is the founder and CEO of Deep Genomics. He is the professor of engineering and medicine at the University of Toronto. His major work focuses on using machine learning to model genome biology and understand genetic disorders. This article attempts to bring our readers to Brendan’s Keynote speech at NIPS 2017. It highlights how the human genome can be reprogrammed using Machine Learning and gives a glimpse into some of the significant work going on in this field. After reading this article, head over to the NIPS Facebook page for the complete keynote. All images in this article come from Brendan’s presentation slides and do not belong to us. 65% of people in their lifetime are at a risk of acquiring a disease with a genetic basis. 8 million births per year are estimated to have a serious genetic defect. According to the US healthcare system, the lifetime average cost of such a baby is 5M$ per child. These are just some statistics. If we also add the emotional component to this data, it gives us an alarming picture of the state of the healthcare industry today. According to a recent study, investing in pharma is no longer as lucrative as it used to be in the 90s. Funding for this sector is dwindling, which serves as a barrier to drug discovery, trial, and deployment. All of these, in turn, add to the rising cost of healthcare. Better to stuff your money in a mattress than put it in a pharmaceutical company! Genomics as a field is rich in data. Experts in genomics strive to determine complete DNA sequences and perform genetic mapping to help understand a disease. However, the main problem confronting Genome Biology and Genomics is the inability to decipher information from the human genome i.e. how to convert the genome into actionable information. What genes are made of and why sequencing matter Essentially each gene consists of a promoter region, which basically activates the gene. Following the promoter region, there are alternating Exons and Introns. Introns are almost 10,000 nucleotides long. Exons are relatively short, around 100 nucleotides long. In software terms, you can think of Exons as print statements. Exons are the part that ends up in proteins. Introns get cut out/removed. However, Introns contain crucial control logic. There are words embedded in introns that tells the cells how to cut and paste these exons together and make the gene. A DNA sequence is transcribed into RNA and then the RNA is processed in various ways to translate into proteins. However, the picture is much more complicated than this. Proteins go back and interact with the DNA. Proteins also interact with RNA. even, RNA interacts with protein. So all these entities are interrelated. All these technicalities and interrelationships make biology generally complex for researchers or even a group of researchers to fully understand and make sense of the data. Another way to look at this is, that in the recent years our ability to measure biology (fitbits, tabloids, genomes) and the ability to alter biology (DNA editing) has far surpassed our ability to understand biology. In short, in this field, we have become very good at collecting data but not as good with interpreting it. Machine Learning brought to genomes Deep Genomics, is a genetic medicine company that uses an AI-driven platform to support geneticists, molecular biologists and chemists in the development of genetic therapies. In 2010, Deep Genomics used machine learning to understand how words embedded in introns control print statements splicing which puts exons into proteins. They also used machine learning to reverse engineer to infer those code words using datasets. Another deep genomic research project talks about Protein-DNA binding data. There are datasets which allow you to measure interactions between protein and DNA and understand how that works. In this research, they took a dataset from Ray et al 2013 which consisted of 240,000 designed sequences and then evaluated which are the proteins that the sequence likes to stick to. Thus generating a big data matrix of proteins and designed sequences. The machine learning task here was to learn to take a sequence and predict whether the protein will bind to that sequence. How was this done? They took batches of data containing the designed sequences and fed it into a convolutional neural network. The CNN swept across those sequences to generate an intermediary representation. This representation was then fed into different layers of convolutional pooling and fully connected layers produced the output. The output was then compared to the measurement (the data matrix of proteins and designed sequences described earlier ) and backpropagation was used to update the parameters. One of the challenges was figuring out the right metric. For this, they compared the measured binding affinity (how much protein sticks to the sequence) to the output of the neural network and determined the right cos function for producing a neural network that is useful in practice. Usecase One of the use cases of this neural network is to identify the pathological mutation and fix them. The above illustration is a sequence of the cholesterol gene. The researchers artificially in silico looked at every possible mutation in the promoter. So for each nucleotide, say if the nucleotide had a value of A, they switched it to G, C, and T and for each of those possibilities, they ran the entire promoter through a neural network and looked at its output. The neural network then predicted the mutations that will disrupt the protein binding. The heights of the letter showed the measured binding affinity i.e the output of the neural network. The white box displays how much the mutation changed the output. Pink or bright red was used in case of positive mutation, blue in case of negative mutation and white for no change. This map was then compared with known results to see the accuracy and also make predictions never seen before in a clinical trial. As shown in the image, Blues, which are the potential or known harmful mutations have correctly fallen in the white spaces. But there are some unknown mutations as well. Machine learning output such as this can help researchers narrow their focus on learning about new diseases and also in diagnosing existing ones and treating them. Another group of researchers used a neural network to figure out the 3D structure or the chromatin interaction structure of the DNA. The data used was a matrix form and showed how strongly two parts of a DNA are likely to interact. The researchers trained a multilayer convolutional network to take as input the raw DNA sequence and also a signal called chromatin accessibility( tells how available the DNA is) and fed it into CNN. The output of that system predicted the probability of contact which is crucial for gene expression. Deep genomics: Using AI to build a new universe of digital medicines The founding belief at Deep Genomics is that the future of medicine will rely on artificial intelligence, because biology is too complex for humans to understand. The goal of deep genomics is to build AI platform for detecting and treating genetic disease. Genome tools Genome processing tools are tools which help in identification of mutation, e.g DeepVariant. At deep genomics, the tool used is called genomic kit which is 20 to 800 times faster than other existing tools. Disease mechanism Prediction This is about figuring whether the disease mechanism is pathological or the mutation which simply changes hair color. Therapeutic Development Helping patients by providing them with better medicines. These are the basics of any drug development procedure. We start with patient genetic data and clinical mutations. Then we find the disease mechanism and figure the mechanism of action( the steps to remediate the problem). However, the disease mechanism and mechanism of action of a potential drug may not be the inverse of one another. The next step is to design a drug. With Digital medicines, if we know the mechanism of action that we are trying to achieve, and we have ML systems, like the ones described earlier, we can simulate the effects of modifying DNA or RNA. Thus we can, in silico design the compound we want to test. Next, we test the experimental work in the wet lab to actually see if it alters the way in which ML systems predicted. The next thing is toxicity or off-target effects. This evaluates if the compound is going to change some other part of the genome or has some unintended consequences. Next, we have clinical trials. In case of clinical trials, the biggest problems facing pharmaceutical companies is patient’s gratification. Then comes the marketing and distribution of that drug which is highly costly. This includes marketing strategies to convince people to buy those drugs, insurance companies to pay for them, and legal companies to deal with litigations. Here’s how long it took Ionis and Biogen, to develop Spinraza, which is a drug for curing Spinal Muscular Atrophy (SMA). It is the most effective drug for curing SMA. It has saved hundreds of lives already. However, it costs 750,000$ per child per year. Why does it cost so much? If we look at the timeline of the development of Spinraza, the initial period of testing was quite long. The goal of deep Genomics is to use ML to accelerate the research period of drugs such as Spinraza from 8 years down to a couple of years. They also aim to use AI to accelerate clinical trials, toxicity studies, and other aspects of drug development. The whole idea is to reduce the amount of time needed to develop the drug. Deep genomics uses AI to automate and accelerate each of these steps and make it fast and accurate. However, apart from AI, they also test compounds at their wet lab in human cells to see if they work. They also use Cloud Laboratory. At cloud lab, they upload a python script. Once uploaded, it specifies the experimental protocols and then robots conduct these experiments. These labs rapidly scale up the ability to do experiments, test compounds, and solve other problems. Earning trust of stakeholders One of the major issues ML systems face in the genomics industry is earning the trust of the stakeholders. These stakeholders include the patients, the physicians treating the patients, the insurance companies paying for the treatments, different technology providers, and the hospitals. Machine learning practitioners are also often criticized for producing black boxes, that are not open to interpretation. The way to gain this trust is to exactly figure out what these stakeholders need. For this, machine learning systems need to explain the intermediary steps of a prediction. For instance, instead of directly recommending double mastectomy, the system says you have a mutation, the mutation is going to cause splicing to go wrong, leading to malfunctioning protein, which is likely to lead to breast cancer. The likelihood is x%. The road ahead Researchers at Deep Genomics pare currently working primarily on Project Saturn. The idea is to use a Machine learning system to scan a vast space of 69 billion molecules all in silico and identify about a thousand active compounds. Active compounds allow us to manipulate cell biology. Think about it as 1000 control switches which we can turn and twist to adjust what is going inside a cell, a toolkit for therapeutic development. They plan to have 3 compounds in clinical trials within the next 3 years.

0
0
3248

article-image-big-data-analysis-using-googles-pagerank

Sugandha Lahoti

14 Dec 2017

8 min read

Getting started with big data analysis using Google's PageRank algorithm

Sugandha Lahoti

14 Dec 2017

8 min read

[box type="note" align="" class="" width=""]The article given below is a book excerpt from Java Data Analysis written by John R. Hubbard. Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks. This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. [/box] This post aims to help you learn how to analyse big data using Google’s PageRank algorithm. The term big data generally refers to algorithms for the storage, retrieval, and analysis of massive datasets that are too large to be managed by a single file server. Commercially, these algorithms were pioneered by Google, Google’s PageRank being one of them is considered in this article. Google PageRank algorithm Within a few years of the birth of the web in 1990, there were over a dozen search engines that users could use to search for information. Shortly after it was introduced in 1995, AltaVista became the most popular among them. These search engines would categorize web pages according to the topics that the pages themselves specified. But the problem with these early search engines was that unscrupulous web page writers used deceptive techniques to attract traffic to their pages. For example, a local rug-cleaning service might list "pizza" as a topic in their web page header, just to attract people looking to order a pizza for dinner. These and other tricks rendered early search engines nearly useless. To overcome the problem, various page ranking systems were attempted. The objective was to rank a page based upon its popularity among users who really did want to view its contents. One way to estimate that is to count how many other pages have a link to that page. For example, there might be 100,000 links to https://en.wikipedia.org/wiki/Renaissance, but only 100 to https://en.wikipedia.org/wiki/Ernest_Renan, so the former would be given a much higher rank than the latter. But simply counting the links to a page will not work either. For example, the rug-cleaning service could simply create 100 bogus web pages, each containing a link to the page they want users to view. In 1996, Larry Page and Sergey Brin, while students at Stanford University, invented their PageRank algorithm. It simulates the web itself, represented by a very large directed graph, in which each web page is represented by a node in the graph, and each page link is represented by a directed edge in the graph. The directed graph shown in the figure below could represent a very small network with the same properties: This has four nodes, representing four web pages, A, B, C, and D. The arrows connecting them represent page links. So, for example, page A has a link to each of the other three pages, but page B has a link only to A. To analyze this tiny network, we first identify its transition matrix, M : This square has 16 entries, mij, for 1 ≤ i ≤ 4 and 1 ≤ j ≤ 4. If we assume that a web crawler always picks a link at random to move from one page to another, then mij, equals the probability that it will move to node i from node j, (numbering the nodes A, B, C, and D as 1, 2, 3, and 4). So m12 = 1 means that if it's at node B, there's a 100% chance that it will move next to A. Similarly, m13 = m43 = ½ means that if it's at node C, there's a 50% chance of it moving to A and a 50% chance of it moving to D. Suppose a web crawler picks one of those four pages at random, and then moves to another page, once a minute, picking each link at random. After several hours, what percentage of the time will it have spent at each of the four pages? Here is a similar question. Suppose there are 1,000 web crawlers who obey that transition matrix as we've just described, and that 250 of them start at each of the four pages. After several hours, how many will be on each of the four pages? This process is called a Markov chain. It is a mathematical model that has many applications in physics, chemistry, computer science, queueing theory, economics, and even finance. The diagram in the above figure is called the state diagram for the process, and the nodes of the graph are called the states of the process. Once the state diagram is given, the meaning of the nodes (web pages, in this case) becomes irrelevant. Only the structure of the diagram defines the transition matrix M, and from that we can answer the question. A more general Markov chain would also specify transition probabilities between the nodes, instead of assuming that all transition choices are made at random. In that case, those transition probabilities become the non-zero entries of the M. A Markov chain is called irreducible if it is possible to get to any state from any other state. According to the mathematical theory of Markov chains, if the chain is irreducible, then we can compute the answer to the preceding question using the transition matrix. What we want is the steady state solution; that is, a distribution of crawlers that doesn't change. The crawlers themselves will change, but the number at each node will remain the same. To calculate the steady state solution mathematically, we first have to realize how to apply the transition matrix M. The fact is that if x = (x1 , x2 , x3 , x4 ) is the distribution of crawlers at one minute, and the next minute the distribution is y = (y1 , y2 , y3 , y4 ), then y = Mx , using matrix multiplication. So now, if x is the steady state solution for the Markov chain, then Mx = x. This vector equation gives us four scalar equations in four unknowns: One of these equations is redundant (linearly dependent). But we also know that x1 + x2 + x3 + x4 = 1, since x is a probability vector. So, we're back to four equations in four unknowns. The solution is: The point of that example is to show that we can compute the steady state solution to a static Markov chain by solving an n × n matrix equation, where n is the number of states. By static here, we mean that the transition probabilities mij do not change. Of course, that does not mean that we can mathematically compute the web. In the first place, n > 30,000,000,000,000 nodes! And in the second place, the web is certainly not static. Nevertheless, this analysis does give some insight about the web; and it clearly influenced the thinking of Larry Page and Sergey Brin when they invented the PageRank algorithm. The purpose of the PageRank algorithm is to rank the web pages according to some criteria that would resemble their importance, or at least their frequency of access. The original simple (pre-PageRank) idea was to count the number of links to each page and use something proportional to that count for the rank. Following that line of thought, we can imagine that, if x = (x1 , x2 ,..., xn )T is the page rank for the web (that is, if xj is the relative rank of page j and ∑xj = 1), then Mx = x, at least approximately. Another way to put that is that repeated applications of M to x should nudge x closer and closer to that (unattainable) steady state. That brings us (finally) to the PageRank formula: where ε is a very small positive constant, z is the vector of all 1s, and n is the number of nodes. The vector expression on the right defines the transformation function f which replaces a page rank estimate x with an improved page rank estimate. Repeated applications of this function gradually converge to the unknown steady state. Note that in the formula, f is a function of more than just x. There are really four inputs: x, M, ε , and n. Of course, x is being updated, so it changes with each iteration. But M, ε , and n change too. M is the transition matrix, n is the number of nodes, and ε is a coefficient that determines how much influence the z/n vector has. For example, if we set ε to 0.00005, then the formula becomes: This is how Google's PageRank algorithm can be utilized for the analysis of very large datasets. To learn how to implement this algorithm and various other machine learning algorithms for big data, data visualization, and more using Java, check out this book Java Data Analysis.

0
0
5574

article-image-how-to-build-a-cold-start-friendly-content-based-recommender-using-apache-spark-sql

Sunith Shetty

14 Dec 2017

12 min read

How to build a cold-start friendly content-based recommender using Apache Spark SQL

Sunith Shetty

14 Dec 2017

12 min read

[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Big Data Analytics with Java written by Rajat Mehta. In this book, you will learn how to perform analytics on big data using Java, machine learning and other big data tools. [/box] From the below given article, you will learn how to create the content-based recommendation system using movielens dataset. Getting started with content-based recommendation systems In content-based recommendations, the recommendation systems check for similarity between the items based on their attributes or content and then propose those items to the end users. For example, if there is a movie and the recommendation system has to show similar movies to the users, then it might check for the attributes of the movie such as the director name, the actors in the movie, the genre of the movie, and so on or if there is a news website and the recommendation system has to show similar news then it might check for the presence of certain words within the news articles to build the similarity criteria. As such the recommendations are based on actual content whether in the form of tags, metadata, or content from the item itself (as in the case of news articles). Dataset For this article, we are using the very popular movielens dataset, which was collected by the GroupLens Research Project at the University of Minnesota. This dataset contains a list of movies that are rated by their customers. It can be downloaded from the site https:/grouplens.org/datasets/movielens. There are few files in this dataset, they are: u.item:This contains the information about the movies in the dataset. The attributes in this file are: Movie Id This is the ID of the movie Movie title This is the title of the movie Video release date This is the release date of the movie Genre (isAction, isComedy, isHorror) This is the genre of the movie. This is represented by multiple flats such as isAction, isComedy, isHorror, and so on and as such it is just a Boolean flag with 1 representing users like this genre and 0 representing user do not like this genre. u.data: This contains the data about the users rating for the movies that they have watched. As such there are three main attributes in this file and they are: Userid This is the ID of the user rating the movie Item id This is the ID of the movie Rating This is the rating given to the movie by the user u.user: This contains the demographic information about the users. The main attributes from this file are: User id This is the ID of the user rating the movie Age This is the age of the user Gender This is the rating given to the movie by the user ZipCode This is the zip code of the place of the user As the data is pretty clean and we are only dealing with a few parameters, we are not doing any extensive data exploration in this article for this dataset. However, we urge the users to try and practice data exploration on this dataset and try creating bar charts using the ratings given, and find patterns such as top-rated movies or top-rated action movies, or top comedy movies, and so on. Content-based recommender on MovieLens dataset We will build a simple content-based recommender using Apache Spark SQL and the MovieLens dataset. For figuring out the similarity between movies, we will use the Euclidean Distance. This Euclidean Distance would be run using the genre properties, that is, action, comedy, horror, and so on of the movie items and also run on the average rating for each movie. First is the usual boiler plate code to create the SparkSession object. For this we first build the Spark configuration object and provide the master which in our case is local since we are running this locally in our computer. Next using this Spark configuration object we build the SparkSession object and also provide the name of the application here and that is ContentBasedRecommender. SparkConf sc = new SparkConf().setMaster("local"); SparkSession spark = SparkSession .builder() .config(sc) .appName("ContentBasedRecommender") .getOrCreate(); Once the SparkSession is created, load the rating data from the u.data file. We are using the Spark RDD API for this, the user can feel free to change this code and use the dataset API if they prefer to use that. On this Java RDD of the data, we invoke a map function and within the map function code we extract the data from the dataset and populate the data into a Java POJO called RatingVO: JavaRDD<RatingVO> ratingsRDD = spark.read().textFile("data/movie/u.data").javaRDD() .map(row -> { RatingVO rvo = RatingVO.parseRating(row); return rvo; }); As you can see, the data from each row of the u.data dataset is passed to the RatingVO.parseRating method in the lambda function. This parseRating method is declared in the POJO RatingVO. Let's look at the RatingVO POJO first. For maintaining brevity of the code, we are just showing the first few lines of the POJO here: public class RatingVO implements Serializable { private int userId; private int movieId; private float rating; private long timestamp; private int like; As you can see in the preceding code we are storing movieId, rating given by the user and the userId attribute in this POJO. This class RatingVO contains one useful method parseRating as shown next. In this method, we parse the actual data row (that is tab separated) and extract the values of rating, movieId, and so on from it. In the method, do note the if...else block, it is here that we check whether the rating given is greater than three. If it is, then we take it as a 1 or like and populate it in the like attribute of RatingVO, else we mark it as 0 or dislike for the user: public static RatingVO parseRating(String str) { String[] fields = str.split("t"); ... int userId = Integer.parseInt(fields[0]); int movieId = Integer.parseInt(fields[1]); float rating = Float.parseFloat(fields[2]); if(rating > 3) return new RatingVO(userId, movieId, rating, timestamp,1); return new RatingVO(userId, movieId, rating, timestamp,0); } Once we have the RDD object built with the POJO's RatingVO contained per row, we are now ready to build our beautiful dataset object. We will use the createDataFrame method on the spark session and provide the RDD containing data and the corresponding POJO class, that is, RatingVO. We also register this dataset as a temporary view called ratings so that we can fire SQL queries on it: Dataset<Row> ratings = spark.createDataFrame(ratingsRDD, RatingVO.class); ratings.createOrReplaceTempView("ratings"); ratings.show(); The show method in the preceding code would print the first few rows of this dataset as shown next: Next we will fire a Spark SQL query on this view (ratings) and in this SQL query we will find the average rating in this dataset for each movie. For this, we will group by on the movieId in this query and invoke an average function on the rating column as shown here: Dataset<Row> moviesLikeCntDS = spark.sql("select movieId,avg(rating) likesCount from ratings group by movieId"); The moviesLikeCntDS dataset now contains the results of our group by query. Next we load the data for the movies from the u.item data file. As we did for users, we store this data for movies in a MovieVO POJO. This MovieVO POJO contains the data for the movies such as the MovieId, MovieTitle and it also stores the information about the movie genre such as action, comedy, animation, and so on. The genre information is stored as 1 or 0. For maintaining the brevity of the code, we are not showing the full code of the lambda function here: JavaRDD<MovieVO> movieRdd = spark.read().textFile("data/movie/u.item").javaRDD() .map(row -> { String[] strs = row.split("|"); MovieVO mvo = new MovieVO(); mvo.setMovieId(strs[0]); mvo.setMovieTitle(strs[1]); ... mvo.setAction(strs[6]); mvo.setAdventure(strs[7]); ... return mvo; }); As you can see in the preceding code we split the data row and from the split result, which is an array of strings, we extract our individual values and store in the MovieVO object. The results of this operation are stored in the movieRdd object, which is a Spark RDD. Next, we convert this RDD into a Spark dataset. To do so, we invoke the Spark createDataFrame function and provide our movieRdd to it and also supply the MovieVO POJO here. After creating the dataset for the movie RDD, we perform an important step here of combining our movie dataset with the moviesLikeCntDS dataset we created earlier (recall that moviesLikeCntDS dataset contains our movie ID and the average rating for that review in this movie dataset). We also register this new dataset as a temporary view so that we can fire Spark SQL queries on it: Dataset<Row> movieDS = spark.createDataFrame(movieRdd.rdd(), MovieVO.class).join(moviesLikeCntDS, "movieId"); movieDS.createOrReplaceTempView("movies"); Before we move further on this program, we will print the results of this new dataset. We will invoke the show method on this new dataset: movieDS.show(); This would print the results as (for brevity we are not showing the full columns in the screenshot): Now comes the turn for the meat of this content recommender program. Here we see the power of Spark SQL, we will now do a self join within a Spark SQL query to the temporary view movie. Here, we will make a combination of every movie to every other movie in the dataset except to itself. So if you have say three movies in the set as (movie1, movie2, movie3), this would result in the combinations (movie1, movie2), (movie1, movie3), (movie2, movie3), (movie2, movie1), (movie3, movie1), (movie3, movie2). You must have noticed by now that this query would produce duplicates as (movie1, movie2) is same as (movie2, movie1), so we will have to write separate code to remove those duplicates. But for now the code for fetching these combinations is shown as follows, for maintaining brevity we have not shown the full code: Dataset<Row> movieDataDS = spark.sql("select m.movieId movieId1,m.movieTitle movieTitle1,m.action action1,m.adventure adventure1, ... " + "m2.movieId movieId2,m2.movieTitle movieTitle2,m2.action action2,m2.adventure adventure2, ..." If you invoke show on this dataset and print the result, it would print a lot of columns and their first few values. We will show some of the values next (note that we show values for movie1 on the left-hand side and the corresponding movie2 on the right-hand side): Did you realize by now that on a big data dataset this operation is massive and requires extensive computing resources. Thankfully on Spark you can distribute this operation to run on multiple computer notes. For this, you can tweak the spark-submit job with the following parameters so as to extract the last bit of juice from all the clustered nodes: spark-submit executor-cores <Number of Cores> --num-executor <Number of Executors> The next step is the most important one in this content management program. We will run over the results of the previous query and from each row we will pull out the data for movie1 and movie2 and cross compare the attributes of both the movies and find the Euclidean Distance between them. This would show how similar both the movies are; the greater the distance, the less similar the movies are. As expected, we convert our dataset to an RDD and invoke a map function and within the lambda function for that map we go over all the dataset rows and from each row we extract the data and find the Euclidean Distance. For maintaining conciseness, we depict only a portion of the following code. Also note that we pull the information for both movies and store the calculated Euclidean Distance in a EuclidVO Java POJO object: JavaRDD<EuclidVO> euclidRdd = movieDataDS.javaRDD().map( row -> { EuclidVO evo = new EuclidVO(); evo.setMovieId1(row.getString(0)); evo.setMovieTitle1(row.getString(1)); evo.setMovieTitle2(row.getString(22)); evo.setMovieId2(row.getString(21)); int action = Math.abs(Integer.parseInt(row.getString(2)) – Integer.parseInt(row.getString(23)) ); ... double likesCnt = Math.abs(row.getDouble(20) - row.getDouble(41)); double euclid = Math.sqrt(action * action + ... + likesCnt * likesCnt); evo.setEuclidDist(euclid); return evo; }); As you can see in the bold text in the preceding code, we are calculating the Euclid Distance and storing it in a variable. Next, we convert our RDD to a dataset object and again register it in a temporary view movieEuclids and now are ready to fire queries for our predictions: Dataset<Row> results = spark.createDataFrame(euclidRdd.rdd(), EuclidVO.class); results.createOrReplaceTempView("movieEuclids"); Finally, we are ready to make our predictions using this dataset. Let's see our first prediction; let's find the top 10 movies that are closer to Toy Story, and this movie has movieId of 1. We will fire a simple query on the view movieEuclids to find this: spark.sql("select * from movieEuclids where movieId1 = 1 order by euclidDist asc").show(20); As you can see in the preceding query, we order by Euclidean Distance in ascending order as lesser distance means more similarity. This would print the first 20 rows of the result as follows: As you can see in the preceding results, they are not bad for content recommender system with little to no historical data. As you can see the first few movies returned are also famous animation movies like Toy Story and they are Aladdin, Winnie the Pooh, Pinocchio, and so on. So our little content management system we saw earlier is relatively okay. You can do many more things to make it even better by trying out more properties to compare with and using a different similarity coefficient such as Pearson Coefficient, Jaccard Distance, and so on. From the article, we learned how content recommenders can be built on zero to no historical data. These recommenders are based on the attributes present on the item itself using which we figure out the similarity with other items and recommend them. To know more about data analysis, data visualization and machine learning techniques you can refer to the book Big Data Analytics with Java.

0
0
6685

article-image-tinkering-with-ticks-in-matplotlib-2-0

Sugandha Lahoti

13 Dec 2017

6 min read

Tinkering with ticks in Matplotlib 2.0

Sugandha Lahoti

13 Dec 2017

6 min read

[box type="note" align="" class="" width=""]This is an excerpt from the book titled Matplotlib 2.x By Example written by Allen Chi Shing Yu, Claire Yik Lok Chung, and Aldrin Kay Yuen Yim,. The book covers basic know-how on how to create and customize plots by Matplotlib. It will help you learn to visualize geographical data on maps and implement interactive charts. [/box] The article talks about how you can manipulate ticks in Matplotlib 2.0. It includes steps to adjust tick spacing, customizing tick formats, trying out the ticker locator and formatter, and rotating tick labels. What are Ticks Ticks are dividers on an axis that help readers locate the coordinates. Tick labels allow estimation of values or, sometimes, labeling of a data series as in bar charts and box plots. Adjusting tick spacing Tick spacing can be adjusted by calling the locator methods: ax.xaxis.set_major_locator(xmajorLocator) ax.xaxis.set_minor_locator(xminorLocator) ax.yaxis.set_major_locator(ymajorLocator) ax.yaxis.set_minor_locator(yminorLocator) Here, ax refers to axes in a Matplotlib figure. Since set_major_locator() or set_minor_locator() cannot be called from the pyplot interface but requires an axis, we call pyplot.gca() to get the current axes. We can also store a figure and axes as variables at initiation, which is especially useful when we want multiple axes. Removing ticks NullLocator: No ticks Drawing ticks in multiples Spacing ticks in multiples of a given number is the most intuitive way. This can be done by using MultipleLocator space ticks in multiples of a given value. Automatic tick settings MaxNLocator: This finds the maximum number of ticks that will display nicely AutoLocator: MaxNLocator with simple defaults AutoMinorLocator: Adds minor ticks uniformly when the axis is linear Setting ticks by the number of data points IndexLocator: Sets ticks by index (x = range(len(y)) Set scaling of ticks by mathematical functions LinearLocator: Linear scale LogLocator: Log scale SymmetricalLogLocator: Symmetrical log scale, log with a range of linearity LogitLocator: Logit scaling Locating ticks by datetime There is a series of locators dedicated to displaying date and time: MinuteLocator: Locate minutes HourLocator: Locate hours DayLocator: Locate days of the month WeekdayLocator: Locate days of the week MonthLocator: Locate months, for example, 8 for August YearLocator: Locate years that in multiples RRuleLocator: Locate using matplotlib.dates.rrulewrapper The rrulewrapper is a simple wrapper around a dateutil.rrule (dateutil) that allows almost arbitrary date tick specifications AutoDateLocator: On autoscale, this class picks the best MultipleDateLocator to set the view limits and the tick locations Customizing tick formats Tick formatters control the style of tick labels. They can be called to set the major and minor tick formats on the x and y axes as follows: ax.xaxis.set_major_formatter( xmajorFormatter ) ax.xaxis.set_minor_formatter( xminorFormatter ) ax.yaxis.set_major_formatter( ymajorFormatter ) ax.yaxis.set_minor_formatter( yminorFormatter ) Removing tick labels NullFormatter: No tick labels Fixing labels FixedFormatter: Labels are set manually Setting labels with strings IndexFormatter: Take labels from a list of strings StrMethodFormatter: Use the string format method Setting labels with user-deﬁned functions FuncFormatter: Labels are set by a user-defined function Formatting axes by numerical values ScalarFormatter: The format string is automatically selected for scalars by default The following formatters set values for log axes: LogFormatter: Basic log axis LogFormatterExponent: Log axis using exponent = log_base(value) LogFormatterMathtext: Log axis using exponent = log_base(value) using Math text LogFormatterSciNotation: Log axis with scientific notation LogitFormatter: Probability formatter Trying out the ticker locator and formatter To demonstrate the ticker locator and formatter, here we use Netflix subscriber data as an example. Business performance is often measured seasonally. Television shows are even more "seasonal". Can we better show it in the timeline? import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as ticker """ Number for Netflix streaming subscribers from 2012-2017 Data were obtained from Statista on https://www.statista.com/statistics/250934/quarterly-number-of-netflix-stre aming-subscribers-worldwide/ on May 10, 2017. The data were originally published by Netflix in April 2017. """ # Prepare the data set x = range(2011,2018) y = [26.48,27.56,29.41,33.27,36.32,37.55,40.28,44.35, 48.36,50.05,53.06,57.39,62.27,65.55,69.17,74.76,81.5, 83.18,86.74,93.8,98.75] # quarterly subscriber count in millions # Plot lines with different line styles plt.plot(y,'^',label = 'Netflix subscribers',ls='-') # get current axes and store it to ax ax = plt.gca() # set ticks in multiples for both labels ax.xaxis.set_major_locator(ticker.MultipleLocator(4)) # set major marks # every 4 quarters, ie once a year ax.xaxis.set_minor_locator(ticker.MultipleLocator(1)) # set minor marks # for each quarter ax.yaxis.set_major_locator(ticker.MultipleLocator(10)) # ax.yaxis.set_minor_locator(ticker.MultipleLocator(2)) # label the start of each year by FixedFormatter ax.get_xaxis().set_major_formatter(ticker.FixedFormatter(x)) plt.legend() plt.show() From this plot, we see that Netflix has a pretty linear growth of subscribers from the year 2012 to 2017. We can tell the seasonal growth better after formatting the x axis in a quarterly manner. In 2016, Netflix was doing better in the latter half of the year. Any TV shows you watched in each season? Rotating tick labels A figure can get too crowded or some tick labels may get skipped when we have too many tick labels or if the label strings are too long. We can solve this by rotating the ticks, for example, by pyplot.xticks(rotation=60): import matplotlib.pyplot as plt import numpy as np import matplotlib as mpl mpl.style.use('seaborn') techs = ['Google Adsense','DoubleClick.Net','Facebook Custom Audiences','Google Publisher Tag', 'App Nexus'] y_pos = np.arange(len(techs)) # Number of websites using the advertising technologies # Data were quoted from builtwith.com on May 8th 2017 websites = [14409195,1821385,948344,176310,283766] plt.bar(y_pos, websites, align='center', alpha=0.5) # set x-axis tick rotation plt.xticks(y_pos, techs, rotation=25) plt.ylabel('Live site count') plt.title('Online advertising technologies usage') plt.show() Use pyplot.tight_layout() to avoid image clipping. Using rotated labels can sometimes result in image clipping, as follows, if you save the figure by pyplot.savefig(). You can call pyplot.tight_layout() before pyplot.savefig() to ensure a complete image output. We saw how ticks can be adjusted, customized, rotated and formatted in Matplotlib 2.0 for easy readability, labelling and estimation of values. To become well-versed with Matplotlib for your day to day work, check out this book Matplotlib 2.x By Example.

0
0
6690

article-image-how-to-customize-lines-and-markers-in-matplotlib-2-0

Sugandha Lahoti

13 Dec 2017

6 min read

How to Customize lines and markers in Matplotlib 2.0

Sugandha Lahoti

13 Dec 2017

6 min read

[box type="note" align="" class="" width=""]This article is an excerpt from a book by Allen Chi Shing Yu, Claire Yik Lok Chung, and Aldrin Kay Yuen Yim, titled Matplotlib 2.x By Example. The book illustrates methods and applications of various plot types through real world examples.[/box] In this post we demonstrate how you can manipulate Lines and Markers in Matplotlib 2.0. It covers steps to plot, customize, and adjust line graphs and markers. What are Lines and Markers Lines and markers are key components found among various plots. Many times, we may want to customize their appearance to better distinguish different datasets or for better or more consistent styling. Whereas markers are mainly used to show data, such as line plots and scatter plots, lines are involved in various components, such as grids, axes, and box outlines. Like text properties, we can easily apply similar settings for different line or marker objects with the same method. Lines Most lines in Matplotlib are drawn with the lines class, including the ones that display the data and those setting area boundaries. Their style can be adjusted by altering parameters in lines.Line2D. We usually set color, linestyle, and linewidth as keyword arguments. These can be written in shorthand as c, ls, and lw respectively. In the case of simple line graphs, these parameters can be parsed to the plt.plot() function: import numpy as np import matplotlib.pyplot as plt # Prepare a curve of square numbers x = np.linspace(0,200,100) # Prepare 100 evenly spaced numbers from # 0 to 200 y = x**2 # Prepare an array of y equals to x squared # Plot a curve of square numbers plt.plot(x,y,label = '$x^2$',c='burlywood',ls=('dashed'),lw=2) plt.legend() plt.show() With the preceding keyword arguments for line color, style, and weight, you get a woody dashed curve: Choosing dash patterns Whether a line will appear solid or with dashes is set by the keyword argument linestyle. There are a few simple patterns that can be set by the linestyle name or the corresponding shorthand. We can also define our own dash pattern: 'solid' or '-': Simple solid line (default) 'dashed' or '--': Dash strokes with equal spacing 'dashdot' or '-.': Alternate dashes and dots 'None', ' ', or '': No lines (offset, on-off-dash-seq): Customized dashes; we will demonstrate in the following advanced example Setting capstyle of dashes The cap of dashes can be rounded by setting the dash_capstyle parameter if we want to create a softer image such as in promotion: import numpy as np import matplotlib.pyplot as plt # Prepare 6 lines x = np.linspace(0,200,100) y1 = x*0.5 y2 = x y3 = x*2 y4 = x*3 y5 = x*4 y6 = x*5 # Plot lines with different dash cap styles plt.plot(x,y1,label = '0.5x', lw=5, ls=':',dash_capstyle='butt') plt.plot(x,y2,label = 'x', lw=5, ls='--',dash_capstyle='butt') plt.plot(x,y3,label = '2x', lw=5, ls=':',dash_capstyle='projecting') plt.plot(x,y4,label = '3x', lw=5, ls='--',dash_capstyle='projecting') plt.plot(x,y5,label = '4x', lw=5, ls=':',dash_capstyle='round') plt.plot(x,y6,label = '5x', lw=5, ls='--',dash_capstyle='round') plt.show() Looking closely, you can see that the top two lines are made up of rounded dashes. The middle two lines with projecting capstyle have closer spaced dashes than the lower two with butt one, given the same default spacing: Markers A marker is another type of important component for illustrating data, for example, in scatter plots, swarm plots, and time series plots. Choosing markers There are two groups of markers, unfilled markers and filled_markers. The full set of available markers can be found by calling Line2D.markers, which will output a dictionary of symbols and their corresponding marker style names. A subset of filled markers that gives more visual weight is under Line2D.filled_markers. Here are some of the most typical markers: 'o' : Circle 'x' : Cross '+' : Plus sign 'P' : Filled plus sign 'D' : Filled diamond 'S' : Square '^' : Triangle Here is a scatter plot of random numbers to illustrate the various marker types: import numpy as np import matplotlib.pyplot as plt from matplotlib.lines import Line2D # Prepare 100 random numbers to plot x = np.random.rand(100) y = np.random.rand(100) # Prepare 100 random numbers within the range of the number of # available markers as index # Each random number will serve as the choice of marker of the # corresponding coordinates markerindex = np.random.randint(0, len(Line2D.markers), 100) # Plot all kinds of available markers at random coordinates # for each type of marker, plot a point at the above generated # random coordinates with the marker type for k, m in enumerate(Line2D.markers): i = (markerindex == k) plt.scatter(x[i], y[i], marker=m) plt.show() The different markers suit different densities of data for better distinction of each point: Adjusting marker sizes We often want to change the marker sizes so as to make them clearer to read from a slideshow. Sometimes we need to adjust the markers to have a different numerical value of marker size to: import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as ticker # Prepare 5 lines x = np.linspace(0,20,10) y1 = x y2 = x*2 y3 = x*3 y4 = x*4 y5 = x*5 # Plot lines with different marker sizes plt.plot(x,y1,label = 'x', lw=2, marker='s', ms=10) # square size 10 plt.plot(x,y2,label = '2x', lw=2, marker='^', ms=12) # triangle size 12 plt.plot(x,y3,label = '3x', lw=2, marker='o', ms=10) # circle size 10 plt.plot(x,y4,label = '4x', lw=2, marker='D', ms=8) # diamond size 8 plt.plot(x,y5,label = '5x', lw=2, marker='P', ms=12) # filled plus sign # size 12 # get current axes and store it to ax ax = plt.gca() plt.show() After tuning the marker sizes, the different series look quite balanced: If all markers are set to have the same markersize value, the diamonds and squares may look heavier: Thus, we learned how to customize lines and markers in a Matplotlib plot for better visualization and styling. To know more about how to create and customize plots in Matplotlib, check out this book Matplotlib 2.x By Example.

0
0
20363

article-image-nips-2017-6-challenges-deep-learning-robotics-pieter-abbeel

Aaron Lazar

13 Dec 2017

10 min read

NIPS 2017 Special: 6 Key Challenges in Deep Learning for Robotics by Pieter Abbeel

Aaron Lazar

13 Dec 2017

10 min read

Pieter Abbeel is a professor at UC Berkeley and a former Research Scientist at OpenAI. His current research focuses on robotics and machine learning with particular focus on deep reinforcement learning, deep imitation learning, deep unsupervised learning, meta-learning, learning-to-learn, and AI safety. This article attempts to bring our readers to Pieter’s fantastic Keynote speech at NIPS 2017. It talks about the implementation of Deep Reinforcement Learning in Robotics, what challenges exist and how these challenges can be overcome. Once you’ve been through this article, we’re certain you’d be extremely interested in watching the entire video on the NIPS Facebook page. All images in this article come from his presentation slides and do not belong to us. Robotics in ML has been growing in leaps and bounds with several companies investing huge amounts to tie both these technologies together in the best way possible. Although, there are still several aspects that are not thoroughly accomplished when it comes to AI Robotics. Here are a few of them: Maximize Signal Extracted from Real World Experience Faster/Data efficient Reinforcement Learning Long Horizon Reasoning Taskability (Imitation Learning) Lifelong Learning (Continuous Adaptation) Leverage Simulation Maximise signal extracted from real world experience We need more real world data, so we need to extract as much signal from it. In the diagram below, are the different layers of machine learning that engineers perform. There are engineers who look at the entire cake and train the agent to take both the learning from the reward and from auxiliary signals. This is because using only Reinforcement Learning doesn’t give you a lot of signal. Is there then, a possibility of having a Reward Signal in RL that ties more RL into the system? There’s something known as Hindsight Experience Replay. The idea is to get a reward signal from any experience by assuming the goal equals whatever happened, and not just from success like in usual RL. For this, we need to assume that whatever the agent does is a success. We use Q-learning and instead of a standard Q function, we use multiple goals even though they were not really a goal when you were acting. Here, a replay buffer collects experience, Q-learning is then applied and a hindsight replay is performed to infuse a new reward for everything the agent has done. For various robotic tasks like pushing, sliding and picking and placing objects, this does very well. Faster Reinforcement Learning When we’re talking about faster RL, we’re talking about much more data efficient RL. Here is a diagram that demonstrates standard RL: An agent lets a robot perform an action in a particular environment or situation in order to achieve a reward. Here, the goal is to maximise the reward. As against Supervised Learning, there is no supervision as to whether the actions taken by the agents are right or wrong. That brings in a few additional challenges in RL, which are: Credit assignment: This is a major problem and is where you get the signal from in RL Stability: Because of the feedback loop, the system could destabilize and destroy itself Exploration: Doing things you’ve never done before when the only way to learn is based on what you’ve done before Despite this, there have been great improvements in Reinforcement Learning in the past few years, enabling AI systems to play games like Go, Dota, etc. It has also been implemented in building robots by NASA for planetary exploration. But the question still exists: “How good is learning?” In the game of pong, a human takes roughly 2 hours to learn what Deep Q-Network (DQN) learns in 40 hours! A more careful study reveals that after 15 minutes, humans tend to outperform DDQN that has trained for 115 hours. This is a tremendous gap in terms of learning efficiency. So, how do we overcome the challenge? Several fully generalised algorithms like Trust Region Policy Optimization (TRPO), DQN, Asynchronous Actor-Critic Agents (A3C) and Rainbow are available, meaning that they can be applied to any kind of environment. Although, only a very small subset of environments are actually encountered in the real world. Can we develop fast RL algorithms that take advantage of this situation? RL Agents can be reused to train various policies. The RL algorithm is developed to train the policy to adapt to a particular environment A. This can then be replicated to environment B and so on. Humans develop the RL algorithm and then rely on it to train the policy. Despite this, none of the algorithms are as good as human learners. Do we have an alternative then? Indeed, yes! Why not let the system learn not just the policy but the algorithm as well or in other words, the entire agent? Enter Meta-Reinforcement Learning In Meta-RL, the learning algorithm itself is being learnt. You could relate this to meta-programming, where one program is trained to write another. This process helps a system learn the world better so it can pick up on learning a new situation quicker. So how does this work? The system is faced with many environments, so that it learns the algorithms and then outputs a faster RL Agent. So, when faced with a new environment, it quickly adapts to it. For evaluating the actual performance, the Multi-armed bandits problem can be considered. Here’s the setting: each bandit has its own distribution over payouts, and in each episode you can choose one bandit. A good RL agent should be able to explore a sufficient number of bandits and exploit the best ones. We need to come up with an algorithm that pulls a higher probability of payoff, rather than a low probability. There are already several asymptotically optimal algorithms like Gittins index, UCB1, Thompson Sampling, that have been created to solve this problem. Here’s a comparison of some of them with the Meta-RL algorithm. The result is quite impressive. The Meta-RL algorithm is equally competitive with Gittins. In a case where the task is to obtain an on target running direction as well as attain the maximum speed, the agent when dropped into an environment is able to master the the task almost instantly. However, meta-learning succeeds only 2/3rd of the time. It doesn’t succeed the rest of the time due to two main reasons. Overfitting: You would usually tend to overfit to the current situation rather than generically fitting to situations Underfitting: This is when you don’t get enough signal to get any rewards The solution is to put a different structure underneath the system. Instead of using an RNN, we use a wavenet like architecture or maybe Simple Neural Attentive Meta-Learner (SNAIL). SNAIL is able to perform a bit better than RL2 in the same Bandits problem. Longer Horizon Reasoning We need to learn to reason over longer horizons than what canonical algorithms do. For this, we need hierarchy. For example, suppose a robot has to perform 10 tasks in a day. This would mean it has 10 timesteps per day? Each of these 10 tasks would have subtasks under them. Let’s assume that would make it a total of 1000 time steps. To perform these tasks, the robot would need footstep planning, which would amount to 100,000 time steps. Footsteps in turn require commands to be sent to motors, which would make it 100,000,000 time steps. This is a very long horizon. We can formulate this as a meta-learning problem. The agent has to solve a distribution of related long-horizon tasks with the goal of learning new tasks in the distribution quickly. If that is our objective, hierarchy would fall out. Taskability (Imitation Learning) There are several things we want from robots. We need to be able to tell them what to do and we can do this by giving them examples. This is called Imitation Learning, which can be successfully implemented to a variety of use cases. The idea is to collect many demonstrations, then train something from those demonstrations, then deploy the learn policy. The problem with this is that everytime there is a new task, you start from scratch. The solution to this problem is experience through several demonstrations, as in the case of humans. Although, instead of running the agent through several demos, it is trained completely on one, then showed a frame of a second demo, where it uses it to predict what the outcome would be. This is known as One-Shot imitation learning which is a part of supervised learning, where in several demonstrations are used to train the system to be able to handle any new environment it is put into. Lifelong learning (Continuous Adaptation) What we usually do in ML can be divided into two broad steps: Run Machine Learning Deploy it, which is a canonical way In this case, all the learning happens ahead of time, before the deployment. However, in real world cases, what you learn from past data might not work in the future. There is a necessity to learn during deployment, which is a lifelong learning spirit. This brings us to Continuous Adaptation. Can we train an agent to be good at non stationary environments? We need to find whether at the time of meta training the agent is able to adapt to a new/changing task. We can try changing the dynamics since it’s hard to do ML training in the real world. At the same time, we can also use competitor environments; which means you’re in an environment with other agents who are trying to beat your agent. The only way to succeed is to continuously adapt more quickly than the others. Leverage Simulation Simulation is very helpful and it’s not that expensive. It’s fast and scalable and lets you label more easily. However, the challenge is how to get useful things out of the simulator. One approach is to build realistic simulators. This is quite expensive. Another way is to use a close enough simulator that uses real world data through domain confusion or adaptation. It allows to learn from a small amount of real world data and is quite successful. Further, another approach to look at is Domain Randomisation, which is also working well in the real world. If the model sees enough simulated variations, the real world might appear like just the next simulator. This has worked in the context of using simulator data to train a quadcopter to avoid collision. Moreover, when pre trained from imagenet or just training in simulation, both performances were similar, after around 8000 examples. To conclude, the beauty of meta learning is that it enables the discovery of algorithms that are data driven, as against those that are created from pure human ingenuity. This requires more compute power, but several companies like Nvidia and Intel are working hard to overcome this challenge. This will surely power meta-learning to great heights to be implemented in robotics. While we figure out these above mentioned technical challenges of incorporating AI in robotics, some significant other challenges that we must focus on in parallel are safe learning, and value alignment among others.

0
0
3505

article-image-3-simple-steps-convert-mathematical-formula-model

Vedika Naik

13 Dec 2017

3 min read

3 simple steps to convert a mathematical formula to a model

Vedika Naik

13 Dec 2017

3 min read

[box type="note" align="" class="" width=""]The following is an excerpt from the book Scala for Machine Learning, Second Edition, written by Patrick R. Nicolas. The book will help you leverage Scala and Machine Learning to study and construct systems that can learn from data. [/box] This article aims to teach how a mathematical formula can be converted into a machine learning model in three easy steps: Stackable traits enable developers to follow a strict mathematical formalism while implementing a model in Scala. Scientists use a universally accepted template to solve mathematical problems: Declare the variables relevant to the problem. Define a model (equation, algorithm, formulas…) as the solution to the problem. Instantiate the variables and execute the model to solve the problem. Let's consider the example of the concept of kernel functions, a model that consists of the composition of two mathematical functions, and its potential implementation in Scala. Step 1 – variable declaration The implementation consists of wrapping (scope) the two functions into traits and defining these functions as abstract values. The mathematical formalism is as follows: The Scala implementation is represented here: type V = Vector[Double] trait F{ val f: V => V} trait G{ val g: V => Double } Step 2 – model definition The model is defined as the composition of the two functions. The stack of traits G, F describes the type of compatible functions that can be composed using the self-referenced constraint self: G with F: Formalism h = f o g The implementation is as follows: class H {self: G with F =>def apply(v:V): Double =g(f(v))} Step 3 – instantiation The model is executed once the variable f and g are instantiated. The formalism is as follows: The implementation is as follows: val h = new H with G with F { val f: V=>V = (v: V) =>v.map(exp(_)) val g: V => Double = (v: V) =>v.sum Note: Lazy value trigger In the preceding example, the value of h(v) = g(f(v)) can be automatically computed as soon as g and f are initialized, by declaring h as a lazy value. Clearly, Scala preserves the formalism of mathematical models, making it easier for scientists and developers to migrate their existing projects written in scientific oriented languages such as R. Note: Emulation of R Most data scientists use the language R to create models and apply learning strategies. They may consider Scala as an alternative to R in some cases, as Scala preserves the mathematical formalism used in models implemented in R. In this article we explained the concept preservation of mathematical formalism. This needs to be further extended to dynamic creation of workflows using traits. Design patterns, such as the cake pattern, can be used to do this. If you enjoyed this excerpt, be sure to check out the book Scala Machine Learning, Second Edition to gain solid foundational knowledge in machine learning with Scala.

0
0
3619

article-image-write-effective-stored-procedures-postgresql

Amey Varangaonkar

12 Dec 2017

8 min read

How to write effective Stored Procedures in PostgreSQL

Amey Varangaonkar

12 Dec 2017

8 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book Mastering PostgreSQL 9.6 written by Hans-Jürgen Schönig. PostgreSQL is an open source database used not only for handling large datasets (Big Data) but also as a JSON document database. It finds applications in the software as well as the web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently. [/box] In this article, we explain the concept of Stored Procedures, and how to write them effectively in PostgreSQL 9.6. When it comes to stored procedures, PostgreSQL differs quite significantly from other database systems. Most database engines force you to use a certain programming language to write server-side code. Microsoft SQL Server offers Transact-SQL while Oracle encourages you to use PL/SQL. PostgreSQL does not force you to use a certain language but allows you to decide on what you know best and what you like best. The reason PostgreSQL is so flexible is actually quite interesting too in a historical sense. Many years ago, one of the most well-known PostgreSQL developers (Jan Wieck), who had written countless patches back in its early days, came up with the idea of using TCL as the server-side programming language. The trouble was simple—nobody wanted to use TCL and nobody wanted to have this stuff in the database engine. The solution to the problem was to make the language interface so flexible that basically any language can be integrated with PostgreSQL easily. Then, the CREATE LANGUAGE clause was born: test=# h CREATE LANGUAGE Command: CREATE LANGUAGE Description: define a new procedural language Syntax: CREATE [ OR REPLACE ] [ PROCEDURAL ] LANGUAGE name CREATE [ OR REPLACE ] [ TRUSTED ] [ PROCEDURAL ] LANGUAGE name HANDLER call_handler [ INLINE inline_handler ] [ VALIDATOR valfunction ] Nowadays, many different languages can be used to write stored procedures. The flexibility added to PostgreSQL back in the early days has really paid off, and so you can choose from a rich set of programming languages. How exactly does PostgreSQL handle languages? If you take a look at the syntax of the CREATE LANGUAGE clause, you will see a couple of keywords: HANDLER: This function is actually the glue between PostgreSQL and any external language you want to use. It is in charge of mapping PostgreSQL data structures to whatever is needed by the language and helps to pass the code around. VALIDATOR: This is the policeman of the infrastructure. If it is available, it will be in charge of delivering tasty syntax errors to the end user. Many languages are able to parse the code before actually executing it. PostgreSQL can use that and tell you whether a function is correct or not when you create it. Unfortunately, not all languages can do this, so in some cases, you will still be left with problems showing up at runtime. INLINE: If it is present, PostgreSQL will be able to run anonymous code blocks in this function. The anatomy of a stored procedure Before actually digging into a specific language, I want to talk a bit about the anatomy of a typical stored procedure. For demo purposes, I have written a function that just adds up two numbers: test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS ' SELECT $1 + $2; ' LANGUAGE 'sql'; CREATE FUNCTION The first thing you can see is that the procedure is written in SQL. PostgreSQL has to know which language we are using, so we have to specify that in the definition. Note that the code of the function is passed to PostgreSQL as a string ('). That is somewhat noteworthy because it allows a function to become a black box to the execution machinery. In other database engines, the code of the function is not a string but is directly attached to the statement. This simple abstraction layer is what gives the PostgreSQL function manager all its power. Inside the string, you can basically use all that the programming language of your choice has to offer. In my example, I am simply adding up two numbers passed to the function. For this example, two integer variables are in use. The important part here is that PostgreSQL provides you with function overloading. In other words, the mysum(int, int) function is not the same as the mysum(int8, int8) function. PostgreSQL sees these things as two distinct functions. Function overloading is a nice feature; however, you have to be very careful not to accidentally deploy too many functions if your parameter list happens to change from time to time. Always make sure that functions that are not needed anymore are really deleted. The CREATE OR REPLACE FUNCTION clause will not change the parameter list. You can, therefore, use it only if the signature does not change. It will either error out or simply deploy a new function. Let's run the function: test=# SELECT mysum(10, 20); Mysum ------- 30 (1 row) The result is not really surprising. Introducing dollar quoting Passing code to PostgreSQL as a string is very flexible. However, using single quotes can be an issue. In many programming languages, single quotes show up frequently. To be able to use quotes, people have to escape them when passing the string to PostgreSQL. For many years this has been the standard procedure. Fortunately, those old times have passed by and new means to pass the code to PostgreSQL are available: test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS $$ SELECT $1 + $2; $$ LANGUAGE 'sql'; CREATE FUNCTION The solution to the problem of quoting strings is called dollar quoting. Instead of using quotes to start and end strings, you can simple use $$. Currently, I am only aware of two languages that have assigned a meaning to $$. In Perl as well as in bash scripts, $$ represents the process ID. To overcome even this little obstacle, you can use $ almost anything $ to start and end the string. The following example shows how that works: test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS $ $ SELECT $1 + $2; $ $ LANGUAGE 'sql'; CREATE FUNCTION All this flexibility allows you to really overcome the problem of quoting once and for all. As long as the start string and the end string match, there won't be any problems left. Making use of anonymous code blocks So far, you have learned to write the most simplistic stored procedures possible, and you have learned to execute code. However, there is more to code execution than just full-blown stored procedures. In addition to full-blown procedures, PostgreSQL allows the use of anonymous code blocks. The idea is to run code that is needed only once. This kind of code execution is especially useful to deal with administrative tasks. Anonymous code blocks don't take parameters and are not permanently stored in the database as they don't have a name anyway. Here is a simple example: test=# DO $$ BEGIN RAISE NOTICE 'current time: %', now(); END; $$ LANGUAGE 'plpgsql'; NOTICE: current time: 2016-12-12 15:25:50.678922+01 CONTEXT: PL/pgSQL function inline_code_block line 3 at RAISE DO In this example, the code only issues a message and quits. Again, the code block has to know which language it uses. The string is again passed to PostgreSQL using simple dollar quoting. Using functions and transactions As you know, everything that PostgreSQL exposes in user land is a transaction. The same, of course, applies if you are writing stored procedures. The procedure is always part of the transaction you are in. It is not autonomous—it is just like an operator or any other operation. Here is an example: All three function calls happen in the same transaction. This is important to understand because it implies that you cannot do too much transactional flow control inside a function. Suppose the second function call commits. What happens in such a case anyway? It cannot work. However, Oracle has a mechanism that allows for autonomous transactions. The idea is that even if a transaction rolls back, some parts might still be needed and should be kept. The classical example is as follows: Start a function to look up secret data Add a log line to the document that somebody has modified this important secret data Commit the log line but roll back the change You still want to know that somebody attempted to change data To solve problems like this one, autonomous transactions can be used. The idea is to be able to commit a transaction inside the main transaction independently. In this case, the entry in the log table will prevail while the change will be rolled back. As of PostgreSQL 9.6, autonomous transactions are not happening. However, I have already seen patches floating around that implement this feature. We will see when these features make it to the core. To give you an impression of how things will most likely work, here is a code snippet based on the first patches: AS $$ DECLARE PRAGMA AUTONOMOUS_TRANSACTION; BEGIN FOR i IN 0..9 LOOP START TRANSACTION; INSERT INTO test1 VALUES (i); IF i % 2 = 0 THEN COMMIT; ELSE ROLLBACK; END IF; END LOOP; RETURN 42; END; $$; ... The point in this example is that we can decide on the fly whether to commit or to roll back the autonomous transaction. More about stored procedures - on how to write them in different languages and how you can improve their performance - can be found in the book Mastering PostgreSQL 9.6. Make sure you order your copy now!

0
0
11180

article-image-sequence-generator-transformation-informatica-powercenter

Savia Lobo

12 Dec 2017

7 min read

How to integrate Sequence Generator transformation in Informatica PowerCenter 10.x

Savia Lobo

12 Dec 2017

7 min read

0
0
5392

article-image-working-targets-informatica-powercenter-10-x

Savia Lobo

11 Dec 2017

8 min read

Working with Targets in Informatica PowerCenter 10.x

Savia Lobo

11 Dec 2017

8 min read

0
0
7162

article-image-introducing-generative-adversarial-networks

Amey Varangaonkar

11 Dec 2017

6 min read

Implementing a simple Generative Adversarial Network (GANs)

Amey Varangaonkar

11 Dec 2017

6 min read

[box type="note" align="" class="" width=""]The following excerpt is taken from Chapter 2 - Learning Features with Unsupervised Generative Networks of the book Deep Learning with Theano, written by Christopher Bourez. This book talks about modeling and training effective deep learning models with Theano, a popular Python-based deep learning library. [/box] In this article, we introduce you to the concept of Generative Adversarial Networks, a popular class of Artificial Intelligence algorithms used in unsupervised machine learning. Code files for this particular chapter are available for download towards the end of the post. Generative adversarial networks are composed of two models that are alternatively trained to compete with each other. The generator network G is optimized to reproduce the true data distribution, by generating data that is difficult for the discriminator D to differentiate from real data. Meanwhile, the second network D is optimized to distinguish real data and synthetic data generated by G. Overall, the training procedure is similar to a two-player min-max game with the following objective function: Here, x is real data sampled from real data distribution, and z the noise vector of the generative model. In some ways, the discriminator and the generator can be seen as the police and the thief: to be sure the training works correctly, the police is trained twice as much as the thief. Let's illustrate GANs with the case of images as data. In particular, let's again take our example from Chapter 2, Classifying Handwritten Digits with a Feedforward Network about MNIST digits, and consider training a generative adversarial network, to generate images, conditionally on the digit we want. The GAN method consists of training the generative model using a second model, the discriminative network, to discriminate input data between real and fake. In this case, we can simply reuse our MNIST image classification model as discriminator, with two classes, real or fake, for the prediction output, and also condition it on the label of the digit that is supposed to be generated. To condition the net on the label, the digit label is concatenated with the inputs: def conv_cond_concat(x, y): return T.concatenate([x, y*T.ones((x.shape[0], y.shape[1], x.shape[2], x.shape[3]))], axis=1) def discrim(X, Y, w, w2, w3, wy): yb = Y.dimshuffle(0, 1, 'x', 'x') X = conv_cond_concat(X, yb) h = T.nnet.relu(dnn_conv(X, w, subsample=(2, 2), border_mode=(2, 2)), alpha=0.2 ) h = conv_cond_concat(h, yb) h2 = T.nnet.relu(batchnorm(dnn_conv(h, w2, subsample=(2, 2), border_mode=(2, 2))), alpha=0.2) h2 = T.flatten(h2, 2) h2 = T.concatenate([h2, Y], axis=1) h3 = T.nnet.relu(batchnorm(T.dot(h2, w3))) h3 = T.concatenate([h3, Y], axis=1) y = T.nnet.sigmoid(T.dot(h3, wy)) return y Note the use of two leaky rectified linear units, with a leak of 0.2, as activation for the first two convolutions. To generate an image given noise and label, the generator network consists of a stack of deconvolutions, using an input noise vector z that consists of 100 real numbers ranging from 0 to 1: To create a deconvolution in Theano, a dummy convolutional forward pass is created, which gradient is used as deconvolution: def deconv(X, w, subsample=(1, 1), border_mode=(0, 0), conv_ mode='conv'): img = gpu_contiguous(T.cast(X, 'float32')) kerns = gpu_contiguous(T.cast(w, 'float32')) desc = GpuDnnConvDesc(border_mode=border_mode, subsample=subsample, conv_mode=conv_mode)(gpu_alloc_empty(img.shape[0], kerns.shape[1], img.shape[2]*subsample[0], img.shape[3]*subsample[1]).shape, kerns. shape) out = gpu_alloc_empty(img.shape[0], kerns.shape[1], img. shape[2]*subsample[0], img.shape[3]*subsample[1]) d_img = GpuDnnConvGradI()(kerns, img, out, desc) return d_img def gen(Z, Y, w, w2, w3, wx): yb = Y.dimshuffle(0, 1, 'x', 'x') Z = T.concatenate([Z, Y], axis=1) h = T.nnet.relu(batchnorm(T.dot(Z, w))) h = T.concatenate([h, Y], axis=1) h2 = T.nnet.relu(batchnorm(T.dot(h, w2))) h2 = h2.reshape((h2.shape[0], ngf*2, 7, 7)) h2 = conv_cond_concat(h2, yb) h3 = T.nnet.relu(batchnorm(deconv(h2, w3, subsample=(2, 2), border_mode=(2, 2)))) h3 = conv_cond_concat(h3, yb) x = T.nnet.sigmoid(deconv(h3, wx, subsample=(2, 2), border_ mode=(2, 2))) return x Real data is given by the tuple (X,Y), while generated data is built from noise and label (Z,Y): X = T.tensor4() Z = T.matrix() Y = T.matrix() gX = gen(Z, Y, *gen_params) p_real = discrim(X, Y, *discrim_params) p_gen = discrim(gX, Y, *discrim_params) Generator and discriminator models compete during adversarial learning: The discriminator is trained to label real data as real (1) and label generated data as generated (0), hence minimizing the following cost function: d_cost = T.nnet.binary_crossentropy(p_real, T.ones(p_real.shape)).mean() + T.nnet.binary_crossentropy(p_gen, T.zeros(p_gen.shape)). mean() The generator is trained to deceive the discriminator as much as possible. The training signal for the generator is provided by the discriminator network (p_gen) to the generator: g_cost = T.nnet.binary_crossentropy(p_gen,T.ones(p_gen.shape)). mean() The same as usual follows. Cost with respect to the parameters for each model is computed and training optimizes the weights of each model alternatively, with two times more the discriminator. In the case of GANs, competition between discriminator and generator does not lead to decreases in each loss. From the first epoch: To the 45th epoch: Generated examples look closer to real ones: Generative models, and especially Generative Adversarial Networks are currently the trending areas of Deep Learning. It has also found its way in a few practical applications as well. For example, a generative model can successfully be trained to generate the next most likely video frames by learning the features of the previous frames. Another popular example where GANs can be used is, search engines that predict the next likely word before it is even entered by the user, by studying the sequence of the previously entered words. If you found this excerpt useful, do check out more comprehensive coverage of popular deep learning topics in our book Deep Learning with Theano. [box type="download" align="" class="" width=""] Download files [/box]

0
0
7655

article-image-20-lessons-bias-machine-learning-systems-nips-2017

Aarthi Kumaraswamy

08 Dec 2017

9 min read

20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017

Aarthi Kumaraswamy

08 Dec 2017

9 min read

Kate Crawford is a Principal Researcher at Microsoft Research and a Distinguished Research Professor at New York University. She has spent the last decade studying the social implications of data systems, machine learning, and artificial intelligence. Her recent publications address data bias and fairness, and social impacts of artificial intelligence among others. This article attempts to bring our readers to Kate’s brilliant Keynote speech at NIPS 2017. It talks about different forms of bias in Machine Learning systems and the ways to tackle such problems. By the end of this article, we are sure you would want to listen to her complete talk on the NIPS Facebook page. All images in this article come from Kate's presentation slides and do not belong to us. The rise of Machine Learning is every bit as far reaching as the rise of computing itself. A vast new ecosystem of techniques and infrastructure are emerging in the field of machine learning and we are just beginning to learn their full capabilities. But with the exciting things that people can do, there are some really concerning problems arising. Forms of bias, stereotyping and unfair determination are being found in machine vision systems, object recognition models, and in natural language processing and word embeddings. High profile news stories about bias have been on the rise, from women being less likely to be shown high paying jobs to gender bias and object recognition datasets like MS COCO, to racial disparities in education AI systems. 20 lessons on bias in machine learning systems Interest in the study of bias in ML systems has grown exponentially in just the last 3 years. It has more than doubled in the last year alone. We are speaking different languages when we talk about bias. I.e., it means different things to different people/groups. Eg: in law, in machine learning, in geometry etc. Read more on this in the ‘What is bias?’ section below. In the simplest terms, for the purpose of understanding fairness in machine learning systems, we can consider ‘bias’ as a skew that produces a type of harm. Bias in MLaaS is harder to identify and also correct as we do not build them from scratch and are not always privy to how it works under the hood. Data is not neutral. Data cannot always be neutralized. There is no silver bullet for solving bias in ML & AI systems. There are two main kinds of harms caused by bias: Harms of allocation and harms of representation. The former takes an economically oriented view while the latter is more cultural. Allocative harm is when a system allocates or withholds certain groups an opportunity or resource. To know more, jump to the ‘harms of allocation’ section. When systems reinforce the subordination of certain groups along the lines of identity like race, class, gender etc., they cause representative harm. This is further elaborated in the ‘Harms of representation’ section. Harm can further be classified into five types: stereotyping, recognition, denigration, under-representation and ex-nomination. There are many technical approaches to dealing with the problem of bias in a training dataset such as scrubbing to neutral, demographic sampling etc among others. But they all still suffer from bias. Eg: who decides what is ‘neutral’. When we consider bias purely as a technical problem, which is hard enough, we are already missing part of the picture. Bias in systems is commonly caused by bias in training data. We can only gather data about the world we have which has a long history of discrimination. So, the default tendency of these systems would be to reflect our darkest biases. Structural bias is a social issue first and a technical issue second. If we are unable to consider both and see it as inherently socio-technical, then these problems of bias are going to continue to plague the ML field. Instead of just thinking about ML contributing to decision making in say hiring or criminal justice, we also need to think of the role of ML in the harmful representation of human identity. While technical responses to bias are very important and we need more of them, they won’t get us all the way to addressing representational harms to group identity. Representational harms often exceed the scope of individual technical interventions. Developing theoretical fixes that come from the tech world for allocational harms is necessary but not sufficient. The ability to move outside our disciplinary boundaries is paramount to cracking the problem of bias in ML systems. Every design decision has consequences and powerful social implications. Datasets reflect not only the culture but also the hierarchy of the world that they were made in. Our current datasets stand on the shoulder of older datasets building on earlier corpora. Classifications can be sticky and sometimes they stick around longer than we intend them to, even when they are harmful. ML can be deployed easily in contentious forms of categorization that could have serious repercussions. Eg: free-of-bias criminality detector that has Physiognomy at the heart of how it predicts the likelihood of a person being a criminal based on his appearance. What is bias? 14th century: an oblique or diagonal line 16th century: undue prejudice 20th century: systematic differences between the sample and a population In ML: underfitting (low variance and high bias) vs overfitting (high variance and low bias) In Law: judgments based on preconceived notions or prejudices as opposed to the impartial evaluation of facts. Impartiality underpins jury selection, due process, limitations placed on judges etc. Bias is hard to fix with model validation techniques alone. So you can have an unbiased system in an ML sense producing a biased result in a legal sense. Bias is a skew that produces a type of harm. Where does bias come from? Commonly from Training data. It can be incomplete, biased or otherwise skewed. It can draw from non-representative samples that are wholly defined before use. Sometimes it is not obvious because it was constructed in a non-transparent way. In addition to human labeling, other ways that human biases and cultural assumptions can creep in ending up in exclusion or overrepresentation of subpopulation. Case in point: stop-and-frisk program data used as training data by an ML system. This dataset was biased due to systemic racial discrimination in policing. Harms of allocation Majority of the literature understand bias as harms of allocation. Allocative harm is when a system allocates or withholds certain groups, an opportunity or resource. It is an economically oriented view primarily. Eg: who gets a mortgage, loan etc. Allocation is immediate, it is a time-bound moment of decision making. It is readily quantifiable. In other words, it raises questions of fairness and justice in discrete and specific transactions. Harms of representation It gets tricky when it comes to systems that represent society but don't allocate resources. These are representational harms. When systems reinforce the subordination of certain groups along the lines of identity like race, class, gender etc. It is a long-term process that affects attitudes and beliefs. It is harder to formalize and track. It is a diffused depiction of humans and society. It is at the root of all of the other forms of allocative harm. 5 types of allocative harms Source: Kate Crawford’s NIPS 2017 Keynote presentation: Trouble with Bias Stereotyping 2016 paper on word embedding that looked at Gender stereotypical associations and the distances between gender pronouns and occupations. Google translate swaps the genders of pronouns even in a gender-neutral language like Turkish Recognition When a group is erased or made invisible by a system In a narrow sense, it is purely a technical problem. i.e., does a system recognize a face inside an image or video? Failure to recognize someone’s humanity. In the broader sense, it is about respect, dignity, and personhood. The broader harm is whether the system works for you. Eg: system could not process darker skin tones, Nikon’s camera s/w mischaracterized Asian faces as blinking, HP's algorithms had difficulty recognizing anyone with a darker shade of pale. Denigration When people use culturally offensive or inappropriate labels Eg: autosuggestions when people typed ‘jews should’ Under-representation An image search of 'CEOs' yielded only one woman CEO at the bottom-most part of the page. The majority were white male. ex-nomination Technical responses to the problem of biases Improve accuracy Blacklist Scrub to neutral Demographics or equal representation Awareness Politics of classification Where did identity categories come from? What if bias is a deeper and more consistent issue with classification? Source: Kate Crawford’s NIPS 2017 Keynote presentation: Trouble with Bias The fact that bias issues keep creeping into our systems and manifesting in new ways, suggests that we must understand that classification is not simply a technical issue but a social issue as well. One that has real consequences for people that are being classified. There are two themes: Classification is always a product of its time We are currently in the biggest experiment of classification in human history Eg: labeled faces in the wild dataset has 77.5% men, and 83.5% white. An ML system trained on this dataset will work best for that group. What can we do to tackle these problems? Start working on fairness forensics Test our systems: eg: build pre-release trials to see how a system is working across different populations How do we track the life cycle of a training dataset to know who built it and what the demographics skews might be in that dataset Start taking interdisciplinarity seriously Working with people who are not in our field but have deep expertise in other areas Eg: FATE (Fairness Accountability Transparency Ethics) group at Microsoft Research Build spaces for collaboration like the AI now institute. Think harder on the ethics of classification The ultimate question for fairness in machine learning is this. Who is going to benefit from the system we are building? And who might be harmed?

0
0
7730

article-image-how-to-implement-discrete-convolution-on-a-2d-dataset

Pravin Dhandre

08 Dec 2017

7 min read

How to implement discrete convolution on a 2D dataset

Pravin Dhandre

08 Dec 2017

7 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book by Rodolfo Bonnin titled Machine Learning for Developers. Surprisingly the question frequently asked by developers across the globe is, “How do I get started in Machine Learning?”. One reason could be attributed to the vastness of the subject area. This book is a systematic guide teaching you how to implement various Machine Learning techniques and their day-to-day application and development. [/box] In the tutorial given below, we have implemented convolution in a practical example to see it applied to a real image and get intuitive ideas of its effect.We will use different kernels to detect high-detail features and execute subsampling operation to get optimized and brighter image. This is a simple intuitive implementation of discrete convolution concept by applying it to a sample image with different types of kernel. Let's import the required libraries. As we will implement the algorithms in the clearest possible way, we will just use the minimum necessary ones, such as NumPy: import matplotlib.pyplot as plt import imageio import numpy as np Using the imread method of the imageio package, let's read the image (imported as three equal channels, as it is grayscale). We then slice the first channel, convert it to a floating point, and show it using matplotlib: arr = imageio.imread("b.bmp") [:,:,0].astype(np.float) plt.imshow(arr, cmap=plt.get_cmap('binary_r')) plt.show() Now it's time to define the kernel convolution operation. As we did previously, we will simplify the operation on a 3 x 3 kernel in order to better understand the border conditions. apply3x3kernel will apply the kernel over all the elements of the image, returning a new equivalent image. Note that we are restricting the kernels to 3 x 3 for simplicity, and so the 1 pixel border of the image won't have a new value because we are not taking padding into consideration: class ConvolutionalOperation: def apply3x3kernel(self, image, kernel): # Simple 3x3 kernel operation newimage=np.array(image) for m in range(1,image.shape[0]-2): for n in range(1,image.shape[1]-2): newelement = 0 for i in range(0, 3): for j in range(0, 3): newelement = newelement + image[m - 1 + i][n - 1+ j]*kernel[i][j] newimage[m][n] = newelement return (newimage) As we saw in the previous sections, the different kernel configurations highlight different elements and properties of the original image, building filters that in conjunction can specialize in very high-level features after many epochs of training, such as eyes, ears, and doors. Here, we will generate a dictionary of kernels with a name as the key, and the coefficients of the kernel arranged in a 3 x 3 array. The Blur filter is equivalent to calculating the average of the 3 x 3 point neighborhood, Identity simply returns the pixel value as is, Laplacian is a classic derivative filter that highlights borders, and then the two Sobel filters will mark horizontal edges in the first case, and vertical ones in the second case: kernels = {"Blur":[[1./16., 1./8., 1./16.], [1./8., 1./4., 1./8.], [1./16., 1./8., 1./16.]] ,"Identity":[[0, 0, 0], [0., 1., 0.], [0., 0., 0.]] ,"Laplacian":[[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]] ,"Left Sobel":[[1., 0., -1.], [2., 0., -2.], [1., 0., -1.]] ,"Upper Sobel":[[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]]} Let's generate a ConvolutionalOperation object and generate a comparative kernel graphical chart to see how they compare: conv = ConvolutionalOperation() plt.figure(figsize=(30,30)) fig, axs = plt.subplots(figsize=(30,30)) j=1 for key,value in kernels.items(): axs = fig.add_subplot(3,2,j) out = conv.apply3x3kernel(arr, value) plt.imshow(out, cmap=plt.get_cmap('binary_r')) j=j+1 plt.show() <matplotlib.figure.Figure at 0x7fd6a710a208> In the final image you can clearly see how our kernel has detected several high-detail features on the image—in the first one, you see the unchanged image because we used the unit kernel, then the Laplacian edge finder, the left border detector, the upper border detector, and then the blur operator: Having reviewed the main characteristics of the convolution operation for the continuous and discrete fields, we can conclude by saying that, basically, convolution kernels highlight or hide patterns. Depending on the trained or (in our example) manually set parameters, we can begin to discover many elements in the image, such as orientation and edges in different dimensions. We may also cover some unwanted details or outliers by blurring kernels, for example. Additionally, by piling layers of convolutions, we can even highlight higher-order composite elements, such as eyes or ears. This characteristic of convolutional neural networks is their main advantage over previous data-processing techniques: we can determine with great flexibility the primary components of a certain dataset, and represent further samples as a combination of these basic building blocks. Now it's time to look at another type of layer that is commonly used in combination with the former—the pooling layer. Subsampling operation (pooling) The subsampling operation consists of applying a kernel (of varying dimensions) and reducing the extension of the input dimensions by dividing the image into mxn blocks and taking one element representing that block, thus reducing the image resolution by some determinate factor. In the case of a 2 x 2 kernel, the image size will be reduced by half. The most well-known operations are maximum (max pool), average (avg pool), and minimum (min pool). The following image gives you an idea of how to apply a 2 x 2 maxpool kernel, applied to a one-channel 16 x 16 matrix. It just maintains the maximum value of the internal zone it covers: Now that we have seen this simple mechanism, let's ask ourselves, what's the main purpose of it? The main purpose of subsampling layers is related to the convolutional layers: to reduce the quantity and complexity of information while retaining the most important information elements. In other word, they build a compact representation of the underlying information. Now it's time to write a simple pooling operator. It's much easier and more direct to write than a convolutional operator, and in this case we will only be implementing max pooling, which chooses the brightest pixel in the 4 x 4 vicinity and projects it to the final image: class PoolingOperation: def apply2x2pooling(self, image, stride): # Simple 2x2 kernel operation newimage=np.zeros((int(image.shape[0]/2),int(image.shape[1]/2)),np.float32) for m in range(1,image.shape[0]-2,2): for n in range(1,image.shape[1]-2,2): newimage[int(m/2),int(n/2)] = np.max(image[m:m+2,n:n+2]) return (newimage) Let's apply the newly created pooling operation, and as you can see, the final image resolution is much more blocky, and the details, in general, are brighter: plt.figure(figsize=(30,30)) pool=PoolingOperation() fig, axs = plt.subplots(figsize=(20,10)) axs = fig.add_subplot(1,2,1) plt.imshow(arr, cmap=plt.get_cmap('binary_r')) out=pool.apply2x2pooling(arr,1) axs = fig.add_subplot(1,2,2) plt.imshow(out, cmap=plt.get_cmap('binary_r')) plt.show() Here you can see the differences, even though they are subtle. The final image is of lower precision, and the chosen pixels, being the maximum of the environment, produce a brighter image: This simple implementation with various kernels simplified the working mechanism of discrete convolution operation on a 2D dataset. Using various kernels and subsampling operation, the hidden patterns of dataset are unveiled and the image is made more sharpened, with maximum pixels and much brighter image thereby producing compact representation of the dataset. If you found this article interesting, do check Machine Learning for Developers and get to know about the advancements in deep learning, adversarial networks, popular programming frameworks to prepare yourself in the ubiquitous field of machine learning.

0
0
2577

article-image-3-ways-use-structures-machine-learning-lise-getoor-nips-2017

Sugandha Lahoti

08 Dec 2017

11 min read

3 great ways to leverage Structures for Machine Learning problems by Lise Getoor at NIPS 2017

Sugandha Lahoti

08 Dec 2017

11 min read

0
0
2168

article-image-top-research-papers-showcased-nips-2017-part-1

Sugandha Lahoti

07 Dec 2017

6 min read

Top Research papers showcased at NIPS 2017 - Part 1

Sugandha Lahoti

07 Dec 2017

6 min read

The ongoing 31st annual Conference on Neural Information Processing Systems (NIPS 2017) in Long Beach, California is scheduled from December 4-9, 2017. The 6-day conference is hosting a number of invited talks, demonstrations, tutorials, and paper presentations pertaining to the latest in machine learning, deep learning and AI research. This year the conference has grown larger than life with a record-high 3,240 papers, 678 selected ones, and a completely sold-out event. Top tech members from Google, Microsoft, IBM, DeepMind, Facebook, Amazon, are among other prominent players who enthusiastically participated this year. Here is a quick roundup of some of the top research papers till date. Generative Adversarial Networks Generative Adversarial Networks are a hot topic of research at the ongoing NIPS conference. GANs cast out an easy way to train the DL algorithms by slashing out the amount of data required in training with unlabelled data. Here are a few research papers on GANs. Regularization can stabilize training of GANs Microsoft researchers have proposed a new regularization approach to yield a stable GAN training procedure at low computational costs. Their new model overcomes the fundamental limitation of GANs occurring due to a dimensional mismatch between the model distribution and the true distribution. This results in their density ratio and the associated f-divergence to be undefined. Their paper “Stabilizing Training of Generative Adversarial Networks through Regularization” turns GAN models into reliable building blocks for deep learning. They have also used this for several datasets including image generation tasks. AdaGAN: Boosting GAN Performance Training GANs can at times be a hard task. They can also suffer from the problem of missing modes where the model is not able to produce examples in certain regions of the space. Google researchers have developed an iterative procedure called AdaGAN in their paper “AdaGAN: Boosting Generative Models”, an approach inspired by boosting algorithms, where many potentially weak individual predictors are greedily aggregated to form a strong composite predictor. It adds a new component into a mixture model at every step by running a GAN algorithm on a re-weighted sample. The paper also addresses the problem of missing modes. Houdini: Generating Adversarial Examples The generation of adversarial examples is considered as a critical milestone for evaluating and upgrading the robustness of learning in machines. Also, current methods are confined to classification tasks and are unable to alter the performance measure of the problem at hand. In order to tackle such an issue, Facebook researchers have come up with a research paper titled “Houdini: Fooling Deep Structured Prediction Models”, a novel and a flexible approach for generating adversarial examples distinctly tailormade for the final performance measure of the task taken into account (combinational and non-decomposable tasks). Stochastic hard-attention for Memory Addressing in GANs DeepMind researchers showcased a new method which uses stochastic hard-attention to retrieve memory content in generative models. Their paper titled “Variational memory addressing in generative models” was presented at the 2nd day of the conference and is an advancement over the popular differentiable soft-attention mechanism. Their new technique allows developers to apply variational inference to memory addressing. This leads to more precise memory lookups using target information, especially in models with large memory buffers and with many confounding entries in the memory. Image and Video Processing A lot of hype was also around developing sophisticated models and techniques for image and video processing. Here is a quick glance at the top presentations. Fader Networks: Image manipulation through disentanglement Facebook researchers have introduced Fader Networks, in their paper titled “Fader Networks: Manipulating Images by Sliding Attributes”. These fader networks use an encoder-decoder architecture to reconstruct images by disentangling their salient information and the values of particular attributes directly in a latent space. Disentanglement helps in manipulating these attributes to generate variations of pictures of faces while preserving their naturalness. This innovative approach results in much simpler training schemes and scales for manipulating multiple attributes jointly. Visual interaction networks for Video simulation Another paper titled “Visual interaction networks: Learning a physics simulator from video Tuesday” proposes a new neural-network model to learn physical objects without prior knowledge. Deepmind’s Visual Interaction Network is used for video analysis and is able to infer the states of multiple physical objects from just a few frames of video. It then uses these to predict object positions many steps into the future. It can also deduce the locations of invisible objects. Transfer, Reinforcement, and Continual Learning A lot of research is going on in the field of Transfer, Reinforcement, and Continual learning to make stable and powerful deep learning models. Here are a few research papers presented in this domain. Two new techniques for Transfer Learning Currently, a large set of input/output (I/O) examples are required for learning any underlying input-output mapping. By leveraging information based on the related tasks, the researchers at Microsoft have addressed the problem of data and computation efficiency of program induction. Their paper “Neural Program Meta-Induction” uses two approaches for cross-task knowledge transfer. First is Portfolio adaption, where a set of induction models is pretrained on a set of related tasks, and the best model is adapted towards the new task using transfer learning. The second one is Meta program induction, a k-shot learning approach which makes a model generalize itself to new tasks without requiring any additional training. Hybrid Reward Architecture to solve the problem of generalization in Reinforcement Learning A new paper from Microsoft “Hybrid Reward Architecture for Reinforcement Learning” highlights a new method to address the generalization problem faced by a typical deep RL method. Hybrid Reward Architecture (HRA) takes a decomposed reward function as the input and learns a separate value function for each component reward function. This is especially useful in domains where the optimal value function cannot easily be reduced to a low-dimensional representation. In the new approach, the overall value function is much smoother and can be easier approximated by a low-dimensional representation, enabling more effective learning. Gradient Episodic Memory to counter catastrophic forgetting in continual learning models Continual learning is all about improving the ability of models to solve sequential tasks without forgetting previously acquired knowledge. In the paper “Gradient Episodic Memory for Continual Learning”, Facebook researchers have proposed a set of metrics to evaluate models over a continuous series of data. These metrics characterize models by their test accuracy and the ability to transfer knowledge across tasks. They have also proposed a model for continual learning, called Gradient Episodic Memory (GEM) that reduces the problem of catastrophic forgetting. They have also experimented with variants of the MNIST and CIFAR-100 datasets to demonstrate the performance of GEM when compared to other methods. In our next post, we will cover a selection of papers presented so far at NIPS 2017 in the areas of Predictive Modelling, Machine Translation, and more. For live content coverage, you can visit NIPS’ Facebook page.

0
0
3178

How-To Tutorials - Data

NIPS 2017 Special: How machine learning for genomics is bridging the gap between research and clinical trial success by Brendan Frey

Getting started with big data analysis using Google's PageRank algorithm

How to build a cold-start friendly content-based recommender using Apache Spark SQL

Tinkering with ticks in Matplotlib 2.0

How to Customize lines and markers in Matplotlib 2.0

NIPS 2017 Special: 6 Key Challenges in Deep Learning for Robotics by Pieter Abbeel

3 simple steps to convert a mathematical formula to a model

How to write effective Stored Procedures in PostgreSQL

How to integrate Sequence Generator transformation in Informatica PowerCenter 10.x

Working with Targets in Informatica PowerCenter 10.x

Trending Topics

Implementing a simple Generative Adversarial Network (GANs)

20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017

How to implement discrete convolution on a 2D dataset

3 great ways to leverage Structures for Machine Learning problems by Lise Getoor at NIPS 2017

Top Research papers showcased at NIPS 2017 - Part 1