Data | 0 articles | Tech News, Tutorials & Expert Insights

21 Mar 2018

5 min read

What is Meta Learning?

21 Mar 2018

Meta Learning, an original concept of cognitive psychology, is now applied to machine learning techniques. If we go by the social psychology definition, meta learning is the state of being aware of and taking control of one's own learning. Similar concepts, when applied to the machine learning theory states that a meta learning algorithm uses prior experience to change certain aspects of an algorithm, such that the modified algorithm is better than the original algorithm. To explain in simple terms, meta-learning is how the algorithm learns how to learn. Meta Learning: Making a versatile AI agent Current AI Systems excel at mastering a single skill, playing Go, holding human-like conversations, predicting a disaster, etc. However, now that AI and machine learning is possibly being integrated in everyday tasks, we need a single AI system to solve a variety of problems. Currently, a Go Player, will not be able to navigate the roads or find new places. Or an AI navigation controller won’t be able to hold a perfect human-like conversation. What machine learning algorithms need to do is develop versatility – the capability of doing many different things. Versatility is achieved by intelligent amalgamation of Meta Learning along with related techniques such as reinforcement learning (finding suitable actions to maximize a reward), transfer learning (re-purposing a trained model for a specific task on a second related task), and active learning (learning algorithm chooses the data it wants to learn from). Such different learning techniques provides an AI agent with the brains to do multiple tasks without the need to learn every new task from scratch. Thereby making it capable of adapting intelligently to a wide variety of new, unseen situations. Apart from creating versatile agents, recent researches also focus on using meta learning for hyperparameter and neural network optimization, fast reinforcement learning, finding good network architectures and for specific cases such as few-shot image recognition. Using Meta Learning, AI agents learn how to learn new tasks by reusing prior experience, rather than examining each new task in isolation. Various approaches to Meta Learning algorithms A wide variety of approaches come under the umbrella of Meta-Learning. Let's have a quick glance at these algorithms and techniques: Algorithm Learning (selection) Algorithm selection or learning, selects learning algorithms on the basis of characteristics of the instance. For example, you have a set of ML algos (Random Forest, SVM, DNN), data sets as the instances and the error rate as the cost metric. Now, the goal of Algorithm Selection is to predict which machine learning algorithm will have a small error on each data set. Hyper-parameter Optimization Many machine learning algorithms have numerous hyper-parameters that can be optimized. The choice of selecting these hyper-parameters for learning algorithms determines how well the algorithm learns. A recent paper, "Evolving Deep Neural Networks", provides a meta learning algorithm for optimizing deep learning architectures through evolution. Ensemble Methods Ensemble methods usually combine several models or approaches to achieve better predictive performance. There are 3 basic types – Bagging, Boosting, and Stacked Generalization. In Bagging, each model runs independently and then aggregates the outputs at the end without preference to any model. Boosting refers to a group of algorithms that utilize weighted averages to make weak learners into stronger learners. Boosting is all about “teamwork”. Stacked generalization, has a layered architecture. Each set of base-classifiers is trained on a dataset. Successive layers receive as input the predictions of the immediately preceding layer and the output is passed on to the next layer. A single classifier at the topmost level produces the final prediction. Dynamic bias selection In Dynamic Bias selection, we adjust the bias of the learning algorithm dynamically to suit the new problem instance. The performance of a base learner can trigger the need to explore additional hypothesis spaces, normally through small variations of the current hypothesis space. The bias selection can either be a form of data variation or a time-dependent feature. Inductive Transfer Inductive transfer describes learning using previous knowledge from related tasks. This is done by transferring meta-knowledge across domains or tasks; a process known as inductive transfer. The goal here is to incorporate the meta-knowledge into the new learning task rather than matching meta-features with a meta-knowledge base. Adding Enhancements to Meta Learning algorithms Supervised meta-learning: When the meta-learner is trained with supervised learning. In supervised learning we have both input and output variables and the algorithm learns the mapping function from the input to the output. RL meta-learning: This algorithm talks about using standard deep RL techniques to train a recurrent neural network in such a way that the recurrent network can then implement its own Reinforcement learning procedure. Model-agnostic meta-learning: MAML trains over a wide range of tasks, for a representation that can be quickly adapted to a new task, via a few gradient steps. The meta-learner seeks an initialization that is not only useful for adapting to various problems, but also can be adapted quickly. The ultimate goal of any meta learning algorithm and its variations is to be fully self-referential. This means it can automatically inspect and improve every part of its own code. A regenerative meta learning algorithm, on the lines of how a lizard regenerates its limbs, would not only blur the distinction between the different variations as described above but will also lead to better future performance and versatility of machine learning algorithms.

0
0
5824

article-image-developers-lives-matter-chinese-developers-protest-over-the-996-work-schedule-on-github

Natasha Mathur

29 Mar 2019

3 min read

'Developers' lives matter': Chinese developers protest over the “996 work schedule” on GitHub

Natasha Mathur

29 Mar 2019

3 min read

Working long hours at a company, devoid of any work-life balance, is rife in China’s tech industry. Earlier this week on Tuesday, a Github user with the name “996icu” created a webpage that he shared on GitHub, to protest against the “996” work culture in Chinese tech companies. The “996” work culture is an unofficial work schedule that requires employees to work from 9 am to 9 pm, 6 days a week, totaling up to 60 hours of work per week. The 99icu webpage mentions the Labor Law of the People’s Republic of China, according to which, an employer can ask its employees to work long hours due to needs of production or businesses. But, the work time to be prolonged should not exceed 36 hours a month. Also, as per the Labor Law, employees following the "996" work schedule should be paid 2.275 times of their base salary. However, this is not the case in reality and Chinese employees following the 996 work rule rarely get paid that much. GitHub users also called out to companies like Youzan and Jingdong, who both follow the 996 work rule. The webpage cites example of a Jingdong PR who posted on their maimai ( Chinese business social network) account that "(Our culture is to devote ourselves with all our hearts (to achieve the business objectives)". 996 work schedule started to gain popularity in recent years but has been a “secret practice” for quite a while. The 996icu webpage went viral online and ranked first on GitHub’s trending page on Thursday. It currently has amassed more than 90,000 stars (a post bookmarking tool). The post is also being widely shared on Chinese social media platforms such as Weibo and WeChat, where many users are talking about their experiences as tech workers who followed the 996 schedule. This gladiatorial work environment in Chinese firms has long been a bone of contention. South China Morning Post writer Zheping Huang published a post sharing stories of different Chinese tech employees who shed light on the grotesque reality of China’s Silicon Valley. One such example is of a 33-year-old Beijing native, Yang, who works as a product manager in a Chinese internet company. Yang wakes up at 6 am every day to get through a two-and-a-half-hour commute to reach work. Another example is of Bu, a 20-something marketing specialist who relocated to an old complex near her workplace. She pays high rent, shares room with two other women, and no longer has access to coffee shops or good restaurants. A user named “discordance” on Hacker News commented regarding the GitHub protest, asking developers in China to move to better companies. “Leave your company, take your colleagues and start one with better conditions. You are some of the best engineers I've worked with and deserve better”. Another user “ceohockey60” commented: “The Chinese colloquial term for a developer is "码农". Its literal English translation is "code peasants" -- not the most flattering or respectful way to call software engineers. I've recently heard horror stories, where 9-9-6 is no longer enough inside one of the Chinese tech giants, and 10-10-7 is expected (10am-10pm, 7 days/week)”. The 996icu webpage states that people who “consistently follow the "996" work schedule.. run the risk of getting..into the Intensive Care Unit. Developers' lives matter”. What the US-China tech and AI arms race means for the world – Frederick Kempe at Davos 2019 China’s Huawei technologies accused of stealing Apple’s trade secrets, reports The Information Is China’s facial recognition powered airport kiosks an attempt to invade privacy via an easy flight experience

0
0
5804

article-image-google-universal-transformers-extension-standard-translation-system

Fatema Patrawala

22 Aug 2018

4 min read

Google Brain’s Universal Transformers: an extension to its standard translation system

Fatema Patrawala

22 Aug 2018

4 min read

0
0
5700

article-image-googles-cloud-robotics-platform-to-be-launched-in-2019-will-combine-the-power-of-ai-robotics-and-the-cloud

Melisha Dsouza

25 Oct 2018

3 min read

Google’s Cloud Robotics platform, to be launched in 2019, will combine the power of AI, robotics and the cloud

Melisha Dsouza

25 Oct 2018

3 min read

Earlier this week, Google announced its plans to launch a ‘Cloud Robotics platform’ for developers in 2019. Since the early onset of ‘cloud robotics’ in the year 2010, Google has explored various aspects of the cloud robotics field. Now, with the launch of Cloud Robotics platform, Google will combine the power of AI, robotics and the cloud to deploy cloud-connected collaborative robots. The platform will encourage efficient robotic automation in highly dynamic environments. The core infrastructure of the Platform will be open source and users will pay only for what services they use. Features of Cloud Robotics platform: #1 Critical infrastructure The platform will introduce secure and robust connectivity between robots and the cloud. Kubernetes will be used for the management and distribution of digital assets. Stackdriver will assist with the logging, monitoring, alerting, and dashboarding processes. Developers will gain access to Google’s data management and AI capabilities, ranging from Cloud Bigtable to Cloud AutoML. The standardized data types and open APIs will help developers build reusable automation components. Moreover, open APIs support interoperability, which means integrators can compose end-to-end solutions with collaborative robots from different vendors. #2 Specialized tools The tools provided with this platform will help developers to build, test, and deploy software for robots with ease. Composing and deploying automation solutions in customers’ environments through system integrators can be done easily. Operators can monitor robot fleets and ongoing missions, as well. Plus, users have to only pay for the services they use. That being said, if a user decides to move to another cloud provider, they can take their data with them! #3 Fostering powerful first-party services and third-partyy innovation Google’s initial Cloud Robotics services can be applied to various use cases like robot localization and object tracking. The services will process sensor data from multiple sources and use machine learning to obtain information and insights about the state of the physical world. It will encourage an ecosystem of hardware, and applications, that can be used and re-used for collaborative automation. #4 Industrial Automation made easy Industrial automation requires extensive custom integration. Collaborative robots can help improve flexibility of the overall process. It will help save costs and vendor lock ins. That being said, it is difficult to program robots to understand and react to the unpredictable changes of the physical human world. The Google Cloud platform will solve these issues by providing flexible automation services like Cartographer service, Spatial Intelligence service and Object Intelligence service Watch this video to know more about these services: https://www.youtube.com/watch?v=eo8MzGIYGzs&feature=youtu.be Alternatively, head over to Google's Blog to know more about this announcement. What’s new in Google Cloud Functions serverless platform Cloud Filestore: A new high performance storage option by Google Cloud Platform Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

0
0
5635

article-image-google-open-sources-active-question-answering-activeqa-a-reinforcement-learning-based-qa-system

Natasha Mathur

15 Oct 2018

3 min read

Google open sources Active Question Answering (ActiveQA), a Reinforcement Learning based Q&A system

Natasha Mathur

15 Oct 2018

3 min read

Google announced last week, that it’s open-sourcing Active Question Answering (ActiveQA), a research project that involves training artificial agents for question answering using reinforcement learning. As this research project is now open source, Google has released a TensorFlow package for ActiveQA system. The latest TensorFlow ActiveQA package comprises three main components along with the code necessary to train and run the ActiveQA agent. First component is a pre-trained sequence to sequence model which takes a question as an input and returns its reformulations. Second component is an answer selection model that uses a convolutional neural network and gives a score to each triplet of the original question, reformulation, and answer. The selector makes use of the pre-trained, and publicly available word embeddings (GloVe). Third component is a question answering system (the environment) that uses BiDAF, a popular question answering system.The TensorFlow package also consists of all the code that is necessary to train and run the ActiveQA agent. “ActiveQA system.. learns to ask questions that lead to good answers. However, because training data in the form of question pairs, with an original question and a more successful variant, is not readily available, ActiveQA uses reinforcement learning, an approach to machine learning concerned with training agents so that they take actions that maximize a reward, while interacting with an environment”, reads the Google AI blog. This concept of ActiveQA was first Introduced in Google’s ICLR 2018 paper “Ask the Right Questions: Active Question Reformulation with Reinforcement Learning”. ActiveQA is far different in its approach than the traditional QA systems.Traditional QA systems make use of supervised learning techniques that are used along with labeled data to train a system. This system is capable of answering the arbitrary input questions, however, it doesn’t come with an ability to deal with uncertainty as humans would. For instance, It is not able to reformulate the questions, issue multiple searches, and evaluate the responses. This leads to poor quality answers. ActiveQA, on the other hand, comprises an agent that consults the QA system repeatedly. This agent reformulates the original question many times which helps it select the best answer. Each of the questions reformulated is evaluated on the basis of how good the corresponding answer to that question is. If the corresponding answer is good, then the learning algorithm adjusts the model’s parameters accordingly. So, the question reformulation that led to the right answer would more likely be generated again. The ActiveQA approach allows the agent to involve in a dynamic interaction with the QA system, which leads to better quality of the returned answers. ActiveQA As per an example mentioned by Google, if you consider a question “When was Tesla born?”. The agent will reformulate the question in two different ways. One of them being “When is Tesla’s birthday” and the other one as “Which year was Tesla born”. This will help it retrieve the answers to both of the questions from the QA system. Once the systems use all this information, it collectively returns the answer as “July 10, 1856”. ActiveQA “We envision that this research will help us design systems that provide better and more interpretable answers, and hope it will help others develop systems that can interact with the world using natural language”, mentions Google. For more information, read the official Google AI blog. Google, Harvard researchers build a deep learning model to forecast earthquake aftershocks location with over 80% accuracy Google strides forward in deep learning: open sources Google Lucid to answer how neural networks make decisions Google moving towards data centers with 24/7 carbon-free energy

0
0
5628

article-image-michelangelo-pyml-introducing-ubers-platform-for-rapid-machine-learning-development

Amey Varangaonkar

25 Oct 2018

3 min read

Michelangelo PyML: Introducing Uber’s platform for rapid machine learning development

Amey Varangaonkar

25 Oct 2018

3 min read

Transportation network giants Uber have developed Michelangelo PyML - a Python-powered platform for rapid prototyping of machine learning models. The aim of this platform is to offer machine learning as a service that democratizes machine learning and makes it possible to scale the AI models to meet business needs efficiently. Michelangelo PyML is an integration of Michelangelo - which Uber developed for large-scale machine learning in 2017. This will make it possible for their data scientists and engineers to build intelligent Python-based models that run at scale for online as well as offline tasks. Why Uber chose PyML for Michelangelo Uber developed Michelangelo in September 2017 with a clear focus of high performance and scalability. It currently enables Uber’s product teams to design, build, deploy and maintain machine learning solutions at scale and powers roughly close to 1 million predictions per second. However, that also came at the cost of flexibility. Users mainly were faced with 2 critical issues: It was possible to train the models using the algorithms that were only natively supported by Michelangelo. To run unsupported algorithms, the platform’s capability had to be extended to include additional training and deployment components. This caused a lot of inconvenience at times. The users could not use any feature transformations apart from those offered by Michelangelo’s DSL (Domain Specific Language) Apart from these constraints, Uber also observed that data scientists usually preferred Python over other programming language, given the rich suite of libraries and frameworks available in Python for effective analytics and machine learning. Also, many data scientists gathered and worked with data locally using tools such as pandas, scikit-learn and Tensorflow, as opposed to Big Data tools such as Apache Spark and Hive, while spending hours in setting them up. How PyML improves Michelangelo Based on the challenges faced in using Michelangelo, Uber decided to revamp the platform by integrating PyML to make it more flexible. PyML provides a concrete framework for data scientists to build and train machine learning models that can be deployed quickly, safely and reliably across different environments. This, without any restriction on the types of data they can use or the algorithms they can choose to build the model, makes it an ideal choice of tool to integrate with a platform like Michelangelo. By integrating Python-based models that can operate at scale with Michelangelo, Uber will now be able to handle online as well as offline queries and give smart predictions quite easily. This could be a potential masterstroke by Uber, as they try to boost their business and revenue growth after it slowed down over the last year. Read more Why did Uber created Hudi, an open source incremental processing framework on Apache Hadoop? Uber’s Head of corporate development, Cameron Poetzscher, resigns following a report on a 2017 investigation into sexual misconduct Uber’s Marmaray, an Open Source Data Ingestion and Dispersal Framework for Apache Hadoop

0
0
5534

Sunith Shetty

01 Jun 2018

2 min read

Anaconda 5.2 releases!

Sunith Shetty

01 Jun 2018

2 min read

The Anaconda team has announced a new release of Anaconda Distribution 5.2. This new version has brought several new changes in terms of platform changes, user-facing challenges, and backend improvements. Anaconda is a free open-source distribution of Python which allows fast, easier and powerful way to perform data science and machine learning tasks. It is an efficient platform used for carrying out large-scale data processing, scientific computing and more. With over 6 million users, it includes more than 250 data science packages suitable for all major operating systems such as Windows, Linux, and macOS. Every package version is managed by the package management system conda. Some of the noteworthy changes available in Anaconda Distribution 5.2 are: Major highlights More than 100 packages have been updated or added to the new release of Anaconda Distribution 5.2 (Notable Updates includes - Qt v5.9.5, OpenSSL v1.0.2o, NumPy 1.14.3, SciPy v1.1.0, Matplotlib v2.2.2, and Pandas 0.23.0). Now Windows installers control their environment more carefully. Thus even if menu shortcuts fail to get created, it won't lead to a lot of installation issues. macOS pkg installers developer certificate is now updated to Anaconda, Inc. User-facing improvements All default channels now point to repo.anaconda.com instead of repo.continuum.io Now you have more dynamic shortcut working directory behavior thus improving Windows multi-user installations To prevent usability issues, Windows installers now disallow the characters (! % ^ =) in the installation path. Backend improvements Security fixes done for more than 20 packages based on in-depth Common Vulnerabilities and Exposures (CVE) vulnerabilities. Improved behavior of --prune because of history file being updated correctly in the conda-meta directory Windows Installer will now use a trimmed down value for PATH env var, to avoid DLL hell problems with existing software In addition to these, several new changes have been added to all x86 platforms, Linux distributions, and windows distributions. For the complete list of new changes, you can refer the release notes. In case you want to download the new version of Anaconda Distribution 5.2, you can get the file from the official page. Alternatively, you can update the current Anaconda Distribution platform to version 5.2 by using conda update conda followed by conda install anaconda=5.2. 30 common data science terms explained Data science on Windows is a big no 10 Machine Learning Tools to watch in 2018

0
0
5501

article-image-a-bitwise-study-presented-to-the-sec-reveals-that-95-of-coinmarketcaps-btc-trading-volume-report-is-fake

Savia Lobo

25 Mar 2019

2 min read

A Bitwise study presented to the SEC reveals that 95% of CoinMarketCap’s BTC trading volume report is fake

Savia Lobo

25 Mar 2019

2 min read

A recent research report by Bitwise Asset Management last week revealed that 95% of the reported trading volumes in Bitcoin by CoinMarketCap.com is fake and artificially created by unregulated exchanges. Surprisingly, this fake data came from CoinMarketCap.com, the most widely cited source for bitcoin volume and is also used by most of the major media outlets. CoinMarketCap hasn't responded yet to the findings. “Despite its widespread use, the CoinMarketCap.com data is wrong. It includes a large amount of fake and/or non-economic trading volume, thereby giving a fundamentally mistaken impression of the true size and nature of the bitcoin market”, the Bitwise report states. The report also claims that only 10 cryptocurrency exchanges have actual volume, including major names like Binance, Coinbase, Kraken, Gemini, and Bittrex. https://twitter.com/BitwiseInvest/status/1109114656944209921 Following are the key takeaways of the report: 95% of reported BTC spot volume is fake. Likely motive is listing fees (can be $1-3M) Real daily spot volume is ~$270M 10 exchanges make up almost all of the real trading volume Majority of the 10 are regulated Spreads are <0.10%. Arbitrage is super efficient CoinMarketCap.com(CMC) originally reported a combined $6 billion in average daily trading volume. However, the 226-slide presentation by Bitwise to the U.S. Securities and Exchanges Commission (SEC) revealed that only $273 million of CMC’s reported BTC trading volume was legitimate. The report also has a detailed breakdown of all the exchanges that report more than $1 million in daily trading volumes on CoinMarketCap. Matthew Hougan, the global head of Bitwise’s research division, said, “People looked at cryptocurrency and said this market is a mess; that’s because they were looking at data that was manipulated”. Bitwise also posted on its official Twitter account, “Arbitrage between the 10 real exchanges has improved significantly. The avg price deviation of any one exchange from the aggregate price is now less than 0.10%! Well below the arbitrage band considering exchange-level fees (0.10-0.30%) & hedging costs.” https://twitter.com/BitwiseInvest/status/1109114686635687936 To know more about this in detail, head over to the complete Bitwise report. 200+ Bitcoins stolen from Electrum wallet in an ongoing phishing attack Can Cryptocurrency establish a new economic world order? Crypto-cash is missing from the wallet of dead cryptocurrency entrepreneur Gerald Cotten – find it, and you could get $100,000

0
0
5446

article-image-amazon-reinvent-2018-aws-snowball-edge-comes-with-a-gpu-option-and-more-computing-power

Bhagyashree R

27 Nov 2018

2 min read

Amazon re:Invent 2018: AWS Snowball Edge comes with a GPU option and more computing power

Bhagyashree R

27 Nov 2018

2 min read

Amazon re:Invent 2018 commenced yesterday at Las Vegas. This five-day event will comprise of various sessions, chalk talks, and hackathons covering AWS core topics. Amazon is also launching several new products and making some crucial announcements. Adding to this list, yesterday, Amazon announced that AWS Snowball Edge will now come with two options: Snowball Edge Storage Optimized and Snowball Edge Compute Optimized. Snowball Edge Compute Optimized, in addition to more computing power, comes with an optional GPU support. What is AWS Snowball Edge? AWS Snowball Edge is a physical appliance that is used for data migration and edge computing. It supports specific Amazon EC2 instance types and AWS Lambda functions. With Snowball Edge, customers can develop and test in AWS. The applications can then be deployed on remote devices to collect, pre-process, and return the data. Common use cases include data migration, data transport, image collation, IoT sensor stream capture, and machine learning. What is new in Snowball Edge? Snowball Edge will now come in two options: Snowball Edge Storage Optimized: This option provides 100 TB of capacity and 24 vCPUs, well suited for local storage and large-scale data transfer. Snowball Edge Compute Optimized: There are two variations of this option, one is without GPU and the other is with GPU. Both come with 42 TB of S3-compatible storage and 7.68 TB of NVMe SSD storage. You will also be able to run any combination of the instances that consume up to 52 vCPUs and 208 GiB of memory. The main highlight here is the support for an optional GPU. With Snowball Edge with GPU, you can do things like real-time full-motion video analysis and processing, machine learning inferencing, and other highly parallel compute-intensive work. In order to gain access to the GPU, you need to launch an sbe-g instance. You can select the “with GPU” option using the console: Source: Amazon The following are the specifications of the instances: Source: Amazon You can read more about the re:Invent announcements regarding Snowball Edge on AWS website. AWS updates the face detection, analysis and recognition capabilities in Amazon Rekognition AWS announces more flexibility its Certification Exams, drops its exam prerequisites Introducing Automatic Dashboards by Amazon CloudWatch for monitoring all AWS Resources

0
0
5428

article-image-openais-gradient-checkpointing-package-makes-huge-neural-nets-fit-memory

Savia Lobo

17 Jan 2018

5 min read

OpenAI’s gradient checkpointing: A package that makes huge neural nets fit into memory

Savia Lobo

17 Jan 2018

5 min read

OpenAI releases a python/Tensorflow package, Gradient checkpointing! Gradient checkpointing lets you fit 10x larger neural nets into memory at the cost of an additional 20% computation time. The tools within this package, which is a joint development of Tim Salimans and Yaroslav Bulatov, aids in rewriting TensorFlow model for using less memory. Computing the gradient of the loss by backpropagation is the memory intensive part of training deep neural networks. By checkpointing nodes in the computation graph defined by your model, and recomputing the parts of the graph in between those nodes during backpropagation, it is possible to calculate this gradient at reduced memory cost. While training deep feed-forward neural networks, which consists of n layers, we can reduce the memory consumption to O(sqrt(n)), at the cost of performing one additional forward pass. The graph shows the amount of memory used while training TensorFlow official CIFAR10 resnet example with the regular tf.gradients function and the optimized gradient function. To see how it works, let’s take an example of a simple feed-forward neural network. In the figure above, f : The activations of the neural network layers b : Gradient of the loss with respect to the activations and parameters of these layers All these nodes are evaluated in order during forward pass and in reversed order during backward pass. The results obtained for ‘f’ nodes are required in order to compute ‘b’ nodes. Hence, after the forward pass, all the f nodes are kept in memory, and can be erased only when backpropagation has progressed far enough to have computed all dependencies, or children, of an f node. This implies that in simple backpropagation, the memory required grows linearly with the number of neural net layers n. Graph 1: Vanilla Backpropagation The graph above shows a simple vanilla backpropagation, which computes each node once. However, recomputing the nodes can save a lot of memory. For this, we can simply try recomputing every node from the forward pass as and when required. The order of execution, and the memory used, then appear as follows: Graph 2: Backpropagation with poor memory By using the strategy above, the memory required to compute gradients in our graph is constant in the number of neural network layers n, which is optimal in terms of memory. However, now the number of node evaluations scales to n^2, which was previously scaled as n. This means, each of the n nodes is recomputed on the order of n times. As a result, the computation graph becomes much slower for evaluating deep networks. This makes the method impractical for use in deep learning. To strike a balance between memory and computation, OpenAI has come up with a strategy that allows nodes to be recomputed, but not too often. The strategy used here is to mark a subset of the neural net activations as checkpoint nodes. Source: Graph with chosen checkpointed node These checkpoint nodes are kept in memory after the forward pass, while the remaining nodes are recomputed at most once. After recomputation, the non-checkpoint nodes are stored in memory until they are no longer required. For the case of a simple feed-forward neural net, all neuron activation nodes are graph separators or articulation points of the graph defined by the forward pass. This means, the nodes between a b node and the last checkpoint preceding it need to be recomputed when computing that b node during backprop. When backprop has progressed far enough to reach the checkpoint node, all nodes that were recomputed from it can be erased from memory. The order of computation and memory usage then would appear as: Graph 3: Checkpointed Backpropagation Thus, the package implements checkpointed backprop, which is implemented by taking the graph for standard/ vanilla backprop (Graph 1) and automatically rewriting it using the Tensorflow graph editor. For graphs that contain articulation points or single node graph dividers, checkpoints using the sqrt(n) strategy, giving sqrt(n)memory usage for feed-forward networks are automatically selected. For other general graphs that only contain multi-node graph separators, our implementation of checkpointed backprop still works. But currently, the checkpoints have to be selected manually by the user. Summing up, the biggest advantage of using gradient checkpointing is that it can save a lot of memory for large neural network models. But, this package has some limitations too, which are listed below. Limitations of gradient checkpointing: The provided code does all graph manipulation in python before running your model. This slows down the process for large graphs. The current algorithm for automatically selecting checkpoints is purely heuristic and is expected to fail on some models outside of the class that are tested. In such cases manual mode checkpoint selection is preferable. To know more about gradient checkpointing in detail or to have a further explanation of computation graphs, memory usage, and gradient computation strategies, Yaroslav Bulatov’s medium post on gradient-checkpointing.

0
0
5424

article-image-google-open-sources-bert-an-nlp-pre-training-technique

Prasad Ramesh

05 Nov 2018

2 min read

Google open sources BERT, an NLP pre-training technique

Prasad Ramesh

05 Nov 2018

2 min read

Google open-sourced Bidirectional Encoder Representations from Transformers (BERT) last Friday for NLP pre-training. Natural language processing (NLP) consists of topics like sentiment analysis, language translation, question answering, and other language-related tasks. Large datasets for NLP containing millions, or billions, of annotated training examples is scarce. Google says that with BERT, you can train your own state-of-the-art question answering system in 30 minutes on a single Cloud TPU, or a few hours using a single GPU. The source code built on top of TensorFlow. A number of pre-trained language representation models are also included. BERT features BERT improves on recent work in pre-training contextual representations. This includes semi-supervised sequence learning, generative pre-training, ELMo, and ULMFit. BERT is different from these models, it is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus - Wikipedia. Context-free models like word2vec generate a single word embedding representation for every word. Contextual models, on the other hand, generate a representation\ of each word based on the other words in the sentence. BERT is deeply bidirectional as it considers the previous and next words. Bidirectionality It is not possible to train bidirectional models by simply conditioning each word on words before and after it. Doing this would allow the word that’s being predicted to indirectly see itself in a multi-layer model. To solve this, Google researchers used a straightforward technique of masking out some words in the input and condition each word bidirectionally in order to predict the masked words. This idea is not new, but BERT is the first technique where it was successfully used to pre-train a deep neural network. Results On The Stanford Question Answering Dataset (SQuAD) v1.1, BERT achieved 93.2% F1 score surpassing the previous state-of-the-art score of 91.6% and human-level score of 91.2%. BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks. For more details, visit the Google Blog. Intel AI Lab introduces NLP Architect Library FAT Conference 2018 Session 3: Fairness in Computer Vision and NLP Implement Named Entity Recognition (NER) using OpenNLP and Java

0
0
5413

article-image-google-adanet-a-tensorflow-based-automl-framework

Sugandha Lahoti

31 Oct 2018

3 min read

Google AdaNet, a TensorFlow-based AutoML framework

Sugandha Lahoti

31 Oct 2018

3 min read

Google researchers have come up with a new AutoML framework, which can automatically learn high-quality models with minimal expert intervention. Google AdaNet is a fast, flexible, and lightweight TensorFlow-based framework for learning a neural network architecture and learning to ensemble to obtain even better models. How Google Adanet works? As machine learning models increase in number, Adanet will automatically search over neural architectures, and learn to combine the best ones into a high-quality model. Adanet implements an adaptive algorithm for learning a neural architecture as an ensemble of subnetworks. It can add subnetworks of different depths and widths to create a diverse ensemble, and trade off performance improvement with the number of parameters. This saves ML engineers the time spent selecting optimal neural network architectures. Source: Google Adanet: Built on Tensorflow AdaNet implements the TensorFlow Estimator interface. This interface simplifies machine learning programming by encapsulating training, evaluation, prediction and export for serving. Adanet also integrates with open-source tools like TensorFlow Hub modules, TensorFlow Model Analysis, and Google Cloud’s Hyperparameter Tuner. TensorBoard integration helps to monitor subnetwork training, ensemble composition, and performance. Tensorboard is one of the best TensorFlow features for visualizing model metrics during training. When AdaNet is done training, it exports a SavedModel that can be deployed with TensorFlow Serving. How to extend AdaNet to your own projects Machine learning engineers and enthusiasts can define their own AdaNet adanet.subnetwork.Builder using high level TensorFlow APIs like tf.layers. Users who have already integrated a TensorFlow model in their system can use the adanet.Estimator to boost model performance while obtaining learning guarantees. Users are also invited to use their own custom loss functions via canned or custom tf.contrib.estimator.Heads in order to train regression, classification, and multi-task learning problems. Users can also fully define the search space of candidate subnetworks to explore by extending the adanet.subnetwork.Generator class. Experiments: NASNet-A versus AdaNet Google researchers took an open-source implementation of a NASNet-A CIFAR architecture and transformed it into a subnetwork. They were also able to improve upon CIFAR-10 results after eight AdaNet iterations. The model achieves this result with fewer parameters: [caption id="attachment_23810" align="aligncenter" width="640"] Performance of a NASNet-A model versus AdaNet learning to combine small NASNet-A subnetworks on CIFAR-10[/caption] Source: Google You can checkout the Github repo, and walk through the tutorial notebooks for more details. You can also have a look at the research paper. Top AutoML libraries for building your ML pipelines. Anatomy of an automated machine learning algorithm (AutoML) AmoebaNets: Google’s new evolutionary AutoML

0
0
5391

article-image-amd-rocm-gpus-now-support-tensorflow-v1-8-a-major-milestone-for-amds-deep-learning-plans

Prasad Ramesh

28 Aug 2018

2 min read

AMD ROCm GPUs now support TensorFlow v1.8, a major milestone for AMD’s deep learning plans

Prasad Ramesh

28 Aug 2018

2 min read

AMD has announced the support for TensorFlow v1.8 for their ROCm-enabled GPUs. This includes the Radeon Instinct MI25. ROCm stands for Radeon Open Compute and it is an open-source Hyperscale-class (HPC) platform for GPUs. The platform is programming-language independent. This is a major milestone in AMD’s efforts towards accelerating deep learning. ROCm, the Radeon Open Ecosystem is AMD’s open-source software foundation for GPU computing on Linux. Mayank Daga, Director, Deep Learning Software, AMD stated: “Our TensorFlow implementation leverages MIOpen, a library of highly optimized GPU routines for deep learning.” There is a pre-built whl package made available for a simple install similar to the installation of generic TensorFlow in Linux. They also provide a pre-built Docker image for fast installation. AMD is also working towards upstreaming all the ROCm-specific enhancements to the TensorFlow master repository in addition to supporting TensorFlow v1.8. While they work towards fully upstreaming the enhancements, AMD will be releasing and maintaining future ROCm-enabled TensorFlow versions, like v1.10. In the post, Daga stated, “We believe the future of deep learning optimization, portability, and scalability has its roots in domain-specific compilers. We are motivated by the early results of XLA, and are also working towards enabling and optimizing XLA for AMD GPUs.” Current CPUs which support PCIe Gen3 + PCIe Atomics are: AMD Ryzen CPUs AMD EPYC CPUs Intel Xeon E7 V3 or newer CPUs Intel Xeon E5 v3 or newer CPUs Intel Xeon E3 v3 or newer CPUs Intel Core i7 v4, Core i5 v4, Core i3 v4 or newer CPUs (i.e. Haswell family or newer). The installation is simple, First, you’ll need the open-source ROCm stack. Then, install the rocm library needs to be installed via APT: sudo apt update sudo apt install rocm-libs miopen-hip cxlactivitylogger And finally, you install TensorFlow itself (via AMD’s pre-built whl package): sudo apt install wget python3-pip wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl For more details on how to get started, visit the GitHub repository. There are also examples on image recognition, audio recognition, and multi-gpu training on ImageNet in the GPUOpen website. Nvidia unveils a new Turing architecture: “The world’s first ray tracing GPU” AMD open sources V-EZ, the Vulkan wrapper library Sugar operating system: A new OS to enhance GPU acceleration security in web apps

0
0
5380

article-image-turbo-googles-new-color-palette-for-data-visualization-addresses-shortcomings-of-the-common-rainbow-palette-jet

Sugandha Lahoti

23 Aug 2019

4 min read

Turbo: Google’s new color palette for data visualization addresses shortcomings of the common rainbow palette, 'Jet'

Sugandha Lahoti

23 Aug 2019

4 min read

Google has released a new color palette, which it has named Turbo to address some of the shortcomings of the current popular rainbow palette, Jet. These shortcomings, include false detail, banding, and color blindness ambiguity. According to the blog post, Turbo provides better data visualization depth perception. Their aim with Turbo is to provide a color map which is uniform and color blind-accessible, but also optimal for day to day tasks where the requirements are not as stringent. The blog post specifies that Turbo is meant to be used in cases where perceptual uniformity is not critical, but one still wants a high contrast, smooth visualization of the underlying data. Google Researchers created a simple interface to interactively adjust the sRGB curves using a 7-knot cubic spline while comparing the result on a selection of sample images as well as other well-known color maps. “This approach,” the blog post reads, “provides control while keeping the curve C2 continuous. The resulting color map is not “perceptually linear” in the quantitative sense, but it is more smooth than Jet, without introducing false detail.” Comparison of Turbo with other color maps Virdius and Inferno are two linear color maps that fix most issues of Jet and are generally recommended when false color is needed. However, some feel that it can be harsh on the eyes, which hampers visibility when used for extended periods. Turbo, on the other hand, mimics the lightness profile of Jet, going from low to high back down to low, without banding. Turbo’s lightness slope is generally double that of Viridis, allowing subtle changes to be more easily seen. “This is a valuable feature,” the researchers note, “since it greatly enhances detail when color can be used to disambiguate the low and high ends.” Lightness plots generated by converting the sRGB values to CIECAM02-UCS and displaying the lightness value (J) in greyscale. The black line traces the lightness value from the low end of the color map (left) to the high end (right). Source: Google blog The lightness plots show Viridis and Inferno plots to be linear and Jet’s plot to be erratic and peaky. Turbo’s had a similar asymmetric profile to Jet with the lows darker than the highs. Although the low-high-low curve increases detail, it comes at the cost of lightness ambiguity. This makes Turbo inappropriate for grayscale printing and for people with the rare case of achromatopsia (total color blindness). In the case of semantic layers, compared to Jet, Turbo is much more smooth and has no “false layers” due to banding. Turbo’s attention system prioritizes hue which makes it easy for Turbo to judge the differences in color than in lightness. Turbo’s color map can be used as a diverging colormap as well. The researchers tested Turbo using a color blindness simulator and found that for all conditions except Achromatopsia, the map remains distinguishable and smooth. NASA data viz lead argues Turbo comes with flaws Joshua Stevens, Data visualization and cartography lead at NASA has posted a detailed Twitter thread pointing out certain flaws with Google’s Turbo color map. He points out that “Color palettes should change linearly in lightness. However, Turbo admittedly does not do this. While it avoids the 'peaks' and banding of Jet, Turbo's luminance curve is still humped. Moreover, the slopes on either side are not equal, the curve is still irregular, and it starts out darker than it finishes.” He also contradicts Google’s statement of "our attention system prioritizes hue". The paper that Google links to clearly specifies that experimental results showed that brightness and saturation levels are more important than hue component in attracting attention.”. He clarifies further, “This is not to say that Turbo is not an improvement over Jet. It is! But there is too much known about visual perception to reimagine another rainbow. The effort is stellar, but IMO Turbo is a crutch that further slows adoption of more sensible palettes.” Google has made available the color map data and usage instructions for Python and C/C++. There is also a polynomial approximation, for cases where a look-up table may not be desirable. DeOldify: Colorising and restoring B&W images and videos using a NoGAN approach Implementing color and shape-based object detection and tracking with OpenCV and CUDA [Tutorial] Matplotlib 3.0 is here with new cyclic colormaps, and convenience methods

0
0
5372

article-image-researchers-introduce-a-deep-learning-method-that-converts-mono-audio-recordings-into-3d-sounds-using-video-scenes

Natasha Mathur

28 Dec 2018

4 min read

Researchers introduce a deep learning method that converts mono audio recordings into 3D sounds using video scenes

Natasha Mathur

28 Dec 2018

4 min read

A pair of researchers, Ruohan Gao, University of Texas and Kristen Grauman, Facebook AI research came out with a method, earlier this month, that can teach an AI system the conversion of ordinary mono sounds into binaural sounds. The researchers have termed this concept as “2.5D visual sound” and it uses a video to generate synthetic 3D sounds. Background According to the researchers, binaural audio provides a listener with the 3D sound sensation that allows a rich experience of the scene. However, these recordings are not easily available and require expertise and equipment to obtain. Researchers state that humans generally determine the direction of a sound with the help of visual cues. So, they have used a similar technique, where a machine learning system is provided with a video involving a scene and mono sound recording. Using this video, the ML system then figures out the direction of the sounds and further distorts the “interaural time and level differences” to generate the effect of a 3D sound for the listener. Researchers mention that they have devised a deep convolutional neural network which is capable of learning how to decode the monaural (single-channel) soundtrack into its binaural counterpart. Visual information about object and scene information is injected within the CNN during the whole process. “We call the resulting output 2.5D visual sound—the visual stream helps “lift” the flat single channel audio into spatialized sound. In addition to sound generation, we show the self-supervised representation learned by our network benefits audio-visual source separation”, say researchers. Training method used For the training process, researchers first created a database of examples of the effect that it wants the machine learning system to learn. Grauman and Gao created a database using binaural recordings of over 2,265 musical clips which they had also converted into videos. The researchers mention in the paper, “Our intent was to capture a variety of sound-making objects in a variety of spatial contexts, by assembling different combinations of instruments and people in the room. We post-process the raw data into 10s clips. In the end, our BINAURAL-MUSIC-ROOM dataset consists of 2,265 short clips of musical performances, totaling 6.3 hours”. The equipment used for this project involved a 3Dio Free Space XLR binaural microphone, a GoPro HERO6 Black camera, and a Tascam DR-60D recorder as an audio pre-amplifier. The GoPro camera was mounted on top of the 3Dio binaural microphone to mimic a person seeing and hearing, respectively. The GoPro camera records videos at 30fps with stereo audio. Researchers then used these recordings from the dataset for training a machine-learning algorithm which could recognize the direction of sound from a video of the scene. Once the machine learning system learns this behavior, it is then capable of watching a video and distorting a monaural recording to simulate where the sound is ought to be coming from. Results The video shows the performance results of the research which is quite good. In the video, the results of 2.5D recordings are compared against monaural recording. 2.5D Visual Sound However, it is not capable of generating a complete 3D sound and there certain situations that the algorithm finds difficult to deal with. Other than that, the ML system cannot consider any sound source that is not visible in the video, and the ones that it has not been trained on. Researchers say that this method works best for music videos and they have plans to extend its applications. “Generating binaural audio for off-the-shelf video could potentially close the gap between transporting audio and visual experiences, and will be useful for new applications in VR/AR. As future work, we plan to explore ways to incorporate object localization and motion, and explicitly model scene sounds”, say the researchers. For more information, check out the official research paper. Italian researchers conduct an experiment to prove that quantum communication is possible on a global scale Stanford researchers introduce DeepSolar, a deep learning framework that mapped every solar panel in the US Researchers unveil a new algorithm that allows analyzing high-dimensional data sets more effectively, at NeurIPS conference

0
0
5352

Tech News - Data

What is Meta Learning?

'Developers' lives matter': Chinese developers protest over the “996 work schedule” on GitHub

Google Brain’s Universal Transformers: an extension to its standard translation system

Google’s Cloud Robotics platform, to be launched in 2019, will combine the power of AI, robotics and the cloud

Google open sources Active Question Answering (ActiveQA), a Reinforcement Learning based Q&A system

Michelangelo PyML: Introducing Uber’s platform for rapid machine learning development

Anaconda 5.2 releases!

A Bitwise study presented to the SEC reveals that 95% of CoinMarketCap’s BTC trading volume report is fake

Amazon re:Invent 2018: AWS Snowball Edge comes with a GPU option and more computing power

OpenAI’s gradient checkpointing: A package that makes huge neural nets fit into memory

Trending Topics

Google open sources BERT, an NLP pre-training technique

Google AdaNet, a TensorFlow-based AutoML framework

AMD ROCm GPUs now support TensorFlow v1.8, a major milestone for AMD’s deep learning plans

Turbo: Google’s new color palette for data visualization addresses shortcomings of the common rainbow palette, 'Jet'

Researchers introduce a deep learning method that converts mono audio recordings into 3D sounds using video scenes