Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-introducing-deon-a-tool-for-data-scientists-to-add-an-ethics-checklist

06 Sep 2018

5 min read

Introducing Deon, a tool for data scientists to add an ethics checklist

06 Sep 2018

Drivendata has come out with a new tool, named, Deon, which allows you to easily add an ethics checklist to your data science projects. Deon is aimed at pushing the conversation about ethics in data science, machine learning, and Artificial intelligence by providing actionable reminders to data scientists. According to the Deon team, “it's not up to data scientists alone to decide what the ethical course of action is. This has always been a responsibility of organizations that are part of civil society. This checklist is designed to provoke conversations around issues where data scientists have particular responsibility and perspective”. Deon comes with a default checklist, but you can also develop your own custom checklists by removing items and sections, or marking items as N/A depending on the needs of the project. There are also real-world examples linked with each item in the default checklist. To be able to run Deon for your data science projects, you need to have Python 3 or greater. Let’s now discuss the two types of checklists, Default, and Custom, that comes with Deon. Default checklist The default checklist comprises of sections on Data Collection, Data Storage, Analysis, Modeling, and Deployment. Data Collection This checklist covers information on informed consent, Collection Bias, and Limit PII exposure. Informed consent includes a mechanism for gathering consent where users have clear understanding of what they are consenting to. Collection Bias checks on sources of bias introduced during data collection and survey design. Lastly, Limit PII exposure talks about ways that can help minimize the exposure of personally identifiable information (PII). Data Storage This checklist covers sections such as Data security, Right to be forgotten and Data retention plan. Data Security refers to a plan to protect and secure data. Right to be forgotten includes a mechanism by which an individual can have his/her personal information. Data Retention consists of a plan to delete the data if no longer needed. Analysis This section comprises information on Missing perspectives, Dataset bias, Honest representation, Privacy in analysis and Auditability. Missing perspectives address the blind spots in data analysis via engagement with relevant stakeholders. Dataset bias discusses examining the data for possible sources of bias and consists of steps to mitigate or address them. Honest representation checks if visualizations, summary statistics, and reports designed honestly represent the underlying data. Privacy in analysis ensures that the data with PII are not used or displayed unless necessary for the analysis. Auditability refers to the process of producing an analysis which is well documented and reproducible. Modeling This offers information on Proxy discrimination, Fairness across groups, Metric selection, Explainability, and Communicate bias. Proxy discrimination talks about ensuring that the model does not rely on variables or proxies that are discriminatory. Fairness across groups is a section that cross-checks whether the model results have been tested for fairness with respect to different affected groups. Metric selection considers the effects of optimizing for defined metrics and other additional metrics. Explainability talks about explaining the model’s decision in understandable terms. Communicate bias makes sure that the shortcomings, limitations, and biases of the model have been properly communicated to relevant stakeholders. Deployment This covers topics such as Redress, Roll back, Concept drift, and Unintended use. Redress discusses with an organization a plan for response in case users get harmed by the results. Roll back talks about a way to turn off or roll back the model in production when required. Concept drift refers to changing relationships between input and output data in a problem over time. This part in a checklist reminds the user to test and monitor the concept drift. This is to ensure that the model remains fair over time. Unintended use prompts the user about the steps to be taken for identifying and preventing uses and abuse of the model. Custom checklists For your projects with particular concerns, it is recommended to create your own checklist.yml file. Custom checklists are required to follow the same schema as checklist.yml. Custom Checklists need to have a top-level title which is a string, and sections which are a list. Each section in the list must have a title, a section_id, and then a list of lines. Each line must include a line_id, a line_summary, and a line string which is the content. When changing the default checklist, it is necessary to keep in mind that Deon’s goal is to have checklist items that are actionable. This is why users are advised to avoid suggesting items that are vague (e.g., "do no harm") or extremely specific (e.g., "remove social security numbers from data"). For more information, be sure to check out the official Drivendata blog post. The Cambridge Analytica scandal and ethics in data science OpenAI charter puts safety, standards, and transparency first 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017

0
0
5968

article-image-california-replaces-cash-bail-with-algorithms

Richard Gall

05 Sep 2018

2 min read

California replaces cash bail with algorithms

Richard Gall

05 Sep 2018

2 min read

Last week, (August 28) California Governor Jerry Brown signed a bill that will see cash bail replaced by an algorithm. Set to take effect in October 2019, it will mean that if you're accused of a crime you won't simply be able to pay money as a form of collateral before your trial. Instead, you'll be 'graded' by an algorithm according to how likely you are to abscond or commit another crime. However, the algorithm won't make the final decision - the grade is rather a guide for a county official who will then decide whether to grant bail. In a statement, Brown said that Today, "California reforms its bail system so that rich and poor alike are treated fairly". However, there are plenty who disagree that that will be the case. Criticism of the legislation The move has been met with criticism from civil liberties groups and AI watchdogs. Although cash bail has long drawn criticism for making wealth the arbiter of someone's freedom, placing judicial decision making in the hands of algorithms could, these groups argue, similarly discriminate and entrench established injustice and social divisions. Rashida Richardson, policy director at AI think tank AI Now, speaking to Quartz, said that "a lot of these criminal justice algorithmic-based systems are relying on data collected through the criminal justice system." This means, Richardson explains, “you have data collection that’s flawed with a lot of the same biases as the criminal justice system.” Raj Jayadev, from Silicon Valley Debug also speaking to Quartz, said that the legislation will "lead to an increase in pretrial detention." The details have yet to be finalized, but it's believed that the bill's impact will be reviewed in 2023. The most crucial element for this project to work is transparency - whether law makers and law enforcement provide transparency on the algorithm and how its used will remain to be seen. Read next Amazon is selling facial recognition technology to police Alarming ways governments are using surveillance tech to watch you Lerna development team quickly reverses decision to block ICE Contractors from using its Software

0
0
3191

article-image-twitters-ceo-jack-dorseys-testimony-for-the-senate-hearing-on-twitter-algorithms-platform-health-role-in-elections-and-more

Sugandha Lahoti

05 Sep 2018

5 min read

Twitter’s CEO, Jack Dorsey’s Senate Testimony: On Twitter algorithms, platform health, role in elections and more

Sugandha Lahoti

05 Sep 2018

5 min read

Last week, on Monday, The House Energy and Commerce Committee announced that Twitter CEO Jack Dorsey will testify before the committee regarding Twitter algorithms and content monitoring. The hearing is scheduled to take place on the afternoon of Wednesday, September 5, 2018. It is Wednesday, and Jack Dorsey has released a Testimony ahead of appearing before the committee today. Mid August, Jack Dorsey announced plans to rethink how Twitter works to combat fake news and data scandals. In July, Twitter deleted 70 million fake accounts in an attempt to curb fake news and improve Twitter algorithms. It has been constantly suspending fake accounts which are inauthentic, spammy or created via malicious automated bots. In his testimony, Dorsey has pushed back critics who have accused Twitter of censoring political content and not acting quickly enough to take down hateful expression. “Twitter does not use political ideology to make any decisions, whether related to ranking content on our service or how we enforce our rules.” His testimony provided information on four important factors that E&C sought answers for. Improving Twitter’s health Twitter algorithms Twitter’s work on Russian meddling of the 2016 elections Information on recent malicious activity Twitter saw on the platform. Twitter Healthcare Regarding Twitter’s health, Dorsey asserted that the platform has developed a more “comprehensive framework” to encourage healthy debate, conversations, and critical thinking. Earlier this year, Twitter began collaborating with the non-profit research center Cortico and the MIT Media Lab on exploring how to measure aspects of the health of the public sphere. They also requested outside experts for their proposed health metrics for Twitter. Following which the social platform is partnering with the University of Oxford and Leiden University and other academic institutions to better measure the health of Twitter, focusing on “informational echo chambers and unhealthy discourse on Twitter”. Twitter Algorithms In 2016, Twitter introduced a Home Timeline new ranking feature to show people the Tweets they might find most interesting first. Twitter also has a notification timeline that enables people to see who has liked, retweeted and replied to their Tweets, as well as who mentioned or followed them. Twitter algorithms also promote “Safe Search”. Per this mode, potentially sensitive content is excluded from search results, such as spam, adult content, and the accounts an individual has muted or blocked. Individual accounts may also mark their own posts as sensitive. Twitter also uses behavioral signals to determine how Tweets are organized. When a Tweet is identified as potentially detracting from a healthy conversation, it will only be available to view if you click on “Show more replies” or choose to see everything in your search setting. Using behavior signals, Twitter has had a 4 percent drop in abuse reports from search and 8 percent fewer abuse reports from conversations. In preparation for the hearing, Twitter data scientists analyzed Tweets sent by all members of the House and Senate that have Twitter accounts for a 30 day period spanning July 23, 2018 until August 13, 2018. They observed that there is no statistically significant difference between the number of times a Tweet by a Democrat is viewed versus a Tweet by a Republican. Their performance is the same because the Twitter platform itself does not take sides. Twitter’s work on Russian meddling of the 2016 elections After the meddling of Russian interference in US presidential elections in 2016, Twitter conducted a comprehensive platform activity. They identified 50,258 automated accounts that were Russian-linked and Tweeting election-related content. They also conducted an analysis of accounts that promoted election-related Tweets on the platform throughout 2016 in the form of paid ads. The two most active accounts were affiliated with Russia Today (“RT”), which Twitter subsequently barred from advertising. Per Dorsey’s testimony “Twitter’s main focus is on promoting healthy public discourse through protection of the democratic process. We must continue to work together with our elected officials, government partners, industry peers, outside experts, and other stakeholders so that the American people and the global community can understand the full context in which these threats arise.” Recent Malicious activity Twitter saw on its platform Twitter has suspended a total of 3,843 malicious accounts affiliated with the Russian Internet Research Agency. “As an example of Twitter’s ongoing efforts, Twitter identified 18 accounts in March 2018, believed to be linked to the Internet Research Agency uncovered by our ongoing additional reviews.” They have also suspended 770 accounts for violating Twitter policies in Iran. These accounts were in violation of Twitter’s platform manipulation policies and were engaged in coordinated activity intended to propagate messages artificially across accounts. Dorsey says, “Twitter has been in close contact with our industry peers about the malicious accounts located within Iran—we have received detailed information from them that has assisted us in our investigation, and we have shared our own details and work with other companies.” Dorsey ends his testimony saying that increased transparency is critical to promoting healthy public conversation on Twitter and earning trust. “We remain vigilant about identifying and eliminating abuse on the platform perpetrated by hostile foreign actors, and we will continue to invest in resources and leverage our technological capabilities to do so,” he wrote. You can read Dorsey’s entire testimony here. Facebook, Twitter takes down hundreds of fake accounts with ties to Russia and Iran, suspected to influence the US midterm elections. Twitter’s trying to shed its skin to combat fake news and data scandals, says Jack Dorsey. Twitter allegedly deleted 70 million fake accounts in an attempt to curb fake news.

0
0
1866

article-image-ethereum-constantinople-hard-fork-to-move-ethereum-from-pow-proof-of-work-to-pos-proof-of-stake-model

Prasad Ramesh

05 Sep 2018

2 min read

Ethereum Constantinople hard fork to move Ethereum from PoW (proof-of-work) to PoS (proof-of-stake) model

Prasad Ramesh

05 Sep 2018

2 min read

The Ethereum development team streamed a live meeting recently. In it, they talked about a hard fork on the Ethereum blockchain. A hard fork is a drastic change in the platform’s protocol. This hard fork named ‘Constantinople’ is a step to get closer to moving Ethereum from a proof-of-work (PoW) to a proof-of-stake model (PoS). The meeting happened on 31st August and did not involve Ethereum founder Vitalik Buterin. Proof-of-stake models will affect mineable coins. This is known as a “difficulty bomb” which the dev team decided to delay for another year. Till then ETH will still be mineable as usual including ASIC cards. The difficulty bomb will act as a deterrent for miners who choose to continue using PoW, after Ethereum shifts to PoS with speculations signaling early next year. After the implementation, mining will stay but the rewards will be less. The hard fork will reduce the block ETH value from 3 to 1, a reduction of 33%. This is in line with a PoS model where mining is not used to maintain the cryptographic integrity of the ledger. YouTube commentators viewed the move as a way to reduce block rewards and push ETH’s market price. Also on Twitter, some see it as a move to drive up the ETH price. There is no announced date for Constantinople to take effect, but there is another meeting scheduled in two weeks, in which more information should be made available. The ETH price is depressed as miners have to sell it to cover mining costs. As of now, Ethereum trades for about $284, a drop from $411 which was the price in early August. There are Ethereum Improvement Proposals (EIP) made that include proposals to reduce rewards, bitwise shifting instructions in EVM, simpler blockhash refactoring and others. You can read about the EIPs on GitHub. For more details, you can view the streamed YouTube video. Vitalik Buterin’s new consensus algorithm to make Ethereum 99% fault tolerant Ethereum Blockchain dataset now available in BigQuery for smart contract analytics How to set up an Ethereum development environment [Tutorial]

0
0
3007

article-image-apache-flink-founders-data-artisans-could-transform-stream-processing-with-patent-pending-tool

Richard Gall

04 Sep 2018

2 min read

Apache Flink founders data Artisans could transform stream processing with patent-pending tool

Richard Gall

04 Sep 2018

2 min read

data Artisans, the stream processing team behind Apache Flink, today unveiled data Artisans Streaming Ledger at the Flink Forward Conference in Berlin. Streaming Ledger, according to data Artisans "extends the scope of stream processing with fast, serializable ACID transactions directly on streaming data." This is significant because previously performing serializable transactions across streaming data - without losing data consistency - was impossible. If data Artisans are right about Streaming Ledger that's not only good news for them, it's good news for developers and system architects struggling to manage streaming data within their applications. Read next: Say hello to streaming analytics How Streaming Ledger fits into a data streaming architecture Streaming Ledger is essentially a new component within data Artisans existing data streaming architecture, which includes Apache Flink. [caption id="attachment_22285" align="aligncenter" width="607"] The architecture of data Artisans Platform (via data-artisans.com)[/caption] Stephan Ewen, co-founder and CTO at data Artisans said that "guaranteeing serializable ACID transactions is the crown discipline of data management." He also claimed that Streaming Ledger does "something that even some large established databases fail to provide. We are very proud to have come up with a way to solve this problem for real time data streams, and make it fast and easy to use." Read next: Apache Flink version 1.6.0 released! How Streaming Ledger works It's not easy for streaming technologies to process event streams across shared states and tables. That's why streaming is so tough (okay, just about impossible) when used with relational databases. However, Streaming Ledger works by isolating tables from concurrent changes as they are modified in transactions. This helps to ensure consistency is maintained across your data, as you might expect in a really robust relational database. [caption id="attachment_22287" align="aligncenter" width="1263"] data Artisans Streaming Ledger functionality (via data-artisans.com)[/caption] data Artisans have also produced a white paper that details how Streaming Ledger works as well as further information about why you want to use it. You need to provide details to gain access, but you can find it here.

0
0
2784

article-image-us-russia-avoid-talks-ban-ai-enabled-killer-robots

Fatema Patrawala

04 Sep 2018

2 min read

UN meetings ended with US & Russia avoiding formal talks to ban AI enabled killer robots

Fatema Patrawala

04 Sep 2018

2 min read

The United States and Russia were among a small number of countries that blocked the U.N. from moving toward talks on whether to ban so-called killer robots. As per Politico, a week of UN meetings in Geneva, concluded in the early hours of Saturday. During the meetings a group at the United Nations' Convention on Conventional Weapons (CCW) discussed whether to take negotiations on fully autonomous weapons powered by artificial intelligence to a formal level that could lead to a treaty banning them. However, a list of non-binding recommendations that participating countries agreed on, dodged the question of whether to move on to formal negotiations. Mary Wareham, coordinator of the Campaign to Stop Killer Robots, said that Russia, the U.S., South Korea, Israel and Australia were the main countries to oppose this call. "It's a disappointment, of course, that a small minority of large military powers can hold back the will of the majority," she said. Mary Wareham’s group represents 75 non-governmental organizations in 32 countries fighting for a ban on weapons that use AI technology to choose their targets. It says 26 countries endorse a full ban on the weapons. Throughout the meeting, many of those countries reiterated their call for strong regulation, pushing for the U.N. to start formal negotiations at least by next year. Doing so will be the next step toward binding international rules but opponents of the ban stood firm. The document issued at the end of the meeting recommends that non-binding talks should continue. Russian censorship board threatens to block search giant Yandex due to pirated content The New AI Cold War Between China and the USA Microsoft claims it halted Russian spearphishing cyberattacks

0
0
2168

article-image-how-netflix-uses-ava-an-image-discovery-tool-to-find-the-perfect-title-image-for-each-of-its-shows

Melisha Dsouza

04 Sep 2018

5 min read

How Netflix uses AVA, an Image Discovery tool to find the perfect title image for each of its shows

Melisha Dsouza

04 Sep 2018

5 min read

Netflix, the video-on-demand streaming company, has seen a surge in its growing number of users every day as well as in the viewership of its TV shows. It is constantly striving to provide an enriching experience to its viewers. To keep pace with the ever-increasing demands of user experience, Netflix is introducing a collection of tools and algorithms to make its content more audience relevant. AVA( Aesthetic Visual Analysis)- analyses large volumes of images obtained from video frames of a particular TV show to set as the title image for that show. Netflix understands that a more visually appealing title image plays an incredibly important role assisting a viewer find new shows and movies to watch. How title images are selected normally Usually, content editors had to go through tens of thousands of video frames for a show, to select a good title image. To give you a gist of the effort required- a single one-hour episode of ‘Stranger Things’, consists of nearly 86,000 static video frames. Imagine sieving through each one of these frames painstakingly to find the perfect title image that will not only connect with the viewers, but also give them a gist of the storyline. To top it all up, the number of frames can go up to a million depending on the number of episodes in a show. This task of manually screening the frames is almost impossible and labor intensive, if not ineffective. Additionally, the editors choosing the image stills require an in-depth expertise of the source content that they’re intended to represent. Considering Netflix has an exponentially increasing catalog of shows, this presents a very challenging expectation for the editors to surface meaningful images from videos. Enter AVA, using its image classification algorithms for sorting the right image at the right time. What is AVA? The ever-growing number of images on the internet space has led to challenges in its processing and classification. To address this concern, a research team from University of Barcelona, Spain in collaboration with Xerox corporation has developed a method called Aesthetic Visual Analysis (AVA) as a research project. The project contains a vast database of over 2.5 lakh images combined with metadata such as aesthetic scores for images semantic labels for more than 60 classifications of images and many other characteristics. Using statistical concepts like standard deviation, mean score and variance, AVA rates images. Based on the distributions computed from these statistics, they assess the semantic challenges and choose the right images for the database. AVA primarily alleviates the issues of extensive benchmarking and trains more images. They also enable images to get a better aesthetic appeal. Computing performance can be significantly optimised to have lesser impact on the hardware. You can get more insights by reading the Research paper. The ‘AVA’ approach used at Netflix The process takes place in 3 steps: AVA starts by analysing images obtained through the process of frame annotation. This includes processing and annotating many different variables on every individual frame of video to best derive what the frame contains, and to understand its importance to the story. To keep up pace with the growing catalog of content, Netflix uses the Archer framework to process videos more efficiently. Archer splits the video into very tiny bits to aid parallel video processing. After the frames are obtained, they are subjected to a series of image recognition algorithms to build metadata. Metadata is further classified as visual, contextual and composition metadata. To give you a brief overview- Visual Metadata: For brightness, sharpness and color Contextual Metadata: This is a combination of elements that are combined to derive meaning from the actions or movement of the actors, objects and camera in the frame. Eg: face detection, Motion estimation, Object Detection and camera shot identification Composition Metadata: For intricate image details based on core principles in photography, cinematography and visual aesthetic design such as depth of field and symmetry. Choosing the right Picture! The ‘best’ image is chosen considering three important aspects– the lead actors, visual range and sensitivity filters. Emphasis is given first to lead actors of the show since they make a visual impact. In order to identify the key character for a given episode, AVA utilizes a combination of face clustering and actor recognition to filter main characters from secondary characters or extras. The next thing, is the diversity of the images present in the video frames which includes camera positions, image details such as brightness, color, contrast to name a few. Keeping these in mind, image frames are easy to group based on similarities. This helps in developing image support vectors. The vectors primarily assist in designing an image diversity index where all the relevant images collected for an episode or even a movie can be scored based on visual appeal. Sensitive factors such as violence, nudity and advertisements are filtered and are allotted low priority in the image vectors. This way they are screened out completely in the process. Source: Netflix Blog What's in this for Netflix and its users? Netflix’s decision to use AVA will not only save manual labour, but also reduce the cost involved in having manpower source through millions of images in order to get that one perfect shot. This unique approach will help in obtaining meaningful images from video and thus enable creative teams to invest time in designing stunning artwork . As for its users, a good title image means establishing a deeper connection to the show’s characters and storyline, thus improving their overall experience. To understand the intricate workings of AVA, you can read Netflix engineering team’s original post on this topic on Medium. How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts Netflix releases FlameScope Netflix bring in Verna Myers as new VP of Inclusion strategy to boost cultural diversity

0
0
5849

article-image-pytorch-based-hyperlearn-statsmodels-aims-to-implement-a-faster-and-leaner-gpu-sklearn

Melisha Dsouza

04 Sep 2018

3 min read

PyTorch-based HyperLearn Statsmodels aims to implement a faster and leaner GPU Sklearn

Melisha Dsouza

04 Sep 2018

3 min read

HyperLearn is a Statsmodel, a result of the collaboration of languages such as PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and has similarities to Scikit Learn. This project started last month by Daniel Hanchen and still has some unstable packages. He aims to make Linear Regression, Ridge, PCA, LDA/QDA faster, which then flows onto other algorithms being faster. This Statsmodels combo incorporates novel algorithms to make it 50% more faster and enables it to use 50% lesser RAM along with a leaner GPU Sklearn. HyperLearn also has an embedded statistical inference measures, and can be called similar to a Scikit Learn's syntax (model.confidence_interval_) HyperLearn’s Speed/ Memory comparison There is a 50%+ improvement on Quadratic Discriminant Analysis (similar improvements for other models) as can be seen below: Source: GitHub Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) ) Key Methodologies and Aims of the HyperLearn project #1 Parallel For Loops Hyperlearn for loops will include Memory Sharing and Memory Management CUDA Parallelism will be made possible through PyTorch & Numba #2 50%+ faster and leaner Matrix operations that have been improved include Matrix Multiplication Ordering, Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3), reducing Matrix Operations to Einstein Notation and Evaluating one-time Matrix Operations in succession to reduce RAM overhead. Applying QR Decomposition and then SVD(Singular Value decomposition) might be faster in some cases. Utilise the structure of the matrix to compute faster inverse Computing SVD(X) and then getting pinv(X) is sometimes faster than pure pinv(X) #3 Statsmodels is sometimes slow Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized. Using Einstein Notation & Hadamard Products where possible. Computing only what is necessary to compute (Diagonal of matrix only) Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables. #4 Deep Learning Drop In Modules with PyTorch Using PyTorch to create Scikit-Learn like drop in replacements. #5 20%+ Less Code along with Cleaner Clearer Code Using Decorators & Functions wherever possible. Intuitive Middle Level Function names like (isTensor, isIterable). Handles Parallelism easily through hyperlearn.multiprocessing #6 Accessing Old and Exciting New Algorithms Matrix Completion algorithms - Non Negative Least Squares, NNMF Batch Similarity Latent Dirichelt Allocation (BS-LDA) Correlation Regression and many more! Daniel further went on to publish some prelim algorithm timing results on a range of algos from MKL Scipy, PyTorch, MKL Numpy, HyperLearn's methods + Numba JIT compiled algorithms Here are his key findings on the HyperLearn statsmodel: HyperLearn's Pseudoinverse has no speed improvement HyperLearn's PCA will have over 200% improvement in speed boost. HyperLearn's Linear Solvers will be over 1 times faster i.e it will show a 100% improvement in speed You can find all the details of the test on reddit.com For more insights on HyperLearn, check out the release notes on Github. A new geometric deep learning extension library for PyTorch releases! NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning? Introduction to Sklearn

0
0
4035

article-image-amazon-is-supporting-research-into-conversational-ui-with-alexa-fellowships

Sugandha Lahoti

03 Sep 2018

3 min read

Amazon is supporting research into conversational AI with Alexa fellowships

Sugandha Lahoti

03 Sep 2018

3 min read

Amazon has chosen recipients from all over the world to be awarded the Alexa fellowships. The Alexa Fellowships program is open for PhD and post-doctoral students specializing in conversational AI at select universities. The program was launched last year, when four researchers won awards. Amazon's Alexa Graduate fellowship The Alexa Graduate Fellowship supports conversational AI research by providing funds and mentorship to PhD and postdoctoral students. Faculty Advisors and Alexa Graduate Fellows will also teach conversational AI to undergraduate and graduate students using the Alexa Skills Kit (ASK) and Alexa Voice Services (AVS). The graduate fellowship recipients are selected based on their research interests, planned coursework and existing conversational AI curriculum. This year the institutions include six in the United States, two in the United Kingdom, one in Canada and one in India. The 10 universities are: Carnegie Mellon University, Pittsburgh, PA International Institute of Information Technology, Hyderabad, India Johns Hopkins University, Baltimore, MD MIT App Inventor, Boston, MA University of Cambridge, Cambridge, United Kingdom University of Sheffield, Sheffield, United Kingdom University of Southern California, Los Angeles, CA University of Texas at Austin, Austin, TX University of Washington, Seattle, WA University of Waterloo, Waterloo, Ontario, Canada Amazon's Alexa Innovation Fellowship The Alexa Innovation Fellowship is dedicated to innovations in conversational AI. The program was introduced this year and Amazon has partnered with university entrepreneurship centers to help student-led startups build their innovative conversational interfaces. The fellowship also provides resources to faculty members. This year ten leading entrepreneurship center faculty members were selected as the inaugural class of Alexa Innovation Fellows. They are invited to learn from the Alexa team and network with successful Alexa Fund entrepreneurs. Instructors will receive funding, Alexa devices, hardware kits and regular training, as well as introductions to successful Alexa Fund-backed entrepreneurs. The 10 universities selected to receive the 2018-2019 Alexa Innovation Fellowship include: Arizona State University, Tempe, AZ California State University, Northridge, CA Carnegie Mellon University, Pittsburgh, PA Dartmouth College, Hanover, NH Emerson College, Boston, MA Texas A&M University, College Station, TX University of California, Berkeley, CA University of Illinois, Urbana-Champaign, IL University of Michigan, Ann Arbor, MI University of Southern California, Los Angeles, CA “We want to make it easier and more accessible for smart people outside of the company to get involved with conversational AI. That's why we launched the Alexa Skills Kit (ASK) and Alexa Voice Services (AVS) and allocated $200 million to promising startups innovating with voice via the Alexa Fund.” wrote Kevin Crews, Senior Product Manager for the Amazon Alexa Fellowship, in a blog post. Read more about the 2018-2019 Alexa Fellowship class on the Amazon blog. Read next Cortana and Alexa become best friends: Microsoft and Amazon release a preview of this integration Voice, natural language, and conversations: Are they the next web UI?

0
0
2445

article-image-revolver-a-machine-learning-approach-to-forecast-cancer-growth

Bhagyashree R

03 Sep 2018

3 min read

REVOLVER: A machine learning approach to forecast cancer growth

Bhagyashree R

03 Sep 2018

3 min read

A team of researchers from Institute of Cancer Research London (ICR) and the University of Edinburgh have devised a method named repeated evolution in cancer, also known as REVOLVER. It uses a machine learning approach, specifically known as transfer learning to find out patterns in DNA mutation within cancer and uses the information to forecast future genetic changes. REVOLVER exploits multiple independent noisy observations taken from single patients and transfers information between patients to de-noise data and highlight hidden evolutionary patterns. Along with explaining the data in each patient, the individual models also highlight subgroups of tumors that evolved similarly The goal of this model is to solve the biggest challenge in oncology, that is, the tumor with time could progress from benign to malignant, become metastatic, and develop resistance to certain therapies. This occurs through a process of clonal evolution that involves cancer cells and their microenvironment, and results in intratumor heterogeneity (ITH). ITH results to the deadly outcome of cancer by providing the substrate of phenotypic variation on which adaptation can occur. How REVOLVER works? To accurately detect and compare changes in each tumour, the team used 768 tumour samples from 178 patients reported in previous studies for lung, breast, kidney and bowel cancer, and analysed the data within each cancer type respectively. Source: Nature Methods First, with the help of multi-region sampling genomic ITH is characterized. Patient subgroups share some evolutionary trajectories with common somatic drivers but remain hidden because of apparent variability in genomic patterns between patients. Using the standard approach, the phylogenetic tree (evolutionary model) for every patient is inferred and compared to the n trees. Because the trees are independently inferred, the statistical signal for repeated evolution is weak and few trajectories are identified. REVOLVER uses transfer learning to infer n models jointly and increase their structural correlation. These n trees explain the data in each patient while highlighting repeated evolutionary trajectories in the subgroup. How it will help in cancer treatment? Combining the current knowledge of cancer and identified repeated patterns, scientists could predict the future trajectory of tumour development. This method gives doctor the power of knowing how a tumour will evolve, beforehand, so that they could help the patient in earlier stages. The researchers also found a link between certain sequences of repeated tumour mutations and survival outcome. Repeated patterns of DNA mutations could be used to know the likely of cancer, which could help in shaping future treatment. This method could be used to predict if patients will develop resistance in future, if tumours with certain patterns are found to develop resistance to a particular treatment. Dr Andrea Sottoriva, a team leader in evolutionary genomics and modelling at the ICR who was a part of this study, believes that this AI tool could help the doctors find a treatment in an earlier stage: "By giving us a peek into the future, we could potentially use this AI tool to intervene at an earlier stage, predicting cancer's next move." To explore more on REVOLVER method, check out the paper: Detecting repeated cancer evolution from multi-region tumor sequencing data. Google, Harvard researchers build a deep learning model to forecast earthquake aftershocks location with over 80% accuracy 8 Machine learning best practices How everyone at Netflix uses Jupyter notebooks from data scientists, machine learning engineers, to data analysts

0
0
3110

article-image-ican-module-uses-faster-r-cnn-for-detecting-human-object-interaction

Savia Lobo

03 Sep 2018

3 min read

iCAN module uses faster R-CNN for detecting Human-Object Interaction

Savia Lobo

03 Sep 2018

3 min read

Researchers from Virginia Tech, Chen Gao, Yuliang Zou, and Jia-Bin Huang, recently published a paper on ‘iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection.’ In it, they propose an ‘instance-centric attention module’ (iCAN) for human-object interaction detection. This module uses an incredibly fast regional convolutional neural network (R-CNN), which, in turn, is much more effective in identifying and understanding the human-object interaction. In order to understand the situation in a scene or an image, computers need to recognize how humans interact with surrounding objects. This can be done using human-object interaction, localizes a person and an object, and then well as identifies the relationship - or interaction - between them. The core idea of this research is that an image of a person or an object contains informational cues on the most relevant parts of an image for an algorithm to attend to - this means making predictions should be easier. To exploit this cue, researchers propose an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance. Thus, this network allows to selectively aggregate features relevant for recognizing human-object interactions. The researchers validated the efficacy of the proposed network using the COCO and HICO-DET datasets and showed that this approach compares favorably with the state-of-the-art. iCAN module Highlights of the iCAN paper include: The researchers have introduced an instance-centric attention module that allows the network to dynamically highlight informative regions for improving HOI detection. They have also established a new state-of-the-art performance on two large-scale HOI benchmark datasets. They conducted a detailed ablation study and error analysis to identify the relative contributions of the individual components and quantify different types of errors. They also released the source code and pre-trained models to facilitate future research. Advantages of the iCAN module Unlike hand-designed contextual features based on pose, the entire image, or secondary regions, iCAN’s attention map is automatically learned and jointly trained with the rest of the networks for improving the performance. On comparing with attention modules designed for image-level classification, the instance-centric attention map provides greater flexibility as it allows attending to different regions in an image depending on different object instances. To know about iCAN in detail head on to the research paper. Build intelligent interfaces with CoreML using a CNN [Tutorial] CapsNet: Are Capsule networks the antidote for CNNs kryptonite? A new Stanford artificial intelligence camera uses a hybrid optical-electronic CNN for rapid decision making

0
0
1788

article-image-baidu-apollo-autonomous-driving-vehicles-gets-machine-learning-based-auto-calibration-system

Bhagyashree R

03 Sep 2018

2 min read

Baidu Apollo autonomous driving vehicles gets machine learning based auto-calibration system

Bhagyashree R

03 Sep 2018

2 min read

The Apollo community has built a machine-learning based auto-calibration system for autonomous driving vehicles. By August 2018, the system had been tested on more than two thousand hours with around ten thousands kilometers’ (6,213 miles) road tests and has proven to be effective. The system is automated and intelligent, due to which, it is suitable for mass-scale self-driving vehicle deployment. Why was Apollo auto-calibration system introduced? Following are the main issues that the current system faces: Manual calibration is time consuming and error prone: The performance and safety of an autonomous driving vehicle depend on its control module. This module includes control algorithms that require vehicle dynamics as input and then sends command to manipulate the vehicle. Performing this calibration in real-time is difficult, that is why, most of the research-oriented autonomous vehicles do manual calibration in one-by-one fashion. Manual calibration consumes a lot of time and is prone to man-made mistakes. Variation in vehicle dynamics: While driving the vehicle dynamics change (i.e. loads change, vehicle parts will be worn out over time, surface friction), and manual calibration cannot possibly cover them. How does Apollo auto-calibration system work? The auto-calibration system depends on the Apollo control module, which consists of an offline model and online learning algorithm Offline model First, a calibration table is generated based on human driving data that best reflects vehicle longitudinal performance at the time of driving. It performs three functions: Collects human driving data Preprocesses the data and select input features Generates calibration table through machine learning models Online learning The online algorithm updates the offline table based on real-time feedback in self-driving mode. It tries to best match the current vehicle dynamics based on offline model established from manual driving data. It performs the following functions: Collects vehicle status and feedback in real time Preprocesses and filter data Adjusts calibration table accordingly To know more details on how this model works and helps to solve the manual calibration problem, check out their published paper: Baidu Apollo Auto-Calibration System - An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm. Apollo 11 source code: A small step for a woman, and a huge leap for ‘software engineering’ Baidu open sources ApolloScape and collaborates with Berkeley DeepDrive to further machine learning in automotives Tesla is building its own AI hardware for self-driving cars

0
0
3314

article-image-ethereum-blockchain-dataset-now-available-in-bigquery-for-smart-contract-analytics

Savia Lobo

03 Sep 2018

3 min read

Ethereum Blockchain dataset now available in BigQuery for smart contract analytics

Savia Lobo

03 Sep 2018

3 min read

0
0
3899

article-image-anima-anandkumar-the-machine-learning-guru-behind-aws-bids-adieu-to-aws

Sugandha Lahoti

01 Sep 2018

3 min read

Anima Anandkumar, the machine learning guru behind AWS bids adieu to AWS

Sugandha Lahoti

01 Sep 2018

3 min read

Anima Anandkumar has now bid adieu to AWS after working as the principal scientist at Amazon Web Services (AWS). She joined AWS in November 2016, as Principal Scientist on Deep Learning. She is best known for her work in the development and analysis of tensor algorithms and in the design, development, and launch of Amazon SageMaker. Anima has earned several prestigious awards, including the Alfred P. Sloan Research Fellowship, the NSF CAREER award, and Young Investigator Research award. After her successful 2 year stint in Amazon AWS, she has left her current post and written a heartwarming post on her personal blog. In her own words, “I want to recollect the rich learning experiences I had and the amazing things we accomplished over the last two years.” Amazon was Anima’s first industry job out of academia. She saw huge potential to democratize AI and hence chose AWS, it is the most comprehensive and broadly adopted cloud platform. During her tenure at Amazon she worked on the latest GPU instances, Deeplens, and on computer vision, natural language processing, speech recognition and other technologies. Her most important contribution, however, remains, Amazon SageMaker. Its broad adoption led to AWS increasing its ML user base by more than 250 percent over the last year. Anima says, “It was personally fulfilling to build topic modeling on SageMaker (and AWS comprehend) based on my academic research, which uses tensor decompositions. SageMaker topic-modeling automatically categorizes documents at scale and is several times faster than any other (open-source) framework. Taking the tensor algorithm from its theoretical roots to an AWS production service was a big highlight for me.” As a part of applied research at AWS, she has worked on deep active learning, crowdsourcing and semi-supervised learning methods in a number of domains. She contributed to Amazon community outreach by building partnerships with universities and non-profit organizations to democratize AI. She also represented AWS at many prominent avenues, including Deep Learning Indaba 2017, the first pan-African deep learning summit, Mulan forum for Chinese women entrepreneurs, Geekpark forum for startups in China and Shaastra 2018 at IIT Madras in India. Anima has always been a supporter of women in tech. When Anima went to IIT Madras, she realized the fewer number of women around her (the female to male ratio at IIT Madras was 1:20 then). “Even though I missed having more women in IIT, the women who got in there were remarkable since they overcame other barriers and still performed well; it gave a lot of confidence. Though I do wish there were more women and I'm always looking how to improve the diversity, it should be towards helping women overcome barriers (without compromising on performance/quality).” Her contributions make us realize the fact that women in tech are an important facet even though they are in smaller numbers. Read Anima’s adieu blog for a trip down her memory lane at AWS Cloud. Apollo 11 source code: A small step for a woman, and a huge leap for ‘software engineering’. “Technology opens up so many doors” – An Interview with Sharon Kaur from School of Code. Netflix brings in Verna Myers as new VP of Inclusion strategy to boost cultural diversity.

0
0
4159

Bhagyashree R

31 Aug 2018

3 min read

Google, Harvard researchers build a deep learning model to forecast earthquake aftershocks location with over 80% accuracy

Bhagyashree R

31 Aug 2018

3 min read

Google and Harvard have teamed up to find out a way to predict locations where the earthquake aftershocks might occur, with the help of deep learning. Currently, it is possible to only predict the timing and size of aftershocks with the help of empirical laws such as, Bäth's Law and Omori's Law. However, forecasting where these events will occur is more challenging. The researchers at Google, with Brendan Meade, a professor of Earth and Planetary Sciences at Harvard University, and Phoebe DeVries, a post-doctoral fellow working in his lab, are using machine learning to try finding a way to forecast the location where the aftershock occurs. DeVries believes that forecasting aftershock is a well-suited problem for machine learning to solve: "I'm very excited for the potential for machine learning going forward with these kind of problems -- it's a very important problem to go after. Aftershock forecasting in particular is a challenge that's well-suited to machine learning because there are so many physical phenomena that could influence aftershock behavior and machine learning is extremely good at teasing out those relationships. I think we've really just scratched the surface of what could be done with aftershock forecasting...and that's really exciting." How does this deep learning algorithm work? They started with a database consisting of information of nearly 118 major earthquakes from around the world. Next, they applied neural network to analyze the relationships between static changes caused by mainshocks and aftershock locations. The algorithm was able to extract useful patterns from the data. Finally, they got an improved model to forecast aftershock locations. This model is not absolutely precise, but proved to be significantly more reliable than most of the existing models like, Coulomb failure stress change. In terms of accuracy the deep learning model was able to hit 0.849, on an accuracy scale of 0 to 1. They have also published a paper documenting their findings. What are the future applications of this model? The deep learning-based model will help deploy emergency services such as structural modifications and storing supplies and emergency kit. It will help in making informed evacuation plans for areas at risk of an aftershock, beforehand. The model is far from ready to deploy in the real-world, but has definitely given a motivation to the researchers to investigate the relevance of deep learning in mitigating earthquake aftershocks. To know more on how Google and Harvard teamed up to solve the problem of earthquake aftershocks using deep learning check out Google’s blog post. AutoAugment: Google’s research initiative to improve deep learning performance Deep Learning in games – Neural Networks set to design virtual worlds Google strides forward in deep learning: open sources Google Lucid to answer how neural networks make decisions

0
0
2364

Tech News - Data

Introducing Deon, a tool for data scientists to add an ethics checklist

California replaces cash bail with algorithms

Twitter’s CEO, Jack Dorsey’s Senate Testimony: On Twitter algorithms, platform health, role in elections and more

Ethereum Constantinople hard fork to move Ethereum from PoW (proof-of-work) to PoS (proof-of-stake) model

Apache Flink founders data Artisans could transform stream processing with patent-pending tool

UN meetings ended with US & Russia avoiding formal talks to ban AI enabled killer robots

How Netflix uses AVA, an Image Discovery tool to find the perfect title image for each of its shows

PyTorch-based HyperLearn Statsmodels aims to implement a faster and leaner GPU Sklearn

Amazon is supporting research into conversational AI with Alexa fellowships

REVOLVER: A machine learning approach to forecast cancer growth

Trending Topics

iCAN module uses faster R-CNN for detecting Human-Object Interaction

Baidu Apollo autonomous driving vehicles gets machine learning based auto-calibration system

Ethereum Blockchain dataset now available in BigQuery for smart contract analytics

Anima Anandkumar, the machine learning guru behind AWS bids adieu to AWS

Google, Harvard researchers build a deep learning model to forecast earthquake aftershocks location with over 80% accuracy