Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-filestack-workflows-comes-with-machine-learning-capabilities-to-help-business-manage-their-digital-images

25 Oct 2018

3 min read

Filestack Workflows comes with machine learning capabilities to help business manage their digital images

25 Oct 2018

Filestack has come up with Filestack Workflows, a machine learning powered solution to help businesses detect, analyze, moderate and curate content in scalable and automated ways. Filestack and Workflows have traditionally been providing tools for companies to handle content as it is uploaded. Their tools checked for NSFW content, cropped photos, performed copyright detection on Word Docs, etc. However, handling content at scale using tools they've built in-house was proving to be difficult. They relied heavily on developers to implement the code or set up a chain of events. This brought them to develop a new interface that allows businesses to upload, moderate, transform and understand content at scale, freeing them to innovate more and manage less. The Filestack Workflows platform is built on a logic-driven intelligence functionality which uses machine learning to provide quick analysis of images and return actionable insights. This includes object recognition and detection, explicit content detection, optical character recognition, and copyright detection. Filestack Workflows provide flexibility for integration either from Filestack’s own API or from a simple user Interface. Workflows also have several new features that extend far beyond simple image transformation: Optical Character Recognition (OCR) allows users to abstract text from any given image. Images of everything from tax documents to street signs can be uploaded through their system, returning a raw text format of all characters in that image. Not Safe for Work (NSFW) Detection for filtering out content that is not appropriate for the workplace. Their image tagging feature can automate content moderations by implementing “safe for work” and a “not safe for work” score. Copyright Detection to determine if a file is an original work. A single API call will display the copyright status of single or multiple images. They have also released a quick demo to highlight the features of Filestack Workflows. This demo creates a Workflow that takes uploaded content (images or documents) and determines a filetype and then curates ‘safe for work’ images. It determines the Filetype using the following logic: If it is an 'Image' then: Determine if the image is 'Safe for Work' If it is 'Safe', then store to a specific storage source. If it is 'Not Safe' then, pixelate the image, and then store to a specific storage source for modified images. If it is a 'Document', then store to a specific storage source for documents. Read more about the news on Filestack’s blog. Facebook introduces Rosetta, a scalable OCR system that understands text on images using Faster-RCNN and CNN How Netflix uses AVA, an Image Discovery tool to find the perfect title image for each of its shows Datasets and deep learning methodologies to extend image-based applications to videos

0
0
3717

article-image-googles-cloud-robotics-platform-to-be-launched-in-2019-will-combine-the-power-of-ai-robotics-and-the-cloud

Melisha Dsouza

25 Oct 2018

3 min read

Google’s Cloud Robotics platform, to be launched in 2019, will combine the power of AI, robotics and the cloud

Melisha Dsouza

25 Oct 2018

3 min read

Earlier this week, Google announced its plans to launch a ‘Cloud Robotics platform’ for developers in 2019. Since the early onset of ‘cloud robotics’ in the year 2010, Google has explored various aspects of the cloud robotics field. Now, with the launch of Cloud Robotics platform, Google will combine the power of AI, robotics and the cloud to deploy cloud-connected collaborative robots. The platform will encourage efficient robotic automation in highly dynamic environments. The core infrastructure of the Platform will be open source and users will pay only for what services they use. Features of Cloud Robotics platform: #1 Critical infrastructure The platform will introduce secure and robust connectivity between robots and the cloud. Kubernetes will be used for the management and distribution of digital assets. Stackdriver will assist with the logging, monitoring, alerting, and dashboarding processes. Developers will gain access to Google’s data management and AI capabilities, ranging from Cloud Bigtable to Cloud AutoML. The standardized data types and open APIs will help developers build reusable automation components. Moreover, open APIs support interoperability, which means integrators can compose end-to-end solutions with collaborative robots from different vendors. #2 Specialized tools The tools provided with this platform will help developers to build, test, and deploy software for robots with ease. Composing and deploying automation solutions in customers’ environments through system integrators can be done easily. Operators can monitor robot fleets and ongoing missions, as well. Plus, users have to only pay for the services they use. That being said, if a user decides to move to another cloud provider, they can take their data with them! #3 Fostering powerful first-party services and third-partyy innovation Google’s initial Cloud Robotics services can be applied to various use cases like robot localization and object tracking. The services will process sensor data from multiple sources and use machine learning to obtain information and insights about the state of the physical world. It will encourage an ecosystem of hardware, and applications, that can be used and re-used for collaborative automation. #4 Industrial Automation made easy Industrial automation requires extensive custom integration. Collaborative robots can help improve flexibility of the overall process. It will help save costs and vendor lock ins. That being said, it is difficult to program robots to understand and react to the unpredictable changes of the physical human world. The Google Cloud platform will solve these issues by providing flexible automation services like Cartographer service, Spatial Intelligence service and Object Intelligence service Watch this video to know more about these services: https://www.youtube.com/watch?v=eo8MzGIYGzs&feature=youtu.be Alternatively, head over to Google's Blog to know more about this announcement. What’s new in Google Cloud Functions serverless platform Cloud Filestore: A new high performance storage option by Google Cloud Platform Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

0
0
5635

article-image-michelangelo-pyml-introducing-ubers-platform-for-rapid-machine-learning-development

Amey Varangaonkar

25 Oct 2018

3 min read

Michelangelo PyML: Introducing Uber’s platform for rapid machine learning development

Amey Varangaonkar

25 Oct 2018

3 min read

Transportation network giants Uber have developed Michelangelo PyML - a Python-powered platform for rapid prototyping of machine learning models. The aim of this platform is to offer machine learning as a service that democratizes machine learning and makes it possible to scale the AI models to meet business needs efficiently. Michelangelo PyML is an integration of Michelangelo - which Uber developed for large-scale machine learning in 2017. This will make it possible for their data scientists and engineers to build intelligent Python-based models that run at scale for online as well as offline tasks. Why Uber chose PyML for Michelangelo Uber developed Michelangelo in September 2017 with a clear focus of high performance and scalability. It currently enables Uber’s product teams to design, build, deploy and maintain machine learning solutions at scale and powers roughly close to 1 million predictions per second. However, that also came at the cost of flexibility. Users mainly were faced with 2 critical issues: It was possible to train the models using the algorithms that were only natively supported by Michelangelo. To run unsupported algorithms, the platform’s capability had to be extended to include additional training and deployment components. This caused a lot of inconvenience at times. The users could not use any feature transformations apart from those offered by Michelangelo’s DSL (Domain Specific Language) Apart from these constraints, Uber also observed that data scientists usually preferred Python over other programming language, given the rich suite of libraries and frameworks available in Python for effective analytics and machine learning. Also, many data scientists gathered and worked with data locally using tools such as pandas, scikit-learn and Tensorflow, as opposed to Big Data tools such as Apache Spark and Hive, while spending hours in setting them up. How PyML improves Michelangelo Based on the challenges faced in using Michelangelo, Uber decided to revamp the platform by integrating PyML to make it more flexible. PyML provides a concrete framework for data scientists to build and train machine learning models that can be deployed quickly, safely and reliably across different environments. This, without any restriction on the types of data they can use or the algorithms they can choose to build the model, makes it an ideal choice of tool to integrate with a platform like Michelangelo. By integrating Python-based models that can operate at scale with Michelangelo, Uber will now be able to handle online as well as offline queries and give smart predictions quite easily. This could be a potential masterstroke by Uber, as they try to boost their business and revenue growth after it slowed down over the last year. Read more Why did Uber created Hudi, an open source incremental processing framework on Apache Hadoop? Uber’s Head of corporate development, Cameron Poetzscher, resigns following a report on a 2017 investigation into sexual misconduct Uber’s Marmaray, an Open Source Data Ingestion and Dispersal Framework for Apache Hadoop

0
0
5534

article-image-pipelinedb-1-0-0-the-high-performance-time-series-aggregation-for-postgresql-released

Melisha Dsouza

25 Oct 2018

3 min read

PipelineDB 1.0.0, the high performance time-series aggregation for PostgreSQL, released!

Melisha Dsouza

25 Oct 2018

3 min read

Three years ago, the PipelineDB team published the very first release of PipelineDB, as a fork of PostgreSQL. It received enormous support and feedback from thousands of organizations worldwide, including several Fortune 100 companies. It was highly requested that the fork be released as an extension of PostgreSQL. Yesterday, the team released PipelineDB 1.0.0 as a PostgreSQL extension under the liberal Apache 2.0 license. What is PipelineDB? PipelineDB can be used while storing huge amounts of time-series data that needs to be continuously aggregated. It only stores the compact output of these continuous queries as incrementally updated table rows, which can be evaluated with minimal query latency. It is used for analytics use cases that only require summary data, for instance, for real-time reporting dashboards. PipelineDB will sespeciallybe beneficial in scenarios where queries are known in advance. These queries can be run continuously in order to make the data infrastructure that powers these real time analytics applications simpler, faster, and cheaper as compared to the traditional “store first, query later” data processing model. How does PipelineDB work? PipelineDB uses SQL to write time-series events to a stream, which are also structured as tables. A continuous view is then used to perform an aggregation over this stream. Even if billions of rows are written to the stream, the continuous view ensures that only one physical row per hour is actually persisted within the database. Once the continuous view reads new incoming events and the distinct count is updated to reflect new information, the raw events will be discarded and not stored in PipelineDB. Which enables it to achieve: Enormous levels of raw event throughput on modest hardware footprints Extremely low read query latencies Traditional dependence between data volumes ingested and data volumes stored is broken All of this facilitates a high performance for the system which is sustained indefinitely. PipelineDB also supports another type of continuous queries called ‘continuous transforms’. Continuous transforms are stateless and apply a transformation to a stream. They write out the result to another stream. Features of PipelineDB PipelineDB 1.0.0 has brought about some changes to version 0.9.7. The main highlights are as follows. Non-standard syntax has been removed. Configuration parameters are now qualified by pipelinedb. PostgreSQL pg_dump, pg_restore, and pg_upgrade tooling is now used instead of the PipelineDB variants Certain functions and aggregates are renamed to be descriptive about what problem they solve for the users . “Top-K” now represents Filtered-Space-Saving “Distributions” now refer to T-Digests “Frequency” now refers to Count-Min-Sketch Bloom filters introduced for set membership analysis Distributions and percentiles analysis is now possible What’s more? Continuous queries can be chained together into arbitrarily complex topologies of continuous computation. Each continuous query produces its own output stream of its incremental updates. This can be consumed by another continuous query as any other stream. The team aims to follow up with the functionality of automated partitioning for continuous views in the upcoming release. You can head over to the PipelineDb blog for more insights on this news. Citus Data to donate 1% of its equity to non-profit PostgreSQL organizations PostgreSQL 11 is here with improved partitioning performance, query parallelism, and JIT compilation PostgreSQL group releases an update to 9.6.10, 9.5.14, 9.4.19, 9.3.24

0
0
3294

article-image-lyft-acquires-computer-vision-startup-blue-vision-labs-in-a-bid-to-win-the-self-driving-car-race

Prasad Ramesh

24 Oct 2018

3 min read

Lyft acquires computer vision startup Blue Vision Labs, in a bid to win the self driving car race

Prasad Ramesh

24 Oct 2018

3 min read

Lyft created its Level 5 division last year with the aim to focus on developing self-driving cars. It is now acquiring London based computer vision startup Blue Vision Labs to bring safe and reliable autonomous driving to the streets first. Self-driving cars is one of the most challenging areas of applied machine learning today. In just a year, the Level 5 division into a team of 300 engineers and researchers. This acquisition marks the first acquisition for Level 5 and also its first step into the UK self-driving space. Blue Vision uses computer vision to build large-scale robotics and augmented reality applications. It was found in 2016 by graduates from University of Oxford and Imperial College London. Today they consist of 40 skilled experts in computer vision and robotics. With the technology from Blue Vision Labs, entire city maps in 3D can be formed just with the cameras mounted on cars. These maps make the car aware of its environment with high accuracy. In a Medium post, the VP of Engineering with Lyft, Luc Vincent says: “Blue Vision Labs is the first company able to build city-scale 3D maps from cell phone acquired imagery. This is truly amazing tech.” Vincent also hinted at bigger plans for the role Blue Vision Labs will play in the growth Lyft’s self driving division. He said, “It also has applications well beyond self-driving. For example, we are keen to explore how we can leverage Blue Vision Labs’ stack to more precisely pinpoint drivers’ and riders’ locations, and create new augmented reality interfaces that make transportation simpler and better for everyone.” In-vehicle advertising is a space all tech titans serious about autonomous tech like Alphabet’s Waymo and Apple’s secretive self-driving car project are vying for. Lyft seems to understand the value of being first to market in this area with this promising acquisition. Although there is no official statement about the acquisition details, as per some sources for TechCrunch, it is “around $72 million with $30 million on top of that based on hitting certain milestones.” This acquisition will drive Lyft’s self-driving visions on the streets of UK. Self-driving cars vacate an extra seat in cars and can contribute towards reducing the effects of problems like reducing pollution, traffic etc, believes the Lyft Level 5 team. To know more details about Lyft self-driving, visit the Lyft website. nuScenes: The largest open-source dataset for self-driving vehicles by Scale and nuTonomy This self-driving car can drive in its imagination using deep reinforcement learning Tesla is building its own AI hardware for self-driving cars

0
0
2550

article-image-citus-data-to-donate-1-of-its-equity-to-non-profit-postgresql-organizations

Sugandha Lahoti

24 Oct 2018

2 min read

Citus Data to donate 1% of its equity to non-profit PostgreSQL organizations

Sugandha Lahoti

24 Oct 2018

2 min read

Citus Data, which works on Postgres database technologies, announced that it will donate 1 percent of its equity to non-profit PostgreSQL organizations in the US and Europe. Their aim is to support the growth, education, and future innovation of the open-source Postgres database in both the US and in Europe. The company is also joining the Pledge 1% movement which provides a platform where companies can take a pledge to give back to the community. They have four options, Pledge 1% of equity, time, product, or profit. Citus Data basically creates an extension to Postgres that transforms PostgreSQL into a distributed database. Citus Data CEO Umur Cubukcu said, “You can contribute to open source in different ways. You can open source software you’ve created, you can maintain certain features and projects, and you can contribute to events with speakers and sponsorships—all of which our team spends a lot of time on. We are excited to create a new way to contribute to open source, by this donation.” According to Ozgun Erdogan, one of Citus Data founders “This 1% stock donation is a way for us to give back and to share a piece of our future success. And we believe the donation will make a real difference to future projects in the Postgres community.” RedMonk analyst and co-founder James Governor said, "Citus Data is both making an innovative bet, and paying it forward, by applying the 1% Pledge model to underpin the renaissance of the Postgres community". Magnus Hagander, open source advocate, PostgreSQL core team member, and president of PostgreSQL Europe says “What do I think about this donation of 1 percent equity from the team at Citus Data? I think it's a generous way to support the PostgreSQL community, and shines a light on the importance of supporting open source projects that underpin so many products and companies today.” Read more about the news on Citus Data blog. PostgreSQL 11 is here with improved partitioning performance, query parallelism, and JIT compilation. How to perform full-text search (FTS) in PostgreSQL Azure Database services are now generally available for MySQL and PostgreSQL

0
0
2487

article-image-nips-foundation-decides-against-name-change-as-poll-says-it-is-an-unpopular-superficial-move-instead-increases-focus-on-diversity-and-inclusivity-initiatives

Melisha Dsouza

24 Oct 2018

5 min read

NIPS Foundation decides against name change as poll finds it an unpopular superficial move; instead increases ‘focus on diversity and inclusivity initiatives’

Melisha Dsouza

24 Oct 2018

5 min read

The ‘Neural Information Processing Systems’, also known as ‘NIPS’ is a well known for hosting the most influential AI conferences over the past 32 years, all around the globe. The conference is organized by NIPS Foundation and brings together researchers from biological, psychological, technological, mathematical, and theoretical areas of science and engineering - including the big names of the tech industry like Google, Nvidia, Facebook, and Microsoft. The acronym of the conference has been receiving a lot of attention from members worldwide over the past few years. Some members of the community have pointed out that the current acronym ‘NIPS’ has unintended connotations which makes the name sound “sexist“ On the other hand, the decision of bringing about a name change only added further confusion and frustration. In August 2018, the organizers of the conference conducted a poll on the NIPS website asking people whether they agree or disagree with the potential name change. This was done taking cue from the several well-publicized incidents of insensitivity at past conferences. The poll requested alternative names for the conference, rating of the existing and alternative names, and encouraging additional comments from members. "Arguments in favor of keeping the existing name include a desire to respect the intellectual tradition that brought the meeting to where it is today and the strong brand that comes with this tradition. Arguments in favor of changing the name include a desire to better reflect the modern scope of the conference and to avoid distasteful connotations of the name." - Organizers of NIPS Out of the 2270 participants who took the survey, over 86% were male, around 13% were female, and 0.01% other gender or non-responsive. A key question in the poll was: “Do you think we should change the name of the NIPS conference?” To this, around 30% of the respondents said they support the name change (28% males and about 44% females) while 31% ‘strongly disagreed’ with the name change proposal (31% male and 25% female). Here is the summary of the response distribution: Source: nips.cc Some respondents also questioned whether the name was deliberately selected for a double entendre. But the foundation denies the claims as the name was selected in 1987, and sources such as Oxford English Dictionary show that the slang reference to a body part did not come into usage until years later. To the foundation, the results of the poll did not provide any useful insights to the situation. The first poll resulted in a long list of alternative names. Most of them being unsuitable for reasons like- existing brand, too close to names of other conferences, offensive connotations in some language. After shortlisting six names, a second poll was conducted. None of these names were strongly preferred by the community. Since the polls have not returned a consensus result, the foundation has decided not to change the name of the conference- at least for now. Here are some of the comments posted on the NIPS website (with permission) “Thanks for considering the name change. I am not personally bothered by the current name, which is semi-accurate and has no ill intent -- but I think the gesture of making a name change will send a much-needed inclusive vibe in the right direction” “If it were up to me, I'd call off this nice but symbolic gesture and use whatever time, money, and energy it requires to make actual changes that boost inclusivity, like providing subsidized child care so that parents can attend, or offering more travel awards to scholars from lesser-developed countries” “Please, please please change the name. It is sexist and a racist slur!!! I'm embarrassed every time I have to say the name of the conference” “As a woman, I find it offensive that the board is seriously considering changing the name of the meeting because of an adolescent reference to a woman’s body. From my point of view, it shows that the board does not see me as an equal member of the community, but as a woman first and a scientist second” “I am a woman, I have experienced being harassed by male academics, and I would like this problem to be discussed and addressed. But not in this frankly almost offensive way” Much of the feedback received from its members pointed towards taking a more substantive approach to diversity and inclusivity. Taking this into account, The NIPS code of conduct was implemented, two Inclusion and Diversity chairs were appointed to the organizing committee and, childcare support for NIPS 2018 Conference in Montreal has been introduced. In addition, NIPS has welcomed the formation of several co-located workshops focused on diversity in the field. NIPS is also extending support to additional groups, including Black in AI (BAI), Queer in AI@NIPS, Latinx in AI (LXAI), and Jews in ML (JIML). Twitter saw some pretty strong opinions on this decision- https://twitter.com/StephenLJames/status/1054996053177589760 The foundation hopes that the community's support will help in improving the inclusiveness of the conference for its diverse set of members. Head over to the Neural Information Processing Systems Blog post for more insights on this news. NIPS 2017 Special: 6 Key Challenges in Deep Learning for Robotics by Pieter Abbeel NIPS 2017 Special: How machine learning for genomics is bridging the gap between research and clinical trial success by Brendan Frey

0
0
3369

article-image-baidu-releases-a-new-ai-translation-system-stacl-that-can-do-simultaneous-interpretation

Sugandha Lahoti

24 Oct 2018

3 min read

Baidu releases a new AI translation system, STACL, that can do simultaneous interpretation

Sugandha Lahoti

24 Oct 2018

3 min read

Baidu has released a new AI-powered tool called STACL, that performs simultaneous interpretation. A simultaneous interpreter performs translation concurrently with the speaker’s speech, with a delay of only a few seconds. However, Baidu has taken a step ahead by predicting and anticipating the words a speaker is about to say a few seconds in the future. Current translation systems are generally prone to latency such as “3-word delay” and their systems are overcomplicated and slow to train. Baidu’s STACL overcomes these limitations by predicting the verb to come, based on all the sentences it has seen in the past. The system uses a simple “wait-k” model trained to generate the target sentence concurrently with the source sentence, but always k words behind, for any given k. STACL directly predicts target words, and seamlessly integrates anticipation and translation in a single model. STACL is also flexible in terms of the latency-quality trade-off, where the user can specify any arbitrary latency requirements (e.g., one-word delay or five-word delay). Presently, STACL works on text-to-text translation and speech-to-text translation. The model is trained on newswire articles, where the same story appeared in multiple languages. In the paper, the researchers demonstrated its capabilities in translating from Chinese to English. Source: Baidu They have also come up with a new metric of latency called “Averaged Lagging”, which addresses deficiencies in previous metrics. The system is of course, far from perfect. For instance, at present, it can’t correct its mistakes or apologize for it. However, it is adjustable in the sense that users will be able to make trade-offs between speed and accuracy. It can also be made more accurate by training it in a particular subject so that it understands the likely sentences that will appear in presentations related to that subject. The researchers are also planning to include speech-to-speech translation capabilities in STACL. To do this, they will need to integrate speech synthesis into the system while trying to make it sound natural. According to Liang Huang, principal scientist of Baidu’s Silicon Valley AI Lab, “STACL will be demoed at a Baidu World conference on November 1st, where it will provide a live simultaneous translation of the speeches. Baidu has previously shown off a prototype consumer device that does sentence-by-sentence translation,” and Huang says “his team plans to integrate STACL into that gadget.” Go through the research paper and video demos for extensive coverage. Baidu announces ClariNet, a neural network for text-to-speech synthesis. Baidu Security Lab’s MesaLink, a cryptographic memory safe library alternative to OpenSSL. Baidu releases EZDL – a platform that lets you build AI and machine learning models without any coding knowledge

0
0
3092

article-image-dejavu-2-0-the-open-source-browser-by-elasticsearch-now-lets-you-build-search-uis-visually

Melisha Dsouza

19 Oct 2018

3 min read

Dejavu 2.0, the open source browser by ElasticSearch, now lets you build search UIs visually

Melisha Dsouza

19 Oct 2018

3 min read

Today, the team at Dejavu announced a new version update to the open-source browser of ElasticSearch. Dejavu 2.0 is now generally available and comes with upgrades like search previews, improvements to the UI, better navigation and much more. While working with NoSQL databases or Elasticsearch, Dejavu helps users to import data, map it to data types, create and share filtered data views, and export this data out. Features of Dejavu 2.0 The browser now comes with a Search preview Functionality which will enable viewers to create a visual search UI from their Elasticsearch index. The browser has a better UI color scheme. The team has added navigation to help users perform tasks like importing their dataset via Import, browsing data via Browse, performing Search Preview and managing their schemas using Mappings View. In the previous version of the browser, when a user tried to perform a query, it showed ‘Refused to execute JavaScript URL’, because it violated a certain Content Security Policy directive. The issue stands fixed and Dejavu's chrome extension can run in incognito mode after being enabled by a user. Another suggestion when creating mappings via Dejavu's UI was to use ignore_above setting. This could be done to ignore very long characters from being set as keyword. Now, the Mappings created from Dejavu's UI set an ignore_above value to set a max term limit to 256 which is the same as ElasticSearch's default limit. In the previous version, when trying to add a URL with a / at the end of it, Dejavu threw an authentication error. This bug now stands fixed. Dejavu's build process dropped the use of Bower, resulting in more maintainability. Why use Dejavu? Dejavu allows users to connect to any of the indexes present in their cluster. It makes clusters easily accessible while browsing as it caches each connected index locally. Visual filters allow sorting through data, finding information visually, hiding irrelevant data and helps users interpret all the numbers and text they see. Based on the ElasticSearch query, Dejavu shows filtered views, as well as bulk updating or deleting documents via the query DSL. Besides this, the browser supports an infinite scroll based UI. Users can also update and delete data either individually or via queries in bulk. You can check out the GitHub page for more information on the other features of Dejavu as well as its comparison with other data browsers. After this update, the team is focusing on completely rewriting Dejavu to improve its performance. The browser will then support multi-index and full cluster views out of the box. It will allow a configurable page size view while supporting a mobile responsive view mode. You can head over to their Github Page to know more about the features of this release. How does Elasticsearch work? [Tutorial] How to perform Numeric Metric Aggregations with Elasticsearch Installing and Configuring X-pack on Elasticsearch and Kibana

0
0
3977

article-image-graph-nets-deepminds-library-for-graph-networks-in-tensorflow-and-sonnet

Sunith Shetty

19 Oct 2018

3 min read

Graph Nets – DeepMind's library for graph networks in Tensorflow and Sonnet

Sunith Shetty

19 Oct 2018

3 min read

Graph Nets is a new DeepMind’s library used for building graph networks in TensorFlow and Sonnet. Last week a paper Relational inductive biases, deep learning, and graph networks was published on arXiv by researchers from DeepMind, Google Brain, MIT and University of Edinburgh. The paper introduces a new machine learning framework called Graph networks which is expected to bring new innovations in artificial general intelligence realm. What are graph networks? Graph networks can generalize and extend various types of neural networks to perform calculations on the graph. It can implement relational inductive bias, a technique used for reasoning about inter-object relations. The graph networks framework is based on graph-to-graph modules. Each graph’s features are represented in three characteristics: Nodes Edges: Relations between the nodes Global attributes: System-level properties The graph network takes a graph as an input, performs the required operations and calculations from the edge, to the node, and to the global attributes, and then returns a new graph as an output. The research paper argues that graph networks can support two critical human-like capabilities: Relational reasoning: Drawing logical conclusions of how different objects and things relate to one another Combinatorial Generalization: Constructing new inferences, behaviors, and predictions from known building blocks To understand and learn more about graph networks you can refer the official research paper. Graph Nets Graph Nets library can be installed from pip. To install the library, run the following command: $ pip install graph_nets The installation is compatible with Linux/Mac OSX, and Python versions 2.7 and 3.4+ The library includes Jupyter notebook demos which allow you to create, manipulate, and train graph networks to perform operations such as shortest path-finding task, a sorting task, and prediction task. Each demo uses the same graph network architecture, thus showing the flexibility of the approach. You can try out various demos in your browser using Colaboratory. In other words, you don’t need to install anything locally when running the demos in the browser (or phone) via cloud Colaboratory backend. You can also run the demos on your local machine by installing the necessary dependencies. What’s ahead? The concept was released with ideas not only based in artificial intelligence research but also from the computer and cognitive sciences. Graph networks are still an early-stage research theory which does not yet offer any convincing experimental results. But it will be very interesting to see how well graph networks live up to the hype as they mature. To try out the open source library, you can visit the official Github page. In order to provide any comments or suggestions, you can contact [email protected]. Read more 2018 is the year of graph databases. Here’s why. Why Neo4j is the most popular graph database Pytorch.org revamps for Pytorch 1.0 with design changes and added Static graph support

0
0
6924

article-image-postgresql-11-is-here-with-improved-partitioning-performance-query-parallelism-and-jit-compilation

Natasha Mathur

19 Oct 2018

3 min read

PostgreSQL 11 is here with improved partitioning performance, query parallelism, and JIT compilation

Natasha Mathur

19 Oct 2018

3 min read

After releasing PostgreSQL 11 beta 1, back in May, the PostgreSQL Global Development Group finally released PostgreSQL 11, yesterday. PostgreSQL 11 explores features such as increased performance for partitioning, support for transactions in stored procedures, improved capabilities for query parallelism, and Just-in-Time (JIT) compilation for expressions among other updates. PostgreSQL is a popular open source relational database management system that offers better reliability, robustness, and enhanced performance measures. Let’s have a look at these features in PostgreSQL 11. Increased performance for partitioning PostgreSQL 11 comes with an ability to partition the data using a hash key, which is known as hash partitioning. This adds to the already existing ability to partition data in PostgreSQL using a list of values or by a range. Moreover, PostgreSQL 11 also improves the data federation abilities by implementing functionality improvements for partitions using PostgreSQL foreign data wrapper, and postgres_fdw. For managing these partitions, PostgreSQL 11 comes with a “catch-all” default partition for data that doesn’t match a partition key. It also comes with an ability to create primary keys, foreign keys, indexes as well as triggers on partitioned tables. The latest release also offers support for automatic movement of rows to the correct partition, given that the partition key for that row is updated. Additionally, PostgreSQL 11 enhances the query performance when reading from partitions with the help of a new partition elimination strategy. It also offers support for the popular "upsert" feature on partitioned tables. The upsert feature helps users simplify the application code as well as reduce the network overhead when interacting with their data. Support for transactions in stored procedures With PostgreSQL 11 comes newly added SQL procedures that help perform full transaction management within the body of a function. This enables the developers to build advanced server-side applications like the ones that involve incremental bulk data loading. Also, SQL procedures can now be created using the CREATE PROCEDURE command which is executed using the CALL command. These SQL procedures are supported by the server-side procedural languages such as PL/pgSQL, PL/Perl, PL/Python, and PL/Tcl. Improved capabilities for query parallelism PostgreSQL 11 enhances the parallel query performance, using the performance gains in parallel sequential scans and hash joins. It also performs more efficient scans of the partitioned data. PostgreSQL 11 comes with added parallelism for a range of data definitions commands, especially for the creation of B-tree indexes generated by executing the standard CREATE INDEX command. Other data definition commands that either create tables or materialize the views from queries are also enabled with parallelism. This includes the CREATE TABLE .. AS, SELECT INTO, and CREATE MATERIALIZED VIEW. Just-in-Time (JIT) compilation for expressions PostgreSQL 11 offers support for Just-In-Time (JIT) compilation, This helps to accelerate the execution of certain expressions during query execution. The JIT expression compilation uses the LLVM project to boost the execution of expressions in WHERE clauses, target lists, aggregates, projections, as well as some other internal operations. Other Improvements ALTER TABLE .. ADD COLUMN .. DEFAULT ..have been replaced with a not NULL default to rewrite the whole table on execution. This offers a significant performance boost when running this command. Additional functionality has been added for working with window functions, including allowing RANGE to use PRECEDING/FOLLOWING, GROUPS, and frame exclusion. Keywords such as "quit" and "exit" have been added to the PostgreSQL command-line interface to help make it easier to leave the command-line tool. For more information, check out the official release notes. PostgreSQL group releases an update to 9.6.10, 9.5.14, 9.4.19, 9.3.24 How to perform data partitioning in PostgreSQL 10 How to write effective Stored Procedures in PostgreSQL

0
0
6050

Bhagyashree R

18 Oct 2018

2 min read

Redis 5 is now out

Bhagyashree R

18 Oct 2018

2 min read

After announcing Redis 5 RC1 in May earlier this year, the stable version of Redis 5 was released yesterday. This release comes with a new Stream data type, LFU/LRU info in RDB, active defragmentation version 2, HyperLogLogs improvements and many other improvements. What is new in Redis 5? Redis 5 comes with a new data type called Stream, which models a log data structure in a more abstract way. Three new modules got important APIs: Cluster API, Timer API, Dictionary API. With these APIs, you can now build a distributed system with Redis using it just as a framework, creating your own protocols. To provide better-caching accuracy after a restart or when a slave does a full sync, RDB now stores the LFU and LRU information. In the future releases, we are likely to see a new feature that sends TOUCH commands to slaves to update their information about hot keys. The cluster manager is now ported from Ruby to C and is integrated with redis-cli. Because of this change, it is faster and no longer has any dependency. To learn more about the cluster manager, you can run the redis-cli --cluster help command. Also, many commands with subcommands have a HELP subcommand. Sorted set commands, ZPOPMIN/MAX, and blocking variants are introduced. These commands are used in applications such as time series and leaderboards. With active defragmentation version 2, the process of defragmenting the memory of a running server is better than before. This will be very useful for long-running workloads that tend to fragment Jemalloc. Jemalloc is now upgraded to version 5.1 Improvements are made in the implementations of the HyperLogLog data structure with refined algorithms to offer a more accurate cardinality estimation. This version comes with better memory reporting capabilities. Redis 5 provides improved networking especially related to emitting large objects, CLIENT UNBLOCK and CLIENT ID for useful patterns around connection pools and blocking commands. Read the full Redis 5 release notes on GitHub. MongoDB switches to Server Side Public License (SSPL) to prevent cloud providers from exploiting its open source code Facebook open sources LogDevice, a distributed data store for logs RxDB 8.0.0, a reactive, offline-first, multiplatform database for JavaScript released!

0
0
3369

article-image-creator-side-optimization-how-linkedins-new-feed-model-helps-small-creators

Melisha Dsouza

18 Oct 2018

4 min read

Creator-Side Optimization: How LinkedIn’s new feed model helps small creators

Melisha Dsouza

18 Oct 2018

4 min read

LinkedIn is used by 567M users every day. It creates new opportunities for connecting with professionals all over the globe, with more than a million posts, videos, and articles made on the LinkedIn feed each day. However, the team has identified that the growth in the number of post creators and viewers have led to issues. These include almost no recognition for lesser-known creators and viral posts drowning out posts from closer connections To combat this, the team combined multiple experimental techniques and came up with a smarter feed relevance model. The problem: Almost no recognition for small creators The team discovered that there was no equal distribution of reactions given by members on a creator’s post. In other words, the number of creators who get zero feedback on making a post was actually increasing. This posed a huge problem as getting feedback is a motivational boost for creators to continue posting in the future. Influencers with millions of followers do get more reactions as compared to the average person. If feed viewers kept giving feedback to the top 1% of posters, who were already getting plenty of attention, the lesser-known creators would not be recognized at all. A second issue encountered was that the team received anecdotal reports pointing out that irrelevant hyper-viral posts were gaming the feed and crowding out posts from closer connections. Issues with the old feed model The original LinkedIn feed model was designed in a way that if many people have already enjoyed, liked, and shared a piece of content, then the feed will correctly guess that a new viewer is also highly likely to enjoy it and hence show highly viral content. This meant viewers missed important posts from close connections and people they know personally. Moreover, the model was not programmed to consider how much the creator may appreciate receiving feedback from the viewer. The solution: A new Optimization function The team added an additional term in the optimization function of the relevance model. Once a creator receives feedback from the viewer, the term quantifies the value received. Now that the feed knew how much a given creator will appreciate getting feedback from a given viewer, it uses this information to rank posts. The model also takes into account ‘spam feedback’ considering the quality of the post to avoid spamming viewers with low-quality posts. This consideration for small creators ensures that no one is left behind and they can easily reach out to the community. Optimization from the creator’s perspective To test the model, the team performed an “upstream/downstream metrics.” They used a collection of upstream metrics, for example: “first likes given.” This metric quantifies how often a feed viewer likes a post that didn’t previously have any likes. If the viewer sees a post that doesn’t have any likes, and he clicks the like button for it, then it creates the test case of “first like given.” The other metric used was called “creator love” that describe how the creator feels about the viewer’s actions. It also decides the impact that the viewers’ actions has on a creators post. “creator love” upstream metrics The suite of metrics contains several variations on test cases involving comments, freshness of the post, and the changing value of feedback beyond the first piece. It all boils down to measuring a value given to the creator. The team also used edge-based bootstrapping on bernoulli randomization, pioneered by Jim Sorenson. Did the new feed model help? The answer is 'yes!'. This feature turned out to be successful for both creators and feed viewers. The team believes this change is helping posts from close connections to appears at the top of a member’s feed. And members like seeing more content from people they know! This model especially benefited creators with smaller networks. The model was supposed to take about 8% of feedback away from the top 0.1% of creators and redistribute it to the bottom 98%. This worked and showed a 5% increase in creators returning to post again. As for top creators, taking 8% of the likes away from the top 0.1% still leaves them better off than they were a year ago. These changes just help to ensure equality among all members of the network. It will interesting to see the impact that this new model will have on viewers feed and their reaction to the same. To more in-depth about the experiments the team performed on the model and their line of thoughts, you can visit their official blog. What is Statistical Analysis and why does it matter? Working with Azure container service cluster [Tutorial] Performing Sentiment Analysis with R on Obama’s State of the Union speeches [Tutorial]

0
0
1834

article-image-deepmind-open-sources-trfl-a-new-library-of-reinforcement-learning-building-blocks

Natasha Mathur

18 Oct 2018

3 min read

DeepMind open sources TRFL, a new library of reinforcement learning building blocks

Natasha Mathur

18 Oct 2018

3 min read

The DeepMind team announced yesterday that they’re open sourcing a new library, named TRFL, that comprises useful building blocks for writing reinforcement learning (RL) agents in TensorFlow. The TRFL library was created by the research engineering team at DeepMind. TRFL library is a collection of key algorithmic components that are used for a large number of DeepMind’s agents such as DQN, DDPG, and the Importance Weighted Actor Learner Architecture. A typical deep reinforcement learning agent usually comprises a large number of interacting components that includes the environment and some deep network representing values or policies. Apart from these, these RL agents also include components such as a learned model of the environment, pseudo-reward functions or a replay system. Moreover, these RL agents interact in subtle ways which makes it difficult to identify bugs in large computational graphs. To fix this issue, it is recommended to open-source complete agent implementations. This is because even though the large agent codebases are useful for reproducing research, it is hard to modify and extend them. Additionally, a different and complementary approach is to provide a reliable, well-tested implementation of common building blocks. These implementations can then be used in a variety of different RL agents. TRFL library helps as it includes functions that help implement both classical RL algorithms as well as other cutting-edge techniques. The loss functions and other operations that come with TRFL, are implemented in pure TensorFlow. These RL algorithms are not complete algorithms instead they’re implementations of RL-Specific mathematical operations which are required when building fully-functional RL agents. The DeepMind team also provides TensorFlow ops for value-based reinforcement learning in discrete action spaces such as TD-learning, Sarsa, Q-learning, and their variants. Moreover, it offers ops for implementing continuous control algorithms such as DPG as well as ops for learning distributional value functions. Finally, TRFL also comes with an implementation of the auxiliary pseudo-reward functions used by UNREAL. This improves data efficiency in a wide range of domains. “This is not a one-time release. Since this library is used extensively within DeepMind, we will continue to maintain it as well as add new functionalities over time. We are also eager to receive contributions to the library by the wider RL community”, mentioned the DeepMind team. For more information, check out the official DeepMind blog. Google open sources Active Question Answering (ActiveQA), a Reinforcement Learning based Q&A system Microsoft open sources Infer.NET, it’s popular model-based machine learning framework Salesforce Einstein team open sources TransmogrifAI, their automated machine learning library

0
0
3613

article-image-spammy-bots-most-likely-influenced-fccs-decision-on-net-neutrality-repeal-says-a-new-stanford-study

Melisha Dsouza

17 Oct 2018

4 min read

Spammy bots most likely influenced FCC’s decision on net neutrality repeal, says a new Stanford study

Melisha Dsouza

17 Oct 2018

4 min read

In December 2017, the Federal Communications Commission voted to kill net neutrality protections. This was done ignoring the overwhelming support of the masses towards safeguarding the open internet. Now, fresh reports have emerged according to a study made by a Stanford University researcher, out of the 22 million comments filed to the agency addressing the move to revoke regulations, nearly 100% of those comments were fake. Assisted by data scientist Jeff Kao; Ryan Singel, a media and strategy fellow at Stanford, sifted through all submitted comments to present his findings. Using a machine learning program, Kao segregated millions of comments that were fake and duplicated and most certainly taken from form and letter campaigns. He took the 60+GB dataset of comment, and mapped each comment into semantic space vectors. Further, he clustered the comments based on their meaning, which resulted in approximately 150 clusters of comment submissions. In the end, he was left with about 800,000 unique comments. What's surprising is that, out of all those comments only 0.3 percent supported the repeal of net neutrality. The question then arises, on what basis did the FCC decide to repeal net neutrality? Moreover, did they not have a system in place to filter out bot sent comments? The answer to the latter question is a big ‘NO’. The report suggests that the FCC did nothing to prevent comment stuffing and comment fraud. Most of the comments were submitted under false identities using emails belonging to journalists, lawmakers, and dead people. Subsequently, Kao contacted commenters by emailing them and asking them if they submitted the comment related to their email address. While the responses were varied, users submitting pro-net neutrality comments, confirmed that they did submit the comment. Moreover, after the public had cast its vote, there was no information released to the public, journalists and policy makers to actually understand what Americans had told the FCC about the repeal of the 2015 Open Internet Order. Ryan’s findings were released on 15th October and first reported by Motherboard. The report, entitled "Filtering Out the Bots: What Americans Actually Told the FCC about Net Neutrality Repeal," points out that Americans were well-informed on the topic of net neutrality. Ryan and Kao further went on to match and sort comments based on geographic areas. They deduced that 646,041 unique comments were matched to Congressional districts. The resulting reports for every district explores the concerns of the citizens over net neutrality. The report also suggests measures for FCC and other government agencies to avoid comment stuffing while making it easy for Americans to participate in nationwide discussions. The report suggests a confirmation email to be sent once a comment is posted by a user. The owner of the email can confirm or deny if they sent the comment. For users without an email-id, comments can be marked as “no email address given.” Comments could then be labeled as “confirmed,” “unconfirmed,” “denied,” “invalid email address,” or “no email address given.” This would aid researchers and policymakers to identify likely fake comments. Fake email id’s can be distributed and registered across the federal agencies to combat comment stuffing. To identify bot-controlled email address, the system could mark every comment with a count of the number of submissions from that particular email address. This will help discard repetitive comments from the same email id. You can download the full Filtering Out the Bots report to explore links to the individual reports for every Congressional district and state. The U.S. Justice Department sues to block the new California Net Neutrality law Furthering the Net Neutrality debate, GOP proposes the 21st Century Internet Act

0
0
1729

Tech News - Data

Filestack Workflows comes with machine learning capabilities to help business manage their digital images

Google’s Cloud Robotics platform, to be launched in 2019, will combine the power of AI, robotics and the cloud

Michelangelo PyML: Introducing Uber’s platform for rapid machine learning development

PipelineDB 1.0.0, the high performance time-series aggregation for PostgreSQL, released!

Lyft acquires computer vision startup Blue Vision Labs, in a bid to win the self driving car race

Citus Data to donate 1% of its equity to non-profit PostgreSQL organizations

NIPS Foundation decides against name change as poll finds it an unpopular superficial move; instead increases ‘focus on diversity and inclusivity initiatives’

Baidu releases a new AI translation system, STACL, that can do simultaneous interpretation

Dejavu 2.0, the open source browser by ElasticSearch, now lets you build search UIs visually

Graph Nets – DeepMind's library for graph networks in Tensorflow and Sonnet

Trending Topics

PostgreSQL 11 is here with improved partitioning performance, query parallelism, and JIT compilation

Redis 5 is now out

Creator-Side Optimization: How LinkedIn’s new feed model helps small creators

DeepMind open sources TRFL, a new library of reinforcement learning building blocks

Spammy bots most likely influenced FCC’s decision on net neutrality repeal, says a new Stanford study