Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News - Data

1208 Articles
article-image-nvidia-brings-new-deep-learning-updates-at-cvpr-conference
Sunith Shetty
20 Jun 2018
4 min read
Save for later

NVIDIA brings new deep learning updates at CVPR conference

Sunith Shetty
20 Jun 2018
4 min read
NVIDIA team has announced a new set of deep learning updates on their cloud computing software and hardware front during Computer Vision and Pattern Recognition Conference (CVPR 2018) held in Salt Lake City. Some of the key announcements made during the CVPR conference include Apex, an early release of a new open-source PyTorch extension, NVIDIA DALI and NVIDIA nvJPEG for efficient data optimization and image decoding, Kubernetes on NVIDIA GPUs release candidate, and runtime engine TensorRT version 4. Let’s look at some noteworthy updates made during CVPR conference: Apex Apex is an open-source PyTorch extension that includes all the required NVIDIA-maintained utilities to provide optimized and efficient mixed precision results and distributed training in PyTorch. This new extension helps machine learning engineers and data scientists to maximize deep learning training performance on NVIDIA Volta GPUs. The core promise of Apex is to provide up-to-date utilities to users as quickly as possible. Some of the notable features included are: NVIDIA PyTorch team has been inspired by the state of the art mixed precision training in tasks such as sentiment analysis, translational networks, and image classification. This has allowed them to create a set of tools to bring these methods to all levels of PyTorch users. Apex provides mixed precision utilities which are designed to improve training speed while maintaining the accuracy and stability of training in single precision. With Apex, you will now only require four or fewer line changes to the existing code to provide automatic loss scaling, automated execution of operations on FP16 or FP32, and automatic handling of master parameter conversion. In order to install/use Apex in your own development environment, you will require CUDA 9, PyTorch 0.4 or later, and Python 3. The extension is still in their early release, so we can expect the modules and utilities to undergo changes. If you want to download the code and get started with the tutorials and examples, you can visit the GitHub page. You can visit the official announcement page for more details. NVIDIA DALI and NVIDIA nvJPEG NVIDIA is using the power of GPUs with NVIDIA DALI, which utilizes the NVIDIA nvJPEG library to work on images at greater speed. This allows one to deal with performance bottleneck issues faced during image recognition and while decoding in deep learning powered computer vision applications. NVIDIA DALI is an open-source GPU-accelerated data augmentation and image loading library which can be used to optimize data pipelines (data optimization) of deep learning frameworks. You can refer to the GitHub page to learn more. NVIDIA nvJPEG is a GPU-accelerated library for JPEG decoding. You can download the release candidate for feedback and testing. This new update allows deep learning practitioners and researchers to have optimized training performance on image classification models such as ResNet-50 with MXNet, TensorFlow, and PyTorch across Amazon Web Services P3 8 GPU instances or DGX-1 systems with Volta GPUs. You can refer to the official announcement page for more details. Kubernetes on NVIDIA GPUs NVIDIA team has announced a release candidate of Kubernetes on NVIDIA GPUs which is freely available to developers for testing. This allows the enterprise to scale up training and ease up deployment to multi-cloud GPU clusters smoothly. This will ensure automated deployment, maintenance, and proper scheduling and operations of multiple GPU accelerated containers across clusters of nodes. You can arrange the growing resources on heterogeneous GPU clusters. To know more about this update, you can refer to the official announcement page. TensorRT 4 This new release of inference optimizer and runtime engine adds new layers such as recurrent neural networks, multilayer perceptrons, ONNX parser, and integration with TensorFlow to ease deep learning tasks. Moreover, it also provides the ability to execute custom neural network layers using FP16 precision and support for the Xavier SoC through NVIDIA DRIVE AI platforms. TensorRT ensures speeding up deep learning tasks such as machine translation, speech and image processing, recommender systems on GPUs. Using TensorRT across these application areas speed up the process 45x to 190x. All members of NVIDIA registered developer program can use TensorRT 4 for free. For more detailed information about the new features and updates, you can visit the developer’s official page. Read more NVIDIA open sources NVVL, library for machine learning training Nvidia’s Volta Tensor Core GPU hits performance milestones. But is it the best? Nvidia Tesla V100 GPUs publicly available in beta on Google Compute Engine and Kubernetes Engine
Read more
  • 0
  • 0
  • 2375

article-image-a-new-geometric-deep-learning-extension-library-for-pytorch-releases
Sunith Shetty
19 Jun 2018
2 min read
Save for later

A new geometric deep learning extension library for PyTorch releases!

Sunith Shetty
19 Jun 2018
2 min read
PyTorch Geometric is a new geometric deep learning extension library for PyTorch. With this library, you will be able to perform deep learning on graphs and other irregular graph structures using various methods and features offered by the library. Additionally, it also offers an easy-to-use mini-batch loader and helpful transforms to perform complex operations. In order to create your own simple interfaces, you can use a range of a large number of datasets offered by PyTorch Geometric library. You can use all these sets of features for performing operations on both arbitrary graphs as well as on 3D meshes or point clouds. You can find the following list of methods that are currently implemented in the library: SplineConv, Spline based CNNs which are used for irregular structured and geometric input (For eg: Graphs or meshes). You can refer to the research paper for more details. GCNConv provides a scalable approach using semi-supervised learning on graph-structured data. You can refer to the research paper for more details. ChebConv uses a generalized CNN model with fast localized spectral filtering on graphs. You can refer to the research paper for more details. NNConv uses a neural message passing algorithm for Quantum chemistry. You can refer to the research paper for more details. GATConv uses graph attention networks that operate on graph-structured data. You can refer to the research paper for more details. AGNNProp uses attention-based graph neural networks for graph-based semi-supervised learning. You can refer to the research paper for more details. SAGEConv uses representation learning on large graphs thus achieving great results in a variety of prediction tasks. You can refer to the research paper for more details. Graclus Pooling uses weighted graph cuts without Eigenvectors. You can refer to the research paper for more details. Voxel Grid Pooling In order to learn more about the installation, data handling mechanisms and a full list of implemented methods and datasets, you can refer the documentation. If you want simple hands-on examples to practice you can refer the examples/ directory. The library is currently in its first Alpha release. You can contribute to the project by raising an issue request if you notice anything unexpected. Read more Can a production ready Pytorch 1.0 give TensorFlow a tough time? Is Facebook-backed PyTorch better than Google’s TensorFlow? Python, Tensorflow, Excel and more – Data professionals reveal their top tools
Read more
  • 0
  • 0
  • 4602

article-image-the-most-valuable-skills-for-web-developers-to-learn-in-2018
Natasha Mathur
18 Jun 2018
7 min read
Save for later

The most valuable skills for web developers to learn in 2018

Natasha Mathur
18 Jun 2018
7 min read
Machine learning is gradually transforming the development landscape. Being the hottest technology in the software industry currently, everyone from professionals to beginners, are hopping on the machine learning bandwagon. Machine learning is filled with immense potential, paving the way for people to build cutting-edge applications across different domains. This is why application developers have started to incorporate parts of machine learning into their development process to make it more effective. A web or an app developer who knows ML has a competitive edge over the one who doesn’t. In this year’s Skill Up 2018 Survey, we asked developers about the most valuable skill they would want to adopt and the answer was: Machine Learning.   Source: Packt Skill Up Survey 2018 But, how does machine learning help with the web and app development process? Impact of Machine Learning on web & app development Self-driving cars, robots, face detectors, etc, all have a common denominator: Machine Learning. These are some popular areas, we have seen ML models create wonders by identifying the best and the worst of the user-generated content to make it highly valuable experience on the web. But machine learning is everywhere. How can we not remember Machine learning to help us find out and eradicate web spam which used to damage user experience?  Google’s artificial neural network helped in email spam filtering which has blocked almost 99% of spam emails from reaching our inboxes. Companies like Pinterest and Instagram use ML to show ever interesting and engaging content on their apps. Another example is of Uber app which uses Machine Learning to create a seamless and reliable experience for customers.  With advanced technologies like ML & AI used for designing the Uber app, helps estimate the time of arrival and cost of travel. It also helps in providing real-time information about the driver’s location to the customers. Among other areas, Uber uses ML to enable an efficient ride-sharing marketplace, identify suspicious or fraudulent accounts, suggest optimal pickup and dropoff points and even facilitate UberEATS delivery. Machine learning has the potential to take development skills to the next level. So if you want to be a versatile developer, ML, no longer has to be a skill that you put on the back-burner. However, that's not all to the story, there are plenty of such examples where companies use ML to build their products. And there are plenty of reasons and opportunities for web developers to dive into machine learning. Let us take a look at each one by one: Machine learning for data mining Organizations across the globe use different data mining techniques to examine their large database in order to discover new information. ML can be used for data mining since it is quite effective in detecting new patterns based on huge amounts of data. It uses pattern recognition techniques and computational learning for data prediction. Web developers can leverage web mining technique which is a subset of data mining. It uncovers distinct usage patterns from web data to understand and better serve the needs of Web-based applications. It helps developers discover useful data such as users’ browsing history and the origin of the web users. Web structure mining can further help developers to analyze nodes and connection structure of a website to describe HTML or XML tags usage. Comprehending customer behavior Web apps and other mobile apps make use of supervised machine learning algorithms to address issues faced by the user. This, in turn, helps ameliorate the entire customer service process. For instance, contact us forms are quite prevalent on websites these days. Contact us forms eliminate the need for the users to self-select an issue and fill out ceaseless form fields to get in touch with the customer care executive. All you have to do now is fill in the contact us form and you’ll hear back from the respective customer care center. This helps streamline the customer service process. Another great example is Chatbots. Chatbots helps website or an app to better understand the patterns in customer behavior. What do customers search for the most? What is customer’s buying tendency? What problems are they facing? These questions can be easily answered by a chatbot which is built on machine learning algorithms. As a developer, you will feel overwhelmed by having developed such innovative solutions and enhance the whole process. Personalizing content The number one example of machine learning helping developers personalize the content within their sites is Facebook. In fact, several social media applications are heavily leveraging the potential of machine learning to provide users with more personalized and relevant content. Facebook uses ML in the form of automatic friend tagging suggestions, mutual friend analysis,  personalized news feed, and video recommendations as per user’s choice. It uses a combination of predictive analytics and statistical analysis to detect patterns based on the user’s data to create an ever engaging content. Recently, Twitter also started implementing machine learning algorithms to value user’s time by providing them with a deeply personalized feed which is custom-tailored as per the user’s choice and liking. Machine learning has become game-changer for the social media websites and developers should grab this opportunity with both their hands. Dealing with security threats Machine learning technique such as logistic regression can help developers find and evaluate websites that are malicious in nature. Another such machine learning algorithm is called classification algorithm. It can help detect and predict phishing websites. Detection of phishing websites depends on factors such as security features, domain identity, and data encryption technique. Some of the examples of prevalent applications that are making use of ML for web and app development are Snapchat, Tinder, Netflix, etc. For instance, Snapchat uses machine learning which helps in perceiving people’s facial components. Similarly, Netflix uses Linear relapse, Logistic relapse, and other machine learning calculations. These calculations at Netflix track users’ activities to provide personalized content for the viewers. Hence, we speculate Machine learning to completely transform the development process and help web developers take a bigger leap in their career. The Skill up survey also revealed another skill that the developers are keen to learn in the next 12 months; that is Python. Python: the go-to language for both machine learning and web development Python is one of the top languages for both web development as well as machine learning. It has an easy syntax and faster development time which makes it a good choice for the developers. Python contains a vast number of ML libraries such as scikit learn, Keras, Tensorflow, SciPy, and boasts a rich and vibrant machine learning community. The versatile features of the Python language have helped build some of the most robust and popular websites like Instagram, Quora, Youtube, etc. Likewise, the powerful capabilities of machine learning have made our life simpler. It has introduced us to the world of virtual assistants like Siri, Cortana, and face detection technology among others. Machine learning is an incredible breakthrough for businesses and consumers alike. So, if you’re interested in getting counted among the upper echelons of the development world then be ready to expand your toolbelt. Dive into the machine learning world and brace yourself for the opportunities that are to find your way. Asking if good developers can be great entrepreneurs is like asking if moms can excel at work and motherhood What are web developers favorite front-end tools? Packt’s Skill Up report reveals all Developers think managers don’t know enough about technology. And that’s hurting business.
Read more
  • 0
  • 0
  • 3054
Visually different images

article-image-googles-translation-tool-is-now-offline-and-more-powerful-than-ever-thanks-to-ai
Pravin Dhandre
13 Jun 2018
2 min read
Save for later

Google's translation tool is now offline - and more powerful than ever thanks to AI

Pravin Dhandre
13 Jun 2018
2 min read
Google has today rolled out its super fast translation package in offline mode. This will deliver accurate and natural machine translations to users without a live connection to the internet. The team at Google worked for almost more than 2 years to deliver the powerful neural machine translation (NMT) technology to Google’s native Translate applications on smartphones. Using neural nets, the package should provide instant and accurate human-sounding translations for both Android and iOS users. Previously, the offline translation tool worked by breaking down sentences and then translating every individual phrase. However, with AI-powered NMT technology, the app translates the whole sentence swiftly in one. NMT uses millions of translated examples collected from different sources including books, documents, articles, and search engine results. This information is then used to understand how a given sentence can be formulated in a natural way that remains true to its intended context. In addition, this offline feature is surprisingly compact. Each language package is just 35 MB. That means you’ll be able to download it to your phone without using up all of your precious storage. Google says that the package would be very soon rolled out in over 59 languages in next couple of days. It should include European, Indian and several other languages. At present, you will be able to translate the following languages offline: Afrikaans, Albanian, Arabic, Belarusian, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Creole, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Macedonian, Malay, Maltese, Marathi, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese and Welsh. To use offline translations in your Google Translate app, browse to Offline Translation settings, tap the symbol next to the language name and the package gets downloaded. To learn more, check out the official announcement at the Google Blog page. FAE (Fast Adaptation Engine): iOlite’s tool to write Smart Contracts using machine translation How to auto-generate texts from Shakespeare writing using deep recurrent neural networks Implement Named Entity Recognition (NER) using OpenNLP and Java
Read more
  • 0
  • 0
  • 3693

article-image-ibm-unveils-worlds-fastest-supercomputer-with-ai-capabilities-summit
Natasha Mathur
11 Jun 2018
3 min read
Save for later

IBM unveils world’s fastest supercomputer with AI capabilities, Summit

Natasha Mathur
11 Jun 2018
3 min read
World’s most powerful and smartest supercomputer, called Summit, has been revealed by IBM and Department of Energy of Oak Ridge National Laboratory. It is capable of performing 200 quadrillion calculations each second, a speed called 200 petaflops which is almost as fast as 7.6 billion people on the planet doing 26 million calculations each second on a basic calculator. Summit was funded back in 2014. It was a part of $325 million Department of Energy program called Coral, but it took quite a few years to develop Summit. Summit is capable of delivering high speed with a new processor, quick storage capacity, internal communications, and a versatile design that can use Artificial Intelligence methods. This makes it quite expensive. Let’s have a look at the features that the Summit Supercomputer entails. Supercomputer and AI integration Dave Turek, vice president of high-performance computing and cognitive systems at IBM said that AI and high-performance computing are not different domains. The two are deeply interconnected to each other which is why Summit will be seen using AI methods for different purposes. Summit will mainly be used for AI development and machine learning. Apart from AI, Oak Ridge will be using Summit to carry out scientific research in subjects such as chemical formula designing, studying links between cancer and genes on a large scale, fusion energy investigation, universe research using astrophysics and simulation of changing Earth’s climate. Super Big Supercomputer Source: Oak Ridge National Laboratory Summit consists of 4,608 interconnected computer servers, housed in huge refrigerator-sized cabinets. It takes up an eighth of an acre, which, to put into perspective is the size of two tennis courts. Peak energy consumption of Summit is 15 megawatts which have the capacity to power more than 7,000 homes. Each server has two IBM Power9 chips at 3.1 GHz. Each chip has 22 cores running in parallel and six Nvidia Tesla V100 GPUs each. Each server consists of 1.6 terabytes of memory and data can be saved at 2.2 terabytes each second on a storage system of 250-petabyte which is 1000 times the storage capacity of a high-end laptop. Supercomputer performance measure Supercomputers’ performance is measured in terms of a benchmark called Linpack in the top 500 list and China's Sunway TaihuLight grabs the highest Linpack score of 93 petaflops. But Turek feels that measuring the value of a machine based on a single figure of merit is not that accurate; rather a machine should be able to scale on real applications. This is IBM’s attempt to exascale in the future. With Summit, IBM is quite convinced that it can reach its goal to build a system capable of performing a quintillion calculations per second (five times that of Summit). Along with Summit, there is also work being done on a less powerful computer, Sierra. Both are scheduled to go online sometime this year. This will take U.S’s arsenal of supercomputers a step forward in terms of competition. Lately, the top spots have been held by other countries, but Summit can become the United States’ chance to stay ahead in the game by retaking the lead. PyCon US 2018 Highlights: Quantum computing, blockchains, and serverless rule! Quantum A.I. : An intelligent mix of Quantum+A.I. Q# 101: Getting to know the basics of Microsoft’s new quantum computing language  
Read more
  • 0
  • 0
  • 2680

article-image-tensorflow-1-9-0-rc0-release-announced
Pravin Dhandre
08 Jun 2018
2 min read
Save for later

TensorFlow 1.9.0-rc0 release announced

Pravin Dhandre
08 Jun 2018
2 min read
TensorFlow Community keeps rolling with updates. The first release candidate for next minor version release 1.9.0 is unveiled today with pretty good list of features, improvements and bug fixes. In its previous version 1.8.0 release, the team paid more attention towards supporting GPU memory, running on multiple GPUs and cloud performance. In today’s release, the team were strong in adding support to Keras, gradient estimators and improvement in the layers. Major features and improvements in TensorFlow 1.9.0-rc0: Updated tf.keras to the Keras 2.1.6 API. tfe.Network is deprecated and can be inherited from tf.keras.Model. Added support of core feature columns and losses to gradient boosted trees estimators. The distributions.Bijector API supports broadcasting for Bijectors with new API changes. Layered variable names changed Bug Fixes in TensorFlow 1.9.0-rc0: The DatasetBase::DebugString() method is now const. Added the tf.contrib.data.sample_from_datasets() API for randomly sampling from multiple datasets. Eager Execution and Accelerated Linear Algebra (XLA) fixed. tf.keras.Model.save_weights by default saves in TensorFlow format. TensorFlow Debugger (tfdbg) CLI fixed. Added "constrained_optimization" to tensorflow/contrib. tf.contrib.framework.zero_initializer supports ResourceVariable. tf.contrib.data.make_csv_dataset() supports line breaks in quoted strings. Miscellaneous changes: Added GCS Configuration Ops. MakeIterator signature changed to enable propagating error status. KL divergence for two Dirichlet distributions. More consistent GcsFileSystem behavior for reads past EOF. Added Benchmark for tf.scan in graph and eager modes. Added complex128 support to FFT, FFT2D, FFT3D, IFFT, IFFT2D, and IFFT3D. Support for preventing tf.gradients() from backpropagating through integer tensors. Supports indicator column in boosted trees. Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter now supports arbitrary. LinearOperator[1D,2D,3D]Circulant added to tensorflow.linalg. Allows LinearOperator to broadcast. For the complete list of bug fixes and improvements, you can read TensorFlow’s Github page. You can also download the source code to access all the exciting features of TensorFlow 1.9.0-rc0. Implementing feedforward networks with TensorFlow How TFLearn makes building TensorFlow models easier Distributed TensorFlow: Working with multiple GPUs and servers
Read more
  • 0
  • 0
  • 2748
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-keras-2-2-0-releases
Sunith Shetty
08 Jun 2018
3 min read
Save for later

Keras 2.2.0 releases!

Sunith Shetty
08 Jun 2018
3 min read
Keras team has announced a new version 2.2.0 with notable features to allow developers to perform deep learning with ease. This release has brought new API changes, new input modes, bug fixes and performance improvements to the high-level neural network API. Keras is a popular neural network API which is capable of running on top of TensorFlow, CNTK or Theano. This Python API is developed with a focus on bringing fast experimentation results, thus taking least possible delay while doing research. It is a highly efficient library allowing easy and fast prototyping, and can even run seamlessly on CPU and GPU. Some of the noteworthy changes available in Keras 2.2.0: New areas of improvements A new API called Model subclassing is added for model definition. They have added a new input mode which provides the ability to call models on TensorFlow tensors directly (however this is applicable to TensorFlow backend only). More improved feature coverage of Keras with the CNTK and Theano backends. Lots of bug fixes and performance improvements are done to the Keras API Now, Keras engine will follow a much more modular structure, thus improving code structure, code health, and reduced test time. Keras modules applications and preprocessing are now externalized to their own repositories such as keras-applications and keras-preprocessing respectively. New API changes MobileNetV2 application added which is available for all backends. Enabled CNTK and Theano support for applications Xception and MobileNet. They have also extended their support for layers SeparableConv1D, SeparableConv2D, as well as the backend methods separable_conv1d and separable_conv2d. which was previously only available for TensorFlow. Now you can feed symbolic tensors to models, with TensorFlow backend. Support for input masking in the TimeDistributed layer. ReLU activation is made easier to configure while retaining easy serialization capabilities by adding an advanced_activation layer ReLU. In order to have a complete list of new API changes, you can visit Github. Breaking changes They have removed the legacy Merge layers and their related functionalities which were the remains of Keras 0. These layers were deprecated in May 2016, with full eviction schedules for August 2017. From now on models from the Keras 0 API using these layers will not be loaded with Keras 2.2.0 and above. The base initializer called truncated_normal now return values that are scaled by ~0.9 thus providing the correct variance value after truncation. For the full list of updates, you can refer the release notes. Read more Why you should use Keras for deep learning Implementing Deep Learning with Keras 2 ways to customize your deep learning models with Keras How to build Deep convolutional GAN using TensorFlow and Keras
Read more
  • 0
  • 0
  • 3682

article-image-project-hydrogen-making-apache-spark-play-nice-with-other-distributed-machine-learning-frameworks
Sunith Shetty
06 Jun 2018
5 min read
Save for later

Project Hydrogen: Making Apache Spark play nice with other distributed machine learning frameworks

Sunith Shetty
06 Jun 2018
5 min read
Apache Spark team has revealed a new venture during a keynote at Spark AI Summit called Project Hydrogen. This new project focuses on eliminating the obstacles faced by organizations from using Spark with various deep learning frameworks such as TensorFlow and MxNet. The rise of Apache Spark is quite evident from the fact it is one of the highly accepted platforms for big data processing even outperforming other big data frameworks like Hadoop. It has shown a significant growth in the big data field. Due to its excellent functionalities and services, Apache Spark is one of the most used big data unified framework for carrying out data processing, SQL querying, real-time streaming analytics, and machine learning. If you want to understand why Apache Spark is gaining popularity, you can check out our interview with Romeo Kienzler, Chief Data Scientist in the IBM Watson IoT worldwide team. What are the current limitations of Apache Spark? Apache Spark works fine when you want to work in the big data field. However, the power of Spark’s single framework breaks down when one tries to use other third-party distributed machine learning or deep learning frameworks. Apache Spark has its own machine learning library called Spark MLlib, which provides noteworthy machine learning functionalities. However looking at the rate of development and research in the machine learning and artificial intelligence domain, data scientists and machine learning practitioners want to explore the power of leading deep learning frameworks such as TensorFlow, Keras, MxNet, Caffe2, and more. The problem is, Apache Spark and deep learning frameworks don’t play well together. With growing requirement and advanced tasks, Spark users do want to combine Spark together with those frameworks in order to handle complex functionalities. However, the main problem is the incompatibility between the way how Spark scheduler works and other machine learning frameworks works. Do we have any in-house solutions? Basically, there are two possible options for combining Spark with other deep learning frameworks, Option 1 We will need to use two different clusters to carry out individual work. Source: Databricks - Spark AI Summit 2018 As you can see in the preceding image, we have two clusters. All the data processing work which includes data prep, data cleansing and more can be performed in the Spark cluster, the final result is shared to a storage repository (HDFS or S3). The second cluster which is running the distributed machine learning framework can read the data stored in the repository, This architecture no more follows a unified nature. One of the core challenges faced is handling these two disparate systems separately since you need to understand how each system work. Each cluster might follow different debugging schemes, different log files, thus making it very difficult to operate. Option 2 Some users have tried to tackle all the challenges faced in option 1 such as operational difficulties, debugging, testing challenges and more by implementing option 2. As you can see in the following image, here we have one cluster that runs both Spark and distributed machine learning frameworks. However, the result is not so convincing. The main problem with this approach is the inconsistency between how both systems work. There is a great difference between how Spark tasks are scheduled and how deep learning tasks are scheduled. In Spark environment, each job is divided into a number of subtasks that are independent of each other. However, deep learning frameworks use different scheduling schemes. Based on the job, they either use MPI or their own custom RPCs for doing communication. Here they assume complete coordination and dependency among their set of tasks. Source: Databricks - Spark AI Summit 2018 You can see clear signs of this approach when the tasks fail. For example, as shown in the following figure, in the Spark model, when any task fails, the Spark scheduler simply restarts the single task, and thus the entire job is fully recovered. However, in case of deep learning frameworks, because of complete dependency if any of the tasks fails all the tasks need to be launched again. Source: Databricks - Spark AI Summit 2018 The Solution: Project Hydrogen Project Hydrogen aims to solve all the challenges faced while using Spark and other deep learning frameworks together. It is positioned as a potential solution allowing all the data scientists to plug Spark with other deep learning frameworks. This project uses a new scheduling primitive called Gang scheduler. This primitive addresses the dependencies challenge introduced by the deep learning schedulers as shown in option 2. Source: Databricks - Spark AI Summit 2018 In gang scheduling, it has to schedule all or nothing which means it schedule all the tasks in one go or none of the tasks are scheduled at all. This measure will successfully handle the disparity between how both systems work. What’s next? Project Hydrogen API is not ready yet. We can expect them to be added to the core Apache Spark project later this year. The primary goal of this project is to embrace all distributed machine learning frameworks in the Spark ecosystem. Thus allowing every other framework to run as smoothly as Apache Spark’s machine learning library MLlib. Along with Spark support for deep learning frameworks, they are also working on speeding up the data exchanges, which often becomes a potential bottleneck while doing machine learning and deep learning tasks. In order to comfortably use FPGA or GPUs in your latest clusters, Spark is working closely with accelerators. Read more Apache Spark 2.3 now has native Kubernetes support! How to win Kaggle competition with Apache SparkML How to build a cold-start friendly content-based recommender using Apache Spark SQL
Read more
  • 0
  • 0
  • 4509

article-image-databricks-open-sources-mlflow-simplifying-end-to-end-machine-learning-lifecycle
Pravin Dhandre
06 Jun 2018
2 min read
Save for later

Databricks open sources MLflow, simplifying end-to-end Machine Learning Lifecycle

Pravin Dhandre
06 Jun 2018
2 min read
Machine Learning has energised software applications with highly accurate predictions thereby upsurging the product demand of tech driven companies. However, while developing such smart applications, numerous machine learning challenges and software development issues are been faced by data scientist and machine learning professionals. Today, Databricks open sources their newly developed framework MLflow, with an aim to simplify their complex machine learning experiments with smart automation and numerous accessibility in deploying your machine learning models across any platform. With MLflow, Machine Learning users can simply standardize their complex processes while building and deploying their machine learning and predictive models. With this framework, data scientists are fueled with lots of automation accessibility through which they can track experiments, package their machine learning codes and manage their models on any of the popular machine learning frameworks. The current platform offers following three components: MLflow Tracking: This component allows you to log codes, data files, config and results. It also allows to query your experiments through which you visualize and compare your experiments and parameters swiftly without much hassle. MLflow Projects: It provides structured format for packaging machine learning codes along with useful API and CLI tools.This allows data scientists to reuse and reproduce their codes and easily chain their projects and workflows together. MLflow Models: It is a standard format for packaging and distributing machine learning models across different downstream tools. Azure ML compatible models, Deploying with Amazon Sagemaker or deploying on a local REST API are some of the examples of distributing models. The current version is just an Alpha release and more features would be added to its full release. To get more details on its core offerings, APIs and command-line interfaces, read the official documentation at mlflow.org. MachineLabs, the browser based machine learning platform, goes open source Microsoft Open Sources ML.NET, a cross-platform machine learning framework Google announces Cloud TPUs on the Cloud Machine Learning Engine (ML Engine)
Read more
  • 0
  • 0
  • 2180

article-image-apache-flink-1-5-0-is-out
Pravin Dhandre
01 Jun 2018
2 min read
Save for later

Apache Flink 1.5.0 is out

Pravin Dhandre
01 Jun 2018
2 min read
After almost 5 months of hard work by the Flink community, the team is happy to roll out the newest release Apache Flink 1.5.0. This is a major release of the 1.x series featuring advanced capabilities along with over 750+ bugs and issues fixed. Apache Flink is an open-source big data processing framework used for real-time analytics, stream processing and batch processing applications.This framework is capable of delivering fast, efficient, accurate, and high fault tolerance in handling huge massive streams of events. With more than 330 active contributors, Apache Flink is one of the most active stream processing projects of Apache Software Foundation. Key new features and improvements: Rewritten Flink’s Deployment and Process Model Added dynamic support for allocation and release of resources on YARN and Mesos. Simplified deployment on Kubernetes. Requests for job submission, cancellation, job status to the JobManager happen through REST. Broadcast State Connects broadcasted stream such as context data, machine learning models with other streams. Broadcasted states can be checkpointed and restored. Unblocks implementation of “dynamic patterns” feature. Improvements to Flink’s Network Stack Added Credit-based flow control for high throughput. Improved performance by lowering latencies without reduction in throughput. Task-Local State Recovery Keeps copy of the application state on the local disk of each machine. Improved failure recovery. Extending Join Support for SQL and Table API Support for joining of tables on bounded time ranges in both event-time and processing-time. Supports full-history matching similar to standard SQL statements. SQL CLI Client Added SQL CLI client support for processing exploratory queries on data streams. Service added for streaming and batch SQL queries. Various other features and improvements Supports OpenStack’s S3-like file system Improved reading and writing of JSON messages from and to connectors Applications rescaling improved without manual triggers Improved watermarks and latency measures For the complete list of features and improvements, please review the release notes on the official Apache Flink page. Flink Complex Event Processing Top 5 programming languages for crunching Big Data effectively Working with Kafka Streams
Read more
  • 0
  • 0
  • 1691
article-image-anaconda-5-2-releases
Sunith Shetty
01 Jun 2018
2 min read
Save for later

Anaconda 5.2 releases!

Sunith Shetty
01 Jun 2018
2 min read
The Anaconda team has announced a new release of Anaconda Distribution 5.2. This new version has brought several new changes in terms of platform changes, user-facing challenges, and backend improvements. Anaconda is a free open-source distribution of Python which allows fast, easier and powerful way to perform data science and machine learning tasks. It is an efficient platform used for carrying out large-scale data processing, scientific computing and more. With over 6 million users, it includes more than 250 data science packages suitable for all major operating systems such as Windows, Linux, and macOS. Every package version is managed by the package management system conda. Some of the noteworthy changes available in Anaconda Distribution 5.2 are: Major highlights More than 100 packages have been updated or added to the new release of Anaconda Distribution 5.2 (Notable Updates includes - Qt v5.9.5, OpenSSL v1.0.2o, NumPy 1.14.3, SciPy v1.1.0, Matplotlib v2.2.2, and Pandas 0.23.0). Now Windows installers control their environment more carefully. Thus even if menu shortcuts fail to get created, it won't lead to a lot of installation issues. macOS pkg installers developer certificate is now updated to Anaconda, Inc. User-facing improvements All default channels now point to repo.anaconda.com instead of repo.continuum.io Now you have more dynamic shortcut working directory behavior thus improving Windows multi-user installations To prevent usability issues, Windows installers now disallow the characters (! % ^ =) in the installation path. Backend improvements Security fixes done for more than 20 packages based on in-depth Common Vulnerabilities and Exposures (CVE) vulnerabilities. Improved behavior of --prune because of history file being updated correctly in the conda-meta directory Windows Installer will now use a trimmed down value for PATH env var, to avoid DLL hell problems with existing software In addition to these, several new changes have been added to all x86 platforms,  Linux distributions, and windows distributions. For the complete list of new changes, you can refer the release notes. In case you want to download the new version of Anaconda Distribution 5.2, you can get the file from the official page. Alternatively, you can update the current Anaconda Distribution platform to version 5.2 by using conda update conda followed by conda install anaconda=5.2. 30 common data science terms explained Data science on Windows is a big no 10 Machine Learning Tools to watch in 2018
Read more
  • 0
  • 0
  • 5501

article-image-intel-ai-lab-introduces-nlp-architect-library
Sunith Shetty
30 May 2018
3 min read
Save for later

Intel AI Lab introduces NLP Architect Library

Sunith Shetty
30 May 2018
3 min read
Data forms an integral part of every business or organization which is used to make valuable decisions based on changing circumstances. Natural Language Processing (NLP) is a widely adopted technique used by machines to understand and communicate with humans in human language. This enables human to access, analyze and extract data more intelligently from a huge amount of unstructured data. Intel AI Lab’s team of NLP researchers and developers has introduced NLP Architect, a new open-source Python library. This library can be used as a platform for future research and developing the state-of-the-art deep learning techniques for natural language processing and natural language understanding. Rapid and recent advancements in deep learning and neural network paradigms has led to the growth in NLP domain. This new library offers flexibility in implementing NLP solutions which are packed with the past and ongoing NLP research and development work of Intel AI Lab. NLP Architect overview The current version of NLP Architect offers noteworthy features which form the backbone in terms of research and practical development. All the following models are provided with required training and inference processes: It consists of NLP core models such as BIST and NP chunker that allows powerful extraction of linguistic features for NLP workflow NLU models such as intent extraction (IE), name entity recognition (NER) used for intent-based applications It consists of modules which address semantic understanding Now consists of components which hold a key for conversational AI such as chatbot applications, dialog applications and more End-to-end deep learning applications such as Q&A, reading comprehension and more Source: AI Intel Blog This library of NLP components provides the required functionality to extend NLP solutions with a range of audience. It provides excellent media for analysis and optimization of Intel software and hardware on NLP workloads. In addition to these models, new features such as data pipelines, common functional calls, and utilities related to NLP domain which are majorly used when deploying models, are added. To know more about the updates, you can refer the official Intel AI blog. How NLP Architect can be used You can train models using the provided datasets, configurations and algorithms You can train models based on your own data You can create new models or extend your existing models You can explore various common and not-so-common challenges faced in NLP domain using deep learning models You can optimize and extend the use of state-of-the-art deep learning algorithms You can integrate various modules and utilities from the library to NLP solutions Deep learning frameworks support This repository supports several open source deep learning frameworks such as: Intel Nervana Graph Intel Neon Intel-optimized TensorFlow Dynet Keras Note: We can expect the list of models to update in future. All these models will run with Python 3.5+ If you want to download the open-source Python library or want to contribute to the project by providing valuable feedback, download the code from Github. A complete documentation for all core modules with end-to-end examples can be found in their official page. Intel takes Facebook’s help on AI chip; Cisco uses AI to predict IT services; and more Introducing Intel’s OpenVINO computer vision toolkit for edge computing Facelifting NLP with Deep Learning
Read more
  • 0
  • 0
  • 3035

article-image-mariadb-10-3-7-releases
Pravin Dhandre
28 May 2018
2 min read
Save for later

MariaDB 10.3.7 releases

Pravin Dhandre
28 May 2018
2 min read
Last Friday, the MariaDB Foundation officially announced the general availability of its popular database MariaDB with a newer stable version 10.3.7. This release is considered to be a major and substantial release within 10.3 series of release. MariaDB is fast, scalable and robust with a rich ecosystem of storage engines and plugins for a wide variety of use cases across banks, social sites, ecommerce and many more. Improvement Highlights MyRocks Storage Engine 1.0 now Stable for MariaDB 10.3.7 with high compression ratio. Spider Storage Engine 3.3.13 now Stable, supporting partitioning and XA Transactions. Added two new algorithm options, INSTANT and NOCOPY for operations of data modification and rebuilding clustered index respectively. SSL support for embedded server library when connecting to remote servers. Added new status variables namely, feature_json and feature_system_versioning for monitoring JSON functionality usage and system versioning respectively. Removed InnoDB version number 5.7 in MariaDB 10.3 and onwards. Bugs fixed for ADD COLUMN. Improved ALTER TABLE algorithms along with  ALGORITHM=INSTANT and ALGORITHM=NOCOPY. Various performance fixes and code cleanup, including Clean up InnoDB parameter validation Fixed bug that caused the system to hang while shutting down InnoDB. Performance improved in FLUSH TABLES…FOR EXPORT causing no hang. No more support to Debian 7 Wheezy and Fedora 26 in future releases. Users need to update their OS with either Debian 8 “Jessie” or Fedora 27 and onwards. With these added features and performance improvements, MariaDB developers are equipped now to churn out their data into better structured information. Please refer to the release notes and changelog for more details. MySQL 8.0 is generally available with added features Why Oracle is losing the Database Race Neo4j 3.4 aims to make connected data even more accessible
Read more
  • 0
  • 0
  • 2302
article-image-postgresql-11-beta-1-is-out
Sunith Shetty
25 May 2018
4 min read
Save for later

PostgreSQL 11 Beta 1 is out!

Sunith Shetty
25 May 2018
4 min read
PostgreSQL team announces the first beta release of PostgreSQL 11 which contains sneak peek into all the features that will be available in the release candidate of PostgreSQL 11 which is likely to be released in late 2018. The major features are centered around database simplicity, handling large datasets, and various performance bottlenecks. We can expect some minor changes before the final release is out. Since it is still in beta release, it is strongly advised you don't run them in the production environment to avoid any hassle. PostgreSQL is an open source relational database management system which has grown in popularity over the years. With the constant development of more than 30 years, PostgreSQL is one of the popular database used today. It has been called the DBMS of 2017 because of its powerful database management system that offers better reliability, robustness, and performance measures. Some of the noteworthy changes available in PostgreSQL 11 Beta 1: Partitioning improvements Partitioning plays an integral part in splitting a large dataset into smaller pieces in order to carry out complex operations with ease. PostgreSQL 11 contains several new features and improvements to working with data in partitions: New feature, hash partitioning, allows you to partition using a hash key You can now use UPDATE statements to a partition key in order to move the affected rows to the appropriate partitions PostgreSQL 11 supports enhanced partition elimination during query processing and execution thus leading to improved SELECT query performance Complete support for PRIMARY KEY, FOREIGN KEY, triggers, and indexes on partitions A new feature has been added which allows the query to distribute grouping and aggregation to partitioned tables before the final aggregation. However, in order to enable the settings, you need to set enable_partitionwise_aggregate = on in your configuration file, since it is disabled by default. Parallelism improvements New features have been added to build a smooth parallel query infrastructure to manage and carry out workloads efficiently and effectively thus providing significant performance enhancements. We now have parallelized hash joins and CREATE INDEX for B-tree indexes We can use parallelized features on certain queries with UNION SQL stored procedures A new feature SQL stored procedures is introduced by the PostgreSQL team which allows users to use embedded transactions such as BEGIN, COMMIT/ROLLBACK and more within a procedure. Just-In-Time compilation Now you can optimize the execution of code, and operations; and even make required changes during the run time. Thus it stands out as a perfect framework which gives you a leeway to allow future optimizations in the workflow. In case you are building PostgreSQL 11 from source, you can enable JIT compilation using the --with-llvm flag. Window functions In PostgreSQL 11, window functions will support all options in SQL:2011 standard SCRAM authentication PostgreSQL 11 supports channel binding for SCRAM authentication, thus providing the required security feature to prevent man-in-the-middle attacks. PostgreSQL team has upgraded this feature since SCRAM authentication was already available. This was used to improve the storage and transmission of passwords on the basis of standard protocol. Simplicity and user experience improvements Although PostgreSQL provides a healthy set of features, not all of them can be easy-to-use in development and production environments. The PostgreSQL team has therefore brought many new improvements to better the user experience. Now you can quit the PostgreSQL command-line (psql) using keywords like quit and exit. Additional improvements and features Many other new improvements and features have been added to the PostgreSQL 11. You can refer the release notes for complete list of new and changed features in the roadmap. If you want to contribute to the project and want to test this new release in order to find bugs and issues, download PostgreSQL 11 Beta 1, from their official page. You can find existing open issues in the PostgreSQL wiki. In case you want to report any bugs or issues you can use report bugs form available on the PostgreSQL website. How to perform data partitioning in PostgreSQL 10 New updates to Microsoft Azure services for SQL Server, MySQL, and PostgreSQL 2018 is the year of graph databases. Here’s why
Read more
  • 0
  • 0
  • 2580

article-image-amazon-is-selling-facial-recognition-technology-to-police
Richard Gall
23 May 2018
4 min read
Save for later

Amazon is selling facial recognition technology to police

Richard Gall
23 May 2018
4 min read
The American Civil Liberties Union (ACLU) has revealed that Amazon has been selling its facial recognition software, called Rekognition, to a number of law enforcement agencies in the U.S. Using a freedom of information requests, the ACLU obtained correspondence between the respective departments and Amazon. According to the ACLU, Rekognition is a dangerous step towards a surveillance state. It could, the organization argues, lead to serious infringement on civil liberties. Here's what ACLU had to say in a post published on Tuesday 22 May: People should be free to walk down the street without being watched by the government. By automating mass surveillance, facial recognition systems like Rekognition threaten this freedom, posing a particular threat to communities already unjustly targeted in the current political climate. Once powerful surveillance systems like these are built and deployed, the harm will be extremely difficult to undo. How is Rekognition currently being used? Two U.S. police departments are using Rekognition. In Oregon, the Washington County Sheriff's Office is using the facial recognition tool to identify persons of interest from a database of 300,000 mugshots. This is a project that has been underway for some time. Chris Adzima, Senior Information Systems Analyst for the Washington County Sheriff’s Office, wrote a guest post on the AWS website outlining how they were using Rekognition in June 2017. Once the architecture was in place, the team built a mobile app to make the technology usable for officers. In Orlando, meanwhile, police have been using AWS for 'consulting and advisory services.' They are seeking to implement Rekognition in a project referred to in the documentation as 'Orlando Safety Video POC'. Orlando City police are paying $39,000 for AWS' time on the project. Civil liberties organizations pen an open letter to Jeff Bezos The ACLU, along with a number of other organizations, including the Electronic Frontier Foundation and Data for Black Lives, penned an open letter to Jeff Bezos to express their concern. In an appeal to Amazon's past commitment to civil liberties, the letter stated: In the past, Amazon has opposed secret government surveillance. And you have personally supported First Amendment freedoms and spoken out against the discriminatory Muslim Ban. But Amazon’s Rekognition product runs counter to these values. As advertised, Rekognition is a powerful surveillance system readily available to violate rights and target communities of color. The letter presents an impassioned plea for Amazon to consider the way in which it is its complicit with government agencies. It also offers a serious warning about the potential consequences of facial recognition technology in the hands of law enforcement. Amazon defends collaborating with police Amazon has been quick to defend itself. In a statement emailed to various news organizations, the company said "Our quality of life would be much worse today if we outlawed new technology because some people could choose to abuse the technology. Imagine if customers couldn’t buy a computer because it was possible to use that computer for illegal purposes? Like any of our AWS services, we require our customers to comply with the law and be responsible when using Amazon Rekognition.” However, the key issue with Amazon's statement is that the analogy with personal computers doesn't quite hold. Individuals aren't responsible for maintaining the law, and neither do they hold the same power that law enforcement agencies do. Technology might change how individuals behave, but that behavior must still comply with the law. The current scenario is a little different; the concern is around how technology might actually change the way the law functions. There isn't, strictly speaking at least, any way of governing how that happens. Whatever you make of Amazon's work with law enforcement, it's clear that we are about to enter a new era of disruption and innovation in public institutions. For some people, collaboration between public and private realms opens up plenty of opportunities. But there are many dangers that must be monitored and challenged. Read next: Top 10 Tools for Computer Vision [Link] Admiring the many faces of Facial Recognition with Deep Learning [Link]
Read more
  • 0
  • 0
  • 3025