Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-14th-feb-2018-data-science-news-daily-roundup

14 Feb 2018

3 min read

14th Feb 2018 – Data Science News Daily Roundup

14 Feb 2018

Keras 2.1.4, Updates to TensorFlow Object detection API, PyTorch 0.3.1, New releases in Pgpool-II, and more in today’s top stories around machine learning, deep learning, and data science news. 1. Keras 2.1.4 releases Keras 2.1.4 is released with bug fixes and improvements in performance and example scripts. The major changes include: In ImageDataGenerator, change default interpolation of image transforms from nearest to bilinear. Stateful metrics in model.compile(..., metrics=[...]) are now allowed. A stateful metric inherits from Layer, and implements __call__ and reset_states. Support for constants argument in StackedRNNCells. Some TensorBoard features are enabled in the TensorBoard callback (loss and metrics plotting) with non-TensorFlow backends. Reshape argument in model.load_weights() are added, to optionally reshape weights being loaded to the size of the target weights in the model considered. The entire changes are available in the release notes. 2. TensorFlow Object Detection API gets updated with instance segmentation Tensorflow announced the addition of instance segmentation to their object detection API. Instance segmentation is used to segment an object region once it is detected. Instance segmentation allows for more fine-grained information about the extent of the object within the box. With this API update, Tensorflow now supports a number of instance segmentation models similar to those discussed in the Mask R-CNN paper. The model now predicts masks in addition to object bounding boxes. They have also provided four instance segmentation config files to be used to train models: mask_rcnn_inception_resnet_v2_atrous_coco mask_rcnn_resnet101_atrous_coco mask_rcnn_resnet50_atrous_coco mask_rcnn_inception_v2_coco More details can be read at the official Github repo. 3. PyTorch 0.3.1 release PyTorch have released a minor release 0.3.1 of bug fixes and performance improvements. They have removed support for CUDA capability 3.0 and 5.0 Binary releases for CUDA 7.5 are now stopped. They will now add CPU-only binary releases that are 10x smaller in size than the full binary with CUDA capabilities. Added Cosine Annealing Learning Rate Scheduler Added reduce argument to PoissonNLLLoss to be able to compute unreduced losses Added random_split that randomly splits a dataset into non-overlapping new datasets of given lengths Introduced scopes to annotate ONNX graphs to have better TensorBoard visualization of models Allowed map_location in torch.load to be a string, such as map_location='cpu' or map_location='cuda:2' Bug fixes and other improvements are available in the changelog. 4. Pgpool-II 3.7.2, 3.6.9, 3.5.13, 3.4.16 and 3.3.20 are now officially released Pgpool-II is a tool to add useful features to PostgreSQL, such as connection pooling, load balancing, and automatic failover. Pgpool Global Development Group has announced the availability of versions 3.7.2, 3.6.9, 3.5.13, 3.4.16, and 3.3.20 of Pgpool-II. The changes include: Fixed the bug with socket writing added in Pgpool-II 3.7.0, 3.6.6 and 3.5.10. Allow building with libressl. Set TCP_NODELAY and non-blocking to frontend socket. TCP_NODELAY is now employed Changed systemd service file to use STOP_OPTS=" -m fast". Changed pgpool_setup to add restore_command in recovery.conf. For more information, take a look at the release notes. 5. OmniDB 2.5 is now released with support for Oracle Databases OmniDB 2.5, the browser-based database management tool, is now released. It now allows users to manage multiple databases in a unified workspace with a user-friendly and fast-performing interface. The following features and improvements are added: Basic support for Oracle databases. Users can manage, connect, and interact with Oracle databases using most of the same features provided to manage PostgreSQL databases. New DDL Panel. A new panel located below the treeview displays properties and DDL of the currently selected node. For a complete list of updates, read the OmniDB change tracker.

0
0
1918

article-image-13th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

13 Feb 2018

4 min read

13th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

13 Feb 2018

4 min read

Cloud TPUs available in beta, Microsoft’s plans for blockchain, Oracle’s PaaS, Scanpy, and more in today’s top stories around machine learning, blockchain, and data science news. 1. Google’s Cloud TPUs now available in beta for accelerating machine learning Google’s Cloud TPUs, the family of Google-designed hardware accelerators, is now available in beta. These custom chips are optimized to speed up and scale up specific ML workloads programmed with TensorFlow. The company first announced Cloud TPUs at its I/O developer conference on May 17-19, 2017, for a limited number of developers and researchers. Each Cloud TPU features four custom ASICs. It packs up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board. Developers who already use TensorFlow don’t have to make any major changes to their code to use this service. Usage will be billed at $6.50 per Cloud TPU per hour. Using a single Cloud TPU, developers can train ResNet-50 to the expected accuracy on the ImageNet benchmark challenge in less than a day, for well under $200! Once the TPU pods (array of cloud TPUs connected together via an ultra-fast, dedicated network to form multi-petaflop ML supercomputers) are available, ResNet-50 and Transformer training times will drop from almost a day to less than 30 minutes. 2. Microsoft plans to use blockchain technology for identity management In a recent blog post, Microsoft revealed plans to use blockchain technology in the form of decentralized identity systems to solve online management issues with personal data and identity management. A decentralized identity system is not controlled by any single, centralized institution. It removes the possibility of censorship and gives an individual full control over their identity and reputation. The building of the platform takes inspiration from Microsoft’s commitment to the ID2020 alliance. Initially, Microsoft will support blockchain-based decentralized IDs (DIDs) through the Microsoft Authenticator app. Microsoft plans to work with DID method implementations, which follow a specific standard outlined by a W3C working group. According to Ankur Patel, PM, Microsoft’s identity division, “Using our technology individuals will get a secure encrypted digital hub where they can store their identity data and easily control access to it.” 3. Oracle adds advanced autonomous capabilities to its Cloud platform Oracle lays out a broader vision for Oracle Cloud Platform with a range of autonomous service capabilities. Oracle PaaS (Platform as a service) capabilities support the needs of the entire organization, including developers, enterprise architects, data scientists, IT operations, and business users. Autonomous PaaS services include advanced capabilities such as auto code generation, self-defining data flows, automated data discovery and preparation. They also have voice-enabled integration links, machine learning-based continuous data analysis, and self-learning bots that understand user intent and continually refine that understanding. They will speed IT deployments, by letting developers jump right to creating new functionality rather than having to spend time on the routine tasks. PaaS promises to lower IT costs and improve security because they require less human management and eliminate human error. 4. Scientists develop Scanpy to help manage enormous datasets Scientists from the Helmholtz Zentrum München have developed Scanpy, a program that is able to help manage enormous datasets. Scanpy was made with the purpose of analyzing the gene-expression data of a large number of individual cells. It allows comprehensive analysis of large gene-expression datasets with a broad range of machine-learning and statistical methods. Scanpy is based on the Python language and uses graph-like coordinate system. Instead of characterizing a single cell by the expression value for thousands of genes, the system simply characterizes cells by identifying their closest neighbors -- very much like the connections in social networks. In fact, to identify cell types, Scanpy uses the same algorithms as Facebook does for identifying communities. To read more, visit the official documentation. 5. Accelirate has announced its partnership with Chirrp.ai to strengthen its enterprise-class chatbot solutions capability Accelirate has partnered with Chirrp.ai, an AI-powered communication channel provider, to strengthen their chatbot solutions. With this partnership, the enterprise-grade chatbots will be able to handle low-, medium- and high-complexity use cases. A high-complexity use case is where many clients use chatbots as an NLP/NLU-powered application-delivery mechanism which can handle complex user queries as well as application rules and workflows right from within the chatbot interface. Initially, the chatbots will be configured and set up to understand structured as well as unstructured customer queries and provide them with appropriate answers without involving a human. The chatbot will gather the relevant customer information, query the backend systems (which can be accomplished by using RPA robots) and present the information to the customer interactively. All without human intervention!

0
0
1385

article-image-12th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

12 Feb 2018

4 min read

12th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

12 Feb 2018

4 min read

DeepMind IMPALA, Dynamometer opensourced, VoltDB v8.0, and more in today’s top stories around machine learning, deep learning,and data science news. 1. DeepMind Lab introduces IMPALA - a new and efficient distributed architecture capable of solving many tasks at the same time DeepMind has developed a new distributed agent named IMPALA (Importance-Weighted Actor-Learner Architectures) that maximises data throughput using an efficient distributed architecture with TensorFlow. IMPALA was developed in order to tackle the challenging DMLab-30 suite. DMLab-30 is a set of environments designed using the open source RL environment by DeepMind Lab. These environments enable any DeepRL researcher to test systems on a large spectrum of interesting tasks either individually or in a multi-task setting. IMPALA is inspired by the popular A3C architecture which uses multiple distributed actors to learn the agent’s parameters. When it was tested on the DMLab-30 levels, IMPALA was 10 times more data efficient and achieved double the final score compared to distributed A3C. Moreover, IMPALA showed positive transfer from training in multi-task settings compared to training in single-task setting. To know more about IMPALA, you can read the research paper. 2. LinkedIn open-sources Dynamometer, a new tool for testing big-data performance LinkedIn opensources Dynamometer, a tool which focuses around stress-testing large Hadoop big-data deployments without using massive amounts of infrastructure. Using Dynamometer, Information technology teams can test production workloads and ensure they’ll be able to cope with any changes to their Hadoop clusters. It is designed for those running large-scale Hadoop deployments, as well as those who propose changes to the core Hadoop project and want to ensure new features don’t hurt performance. Visit the GitHub Repo for a detailed information on LinkedIn’s Dynamometer. 3. VoltDB Introduces VoltDB v8.0, a Translytical Database for Powering Real-Time Decisions VoltDB, provider of an enterprise-class translytical database for business-critical applications announced the latest version (v0.8) of its flagship solution. According to Forrester analyst Mike Gualtieri, a translytical database is a “single unified database that supports transaction and analytics in real time without sacrificing transactional integrity, performance, and scale. The new version delivers more predictable, long-tail latency responses based on real-time data and historical intelligence, improving real-time processing and offering self-service analysis. What’s new in the VoltDB v8.0? Improved Network Security User-Defined Functions Common Table Expressions Kafka Enhancements Python V3 API For detailed information on v0.8, read the release notes. 4. Amazon adds encryption at rest to DynamoDB database service Amazon Web Services Inc. added a new encryption feature to its DynamoDB database service, which helps secure users’ data better. DynamoDB, Amazon’s NoSQL database service is designed to store and retrieve unstructured data, and is typically used for big-data workloads and analysis. With the new update, users can choose to encrypt data stored “at rest,” that is, when the data is not being used. The option is not switched on by default, so users will have to enable it manually when creating a new database table. Visit the AWS’ official post for a detailed read on this topic. 5. Apache Flink® Master Branch Monthly: New in Flink in January 2018 Apache Flink team highlighted a selection of features that have been merged into Flink’s master branch during the past month in its “Flink Master Monthly” blog post. The summary of features merged are: Improvements to Flink’s deployment and process model (FLIP-6) Groundwork for task recovery from local state, which speeds up failure recovery Improved state backend abstraction Network stack changes to improve performance Application-level flow control for improved control of checkpointing behavior Improved Mesos integration with Docker Table API / Streaming SQL Ecosystem integrations Integrate generated config tables into documentation

0
0
1347

article-image-9th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

09 Feb 2018

3 min read

9th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

09 Feb 2018

3 min read

PostgreSQL 10.2, 9.6.7, 9.5.11, 9.4.16, and 9.3.21 released, Bokeh 0.12.14 released, Cloudera Altus Analytic DB Beta,and more in today’s top stories around machine learning, deep learning,and data science news. 1. PostgreSQL 10.2, 9.6.7, 9.5.11, 9.4.16, and 9.3.21 released! PostgreSQL Global Development Group has released updates 10.2, 9.6.7, 9.5.11, 9.4.16, and 9.3.21. This release: Fixes two security issues Fixes issues with VACUUM, GIN indexes, and hash indexes that could lead to data corruption Fixes for using parallel queries and logical replication. Read the detailed release document on the official website. 2. Bokeh 0.12.14 released The Bokeh organization announced the incremental release of Bokeh 0.12.14. This version has two highlights: New multi-gesture tools for editing glyphs directly Update for compatibility with upcoming Tornado 5.0 Additionally, this release also includes some bug fixes and documentation improvements. You can visit the Change log on GitHub and the official documentation for a detailed hold on this release. 3. MapR simplifies end-to-end workflow for Data Scientists with MapR Expansion Pack (MEP) 4.1 MapR Technologies announced the availability of MapR Expansion Pack (MEP) 4.1, which allows data scientists and engineers to build scalable deep learning pipelines, instant availability of operational data for data science. It also enables them to achieve over 2X improvement in performance across a variety of data discovery and ad-hoc queries. The MEP 4.1 allows building real-time pipelines and brings data science capabilities to a broad set of users with new languages support. The team also added features to MapR-DB, MapR Data Science Refinery, and Apache Drill 1.12 in the MapR Expansion Pack 4.1, which include: MapR Data Science Refinery extends support for distributing Python archives for PySpark. This allows data scientists to leverage popular Python data science libraries in a distributed way to create scalable deep learning pipelines. MapR Data Science Refinery enables Apache Zeppelin to easily leverage a diverse set of Python libraries and environments that can be shared and stored in MapR-XD. PySpark jobs can directly read and write to MapR-DB OJAI, making operational data instantly available for data science. Python and Java Bindings for MapR-DB OJAI Connector for Apache Spark enable developers to read/write to MapR-DB from Spark using Java and Python. With this, developers can now build data-intensive business applications in Java and Python. A new version of Apache Drill, Drill 1.12 enables fast data exploration on operational data in MapR-DB and historical data in Parquet for data scientists, with over 2X performance improvements across a variety of data discovery and ad-hoc queries. 4. Cloudera Altus Analytic DB Beta Available Cloudera announced the beta version of its Altus Analytic DB, which is built on the Cloudera Altus platform-as-a-service foundation. The Altus Analytic DB also supports the Altus Data Engineering service. Cloudera’s Altus Analytic DB: Allows maintaining a single shared repository of Data in Open File Formats Provides multiple clusters over shared data Provides a fully controlled data security Makes it easy to provision a cluster Read more about each feature in detail on Cloudera’s official website. 5. Extract! 4.0 - the first fully Deep learning powered Resume Parsing Solution Textkernel announced the first Deep learning powered ‘Resume Parsing Solution’ named Extract! 4.0. The resume parsing software is currently available in the English language. Matt McNair, VP Global Services at CareerBuilder - Textkernel's parent company, said, “Deep Learning has transformed entire industries including automotive, healthcare, retail and financial services. Today, Textkernel is revolutionizing the HR domain with its launch of Extract! 4.0.” To have a detailed information on Extract! 4.0 and Deep Learning, visit Textkernel’s official website.

0
0
1304

article-image-8th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

08 Feb 2018

4 min read

8th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

08 Feb 2018

4 min read

RapidMiner Studio 8.1.0, MySQL Shell 8.0.4, Project Maestro’s new release, upgrades to S/4HANA, and more in today’s top stories around machine learning, deep learning, and data science news. 1. What's New in RapidMiner Studio 8.1.0 RapidMiner announces the release of Auto Model and RapidMiner 8.1 to accelerate data science. They have added an Auto Model feature, a new working model for rapid creation, comparison, and exploration of new models. They have also added a powerful global search functionality. Users can now search for operators, repository contents, UI actions, and Marketplace content. Other enhancements include: New Process Templates upgraded to use the latest operator versions. Read Excel now allows sheet selection by name. Read CSV, Read XML and Read Excel has a new expert parameter to read all values as polynomial, which allows the user to disable type guessing. Hide passwords in the Password Manager dialog and store them with a stronger encryption. Search Twitter and Get Twitter User Statuses added support for 280-character tweets. For other bug fixes and enhancements, read the official documentation. 2. MySQL announces Shell 8.0.4 and the general availability of Oracle Enterprise Manager for MySQL Database MySQL introduces a new Upgrade checker (UC) utility with the latest release of Shell. This feature is introduced to make the 5.7 system ready for MySQL 8.0 upgrade. UC connects to a specified server and runs a series of checks. If any issues are discovered, it displays them along with any advice targeted at resolving those issues. It also prints a summary and returns an integer value describing the severity of the issues found: 0 – no issues or only ones categorized as notice. 1 – No fatal errors were found, but some potential issues were detected. 2 – UC found errors that must be fixed before upgrading to 8.0. More information is available at the MySQL Server Blog. The MySQL development team has also announced the general availability of Oracle Enterprise Manager for MySQL Database. Oracle Enterprise Manager for MySQL Database is the official MySQL plugin that provides comprehensive performance, availability, and configuration information for Oracle's enterprise IT management product line and Oracle Enterprise Manager (13c or later). More information on the contents of this release is available in the changelog. 3. Tableau releases Beta 3 of Project Maestro Tableau’s latest release for Project Maestro includes improvements to data cleaning to quickly and accurately get dirty data ready for analysis. The major changes include: Quick text cleaning, which allows application of common calculation to text fields to change the case or remove unwanted characters without having to write the calculation manually. Fast, visual filters. The new quick filter experience allows easy filtration of ranges of values for dates and numbers. Users can also write a calculation to handle more complex filtering tasks. Easy Debugging features, to easily find errors and navigate to them One-click removal of columns or steps. The entire information is available at the official blog. 4. S/4HANA cloud service from SAP gets a major upgrade SAP has unveiled a major new update to its S/4HANA Cloud service. The new update adds more intelligent functionality in machine learning, in-memory analytics, and in-context collaboration. These changes are mostly for the Finance, Procurement, Sales, Manufacturing and Professional Services sector. A new improvement requests submission form which can be used on the Customer Influence site. An Automated Payment Advice Processing, powered by machine learning and SAP Leonardo. This will help users turn documents into structured data with automated extraction of payment information from PDF files. Predictive Quotation Conversion Rates calculator to understand probable orders and predicted Sales Volumes, allowing for a more accurate forecasting. Release Billing Proposal application for transparent view of non-billable services in the Professional service firms. 5. HarperDB Launches Database Solution for IoT, App Developers, and Enterprise HarperDB, have launched an HTAP (Hybrid Transactional/Analytical Processing) database solution. HarperDB's database is powered by a data storage algorithm that ingests both unstructured and structured data into a fully indexed, single model data store. Both NoSQL and SQL capabilities are provided natively in real-time, and there is no increase in the storage footprint. This database solution is available for IoT, which can run on the Edge. It is also useful for app developers, allowing them to focus more time on coding and less on managing a complex database. It also provides a single model for structured and unstructured data for enterprises.

0
0
1107

article-image-7th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

07 Feb 2018

4 min read

7th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

07 Feb 2018

4 min read

Sisense 7.0, Distilled IMPACT Behavioral Analytics Model, Gnocchi 4.2 release, and more in today’s top stories around machine learning, deep learning,and data science news. 1. Sisense 7.0 aids non-technical users to gain data expertise Sisense announced the release of Sisense Version 7.0, which delivers an intuitive, visual, drag and drop interface for data preparation that is used by non-technical business users to easily find, add, and combine complex data sources. It delivers smart, machine learning-driven recommendations, helping guide users through the data preparation process by recommending use of specific fields to easily 'mash-up' data sources—saving time, unveiling new insights and reducing the chance of errors. Leveraging advanced machine learning to allow for smart data preparation and visualization field suggestions presents a new step in making analytics accessible to everyone, regardless of technical skill or expertise. For more information read the detailed coverage here. 2. Distilled Analytics releases distilled IMPACT behavioral analytics model Distilled Analytics announced the release of Distilled IMPACT, which is an innovative approach to providing quantitative measurement of non-financial factors associated with for-profit investment that uses advanced behavioral analytics supported by artificial intelligence. Distilled IMPACT platform quantifies non-financial activities using granular, discrete measures, particularly around human factors, to enable asset growth for impact investing by providing greater trust and transparency. It helps organizations understand impact by analyzing patterns of movement from aggregated and third-party data sources, revealing fundamental insight to human behavior. 3. IBM’s Watson Captioning to leverage Artificial Intelligence to Automate Closed Captioning Process IBM leverages Artificial Intelligence (AI) to automate the closed captioning process as part of its latest Watson Captioning. This new service will provide businesses with A scalable solution which saves time and capital Maximize productivity by streamlining workflows An increased caption accuracy over time The new IBM offering provides a seamless user experience via tools including Machine Generated Captions, Embedded Smart Layout, Watson Caption Editor and Live Captioning. 4. Gnocchi 4.2 released, with added features and performance Gnocchi 4.2 is released. Gnocchi is an open-source time series database designed to handle large amounts of aggregates being stored while being performant, scalable and fault-tolerant. Let’s have a quick look at the features added in Gnocchi 4.2: Wildcard can be used instead of metric name in Dynamic aggregates API. Dynamic Aggregate API have a new method called ‘rateofchange’. A new format for the batch payload is available to allow to pass the archive policy description Gnocchi now strictly respects the archive policy configured timespan when storing aggregates. A new date type ‘ datetime‘ is available for resource type attribute. It provides a new /v1/influxdb endpoint that allows to ingest data from InfluxDB clients. Only write is implemented. This should ease transition of users coming from InfluxDB tools such as Telegraf. Metricd exposes a new option called greedy (true by default) that allows to control whether eager processing of new measures is enabled when available. Gnocchi API can act as Prometheus Remote Write Adapter to receive Prometheus metrics. The endpoint to configure in Prometheus configuration is: https://<gnocchi-host-port>/v1/prometheus/write. The deprecated dynamic aggregation (moving average) has been removed. To know about these features in detail, visit its official website. 5. Podium Data releases Podium 3.2 to take its data lake catalog to the cloud Podium Data Inc. brings self-service big data to the cloud with the release of its new version 3.2 of its Data marketplace. Data Marketplace is a data catalog which is used with data lakes to eliminate the need for the extensive extraction and massaging procedures that characterize pure-Hadoop models. Podium promotes the software as providing self-service, on-demand access to quality data. With the Podium 3.2 release, users can now combine on-premises and cloud data, as stated by the company. Podium architecture separates storage from computing to enable data taken from the data delivery teams to support multiple variations of an analytical application from a single store. With version 3.2, sources now include Amazon Web Services Inc. and Microsoft Corp. Azure clouds. Version 3.2 also permits assets inside and outside the cloud to be merged and joined. For a detailed understanding of the Data Marketplace, visit the official website.

0
0
92

article-image-6th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

06 Feb 2018

3 min read

6th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

06 Feb 2018

3 min read

Tensorflow 1.6.0-rc, RocksDB 5.10.2, Grafana v5.0, the upcoming release of Spark 2.3, and more in today’s top stories around machine learning, deep learning, and data science news. 1. Tensorflow 1.6.0-rc released Introducing TensorFlow 1.6 release candidate with some breaking changes and other exciting major features and improvements. Prebuilt binaries are now built against CUDA 9.0 and cuDNN 7. Prebuilt binaries will now use AVX instructions. (This may break TF on older CPUs.) tf.estimator.{FinalExporter,LatestExporter} can now export stripped SavedModels. This improves forward compatibility of the SavedModel. FFT support added to XLA CPU/GPU. To know about Bug Fixes and other changes, visit the GitHub repo. 2. Facebook’s RocksDB 5.10.2 is now released RocksDB, the high performance embedded database for key-value data built by Facebook, has released its version 5.10.2. The new features include: CRC32C is now using the 3-way pipelined SSE algorithm crc32c_3way on supported platforms to improve performance. It now provides lifetime hints when writing files on Linux. This reduces hardware write-amp on storage devices supporting multiple streams. It now has a DB stat, NUMBER_ITER_SKIP, which returns the number of internal keys skipped during iterations. PerfContext counters, key_lock_wait_count and key_lock_wait_time are added, which measure the number of times transactions wait on key locks and total amount of time waiting. The complete release and changes are available at the official GitHub repo. 3. Grafana v5.0 is out in Beta Grafana, the open platform for analytics and monitoring, is now available in version 5.0. The major new features and enhancements include New Dashboard Layout Engine with easier drag, drop and resize experience and new types of layouts. New UX and improvements in UI in both look and function. Dashboard Folders for dashboards organization. Permissions on folders and dashboards to help manage larger Grafana installations. Datasource provisioning, to setup datasources and dashboards via config files. Persistent dashboard url makes it possible to rename dashboards without breaking links. The entire changes can be read at the official documentation. 4. What is expected from the upcoming Apache Spark 2.3 Release Apache Spark is soon to release their version 2.3.0 in an upcoming live webinar. The expected changes include: New DataSource APIs for helping developers to easily read and write data for Continuous Processing in Structured Streaming. PySpark support for vectorization, giving Python developers the ability to run native Python code fast. Improved performance by taking advantage of NVMe SSDs. Native Kubernetes support. 5. Ian Goodfellow releases code for SN-GAN and the projection discriminator Ian Goodfellow, the inventor of GANs, has released the code for SN-GAN and the projection discriminator. Spectral Normalization for GANs is a novel weight normalization technique to stabilize the training of the discriminator of GANs. cGANs with Projection Discriminator is a projection based way to incorporate the conditional information into the discriminator of GANs that respects the role of the conditional information in the underlying probabilistic model. Ian has done the chainer implementation for conditional image generation on ILSVRC2012 dataset (ImageNet) with spectral normalization and projection discriminator. The entire code implementation is available on GitHub.

0
0
1363

Savia Lobo

05 Feb 2018

6 min read

AutoML : Developments and where is it heading to

Savia Lobo

05 Feb 2018

6 min read

With the growing demand in ML applications, there is also a demand for machine learning tasks such as data preprocessing, optimizing model hyperparameters and so on to be easily handled by non-experts. This is because, these tasks were repetitive and due to the complexity were considered to be handled only by ML experts. To support this cause and to maintain off-the-shelf quality of machine learning methods without expert knowledge, Google came out with a project named AutoML, an approach that automates designing of ML models. You could also refer to our article on Automated Machine Learning (AutoML) for a clear understanding on how AutoML functions. Trying AutoML on smaller datasets AutoML brought in altogether new dimensions within machine learning workflows where repetitive tasks performed by human experts could be taken over by machines. When Google started off with AutoML, they applied the AutoML approach onto two smaller datasets in DL namely, CIFAR-10 and Penn Treebank to test them on image recognition and language modeling tasks respectively. The result was, AutoML approach could design models that were at par with the ones designed by the ML experts. Also, on comparing the designs drafted by humans and AutoML, it was seen that the machine-suggested architecture included new elements. These elements were later known to alleviate gradient vanishing/exploding issues, which concludes that the machines provided a new architecture which could be more useful for multiple tasks. Also, the machine designed architecture has many channels so that the gradients could flow backwards. This could help explain why LSTM RNNs work better than standard RNNs. Trying AutoML on larger datasets After a success in small scale datasets, Google tested AutoML on large scale datasets such as ImageNet and COCO object detection dataset. Testing AutoML on these was a challenge because of their higher orders of magnitude, and also because simply applying AutoML directly to ImageNet would require many months of training the AutoML method. In order to apply AutoML to large scale datasets, some alterations were made within the AutoML approach for it to be more tractable to large scale datasets. The changes include: Redesigning the search space so that AutoML could find the best layer which can then be stacked many times in a flexible manner to create a final network. Carry out architecture search on CIFAR-10 dataset and transfer the best learned architecture to ImageNet image classification and COCO object detection datasets. Thus, AutoML could find out two best layers i.e normal cell and reduction cell, which when combined resulted into a novel architecture called as “NASNet”. These two work well with CIFAR-10, and also ImageNet and COCO object detection. NASNet was seen to have a prediction accuracy of 82.7% on the validation, as stated by Google. Such an accuracy surpassed all previous inception models built by Google. Further, the learned features from the ImageNet classification were transferred to carry out object detection tasks using the COCO dataset. The learned features combined with a faster R-CNN resulted into a state-of-the-art predictive performance on the COCO object detection task in both the largest as well as mobile-optimized models. Google suspected that these image features learned by ImageNet and COCO can be reused for various other computer vision applications. Hence, Google open-sourced NASNet for inference on image classification and for object detection in the Slim and Object Detection TensorFlow repositories. Towards Cloud AutoML: Automated Machine learning platform for everyone Cloud AutoML has been Google’s latest buzz for its customers as it makes AI available for everyone. Using Google’s advanced techniques such as learning2learn and transfer learning, Cloud AutoML helps businesses having limited ML expertise, to start building their own high-quality custom models. Thus, Cloud AutoML benefits AI experts by improving their productivity and explore new fields in AI. The experts can also aid less-skilled engineers to build powerful systems. Companies such as Disney and Urban Outfitters are using AutoML for making search and shopping on their websites more relevant. With AutoML going on cloud, Google released its first Cloud AutoML product, Cloud AutoML Vision, an Image Recognition tool that enables fast and easy to build custom ML models. This tool has a drag-and-drop interface that allows one to easily upload images, train and manage the models, and then deploy those trained models directly on Google Cloud. When used to classify popular public datasets like ImageNet and CIFAR, Cloud AutoML Vision has shown state-of-the-art results. These results included fewer misclassifications than the generic ML APIs results. Here are some highlights on Cloud AutoML vision: It is built on Google’s leading image recognition approaches, along with transfer learning and neural architecture search technologies. Hence, one can expect an accurate model even if the business has a limited expertise in ML. One can build a simple model in minutes or a full, production-ready model in a day in order to pilot AI-enabled application. AutoML Vision has a simple graphical UI using which one can easily specify data. It later turns the data into a high quality model customized for one’s specific needs. Starting off with Images, Google plans to roll out Cloud AutoML tools and services for text and audio too. However, Google isn’t the only one in the race; other competitors including AWS and Microsoft are also bringing in tools such as Amazon’s SageMaker and Microsoft’s service for customizing Image recognition model, to aid developers with automating machine learning. Some other automated tools include: Auto-sklearn: An automated project that aids scikit-learn project--package of common machine learning functions--to choose the right estimator function. The Auto-sklearn includes a generic estimator function that conducts analysis to determine the best algorithm and set of hyperparameters for a given Scikit-learn job. Auto-WEKA : An inspiration from the Auto-sklearn is for machine learners using Java programming language and the Weka ML package. Auto-WEKA uses a fully automated approach to select a learning algorithm and sets its hyperparameters, unlike previous methods which used to address this in isolation. H2o Driverless AI : This uses a web-based UI and is specifically designed for business users who want to gain insights from data but do not want to get into the intricacies of machine learning algorithms. This tool allows users to choose one or multiple target variables in the dataset that needs a solution, and the system provides the answer. The results are in the form of interactive charts, explained with annotations in plain English. Currently, Google’s AutoML is leading them. It would be exciting to see how Google scales an automated ML environment exactly the same as traditional ML. Not only Google, but also other businesses are contributing to the movement towards adopting an automated machine learning ecosystem. We saw some tools joining the automation league and can expect more tools to join them. Also, these tools could go on cloud in future for an extended availability for non-experts, similar to the AutoML cloud by Google. With machine learning going automated, we can expect more and more systems to move a step closer to widening the scope for AI.

0
0
3210

article-image-5th-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

05 Feb 2018

3 min read

5th Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

05 Feb 2018

3 min read

MySQL Cluster 7.6.4, new features in dbForge SQL Complete v5.8, new chip for linking IoT and Blockchain, and more in today’s top stories around machine learning, deep learning,and data science news 1. MySQL Cluster 7.6.4 is out Announcing MySQL Cluster 7.6.4 which contain a number of attractive features including: A rewritten Local checkpoint algorithm, further designed to scale to atleast 16 TBytes of DataMemory sizes. Improvements in the MySQL Cluster Configurator (MCC, Auto Installer) New cloud feature for configuring nodes with a LocationDomainId. New ODirectSyncFlag, to improve disk write speeds by around 2-3x. Change default behaviour of restart configuration leading to a very significant reduction in restore times. Improvements to parallel query implementation (pushdown join, SPJ). Parallel UNDO log applier for disk columns Bug fixes For a detailed read on the features, visit the blog post. 2. Introducing new productivity features in dbForge SQL Complete v5.8 Devart, one of the leading developers of database tools and administration software, announced the release of dbForge SQL Complete v5.8. The dbForge SQL Complete is a useful add-in for Microsoft SQL Server Management Studio and Microsoft Visual Studio. The new release includes new productivity features, such as Result Grid Aggregates Find in Results Grid Execution warnings CRUD generator, and much more. For a detailed information on the new features and improvements, refer here. 3. New chip that helps IoT devices communicate with Blockchain Filament, an industrial internet solutions startup have developed a new Blocklet chip. The chip allows IoT devices to communicate with any Blockchain technology. Additionally, the chip has a very small footprint and low power consumption. It is also secure, containing a robust cryptographic chain-of-custody protocol. The chip currently supports Hyperledger Sawtooth blockchain, and will shortly be expanded to encompass the Ethereum blockchain ledger. 4. Nexus Earth collaborates with SingularityNET to Integrate Artificial Intelligence with Blockchain Technology Nexus (NXS) partners with SingularityNET (AGI), to explore new technologies from the collaboration. The new partnership could result in a secure, scalable and censorship-resistant blockchain AI infrastructure. SingularityNET is looking to expand horizons creating a decentralized AI network based on blockchain. It is also planning to explore use of Nexus' satellite-based alternative internet protocol.As for Nexus, the collaboration gives a valuable use-case for deploying its 3D Chain architecture and exploring AI applications on layers 1 and 2 of its network. 5. Microsoft forms Cortana Intelligence Institute to advance AI Microsoft, on Thursday announced the establishment of the Cortana Intelligence Institute. It is a collaboration with the Royal Melbourne Institute of Technology (RMIT), which is focused on broadening the capabilities of its virtual assistant. Researchers from RMIT will work with Microsoft personnel to apply AI to new tasks that currently can’t be handled by neural networks. The primary task on the agenda is to assemble a “multidimensional” user dataset for development purposes. Microsoft aims to gather a wide variety of information ranging from online activity patterns to location. It is also looking to build new AI models that can understand contextual data well enough to interpret and carry out complex user requests involving multiple different steps. Read the complete coverage for a detailed information on this establishment.

0
0
1233

article-image-how-deep-neural-networks-can-improve-speech-recognition-and-generation

Sugandha Lahoti

02 Feb 2018

7 min read

How Deep Neural Networks can improve Speech Recognition and generation

Sugandha Lahoti

02 Feb 2018

7 min read

While watching your favorite movie or TV show, you must have found it difficult to sometimes decipher what the characters are saying, especially if they are talking really fast, or well, you’re seeing a show in the language you don’t know. You quickly add subtitles and voila, the problem is solved. But, do you know how these subtitles work? Instead of a person writing them, a computer automatically recognizes speech and the dialogues of the characters and generates scripts. However, this is just a trivial example of what computers and neural networks can do in the field of speech understanding and generation. Today, we’re gonna talk about the achievements of deep neural networks to improve the ability of our computing systems to understand and generate human speech. How traditional speech recognition systems work Traditionally speech recognition models used classification algorithms to arrive at a distribution of possible phonemes for each frame. These classification algorithms were based on highly specialized features such as MFCC. Hidden Markov Models (HMM) were used in the decoding phase. This model was accompanied with a pre-trained language model and was used to find the most likely sequence of phones that can be mapped to output words. With the emergence of deep learning, neural networks were used in many aspects of speech recognition such as phoneme classification, isolated word recognition, audiovisual speech recognition, audio-visual speaker recognition and speaker adaptation. Deep learning enabled the development of Automatic Speech Recognition (ASR) systems. These ASR systems require separate models, namely acoustic model (AM), a pronunciation model (PM) and a language model (LM). The AM is typically trained to recognize context-dependent states or phonemes, by bootstrapping from an existing model which is used for alignment. The PM maps the sequences of phonemes produced by the AM into word sequences. Word sequences are scored using LM trained on large amounts of text data, which estimate probabilities of word sequences. However, training independent components added complexities and was suboptimal compared to training all components jointly. This called for developing end-to-end systems in the ASR community, those which attempt to learn the separate components of an ASR jointly as a single system. A single system Speech recognition model The end-to-end trained neural networks can essentially recognize speech, without using an external pronunciation lexicon, or a separate language model. End-to-end trained systems can directly map the input acoustic speech signal to word sequences. In such sequence-to-sequence models, the AM, PM, and LM are trained jointly in a single system. Since these models directly predict words, the process of decoding utterances is also greatly simplified. The end-to-end ASR systems do not require bootstrapping from decision trees or time alignments generated from a separate system. Thereby making the training of such models simpler than conventional ASR systems. There are several sequence-to-sequence models including connectionist temporal classification (CTC), and recurrent neural network (RNN) transducer, an attention-based model etc. CTC models are used to train end-to-end systems that directly predict grapheme sequences. This model was proposed by Graves et al. as a way of training end-to-end models without requiring a frame-level alignment of the target labels for a training statement. This basic CTC model was extended by Graves to include a separate recurrent LM component, in a model referred to as the recurrent neural network (RNN) transducer. The RNN transducer augments the encoder network from the CTC model architecture with a separate recurrent prediction network over the output symbols. Attention-based models are also a type of end-to-end sequence models. These models consist of an encoder network, which maps the input acoustics into a higher-level representation. They also have an attention-based decoder that predicts the next output symbol based on the previous predictions. A schematic representation of various sequence-to-sequence modeling approaches Google’s Listen-Attend-Spell (LAS) end-to-end architecture is one such attention-based model. Their end-to-end system achieves a word error rate (WER) of 5.6%, which corresponds to a 16% relative improvement over a strong conventional system which achieves a 6.7% WER. Additionally, the end-to-end model used to output the initial word hypothesis, before any hypothesis rescoring, is 18 times smaller than the conventional model. These sequence-to-sequence models are comparable with traditional approaches on dictation test sets. However, the traditional models outperform end-to-end systems on voice-search test sets. Future work is being done on building optimal models for voice-search tests as well. More work is also expected in building multi-dialect and multi-lingual systems. So that data for all dialects/languages can be combined to train one network, without the need for a separate AM, PM, and LM for each dialect/language. Enough with understanding speech. Let’s talk about generating it Text-to-speech (TTS) conversion, i.e generating natural sounding speech from text, or allowing people to converse with machines has been one of the top research goals in the present times. Deep Neural networks have greatly improved the overall development of a TTS system, as well as enhanced individual pieces of such a system. In 2012, Google first used Deep Neural Networks (DNN) instead of Gaussian Mixture Model (GMMs), which were then used as the core technology behind TTS systems. DNNs assessed sounds at every instant in time with increased speech recognition accuracy. Later, better neural network acoustic models were built using CTC and sequence discriminative training techniques based on RNNs. Although being blazingly fast and accurate, these TTS systems were largely based on concatenative TTS, where a very large database of short speech fragments was recorded from a single speaker and then recombined to form complete utterances. This led to the development of parametric TTS, where all the information required to generate the data was stored in the parameters of the model, and the contents and characteristics of the speech were controlled via the inputs to the model. WaveNet further enhanced these parametric models by directly modeling the raw waveform of the audio signal, one sample at a time. WaveNet yielded more natural-sounding speech using raw waveforms and was able to model any kind of audio, including music. Baidu then came with their Deep Voice TTS system constructed entirely from deep neural networks. Their system was able to do audio synthesis in real-time, giving up to 400X speedup over previous WaveNet inference implementations. Google, then released Tacotron, an end-to-end generative TTS model that synthesized speech directly from characters. Tacotron was able to achieve a 3.82 mean opinion score (MOS), outperforming the traditional parametric system in terms of speech naturalness. Tacotron was also considerably faster than sample-level autoregressive methods because of its ability to generate speech at the frame level. Most recently, Google has released Tacotron 2 which took inspiration from past work on Tacotron and WaveNet. It features a tacotron style, recurrent sequence-to-sequence feature prediction network that generates mel spectrograms. Followed by a modified version of WaveNet which generates time-domain waveform samples conditioned on the generated mel spectrogram frames. The model achieved a MOS of 4.53 compared to a MOS of 4.58 for professionally recorded speech. Deep Neural Networks have been a strong force behind the developments of end-to-end speech recognition and generation models. Although these end-to-end models have compared substantially well against the classical approaches, more work is to be done still. As of now, end-to-end speech models cannot process speech in real time. Real-time speech processing is a strong requirement for latency-sensitive applications such as voice search. Hence more progress is expected in such areas. Also, end-to-end models do not give expected results when evaluated on live production data. There is also difficulty in learning proper spellings for rarely used words such as proper nouns. This is done quite easily when a separate PM is used. More efforts will need to be made to address these challenges as well.

0
0
6552

article-image-2nd-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

02 Feb 2018

3 min read

2nd Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

02 Feb 2018

3 min read

PdVega, a new library for pandas, Elastic Cloud Enterprise version 1.1.3, Google Analytics’ Audiences report, EverString’s Data platform, and more in today’s top stories around machine learning, blockchain, and data science news. 1. Introducing PdVega, a library for creating Interactive Vega-Lite Plots for Pandas The PdVega library allows quick creation of interactive Vega-Lite plots from Pandas dataframes. Vega-Lite is a visualization specification that allows users to declaratively describe which data features should map to which visualization features using a well-defined JSON schema. PdVega uses an API that is nearly identical to Pandas’ built-in plotting API. It is designed for easy use within the Jupyter notebook. The resulting plots are beautiful and dynamic data visualizations with a minimum of boilerplate. More information is available at the official documentation. 2. Elastic Cloud Enterprise version 1.1.3 released Elastic Cloud Enterprise (ECE) 1.1.3 has been released with an important bug fix to support Elasticsearch 6.1.x deployments. This new release adds support for Elasticsearch and Kibana version 6.1.3 by fixing a potential data loss bug when attempting a cluster configuration change (meaning any cluster configuration change, such as an upgrade or the addition of capacity). More bug fixes include: For stack versions 6.1.0 and above, Kibana now navigates to the home page when there is no data. Added a check that ensures the reallocation of clusters happens only after data is successfully migrated. Added an internal configuration flag when starting ZooKeeper that corrects a failure during an ECE upgrade. Other minor bug fixes and changes can be found in the release notes. 3. Google Analytics rolls out new ‘Audiences’ report to analyze a website’s custom audiences. Google Analytics has introduced a new report in Analytics called “Audiences” which analyzes a website’s custom audiences. The new Audience dimension can be used in segments and custom reports. With the new Audiences report, users can now view how their audience is performing and subsequently evaluate remarketing efforts. The Audiences report can display the following metrics: Acquisition: The volume of users an audience is sending, and how well the audience works to generate potential new business. Behavior: How well a site engages a particular audience based on bounce rate, pages per session, and time on site. Conversions: How well an audience is performing in terms of goal completions and transactions. 4. Hortonworks updates its streaming analytics platform for better data flow management Hortonworks Inc. incorporated some new releases to their Hortonworks DataFlow (HDF) streaming analytics platform. It now has the ability to share and publish data flows directly to production with improved support for complex processes. According to Scott Gnau, Hortonworks’ CTO, “The new release will be particularly useful for companies in regulated environments that need to rigorously document and govern their data”. Apart from this, HDF can now also be integrated with the Apache Atlas data governance and metadata framework, Hortonworks’s SmartSense problem resolution and optimization software, and Apache Knox authentication gateway. The new release is available as on 1st Feb’2018. All Hortonworks enhancements have been incorporated into their respective open-source projects. 5. EverString announces ML powered Data Platform for B2B marketing firms EverString announced the launch of a new Data Platform powered by Machine Learning to provide sales and marketing teams with company intelligence. The platform combines machine learning and AI to help keep contact, firmographic, technographic and intent insights up to date in real time. The platform can automatically identify problematic data and apply machine learning to improve system-wide accuracy. With this platform, B2B companies can prioritize pipeline to focus time and resources on high-value prospects and maintain their growing databases with accurate data on relevant prospects.

0
0
1396

article-image-1st-feb-2018-data-science-news-daily-roundup

Packt Editorial Staff

01 Feb 2018

4 min read

1st Feb 2018 – Data Science News Daily Roundup

Packt Editorial Staff

01 Feb 2018

4 min read

OpenAI’s seven unsolved problems, NVIDIA integrated GPU to IBM cloud, open-sourcing Psychlab, InterSystems IRIS data platform generally available, and more in today’s top stories around machine learning, deep learning,and data science news. 1. OpenAI releases new batch of seven unsolved problems OpenAI states that it is releasing new batch of seven unsolved problems, which came up during the course of their research. These questions will pave a meaningful way for new people to enter the field, as well as for practitioners to hone their skills. It is also a great way for people to get a job at OpenAI. Let us now have a look at the seven unsolved problems: Implement and solve a multiplayer clone of the classic Snake game as a Gym environment. (One can refer slither.io for inspiration) Explore the effect of parameter averaging schemes on sample complexity and amount of communication in RL algorithms. Transfer Learning Between Different Games via Generative Models Use linear attention for the Transformer model (which uses soft attention with softmax)in order to use the resulting model for RL. Use a learned VAE of data, to perform “learned data augmentation”. Experimentally investigate (and qualitatively explain) the effect of different regularization methods on an RL algorithm of choice. Excited? Have a detailed read on OpenAI blog. 2. DeepMind open-sources Psychlab DeepMind open-sourced Psychlab, a platform built on top of DeepMind Lab, for others to use. Psychlab allows direct application of methods from fields like cognitive psychology to study behaviours of artificial agents in a controlled environment. Alongwith open-sourcing Psychlab, the DeepMind team have also built a series of classic experimental tasks to run on the virtual computer monitor, which has a flexible and easy-to-learn API, enabling others to build their own tasks. Read more about Psychlab and the added tasks on DeepMind’s blog. 3. NVIDIA integrates its fastest GPU accelerator within IBM Cloud: Boosts AI and HPC workloads IBM announces availability of NVIDIA Tesla V100 GPU on its Cloud, which aims to accelerate enterprise efforts in mission-critical artificial intelligence (AI), deep learning, and HPC workloads. The V100 GPU is NVIDIA's fastest and most advanced GPU accelerator on the market, says John Considine, general manager of cloud infrastructure services for IBM Watson and Cloud Platform. Users can now integrate individual IBM Cloud bare metal servers with up to two NVIDIA Tesla V100 PCle GPU accelerators. This combination of IBM's high-speed network connectivity and bare metal servers with the V100 GPUs will provide a major boost to compute-intensive workloads. In a blog post, John Considine, general manager of cloud infrastructure services for IBM Watson and Cloud Platform said,"With the Tesla P100 GPU accelerator, you can leverage up to 65 percent more deep learning capabilities and 50 times the performance than its predecessor". For details, visit IBM’s blog. 4. DeepMind’s new research paper on achieving symbolic generalisation in deep neural networks DeepMind has come up with a new paper in the Journal of Artificial Intelligence Research(JAIR). The new paper showcases how Deep Neural Networks can be extended for generalizing visually and symbolically. This paper proposes a Differentiable Inductive Logic framework, which can solve tasks which traditional Inductive Logic Programming (ILP) systems are suited for. It can also show robustness to noise and error in the training data which ILP cannot cope with. Further, as it is trained by backpropagation, it can be hybridised by connecting it with neural networks over ambiguous data. Thus, this provides data efficiency and generalisation beyond what neural networks on their own can achieve. Read the detailed information on DeepMind’s blog. You can also read the research paper here. 5. InterSystems IRIS data platform is now generally available InterSystems announced the general availability of InterSystems IRIS Data Platform, the first data platform to deliver multi-workload and multi-model data management, native interoperability, and an open analytics platform in a single product. The IRIS is a complete unified data platform that makes it faster and easier to build real-time data-rich applications. It allows organizations to combine event and transactional data with large sets of historical and other data for capturing untapped business opportunities and also to improve operational efficiencies. InterSystems IRIS Data Platform aims: To delivers concurrent transactional and analytic processing, and multiple data representations (including relational and non-relational models which are always synchronized) in a single database; To provide a complete set of interoperability capabilities for integrating disparate data and applications and create seamless real time business processes To include business intelligence and natural language processing capabilities, and an open analytics platform that allows best-of-breed, third-party analytics to be easily incorporated through dedicated connectors and industry standards To support flexible deployment options for public and private cloud, on premises, and hybrid environments. Have a detailed read at the official press release.

0
0
1485

article-image-31st-jan-2018-data-science-news-daily-roundup

Packt Editorial Staff

31 Jan 2018

4 min read

31st Jan 2018 – Data Science News Daily Roundup

Packt Editorial Staff

31 Jan 2018

4 min read

Hyperledger Sawtooth 1.0, an implementation of AlphaGoZero, enhancements in PostgreSQL 11, SAS AI offerings, and more in today’s top stories around machine learning, blockchain, and data science news. 1. Hyperledger Sawtooth 1.0, the second blockchain framework from Hyperledger, is now production ready Hyperledger announced the availability of their second blockchain framework, Hyperledger Sawtooth 1.0. It is the latest open source digital ledger project, after Hyperledger Fabric which reached version 1.0 in July 2017. Sawtooth 1.0 is equipped with several new enterprise features: On-chain governance – Users can now utilize smart contracts to vote on blockchain configuration settings such as the allowed participants and smart contracts. Advanced transaction execution engine – The engine can process transactions in parallel to accelerate block creation and validation. Support for Ethereum – Sawtooth runs solidity smart contracts and allows integration with Ethereum tooling. Dynamic consensus – Users can also upgrade or swap the blockchain consensus protocol on the fly, enabling the integration of more scalable algorithms as they are available. Interested users can download the code here. They can also read the official documentation here. 2. Minigo: An open-source implementation of the AlphaGoZero algorithm Minigo is a pure Python implementation of a neural network-based Go AI, using TensorFlow. It is inspired by DeepMind's AlphaGo algorithm. Minigo is based on Brian Lee's MuGo, which is a pure Python implementation of the first AlphaGo paper. The project provides a clear set of learning examples using Tensorflow, Kubernetes, and Google Cloud Platform for establishing Reinforcement Learning pipelines on various hardware accelerators. It reproduces the methods of the original DeepMind AlphaGo papers through an open-source implementation and open-source pipeline tools. The project aims to provide their contributions in the form of data, results, and discoveries for the benefit of the Go, machine learning, and Kubernetes communities. More information is available at the official Github repo. 3. PostgreSQL 11 plans to add enhancements to Partitioning & Indexes PostgreSQL 11 would be releasing this year, and the team plans to add some enhancements to partitioning and indexes. The whole idea is to allow Partitioned tables to have Referential Integrity, by way of Primary Keys and Foreign Keys, and some additional tweaks can be expected. Foreign Keys (FKs) are implemented using row Triggers, so Triggers would allow them to be executed on Partitioned Tables. Primary Keys are implemented using Unique Indexes, so an addition of indexes would allow them to be unique. Following are some set of features and the order in which they have to be implemented: Create Index on Partitioned Tables Allow Unique Index on Partitioned Tables Create Triggers on Partitioned Tables Allow FKs on Partitioned Tables To have a detailed read on this news visit the website. 4. SAS launches new AI offerings for Text Analytics, Data Mining, and Machine Learning SAS, an analytics software development firm, has released a variety of new offerings for its SAS Viya Platform. This includes SAS Visual Text Analytics and significant enhancements to SAS Visual Data Mining and Machine Learning. SAS Visual Text Analytics is a modern and flexible framework which can perform text mining, contextual extraction, categorization, sentiment analysis and search operations. It extracts value from unstructured data using NLP, machine learning, and linguistic rules. The software allows users to prepare data for analysis, visually explore topics, build text models and deploy them within existing systems or business processes. Apart from this, there are also enhancements in SAS Visual Data Mining and Machine Learning. It now offers an end-to-end visual environment for data access, data wrangling, sophisticated model building, and deployment. It has an in-memory, distributed processing to solve critical business queries. It also supports programming from popular open source languages like Python and R. 5. Cisco advances its intent-based networking with new analytics services Cisco has introduced three new analytics tools to advance its intent based networking services. These analytics services are powerful assurance products spanning the entire networking portfolio. Network Assurance Engine, which continually verifies network health and uses models to pinpoint issues with the network. It uses continuous verification of the entire network to help keep a business running as intended, even as the network changes dynamically. Cisco's ACI and Tetration connect to the Network Assurance engine to link network and application monitoring. DNA Center Assurance, which is a service that connects users and application behavior to make predictions. DNA Center Assurance provides problem isolation so IT teams can find a root cause quickly, replicate problems and offer guided remediation. Meraki Network Health, which is a cloud IT management tool to automate network and IT operations. The tool finds poor performing access points and provides insights to improve service.

0
0
1189

article-image-30th-jan-2018-data-science-news-daily-roundup

Packt Editorial Staff

30 Jan 2018

4 min read

30th Jan 2018 – Data Science News Daily Roundup

Packt Editorial Staff

30 Jan 2018

4 min read

Microsoft releases new data science tools, Tensorflow publishes an implementation of SPINN, Machinelabs now supports private labs, and more in today’s top stories around machine learning, deep learning,and data science news. 1. Microsoft Releases DataScience Tools for Interactive Data Exploration and Data Modeling Microsoft has introduced the early preview release of the Data Science Utilities developed by Team Data Science Process (TDSP). At present, the Data Science Utilities are released in the GitHub repository and these include: Interactive Data Exploration, Analysis, and Reporting (IDEAR) in R, MRS, and in Python are tools developed for data scientists to interactively explore, visualize, and analyze data sets prior to building modeling tasks.The Python version of IDEAR is delivered through Jupyter Notebooks which runs on both Jupyter Notebook Server available and any notebook services in Python 2.7 or 3.5 kernel, as long as the required Python libraries are installed on the notebook server. Automated Modeling and Reporting in R (AMAR in R) tool creates an automated workflow for generating and comparing multiple modeling approaches on a data-set. One can easily run these utilities on sample data in the Data/Common directory. To read more on this, visit the GitHub repo. 2. Tensorflow publishes an implementation of SPINN written with Eager execution Tensorflow recently published an implementation of SPINN written with Eager execution. Stack-Augmented Parser-Interpreter Neural Network (SPINN), is a recursive neural network that utilizes syntactic parse information for natural language understanding. It was originally described in the paper, A Fast Unified Model for Parsing and Sentence Understanding. The Tensorflow implementation is based on Jek bradbury's PyTorch implementation. It includes model definition and training routines, a pipeline for loading and preprocessing the SNLI data and GloVe word embedding, written using the tf.data API, saving and loading checkpoints, TensorBoard summaries for monitoring and visualization, etc. More information can be found at the Github repo. 3. Machinelabs now supports private labs MachineLabs announces the support for Private Labs. MachineLabs is an open platform for sharing machine learning experiments with others. This means Labs can be viewed by and shared with everyone one, even via a browser. If a user wants to work in secrecy, or use it for company’s internal tasks, or may want to do a trial and error and don’t want to create a clutter of trails and errors in the public labs, they can now use private labs. To set labs as private is as easy as setting a flag private in one’s lab settings. Once private, the lab will be only visible to the user, including its executions. One can also find all the public and private labs on their own profile page. Also, these private labs can be recognized by the little “private” badge. Know more about these Private labs on MachineLab’s blog post. 4. Aureum 5.3 to power predictive analytics for Data-Driven Industrial World Peaxy announced the release of Aureum 5.3, a data access solution that provides a foundation for industrial digital twins and predictive analytics. Manuel Terranova, CEO, Peaxy, says, “Aureum has evolved since 2012 from an advanced distributed data platform to an incredibly useful infrastructure component in complex analytical solutions and predictive applications. Our team of engineers are experts in supporting predictive analytics solutions to difficult industrial problems at enterprise scale.” Aureum 5.3 is being used by Fortune 100 companies in the aviation, power generation and oil & gas industries as an essential data staging area for analytics that solve real-world business problems. Know more on the website. 5. Dodge Data & Analytics launches Dodge Construction Central Dodge Data & Analytics announced the launch of Dodge Construction Central, a single unified hub where all construction industry and project stakeholders can discover, share and access new and unique insights from across the entire construction ecosystem and along the full project lifecycle to make timely, data-driven decisions. Dodge Construction Central Delivers deep intelligence to project stakeholders from the most-comprehensive industry data cloud. It empowers project stakeholders to collaborate with project teams and integrate insights directly into their business processes by leveraging artificial intelligence, advanced analytics, collaboration and workflow automation technologies. To know about the new capabilities offered by Dodge Construction Central, you can visit this website.

0
0
1194

article-image-29th-jan-2018-data-science-news-daily-roundup

Packt Editorial Staff

29 Jan 2018

4 min read

29th Jan 2018 – Data Science News Daily Roundup

Packt Editorial Staff

29 Jan 2018

4 min read

Tensorflow 1.5.0, DataNucleus AccessPlatform 5.1.6, Databricks comes to Microsoft Azure, Citus 7.2, and more in today’s top stories around machine learning, deep learning, and data science news. 1. Tensorflow 1.5.0 now generally available with preview of Tensorflow Lite Tensorflow 1.5.0 is now generally available. Previously, Tensorflow 1.5 RC was announced on 4th January, 2018. With Tensorflow 1.5, a lot of new features and changes have been added. The breaking changes revolve around prebuilt binaries. They are now built against CUDA 9 and cuDNN 7. Also, starting from 1.6 release, prebuilt binaries will use AVX instructions. Other major features and improvements include: The availability of eager execution preview version. The availability of TensorFlow Lite dev preview. Accelerated Linear Algebra (XLA) related changes. Addition of streaming_precision_recall_at_equal_thresholds, a method for computing streaming precision and recall with O(num_thresholds + size of predictions) time and space complexity. RunConfig default behavior will now not set a random seed, making random behavior independently random on distributed workers. The implementation of tf.flags is replaced with absl.flags. Support for CUBLAS_TENSOR_OP_MATH in fp16 GEMM. Support for CUDA on NVIDIA Tegra devices. For details on bug fixes and other changes, see the full release notes here. 2. A new version of DataNucleus AccessPlatform is now available DataNucleus AccessPlatform is Apache 2 licensed and provides retrieval of Java objects to a range of datastores using JDO/JPA/REST APIs, with a range of query languages. Now they have released the version 5.1.6 which includes a lot of new enhancements and bug fixes. ClassUtils.getConstructorWithArguments doesn’t allow to skip type check of one of the arguments. Support for queries with "IS NULL" / "IS NOT NULL" added. Support for String toUpperCase/toLowerCase/trim/trimLeft/trimRight/substring in JDOQL/JPQL added. Support for Numeric cos/sin/tan/acos/asin/atan/toDegrees/toRadians in JDOQL/JPQL added. Unable to execute an UPDATE JPQL Query against a domain class that contains 'Set' in its name. Lists might appear empty while they are actually not (forEach). Retrieval code doesn't handle primitive retrieval when not existing in database Inequality Filter method, .ne() gives QueryExecutionException. Query with candidate being base of inheritance tree using "complete-table" strategy fails when overriding the "id" column name. JDOQL query fails when using reference to interface field, and implementations share table. @Basic @Lob ArrayList<byte[]> entity field results in erroneous metamodel. @Basic @Lob Serializable entity field results in erroneous metamodel. The entire changelog can be found in their release notes. 3. Databricks integrates with Microsoft Azure Until now, services of Databricks were available as a single cloud offering based on the Amazon Web Services (AWS) cloud. Starting 27th January 2018, a new flavor of Apache Spark service is announced. Called Azure Databricks (ADB), it is based on and is tightly integrated with Microsoft Azure. The Apache Spark-based analytics platform is optimized for the Microsoft Azure cloud services platform. It provides one-click setup, streamlined workflows, and an interactive workspace allowing data scientists, data engineers, and business analysts to collaborate. This new service is a first-party offering from Microsoft. It consists of three major parts, a notebook-based collaborative workspace, the Databricks Runtime, and a serverless compute model. ADB has direct support for Azure Blob Storage and Azure Data Lake Store. It also integrates with Cosmos DB and Azure Active Directory. More information is available here. 4. A new version of Citus (7.2), the distributed database is now released Citus have announced the version 7.2 of their distributed database. With Citus database version 7.2, distributed SQL support is added to queries that run on data spread across a cluster of machines. A quick overview of the changes in Citus database version 7.2 for distributed queries include: Common Table Expressions (CTEs). Complex subqueries. Set operations (UNION, INTERSECT, etc). Joins between distributed and local tables through CTEs. Joins that include non-equality clauses. Partition management automation with pg_partman. Citus 7.2 is compatible with PostgreSQL 9.6 and 10. It can be downloaded by following the instructions here. It can also be deployed in a single-click through the Citus Cloud console. To learn more about the new features, visit the official blog. 5. CapLinked announces Transitnet, a new blockchain framework CapLinked has launched their new blockchain framework TransitNet to protect and record enterprise transactions. TransitNet is a decentralized protocol that protects digital assets and permanently records data access. The protocol is accessible via an API and is used to apply protections and activity tracking for information exchanging during business deals. TransitNet adds security to transfer of funds, as opposed to Ripple and other decentralized technologies that are addressing payments. TransitNet’s decentralized application will allow users to apply protections and track their digital assets when they’re transferred to third parties. They’ll be able to encrypt, watermark, and set access parameters for digital assets being moved and track their movement on an immutable decentralized ledger.

0
0
1320

Tech News - Data

14th Feb 2018 – Data Science News Daily Roundup

13th Feb 2018 – Data Science News Daily Roundup

12th Feb 2018 – Data Science News Daily Roundup

9th Feb 2018 – Data Science News Daily Roundup

8th Feb 2018 – Data Science News Daily Roundup

7th Feb 2018 – Data Science News Daily Roundup

6th Feb 2018 – Data Science News Daily Roundup

AutoML : Developments and where is it heading to

5th Feb 2018 – Data Science News Daily Roundup

How Deep Neural Networks can improve Speech Recognition and generation

Trending Topics

2nd Feb 2018 – Data Science News Daily Roundup

1st Feb 2018 – Data Science News Daily Roundup

31st Jan 2018 – Data Science News Daily Roundup

30th Jan 2018 – Data Science News Daily Roundup

29th Jan 2018 – Data Science News Daily Roundup