Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-pytorch-1-0-is-here-with-jit-c-api-and-new-distributed-packages

10 Dec 2018

4 min read

PyTorch 1.0 is here with JIT, C++ API, and new distributed packages

10 Dec 2018

It was just two months back when Facebook announced the release of PyTorch 1.0 RC1. Facebook is now out with the stable release of PyTorch 1.0. The latest release, which was announced last week at the NeurIPS conference, explores new features such as JIT, brand new distributed package, and Torch Hub, breaking changes, bug fixes and other improvements. PyTorch is an open source, deep learning python-based framework. “It accelerates the workflow involved in taking AI from research prototyping to production deployment, and makes it easier and more accessible to get started”, reads the announcement page. Let’s now have a look at what’s new in PyTorch 1.0 New Features JIT JIT is a set of compiler tools that is capable of bridging the gap between research in PyTorch and production. JIT enables the creation of models that have the capacity to run without any dependency on the Python interpreter. PyTorch 1.0 offers two ways using which you can make your existing code compatible with the JIT: using torch.jit or torch.jit.script. Once the models have been annotated, Torch Script code can be optimized and serialized for later use in the new C++ API, which doesn't depend on Python. Brand New distributed package In PyTorch 1.0, the new torch.distributed package and torch.nn.parallel.DistributedDataParallel comes backed with a brand new re-designed distributed library. Major highlights of the new library are as follows: The new torch.distributed is performance driven and operates entirely asynchronously for all backends such as Gloo, NCCL, and MPI. There are significant Distributed Data-Parallel performance improvements for hosts with slower networks such as Ethernet-based hosts. It comes with async support for all distributed collective operations in the torch.distributed package. C++ frontend [ API unstable] The C++ frontend is a complete C++ interface to the PyTorch backend. It follows the API and architecture of the established Python frontend and is meant to enable research in high performance, low latency and bare metal C++ applications. It also offers equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. PyTorch team has released C++ frontend marked as "API Unstable" as part of PyTorch 1.0. This is because although it is ready to use for research applications, it still needs to get more stabilized over future releases. Torch Hub Torch Hub refers to a pre-trained model repository that has been designed to facilitate research reproducibility. Torch Hub offers support for publishing pre-trained models (model definitions and pre-trained weights) to a github repository with the help of hubconf.py file. Once published, users can then load the pre-trained models with the help of torch.hub.load API. Breaking Changes Indexing a 0-dimensional tensor displays an error instead of warn. torch.legacy has been removed. torch.masked_copy_ is removed and hence, use torch.masked_scatter_ instead. torch.distributed: the TCP backend has been removed. It is recommended to use Gloo and MPI backends for CPU collectives and NCCL backend for GPU collectives. torch.tensor function with a Tensor argument can now return a detached Tensor (i.e. a Tensor where grad_fn is None) in PyTorch 1.0. torch.nn.functional.multilabel_soft_margin_loss now returns Tensors of shape (N,) instead of (N, C). This is to match the behaviour of torch.nn.MultiMarginLoss and it is also more numerically stable. Support for C extensions has been removed in PyTorch 1.0. Torch.utils.trainer has been deprecated. Bug fixes torch.multiprocessing has been fixed and now correctly handles CUDA tensors, requires_grad settings, and hooks. Memory leak during packing in tuples has been fixed. RuntimeError: storages that don't support slicing when loading models are saved with PyTorch 0.3, has been fixed. The issue with calculated output sizes of torch.nn.Conv modules with stride and dilation have been fixed. torch.dist has been fixed for infinity, zero and minus infinity norms. torch.nn.InstanceNorm1d has been fixed and now can correctly accept 2-dimensional inputs. torch.nn.Module.load_state_dict showed an incorrect error message that has been fixed. broadcasting bug in torch.distributions.studentT.StudentT has been fixed. Other Changes “Advanced Indexing" performance has been considerably improved on CPU as well as GPU. torch.nn.PReLU speed has been improved on both CPU and GPU. Printing large tensors has become faster. N-dimensional empty tensors have been added in PyTorch 1.0, which allows tensors with 0 elements to have arbitrary number of dimensions. They also support indexing and other torch operations. For more information, check out the official release notes. Can a production-ready Pytorch 1.0 give TensorFlow a tough time? Pytorch.org revamps for Pytorch 1.0 with design changes and added Static graph support What is PyTorch and how does it work?

0
0
4932

article-image-netflix-open-sources-polynote-an-ide-like-polyglot-notebook-with-scala-support-apache-spark-integration-multi-language-interoperability-and-more

Vincy Davis

31 Oct 2019

4 min read

Netflix open sources Polynote, an IDE-like polyglot notebook with Scala support, Apache Spark integration, multi-language interoperability, and more

Vincy Davis

31 Oct 2019

4 min read

Last week, Netflix announced the open source launch of Polynote which is a polyglot notebook. It comes with a full scale Scala support, Apache Spark integration, multi-language interoperability including Scala, Python, SQL, and provides IDE-like features such as interactive autocomplete, a rich text editor with LaTeX support, and more. Polynote renders a seamless integration of Netflix’s Scala employed JVM-based ML platform with Python’s machine learning and visualization libraries. It is currently used by Netflix’s personalization and recommendation teams and is also being integrated with the rest of the Netflix research platform. The Netflix team says, “Polynote originated from a frustration with the shortcomings of existing notebook tools, especially with respect to their support of Scala.” Also, “we found that our users were also frustrated with the code editing experience within notebooks, especially those accustomed to using IntelliJ IDEA or Eclipse.” Key features supported by Polynote Reproducibility A traditional notebook generally relies on a Read–eval–print loop (REPL) environment to build an interactive environment with other users. According to Netflix, the expressions and the results of a REPL evaluation is quite rigid. Thus, Netflix built the Polynote’s code interpretation from scratch, instead of relying on a REPL. This helps Polynote to keep track of the variables defined in each cell by constructing the input state for a given cell based on the cells that have run above it. By making the position of a cell important in its execution semantics, Polynote allows the users to read the notebook from top to bottom. This ensures reproducibility in Polynote by increasing the chances of running the notebook sequentially. Editing Improvements Polynote provides editing enhancements like: It integrates code editing with the Monaco editor for interactive auto-complete. It highlights errors internally to help users rectify it quickly. A rich text editor for text cells which allows users to easily insert LaTeX equations. Visibility One of the major guiding principles of Polynote is its visibility. It enables live view of what the kernel is doing at any given time, without requiring logs. A single glance at a user interface imparts with many information like- The notebook view and task list displays the current running cell, and also shows the queue to be run. The exact statement running in the system is highlighted in colour. Job and stage level Spark progress information is shown in the task list. The kernel status area provides information about the execution status of the kernel. Polyglot Currently, Polynote supports Scala, Python, and SQL cell types and enables users to seamlessly move from one language to another within the same notebook. When a cell is running in the system, the kernel handovers the typed input values to the cell’s language interpreter. Successively, the interpreter provides the resulted typed output values back to the kernel. This enables the cell in a Polynote notebook to run irrespective of the language with the same context and the same shared state. Dependency and Configuration Management In order to ease reproducibility, Polynote yields configuration and dependency setup within the notebook itself. It also provides a user-friendly Configuration section where users can set dependencies for each notebook. This allows Polynote to fetch the dependencies locally and also load the Scala dependencies into an isolated ClassLoader. This reduces the chances of a class conflict of Polynote with the Spark libraries. When Polynote is used in Spark mode, it creates a Spark Session for the notebook, where the Python and Scala dependencies are automatically added to the Spark Session. Data Visualization One of the most important use cases of a notebook is its ability to explore and visualize data. Polynote integrates with two open source visualization libraries- Vega and Matplotlib. It also has a native support for data exploration such as including a data schema view, table inspector and plot constructor. Hence, this feature helps users to learn about their data without cluttering their notebooks. Users have appreciated Netflix efforts of open sourcing their Polynote notebook and have liked its features https://twitter.com/SpirosMargaris/status/1187164558382845952 https://twitter.com/suzatweet/status/1187531789763399682 https://twitter.com/SpirosMargaris/status/1187164558382845952 https://twitter.com/julianharris/status/1188013908587626497 Visit the Netflix Techblog for more information of Polynote. You can also check out the Polynote website for more details. Netflix security engineers report several TCP networking vulnerabilities in FreeBSD and Linux kernels Netflix adopts Spring Boot as its core Java framework Netflix’s culture is too transparent to be functional, reports the WSJ Linux foundation introduces strict telemetry data collection and usage policy for all its projects Fedora 31 releases with performance improvements, dropping support for 32 bit and Docker package

0
0
4854

article-image-baidu-announces-clarinet-a-neural-network-for-text-to-speech-synthesis

Sugandha Lahoti

23 Jul 2018

2 min read

Baidu announces ClariNet, a neural network for text-to-speech synthesis

Sugandha Lahoti

23 Jul 2018

2 min read

Text-to-speech synthesis has been a booming research area, with Google, Facebook, Deepmind, and other tech giants showcasing their interesting research and trying to build better TTS models. Now Baidu has stolen the show with ClariNet, the first fully end-to-end TTS model, that directly converts text to a speech waveform in a single neural network. Classical TTS models such as Deepmind’s Wavenet usually have a separately text-to-spectrogram and waveform synthesis models. Having two models may result in suboptimal performance. ClariNet combines the two models into one fully convolutional single neural network. Not only that, their text-to-wave model significantly outperforms the previous separate TTS models, they claim. Baidu’s ClariNet consists of four components: Encoder, which encodes textual features into an internal hidden representation. Decoder, which decodes the encoder representation into the log-mel spectrogram in an autoregressive manner. Bridge-net: An intermediate processing block, which processes the hidden representation from the decoder and predicts log-linear spectrogram. It also upsamples the hidden representation from frame-level to sample-level. Vocoder: A Gaussian autoregressive WaveNet to synthesize the waveform. It is conditioned on the upsampled hidden representation from the bridge-net. ClariNet’s Architecture Baidu has also proposed a new parallel wave generation method based on the Gaussian inverse autoregressive flow (IAF). This mechanism generates all samples of an audio waveform in parallel, speeding up waveform synthesis dramatically as compared to traditional autoregressive methods. To teach a parallel waveform synthesizer, they use a Gaussian autoregressive WaveNet as the teacher-net and the Gaussian IAF as the student-net. Their Gaussian autoregressive WaveNet is trained with maximum likelihood estimation (MLE). The Gaussian IAF is distilled from the autoregressive WaveNet by minimizing KL divergence between their peaked output distributions, stabilizing the training process. For more details on ClariNet, you can check out Baidu’s paper and audio samples. How Deep Neural Networks can improve Speech Recognition and generation AI learns to talk naturally with Google’s Tacotron 2

0
0
4849

article-image-darpas-2-billion-ai-next-campaign-includes-a-next-generation-nonsurgical-neurotechnology-n3-program

Savia Lobo

11 Sep 2018

3 min read

DARPA’s $2 Billion ‘AI Next’ campaign includes a Next-Generation Nonsurgical Neurotechnology (N3) program

Savia Lobo

11 Sep 2018

3 min read

Last Friday (7th September, 2018), DARPA announced a multi-year investment of more than $2 billion in a new program called the ‘AI Next’ campaign. DARPA’s Agency director, Dr. Steven Walker, officially unveiled the large-scale effort during D60, DARPA’s 60th Anniversary Symposium held in Maryland. This campaign seeks contextual reasoning in AI systems in order to create deeper trust and collaborative partnerships between humans and machines. The key areas the AI Next Campaign may include are: Automating critical DoD (Department of Defense) business processes, such as security clearance vetting in a week or accrediting software systems in one day for operational deployment. Improving the robustness and reliability of AI systems; enhancing the security and resiliency of machine learning and AI technologies. Reducing power, data, and performance inefficiencies. Pioneering the next generation of AI algorithms and applications, such as ‘explainability’ and commonsense reasoning. The Next-Generation Nonsurgical Neurotechnology (N3) program In the conference, DARPA officials also described the next frontier of neuroscience research: technologies for able-bodied soldiers that give them super abilities. Following this, they introduced the Next-Generation Nonsurgical Neurotechnology (N3) program, which was announced in March. This program aims at funding research on tech that can transmit high-fidelity signals between the brain and some external machine without requiring that the user is cut open for rewiring or implantation. Al Emondi, manager of N3, said to IEEE Spectrum that he is currently picking researchers who will be funded under the program and can expect an announcement in early 2019. The program has two tracks: Completely non-invasive: The N3 program aims for new non-invasive tech that can match the high performance currently achieved only with implanted electrodes that are nestled in the brain tissue and therefore have a direct interface with neurons—either recording the electrical signals when the neurons “fire” into action or stimulating them to cause that firing. Minutely invasive: DARPA says it doesn’t want its new brain tech to require even a tiny incision. Instead, minutely invasive tech might come into the body in the form of an injection, a pill, or even a nasal spray. Emondi imagines “nanotransducers” that can sit inside neurons, converting the electrical signal when it fires into some other type of signal that can be picked up through the skull. Justin Sanchez, director of DARPA’s Biological Technologies Office, said that making brain tech easy to use will open the floodgates. He added, “We can imagine a future of how this tech will be used. But this will let millions of people imagine their own futures”. To know more about the AI Next Campaign and the N3 program in detail, visit DARPA blog. Skepticism welcomes Germany’s DARPA-like cybersecurity agency – The federal agency tasked with creating cutting-edge defense technology DARPA on the hunt to catch deepfakes with its AI forensic tools underway

0
0
4836

article-image-spotify-releases-chartify-a-new-data-visualization-library-in-python-for-easier-chart-creation

Natasha Mathur

19 Nov 2018

2 min read

Spotify releases Chartify, a new data visualization library in python for easier chart creation

Natasha Mathur

19 Nov 2018

2 min read

Spotify announced, last week, that it has come out with Chartify, a new open source Python data visualization library, making it easy for data scientists to create charts. It comes with features such as concise and user-friendly syntax and consistent data formatting among others. Let’s have a look at these features in this new library. Concise and user-friendly syntax Despite the abundance of tools such as Seaborn, Matplotlib, Plotly, Bokeh, etc, used by data scientists at Spotify, chart creation has always been a major issue in the data science workflow. Chartify solves that problem as the syntax in it is considerably more concise and user-friendly, as compared to the other tools. There are suggestions added in the docstrings, allowing users to recall the most common formatting options. This, in turn, saves time, allowing data scientists to spend less time on configuring chart aesthetics, and more on actually creating charts. Consistent data formatting Another common problem faced by data scientists is that different plotting methods need different input data formats, requiring users to completely reformat their input data. This leads to data scientists spending a lot of time manipulating data frames into the right state for their charts. Chartify’s consistent input data formatting allows you to quickly create and iterate on charts since less time is spent on data munging. Chartify Other features Since a majority of the problems could be solved by just a few chart types, Chartify focuses mainly on these use cases and comes with a complete example notebook that presents the full list of chart types that Chartify is capable of generating. Moreover, adding color into charts greatly help simplify the charting process, which is why Chartify has different palette types aligned to the different use cases for color. Additionally, Chartify offers support for Bokeh, an interactive python library for data visualization, providing users the option to fall back on manipulating Chartify charts with Bokeh if they need more control. For more information, check out the official Chartify blog post. cstar: Spotify’s Cassandra orchestration tool is now open source! Spotify has “one of the most intricate uses of JavaScript in the world,” says former engineer 8 ways to improve your data visualizations

0
0
4823

article-image-introducing-intels-openvino-computer-vision-toolkit-for-edge-computing

Pravin Dhandre

17 May 2018

2 min read

Introducing Intel's OpenVINO computer vision toolkit for edge computing

Pravin Dhandre

17 May 2018

2 min read

Almost after a week of Microsoft’s announcement about its plan to develop a computer vision develop kit for edge computing, Intel smartly introduced its latest offering, called OpenVINO in the domain of Internet of Things (IoT) and Artificial Intelligence (AI). This toolkit is a comprehensive computer vision solution, that brings computer vision and deep learning capabilities to the edge devices smoothly. OpenVINO (Open Visual Inference and Neural Network Optimization) toolkit supports popular open source frameworks like OpenCV, Caffe and TensorFlow. It supports and works with Intel’s traditional CPUs, AI chips, field programmable gate array (FPGA) chips and Movidius vision processing unit (VPU). The toolkit presumes the potential to address a wide number of challenges faced by developers in delivering distributed and end-to-end intelligence. With OpenVINO, developers can simply streamline their deep learning inferences and deploy high-performance computer vision solutions across a wide range of use-cases. Computer vision limitations related to bandwidth, latency and storage are expected to be resolved to an extent. This toolkit would also help developers in optimizing AI-integrated computer vision applications and scaling distributed vision applications which generally needs a complete redesign of solution. Until now, edge computing has been more of a prospect for an IoT market. With OpenVINO, Intel stands as the the only industry leader in delivering IoT solutions from the edges, providing an unparalleled solution to meet AI needs of businesses. OpenVINO is already being used by companies like GE Healthcare, Dahua, Amazon Web Services and Honeywell across their Digital Imaging and IoT Solutions. To explore more information on its capabilities and performance, visit Intel’s official OpenVINO product documentation. A gentle note to readers: OpenVINO is not to be confused with Openvino, an open-source winery and wine-backed cryptoasset, Openvino. Should you go with Arduino Uno or Raspberry Pi 3 for your next IoT project? AWS Greengrass brings machine learning to the edge Cognitive IoT: How Artificial Intelligence is remoulding Industrial and Consumer IoT

0
1
4805

article-image-sqlite-adopts-the-rule-of-st-benedict-as-its-code-of-conduct-drops-it-to-adopt-mozillas-community-participation-guidelines-in-a-week

Natasha Mathur

29 Oct 2018

4 min read

SQLite adopts the rule of St. Benedict as its Code of Conduct, drops it to adopt Mozilla’s community participation guidelines, in a week

Natasha Mathur

29 Oct 2018

4 min read

0
0
4799

article-image-how-verizon-and-a-bgp-optimizer-caused-a-major-internet-outage-affecting-amazon-facebook-cloudflare-among-others

Savia Lobo

25 Jun 2019

5 min read

How Verizon and a BGP Optimizer caused a major internet outage affecting Amazon, Facebook, CloudFlare among others

Savia Lobo

25 Jun 2019

5 min read

Yesterday, many parts of the Internet faced an unprecedented outage as Verizon, the popular Internet transit provider accidentally rerouted IP packages after it wrongly accepted a network misconfiguration from a small ISP in Pennsylvania, USA. According to The Register, “systems around the planet were automatically updated, and connections destined for Facebook, Cloudflare, and others, ended up going through DQE and Allegheny, which buckled under the strain, causing traffic to disappear into a black hole”. According to Cloudflare, “What exacerbated the problem today was the involvement of a “BGP Optimizer” product from Noction. This product has a feature that splits up received IP prefixes into smaller, contributing parts (called more-specifics). For example, our own IPv4 route 104.20.0.0/20 was turned into 104.20.0.0/21 and 104.20.8.0/21”. Many Google users were unable to access the web using the Google browser. Some users say the Google Calendar went down too. Amazon users were also unable to use some services such as Amazon books, as users were unable to reach the site. Source: Downdetector Source:Downdetector Source:Downdetector Also, in another incident, on June 6, more than 70,000 BGP routes were leaked from Swiss colocation company Safe Host to China Telecom in Frankfurt, Germany, which then announced them on the global internet. “This resulted in a massive rerouting of internet traffic via China Telecom systems in Europe, disrupting connectivity for netizens: a lot of data that should have gone to European cellular networks was instead piped to China Telecom-controlled boxes”, The Register reports. BGP caused a lot of blunder in this outage The Internet is made up of networks called Autonomous Systems (AS), and each of these networks has a unique identifier, called an AS number. All these networks are interconnected using a Border Gateway Protocol (BGP), which joins these networks together and enables traffic to travel from an ISP to a popular website at a far off location, for example. Source: Cloudflare With the help of BGP, networks exchange route information that can either be specific, similar to finding a specific city on your GPS, or very general, like pointing your GPS to a state. DQE Communications with an AS number AS33154, an Internet Service Provider in Pennsylvania was using a BGP optimizer in their network. It announced these specific routes to its customer, Allegheny Technologies Inc (AS396531), a steel company based in Pittsburgh. This entire routing information was sent to Verizon (AS701), who further accepted and passed this information to the world. “Verizon’s lack of filtering turned this into a major incident that affected many Internet services”, Cloudfare mentions. “What this means is that suddenly Verizon, Allegheny, and DQE had to deal with a stampede of Internet users trying to access those services through their network. None of these networks were suitably equipped to deal with this drastic increase in traffic, causing disruption in service” Job Snijders, an internet architect for NTT Communications, wrote in a network operators' mailing list, “While it is easy to point at the alleged BGP optimizer as the root cause, I do think we now have observed a cascading catastrophic failure both in process and technologies.” https://twitter.com/bgpmon/status/1143149817473847296 Cloudflare's CTO Graham-Cumming told El Reg's Richard Speed, "A customer of Verizon in the US started announcing essentially that a very large amount of the internet belonged to them. For reasons that are a bit hard to understand, Verizon decided to pass that on to the rest of the world." "but normally [a large ISP like Verizon] would filter it out if some small provider said they own the internet", he further added. “If Verizon had used RPKI, they would have seen that the advertised routes were not valid, and the routes could have been automatically dropped by the router”, Cloudflare said. https://twitter.com/eastdakota/status/1143182575680143361 https://twitter.com/atoonk/status/1143139749915320321 Rerouting is highly dangerous as criminals, hackers, or government-spies could be lurking around to grab such a free flow of data. However, this creates security distension among users as their data can be used for surveillance, disruption, and financial theft. Cloudflare was majorly affected by this outage, “It is unfortunate that while we tried both e-mail and phone calls to reach out to Verizon, at the time of writing this article (over 8 hours after the incident), we have not heard back from them, nor are we aware of them taking action to resolve the issue”, the company said in their blogpost. One of the users commented, “BGP needs a SERIOUS revamp with Security 101 in mind.....RPKI + ROA's is 100% needed and the ISPs need to stop being CHEAP. Either build it by Federal Requirement, at least in the Nation States that take their internet traffic as Citizen private data or do it as Internet 3.0 cause 2.0 flaked! Either way, "Path Validation" is another component of BGP that should be looked at but honestly, that is going to slow path selection down and to instrument it at a scale where the internet would benefit = not worth it and won't happen. SMH largest internet GAP = BGP "accidental" hijacks” Verizon in a statement to The Register said, "There was an intermittent disruption in internet service for some [Verizon] FiOS customers earlier this morning. Our engineers resolved the issue around 9 am ET." https://twitter.com/atoonk/status/1143145626516914176 To know more about this news in detail head over to CloudFlare’s blog. OpenSSH code gets an update to protect against side-channel attacks Red Badger Tech Director Viktor Charypar talks monorepos, lifelong learning, and the challenges facing open source software [Interview] Facebook signs on more than a dozen backers for its GlobalCoin cryptocurrency including Visa, Mastercard, PayPal and Uber

0
0
4798

article-image-canva-faced-security-breach-139-million-users-data-hacked-zdnet-reports

Fatema Patrawala

28 May 2019

3 min read

Canva faced security breach, 139 million users data hacked: ZDNet reports

Fatema Patrawala

28 May 2019

3 min read

Last Friday, ZDNet reported about Canva’s data breach. Canva is a popular Sydney-based startup which offers a graphic design service. According to the hacker, who directly contacted ZDNet, data of roughly 139 million users has been compromised during the breach. Responsible for the data breach is a hacker known as GnosticPlayers online. Since February this year, they have put up the data of 932 million users on sale, which are reportedly stolen from 44 companies around the world. "I download everything up to May 17," the hacker said to ZDNet. "They detected my breach and closed their database server." Source: ZDNet website In a statement on the Canva website, the company confirmed the attack and has notified the relevant authorities. They also tweeted about the data breach on 24th May as soon as they discovered the hack and recommended their users to change their passwords immediately. https://twitter.com/canva/status/1132086889408749573 “At Canva, we are committed to protecting the data and privacy of all our users and believe in open, transparent communication that puts our communities’ needs first,” the statement said. “On May 24, we became aware of a security incident. As soon as we were notified, we immediately took steps to identify and remedy the cause, and have reported the situation to authorities (including the FBI). “We’re aware that a number of our community’s usernames and email addresses have been accessed.” Stolen data included details such as customer usernames, real names, email addresses, and city & country information. For 61 million users, password hashes were also present in the database. The passwords where hashed with the bcrypt algorithm, currently considered one of the most secure password-hashing algorithms around. For other users, the stolen information included Google tokens, which users had used to sign up for the site without setting a password. Of the total 139 million users, 78 million users had a Gmail address associated with their Canva account. Canva is one of Australia's biggest tech companies. Founded in 2012, since the launch, the site has shot up the Alexa website traffic rank, and has been ranking among the Top 200 popular websites. Three days ago, the company announced it raised $70 million in a Series-D funding round, and is now valued at a whopping $2.5 billion. Canva also recently acquired two of the world's biggest free stock content sites -- Pexels and Pixabay. Details of Pexels and Pixabay users were not included in the data stolen by the hacker. According to reports from Business Insider, the community was dissatisfied with how Canva responded to the attack. IT consultant Dave Hall criticized the wording Canva used in a communication sent to users on Saturday. He believes Canva did not respond fast enough. https://twitter.com/skwashd/status/1132258055767281664 One Hacker News user commented , “It seems as though these breaches have limited effect on user behaviour. Perhaps I'm just being cynical but if you are aren't getting access and you are just getting hashed passwords, do people even care? Does it even matter? Of course names and contact details are not great. I get that. But will this even effect Canva?” Another user says, “How is a design website having 189M users? This is astonishing more than the hack!” Facebook again, caught tracking Stack Overflow user activity and data Ireland’s Data Protection Commission initiates an inquiry into Google’s online Ad Exchange services Adobe warns users of “infringement claims” if they continue using older versions of its Creative Cloud products

0
0
4782

article-image-pandas-will-drop-support-for-python-2-this-month-with-pandas-0-24

Prasad Ramesh

04 Jan 2019

2 min read

pandas will drop support for Python 2 this month with pandas 0.24

Prasad Ramesh

04 Jan 2019

2 min read

The next version of the Python library, pandas 0.24.0 will not have support for Python 2. pandas is a popular Python library widely used for data manipulation and data analysis. It is used in areas like numerical tables and time series data. Jeff Reback, pandas maintainer Tweeted on Wednesday: https://twitter.com/jreback/status/1080603676882935811 Many major Python libraries removing Python 2 support One of the first tools to drop support for Python 2 was ipython in 2017. This was followed by matplotlib and more recently NumPy. Other popular libraries like scikit-learn and SciPy will also be removing support for Python 2 this year. IDEs like Spyder and Pythran are also included in the list. Python 2 support ending in 2020 Core Python developers will stop supporting Python 2 no later than the year 2020. This move is to control fragmentation and save on workforce for maintaining Python 2. Python 2 will no longer receive any new features and all support for it will cease next year. As stated on the official website: “2.7 will receive bugfix support until January 1, 2020. After the last release, 2.7 will receive no support.” Python 2 support was about to end in 2015 itself but was extended by five years considering the user base. Users seem to welcome the change to move forward as a comment on Hacker new says: “Time to move forward. Python 2 is so 2010.” NumPy drops Python 2 support. Now you need Python 3.5 or later. Python governance vote results are here: The steering council model is the winner NYU and AWS introduce Deep Graph Library (DGL), a python package to build neural network graphs

0
0
4775

article-image-data-scientist-sexiest-role-21st-century

Aarthi Kumaraswamy

08 Nov 2017

6 min read

Data Scientist: The sexiest role of the 21st century

Aarthi Kumaraswamy

08 Nov 2017

6 min read

"Information is the oil of the 21st century, and analytics is the combustion engine." -Peter Sondergaard, Gartner Research By 2018, it is estimated that companies will spend $114 billion on big data-related projects, an increase of roughly 300%, compared to 2013 (https://www.capgemini-consulting.com/resource-file-access/resource/pdf/big_dat a_pov_03-02-15.pdf). Much of this increase in expenditure is due to how much data is being created and how we are better able to store such data by leveraging distributed filesystems such as Hadoop. However, collecting the data is only half the battle; the other half involves data extraction, transformation, and loading into a computation system, which leverages the power of modern computers to apply various mathematical methods in order to learn more about data and patterns and extract useful information to make relevant decisions. The entire data workflow has been boosted in the last few years by not only increasing the computation power and providing easily accessible and scalable cloud services (for example, Amazon AWS, Microsoft Azure, and Heroku) but also by a number of tools and libraries that help to easily manage, control, and scale infrastructure and build applications. Such a growth in the computation power also helps to process larger amounts of data and to apply algorithms that were impossible to apply earlier. Finally, various computation- expensive statistical or machine learning algorithms have started to help extract nuggets of information from data. Finding a uniform definition of data science is akin to tasting wine and comparing flavor profiles among friends—everyone has their own definition and no one description is more accurate than the other. At its core, however, data science is the art of asking intelligent questions about data and receiving intelligent answers that matter to key stakeholders. Unfortunately, the opposite also holds true—ask lousy questions of the data and get lousy answers! Therefore, careful formulation of the question is the key for extracting valuable insights from your data. For this reason, companies are now hiring data scientists to help formulate and ask these questions. At first, it's easy to paint a stereotypical picture of what a typical data scientist looks like: t- shirt, sweatpants, thick-rimmed glasses, and debugging a chunk of code in IntelliJ... you get the idea. Aesthetics aside, what are some of the traits of a data scientist? One of our favorite posters describing this role is shown here in the following diagram: Math, statistics, and general knowledge of computer science is given, but one pitfall that we see among practitioners has to do with understanding the business problem, which goes back to asking intelligent questions of the data. It cannot be emphasized enough: asking more intelligent questions of the data is a function of the data scientist's understanding of the business problem and the limitations of the data; without this fundamental understanding, even the most intelligent algorithm would be unable to come to solid conclusions based on a wobbly foundation. A day in the life of a data scientist This will probably come as a shock to some of you—being a data scientist is more than reading academic papers, researching new tools, and model building until the wee hours of the morning, fueled on espresso; in fact, this is only a small percentage of the time that a data scientist gets to truly play (the espresso part however is 100% true for everyone)! Most part of the day, however, is spent in meetings, gaining a better understanding of the business problem(s), crunching the data to learn its limitations (take heart, this book will expose you to a ton of different feature engineering or feature extractions tasks), and how best to present the findings to non data-sciencey people. This is where the true sausage making process takes place, and the best data scientists are the ones who relish in this process because they are gaining more understanding of the requirements and benchmarks for success. In fact, we could literally write a whole new book describing this process from top-to-tail! So, what (and who) is involved in asking questions about data? Sometimes, it is process of saving data into a relational database and running SQL queries to find insights into data: "for the millions of users that bought this particular product, what are the top 3 OTHER products also bought?" Other times, the question is more complex, such as, "Given the review of a movie, is this a positive or negative review?" This book is mainly focused on complex questions, like the latter. Answering these types of questions is where businesses really get the most impact from their big data projects and is also where we see a proliferation of emerging technologies that look to make this Q and A system easier, with more functionality. Some of the most popular, open source frameworks that look to help answer data questions include R, Python, Julia, and Octave, all of which perform reasonably well with small (X < 100 GB) datasets. At this point, it's worth stopping and pointing out a clear distinction between big versus small data. Our general rule of thumb in the office goes as follows: If you can open your dataset using Excel, you are working with small data. Working with big data What happens when the dataset in question is so vast that it cannot fit into the memory of a single computer and must be distributed across a number of nodes in a large computing cluster? Can't we just rewrite some R code, for example, and extend it to account for more than a single-node computation? If only things were that simple! There are many reasons why the scaling of algorithms to more machines is difficult. Imagine a simple example of a file containing a list of names: B D X A D A We would like to compute the number of occurrences of individual words in the file. If the file fits into a single machine, you can easily compute the number of occurrences by using a combination of the Unix tools, sort and uniq: bash> sort file | uniq -c The output is as shown ahead: 2 A 1 B 1 D 1 X However, if the file is huge and distributed over multiple machines, it is necessary to adopt a slightly different computation strategy. For example, compute the number of occurrences of individual words for every part of the file that fits into the memory and merge the results together. Hence, even simple tasks, such as counting the occurrences of names, in a distributed environment can become more complicated. The above is an excerpt from the book Mastering Machine Learning with Spark 2.x by Alex Tellez, Max Pumperla and Michal Malohlava. If you would like to learn how to solve the above problem and other cool machine learning tasks a data scientist carries out such as the following, check out the book. Use Spark streams to cluster tweets online Run the PageRank algorithm to compute user influence Perform complex manipulation of DataFrames using Spark Define Spark pipelines to compose individual data transformations Utilize generated models for off-line/on-line prediction

0
0
4761

article-image-keras-2-3-0-the-first-release-of-multi-backend-keras-with-tensorflow-2-0-support-is-now-out

Bhagyashree R

18 Sep 2019

4 min read

Keras 2.3.0, the first release of multi-backend Keras with TensorFlow 2.0 support is now out

Bhagyashree R

18 Sep 2019

4 min read

Yesterday, the Keras team announced the release of Keras 2.3.0, which is the first release of multi-backend Keras with TensorFlow 2.0 support. This is also the last major release of multi-backend Keras. It is backward-compatible with TensorFlow 1.14, 1.13, Theano, and CNTK. Keras to focus mainly on tf.keras while continuing support for Theano/CNTK This release comes with a lot of API changes to bring the multi-backend Keras API “in sync” with tf.keras, TensorFlow’s high-level API. However, there are some TensorFlow 2.0 features that are not supported. This is why the team recommends developers to switch their Keras code to tf.keras in TensorFlow 2.0. Read also: TensorFlow 2.0 beta releases with distribution strategy, API freeze, easy model building with Keras and more Moving to tf.keras will give developers access to features like eager execution, TPU training, and much better integration between low-level TensorFlow and high-level concepts like Layer and Model. Following this release, the team plans to mainly focus on the further development of tf.keras. “Development will focus on tf.keras going forward. We will keep maintaining multi-backend Keras over the next 6 months, but we will only be merging bug fixes. API changes will not be ported,” the team writes. To make it easier for the community to contribute to the development of Keras, the team will be developing tf.keras in its own standalone GitHub repository at keras-team/keras. François Chollet, the creator of Keras, further explained on Twitter why they are moving away from the multi-backend Keras: https://twitter.com/fchollet/status/1174019142774452224 API updates in Keras 2.3.0 Here are some of the API updates in Keras 2.3.0: The add_metric method is added to Layer/Model, which is similar to the add_loss method but for metrics. Keras 2.3.0 introduces several class-based losses including MeanSquaredError, MeanAbsoluteError, BinaryCrossentropy, Hinge, and more. With this update, losses can be parameterized via constructor arguments. Many class-based metrics are added including Accuracy, MeanSquaredError, Hinge, FalsePositives, BinaryAccuracy, and more. This update enables metrics to be stateful and parameterized via constructor arguments. The train_on_batch and test_on_batch methods now have a new argument called resent_metrics. You can set this argument to True for maintaining metric state across different batches when writing lower-level training or evaluation loops. The model.reset_metrics() method is added to Model to clear metric state at the start of an epoch when writing lower-level training or evaluation loops. Breaking changes in Keras 2.3.0 Along with the API changes, Keras 2.3.0 includes a few breaking changes. In this release, batch_size, write_grads, embeddings_freq, and embeddings_layer_names are deprecated and hence are ignored when used with TensorFlow 2.0. Metrics and losses will now be reported under the exact name specified by the user. Also, the default recurrent activation is changed from hard_sigmoid to sigmoid in all RNN layers. Read also: Build your first Reinforcement learning agent in Keras [Tutorial] The release started a discussion on Hacker News where developers appreciated that Keras will mainly focus on the development of tf.keras. A user commented, “Good move. I'd much rather it worked well for one backend then sucked mightily on all of them. Eager mode means that for the first time ever you can _easily_ debug programs using the TensorFlow backend. That will be music to the ears of anyone who's ever tried to debug a complex TF-backed model.” Some also raised the question that Google might acquire Keras in the future considering TensorFlow has already included Keras in its codebase and its creator, François Chollet works as an AI researcher at Google. Check out the official announcement to know what more has landed in Keras 2.3.0. Other news in Data The CAP Theorem in practice: The consistency vs. availability trade-off in distributed databases InfluxData launches new serverless time series cloud database platform, InfluxDB Cloud 2.0 Different types of NoSQL databases and when to use them

0
0
4736

article-image-tensorflow-1-11-0-releases

Pravin Dhandre

28 Sep 2018

2 min read

TensorFlow 1.11.0 releases

Pravin Dhandre

28 Sep 2018

2 min read

It’s been just a month since the release of TensorFlow 1.10, and the TensorFlow community introduces the newer version 1.11 with few major additions, lots of bug fixes and numerous performance improvements. Major Features of TensorFlow 1.11.0: Prebuilt binaries built for Nvidia GPU Experimental tf.data integration for Keras Preview support for eager execution on Google Cloud TPUs Added multi-GPU DistributionStrategy support in tf.keras for model distribution Added multi-worker DistributionStrategy support in Estimator C, C++, and Python functions added for querying kernels Added simple Tensor and DataType classes to TensorFlow Lite Java Bug Fixes and Other Changes: Default values for tf.keras RandomUniform, RandomNormal, and TruncatedNormal initializers changed Added pruning mode for boosted trees Old checkpoints do not get deleted by default Total disk space for dumped tensor data limited to 100 GB. Added experimental IndexedDatasets Performance Improvements: Enhanced performance for StringSplitOp & StringSplitV2Op Regex replace operations improvised with max performance. Toco compilation/execution fixed for Windows Added GoogleZoneProvider class for detecting Google Cloud Engine zone tensorflow Import enabled for tensor.proto.h Added documentation clarifying the differences between tf.fill and tf.constant Added selective registration target using the lite proto runtime Support for bitcasting to and from uint32 and uint64 Estimator subclass added and can be created from a SavedModelEstimator Added argument leaf index modes Please see the full release notes for complete details on added features and changes. You can also check the GitHub repository to find various interesting use cases of TensorFlow. Top 5 Deep Learning Architectures A new Model optimization Toolkit for TensorFlow can make models 3x faster Intelligent mobile projects with TensorFlow: Build your first Reinforcement Learning model on Raspberry Pi

0
0
4710

article-image-deepmind-introduces-openspiel-a-reinforcement-learning-based-framework-for-video-games

Savia Lobo

28 Aug 2019

3 min read

DeepMind introduces OpenSpiel, a reinforcement learning-based framework for video games

Savia Lobo

28 Aug 2019

3 min read

A few days ago, researchers at DeepMind introduced OpenSpiel, a framework for writing games and algorithms for research in general reinforcement learning and search/planning in games. The core API and games are implemented in C++ and exposed to Python. Algorithms and tools are written both in C++ and Python. It also includes a branch of pure Swift in the swift subdirectory. In their paper, the researchers write, “We hope that OpenSpiel could have a similar effect on general RL in games as the Atari Learning Environment has had on single-agent RL.” Read Also: Google Research Football Environment: A Reinforcement Learning environment for AI agents to master football OpenSpiel allows evaluating written games and algorithms on a variety of benchmark games as it includes implementations of over 20 different games types including simultaneous move, perfect and imperfect information games, gridworld games, an auction game, and several normal-form / matrix games, etc. It includes tools to analyze learning dynamics and other common evaluation metrics. It also supports n-player (single- and multi-agent) zero-sum, cooperative and general-sum, one-shot and sequential games, etc. OpenSpiel has been tested on Linux (Debian 10 and Ubuntu 19.04). However, the researchers have not tested the framework on MacOS or Windows. “since the code uses freely available tools, we do not anticipate any (major) problems compiling and running under other major platforms,” the researchers added. The purpose of OpenSpiel is to promote “general multiagent reinforcement learning across many different game types, in a similar way as general game-playing but with a heavy emphasis on learning and not in competition form,” the researcher paper mentions. This framework is “designed to be easy to install and use, easy to understand, easy to extend (“hackable”), and general/broad.” Read Also: DeepMind’s AI uses reinforcement learning to defeat humans in multiplayer games Design constraints for OpenSpiel The two main design criteria that OpenSpiel is based on include: Simplicity: OpenSpiel provides easy-to-read, easy-to-use code that can be used to learn from and to build a prototype rather than a fully-optimized code that would require additional assumptions. Dependency-free: Researchers say, “dependencies can be problematic for long-term compatibility, maintenance, and ease-of-use.” Hence, the OpenSpiel framework does not introduce dependencies thus keeping it portable and easy to install. Swift OpenSpiel: A port to use Swift for TensorFlow The swift/ folder contains a port of OpenSpiel to use Swift for TensorFlow. This Swift port explores using a single programming language for the entire OpenSpiel environment, from game implementations to the algorithms and deep learning models. This Swift port is intended for serious research use. As the Swift for TensorFlow platform matures and gains additional capabilities (e.g. distributed training), expect the kinds of algorithms that are expressible and tractable to train to grow significantly. While OpenSpiel has some tools for visualization and evaluation, the α-Rank algorithm is also a tool. The α-Rank algorithm leverages evolutionary game theory to rank AI agents interacting in multiplayer games. OpenSpiel currently supports using α-Rank for both single-population (symmetric) and multi-population games. Developers are excited about this release and want to try out this framework. https://twitter.com/SMBrocklehurst/status/1166435811581202443 https://twitter.com/sharky6000/status/1166349178412261376 To know more about this news in detail, head over to the research paper. You can also check out the GitHub page. Terrifyingly realistic Deepfake video of Bill Hader transforming into Tom Cruise is going viral on YouTube DeepCode, the AI startup for code review, raises $4M seed funding; will be free for educational use and enterprise teams with 30 developers Baidu open sources ERNIE 2.0, a continual pre-training NLP model that outperforms BERT and XLNet on 16 NLP tasks

0
0
4695

article-image-say-hello-to-faster-a-new-key-value-store-for-large-state-management-by-microsoft

Natasha Mathur

20 Aug 2018

3 min read

Say hello to FASTER: a new key-value store for large state management by Microsoft

Natasha Mathur

20 Aug 2018

3 min read

The Microsoft research team announced a new key-value store named FASTER at SIGMOD 2018, in June. FASTER offers support for fast and frequent lookups of data. It also helps with updating large volumes of state information which poses a problem for cloud applications today. Let’s consider IoT as a scenario. Here billions of devices report and update state like per-device performance counters. This leads to applications underutilizing resources such as storage and networking on the machine. FASTER helps solve this problem as it makes use of the temporal locality in these applications for controlling the in-memory footprint of the system. According to Microsoft, “FASTER is a single-node shared memory key-value store library”. A key-value store is a NoSQL database which makes use of simple key/value method for data storage. It consists of two important innovations: A cache-friendly, concurrent and latch-free hash index. It maintains logical pointers to records in a log. The FASTER hash index refers to an array of cache-line-sized hash buckets, each with 8-byte entries to hold hash tags. It also consists of logical pointers to records that have been stored separately. A new concurrent and hybrid log record allocator. This helps in backing the index which includes fast storage (such as cloud storage and SSD) and main memory. What makes FASTER different? The traditional key-value stores make use of log-structured record organizations. But, FASTER is different as it has a hybrid log that combines log-structuring with read-copy-updates (good for external storage) and in-place updates (good for in-memory performance). So, the hybrid log head which lies in storage uses a read-copy-update whereas the hybrid log tail part in main memory uses in-place updates. There is a read-only region in memory that lies between these two regions. It provides the core records another chance to be copied back to the tail. This captures temporary location of the updates and allows a natural clustering of hot records in memory. As a result, FASTER is capable of outperforming even pure in-memory data structures like the Intel TBB hash map. It also performs far better than today’s popular key-value stores and caching systems like the RocksDB and Redis, says Microsoft. Other than that, FASTER also provides support for failure recovery as it consists of a recovery strategy in place which helps bring back the system to a recent consistent state at low cost. This is different than the recovery mechanism in traditional database systems as it does not involve blocking or creating a separate “write-ahead log”. For more information, check out the official research paper. Google, Microsoft, Twitter, and Facebook team up for Data Transfer Project Microsoft Azure’s new governance DApp: An enterprise blockchain without mining Microsoft announces the general availability of Azure SQL Data Sync

0
0
4695

Tech News - Data

PyTorch 1.0 is here with JIT, C++ API, and new distributed packages

Netflix open sources Polynote, an IDE-like polyglot notebook with Scala support, Apache Spark integration, multi-language interoperability, and more

Baidu announces ClariNet, a neural network for text-to-speech synthesis

DARPA’s $2 Billion ‘AI Next’ campaign includes a Next-Generation Nonsurgical Neurotechnology (N3) program

Spotify releases Chartify, a new data visualization library in python for easier chart creation

Introducing Intel's OpenVINO computer vision toolkit for edge computing

SQLite adopts the rule of St. Benedict as its Code of Conduct, drops it to adopt Mozilla’s community participation guidelines, in a week

How Verizon and a BGP Optimizer caused a major internet outage affecting Amazon, Facebook, CloudFlare among others

Canva faced security breach, 139 million users data hacked: ZDNet reports

pandas will drop support for Python 2 this month with pandas 0.24

Trending Topics

Data Scientist: The sexiest role of the 21st century

Keras 2.3.0, the first release of multi-backend Keras with TensorFlow 2.0 support is now out

TensorFlow 1.11.0 releases

DeepMind introduces OpenSpiel, a reinforcement learning-based framework for video games

Say hello to FASTER: a new key-value store for large state management by Microsoft

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access