Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech News - Data

1208 Articles
article-image-elvis-pranskevichus-on-limitations-in-sql-and-how-edgeql-can-help
Bhagyashree R
10 May 2019
3 min read
Save for later

Elvis Pranskevichus on limitations in SQL and how EdgeQL can help

Bhagyashree R
10 May 2019
3 min read
Structure Query Language (SQL), which was once considered “not a serious language” by its authors, has now become a dominant query language for relational databases in the industry. Its battle-tested solutions, stability, portability makes it a reliable choice to perform operations on your stored data. However, it does has its share of weak points and that’s what Elvis Pranskevichus, founder of EdgeDB, listed down in a post titled “We Can Do Better Than SQL” published yesterday. He explained that we now need a “better SQL” and further introduced the EdgeQL language, which aims to address the limitations in SQL. SQL’s shortcomings Following are some of the shortcomings Pranskevichus talks about in his post: “Lack of Orthogonality” Orthogonality is a property, which means if you make some changes in one component, it will have no side effect on any other component. In the case of SQL, it means, allowing users to combine a small set of primitive constructs in a small number of ways. Orthogonality leads to a more compact and consistent design and not having it will lead to language which has many exceptions and caveats. Giving an example, Pranskevichus wrote, “A good example of orthogonality in a programming language is the ability to substitute an arbitrary part of an expression with a variable, or a function call, without any effect on the final result.” SQL does not permit such type of generic substitution. “Lack of Compactness” One of the side effects of not being orthogonal is lack of compactness. SQL is also considered to be “verbose” because of its goal of being an English-like language for catering to “non-professions”. “However, with the growth of the language, this verbosity has contributed negatively to the ability to write and comprehend SQL queries. We learnt this lesson with COBOL, and the world has long since moved on to newer, more succinct programming languages. In addition to keyword proliferation, the orthogonality issues discussed above make queries more verbose and harder to read,” wrote Pranskevichus in his post. “Lack of Consistency” Pranskevichus further adds that SQL is inconsistent in terms of both syntax and semantics. Additionally, there is a problem of standardization as well as different database vendors implement their own version of SQL, which often end up being incompatible with other SQL variants. Introducing EdgeQL With EdgeQL, Pranskevichus aims to provide users a language which is orthogonal, consistent, and compact, and at the same time works with the generally applicable relational model. In short, he aims to make SQL better! EdgeQL basically considers every value a set and every expression a function over sets. This design of the language allows you yo factor any part of an EdgeQL expression into a view or a function without changing other parts of the query. It has no null and a missing value is considered to be an empty set, which comes with the advantage of having only two boolean logic states. Read Pranskevichus’s original post for more details on EdqeQL. Building a scalable PostgreSQL solution PostgreSQL security: a quick look at authentication best practices [Tutorial] How to handle backup and recovery with PostgreSQL 11 [Tutorial]
Read more
  • 0
  • 0
  • 2634

article-image-tableau-day-highlights-augmented-analytics-tableau-prep-builder-and-conductor-and-more
Savia Lobo
10 May 2019
4 min read
Save for later

‘Tableau Day’ highlights: Augmented Analytics, Tableau Prep Builder and Conductor, and more!

Savia Lobo
10 May 2019
4 min read
The Tableau community held a Tableau Day in Mumbai, India, yesterday. The community announced some upcoming and exciting developments in Tableau. Some of the highlights of the Tableau Day include an in-depth explanation of the new Tableau Prep Builder and Conductor, how Tableau plans to move on to Augmented Analytics, and many others. The conference also included a customer story from Nishtha Sharma, Manager at Times Internet, who shared how Tableau helped Times Internet in optimizing their sales, revenue, managing cost per customer, and business predictions with the help of Tableau Dashboards. She further said, Times Internet was solving around 10 business problems with 7 dashboards initially; however, due to success with Tableau initially, they are now solving close to 30 business cases with 15 dashboards. Let us have a look at some of the highlights below. Augmented Analytics: The next step for Tableau Varun Tandon, Tableau Solution consultant explained how Tableau is adopting intelligent or Augmented Analytics. Tableau may be moving into adopting augmented analytics for its platform where ML and AI can be used to enhance data access and data quality, uncover previously hidden insights, suggest analyses, deliver predictive analytics and suggest actions, and a lot of other tasks. A lot of users came up with questions and speculations based on Tableau’s acquisition of Empirical Systems last year and whether Ask Data, Tableau’s new natural language capability, a feature included in Tableau 2018.2, was a result of the same. The representatives confirmed the acquisition and also mentioned that Tableau is planning on building analytics and augmented analytics within Tableau without the need for additional third-party add-ons. However, they did not clarify if Ask Data was a result of Empirical System’s acquisition. With Empirical’s NLP module, Tableau users may easily gain insights, make better data-driven decisions, and explore many more features without knowledge of data science or query languages. Doug Henschen, a technology analyst at ConstellationR in his report, “Tableau Advances the Era of Smart Analytics” explored the smart features that Tableau Software has introduced and is investing in and how these capabilities will benefit Tableau customers. Creating a single Hub for data from various sources The conference explained in detail with examples of how Tableau can be used as a single Hub for data coming from various sources such as Netsuite, Excel, Salesforce, and so on. New features on Tableau Prep Builder and Conductor Tableau’s new Data Prep Builder and Conductor, which saves massive data preparation time, was also demonstrated and its new features were explained in detail, in this conference. Shilpa Bhatia, a Customer Consultant at Tableau Software, conducted this session. Questions were asked if Tableau Prep Builder and Conductor would replace ETL. The representatives said that Data Prep does a good job with data preparation; however, users should not confuse it with ETL. They have called the Tableau Prep Builder and Conductor, a Mini ETL. Tableau is also coming up with monthly updates since the tool is still evolving and it will continue for the near future. A question on the ability to pull the data from Data Prep to Jupyter notebook for building data frames was also asked. However, even this isn’t possible with the Prep Prep Builder and Conductor. They said Data Prep is extremely simple to use; however, it is a resource heavy tool and a dedicated machine with more than 16 GB RAM to will be needed to avoid system lag for large datasets. The self-service mode in Tableau Jayen Thakker, Sales Consultant at Tableau explained how one can go beyond dashboards with Tableau. He said, with the help of Tableau’s self-service mode, users can explore and build dashboards on their own without the need of waiting for the developer to build it for them.​ Upcoming Tableau versions The conference also revealed that Tableau 2019.2 is currently in Beta 2 and is expected to be released by the next month. Also, there will be a Beta 3 version before the final release. Each release of Tableau includes around 100 to 150 changes, and a couple of changes were also discussed including Spatial data (MakePoint and MakeLine), some next steps on how it will move beyond 'Ask Data' and will include advanced analytics and AI features, and so on. Tableau is also working on serving people who need more traditional reporting, the representatives mentioned. To know more about the ‘Tableau Day’ highlights from Mumbai, watch this space or visit Tableau’s official website. Alteryx vs. Tableau: Choosing the right data analytics tool for your business Tableau 2019.1 beta announced at Tableau Conference 2018 How to do data storytelling well with Tableau [Video]
Read more
  • 0
  • 0
  • 4195

article-image-artist-holly-herndon-releases-an-album-featuring-an-artificial-intelligence-musician
Richard Gall
10 May 2019
6 min read
Save for later

Artist Holly Herndon releases an album featuring an artificial intelligence 'musician'

Richard Gall
10 May 2019
6 min read
The strange mixture of panic and excitement around artificial intelligence only appears to grow as the world uncovers new and more novel ways of using it. These waves of innovation then only feed into continuing cycles of stories that have a habit of perpetuating misleading ideas about both the threats and opportunities it presents. It shouldn't be surprising, then, that there's a serious misunderstanding of what artificial intelligence really is and how it works - as Rowel Atienza told us last month, "we're still very far from robots taking over society." However, artist Holly Herndon (who, incidentally, is a researcher at Stanford) is getting listeners to think differently about artificial intelligence. On her latest album PROTO, which was released today, she's using it to augment and complement her music. Holly Herndon's AI agent, Spawn The special guest that makes PROTO remarkable is Spawn, an AI agent created by Herndon, her husband, and a software engineer. What makes Spawn particularly significant is that Herndon doesn't use it to replace or recreate something but instead as something that exists alongside human activity and creativity. How does Spawn work? Spawn was 'trained' on the music that Herndon and her band were writing for the album. In essence, then, this makes it quite different from the way in which AI is typically used, in that it was developed around a new dataset, not an existing one. When we use existing data sets - and especially when we use them uncritically, without any awareness of how they reproduce or hide certain biases - the AI develops around those very biases. However, when learning from the new 'data', which bears all the marks of Herndon's creative decision making, Spawn almost becomes a 'creative' AI agent. If you listen to the album, it's not always that easy to spot which parts are created by the artificial intelligence and which are made by human musicians. This combination of creative 'sources' means Herndon's album forces us to ask questions about how we use AI and how it interacts with our lives. It quite deliberately engages with the conversation around ethics in AI that has been taking place across the tech industry over the last year or so. https://open.spotify.com/album/3PkYFFSJTPxOhnSYBtyZsk?si=OgFCY5p4Tu2u2rK-3mFYjA "The advent of sampling raised many questions about the ethical use of material created by others," Herndon wrote in a statement published on Twitter at the end of 2018, "but the era of machine legible culture accelerates and abstracts that conversation." https://twitter.com/hollyherndon/status/1069978436851113985 What does Holly Herndon's album tell us about artificial intelligence? PROTO raises a number of really important questions about artificial intelligence. First and foremost, it suggests that artificial intelligence isn't a replacement for human intelligence. Spawn isn't used to take the jobs from any musicians, but rather extends what's sonically possible. It adds to their capabilities, giving it a new dimension. Furthermore, just as Herndon refuses to see artificial intelligence as something which can replicate human labor - or creativity - it also points out some of the very problems with this kind of understanding: the idea that AI can 'replicate' human intelligence at all. Instead, the album's merging of the human and the artificial is a way of exploring the weaknesses of artificial intelligence. This is a way of making AI more transparent. It opens up something that so seems seamless, and highlights the ways it doesn't quite work. It almost refracts rather than mimics the sound the human musicians make. As Herndon said in an interview with Jezebel publication The Muse, "the technology is impressive and it’s cool but it’s really early still. We really wanted to be honest about that and show its mistakes and show how kind of rough the technology is still because... it's more honest and more interesting, to allow it to have its own aesthetic." https://www.youtube.com/watch?v=r4sROgbaeOs Read next: Why an algorithm will never win a Pulitzer The human side of AI technology But the album does more than just present AI as a flawed tool that can complement human ingenuity. It also asks us about ownership and creativity. It uses the technology as a way of tackling human questions like "what does it mean to create something?" and "who's even allowed to create things?" This is important when we consider the fact that not only does someone control and own a given algorithm - as in literally owning the intellectual property - but also that someone owns and controls the swathes of data that are, at a really fundamental level, crucial to artificial intelligence being possible at all. "The history of music and our shared, human, intellectual project that leads up to today, is a shared resource that we all tap into and we all learn from," Herndon also said in the interview with Jezebel. "So if an individual can just scrape that and then claim so much of that as their own because they hold the keys to this AI, and then they can recreate it, of course it’s going to give people anxiety because there’s an ethical issue with that." Read next: Sex robots, artificial intelligence, and ethics: How desire shapes and is shaped by algorithms Instrumental and aesthetic artificial intelligence One of the main reasons artificial intelligence has become a buzzword is because it's a tool for industry. It has a commercial value; it can improve efficiency by allowing us to do more with less. The value of an album like PROTO - even if it's not the sort of thing you'd usually listen to - is that it removes artificial intelligence from a context in which it is instrumentalized, and puts it into one that's purely aesthetic. To make that clearer, it changes something we'd typically think about in a functional manner - is it working? is it doing what it's supposed to do? - to something in which it's very function is open to question. If Herndon's album is able to do that in even the smallest way, then that can only be a good thing, right? And even if it doesn't - at least it sounds good...
Read more
  • 0
  • 0
  • 3385
Visually different images

article-image-singapore-passes-controversial-bill-that-criminalizes-publishing-fake-news
Vincy Davis
10 May 2019
3 min read
Save for later

Singapore passes controversial bill that criminalizes publishing “fake” news

Vincy Davis
10 May 2019
3 min read
Yesterday, Singapore passed a law criminalizing publication of fake news which will allow the government to block and order the removal of such content. The bill ‘The Protection from Online Falsehoods and Manipulation’ was passed by a vote of 72-9 in the Singapore parliament. This law would allow the government to demand corrections, order the removal of content, or block websites deemed to be propagating falsehoods contrary to the public interest. Two months ago, Russia passed a new law which will allow the government to punish individuals and online media for spreading “fake” news and information which disrespects the state. In recent months, other countries like France and Germany have already passed tough laws against fake news or hate speech. Singapore is ranked 151 out of 180 countries in this year's World Press Freedom Index. What does the Bill cover? ‘The Protection from Online Falsehoods and Manipulation Bill’ will give the Singapore government the power to ban fake news which can be detrimental to Singapore or can influence elections. The government can demand removal of such hurtful content or they can even block it. Offenders could face a jail term of up to 10 years and hefty fines. Last month during a visit to Malaysia, Singapore Prime Minister Lee Hsien Loong had said, “fake news was a serious problem and other countries including France, Germany and Australia were legislating to combat it”. He added Singapore’s proposed laws “will be a significant step forward” and “We’ve deliberated on this now for almost two years. What we have done has worked for Singapore, it is our objective to continue to do things which will work for Singapore.” Reactions to the Bill Under the proposed legislation, all of Singapore government's ministers will be handed powers to demand corrections or order websites to be blocked if they are found to be propagating “falsehoods” contrary to the public interest. Very few people have praised the law, as there are many who believe that this law will target ‘Free Speech’ more than ‘Fake News’. Phil Robertson, deputy Asia director at Human Rights said, “Singapore’s new 'fake news' law is a disaster for online expression by ordinary #Singaporeans, and a hammer blow against the independence of many online news portals they rely on to get real news about their country beyond the ruling People's Action Party political filter”. He added, “You’re basically giving the autocrats another weapon to restrict speech, and speech is pretty restricted in the region already.” Social media firms have strongly criticized the law which hurt freedom of speech by forcing social media platforms to censor users in order to avoid potential fines. Google, Facebook, and Twitter have voiced their reservations regarding the ‘Fake News bill’. According to Reuters news, Google which has its Asia headquarters in Singapore, said it was "concerned that this law will hurt innovation" and that "how the law is implemented matters." Though authorities around the world are of the opinion that laws to restrict ‘Fake News’ are the need of the hour. It would be good if they would have decide what’s worse, some fake news on the web or some big daddy deciding what is right for the people. To know more details about the bill, read the document release. Facebook hires top EEF lawyer and Facebook critic as Whatsapp privacy policy manager Will Facebook enforce it’s updated “remove, reduce, and inform” policy to curb fake news and manage problematic content? OpenAI’s new versatile AI model, GPT-2 can efficiently write convincing fake news from just a few words
Read more
  • 0
  • 0
  • 1141

article-image-linux-forms-urban-computing-foundation-open-source-tools-build-autonomous-vehicles-smart-infrastructure
Fatema Patrawala
09 May 2019
3 min read
Save for later

Linux forms Urban Computing Foundation: Set of open source tools to build autonomous vehicles and smart infrastructure

Fatema Patrawala
09 May 2019
3 min read
The Linux Foundation, nonprofit organization enabling mass innovation through open source, on Tuesday announced the formation of the Urban Computing Foundation (UCF). UCF will accelerate open source software to improve mobility, safety, road infrastructure, traffic congestion and energy consumption in connected cities. UCF’s mission is to enable developers, data scientists, visualization specialists and engineers to improve urban environments, human life quality, and city operation systems to build connected urban infrastructure. The founding members of UCF are Facebook, Google, IBM, UC San Diego, Interline Technologies, Uber etc. The executive director of Linux Foundation, Jim Zemlin spoke to Venturebeat, and said the Foundation will adopt an open governance model developed by the Technical Advisory Council (TAC), which will include technical and IP stakeholders in urban computing who’ll guide its work through projects by review and curation. The intent, added Zemlin, is to provide platforms to developers who seek to address traffic congestion, pollution, and other problems plaguing modern metros. Here’s the list of TAC members: Drew Dara-Abrams, principal, Interline Technologies Oliver Fink, director Here XYZ, Here Technologies Travis Gorkin, engineering manager of data visualization, Uber Shan He, project leader of Kepler.gl, Uber Randy Meech, CEO, StreetCred Labs Michal Migurski, engineering manager of spatial computing, Facebook Drishtie Patel, product manager of maps, Facebook Paolo Santi, senior researcher, MIT Max Sills, attorney, Google On Tuesday, Facebook announced their participation as a founding member of the Urban Computing Foundation (UCF). https://twitter.com/fb_engineering/status/1125783991452180481 Facebook mentions in its post that, “We are using our expertise — including a predictive model for mapping electrical grids, disaster maps , and more accurate population density maps — to improve access to this type of technology”. Further Facebook mentions that UCF will establish a neutral space for the critical work. It will include adapting geospatial and temporal machine learning techniques for urban environments and developing simulation methodologies for modeling and predicting citywide phenomena. Uber also reported about their joining and announced their contribution of Kepler.gl as the initiative’s first official project. Kepler is Uber’s open source, no-code geospatial analysis tool for creating large-scale data sets. It was released in 2018, and is currently used by Airbnb, Atkins Global, Cityswifter, Lime, Mapbox, Sidewalk Labs, and UBILabs, among others to generate visualizations of location data. While all of this set a path towards making of smarter cities, it also raises an alarm to another way of violating privacy and mishandling user data as per the history in tech. Moreover when recently Amnesty International in Canada regarded the Google Sidewalk Labs project in Toronto to normalize mass surveillance and a direct threat to human rights. Questions are raised as to the tech companies forming foundation to address traffic congestion issue but not to address the privacy violation or online extremism. https://twitter.com/shannoncoulter/status/1126199285530238976 The Linux Foundation announces the CHIPS Alliance project for deeper open source hardware integration Mapzen, an open-source mapping platform, joins the Linux Foundation project Uber becomes a Gold member of the Linux Foundation
Read more
  • 0
  • 0
  • 2663

article-image-sherin-thomas-explains-how-to-build-a-pipeline-in-pytorch-for-deep-learning-workflows
Packt Editorial Staff
09 May 2019
8 min read
Save for later

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Packt Editorial Staff
09 May 2019
8 min read
A typical deep learning workflow starts with ideation and research around a problem statement, where the architectural design and model decisions come into play. Following this, the theoretical model is experimented using prototypes. This includes trying out different models or techniques, such as skip connection, or making decisions on what not to try out. PyTorch was started as a research framework by a Facebook intern, and now it has grown to be used as a research or prototype framework and to write an efficient model with serving modules. The PyTorch deep learning workflow is fairly equivalent to the workflow implemented by almost everyone in the industry, even for highly sophisticated implementations, with slight variations. In this article, we explain the core of ideation and planning, design and experimentation of the PyTorch deep learning workflow. This article is an excerpt from the book PyTorch Deep Learning Hands-On by Sherin Thomas and Sudhanshi Passi. This book attempts to provide an entirely practical introduction to PyTorch. This PyTorch publication has numerous examples and dynamic AI applications and demonstrates the simplicity and efficiency of the PyTorch approach to machine intelligence and deep learning. Ideation and planning Usually, in an organization, the product team comes up with a problem statement for the engineering team, to know whether they can solve it or not. This is the start of the ideation phase. However, in academia, this could be the decision phase where candidates have to find a problem for their thesis. In the ideation phase, engineers brainstorm and find the theoretical implementations that could potentially solve the problem. In addition to converting the problem statement to a theoretical solution, the ideation phase is where we decide what the data types are and what dataset we should use to build the proof of concept (POC) of the minimum viable product (MVP). Also, this is the stage where the team decides which framework to go with by analyzing the behavior of the problem statement, available implementations, available pretrained models, and so on. This stage is very common in the industry, and I have come across numerous examples where a well-planned ideation phase helped the team to roll out a reliable product on time, while a non-planned ideation phase destroyed the whole product creation. Design and experimentation The crucial part of design and experimentation lies in the dataset and the preprocessing of the dataset. For any data science project, the major timeshare is spent on data cleaning and preprocessing. Deep learning is no exception from this. Data preprocessing is one of the vital parts of building a deep learning pipeline. Usually, for a neural network to process, real-world datasets are not cleaned or formatted. Conversion to floats or integers, normalization and so on, is required before further processing. Building a data processing pipeline is also a non-trivial task, which consists of writing a lot of boilerplate code. For making it much easier, dataset builders and DataLoader pipeline packages are built into the core of PyTorch. The dataset and DataLoader classes Different types of deep learning problems require different types of datasets, and each of them might require different types of preprocessing depending on the neural network architecture we use. This is one of the core problems in deep learning pipeline building. Although the community has made the datasets for different tasks available for free, writing a preprocessing script is almost always painful. PyTorch solves this problem by giving abstract classes to write custom datasets and data loaders. The example given here is a simple dataset class to load the fizzbuzz dataset, but extending this to handle any type of dataset is fairly straightforward. PyTorch's official documentation uses a similar approach to preprocess an image dataset before passing that to a complex convolutional neural network (CNN) architecture. A dataset class in PyTorch is a high-level abstraction that handles almost everything required by the data loaders. The custom dataset class defined by the user needs to override the __len__ and __getitem__ functions of the parent class, where __len__ is being used by the data loaders to determine the length of the dataset and __getitem__ is being used by the data loaders to get the item. The __getitem__ function expects the user to pass the index as an argument and get the item that resides on that index: from dataclasses import dataclassfrom torch.utils.data import Dataset, DataLoader@dataclass(eq=False)class FizBuzDataset(Dataset):    input_size: int    start: int = 0    end: int = 1000    def encoder(self,num):        ret = [int(i) for i in '{0:b}'.format(num)]        return[0] * (self.input_size - len(ret)) + ret    def __getitem__(self, idx):        x = self.encoder(idx)        if idx % 15 == 0:            y = [1,0,0,0]        elif idx % 5 ==0:            y = [0,1,0,0]        elif idx % 3 == 0:            y = [0,0,1,0]        else:            y = [0,0,0,1]        return x,y           def __len__(self):        return self.end - self.start The implementation of a custom dataset uses brand new dataclasses from Python 3.7. dataclasses help to eliminate boilerplate code for Python magic functions, such as __init__, using dynamic code generation. This needs the code to be type-hinted and that's what the first three lines inside the class are for. You can read more about dataclasses in the official documentation of Python (https://docs.python.org/3/library/dataclasses.html). The __len__ function returns the difference between the end and start values passed to the class. In the fizzbuzz dataset, the data is generated by the program. The implementation of data generation is inside the __getitem__ function, where the class instance generates the data based on the index passed by DataLoader. PyTorch made the class abstraction as generic as possible such that the user can define what the data loader should return for each id. In this particular case, the class instance returns input and output for each index, where, input, x is the binary-encoder version of the index itself and output is the one-hot encoded output with four states. The four states represent whether the next number is a multiple of three (fizz), or a multiple of five (buzz), or a multiple of both three and five (fizzbuzz), or not a multiple of either three or five. Note: For Python newbies, the way the dataset works can be understood by looking first for the loop that loops over the integers, starting from zero to the length of the dataset (the length is returned by the __len__ function when len(object) is called). The following snippet shows the simple loop: dataset = FizBuzDataset()for i in range(len(dataset)):    x, y = dataset[i]dataloader = DataLoader(dataset, batch_size=10, shuffle=True,                     num_workers=4)for batch in dataloader:    print(batch) The DataLoader class accepts a dataset class that is inherited from torch.utils.data.Dataset. DataLoader accepts dataset and does non-trivial operations such as mini-batching, multithreading, shuffling, and so on, to fetch the data from the dataset. It accepts a dataset instance from the user and uses the sampler strategy to sample data as mini-batches. The num_worker argument decides how many parallel threads should be operating to fetch the data. This helps to avoid a CPU bottleneck so that the CPU can catch up with the GPU's parallel operations. Data loaders allow users to specify whether to use pinned CUDA memory or not, which copies the data tensors to CUDA's pinned memory before returning it to the user. Using pinned memory is the key to fast data transfers between devices, since the data is loaded into the pinned memory by the data loader itself, which is done by multiple cores of the CPU anyway. Most often, especially while prototyping, custom datasets might not be available for developers and in such cases, they have to rely on existing open datasets. The good thing about working on open datasets is that most of them are free from licensing burdens, and thousands of people have already tried preprocessing them, so the community will help out. PyTorch came up with utility packages for all three types of datasets with pretrained models, preprocessed datasets, and utility functions to work with these datasets. This article is about how to build a basic pipeline for deep learning development. The system we defined here is a very common/general approach that is followed by different sorts of companies, with slight changes. The benefit of starting with a generic workflow like this is that you can build a really complex workflow as your team/project grows on top of it. Build deep learning workflows and take deep learning models from prototyping to production with PyTorch Deep Learning Hands-On written by Sherin Thomas and Sudhanshu Passi. F8 PyTorch announcements: PyTorch 1.1 releases with new AI tools, open sourcing BoTorch and Ax, and more Facebook AI open-sources PyTorch-BigGraph for faster embeddings in large graphs Top 10 deep learning frameworks
Read more
  • 0
  • 0
  • 8120
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €14.99/month. Cancel anytime
article-image-msbuild2019-microsoft-launches-new-products-to-secure-elections-and-political-campaigns
Sugandha Lahoti
07 May 2019
2 min read
Save for later

#MSBuild2019: Microsoft launches new products to secure elections and political campaigns

Sugandha Lahoti
07 May 2019
2 min read
It seems big tech giants are getting pretty serious about protecting election integrity and adopting data protection measures. At the ongoing Microsoft Build 2019 developer conference, CEO Satya Nadella announced ElectionGuard, a free open-source software development kit (SDK) as an extension of Microsoft’s Defending Democracy Program. ElectionGuard SDK It is an open-source SDK and voting system reference implementation that was developed in partnership with Galois. This SDK will provide voting system vendors with the ability to enable end-to-end verifiability and improved risk-limiting audit capabilities for elections in their systems. It will be offered free to voting system vendors either to integrate into their existing systems or to use to build all-new election systems. “One of the things we want to ensure is real transparency and verifiability in election systems. And so this is an open source project that will be alive on GitHub by the end of this month, which will even bring some new technology from Microsoft Research around homomorphic encryption, so that you can have the software stack that can modernize all of the election infrastructure everywhere in the world,” CEO Satya Nadella said onstage today at Microsoft’s annual Build developer conference in Seattle. The ElectionGuard SDK and reference implementation will be available on GitHub in June, just ahead of the EU elections. Microsoft 365 for Campaigns Micrsoft365 for Campaigns provides security capabilities of Microsoft 365 Business to political parties and individual candidates. M365 for Campaigns will be rolled out to customers this summer for $5 per user per month. Any campaign using M365 for Campaigns will have free access to Microsoft’s AccountGuard service. Microsoft claims it'll be affordable, naturally, and "preconfigured to optimize for the unique operating environments campaigns face." Starting next month, M365 for Campaigns will be available for all federal election campaign candidates, federal candidate committees, and national party committees in the United States Microsoft Build is in its 6th year and will continue till 8th May. The conference hosts over 6,000 attendees with early 500 student-age developers and over 2,600 customers and partners in attendance. Watch this space for more coverage of Microsoft Build 2019. Microsoft introduces Remote Development extensions to make remote development easier on VS Code Docker announces a collaboration with Microsoft’s .NET at DockerCon 2019 How Visual Studio Code can help bridge the gap between full-stack development and DevOps [Sponsered by Microsoft]
Read more
  • 0
  • 0
  • 2390

article-image-openai-two-new-versions-and-the-output-dataset-of-gpt-2-out
Vincy Davis
07 May 2019
3 min read
Save for later

OpenAI: Two new versions and the output dataset of GPT-2 out!

Vincy Davis
07 May 2019
3 min read
Today, OpenAI have released the versions of GPT-2, which is a new AI model. GPT-2 is capable of generating coherent paragraphs of text without needing any task-specific training. The release includes a medium 345M version and a small 117M version of GPT-2. They have also shared the 762M and 1.5B versions with partners in the AI and security communities who are working to improve societal preparedness for large language models. The earlier version release of GPT was in the year 2018. In February 2019, Open-AI had made an announcement about GPT-2 with many samples and policy implications. Read More: OpenAI’s new versatile AI model, GPT-2 can efficiently write convincing fake news from just a few words The team at OpenAI has decided on a staged release of GPT-2. Staged release will have the gradual release of family models over time. The reason behind the staged release of GPT-2 is to give people time to assess the properties of these models, discuss their societal implications, and evaluate the impacts of release after each stage. The 345M parameter version of GPT-2 has improved performance relative to the 117M version, though it does not offer much ease of generating coherent text. Also it would be difficult to misuse the 345M version. Many factors like ease of use for generating coherent text, the role of humans in the text generation process, the likelihood and timing of future replication and publication by others, evidence of use in the wild and expert-informed inferences about unobservable uses, etc were considered while releasing this staged 345M version. The team is hopeful that the ongoing research on bias, detection, and misuse will boost them to publish larger models and in six months, they will share a fuller analysis of language models’ societal implications and the heuristics for release decisions. The team at OpenAI is looking for partnerships with academic institutions, non-profits, and industry labs which will focus on increasing societal preparedness for large language models. They are also open to collaborating with researchers working on language model output detection, bias, and publication norms, and with organizations potentially affected by large language models. The output dataset contains GPT-2 outputs from all 4 model sizes, with and without top-k truncation, as well as a subset of the WebText corpus used to train GPT-2. The dataset features approximately 250,000 samples per model/hyperparameter pair, which will be sufficient to help a wider range of researchers perform quantitative and qualitative analysis. To know more about the release, head over to the official release announcement. OpenAI introduces MuseNet: A deep neural network for generating musical compositions OpenAI researchers have developed Sparse Transformers, a neural network which can predict what comes OpenAI Five bots destroyed human Dota 2 players this weekend
Read more
  • 0
  • 0
  • 4392

article-image-duckduckgo-proposes-do-not-track-act-of-2019-to-require-sites-to-respect-dnt-browser-setting
Sugandha Lahoti
07 May 2019
3 min read
Save for later

DuckDuckGo proposes “Do-Not-Track Act of 2019” to require sites to respect DNT browser setting

Sugandha Lahoti
07 May 2019
3 min read
DuckDuckGo, the browser known for its privacy protection policies, has proposed draft legislation which will require sites to respect the Do Not Track browser setting. Called, the “Do-Not-Track Act of 2019”, this legislation will mandate websites to not track people if they have enabled the DNT signal on their browsers. Per a recent study conducted by DuckDuckGo, a quarter of people have turned on this setting, and most were unaware big sites do not respect it. [box type="shadow" align="" class="" width=""] Do-Not-Track Signal” means a signal sent by a web browser or similar User Agent that conveys a User’s choice regarding online Tracking, reflects a deliberate choice by the user. It complies with the latest Tracking Preference Expression (DNT) specification published by the World Wide Web Consortium (W3C)[/box] DuckDuckGo’s act just comes days after Google announced more privacy control to its users. Last week, Google launched a new feature allowing users to delete all or part of the location history and web and app activity data, manually.  It has a time limit for how long you want your activity data to be saved: 3 or 18 months, before deleting it automatically. However, it does not have an option to not store history automatically. DuckDuckGo’s proposed 'Do-Not-Track Act of 2019' legislation details the following points: No third-party tracking by default. Data brokers would no longer be legally able to use hidden trackers to slurp up your personal information from the sites you visit. And the companies that deploy the most trackers across the web — led by Google, Facebook, and Twitter — would no longer be able to collect and use your browsing history without your permission. No first-party tracking outside what the user expects. For example, if you use Whatsapp, its parent company (Facebook) wouldn't be able to use your data from Whatsapp in unrelated situations (like for advertising on Instagram, also owned by Facebook). As another example, if you go to a weather site, it could give you the local forecast, but not share or sell your location history. The legislation would have exceptions for debugging, auditing, security, non-commercial research, and journalism. However, each of these exceptions would only apply if a site adopts strict data-minimization practices. These include using the least amount of personal information needed, and anonymizing it when possible. Also, restrictions would only come into play only if a consumer has turned on the Do Not Track setting in their browser settings. In case of violation of the Do-Not-Track Act of 2019, DuckDuckGo proposes an amount no less than $50,000 and no more than $10,000,000 or 2% of an Organization’s annual revenue, whichever is greater, can be charged by the legislators. If the act passes into law, sites would be required to cease certain user tracking methods, which means fewer data available to inform marketing and advertising campaigns. The proposal is still quite far from being turning into law but presidential candidate Elizabeth Warren’s recent proposal to regulate “big tech companies”, may give it a much-needed boost. Twitter users complimented the act. https://twitter.com/Bendineliot/status/1123579280892538881 https://twitter.com/jmhaigh/status/1123574469950414848 https://twitter.com/n0ahrabbit/status/1123572013153439745 For the full text, download the proposed Do-Not-Track Act of 2019. DuckDuckGo now uses Apple MapKit JS for its map and location-based searches DuckDuckGo chooses to improve its products without sacrificing user privacy ‘Ethical mobile operating system’ /e/, an alternative for Android and iOS, is in beta
Read more
  • 0
  • 0
  • 3409

article-image-an-unsupervised-deep-neural-network-cracks-250-million-protein-sequences-to-reveal-biological-structures-and-functions
Vincy Davis
07 May 2019
4 min read
Save for later

An unsupervised deep neural network cracks 250 million protein sequences to reveal biological structures and functions

Vincy Davis
07 May 2019
4 min read
One of the goals for artificial intelligence in biology is the creation of controllable predictive and generative models that can read and generate biology in its native language. Artificial neural networks with proven pattern recognition capabilities, have been utilized in many areas of bioinformatics. Accordingly, research is necessary into methods that can learn intrinsic biological properties directly from protein sequences, which can be transferred to prediction and generation. Last week, Alexander Rives and Rob Fergus from Dept. of Computer Science, New York University, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C. Lawrence Zitnick and Jerry Ma from Facebook AI Research team together published a paper titled ‘Biological Structure And Function Emerge From Scaling Unsupervised Learning to 250 Million Protein Sequences’. This paper investigates scaling high-capacity neural networks to extract general and transferable information about proteins from raw sequences. The next-generation sequencing (NGS) have revolutionized the biological field. It has also helped in performing a wide variety of applications and study biological systems at a detailed level. Recently due to reductions in the cost of this technology, there has been exponential growth in the size of biological sequence datasets. When data is sampled across diverse sequences, it helps in studying predictive and generative techniques for biology using artificial intelligence. In this paper the team has investigated deep learning across evolution at the scale of the largest available protein sequence databases. What does the research involve Researchers have applied self-supervision to the problem of understanding protein sequences and explore information about representation learning. They have trained a neural network by predicting masked amino acids. For training the neural network, a wide range of datasets containing 250M protein sequences with 86 billion amino acids are used during the research. The resulting model maps raw sequences to representations of biological properties without any prior domain knowledge. The neural network represents the identity of each amino acid in its input and output embeddings. The space of representations learned from sequences provides biological structure information at many levels, including that of amino acids, proteins, groups of orthologous genes, and species. Information about secondary and tertiary structure is internalized and represented within the network in a generalizable form. Observations from the research Finally the paper states that it is possible to adapt networks that have been trained on evolutionary data which will give results using only features that have been learned from sequences i.e., without any prior knowledge. It was also observed that the higher capacity models which were trained, were not fit for the 250M sequences, due to insufficient model capacity. The researchers are certain that using trained network architectures, along with predictive models will help in generating and optimizing new sequences for desired functions. It will also work for sequences that have not been seen before in nature but that are biologically active. They have tried to use unsupervised learning to recover representations that can map multiple levels of biological granularity. https://twitter.com/soumithchintala/status/1123236593903423490 But the result of the paper does not satisfy the community completely. Some are of the opinion that the paper is incomprehensible and has left some information unarticulated. For example, it is not specified which representation of biological properties does the model map. A user on Reddit commented that, “Like some of the other ML/AI posts that made it to the top page today, this research too does not give any clear way to reproduce the results. I looked through the pre-print page as well as the full manuscript itself. Without reproducibility and transparency in the code and data, the impact of this research is ultimately limited. No one else can recreate, iterate, and refine the results, nor can anyone rigorously evaluate the methodology used”. Another user added, “This is cool, but would be significantly cooler if they did some kind of biological follow up. Perhaps getting their model to output an "ideal" sequence for a desired enzymatic function and then swapping that domain into an existing protein lacking the new function”. Create machine learning pipelines using unsupervised AutoML [Tutorial] Rigetti develops a new quantum algorithm to supercharge unsupervised Machine Learning RIP Nils John Nilsson; an AI visionary, inventor of A* algorithm, STRIPS automatic planning system and many more
Read more
  • 0
  • 0
  • 2814
article-image-palantirs-software-was-used-to-separate-families-in-a-2017-operation-reveals-mijente
Savia Lobo
06 May 2019
4 min read
Save for later

Palantir’s software was used to separate families in a 2017 operation reveals Mijente

Savia Lobo
06 May 2019
4 min read
Documents released this week, reveals that the data mining firm, Palantir was responsible for 2017 operation that targeted and arrested family members of children crossing the border alone. The documents show a huge contrast to what Palantir said its software was doing. This discrepancy was first identified by Mijente, an advocacy organization that has closely tracked Palantir’s murky role in immigration enforcement. The documents confirm that “the role Palantir technology played in facilitating hundreds of arrests, only a small fraction of which led to criminal prosecutions”, The Intercept reports. Palantir, a software firm founded by Peter Thiel, one of President Trump’s most vocal supporters in Silicon Valley, develops software that helps agents analyze massive amounts of personal data and builds profiles for prosecution and arrest. Also, in May 2018, Amazon employees, in a letter to Jeff Bezos, protested against the sale of its facial recognition tech to Palantir where they “refuse to contribute to tools that violate human rights”, citing the mistreatment of refugees and immigrants by ICE. Read Also: Amazon addresses employees dissent regarding the company’s law enforcement policies at an all-staff meeting, in a first Palantir earlier said it was not involved with the part of ICE, which was strictly devoted to deportations and the enforcement of immigration laws. Whereas Palantir’s $38 million contract with Homeland Security Investigations, or HSI, a component of ICE had a far broader criminal enforcement mandate. https://twitter.com/ConMijente/status/1124056308943138834 The 2017 ICE operation was designed to dissuade children from joining family members in the United States by targeting parents and sponsors for arrest. According to The Intercept, “Documents obtained through Freedom of Information Act litigation and provided to The Intercept show that this claim, that Palantir software is strictly involved in criminal investigations as opposed to deportations, is false.” As part of the operation, ICE arrested 443 people solely for being undocumented. For all this, Palantir’s software was used throughout, which helped agents build profiles of immigrant children and their family members for the prosecution and arrest of any undocumented person they encountered in their investigation. https://twitter.com/ConMijente/status/1124056314106322944 “The operation was underway as the Trump administration detained hundreds of children shelters throughout the country. Unaccompanied children were taken by border agents, sent to privately-run facilities, and held indefinitely. Any undocumented parent or family member who came forward to claim children were arrested by ICE for deportation. More children were kept in detention longer, as relatives stopped coming forward”, Mijente reports. Mijente further mentions in their post, “Mijente is urging Palantir to drop its contract with ICE and stop providing software to agencies that aid in tracking, detaining, and deporting migrants, refugees, and asylum seekers. As Palantir plans its initial public offering, Mijente is also calling on investors not to invest in a company that played a key role in family separation.” The seven-page document, titled “Unaccompanied Alien Children Human Smuggling Disruption Initiative,” details how one of Palantir’s software solutions, Investigative Case Management (ICM) can be used by agents stationed at the border to build cases of unaccompanied children and their families.Mijente further mentions, “This document is further proof that Palantir’s software directly aids in prosecutions for deportation carried out by HSI agents. Not only are HSI agents involved in deportations in the interior, but they are also actively aiding border agents by investigating and prosecuting relatives of unaccompanied children hoping to join their families.” Jesse Franzblau, senior policy analyst for the National Immigrant Justice Center, said in an email to The Intercept, “The detention and deportation machine is not only driven by hate, but also by profit. Palantir profits from its contract with ICE to help the administration target parents and sponsors of children, and also pays Amazon to use its servers in the process. The role of private tech behind immigration enforcement deserves more attention, particularly with the growing influence of Silicon Valley in government policymaking. “Yet, Palantir’s executives have made no move to cancel their work with ICE. Its founder, Alex Karp, said he’s “proud” to work with the United States government. Last year, he reportedly ignored employees who “begged” him to end the firm’s contract with ICE”, the Mijente report mentions. To know more about this news in detail head over to the official report. Lerna relicenses to ban major tech giants like Amazon, Microsoft, Palantir from using its software as a protest against ICE Free Software Foundation updates their licensing materials, adds Commons Clause and Fraunhofer FDK AAC license “We can sell dangerous surveillance systems to police or we can stand up for what’s right. We can’t do both,” says a protesting Amazon employee
Read more
  • 0
  • 0
  • 2590

article-image-deeplearning4j-1-0-0-beta4-released-with-full-multi-datatype-support-new-attention-layers-and-more
Vincy Davis
03 May 2019
3 min read
Save for later

Deeplearning4J 1.0.0-beta4 released with full multi-datatype support, new attention layers, and more!

Vincy Davis
03 May 2019
3 min read
Yesterday, Deep Learning For Java (DL4J) released their new beta version, DL4J 1.0.0-beta4. The main highlight of this version is the full multi-datatype support for ND4J and DL4J, unlike past releases. The previous version deeplearning4j-1.0.0-beta3 was released last year. This 1.0.0-beta4 version also includes the addition of MKL-DNN support, new attention layers, and many more along with optimizations and bug fixes. What’s new in DL4J 1.0.0-beta4? Full multi-datatype support In past releases, all N-Dimensional arrays in ND4J were limited to a single datatype, set globally. Now, arrays of all datatypes may be used simultaneously. The supported datatypes are Double, Float, Half, Long, Int, Short, Ubyte, Byte, Bool and UTF8. CUDA Support CUDA 10.1 support has been added and CUDA 9.0 support has been dropped. DL4J 1.0.0-beta4 also supports CUDA versions 9.2, 10.0 and 10.1. Mac (OSX) CUDA binaries are no longer provided. However, support for Linux and Windows CUDA, and OSX CPU(x86_64) is still available. Memory Management Changes In DL4J 1.0.0-beta4, the periodic garbage collection is disabled by default; instead, garbage collection (GC) will be called only when it is required to reclaim memory from arrays that are allocated outside of workspaces. Deeplearning4J: Bug Fixes and Optimizations cuDNN helpers will no longer attempt to fall back on built-in layer implementations if an out-of-memory exception is thrown. Batch normalization global variance reparameterized to avoid underflow and zero/negative variance in some cases during distributed training. A bug where dropout instances were incorrectly shared between layers when using transfer learning with dropout has been fixed. An issue where tensor Along Dimension could result in an incorrect array order for edge cases and hence exceptions in LSTMs has been fixed. ND4J and SameDiff Features and Enhancements Removed reliance on periodic garbage collection calls for handling memory management of out-of-workspace (detached) INDArrays. New additions - TensorFlowImportValidator tool, INDArray.close() method, Nd4j.createFromNpzFile method, support for importing BERT models into SameDiff, SameDiff GraphTransformUtil, etc have been added. Evaluation, RegressionEvaluation etc. now support 4d (CNN segmentation) data formats. Bug Fixes and Optimizations The bug with InvertMatrix.invert() with [1,1] shape matrices has been fixed. Edge case bug for Updater instances with length 1 state arrays has been fixed. In SameDiff, ‘gradients’ are now no longer defined for non-floating-point variables, and variables that aren’t required to calculate loss or parameter gradients, thus the gradient calculation performance has improved. To know more about the release, check the detailed release notes. Deeplearning4j 1.0.0-alpha arrives! 7 things Java programmers need to watch for in 2019 Deep Learning Indaba presents the state of Natural Language Processing in 2018
Read more
  • 0
  • 0
  • 2835

article-image-oakland-privacy-advisory-commission-lay-out-privacy-principles-for-oaklanders-and-propose-ban-on-facial-recognition-tech
Amrata Joshi
30 Apr 2019
5 min read
Save for later

Oakland Privacy Advisory Commission lay out privacy principles for Oaklanders and propose ban on facial recognition tech

Amrata Joshi
30 Apr 2019
5 min read
Privacy issues are now becoming a matter of concern, with Silicon Valley coming under the radar every now and then, and lawmakers taking a stand for the user’s privacy, it seems a lot of countries are now making an effort in this direction. In the US, lawmakers have already started working on the lawsuits and regulations that violate consumer data privacy. Countries like California have taken steps towards issues related to privacy and surveillance. Perhaps last week, the Oakland Privacy Advisory Commission released 2 key documents, an initiative to protect Oaklanders’ privacy namely, Proposed ban on facial recognition and City of Oakland Privacy Principles. https://twitter.com/cfarivar/status/1123081921498636288 Proposal to ban facial recognition tech The committee has written this document which talks about the regulations on Oakland’s acquisition and use of surveillance technology. It has defined Face Recognition Technology “as an automated or semi-automated process that assists in identifying or verifying an individual based on an individual's face.” According to this document, it will be unlawful for any city staff to retain, obtain, request, access, or use any Face Recognition Technology or any information obtained from Face Recognition Technology. City staff’s unintentional receipt, access to, or use of any information that has been obtained from Face Recognition Technology shouldn’t violate the above. Provided that the city staff shouldn’t request or solicit its receipt, access to, or use of such information. Unless the city staff logs such access to, receipt, or use in its Annual Surveillance Report. Oakland privacy principles laid out by the committee The Oakland Privacy Advisory Commission has listed few principles with regards to user’s data privacy for Oaklanders. Following are the privacy principles: Design and use equitable privacy practices According to the first principle, community safety and access to city services shouldn’t be welcomed at the cost of any Oaklander’s right to privacy. They aim to collect information in a way that won’t discriminate against any Oaklander or Oakland community. Whenever possible, the alternatives to the collection of personal data will be communicated at the time of data collection. Limit collection and retention of personal information According to this principle, personal information should be collected and stored only when and for as long as is justified for serving the purpose of collecting it in the first place. Information related to Oaklanders’ safety, health, or access to city services should be protected. Oaklanders views on collection of information will be considered by the Commission. Manage personal information with diligence Oaklanders’ personal information should be treated with respect and handled with care, regardless of how or by whom it was collected. For maintaining the security of the systems, the software and applications that interact with Oaklanders’ personal information are regularly updated and reviewed by the Commission. The personal information gathered from different departments will be combined when there is a need. According to the Oakland Privacy Advisory Commission, encryption, minimization, deletion, and anonymization can reduce misuse of personal information. The aim of the Commission is to create effective use of these tools and practices. Extend privacy protections to our relationships with third parties According to the Oakland Privacy Advisory Commission, the responsibility to protect Oaklanders’ privacy should be extended to the vendors and partners. Oaklanders’ personal information should be shared by the Commission with third parties only to provide city services, and only when doing so is consistent with these privacy principles. The Commission will disclose the identity of parties with whom they share personal information, once the law permits to do so. Safeguard individual privacy in public records disclosures According to the Commission, providing relevant information to interested parties about their services and governance is essential to democratic participation as well as civic engagement. The Commission will protect Oaklanders’ individual privacy interests and the City’s information security interests and will still preserve the fundamental objective of the California Public Records Act for encouraging transparency. Be transparent and open The Commission states that Oaklanders’ right to privacy is open to access and understand explanations of why and how they collect, use, manage, and share personal information. And they aim to communicate these explanations to Oakland communities in plain and accessible language on the City of Oakland website. Be accountable to Oaklanders The Commission publicly reviews and discusses departmental requests for acquiring and using technology that can be used for surveillance purposes. The Commission further encourages Oaklanders to share their views and concerns regarding any system or department that collects and uses their personal information or has the potential to do so. And the Commission allows Oaklanders to share their views on their compliance with these Principles. Well, it seems Oakland has clearly signalled that development at the cost of Oaklanders’ privacy won’t be unacceptable, there is still a long race to go for the cities around the world with respect to their user privacy laws. Russia opens civil cases against Facebook and Twitter over local data laws Microsoft says tech companies are “not comfortable” storing their data in Australia thanks to the new anti-encryption law Harvard Law School launches its Caselaw Access Project API and bulk data service making almost 6.5 million cases available  
Read more
  • 0
  • 0
  • 2349
article-image-googles-sidewalk-lab-smart-city-project-threatens-privacy-and-human-rights-amnesty-intl-ca-says
Fatema Patrawala
30 Apr 2019
6 min read
Save for later

Google’s Sidewalk Lab smart city project threatens privacy and human rights: Amnesty Intl, CA says

Fatema Patrawala
30 Apr 2019
6 min read
Sidewalk Toronto, a joint venture between Sidewalk Labs, which is owned by Google parent company Alphabet Inc., and Waterfront Toronto, is proposing a high-tech neighbourhood called Quayside for the city’s eastern waterfront. In March 2017, Waterfront Toronto had shared a Request for proposal for this project with the Sidewalk Labs team. It ultimately got approval by Oct 2017 and is currently led by Eric Schmidt, Alphabet Inc CEO and Daniel Doctoroff, Sidewalk Labs CEO. As per reports from Daneilla Barreto, a digital activism coordinator for Amnesty International Canada, the project will normalize the mass surveillance and is a direct threat to human rights. https://twitter.com/AmnestyNow/status/1122932137513164801 The 12-acre smart city, which will be located between East Bayfront and the Port Lands, promises to tackle the social and policy challenges affecting Toronto: affordable housing, traffic congestion and the impacts of climate change. Imagine self-driving vehicles shuttling you around a 24/7 neighbourhood featuring low-cost, modular buildings that easily switch uses based on market demand. Picture buildings heated or cooled by a thermal grid that doesn’t rely on fossil fuels, or garbage collection by industrial robots. Underpinning all of this is a network of sensors and other connected technology that will monitor and track environmental and human behavioural data. That last part about tracking human data has sparked concerns. Much ink has been spilled in the press about privacy protections and the issue has been raised repeatedly by citizens in two of four recent community consultations held by Sidewalk Toronto. They have proposed to build the waterfront neighbourhood from scratch, embed sensors and cameras throughout and effectively create a “digital layer”. This digital layer may result monitoring actions of individuals and collection of their data. In the Responsible Data Use Policy Framework released last year, the Sidewalk Toronto team made a number of commitments with regard to privacy, such as not selling personal information to third parties or using it for advertising purposes. Daneilla further argues that privacy was declared a human right and is protected under the Universal Declaration of Human Rights adopted by the United Nations in 1948. However, in the Sidewalk Labs conversation, privacy has been framed as a purely digital tech issue. Debates have focused on questions of data access, who owns it, how will it be used, where it should all be stored and what should be collected. In other words it will collect the minutest information of an individual’s everyday living. For example, track what medical offices they enter, what locations they frequent and who their visitors are, in turn giving away clues to physical or mental health conditions, immigration status, whether if an individual is involved in any kind of sex work, their sexual orientation or gender identity or, the kind of political views they might hold. It will further affect their health status, employment, where they are allowed to live, or where they can travel further down the line. All of these raise a question: Do citizens want their data to be collected at this scale at all? And this conversation remains long overdue. Not all communities have agreed to participate in this initiative as marginalized and racialized communities will be affected most by surveillance. The Canadian Civil Liberties Association (CCLA) has threatened to sue Sidewalk Toronto project, arguing that privacy protections should be spelled out before the project proceeds. Toronto’s Mayor John Tory showed least amount of interest in addressing these concerns during a panel on tech investment in Canada at South by Southwest (SXSW) on March 10. Tory was present in the event to promote the city as a go-to tech hub while inviting the international audience at SXSW at the other industry events. Last October, Saadia Muzaffar announced her resignation from Waterfront Toronto's Digital Strategy Advisory Panel. "Waterfront Toronto's apathy and utter lack of leadership regarding shaky public trust and social license has been astounding," the author and founder of TechGirls Canada said in her resignation letter. Later that month, Dr. Ann Cavoukian, a privacy expert and consultant for Sidewalk Labs, put her resignation too. As she wanted all data collection to be anonymized or "de-identified" at the source, protecting the privacy of citizens. Why big tech really want your data? Data can be termed as a rich resource or the “new oil” in other words. As it can be mined in a number of ways, from licensing it for commercial purposes to making it open to the public and freely shareable.  Apparently like oil, data has the power to create class warfare, permitting those who own it to control the agenda and those who don’t to be left at their mercy. With the flow of data now contributing more to world GDP than the flow of physical goods, there’s a lot at stake for the different players. It can benefit in different ways as for the corporate, it is the primary beneficiaries of personal data, monetizing it through advertising, marketing and sales. For example, Facebook for past 2 to 3 years has repeatedly come under the radar for violating user privacy and mishandling data. For the government, data may help in public good, to improve quality of life for citizens via data--driven design and policies. But in some cases minorities and poor are highly impacted by the privacy harms caused due to mass surveillance, discriminatory algorithms among other data driven technological applications. Also public and private dissent can be discouraged via mass surveillance thus curtailing freedom of speech and expression. As per NY Times report, low-income Americans have experienced a long history of disproportionate surveillance, the poor bear the burden of both ends of the spectrum of privacy harms; are subject to greater suspicion and monitoring while applying for government benefits and live in heavily policed neighborhoods. In some cases they also lose out on education and job opportunities. https://twitter.com/JulieSBrill/status/1122954958544916480 In more promising news, today the Oakland Privacy Advisory Commission released 2 key documents one on the Oakland privacy principles and the other on ban on facial recognition tech. https://twitter.com/cfarivar/status/1123081921498636288 They have given emphasis to privacy in the framework and mentioned that, “Privacy is a fundamental human right, a California state right, and instrumental to Oaklanders’ safety, health, security, and access to city services. We seek to safeguard the privacy of every Oakland resident in order to promote fairness and protect civil liberties across all of Oakland’s diverse communities.” Safety will be paramount for smart city initiatives, such as Sidewalk Toronto. But we need more Oakland like laws and policies that protect and support privacy and human rights. One where we are able to use technology in a safe way and things aren’t happening that we didn’t consent to. #NotOkGoogle: Employee-led town hall reveals hundreds of stories of retaliation at Google Google announces new policy changes for employees to report misconduct amid complaints of retaliation and harassment #GoogleWalkout organizers face backlash at work, tech workers show solidarity
Read more
  • 0
  • 0
  • 3745

article-image-ai-can-now-help-speak-your-mind-uc-researchers-introduce-a-neural-decoder-that-translates-brain-signals-to-natural-sounding-speech
Bhagyashree R
29 Apr 2019
4 min read
Save for later

AI can now help speak your mind: UC researchers introduce a neural decoder that translates brain signals to natural-sounding speech

Bhagyashree R
29 Apr 2019
4 min read
In a research published in the Nature journal on Monday, a team of neuroscientists from the University of California, San Francisco, introduced a neural decoder that can synthesize natural-sounding speech based on brain activity. This research was led by Gopala Anumanchipalli, a speech scientist, and Josh Chartier, a bioengineering graduate student in the Chang lab. It is being developed in the laboratory of Edward Chang, a Neurological Surgery professor at University of California. Why is this neural decoder being introduced? There are many cases of people losing their voice because of stroke, traumatic brain injury, or neurodegenerative diseases such as Parkinson’s disease, multiple sclerosis, and amyotrophic lateral sclerosis. Currently,assistive devices that track very small eye or facial muscle movements to enable people with severe speech disabilities express their thoughts by writing them letter-by-letter, do exist. However, generating text or synthesized speech with such devices is often time consuming, laborious, and error-prone. Another limitation these devices have is that they only permit generating a maximum of 10 words per minute, compared to the 100 to 150 words per minute of natural speech. This research shows that it is possible to generate a synthesized version of a person’s voice that can be controlled by their brain activity. The researchers believe that in future, this device could be used to enable individuals with severe speech disability to have fluent communication. It could even reproduce some of the “musicality” of the human voice that expresses the speaker’s emotions and personality. “For the first time, this study demonstrates that we can generate entire spoken sentences based on an individual’s brain activity,” said Chang. “This is an exhilarating proof of principle that with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss.” How does this system work? This research is based on another study by Josh Chartier and Gopala K. Anumanchipalli, which shows how the speech centers in our brain choreograph the movements of the lips, jaw, tongue, and other vocal tract components to produce fluent speech. In this new study, Anumanchipalli and Chartier asked five patients being treated at the UCSF Epilepsy Center to read several sentences aloud. These patients had electrodes implanted into their brains to map the source of their seizures in preparation for neurosurgery. Simultaneously, the researchers recorded activity from a brain region known to be involved in language production. The researchers used the audio recordings of volunteer’s voice to understand the vocal tract movements needed to produce those sounds. With this detailed map of sound to anatomy in hand, the scientists created a realistic virtual vocal tract for each volunteer that could be controlled by their brain activity. The system comprised of two neural networks: A decoder for transforming brain activity patterns produced during speech into movements of the virtual vocal tract. A synthesizer for converting these vocal tract movements into a synthetic approximation of the volunteer’s voice. Here’s a video depicting the working of this system: https://www.youtube.com/watch?v=kbX9FLJ6WKw&feature=youtu.be The researchers observed that the synthetic speech produced by this system was much better as compared to the synthetic speech directly decoded from the volunteer’s brain activity. The generated sentences were also understandable to hundreds of human listeners in crowdsourced transcription tests conducted on the Amazon Mechanical Turk platform. The system is still in its early stages. Explaining its limitations, Chartier said, “We still have a ways to go to perfectly mimic spoken language. We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available.” Read the full report on UCSF’s official website. OpenAI introduces MuseNet: A deep neural network for generating musical compositions Interpretation of Functional APIs in Deep Neural Networks by Rowel Atienza Google open-sources GPipe, a pipeline parallelism Library to scale up Deep Neural Network training  
Read more
  • 0
  • 0
  • 16450