Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-do-google-ads-secretly-track-stack-overflow-users

27 Jun 2019

5 min read

Do Google Ads secretly track Stack Overflow users?

27 Jun 2019

Update: A day after a user found a bug on Stack Overflow’s devtools website, Nick Craver, the Architecture Lead for Stack Overflow, has updated users on their working. He says that the fingerprinting issue has emerged from the ads relayed through 3rd party providers. Stack Overflow has been reaching out to experts and the Google Chrome security team and has also filed a bug in the Chrome tracker. Stack Overflow has contacted Google, their ad server for assistance and are testing deployment of Safe Frame to all ads. The Safe Frame API will configure if all ads on the page should be forced to be rendered using a SafeFrame container. Stack Overflow is also trying to deploy the Feature-Policy header to block access to most browser features from all components in the page. Craver has also specified in the update that Stack Overflow has decided not to turn off these ad campaigns swiftly, as they need the repro to fix these issues. A user by the name greggman has discovered a bug on Stack Overflow’s devtools website. Today, while working on his browser's devtools website, he noticed the following message: Image source: Stack Overflow Meta website greggman then raised the query “Why is Stack Overflow trying to start audio?” on the Stack Overflow Meta website, which is intended for bugs, features, and discussion of Stack Overflow for its users. He then found out that the above message appears whenever a particular ad is appearing on the website. The ad is from Microsoft via Google. Image source: Stack Overflow Meta Website Later another user, TylerH did an investigation and revealed some intriguing information about the identified bug. He found out that the Google Ad is employing the audio API, to collect information from the users’ browser, in an attempt to fingerprint it. He says that “This isn't general speculation, I've spent the last half hour going though the source code linked above, and it goes to considerable lengths to de-anonymize viewers. Your browser may be blocking this particular API, but it's not blocking most of the data.” TylerH claims that this fingerprint tracking of users is definitely not done for legitimate feature detection. He adds that this technique is done in aggregate to generate a user fingerprint, which is included along with the advertising ID, while recording analytics for the publisher. This is done to detect the following : Users’ system resolution and accessibility settings The audio API capabilities, supported by the users’ browser The mobile browser-specific APIs, supported by the users’ browser TylerH states that this bug can detect many other details about the user, without the users’ consent. Hence he issues a warning to all Stack Overflow users to “Use an Ad blocker!” As both these findings gained momentum on the Stack Overflow Meta website, Nick Craver, the Architecture Lead for Stack Overflow replied to greggman and TylerH, “Thanks for letting us know about this. We are aware of it. We are not okay with it.” Craver also mentioned that Stack Overflow has reached out to Google, to obtain their support. He also notified users that “This is not related to ads being tested on the network and is a distinctly separate issue. Programmatic ads are not being tested on Stack Overflow at all.” Users are annoyed at this response by Craver. Many are not ready to believe that the Architecture Lead for Stack Overflow did not have any idea about this and is now going to work on it. A user on Hacker News comments that this response from Craver “encapsulates the entire problem with the current state of digital advertising in 1 simple sentence.” Few users feel like this is not surprising at all, as all websites use ads as tracking mechanisms. A HN user says that “Audio feature detection isn't even a novel technique. I've seen trackers look at download stream patterns to detect whether or not BBR congestion control is used, I have seen mouse latency based on the difference between mouse ups and downs in double clocks and I have seen speed-of-interaction checks in mouse movements.” Another comment reads, “I think ad blocking is a misnomer. What people are trying to do when blocking ads is prevent marketing people from spying on them. And the performance and resource consumption that comes from that. Personal opinion: Laws are needed to make what advertisers are doing illegal. Advertisers are spying on people to the extent where if the government did it they'd need a warrant.” While there is another user, who thinks that the situation is not that bad, with Stack Overflow at least taking responsibility of this bug. The user on Hacker News wrote, “Let's be adults here. This is SO, and I imagine you've used and enjoyed the use of their services just like the rest of us. Support them by letting passive ads sit on the edge of the page, and appreciate that they are actually trying to solve this issue.” Approx. 250 public network users affected during Stack Overflow’s security attack Stack Overflow confirms production systems hacked Facebook again, caught tracking Stack Overflow user activity and data

0
0
2951

article-image-elastic-stack-7-2-0-releases-elastic-siem-and-general-availability-of-elastic-app-search

Vincy Davis

27 Jun 2019

4 min read

Elastic Stack 7.2.0 releases Elastic SIEM and general availability of Elastic App Search

Vincy Davis

27 Jun 2019

4 min read

Yesterday, the team behind Elastic Stack announced the release of Elastic Stack 7.2.0. The major highlight of this release is the free availability of Elastic SIEM (Security information and event management) as a part of Elastic’s default distribution. The Elastic SIEM app provides interactivity, ad hoc search, responsive drill downs and packages it into an intuitive product experience. Elastic Stack 7.2.0 also comes with the free availability of the Elastic app search for its users, which was only available as a hosted service up until now. With this release, Elastic has advanced the Kubernetes and container monitoring initiative to include the monitoring of the NATS open source messaging system, CoreDNS, and to support the CRI-O format container logs. https://youtu.be/bmx13X87e2s What is Elastic SIEM? The SIEM app is an interactive UI workspace for security teams to triage events and perform initial investigations. It assigns a Timeline Event Viewer which allows analysts to gather and store evidence of an attack, pin and comment on relevant events, and share their findings all from within Kibana. Kibana is an open source data visualization plugin for Elasticsearch. Elastic SIEM is being introduced as a beta in the 7.2 release of the Elastic Stack. Image Source: Elastic blog The Elastic SIEM app enables analysis of host-related and network-related security events as part of alert investigations or interactive threat hunting, including the following: The Hosts view in the SIEM app provides key metrics regarding host-related security events, and a set of data tables that enable interaction with the Timeline Event Viewer. The Network view in the SIEM app informs analysts of key network activity metrics, facilitates investigation time enrichment, and provides network event tables that enable interaction with the Timeline Event Viewer. Analysts can easily drag objects of interest into the Timeline Event Viewer to create the required query filter to get to the bottom of an alert. With Auto-saving, it is possible to ensure that the results of the investigation are available for incident response teams. Elastic SIEM is available on the Elasticsearch Service on Elastic Cloud, or for download. Since this a major feature of Elastic Stack, it has got people quite excited. https://twitter.com/cbnetsec/status/1143661272594096128 https://twitter.com/neu5ron/status/1143623893476958208 https://twitter.com/netdogca/status/1143581280837107714 https://twitter.com/tommyyyyyyyy/status/1143791589325725696 General availability of Elastic App Search on-premise With the Elastic Stack 7.2.0 version, the Elastic App Search product is going to be freely available for users as a downloadable, self-managed search solution. Though Elastic App Search has been around for over a decade as a cloud-based solution, users of Elastic will have a greater flexibility to build fluid and engaging search experiences. As part of this release, the below services will be offered in a downloadable form: Simple and focused data ingestion Powerful search APIs and UI frameworks Insightful analytics Intuitive relevance controls Elastic Stack 7.2.0 is also introducing the Metrics Explorer. It will enable users to quickly visualize the most important infrastructure metrics and interact with them using common tags and chart groupings inside the Infrastructure app. With this feature, users can create a chart and see on the dashboard. Other Highlights Elasticsearch simplifies search-as-you-type, adds a UI around snapshot/restore, gives more control over relevance without sacrificing performance, and much more. Kibana makes it even easier to build a secure, multi-tenant Kibana instance with advanced RBAC for Spaces. Elastic Stack 7.2.0 has also introduced kiosk mode for Canvas, and the maps created in the new Maps app can now be embedded in any Kibana dashboard. There are also new easy-on-your-eyes dark-mode map tiles and much more. Beats improves edge-based processing with a new JavaScript processor, and more. Logstash gets faster with the Java execution pipeline going GA. It now fully supports JMS as an input and output, and more. Users are very impressed with the features introduced in Elastic Stack 7.2.0 https://twitter.com/mikhail_khusid/status/1143695869411307526 https://twitter.com/markcartertm/status/1143652867284189184 Visit the Elastic blog for more details. Core security features of Elastic Stack are now free! Elasticsearch 7.0 rc1 releases with new allocation and security features Elastic Stack 6.7 releases with Elastic Maps, Elastic Update and much more!

0
0
6924

article-image-introducing-tensorwatch-a-debugging-and-visualization-tool

Amrata Joshi

26 Jun 2019

3 min read

Introducing TensorWatch, a debugging and visualization tool

Amrata Joshi

26 Jun 2019

3 min read

Yesterday, the team at Microsoft introduced TensorWatch, an open source debugging and visualization tool designed for deep learning, data science, and reinforcement learning. https://twitter.com/MSFTResearch/status/1143574610820026368 TensorWatch works in Jupyter Notebook and shows real-time visualization of machine learning training.It can also perform several key analysis tasks for models and data. It is flexible and extensible so that users can build their own custom visualizations, UIs, and dashboards. It can execute arbitrary queries against live ML training process and return a stream as a result of the query and view this stream by using a visualizer. TensorWatch is under development and aims to provide a platform for debugging machine learning in an easy to use, extensible, and hackable package. The official blog post reads, “We like to think of TensorWatch as the Swiss Army knife of debugging tools with many advanced capabilities researchers and engineers will find helpful in their work. We presented TensorWatch at the 2019 ACM SIGCHI Symposium on Engineering Interactive Computing Systems.” Key features of TensorWatch Easy customization and visualizations TensorWatch uses Jupyter Notebook instead of prepackaged user interfaces that are often difficult to customize. It provides an interactive debugging of real-time training processes that either uses a composable UI in Jupyter Notebooks or live shareable dashboards in Jupyter Lab. As TensorWatch is a Python library, users can now build their own custom UIs or can use TensorWatch in the vast Python data science ecosystem. It supports several standard visualization types, including histograms, bar charts, pie charts, and 3D variations. Streams As per the architecture of TensorWatch, data and other objects such as files, console, sockets, cloud storage, and even visualizations themselves are considered as streams. TensorWatch streams can listen to other streams that leads to the creation of custom data flow graphs. TensorWatch allows users to implement a variety of advanced scenarios. The blog post reads, “For example, you can render many streams into the same visualization, or one stream can be rendered in many visualizations simultaneously, or a stream can be persisted in many files, or not persisted at all.” Lazy logging mode With TensorWatch, the team introduced lazy logging mode which doesn’t require explicit logging of all the information beforehand. TensorWatch helps users to observe and track variables including large models or entire batches during the training. It allows users to perform interactive queries that can run in the context of these variables and further returns the streams as a result. The blog reads, “For example, you can write a lambda expression that computes mean weight gradients in each layer in the model at the completion of each batch and send the result as a stream of tensors that can be plotted as a bar chart.” Users seem to be excited about this news as TensorWatch will help visualize streams of data in real time. https://twitter.com/CSITsites/status/1143735826028908544 https://twitter.com/alxndrkalinin/status/1136386187336269834 https://twitter.com/RitchieNg/status/1133678155015704576 To know more about this news, check out Microsoft’s blog post. Docker and Microsoft collaborate over WSL 2, future of Docker Desktop for Windows is near Microsoft finally makes Hyper-V Server 2019 available, after a delay of more than six months Microsoft quietly deleted 10 million faces from MS Celeb, the world’s largest facial recognition database

0
0
1586

Vincy Davis

26 Jun 2019

3 min read

Apache Kafka 2.3 is here!

Vincy Davis

26 Jun 2019

3 min read

Two days ago, the Apache Kafka team released the latest version of their open source distributed data streaming software, Apache Kafka 2.3. This release has several improvements to the Kafka Core, Connect and Streams REST API. In this release, a new Maximum Log Compaction Lag has been added. It has also improved monitoring for partitions, and fairness in SocketServer processors and much more. What’s new in Apache Kafka 2.3? Kafka Core Reduced the amount of time the broker spends scanning log files JIRA optimizes a process such that Kafka has to check its log segments only. In the earlier versions, the time required for log recovery was not proportional to the number of logs. With Kafka 2.3, it has become proportional to the number of unflushed log segments and has made a 50% reduction in broker startup time. Improved monitoring for partitions which have lost replicas In this release, Kafka Core has added metrics showing partitions that have exactly the minimum number of in-sync replicas. By monitoring these metrics, users can see partitions that are on the verge of becoming under-replicated. Also, the --under-min-isr command line flag has been added to the kafka-topics command. This will allow users to easily see which topics have fewer than the minimum number of in-sync replicas. Added a Maximum Log Compaction Lag In the earlier versions, after the latest key is written, the previous key values in a first-order approximation would get compacted after some time. With this release, it will now be possible to set the maximum amount of time for an old value to stick around. The new parameter max.log.compation.time.ms will specify how long an old value may possibly live in a compacted topic. This will enable Apache Kafka to comply with data retention regulations such as the GDPR. Improved fairness in SocketServer processors Apache Kafka 2.3 will prioritize existing connections over new ones and will improve the broker’s resilience to connection storms. It also adds a max.connections per broker setting. Core Kafka has also improved failure handling in the Replica Fetcher. Incremental Cooperative Rebalancing in Kafka Connect In Kafka Connect, worker tasks are distributed among the available worker nodes. When a connector is reconfigured or a new connector is deployed-- as well as when a worker is added or removed-- the tasks must be rebalanced across the Connect cluster. This helps ensure that all of the worker nodes are doing a fair share of the Connect work. With Kafka 2.3, it will be possible to make configuration changes easier. Kafka Connect has also added connector contexts to Connect worker logs. Kafka Streams Users are allowed to store record timestamps in RocksDB Kafka Streams will have timestamps included in the state store. This will lay the groundwork to ensure future features like handling out-of-order messages in KTables and implementing TTLs for KTables. Added in-memory window store and session Store This release has an in-memory implementation for the Kafka Streams window store and session store. The in-memory implementations provide higher performance, in exchange for lack of persistence to disk. Kafka Streams has also added KStream.flatTransform and KStream.flatTransformValues. https://twitter.com/apachekafka/status/1138872848678653952 These are some of the select updates, head over to the Apache blog for more details. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is now generally available Confluent, an Apache Kafka service provider adopts a new license to fight against cloud service providers Twitter adopts Apache Kafka as their Pub/Sub System

0
0
2901

article-image-how-verizon-and-a-bgp-optimizer-caused-a-major-internet-outage-affecting-amazon-facebook-cloudflare-among-others

Savia Lobo

25 Jun 2019

5 min read

How Verizon and a BGP Optimizer caused a major internet outage affecting Amazon, Facebook, CloudFlare among others

Savia Lobo

25 Jun 2019

5 min read

Yesterday, many parts of the Internet faced an unprecedented outage as Verizon, the popular Internet transit provider accidentally rerouted IP packages after it wrongly accepted a network misconfiguration from a small ISP in Pennsylvania, USA. According to The Register, “systems around the planet were automatically updated, and connections destined for Facebook, Cloudflare, and others, ended up going through DQE and Allegheny, which buckled under the strain, causing traffic to disappear into a black hole”. According to Cloudflare, “What exacerbated the problem today was the involvement of a “BGP Optimizer” product from Noction. This product has a feature that splits up received IP prefixes into smaller, contributing parts (called more-specifics). For example, our own IPv4 route 104.20.0.0/20 was turned into 104.20.0.0/21 and 104.20.8.0/21”. Many Google users were unable to access the web using the Google browser. Some users say the Google Calendar went down too. Amazon users were also unable to use some services such as Amazon books, as users were unable to reach the site. Source: Downdetector Source:Downdetector Source:Downdetector Also, in another incident, on June 6, more than 70,000 BGP routes were leaked from Swiss colocation company Safe Host to China Telecom in Frankfurt, Germany, which then announced them on the global internet. “This resulted in a massive rerouting of internet traffic via China Telecom systems in Europe, disrupting connectivity for netizens: a lot of data that should have gone to European cellular networks was instead piped to China Telecom-controlled boxes”, The Register reports. BGP caused a lot of blunder in this outage The Internet is made up of networks called Autonomous Systems (AS), and each of these networks has a unique identifier, called an AS number. All these networks are interconnected using a Border Gateway Protocol (BGP), which joins these networks together and enables traffic to travel from an ISP to a popular website at a far off location, for example. Source: Cloudflare With the help of BGP, networks exchange route information that can either be specific, similar to finding a specific city on your GPS, or very general, like pointing your GPS to a state. DQE Communications with an AS number AS33154, an Internet Service Provider in Pennsylvania was using a BGP optimizer in their network. It announced these specific routes to its customer, Allegheny Technologies Inc (AS396531), a steel company based in Pittsburgh. This entire routing information was sent to Verizon (AS701), who further accepted and passed this information to the world. “Verizon’s lack of filtering turned this into a major incident that affected many Internet services”, Cloudfare mentions. “What this means is that suddenly Verizon, Allegheny, and DQE had to deal with a stampede of Internet users trying to access those services through their network. None of these networks were suitably equipped to deal with this drastic increase in traffic, causing disruption in service” Job Snijders, an internet architect for NTT Communications, wrote in a network operators' mailing list, “While it is easy to point at the alleged BGP optimizer as the root cause, I do think we now have observed a cascading catastrophic failure both in process and technologies.” https://twitter.com/bgpmon/status/1143149817473847296 Cloudflare's CTO Graham-Cumming told El Reg's Richard Speed, "A customer of Verizon in the US started announcing essentially that a very large amount of the internet belonged to them. For reasons that are a bit hard to understand, Verizon decided to pass that on to the rest of the world." "but normally [a large ISP like Verizon] would filter it out if some small provider said they own the internet", he further added. “If Verizon had used RPKI, they would have seen that the advertised routes were not valid, and the routes could have been automatically dropped by the router”, Cloudflare said. https://twitter.com/eastdakota/status/1143182575680143361 https://twitter.com/atoonk/status/1143139749915320321 Rerouting is highly dangerous as criminals, hackers, or government-spies could be lurking around to grab such a free flow of data. However, this creates security distension among users as their data can be used for surveillance, disruption, and financial theft. Cloudflare was majorly affected by this outage, “It is unfortunate that while we tried both e-mail and phone calls to reach out to Verizon, at the time of writing this article (over 8 hours after the incident), we have not heard back from them, nor are we aware of them taking action to resolve the issue”, the company said in their blogpost. One of the users commented, “BGP needs a SERIOUS revamp with Security 101 in mind.....RPKI + ROA's is 100% needed and the ISPs need to stop being CHEAP. Either build it by Federal Requirement, at least in the Nation States that take their internet traffic as Citizen private data or do it as Internet 3.0 cause 2.0 flaked! Either way, "Path Validation" is another component of BGP that should be looked at but honestly, that is going to slow path selection down and to instrument it at a scale where the internet would benefit = not worth it and won't happen. SMH largest internet GAP = BGP "accidental" hijacks” Verizon in a statement to The Register said, "There was an intermittent disruption in internet service for some [Verizon] FiOS customers earlier this morning. Our engineers resolved the issue around 9 am ET." https://twitter.com/atoonk/status/1143145626516914176 To know more about this news in detail head over to CloudFlare’s blog. OpenSSH code gets an update to protect against side-channel attacks Red Badger Tech Director Viktor Charypar talks monorepos, lifelong learning, and the challenges facing open source software [Interview] Facebook signs on more than a dozen backers for its GlobalCoin cryptocurrency including Visa, Mastercard, PayPal and Uber

0
0
4798

article-image-google-researchers-present-xlnet-a-new-pre-training-method-that-outperforms-bert-on-20-tasks

Amrata Joshi

25 Jun 2019

7 min read

CMU and Google researchers present XLNet: a new pre-training method for language modeling that outperforms BERT on 20 tasks

Amrata Joshi

25 Jun 2019

7 min read

Last week, Carnegie Mellon University (CMU) and Google researchers presented a paper XLNet: Generalized Autoregressive Pretraining for Language Understanding which focuses on the XLNet model. https://twitter.com/quocleix/status/1141511813709717504 In this paper, the researchers have explained about the XLNet and how it uses a permutation language modeling objective for combining the advantages of AR and AE methods. The researchers compared XLNet with BERT and they have shown with examples that XLNet was able to surpass BERT on 20 tasks using the RACE, SQuAD and GLUE datasets. What is the need for XLNet Among different unsupervised pre-training objectives, autoregressive (AR) language modeling and autoencoding (AE) have been the two most successful pre-training objectives. Also, AR language modeling estimates the probability distribution of a text corpus with an autoregressive model. This language model has been only trained to encode a uni-directional context and is not effective at modeling deep bidirectional contexts. But the downstream language understanding tasks usually need bidirectional context information and which results in a gap between AR language modeling and effective pretraining. In contrast, AE based pretraining does not perform density estimation but it works towards reconstructing the original data from corrupted input. As density estimation is not part of the objective, BERT can utilize bidirectional contexts for reconstruction which also closes the bidirectional information gap in AR language modeling and improves performance. BERT (Bidirectional Encoder Representations from Transformers) achieves better performance than pretraining approaches that are based on autoregressive language modeling. But it relies on corrupting the input with masks and neglects dependency between the masked positions and also suffers from a discrepancy. Considering these pros and cons, the researchers from CMU and Google proposed XLNet, a generalized autoregressive pretraining method that: (1) enables learning bidirectional contexts by simply maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT because of its autoregressive formulation. XLNet also integrates ideas from Transformer-XL which is the state-of-the-art autoregressive model, into pretraining. It outperforms BERT on 20 tasks and usually by a large margin, and achieves state-of-the-art results on 18 tasks. These tasks include question answering, sentiment analysis, natural language inference, and document ranking. The researchers observed that applying a Transformer(-XL) architecture to permutation-based language modeling does not work as the factorization order is random and also the target is unclear. To solve this, the researchers proposed to reparameterize the Transformer(-XL) network for removing the ambiguity. https://twitter.com/rsalakhu/status/1141539269565132800?s=19 XLNet comparison with BERT While comparing with BERT, researchers observed that BERT and XLNet perform partial prediction, which means predicting only a subset of tokens in the sequence. It is important for BERT because in case, all the tokens are masked then it is impossible to make any meaningful predictions. Partial prediction plays a role in reducing optimization difficulty for both BERT and XLNet by predicting tokens with sufficient context. XLNet improves architectural designs for pretraining and improves the performance for tasks involving a longer text sequence. XLNet does not rely on data corruption so it does not suffer from the pretrain-finetune discrepancy that happens in the case of BERT. The autoregressive objective provides a natural way to use the product rule for factorizing the joint probability of the predicted tokens. This eliminates the independence assumption made in BERT. XLNet maximizes the expected log likelihood of a sequence with respect to all possible permutations of the factorization order instead of using a fixed forward or backward factorization order. According to the researchers, “BERT factorizes the joint conditional probability p(x¯ | xˆ) based on an independence assumption that all masked tokens x̄ are separately reconstructed (Given a text sequence x = [x1, · · · , xT ],). The researchers have called it as independence assumption, and according to them it disables BERT to model dependency between targets. The researchers explain the difference between XLNet and BERT with an example, “Let’s consider a concrete example [New, York, is, a, city]. Suppose both BERT and XLNet select the two tokens [New, York] as the prediction targets and maximize 6 log p(New York | is a city). Also suppose that XLNet samples the factorization order [is, a, city, New, York]. In this case, BERT and XLNet respectively reduce to the following objectives: JBERT = log p(New | is a city) + log p(York | is a city), JXLNet = log p(New | is a city) + log p(York | New, is a city). Notice that XLNet is able to capture the dependency between the pair (New, York), which is omitted by BERT.” In the above example, BERT learns some dependency pairs such as (New, city) and (York, city), so the researchers conclude that XLNet always learns more dependency pairs given the same target and contains “denser” effective training signals. Also, the XLNet objective comprises of more effective training signals that offer better performance. XLNet comparison with Language Model According to the researchers, standard AR language model like GPT (GUID Partition Table) is only able to cover the dependency (x = York, U = {New}) but not (x = New, U = {York}). On the other hand, XLNet is able to cover both in expectation overall factorization orders. This limitation of AR language modeling can be a critical issue in real-world applications. The researchers concluded that AR language modeling is not able to cover the dependency but XLNet is able to cover all dependencies in expectation. There has always been a gap between language modeling and pretraining because of the lack of the capability of bidirectional context modeling. But XLNet generalizes language modeling and bridges the gap. Implementation and conclusion The researchers used the BooksCorpus and English Wikipedia as part of their pre-training data, which contains 13GB plain text combined. They experimented on four datasets including RACE dataset, SQuAD dataset, ClueWeb09-B Dataset, and GLUE dataset. “They further studied three major aspects: The effectiveness of the permutation language modeling objective, especially compared to the denoising auto-encoding objective used by BERT. The importance of using Transformer-XL as the backbone neural architecture and employing segment-level recurrence (i.e. using memory). The necessity of some implementation details including span-based prediction, the bidirectional input pipeline, and next-sentence prediction.” The researchers concluded that XLNet is a generalized AR pre-training method and it uses a permutation language modeling objective for combining the advantages of AR and AE methods. According to them, the neural architecture of XLNet is developed to work seamlessly with the AR objective that integrates Transformer-XL. It also achieves state-of-the-art results in various tasks with improvement. The paper reads, “In the future, we envision applications of XLNet to a wider set of tasks such as vision and reinforcement learning.” A lot of users seem to be excited about this news and they think it can get even better. One of the users commented on Reddit, “The authors are currently trying to see the text generation capability of XLNet. If they confirm that it's on par with left-to-right model (hence better than BERT), then their work would be even more impressive.” Few others think that it will be better if the researchers use more diverse datasets for experimentation purpose. Another user commented, “The result seems to me as if the substantial improvement in this setting is coming mostly from the use of Transformer-XL (i.e. larger context size). Probably using more data and greater context size (and more diverse dataset) is far more important than doing anything else proposed in the paper.” Many others are excited about this research and think that XLNet is better than BERT. https://twitter.com/eturner303/status/1143174828804857856 https://twitter.com/ST4Good/status/1143182779460608001 https://twitter.com/alex_conneau/status/1141489936022953984 To know more about this, check out the paper XLNet: Generalized Autoregressive Pretraining for Language Understanding. Curl’s lead developer announces Google’s “plan to reimplement curl in Libcrurl” Google rejects all 13 shareholder proposals at its annual meeting, despite protesting workers Google Calendar was down for nearly three hours after a major outage

0
0
4188

article-image-bipartisan-us-legislators-introduce-the-dashboard-act-to-force-big-tech-to-disclose-their-user-data-monetization-practices

Sugandha Lahoti

25 Jun 2019

4 min read

Bipartisan US legislators introduce the Dashboard act to force big tech to disclose their user data monetization practices

Sugandha Lahoti

25 Jun 2019

4 min read

Bipartisan Senators Mark Warner and Josh Hawley introduced a new bill on Monday that requires Facebook, Google, Amazon and other major platforms to disclose the value of their users' data. Called The Dashboard Act (Designing Accounting Safeguards to Help Broader Oversight and Regulations on Data), this act will force companies (services with 100M active users) to regularly disclose to consumers the ways in which their data is being used, the third parties it is being shared with, and what their data is worth to the platform. The tech companies will have to undergo assessment of the data's value once every 90 days and file an annual report to the Securities and Exchange Commission. https://twitter.com/SenHawleyPress/status/1143177612786880519 The use of personal data for monetization purposes by tech companies has been a bone of contention for governments and activists. Consumers lack transparency in fully understanding the terms of the exchange and decide for themselves whether they are getting a fair deal from the platform companies that monetize their data. This serves as a major obstacle for agencies like the Federal Trade Commission (FTC) seeking to address competitive and consumer harms. “For years, social media companies have told consumers that their products are free to the user. But that’s not true — you are paying with your data instead of your wallet,” Warner said in a statement. “But the overall lack of transparency and disclosure in this market have made it impossible for users to know what they’re giving up, who else their data is being shared with, or what it’s worth to the platform,” he added. The bill suggests the following amendments Require commercial data operators (defined as services with over 100 million monthly users) to disclose the types of data collected as well as regularly provide their users with an assessment of the value of that data. Require commercial data operators to file an annual report on the aggregate value of user data they’ve collected, as well as contracts with third parties involving data collection. Require commercial data operators to allow users to delete all, or individual fields of data collected and disclose to users all the ways in which their data is being used. Empower the SEC (Securities and Exchange Commission) to develop methodologies for calculating data value, while encouraging the agency to facilitate flexibility to enable businesses to adopt methodologies that reflect the different uses, sectors, and business models With this bill, the senators want to serve three important goals. Consumers will be able to determine the true value of the data they are providing to platforms. Making the value more transparent could increase competition by attracting competitors to the market. Disclosing the economic value of consumer data will also assist antitrust enforcers in identifying unfair transactions and anti competitive transactions and practices. Public opinion on this bill was appreciative with people calling it a right move to protect user data. https://twitter.com/davidshepardson/status/1142936991790768128 https://twitter.com/profcarroll/status/1142975892442025985 However, some find it insufficient. Lindsey Barrett, a staff attorney at Georgetown Law’s Institute for Public Representation Communications and Technology Clinic noted that greater transparency might not change tech companies’ practices. https://twitter.com/LAM_Barrett/status/1142942803716182016 She also questioned how people are to martial the info this bill would give them into better decision-making. https://twitter.com/LAM_Barrett/status/1143169637732904960 ITIF also argues that Hawley gets “paying” with data wrong. https://twitter.com/ITIFdc/status/1143194101204094977 Previous to the DASHBOARD Act, Senator Hawley introduced the Do Not Track Act, last month. The Do Not Track Act would prohibit web companies from collecting more data than they need to operate their services. Per the act, “first parties” -- meaning sites users intentionally visit, like Amazon or Google's search engine -- will be prohibited from collecting or sharing data for ad targeting when they encounter users who have activated do-not-track. This act would be modeled after the Federal Trade Commission’s (FTC) “Do Not Call” list and allow users to opt out of non-essential data collection. This bill is introduced in Congress and is up for consideration by the Senate. Last month, DuckDuckGo, the browser known for its privacy protection policies, also proposed a draft legislation which will require sites to respect the Do Not Track browser setting. In March, presidential candidate Elizabeth Warren’s also proposed regulating “big tech companies” by breaking up Google and Facebook. A section-by-section summary of the Dashboard act is available here. Bill text is available here. UK’s data protection regulator ICO releases report concludes that Adtech industry operates illegally. Facebook fails to block ECJ data security case from proceeding Experts present most pressing issues facing global lawmakers on citizens’ privacy and rights to freedom of speech.

0
0
1045

article-image-now-there-is-a-deepfake-that-can-animate-your-face-with-just-your-voice-and-a-picture-using-temporal-gans

Savia Lobo

24 Jun 2019

6 min read

Now there is a Deepfake that can animate your face with just your voice and a picture using temporal GANs

Savia Lobo

24 Jun 2019

6 min read

Last week, researchers from the Imperial College in London and Samsung’s AI research center in the UK revealed how deepfakes can be used to generate a singing or talking video portrait by from a still image of a person and an audio clip containing speech. In their paper titled, “Realistic Speech-Driven Facial Animation with GANs”, the researchers have used temporal GAN which uses 3 discriminators focused on achieving detailed frames, audio-visual synchronization, and realistic expressions. Source: arxiv.org “The generated videos are evaluated based on sharpness, reconstruction quality, lip-reading accuracy, synchronization as well as their ability to generate natural blinks”, the researchers mention in their paper. https://youtu.be/9Ctm4rTdVTU Researchers used the GRID, TCD TIMIT, CREMA-D and LRW datasets. The GRID dataset has 33 speakers each uttering 1000 short phrases, containing 6 words randomly chosen from a limited dictionary. The TCD TIMIT dataset has 59 speakers uttering approximately 100 phonetically rich sentences each. The CREMA-D dataset includes 91 actors coming from a variety of different age groups and races utter 12 sentences. Each sentence is acted out by the actors multiple times for different emotions and intensities. Researchers have used the recommended data split for the TCD TIMIT dataset but exclude some of the test speakers and use them as a validation set. Researchers performed data augmentation on the training set by mirroring the videos. Metrics used to assess the quality of generated videos Researchers evaluated the videos using traditional image reconstruction and sharpness metrics. These metrics can be used to determine frame quality; however, they fail to reflect other important aspects of the video such as audio-visual synchrony and the realism of facial expressions. Hence they have also proposed alternative methods capable of capturing these aspects of the generated videos. Reconstruction Metrics This method uses common reconstruction metrics such as the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) index to evaluate the generated videos. However, the researchers reveal that “reconstruction metrics will penalize videos for any facial expression that does not match those in the ground truth videos”. Sharpness Metrics The frame sharpness is evaluated using the cumulative probability blur detection (CPBD) measure, which determines blur based on the presence of edges in the image. For this metric as well as for the reconstruction metrics larger values imply better quality. Content Metrics The content of the videos is evaluated based on how well the video captures the identity of the target and on the accuracy of the spoken words. The researchers have verified the identity of the speaker using the average content distance (ACD), which measures the average Euclidean distance of the still image representation, obtained using OpenFace from the representation of the generated frames. The accuracy of the spoken message is measured using the word error rate (WER) achieved by a pre-trained lip-reading model. They used the LipNet model which exceeds the performance of human lip-readers on the GRID dataset. For both content metrics, lower values indicate better accuracy. Audio-Visual Synchrony Metrics Synchrony is quantified in Joon Son Chung and Andrew Zisserman’s “Out of time: automated lip sync in the wild”. In this work Chung et al. propose the SyncNet network which calculates the euclidean distance between the audio and video encodings on small (0.2 second) sections of the video. The audio-visual offset is obtained by using a sliding window approach to find where the distance is minimized. The offset is measured in frames and is positive when the audio leads the video. For audio and video pairs that correspond to the same content, the distance will increase on either side of the point where the minimum distance occurs. However, for uncorrelated audio and video, the distance is expected to be stable. Based on this fluctuation they further propose using the difference between the minimum and the median of the Euclidean distances as an audio-visual (AV) confidence score which determines the audio-visual correlation. Higher scores indicate a stronger correlation, whereas confidence scores smaller than 0.5 indicate that Limitations and the possible misuse of Deepfake The limitation of this new Deepfake method is that it only works for well-aligned frontal faces. “the natural progression of this work will be to produce videos that simulate in wild conditions”, the researchers mention. While this research appears the next milestone for GANs in generating videos from still photos, it also may be misused for spreading misinformation by morphing video content from any still photograph. Recently, at the House Intelligence Committee hearing, Top House Democrat Rep. Adam Schiff (D-CA) issued a warning on Thursday that deepfake videos could have a disastrous effect on the 2020 election cycle. “Now is the time for social media companies to put in place policies to protect users from this kind of misinformation not in 2021 after viral deepfakes have polluted the 2020 elections,” Schiff said. “By then it will be too late.” The hearing came only a few weeks after a real-life instance of a doctored political video, where the footage was edited to make House Speaker Nancy Pelosi appear drunk, that spread widely on social media. “Every platform responded to the video differently, with YouTube removing the content, Facebook leaving it up while directing users to coverage debunking it, and Twitter simply letting it stand,” The Verge reports. YouTube took the video down; however, Facebook refused to remove the video. Neil Potts, Public Policy Director of Facebook had stated that if someone posted a doctored video of Zuckerberg, like one of Pelosi, it would stay up. After this, on June 11, a fake video of Mark Zuckerberg was posted on Instagram, under the username, bill_posters_uk. In the video, Zuckerberg appears to give a threatening speech about the power of Facebook. https://twitter.com/motherboard/status/1138536366969688064 Omer Ben-Ami, one of the founders of Canny says that the video is made to educate the public on the uses of AI and to make them realize the potential of AI. Though Zuckerberg’s video was to retain the educational value of Deepfakes, this shows the potential of how it can be misused. Although some users say it has interesting applications, many are concerned that the chances of misusing this software are more than putting it into the right use. https://twitter.com/timkmak/status/1141784420090863616 A user commented on Reddit, “It has some really cool applications though. For example in your favorite voice acted video game, if all of the characters lips would be in sync with the vocals no matter what language you are playing the game in, without spending tons of money having animators animate the characters for every vocalization.” To know more about this new Deepfake, read the official research paper. Lawmakers introduce new Consumer privacy bill and Malicious Deep Fake Prohibition Act to support consumer privacy and battle deepfakes Worried about Deepfakes? Check out the new algorithm that manipulate talking-head videos by altering the transcripts Machine generated videos like Deepfakes – Trick or Treat?

0
0
8681

article-image-amazon-patents-ai-powered-drones-to-provide-surveillance-as-a-service

Savia Lobo

21 Jun 2019

7 min read

Amazon patents AI-powered drones to provide ‘surveillance as a service’

Savia Lobo

21 Jun 2019

7 min read

At the first re:MARS event early this month Amazon proposed its plans to further digitize its delivery services by making the AI-powered drones deliver orders. Amazon was recently granted a US patent on June 4 for these ‘unmanned aerial vehicles (UAV) or drones’ to provide “surveillance as a service.” The patent which was filed on June 12, 2015, mentions how Amazon’s UAVs could keep an eye on customers’ property between deliveries while supposedly maintaining their privacy. “The property may be defined by a geo-fence, which may be a virtual perimeter or boundary around a real-world geographic area. The UAV may image the property to generate surveillance images, and the surveillance images may include image data of objects inside the geo-fence and image data of objects outside the geo-fence,” the patent states. A diagram from the patent shows how delivery drones could be diverted to survey a location. Source: USPTO According to The Telegraph, “The drones would look for signs of break-ins, such as smashed windows, doors left open, and intruders lurking on people’s property. Anything unusual could then be photographed and passed on to the customer and the police”. “Drones have long been used for surveillance, particularly by the military, but companies are now beginning to explore how they might be used for home security”, The Verge reports. Amazon’s competitor, Alphabet Inc.’s Wing, became the first drone to win an FAA approval to operate as a small airline, in April. However, Amazon received an approval to start making drone deliveries only in remote parts of the United States. Amazon says it hopes to launch a commercial service “in a matter of months.” The drones could be programmed to trigger automated text or phone alerts if the system’s computer-vision algorithms spot something that could be a concern. Those alerts might go to the subscriber, or directly to the authorities. “For example, if the surveillance event is the determination that a garage door was left open, an alert may be a text message to a user, while if the surveillance event is a fire, an alert may be a text message or telephone call to a security provider or fire department,” the inventors write. But this raises a lot of data privacy concerns as this may allow drones to peep into people’s houses and collect information they are not supposed to. However, Amazon’s patent stating that, “Geo-clipped surveillance images may be generated by physically constraining a sensor of the UAV, by performing pre-image capture processing, or post-image capture processing. Geo-clipped surveillance images may be limited to authorized property, so privacy is ensured for private persons and property.” Amazon has been curating a lot of user data using various products including the smart doorbell made by Ring, which Amazon bought for more than $1 billion in February last year. This smart doorbell sends a video feed customers can check and answer from their smartphone. Amazon launched Neighbors, a crime-reporting social network that encourages users to upload videos straight from their Ring security cameras and tag posts with labels like “Crime,” “Safety,” and “Suspicious.” Over 50 local US police departments have partnered with Ring to gain access to its owners’ security footage. Amazon’s Key allows Prime members to have packages delivered straight into their homes—if they install its smart lock on their door and Amazon security cameras inside their homes. Last month, the US House Oversight and Reform Committee held its first hearing on examining the use of ‘Facial Recognition Technology’. The hearing included discussion on the use of facial recognition by government and commercial entities, flaws in the technology, lack of regulation and its impact on citizen’s civil rights and liberties. Joy Buolamwini, founder of Algorithmic Justice League highlighted one of the major pressing points for the failure of this technology as ‘misidentification’, that can lead to false arrests and accusations, a risk especially for marginalized communities. Earlier this year in January, activist shareholders proposed a resolution to limit the sale of Amazon’s facial recognition tech called Rekognition to law enforcement and government agencies. Rekognition was found to be biased and inaccurate and is regarded as an enabler of racial discrimination of minorities. Rekognition, runs image and video analysis of faces, has been sold to two states; Amazon has also pitched it to Immigration and Customs Enforcement. Last month, Amazon shareholders rejected the proposal on ban of selling its facial recognition tech to governments. Amazon pushed back the claims that the technology is inaccurate, and called on the U.S. Securities and Exchange Commission to block the shareholder proposal prior to its annual shareholder meeting. While ACLU blocked Amazon’s efforts to stop the vote, amid growing scrutiny of its product. According to an Amazon spokeswoman, the resolutions failed by a wide margin. Amazon has defended its work and said all users must follow the law. It also added a web portal for people to report any abuse of the service. The votes were non-binding, thus, allowing the company to reject the outcome of the vote. In April, Bloomberg reported that Amazon workers “listen to voice recordings captured in Echo owners’ homes and offices. The recordings are transcribed, annotated and then fed back into the software as part of an effort to eliminate gaps in Alexa’s understanding of human speech and help it better respond to commands”. Also, this month, two lawsuits were filed in Seattle alleging that Amazon is recording voiceprints of children using its Alexa devices without their consent. This shows Amazon may be secretly collecting user’s data and now, with the surveillance drones they can gain access to user’s home on the whole. What more can a company driven on user data ask for? We’ll have to see if Amazon stays true to what they have stated in their patent. While drones hovering over for surveillance seems interesting, it is actually collecting large volumes of user data, and a lot of private information. Black hat hackers who use their skills to break into systems and access data and programs without the permission of the owners may gain access to this data, which is a risk. They can further sell the data to 3rd party buyers including advertisement companies who may further use it to forward advertisements on particular products they use. Amazon employees managing the data from these drones may also have certain access to this data. As a network administrator or security professional, the rights and privileges allow them access most of the data on the systems of user’s network. Also, one can easily decrypt the data if they have access to the recovery agent account. This creates an alarming state whether this extra private is data safe or not? On what level can intruders misuse this? According to The Verge, “Amazon has patented some pretty eccentric drone technologies over the years that have never made it to market; including a floating airship that could act as a warehouse for delivery drones, a parachute shipping label, and a system that lets a drone understand when you shout or wave at it”. https://twitter.com/drewharwell/status/1141712282184867840 https://twitter.com/drewharwell/status/1141793761283989504 To know more about ‘surveillance as a service’ read the patent. Amazon announces general availability of Amazon Personalize, an AI-based recommendation service US regulators plan to probe Google on anti-trust issues; Facebook, Amazon & Apple also under legal scrutiny Amazon shareholders reject proposals to ban sale of facial recognition tech to govt and to conduct independent review of its human and civil rights impact

0
0
3280

article-image-uks-data-protection-regulator-ico-releases-report-concludes-that-adtech-industry-operates-illegally

Sugandha Lahoti

21 Jun 2019

6 min read

UK’s data protection regulator ICO releases report concludes that Adtech industry operates illegally

Sugandha Lahoti

21 Jun 2019

6 min read

UK’s data protection regulator ICO (Information Commissioner’s office) has published a report highlighting how thousands of companies are sharing personal data on hundreds of millions every day without a legal basis. The report also says, how most of today's online advertising is illegal at a 'general, systemic' level. The report was in response to a series of complaints made in the UK around the security and legality of the adtech ecosystem. These complaints were made by Mr. Veale, an academician and Jim Killock, executive director of the Open Rights Group, as well as campaign group Privacy International. [box type="shadow" align="" class="" width=""]Adtech is a term used to describe tools that analyze and manage information (including personal data) for online advertising campaigns and automate the processing of advertising transactions. RTB (Real time bidding) uses adtech to enable the buying and selling of advertising inventory in real time on an impression by impression basis, typically involving an auction pricing mechanism. It is a type of online advertising that is most commonly used at present for selling visual inventory online, either on the website of a publisher or via a publisher’s app.[/box] RTB relies on the potential advertiser seeing information about you. That information can be as basic as the device you’re using to view the webpage, or where in the country you are. But it can have a more detailed picture, including the websites you visited, what your perceived interests are, even what health condition you’ve been searching for information about. The complexity of this type of online advertising poses a number of risks about the level of data protection compliance. Hence the ICO has investigated this issue and summarized how the ad tech sector should comply with GDPR. In this report, ICO has prioritized two areas: the processing of special category data, and issues caused by relying solely on contracts for data sharing across the supply chain. The report highlights “Under data protection law, using people’s sensitive personal data to serve adverts requires their explicit consent, which is not happening right now. Sharing people’s data with potentially hundreds of companies, without properly assessing and addressing the risk of these counterparties, raises questions around the security and retention of this data.” Key findings from ICO’s report Adtech is disregarding Special and Non-special category data Non-special category data is being processed unlawfully at the point of collection. Online advertisers believe that legitimate interests can be used for placing and/or reading a cookie or other technology (rather than obtaining the consent PECR requires). Even if an argument could be made for reliance on legitimate interests, participants within the ecosystem are unable to demonstrate that they have properly carried out the legitimate interests tests and implemented appropriate safeguards. Special category data- relating to especially sensitive data such as ethnic origin, health background, religion, political and sexual orientation- is also being processed unlawfully. This is because explicit consent is not being collected due to lack of proper data protection laws. DPIAs are tools that organizations can use to identify and minimize the data protection risks of any processing operation. Article 35 of the GDPR specifies several circumstances that require DPIAs where there is large scale processing of special category data. ICO states that there appears to be a lack of understanding of, and potentially compliance with, the DPIA requirements of data protection law. This increases the risks associated with RTB which are probably being not fully assessed and mitigated. Individuals have no control over their privacy ICO claims that the Privacy information provided to individuals lacks clarity as it is overly complex. Individuals have no guarantees about the security of their personal data within the ecosystem. Moreover, individual profiles are extremely detailed and repeatedly shared among organizations for any one bid request, all without the individuals’ knowledge. Not just that, these organizations are processing these bid requests with inadequate technical and organizational measures to secure the data in transit and at rest. There is also little to no consideration as to the requirements of data protection law about international transfers of personal data. ICO says organizations must understand, document and be able to demonstrate: how their processing operations work; what they do; who they share any data with; and how they can enable individuals to exercise their rights. Contract-only approach for data protection legislation should stop The adtech industry currently uses contractual controls to provide a level of guarantees about data protection-compliant processing of personal data. However, this contract-only approach does not satisfy the requirements of data protection legislation. Organizations cannot rely on standard terms and conditions by themselves, without undertaking appropriate monitoring and ensuring technical and organizational controls back up those terms. ICO says that the controllers must: assess the processor is competent to process personal data in line with the GDPR; put in place a contract or other legal act meeting the requirements in Article 28(3); and ensure a processor’s compliance on an ongoing basis, in order for the controller to comply with the accountability principle and demonstrate due diligence (such as audits and inspections). What’s next for ICO ICO states that its report requires further analysis and exploration. They will undertake targeted information-gathering activities related to the data supply chain and profiling aspects, the controls in place, and the DPIAs that have been undertaken, starting in July 2019. They will also continue targeted engagement with key stakeholders. They will continue bilateral engagement with IAB Europe and Google. They may also undertake a further industry review in six months’ time. The scope and nature of such an exercise will depend on their findings over the forthcoming months. As obvious, this report was well appreciated by netizens. https://twitter.com/mark_barratt/status/1141702170334695424 https://twitter.com/jason_kint/status/1141881508619313154 https://twitter.com/DataEthicsEU/status/1141943677687926784 However, some people had issues with it being just a guidance report, with a lack of real efforts. https://twitter.com/neil_neilzone/status/1141769209778778113 They also criticized the next steps section. https://twitter.com/WolfieChristl/status/1141698725015937024 https://twitter.com/mikarv/status/1141643837712080898 Another issue which cropped up was how in spite of issues, the adtech industry, is also responsible for generating a large percentage of revenues. https://twitter.com/jonmundy/status/1141960501485867009 Although, ICO gave its reply. “RTB is an innovative means of ad delivery, but one that lacks data protection maturity in its current implementation. Whilst it is more the practices than the underlying technology that concerns us, it’s also the case that, if an online service is looking to generate revenue from digital advertising, there are a number of different ways available to do this. RTB is just one of these. Whatever form organizations choose, if it involves either accessing or storing information on user devices, and/or the processing of personal data, there are laws that they have to comply with.” Read the full report here. GDPR complaint in EU claim billions of personal data leaked via online advertising bids European Union fined Google 1.49 billion euros for antitrust violations in online advertising GDPR complaint claims Google and IAB leaked ‘highly intimate data’ of web users for behavioral advertising.

0
0
3037

article-image-ftc-to-investigate-youtube-over-mishandling-childrens-data-privacy

Savia Lobo

20 Jun 2019

5 min read

FTC to investigate YouTube over mishandling children’s data privacy

Savia Lobo

20 Jun 2019

5 min read

The Federal Trade Commission (FTC) launched an investigation into YouTube over mishandling children’s private data and may levy the popular video-sharing website with a potential fine. This probe has already prompted the tech giant to reevaluate some of its business practices. Google, which owns YouTube, declined to comment on the investigation. A report from the Washington Post says this investigation was triggered by complaints from children’s health and privacy groups. These complaints mentioned that YouTube improperly collected data from kids using the video service, thus violating the Children’s Online Privacy Protection Act, a 1998 law known as COPPA that forbids the tracking and targeting of users younger than age 13. The Washington Post said, according to consumer advocates, “some of the problems highlighted by the YouTube investigation are shared by many of the most popular online services, including social media sites, such as Instagram and Snapchat, and games such as Fortnite”. YouTube has come under scrutiny for exposing children to dangerous conspiracy theories, hate speech, violence, sexual content and even for catering to pedophiles, the New York Times reported. “The companies say their services are intended for adults and that they take action when they find users who are underage. Still, they remain widely popular with children, especially preteens, according to surveys and other data, raising concerns that the companies’ efforts — and federal law — have not kept pace with the rapidly evolving online world”, the Washington Post reports. In February, Youtube received major criticism from companies and individuals for recommending videos of minors and allowing pedophiles to comment on these posts, with a specific time stamp of the video of when an exposed private part of the young child was visible. YouTube was also condemned for monetizing these videos allowing advertisements for major brands like Alfa Romeo, Fiat, Fortnite, Grammarly, L’Oreal, Maybelline, Metro: Exodus, Peloton and SingleMuslims.com, etc to be displayed on these videos. Read Also: YouTube disables all comments on videos featuring children in an attempt to curb predatory behavior and appease advertisers According to The Verge, “The YouTube app, although generally is safer than the main platform, it has faced an array of moderation challenges, including graphic discussions about pornography and suicide, explicit sexual language in cartoons, and modeling unsafe behaviors like playing with lit matches.” “One of the biggest requests that YouTube executives have received from policymakers, critics, and even some employees is to stop recommending videos that contain children”, The Verge reports. A YouTube spokesperson told The New York Times earlier this month that doing so would hurt creators. Instead, the company has limited “recommendations on videos that it deems as putting children at risk,” according to the Times. Marc Groman, a privacy lawyer who previously worked for the FTC and the White House, “YouTube is a really high-profile target, and for obvious reasons because all of our kids are on it.” “But the issues on YouTube that we’re all grappling with are elsewhere and everywhere.” In a statement to the Washington Post, Andrea Faville, a spokesperson from YouTube said, she emphasized that not all discussions about product changes come to fruition. “We consider lots of ideas for improving YouTube and some remain just that — ideas,” she said. “Others, we develop and launch, like our restrictions to minors live-streaming or updated hate speech policy.” The Wall Street Journal reported that YouTube was planning to migrate all children’s content off the service into a separate app, YouTube Kids, to better protect younger viewers from problematic material, “a change that would be difficult to implement because of the sheer volume of content on YouTube, and potentially could be costly to the company in lost advertising revenue.” David Monahan of the Campaign for a Commercial-Free Childhood, a Boston-based advocacy group told The Post, “YouTube’s business model puts profits first, and kids’ well-being last” “When we filed a COPPA complaint with the FTC a year ago, Google’s response was ridiculous — that YouTube is not a site for kids, when it’s actually the most popular children’s site on the Internet. We hope the FTC will act soon, and require YouTube to move all kids’ content to YouTube Kids with no marketing, no autoplay or recommendations, and strong protections for children’s privacy, he further added. https://twitter.com/CBSThisMorning/status/1141690074909892608 U.S. Senator Ed Markey said in a statement to Gizmodo, “In the coming weeks, I will introduce legislation that will combat online design features that coerce children and create bad habits, commercialization, and marketing that manipulate kids and push them into consumer culture, and the amplification of inappropriate and harmful content on the internet. It’s time for the adults in the room to step in and ensure that corporate profits no longer come before kids’ privacy.” To know more about this news in detail, head over to The Washington Post. YouTube CEO, Susan Wojcicki says reviewing content before upload isn’t a good idea YouTube’s new policy to fight online hate and misinformation misfires due to poor execution, as usual YouTube demonetizes anti-vaccination videos after Buzzfeed News reported that it is promoting medical misinformation

0
0
1485

article-image-facebook-content-moderators-work-in-filthy-stressful-conditions-and-experience-emotional-trauma-daily-reports-the-verge

Fatema Patrawala

20 Jun 2019

5 min read

Facebook content moderators work in filthy, stressful conditions and experience emotional trauma daily, reports The Verge

Fatema Patrawala

20 Jun 2019

5 min read

Yesterday, The Verge published a gut-wrenching investigative report about the terrible working conditions of Facebook moderators at one of its contract vendor sites in North America - Tampa. Facebook’s content moderation site in Tampa, Florida, is operated by the professional services firm Cognizant. It is one of the lowest-performing sites in North America and has never consistently enforced Facebook’s policies with 98 percent accuracy, as per Cognizant’s contract. In February, The Verge had published a similar report on the deplorable working conditions of content moderators who work at Facebook’s Arizona site. Both the reports are investigated and written by an acclaimed tech reporter, Casey Newton. But yesterday’s article is based on the investigation performed by interviewing 12 current and former moderators and managers at the Tampa site. In most cases, pseudonyms are used to protect employees from potential retaliation from Facebook and Cognizant. But for the first time, three former moderators for Facebook agreed to break their nondisclosure agreements and discuss working conditions at the site on the record. https://twitter.com/CaseyNewton/status/1141317045881069569 The working conditions for the content moderators are filled with filth and stress. To an extent that one of them is being reported dead due to such an emotional trauma that the moderators go through everyday. Keith Utley, was a lieutenant commander in the military and after his retirement he chose to work as Facebook moderator at the Tampa site. https://twitter.com/CaseyNewton/status/1141316396942602240 Keith worked the overnight shift and he moderated the worst stuff posted by users on daily basis on Facebook including the the hate speech, the murders, the child pornography. Utley had a heart attack at his desk and died last year. Senior management initially discouraged employees from discussing the incident and tried hiding the fact that Keith died, for fear it would hurt productivity. But Keith’s father visited the site to collect his belongings and broke emotionally and said, “My son died here”. The moderators further mention that the Tampa site has only one bathroom for all the 800 employees working at the site. And repeatedly the bathroom is found smeared with feces and menstrual blood. The office coordinators did not even care about cleaning the site and it was infested with bed bugs. Workers also found fingernails and pubic hair on their desk. “Bed bugs can be found virtually every place people tend to gather, including the workplace,” Cognizant said in a statement. “No associate at this facility has formally asked the company to treat an infestation in their home. If someone did make such a request, management would work with them to find a solution.” There are instances of sexual harassment as well at the work place and workers have filed two such cases since April. They are now before the US Equal Employment Opportunity Commission. Often there are cases of physical and verbal fights in the office and instances of things stolen from the office premises was common. One of the former moderators bluntly said to The Verge reporter that if anything needs change. It is only one thing that Facebook needs to shut down. https://twitter.com/jephjacques/status/1141330025897168897 There are many significant voices added to the shout of breaking Facebook, one of them includes Elizabeth Warren, US Presidential candidate for 2020, who wants to break the big tech. Another one comes from Chris Hughes, one of the founders of Facebook who published an op-ed on why he thinks it's time to break Facebook. In response to this investigation, Facebook spokesperson, Chris Harrison says they will conduct an audit of its partner sites and make other changes to promote the well-being of its contractors. He said the company would consider making more moderators full-time employees in the future, and hope to provide counseling for moderators after they leave. This news garnered public anger and rage towards Facebook, people have commented that Facebook defecates on humanity and profits enormously while getting away with it easily. https://twitter.com/pemullen/status/1141357359861645318 Another one reads that Facebook’s mission of connecting the world has been an abject failure and the world is worse off from being connected in the ways Facebook has done it. Additionally there are comments on how this story comes in as a reminder of how little these big tech firms care about people. https://twitter.com/stautistic/status/1141424512736485376 Author of the book Antisocial Media and a columnist at Guardian, Siva Vaidhyanathan, applauds Casey Newton, The Verge reporter for bringing up this story. But he also mentions that Casey has ignored the work of Sarah T. Roberts who had written an entire book on this topic called, Behind the Screens. https://twitter.com/sivavaid/status/1141330295376863234 Check out the full story covered by The Verge on their official blog post. Facebook releases Pythia, a deep learning framework for vision and language multimodal research After refusing to sign the Christchurch Call to fight online extremism, Trump admin launches tool to defend “free speech” on social media platforms How Genius used embedded hidden Morse code in lyrics to catch plagiarism in Google search results

0
0
2547

article-image-i-code-in-my-dreams-too-say-developers-in-jetbrains-state-of-developer-ecosystem-2019-survey

Fatema Patrawala

19 Jun 2019

5 min read

‘I code in my dreams too’, say developers in Jetbrains State of Developer Ecosystem 2019 Survey

Fatema Patrawala

19 Jun 2019

5 min read

Last week, Jetbrains published its annual survey results known as The State of Developer Ecosystem 2019. More than 19,000 people participated in this developer ecosystem survey. But responses from only 7000 developers from 17 countries were included in the report. The survey had over 150 questions and key results from the survey are published, complete results along with the raw data will be shared later. Jetbrains prepared an infographics based on the survey answers they received. Let us take a look at their key takeaways: Key takeaways from the survey Java is the most popular primary programming language. Python is the most studied language in 2019. Cloud services are getting more popular. The share of local and private servers dropped 8% and 3%, respectively, compared to 2018. Machine learning professionals have less fear that AI will replace developers one day. 44% of JavaScript developers use TypeScript regularly. In total, a quarter of all developers are using it in 2019, compared to 17% last year. The use of containerized environments by PHP developers is growing steadily by 12% per year. 73% of Rust devs use a Unix / Linux development environment, though Linux is not a primary environment for most of them. Go Modules appeared recently, but already 40% of Go developers use it and 17% want to migrate to it. 71% of Kotlin developers use Kotlin for work, mainly for new projects (96%), but more than a third are also migrating their existing projects to it. The popularity of Vue.js is growing year on year: it gained 11 percentage points since last year and has almost doubled its share since 2017. The most frequently used tools for developers involved in infrastructure development is Docker + Terraform + Ansible. The more people code at work, the more likely they are to code in their dreams. Developers choose Java as their primary language The participants were asked 3 questions about their language preference. Firstly, they were asked about the language they used last year, second they were asked about their primary language preference and, finally, they were asked to rank them. The most loved programming languages among developers are Java and Python. Second place is a tie between C# and JavaScript. Common secondary languages include HTML, SQL, and Shell scripting. A lot of software developers have some practice with these secondary languages, but very few work with them as their major language. For example, while 56% practice SQL, only 19% called it their primary language and only 1.5% rank it as their first language. Java, on the other hand, is the leading ‘solo’ language. 44% of its users use only Java or use Java first. The next top solo language is JavaScript, with a mere 17%. Android and React Native remain popular among mobile developers, Flutter gains momentum For mobile operating system preference 83% participants said they used Android as their preferred operating system followed by iOS which is 59%. Two thirds of mobile developers use native tools to develop for mobile OS. Every other developer uses cross-platform technologies or frameworks. 42% said they use React native as a cross platform mobile framework. Interestingly Flutter was at the 2nd place with 30% of audience preferring to use. Other included Cordova, Ionic, Xamarin, Unity etc. Other takeaways from the survey and few fun facts The most interesting question asked in this year’s survey was if developers code in their dreams. 52% responded Yes to this question which means the more people code at work (as a primary activity), the more likely they are to code in their dreams. Another really interesting fact was revealed when they were asked if AI will replace developers in future. 57% of participants responded that partially AI may replace programmers, but those who do Machine learning professionally were more skeptical about AI than those who do it as a hobby. 27% think that AI will never replace developers, while 6% agreed that it will fully replace programmers and another 11% were not sure. There were other questions like which is the most preferred operating system for the development environment. 57% of participants said they prefer Windows, followed by 49% for macOS and 48% for Unix/Linux. When asked about what types of applications do developers prefer to develop. Major chunk went to Web based Back-end applications, followed by Web front-end, mobile applications, libraries and frameworks, desktop applications, etc. 41% responded No to the question about if they contributed to open-source projects on a regular basis. Only 11% said they contribute to open source on a regular basis that is every month. 71% have Unit tests in their projects and 16% responded that they do not have any tests in their projects that is about among the fully employed senior developers. Source code collaboration tool is used regularly among the developers with 80% preference to it. Other tools like Standalone IDE, Lightweight Desktop Editor, Continuous Integration or Continuous Delivery tool, Issue tracker etc are also used by developers regularly. Demographics of the survey The demographics of the survey had 69% of people who are fully employed with a company or an organization. 75% were developer/programmer/software engineer. 1 in 14 people who were polled occupied a senior leadership role. Two thirds of the developers practice pair programming. The survey also revealed that the more experienced people spent less time on learning new tools / technologies / programming languages. The gender ratio participants is not revealed. Check out the infographic to know more about the survey results. What the Python Software Foundation & Jetbrains 2017 Python Developer Survey had to reveal Python Software foundation and JetBrains’ Python Developers Survey 2018 PyCon 2019 highlights: Python Steering Council discusses the changes in the current Python governance structure

0
0
2787

article-image-mongodb-announces-new-cloud-features-beta-version-of-mongodb-atlas-data-lake-and-mongodb-atlas-full-text-search-and-more

Amrata Joshi

19 Jun 2019

3 min read

MongoDB announces new cloud features, beta version of MongoDB Atlas Data Lake and MongoDB Atlas Full-Text Search and more!

Amrata Joshi

19 Jun 2019

3 min read

Yesterday, the team at MongoDB announced new cloud services and features that will offer a better way to work with data. The beta versions of MongoDB Atlas Data Lake and MongoDB Atlas Full-Text Search will help users to access new features in a fully managed MongoDB environment. MongoDB Charts include embedded charts in web applications The general availability of MongoDB Charts will help customers in creating charts and graphs, and further building and sharing dashboards. It also helps in embedding these charts, graphs and dashboards directly into web apps for creating better user experiences. MongoDB Charts is generally available to Atlas as well as on-premise customers which help in creating real-time visualization of MongoDB data. The MongoDB Charts include new features, such as embedded charts in external web applications, geospatial data visualization with new map charts, and built-in workload isolation for eliminating the impact of analytics queries on an operational application. Dev Ittycheria, CEO and President, MongoDB, said, “Our new offerings radically expand the ways developers can use MongoDB to better work with data.” He further added, “We strive to help developers be more productive and remove infrastructure headaches --- with additional features along with adjunct capabilities like full-text search and data lake. IDC predicts that by 2025 global data will reach 175 Zettabytes and 49% of it will reside in the public cloud. It’s our mission to give developers better ways to work with data wherever it resides, including in public and private clouds.” MongoDB Query Language added to MongoDB Atlas Data Lake MongoDB Atlas Data Lake helps customers to quickly query data on S3 in any format such as BSON, CSV, JSON, TSV, Parquet and Avro with the help of MongoDB Query Language (MQL). One of the major plus points about MongoDB Query Language is that it is expressive and will that allows developers to query the data. Developers can now use the same query language across data on S3, and make querying massive data sets easy and cost-effective. With MQL being added to MongoDB Atlas Data Lake, users can now run queries and explore their data by giving access to existing S3 storage buckets with a few clicks from the MongoDB Atlas console. Since the Atlas Data Lake is completely serverless, there is no need for setting up an infrastructure or managing it. Also, the customers pay only for the queries they run when they are actively working with the data. The team has planned for the availability of MongoDB Atlas Data Lake on Google Cloud Storage and Azure Storage for the future. Atlas Full-Text Search offers rich text search capabilities Atlas Full-Text Search offers rich text search capabilities that are based on Apache Lucene 8 against fully managed MongoDB databases. Also, there is no need for additional infrastructure or systems to manage. Full-Text Search helps the end users in filtering, ranking, and sorting their data for bringing out the most relevant results. So, users are not required to pair their database with an external search engine To know more about this news, check out the official press release. 12,000+ unsecured MongoDB databases deleted by Unistellar attackers MongoDB is going to acquire Realm, the mobile database management system, for $39 million MongoDB withdraws controversial Server Side Public License from the Open Source Initiative’s approval process

0
0
2137

article-image-how-genius-used-embedded-hidden-morse-code-in-lyrics-to-catch-plagiarism-in-google-search-results

Fatema Patrawala

18 Jun 2019

3 min read

How Genius used embedded hidden Morse code in lyrics to catch plagiarism in Google search results

Fatema Patrawala

18 Jun 2019

3 min read

Have you ever noticed that when you google lyrics of a particular song, Google displays them on its Search results card all along? A lyrics website Genius Media Group Inc. has accused Google of stealing lyrics from its site and reposting them in search results without permission. Additionally Genius claims to have caught Google “red handed” with the help of a Morse code embedded in their lyrics. On 16th June, the Wall Street Journal reported that Genius’ web traffic has dropped in recent years as Google has posted lyrics on its search results page in “information boxes” instead of routing users to lyric sites like Genius. In March, 62 percent of mobile searches on Google did not result in a click-through to another site. https://twitter.com/WSJ/status/1140201102102732800 Companies like Genius and other such lyrics website depend on search engines like Google to send music lovers to the website who stock hard-to-decipher lyrics of hip-hop songs and other pop hits. While Google posting song lyrics themselves is not a crime, Genius claims that Google has been lifting the song lyrics directly from Genius without permission and reposting them on the search result page. They have also shown evidence by inserting two forms of apostrophes embedded in Genius-housed lyrics. The company started to collect proof in 2016, the team at Genius positioned both “straight” and “curly” apostrophes in their lyrics. So when the apostrophes were converted into dots and dashes like Morse code, it spelled out the words “Red Handed.” Genius added that, using these apostrophes, they found over 100 instances of Google using Genius’ own lyrics in the Google search results. Check out the below video posted by WSJ to see how Genius caught Google copying the lyrics from its website: “Over the last two years, we’ve shown Google irrefutable evidence again and again that they are displaying lyrics copied from Genius,” Genius’s chief strategy officer Ben Gross told the Wall Street Journal. “We noticed that Google’s lyrics matched our lyrics down to the character.” The Wall Street Journal confirmed Genius’ accusations by matching the results of a set of randomly chosen three songs from the list of 100 instances. The songs included Alessia Cara’s “Not Today” – as well as Genius’ lyrics for Desiigner’s near-indecipherable “Panda,” which the rapper himself submitted the lyrics to the site. According to the New York Post, Google has denied the accusations through their partnership with LyricFind, which provides the search engine with lyrics through a deal with music publishers. “We take data quality and creator rights very seriously and hold our licensing partners accountable to the terms of our agreement,” Google said. Moreover, Google issued a second statement to say it’s investigating the issues and would terminate its agreements with partners that aren’t “upholding good practices.” “We do not source lyrics from Genius,” LyricFind Chief Executive Darryl Ballantyne said. Canva faced security breach, 139 million users data hacked: ZDNet reports Microsoft open sources SPTAG algorithm to make Bing smarter! Time for data privacy: DuckDuckGo CEO Gabe Weinberg in an interview with Kara Swisher

0
0
3469

Tech News - Data

Do Google Ads secretly track Stack Overflow users?

Elastic Stack 7.2.0 releases Elastic SIEM and general availability of Elastic App Search

Introducing TensorWatch, a debugging and visualization tool

Apache Kafka 2.3 is here!

How Verizon and a BGP Optimizer caused a major internet outage affecting Amazon, Facebook, CloudFlare among others

CMU and Google researchers present XLNet: a new pre-training method for language modeling that outperforms BERT on 20 tasks

Bipartisan US legislators introduce the Dashboard act to force big tech to disclose their user data monetization practices

Now there is a Deepfake that can animate your face with just your voice and a picture using temporal GANs

Amazon patents AI-powered drones to provide ‘surveillance as a service’

UK’s data protection regulator ICO releases report concludes that Adtech industry operates illegally

Trending Topics

FTC to investigate YouTube over mishandling children’s data privacy

Facebook content moderators work in filthy, stressful conditions and experience emotional trauma daily, reports The Verge

‘I code in my dreams too’, say developers in Jetbrains State of Developer Ecosystem 2019 Survey

MongoDB announces new cloud features, beta version of MongoDB Atlas Data Lake and MongoDB Atlas Full-Text Search and more!

How Genius used embedded hidden Morse code in lyrics to catch plagiarism in Google search results