Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-privacy-experts-urge-the-senate-commerce-committee-for-a-strong-federal-privacy-bill-that-sets-a-floor-not-a-ceiling

11 Oct 2018

9 min read

Privacy experts urge the Senate Commerce Committee for a strong federal privacy bill “that sets a floor, not a ceiling”

11 Oct 2018

The Senate Commerce Committee held a hearing yesterday on consumer data privacy. The hearing focused on the perspective of privacy advocates and other experts. These advocates encouraged federal lawmakers to create strict data protection regulation rules, giving consumers more control over their personal data. The major focus was on implementing a strong common federal consumer privacy bill “that sets a floor, not a ceiling.” Representatives included Andrea Jelinek, the chair of the European Data Protection Board; Alastair Mactaggart, the advocate behind California's Consumer Privacy Act; Laura Moy, executive director of the Georgetown Law Center on Privacy and Technology; and Nuala O'Connor, president of the Center for Democracy and Technology. The Goal: Protect user privacy, allow innovation John Thune, the Committee Chairman said in his opening statement, “Over the last few decades, Congress has tried and failed to enact comprehensive privacy legislation. Also in light of recent security incidents including Facebook’s Cambridge Analytica and another security breach, and of the recent data breach in Google+, it is increasingly clear that industry self-regulation in this area is not sufficient. A national standard for privacy rules of the road is needed to protect consumers.” Senator Edward Markey, in his opening statement, spoke on data protection and privacy saying that “Data is the oil of the 21st century”. He further adds, “Though it has come with an unexpected cost to the users, any data-driven website that uses their customer’s personal information as a commodity, collecting, and selling user information without their permission.” He said that the goal of this hearing was to give users meaningful control over their personal information while maintaining a thriving competitive data ecosystem in which entrepreneurs can continue to develop. What did the industry tell the Senate Commerce Committee in the last hearing on the topic of consumer privacy? A few weeks ago, the Commerce committee held a discussion with Google, Facebook, Amazon, AT&T, and other industry players to understand their perspective on the same topic. The industry unanimously agreed that privacy regulations need to be put in place However, these companies pushed for the committee to make online privacy policy at the federal level rather than at the state level to avoid a nightmarish patchwork of policies for businesses to comply by. They also shared that complying by GDPR has been quite resource intensive. While they acknowledged that it was too soon to assess the impact of GDPR, they cautioned the Senate Commerce Committee that policies like the GDPR and CCPA could be detrimental to growth and innovation and thereby eventually cost the consumer more. As such, they expressed interest in being part of the team that formulates the new federal privacy policy. Also, they believed that the FTC was the right body to oversee the implementation of the new privacy laws. Overall, the last hearing’s meta-conversation between the committee and the industry was heavy with defensive stances and scripted almost colluded recommendations. The Telcos wanted tech companies to do better. The message was that user privacy and tech innovation are too interlinked and there is a need to strike a delicate balance to make privacy work practically. The key message from yesterday’s Senate Commerce Committee hearing with privacy advocates and EU regulator This time, the hearing was focused solely on establishing strict privacy laws and to draft clear guidelines regarding, definitions of ‘sensitive’ data, prohibited uses of data, and establishing limits for how long corporations can hold on to consumer data for various uses. A focal point of the hearing was to give users the key elements of Knowledge, Notice, and No. Consumers need knowledge that their data is being shared and how it is used, notice when their data is compromised, and the ability to say no to the entities that want their personal information. It should also include limits on how companies can use consumer’s information. The bill should prohibit companies from giving financial incentives to users in exchange for their personal information. Privacy must not become a luxury good that only the fortunate can afford. The bill should also ban “take it or leave it” offerings, in which a company requires a consumer to forfeit their privacy in order to consume a product. Companies should not be able to coerce users into providing their personal information by threatening to deprive them of a service. The law should include individual rights like the ability to access, correct, delete, and remove information. Companies should only collect user data which is absolutely necessary to carry out the service and keep that private information safe and secure. The legislation should also include special protections for children and teenagers. The federal government should be given strong enforcement powers and robust rule-making authority in order to ensure rules keep pace with changing technologies. Some of the witnesses believed that the FTC may not the right body to do this and that a new entity focused on this aspect may do a better and more agile job. “We can’t be shy about data regulation”, Laura Moy Laura Moy, Deputy Director of the Privacy and Technology center at Georgetown University law center talked at length about Data regulation. “This is not a time to be shy about data regulation,” Moy said. “Now is the time to intervene.” She emphasized that information should not in any way be used for discrimination. Nor it should be used to amplify hate speech, be sold to data brokers or used to target misinformation or disinformation. She also talked about Robust Enforcement, where she said she plans to call for legislation to “enable robust enforcement both by a federal agency and state attorneys general and foster regulatory agility.” She also addressed the question of whether companies should be able to tell consumers that if they don’t agree to share non-essential data, they cannot receive products or service? She disagreed saying that if companies do so, they have violated the idea of “Free choice”. She also addressed issues as to whether companies should be eligible for offering financial initiatives in exchange for user personal information, “GDPR was not a revolution, but just an evolution of a law [that existed for 20 years]”, Andrea Jelinek Andrea Jelinek, Chairperson, European Data Protection Board, highlighted the key concepts of GDPR and how it can be an inspiration to develop a policy in the U.S. at the federal level. In her opening statements, she said, “The volume of Digital information doubles every two years and deeply modifies our way of life. If we do not modify the roots of data processing gains with legislative initiatives, it will turn into a losing game for our economy, society, and each individual.” She addressed the issue of how GDPR is going to be enforced in the investigation of Facebook by Ireland’s Data protection authority. She also gave stats on the number of GDPR investigations opened in the EU so far. From the figures dating till October 1st, GDPR has 272 cases regarding identifying the lead supervisory authority and concern supervisory authority. There are 243 issues on mutual assistance according to Article 61 of the GDPR. There are also 223 opinions regarding data protection impact assessment. Company practices that have generated the most complaints and concerns from consumers revolved around “User Consent”. She explained why GDPR went with the “regulation route”, choosing one data privacy policy for the entire continent instead of each member country having their own. Jelinek countered Google’s point about compliance taking too much time and effort from the team by saying that given Google’s size, it would have taken around 3.5 hours per employee to get the compliance implemented. She also observed that it could have been reduced a lot, had they followed good data practices, to begin with. She also clarified that GDPR was not a really new or disruptive regulatory framework. In addition to the two years provided to companies to comply with the new rules, there was a 20-year-old data protection directive already in place in Europe in various forms. In that sense she said, GDPR was not a revolution, but just an evolution of a law that existed for 20 years. Californians for Consumer Privacy Act Alastair McTaggart, Chairman of Californians for consumer privacy, talked about CCPA’s two main elements. First, the Right to know, which allows Californians to know the information corporations have collected concerning them. Second, the Right to say no to businesses to stop selling their personal information. He said, “CCPA puts the focus on giving choice back to the consumer and enforced data security, a choice which is sorely needed." He also addressed questions like, “If he believes federal law should also grant permission for 13, 14, and 15-year-old?” What should the new Federal Privacy law look like according to CDT’s O’Connor Center for Democracy and Technology (CDT) President and CEO, Laura O'Connor said, "As with many new technological advancements and emerging business models, we have seen exuberance and abundance, and we have seen missteps and unintended consequences. International bodies and US states have responded by enacting new laws, and it is time for the US federal government to pass omnibus federal privacy legislation to protect individual digital rights and human dignity, and to provide certainty, stability, and clarity to consumers and companies in the digital world." She also highlighted five important pointers that should be kept in mind while designing the new Federal Privacy law. A comprehensive federal privacy law should apply broadly to all personal data and unregulated commercial entities, not just to tech companies. The law should include individual rights like the ability to access, correct, delete, and remove information. Congress should prohibit the collection, use, and sharing of certain types of data when not necessary for the immediate provision of the service. The FTC should be expressly empowered to investigate data abuses that result in discriminatory advertising and other practices. A federal privacy law should be clear on its face and provide specific guidance to companies and markets about legitimate data practices. It is promising to see the Senate Commerce committee sincerely taking in notes from both industry and privacy advocates to enable building strict privacy standards. They are hoping this new legislation is more focused on protecting consumer data than the businesses that profit from it. Only time will tell if a bipartisan consensus to this important initiative will be reached. For a detailed version of this story, it is recommended to hear the full Senate Commerce Committee hearing. Consumer protection organizations submit a new data protection framework to the Senate Commerce Committee. Google, Amazon, AT&T met the U.S Senate Committee to discuss consumer data privacy. Facebook, Twitter open up at Senate Intelligence hearing, the committee does ‘homework’ this time.

0
0
2457

article-image-microsoft-open-sources-infer-net-its-popular-model-based-machine-learning-framework

Melisha Dsouza

08 Oct 2018

3 min read

Microsoft open sources Infer.NET, it’s popular model-based machine learning framework

Melisha Dsouza

08 Oct 2018

3 min read

Last week, Microsoft open sourced Infer.NET, the cross-platform framework used for model-based machine learning. This popular machine learning engine used in Office, Xbox and Azure, will be available on GitHub under the permissive MIT license for free use in commercial applications. Features of Infer.NET The team at Microsoft Research in Cambridge initially envisioned Infer.NET as a research tool and released it for academic use in 2008. The framework has served as a base to publish hundreds of papers across a variety of fields, including information retrieval and healthcare. The team then started using the framework as a machine learning engine within a wide range of Microsoft products. A model-based approach to machine learning Infer.NET allows users to incorporate domain knowledge into their model. The framework can be used to build bespoke machine learning algorithms directly from their model. To sum it up, this framework actually constructs a learning algorithm for users based on the model they have provided. Facilitates interpretability Infer.NET also facilitates interpretability. If users have designed the model themselves and the learning algorithm follows that model, they can understand why the system behaves in a particular way or makes certain predictions. Probabilistic Approach In Infer.NET, models are described using a probabilistic program. This is used to describe real-world processes in a language that machines understand. Infer.NET compiles the probabilistic program into high-performance code for implementing something cryptically called deterministic approximate Bayesian inference. This approach allows a notable amount of scalability. For instance, it can be used in a system that automatically extracts knowledge from billions of web pages, comprising petabytes of data. Additional Features The framework also supports the ability of the system to learn as new data arrives. The team is also working towards developing and growing it further. Infer.NET will become a part of ML.NET (the machine learning framework for .NET developers). They have already set up the repository under the .NET Foundation and moved the package and namespaces to Microsoft.ML.Probabilistic. Being cross platform, Infer.NET supports .NET Framework 4.6.1, .NET Core 2.0, and Mono 5.0. Windows users get to use Visual Studio 2017, while macOS and Linux folks have command-line options, which could be incorporated into the code wrangler of their choice. Download the framework to learn more about Infer.NET. You can also check the documentation for a detailed User Guide. To know more about this news, head over to Microsoft’s official blog. Microsoft announces new Surface devices to enhance user productivity, with style and elegance Neural Network Intelligence: Microsoft’s open source automated machine learning toolkit Microsoft’s new neural text-to-speech service lets machines speak like people

0
0
3988

article-image-7-tips-for-using-git-and-github-the-right-way

Sugandha Lahoti

06 Oct 2018

3 min read

7 tips for using Git and GitHub the right way

Sugandha Lahoti

06 Oct 2018

3 min read

GitHub has become a widely accepted and integral part of software development owing to the imperative features of change tracking that it offers. It was created in 2005 by Linus Torvalds to support the development of the Linux kernel. In this post, Alex Magana and Joseph Muli, the authors of Introduction to Git and GitHub course, discuss some of the best practices you should keep in mind while learning or using Git and GitHub. Document everything A good best practice that eases work in any team is ample documentation. Documenting something as simple as a repository goes a long way in presenting work and attracting contributors. It’s more of a first impression aspect when it comes to looking for a tool to aid in development. Utilize the README.MD and wikis One should also utilize the README.MD and wikis to elucidate the functionality delivered by the application and categorize guide topics and material. You should Communicate the solution of the code in the repository avails. Specify the guidelines and rules of engagement that govern contributing to the codebase. Indicate the dependencies required to set-up the working environment. Stipulate set-up instructions to get a working version of the application in a contributor’s local environment. Keep simple and concise naming conventions Naming conventions are also highly encouraged when it comes to repositories and branches. They should be simple, descriptive and concise. For instance, a repository that houses code intended for a Git course could be simply named as “git-tutorial-material”. As a learner on the bespoke course, it’s easier for a user to get the material, compared to a repository with a name such as “material”. Adopt naming prefixes You should also adopt institute naming prefixes for different task types for branch naming. For example, you may use feat- for feature branches, bug- for bugs and fix- for fix branches. Also, make use of templates that encompass a checklist for Pull Requests. Correspond a PR and Branch to a ticket or task A PR and Branch should correspond to a ticket or task on the project management board. This aids in aligning efforts employed on a product to the appropriate milestones and product vision. Organize and track tasks using issues Tasks such as technical debts, bugs, and workflow improvements should be organized and tracked using Issues. You should also enforce Push and Pull restrictions on the default branch and use webhooks to automate deployment and run pre-merge test suites. Use atomic commits Another best practice is to use atomic commits. An atomic commit is an operation that applies a set of distinct changes as a single operation. You should persist changes in small changesets and use descriptive and concise commit messages to record effected changes. You read a guest post from Alex Magana and Joseph Muli, the authors of Introduction to Git and GitHub. We hope that these best practices help you manage your Git and GitHub more smoothly. Don’t forget to check out Alex and Joseph’s Introduction to Git and GitHub course to learn how to create and enforce checks and controls for the introduction, scrutiny, approval, merging, and reversal of changes. GitHub introduces ‘Experiments’, a platform to share live demos of their research projects. Packt’s GitHub portal hits 2,000 repositories

0
0
4062

article-image-production-ready-pytorch-1-0-preview-release-is-here-with-torch-jit-c10d-distributed-library-c-api

Aarthi Kumaraswamy

02 Oct 2018

4 min read

PyTorch 1.0 preview release is production ready with torch.jit, c10d distributed library, C++ API

Aarthi Kumaraswamy

02 Oct 2018

4 min read

Back in May, the PyTorch team shared their roadmap for PyTorch 1.0 release and highlighted that this most anticipated version will not only continue to provide stability and simplicity of use to its users, but will also make it production ready while making it a hassle-free migration experience for its users. Today, Facebook announced the release of PyTorch 1.0 RC1. The official announcement states, “PyTorch 1.0 accelerates the workflow involved in taking breakthrough research in artificial intelligence to production deployment. With deeper cloud service support from Amazon, Google, and Microsoft, and tighter integration with technology providers ARM, Intel, IBM, NVIDIA, and Qualcomm, developers can more easily take advantage of PyTorch’s ecosystem of compatible software, hardware, and developer tools. The more software and hardware that is compatible with PyTorch 1.0, the easier it will be for AI developers to quickly build, train, and deploy state-of-the-art deep learning models.” PyTorch is an open-source Python-based deep learning framework which provides powerful GPU acceleration. PyTorch is known for advanced indexing and functions, imperative style, integration support and API simplicity. This is one of the key reasons why developers prefer PyTorch for research and hackability. On the downside, it has struggled with adoption in production environments. The pyTorch team acknowledged this in their roadmap and have worked on improving this aspect significantly in pyTorch 1.0 not just in terms of improving the library but also by enriching its ecosystems but partnering with key software and hardware vendors. “One of its biggest downsides has been production-support. What we mean by production-support is the countless things one has to do to models to run them efficiently at massive scale: exporting to C++-only runtimes for use in larger projects optimizing mobile systems on iPhone, Android, Qualcomm and other systems using more efficient data layouts and performing kernel fusion to do faster inference (saving 10% of speed or memory at scale is a big win) quantized inference (such as 8-bit inference)”, stated the pyTorch team in their roadmap post. Below are some key highlights of this major milestone for PyTorch. JIT The JIT is a set of compiler tools for bridging the gap between research in PyTorch and production. It includes a language called Torch Script ( a subset of Python), and two ways (Tracing mode and Script mode) in which the existing code can be made compatible with the JIT. Torch Script code can be aggressively optimized and it can be serialized for later use in the new C++ API, which doesn't depend on Python at all. torch.distributed new "C10D" library The torch.distributed package and torch.nn.parallel.DistributedDataParallel module are backed by the new "C10D" library. The main highlights of the new library are: C10D is performance driven and operates entirely asynchronously for all backends: Gloo, NCCL, and MPI. Significant Distributed Data Parallel performance improvements especially for slower network like ethernet-based hosts Adds async support for all distributed collective operations in the torch.distributed package. Adds send and recv support in the Gloo backend C++ Frontend [API Unstable] The C++ frontend is a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend. It is intended to enable research in high performance, low latency and bare metal C++ applications. It provides equivalents to torch.nn, torch.optim, torch.data and other components of the Python frontend. The C++ frontend is marked as "API Unstable" as part of PyTorch 1.0. This means it is ready to be used for building research applications, but still has some open construction sites that will stabilize over the next month or two. In other words, it is not ready for use in production, yet. N-dimensional empty tensors, a collection of new operators inspired from numpy and scipy and new distributions such as Weibull, Negative binomial and multivariate log gamma distributions have been introduced. There have also been a lot of breaking changes, bug fixes, and other improvements made to pyTorch 1.0. For more details read the official announcement and also the official release notes for pyTorch. What is PyTorch and how does it work? Build your first neural network with PyTorch [Tutorial] Is Facebook-backed PyTorch better than Google’s TensorFlow? Can a production-ready Pytorch 1.0 give TensorFlow a tough time?

0
0
5874

article-image-ethical-dilemmas-developers-on-artificial-intelligence-products-must-consider

Amey Varangaonkar

29 Sep 2018

10 min read

The ethical dilemmas developers working on Artificial Intelligence products must consider

Amey Varangaonkar

29 Sep 2018

10 min read

0
0
5086

article-image-brainnet-an-interface-to-communicate-between-human-brains-could-soon-make-telepathy-real

Sunith Shetty

28 Sep 2018

3 min read

BrainNet, an interface to communicate between human brains, could soon make Telepathy real

Sunith Shetty

28 Sep 2018

3 min read

BrainNet provides the first multi-person brain-to-brain interface which allows a nonthreatening direct collaboration between human brains. It can help small teams collaborate to solve a range of tasks using direct brain-to-brain communication. How does BrainNet operate? The noninvasive interface combines electroencephalography (EEG) to record brain signals and transcranial magnetic stimulation (TMS) to deliver the required information to the brain. For now, the interface allows three human subjects to collaborate, handle and solve a task using direct brain-to-brain communication. Two out of three human subjects are “Senders”. The senders’ brain signals are decoded using real-time EEG data analysis. This technique allows extracting decisions which are vital in communicating in order to solve the required challenges. Let’s take an example of a Tetris-like game--where you need quick decisions to decide whether to rotate a block or drop as it is in order to fill a line. The senders’ signals (decisions) are transmitted to the third subject human brain via the Internet, the “Receiver” in this case. The decisions are sent to the receiver brain via magnetic stimulation of the occipital cortex. The receiver can’t see the game screen to decide if the rotation of the block is required. The receiver integrates the decisions received and makes an informed call using an EEG interface regarding turning the position of the block or keeping it in the same position. The second round of the game allows the senders to validate the previous move and provide the necessary feedback to the receiver’s action. How did the results look? The group of researchers has implemented this technique for the Tetris game to evaluate the performance of BrainNet considering the following factors: Group-level performance during the game True/False positive rates of subject’s decisions Mutual information between subjects This was implemented among five groups of three human brain subjects to perform the Tetris task using BrainNet interface. The average accuracy result for the task was 0.813. Furthermore, they also tried varying the information reliability by injecting artificially generated noise into one of the senders’ signals. However, the receiver was able to classify which sender is more reliable based on the information transmitted to their brains. These positive results have open the gates and the possibilities of future brain-to-brain interfaces which holds the power of enabling cooperative problem solving by humans using a "social network" of connected brains. To know more, you can refer to the research paper. Read more Diffractive Deep Neural Network (D2NN): UCLA-developed AI device can identify objects at the speed of light Baidu announces ClariNet, a neural network for text-to-speech synthesis Optical training of Neural networks is making AI more efficient

0
0
4549

article-image-did-you-know-facebook-shares-the-data-you-share-with-them-for-security-reasons-with-advertisers

Natasha Mathur

28 Sep 2018

5 min read

Did you know Facebook shares the data you share with them for ‘security’ reasons with advertisers?

Natasha Mathur

28 Sep 2018

5 min read

Facebook is constantly under the spotlight these days when it comes to controversies regarding user’s data and privacy. A new research paper published by the Princeton University researchers states that Facebook shares the contact information you handed over for security purposes, with their advertisers. This study was first brought to light by a Gizmodo writer, Kashmir Hill. “Facebook is not content to use the contact information you willingly put into your Facebook profile for advertising. It is also using contact information you handed over for security purposes and contact information you didn’t hand over at all, but that was collected from other people’s contact books, a hidden layer of details Facebook has about you that I’ve come to call “shadow contact information”, writes Hill. Recently, Facebook introduced a new feature called custom audiences. Unlike traditional audiences, the advertiser is allowed to target specific users. To do so, the advertiser uploads user’s PII (personally identifiable information) to Facebook. After the uploading is done, Facebook then matches the given PII against platform users. Facebook then develops an audience that comprises the matched users and allows the advertiser to further track the specific audience. Essentially with Facebook, the holy grail of marketing, which is targeting an audience of one, is practically possible; nevermind whether that audience wanted it or not. In today’s world, different social media platforms frequently collect various kinds of personally identifying information (PII), including phone numbers, email addresses, names and dates of birth. Majority of this PII often represent extremely accurate, unique, and verified user data. Because of this, these services have the incentive to exploit and use this personal information for other purposes. One such scenario includes providing advertisers with more accurate audience targeting. The paper titled ‘Investigating sources of PII used in Facebook’s targeted advertising’ is written by Giridhari Venkatadri, Elena Lucherini, Piotr Sapiezynski, and Alan Mislove. “In this paper, we focus on Facebook and investigate the sources of PII used for its PII-based targeted advertising feature. We develop a novel technique that uses Facebook’s advertiser interface to check whether a given piece of PII can be used to target some Facebook user and use this technique to study how Facebook’s advertising service obtains users’ PII,” reads the paper. The researchers developed a novel methodology, which involved studying how Facebook obtains the PII to provide custom audiences to advertisers. “We test whether PII that Facebook obtains through a variety of methods (e.g., directly from the user, from two-factor authentication services, etc.) is used for targeted advertising, whether any such use is clearly disclosed to users, and whether controls are provided to users to help them limit such use,” reads the paper. The paper uses size estimates to study what sources of PII are used for PII-based targeted advertising. Researchers used this methodology to investigate which range of sources of PII was actually used by Facebook for its PII-based targeted advertising platform. They also examined what information gets disclosed to users and what control users have over PII. What sources of PII are actually being used by Facebook? Researchers found out that Facebook allows its users to add contact information (email addresses and phone numbers) on their profiles. While any arbitrary email address or phone number can be added, it is not displayed to other users unless verified (through a confirmation email or confirmation SMS message, respectively). This is the most direct and explicit way of providing PII to advertisers. Researchers then further moved on to examine whether PII provided by users for security purposes such as two-factor authentication (2FA) or login alerts are being used for targeted advertising. They added and verified a phone number for 2FA to one of the authors’ accounts. The added phone number became targetable after 22 days. This proved that a phone number provided for 2FA was indeed used for PII-based advertising, despite having set the privacy controls to the choice. What control do users have over PII? Facebook allows users the liberty of choosing who can see each PII listed on their profiles, the current list of possible general settings being: Public, Friends, Only Me. Users can also restrict the set of users who can search for them using their email address or their phone number. Users are provided with the following options: Everyone, Friends of Friends, and Friends. Facebook provides users a list of advertisers who have included them in a custom audience using their contact information. Users can opt out of receiving ads from individual advertisers listed here. But, information about what PII is used by advertisers is not disclosed. What information about how Facebook uses PII gets disclosed to the users? On adding mobile phone numbers directly to one’s Facebook profile, no information about the uses of that number is directly disclosed to them. This Information is only disclosed to users when adding a number from the Facebook website. As per the research results, there’s very little disclosure to users, often in the form of generic statements that do not refer to the uses of the particular PII being collected or that it may be used to allow advertisers to target users. “Our paper highlights the need to further study the sources of PII used for advertising, and shows that more disclosure and transparency needs to be provided to the user,” says the researchers in the paper. For more information, check out the official research paper. Ex-employee on contract sues Facebook for not protecting content moderators from mental trauma How far will Facebook go to fix what it broke: Democracy, Trust, Reality Mark Zuckerberg publishes Facebook manifesto for safeguarding against political interference

0
0
3074

article-image-9-recommended-blockchain-online-courses

Guest Contributor

27 Sep 2018

7 min read

9 recommended blockchain online courses

Guest Contributor

27 Sep 2018

7 min read

Blockchain is reshaping the world as we know it. And we are not talking metaphorically because the new technology is really influencing everything from online security and data management to governance and smart contracting. Statistical reports support these claims. According to the study, the blockchain universe grows by over 40% annually, while almost 70% of banks are already experimenting with this technology. IT experts at the Editing AussieWritings.com Services claim that the potential in this field is almost limitless: “Blockchain offers a myriad of practical possibilities, so you definitely want to get acquainted with it more thoroughly.” Developers who are curious about blockchain can turn it into a lucrative career opportunity since it gives them the chance to master the art of cryptography, hierarchical distribution, growth metrics, transparent management, and many more. There were 5,743 mostly full-time job openings calling for blockchain skills in the last 12 months - representing the 320% increase - while the biggest freelancing website Upwork reported more than 6,000% year-over-year growth. In this post, we will recommend our 9 best blockchain online courses. Let’s take a look! Udemy Udemy offers users one of the most comprehensive blockchain learning sources. The target audience is people who have heard a little bit about the latest developments in this field, but want to understand more. This online course can help you to fully understand how the blockchain works, as well as get to grips with all that surrounds it. Udemy breaks down the course into several less complicated units, allowing you to figure out this complex system rather easily. It costs $19.99, but you can probably get it with a 40% discount. The one downside, however, is that content quality in terms of subject scope can vary depending on the instructor, but user reviews are a good way to gauge quality. Each tutorial lasts approximately 30 minutes, but it also depends on your own tempo and style of work. Pluralsight Pluralsight is an excellent beginner-level blockchain course. It comes in three versions: Blockchain Fundamentals, Surveying Blockchain Technologies for Enterprise, and Introduction to Bitcoin and Decentralized Technology. Course duration varies from 80 to 200 minutes depending on the package. The price of Pluralsight is $29 a month or $299 a year. Choosing one of these options, you are granted access to the entire library of documents, including course discussions, learning paths, channels, skill assessments, and other similar tools. Packt Publishing Packt Publishing has a wide portfolio of learning products on Blockchain for varying levels of experience in the field from beginners to experts. And what’s even more interesting is that you can choose your learning format from books, ebooks to videos, courses and live courses. Or you could simply subscribe to MAPT, their library to gain access to all products at a reasonable price of $29 monthly and $150 annually. It offers several books and videos on the leading blockchain technology. You can purchase 5 blockchain titles at a discounted rate of $50. Here’s the list of top blockchain courses offered by Packt Publishing: Exploring Blockchain and Crypto-currencies: You will gain the foundational understanding of blockchain and crypto-currencies through various use-cases. Building Blockchain Projects: In this, you will be able to develop real-time practical DApps with Ethereum and JavaScript. Mastering Blockchain - Second Edition: You can learn about cryptography and cryptocurrencies, so you can build highly secure, decentralized applications and conduct trusted in-app transactions. Hands-On Blockchain with Hyperledger: This book will help you leverage the power of Hyperledger Fabric to develop Blockchain-based distributed ledgers with ease. Learning Blockchain Application Development [video ]: This interactive video will help you learn build smart contracts and DApps on Ethereum. Create Ethereum and Blockchain Applications using Solidity [video ]: This video will help you learn about Ethereum, Solidity, DAO, ICO, Bitcoin, Altcoin, Website Security, Ripple, Litecoin, Smart Contracts, and Apps. Cryptozombies Cryptozombies is an online blockchain course based on gamification elements. The tool teaches you to write smart contracts in Solidity through building your own crypto-collectibles game. It is entirely Ethereum-focused, but you don’t need any previous experience to understand how Solidity works. There is a step by step guide that explains to you even the smallest details, so you can quickly learn to create your own fully-functional blockchain-based game. The best thing about Cryptozombies is that you can test it for free and give up in case you don’t like it. Coursera The blockchain is the epicenter of the cryptocurrency world, so it’s necessary to study it if you want to deal with Bitcoin and other digital currencies. Coursera is the leading online resource in the field of virtual currencies, so you might want to check it out. After this course like Blockchain Specialization, you’ll know everything you need to be able to separate fact from fiction when reading claims about Bitcoin and other cryptocurrencies. You’ll have the conceptual foundations you need to engineer to secure software that interacts with the Bitcoin network. And you’ll be able to integrate ideas from Bitcoin in your own projects. The course is a 4-part course spanning a duration 4 weeks, but you can take each part separately. The price depends on the level and features you choose. LinkedIn Learning (formerly known as Lynda) LinkedIn Learning (what used to be Lynda) doesn't offer a specific blockchain course, but it does have a wide range of industry-related learning sources. A search for ‘blockchain’ will present you with almost 100 relevant video courses. You can find all sorts of lessons here, from beginner to expert levels. Lynda allows you to customize selection according to video duration, authors, software, subjects, etc. You can access the library for $15 a month. B9Lab B9Lab ETH-25 Certified Online Ethereum Developer Course is another course that promotes blockchain technology aimed at the Ethereum platform. It’s a 12-week in-depth learning solution that targets experienced programmers. B9Lab introduces everything there is to know about blockchain and how to build useful applications. Participants are taught about the Ethereum platform, the programming language Solidity, how to use web3 and the Truffle framework, and how to tie everything together. The price is €1450 or about $1700. IBM IBM made a self-paced blockchain course, titled Blockchain Essentials that lasts over two hours. The video lectures and lab in this course help you learn about blockchain for business and explore key use cases that demonstrate how the technology adds value. You can learn how to leverage blockchain benefits, transform your business with the new technology, and transfer assets. Besides that, you get a nice wrap-up and a quiz to test your knowledge upon completion. IBM’s course is free of charge. Khan Academy Khan Academy is the last, but certainly not the least important online course on our list. It gives users a comprehensive overview of blockchain-powered systems, particularly Bitcoin. Using this platform, you can learn more on cryptocurrency transactions, security, proof of work, etc. As an online education platform, Khan Academy won’t cost you a dime. [dropcap]B[/dropcap]lockchain is the groundbreaking technology that opens new boundaries in almost every field of business. It directly influences financial markets, data management, digital security, and a variety of other industries. In this post, we presented 9 best blockchain online courses you should try. These sources can teach you everything there is to know about the blockchain basics. Take some time to check them out and you won’t regret it! Author Bio: Olivia is a passionate blogger who writes on topics of digital marketing, career, and self-development. She constantly tries to learn something new and to share this experience on various websites. Connect with her on Facebook and Twitter. Google introduces Machine Learning courses for AI beginners Microsoft start AI School to teach Machine Learning and Artificial Intelligence.

0
0
5451

article-image-the-white-house-is-reportedly-launching-an-antitrust-investigation-against-social-media-companies

Sugandha Lahoti

26 Sep 2018

3 min read

The White House is reportedly launching an antitrust investigation against social media companies

Sugandha Lahoti

26 Sep 2018

3 min read

According to information obtained by Bloomberg, The White House is reportedly making a draft executive order against online platform bias in Social Media firms. Per this draft, federal antitrust and law enforcement agencies are instructed to investigate into the practices of Google, Facebook, and other social media companies. The existence of the draft was first reported by Capital Forum. Federal law enforcers are required to investigate primarily against two violations. First, if an online platform has acted in violation of the antitrust laws. Second, to remove anti-competitive spirit among online platforms and address online platform bias. Per the sources by Capital Forum, the draft is written in two parts. The first part is a policy statement stating that online platforms are central to the flow of information and commerce and need to be held accountable through competition. The second part instructs agencies to investigate bias and anticompetitive conduct in online platforms where they have the authority. In case of lack of authorization, they are required to report concerns or issues to the Federal Trade Commission or the Department of Justice. No online platforms are mentioned by name in the draft. It’s unclear when or if the White House will decide to issue the order. Donald Trump and the White House have always been vocal about the prevalent bias in Social media platforms. In August, Trump tweeted about Social Media discriminating against Republican and Conservative voices. Source: Twitter He also went on to claim that Google search results for “Trump News” reports fake news. He accused the search engines’ algorithms of being rigged. However, that allegation having not been backed by evidence, let Google slam Trump’s accusations, asserting that its search engine algorithms do not favor any political ideology. Earlier this month, Facebook COO Sheryl Sandberg and Twitter CEO Jack Dorsey faced the Senate Select Intelligence Committee, to discuss foreign interference through social media platforms in US elections. Google, Facebook, and Twitter also released a Testimony ahead of appearing before the committee. As reported by the Wall Street Journal, Google CEO Sundar Pichai also plans to meet privately with top Republican lawmakers this Friday to discuss a variety of topics, including the company's alleged political bias in search results. This meeting is organized by the House Majority Leader, Kevin McCarthy. Pichai said on Tuesday that “I look forward to meeting with members on both sides of the aisle, answering a wide range of questions, and explaining our approach." Google is also facing public scrutiny over a report that it intends to launch a censored search engine in China. Google’s custom search engine would link Chinese users’ search queries to their personal phone numbers, thus making it easier for the government to track their searches. About a thousand Google employees frustrated with a series of controversies involving Google have signed a letter to demand transparency on building the alleged search engine. Google’s new Privacy Chief officer proposes a new framework for Security Regulation. Amazon is the next target on EU’s antitrust hitlist. Mark Zuckerberg publishes Facebook manifesto for safeguarding against political interference.

0
0
1932

article-image-what-is-a-convolutional-neural-network-cnn-video

Richard Gall

25 Sep 2018

5 min read

What is a convolutional neural network (CNN)? [Video]

Richard Gall

25 Sep 2018

5 min read

0
0
27202

article-image-how-far-will-facebook-go-to-fix-what-it-broke-democracy-trust-reality

Aarthi Kumaraswamy

24 Sep 2018

19 min read

How far will Facebook go to fix what it broke: Democracy, Trust, Reality

Aarthi Kumaraswamy

24 Sep 2018

19 min read

Facebook, along with other tech media giants, like Twitter and Google, broke the democratic process in 2016. Facebook also broke the trust of many of its users as scandal after scandal kept surfacing telling the same story in different ways - the story of user data and trust abused in exchange for growth and revenue. The week before last, Mark Zuckerberg posted a long explanation on Facebook titled ‘Preparing for Elections’. It is the first of a series of reflections by Zuckerberg that ‘address the most important issues facing Facebook’. That post explored what Facebook is doing to avoid ending up in a situation similar to the 2016 elections when the platform ‘inadvertently’ became a super-effective channel for election interference of various kinds. It follows just weeks after Facebook COO, Sheryl Sandberg appeared in front of a Senate Intelligence hearing alongside Twitter CEO, Jack Dorsey on the topic of social media’s role in election interference. Zuckerberg’s mobile-first rigor oversimplifies the issues Zuckerberg opened his post with a strong commitment to addressing the issues plaguing Facebook using the highest levels of rigor the company has known in its history. He wrote, “I am bringing the same focus and rigor to addressing these issues that I've brought to previous product challenges like shifting our services to mobile.” To understand the weight of this statement we must go back to how Facebook became a mobile-first company that beat investor expectations wildly. Suffice to say it went through painful years of restructuring and reorientation in the process. Those unfamiliar with that phase of Facebook, please read the section ‘How far did Facebook go to become a mobile-first company?’ at the end of this post for more details. To be fair, Zuckerberg does acknowledge that pivoting to mobile was a lot easier than what it will take to tackle the current set of challenges. He writes, “These issues are even harder because people don't agree on what a good outcome looks like, or what tradeoffs are acceptable to make. When it comes to free expression, thoughtful people come to different conclusions about the right balances. When it comes to implementing a solution, certainly some investors disagree with my approach to invest so much on security. We have a lot of work ahead, but I am confident we will end this year with much more sophisticated approaches than we began, and that the focus and investments we've put in will be better for our community and the world over the long term.” However, what Zuckerberg does not acknowledge in the above statement is that the current set of issues is not merely a product challenge, but a business ethics and sustainability challenge. Unless ‘an honest look in the mirror’ kind of analysis is done on that side of Facebook, any level of product improvements will only result in cosmetic changes that will end in an ‘operation successful, patient dead’ scenario. In the coming sections, I attempt to dissect Zuckerberg’s post in the context of the above points by reading between the lines to see how serious the platform really is about changing its ways to ‘be better for our community and the world over the long term’. Why does Facebook’s commitment to change feel hollow? Let’s focus on election interference in this analysis as Zuckerberg limits his views to this topic in his post. Facebook has been at the center of this story on many levels. Here is some context on where Zuckerberg is coming from. Facebook’s involvement in the 2016 election meddling Apart from the traditional cyber-attacks (which they had even back then managed to prevent successfully), there were Russia-backed coordinated misinformation campaigns found on the platform. Then there was also the misuse of its user data by data analytics firm, Cambridge Analytica, which consulted on election campaigning. They micro-profiled users based on their psychographics (the way they think and behave) to ensure more effective ad spending by political parties. There was also the issue of certain kinds of ads, subliminal messages and peer pressure sent out to specific Facebook users during elections to prompt them to vote for certain candidates while others did not receive similar messages. There were also alleged reports of a certain set of users having been sent ‘dark posts’ (posts that aren’t publicly visible to all, but visible only to those on the target list) to discourage them from voting altogether. It also appears that Facebook staff offered both the Clinton and the Trump campaigns to assist with Facebook advertising. The former declined the offer while the latter accepted. We don’t know which of the above and to what extent each of these decisions and actions impacted the outcome of the 2016 US presidential elections. But one thing is certain, collectively they did have a significant enough impact for Zuckerberg and team to acknowledge these are serious problems that they need to address, NOW! Deconstructing Zuckerberg’s ‘Protecting Elections’ Before diving into what is problematic about the measures that are taken (or not taken) by Facebook, I must commend them for taking ownership of their role in election interference in the past and for attempting to rectify the wrongs. I like that Zuckerberg has made himself vulnerable by sharing his corrective plans with the public while it is a work in progress and is engaging with the public at a personal level. Facebook’s openness to academic research using anonymized Facebook data and their willingness to permit publishing findings without Facebook’s approval is also noteworthy. Other initiatives such as the political ad transparency report, AI enabled fake account & fake news reduction strategy, doubling the content moderator base, improving their recommendation algorithms are all steps in the right direction. However, this is where my list of nice things to say ends. The overall tone of Zuckerberg’s post is that of bargaining rather than that of acceptance. Interestingly this was exactly the tone adopted by Sandberg as well in the Senate hearing earlier this month, down to some very similar phrases. This makes one question if everything isn’t just one well-orchestrated PR disaster management plan. Disappointingly, most of the actions stated in Zuckerberg's post feel like half-measures; I get the sense that they aren’t willing to go the full distance to achieve the objectives they set for themselves. I hope to be wrong. 1. Zuckerberg focuses too much on ‘what’ and ‘how’, is ignoring the ‘why’ Zuckerberg identifies three key issues he wants to address in 2018: preventing election interference, protecting the community from abuse, and providing users with better control over their information. This clarity is a good starting point. In this post, he only focuses on the first issue. So I will reserve sharing my detailed thoughts on the other two for now. What I would say for now is that the key to addressing all issues on Facebook is taking a hard look at Facebook policies, including privacy, from a mission statement perspective. In other words, be honest about ‘Why Facebook exists’. Users are annoyed, advertisers are not satisfied and neither are shareholders confident about Facebook’s future. Trying to be everyone’s friend is clearly not working for Facebook. As such, I expected this in the opening part of the series. ‘Be better for our community and the world over the long term’ is too vague of a mission statement to be of any practical use. 2. Political Ad transparency report is necessary, but not sufficient In May this year, Facebook released its first political ad transparency report as a gesture to show its commitment to minimizing political interference. The report allows one to see who sponsored which issue advertisement and for how much. This was a move unanimously welcomed by everyone and soon others like Twitter and Google followed suit. By doing this, Facebook hopes to allow its users to form more informed views about political causes and other issues. Here is my problem with this feature. (Yes, I do view this report as a ‘feature’ of the new Facebook app which serves a very specific need: to satisfy regulators and media.) The average Facebook user is not the politically or technologically savvy consumer. They use Facebook to connect with friends and family and maybe play silly games now and then. The majority of these users aren’t going to proactively check out this ad transparency report or the political ad database to arrive at the right conclusions. The people who will find this report interesting are academic researchers, campaign managers, and analysts. It is one more rich data point to understand campaign strategy and thereby infer who the target audience is. This could most likely lead to a downward spiral of more and more polarizing ads from parties across the spectrum. 3. How election campaigning, hate speech, and real violence are linked but unacknowledged Another issue closely tied with political ads is hate speech and violence-inciting polarising content that aren’t necessarily paid ads. These are typical content in the form of posts, images or videos that are posted as a response to political ads or discourses. These act as carriers that amplify the political message, often in ways unintended by the campaigners themselves. The echo chambers still exist. And the more one's ecosystem or ‘look-alike audience’ responds to certain types of ads or posts, users are more likely to keep seeing them, thanks to Facebook's algorithms. Seeing something that is endorsed by one’s friends often primes one to trust what is said without verifying the facts for themselves thus enabling fake news to go viral. The algorithm does the rest to ensure everyone who will engage with the content sees it. Newsy political ads will thrive in such a setup while getting away with saying ‘we made full disclosure in our report’. All of this is great for Facebook’s platform as it not only gets great engagement from the content but also increased ad spendings from all political parties as they can’t afford to be missing from action on Facebook. A by-product of this ultra-polarised scenario though is more protectionism and less free, open and meaningful dialog and debate between candidates as well as supporters on the platform. That’s bad news for the democratic process. 4. Facebook’s election interference prevention model is not scalable Their single-minded focus on eliminating US election interference on Facebook’s platforms through a multipronged approach to content moderation is worth appreciating. This also makes one optimistic about Facebook’s role in consciously attempting to do the right thing when it comes to respecting election processes in other nations as well. But the current approach of creating an ‘election war room’ is neither scalable nor sustainable. What happens everytime a constituency in the US has some election or some part of the world does? What happens when multiple elections take place across the world simultaneously? Who does Facebook prioritize to provide election interference defense support and why? Also, I wouldn’t go too far to trust that they will uphold individual liberties in troubled nations with strong regimes or strong divisive political discourses. What happens when the ruling party is the one interfering with the elections? Who is Facebook answerable to? 5. Facebook’s headcount hasn’t kept up with its own growth ambitions Zuckerberg proudly states in his post that they’ve deleted a billion fake accounts with machine learning and have double the number of people hired to work on safety and security. "With advances in machine learning, we have now built systems that block millions of fake accounts every day. In total, we removed more than one billion fake accounts -- the vast majority within minutes of being created and before they could do any harm -- in the six months between October and March. ....it is still very difficult to identify the most sophisticated actors who build their networks manually one fake account at a time. This is why we've also hired a lot more people to work on safety and security -- up from 10,000 last year to more than 20,000 people this year." ‘People working on safety and security’ could have a wide range of job responsibilities from network security engineers to security guards hired at Facebook offices. What is missing conspicuously in the above picture is a breakdown of the number of people hired specifically to fact check, moderate content and resolve policy related disputes and review flagged content. With billions of users posting on Facebook, the job of content moderators and policy enforcers, even when assisted by algorithms, is massive. It is important that they are rightly incentivized to do their job well and are set clear and measurable goals. The post neither talks of how Facebook plans to reward moderators and neither does it talk about what the yardsticks for performance in this area would be. Facebook fails to acknowledge that it is not fully prepared, partly because it is understaffed. 6. The new Product Policy Director, human rights role is a glorified Public Relations job The weekend following Zuckerberg’s post, a new job opening appeared on Facebook’s careers page for the position of ‘Product policy director, human rights’. Below snippet is taken from that job posting. Source: Facebook careers The above is typically what a Public relations head does as well. Not only are the responsibilities cited above heavily communication and public perception building based, there’s not much given in terms of authority to this role to influence how other teams achieve their goals. Simply put, this role ‘works with, coordinates or advises teams’, it does not ‘guide or direct teams’. Als,o another key point to observe is that this role aims to add another layer of distance to further minimize exposure for Zuckerberg, Sandberg and other top key executives in public forums such as congressional hearings or press meets. Any role/area that is important to a business typically finds a place at the C-suite table. Had this new role been one of the c-suite roles it would have been advertised so, and it may have had some teeth. Of the 24 key executives in Facebook, only one is concerned with privacy and policy, ‘Chief Privacy Officer & VP of U.S. Public Policy’. Even this role does not have a global directive or public welfare in mind. On the other hand, there are multiple product development, creative and business development roles on Facebook’s c-suite. There is even a separate watch product head, a messaging product head, and one just dedicated to China called ‘Head of Creative Shop - Greater China’. This is why Facebook’s plan to protect elections will fail I am afraid Facebook’s greatest strength is also it’s Achilles heel. The tech industry’s deified hacker culture is embodied perfectly by Facebook. Facebook’s ad revenue based flawed business model is the ingenious creation of that very hacker culture. Any attempts to correct everything else is futile without correcting the issues with the current model. The ad revenue based model is why the Facebook app is designed the way it is: with ‘relevant’ news feeds, filter bubbles and look-alike audience segmentation. It is the reason why viral content gets rewarded irrespective of its authenticity or the impact it has on society. It is also the reason why Facebook has a ‘move fast and break things’ internal culture where growth at all costs is favored and idolized. Facebook’s Q2 2018 Earnings summary highlights the above points succinctly. Source: Facebook's SEC Filing The above snapshot means that even if we assume all 30k odd employees do some form of content moderation (the probability of which is zero), every employee is responsible for 50k users’ content daily. Let’s say every user only posts 1 post a day. If we assume Facebook’s news feed algorithms are super efficient and only find 2% of the user content questionable/fake (as speculated by Sandberg in her Senate hearing this month), that would still mean nearly 1k posts per person to review every day! What can Facebook do to turn over a new leaf? Unless Facebook attempts to sincerely address at least some of the below, I will continue to be skeptical of any number of beautifully written posts by Zuckerberg or patriotically orated speeches by Sandberg. A content moderation transparency report that shares not just the number of posts moderated, the number of people working to moderate content on Facebook but also the nature of content moderated, the moderators’ job satisfaction levels, their tenure, qualifications, career aspirations, their challenges, and how much Facebook is investing in people, processes and technology to make its platform safe and objective for everyone to engage with others. A general Ad transparency report that not only lists advertisers on Facebook but also their spendings and chosen ad filters for the public and academia to review or analyze any time. Taking responsibility for the real-world consequences of actions enabled by Facebook. Like the recent gender and age discrimination employment ads shown on Facebook. Really banning hate speech and fake viral content. Bring in a business/AI ethics head who is only next to Zuckerberg and equal to Sandberg’s COO role. Exploring and experimenting with other alternative revenue channels to tackle the current ad-driven business model problem. Resolving the UI problem so that users can gain back control over their data and make it easy to choose to not participate in Facebook’s data experiments. This would mean a potential loss in some ad revenue. The ‘grow hacker’ culture problem that is a byproduct of years of moving fast and breaking things. This would mean a significant change in behavior by everyone starting from the top and probably restructuring the way teams are organized and business is done. It would also mean a different definition and measurement of success which could lead to shareholder backlash. But Mark is uniquely placed to withstand these pressures given his clout over the board voting powers. Like Augustus Caesar his role model, Zuckerberg has a chance to make history. But he might have to put the company through hard and sacrificing times in exchange for the proverbial 200 years of world peace. He’s got the best minds and limitless resources at his disposal to right what he and his platform wronged. But he would have to make enemies with the hands that feed him. Would he rise to the challenge? Like Augustus who is rumored to have killed his grandson, will Zuckerberg ever be prepared to kill his ad revenue generating brainchild? In the meanwhile, we must not underestimate the power of good digital citizenry. We must continue to fight the good fight to move tech giants like Facebook in the right direction. Just as persistent trickling water droplets can erode mountains and create new pathways, so can our mindful actions as digital platform users prompt major tech reforms. It could be as bold as deleting one's Facebook account (I haven’t been on the platform for years now, and I don’t miss it at all). You could organize groups to create awareness on topics like digital privacy, fake news, filter bubbles, or deliberately choose to engage with those whose views differ from yours to understand their perspective on topics and thereby do your part in reversing algorithmically accentuated polarity. It could also be by selecting the right individuals to engage in informed dialog with tech conglomerates. Not every action needs to be hard though. It could be as simple as customizing your default privacy settings or choosing to only spend a select amount of time on such platforms, or deciding to verify the authenticity and assessing the toxicity of a post you wish to like, share or forward to your network. Addendum How far did Facebook go to become a mobile-first company? Following are some of the things Facebook did to become the largest mobile advertising platform in the world, surpassing Google by a huge margin. Clear purpose and reason for the change: “For one, there are more mobile users. Second, they’re spending more time on it... third, we can have better advertising on mobile, make more money,” said Zuckerberg at TechCrunch Disrupt back in 2012 on why they were becoming mobile first. In other words, there was a lot of growth and revenue potential in investing in this space. This was a simple and clear ‘what’s in it for me’ incentive for everyone working to make the transition as well for stockholders and advertisers to place their trust in Zuckerberg’s endeavors. Setting company-wide accountability: “We realigned the company around, so everybody was responsible for mobile.”, said the then President of Business and Marketing Partnerships David Fischer to Fortune in 2013. Willing to sacrifice desktop for mobile: Facebook decided to make a bold gamble to lose its desktop users to grow its unproven mobile platform. Essentially it was willing to bet its only cash cow for a dark horse that was dependent on so many other factors to go right. Strict consequences for non-compliance: Back in the days of transitioning to a mobile-first company Zuckerberg famously said to all his product teams that when they went in for reviews: “Come in with mobile. If you come in and try to show me a desktop product, I’m going to kick you out. You have to come in and show me a mobile product.” Expanding resources and investing in reskilling: They grew their team of 20 mobile engineers to literally all engineers at Facebook undergoing training courses on iOS and Android development. “we’ve completely changed the way we do product development. We’ve trained all our engineers to do mobile first.”, said Facebook’s VP of corporate development, Vaughan Smith to TechCrunch by the end of 2012. Realigning product design philosophy: Designed custom features for the mobile-first interface instead of trying to adapt the features for the web to mobile. In other words, they began with mobile as their default user interface. Local and global user behavior sensitization: Some of their engineering teams even did field visits to developing nations like the Philippines to see first hand how mobile apps are being used there. Environmental considerations in app design: Facebook even had the foresight to consider scenarios where mobile users may not have quality internet signals or poor quality mobile battery related issues. They designed their apps keeping these future needs in mind.

0
0
2855

article-image-how-facebook-is-advancing-artificial-intelligence-video

Richard Gall

14 Sep 2018

4 min read

How Facebook is advancing artificial intelligence [Video]

Richard Gall

14 Sep 2018

4 min read

0
0
2804

article-image-getting-started-with-amazon-machine-learning-workflow-tutorial

Melisha Dsouza

02 Sep 2018

14 min read

Getting started with Amazon Machine Learning workflow [Tutorial]

Melisha Dsouza

02 Sep 2018

14 min read

Amazon Machine Learning is useful for building ML models and generating predictions. It also enables the development of robust and scalable smart applications. The process of building ML models with Amazon Machine Learning consists of three operations: data analysis model training evaluation. The code files for this article are available on Github. This tutorial is an excerpt from a book written by Alexis Perrier titled Effective Amazon Machine Learning. The Amazon Machine Learning service is available at https://console.aws.amazon.com/machinelearning/. The Amazon ML workflow closely follows a standard Data Science workflow with steps: Extract the data and clean it up. Make it available to the algorithm. Split the data into a training and validation set, typically a 70/30 split with equal distribution of the predictors in each part. Select the best model by training several models on the training dataset and comparing their performances on the validation dataset. Use the best model for predictions on new data. As shown in the following Amazon ML menu, the service is built around four objects: Datasource ML model Evaluation Prediction The Datasource and Model can also be configured and set up in the same flow by creating a new Datasource and ML model. Let us take a closer look at each one of these steps. Understanding the dataset used We will use the simple Predicting Weight by Height and Age dataset (from Lewis Taylor (1967)) with 237 samples of children's age, weight, height, and gender, which is available at https://v8doc.sas.com/sashtml/stat/chap55/sect51.htm. This dataset is composed of 237 rows. Each row has the following predictors: sex (F, M), age (in months), height (in inches), and we are trying to predict the weight (in lbs) of these children. There are no missing values and no outliers. The variables are close enough in range and normalization is not required. We do not need to carry out any preprocessing or cleaning on the original dataset. Age, height, and weight are numerical variables (real-valued), and sex is a categorical variable. We will randomly select 20% of the rows as the held-out subset to use for prediction on previously unseen data and keep the other 80% as training and evaluation data. This data split can be done in Excel or any other spreadsheet editor: By creating a new column with randomly generated numbers Sorting the spreadsheet by that column Selecting 190 rows for training and 47 rows for prediction (roughly a 80/20 split) Let us name the training set LT67_training.csv and the held-out set that we will use for prediction LT67_heldout.csv, where LT67 stands for Lewis and Taylor, the creator of this dataset in 1967. As with all datasets, scripts, and resources mentioned in this book, the training and holdout files are available in the GitHub repository at https://github.com/alexperrier/packt-aml. It is important for the distribution in age, sex, height, and weight to be similar in both subsets. We want the data on which we will make predictions to show patterns that are similar to the data on which we will train and optimize our model. Loading the data on S3 Follow these steps to load the training and held-out datasets on S3: Go to your s3 console at https://console.aws.amazon.com/s3. Create a bucket if you haven't done so already. Buckets are basically folders that are uniquely named across all S3. We created a bucket named aml.packt. Since that name has now been taken, you will have to choose another bucket name if you are following along with this demonstration. Click on the bucket name you created and upload both the LT67_training.csv and LT67_heldout.csv files by selecting Upload from the Actions drop-down menu: Both files are small, only a few KB, and hosting costs should remain negligible for that exercise. Note that for each file, by selecting the Properties tab on the right, you can specify how your files are accessed, what user, role, group or AWS service may download, read, write, and delete the files, and whether or not they should be accessible from the Open Web. When creating the datasource in Amazon ML, you will be prompted to grant Amazon ML access to your input data. You can specify the access rules to these files now in S3 or simply grant access later on. Our data is now in the cloud in an S3 bucket. We need to tell Amazon ML where to find that input data by creating a datasource. We will first create the datasource for the training file ST67_training.csv. Declaring a datasource Go to the Amazon ML dashboard, and click on Create new... | Datasource and ML model. We will use the faster flow available by default: As shown in the following screenshot, you are asked to specify the path to the LT67_training.csv file {S3://bucket}{path}{file}. Note that the S3 location field automatically populates with the bucket names and file names that are available to your user: Specifying a Datasource name is useful to organize your Amazon ML assets. By clicking on Verify, Amazon ML will make sure that it has the proper rights to access the file. In case it needs to be granted access to the file, you will be prompted to do so as shown in the following screenshot: Just click on Yes to grant access. At this point, Amazon ML will validate the datasource and analyze its contents. Creating the datasource An Amazon ML datasource is composed of the following: The location of the data file: The data file is not duplicated or cloned in Amazon ML but accessed from S3 The schema that contains information on the type of the variables contained in the CSV file: Categorical Text Numeric (real-valued) Binary It is possible to supply Amazon ML with your own schema or modify the one created by Amazon ML. At this point, Amazon ML has a pretty good idea of the type of data in your training dataset. It has identified the different types of variables and knows how many rows it has: Move on to the next step by clicking on Continue, and see what schema Amazon ML has inferred from the dataset as shown in the next screenshot: Amazon ML needs to know at that point which is the variable you are trying to predict. Be sure to tell Amazon ML the following: The first line in the CSV file contains te column name The target is the weight We see here that Amazon ML has correctly inferred the following: sex is categorical age, height, and weight are numeric (continuous real values) Since we chose a numeric variable as the target Amazon ML, will use Linear Regression as the predictive model. For binary or categorical values, we would have used Logistic Regression. This means that Amazon ML will try to find the best a, b, and c coefficients so that the weight predicted by the following equation is as close as possible to the observed real weight present in the data: predicted weight = a * age + b * height + c * sex Amazon ML will then ask you if your data contains a row identifier. In our present case, it does not. Row identifiers are useful when you want to understand the prediction obtained for each row or add an extra column to your dataset later on in your project. Row identifiers are for reference purposes only and are not used by the service to build the model. You will be asked to review the datasource. You can go back to each one of the previous steps and edit the parameters for the schema, the target and the input data. Now that the data is known to Amazon ML, the next step is to set up the parameters of the algorithm that will train the model. Understanding the model We select the default parameters for the training and evaluation settings. Amazon ML will do the following: Create a recipe for data transformation based on the statistical properties it has inferred from the dataset Split the dataset (ST67_training.csv) into a training part and a validation part, with a 70/30 split. The split strategy assumes the data has already been shuffled and can be split sequentially. The recipe will be used to transform the data in a similar way for the training and the validation datasets. The only transformation suggested by Amazon ML is to transform the categorical variable sex into a binary variable, where m = 0 and f = 1 for instance. No other transformation is needed. The default advanced settings for the model are shown in the following screenshot: We see that Amazon ML will pass over the data 10 times, shuffle splitting the data each time. It will use an L2 regularization strategy based on the sum of the square of the coefficients of the regression to prevent overfitting. We will evaluate the predictive power of the model using our LT67_heldout.csv dataset later on. Regularization comes in 3 levels with a mild (10^-6), medium (10^-4), or aggressive (10^-02) setting, each value stronger than the previous one. The default setting is mild, the lowest, with a regularization constant of 0.00001 (10^-6) implying that Amazon ML does not anticipate much overfitting on this dataset. This makes sense when the number of predictors, three in our case, is much smaller than the number of samples (190 for the training set). Clicking on the Create ML model button will launch the model creation. This takes a few minutes to resolve, depending on the size and complexity of your dataset. You can check its status by refreshing the model page. In the meantime, the model status remains pending. At that point, Amazon ML will split our training dataset into two subsets: a training and a validation set. It will use the training portion of the data to train several settings of the algorithm and select the best one based on its performance on the training data. It will then apply the associated model to the validation set and return an evaluation score for that model. By default, Amazon ML will sequentially take the first 70% of the samples for training and the remaining 30% for validation. It's worth noting that Amazon ML will not create two extra files and store them on S3, but instead create two new datasources out of the initial datasource we have previously defined. Each new datasource is obtained from the original one via a Data rearrangement JSON recipe such as the following: { "splitting": { "percentBegin": 0, "percentEnd": 70 } } You can see these two new datasources in the Datasource dashboard. Three datasources are now available where there was initially only one, as shown by the following screenshot: While the model is being trained, Amazon ML runs the Stochastic Gradient algorithm several times on the training data with different parameters: Varying the learning rate in increments of powers of 10: 0.01, 0.1, 1, 10, and 100. Making several passes over the training data while shuffling the samples before each path. At each pass, calculating the prediction error, the Root Mean Squared Error (RMSE), to estimate how much of an improvement over the last pass was obtained. If the decrease in RMSE is not really significant, the algorithm is considered to have converged, and no further pass shall be made. At the end of the passes, the setting that ends up with the lowest RMSE wins, and the associated model (the weights of the regression) is selected as the best version. Once the model has finished training, Amazon ML evaluates its performance on the validation datasource. Once the evaluation itself is also ready, you have access to the model's evaluation. Evaluating the model Amazon ML uses the standard metric RMSE for linear regression. RMSE is defined as the sum of the squares of the difference between the real values and the predicted values: Here, ŷ is the predicted values, and y the real values we want to predict (the weight of the children in our case). The closer the predictions are to the real values, the lower the RMSE is. A lower RMSE means a better, more accurate prediction. Making batch predictions We now have a model that has been properly trained and selected among other models. We can use it to make predictions on new data. A batch prediction consists in applying a model to a datasource in order to make predictions on that datasource. We need to tell Amazon ML which model we want to apply on which data. Batch predictions are different from streaming predictions. With batch predictions, all the data is already made available as a datasource, while for streaming predictions, the data will be fed to the model as it becomes available. The dataset is not available beforehand in its entirety. In the Main Menu select Batch Predictions to access the dashboard predictions and click on Create a New Prediction: The first step is to select one of the models available in your model dashboard. You should choose the one that has the lowest RMSE: The next step is to associate a datasource to the model you just selected. We had uploaded the held-out dataset to S3 at the beginning of this chapter (under the Loading the data on S3 section) but had not used it to create a datasource. We will do so now.When asked for a datasource in the next screen, make sure to check My data is in S3, and I need to create a datasource, and then select the held-out dataset that should already be present in your S3 bucket: Don't forget to tell Amazon ML that the first line of the file contains columns. In our current project, our held-out dataset also contains the true values for the weight of the students. This would not be the case for "real" data in a real-world project where the real values are truly unknown. However, in our case, this will allow us to calculate the RMSE score of our predictions and assess the quality of these predictions. The final step is to click on the Verify button and wait for a few minutes: Amazon ML will run the model on the new datasource and will generate predictions in the form of a CSV file. Contrary to the evaluation and model-building phase, we now have real predictions. We are also no longer given a score associated with these predictions. After a few minutes, you will notice a new batch-prediction folder in your S3 bucket. This folder contains a manifest file and a results folder. The manifest file is a JSON file with the path to the initial datasource and the path to the results file. The results folder contains a gzipped CSV file: Uncompressed, the CSV file contains two columns, trueLabel, the initial target from the held-out set, and score, which corresponds to the predicted values. We can easily calculate the RMSE for those results directly in the spreadsheet through the following steps: Creating a new column that holds the square of the difference of the two columns. Summing all the rows. Taking the square root of the result. The following illustration shows how we create a third column C, as the squared difference between the trueLabel column A and the score (or predicted value) column B: As shown in the following screenshot, averaging column C and taking the square root gives an RMSE of 11.96, which is even significantly better than the RMSE we obtained during the evaluation phase (RMSE 14.4): The fact that the RMSE on the held-out set is better than the RMSE on the validation set means that our model did not overfit the training data, since it performed even better on new data than expected. Our model is robust. The left side of the following graph shows the True (Triangle) and Predicted (Circle) Weight values for all the samples in the held-out set. The right side shows the histogram of the residuals. Similar to the histogram of residuals we had observed on the validation set, we observe that the residuals are not centered on 0. Our model has a tendency to overestimate the weight of the students: In this tutorial, we have successfully performed the loading of the data on S3 and let Amazon ML infer the schema and transform the data. We also created a model and evaluated its performance. Finally, we made a prediction on the held -out dataset. To understand how to leverage Amazon's powerful platform for your predictive analytics needs, check out this book Effective Amazon Machine Learning. Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics Amazon Sagemaker makes machine learning on the cloud easy Amazon ML Solutions Lab to help customers “work backwards” and leverage machine learning

0
0
2148

article-image-understanding-amazon-machine-learning-workflow

Natasha Mathur

24 Aug 2018

11 min read

Understanding Amazon Machine Learning Workflow [ Tutorial ]

Natasha Mathur

24 Aug 2018

11 min read

This article presents an overview of the workflow of a simple Amazon Machine Learning (Amazon ML) project. Amazon Machine Learning is an online service by Amazon Web Services (AWS) that does supervised learning for predictive analytics. Launched in April 2015 at the AWS Summit, Amazon ML joins a growing list of cloud-based machine learning services, such as Microsoft Azure, Google prediction, IBM Watson, Prediction IO, BigML, and many others. These online machine learning services form an offer commonly referred to as Machine Learning as a Service or MLaaS following a similar denomination pattern of other cloud-based services such as SaaS, PaaS, and IaaS respectively for Software, Platform, or Infrastructure as a Service. The Amazon ML workflow closely follows a standard Data Science workflow with steps: Extract the data and clean it up. Make it available to the algorithm. Split the data into a training and validation set, typically a 70/30 split with equal distribution of the predictors in each part. Select the best model by training several models on the training dataset and comparing their performances on the validation dataset. Use the best model for predictions on new data. This article is an excerpt taken from the book 'Effective Amazon Machine Learning' written by Alexis Perrier. As shown in the following Amazon ML menu, the service is built around four objects: Datasource ML model Evaluation Prediction The Datasource and Model can also be configured and set up in the same flow by creating a new Datasource and ML model. We will take a closer look at the Datasource and ML model. Amazon ML dataset For the rest of the article, we will use the simple Predicting Weight by Height and Age dataset (from Lewis Taylor (1967)) with 237 samples of children's age, weight, height, and gender, which is available at https://v8doc.sas.com/sashtml/stat/chap55/sect51.htm. This dataset is composed of 237 rows. Each row has the following predictors: sex (F, M), age (in months), height (in inches), and we are trying to predict the weight (in lbs) of these children. There are no missing values and no outliers. The variables are close enough in range and normalization is not required. In short, we do not need to carry out any preprocessing or cleaning on the original dataset. Age, height, and weight are numerical variables (real-valued), and sex is a categorical variable. We will randomly select 20% of the rows as the held-out subset to use for the prediction of previously unseen data and keep the other 80% as training and evaluation data. This data split can be done in Excel or any other spreadsheet editor: By creating a new column with randomly generated numbers Sorting the spreadsheet by that column Selecting 190 rows for training and 47 rows for prediction (roughly a 80/20 split) Let us name the training set LT67_training.csv and the held-out set that we will use for prediction LT67_heldout.csv, where LT67 stands for Lewis and Taylor, the creator of this dataset in 1967. Note that it is important for the distribution in age, sex, height, and weight to be similar in both subsets. We want the data on which we will make predictions to show patterns that are similar to the data on which we will train and optimize our model. Loading the data on Amazon S3 Follow these steps to load the training and held-out datasets on S3: Go to your s3 console at https://console.aws.amazon.com/s3. Create a bucket if you haven't done so already. Buckets are basically folders that are uniquely named across all S3. We created a bucket named aml.packt. Since that name has now been taken, you will have to choose another bucket name if you are following along with this demonstration. Click on the bucket name you created and upload both the LT67_training.csv and LT67_heldout.csv files by selecting Upload from the Actions drop-down menu: Both files are small, only a few KB, and hosting costs should remain negligible for that exercise. Note that for each file, by selecting the Properties tab on the right, you can specify how your files are accessed, what user, role, group or AWS service may download, read, write, and delete the files, and whether or not they should be accessible from the Open Web. When creating the datasource in Amazon ML, you will be prompted to grant Amazon ML access to your input data. You can specify the access rules to these files now in S3 or simply grant access later on. Our data is now in the cloud in an S3 bucket. We need to tell Amazon ML where to find that input data by creating a datasource. We will first create the datasource for the training file ST67_training.csv. Declaring a datasource Go to the Amazon ML dashboard, and click on Create new... | Datasource and ML model. We will use the faster flow available by default: As shown in the following screenshot, you are asked to specify the path to the LT67_training.csv file {S3://bucket}{path}{file}. Note that the S3 location field automatically populates with the bucket names and file names that are available to your user: Specifying a Datasource name is used to organize your Amazon ML assets. By clicking on Verify, Amazon ML will make sure that it has the proper rights to access the file. In case it needs to be granted access to the file, you will be prompted to do so as shown in the following screenshot: Just click on Yes to grant access. At this point, Amazon ML will validate the datasource and analyze its contents. Creating the datasource An Amazon ML datasource is composed of the following: The location of the data file: The data file is not duplicated or cloned in Amazon ML but accessed from S3 The schema that contains information on the type of the variables contained in the CSV file: Categorical Text Numeric (real-valued) Binary It is possible to supply Amazon ML with your own schema or modify the one created by Amazon ML. At this point, Amazon ML has a pretty good idea of the type of data in your training dataset. It has identified the different types of variables and knows how many rows it has: Move on to the next step by clicking on Continue, and see what schema Amazon ML has inferred from the dataset as shown in the next screenshot: Amazon ML needs to know at that point which is the variable you are trying to predict. Be sure to tell Amazon ML the following: The first line in the CSV file contains te column name The target is the weight We see here that Amazon ML has correctly inferred the following: sex is categorical age, height, and weight are numeric (continuous real values) Since we chose a numeric variable as the target Amazon ML, will use Linear Regression as the predictive model. For binary or categorical values, we would have used Logistic Regression. This means that Amazon ML will try to find the best a, b, and c coefficients so that the weight predicted by the following equation is as close as possible to the observed real weight present in the data: predicted weight = a * age + b * height + c * sex Amazon ML will then ask you if your data contains a row identifier. In our present case, it does not. Row identifiers are used when you want to understand the prediction obtained for each row or add an extra column to your dataset later on in your project. Row identifiers are for reference purposes only and are not used by the service to build the model. You will be asked to review the datasource. You can go back to each one of the previous steps and edit the parameters for the schema, the target, and the input data. Now that the data is known to Amazon ML, the next step is to set up the parameters of the algorithm that will train the model. The machine learning model We select the default parameters for the training and evaluation settings. Amazon ML will do the following: Create a step for data transformation based on the statistical properties it has inferred from the dataset Split the dataset (ST67_training.csv) into a training part and a validation part, with a 70/30 split. The split strategy assumes the data has already been shuffled and can be split sequentially. The step will be used to transform the data in a similar way for the training and the validation datasets. The only transformation suggested by Amazon ML is to transform the categorical variable sex into a binary variable, where m = 0 and f = 1 for instance. No other transformation is needed. The default advanced settings for the model are shown in the following screenshot: We see that Amazon ML will pass over the data 10 times, shuffle splitting the data each time. It will use an L2 regularization strategy based on the sum of the square of the coefficients of the regression to prevent overfitting. We will evaluate the predictive power of the model using our LT67_heldout.csv dataset later on. Regularization comes in 3 levels with a mild (10^-6), medium (10^-4), or aggressive (10^-02) setting, each value stronger than the previous one. The default setting is mild, the lowest, with a regularization constant of 0.00001 (10^-6) implying that Amazon ML does not anticipate much overfitting on this dataset. This makes sense when the number of predictors, three in our case, is much smaller than the number of samples (190 for the training set). Clicking on the Create ML model button will launch the model creation. This takes a few minutes to resolve, depending on the size and complexity of your dataset. You can check its status by refreshing the model page. In the meantime, the model status remains pending. At that point, Amazon ML will split our training dataset into two subsets: a training and a validation set. It will use the training portion of the data to train several settings of the algorithm and select the best one based on its performance on the training data. It will then apply the associated model to the validation set and return an evaluation score for that model. By default, Amazon ML will sequentially take the first 70% of the samples for training and the remaining 30% for validation. It's worth noting that Amazon ML will not create two extra files and store them on S3, but instead create two new datasources out of the initial datasource we have previously defined. Each new datasource is obtained from the original one via a Data rearrangement JSON recipe such as the following: { "splitting": { "percentBegin": 0, "percentEnd": 70 } } You can see these two new datasources in the Datasource dashboard. Three datasources are now available where there was initially only one, as shown by the following screenshot: While the model is being trained, Amazon ML runs the Stochastic Gradient algorithm several times on the training data with different parameters: Varying the learning rate in increments of powers of 10: 0.01, 0.1, 1, 10, and 100. Making several passes over the training data while shuffling the samples before each path. At each pass, calculating the prediction error, the Root Mean Squared Error (RMSE), to estimate how much of an improvement over the last pass was obtained. If the decrease in RMSE is not really significant, the algorithm is considered to have converged, and no further pass shall be made. At the end of the passes, the setting that ends up with the lowest RMSE wins, and the associated model (the weights of the regression) is selected as the best version. Once the model has finished training, Amazon ML evaluates its performance on the validation datasource. Once the evaluation itself is also ready, you have access to the model's evaluation. The Amazon ML flow is smooth and facilitates the inherent data science loop: data, model, evaluation, and prediction. We looked at an overview of the workflow of a simple Amazon Machine Learning (Amazon ML) project. We discussed two objects of the Amazon ML menu: Datasource and ML model. If you found this post useful, be sure to check out the book 'Effective Amazon Machine Learning' to learn about evaluation and prediction in Amazon ML along with other AWS ML concepts. Integrate applications with AWS services: Amazon DynamoDB & Amazon Kinesis [Tutorial] AWS makes Amazon Rekognition, its image recognition AI, available for Asia-Pacific developers

0
0
2250

article-image-four-ibm-facial-recognition-patents-in-2018-we-found-intriguing

Natasha Mathur

11 Aug 2018

10 min read

Four IBM facial recognition patents in 2018, we found intriguing

Natasha Mathur

11 Aug 2018

10 min read

0
0
5818

How-To Tutorials - Data

Privacy experts urge the Senate Commerce Committee for a strong federal privacy bill “that sets a floor, not a ceiling”

Microsoft open sources Infer.NET, it’s popular model-based machine learning framework

7 tips for using Git and GitHub the right way

PyTorch 1.0 preview release is production ready with torch.jit, c10d distributed library, C++ API

The ethical dilemmas developers working on Artificial Intelligence products must consider

BrainNet, an interface to communicate between human brains, could soon make Telepathy real

Did you know Facebook shares the data you share with them for ‘security’ reasons with advertisers?

9 recommended blockchain online courses

The White House is reportedly launching an antitrust investigation against social media companies

What is a convolutional neural network (CNN)? [Video]

Trending Topics

How far will Facebook go to fix what it broke: Democracy, Trust, Reality

How Facebook is advancing artificial intelligence [Video]

Getting started with Amazon Machine Learning workflow [Tutorial]

Understanding Amazon Machine Learning Workflow [ Tutorial ]

Four IBM facial recognition patents in 2018, we found intriguing

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access