Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-optimizing-graphql-with-apollo-engine-tutorial
Bhagyashree R
02 Apr 2019
13 min read
Save for later

Optimizing GraphQL with Apollo Engine [Tutorial]

Bhagyashree R
02 Apr 2019
13 min read
Apollo Engine is a commercial product produced by MDG, the Meteor Development Group, the company behind Apollo. It provides many great features, which we'll explore in this article. We will also answer these questions using the Apollo Engine: How is our GraphQL API performing, are there any errors, and how can we improve the GraphQL schema? This article is taken from the book Hands-on Full-Stack Web Development with GraphQL and React by Sebastian Grebe. This book will guide you in implementing applications by using React, Apollo, Node.js, and SQL. By the end of the book, you will be proficient in using GraphQL and React for your full-stack development requirements. To follow along with the examples implemented in this article, you can download the code from the book’s GitHub repository. Setting up Apollo Engine First, you need to sign up for an Apollo Engine account. At the time of writing, they offer three different plans, which you can find by going to their plans page. When signing up, you get a two-week trial of the Team plan, which is one of the paid plans. Afterward, you'll be downgraded to the free plan. You should compare all three plans to understand how they differ—they're all worth checking out. To sign up, go to its login page. Currently, you can only sign up using a GitHub account. If you don't have one already, create a GitHub account. After logging in, you will see a dashboard that looks as follows: The next step is to add a service with the NEW SERVICE button in the top-right corner. The first thing you need to enter is a unique id for your service across all Apollo Engine services. This id will be auto-generated through the organization you select, but can be customized. Secondly, you will be asked to publish your GraphQL schema to Apollo Engine. Publishing your GraphQL schema means that you upload your schema to Apollo Engine so that it can be processed. It won't get publicized to external users. You can do this using the command provided by Apollo Engine. For me, this command looked as follows: npx apollo service:push --endpoint="http://localhost:8000/graphql" --key="YOUR_KEY" The preceding endpoint must match your GraphQL route. The key comes from Apollo Engine itself, so you don't generate it on your own. Before running the preceding command, you have to start the server, otherwise, the GraphQL schema isn't accessible. Once you've uploaded the schema, Apollo Engine will redirect you to the service you just set up. Notice that the GraphQL introspection feature needs to be enabled. Introspection means that you can ask your GraphQL API which operations it supports. Introspection is only enabled when you run your Apollo Server in a development environment, or if you explicitly enable introspection in production. I highly discourage this because it involves giving away information about queries and mutations that are accepted by your back end. However, if you want to enable it, you can do this by setting the introspection field when initializing Apollo Server. It can be added inside the index.js file of the graphql folder: const server = new ApolloServer({ schema: executableSchema, introspection: true, Ensure that you remove the introspection field when deploying your application. If you aren't able to run the GraphQL server, you also have the ability to specify a schema file. Once you publish the GraphQL schema, the setup process for your Apollo Engine service should be done. We'll explore the features that we can now use in the following sections of this article. Before doing this, however, we have to change one thing on the back end to get Apollo Engine working with our back end. We already used our API Key to upload our GraphQL schema to Apollo Engine. Everything, such as error tracking and performance analysis, relies on this key. We also have to insert it in our GraphQL server. If you entered a valid API key, all requests will be collected in Apollo Engine. Open index.js in the server's graphql folder and add the following object to the ApolloServer initialization: engine: { apiKey: ENGINE_KEY } The ENGINE_KEY variable should be extracted from the environment variables at the top of the file. We also need to extract JWT_SECRET with the following line: const { JWT_SECRET, ENGINE_KEY } = process.env; Verify that everything is working by running some GraphQL requests. You can view all past requests by clicking on the Clients tab in Apollo Engine. You should see that a number of requests happened, under the Activity in the last hour panel. If this isn't the case, there must be a problem with the Apollo Server configuration. Analyzing schemas with Apollo Engine The Community plan of Apollo Engine offers schema registry and explorer tools. You can find them by clicking on the Explorer tab in the left-hand panel. If your setup has gone well, the page should look as follows: Let's take a closer look at this screenshot: On the page, you see the last GraphQL schema that you have published. Each schema you publish has a unique version, as long as the schema includes changes. Beneath the version number, you can see your entire GraphQL schema. You can inspect all operations and types. All relations between types and operations are directly linked to each other. You can directly see the number of clients and various usage statistics next to each operation, type, and field. You can search through your GraphQL schema in the top bar and filter the usage statistics in the panel on the right. You can also switch to the Deprecation tab at the top. This page gives you a list of fields that are deprecated. We won't use this page because we are using the latest field definitions, but it's vital if you're running an application for a longer time. Having an overview of our schema is beneficial. In production, every new release of our application is likely to also bring changes to the GraphQL schema. With Apollo Engine, you can track those changes easily. This feature is called schema-change validation and is only included in the paid Team plan of Apollo Engine. It's worth the extra money because it allows you to track schema changes and also to compare how those fields are used. It allows us to draw conclusions about which clients and versions are being used at the moment. I have created an example for you in the following screenshot: Here, I published an initial version of our current GraphQL schema. Afterward, I added a demonstration type with one field, called example. On the right-hand side, you can see the schema difference between the initial and second releases of the GraphQL schema. Viewing your schema inside Apollo Engine, including the history of all previous schemas, is very useful. Performance metrics with Apollo Engine When your application is live and heavily used, you can't check the status of every feature yourself; it would lead to an impossible amount of work. Apollo Engine can tell you how your GraphQL API is performing by collecting statistics with each request that's received. You always have an overview of the general usage of your application, the number of requests it receives, the request latency, the time taken to process each operation, the type, and also each field that is returned. Apollo Server can provide these precise analytics since each field is represented in a resolver function. The time elapsed to resolve each field is then collected and stored inside Apollo Engine. At the top of the Metrics page, you have four tabs. The first tab will look as follows: If your GraphQL API is running for more than a day, you'll receive an overview that looks like the one here. The left-hand graph shows you the request rate over the last day. The graph in the middle shows the service time, which sums up the processing time of all requests. The right-hand graph gives you the number of errors, along with the queries that caused them. Under the overview, you'll find details about the current day, including the requests per minute, the request latency over time, and the request latency distribution: Requests Per Minute (rpm): It is useful when your API is used very often. It indicates which requests are sent more often than others. Latency over time: It is useful when the requests to your API take too long to process. You can use this information to look for a correlation between the number of requests and increasing latency. Request-latency distribution: It shows you the processing time and the number of requests. You can compare the number of slow requests with the number of fast requests in this chart. In the right-hand panel of Apollo Engine, under Metrics, you'll see all your GraphQL operations. If you select one of these, you can get even more detailed statistics. Now, switch to the Traces tab at the top. The first chart on this page looks as follows: The latency distribution chart shows all the different latencies for the currently-selected operation, including the number of sent requests with that latency. In the preceding example, I used the postsFeed query. Each request latency has its own execution timetable. You can see it by clicking on any column in the preceding chart. The table should look like the following screenshot: The execution timetable is a big foldable tree. It starts at the top with the root query, postsFeed, in this case. You can also see the overall time it took to process the operation. Each resolver function has got its own latency, which might include, for example, the time taken for each post and user to be queried from the database. All the times from within the tree are summed up and result in a total time of about 90 milliseconds. It's obvious that you should always check all operations and their latencies to identify performance breakdowns. Your users should always have responsive access to your API. This can easily be monitored with Apollo Engine. Error tracking with Apollo Engine We've already looked at how to inspect single operations using Apollo Engine. Under the Clients tab, you will find a separate view that covers all client types and their requests: In this tab, you can directly see the percentage of errors that happened during each operation. In the currentUser query, there were 37.14% errors out of the total currentUser requests. If you take a closer look at the left-hand side of the image, you will see that it says, Unidentified clients. Since version 2.2.3 of Apollo Server, client awareness is supported. It allows you to identify the client and track how consumers use your API. Apollo automatically extracts an extensions field inside each GraphQL operation, which can hold a name and version. Both fields—Name and Version—are then directly transferred to Apollo Engine. We can filter by these fields in Apollo Engine. We will have a look at how to implement this in our back end next. In this example, we'll use HTTP header fields to track the client type. There will be two header fields: apollo-client-name and apollo-client-version. We'll use these to set custom values to filter requests later in the Clients page. Open the index.js file from the graphql folder. Add the following function to the engine property of the ApolloServer initialization: engine: { apiKey: ENGINE_KEY, generateClientInfo: ({ request }) => { const headers = request.http.headers; const clientName = headers.get('apollo-client-name'); const clientVersion = headers.get('apollo-client-version'); if(clientName && clientVersion) { return { clientName, clientVersion }; } else { return { clientName: "Unknown Client", clientVersion: "Unversioned", }; } }, }, The generateClientInfo function is executed with every request. We extract the two fields from the header. If they exist, we return an object with the clientName and clientVersion properties that have the values from the headers. Otherwise, we return a static Unkown Client text. To get both of our clients – the front end and back end – set up, we have to add these fields. Perform the following steps: Open the index.js file of the client's apollo folder file. Add a new InfoLink to the file to set the two new header fields: const InfoLink = (operation, next) => { operation.setContext(context => ({ ...context, headers: { ...context.headers, 'apollo-client-name': 'Apollo Frontend Client', 'apollo-client-version': '1' }, })); return next(operation); }; Like AuthLink, this link will add the two new header fields next to the authorization header. It sets the version header to '1' and the name of the client to 'Apollo Frontend Client'. We will see both in Apollo Engine soon. Add InfoLink in front of AuthLink in the ApolloLink.from function. On the back end, we need to edit the apollo.js file in the ssr folder: const InfoLink = (operation, next) => { operation.setContext(context => ({ ...context, headers: { ...context.headers, 'apollo-client-name': 'Apollo Backend Client', 'apollo-client-version': '1' }, })); return next(operation); }; The link is almost the same as the one for the front end, except that we set another apollo-client-name header. Add it just before AuthLink in the ApolloLink.from function. The client name differs between the front end and back end code so you can compare both clients inside Apollo Engine. If you execute some requests from the back end and front end, you can see the result of these changes directly in Apollo Engine. Here, you can see an example of how that result should look: At the top of the screenshot, we see the number of requests the back end has made. In the middle, all the clients that we have no further information on are listed, while at the bottom, we can see all requests that have been made by the client-side code. Unknown clients might be external applications that are accessing your API. When releasing a new version of your application, you can increase the version number of the client. The version number represents another comparable field. We now know which clients have accessed our API from the information provided by Apollo Engine. Let's take a look at what Apollo Engine can tell us about errors. When you visit the Error tab, you will be presented with a screen that looks like the following screenshot: The first chart shows the number of errors over a timeline. Under the graph, you can see each error with a timestamp and the stack trace. You can follow the link to see the trace in detail, with the location of the error. If you paid for the Team plan, you can also set alerts when the number of errors increases or the latency time goes up. You can find these alerts under the Integrations tab. This article walked you through how to sign up to and set up Apollo Engine. Further, we will learn how to analyze schemas, check how our GraphQL API is performing, and track errors using Apollo Engine. If you found this post useful, do check out the book, Hands-on Full-Stack Web Development with GraphQL and React. This book teaches you how to build scalable full-stack applications while learning to solve complex problems with GraphQL. Applying Modern CSS to Create React App Projects [Tutorial] Keeping animations running at 60 FPS in a React Native app [Tutorial] React Native development tools: Expo, React Native CLI, CocoaPods [Tutorial]
Read more
  • 0
  • 0
  • 4305

article-image-zuckerberg-agenda-for-tech-regulation-yet-another-digital-gangster-move
Sugandha Lahoti
01 Apr 2019
7 min read
Save for later

Zuckerberg wants to set the agenda for tech regulation in yet another “digital gangster” move

Sugandha Lahoti
01 Apr 2019
7 min read
Facebook has probably made the biggest April Fool’s joke of this year. Over the weekend, Mark Zuckerberg, CEO of Facebook, penned a post detailing the need to have tech regulation in four major areas: “harmful content, election integrity, privacy, and data portability”. However, privacy advocates and tech experts were frustrated rather than pleased with this announcement, stating that seeing recent privacy scandals, Facebook CEO shouldn’t be the one making the rules. The term ‘digital gangster’ was first coined by the Guardian, when the Digital, Culture, Media and Sport Committee published its final report on Facebook’s Disinformation and ‘fake news practices. Per the publishing firm, “Facebook behaves like a ‘digital gangster’ destroying democracy. It considers itself to be ‘ahead of and beyond the law’. It ‘misled’ parliament. It gave statements that were ‘not true’”. Last week, Facebook rolled out a new Ad Library to provide more stringent transparency for preventing interference in worldwide elections. It also rolled out a policy to ban white nationalist content from its platforms. Zuckerberg’s four new regulation ideas “I believe we need a more active role for governments and regulators. By updating the rules for the internet, we can preserve what’s best about it — the freedom for people to express themselves and for entrepreneurs to build new things — while also protecting society from broader harms.”, writes Zuckerberg. Reducing harmful content For harmful content, Zuckerberg talks about having a certain set of rules that govern what types of content tech companies should consider harmful. According to him, governments should set "baselines" for online content that require filtering. He suggests that third-party organizations should also set standards governing the distribution of harmful content and measure companies against those standards. "Internet companies should be accountable for enforcing standards on harmful content," he writes. "Regulation could set baselines for what’s prohibited and require companies to build systems for keeping harmful content to a bare minimum." Ironically, over the weekend, Facebook was accused of enabling the spreading of anti-Semitic propaganda after its refusal to take down repeatedly flagged hate posts. Facebook stated that it will not remove the posts as they do not breach its hate speech rules and are not against UK law. Preserving election integrity The second tech regulation revolves around election integrity. Facebook has been taken steps in this direction by making significant changes to its advertising policies. Facebook’s new Ad library which was released last week, now provides advertising transparency on all active ads running on a Facebook page, including politics or issue ads. Ahead of the European Parliamentary election in May 2019, Facebook is also introducing ads transparency tools in the EU. He advises other tech companies to build a searchable ad archive as well. "Deciding whether an ad is political isn’t always straightforward. Our systems would be more effective if regulation created common standards for verifying political actors," Zuckerberg says. He also talks about improving online political advertising laws for political issues rather than primarily focussing on candidates and elections. “I believe”, he says “legislation should be updated to reflect the reality of the threats and set standards for the whole industry.” What is surprising is that just 24 hrs after Zuckerberg published his post committing to preserve election integrity, Facebook took down over 700 pages, groups, and accounts that were engaged in “coordinated inauthentic behavior” on Indian politics ahead of the country’s national elections. According to DFRLab, who analyzed these pages, Facebook was in fact quite late to take actions against these pages. Per DFRLab, "Last year, AltNews, an open-source fact-checking outlet, reported that a related website called theindiaeye.com was hosted on Silver Touch servers. Silver Touch managers denied having anything to do with the website or the Facebook page, but Facebook’s statement attributed the page to “individuals associated with” Silver Touch. The page was created in 2016. Even after several regional media outlets reported that the page was spreading false information related to Indian politics, the engagements on posts kept increasing, with a significant uptick from June 2018 onward." Adhering to privacy and data portability For privacy, Zuckerberg talks about the need to develop a “globally harmonized framework” along the lines of European Union's GDPR rules for US and other countries “I believe a common global framework — rather than regulation that varies significantly by country and state — will ensure that the internet does not get fractured, entrepreneurs can build products that serve everyone, and everyone gets the same protections.”, he writes. Which makes us wonder what is stopping him from implementing EU style GDPR on Facebook globally until a common framework is agreed upon by countries? Lastly, he adds, “regulation should guarantee the principle of data portability”, allowing people to freely port their data across different services. “True data portability should look more like the way people use our platform to sign into an app than the existing ways you can download an archive of your information. But this requires clear rules about who’s responsible for protecting information when it moves between services.” He also endorses the need for a standard data transfer format by supporting the open source Data Transfer Project. Why this call for regulation now? Zuckerberg's post comes at a strategic point of time when Facebook is battling a large number of investigations. Most recent of which is the housing discrimination charge by the U.S. Department of Housing and Urban Development (HUD) who alleged that Facebook is using its advertising tools to violate the Fair Housing Act. Also to be noticed is the fact, that Zuckerberg’s blog post comes weeks after Senator Elizabeth Warren, stated that if elected president in 2020, her administration will break up Facebook. Facebook was quick to remove and then restore several ads placed by Warren, that called for the breakup of Facebook and other tech giants. A possible explanation to Zuckerberg's post can be the fact that Facebook will be able to now say that it's actually pro-government regulation. This means it can lobby governments to make a decision that would be the most beneficial for the company. It may also set up its own work around political advertising and content moderation as the standard for other industries. By blaming decisions on third parties, it may also possibly reduce scrutiny from lawmakers. According to a report by Business Insider, just as Zuckerberg posted about his news today, a large number of Zuckerberg’s previous posts and announcements have been deleted from the FB Blog. Reaching for comment, a Facebook spokesperson told Business Insider that the posts were "mistakenly deleted" due to "technical errors." Now if this is a deliberate mistake or an unintentional one, we don’t know. Zuckerberg’s post sparked a huge discussion on Hacker news with most people drawing negative conclusions based on Zuckerberg’s writeup. Here are some of the views: “I think Zuckerberg's intent is to dilute the real issue (privacy) with these other three points. FB has a bad record when it comes to privacy and they are actively taking measures against it. For example, they lobby against privacy laws. They create shadow profiles and they make it difficult or impossible to delete your account.” “harmful content, election integrity, privacy, data portability Shut down Facebook as a company and three of those four problems are solved.” “By now it's pretty clear, to me at least, that Zuckerberg simply doesn't get it. He could have fixed the issues for over a decade. And even in 2019, after all the evidence of mismanagement and public distrust, he still refuses to relinquish any control of the company. This is a tone-deaf opinion piece.” Twitteratis also shared the same sentiment. https://twitter.com/futureidentity/status/1112455687169327105 https://twitter.com/BrendanCarrFCC/status/1112150281066819584 https://twitter.com/davidcicilline/status/1112085338342727680 https://twitter.com/DamianCollins/status/1112082926232092672 https://twitter.com/MaggieL/status/1112152675699834880 Ahead of EU 2019 elections, Facebook expands it’s Ad Library to provide advertising transparency in all active ads Facebook will ban white nationalism, and separatism content in addition to white supremacy content. Are the lawmakers and media being really critical towards Facebook?
Read more
  • 0
  • 0
  • 1880

article-image-installing-a-blockchain-network-using-hyperledger-fabric-and-composertutorial
Savia Lobo
01 Apr 2019
6 min read
Save for later

Installing a blockchain network using Hyperledger Fabric and Composer[Tutorial]

Savia Lobo
01 Apr 2019
6 min read
This article is an excerpt taken from the book Hands-On IoT Solutions with Blockchain written by Maximiliano Santos and Enio Moura. In this book, you'll learn how to work with problem statements and learn how to design your solution architecture so that you can create your own integrated Blockchain and IoT solution. In this article, you will learn how to install your own blockchain network using Hyperledger Fabric and Composer. We can install the blockchain network using Hyperledger Fabric by many means, including local servers, Kubernetes, IBM Cloud, and Docker. To begin with, we'll explore Docker and Kubernetes. Setting up Docker Docker can be installed using information provided on https://www.docker.com/get-started. Hyperledger Composer works with two versions of Docker: Docker Composer version 1.8 or higher Docker Engine version 17.03 or higher If you already have Docker installed but you're not sure about the version, you can find out what the version is by using the following command in the terminal or command prompt: docker –version Be careful: many Linux-based operating systems, such as Ubuntu, come with the most recent version of Python (Python 3.5.1). In this case, it's important to get Python version 2.7. You can get it here: https://www.python.org/download/releases/2.7/. Installing Hyperledger Composer We're now going to set up Hyperledger Composer and gain access to its development tools, which are mainly used to create business networks. We'll also set up Hyperledger Fabric, which can be used to run or deploy business networks locally. These business networks can be run on Hyperledger Fabric runtimes in some alternative places as well, for example, on a cloud platform. Make sure that you've not installed the tools and used them before. If you have, you'll them using this guide. Components To successfully install Hyperledger Composer, you'll need these components ready: CLI Tools Playground Hyperledger Fabric An IDE Once these are set up, you can begin with the steps given here. Step 1 – Setting up CLI Tools CLI Tools, composer-cli, is a library with the most important operations, such as administrative, operational, and developmental tasks. We'll also install the following tools during this step: Yeoman: Frontend tool for generating applications Library generator: For generating application assets REST server: Utility for running a REST server (local) Let's start our setup of CLI Tools:  Install CLI Tools: npm install -g [email protected] Install the library generator: npm install -g [email protected] Install the REST server: npm install -g [email protected] This will allow for integration with a local REST server to expose your business networks as RESTful APIs. Install Yeoman: npm install -g yo Don't use the su or sudo commands with npm to ensure that the current user has all permissions necessary to run the environment by itself. Step 2 – Setting up Playground Playground can give you a UI in your local machine if using your browser to run Playground. This will allow you to display your business networks, browse apps to test edit, and test your business networks. Use the following command to install Playground: npm install -g [email protected] Now we can run Hyperledger Fabric. Step 3 – Hyperledger Fabric This step will allow you to run a Hyperledger Fabric runtime locally and deploy your business networks: Choose a directory, such as ~/fabric-dev-servers. Now get the .tar.gz file, which contains the tools for installing Hyperledger Fabric: mkdir ~/fabric-dev-servers && cd ~/fabric-dev-servers curl -O https://raw.githubusercontent.com/hyperledger/composer-tools/master/packages/fabric-dev-servers/fabric-dev-servers.tar.gz tar -xvf fabric-dev-servers.tar.gz You've downloaded some scripts that will allow the installation of a local Hyperledger Fabric v1.2 runtime. To download the actual environment Docker images, run the following commands in your user home directory: cd ~/fabric-dev-servers export FABRIC_VERSION=hlfv12 ./downloadFabric.sh Well done! Now you have everything required for a typical developer environment. Step 4 – IDE Hyperledger Composer allows you to work with many IDEs. Two well-known ones are Atom and VS Code, which both have good extensions for working with Hyperledger Composer. Atom lets you use the composer-atom plugin (https://github.com/hyperledger/composer-atom-plugin) for syntax highlighting of the Hyperledger Composer Modeling Language. You can download this IDE at the following link: https://atom.io/. Also, you can download VS Code at the following link: https://code.visualstudio.com/download. Installing Hyperledger Fabric 1.3 using Docker There are many ways to download the Hyperledger Fabric platform; Docker is the most used method. You can use an official repository. If you're using Windows, you'll want to use the Docker Quickstart Terminal for the upcoming terminal commands. If you're using Docker for Windows, follow these instructions: Consult the Docker documentation for shared drives, which can be found at https://docs.docker.com/docker-for-windows/#shared-drives, and use a location under one of the shared drives. Create a directory where the sample files will be cloned from the Hyperledger GitHub repository, and run the following commands: $ git clone -b master https://github.com/hyperledger/fabric-samples.git To download and install Hyperledger Fabric on your local machine, you have to download the platform-specific binaries by running the following command: $ curl -sSl https://goo.gl/6wtTN5 | bash -s 1.1.0 The complete installation guide can be found on the Hyperledger site. Deploying Hyperledger Fabric 1.3 to a Kubernetes environment This step is recommended for those of you who have the experience and skills to work with Kubernetes, a cloud environment, and networks and would like an in-depth exploration of Hyperledger Fabric 1.3. Kubernetes is a container orchestration platform and is available on major cloud providers such as Amazon Web Services, Google Cloud Platform, IBM, and Azure. Marcelo Feitoza Parisi, one of IBM's brilliant cloud architects, has created and published a guide on GitHub on how to set up a Hyperledger Fabric production-level environment on Kubernetes. The guide is available at https://github.com/feitnomore/hyperledger-fabric-kubernetes. If you've enjoyed reading this post, head over to the book, Hands-On IoT Solutions with Blockchain to understand how IoT and blockchain technology can help to solve the modern food chain and their current challenges. IBM announces the launch of Blockchain World Wire, a global blockchain network for cross-border payments Google expands its Blockchain search tools, adds six new cryptocurrencies in BigQuery Public Datasets Blockchain governance and uses beyond finance – Carnegie Mellon university podcast
Read more
  • 0
  • 0
  • 9286

article-image-understanding-the-cost-of-a-cybersecurity-attack-the-losses-organizations-face
Savia Lobo
31 Mar 2019
15 min read
Save for later

Understanding the cost of a cybersecurity attack: The losses organizations face

Savia Lobo
31 Mar 2019
15 min read
The average cost of a cybersecurity attack has been increasing over time. The rewards to hackers in cyberheists have also been increasing, and this has been motivating them to come up with even better tools and techniques in order to allow them to steal more money and data. Several cybersecurity companies have listed their estimates for the average costs of cyber attacks in 2017/2018. This article is an excerpt taken from the book, Hands-On Cybersecurity for Finance written by Dr. Erdal Ozkaya and Milad Aslaner. In this book you will learn how to successfully defend your system against common cyber threats, making sure your financial services are a step ahead in terms of security. In this article, you will learn the different losses an organization faces post a cyber attack. According to IBM—a tech giant both in hardware and software products—the average cost of a cybersecurity breach has been increasing and is now at $3,860,000. This is a 6.4% increase in their estimate for 2017. The company also estimates that the cost of each stolen record that has sensitive information in 2018 is at $148, which is a rise of 4.8% compared to their estimate for 2017. The following is IBM's report on the cost of a cyber breach in 2018: This year's study reports the global average cost of a data breach is up 6.4% over the previous year to $3,860,000 million. The average cost for each lost or stolen record containing sensitive and confidential information also increased by 4.8% year over year to $148. The cost of different cyber attacks While it might be easy to say that the average cost of a hack is $3,000,000, not all types of attacks will be around that figure. Some attacks are more costly than others. Costs also differ with the frequency of an attack against an organization. Consequently, it's good to look at how costs vary among common cyber attacks. The following screenshot is Accenture's graphical representation of the costs of the most common attacks based on their frequency in 2016 and 2017. This data was collected from 254 companies around the world: To interpret this data, one should note that frequency was taken into consideration. Consequently, the most frequent attacks had higher averages. As can be seen from the graph, insider threats are the most frequent and costly threats to an organization. Attacks related to malicious insiders led to losses averaging $173,516 in 2017. The reason for this high cost is due to the amount of information that insider threats possess when carrying out an attack. Since they've worked with the victim company for some time, they know exactly what to target and are familiar with which security loopholes to exploit. This isn't a guessing game but an assured attack with a clear aim and a preplanned execution. According to the graph by Accenture, malicious insiders were followed by denial of service (DoS) attacks at an annual cost of $129,450, and then malicious code at an annual cost of $112,419. However, when frequency is not considered, there are several changes to the report, as can be seen from the following graphical representation: This graph is representative of the situation in the real world. As can be seen, malware attacks are collectively the costliest. Organizations hit by malware lose an average of $2,400,000 per attack. This is because of the establishment of an underground market that's supports the quick purchase of new malware and the huge number of unpatched systems. Malware has also become more sophisticated due to highly skilled black hats selling their malware on the dark web at affordable prices. Therefore, script kiddies have been getting highly effective malware that they can deploy in attacks. Web-based attacks come in second at $2,000,000, while DoS attacks are ranked third at $1,565,000. DoS attacks are ranked high due to the losses that they can cause a company to incur. Breakdown of the costs of a cyber attack The direct financial losses that have been discussed are not as a result of money stolen during an attack or records copied and advertised as for sale on the deep web. All cyber attacks come bundled with other losses to the company, some of which are felt even years after the attack has happened. This is why some attacks that do not involve the direct theft of money have been ranked among the most costly attacks. For instance, DoS does not involve the theft of money from an organization, yet each DDoS attack is said to average at about $1,500,000. This is due to the other costs that come with the attacks. The following is a breakdown of the costs that come with a cyber attack. Production loss During a cyber attack, productive processes in some organizations will come to a halt. For instance, an e-commerce shop will be unable to keep its business processes running once it's attacked by a DDoS attack or a web-based attack. Organizations have also had their entire networks taken down during attacks, preventing any form of electronic communication from taking place. In various industries, cyber attacks can take a toll on production systems. Weaponized cyber attacks can even destroy industrial machines by messing with hardware controls. For instance, the Stuxnet cyberattack against Iran's nuclear facility led to the partial destruction of the facility. This shows the affect that an attack can have even behind highly secured facilities. With the looming cyber warfare and worsening political tensions between countries, it can only be feared that there will be a wave of cyber attacks targeted at key players in the industrial sector. There has been a radical shift in hacking tendencies in that hackers are no longer just looking to embezzle funds or extort money from companies. Instead, they are causing maximum damage by attacking automated processes and systems that control production machines. Cyber attacks are heading into a dangerous phase where they are able to be weaponized by competitors or enemy states, enabling them to cause physical damage and even the loss of life. There are fears that some states already have the capabilities to take over smart grids and traffic lights in US cities. ISIS, a terrorist group, was once also reported to be trying to hack into the US energy grid. In any case, production losses are moving to new heights and are becoming more costly. A ransomware attack in 2016 called WannaCry was able to encrypt many computers used in industrial processes. Some hospitals were affected and critical computers, such as those used to maintain life support systems or schedule operations in the healthcare facilities, were no longer usable. This led to the ultimate loss: human life. Other far-reaching impacts are environmental impacts, regulatory risks, and criminal liability on the side of the victim. Economic losses Cybercrime has become an economic disaster in many countries. It is estimated that at least $600,000,000,000 is drained from the global economy through cybercrime annually. This is quite a huge figure and its impact is already being felt. $600,000,000,000 is an enormous figure and the loss of this has affected many factors, including jobs. Cybercrime is hurting the economy and, in turn, hurting the job market (https://www.zdnet.com/article/cybercrime-drains-600-billion-a-year-from-the-global-economy-says-report/): Global businesses are losing the equivalent of nearly 1% of global Gross Domestic Product (GDP) a year to cybercrime, and it's impacting job creation, innovation, and economic growth. So says a report from cybersecurity firm McAfee and the Center for Strategic and International Studies (CSIS), which estimates that cybercrime costs the global economy $600,000,000,000 a year—up from a 2014 study which put the figure at $445,000,000,000. Companies are being targeted with industrial espionage and their business secrets are being stolen by overseas competitors. In the long run, companies have been facing losses due to flooding of markets with similar but cheap and substandard products. This has forced companies that were once growing fast, opening multiple branches, and hiring thousands, to start downsizing and retrenching their employees. In the US, it's estimated that cybercrime has already caused the loss of over 200,000 jobs. The loss of jobs and the drainage of money from a country's economy has made cyber crime a major concern globally. However, it might be too late for the loss to be averted. It's said that many industries have already had their business secrets stolen. In the US, it's estimated that a large number of organizations are those that are not aware of having been breached and their business secrets stolen. Therefore, the economic loss might continue for a while. In 2015, then US president Barack Obama agreed to a digital truce to put an end to the hacking of companies for trade secrets because US companies were losing too much data. The following is a snippet from the BBC (https://www.bbc.co.uk/news/world-asia-china-34360934) about the agreement between Xi and Obama: US President Barack Obama and Chinese President Xi Jinping have said they will take new steps to address cybercrime. Speaking at a joint news conference at the White House, Mr. Obama said they had agreed that neither country would engage in cyber economic espionage. Political tensions with China due to Donald Trump's presidency are threatening this truce, and an increase in hacking could occur against US companies if these tensions run too high. Unlike Obama, Trump is taking on China head-on and has been hinting at retaliatory moves, such as cutting off China's tech companies, such as Huawei, from the US market. The US arrests of Huawei employees are likely to cause retaliatory attacks from the Chinese; China may hack more US companies and, ultimately, the two countries might enter into a cyber war. Damaged brand and reputation An organization will spend a lot of money on building its brand in order to keep a certain market share and also to keep investors satisfied. Without trusted brand names, some companies could fall into oblivion. Cyber attacks tend to attract negative press and this leads to damaging a company's brand and reputation. Investors are put in a frenzy of selling their shares to prevent further loss in value. Shareholders that are left holding onto their shares are unsure whether they will ever recover the money trapped in their shares. Consequently, customers stop trusting the victim company's goods and services. Competitors then take advantage of the situation and intensify marketing in order to win over the customers and investors of the victim company. This could happen within a day or a week due to an unpreventable cyber attack. Investors will always want to keep their money with companies that they trust, and customers will always want to buy from companies that they trust. When a cyber attack breaks this trust, both investors and customers run away. Damage to a brand is very costly. A good example is Yahoo, where, after three breaches, Verizon purchased the company for $4,000,000,000 less than the amount offered in the previous year, before the hacks were public knowledge. Therefore, in a single company, almost $4,000,000,000 was lost due to the brand-damaging effects of a cyber attack. The class-action law suits against Yahoo also contributed to its lower valuation. Loss of data Despite the benefits, organizations are said to have been sluggishly adopting cloud-based services due to security fears. Those that have bought into the idea of the cloud have mostly done this halfway, not risking their mission-critical data to cloud vendors. Many organizations spend a lot of resources on protecting their systems and networks from the potential loss of data. The reason that they go through all of this trouble is so that they don't lose their valuable data, such as business secrets. If a hacker were to discover the secret code used to securely unlock iPhones, they could make a lot of money selling that code to underground markets. This is because such information is of high value to a point where Apple was unwilling to give authorities a code to compromise the lock protection and aid with the investigations of terrorists. It wasn't because Apple isn't supportive of the war against terrorism; it was instead a decision made to protect all Apple users. Here is a snippet from an article (https://www.theguardian.com/technology/2016/feb/22/tim-cook-apple-refusal-unlock-iphone-fbi-civil-liberties) on Apple's refusal to unlock an iPhone for the FBI: "Apple boss Tim Cook told his employees on Monday that the company's refusal to cooperate with a US government to unlock an iPhone used by Syed Farook, one of the two shooters in the San Bernardino attack, was a defense of civil liberties." No company will trust a third party with such sensitive information. With Apple, if a hacker were to steal documentation relating to the safety measures in Apple devices and their shortcomings, the company would face a fall in share prices and a loss of customers. The loss of data is even more sensitive in institutions that offer far more sensitive services. For instance, in June 2018, it was reported that a US Navy contractor lost a large amount of data to hackers. Among the sensitive data stolen were sensitive details about undersea warfare, plans of supersonic anti-ship missiles, and other armament and defense details of US ships and submarines. Fines, penalties, and litigations The loss of data in any cyber attack is recovered by all organizations, particularly if the data lost is sensitive in nature. The loss of health, personal, and financial data will cause a company agony when it considers the consequences that will follow. The loss of these types of data comes with many more losses, in the form of fines, penalties, and litigations. If a company is hacked, instead of receiving consolation, it's dragged into court cases and slapped with heavy fines and penalties. Several regulations have been put in place to ensure the protection of sensitive, personally-identifiable information (PII) by the organizations that collect them. This is due to the impact of the theft of such information. The demand for PII is on the rise on the dark web. This is because PII is valuable in different aspects. If, for instance, hackers were to discover that some of the data stolen from a hospital included the health information of a politician, they could use this data to extort huge amounts of money from the politician. In another scenario, hackers can use PII to social engineer the owners. Armed with personal details, such as name, date of birth, real physical address and current contact details, it's very easy for a skilled social engineer to scam a target. This is part of the reason why governments have ensured that there are very tough laws to protect PII. Losses due to recovery techniques After an attack, an organization will have to do everything it can to salvage itself. The aftermath of a serious attack is not pretty, and lots of funds have to be used to clean up the mess created by the hackers. Some companies prefer to do a complete audit of their information systems to find out the exact causes or influential factors in the attack. Post-breach activities, such as IT audits, can unearth important information that can be used to prevent the same type of attack from being executed. Some companies prefer to pay for digital forensics experts to identify the cause of an attack as well as track the hackers or the data and money stolen. Digital forensics is sometimes even able to recover some of the lost assets or funds. For instance, Ubiquiti Networks was hacked in 2015 and $46,000,000 was stolen through social engineering. Using digital forensics, $8,000,000 was recovered in one of the overseas accounts that the hackers requested the money be sent to. Sometimes all the stolen money can be recovered, but in most instances, that's not the case. The following is an article on the recovery of $8,100,000 by Ubiquiti Networks after an attack that stole $46,000,000: "The incident involved employee impersonation and fraudulent requests from an outside entity targeting the Company's finance department. This fraud resulted in transfers of funds aggregating $46,700,000 held by a Company subsidiary incorporated in Hong Kong to other overseas accounts held by third parties. "Ubiquiti says it has so far managed to recover $8,100,000 of the lost funds, and it expects to regain control of another $6,800,000. The rest? Uncertain." In short, the costs associated with a cyber attack are high and charges can even continue for several years after the actual attack happens. The current estimate of each attack being around $3,000,000 per victim organization is a mere statistic. Individual companies suffer huge losses. However, the costs associated with cybersecurity are not solely tied to the negative aftermath of an attack. Cybersecurity products are added, but necessary, expenditure for organizations. Analysts say that 75% of cyber attacks happen to people or organizations that don't have any cybersecurity products. If you've enjoyed reading this article, head over to the book Hands-On Cybersecurity for Finance to know more about the different types of threat actor groups and their motivations. Defensive Strategies Industrial Organizations Can Use Against Cyber Attacks Hydro cyber attack shuts down several metal extrusion plants 5 nation joint Activity Alert Report finds most threat actors use publicly available tools for cyber attacks
Read more
  • 0
  • 0
  • 11774

article-image-knowing-the-threat-actors-behind-a-cyber-attack
Savia Lobo
30 Mar 2019
7 min read
Save for later

Knowing the threat actors behind a cyber attack

Savia Lobo
30 Mar 2019
7 min read
This article is an excerpt taken from the book, Hands-On Cybersecurity for Finance written by Dr. Erdal Ozkaya and Milad Aslaner. In this book you will learn how to successfully defend your system against common cyber threats, making sure your financial services are a step ahead in terms of security. In this article, you will understand the different types of threat actor groups and their motivations. The attackers behind cyber attacks can be classified into the following categories: Cybercriminals Cyber terrorists Hacktivists "What really concerns me is the sophistication of the capability, which is becoming good enough to really threaten parts of our critical infrastructure, certainly in the financial, banking sector."– Director of Europol Robert Wainwright Hacktivism Hacktivism, as defined by the Industrial Control Systems Cyber Emergency Response Team (ICS-CERT), refers to threat actors that depend on propaganda rather than damage to critical infrastructures. Their goal is to support their own political agenda, which varies between anti-corruption, religion, environmental, or anti-establishment concerns. Their sub-goals are propaganda and causing damage to achieve notoriety for their cause. One of the most prominent hacktivist threat actor groups is Anonymous. Anonymous is known primarily for their distributed denial of service (DDoS) attacks on governments and the Church of Scientology. The following screenshot shows "the man without a head," which is commonly used by Anonymous as their emblem: Hacktivists target companies and governments based on the organization's mission statement or ethics. Given that the financial services industry is responsible for economic wealth, they tend to be a popular target for hacktivists. The ideologies held by hacktivists can vary, but at their core, they focus on bringing attention to social issues such as warfare or what they consider to be illegal activities. To spread their beliefs, they choose targets that allow them to spread their message as quickly as possible. The primary reason why hacktivists are choosing organizations in the financial services industry sector is that these organizations typically have a large user base, allowing them to raise the profile of their beliefs very quickly once they have successfully breached the organization's security controls. Case study – Dakota Access Pipeline The Dakota Access Pipeline (DAPL) was a 2016 construction of a 1.172-mile-long pipeline that spanned three states in the US. Native American tribes were protesting against the DAPL because of the fear that it would damage sacred grounds and drinking water. Shortly after the protests began, the hacktivist group Anonymous publicly announced their support under the name OpNoDAPL. During the construction, Anonymous launched numerous DDoS attacks against the organizations involved in the DAPL. Anonymous leaked the personal information of employees that were responsible for the DAPL and threatened that this would continue if they did not quit. The following screenshot shows how this attack spread on Twitter: Case study – Panama Papers In 2015, an offshore law firm called Mossack Fonseca had 11.5 million of their documents leaked. These documents contained confidential financial information for more than 214,488 offshore entities under what was later known as the Panama Papers. In the leaked documents, several national leaders, politicians, and industry leaders were identified, including a trail to Vladimir Putin. The following diagram shows how much was exposed as part of this attack:   While there is not much information available on how the cyber attack occurred, various security researchers have analyzed the operation. According to the WikiLeaks post, which claims to show a client communication from Mossack Fonseca, they confirm that there was a breach of their "email server". Considering the size of the data leak, it is believed that a direct attack occurred on the email servers. Cyber terrorists Extremist and terrorist organizations such as Al Qaeda and Islamic State of Iraq and Syria (ISIS) are using the internet to distribute their propaganda, recruiting new terrorists and communicating via this medium. An example of this is the 2008 attack in Mumbai, in which one of the gunmen confirmed that they used Google Earth to familiarize themselves with the locations of buildings. Cyber terrorism is an extension of traditional terrorism in cyber space. Case study – Operation Ababil In 2012, the Islamic group Izz ad-Din al-Qassam Cyber Fighters—which is a military wing of Hamas—attacked a series of American financial institutions. On September 18th 2012, this threat actor group confirmed that they were behind the cyber attack and justified it due to the relationship of the United States government with Israel. They also claimed that this was a response to the Innocence of Muslims video released by the American pastor Terry Jones. As part of a DDoS attack, they targeted the New York Stock Exchange as well as banks such as J.P. Morgan Chase. Cyber criminals Cyber criminals are either individuals or groups of hackers who use technology to commit crimes in the digital world. The primary driver of cyber criminals is financial gain and/or service disruption. Cyber criminals use computers in three broad ways: Select computers as their target: These criminals attack other people's computers to perform malicious activities, such as spreading viruses, data theft, identity theft, and more. Use computers as their weapon: They use computers to carry out "conventional crime", such as spam, fraud, illegal gambling, and more. Use computers as an accessory: They use computers to save stolen or illegal data. The following provides the larger picture so we can understand how Cyber Criminals has penetrated into the finance sector and wreaked havoc: Becky Pinkard, vice president of service delivery and intelligence at Digital Shadows Ltd, states that "Attackers can harm the bank by adding or subtracting a zero with every balance, or even by deleting entire accounts". Case study – FIN7 On August 1st 2018, the United States District Attorney's Office for the Western District of Washington announced the arrest of several members of the cyber criminal organization FIN7, which had been tracked since 2015. To this date, security researchers believe that FIN7 is one of the largest threat actor groups in the financial services industry. Combi Security is a FIN7 shelf company. The screenshot presented here shows a phishing email sent by FIN7 to victims claiming it was sent by the US Food and Drug Administration (FDA) Case study – Carbanak APT Attack Carbanak is an advanced persistent threat (APT) attack that is believed to have been executed by the threat actor group Cobalt Strike Group in 2014. In this operation, the threat actor group was able to generate a total financial loss for victims of more than 1 billion US dollars. The following depicts how the Carbanak cyber-gang stole $1bn by targeting a bank: Case study – OurMine operation In 2016, the threat actor group OurMine, who are suspected to operate in Saudi Arabia, conducted a DDoS attack against HSBC's websites, hosted in the USA and UK. The following screenshot shows the communication by the threat actor: The result of the DDoS attack was that HSBC websites for the US and the UK were unavailable. The following screenshot shows the HSBC USA website after the DDoS attack: Summary The financial services industry is one of the most popular victim industries for cybercrime. Thus, in this article, you learned about different threat actor groups and their motivations. It is important to understand these in order to build and execute a successful cybersecurity strategy.  Head over to the book, Hands-On Cybersecurity for Finance, to know more about the costs associated with cyber attacks and cybersecurity. RSA Conference 2019 Highlights: Top 5 cybersecurity products announced Security experts, Wolf Halton and Bo Weaver, discuss pentesting and cybersecurity [Interview] Hackers are our society’s immune system – Keren Elazari on the future of Cybersecurity
Read more
  • 0
  • 0
  • 4639

article-image-why-did-mcdonalds-acqui-hire-300-million-machine-learning-startup-dynamic-yield
Fatema Patrawala
29 Mar 2019
7 min read
Save for later

Why did McDonalds acqui-hire $300 million machine learning startup, Dynamic Yield?

Fatema Patrawala
29 Mar 2019
7 min read
Mention McDonald’s to someone today, and they're more likely to think about Big Mac than Big Data. But that could soon change. As the fast-food giant embraced machine learning, with plans to become a tech-innovator in a fittingly super-sized way. McDonald's stunned a lot of people when it announced its biggest acquisition in 20 years, one that reportedly cost it over $300 million. It plans to acquire Dynamic Yield, a New York based startup that provides retailers with algorithmically driven "decision logic" technology. When you add an item to an online shopping cart, “decision logic” is the tech that nudges you about what other customers bought as well. Dynamic Yield’s client list includes blue-chip retail clients like Ikea, Sephora, and Urban Outfitters. McDonald’s vetted around 30 firms offering similar personalization engine services, and landed on Dynamic Yield. It has been recently valued in the hundreds of millions of dollars; people familiar with the details of the McDonald’s offer put it at over $300 million. This makes the company's largest purchase as per a tweet by the McDonald’s CEO Steve Easterbrook. https://twitter.com/SteveEasterbrk/status/1110313531398860800 The burger giant can certainly afford it; in 2018 alone it tallied nearly $6 billion of net income, and ended the year with a free cash flow of $4.2 billion. McDonalds, a food-tech innovator from the start Over the last several years, McDonalds has invested heavily in technology by bringing stores up to date with self-serve kiosks. The company also launched an app and partnered with Uber Eats in that time, in addition to a number of infrastructure improvements. It even relocated its headquarters less than a year ago from the suburbs to Chicago’s vibrant West Town neighborhood, in a bid to attract young talent. Collectively, McDonald’s serves around 68 million customers every single day. And the majority of those people are at their drive-thru window who never get out of their car, instead place and pick up their orders from the window. And that’s where McDonalds is planning to deploy Dynamic Yield tech first. “What we hadn’t done is begun to connect the technology together, and get the various pieces talking to each other,” says Easterbrook. “How do you transition from mass marketing to mass personalization? To do that, you’ve really got to unlock the data within that ecosystem in a way that’s useful to a customer.” Here’s what that looks like in practice: When you drive up to place your order at a McDonald’s today, a digital display greets you with a handful of banner items or promotions. As you inch up toward the ordering area, you eventually get to the full menu. Both of these, as currently implemented, are largely static, aside from the obvious changes like rotating in new offers, or switching over from breakfast to lunch. But in a pilot program at a McDonald’s restaurant in Miami, powered by Dynamic Yield, those displays have taken on new dexterity. In the new McDonald’s machine-learning paradigm, that particular display screen will show customers what other items have been popular at that location, and prompt them with potential upsells. Thanks for your Happy Meal order; maybe you’d like a Sprite to go with it. “We’ve never had an issue in this business with a lack of data,” says Easterbrook. “It’s drawing the insight and the intelligence out of it.” Revenue aspects likely to double with the acquisition McDonald’s hasn’t shared any specific insights gleaned so far, or numbers around the personalization engine’s effect on sales. But it’s not hard to imagine some of the possible scenarios. If someone orders two Happy Meals at 5 o’clock, for instance, that’s probably a parent ordering for their kids; highlight a coffee or snack for them, and they might decide to treat themselves to a pick-me-up. And as with any machine-learning system, the real benefits will likely come from the unexpected. While customer satisfaction may be the goal, the avenues McDonald’s takes to get there will increase revenues along the way. Customer personalization is another goal to achieve As you may think, McDonald’s didn’t spend over $300 million on a machine-learning company to only juice up its drive-thru sales. An important part is to figure how to leverage the “personalization” part of a personalization engine. Fine-tuned insights at the store level are one thing, but Easterbrook envisions something even more granular. “If customers are willing to identify themselves—there’s all sorts of ways you can do that—we can be even more useful to them, because now we call up their favorites,” according to Easterbrook, who stresses that privacy is paramount. As for what form that might ultimately take, Easterbrook raises a handful of possibilities. McDonald’s already uses geofencing around its stores to know when a mobile app customer is approaching and prepare their order accordingly. On the downside of this tech integration When you know you have to change so much in your company, it's easy to forget some of the consequences. You race to implement all new things in tech and don't adequately think about what your employees might think of it all. This seems to be happening to McDonald's. As the fast-food chain tries to catch up to food trends that have been established for some time, their employees seem to be not happy about the fact. As Bloomberg reports, the more McDonald's introduces, fresh beef, touchscreen ordering and delivery, the more its employees are thinking: "This is all too much work." One of the employees at the McDonalds franchisee revealed at the beginning of this year. "Employee turnover is at an all-time high for us," he said, adding "Our restaurants are way too stressful, and people do not want to work in them." Workers are walking away rather than dealing with new technologies and menu options. The result: customers will wait longer. Already, drive-through times at McDonald’s slowed to 239 seconds last year -- more than 30 seconds slower than in 2016, according to QSR magazine. Turnover at U.S. fast-food restaurants jumped to 150% meaning a store employing 20 workers would go through 30 in one year. Having said that it does not come to us as a surprise that McDonalds on Tuesday announced to the National Restaurant Association that it will no longer participate in lobby efforts against minimum-wage hikes at the federal, state or local level. It does makes sense when they are already paying low wages and an all time high attrition rate hail as a bigger problem. Of course, technology is supposed to solve all the world's problems, while simultaneously eliminating the need for many people. Looks like McDonalds has put all its eggs in the machine learning and automation basket. Would it not be a rich irony, if people saw technology being introduced and walked out, deciding it was all too much trouble for just a burger? 25 Startups using machine learning differently in 2018: From farming to brewing beer to elder care An AI startup now wants to monitor your kids’ activities to help them grow ‘securly’ Microsoft acquires AI startup Lobe, a no code visual interface tool to build deep learning models easily
Read more
  • 0
  • 0
  • 3736
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
article-image-brett-lantz-on-implementing-a-decision-tree-using-c5-0-algorithm-in-r
Packt Editorial Staff
29 Mar 2019
9 min read
Save for later

Brett Lantz on implementing a decision tree using C5.0 algorithm in R

Packt Editorial Staff
29 Mar 2019
9 min read
Decision tree learners are powerful classifiers that utilize a tree structure to model the relationships among the features and the potential outcomes. This structure earned its name due to the fact that it mirrors the way a literal tree begins at a wide trunk and splits into narrower and narrower branches as it is followed upward. In much the same way, a decision tree classifier uses a structure of branching decisions that channel examples into a final predicted class value. In this article, we demonstrate the implementation of decision tree using C5.0 algorithm in R. This article is taken from the book, Machine Learning with R, Fourth Edition written by Brett Lantz. This 10th Anniversary Edition of the classic R data science book is updated to R 4.0.0 with newer and better libraries. This book features several new chapters that reflect the progress of machine learning in the last few years and help you build your data science skills and tackle more challenging problems There are numerous implementations of decision trees, but the most well-known is the C5.0 algorithm. This algorithm was developed by computer scientist J. Ross Quinlan as an improved version of his prior algorithm, C4.5 (C4.5 itself is an improvement over his Iterative Dichotomiser 3 (ID3) algorithm). Although Quinlan markets C5.0 to commercial clients (see http://www.rulequest.com/ for details), the source code for a single-threaded version of the algorithm was made public, and has therefore been incorporated into programs such as R. The C5.0 decision tree algorithm The C5.0 algorithm has become the industry standard for producing decision trees because it does well for most types of problems directly out of the box. Compared to other advanced machine learning models, the decision trees built by C5.0 generally perform nearly as well but are much easier to understand and deploy. Additionally, as shown in the following table, the algorithm's weaknesses are relatively minor and can be largely avoided. Strengths An all-purpose classifier that does well on many types of problems. Highly automatic learning process, which can handle numeric or nominal features, as well as missing data. Excludes unimportant features. Can be used on both small and large datasets. Results in a model that can be interpreted without a mathematical background (for relatively small trees). More efficient than other complex models. Weaknesses Decision tree models are often biased toward splits on features having a large number of levels. It is easy to overfit or underfit the model. Can have trouble modeling some relationships due to reliance on axis-parallel splits. Small changes in training data can result in large changes to decision logic. Large trees can be difficult to interpret and the decisions they make may seem counterintuitive. To keep things simple, our earlier decision tree example ignored the mathematics involved with how a machine would employ a divide and conquer strategy. Let's explore this in more detail to examine how this heuristic works in practice. Choosing the best split The first challenge that a decision tree will face is to identify which feature to split upon. In the previous example, we looked for a way to split the data such that the resulting partitions contained examples primarily of a single class. The degree to which a subset of examples contains only a single class is known as purity, and any subset composed of only a single class is called pure. There are various measurements of purity that can be used to identify the best decision tree splitting candidate. C5.0 uses entropy, a concept borrowed from information theory that quantifies the randomness, or disorder, within a set of class values. Sets with high entropy are very diverse and provide little information about other items that may also belong in the set, as there is no apparent commonality. The decision tree hopes to find splits that reduce entropy, ultimately increasing homogeneity within the groups. Typically, entropy is measured in bits. If there are only two possible classes, entropy values can range from 0 to 1. For n classes, entropy ranges from 0 to log2(n). In each case, the minimum value indicates that the sample is completely homogenous, while the maximum value indicates that the data are as diverse as possible, and no group has even a small plurality. In mathematical notion, entropy is specified as: In this formula, for a given segment of data (S), the term c refers to the number of class levels, and pi  refers to the proportion of values falling into class level i. For example, suppose we have a partition of data with two classes: red (60 percent) and white (40 percent). We can calculate the entropy as: > -0.60 * log2(0.60) - 0.40 * log2(0.40) [1] 0.9709506 We can visualize the entropy for all possible two-class arrangements. If we know the proportion of examples in one class is x, then the proportion in the other class is (1 – x). Using the curve() function, we can then plot the entropy for all possible values of x: > curve(-x * log2(x) - (1 - x) * log2(1 - x),     col = "red", xlab = "x", ylab = "Entropy", lwd = 4) This results in the following figure: The total entropy as the proportion of one class varies in a two-class outcome As illustrated by the peak in entropy at x = 0.50, a 50-50 split results in the maximum entropy. As one class increasingly dominates the other, the entropy reduces to zero. To use entropy to determine the optimal feature to split upon, the algorithm calculates the change in homogeneity that would result from a split on each possible feature, a measure known as information gain. The information gain for a feature F is calculated as the difference between the entropy in the segment before the split (S1) and the partitions resulting from the split (S2): One complication is that after a split, the data is divided into more than one partition. Therefore, the function to calculate Entropy(S2) needs to consider the total entropy across all of the partitions. It does this by weighting each partition's entropy according to the proportion of all records falling into that partition. This can be stated in a formula as: In simple terms, the total entropy resulting from a split is the sum of entropy of each of the n partitions weighted by the proportion of examples falling in the partition (wi). The higher the information gain, the better a feature is at creating homogeneous groups after a split on that feature. If the information gain is zero, there is no reduction in entropy for splitting on this feature. On the other hand, the maximum information gain is equal to the entropy prior to the split. This would imply the entropy after the split is zero, which means that the split results in completely homogeneous groups. The previous formulas assume nominal features, but decision trees use information gain for splitting on numeric features as well. To do so, a common practice is to test various splits that divide the values into groups greater than or less than a threshold. This reduces the numeric feature into a two-level categorical feature that allows information gain to be calculated as usual. The numeric cut point yielding the largest information gain is chosen for the split. Note: Though it is used by C5.0, information gain is not the only splitting criterion that can be used to build decision trees. Other commonly used criteria are Gini index, chi-squared statistic, and gain ratio. For a review of these (and many more) criteria, refer to An Empirical Comparison of Selection Measures for Decision-Tree Induction, Mingers, J, Machine Learning, 1989, Vol. 3, pp. 319-342. Pruning the decision tree As mentioned earlier, a decision tree can continue to grow indefinitely, choosing splitting features and dividing into smaller and smaller partitions until each example is perfectly classified or the algorithm runs out of features to split on. However, if the tree grows overly large, many of the decisions it makes will be overly specific and the model will be overfitted to the training data. The process of pruning a decision tree involves reducing its size such that it generalizes better to unseen data. One solution to this problem is to stop the tree from growing once it reaches a certain number of decisions or when the decision nodes contain only a small number of examples. This is called early stopping or prepruning the decision tree. As the tree avoids doing needless work, this is an appealing strategy. However, one downside to this approach is that there is no way to know whether the tree will miss subtle but important patterns that it would have learned had it grown to a larger size. An alternative, called post-pruning, involves growing a tree that is intentionally too large and pruning leaf nodes to reduce the size of the tree to a more appropriate level. This is often a more effective approach than prepruning because it is quite difficult to determine the optimal depth of a decision tree without growing it first. Pruning the tree later on allows the algorithm to be certain that all of the important data structures were discovered. Note: The implementation details of pruning operations are very technical and beyond the scope of this book. For a comparison of some of the available methods, see A Comparative Analysis of Methods for Pruning Decision Trees, Esposito, F, Malerba, D, Semeraro, G, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, Vol. 19, pp. 476-491. One of the benefits of the C5.0 algorithm is that it is opinionated about pruning—it takes care of many of the decisions automatically using fairly reasonable defaults. Its overall strategy is to post-prune the tree. It first grows a large tree that overfits the training data. Later, the nodes and branches that have little effect on the classification errors are removed. In some cases, entire branches are moved further up the tree or replaced by simpler decisions. These processes of grafting branches are known as subtree raising and subtree replacement, respectively. Getting the right balance of overfitting and underfitting is a bit of an art, but if model accuracy is vital, it may be worth investing some time with various pruning options to see if it improves the test dataset performance. To summarize , decision trees are widely used due to their high accuracy and ability to formulate a statistical model in plain language.  Here, we looked at a highly popular and easily configurable decision tree algorithm C5.0. The major strength of the C5.0 algorithm over other decision tree implementations is that it is very easy to adjust the training options. Harness the power of R to build flexible, effective, and transparent machine learning models with Brett Lantz’s latest book Machine Learning with R, Fourth Edition. Dr.Brandon explains Decision Trees to Jon Building a classification system with Decision Trees in Apache Spark 2.0 Implementing Decision Trees
Read more
  • 0
  • 0
  • 6761

article-image-amazon-joins-nsf-funding-fairness-ai-public-outcry-big-tech-ethicswashing
Sugandha Lahoti
27 Mar 2019
5 min read
Save for later

Amazon joins NSF in funding research exploring fairness in AI amidst public outcry over big tech #ethicswashing

Sugandha Lahoti
27 Mar 2019
5 min read
Behind the heels of Stanford’s HCAI Institute ( which, mind you, received public backlash for non-representative faculty makeup). Amazon is collaborating with the National Science Foundation (NSF) to develop systems based on fairness in AI. The company will be investing $10M each in artificial intelligence research grants over a three-year period. The official announcement was made by Prem Natarajan, VP of natural understanding in the Alexa AI group, who wrote in a blog post “With the increasing use of AI in everyday life, fairness in artificial intelligence is a topic of increasing importance across academia, government, and industry. Here at Amazon, the fairness of the machine learning systems we build to support our businesses is critical to establishing and maintaining our customers’ trust.” Per the blog post, Amazon will be collaborating with NSF to build trustworthy AI systems to address modern challenges. They will explore topics of transparency, explainability, accountability, potential adverse biases and effects, mitigation strategies, validation of fairness, and considerations of inclusivity. Proposals will be accepted from March 26 until May 10, to result in new open source tools, publicly available data sets, and publications. The two organizations plan to continue the program with calls for additional proposals in 2020 and 2021. There will be 6 to 9 awards of type Standard Grant or Continuing Grant. The award size will be $750,000 - up to a maximum of $1,250,000 for periods of up to 3 years. The anticipated funding amount is $7,600,000. “We are excited to announce this new collaboration with Amazon to fund research focused on fairness in AI,” said Jim Kurose, NSF's head for Computer and Information Science and Engineering. “This program will support research related to the development and implementation of trustworthy AI systems that incorporate transparency, fairness, and accountability into the design from the beginning.” The insidious nexus of private funding in public research: What does Amazon gain from collab with NSF? Amazon’s foray into fairness system looks more of a publicity stunt than eliminating AI bias. For starters, Amazon said that they will not be making the award determinations for this project. NSF would solely be awarding in accordance with its merit review process. However, Amazon said that Amazon researchers may be involved with the projects as an advisor only at the request of an awardee, or of NSF with the awardee's consent. As advisors, Amazon may host student interns who wish to gain further industry experience, which seems a bit dicey. Amazon will also not participate in the review process or receive proposal information. NSF will only be sharing with Amazon summary-level information that is necessary to evaluate the program, specifically the number of proposal submissions, number of submitting organizations, and numbers rated across various review categories. There was also the question of who exactly is funding since VII.B section of the proposal states: "Individual awards selected for joint funding by NSF and Amazon will be   funded through separate NSF and Amazon funding instruments." https://twitter.com/nniiicc/status/1110335108634951680 https://twitter.com/nniiicc/status/1110335004989521920 Nic Weber, the author of the above tweets and Assistant Professor at UW iSchool, also raises another important question: “Why does Amazon get to put its logo on a national solicitation (for a paltry $7.6 million dollars in basic research) when it profits in the multi-billions off of AI that is demonstrably unfair and harmful.” Twitter was abundant with tweets from those in working tech questioning Amazon’s collaboration. https://twitter.com/mer__edith/status/1110560653872373760 https://twitter.com/patrickshafto/status/1110748217887649793 https://twitter.com/smunson/status/1110657292549029888 https://twitter.com/haldaume3/status/1110697325251448833 Amazon has already been under the fire due to its controversial decisions in the recent past. In June last year, when the US Immigration and Customs Enforcement agency (ICE) began separating migrant children from their parents, Amazon came under fire as one of the tech companies that aided ICE with the software required to do so. Amazon has also faced constant criticisms since the news came that Amazon had sold its facial recognition product Rekognition to a number of law enforcement agencies in the U.S. in the first half of 2018. Amazon is also under backlash after a study by the Massachusetts Institute of Technology in January, found Amazon Rekognition incapable of reliably determining the sex of female and darker-skinned faces in certain scenarios. Amazon is yet to fix this AI-bias anomaly, and yet it has now started a new collaboration with NSF that ironically focusses on building bias-free AI systems. Amazon’s Ring (a smart doorbell company) also came under public scrutiny in January, after it gave access to its employees to watch live footage from cameras of the customers. In other news, yesterday, Google also formed an external AI advisory council to help advance the responsible development of AI. More details here. Amazon won’t be opening its HQ2 in New York due to public protests Amazon admits that facial recognition technology needs to be regulated Amazon’s Ring gave access to its employees to watch live footage of the customers, The Intercept reports
Read more
  • 0
  • 0
  • 3491

article-image-performing-sentiment-analysis-on-social-media-platforms-using-corenlp
Packt Editorial Staff
27 Mar 2019
9 min read
Save for later

Dr Joshua Eckroth on performing Sentiment Analysis on social media platforms using CoreNLP

Packt Editorial Staff
27 Mar 2019
9 min read
Sentiment analysis is achieved by labeling individual words as positive or negative, among other possible sentiments such as happy, worried, and so on. The sentiment of the sentence or phrase as a whole is determined by a procedure that aggregates the sentiment of individual words. In this article we'll demonstrate how to perform sentiment analysis on social media platforms using the CoreNLP library. This article is taken from the book AI Blueprints written by Dr Joshua Eckroth.The book covers several paradigms of AI, including deep learning, natural language processing, planning, and logic programming. Consider the sentence, I didn't like a single minute of this film. A simplistic sentiment analysis system would probably label the word like as positive and the other words as neutral, yielding an overall positive sentiment. More advanced systems analyze the "dependency tree" of the sentence to identify which words are modifiers for other words. In this case, didn't is a modifier for like, so the sentiment of like is reversed due to this modifier. Likewise, a phrase such as It's definitely not dull, exhibits a similar property, and ...not only good but amazing exhibits a further nuance of the English language. It is clear a simple dictionary of positive and negative words is insufficient for accurate sentiment analysis. The presence of modifiers can change the polarity of a word. Wilson and others' work on sentiment analysis (Recognizing contextual polarity in phrase-level sentiment analysis, Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann, published in Proceedings of the conference on human language technology and empirical methods in natural language processing, pp. 347-354, 2005) is foundational in the dependency tree approach. They start with a lexicon (that is, collection) of 8,000 words that serve as "subjectivity clues" and are tagged with polarity (positive or negative). Using just this dictionary, they achieved 48% accuracy in identifying the sentiment of about 3,700 phrases. To improve on this, they adopted a two-step approach. First, they used a statistical model to determine whether a subjectivity clue is used in a neutral or polar context. When used in a neutral context, the word can be ignored as it does not contribute to the overall sentiment. The statistical model for determining whether a word is used in a neutral or polar context uses 28 features, including the nearby words, binary features such as whether the word not appears immediately before, and part-of-speech information such as whether the word is a noun, verb, adjective, and so on. Next, words that have polarity, that is, those that have not been filtered out by the neutral/polar context identifier, are fed into another statistical model that determines their polarity: positive, negative, both, or neutral. Ten features are used for polarity classification, including the word itself and its polarity from the lexicon, whether or not the word is being negated, and the presence of certain nearby modifiers such as little, lack, and abate. These modifiers themselves have polarity: neutral, negative, and positive, respectively. Their final procedure achieves 65.7% percent accuracy for detecting sentiment. Their approach is implemented in the open source OpinionFinder. Sentiment analysis using Natural Language Processing A more modern approach may be found in Stanford's open source CoreNLP project . CoreNLP supports a wide range of NLP processing such as sentence detection, word detection, part-of-speech tagging, named-entity recognition (finding names of people, places, dates, and so on), and sentiment analysis. Several NLP features, such as sentiment analysis, depend on prior processing including sentence detection, word detection, and part-of-speech tagging. As described in the following text, a sentence's dependency tree, which shows the subject, object, verbs, adjectives, and prepositions of a sentence, is critical for sentiment analysis. CoreNLP's sentiment analysis technique has been shown to achieve 85.4% accuracy for detecting positive/negative sentiment of sentences. Their technique is state-of-the-art and has been specifically designed to better handle negation in various places in a sentence, a limitation of simpler sentiment analysis techniques as previously described. CoreNLP's sentiment analysis uses a technique known as recursive neural tensor networks (RNTN) (Here, a sentence or phrase is parsed into a binary tree, as seen in Figure 1. Every node is labeled with its part-of-speech: NP (noun phrase), VP (verb phrase), NN (noun), JJ (adjective), and so on. Each leaf node, that is, each word node, has a corresponding word vector. A word vector is an array of about 30 numbers (the actual size depends on a parameter that is determined experimentally). The values of the word vector for each word are learned during training, as is the sentiment of each individual word. Just having word vectors will not be enough since we have already seen how sentiment cannot be accurately determined by looking at words independently of their context. The next step in the RNTN procedure collapses the tree, one node at a time, by calculating a vector for each node based on its children. The bottom-right node of the figure, the NP node with children own and crashes, will have a vector that is the same size of the word vectors but is computed based on those child word vectors. The computation multiplies each child word vector and sums the results. The exact multipliers to use are learned during training. The RNTN approach, unlike prior but similar tree collapsing techniques, uses a single combiner function for all nodes. Ultimately, the combiner function and the word vectors are learned simultaneously using thousands of example sentences with the known sentiment. Figure 1: CoreNLP's dependency tree parse of the sentence, "Self-driving car companies should not be allowed to investigate their own crashes" The dependency tree from the preceding figure has twelve leaf nodes and twelve combiner nodes. Each leaf node has an associated word vector learned during training. The sentiment of each leaf node is also learned during training. Thus, the word crashes, for example, has a neutral sentiment with 0.631 confidence, while the word not has negative sentiment with 0.974 confidence. The parent node of allowed and the phrase to investigate their own crashes has a negative sentiment, confidence 0.614, even though no word or combiner node among its descendants have anything but neutral sentiment. This demonstrates that the RNTN learned a complex combiner function that operates on the word vectors of its children and not just a simple rule such as, If both children are neutral, then this node is neutral, or if one child is neutral, but one is positive, this node is positive, .... The sentiment values and confidence of each node in the tree is shown in the output of CoreNLP shown in the following code block. Note that sentiment values are coded: 0 = very negative 1 = negative 2 = neutral 3 = positive 4 = very positive (ROOT|sentiment=1|prob=0.606  (NP|sentiment=2|prob=0.484    (JJ|sentiment=2|prob=0.631 Self-driving) (NP|sentiment=2|prob=0.511   (NN|sentiment=2|prob=0.994 car)   (NNS|sentiment=2|prob=0.631 companies)))  (S|sentiment=1|prob=0.577 (VP|sentiment=2|prob=0.457      (VP|sentiment=2|prob=0.587     (MD|sentiment=2|prob=0.998 should)     (RB|sentiment=1|prob=0.974 not))      (VP|sentiment=1|prob=0.703        (VB|sentiment=2|prob=0.994 be)        (VP|sentiment=1|prob=0.614      (VBN|sentiment=2|prob=0.969 allowed)          (S|sentiment=2|prob=0.724            (TO|sentiment=2|prob=0.990 to)            (VP|sentiment=2|prob=0.557              (VB|sentiment=2|prob=0.887 investigate)              (NP|sentiment=2|prob=0.823                (PRP|sentiment=2|prob=0.997 their)                (NP|sentiment=2|prob=0.873                  (JJ|sentiment=2|prob=0.996 own)                  (NNS|sentiment=2|prob=0.631 crashes))))))))    (.|sentiment=2|prob=0.997 .))) We see from these sentiment values that allowed to investigate their own crashes is labeled with negative sentiment. We can investigate how CoreNLP handles words such as allowed and not by running through a few variations. These are shown in the following table: Sentence Sentiment Confidence They investigate their own crashes. Neutral 0.506 They are allowed to investigate their own crashes. Negative 0.697 They are not allowed to investigate their own crashes. Negative 0.672 They are happy to investigate their own crashes. Positive 0.717 They are not happy to investigate their own crashes. Negative 0.586 They are willing to investigate their own crashes. Neutral 0.507 They are not willing to investigate their own crashes. Negative 0.599 They are unwilling to investigate their own crashes. Negative 0.486 They are not unwilling to investigate their own crashes. Negative 0.625 Table 1: Variations of a sentence with CoreNLP's sentiment analysis It is clear from Table 1 that the phrase investigates their own crashes is not contributing strongly to the sentiment of the whole sentence. The verb, allowed, happy, or willing can dramatically change the sentiment. The modifier not can flip the sentiment, though curiously not unwilling is still considered negative. We should be particularly careful to study CoreNLP's sentiment analysis with sentence fragments and other kinds of invalid English that is commonly seen on Twitter. For example, the Twitter API will deliver phrases such as, Ford's self-driving car network will launch 'at scale' in 2021 - Ford hasn't been shy about... with the ... in the actual tweet. CoreNLP labels this sentence as negative with confidence 0.597. CoreNLP was trained on movie reviews, so news articles, tweets, and Reddit comments may not match the same kind of words and grammar found in movie reviews. We might have a domain mismatch between the training domain and the actual domain. CoreNLP can be trained on a different dataset but doing so requires that thousands (or 10's or 100's of thousands) of examples with known sentiment are available. Every node in the dependency tree of every sentence must be labeled with a known sentiment. This is very time-consuming. The authors of CoreNLP used Amazon Mechanical Turk to recruit humans to perform this labeling task. To summarize, in this article, we have demonstrated how to identify the sentiment, or general mood, of the feedback from customers and the general public on social media platforms (e.g. Twitter). We also performed sentiment analysis using a method based on machine learning using CoreNLP. Master essential AI blueprints to program real-world business applications from the book AI Blueprints by Dr Joshua Eckroth. How to perform sentiment analysis using Python [Tutorial] Understanding Sentiment Analysis and other key NLP concepts Sentiment Analysis of the 2017 US elections on Twitter
Read more
  • 0
  • 0
  • 3490

article-image-apples-march-event-changes-gears-to-services
Sugandha Lahoti
26 Mar 2019
7 min read
Save for later

Apple’s March Event: Apple changes gears to services, is now your bank, news source, gaming zone, and TV

Sugandha Lahoti
26 Mar 2019
7 min read
Apple’s main business model has always been hardware-centric - to sell phones and computers. However, in light of the recent news of Apple’s iPhone sales dwindling, the company is now shifting its focus to other means of revenue growth to keep its consumers occupied in the world of Apple. That is exactly what happened yesterday when Apple unveiled a set of new services at the Apple March Event. Gadi Schwartz, NBC News correspondent rightly sums up Apple’s latest plan. https://twitter.com/GadiNBC/status/1110270953001410560 Here’s the detailed report. Video subscription service: Apple TV+ The Apple TV+ is a new television subscription service (yes in the likes of Netflix and Amazon Prime) which will give subscribers access to the many shows the company has been developing. Apple TV+, the company says, “will become the new home for the world’s most creative storytellers featuring exclusive original shows, movies, and documentaries.”  Apple plans to launch Apple TV+ in over 100 countries sometime this fall, though it did not disclose the pricing. The subscription service will be ad-free, available on demand, and viewable both online and offline. https://youtu.be/Bt5k5Ix_wS8 Apple also announced Apple TV Channels as a part of the Apple TV app, which will let customers pay and watch HBO, Showtime, Starz, CBS All Access, and other services directly through the TV app. Apple TV + puts the company in direct competition to Netflix, Hulu, and Disney who also offer their own video subscription services. The company is trying to find new ways to market the Apple experience to consumers. With iPhone’s sales slowly receding, Apple’s foray into video subscription services is a new initiative to bring everyone into the walled orchids of Apple. https://twitter.com/DylanByers/status/1110144534132908037 Media subscription service: Apple news+ Next in line, is the Apple News Plus service, which adds newspapers and magazines to the Apple News app. The service costing $9.99 per month will feature almost 300 magazines and newspapers including People, Vogue, National Geographic Magazine, ELLE, Glamour, The Wall Street Journal, Los Angeles Times, and more. Surprisingly, The New York Times and The Washington Post, have opted out of joining the subscription service. Although the publishers were not authorized to speak publicly about the plans, it is speculated that opting out from this subscription service is because of two major reasons. First, Apple is asking for a cut of roughly half of the subscription revenue involved in the service. Second, Apple has also asked publishers to give unlimited access to all their content which is concerning. Combined, the subscriptions provided through Apple News+ would cost more than $8,000 per year. Apple News Plus will also come with “Live Covers,” which shows animated images instead of static photos for a magazine’s cover. https://youtu.be/Im5c5WR9vMQ Apple has been quite vocal about maintaining privacy. A striking feature of Apple news+ is the heavy emphasis on private recommendations inside the news app, including magazines. The app downloads a set of articles and manages recommendations on-device. It also does not give any data to advertisers. The company noted in the live stream, "Apple doesn't know what you read." Apple News+ is available in the U.S. and Canada. The first month is free. In Canada, the service will be offered at $12.99 per month. Later this year, Apple News+ will arrive in Europe and Australia. Game subscription service: Apple Arcade Apple is now your new gaming zone with a new game subscription service, the Apple Arcade. Presented as the “world’s first game subscription service for mobile, desktop, and living room”, it will feature over 100 new and exclusive games. These games will be from acclaimed indie developers, major studios as well as renowned creators. Apple will also be contributing to the development costs for such games. With the subscription service, players can try any game in the service without risk. Every game includes access to the full experience, including all game features, content and future updates with no additional purchases required. Apple says Arcade games don’t track usage, gameplay or the titles a user plays more. Apple Arcade will launch in fall 2019 in more than 150 countries. Arcade as a single subscription package may also possibly bring premium games the traction they may have been lacking otherwise. People also pointed out that Apple’s primary target for Arcade may be parents. A comment on Hacker News reads, “I think a lot of folks looking at this from the point of view of an adult gamer are missing the point: the audience for this is parents. For 10 bucks (or whatever) a month you can load the iPad up with games and not worry about microtransactions or scummy ads targeting your kids. "Curated" is a signal that can trust age recommendations and not worry about inappropriate content.” Netizens also believe that gaming subscription services will payout more than traditional models. “The difference between this and music is that most people do not want to hear the same songs over and over again. The $10 is spread across so many artists. Video games will capture the attention of a person for hours in a month. I can see that a big chunk of the monthly fee going to a couple of titles.”, reads a comment on Hacker News. Payment subscription service: Apple Card Probably the most important service, Apple is now venturing into the banking sector, with a new digital credit card with simpler applications, no fees, lower interest rates, and daily rewards. The Apple Card is created in partnership with Goldman Sachs and Mastercard. It is available as two options. First, as a digital card which users will be able to access by signing up on their iPhone in the Apple Wallet app.  Second, as a physical titanium card with no credit card number, CVV, expiration date, or signature. All of the authorization information is stored directly in the Apple Wallet app. The card makes use of machine learning and Apple Maps to label stores and categorize them based on color. Users can easily track purchases across categories like “food and drink” or “shopping.” It also has a rewards program, “Daily Cash,” which adds 2 percent of the daily purchase amount in cash to your Apple Cash account, also within the Wallet app. Though, purchases made through the physical card will get just 1 percent cash back. Again, privacy is the most important feature here. Apple will store the spending, tracking and other information directly on the device. Jennifer Bailey, VP of Apple Pay said, “Apple doesn’t know what you bought, where you bought it, and how much you paid for it. Goldman Sachs will never sell your data to third parties for marketing and advertising.” This is probably the service that has got people the most excited. https://twitter.com/byjacobward/status/1110237925889851393 https://twitter.com/DylanByers/status/1110248441152561158 Apple Card will be available in the US this summer. Why is Apple changing focus to services? With iPhone sales growth slowing, Apple needs new measures to bring in a large number of users to its world. What better than to foray into subscription streaming services and premium original content. With the new announcements made, Apple is indeed playing the perfect middleman between its users and TV, gaming, news, and other services, bolstering privacy as their major selling point, while also earning huge revenues. As perfectly summed by a report from NBC News, “The better Apple's suite of services — movies and shows, but also music, news, fitness tracking, mobile payments, etc. — the more revenue Apple will see from subscribers.” Spotify files an EU antitrust complaint against Apple; Apple says Spotify’s aim is to make more money off other’s work. Donald Trump called Apple CEO Tim Cook ‘Tim Apple’ Apple to merge the iPhone, iPad, and Mac apps by 2021
Read more
  • 0
  • 0
  • 2286
article-image-evade-intrusion-detection-systems-using-proxy-cannon-tutorial
Packt Editorial Staff
26 Mar 2019
10 min read
Save for later

Adrian Pruteanu shows how to evade Intrusion Detection Systems using Proxy Cannon [Tutorial]

Packt Editorial Staff
26 Mar 2019
10 min read
These days, it is fairly common for mature companies to implement Intrusion detection system (IDS), intrusion prevention systems (IPS), and security information and event management (SIEM) when they detect abuse against a particular application. When an unknown IP is performing too many operations in a short time on a protected application, IDS or IPS may take action against the source. If we are conducting a password spraying attack, we may avoid lockouts but we're still hammering the server from one source: our machine. A good way to evade these types of detection systems is to distribute the connection requests from the attacker machine over many IPs, which is commonly done by malicious actors through networks of compromised hosts. With the advent of cloud computing and computing time becoming increasingly cheap, even free in some cases, we don't have to stray outside of the law and build a botnet. In this article we'll see how to use Proxy cannon to evade intrusion detection systems (IDS). This article is taken from the book Becoming the Hacker written by Adrian Pruteanu. This book will teach you how to approach web penetration testing with an attacker's mindset. While testing web applications for performance is common, the ever-changing threat landscape makes security testing much more difficult for the defender. The Tor Project was started to provide a way for users to browse the internet anonymously. It is by far the best way to anonymize traffic and best of all, it's free. It is an effective way to change the public IP during an attack. The Tor network Tor is a network of independently operated nodes interconnected to form a network through which packets can be routed. The following graphic shows how a user, Alice, can connect to Bob through a randomly generated path or circuit, through the Tor network: Figure 1: The Tor network traffic flow (source: https://www.torproject.org/) Instead of connecting directly to the destination, the client connection from Alice to Bob will be routed through a randomly chosen set of nodes in the Tor network. Each packet is encrypted and every node can only decrypt enough information to route it to the next hop along the path. The exit node is the final node in the chain, which will make the connection to the intended destination on behalf of the client. When the packet arrives at Bob's machine, the request will look like it's coming from the exit node and not Alice's public IP. Note: More information on Tor can be found on the official site: https://www.torproject.org. While Tor is important for anonymity, we're not really concerned with staying completely anonymous. We can, however, leverage the randomly chosen exit nodes to mask the public IP when attacking an application. There are a couple of issues with conducting attacks through the Tor network. The routing protocol is inherently slower than a more direct connection. This is because Tor adds several layers of encryption to each transmission, and each transmission is forwarded through three Tor nodes on top of the normal routing that internet communication requires. This process improves anonymity but also increases communication delay significantly. The lag is noticeable for normal web browsing, but this is a tolerable trade-off. For large volume scans, it may not be the ideal transport. Warning: It should also be noted that Tor is used heavily in regions of the world where privacy is of utmost importance. Conducting large volume attacks through Tor is discouraged, as it can lead to unnecessary network slowdowns and can impact legitimate users. Low and slow attacks shouldn't cause any problems. Some red-team engagements may even require testing from the Tor network to verify related IDS/IPS rules are working as intended, but caution should be taken when launching attacks through a limited-resource public medium. Proxy cannon An alternative to using Tor for diversifying our attack IPs is to simply use the cloud. There are countless Infrastructure as a Service (IaaS) providers, each with a large IP space available for free to VM instances. VMs are cheap and sometimes free as well, so routing our traffic through them should be fairly cost effective. Amazon, Microsoft, and Google all have an easy-to-use API for automating the management of VM instances. If we can spawn a new VM with a new external IP periodically, we can route our traffic to the target application through it and mask our true origin. This should make it much more difficult for automated systems to detect and alert on our activities. Cue ProxyCannon, a great tool that does all the heavy lifting of talking to Amazon's AWS API, creating and destroying VM instances, rotating external IPs, and routing our traffic through them. Note: ProxyCannon was developed by Shellntel and is available on GitHub: https://github.com/Shellntel/scripts/blob/master/proxyCannon.py. ProxyCannon requires boto, a Python library that provides API access to Amazon's AWS. We can use Python's pip command to install the required dependency: root@kali:~/tools# pip install -U boto Collecting boto  Downloading boto-2.48.0-py2.py3-none-any.whl (1.4MB) [...] Installing collected packages: boto Successfully installed boto-2.48.0 The ProxyCannon tool should now be ready to use with the -h option showing all of the available options: root@kali:~/tools# python proxyCannon.py -h usage: proxyCannon.py [-h] [-id [IMAGE_ID]] [-t [IMAGE_TYPE]]             [--region [REGION]] [-r] [-v] [--name [NAME]]             [-i [INTERFACE]] [-l]             num_of_instances positional arguments:  num_of_instances   The number of amazon instances you'd like to launch. optional arguments:  -h, --help         show this help message and exit  -id [IMAGE_ID], --image-id [IMAGE_ID]             Amazon ami image ID. Example: ami-d05e75b8. If not             set, ami-d05e75b8.  -t [IMAGE_TYPE], --image-type [IMAGE_TYPE]             Amazon ami image type Example: t2.nano. If not             set, defaults to t2.nano.  --region [REGION] Select the region: Example: us-east-1. If             not set, defaults to us-east-1. positional arguments:  num_of_instances   The number of amazon instances you'd like to launch. optional arguments:  -h, --help         show this help message and exit  -id [IMAGE_ID], --image-id [IMAGE_ID]             Amazon ami image ID. Example: ami-d05e75b8. If not             set, ami-d05e75b8.  -t [IMAGE_TYPE], --image-type [IMAGE_TYPE]             Amazon ami image type Example: t2.nano. If not             set, defaults to t2.nano.  --region [REGION] Select the region: Example: us-east-1. If             not set, defaults to us-east-1. Output is to /tmp/ By default, ProxyCannon creates t2.nano virtual instances in AWS, which should be free for a limited time with new accounts. They have very little resources but are typically enough for most attacks. To change the type of instance, we can supply the -t switch. The default region is us-east-1 and can be adjusted using the --region switch. ProxyCannon will create as many instances as specified in the num_of_instances and using the -r switch, it will rotate them regularly. The -l switch is also useful to keep track of what public IPs ProxyCannon is using over the course of the execution. This is useful for reporting purposes: the blue team may need a list of all the IPs used in the attack. In order for the tool to be able to communicate with our AWS account and to manage instances automatically, we have to create API access keys in the AWS console. The interface is fairly straightforward and can be accessed in the account Security Credentials page. The access key ID and the secret keys are randomly generated and should be stored securely. Once the engagement is over, you should delete the keys in the AWS console. Figure 2: Generating a new AWS API access key We can start ProxyCannon using the -r and -l switches, and specify that we want 3 instances running at the same time. root@kali:~/tools# python proxyCannon.py -r -l 3 What is the AWS Access Key Id: d2hhdCBhcmUgeW91IGRvaW5n What is the AWS Secret Access Key: dW5mb3J0dW5hdGVseSB0aGlzIGlzIG5vdCB0aGUgcmVhbCBrZXku [...] Upon first run, ProxyCannon will ask you for these values and store them in the ~/.boto file. root@kali:~/tools# cat ~/.boto [default] aws_access_key_id = d2hhdCBhcmUgeW91IGRvaW5n aws_secret_access_key = dW5mb3J0dW5hdGVseSB0aGlzIGlzIG5vdCB0aGUgcmVhbCBrZXku As you can see, these are stored in plaintext, so make sure this file is properly protected. Amazon recommends that these keys are rotated frequently. It's probably a good idea to create new ones for each engagement and delete them from AWS as soon as they're not required anymore. ProxyCannon will connect to Amazon EC2, setup the SSH keys, adjust the security groups, and start the VM instances. This process may take a couple of minutes to complete. [*] Connecting to Amazon's EC2... [*] Generating ssh keypairs... [*] Generating Amazon Security Group... [~] Starting 3 instances, please give about 4 minutes for them to fully boot [====================] 100% ProxyCannon will overwrite the current system iptables configuration to properly route all traffic through whatever instance is chosen: [*] Provisioning Hosts..... [*] Saving existing iptables state [*] Building new iptables... [*] Done! +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Leave this terminal open and start another to run your commands.+ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ [~] Press ctrl + c to terminate the script gracefully. [...] As promised, ProxyCannon will periodically rotate our effective external IP using SSH tunnels and by modifying the routing table. All of this is done automatically, in the background, while Burp Suite or ZAP runs the password spraying attack. The following is the periodic output from ProxyCannon showing the IPs being rotated: [*] Rotating IPs. [*] Replaced 107.21.177.36 with 34.207.187.254 on tun0 [*] Replaced 34.234.91.233 with 52.91.91.157 on tun1 [*] Replaced 34.202.237.230 with 34.228.167.195 on tun2 [*] Replaced 34.207.187.254 with 34.228.158.208 on tun0 [*] Replaced 52.91.91.157 with 54.198.223.114 on tun1 On the AWS console, we can see the started t2.nano instances and their public IPs: Figure 3: AWS instances created to route our traffic through We can test ProxyCannon by repeating a curl request to our target application using the watch command. We don't need to drop in a shell similar to torsocks because ProxyCannon modifies the local system routing to help us change our external IP. root@kali:~# watch -n30 curl http://c2.spider.ml On the target application side, c2.spider.ml, the server log, shows connection attempts from various IPs belonging to the Amazon address space: 52.91.91.157 - - [13:01:16] "GET / HTTP/1.1" 200 - 52.91.91.157 - - [13:01:22] "GET / HTTP/1.1" 200 - 34.228.158.208 - - [13:01:43] "GET / HTTP/1.1" 200 - 34.228.158.208 - - [13:01:48] "GET / HTTP/1.1" 200 - 54.198.223.114 - - [13:06:34] "GET / HTTP/1.1" 200 - 54.198.223.114 - - [13:06:39] "GET / HTTP/1.1" 200 - It should be noted that there is a lower limit to how often we can rotate the IPs on Amazon or any cloud provider for that matter. It takes a while for instances to boot and IP addresses to be reserved, associated, and become active. ProxyCannon has a hardcoded value of about 90 seconds to ensure the effective IP actually changes. In this article, we looked at Proxy cannon for staying under the radar while conducting brute-force attacks during an engagement. Becoming the Hacker is a playbook to help you become an ethical hacker and protect the web. Learn about the tricks of a web attacker. 6 common use cases of Reverse Proxy scenarios MarioNet: A browser-based attack that allows hackers to run malicious code even if users’ exit a web page Black Hat hackers used IPMI cards to launch JungleSec Ransomware, affects most of the Linux servers
Read more
  • 0
  • 0
  • 4346

article-image-how-ai-is-transforming-the-smart-cities-iot-tutorial
Natasha Mathur
23 Mar 2019
11 min read
Save for later

How AI is transforming the Smart Cities IoT? [Tutorial]

Natasha Mathur
23 Mar 2019
11 min read
According to techopedia, a smart city is a city that utilizes information and communication technologies so that it enhances the quality and performance of urban services (such as energy and transportation) so that there's a reduction in resource consumption, wastage, and overall costs. In this article, we will look at components of a smart city and its AI-powered-IoT use cases, how AI helps with the adaption of IoT in Smart cities, and an example of AI-powered-IoT solution. Deakin and AI Waer list four factors that contribute to the definition of a smart city: Using a wide range of electronic and digital technologies in the city infrastructure Employing Information and Communication Technology (ICT) to transform living and working environment Embedding ICT in government systems Implementing practices and policies that bring people and ICT together to promote innovation and enhance the knowledge that they offer Hence, a smart city would be a city that not only possesses ICT but also employs technology in a way that positively impacts the inhabitants. This article is an excerpt taken from the book 'Hands-On Artificial Intelligence for IoT' written by  Amita Kapoor.  The book explores building smarter systems by combining artificial intelligence and the Internet of Things—two of the most talked about topics today. Artificial Intelligence (AI), together with IoT, has the potential to address the key challenges posed by excessive urban population; they can help with traffic management, healthcare, energy crisis, and many other issues. IoT data and AI technology can improve the lives of the citizens and businesses that inhabit a smart city.  Let's see how. Smart city and its AI-powered-IoT use cases A smart city has lots of use cases for AI-powered IoT-enabled technology, from maintaining a healthier environment to enhancing public transport and safety. In the following diagram, you can see some the of use cases for a smart city: Smart city components Let's have a look at some of the most popular use cases that have already been implemented in smart cities across the world. Smart traffic management AI and IoT can implement smart traffic solutions to ensure that inhabitants of a smart city get from one point to another in the city as safely and efficiently as possible. Los Angeles, one of the most congested cities in the world, has implemented a smart traffic solution to control the flow of traffic. It has installed road-surface sensors and closed-circuit television cameras that send real-time updates about the traffic flow to a central traffic management system. The data feed from the sensors and cameras is analyzed, and it notifies the users of congestion and traffic signal malfunctions. In July 2018, the city further installed Advanced Transportation Controller (ATC) cabinets at each intersection. Enabled with vehicle-to-infrastructure (V2I) communications and 5G connectivity, this allows them to communicate with cars that have the traffic light information feature, such as Audi A4 or Q7. You can learn more about the Los Angeles smart transportation system from their website. The launch of automated vehicles embedded with sensors can provide both the location and speed of the vehicle; they can directly communicate with the smart traffic lights and prevent congestion. Additionally, using historical data, future traffic could be predicted and used to prevent any possible congestion. Smart parking Anyone living in a city must have felt the struggle of finding a parking spot, especially during the holiday time. Smart parking can ease the struggle. With road surface sensors embedded in the ground on parking spots, smart parking solutions can determine whether the parking spots are free or occupied and create a real-time parking map. The city of Adelaide installed a smart parking system in February 2018, they are also launching a mobile app: Park Adelaide, which will provide the user with accurate and real-time parking information. The app can provide users with the ability to locate, pay for, and even extend the parking session remotely. The smart parking system of the city of Adelaide aims to also improve traffic flow, reduce traffic congestion, and decrease carbon emissions. The details of the smart parking system are available in the city of Adelaide website. The San Francisco Municipal Transportation Agency (SAFTA) implemented SFpark a smart parking system. They use wireless sensors to detect real-time parking-space occupancy in metered spaces. Launched in the year 2013, SFpark has reduced weekday greenhouse gas emissions by 25%, the traffic volume has gone down, and drivers' search time has reduced by 50%. In London, the city of Westminster also established a smart parking system in the year 2014 in association with Machina Research. Earlier, drivers had to wait an average of 12 minutes, resulting in congestion and pollution, but since the installation of the smart parking system, there's no need to wait; drivers can find an available parking spot using the mobile. These are some of the use-cases mentioned. Other use-cases include smart waste management, smart policing, smart lighting, and smart governance. What can AI do for IoT adaption in smart cities? Building a smart city is not a one-day business, neither is it the work of one person or organization. It requires the collaboration of many strategic partners, leaders, and even citizens. Let's explore what the AI community can do, what are the areas that provide us with a career or entrepreneurship opportunity. Any IoT platform will necessarily require the following: A network of smart things (sensors, cameras, actuators, and so on) for gathering data Field (cloud) gateways that can gather the data from low power IoT devices, store it, and forward it securely to the cloud Streaming data processor for aggregating numerous data streams and distributing them to a data lake and control applications A data lake for storing all the raw data, even the ones that seem of no value yet A data warehouse that can clean and structure the collected data Tools for analyzing and visualizing the data collected by sensors AI algorithms and techniques for automating city services based on long-term data analysis and finding ways to improve the performance of control applications Control applications for sending commands to the IoT actuators User applications for connecting smart things and citizens Besides this, there will be issues regarding security and privacy, and the service provider will have to ensure that these smart services do not pose any threat to citizens' wellbeing. The services themselves should be easy to use and employ so that citizens can adopt them. As you can see, this offers a range of job opportunities, specifically for AI engineers. The IoT-generated data needs to be processed, and to benefit from it truly, we will need to go beyond monitoring and basic analysis. The AI tools will be required to identify patterns and hidden correlations in the sensor data. Analysis of historical sensor data using ML/AI tools can help in identifying trends and create predictive models based on them. These models can then be used by control applications that send commands to IoT devices' actuators. The process of building a smart city will be an iterative process, with more processing and analysis added at each iteration. Let's now have a look at an example of AI-powered-IoT solution. Detecting crime using San Francisco crime data The San Francisco city also has an open data portal providing data from different departments online. In this section, we take the dataset providing about 12 years (from January 2003 to May 2015) of crime reports from across all of San Francisco's neighborhoods and train a model to predict the category of crime that occurred. There are 39 discreet crime categories, thus it's a multi-class classification problem. We will use make use of Apache's PySpark and use its easy to use text processing features for this dataset. So the first step will be to create a Spark session: The first step is to import the necessary modules and create a Spark session: from pyspark.ml.classification import LogisticRegression as LR from pyspark.ml.feature import RegexTokenizer as RT from pyspark.ml.feature import StopWordsRemover as SWR from pyspark.ml.feature import CountVectorizer from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler from pyspark.ml import Pipeline from pyspark.sql.functions import col from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName("Crime Category Prediction") \ .config("spark.executor.memory", "70g") \ .config("spark.driver.memory", "50g") \ .config("spark.memory.offHeap.enabled",True) \ .config("spark.memory.offHeap.size","16g") \ .getOrCreate() We load the dataset available in a csv file: data = spark.read.format("csv"). \ options(header="true", inferschema="true"). \ load("sf_crime_dataset.csv") data.columns The data contains nine columns: [Dates, Category, Descript, DayOfWeek, PdDistrict, Resolution, Address, X, Y], we will need only Category and Descript fields for training and testing dataset: drop_data = ['Dates', 'DayOfWeek', 'PdDistrict', 'Resolution', 'Address', 'X', 'Y'] data = data.select([column for column in data.columns if column not in drop_data]) data.show(5) Now the dataset we have has textual data, so we will need to perform text processing. The three important text processing steps are: tokenizing the data, remove the stop words and vectorize the words into vectors. We will use RegexTokenizer which will uses regex to tokenize the sentence into a list of words, since punctuation or special characters do not add anything to the meaning, we retain only the words containing alphanumeric content. There are some words like the, which will be very commonly present in the text, but not add any meaning to context. We can remove these words (also called stop words) using the inbuilt StopWordsRemover class. We use standard stop words ["http","https","amp","rt","t","c","the"]. And finally using the CountVectorizer, we convert the words to numeric vector (features). It's these numeric features that will be used as input to train the model. The output for our data is the Category column, but it's also textual with 36 distinct categories, and so, we need to convert it to one hot encoded vector; the PySpark's StringIndexer can be easily used for it. We add all these transformations into our data Pipeline: # regular expression tokenizer re_Tokenizer = RT(inputCol="Descript", outputCol="words", pattern="\\W") # stop words stop_words = ["http","https","amp","rt","t","c","the"] stop_words_remover = SWR(inputCol="words", outputCol="filtered").setStopWords(stop_words) # bag of words count count_vectors = CountVectorizer(inputCol="filtered", outputCol="features", vocabSize=10000, minDF=5) #One hot encoding the label label_string_Idx = StringIndexer(inputCol = "Category", outputCol = "label") # Create the pipeline pipeline = Pipeline(stages=[re_Tokenizer, stop_words_remover, count_vectors, label_string_Idx]) # Fit the pipeline to data. pipeline_fit = pipeline.fit(data) dataset = pipeline_fit.transform(data) dataset.show(5) Now, the data is ready, we split it into training and test dataset: # Split the data randomly into training and test data sets. (trainingData, testData) = dataset.randomSplit([0.7, 0.3], seed = 100) print("Training Dataset Size: " + str(trainingData.count())) print("Test Dataset Size: " + str(testData.count())) Let's fit a simple logistic regression model for it. On the test dataset, it provides a 97% accuracy. Yahoo!: # Build the model logistic_regrssor = LR(maxIter=20, regParam=0.3, elasticNetParam=0) # Train model with Training Data model = logistic_regrssor.fit(trainingData) # Make predictions on Test Data predictions = model.transform(testData) # evaluate the model on test data set evaluator = MulticlassClassificationEvaluator(predictionCol="prediction") evaluator.evaluate(predictions) AI is changing the way cities operate, deliver, and maintain public amenities, from lighting and transportation to connectivity and health services. However, the adoption can be obstructed by the selection of technology that doesn't efficiently work together or integrate with other city services. For cities to truly benefit from the potential that smart cities offer, a change in mindset is required. The authorities should plan longer and across multiple departments. The city of Barcelona is a prime example where the implementation of IoT systems created an estimated 47,000 jobs, saved €42.5 million on water, and generated an extra €36.5 million a year through smart parking. We can easily see that cities can benefit tremendously from the technological advances that utilize AI-powered IoT solutions. AI-powered IoT solutions can help connect cities and manage multiple infrastructure, and public services. In this article, we looked at use-cases of smart-cities from smart lighting and road traffic to connected public transport, and waste management. We also learned to use tools that can help categorize the data from the San Francisco crime reports done in a period of 12 years. If you want to explore more topics in the book, be sure to check out the book 'Hands-On Artificial Intelligence for IoT'. IBM Watson announces pre-trained AI tools to accelerate IoT operations Implementing cost-effective IoT analytics for predictive maintenance [Tutorial] AI and the Raspberry Pi: Machine Learning and IoT, What’s the Impact?
Read more
  • 0
  • 0
  • 11348

article-image-four-versions-of-wikipedia-goes-offline-in-a-protest-against-eu-copyright-directive-which-will-affect-free-speech-online
Savia Lobo
22 Mar 2019
5 min read
Save for later

Four versions of Wikipedia goes offline in a protest against EU copyright Directive which will affect free speech online

Savia Lobo
22 Mar 2019
5 min read
Yesterday, March 21, four versions of Wikipedia, German, Danish, Czech, and Slovak were blacked off as a move to oppose the recent EU Copyright Directive, which will be up for voting on Tuesday, March 26. These long-awaited updates to the copyright law include “important wins for the open community in the current text”, the Wikimedia foundation reports. However, “the inclusion of Articles 11 and 13 will harm the way people find and share information online”, Wikimedia further states. However, the major opposition is towards the controversial Article 13. Article 11 states that if a text contains more than a snippet from an article, it must be licensed and paid for by whoever quotes the text. “While each country can define "snippet" however it wants, the Directive does not stop countries from making laws that pass using as little as three words from a news story”, the Electronic Frontier Foundation mentions. Article 13 is, however, the most controversial and is all set to restructure how copyright works on the web. As of now, in order to take down content that is subject to copyright infringement, the rights holder just have to send a ‘takedown notice’. However, with Article 13 in place, there will be no protection for online services and also “relieves rights-holders of the need to check the Internet for infringement and send out notices. Instead, it says that online platforms have a duty to ensure that none of their users infringe copyright.” According to The Next Web, “To make people understand how serious the effects of the Copyright Reform will be if it’s passed, Reddit and Wikipedia will hinder access to their sites in the EU to mimic the effects of the directive.” Both Article 11 and 13 were reintroduced under the leadership of German Member of the European Parliament (MEP) Axel Voss. However, these had already been discarded as unworkable after expert advice. “Voss's insistence that Articles 11 and 13 be included in the final Directive has been a flashpoint for public anger, drawing criticism from the world's top technical, copyright, journalistic, and human rights experts and organizations”, the Electronic Frontier Foundation reports. “Critics say the politicians behind the legislation do not understand the breadth of the laws they are proposing, and that the directive, if implemented, will harm free expression online”, The Verge reports. Platforms such as Tumblr, YouTube, and many others, that host user-generated content will be under the radar if Article 13 is passed and will be legally responsible if the users upload copyrighted content. According to The Verge, “The only way to stop these uploads, say critics, will be to scan content before its uploaded, leading to the creation of filters that will likely be error-prone and abused by copyright trolls.” Many have protested against Article 13 in recent weeks. In Germany, about 3,500 people took out a rally in Berlin as a protest against the new copyright plans. Also, a petition ‘Save the Internet’ has already gathered more than five million signatures. Reddit has also taken an action against the Copyright Directive by flashing a simulated error message citing failure when Reddit desktop users in EU countries attempt to make a top-level post on Reddit. According to Reddit, “This experience, meant to mimic the automated filters that users would encounter should the Directive pass, will last through March 23rd, when IRL demonstrations are planned across Europe.” Julia Reda, a member of the European Parliament from Germany, in her blog post mentions, “For two years we’ve debated different drafts and versions of the controversial Articles 11 and 13. Now, there is no more ambiguity: This law will fundamentally change the internet as we know it – if it is adopted in the upcoming final vote. But we can still prevent that!” United Nations’ free-speech rapporteur, David Kaye, said, “Europe has a responsibility to modernize its copyright law to address the challenges of the digital age. But this should not be done at the expense of the freedom of expression that Europeans enjoy today… Article 13 of the proposed Directive appears destined to drive internet platforms toward monitoring and restriction of user-generated content even at the point of upload. Such sweeping pressure for pre-publication filtering is neither a necessary nor proportionate response to copyright infringement online.” A user on HackerNews writes, “I hope they win and that Article 11 and 13 will be removed. I think this is an important moment in the birth of EU democracy because it feels to me that one of the first times, there is a big public discussion about an issue and the people at the center aren't national politicians like Merkel or Macron but EU MEPs, namely Voss vs Reda. The EU has rightfully been criticized of not being democratic enough, and this discussion feels like it's very much democratic.” https://twitter.com/Wikipedia/status/1108595296068501504 Five EU countries oppose the EU copyright directive Reddit’s 2018 Transparency report includes copyright removals, restorations, and more! Drafts of Article 13 and the EU Copyright Directive have been finalized
Read more
  • 0
  • 0
  • 2478
article-image-google-facebook-working-hard-to-clean-image-after-media-backlash-from-attack
Fatema Patrawala
22 Mar 2019
8 min read
Save for later

Google and Facebook working hard to clean image after the media backlash from the Christchurch terrorist attack

Fatema Patrawala
22 Mar 2019
8 min read
Last Friday’s uncontrolled spread of horrific videos on the Christchurch mosque attack and a propaganda coup for espousing hateful ideologies raised questions about social media. The tech companies scrambled to take action on time due to the speed and volume of content which was uploaded, reuploaded and shared by the users worldwide. In Washington and Silicon Valley, the incident crystallized growing concerns about the extent to which government and market forces have failed to check the power of social media. The failure highlighted the social media companies struggle to police content that are massively lucrative and persistently vulnerable to outside manipulation despite years of promises to do better. After the white supremacist live-streamed the attack and uploaded the video to Facebook, Twitter, YouTube, and other platforms across the internet. These tech companies faced back lashes from the media and internet users worldwide, to an extent where they were regarded as complicit in promoting white supremacism too. In response to the backlash, Google and Facebook provides status report on what they went through when the video was reported, the kind of challenges they faced and what are the next steps to combat such incidents in future. Google’s report so far... Google in an email to Motherboard says it employs 10,000 people across to moderate the company’s platforms and products. They also described a process they would follow when a user reports a piece of potentially violating content—such as the attack video; which is The user flagged report will go to a human moderator to assess. The moderator is instructed to flag all pieces of content related to the attack as “Terrorist Content,” including full-length or sections of the manifesto. Because of the document’s length the email tells moderators not to spend an extensive amount of time trying to confirm whether a piece of content does contain part of the manifesto. Instead, if the moderator is unsure, they should err on the side of caution and still label the content as “Terrorist Content,” which will then be reviewed by a second moderator. The second moderator is told to take time to verify that it is a piece of the manifesto, and appropriately mark the content as terrorism no matter how long or short the section may be. Moderators are told to mark the manifesto or video as terrorism content unless there is an Educational, Documentary, Scientific, or Artistic (EDSA) context to it. Further Google adds that they want to preserve journalistic or educational coverage of the event, but does not want to allow the video or manifesto itself to spread throughout the company’s services without additional context. Google at some point had taken the unusual step of automatically rejecting any footage of violence from the attack video, cutting out the process of a human determining the context of the clip. If, say, a news organization was impacted by this change, the outlet could appeal the decision, Google commented. “We made the call to basically err on the side of machine intelligence, as opposed to waiting for human review,” YouTube’s Product Officer Neal Mohan told the Washington Post in an article published Monday. Google also tweaked the search function to show results from authoritative news sources. It suspended the ability to search for clips by upload date, making it harder for people to find copies of the attack footage. "Since Friday’s horrific tragedy, we’ve removed tens of thousands of videos and terminated hundreds of accounts created to promote or glorify the shooter," a YouTube spokesperson said. “Our teams are continuing to work around the clock to prevent violent and graphic content from spreading, we know there is much more work to do,” the statement added. Facebook’s update so far... Facebook on Wednesday also shared an update on how they have been working with the New Zealand Police to support their investigation. It provided additional information on how their products were used to circulate videos and how they plan to improve them. So far Facebook has provided the following information: The video was viewed fewer than 200 times during the live broadcast. No users reported the video during the live broadcast. Including the views during the live broadcast, the video was viewed about 4,000 times in total before being removed from Facebook. Before Facebook was alerted to the video, a user on 8chan posted a link to a copy of the video on a file-sharing site. The first user report on the original video came in 29 minutes after the video started, and 12 minutes after the live broadcast ended. In the first 24 hours, Facebook removed more than 1.2 million videos of the attack at upload, which were therefore prevented from being seen on our services. Approximately 300,000 additional copies were removed after they were posted. As there were questions asked to Facebook about why artificial intelligence (AI) didn’t detect the video automatically. Facebook says AI has made massive progress over the years to proactively detect the vast majority of the content it can remove. But it’s not perfect. “To achieve that we will need to provide our systems with large volumes of data of this specific kind of content, something which is difficult as these events are thankfully rare.” says Guy Rosen VP Product Management at Facebook. Guy further adds, “AI is an incredibly important part of our fight against terrorist content on our platforms, and while its effectiveness continues to improve, it is never going to be perfect. People will continue to be part of the equation, whether it’s the people on our team who review content, or people who use our services and report content to us. That’s why last year Facebook more than doubled the number of people working on safety and security to over 30,000 people, including about 15,000 content reviewers to report content that they find disturbing.” Facebook further plans to: Improve the image and video matching technology so that they can stop the spread of viral videos of such nature, regardless of how they were originally produced. React faster to this kind of content on a live streamed video. Continue to combat hate speech of all kinds on their platform. Expand industry collaboration through the Global Internet Forum to Counter Terrorism (GIFCT). Challenges Google and Facebook faced to report the video content According to Motherboard, Google saw an unprecedented number of attempts to post footage from the attack, sometimes as fast as a piece of content per second. But the challenge they faced was to block access to the killer’s so-called manifesto, a 74-page document that spouted racist views and explicit calls for violence. Google described the difficulties of moderating the manifesto, pointing to its length and the issue of users sharing the snippets of the manifesto that Google’s content moderators may not immediately recognise. “The manifesto will be particularly challenging to enforce against given the length of the document and that you may see various segments of various lengths within the content you are reviewing,” says Google. A source with knowledge of Google’s strategy for moderating the New Zealand attack material said this can complicate moderation efforts because some outlets did use parts of the video and manifesto. UK newspaper The Daily Mail let readers download the terrorist’s manifesto directly from the paper’s own website, and Sky News Australia aired parts of the attack footage, BuzzFeed News reported. On the other hand Facebook faces a challenge to automatically discern such content from visually similar, innocuous content. For example if thousands of videos from live-streamed video games are flagged by the systems, reviewers could miss the important real-world videos where they could alert first responders to get help on the ground. Another challenge for Facebook is similar to what Google faces, which is the proliferation of many different variants of videos makes it difficult for the image and video matching technology to prevent spreading further. Facebook found that a core community of bad actors working together to continually re-upload edited versions of the video in ways designed to defeat their detection. Second, a broader set of people distributed the video and unintentionally made it harder to match copies. Websites and pages, eager to get attention from people seeking out the video, re-cut and re-recorded the video into various formats. In total, Facebook found and blocked over 800 visually-distinct variants of the video that were circulating. Both companies seem to be working hard to improve their products and gain user’s trust and confidence back. How social media enabled and amplified the Christchurch terrorist attack Google to be the founding member of CDF (Continuous Delivery Foundation) Google announces the stable release of Android Jetpack Navigation
Read more
  • 0
  • 0
  • 2661

article-image-qt-installation-on-different-platforms-tutorial
Amrata Joshi
22 Mar 2019
10 min read
Save for later

Qt installation on different platforms [Tutorial]

Amrata Joshi
22 Mar 2019
10 min read
Qt provides a different look for mobile and embedded devices where users expect a different style of presentation. This is controlled within the framework so the developer can concentrate on developing a single application. The Qt framework is released in two separate distributions, one commercial and one open source (known as dual licensing). In this manner, they can support open source-compliant applications for free, while providing unrestricted usage for closed source commercial projects. Before the year 2000 (with the release of 2.2), the source code for the free distribution had been under various licenses that some groups considered incompatible with common open source initiatives. For the 2.2 release, it was changed to GPL licensing, which settled any concerns about the group's commitment to true open source freedoms. In 2007, Qt 4.5 was released and they added LGPL as an option for developers who prefer the more permissive license. This article is an excerpt taken from the book Hands-On GUI Application Development in Go. This book covers the benefits and complexities of building native graphical applications, the procedure for building platform and developing graphical Windows applications using Walk.  This article covers the basics of therecipe/qt, multiple platforms, installation of qt (the bindings), and much more. Getting started with therecipe/qt To begin our exploration of Qt and the binding to Go, we'll build a simple hello world application. To be able to do so, we first need to install therecipe/qt, which depends on various prerequisites that we must first set up. As with Go-GTK, we'll be relying on a native library that requires that we both set up the CGo functionality and install the Qt library appropriate for the current platform. Preparing CGo The Qt Go bindings, like many of the other toolkits featured in this book, require the presence of CGo to utilize native libraries. On a full development system, it's likely that this is already set up. Installing Qt The Qt website offers various methods of installation, including a customized online installer available to anyone with a Qt account (which is free to sign up for). Typically, a Qt installation comes with Qt Creator (the project IDE), the GUI designer, additional tools, and examples. Visiting the preceding site will automatically detect your system and suggest the most appropriate download (this is normally the best option). Be aware that the Qt installation can be quite large. If you don't have at least 40 GB of space on your hard drive, you need to make a little space before installing. Some operating systems offer Qt libraries and tools as part of their package manager, which often provides a more lightweight installation that'll automatically stay up to date. Installing Qt on multiple platforms macOS On Apple macOS, the best approach to installation is to use the installer application available at the Qt download site. Visit www.qt.io/download and download the macOS installer. Once it has downloaded, open the package and run the program inside; this will install the selected compilers, tools, and supporting applications. If you encounter any errors during installation, the first step would be to check that your Xcode installation is complete and up to date. Windows Installing on Windows is more straightforward than some of the other toolkits we've looked at, as the Qt installer has a mingw package bundled to set up most of the compiling requirements (though it's still recommended to have your own compiler set up for the binding phase next). To install it, go to the download page listed previously and access the Windows installer. Run the downloaded executable and follow the onscreen instructions. It's recommended to install to the default location. Once that's done, you're ready to set up the bindings. Linux Using the online installer from Qt's website is the easiest approach, though it may be possible to install through your system's package manager (if you want to try the package manager approach, then first read the Qt Linux documentation. On most Linux platforms, the Qt downloads website will correctly detect the platform and offer a simple run installer. After downloading the file, you should make it executable and then run it: On Linux, you need to make the install file executable and run it This will start the installer just as on macOS; from here, follow the onscreen instructions and complete the installation. License / Qt account When it comes to the login screen, then you should enter your Qt account details if you have them. If you qualify for their open source license (GPL or LGPL), you can skip this step—to do so; make sure the email and password fields are empty. Installing qt (the bindings) To use qt (the Go Qt bindings), we need to download the project and its dependencies and then run a setup script to configure and compile the library. If using Windows, it's recommended to use the MSYS2 Terminal. If you installed the Qt download to anything other than the default location, then make sure to set up the QT_DIR environment variable to the location you chose. First, the library and its dependencies should be installed using the go tools, by running go get github.com/sirupsen/logrus and go get github.com/therecipe/qt. Once the download has completed, we need to run the qtsetup tool, which is included in the qt project; so, within the cmd/qtsetup folder, execute go run main.go. Using a Linux Terminal, it should look something like this: Executing the qtsetup script for therecipe/qt bindings Once this process completes, the bindings should be ready to use. If you encounter errors, then it's probably because the Qt tools aren't correctly installed or the location was customized and you forgot to set the QT_DIR environment variable. Build To build our first qt application with Go, let's make another Hello World application. As with previous examples, we'll make use of a simple vertical box layout within a single application window. The following code should be sufficient to load your first application: package main import ( "os" "github.com/therecipe/qt/widgets" ) func main() { app := widgets.NewQApplication(len(os.Args), os.Args) window := widgets.NewQMainWindow(nil, 0) window.SetWindowTitle("Hello World") widget := widgets.NewQWidget(window, 0) widget.SetLayout(widgets.NewQVBoxLayout()) window.SetCentralWidget(widget) label := widgets.NewQLabel2("Hello World!", window, 0) widget.Layout().AddWidget(label) button := widgets.NewQPushButton2("Quit", window) button.ConnectClicked(func(bool) { app.QuitDefault() }) widget.Layout().AddWidget(button) window.Show() widgets.QApplication_Exec() } Let's note a few details from this code snippet. You'll see that each of the widget constructor functions takes (typically) two parameters, each is the parent widget and a flags parameter. Additional types passed in will usually be added before these values with a note in the function name that there are additional parameters. For example, widgets.NewQLabel2(title, parent, flags) is equivalent to widgets.NewQLabel(parent, flags).SetTitle(title). Additionally, you'll see that the layout is applied to a new widgets.QWidget through SetLayout(layout), and that's set to the window content through window.SetCentralWidget(widget). To load the display and run the application, we call window.Show() and then widgets.QApplication_Exec(). This file is built in the usual way with go build hello.go: Building is simple though the output file is rather large The file built is quite large due to the size of the Qt framework. This will be reduced significantly when packaging for a specific distribution. Run The output of the build phase is a binary that can be executed on the current computer, either on the command line or by double-clicking in a file manager. Additionally, you could execute it directly with go run hello.go—either way, you should see a simple window, as shown here: qt Hello on Linux Running on macOS At this stage, the binaries can be executed on a computer with the same architecture that also has Qt installed. Object model and event handling The Qt framework is written using the C++ language, and so much of its architecture will be familiar to those who've coded in C++ before. It's important to note that Go isn't a complete object-oriented language and, as such, doesn't match these capabilities directly. In particular, we should look at inheritance as it's important to the Qt object model. Inheritance The Qt API is a fully object-oriented model that makes heavy use of the inheritance model. While Go doesn't truly support object-oriented inheritance in the traditional manner, its composition approach is very powerful and works well in its place. The result means that you probably won't notice the difference! Memory management As you'll have noticed in the preceding example, each widget expects the parent to be passed to the constructing function. This enables the Qt framework to handle the tidying up and freeing of memory when a tree of widgets is removed. QObject (which is the base object for all of the Qt API) keeps track of its child objects and so, when being removed, can remove its children too. This makes the creation and deletion of complex widget hierarchies easier to handle correctly. To make use of this feature, you should always remember to pass the parent object to a widget's constructor (the Go functions starting with New...), despite the fact that passing nil may look like it's working. Signals and slots Qt is similar to GTK+, an event-driven framework and uses signals extensively to handle event management and data communications. In Qt, this concept is split into signals and slots; a signal is what will be generated when an event occurs and a slot is what can receive a signal. The action of setting a slot to receive a signal is called connecting and this causes a slot function to be called when its connected signal is invoked. In Qt, these are typed events meaning that each signal has a list of type parameters associated with it. When the signal is defined, this type is set and any slot wishing to connect to the signal will need to have the same type. s.ConnectMySignal( func(msg string) { log.Println("Signalled message", msg) } ) Signals and slots are what power user interfaces generated with Qt Designer and are the recommended way of handling multi-threaded applications. A signal may fire from a background thread and the user interface code can connect this signal to its own slot—in essence, listening for the signal. When the signal fires, any associated data (parameters to the signal) will be passed from one thread to another so it can be used safely within the GUI updates. As Qt is a lightweight binding to the Qt API, the Go-specific documentation is minimal but you can find out a lot more about the Qt design and all of the classes available in the official documentation available at Qt's blog post. In this article, we have learned about the Qt framework and the multiple platforms, therecipe/qt, installation of qt (the bindings), and much more. To know more about Go-GTK and platforms with GTK, check out the book Hands-On GUI Application Development in Go. Qt Creator 4.9 Beta released with QML support, programming language support and more! Qt Design Studio 1.1 released with Qt Photoshop bridge, updated timeline and more Qt for Python 5.12 released with PySide2, Qt GUI and more
Read more
  • 0
  • 0
  • 3555