Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-analyzing-moby-dick-frequency-distribution-nltk
Richard Gall
30 Mar 2018
2 min read
Save for later

Analyzing Moby Dick through frequency distribution with NLTK

Richard Gall
30 Mar 2018
2 min read
What is frequency distribution and why does it matter? In the context of natural language processing, frequency distribution is simply a tally of the number of times each unique word is used in a text. Recording the individual word counts of a text can better help us understand not only what topics are being discussed and what information is important but also how that information is being discussed as well. It's a useful method for better understanding language and different types of texts. This video tutorial has been taken from from Natural Language Processing with Python. Word frequency distribution is central to performing content analysis with NLP. Its applications are wide ranging. From understanding and characterizing an author’s writing style to analyzing the vocabulary of rappers, the technique is playing a large part in wider cultural conversations. It’s also used in psychological research in a number of ways to analyze how patients use language to form frameworks for thinking about themselves and the world. Trivial or serious, word frequency distribution is becoming more and more important in the world of research. Of course, manually creating such a word frequency distribution models would be time consuming and inconvenient for data scientists. Fortunately for us, NLTK, Python’s toolkit for natural language processing, makes life much easier. How to use NLTK for frequency distribution Take a look at how to use NLTK to create a frequency distribution for Herman Melville’s Moby Dick in the video tutorial above. In it, you'll find a step by step guide to performing an important data analysis task. Once you've done that, you can try it for yourself, or have a go at performing a similar analysis on another data set. Read Analyzing Textual information using the NLTK library. Learn more about natural language processing - read How to create a conversational assistant or chatbot using Python.
Read more
  • 0
  • 0
  • 3570

article-image-django-and-django-rest-frameworks-build-restful-app
Sugandha Lahoti
29 Mar 2018
12 min read
Save for later

Getting started with Django and Django REST frameworks to build a RESTful app

Sugandha Lahoti
29 Mar 2018
12 min read
In this article, we will learn how to install Django and Django REST framework in an isolated environment. We will also look at the Django folders, files, and configurations, and how to create an app with Django. We will also introduce various command-line and GUI tools that are use to interact with the RESTful Web Services. Installing Django and Django REST frameworks in an isolated environment First, run the following command to install the Django web framework: pip install django==1.11.5 The last lines of the output will indicate that the django package has been successfully installed. The process will also install the pytz package that provides world time zone definitions. Take into account that you may also see a notice to upgrade pip. The next lines show a sample of the four last lines of the output generated by a successful pip installation: Collecting django Collecting pytz (from django) Installing collected packages: pytz, django Successfully installed django-1.11.5 pytz-2017.2 Now that we have installed the Django web framework, we can install Django REST framework. Django REST framework works on top of Django and provides us with a powerful and flexible toolkit to build RESTful Web Services. We just need to run the following command to install this package: pip install djangorestframework==3.6.4 The last lines for the output will indicate that the djangorestframework package has been successfully installed, as shown here: Collecting djangorestframework Installing collected packages: djangorestframework Successfully installed djangorestframework-3.6.4 After following the previous steps, we will have Django REST framework 3.6.4 and Django 1.11.5 installed in our virtual environment. Creating an app with Django Now, we will create our first app with Django and we will analyze the directory structure that Django creates. First, go to the root folder for the virtual environment: 01. In Linux or macOS, enter the following command: cd ~/HillarDjangoREST/01 If you prefer Command Prompt, run the following command in the Windows command line: cd /d %USERPROFILE%\HillarDjangoREST\01 If you prefer Windows PowerShell, run the following command in Windows PowerShell: cd /d $env:USERPROFILE\HillarDjangoREST\01 In Linux or macOS, run the following command to create a new Django project named restful01. The command won't produce any output: python bin/django-admin.py startproject restful01 In Windows, in either Command Prompt or PowerShell, run the following command to create a new Django project named restful01. The command won't produce any output: python Scripts\django-admin.py startproject restful01 The previous command creates a restful01 folder with other subfolders and Python files. Now, go to the recently created restful01 folder. Just execute the following command on any platform: cd restful01 Then, run the following command to create a new Django app named toys within the restful01 Django project. The command won't produce any output: python manage.py startapp toys The previous command creates a new restful01/toys subfolder, with the following files: views.py tests.py models.py apps.py admin.py   init  .py In addition, the restful01/toys folder will have a migrations subfolder with an init  .py Python script. The following diagram shows the folders and files in the directory tree, starting at the restful01 folder with two subfolders - toys and restful01: Understanding Django folders, files, and configurations After we create our first Django project and then a Django app, there are many new folders and files. First, use your favorite editor or IDE to check the Python code in the apps.py file within the restful01/toys folder (restful01\toys in Windows). The following lines show the code for this file: from django.apps import AppConfig class ToysConfig(AppConfig): name = 'toys' The code declares the ToysConfig class as a subclass of the django.apps.AppConfig class that represents a Django application and its configuration. The ToysConfig class just defines the name class attribute and sets its value to 'toys'. Now, we have to add toys.apps.ToysConfig as one of the installed apps in the restful01/settings.py file that configures settings for the restful01 Django project. I built the previous string by concatenating many values as follows: app name + .apps. + class name, which is, toys + .apps. + ToysConfig. In addition, we have to add the rest_framework app to make it possible for us to use Django REST framework. The restful01/settings.py file is a Python module with module-level variables that define the configuration of Django for the restful01 project. We will make some changes to this Django settings file. Open the restful01/settings.py file and locate the highlighted lines that specify the strings list that declares the installed apps. The following code shows the first lines for the settings.py file. Note that the file has more code: """ Django settings for restful01 project. Generated by 'django-admin startproject' using Django 1.11.5. For more information on this file, see https://docs.djangoproject.com/en/1.11/topics/settings/ For the full list of settings and their values, see https://docs.djangoproject.com/en/1.11/ref/settings/ """ import os # Build paths inside the project like this: os.path.join(BASE_DIR, ...) BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath( file ))) # Quick-start development settings - unsuitable for production # See https://docs.djangoproject.com/en/1.11/howto/deployment/checklist/ # SECURITY WARNING: keep the secret key used in production secret! SECRET_KEY = '+uyg(tmn%eo+fpg+fcwmm&x(2x0gml8)=cs@$nijab%)y$a*xe' # SECURITY WARNING: don't run with debug turned on in production! DEBUG = True ALLOWED_HOSTS = [] # Application definition INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', ]         Add the following two strings to the INSTALLED_APPS strings list and save the changes to the restful01/settings.py file: 'rest_framework' 'toys.apps.ToysConfig' The following lines show the new code that declares the INSTALLED_APPS string list with the added lines highlighted and with comments to understand what each added string means. The code file for the sample is included in the hillar_django_restful_01 folder: INSTALLED_APPS = [ 'django.contrib.admin', 'django.contrib.auth', 'django.contrib.contenttypes', 'django.contrib.sessions', 'django.contrib.messages', 'django.contrib.staticfiles', # Django REST framework 'rest_framework', # Toys application 'toys.apps.ToysConfig', ] This way, we have added Django REST framework and the toys application to our initial Django project named restful01. Installing tools Now, we will leave Django for a while and we will install many tools that we will use to interact with the RESTful Web Services that we will develop throughout this book. We will use the following different kinds of tools to compose and send HTTP requests and visualize the responses throughout our book: Command-line tools GUI tools Python code Web browser JavaScript code You can use any other application that allows you to compose and send HTTP requests. There are many apps that run on tablets and smartphones that allow you to accomplish this task. However, we will focus our attention on the most useful tools when building RESTful Web Services with Django. Installing Curl We will start installing command-line tools. One of the key advantages of command-line tools is that you can easily run again the HTTP requests again after we have built them for the first time, and we don't need to use the mouse or tap the screen to run requests. We can also easily build a script with batch requests and run them. As happens with any command-line tool, it can take more time to perform the first requests compared with GUI tools, but it becomes easier once we have performed many requests and we can easily reuse the commands we have written in the past to compose new requests. Curl, also known as cURL, is a very popular open source command-line tool and library that allows us to easily transfer data. We can use the curl command-line tool to easily compose and send HTTP requests and check their responses. In Linux or macOS, you can open a Terminal and start using curl from the command line. In Windows, you have two options. You can work with curl in Command Prompt or you can decide to install curl as part of the Cygwin package installation option and execute it from the Cygwin terminal. You can read more about the Cygwin terminal and its installation procedure at: http://cygwin.com/install.html. Windows Powershell includes a curl alias that calls the Invoke-WebRequest command, and therefore, if you want to work with Windows Powershell with curl, it is necessary to remove the curl alias. If you want to use the curl command within Command Prompt, you just need to download and unzip the latest version of the curl download page: https://curl.haxx.se/download.html. Make sure you download the version that includes SSL and SSH. The following screenshot shows the available downloads for Windows. The Win64 - Generic section includes the versions that we can run in Command Prompt or Windows Powershell. After you unzip the .7zip or .zip file you have downloaded, you can include the folder in which curl.exe is included in your path. For example, if you unzip the Win64 x86_64.7zip file, you will find curl.exe in the bin folder. The following screenshot shows the results of executing curl --version on Command Prompt in Windows 10. The --version option makes curl display its version and all the libraries, protocols, and features it supports: Installing HTTPie Now, we will install HTTPie, a command-line HTTP client written in Python that makes it easy to send HTTP requests and uses a syntax that is easier than curl. By default, HTTPie displays colorized output and uses multiple lines to display the response details. In some cases, HTTPie makes it easier to understand the responses than the curl utility. However, one of the great disadvantages of HTTPie as a command-line utility is that it takes more time to load than curl, and therefore, if you want to code scripts with too many commands, you have to evaluate whether it makes sense to use HTTPie. We just need to make sure we run the following command in the virtual environment we have just created and activated. This way, we will install HTTPie only for our virtual environment. Run the following command in the terminal, Command Prompt, or Windows PowerShell to install the httpie package: pip install --upgrade httpie The last lines of the output will indicate that the httpie package has been successfully installed: Collecting httpie Collecting colorama>=0.2.4 (from httpie) Collecting requests>=2.11.0 (from httpie) Collecting Pygments>=2.1.3 (from httpie) Collecting idna<2.7,>=2.5 (from requests>=2.11.0->httpie) Collecting urllib3<1.23,>=1.21.1 (from requests>=2.11.0->httpie) Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.11.0->httpie) Collecting certifi>=2017.4.17 (from requests>=2.11.0->httpie) Installing collected packages: colorama, idna, urllib3, chardet, certifi, requests, Pygments, httpie Successfully installed Pygments-2.2.0 certifi-2017.7.27.1 chardet-3.0.4 colorama-0.3.9 httpie-0.9.9 idna-2.6 requests-2.18.4 urllib3-1.22 Now, we will be able to use the http command to easily compose and send HTTP requests to our future RESTful Web Services build with Django. The following screenshot shows the results of executing http on Command Prompt in Windows 10. HTTPie displays the valid options and indicates that a URL is required: Installing the Postman REST client So far, we have installed two terminal-based or command-line tools to compose and send HTTP requests to our Django development server: cURL and HTTPie. Now, we will start installing Graphical User Interface (GUI) tools. Postman is a very popular API testing suite GUI tool that allows us to easily compose and send HTTP requests, among other features. Postman is available as a standalone app in Linux, macOS, and Windows. You can download the versions of the Postman app from the following URL: https://www.getpostman.com. The following screenshot shows the HTTP GET request builder in Postman: Installing Stoplight Stoplight is a very useful GUI tool that focuses on helping architects and developers to model complex APIs. If we need to consume our RESTful Web Service in many different programming languages, we will find Stoplight extremely helpful. Stoplight provides an HTTP request maker that allows us to compose and send requests and generate the necessary code to make them in different programming languages, such as JavaScript, Swift, C#, PHP, Node, and Go, among others. Stoplight provides a web version and is also available as a standalone app in Linux, macOS, and Windows. You can download the versions of Stoplight from the following URL: http://stoplight.io/. The following screenshot shows the HTTP GET request builder in Stoplight with the code generation at the bottom: Installing iCurlHTTP We can also use apps that can compose and send HTTP requests from mobile devices to work with our RESTful Web Services. For example, we can work with the iCurlHTTP app on iOS devices such as iPad and iPhone: https://itunes.apple.com/us/app/icurlhttp/id611943891. On Android devices, we can work with the HTTP Request app: https://play.google.com/store/apps/details?id=air.http.request&hl=en. The following screenshot shows the UI for the iCurlHTTP app running on an iPad Pro: At the time of writing, the mobile apps that allow you to compose and send HTTP requests do not provide all the features you can find in Postman or command-line utilities. We learnt to set up a virtual environment with Django and Django REST framework and created an app with Django. We looked at Django folders, files, and configurations and installed command-line and GUI tools to interact with the RESTful Web Services. This article is an excerpt from the book, Django RESTful Web Services, written by Gaston C. Hillar. This book serves as an easy guide to build Python RESTful APIs and web services with Django. The code bundle for the article is hosted on GitHub.
Read more
  • 0
  • 0
  • 3995

article-image-how-to-build-and-deploy-microservices-using-payara-micro
Gebin George
28 Mar 2018
9 min read
Save for later

How to build and deploy Microservices using Payara Micro

Gebin George
28 Mar 2018
9 min read
Payara Micro offers a new way to run Java EE or microservice applications. It is based on the Web profile of Glassfish and bundles few additional APIs. The distribution is designed keeping modern containerized environment in mind. Payara Micro is available to download as a standalone executable JAR, as well as a Docker image. It's an open source MicroProfile compatible runtime. Today, we will learn to use payara micro to build and deploy microservices. Here’s a list of APIs that are supported in Payara Micro: Servlets, JSTL, EL, and JSPs WebSockets JSF JAX-RS Chapter 4 [ 91 ] EJB lite JTA JPA Bean Validation CDI Interceptors JBatch Concurrency JCache We will be exploring how to build our services using Payara Micro in the next section. Building services with Payara Micro Let's start building parts of our Issue Management System (IMS), which is going to be a one-stop-destination for collaboration among teams. As the name implies, this system will be used for managing issues that are raised as tickets and get assigned to users for resolution. To begin the project, we will identify our microservice candidates based on the business model of IMS. Here, let's define three functional services, which will be hosted in their own independent Git repositories: ims-micro-users ims-micro-tasks ims-micro-notify You might wonder, why these three and why separate repositories? We could create much more fine-grained services and perhaps it wouldn't be wrong to do so. The answer lies in understanding the following points: Isolating what varies: We need to be able to independently develop and deploy each unit. Changes to one business capability or domain shouldn't require changes in other services more often than desired. Organisation or Team structure: If you define teams by business capability, then they can work independent of others and release features with greater agility. The tasks team should be able to evolve independent of the teams that are handling users or notifications. The functional boundaries should allow independent version and release cycle management. Transactional boundaries for consistency: Distributed transactions are not easy, thus creating services for related features that are too fine grained, and lead to more complexity than desired. You would need to become familiar with concepts like eventual consistency, but these are not easy to achieve in practice. Source repository per service: Setting up a single repository that hosts all the services is ideal when it's the same team that works on these services and the project is relatively small. But we are building our fictional IMS, which is a large complex system with many moving parts. Separate teams would get tightly coupled by sharing a repository. Moreover, versioning and tagging of releases will be yet another problem to solve. The projects are created as standard Java EE projects, which are Skinny WARs, that will be deployed using the Payara Micro server. Payara Micro allows us to delay the decision of using a Fat JAR or Skinny WAR. This gives us flexibility in picking the deployment choice at a later stage. As Maven is a widely adopted build tool among developers, we will use the same to create our example projects, using the following steps: mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-users - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-tasks - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false mvn archetype:generate -DgroupId=org.jee8ng -DartifactId=ims-micro-notify - DarchetypeArtifactId=maven-archetype-webapp -DinteractiveMode=false Once the structure is generated, update the properties and dependencies section of pom.xml with the following contents, for all three projects: <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <failOnMissingWebXml>false</failOnMissingWebXml> </properties> <dependencies> <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> Chapter 4 [ 93 ] <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> </dependencies> Next, create a beans.xml file under WEB-INF folder for all three projects: <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://xmlns.jcp.org/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/beans_2_0.xsd" bean-discovery-mode="all"> </beans> You can delete the index.jsp and web.xml files, as we won't be needing them. The following is the project structure of ims-micro-users. The same structure will be used for ims-micro-tasks and ims-micro-notify: The package name for users, tasks, and notify service will be as shown as the following: org.jee8ng.ims.users (inside ims-micro-users) org.jee8ng.ims.tasks (inside ims-micro-tasks) org.jee8ng.ims.notify (inside ims-micro-notify) Each of the above will in turn have sub-packages called boundary, control, and entity. The structure follows the Boundary-Control-Entity (BCE)/Entity-Control-Boundary (ECB) pattern. The JaxrsActivator shown as follows is required to enable the JAX-RS API and thus needs to be placed in each of the projects: import javax.ws.rs.ApplicationPath; import javax.ws.rs.core.Application; @ApplicationPath("resources") public class JaxrsActivator extends Application {} All three projects will have REST endpoints that we can invoke over HTTP. When doing RESTful API design, a popular convention is to use plural names for resources, especially if the resource could represent a collection. For example: /users /tasks The resource class names in the projects use the plural form, as it's consistent with the resource URL naming used. This avoids confusions such as a resource URL being called a users resource, while the class is named UserResource. Given that this is an opinionated approach, feel free to use singular class names if desired. Here's the relevant code for ims-micro-users, ims-micro-tasks, and ims-micronotify projects respectively. Under ims-micro-users, define the UsersResource endpoint: package org.jee8ng.ims.users.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("users") public class UsersResource { @GET Chapter 4 [ 95 ] @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("user works").build(); } } Under ims-micro-tasks, define the TasksResource endpoint: package org.jee8ng.ims.tasks.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("tasks") public class TasksResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("task works").build(); } } Under ims-micro-notify, define the NotificationsResource endpoint: package org.jee8ng.ims.notify.boundary; import javax.ws.rs.*; import javax.ws.rs.core.*; @Path("notifications") public class NotificationsResource { @GET @Produces(MediaType.APPLICATION_JSON) public Response get() { return Response.ok("notification works").build(); } } Once you build all three projects using mvn clean install, you will get your Skinny WAR files generated in the target directory, which can be deployed on the Payara Micro server. Running services with Payara Micro Download the Payara Micro server if you haven't already, from this link: https://www.payara.fish/downloads. The micro server will have the name payara-micro-xxx.jar, where xxx will be the version number, which might be different when you download the file. Here's how you can start Payara Micro with our services deployed locally. When doing so, we need to ensure that the instances start on different ports, to avoid any port conflicts: >java -jar payara-micro-xxx.jar --deploy ims-micro-users/target/ims-microusers. war --port 8081 >java -jar payara-micro-xxx.jar --deploy ims-micro-tasks/target/ims-microtasks. war --port 8082 >java -jar payara-micro-xxx.jar --deploy ims-micro-notify/target/ims-micronotify. war --port 8083 This will start three instances of Payara Micro running on the specified ports. This makes our applications available under these URLs: http://localhost:8081/ims-micro-users/resources/users/ http://localhost:8082/ims-micro-tasks/resources/tasks/ http://localhost:8083/ims-micro-notify/resources/notifications/ Payar Micro can be started on a non-default port by using the --port parameter, as we did earlier. This is useful when running multiple instances on the same machine. Another option is to use the --autoBindHttp parameter, which will attempt to connect on 8080 as the default port, and if that port is unavailable, it will try to bind on the next port up, repeating until it finds an available port. Examples of starting Payara Micro: Uber JAR option: Now, there's one more feature that Payara Micro provides. We can generate an Uber JAR as well, which would be the Fat JAR approach that we learnt in the Fat JAR section. To package our ims-micro-users project as an Uber JAR, we can run the following command: java -jar payara-micro-xxx.jar --deploy ims-micro-users/target/ims-microusers. war --outputUberJar users.jar This will generate the users.jar file in the directory where you run this command. The size of this JAR will naturally be larger than our WAR file, since it will also bundle the Payara Micro runtime in it. Here's how you can start the application using the generated JAR: java -jar users.jar The server parameters that we used earlier can be passed to this runnable JAR file too. Apart from the two choices we saw for running our microservice projects, there's a third option as well. Payara Micro provides an API based approach, which can be used to programmatically start the embedded server. We will expand upon these three services as we progress further into the realm of cloud based Java EE. We saw how to leverage the power of Payara Micro to run Java EE or microservice applications. You read an excerpt from the book, Java EE 8 and Angular written by Prashant Padmanabhan. This book helps you build high-performing enterprise applications using Java EE powered by Angular at the frontend.  
Read more
  • 0
  • 0
  • 7564

article-image-how-to-build-microservices-using-rest-framework
Gebin George
28 Mar 2018
7 min read
Save for later

How to build Microservices using REST framework

Gebin George
28 Mar 2018
7 min read
Today, we will learn to build microservices using REST framework. Our microservices are Java EE 8 web projects, built using maven and published as separate Payara Micro instances, running within docker containers. The separation allows them to scale individually, as well as have independent operational activities. Given the BCE pattern used, we have the business component split into boundary, control, and entity, where the boundary comprises of the web resource (REST endpoint) and business service (EJB). The web resource will publish the CRUD operations and the EJB will in turn provide the transactional support for each of it along with making external calls to other resources. Here's a logical view for the boundary consisting of the web resource and business service: The microservices will have the following REST endpoints published for the projects shown, along with the boundary classes XXXResource and XXXService: Power Your APIs with JAXRS and CDI, for Server-Sent Events. In IMS, we publish task/issue updates to the browser using an SSE endpoint. The code observes for the events using the CDI event notification model and triggers the broadcast. The ims-users and ims-issues endpoints are similar in API format and behavior. While one deals with creating, reading, updating, and deleting a User, the other does the same for an Issue. Let's look at this in action. After you have the containers running, we can start firing requests to the /users web resource. The following curl command maps the URI /users to the @GET resource method named getAll() and returns a collection (JSON array) of users. The Java code will simply return a Set<User>, which gets converted to JsonArray due to the JSON binding support of JSON-B. The method invoked is as follows: @GET public Response getAll() {... } curl -v -H 'Accept: application/json' http://localhost:8081/ims-users/resources/users ... HTTP/1.1 200 OK ... [{ "id":1,"name":"Marcus","email":"[email protected]" "credential":{"password":"1234","username":"marcus"} }, { "id":2,"name":"Bob","email":"[email protected]" "credential":{"password":"1234","username":"bob"} }] Next, for selecting one of the users, such as Marcus, we will issue the following curl command, which uses the /users/xxx path. This will map the URI to the @GET method which has the additional @Path("{id}") annotation as well. The value of the id is captured using the @PathParam("id") annotation placed before the field. The response is a User entity wrapped in the Response object returned. The method invoked is as follows: @GET @Path("{id}") public Response get(@PathParam("id") Long id) { ... } curl -v -H 'Accept: application/json' http://localhost:8081/ims-users/resources/users/1 ... HTTP/1.1 200 OK ... { "id":1,"name":"Marcus","email":"[email protected]" "credential":{"password":"1234","username":"marcus"} } In both the preceding methods, we saw the response returned as 200 OK. This is achieved by using a Response builder. Here's the snippet for the method: return Response.ok( ..entity here..).build(); Next, for submitting data to the resource method, we use the @POST annotation. You might have noticed earlier that the signature of the method also made use of a UriInfo object. This is injected at runtime for us via the @Context annotation. A curl command can be used to submit the JSON data of a user entity. The method invoked is as follows: @POST public Response add(User newUser, @Context UriInfo uriInfo) We make use of the -d flag to send the JSON body in the request. The POST request is implied: curl -v -H 'Content-Type: application/json' http://localhost:8081/ims-users/resources/users -d '{"name": "james", "email":"[email protected]", "credential": {"username":"james","password":"test123"}}' ... HTTP/1.1 201 Created ... Location: http://localhost:8081/ims-users/resources/users/3 The 201 status code is sent by the API to signal that an entity has been created, and it also returns the location for the newly created entity. Here's the relevant snippet to do this: //uriInfo is injected via @Context parameter to this method URI location = uriInfo.getAbsolutePathBuilder() .path(newUserId) // This is the new entity ID .build(); // To send 201 status with new Location return Response.created(location).build(); Similarly, we can also send an update request using the PUT method. The method invoked is as follows: @PUT @Path("{id}") public Response update(@PathParam("id") Long id, User existingUser) curl -v -X PUT -H 'Content-Type: application/json' http://localhost:8081/ims-users/resources/users/3 -d '{"name": "jameson", "email":"[email protected]"}' ... HTTP/1.1 200 Ok The last method we need to map is the DELETE method, which is similar to the GET operation, with the only difference being the HTTP method used. The method invoked is as follows: @DELETE @Path("{id}") public Response delete(@PathParam("id") Long id) curl -v -X DELETE http://localhost:8081/ims-users/resources/users/3 ... HTTP/1.1 200 Ok You can try out the Issues endpoint in a similar manner. For the GET requests of /users or /issues, the code simply fetches and returns a set of entity objects. But when requesting an item within this collection, the resource method has to look up the entity by the passed in id value, captured by @PathParam("id"), and if found, return the entity, or else a 404 Not Found is returned. Here's a snippet showing just that: final Optional<Issue> issueFound = service.get(id); //id obtained if (issueFound.isPresent()) { return Response.ok(issueFound.get()).build(); } return Response.status(Response.Status.NOT_FOUND).build(); The issue instance can be fetched from a database of issues, which the service object interacts with. The persistence layer can return a JPA entity object which gets converted to JSON for the calling code. We will look at persistence using JPA in a later section. For the update request which is sent as an HTTP PUT, the code captures the identifier ID using @PathParam("id"), similar to the previous GET operation, and then uses that to update the entity. The entity itself is submitted as a JSON input and gets converted to the entity instance along with the passed in message body of the payload. Here's the code snippet for that: @PUT @Path("{id}") public Response update(@PathParam("id") Long id, Issue updated) { updated.setId(id); boolean done = service.update(updated); return done ? Response.ok(updated).build() : Response.status(Response.Status.NOT_FOUND).build(); } The code is simple to read and does one thing—it updates the identified entity and returns the response containing the updated entity or a 404 for a non-existing entity. The service references that we have looked at so far are @Stateless beans which are injected into the resource class as fields: // Project: ims-comments @Stateless public class CommentsService {... } // Project: ims-issues @Stateless public class IssuesService {... } // Project: ims-users @Stateless public class UsersService {... } These will in turn have the EntityManager injected via @PersistenceContext. Combined with the resource and service, our components have made the boundary ready for clients to use. Similar to the WebSockets section in Chapter 6, Power Your APIs with JAXRS and CDI, in IMS, we use a @ServerEndpoint which maintains the list of active sessions and then uses that to broadcast a message to all users who are connected. A ChatThread keeps track of the messages being exchanged through the @ServerEndpoint class. For the message to besent, we use the stream of sessions and filter it by open sessions, then send the message for each of the sessions: chatSessions.getSessions().stream().filter(Session::isOpen) .forEach(s -> { try { s.getBasicRemote().sendObject(chatMessage); }catch(Exception e) {...} }); To summarize, we practically saw how to leverage REST framework to build microservices. This article is an excerpt from the book, Java EE 8 and Angular written by Prashant Padmanabhan. The book covers building modern user friendly web apps with Java EE  
Read more
  • 0
  • 0
  • 3121

article-image-getting-started-with-django-restful-web-services
Sugandha Lahoti
27 Mar 2018
19 min read
Save for later

Getting started with Django RESTful Web Services

Sugandha Lahoti
27 Mar 2018
19 min read
In this article, we will kick start with learning RESTful Web Service and subsequently learn how to create models and perform migration, serialization and deserialization in Django. Here’s a quick glance of what to expect in this article: Defining the requirements for RESTful Web Service Analyzing and understanding Django tables and the database Controlling, serialization and deserialization in Django Defining the requirements for RESTful Web Service Imagine a team of developers working on a mobile app for iOS and Android and requires a RESTful Web Service to perform CRUD operations with toys. We definitely don't want to use a mock web service and we don't want to spend time choosing and configuring an ORM (short for Object-Relational Mapping). We want to quickly build a RESTful Web Service and have it ready as soon as possible to start interacting with it in the mobile app. We really want the toys to persist in a database but we don't need it to be production-ready. Therefore, we can use the simplest possible relational database, as long as we don't have to spend time performing complex installations or configurations. Django REST framework, also known as DRF, will allow us to easily accomplish this task and start making HTTP requests to the first version of our RESTful Web Service. In this case, we will work with a very simple SQLite database, the default database for a new Django REST framework project. First, we must specify the requirements for our main resource: a toy. We need the following attributes or fields for a toy entity: An integer identifier A name An optional description A toy category description, such as action figures, dolls, or playsets A release date A bool value indicating whether the toy has been on the online store's homepage at least once In addition, we want to have a timestamp with the date and time of the toy's addition to the database table, which will be generated to persist toys. In a RESTful Web Service, each resource has its own unique URL. In our web service, each toy will have its own unique URL. The following table shows the HTTP verbs, the scope, and the semantics of the methods that our first version of the web service must support. Each method is composed of an HTTP verb and a scope. All the methods have a well-defined meaning for toys and collections: HTTP verb Scope Semantics GET Toy Retrieve a single toy GET Collection of toys Retrieve all the stored toys in the collection, sorted by their name in ascending order POST Collection of toys Create a new toy in the collection PUT Toy Update an existing toy DELETE Toy Delete an existing toy In the previous table, the GET HTTP verb appears twice but with two different scopes: toys and collection of toys. The first row shows a GET HTTP verb applied to a toy, that is, to a single resource. The second row shows a GET HTTP verb applied to a collection of toys, that is, to a collection of resources. We want our web service to be able to differentiate collections from a single resource of the collection in the URLs. When we refer to a collection, we will use a slash (/) as the last character for the URL, as in http://localhost:8000/toys. When we refer to a single resource of the collection we won't use a slash (/) as the last character for the URL, as in http://localhost:8000/toys/5. Let's consider that http://localhost:8000/toys/ is the URL for the collection of toys. If we add a number to the previous URL, we identify a specific toy with an ID or primary key equal to the specified numeric value. For example, http://localhost:8000/toys/42 identifies the toy with an ID equal to 42. We have to compose and send an HTTP request with the POST HTTP verb and http://localhost:8000/toys/ request URL to create a new toy and add it to the toys collection. In this example, our RESTful Web Service will work with JSON (short for JavaScript Object Notation), and therefore we have to provide the JSON key-value pairs with the field names and the values to create the new toy. As a result of the request, the server will validate the provided values for the fields, make sure that it is a valid toy, and persist it in the database. The server will insert a new row with the new toy in the appropriate table and it will return a 201 Created status code and a JSON body with the recently added toy serialized to JSON, including the assigned ID that was automatically generated by the database and assigned to the toy object: POST http://localhost:8000/toys/ We have to compose and send an HTTP request with the GET HTTP verb and http://localhost:8000/toys/{id} request URL to retrieve the toy whose ID matches the specified numeric value in {id}. For example, if we use the request URL http://localhost:8000/toys/25, the server will retrieve the toy whose ID matches 25. As a result of the request, the server will retrieve a toy with the specified ID from the database and create the appropriate toy object in Python. If a toy is found, the server will serialize the toy object into JSON, return a 200 OK status code, and return a JSON body with the serialized toy object. If no toy matches the specified ID, the server will return only a 404 Not Found status: GET http://localhost:8000/toys/{id} We have to compose and send an HTTP request with the PUT HTTP verb and request URL http://localhost:8000/toys/{id} to retrieve the toy whose ID matches the value in {id} and replace it with a toy created with the provided data. In addition, we have to provide the JSON key-value pairs with the field names and the values to create the new toy that will replace the existing one. As a result of the request, the server will validate the provided values for the fields, make sure that it is a valid toy, and replace the one that matches the specified ID with the new one in the database. The ID for the toy will be the same after the update operation. The server will update the existing row in the appropriate table and it will return a 200 OK status code and a JSON body with the recently updated toy serialized to JSON. If we don't provide all the necessary data for the new toy, the server will return a 400 Bad Request status code. If the server doesn't find a toy with the specified ID, the server will only return a 404 Not Found status: PUT http://localhost:8000/toys/{id} We have to compose and send an HTTP request with the DELETE HTTP verb and request URL http://localhost:8000/toys/{id} to remove the toy whose ID matches the specified numeric value in {id}. For example, if we use the request URL http://localhost:8000/toys/34, the server will delete the toy whose ID matches 34. As a result of the request, the server will retrieve a toy with the specified ID from the database and create the appropriate toy object in Python. If a toy is found, the server will request the ORM delete the toy row associated with this toy object and the server will return a 204 No Content status code. If no toy matches the specified ID, the server will return only a 404 Not Found status: DELETE http://localhost:8000/toys/{id} Creating first Django model Now, we will create a simple Toy model in Django, which we will use to represent and persist toys. Open the toys/models.py file. The following lines show the initial code for this file with just one import statement and a comment that indicates we should create the models: from django.db import models #Create your models here. The following lines show the new code that creates a Toy class, specifically, a Toy model in the toys/models.py file. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/models.py file: from django.db import models class Toy(models.Model): created = models.DateTimeField(auto_now_add=True) name = models.CharField(max_length=150, blank=False, default='') description = models.CharField(max_length=250, blank=True, default='') toy_category = models.CharField(max_length=200, blank=False, default='') release_date = models.DateTimeField() was_included_in_home = models.BooleanField(default=False) class Meta: ordering = ('name',) The Toy class is a subclass of the django.db.models.Model class and defines the following attributes: created, name, description, toy_category, release_date, and was_included_in_home. Each of these attributes represents a database column or field. We specified the field types, maximum lengths, and defaults for many attributes. The class declares a Meta inner class that declares an ordering attribute and sets its value to a tuple of string whose first value is the 'name' string. This way, the inner class indicates to Django that, by default, we want the results ordered by the name attribute in ascending order. Running initial migration Now, it is necessary to create the initial migration for the new Toy model we recently coded. We will also synchronize the SQLite database for the first time. By default, Django uses the popular self-contained and embedded SQLite database, and therefore we don't need to make changes in the initial ORM configuration. In this example, we will be working with this default configuration. Of course, we will upgrade to another database after we have a sample web service built with Django. We will only use SQLite for this example. We just need to run the following Python script in the virtual environment that we activated in the previous chapter. Make sure you are in the restful01 folder within the main folder for the virtual environment when you run the following command: python manage.py makemigrations toys The following lines show the output generated after running the previous command: Migrations for 'toys': toys/migrations/0001_initial.py: - Create model Toy The output indicates that the restful01/toys/migrations/0001_initial.py file includes the code to create the Toy model. The following lines show the code for this file that was automatically generated by Django. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/migrations/0001_initial.py file: # -*- coding: utf-8 -*- # Generated by Django 1.11.5 on 2017-10-08 05:19 from   future    import unicode_literals from django.db import migrations, models class Migration(migrations.Migration): initial = True dependencies = [ ] operations = [ migrations.CreateModel( name='Toy', fields=[ ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), ('created', models.DateTimeField(auto_now_add=True)), ('name', models.CharField(default='', max_length=150)), ('description', models.CharField(blank=True, default='', max_length=250)), ('toy_category', models.CharField(default='', max_length=200)), ('release_date', models.DateTimeField()), ('was_included_in_home', models.BooleanField(default=False)), ], options={ 'ordering': ('name',), }, ), ] Understanding migrations The automatically generated code defines a subclass of the django.db.migrations.Migration class named Migration, which defines an operation that creates the Toy model's table and includes it in the operations attribute. The call to the migrations.CreateModel method specifies the model's name, the fields, and the options to instruct the ORM to create a table that will allow the underlying database to persist the model. The fields argument is a list of tuples that includes information about the field name, the field type, and additional attributes based on the data we provided in our model, that is, in the Toy class. Now, run the following Python script to apply all the generated migrations. Make sure you are in the restful01 folder within the main folder for the virtual environment when you run the following command: python manage.py migrate The following lines show the output generated after running the previous command: Operations to perform: Apply all migrations: admin, auth, contenttypes, sessions, toys Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying sessions.0001_initial... OK Applying toys.0001_initial... OK After we run the previous command, we will notice that the root folder for our restful01 project now has a db.sqlite3 file that contains the SQLite database. We can use the SQLite command line or any other application that allows us to easily check the contents of the SQLite database to check the tables that Django generated. The first migration will generate many tables required by Django and its installed apps before running the code that creates the table for the Toys model. These tables provide support for user authentication, permissions, groups, logs, and migration management. We will work with the models related to these tables after we add more features and security to our web services. After the migration process creates all these Django tables in the underlying database, the first migration runs the Python code that creates the table required to persist our model. Thus, the last line of the running migrations section displays Applying toys.0001_initial. Analyzing the database In most modern Linux distributions and macOS, SQLite is already installed, and therefore you can run the sqlite3 command-line utility. In Windows, if you want to work with the sqlite3.exe command-line utility, you have to download the bundle of command-line tools for managing SQLite database files from the downloads section of the SQLite webpage at http://www.sqlite.org/download.html. For example, the ZIP file that includes the command-line tools for version 3.20.1 is sqlite- tools-win32-x8 6-3200100.zip. The name for the file changes with the SQLite version. You just need to make sure that you download the bundle of command-line tools and not the ZIP file that provides the SQLite DLLs. After you unzip the file, you can include the folder that includes the command-line tools in the PATH environment variable, or you can access the sqlite3.exe command-line utility by specifying the full path to it. Run the following command to list the generated tables. The first argument, db.sqlite3, specifies the file that contains that SQLite database and the second argument indicates the command that we want the sqlite3 command-line utility to run against the specified database: sqlite3 db.sqlite3 ".tables" The following lines show the output for the previous command with the list of tables that Django generated in the SQLite database: auth_group                django_admin_log auth_group_permissions                          django_content_type auth_permission                         django_migrations auth_user                 django_session auth_user_groups                      toys_toy auth_user_user_permissions The following command will allow you to check the contents of the toys_toy table after we compose and send HTTP requests to the RESTful Web Service and the web service makes CRUD operations to the toys_toy table: sqlite3 db.sqlite3 "SELECT * FROM toys_toy ORDER BY name;" Instead of working with the SQLite command-line utility, you can use a GUI tool to check the contents of the SQLite database. DB Browser for SQLite is a useful, free, multiplatform GUI tool that allows us to easily check the database contents of an SQLite database in Linux, macOS, and Windows. You can read more information about this tool and download its different versions from http://sqlitebrowser.org. Once you have installed the tool, you just need to open the db.sqlite3 file and you can check the database structure and browse the data for the different tables. After we start working with the first version of our web service, you need to check the contents of the toys_toy table with this tool The SQLite database engine and the database file name are specified in the restful01/settings.py Python file. The following lines show the declaration of the DATABASES dictionary, which contains the settings for all the databases that Django uses. The nested dictionary maps the database named default with the django.db.backends.sqlite3 database engine and the db.sqlite3 database file located in the BASE_DIR folder (restful01): DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': os.path.join(BASE_DIR, 'db.sqlite3'), } } After we execute the migrations, the SQLite database will have the following tables. Django uses prefixes to identify the modules and applications that each table belongs to. The tables that start with the auth_ prefix belong to the Django authentication module. The table that starts with the toys_ prefix belongs to our toys application. If we add more models to our toys application, Django will create new tables with the toys_ prefix: auth_group: Stores authentication groups auth_group_permissions: Stores permissions for authentication groups auth_permission: Stores permissions for authentication auth_user: Stores authentication users auth_user_groups: Stores authentication user groups auth_user_groups_permissions: Stores permissions for authentication user groups django_admin_log: Stores the Django administrator log django_content_type: Stores Django content types django_migrations: Stores the scripts generated by Django migrations and the date and time at which they were applied django_session: Stores Django sessions toys_toy: Persists the Toys model sqlite_sequence: Stores sequences for SQLite primary keys with autoincrement fields Understanding the table generated by Django The toys_toy table persists in the database the Toy class we recently created, specifically, the Toy model. Django's integrated ORM generated the toys_toy table based on our Toy model. Run the following command to retrieve the SQL used to create the toys_toy table: sqlite3 db.sqlite3 ".schema toys_toy" The following lines show the output for the previous command together with the SQL that the migrations process executed, to create the toys_toy table that persists the Toy model. The next lines are formatted to make it easier to understand the SQL code. Notice that the output from the command is formatted in a different way: CREATE TABLE IF NOT EXISTS "toys_toy" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "created" datetime NOT NULL, "name" varchar(150) NOT NULL, "description" varchar(250) NOT NULL, "toy_category" varchar(200) NOT NULL, "release_date" datetime NOT NULL, "was_included_in_home" bool NOT NULL ); The toys_toy table has the following columns (also known as fields) with their SQLite types, all of them not nullable: id: The integer primary key, an autoincrement row created: DateTime name: varchar(150) description: varchar(250) toy_category: varchar(200) release_date: DateTime was_included_in_home: bool Controlling, serialization and deserialization Our RESTful Web Service has to be able to serialize and deserialize the Toy instances into JSON representations. In Django REST framework, we just need to create a serializer class for the Toy instances to manage serialization to JSON and deserialization from JSON. Now, we will dive deep into the serialization and deserialization process in Django REST framework. It is very important to understand how it works because it is one of the most important components for all the RESTful Web Services we will build. Django REST framework uses a two-phase process for serialization. The serializers are mediators between the model instances and Python primitives. Parser and renderers handle as mediators between Python primitives and HTTP requests and responses. We will configure our mediator between the Toy model instances and Python primitives by creating a subclass of the rest_framework.serializers.Serializer class to declare the fields and the necessary methods to manage serialization and deserialization. We will repeat some of the information about the fields that we have included in the Toy model so that we understand all the things that we can configure in a subclass of the Serializer class. However, we will work with shortcuts, which will reduce boilerplate code later in the following examples. We will write less code in the following examples by using the ModelSerializer class. Now, go to the restful01/toys folder and create a new Python code file named serializers.py. The following lines show the code that declares the new ToySerializer class. The code file for the sample is included in the hillar_django_restful_02_01 folder in the restful01/toys/serializers.py file: from rest_framework import serializers from toys.models import Toy class ToySerializer(serializers.Serializer): pk = serializers.IntegerField(read_only=True) name = serializers.CharField(max_length=150) description = serializers.CharField(max_length=250) release_date = serializers.DateTimeField() toy_category = serializers.CharField(max_length=200) was_included_in_home = serializers.BooleanField(required=False) def create(self, validated_data): return Toy.objects.create(**validated_data) def update(self, instance, validated_data): instance.name = validated_data.get('name', instance.name) instance.description = validated_data.get('description', instance.description) instance.release_date = validated_data.get('release_date', instance.release_date) instance.toy_category = validated_data.get('toy_category', instance.toy_category) instance.was_included_in_home = validated_data.get('was_included_in_home', instance.was_included_in_home) instance.save() return instance The ToySerializer class declares the attributes that represent the fields that we want to be serialized. Notice that we have omitted the created attribute that was present in the Toy model. When there is a call to the save method that ToySerializer inherits from the serializers.Serializer superclass, the overridden create and update methods define how to create a new instance or update an existing instance. In fact, these methods must be implemented in our class because they only raise a NotImplementedError exception in their base declaration in the serializers.Serializer superclass. The create method receives the validated data in the validated_data argument. The code creates and returns a new Toy instance based on the received validated data. The update method receives an existing Toy instance that is being updated and the new validated data in the instance and validated_data arguments. The code updates the values for the attributes of the instance with the updated attribute values retrieved from the validated data. Finally, the code calls the save method for the updated Toy instance and returns the updated and saved instance. We designed a RESTful Web Service to interact with a simple SQLite database and perform CRUD operations with toys. We defined the requirements for our web service and understood the tasks performed by each HTTP method and different scope. You read an excerpt from the book, Django RESTful Web Services, written by Gaston C. Hillar. This book helps developers build complex RESTful APIs from scratch with Django and the Django REST Framework. The code bundle for the article is hosted on GitHub.  
Read more
  • 0
  • 0
  • 4436

article-image-how-to-perform-full-text-search-fts-in-postgresql
Sugandha Lahoti
27 Mar 2018
8 min read
Save for later

How to perform full-text search (FTS) in PostgreSQL

Sugandha Lahoti
27 Mar 2018
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Mastering  PostgreSQL 10, written by Hans-Jürgen Schönig. This book provides expert techniques on PostgreSQL 10 development and administration.[/box] If you are looking up names or for simple strings, you are usually querying the entire content of a field. In Full-Text-Search (FTS), this is different. The purpose of the full-text search is to look for words or groups of words, which can be found in a text. Therefore, FTS is more of a contains operation as you are basically never looking for an exact string. In this article, we will show how to perform a full-text search operation in PostgreSQL. In PostgreSQL, FTS can be done using GIN indexes. The idea is to dissect a text, extract valuable lexemes (= "preprocessed tokens of words"), and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed. Here is an example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars'); to_tsvector --------------------------------------------------------------- 'car':2,6,14 'even':10 'mani':13 'mind':11 'want':4 'would':8 (1 row) The example shows a simple sentence. The to_tsvector function will take the string, apply English rules, and perform a stemming process. Based on the configuration (english), PostgreSQL will parse the string, throw away stop words, and stem individual words. For example, car and cars will be transformed to the car. Note that this is not about finding the word stem. In the case of many, PostgreSQL will simply transform the string to mani by applying standard rules working nicely with the English language. Note that the output of the to_tsvector function is highly language dependent. If you tell PostgreSQL to treat the string as dutch, the result will be totally different: test=# SELECT to_tsvector('dutch', 'A car, I want a car. I would not even mind having many cars'); to_tsvector ----------------------------------------------------------------- 'a':1,5 'car':2,6,14 'even':10 'having':12 'i':3,7 'many':13 'mind':11 'not':9 'would':8 (1 row) To figure out which configurations are supported, consider running the following query: SELECT cfgname FROM pg_ts_config; Comparing strings After taking a brief look at the stemming process, it is time to figure out how a stemmed text can be compared to a user query. The following code snippet checks for the word wanted: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted'); ?column? ---------- t (1 row) Note that wanted does not actually show up in the original text. Still, PostgreSQL will return true. The reason is that want and wanted are both transformed to the same lexeme, so the result is true. Practically, this makes a lot of sense. Imagine you are looking for a car on Google. If you find pages selling cars, this is totally fine. Finding common lexemes is, therefore, an intelligent idea. Sometimes, people are not only looking for a single word, but want to find a set of words. With to_tsquery, this is possible, as shown in the next example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted & bmw'); ?column? ---------- f (1 row) In this case, false is returned because bmw cannot be found in our input string. In the to_tsquery function, & means and and | means or. It is therefore easily possible to build complex search strings. Defining GIN indexes If you want to apply text search to a column or a group of columns, there are basically two choices: Create a functional index using GIN Add a column containing ready-to-use tsvectors and a trigger to keep them in sync In this section, both options will be outlined. To show how things work, I have created some sample data: test=# CREATE TABLE t_fts AS SELECT comment FROM pg_available_extensions; SELECT 43 Indexing the column directly with a functional index is definitely a slower but more space efficient way to get things done: test=# CREATE INDEX idx_fts_func ON t_fts USING gin(to_tsvector('english', comment)); CREATE INDEX Deploying an index on the function is easy, but it can lead to some overhead. Adding a materialized column needs more space, but will lead to a better runtime behavior: test=# ALTER TABLE t_fts ADD COLUMN ts tsvector; ALTER TABLE The only trouble is, how do you keep this column in sync? The answer is by using a trigger: test=# CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON t_fts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(somename, 'pg_catalog.english', 'comment'); Fortunately, PostgreSQL already provides a C function that can be used by a trigger to sync the tsvector column. Just pass a name, the desired language, as well as a couple of columns to the function, and you are already done. The trigger function will take care of all that is needed. Note that a trigger will always operate within the same transaction as the statement making the modification. Therefore, there is no risk of being inconsistent. Debugging your search Sometimes, it is not quite clear why a query matches a given search string. To debug your query, PostgreSQL offers the ts_debug function. From a user's point of view, it can be used just like to_tsvector. It reveals a lot about the inner workings of the FTS infrastructure: test=# x Expanded display is on. test=# SELECT * FROM ts_debug('english', 'go to www.postgresql-support.de'); -[ RECORD 1 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | go dictionaries | {english_stem} dictionary           | english_stem lexemes    | {go} -[ RECORD 2 ]+---------------------------- alias  | blank Description | Space symbols token   |         dictionaries | {}         dictionary     |         lexemes       | -[ RECORD 3 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | to dictionaries | {english_stem} dictionary   | english_stem lexemes    | {} -[ RECORD 4 ]+---------------------------- alias  | blank description | Space symbols token | dictionaries | {} dictionary   | lexemes          | -[ RECORD 5 ]+---------------------------- alias  | host description | Host token      | www.postgresql-support.de dictionaries | {simple} dictionary | simple lexemes    | {www.postgresql-support.de} ts_debug will list every token found and display information about the token. You will see which token the parser found, the dictionary used, as well as the type of object. In my example, blanks, words, and hosts have been found. You might also see numbers, email addresses, and a lot more. Depending on the type of string, PostgreSQL will handle things differently. For example, it makes absolutely no sense to stem hostnames and e-mail addresses. Gathering word statistics Full-text search can handle a lot of data. To give end users more insights into their texts, PostgreSQL offers the pg_stat function, which returns a list of words: SELECT * FROM ts_stat('SELECT to_tsvector(''english'', comment) FROM pg_available_extensions') ORDER BY 2 DESC LIMIT 3; word   | ndoc | nentry ----------+------+-------- function | 10 |   10 data      |      10 |  10 type        |   7  |     7 (3 rows) The word column contains the stemmed word, ndoc tells us about the number of documents a certain word occurs.nentry indicates how often a word was found all together. Taking advantage of exclusion operators So far, indexes have been used to speed things up and to ensure uniqueness. However, a couple of years ago, somebody came up with the idea of using indexes for even more. As you have seen in this chapter, GiST supports operations such as intersects, overlaps, contains, and a lot more. So, why not use those operations to manage data integrity? Here is an example: test=# CREATE EXTENSION btree_gist; test=# CREATE TABLE t_reservation ( room int, from_to tsrange, EXCLUDE USING GiST (room with =, from_to with &&) ); CREATE TABLE The EXCLUDE  USING  GiST clause defines additional constraints. If you are selling rooms, you might want to allow different rooms to be booked at the same time. However, you don't want to sell the same room twice during the same period. What the EXCLUDE clause says in my example is this, if a room is booked twice at the same time, an error should pop up (the data in from_to with must not overlap (&&) if it is related to the same room). The following two rows will not violate constraints: test=# INSERT INTO t_reservation VALUES (10, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 test=# INSERT INTO t_reservation VALUES (13, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 However, the next INSERT will cause a violation because the data overlaps: test=# INSERT INTO t_reservation VALUES (13, '["2017-02-02", "2017-08-14"]'); ERROR:  conflicting key value violates exclusion constraint "t_reservation_room_from_to_excl" DETAIL:   Key (room, from_to)=(13, ["2017-02-02 00:00:00","2017-08-14 00:00:00"]) conflicts with existing key (room, from_to)=(13, ["2017-01-01 00:00:00","2017-03-03 00:00:00"]). The use of exclusion operators is very useful and can provide you with highly advanced means to handle integrity. To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering  PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.  
Read more
  • 0
  • 0
  • 8671
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €14.99/month. Cancel anytime
article-image-creating-2d-3d-plots-using-matplotlib
Pravin Dhandre
22 Mar 2018
10 min read
Save for later

Creating 2D and 3D plots using Matplotlib

Pravin Dhandre
22 Mar 2018
10 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by L. Felipe Martins, Ruben Oliva Ramos and V Kishore Ayyadevara titled SciPy Recipes. This book provides data science recipes for users to effectively process, manipulate, and visualize massive datasets using SciPy.[/box] In today’s tutorial, we will demonstrate how to create two-dimensional and three-dimensional plots for displaying graphical representation of data using a full-fledged scientific library -  Matplotlib. Creating two-dimensional plots of functions and data We will present the basic kind of plot generated by Matplotlib: a two-dimensional display, with axes, where datasets and functional relationships are represented by lines. Besides the data being displayed, a good graph will contain a title (caption), axes labels, and, perhaps, a legend identifying each line in the plot. Getting ready Start Jupyter and run the following commands in an execution cell: %matplotlib inline import numpy as np import matplotlib.pyplot as plt How to do it… Run the following code in a single Jupyter cell: xvalues = np.linspace(-np.pi, np.pi) yvalues1 = np.sin(xvalues) yvalues2 = np.cos(xvalues) plt.plot(xvalues, yvalues1, lw=2, color='red', label='sin(x)') plt.plot(xvalues, yvalues2, lw=2, color='blue', label='cos(x)') plt.title('Trigonometric Functions') plt.xlabel('x') plt.ylabel('sin(x), cos(x)') plt.axhline(0, lw=0.5, color='black') plt.axvline(0, lw=0.5, color='black') plt.legend() None This code will insert the plot shown in the following screenshot into the Jupyter Notebook: How it works… We start by generating the data to be plotted, with the three following statements: xvalues = np.linspace(-np.pi, np.pi, 300) yvalues1 = np.sin(xvalues) yvalues2 = np.cos(xvalues) We first create an xvalues array, containing 300 equally spaced values between -π and π. We then compute the sine and cosine functions of the values in xvalues, storing the results in the yvalues1 and yvalues2 arrays. Next, we generate the first line plot with the following statement: plt.plot(xvalues, yvalues1, lw=2, color='red', label='sin(x)') The arguments to the plot() function are described as follows: xvalues and yvalues1 are arrays containing, respectively, the x and y coordinates of the points to be plotted. These arrays must have the same length. The remaining arguments are formatting options. lw specifies the line width and color the line color. The label argument is used by the legend() function, discussed as follows. The next line of code generates the second line plot and is similar to the one explained previously. After the line plots are defined, we set the title for the plot and the legends for the axes with the following commands: plt.title('Trigonometric Functions') plt.xlabel('x') plt.ylabel('sin(x), cos(x)') We now generate axis lines with the following statements: plt.axhline(0, lw=0.5, color='black') plt.axvline(0, lw=0.5, color='black') The first arguments in axhline() and axvline() are the locations of the axis lines and the options specify the line width and color. We then add a legend for the plot with the following statement: plt.legend() Matplotlib tries to place the legend intelligently, so that it does not interfere with the plot. In the legend, one item is being generated by each call to the plot() function and the text for each legend is specified in the label option of the plot() function. Generating multiple plots in a single figure Wouldn't it be interesting to know how to generate multiple plots in a single figure? Well, let's get started with that. Getting ready Start Jupyter and run the following three commands in an execution cell: %matplotlib inline import numpy as np import matplotlib.pyplot as plt How to do it… Run the following commands in a Jupyter cell: plt.figure(figsize=(6,6)) xvalues = np.linspace(-2, 2, 100) plt.subplot(2, 2, 1) yvalues = xvalues plt.plot(xvalues, yvalues, color='blue') plt.xlabel('$x$') plt.ylabel('$x$') plt.subplot(2, 2, 2) yvalues = xvalues ** 2 plt.plot(xvalues, yvalues, color='green') plt.xlabel('$x$') plt.ylabel('$x^2$') plt.subplot(2, 2, 3) yvalues = xvalues ** 3 plt.plot(xvalues, yvalues, color='red') plt.xlabel('$x$') plt.ylabel('$x^3$') plt.subplot(2, 2, 4) yvalues = xvalues ** 4 plt.plot(xvalues, yvalues, color='black') plt.xlabel('$x$') plt.ylabel('$x^3$') plt.suptitle('Polynomial Functions') plt.tight_layout() plt.subplots_adjust(top=0.90) None Running this code will produce results like those in the following screenshot: How it works… To start the plotting constructions, we use the figure() function, as shown in the following line of code: plt.figure(figsize=(6,6)) The main purpose of this call is to set the figure size, which needs adjustment, since we plan to make several plots in the same figure. After creating the figure, we add four plots with code, as demonstrated in the following segment: plt.subplot(2, 2, 3) yvalues = xvalues ** 3 plt.plot(xvalues, yvalues, color='red') plt.xlabel('$x$') plt.ylabel('$x^3$') In the first line, the plt.subplot(2, 2, 3) call tells pyplot that we want to organize the plots in a two-by-two layout, that is, in two rows and two columns. The last argument specifies that all following plotting commands should apply to the third plot in the array. Individual plots are numbered, starting with the value 1 and counting across the rows and columns of the plot layout. We then generate the line plot with the following statements: yvalues = xvalues ** 3 plt.plot(xvalues, yvalues, color='red') The first line of the preceding code computes the yvalues array, and the second draws the corresponding graph. Notice that we must set options such as line color individually for each subplot. After the line is plotted, we use the xlabel() and ylabel() functions to create labels for the axes. Notice that these have to be set up for each individual subplot too. After creating the subplots, we explain the subplots: plt.suptitle('Polynomial Functions') sets a common title for all Subplots plt.tight_layout() adjusts the area taken by each subplot, so that axes' legends do not overlap plt.subplots_adjust(top=0.90) adjusts the overall area taken by the plots, so that the title displays correctly Creating three-dimensional plots Matplotlib offers several different ways to visualize three-dimensional data. In this recipe, we will demonstrate the following methods: Drawing surfaces plots Drawing two-dimensional contour plots Using color maps and color bars Getting ready Start Jupyter and run the following three commands in an execution cell: %matplotlib inline import numpy as np import matplotlib.pyplot as plt How to do it… Run the following code in a Jupyter code cell: from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm f = lambda x,y: x**3 - 3*x*y**2 fig = plt.figure(figsize=(12,6)) ax = fig.add_subplot(1,2,1,projection='3d') xvalues = np.linspace(-2,2,100) yvalues = np.linspace(-2,2,100) xgrid, ygrid = np.meshgrid(xvalues, yvalues) zvalues = f(xgrid, ygrid) surf = ax.plot_surface(xgrid, ygrid, zvalues, rstride=5, cstride=5, linewidth=0, cmap=cm.plasma) ax = fig.add_subplot(1,2,2) plt.contourf(xgrid, ygrid, zvalues, 30, cmap=cm.plasma) fig.colorbar(surf, aspect=18) plt.tight_layout() None Running this code will produce a plot of the monkey saddle surface, which is a famous example of a surface with a non-standard critical point. The displayed graph is shown in the following screenshot: How it works… We start by importing the Axes3D class from the mpl_toolkits.mplot3d library, which is the Matplotlib object used for creating three-dimensional plots. We also import the cm class, which represents a color map. We then define a function to be plotted, with the following line of code: f = lambda x,y: x**3 - 3*x*y**2 The next step is to define the Figure object and an Axes object with a 3D projection, as done in the following lines of code: fig = plt.figure(figsize=(12,6)) ax = fig.add_subplot(1,2,1,projection='3d') Notice that the approach used here is somewhat different than the other recipes in this chapter. We are assigning the output of the figure() function call to the fig variable and then adding the subplot by calling the add_subplot() method from the fig object. This is the recommended method of creating a three-dimensional plot in the most recent version of Matplotlib. Even in the case of a single plot, the add_subplot() method should be used, in which case the command would be ax = fig.add_subplot(1,1,1,projection='3d'). The next few lines of code, shown as follows, compute the data for the plot: xvalues = np.linspace(-2,2,100) yvalues = np.linspace(-2,2,100) xgrid, ygrid = np.meshgrid(xvalues, yvalues) zvalues = f(xgrid, ygrid) The most important feature of this code is the call to meshgrid(). This is a NumPy convenience function that constructs grids suitable for three-dimensional surface plots. To understand how this function works, run the following code: xvec = np.arange(0, 4) yvec = np.arange(0, 3) xgrid, ygrid = np.meshgrid(xvec, yvec) After running this code, the xgrid array will contain the following values: array([[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]]) The ygrid array will contain the following values: array([[0, 0, 0, 0], [1, 1, 1, 1], [2, 2, 2, 2]]) Notice that the two arrays have the same dimensions. Each grid point is represented by a pair of the (xgrid[i,j],ygrid[i,j]) type. This convention makes the computation of a vectorized function on a grid easy and efficient, with the f(xgrid, ygrid) expression. The next step is to generate the surface plot, which is done with the following function call: surf = ax.plot_surface(xgrid, ygrid, zvalues, rstride=5, cstride=5, linewidth=0, cmap=cm.plasma) The first three arguments, xgrid, ygrid, and zvalues, specify the data to be plotted. We then use the rstride and cstride options to select a subset of the grid points. Notice that the xvalues and yvalues arrays both have length 100, so that xgrid and ygrid will have 10,000 entries each. Using all grid points would be inefficient and produce a poor plot from the visualization point of view. Thus, we set rstride=5 and cstride=5, which results in a plot containing every fifth point across each row and column of the grid. The next option, linewidth=0, sets the line width of the plot to zero, preventing the display of a wireframe. The final argument, cmap=cm.plasma, specifies the color map for the plot. We use the cm.plasma color map, which has the effect of plotting higher functional values with a hotter color. Matplotlib offer as large number of built-in color maps, listed at https:/​/​matplotlib.​org/​examples/​color/​colormaps_​reference.​html.​ Next, we add the filled contour plot with the following code: ax = fig.add_subplot(1,2,2) ax.contourf(xgrid, ygrid, zvalues, 30, cmap=cm.plasma) Notice that, when selecting the subplot, we do not specify the projection option, which is not necessary for two-dimensional plots. The contour plot is generated with the contourf() method. The first three arguments, xgrid, ygrid, zvalues, specify the data points, and the fourth argument, 30, sets the number of contours. Finally, we set the color map to be the same one used for the surface plot. The final component of the plot is a color bar, which provides a visual representation of the value associated with each color in the plot, with the fig.colorbar(surf, aspect=18) method call. Notice that we have to specify in the first argument which plot the color bar is associated to. The aspect=18 option is used to adjust the aspect ratio of the bar. Larger values will result in a narrower bar. To finish the plot, we call the tight_layout() function. This adjusts the sizes of each plot, so that axis labels are displayed correctly.   We generated 2D and 3D plots using Matplotlib and represented the results of technical computation in graphical manner. If you want to explore other types of plots such as scatter plot or bar chart, you may read Visualizing 3D plots in Matplotlib 2.0. Do check out the book SciPy Recipes to take advantage of other libraries of the SciPy stack and perform matrices, data wrangling and advanced computations with ease.
Read more
  • 0
  • 0
  • 20058

article-image-how-to-secure-data-in-salesforce-einstein-analytics
Amey Varangaonkar
22 Mar 2018
5 min read
Save for later

How to secure data in Salesforce Einstein Analytics

Amey Varangaonkar
22 Mar 2018
5 min read
[box type="note" align="" class="" width=""]The following excerpt is taken from the book Learning Einstein Analytics written by Santosh Chitalkar. This book includes techniques to build effective dashboards and Business Intelligence metrics to gain useful insights from data.[/box] Before getting into security in Einstein Analytics, it is important to set up your organization, define user types so that it is available to use. In this article we explore key aspects of security in Einstein Analytics. The following are key points to consider for data security in Salesforce: Salesforce admins can restrict access to data by setting up field-level security and object-level security in Salesforce. These settings prevent data flow from loading sensitive Salesforce data into a dataset. Dataset owners can restrict data access by using row-level security. Analytics supports security predicates, a robust row-level security feature that enables you to model many different types of access control on datasets. Analytics also supports sharing inheritance. Take a look at the following diagram: Salesforce data security In Einstein Analytics, dataflows bring the data to the Analytics Cloud from Salesforce. It is important that Einstein Analytics has all the necessary permissions and access to objects as well as fields. If an object or a field is not accessible to Einstein then the data flow fails and it cannot extract data from Salesforce. So we need to make sure that the required access is given to the integration user and security user. We can configure the permission set for these users. Let’s configure permissions for an integration user by performing the following steps: Switch to classic mode and enter Profiles in the Quick Find / Search… box Select and clone the Analytics Cloud Integration User profile and Analytics Cloud Security User profile for the integration user and security user respectively: Save the cloned profiles and then edit them Set the permission to Read for all objects and fields Save the profile and assign it to users Take a look at the following diagram: Data pulled from Salesforce can be made secure from both sides: Salesforce as well as Einstein Analytics. It is important to understand that Salesforce and Einstein Analytics are two independent databases. So, a user security setting given to Einstein will not affect the data in Salesforce. There are the following ways to secure data pulled from Salesforce: Salesforce Security Einstein Analytics Security Roles and profiles Inheritance security Organization-Wide Defaults (OWD) and record ownership Security predicates Sharing rules Application-level security Sharing mechanism in Einstein All Analytics users start off with Viewer access to the default Shared App that’s available out-of-the-box; administrators can change this default setting to restrict or extend access. All other applications created by individual users are private, by default; the application owner and administrators have Manager access and can extend access to other Users, groups, or roles. The following diagram shows how the sharing mechanism works in Einstein Analytics: Here’s a summary of what users can do with Viewer, Editor, and Manager access: Action / Access level Viewer Editor Manager View dashboards, lenses, and datasets in the application. If the underlying dataset is in a different application than a lens or dashboard, the user must have access to both applications to view the lens or dashboard. Yes Yes Yes See who has access to the application. Yes Yes Yes Save contents of the application to another application that the user has Editor or Manager access to. Yes Yes Yes Save changes to existing dashboards, lenses, and datasets in the application (saving dashboards requires the appropriate permission set license and permission). Yes Yes Change the application’s sharing settings. Yes Rename the application. Yes Delete the application. Yes Confidentiality, integrity, and availability together are referred to as the CIA Triad and it is designed to help organizations decide what security policies to implement within the organization. Salesforce knows that keeping information private and restricting access by unauthorized users is essential for business. By sharing the application, we can share a lens, dashboard, and dataset all together with one click. To share the entire application, do the following: Go to your Einstein Analytics and then to Analytics Studio Click on the APPS tab and then the icon for your application that you want to share, as shown in the following screenshot: 3. Click on Share and it will open a new popup window, as shown in the following screenshot: Using this window, you can share the application with an individual user, a group of users, or a particular role. You can define the access level as Viewer, Editor, or Manager After selecting User, click on the user you wish to add and click on Add Save and then close the popup And that’s it. It’s done. Mass-sharing the application Sometimes, we are required to share the application with a wide audience: There are multiple approaches to mass-sharing the Wave application such as by role or by username In Salesforce classic UI, navigate to Setup|Public Groups | New For example, to share a sales application, label a public group as Analytics_Sales_Group Search and add users to a group by Role, Roles and Subordinates, or by Users (username): 5. Search for the Analytics_Sales public group 6. Add the Viewer option as shown in the following screenshot: 7. Click on Save Protecting data from breaches, theft, or from any unauthorized user is very important. And we saw that Einstein Analytics provides the necessary tools to ensure the data is secure. If you found this excerpt useful and want to know more about securing your analytics in Einstein, make sure to check out this book Learning Einstein Analytics.  
Read more
  • 0
  • 0
  • 6502

article-image-embed-einstein-dashboards-salesforce-classic
Amey Varangaonkar
21 Mar 2018
5 min read
Save for later

How to embed Einstein dashboards on Salesforce Classic

Amey Varangaonkar
21 Mar 2018
5 min read
[box type="note" align="" class="" width=""]The following excerpt is taken from the book Learning Einstein Analytics written by Santosh Chitalkar. This book highlights the key techniques and know-how to unlock critical insights from your data using Salesforce Einstein Analytics.[/box] With Einstein Analytics, users have the power to embed their dashboards on various third-party applications and even on their web applications. In this article, we will show how to embed an Einstein dashboard on Salesforce Classic. In order to start embedding the dashboard, let's create a sample dashboard by performing the following steps: Navigate to Analytics Studio | Create | Dashboard. Add three chart widgets on the dashboard. Click on the Chart button in the middle and select the Opportunity dataset. Select Measures as Sum of Amount and select BillingCountry under Group by. Click on Done. Repeat the second step for the second widget, but select Account Source under Group by and make it a donut chart. Repeat the second step for the third widget but select Stage under Group by and make it a funnel chart. Click on Save (s) and enter Embedding Opportunities in the title field, as shown in the following screenshot: Now that we have created a dashboard, let's embed this dashboard in Salesforce Classic. In order to start embedding the dashboard, exit from the Einstein Analytics platform and go to Classic mode. The user can embed the dashboard on the record detail page layout in Salesforce Classic. The user can view the dashboard, drill in, and apply a filter, just like in the Einstein Analytics window. Let's add the dashboard to the account detail page by performing the following steps: Navigate to Setup | Customize | Accounts | Page Layouts as shown in the following screenshot: Click on Edit of Account Layout and it will open a page layout editor which has two parts: a palette on the upper portion of the screen, and the page layout on the lower portion of the screen. The palette contains the user interface elements that you can add to your page layout, such as Fields, Buttons, Links, and Actions, and Related Lists, as shown in the following screenshot: Click on the Wave Analytics Assets option from the palette and you can see all the dashboards on the right-side panel. Drag and drop a section onto the page layout, name it Einstein Dashboard, and click on OK. Drag and drop the dashboard which you wish to add to the record detail page. We are going to add Embedded Opportunities. Click on Save. Go to any accounting record and you should see a new section within the dashboard: Users can easily configure the embedded dashboards by using attributes. To access the dashboard properties, go to edit page layout again, and go to the section where we added the dashboard to the layout. Hover over the dashboard and click on the Tool icon. It will open an Asset Properties window: The Asset Properties window gives the user the option to change the following features: Width (in pixels or %): This feature allows you to adjust the width of the dashboard section. Height (in pixels): This feature allows you to adjust the height of the dashboard section. Show Title: This feature allows you to display or hide the title of the dashboard. Show Sharing Icon: Using this feature, by default, the share icon is disabled. The Show Sharing Icon option gives the user a flexibility to include the share icon on the dashboard. Show Header: This feature allows you to display or hide the header. Hide on error: This feature gives you control over whether the Analytics asset appears if there is an error. Field mapping: Last but not least, field mapping is used to filter the relevant data to the record on the dashboard. To set up the dashboard to show only the data that’s relevant to the record being viewed, use field mapping. Field mapping links data fields in the dashboard to the object’s fields. We are using the Embedded Opportunity dashboard. Let's add field mapping to it. The following is the format for field mapping: { "datasets": { "datasetName":[{ "fields":["Actual Field name from object"], "filter":{"operator": "matches", "values":["$dataset Fieldname"]} }] } Let's add field mapping for account by using the following format: { "datasets": { "Account":[{ "fields":["Name"], "filter":{"operator": "matches", "values":["$Name"]} }] } } If your dashboard uses multiple datasets, then you can use the following format: { "datasets": { "datasetName1":[{ "fields":["Actual Field name from object"], "filter":{"operator": "matches", "values":["$dataset1 Fieldname"]} }], "datasetName2":[{ "fields":["Actual Field name from object"], "filter":{"operator": "matches", "values":["$dataset2 Fieldname"]} }] } Let's add field mapping for account and opportunities: { "datasets": { "Opportunities":[{ "fields":["Account.Name"], "Filter":{"operator": "Matches", "values":["$Name"]} }], "Account":[{ "fields":["Name"], "filter":{"operator": "matches", "values":["$Name"]} }] } } Now that we have added field mapping, save the page layout and go to the actual record. Observe that the dashboard is getting filtered now per record, as shown in the following screenshot: To summarize, we saw it’s fairly easy to embed your custom dashboards in Salesforce. Similarly, you can do so on other platforms such as Lightning, Visualforce pages, and even on your websites and web applications. If you are keen to learn more, you may check out the book Learning Einstein Analytics.    
Read more
  • 0
  • 0
  • 3283

article-image-write-high-quality-code-python-15-tips-data-scientists-researchers
Aarthi Kumaraswamy
21 Mar 2018
5 min read
Save for later

How to write high quality code in Python: 15+ tips for data scientists and researchers

Aarthi Kumaraswamy
21 Mar 2018
5 min read
Writing code is easy. Writing high quality code is much harder. Quality is to be understood both in terms of actual code (variable names, comments, docstrings, and so on) and architecture (functions, modules, and classes). In general, coming up with a well-designed code architecture is much more challenging than the implementation itself. In this post, we will give a few tips about how to write high quality code. This is a particularly important topic in academia, as more and more scientists without prior experience in software development need to code. High quality code writing first principles Writing readable code means that other people (or you in a few months or years) will understand it quicker and will be more willing to use it. It also facilitates bug tracking. Modular code is also easier to understand and to reuse. Implementing your program's functionality in independent functions that are organized as a hierarchy of packages and modules is an excellent way of achieving high code quality. It is easier to keep your code loosely coupled when you use functions instead of classes. Spaghetti code is really hard to understand, debug, and reuse. Iterate between bottom-up and top-down approaches while working on a new project. Starting with a bottom-up approach lets you gain experience with the code before you start thinking about the overall architecture of your program. Still, make sure you know where you're going by thinking about how your components will work together. How these high quality code writing first principles translate in Python? Take the time to learn the Python language seriously. Review the list of all modules in the standard library—you may discover that functions you implemented already exist. Learn to write Pythonic code, and do not translate programming idioms from other languages such as Java or C++ to Python. Learn common design patterns; these are general reusable solutions to commonly occurring problems in software engineering. Use assertions throughout your code (the assert keyword) to prevent future bugs (defensive programming). Start writing your code with a bottom-up approach; write independent Python functions that implement focused tasks. Do not hesitate to refactor your code regularly. If your code is becoming too complicated, think about how you can simplify it. Avoid classes when you can. If you can use a function instead of a class, choose the function. A class is only useful when you need to store persistent state between function calls. Make your functions as pure as possible (no side effects). In general, prefer Python native types (lists, tuples, dictionaries, and types from Python's collections module) over custom types (classes). Native types lead to more efficient, readable, and portable code. Choose keyword arguments over positional arguments in your functions. Argument names are easier to remember than argument ordering. They make your functions self-documenting. Name your variables carefully. Names of functions and methods should start with a verb. A variable name should describe what it is. A function name should describe what it does. The importance of naming things well cannot be overstated. Every function should have a docstring describing its purpose, arguments, and return values, as shown in the following example. You can also look at the conventions chosen in popular libraries such as NumPy. The exact convention does not matter, the point is to be consistent within your code. You can use a markup language such as Markdown or reST to do that. Follow (at least partly) Guido van Rossum's Style Guide for Python, also known as Python Enhancement Proposal number 8 (PEP8). It is a long read, but it will help you write well-readable Python code. It covers many little things such as spacing between operators, naming conventions, comments, and docstrings. For instance, you will learn that it is considered a good practice to limit any line of your code to 79 or 99 characters. This way, your code can be correctly displayed in most situations (such as in a command-line interface or on a mobile device) or side by side with another file. Alternatively, you can decide to ignore certain rules. In general, following common guidelines is beneficial on projects involving many developers. You can check your code automatically against most of the style conventions in PEP8 with the pycodestyle Python package. You can also automatically make your code PEP8-compatible with the autopep8 package. Use a tool for static code analysis such as flake8 or Pylint. It lets you find potential errors or low-quality code statically, that is, without running your code. Use blank lines to avoid cluttering your code (see PEP8). You can also demarcate sections in a long Python module with salient comments. A Python module should not contain more than a few hundreds lines of code. Having too many lines of code in a module may be a sign that you need to split it into several modules. Organize important projects (with tens of modules) into subpackages (subdirectories). Take a look at how major Python projects are organized. For example, the code of IPython is well-organized into a hierarchy of subpackages with focused roles. Reading the code itself is also quite instructive. Learn best practices to create and distribute a new Python package. Make sure that you know setuptools, pip, wheels, virtualenv, PyPI, and so on. Also, you are highly encouraged to take a serious look at conda, a powerful and generic packaging system created by Anaconda. Packaging has long been a rapidly evolving topic in Python, so read only the most recent references. You enjoyed an excerpt from Cyrille Rossant’s latest book, IPython Cookbook, Second Edition. This book contains 100+ recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. For free recipes from the book, head over to the Ipython Cookbook Github page. If you loved what you saw, support Cyrille’s work by buying a copy of the book today!
Read more
  • 0
  • 1
  • 7857
article-image-cambridge-analytica-ethics-data-science
Richard Gall
20 Mar 2018
5 min read
Save for later

The Cambridge Analytica scandal and ethics in data science

Richard Gall
20 Mar 2018
5 min read
Earlier this month, Stack Overflow published the results of its 2018 developer survey. In it, there was an interesting set of questions around the concept of 'ethical code'. The main takeaway was ultimately that the area remains a gray area. The Cambridge Analytica scandal, however, has given the issue of 'ethical code' a renewed urgency in the last couple of days. The data analytics company are alleged to have not only been involved in votes in the UK and US, but also of harvesting copious amounts of data from Facebook (illegally). For whistleblower Christopher Wylie, the issue of ethical code is particularly pronounced. “I created Steve Bannon’s psychological mindfuck tool” he told Carole Cadwalladr in an interview in the Guardian. Cambridge Analytica: psyops or just market research? Wylie is a data scientist whose experience over the last half a decade or so has been impressive. It’s worth noting however, that Wylie’s career didn’t begin in politics. His academic career was focused primarily on fashion forecasting. That might all seem a little prosaic, but it underlines the fact that data science never happens in a vacuum. Data scientists always operate within a given field. It might be tempting to view the world purely through the prism of impersonal data and cold statistics. To a certain extent you have to if you’re a data scientist or a statistician. But at the very least this can be unhelpful; at worst a potential threat to global democracy. At one point in the interview Wylie remarks that: ...it’s normal for a market research company to amass data on domestic populations. And if you’re working in some country and there’s an auxiliary benefit to a current client with aligned interests, well that’s just a bonus. This is potentially the most frightening thing. Cambridge Analytica’s ostensible role in elections and referenda isn’t actually that remarkable. For all the vested interests and meetings between investors, researchers and entrepreneurs, the scandal is really just the extension of data mining and marketing tactics employed by just about every organization with a digital presence on the planet. Data scientists are always going to be in a difficult position. True, we're not all going to end up working alongside Steve Bannon. But your skills are always being deployed with a very specific end in mind. It’s not always easy to see the effects and impact of your work until later, but it’s still essential for data scientists and analysts to be aware of whose data is being collected and used, how it’s being used and why. Who is responsible for the ethics around data and code? There was another interesting question in the Stack Overflow survey that's relevant to all of this. The survey asked respondents who was ultimately most responsible for code that accomplishes something unethical. 57.5% claimed upper management were responsible, 22.8% said the person who came up with the idea, and 19.7% said it was the responsibility of the developer themselves. Clearly the question is complex. The truth lies somewhere between all three. Management make decisions about what’s required from an organizational perspective, but the engineers themselves are, of course, a part of the wider organizational dynamic. They should be in a position where they are able to communicate any personal misgivings or broader legal issues with the work they are being asked to do. The case of Wylie and Cambridge Analytica is unique, however. But it does highlight that data science can be deployed in ways that are difficult to predict. And without proper channels of escalation and the right degree of transparency it's easy for things to remain secretive, hidden in small meetings, email threads and paper trails. That's another thing that data scientists need to remember. Office politics might be a fact of life, but when you're a data scientist you're sitting on the apex of legal, strategic and political issues. To refuse to be aware of this would be naive. What the Cambridge Analytica story can teach data scientists But there's something else worth noting. This story also illustrates something more about the world in which data scientists are operating. This is a world where traditional infrastructure is being dismantled. This is a world where privatization and outsourcing is viewed as the route towards efficiency and 'value for money'. Whether you think that’s a good or bad thing isn’t really the point here. What’s important is that it makes the way we use data, even the code we write more problematic than ever because it’s not always easy to see how it’s being used. Arguably Wylie was naive. His curiosity and desire to apply his data science skills to intriguing and complex problems led him towards people who knew just how valuable he could be. Wylie has evidently developed greater self-awareness. This is perhaps the main reason why he has come forward with his version of events. But as this saga unfolds it’s worth remembering the value of data scientists in the modern world - for a range of organizations. It’s made the concept of the 'citizen data scientist' take on an even more urgent and literal meaning. Yes data science can help to empower the economy and possibly even toy with democracy. But it can also be used to empower people, improve transparency in politics and business. If anything, the Cambridge Analytica saga proves that data science is a dangerous field - not only the sexiest job of the twenty-first century, but one of the most influential in shaping the kind of world we're going to live in. That's frightening, but it's also pretty exciting.
Read more
  • 0
  • 0
  • 7991

article-image-getting-started-with-python-web-scraping
Amarabha Banerjee
20 Mar 2018
13 min read
Save for later

Getting started with Python Web Scraping

Amarabha Banerjee
20 Mar 2018
13 min read
[box type="note" align="" class="" width=""]Our article is an excerpt from the book Web Scraping with Python, written by Richard Lawson. This book contains step by step tutorials on how to leverage Python programming techniques for ethical web scraping. [/box] The amount of data available on the web is consistently growing both in quantity and in form. Businesses require this data to make decisions, particularly with the explosive growth of machine learning tools which require large amounts of data for training. Much of this data is available via Application Programming Interfaces, but at the same time a lot of valuable data is still only available through the process of web scraping. Python is the choice of programing language for many who build systems to perform scraping. It is an easy to use programming language with a rich ecosystem of tools for other tasks. In this article, we will focus on the fundamentals of setting up a scraping environment and perform basic requests for data with several tools of trade. Setting up a Python development environment If you have not used Python before, it is important to have a working development  environment. The recipes in this book will be all in Python and be a mix of interactive examples, but primarily implemented as scripts to be interpreted by the Python interpreter. This recipe will show you how to set up an isolated development environment with virtualenv and manage project dependencies with pip . We also get the code for the book and install it into the Python virtual environment. Getting ready We will exclusively be using Python 3.x, and specifically in my case 3.6.1. While Mac and Linux normally have Python version 2 installed, and Windows systems do not. So it is likely that in any case that Python 3 will need to be installed. You can find references for Python installers at www.python.org. You can check Python's version with python --version pip comes installed with Python 3.x, so we will omit instructions on its installation. Additionally, all command line examples in this book are run on a Mac. For Linux users the commands should be identical. On Windows, there are alternate commands (like dir instead of ls), but these alternatives will not be covered. How to do it We will be installing a number of packages with pip. These packages are installed into a Python environment. There often can be version conflicts with other packages, so a good practice for following along with the recipes in the book will be to create a new virtual Python environment where the packages we will use will be ensured to work properly. Virtual Python environments are managed with the virtualenv tool. This can be installed with the following command: ~ $ pip install virtualenv Collecting virtualenv Using cached virtualenv-15.1.0-py2.py3-none-any.whl Installing collected packages: virtualenv Successfully installed virtualenv-15.1.0 Now we can use virtualenv. But before that let's briefly look at pip. This command installs Python packages from PyPI, a package repository with literally 10's of thousands of packages. We just saw using the install subcommand to pip, which ensures a package is installed. We can also see all currently installed packages with pip list: ~ $ pip list alabaster (0.7.9) amqp (1.4.9) anaconda-client (1.6.0) anaconda-navigator (1.5.3) anaconda-project (0.4.1) aniso8601 (1.3.0) Packages can also be uninstalled using pip uninstall followed by the package name. I'll leave it to you to give it a try. Now back to virtualenv. Using virtualenv is very simple. Let's use it to create an environment and install the code from github. Let's walk through the steps: Create a directory to represent the project and enter the directory. ~ $ mkdir pywscb ~ $ cd pywscb Initialize a virtual environment folder named env: pywscb $ virtualenv env Using base prefix '/Users/michaelheydt/anaconda' New python executable in /Users/michaelheydt/pywscb/env/bin/python copying /Users/michaelheydt/anaconda/bin/python => /Users/michaelheydt/pywscb/env/bin/python copying /Users/michaelheydt/anaconda/bin/../lib/libpython3.6m.dylib => /Users/michaelheydt/pywscb/env/lib/libpython3. 6m.dylib Installing setuptools, pip, wheel...done. This creates an env folder. Let's take a look at what was installed. pywscb $ ls -la env total 8 drwxr-xr-x 6 michaelheydt staff 204 Jan 18 15:38 . drwxr-xr-x 3 michaelheydt staff 102 Jan 18 15:35 .. drwxr-xr-x 16 michaelheydt staff 544 Jan 18 15:38 bin drwxr-xr-x 3 michaelheydt staff 102 Jan 18 15:35 include drwxr-xr-x 4 michaelheydt staff 136 Jan 18 15:38 lib -rw-r--r-- 1 michaelheydt staff 60 Jan 18 15:38 pipselfcheck. json New we activate the virtual environment. This command uses the content in the env folder to configure Python. After this all python activities are relative to this virtual environment. pywscb $ source env/bin/activate (env) pywscb $ We can check that python is indeed using this virtual environment with the following command: (env) pywscb $ which python /Users/michaelheydt/pywscb/env/bin/python With our virtual environment created, let's clone the books sample code and take a look at its structure. (env) pywscb $ git clone https://github.com/PacktBooks/PythonWebScrapingCookbook.git Cloning into 'PythonWebScrapingCookbook'... remote: Counting objects: 420, done. remote: Compressing objects: 100% (316/316), done. remote: Total 420 (delta 164), reused 344 (delta 88), pack-reused 0 Receiving objects: 100% (420/420), 1.15 MiB | 250.00 KiB/s, done. Resolving deltas: 100% (164/164), done. Checking connectivity... done. This created a PythonWebScrapingCookbook directory. (env) pywscb $ ls -l total 0 drwxr-xr-x 9 michaelheydt staff 306 Jan 18 16:21 PythonWebScrapingCookbook drwxr-xr-x 6 michaelheydt staff 204 Jan 18 15:38 env Let's change into it and examine the content. (env) PythonWebScrapingCookbook $ ls -l total 0 drwxr-xr-x 15 michaelheydt staff 510 Jan 18 16:21 py drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 www There are two directories. Most the the Python code is is the py directory. www contains some web content that we will use from time-to-time using a local web server. Let's look at the contents of the py directory: (env) py $ ls -l total 0 drwxr-xr-x 9 michaelheydt staff 306 Jan 18 16:21 01 drwxr-xr-x 25 michaelheydt staff 850 Jan 18 16:21 03 drwxr-xr-x 21 michaelheydt staff 714 Jan 18 16:21 04 drwxr-xr-x 10 michaelheydt staff 340 Jan 18 16:21 05 drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 06 drwxr-xr-x 25 michaelheydt staff 850 Jan 18 16:21 07 drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 08 drwxr-xr-x 7 michaelheydt staff 238 Jan 18 16:21 09 drwxr-xr-x 7 michaelheydt staff 238 Jan 18 16:21 10 drwxr-xr-x 9 michaelheydt staff 306 Jan 18 16:21 11 drwxr-xr-x 8 michaelheydt staff 272 Jan 18 16:21 modules Code for each chapter is in the numbered folder matching the chapter (there is no code for chapter 2 as it is all interactive Python). Note that there is a modules folder. Some of the recipes throughout the book use code in those modules. Make sure that your Python path points to this folder. On Mac and Linux you can sets this in your .bash_profile file (and environments variables dialog on Windows): Export PYTHONPATH="/users/michaelheydt/dropbox/packt/books/pywebscrcookbook/code/py/modules" export PYTHONPATH The contents in each folder generally follows a numbering scheme matching the sequence of the recipe in the chapter. The following is the contents of the chapter 6 folder: (env) py $ ls -la 06 total 96 drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:21 . drwxr-xr-x 14 michaelheydt staff 476 Jan 18 16:26 .. -rw-r--r-- 1 michaelheydt staff 902 Jan 18 16:21 01_scrapy_retry.py -rw-r--r-- 1 michaelheydt staff 656 Jan 18 16:21 02_scrapy_redirects.py -rw-r--r-- 1 michaelheydt staff 1129 Jan 18 16:21 03_scrapy_pagination.py -rw-r--r-- 1 michaelheydt staff 488 Jan 18 16:21 04_press_and_wait.py -rw-r--r-- 1 michaelheydt staff 580 Jan 18 16:21 05_allowed_domains.py -rw-r--r-- 1 michaelheydt staff 826 Jan 18 16:21 06_scrapy_continuous.py -rw-r--r-- 1 michaelheydt staff 704 Jan 18 16:21 07_scrape_continuous_twitter.py -rw-r--r-- 1 michaelheydt staff 1409 Jan 18 16:21 08_limit_depth.py -rw-r--r-- 1 michaelheydt staff 526 Jan 18 16:21 09_limit_length.py -rw-r--r-- 1 michaelheydt staff 1537 Jan 18 16:21 10_forms_auth.py -rw-r--r-- 1 michaelheydt staff 597 Jan 18 16:21 11_file_cache.py -rw-r--r-- 1 michaelheydt staff 1279 Jan 18 16:21 12_parse_differently_based_on_rules.py In the recipes I'll state that we'll be using the script in <chapter directory>/<recipe filename>. Now just the be complete, if you want to get out of the Python virtual environment, you can exit using the following command: (env) py $ deactivate py $ And checking which python we can see it has switched back: py $ which python /Users/michaelheydt/anaconda/bin/python Scraping Python.org with Requests and Beautiful Soup In this recipe we will install Requests and Beautiful Soup and scrape some content from www.python.org. We'll install both of the libraries and get some basic familiarity with them. We'll come back to them both in subsequent chapters and dive deeper into each. Getting ready In this recipe, we will scrape the upcoming Python events from https:/ / www. python. org/events/ pythonevents. The following is an an example of The Python.org Events Page (it changes frequently, so your experience will differ): We will need to ensure that Requests and Beautiful Soup are installed. We can do that with the following: pywscb $ pip install requests Downloading/unpacking requests Downloading requests-2.18.4-py2.py3-none-any.whl (88kB): 88kB downloaded Downloading/unpacking certifi>=2017.4.17 (from requests) Downloading certifi-2018.1.18-py2.py3-none-any.whl (151kB): 151kB downloaded Downloading/unpacking idna>=2.5,<2.7 (from requests) Downloading idna-2.6-py2.py3-none-any.whl (56kB): 56kB downloaded Downloading/unpacking chardet>=3.0.2,<3.1.0 (from requests) Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB): 133kB downloaded Downloading/unpacking urllib3>=1.21.1,<1.23 (from requests) Downloading urllib3-1.22-py2.py3-none-any.whl (132kB): 132kB downloaded Installing collected packages: requests, certifi, idna, chardet, urllib3 Successfully installed requests certifi idna chardet urllib3 Cleaning up... pywscb $ pip install bs4 Downloading/unpacking bs4 Downloading bs4-0.0.1.tar.gz Running setup.py (path:/Users/michaelheydt/pywscb/env/build/bs4/setup.py) egg_info for package bs4 How to do it Now let's go and learn to scrape a couple events. For this recipe we will start by using interactive python. Start it with the ipython command: $ ipython Python 3.6.1 |Anaconda custom (x86_64)| (default, Mar 22 2017, 19:25:17) Type "copyright", "credits" or "license" for more information. IPython 5.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. In [1]: Next we import Requests In [1]: import requests We now use requests to make a GET HTTP request for the following url:https://www.python.org/events/ python-events/ by making a GET request: In [2]: url = 'https://www.python.org/events/python-events/' In [3]: req = requests.get(url) That downloaded the page content but it is stored in our requests object req. We can retrieve the content using the .text property. This prints the first 200 characters. req.text[:200] Out[4]: '<!doctype html>n<!--[if lt IE 7]> <html class="no-js ie6 lt-ie7 lt-ie8 lt-ie9"> <![endif]-->n<!--[if IE 7]> <html class="no-js ie7 lt-ie8 lt-ie9"> <![endif]-->n<!--[if IE 8]> <h' We now have the raw HTML of the page. We can now use beautiful soup to parse the HTML and retrieve the event data. First import Beautiful Soup In [5]: from bs4 import BeautifulSoup Now we create a BeautifulSoup object and pass it the HTML. In [6]: soup = BeautifulSoup(req.text, 'lxml') Now we tell Beautiful Soup to find the main <ul> tag for the recent events, and then to get all the <li> tags below it. In [7]: events = soup.find('ul', {'class': 'list-recentevents'}). findAll('li') And finally we can loop through each of the <li> elements, extracting the event details, and print each to the console: In [13]: for event in events: ...: event_details = dict() ...: event_details['name'] = event_details['name'] = event.find('h3').find("a").text ...: event_details['location'] = event.find('span', {'class' 'event-location'}).text ...: event_details['time'] = event.find('time').text ...: print(event_details) ...: {'name': 'PyCascades 2018', 'location': 'Granville Island Stage, 1585 Johnston St, Vancouver, BC V6H 3R9, Canada', 'time': '22 Jan. – 24 Jan. 2018'} {'name': 'PyCon Cameroon 2018', 'location': 'Limbe, Cameroon', 'time': '24 Jan. – 29 Jan. 2018'} {'name': 'FOSDEM 2018', 'location': 'ULB Campus du Solbosch, Av. F. Roosevelt 50, 1050 Bruxelles, Belgium', 'time': '03 Feb. – 05 Feb. 2018'} {'name': 'PyCon Pune 2018', 'location': 'Pune, India', 'time': '08 Feb. – 12 Feb. 2018'} {'name': 'PyCon Colombia 2018', 'location': 'Medellin, Colombia', 'time': '09 Feb. – 12 Feb. 2018'} {'name': 'PyTennessee 2018', 'location': 'Nashville, TN, USA', 'time': '10 Feb. – 12 Feb. 2018'} This entire example is available in the 01/01_events_with_requests.py script file. The following is its content and it pulls together all of what we just did step by step: import requests from bs4 import BeautifulSoup def get_upcoming_events(url): req = requests.get(url) soup = BeautifulSoup(req.text, 'lxml') events = soup.find('ul', {'class': 'list-recent-events'}).findAll('li') for event in events: event_details = dict() event_details['name'] = event.find('h3').find("a").text event_details['location'] = event.find('span', {'class', 'eventlocation'}). text event_details['time'] = event.find('time').text print(event_details) get_upcoming_events('https://www.python.org/events/python-events/') You can run this using the following command from the terminal: $ python 01_events_with_requests.py {'name': 'PyCascades 2018', 'location': 'Granville Island Stage, 1585 Johnston St, Vancouver, BC V6H 3R9, Canada', 'time': '22 Jan. – 24 Jan. 2018'} {'name': 'PyCon Cameroon 2018', 'location': 'Limbe, Cameroon', 'time': '24 Jan. – 29 Jan. 2018'} {'name': 'FOSDEM 2018', 'location': 'ULB Campus du Solbosch, Av. F. D. Roosevelt 50, 1050 Bruxelles, Belgium', 'time': '03 Feb. – 05 Feb. 2018'} {'name': 'PyCon Pune 2018', 'location': 'Pune, India', 'time': '08 Feb. – 12 Feb. 2018'} {'name': 'PyCon Colombia 2018', 'location': 'Medellin, Colombia', 'time': '09 Feb. – 12 Feb. 2018'} {'name': 'PyTennessee 2018', 'location': 'Nashville, TN, USA', 'time': '10 Feb. – 12 Feb. 2018'} How it works We will dive into details of both Requests and Beautiful Soup in the next chapter, but for now let's just summarize a few key points about how this works. The following important points about Requests: Requests is used to execute HTTP requests. We used it to make a GET verb request of the URL for the events page. The Requests object holds the results of the request. This is not only the page content, but also many other items about the result such as HTTP status codes and headers. Requests is used only to get the page, it does not do an parsing. We use Beautiful Soup to do the parsing of the HTML and also the finding of content within the HTML. To understand how this worked, the content of the page has the following HTML to start the Upcoming Events section: We used the power of Beautiful Soup to: Find the <ul> element representing the section, which is found by looking for a <ul> with the a class attribute that has a value of list-recent-events. From that object, we find all the <li> elements. Each of these <li> tags represent a different event. We iterate over each of those making a dictionary from the event data found in child HTML tags: The name is extracted from the <a> tag that is a child of the <h3> tag The location is the text content of the <span> with a class of event-location And the time is extracted from the datetime attribute of the <time> tag. To summarize, we saw how to setup a Python environment for effective data scraping from the web and also explored ways to use Beautiful Soup to perform preliminary data scraping for ethical purposes. If you liked this post, be sure to check out Web Scraping with Python, which consists of useful recipes to work with Apache Kafka installation.        
Read more
  • 0
  • 0
  • 2849

article-image-25-datasets-deep-learning-iot
Sugandha Lahoti
20 Mar 2018
8 min read
Save for later

25 Datasets for Deep Learning in IoT

Sugandha Lahoti
20 Mar 2018
8 min read
Deep Learning is one of the major players for facilitating the analytics and learning in the IoT domain. A really good roundup of the state of deep learning advances for big data and IoT is described in the paper Deep Learning for IoT Big Data and Streaming Analytics: A Survey by Mehdi Mohammadi, Ala Al-Fuqaha, Sameh Sorour, and Mohsen Guizani. In this article, we have attempted to draw inspiration from this research paper to establish the importance of IoT datasets for deep learning applications. The paper also provides a handy list of commonly used datasets suitable for building deep learning applications in IoT, which we have added at the end of the article. IoT and Big Data: The relationship IoT and Big data have a two-way relationship. IoT is the main producer of big data, and as such an important target for big data analytics to improve the processes and services of IoT. However, there is a difference between the two. Large-Scale Streaming data: IoT data is a large-scale streaming data. This is because a large number of IoT devices generate streams of data continuously. Big data, on the other hand, lack real-time processing. Heterogeneity: IoT data is heterogeneous as various IoT data acquisition devices gather different information. Big data devices are generally homogeneous in nature. Time and space correlation: IoT sensor devices are also attached to a specific location, and thus have a location and time-stamp for each of the data items. Big data sensors lack time-stamp resolution. High noise data: IoT data is highly noisy, owing to the tiny pieces of data in IoT applications, which are prone to errors and noise during acquisition and transmission. Big data, in contrast, is generally less noisy. Big data, on the other hand, is classified according to conventional 3V’s, Volume, Velocity, and Variety. As such techniques used for Big data analytics are not sufficient to analyze the kind of data, that is being generated by IoT devices. For instance, autonomous cars need to make fast decisions on driving actions such as lane or speed change. These decisions should be supported by fast analytics with data streaming from multiple sources (e.g., cameras, radars, left/right signals, traffic light etc.). This changes the definition of IoT big data classification to 6V’s. Volume: The quantity of generated data using IoT devices is much more than before and clearly fits this feature. Velocity: Advanced tools and technologies for analytics are needed to efficiently operate the high rate of data production. Variety: Big data may be structured, semi-structured, and unstructured data. The data types produced by IoT include text, audio, video, sensory data and so on. Veracity: Veracity refers to the quality, consistency, and trustworthiness of the data, which in turn leads to accurate analytics. Variability: This property refers to the different rates of data flow. Value: Value is the transformation of big data to useful information and insights that bring competitive advantage to organizations. Despite the recent advancement in DL for big data, there are still significant challenges that need to be addressed to mature this technology. Every 6 characteristics of IoT big data imposes a challenge for DL techniques. One common denominator for all is the lack of availability of IoT big data datasets.   IoT datasets and why are they needed Deep learning methods have been promising with state-of-the-art results in several areas, such as signal processing, natural language processing, and image recognition. The trend is going up in IoT verticals as well. IoT datasets play a major role in improving the IoT analytics. Real-world IoT datasets generate more data which in turn improve the accuracy of DL algorithms. However, the lack of availability of large real-world datasets for IoT applications is a major hurdle for incorporating DL models in IoT. The shortage of these datasets acts as a barrier to deployment and acceptance of IoT analytics based on DL since the empirical validation and evaluation of the system should be shown promising in the natural world. The lack of availability is mainly because: Most IoT datasets are available with large organizations who are unwilling to share it so easily. Access to the copyrighted datasets or privacy considerations. These are more common in domains with human data such as healthcare and education. While there is a lot of ground to be covered in terms of making datasets for IoT available, here is a list of commonly used datasets suitable for building deep learning applications in IoT. Dataset Name Domain Provider Notes Address/Link CGIAR dataset Agriculture, Climate CCAFS High-resolution climate datasets for a variety of fields including agricultural http://www.ccafs-climate.org/ Educational Process Mining Education University of Genova Recordings of 115 subjects’ activities through a logging application while learning with an educational simulator http://archive.ics.uci.edu/ml/datasets/Educational+Process+Mining+%28EPM%29%3A+A+Learning+Analytics+Data+Set Commercial Building Energy Dataset Energy, Smart Building IIITD Energy related data set from a commercial building where data is sampled more than once a minute. http://combed.github.io/ Individual household electric power consumption Energy, Smart home EDF R&D, Clamart, France One-minute sampling rate over a period of almost 4 years http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption AMPds dataset Energy, Smart home S. Makonin AMPds contains electricity, water, and natural gas measurements at one minute intervals for 2 years of monitoring http://ampds.org/ UK Domestic Appliance-Level Electricity Energy, Smart Home Kelly and Knottenbelt Power demand from five houses. In each house both the whole-house mains power demand as well as power demand from individual appliances are recorded. http://www.doc.ic.ac.uk/∼dk3810/data/ PhysioBank databases Healthcare PhysioNet Archive of over 80 physiological datasets. https://physionet.org/physiobank/database/ Saarbruecken Voice Database Healthcare Universitat¨ des Saarlandes A collection of voice recordings from more than 2000 persons for pathological voice detection. http://www.stimmdatebank.coli.uni-saarland.de/help_en.php4   T-LESS   Industry CMP at Czech Technical University An RGB-D dataset and evaluation methodology for detection and 6D pose estimation of texture-less objects http://cmp.felk.cvut.cz/t-less/ CityPulse Dataset Collection Smart City CityPulse EU FP7 project Road Traffic Data, Pollution Data, Weather, Parking http://iot.ee.surrey.ac.uk:8080/datasets.html Open Data Institute - node Trento Smart City Telecom Italia Weather, Air quality, Electricity, Telecommunication http://theodi.fbk.eu/openbigdata/ Malaga datasets Smart City City of Malaga A broad range of categories such as energy, ITS, weather, Industry, Sport, etc. http://datosabiertos.malaga.eu/dataset Gas sensors for home activity monitoring Smart home Univ. of California San Diego Recordings of 8 gas sensors under three conditions including background, wine and banana presentations. http://archive.ics.uci.edu/ml/datasets/Gas+sensors+for+home+activity+monitoring CASAS datasets for activities of daily living Smart home Washington State University Several public datasets related to Activities of Daily Living (ADL) performance in a two story home, an apartment, and an office settings. http://ailab.wsu.edu/casas/datasets.html ARAS Human Activity Dataset Smart home Bogazici University Human activity recognition datasets collected from two real houses with multiple residents during two months. https://www.cmpe.boun.edu.tr/aras/ MERLSense Data Smart home, building Mitsubishi Electric Research Labs Motion sensor data of residual traces from a network of over 200 sensors for two years, containing over 50 million records. http://www.merl.com/wmd SportVU   Sport Stats LLC   Video of basketball and soccer games captured from 6 cameras. http://go.stats.com/sportvu RealDisp Sport O. Banos   Includes a wide range of physical activities (warm up, cool down and fitness exercises). http://orestibanos.com/datasets.htm   Taxi Service Trajectory Transportation Prediction Challenge, ECML PKDD 2015 Trajectories performed by all the 442 taxis running in the city of Porto, in Portugal. http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html GeoLife GPS Trajectories Transportation Microsoft A GPS trajectory by a sequence of time-stamped points https://www.microsoft.com/en-us/download/details.aspx?id=52367 T-Drive trajectory data Transportation Microsoft Contains a one-week trajectories of 10,357 taxis https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/ Chicago Bus Traces data Transportation M. Doering   Bus traces from the Chicago Transport Authority for 18 days with a rate between 20 and 40 seconds. http://www.ibr.cs.tu-bs.de/users/mdoering/bustraces/   Uber trip data Transportation FiveThirtyEight About 20 million Uber pickups in New York City during 12 months. https://github.com/fivethirtyeight/uber-tlc-foil-response Traffic Sign Recognition Transportation K. Lim   Three datasets: Korean daytime, Korean nighttime, and German daytime traffic signs based on Vienna traffic rules. https://figshare.com/articles/Traffic_Sign_Recognition_Testsets/4597795 DDD17   Transportation J. Binas End-To-End DAVIS Driving Dataset. http://sensors.ini.uzh.ch/databases.html      
Read more
  • 0
  • 2
  • 48302
article-image-create-prepare-first-dataset-salesforce-einstein
Amey Varangaonkar
19 Mar 2018
3 min read
Save for later

How to create and prepare your first dataset in Salesforce Einstein

Amey Varangaonkar
19 Mar 2018
3 min read
[box type="note" align="" class="" width=""]The following extract is taken from the book Learning Einstein Analytics written by Santosh Chitalkar. This book will help you learn Salesforce Einstein analytics, to get insights faster and understand your customer better.[/box] In this article, we see how to start your analytics journey using Salesforce Einstein by taking the first step in the process i.e; by creating and preparing your dataset! A dataset is a set of source data, specially formatted and optimized for interactive exploration. Here are the steps to create a new dataset in Salesforce Einstein: Click on the Create button in the top-right corner and then click on Dataset. You can see the following three options to create datasets: CSV File Salesforce Informatica Rev 2. Select CSV File and click on Continue, as shown in the following screenshot: 3. Select the Account_data.csv file or drag and drop the file. 4. Click on Next. The next screen uploads the user interface to create a single dataset by using the external.csv file: 5. Click on Next to proceed as shown in the following screenshot: 6. Change the dataset name if you want. You can select an application to store the dataset. You can also replace the CSV file from this screen. 7. Click on in the Data Schema File section and select the Replace File option to change the file. You can also download the uploaded .csv file from here as shown in the following screenshot: 8. Click on Next. In the next screen, you can change field attributes such as column name, dimensions, field type, and so on. 9. Click on the Next button and it will start uploading the file in Analytics and queuing it in dataflow. Once done click on the Got it button. 10. Wait for 10-15 minutes (depending on the data, it may take a longer time to create the dataset). 11. Go to Analytics Studio and open the DATASETS tab. You can see the Account_data dataset as shown in the following screenshot: Congrats!!! You have created your first dataset. Let's now update this dataset with the same information but with some additional columns. Updating datasets We need to update the dataset to add new fields, change application settings, remove fields, and so on. Einstein Analytics gives users the flexibility to update the dataset. Here are the steps to update an existing dataset: Create a CSV file to include some new fields and name it Account_Data_Updated. Save the file to a location that you can easily remember. In Salesforce, go to the Analytics Studio home page and find the dataset. Hover over the dataset and click on the button, then click on Edit, as shown in the following screenshot: 4. Salesforce displays the dataset editing screen. Click on the Replace Data button in the top-right corner of the page: 5. Click on the Next button and upload your new CSV file using upload UI. 6. Click on the Next button again to get to the next screen for editing and click on Next again. 7. Click on Replace as shown in the following screenshot: Voila! You’ve successfully updated your dataset. As you can see it’s fairly easy to create and then update the dataset if required, using Einstein without any hassle. If you found this post useful, make sure to check out our book Learning Einstein Analytics for more tips and techniques on using Einstein Analytics effectively to uncover unique insights from your data.        
Read more
  • 0
  • 1
  • 5160

article-image-perform-crud-operations-on-mongodb-with-php
Amey Varangaonkar
17 Mar 2018
6 min read
Save for later

Perform CRUD operations on MongoDB with PHP

Amey Varangaonkar
17 Mar 2018
6 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book Mastering MongoDB 3.x authored by Alex Giamas. This book covers the key concepts, and tips & tricks needed to build fault-tolerant applications in MongoDB. It gives you the power to become a true expert when it comes to the world’s most popular NoSQL database.[/box] In today’s tutorial, we will cover the CRUD (Create, Read, Update and Delete) operations using the popular PHP language with the official MongoDB driver. Create and delete operations To perform the create and delete operations, run the following code: $document = array( "isbn" => "401", "name" => "MongoDB and PHP" ); $result = $collection->insertOne($document); var_dump($result); This is the output: MongoDBInsertOneResult Object ( [writeResult:MongoDBInsertOneResult:private] => MongoDBDriverWriteResult Object ( [nInserted] => 1 [nMatched] => 0 [nModified] => 0 [nRemoved] => 0 [nUpserted] => 0 [upsertedIds] => Array ( ) [writeErrors] => Array ( ) [writeConcernError] => [writeConcern] => MongoDBDriverWriteConcern Object ( ) ) [insertedId:MongoDBInsertOneResult:private] => MongoDBBSONObjectID Object ( [oid] => 5941ac50aabac9d16f6da142 ) [isAcknowledged:MongoDBInsertOneResult:private] => 1 ) The rather lengthy output contains all the information that we may need. We can get the ObjectId of the document inserted; the number of inserted, matched, modified, removed, and upserted documents by fields prefixed with n; and information about writeError or writeConcernError. There are also convenience methods in the $result object if we want to get the Information: $result->getInsertedCount(): To get the number of inserted objects $result->getInsertedId(): To get the ObjectId of the inserted document We can also use the ->insertMany() method to insert many documents at once, like this: $documentAlpha = array( "isbn" => "402", "name" => "MongoDB and PHP, 2nd Edition" ); $documentBeta = array( "isbn" => "403", "name" => "MongoDB and PHP, revisited" ); $result = $collection->insertMany([$documentAlpha, $documentBeta]); print_r($result); The result is: ( [writeResult:MongoDBInsertManyResult:private] => MongoDBDriverWriteResult Object ( [nInserted] => 2 [nMatched] => 0 [nModified] => 0 [nRemoved] => 0 [nUpserted] => 0 [upsertedIds] => Array ( ) [writeErrors] => Array ( ) [writeConcernError] => [writeConcern] => MongoDBDriverWriteConcern Object ( ) ) [insertedIds:MongoDBInsertManyResult:private] => Array ( [0] => MongoDBBSONObjectID Object ( [oid] => 5941ae85aabac9d1d16c63a2 ) [1] => MongoDBBSONObjectID Object ( [oid] => 5941ae85aabac9d1d16c63a3 ) ) [isAcknowledged:MongoDBInsertManyResult:private] => 1 ) Again, $result->getInsertedCount() will return 2, whereas $result->getInsertedIds() will return an array with the two newly created ObjectIds: array(2) { [0]=> object(MongoDBBSONObjectID)#13 (1) { ["oid"]=> string(24) "5941ae85aabac9d1d16c63a2" } [1]=> object(MongoDBBSONObjectID)#14 (1) { ["oid"]=> string(24) "5941ae85aabac9d1d16c63a3" } } Deleting documents is similar to inserting but with the deleteOne() and deleteMany() methods; an example of deleteMany() is shown here: $deleteQuery = array( "isbn" => "401"); $deleteResult = $collection->deleteMany($deleteQuery); print_r($result); print($deleteResult->getDeletedCount()); Here is the output: MongoDBDeleteResult Object ( [writeResult:MongoDBDeleteResult:private] => MongoDBDriverWriteResult Object ( [nInserted] => 0 [nMatched] => 0 [nModified] => 0 [nRemoved] => 2 [nUpserted] => 0 [upsertedIds] => Array ( ) [writeErrors] => Array ( ) [writeConcernError] => [writeConcern] => MongoDBDriverWriteConcern Object ( ) ) [isAcknowledged:MongoDBDeleteResult:private] => 1 ) 2 In this example, we used ->getDeletedCount() to get the number of affected documents, which is printed out in the last line of the output. Bulk write The new PHP driver supports the bulk write interface to minimize network calls to MongoDB: $manager = new MongoDBDriverManager('mongodb://localhost:27017'); $bulk = new MongoDBDriverBulkWrite(array("ordered" => true)); $bulk->insert(array( "isbn" => "401", "name" => "MongoDB and PHP" )); $bulk->insert(array( "isbn" => "402", "name" => "MongoDB and PHP, 2nd Edition" )); $bulk->update(array("isbn" => "402"), array('$set' => array("price" => 15))); $bulk->insert(array( "isbn" => "403", "name" => "MongoDB and PHP, revisited" )); $result = $manager->executeBulkWrite('mongo_book.books', $bulk); print_r($result); The result is: MongoDBDriverWriteResult Object ( [nInserted] => 3 [nMatched] => 1 [nModified] => 1 [nRemoved] => 0 [nUpserted] => 0 [upsertedIds] => Array ( ) [writeErrors] => Array ( ) [writeConcernError] => [writeConcern] => MongoDBDriverWriteConcern Object ( ) ) In the preceding example, we executed two inserts, one update, and a third insert in an ordered fashion. The WriteResult object contains a total of three inserted documents and one modified document. The main difference compared to simple create/delete queries is that executeBulkWrite() is a method of the MongoDBDriverManager class, which we instantiate on the first line. Read operation Querying an interface is similar to inserting and deleting, with the findOne() and find() methods used to retrieve the first result or all results of a query: $document = $collection->findOne( array("isbn" => "101") ); $cursor = $collection->find( array( "name" => new MongoDBBSONRegex("mongo", "i") ) ); In the second example, we are using a regular expression to search for a key name with the value mongo (case-insensitive). Embedded documents can be queried using the . notation, as with the other languages that we examined earlier in this chapter: $cursor = $collection->find( array('meta.price' => 50) ); We do this to query for an embedded document price inside the meta key field. Similarly to Ruby and Python, in PHP we can query using comparison operators, like this: $cursor = $collection->find( array( 'price' => array('$gte'=> 60) ) ); Querying with multiple key-value pairs is an implicit AND, whereas queries using $or, $in, $nin, or AND ($and) combined with $or can be achieved with nested queries: $cursor = $collection->find( array( '$or' => array( array("price" => array( '$gte' => 60)), array("price" => array( '$lte' => 20)) ))); This finds documents that have price>=60 OR price<=20. Update operation Updating documents has a similar interface with the ->updateOne() OR ->updateMany() method. The first parameter is the query used to find documents and the second one will update our documents. We can use any of the update operators explained at the end of this chapter to update in place or specify a new document to completely replace the document in the query: $result = $collection->updateOne( array( "isbn" => "401"), array( '$set' => array( "price" => 39 ) ) ); We can use single quotes or double quotes for key names, but if we have special operators starting with $, we need to use single quotes. We can use array( "key" => "value" ) or ["key" => "value"]. We prefer the more explicit array() notation in this book. The ->getMatchedCount() and ->getModifiedCount() methods will return the number of documents matched in the query part or the ones modified from the query. If the new value is the same as the existing value of a document, it will not be counted as modified. We saw, it is fairly easy and advantageous to use PHP as a language and tool for performing efficient CRUD operations in MongoDB to handle data efficiently. If you are interested to get more information on how to effectively handle data using MongoDB, you may check out this book Mastering MongoDB 3.x.
Read more
  • 0
  • 0
  • 5923