Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Neo4j – Modeling Bookings and Users

Save for later
  • 840 min read
  • 2015-08-11 00:00:00

article-image

In this article, by Mahesh Lal, author of the book Neo4j Graph Data Modeling, we will explore how graphs can be used to solve problems that are dominantly solved using RDBMS, for example, bookings.

We will discuss the following topics in this article:

  • Modeling bookings in an RDBMS
  • Modeling bookings in a graph
  • Adding bookings to graphs
  • Using Cypher to find bookings and journeys

(For more resources related to this topic, see here.)

Building a data model for booking flights

We have a graph that allows people to search flights. At this point, a logical extension to the problem statement could be to allow users to book flights online after they decide the route on which they wish to travel. We were only concerned with flights and the cities. However, we need to tweak the model to include users, bookings, dates, and capacity of the flight in order to make bookings. Most teams choose to use an RDBMS for sensitive data such as user information and bookings. Let's understand how we can translate a model from an RDBMS to a graph.

A flight booking generally has many moving parts. While it would be great to model all of the parts of a flight booking, a smaller subset would be more feasible, to demonstrate how to model data that is normally stored in a RDBMS.

A flight booking will contain information about the user who booked it along with the date of booking. It's not uncommon to change multiple flights to get from one city to another. We can call these journey legs or journeys, and model them separately from the booking that has these journeys. It is also possible that the person booking the flight might be booking for some other people. Because of this, it is advisable to model passengers with their basic details separately from the user.

We have intentionally skipped details such as payment and costs in order to keep the model simple.

A simple model of the bookings ecosystem

A booking generally contains information such as the date of booking, the user who booked it, and a date of commencement of the travel. A journey contains information about the flight code. Other information about the journey such as the departure and arrival time, and the source and destination cities can be evaluated on the basis of the flight which the journey is being undertaken. Both booking and journey will have their own specific IDs to identify them uniquely. Passenger information related to the booking must have the name of the passengers at the very least, but more commonly will have more information such as the age, gender, and e-mail.

A rough model of the Booking, Journey, Passenger, and User looks like this:

neo4j-modeling-bookings-and-users-img-0

Figure 4.1: Bookings ecosystem

Modeling bookings in an RDBMS

To model data shown in Figure 4.1 in an RDBMS, we will have to create tables for bookings, journeys, passengers, and users. In the previous model, we have intentionally added booking_id to Journeys and user_id to Bookings. In an RDBMS, these will be used as foreign keys.

We also need an additional table Bookings_Passengers_Relationships so that we can depict the many relationships between Bookings and Passengers. The multiple relationships between Bookings and Passengers help us to ensure that we capture passenger details for two purposes. The first is that a user can have a master list of travelers they have travelled with and the second use is to ensure that all the journeys taken by a person can be fetched when the passenger logs into their account or creates an account in the future.

We are naming the foreign key references with a prefix fk_ in adherence to the popular convention.

neo4j-modeling-bookings-and-users-img-1

Figure 4.2: Modeling bookings in an RDBMS

In an RDBMS, every record is a representation of an entity (or a relationship in case of relationship tables). In our case, we tried to represent a single booking record as a single block. This applies to all other entities in the system, such as the journeys, passengers, users, and flights. Each of the records has its own ID by which it can be uniquely identified. The properties starting with fk_ are foreign keys, which should be present in the tables to which the key points.

In our model, passengers may or may not be the users of our application. Hence, we don't add a foreign key constraint to the Passengers table. To infer whether the passenger is one of the users or not, we will have to use other means of inferences, for example, the e-mail ID. Given the relationships of the data, which are inferred using the foreign key relationships and other indirect means, we can draw the logical graph of bookings as shown in the following diagram:

neo4j-modeling-bookings-and-users-img-2

Figure 4.3: Visualizing related entities in an RDBMS

Figure 4.3 shows us the logical graph of how entities are connected in our domain. We can translate this into a Bookings subgraph. From the related entities of Figure 4.3, we can create a specification of the Bookings subgraph, which is as follows:

neo4j-modeling-bookings-and-users-img-3

Figure 4.4: Specification of subgraph of bookings

Comparing Figure 5.3 and Figure 5.4, we observe that all the fk_ properties are removed from the nodes that represent the entities. Since we have explicit relationships that can now be used to traverse the graph, we don't need implicit relationships that rely on foreign keys to be enforced. We put the date of booking on the booking itself rather than on the relationship between User and Bookings.

The date of booking can be captured either in the booking node or in the :MADE_BOOKING relationship. The advantage of capturing it in the booking node is that we can further run queries efficiently on it rather than relying on crude filtering methods to extract information from the subgraph.

An important addition to the Bookings object is adding the properties year, month, and day. Since date is not a datatype supported by Neo4j, range queries become difficult. Timestamps solve this problem to some extent, for example, if we want to find all bookings made between June 01, 2015 and July 01, 2015, we can convert them into timestamps and search for all bookings that have timestamps between these two timestamps. This, however, is a very expensive process, and would need a store scan of bookings. To alleviate these problems, we can capture the year, day, and month on the booking.

While adapting to the changing needs of the system, remodeling the data model is encouraged. It is also important that we build a data model with enough data captured for our needs—both current and future. It is a judgment-based decision, without any correct answer. As long as the data might be easily derived from existing data in the node, we recommend not to add it until needed. In this case, converting a timestamp to its corresponding date with its components might require additional programming effort. To avoid that, we can begin capturing the data right away. There might be other cases, for example, we want to introduce a property Name on a node with First name and Last name as properties. The derivation of Name from First name and Last name is straightforward. In this case, we advise not to capture the data till the need arises.

Creating bookings and users in Neo4j

For bookings to exist, we should create users in our data model.

Creating users

To create users, we create a constraint on the e-mail of the user, which we will use as an unique identifier as shown in the following query:

neo4j-sh (?)$ CREATE CONSTRAINT ON (user:User)   ASSERT user.email IS UNIQUE;

The output of the preceding query is as follows:

+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1

With the constraint added, let's create a few users in our system:

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at ₹800/month. Cancel anytime
neo4j-sh (?)$ CREATE (:User{name:"Mahesh Lal",   email:"[email protected]"}),
(:User{name:"John Doe", email:"[email protected]"}),
(:User{name:"Vishal P", email:"[email protected]"}),
(:User{name:"Dave Coeburg", email:"[email protected]"}),
(:User{name:"Brian Heritage",     email:"[email protected]"}),
(:User{name:"Amit Kumar", email:"[email protected]"}),
(:User{name:"Pramod Bansal",     email:"[email protected]"}),
(:User{name:"Deepali T", email:"[email protected]"}),
(:User{name:"Hari Seldon", email:"[email protected]"}),
(:User{name:"Elijah", email:"[email protected]"});

The output of the preceding query is as follows:

+-------------------+
| No data returned. |
+-------------------+
Nodes created: 10
Properties set: 20
Labels added: 10

Please add more users from users.cqy.

Creating bookings in Neo4j

As discussed earlier, a booking has multiple journey legs, and a booking is only complete when all its journey legs are booked.

Bookings in our application aren't a single standalone entity. They involve multiple journeys and passengers. To create a booking, we need to ensure that journeys are created and information about passengers is captured. This results in a multistep process.

To ensure that booking IDs remain unique and no two nodes have the same ID, we should add a constraint on the id property of booking:

neo4j-sh (?)$ CREATE CONSTRAINT ON (b:Booking)   ASSERT b.id IS UNIQUE;

The output will be as follows:

+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1

We will create similar constraints for Journey as shown here:

neo4j-sh (?)$ CREATE CONSTRAINT ON (journey:Journey)   ASSERT journey._id IS UNIQUE;

The output is as follows:

+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1

Add a constraint for the e-mail of passengers to be unique, as shown here:

neo4j-sh (?)$ CREATE CONSTRAINT ON (p:Passenger)   ASSERT p.email IS UNIQUE;

The output is as shown:

+-------------------+
| No data returned. |
+-------------------+
Constraints added: 1

With constraint creation, we can now focus on how bookings can be created. We will be running this query in the Neo4j browser, as shown:

//Get all flights and users
MATCH (user:User{email:"[email protected]"})
MATCH (f1:Flight{code:"VS9"}), (f2:Flight{code:"AA9"})
//Create a booking for a date
MERGE (user)-[m:MADE_BOOKING]->(booking:Booking {_id:"0f64711c-7e22-11e4-a1af-14109fda6b71", booking_date:1417790677.274862, year: 2014, month: 12, day: 5})
//Create or get passengers
MERGE (p1:Passenger{email:"[email protected]"}) ON CREATE SET p1.name = "Vishal Punyani", p1.age= 30
MERGE (p2:Passenger{email:"[email protected]"}) ON CREATE SET p2.name = "John Doe", p2.age= 25
//Create journeys to be taken by flights
MERGE (j1:Journey{_id: "712785b8-1aff-11e5-abd4-6c40089a9424", date_of_journey:1422210600.0, year:2015, month: 1, day: 26})-[:BY_FLIGHT]-> (f1)
MERGE (j2:Journey{_id:"843de08c-1aff-11e5-8643-6c40089a9424", date_of_journey:1422210600.0, year:2015, month: 1, day: 26})-[:BY_FLIGHT]-> (f2)
WITH user, booking, j1, j2, f1, f2, p1, p2
//Merge journeys and booking, Create and Merge passengers with bookings, and return data
MERGE (booking)-[:HAS_PASSENGER]->(p1)
MERGE (booking)-[:HAS_PASSENGER]->(p2)
MERGE (booking)-[:HAS_JOURNEY]->(j1)
MERGE (booking)-[:HAS_JOURNEY]->(j2)
RETURN user, p1, p2, j1, j2, f1, f2, booking

The output is as shown in the following screenshot:

neo4j-modeling-bookings-and-users-img-4

Figure 4.5: Booking that was just created

We have added comments to the query to explain the different parts of the query. The query can be divided into the following parts:

  • Finding flights and user
  • Creating bookings
  • Creating journeys
  • Creating passengers and link to booking
  • Linking journey to booking

We have the same start date for both journeys, but in general, the start dates of journeys in the same booking will differ if:

  • The traveler is flying across time zones. For example, if a traveler is flying from New York to Istanbul, the journeys from New York to London and from London to Istanbul will be on different dates.
  • The traveler is booking multiple journeys in which they will be spending some time at a destination.

Let's use bookings.cqy to add a few more bookings to the graph. We will use them to run further queries.

Queries to find journeys and bookings

With the data on bookings added in, we can now explore some interesting queries that can help us.

Finding all journeys of a user

All journeys that a user has undertaken will be all journeys that they have been a passenger on. We can use the user's e-mail to search for journeys on which the user has been a passenger.

To find all the journeys that the user has been a passenger on, we should find the journeys via the bookings, and then using the bookings, we can find the journeys, flights, and cities as shown:

neo4j-sh (?)$ MATCH (b:Booking)-[:HAS_PASSENGER]->(p:Passenger{email:"[email protected]"})
WITH b
MATCH (b)-[:HAS_JOURNEY]->(j:Journey)-[:BY_FLIGHT]->(f:Flight)
WITH b._id as booking_id, j.date_of_journey as date_of_journey, COLLECT(f) as flights ORDER BY date_of_journey DESC
MATCH (source:City)-[:HAS_FLIGHT]->(f)-[:FLYING_TO]->(destination:City)
WHERE f in flights
RETURN booking_id, date_of_journey, source.name as from, f.code as by_flight, destination.name as to;

The output of this query is as follows:

neo4j-modeling-bookings-and-users-img-5

While this query is useful to get all the journeys of the user, it can also be used to map all the locations the user has travelled to.

Queries for finding the booking history of a user

The query for finding all bookings by a user is straightforward, as shown here:

neo4j-sh (?)$ MATCH (user:User{email:"[email protected]"})-[:MADE_BOOKING]->(b:Booking) RETURN b._id as booking_id;

The output of the preceding query is as follows:

+----------------------------------------+
| booking_id                             |
+----------------------------------------+
| "251679be-1b3f-11e5-820e-6c40089a9424" |
| "ff3dd694-7e7f-11e4-bb93-14109fda6b71" |
| "7c63cc35-7e7f-11e4-8ffe-14109fda6b71" |
| "f5f15252-1b62-11e5-8252-6c40089a9424" |
| "d45de0c2-1b62-11e5-98a2-6c40089a9424" |
| "fef04c30-7e2d-11e4-8842-14109fda6b71" |
| "f87a515e-7e2d-11e4-b170-14109fda6b71" |
| "75b3e78c-7e2b-11e4-a162-14109fda6b71" |
+----------------------------------------+
8 rows

Upcoming journeys of a user

Upcoming journeys of a user is straightforward. We can construct it by simply comparing today's date to the journey date as shown:

neo4j-sh (?)$ MATCH (user:User{email:"[email protected]"})-[:MADE_BOOKING]->(:Booking)-[:HAS_JOURNEY]-(j:Journey)
WHERE j.date_of_journey >=1418055307
WITH COLLECT(j) as journeys
MATCH (j:Journey)-[:BY_FLIGHT]->(f:Flight)
WHERE j in journeys
WITH j.date_of_journey as date_of_journey, COLLECT(f) as flights
MATCH (source:City)-[:HAS_FLIGHT]->(f)-[:FLYING_TO]->(destination:City)
WHERE f in flights
RETURN date_of_journey, source.name as from, f.code as by_flight, destination.name as to;

The output of the preceding query is as follows:

+-------------------------------------------------------------+
| date_of_journey | from         | by_flight | to           |
+-------------------------------------------------------------+
| 1.4226426E9     | "New York"   | "VS8"     | "London"     |
| 1.4212602E9     | "Los Angeles" | "UA1262" | "New York"   |
| 1.4212602E9     | "Melbourne"   | "QF94"   | "Los Angeles" |
| 1.4304186E9     | "New York"   | "UA1507" | "Los Angeles" |
| 1.4311962E9     | "Los Angeles" | "AA920"   | "New York"   |
+-------------------------------------------------------------+
5 rows

Summary

In this article, you learned how you can model a domain that has traditionally been implemented using RDBMS. We saw how tables can be changed to nodes and relationships, and we explored what happened to relationship tables. You also learned about transactions in Cypher and wrote Cypher to manipulate the database.

Resources for Article:


Further resources on this subject: