Chapter 4. Modelling and Refactoring Patterns
Now that you’ve mastered the principles of graph modelling, you’re ready to explore some common modelling patterns. You’ll learn to recognise when to apply these patterns and weigh their pros and cons. Sooner or later you’ll also need to refactor your graph, so the second half of this chapter takes you through refactoring patterns with a hands-on approach.
Hyperedges– N-way Relationships
There’s a new trend–everybody appears to be interested in covers of songs. ElectricHarmony wants to jump onto the bandwagon by introducing a new feature that helps listeners find covers of their favourite tracks easily. This feature should also help them discover if some of their favourite tracks are indeed covers–and, if so, who the original artist is. You find yourself at the whiteboard with the team once again to model this use case.
The most obvious way to start is like in Figure 4-1: by quantifying the ARTIST relationship to the Track to specify whether their recording of the track is a cover or not.
Good modelling practice involves revisiting existing use cases to see if your refactoring has impacted them (perhaps adversely). You immediately think back to a key use case–fetching the track artist. So far, ElectricHarmony has leaned towards always showing a track’s original artist, and you don’t want that to change. However, now that this information is contained in a property on the relationship, Neo4j is doing extra work to traverse all the ARTIST
relationships from a TRACK
, examine the cover property value, and discard the majority. While that isn’t a major problem, because it is rare for tracks to have an extremely high number of artists, your team doesn’t feel like this is the right solution.
Instead, based on what you’ve learned in the previous chapter about relationship granularity, you decide to introduce a new relationship type (Figure 4-2): COVERED_BY
.
This leaves the key use case undisturbed. When Neo4j needs to fetch covers–or conversely, find all tracks that a particular artist has covered, it traverses the COVERED_BY
relationship.
What about finding any artist who has performed any version of the track? Simply include both relationship types in the pattern and Neo4j will traverse them both:
MATCH (t:Track {id:"4J4gApJKSC0himDViFotdy"})-[:ARTIST|COVERED_BY]-(a:Artist) RETURN a
Draw some examples on the whiteboard (like in Figure 4-3) to validate that this approach works.
It does work: it’s easy to find both the original artist and other artists who’ve covered these tracks. But does it really work? Are you now asking your team whether it was Simon & Garfunkel or Disturbed who recorded the album Immortalized? What happened?
The model has now lost information: it is no longer possible to associate an album with a particular version of a track. What you want is an n-way relationship, one that connects more than two nodes. The model in Figure 4-4 would be ideal, but in graph databases such as Neo4j, a relationship must have exactly two nodes - one on either end.
The solution here is to capture the shared context at the junction of the n-way relationship into a node called a hyperedge. This hyperedge is bound to its related nodes, and the data it holds is only applicable in this local context.
The hyperedges in Figure 4-5 are represented by the label Recording
. Every time a new version of a track is created by the ingestion process, a new Recording
node must be created to hold context for that particular recording. What’s nice about this model is that it can be extended to model other concepts that are specific to the hyperedge, such as whether the track has won a Grammy award. In Figure 4-6, the graph makes it clear via the Award node that the Simon & Garfunkel version won a Grammy in 1968. The previous models would have obscured this information.
The model can also represent performances, since graphs’ schema-free nature supports exceptions and variations with ease. Figure 4-7 depicts how not every track has an Album
:
Usually, hyperedges emerge to be entities themselves that were previously undiscovered. If that’s not the case, consider carefully whether you want to use them: remember, they add more nodes to the graph, and consequently more hops during traversal.
Key takeaways
Use hyperedges when your graph would otherwise lose context that cannot be represented clearly by two nodes and the relationship that connects them. If you find yourself needing to start a traversal at a relationship, that’s a hint that a hyperedge might be a good idea. Weigh the tradeoffs of introducing more nodes into the graph that will also have to be created or updated during data ingestion flows.
Time-based Versioning
-
Determining how the network changed over time
-
Tracking behaviour changes
-
Updating entities without affecting past records
This section applies a variation of the time-based versioning pattern to a simplified use case for ElectricHarmony: versioning a user’s “Favourites” playlist to track the music tastes of the user over time. This feature will be especially useful by applying time decay to artists and genres–applying lower weights to artists and genres that a user might have loved in the past, but not as much now, to improve the relevance of recommendations.1
Every user has a default Favourites
playlist which contains their favourite songs, as shown in Figure 4-8. Over time, they add or remove tracks from this playlist. To apply the principles of time-based versioning, you’ll separate the object from the state: here, the object is the Favourites
playlist, and the state is its contents at various points in time.
After some debate about time-period granularity, ElectricHarmony settles on six months to detect if a user’s listening preference has changed. This means that their application will create and maintain a new state for six months. You decide to capture this timeframe as a property of relationships between the objects and their states, as pictured in Figure 4-9.
Every time a track is added or removed from the Favourites playlist, the HAS_TRACK
relationship is created or deleted between the Track
and the active FavouriteState
node for the current time period, which is matched with a condition in the query, such as:
MATCH (u:User {id:”100”})-[:OWNS]->(p:Playlist {name:”Favourites”})-[r:HAS_STATE]->(f:FavouriteState) WHERE r.from <= timestamp() <=r.to RETURN f
Tip
It’s worth familiarizing yourself with the temporal instant types available in Neo4j. In this case, you have two main options. One is to represent time as a Timestamp
.2 The other is to represent it as a DateTime
. Timestamps are integers, which are compact and straightforward to compare. However, timestamps for dates before the epoch will be represented as negative integers. A DateTime
is a richer object, making it easier to calculate durations, convert formats, or extract components (such as the day or month). For time versioning in this example, a timestamp does the job.
Introducing time-based versioning brings its own tradeoffs–the model is more complex, you need to adjust your queries, and writing data to the graph is more involved. One way of reducing complexity in the main graph is to offload all states except the active state to another database. We’ll discuss that and explore composite graphs when we revisit that particular case in chapter X.
Another avenue to explore is encoding the time period into the relationship type. This is helpful for cases where you want to capture when relationships were created or when they were valid or applicable. Recall the section on Bucketed Relationships from Chapter 3–the relationship from the Favourites playlist to a track could also be recorded as FAVOURITE_2024_H1
for the time period of the first half of the year.
Key takeaways
Time-based versioning is useful when you want to track or audit changes made to the graph or go backwards in time to compare the states of the graph. It adds complexity, however, so consider carefully.
Representing Sequences
Recall that, in chapter 1, you learned that Playlists
are related to their tracks via the HAS_TRACK
relationship, which includes a position
property representing the order of the tracks (Figure 4-10). While this did the job for your proof-of-concept model, in working with it, the team has uncovered some disadvantages. You agree to revisit the question of track order in playlists.
Finding the last track in a playlist
The recommendation query in the proof-of-concept model is based on the last track in a playlist, which is queried in an awkward way. Neo4j traverses every HAS_TRACK
relationship from the playlist to inspect the position
property and compare it to the total number of tracks on the playlist (determined by the COUNT
subquery):
MATCH (p)-[r:HAS_TRACK]->(t) WHERE r.position = COUNT { (p)-[:HAS_TRACK]->()} WITH p AS playlist, t AS lastTrack
It also assumes that all position
properties are correct and sequential, with no holes, since the last track position must match the count of all tracks. This brings us to the next problem.
Maintaining track order
Users frequently reorder tracks in a playlist: new tracks are added, others are removed. Every time one of these operations takes place, your code must recalculate the positions of all tracks in the playlist.
You can make that happen by writing a procedure that extends Neo4j. The disadvantage of going down this route is that you have to write procedures in a JVM language (Kotlin, Scala etc.) and compile them into JAR files, which you must then deploy as plugins to the Neo4j database server.
Since you’re working with a graph, there is a more natural way to model sequences–whether they are tracks in a playlist, episodes in a TV series, or jobs in your employment history: linked lists. You’ll also find linked lists to be especially useful when splitting text into chunks and linking them in order to represent the original document– chapter X on GenAI will refer to this concept. Linked lists are fundamental data structures where each element maintains a reference or pointer to the next. The graph fetches the elements in sequence by simply following the pointer to the next element in the list. The complexity of insertion and deletion are, at best, O(1)
and, at worst, O(n).
In a graph, the pointers are implicitly represented by relationships, and traversing across relationships is the sweet spot for graph databases. Figure 4-11 shows a playlist modelled as a linked list.
Let’s connect some of the concepts you’ve read in chapter 3. The coarse-grained HAS_TRACK
relationship from the Playlist
to a Track
has been replaced with a more specific relationship, PLAYLIST_TRACK
. (The PLAYLIST_TRACK
relationship can also be called NEXT: it’s just a matter of preference.)
Consider Figure 4-12. Can you tell which tracks are part of which playlist?
You can probably tell that “Nobody’s Fool” is the first track of two playlists, “Rock Ballads” and “Drive,” but the tracks that follow it in each playlist are unclear.
To avoid losing context about how tracks are related to playlists, let’s introduce a new node: PlaylistItem
. The PlaylistItem
node makes it much easier for the graph to traverse the playlist and to modify its order.
Figure 4-13 contains one more modification you can make. Since finding the last track of any Playlist is a top use case for ElectricHarmony, instead of traversing all tracks in the sequence, you can give the graph a shortcut and send it straight to the last track by adding a LAST_PLAYLIST_TRACK relationship. Only add this relationship, however, if it’s useful for the majority of your use cases or if it shows a substantial performance advantage when traversing over extremely long playlists. Premature optimisation in the form of adding the LAST_PLAYLIST_TRACK leads to issues in other areas– you increase complexity because the pointer to the tail of the linked list has to be maintained, especially when the list is updated frequently.
The recommendation query you wrote on Day 4 in chapter 1 contained the following segment to find the last track:
// Find the last track MATCH (p)-[r:HAS_TRACK]->(t) WHERE r.position = COUNT { (p)-[:HAS_TRACK]->()} WITH p AS playlist, t AS lastTrack, popularTracks With the introduction of the LAST_PLAYLIST_TRACK relationship, this part of the query reduces to: MATCH (p)-[:LAST_PLAYLIST_ITEM]->()<-[:PLAYLIST_ITEM]-(lastTrack)
In the next section, you will refactor the graph you created in chapter 1 to this enhanced model.
Be careful not to model a doubly linked list data structure by introducing a PREVIOUS_PLAYLIST_TRACK
relationship to navigate the playlist in reverse, as shown in Figure 4-14.
As you learned in the section on bidirectional relationships, these are redundant and bad practice.
Key takeaways
When you encounter sequences in your domain, think about whether you can model them as linked lists for easier modification and faster traversal through the sequenced nodes. Weigh the pros and cons of query complexity against the frequency of updates to the list.
Refactoring Patterns
At some point, you’ll need to enhance your graph model for performance or new data or new use cases, and that means refactoring your graph data. There are two basic ways to refactor:
- Reingest all data
-
This method assumes that the graph can be completely reconstructed from original sources. Here, you modify the ingestion queries or scripts to write data to the graph as per the new model. The benefit of this approach is that ingestion scripts are always current. The downside is that replacing the graph in production will require some minimal downtime.
- Refactor live
-
When you can’t reconstruct your graph or when it’s quite small, live refactoring is the way to go– refactor queries are executed directly on the live graph. Typically, you’ll do this on a backup of the graph and validate it before applying the refactor in production.
You can also combine these two approaches. If data is constantly streaming into the graph, refactoring will feel like a never-ending quest. One way of tackling this problem is to first update the ingestion scripts or queries to write new data to the graph in accordance with the refactored model. This serves to stop propagating the current graph model. Then you can perform a live refactor to address the current graph.
In most cases, you can maintain backward compatibility. The current version of the graph and the refactored version can coexist, which lets you migrate queries gradually with no downtime. However, you need to plan what to do in case things go wrong. For example, if the graph is too large to apply the refactor in a single transaction, you run the risk of applying a partial refactor until you can reattempt the rest of the transactions that have failed. This is why we advise keeping both versions of the graph in place until you’ve verified the success of the operation. Then you can remove the previous version.
A huge benefit of graph refactoring is that it lets dependent applications catch up and migrate over time. You can phase out the “old” graph model once all queries and applications have switched over to the new one.3 (You will see this in action in the following sections.)
You’ve already performed the first type of refactoring: in chapter 1, you converted the data type of the position
property of the HAS_TRACK
relationship from a string to an integer by updating the LOAD CSV
query and reingesting the data. You also removed the redundant SIMILAR
relationship earlier in this chapter.
The next three sections describe a few more refactoring patterns that you’re likely to come across, all of which use live refactoring.
Refactoring to change the type of a relationship
After you read the section on relationship granularity in chapter 3, you decide to refactor the graph to replace the HAS_TRACK
relationship between Playlists
and Tracks
with a ON_PLAYLIST
relationship from Tracks
to Playlists
.
Neo4j does not support changing the type of a relationship, so you need to delete the HAS_TRACK
relationship type and create ON_PLAYLIST
. Since you want to keep the graph backward-compatible until ElectricHarmony’s engineering team updates their recommendation queries, you’ll refactor in the following stages:
Stage 1
Match all HAS_TRACK
relationships from a Playlist
to a Track
. For each pair of Track
and Playlist
matches, create an ON_PLAYLIST
relationship from the Track
to the Playlist
and set the position on the new relationship. At this point, both models co-exist in the same graph; the recommendation query will not break.
First, count how many relationships need to be refactored in the graph you that you ended with from chapter 1:
MATCH (:Playlist)-[r:HAS_TRACK]->(:Track) RETURN count(r)
There are 73,153 relationships to be refactored. This is not a large number; Neo4j can refactor these relationships in a single transaction.
Execute the following query to refactor the graph:
//Step 1 MATCH (p:Playlist)-[hastrack:HAS_TRACK]->(t:Track) //Step 2 MERGE (t)-[onplaylist:ON_PLAYLIST]->(p) SET onplaylist.position=hastrack.position;
If you pause here and inspect a playlist in the graph, you’ll see that both versions of the model are maintained, like the example in figure 4-15.
Stage 2
Update all queries that depend on the HAS_TRACK relationship to use the new ON_PLAYLIST relationship.
Here’s the updated recommendation query that you can test:
//Find popular tracks MATCH (popularTrack:Track)-[:ON_PLAYLIST]-(:Playlist) WITH popularTrack, count(*) as playlistCount ORDER BY playlistCount DESC LIMIT 10 WITH collect(elementId(popularTrack)) as popularTracks // For a given Playlist MATCH (p:Playlist) WHERE p.name = "all that jazz" // Find the last track MATCH (p)<-[r:ON_PLAYLIST]-(t) WHERE r.position = COUNT { (p)<-[:ON_PLAYLIST]-()} WITH p AS playlist, t AS lastTrack, popularTracks // Get the previous tracks WITH playlist, lastTrack, popularTracks, COLLECT { MATCH (playlist)<-[:ON_PLAYLIST]-(previous) WHERE previous <> lastTrack RETURN elementId(previous) } AS previousTracks // Find other playlists that have the same the last track MATCH (lastTrack)-[:ON_PLAYLIST]->(otherPlaylist)-[:SIMILAR]-(playlist) // Find other tracks which are not in the given playlist MATCH (otherPlaylist)<-[:ON_PLAYLIST]-(recommendation) WHERE NOT elementId(recommendation) IN previousTracks AND NOT elementId(recommendation) IN popularTracks // Score them by how frequently they appear RETURN recommendation.id as recommendedTrackId, recommendation.name AS recommendedTrack, otherPlaylist.name AS fromPlaylist, count(*) AS score ORDER BY score DESC LIMIT 5
Note that the WHERE
clause from the original query:
// Find other playlists that have the same the last track MATCH (lastTrack)<-[:HAS_TRACK]-(otherPlaylist)-[:SIMILAR]-(playlist) WHERE otherPlaylist <> playlist
has been dropped, since the graph now has a single SIMILAR
relationship between any pair of playlists
Stage 3
Delete the HAS_TRACK
relationship.
//Step 3 MATCH (p:Playlist)-[hastrack:HAS_TRACK]->(t:Track) DELETE hastrack
Refactoring to create a node from a property
To see this refactoring pattern in action, you’ll first add a property to a few Artist
nodes that represents genre
. Then you’’ll refactor the graph to extract genres as nodes.
The accompanying Github repository contains a CSV file of genres for a limited number of artists, which you can import into the graph using LOAD CSV
. Since this data source does not contain artist IDs, you will match artists by their names. Adding an index on artist name will speed up the MATCH
and hence the import:
CREATE INDEX artist_name FOR (n:Artist) ON n.name;
The data source isn’t particularly clean and contains a slash-delimited set of genres for each artist. The following query will split the genres on the / delimiter, trim spaces, convert them to lowercase, and store them as a list of strings on the Artist
node:
LOAD CSV WITH HEADERS FROM "file:///genres.csv" AS row WITH row WITH row.Artist as artist, split(row.Genre,"/") AS genreList UNWIND genreList AS genre WITH artist, collect(trim(toLower(genre))) AS genres MATCH (a:Artist {name:artist}) SET a.genres=genres
Figure 4-16 shows a sampling of artist-genre pairs after this data has been ingested.
Your next task is to create Genre
nodes from the genres
property. Since you don’t want duplicate Genre
nodes, you will use MERGE
. This will result in a single Genre
node for “alternative rock,” even though this genre is repeated across the artists listed in figure 4-16.
The best practice is to create the constraint first:
CREATE CONSTRAINT genre_name FOR (genre:Genre) REQUIRE genre.name IS UNIQUE
The following steps will complete the refactor:
-
Match all
Artist
nodes that have agenres
property. -
UNWIND
the genres, which will convert the list of genres into individual rows -
Merge a
Genre
node for each unwound genre. -
Merge a
GENRE
relationship from theArtist
to eachGenre
node from the previous step. -
Now both models coexist in the graph. Once you have refactored all queries that use
genres
property on anArtist
to traverse theGENRE
relationship, drop the property.
Here’s the Cypher query to execute on your graph:
//Step 1 MATCH (a:Artist) WHERE a.genres IS NOT NULL //Step 2 WITH a UNWIND a.genres as genreName //Step 3 MERGE (g:Genre {name:genreName}) //Step 4 MERGE (a)-[:GENRE]->(g) //Step 5 REMOVE a.genres
Figure 4-17 depicts the refactored graph.
Refactoring to create a node from a relationship
Now it’s time to refactor the ON_PLAYLIST
relationships into new PlaylistTrack
nodes and create linked lists, following the model described in the “Representing Sequences” section earlier in this chapter. Refresh your memory using Figure 4-18 which shows the transformation required to accomplish this:
Tip
Use realistic examples to expose potential problems with your refactoring early. The graph in figure 4-18 includes “Back in Black”, a track that is on two playlists.
While creating a node from a relationship looks complicated, breaking it up into stages makes it more manageable.
Stage 1
Before getting into the creation of the linked list, you’ll only create PlaylistTrack
nodes in the first stage. Remember that the PlaylistTrack
nodes are nodes that represent a track at a particular spot in the playlist. The result should look like Figure 4-19.
Can you spot the problem with this new graph? Look closely at the new PlaylistTrack
nodes. How will you sequence them in a linked list later on?
The lists to create are Klasic-Progresif-Hard -> A -> B -> C
and jump up -> D
.
Introducing the PlaylistTrack
nodes has muddied the waters, and it is not clear how to figure out which PlaylistTrack
for “Back in Black” belongs to which Playlist, except by correlating its position numbers on the ON_PLAYLIST
and PLAYLIST_ITEM
relationships. If the same track is at the same position in multiple playlists, then the problem is magnified.
Don’t be afraid to use temporary relationships during refactoring to help you write simple but predictable queries- you’ll delete these relationships when they’ve done their job of assisting with the refactor. Here, try introducing a temporary TRACK_ITEM_TEMP
relationship between the Playlist
and the PlaylistTrack
(as shown in Figure 4-20). This relationship is essentially a copy of the ON_PLAYLIST
relationship, but it terminates at the new PlaylistTrack
nodes and will help reconstruct the sequence for the linked list.
Now it’s time to write the Cypher query for this stage. It consists of three steps. First, find all tracks and the playlists they’re on. Then, create a PlaylistTrack
node for every Track
in the Playlist
. Finally, create the TRACK_ITEM_TEMP
relationship from the Playlist
to the PlaylistTrack
, recording the position
of the track in the playlist on the relationship.
Since the number of playlists and tracks are fairly large, you’ll execute this in transactions, with a default batch size of 1,000:
//Stage 1 //Step 1 MATCH (p:Playlist) CALL { WITH p MATCH (t:Track)-[r:ON_PLAYLIST]->(p:Playlist) //Step 2 CREATE (pt:PlaylistTrack) CREATE (t)-[:PLAYLIST_ITEM]->(pt) //Step 3 CREATE (p)-[:TRACK_ITEM_TEMP {position:r.position}]->(pt) } IN TRANSACTIONS
Stage 2
This stage simply links the Playlist
to the head and tail PlaylistTrack
nodes.
//Stage 2 MATCH (p:Playlist) CALL { WITH p MATCH (p)-[:TRACK_ITEM_TEMP {position:1}]->(firstTrack:PlaylistTrack) CREATE (p)-[:PLAYLIST_TRACK]->(firstTrack) WITH p MATCH (p)-[r:TRACK_ITEM_TEMP]->(lastTrack:PlaylistTrack) WHERE r.position = COUNT {()-[:ON_PLAYLIST]->(p)} CREATE (p)-[:LAST_PLAYLIST_TRACK]->(lastTrack) } IN TRANSACTIONS
Stage 3
This is where the linked list is produced by creating relationships between PlaylistTrack
nodes. You’ll use the position
property on the TRACK_ITEM_TEMP
relationship to maintain the playlist’s track order.
One way to do this in Cypher is to collect a playlist’s PlaylistTrack
nodes, in order. Then use a combination of RANGE and UNWIND to obtain a pair of adjacent tracks and create the relationship between them:
//Stage 3 //Step 1 MATCH (p:Playlist) CALL { WITH p MATCH (p)-[r:TRACK_ITEM_TEMP]->(t:PlaylistTrack) WITH r,t ORDER BY r.position WITH COLLECT(t) AS playlistTracks UNWIND RANGE(0,SIZE(playlistTracks) - 2) as idx WITH playlistTracks[idx] AS t1, playlistTracks[idx+1] AS t2 MERGE (t1)-[:PLAYLIST_TRACK]->(t2) } IN TRANSACTIONS
Tip
The apoc library contains a handy list of refactoring procedures, such as apoc.nodes.link to creat a linked list – this would simplify the query you just wrote.
Next, drop the temporary relationships as they’ve done their job. You can batch the deletes in transactions to avoid running out of memory:
//Stage 3 //Step 2 MATCH ()-[r:TRACK_ITEM_TEMP]-() CALL { WITH r DELETE r } IN TRANSACTIONS
Now both models coexist in the same graph, as can be seen in Figure 4-21.
Stage 4
To use the linked list, rework the recommendation query from chapter 1 as follows:
//Stage 4 //Find popular tracks MATCH (popularTrack:Track)-[:PLAYLIST_ITEM]->() WITH popularTrack, count(*) as playlistCount ORDER BY playlistCount DESC LIMIT 10 WITH collect(elementId(popularTrack)) as popularTracks // For a given Playlist MATCH (playlist:Playlist) WHERE playlist.name = "all that jazz" // Find the last track MATCH (playlist)-[:LAST_PLAYLIST_TRACK]->(lastTrackItem) // Get the previous tracks WITH playlist, lastTrackItem, popularTracks, COLLECT { MATCH (playlist) (()-[:PLAYLIST_TRACK]->()) {1,100} (previousTrackItem) WHERE previousTrackItem <> lastTrackItem RETURN elementId(previousTrackItem) } AS previousTrackItems // Find other playlists that have the same the last track MATCH (playlist)-[:SIMILAR]-(otherPlaylist:Playlist)-[:LAST_PLAYLIST_TRACK]->(otherLastTrack)<-[:PLAYLIST_ITEM]-(:Track)-[:PLAYLIST_ITEM]->()<-[:LAST_PLAYLIST_TRACK]-(playlist) // Find other tracks which are not in the given playlist MATCH (otherPlaylist) (()-[:PLAYLIST_TRACK]->()) {1,100} (recommendationItem)<-[:PLAYLIST_ITEM]-(recommendation) WHERE NOT elementId(recommendationItem) IN previousTrackItems AND NOT elementId(recommendationItem) IN popularTracks // Score them by how frequently they appear RETURN recommendation.id as recommendedTrackId, recommendation.name AS recommendedTrack, otherPlaylist.name AS fromPlaylist, count(*) AS score ORDER BY score DESC LIMIT 5
Finally, drop the ON_PLAYLIST
relationships:
//Stage 4 MATCH ()-[r:ON_PLAYLIST]-() CALL { WITH r DELETE r } IN TRANSACTIONS
Key takeaways
Refactoring is a normal part of graph modelling. You can refactor a live graph or reingest all the data to comply with an updated model. A model’s old and new versions of the model can coexist in the same graph: this provides backward compatibility and eases the query migration process.
Sometimes refactoring can result in more complex queries, such as with the recommendation query. Always consider the cost and benefits, taking the principles of modelling into account.
Summary
You can now recognize when specific modelling patterns such as linked lists and hyperedges can be applied. Most domains are straightforward to model, and with experience, you’ll find yourself using these patterns intuitively. You’ve also practised refactoring your graph. Soon this will be second nature.
The next chapter addresses query performance and tuning: we’ll show you how queries are executed and how your modelling decisions affect it.
1 A common modelling pattern for this was first introduced by Ian Robinson; although the original article appears to have vanished, the topic has subsequently been covered by Ljubica Lazarevic.
2 This refers to time elapsed in milliseconds since the epoch: midnight UTC on January 1, 1970.
3 This is similar to Martin Fowler’s description of the Strangler Fig Application.
Get Neo4j: The Definitive Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.