Chapter 4. Detecting Fraud and Money Laundering Patterns

In this chapter, we take on the serious problem of fraud and money laundering. Fraud is typically conducted by one or more parties as a multi-step process. Sometimes, the only way to distinguish fraud or money laundering from legitimate activity is to detect a characteristic or unusual pattern of activity. Modeling the activity and relationships with a graph enables us to detect suspicious activity by searching for those patterns along with checking for their frequency.

After completing this chapter, you should be able to:

  • Describe coordinated activity among multiple parties in terms of a graph pattern.

  • Use a multi-hop or iterated single-hop graph traversal to perform a deep search.

  • Describe bidirectional search and its advantage.

  • Understanding the use of timestamps to find a time sequence.

Goal: Detect Money Laundering

Financial institutions are responsible for averting criminal money flows through the economic infrastructure. According to The Financial Action Task Force (FATF), illicit funds amount to 3.6% of global GDP1. A well-known criminal activity is money laundering. Money laundering is the process of disguising the illegal origin of the money. According to the FATF 2.7% of the global GDP is laundered per year. As part of the banking license, banks are obligated to investigate their clients’ payment behavior and report any suspicious activities.

Banks have built a wide range of applications and procedures into their daily operations to identify and detect money laundering. Broadly speaking, these financial crime detection efforts can fall into two areas. The first area of investigation, Know Your Customer (KYC), looks into the client profile. Similar to what we’ve seen in Chapter 3 with the Customer 360 use case, analysts need to conduct client due diligence. This client risk assessment can happen at multiple stages of the client lifecycle, such as during new client take over (NCTO) or during a periodic review. The second area of investigation,Transaction Monitoring, mainly focuses on identifying criminal behavior through bank transactions. Here analysts try to identify unusual payment patterns between sender and beneficiaries. Although these two investigation areas often overlap from a bank operational perspective and on a risk management level, we will mainly focus on transaction monitoring in this chapter.

Transaction monitoring involves thorough investigations into entities that show suspicious payment behavior. Analysts start these investigations from an entity that has been flagged by business rules and moves from there to other related instances to explore high-risk interactions. Thus analysts do not know the complete picture of how the money flows and lack visibility on where the flagged entity is in the entire money trail. To gain this visibility, they have to query step-by-step the next payment interaction to build up a complete picture of the payment network. Therefore analysts need an approach that helps them retrieve a set of consecutive payments and the parties involved in those payments.

Solution: Modeling Financial Crimes as Network Patterns

Traditional transaction monitoring relies on rule-based systems where fixed risk indicators are triggered if it applies to a client’s payment behavior. Such a risk indicator could be when, for example, clients received 15,000 USD cash in their account and immediately sent that money to several third-party accounts. This kind of activity is known as layering in the context of anti-money laundering (AML). It indicates a suspicious activity because it revolves around a large amount of cash, and that money moves to several third parties, making it harder to backtrack its origin. There are two major problems with relying on rule-based risk indicators. First, the analyst is still required to do an in-depth follow-up investigation on flagged clients, which involves querying consecutive payments between different clients. Second, rule-based risk indicators are set up based on analysts’ known patterns. Thus they only cover cases from historical records, making it challenging to identify risks from unforeseen patterns.

When modeling this problem as a network, it becomes easier to identify high-risk patterns because we can visualize the money flow directly from the graph data model. Doing so shows us how the money moves in a network and which parties are involved in those payment interactions. This graph approach solves the first problem because the graph pattern search will discover the consecutive payments for the analyst. It also solves the second problem because the network will expose all the relationships between involved parties, including those that the analysts do not explicitly query.

Implementing Financial Crime Pattern Searches

TigerGraph provides a graph for detecting money laundering as a cloud Starter Kit. Follow the installation steps from chapter 2 to install the Starter Kit. After the installation we will use the Starter Kit to design our money laundering network and explore how we can detect suspicious payment interactions on this network.

The Fraud and Money Laundering Detection Starter Kit

UsingTigerGraph Cloud, deploy a new cloud solution and select Fraud and Money Laundering as the Starter Kit. Once this Starter Kit is installed, you can load the data following the Load Data and Install Queries steps for a Starter Kit in chapter 2.

Graph Schema

The Fraud and Money Laundering Detection Starter Kit contains over 4.3M vertices and 7M edges, with a schema that has four vertex types and five edge types. Figure 4-1 shows the graph schema of this Starter Kit.

Graph schema for Fraud and Money Laundering Detection Starter Kit
Figure 4-1. Graph schema for Fraud and Money Laundering Detection Starter Kit

In Table 4-1 we describe the four vertex types. A User has a central role in a payment interaction, where it can receive and send payments. Transaction is the payment itself. Device_Token is a unique ID number that refers to the device used for the payment, and Payment_Instrument refers to the type of instrument used for the payment.

Table 4-1. Vertex types in Fraud and Money Laundering Detection Starter Kit
Vertex Type Description
User A person who is involved in a payment
Transaction A payment
Device_Token A unique ID number used to carry out the Transaction
Payment_Instrument An instrument to execute the payment

There are two types of relationships between User and Transaction. A User can receive a transaction, denoted with User_Receive_Transaction, or a User can send a transaction, marked with User_Transfer_Transaction. A User can refer another User, which is indicated by User_Refer_User. The edge type User_to_Payment links a User to a Payment_Instrument (check, cash, warrant, etc.) used to carry out a transaction. Finally, the User_to_Device edge type connects a User to the Device_Token used when making an electronic payment.

Queries and Analytics

The queries included in this starter kit showcase how graphs can help analysts detect high-risk payment behavior to combat fraud and money laundering. We’ll first give a high-level description of the pattern that each query looks for and how this pertains to transaction fraud or money laundering. We then go into depth for three of them, to give you a better idea of how they are implemented.

  1. Circle Detection: This query detects when money moves in a circular flow. It selects all the Transaction elements which form a chain from an input User and then return to that User. If the amount of money that comes back is close to the amount that went out, then this may indicate money laundering.

  2. Invited User Behavior: This query looks for suspicious patterns of User_Refer_User behavior, which may indicate that a User is collaborating with other parties to collect referral bonuses. It looks at the number of referrals within two hops of a source User, and at the number of transactions these users have conducted.

  3. Repeated User: This query discovers if there is a connection among User elements that send money to the same receiver. It starts with the input User who receives money and selects all other User elements that send the money to that input User. Then it checks if there is a path between those senders using Device_Token, Payment_Instrument, and User.

  4. Same Receiver Sender: This query detects if a User uses a fake account to send money to itself. Given an input Transaction, this query returns true if the receiver and sender can be linked to each other by Device_Token and Payment_Instrument.

  5. Transferred Amount: This query looks within a given time window for the total amount of funds transferred out from the Users who are connected within a few hops of a source User. While not directly suspicious, a high volume of funds could help to build the case for AML layering.

  6. Multi Transaction: This query showcases the payments between two networks of User elements. Starting with an input Transaction, the first group is a network of User elements related to the sending party. The second group is a network of User elements from the receiving party. Then visualize all the money flow between the User elements of these two groups.

We now take a closer look at Invited user Behavior, Multi Transaction, and Circle Detection.

Invited User Behavior

This pattern assumes that a User can earn a referral bonus for referring a new User to the bank. This query contains a two-hop traversal implementation. We start our traversal from a given input_user. The first hop selects all the invited User elements that are invited by this input_user. Then, with the second hop, we collect all the User elements that the first order invitees invite. We then aggregate the transaction amount of those invitees. The input_user is a fraudulent User if the amount of money directly being transferred is high, while the aggregated money from the second-order invitees is low or zero. The intuition behind this is that input_user has many fake referrals that fuel itself with referral bonuses so that it can send a large number of transactions.

Graph traversal pattern to detect fraudulent users that conduct activities to earn referral bonus
Figure 4-2. Graph traversal pattern to detect fraudulent users that conduct activities to earn referral bonus

First we declare some accumulator variables to store our aggregated data.

  SumAccum<INT> @@num_invited_persons;
  SumAccum<FLOAT> @@total_amount_sent;
  SetAccum<EDGE> @@edges_to_display;

The variable @@num_invited_persons stores the number of second-hop invitees. We aggregate the amount of all transactions from the one-hop invitees and store that in @@total_amount_sent. With @@edges_to_display we store all the edges (User_Ref_User) between the input User and a referred User, so that the visualization system knows to display them.

Then, we find the one-hop invitees referred by the source User. We save each edge between the Start user and an invitee in @@display_edge_set.

    Start = {input_user};

    First_invitees = SELECT t
        FROM Start:s - (User_Refer_User>:e) - :t
        ACCUM @@edges_to_display += e;

In the FROM clause, we don’t need to specify what type of vertices we are targeting, because the edge type (User_Refer_User) only permits one type of target vertex (User)..

Next, we add up the amount of money that these first-order invitees have sent out. Each Transaction has an attribute called amount.

    Trans = SELECT t
        FROM First_invitees:s - (User_Transfer_Transaction>:e) - :t
            @@total_amount_sent += t.amount,
            @@edges_to_display += e;

Finally, we get the additional invitees referred by first-hop invitees. This search looks very much like the first hop, with two additional steps:

We check that we are not hopping back to the source User.

We count the number of second-order invitees.

    Second_invitees = SELECT t
        FROM First_invitees:s - (User_Refer_User>:e) - :t
        WHERE t != input_user
        ACCUM @@edges_to_display += e
        POST-ACCUM (t) @@num_invited_persons += 1;

Multi Transaction

Analysts motivate that criminals transfer money between two networks. The following query exposes this intuition. Given any input transaction, the first network consists of related accounts from the sender of that transaction, and the second network consists of associated accounts from the receiving party. Then we look for payment activities between all parties from those two networks.

Graph traversal pattern to find transaction networks from sending and receiving parties
Figure 4-3. Graph traversal pattern to find transaction networks from sending and receiving parties

We start with selecting the sender and receiver User elements for a given Transaction by traversing User_Transfer_Transaction or User_Receive_Transaction edge types.

    Start (ANY) = {source_transaction};
    Start = SELECT t FROM 

In the FROM clause, we are traversing from a Transaction (source_transaction) to User elements, which is the reverse direction of the User_Receive_Transaction and User_Transfer_Transaction edges. That is why the direction arrows point to the left and are on the left side of the edge type names. Alternatively, if those edges have reverse edge types defined, we could use their reverse edges instead (and use right-facing arrows.)

We use cases to determine if the User is a receiving or sending party of the Transaction. If a User connects to a Transaction via User_Receive_Transaction, we set @from__receiver to true and add that User to the @@receiver_set. In other cases, the User is a sending party of the Transaction, so we set @from_sender to true and add this User to @@sender_set.

    CASE WHEN e.type == "User_Receive_Transaction" THEN
      t.@from_receiver += true,
      @@receiver_set += t
      t.@from_sender += true,
      @@sender_set += t

Now that we know the sender and receiver, we find User elements that belong to the receiving or sending party. That is, we traverse over User_to_Device or User_to_Payment edges and add User elements to either the @@sender_set or @@receiver_set if they exist within four hops (WHILE Start.size() > 0 LIMIT 4 DO). Since it takes two hops to make a transaction (Sender - Transaction - Receiver), 4 hops equals a chain of two transactions.

    WHILE Start.size() > 0 LIMIT 4 DO
        Start = SELECT t FROM Start:s - ((User_to_Device|User_to_Payment):e) - :t
            WHERE t.@from_receiver == false AND t.@from_sender == false
                t.@from_receiver += s.@from_receiver,
                t.@from_sender += s.@from_sender,
                @@edgeSet += e
            POST-ACCUM (t)
                CASE WHEN t.type == "User" AND t.@from_sender == true THEN
                    @@sender_set += t
                WHEN t.@from_receiver == true THEN
                    @@receiver_set += t

If we end up at a User vertex type and that User is a sending party, we add that User to @@sender_set. If t.@from_receiver is true, then the User belongs to the receiving party, and we add that User to @@receiver_set.

After forming the sending and receiving groups, we now look for transactions other than the source transaction which connect the sender and receiver groups. First we find transactions adjacent to the receiver set:

    Start = {@@receiver_set};
    Rec_T = SELECT t FROM 
        Start:s - ((User_Receive_Transaction>|User_Transfer_Transaction>):e) - :t

Then we find transactions adjacent to the sender set.

    Start = {@@sender)set};
    Send_T = SELECT t FROM    
        Start:s - ((User_Receive_Transaction>|User_Transfer_Transaction>):e) - :t
            WHERE t != transaction
                t.@from_sender += s.@from_sender,
                @@edge_set += e
            HAVING t.@from_receiver AND t.@from_sender;

The HAVING clause checks whether a transaction is considered part of the receiving group and the sending group.

Circle Detection

The essence of money laundering is to transfer money so that it becomes a challenge to backtrace its origin. Criminals have several routing schemas to mask the source of their illicit money. A popular transfer schema is one where the money is transferred via various intermediaries to return eventually to one of the senders. In this case, the money traverses in a circular pattern. With graphs, we can detect such circular patterns easier than with traditional databases because we can hop repeatedly from one transaction to the next until a transaction arrives at the originator. As we explained in Chapter 2, graph hops are computationally much cheaper than table joins in a relational database.

Example circular money flow
Figure 4-4. Example circular money flow

In Figure 4-4 we see such a circular money flow. In this example, Adam is the originator and sends 100 USD to Ben. Ben sends 60 USD to Cor, and she sends 40 USD to Daisy, who in her turn, sends 100 USD back to Adam. We show in this example that Ben, Cor, and Daisy do not send the same amount of money they have received to the next person in the chain. Criminals do this to add another layer of noise by making the starting amount branch out into different chunks across various intermediaries, making it harder to find out who the originator is and what amount is being laundered.

The query circle_detection finds all the circular transaction chains starting from a given User (source_id) which have up to a maximum number of transactions per circle (max_transactions). Since there are two hops per transaction (Sender → Transaction → Recipient), the circles can have up to twice as many hops. To be a valid circle, the sequence of transactions in a circle must move forward in time. For example, for this to be a valid circle:

source_id → txn_A → txn_B → txn_C → source_id

Then txn_A.ts < txn_B.ts < txn_C.ts, where ts is the timestamp of a transaction.

Because there are so many possible paths to check, the query’s implementation employs a couple of performance and filtering techniques. The first one is bidirectional search. Bidirectional search searches forward from the starting point while simultaneously searching backward from the ending point. It is faster to conduct two half-length searches than one full-length search. When the two searches intersect, you have a complete path.

The second technique is filtering out implausible times. The query starts with a preparatory step; one of its purposes is to set some boundary times.

  Seed = {source_id};
  Seed = SELECT src
      FROM Seed:src - ((User_Transfer_Transaction>|User_Receive_Transaction>):e)
                    - Transaction:tgt
          CASE WHEN 
            e.type == "User_Transfer_Transaction" 
            @@min_src_send_time += tgt.ts
            @@max_src_receive_time += tgt.ts
      HAVING @@max_src_receive_time >= @@min_src_send_time;

Starting from source_id, make one step both forward (User_Transfer_Transaction) and backward (User_Receive_Transaction). Find the earliest time of any transaction sent by source_id (@@min_src_send_time) and the latest time of any transaction received by source_id (@@max_src_receive_time). Check to make sure that @@max_src_receive_time >= @@min_src_send_time. These global limits will also be used later to check the plausibility of other transactions which are candidates for a circular path.

Then we begin Phase 1 of the search. Starting from source_id, step forward two hops (equals one transaction). Using Figure 4-4 as an example, this would step from Adam to Ben. Also traverse two hops backward (Adam to Daisy). Iterate this combination of steps, moving forward (or backward) in time until each direction has stepped halfway around a maximum size circle. Table 5-2 shows the paths that would be traversed if we consider the graph of Figure 4-4.

Table 4-2. Forward and reverse paths, using the graph of Figure 4-4
Iteration 1 2 3
Forward Adam→Ben Ben→Cor Cor→Daisy
Reverse Adam→Daisy Daisy→Cor Cor→Ben

The code snippet below shows a simplified version of one iteration of the forward traversal. For brevity, the checking of timing and step constraints has been omitted.

          Fwd_set = SELECT tgt
              FROM Fwd_set:src - (User_Transfer_Transaction>:e) - Transaction:tgt
              WHERE tgt.ts >= @@min_src_send_time 
                  AND src.@min_fwd_dist < GSQL_INT_MAX 
                  AND tgt.@min_fwd_dist == GSQL_INT_MAX
              ACCUM tgt.@min_fwd_dist += src.@min_fwd_dist + 1
              … // POST-ACCUM clause to check time and step constraints
          Fwd_set = SELECT tgt
              FROM Fwd_set:src - (<User_Receive_Transaction:e) - User:tgt
              WHERE src.@min_fwd_dist < GSQL_INT_MAX 
                  AND tgt.@min_fwd_dist == GSQL_INT_MAX
              ACCUM tgt.@min_fwd_dist += src.@min_fwd_dist + 1               
	      … // POST-ACCUM clause to check time and step constraints
              HAVING tgt != source_id;

Looking at Table 5-2, we see that after the second iteration, a forward path and a reverse path have met at a common point, Cor. We have a circle! But wait. What if the Ben→Cor timestamp is later than the Cor→Daisy timestamp? If so, then it’s not a valid circle.

In Phase 2 of the query, we discover and validate circular paths by doing the following : For the forward search, continue traversing forward but only along paths which were previously traversed in the reverse direction and which move forward in time. In our example, if max_transactions = 2 so that Phase 1 got as far as Ben→Cor, then Phase 2 could continue on to Cor→Daisy, but only because we had already traversed Daisy→Cor in Phase 1 and only if the timestamps continue to increase.

          Fwd_set = SELECT tgt
              FROM Fwd_set:src - (User_Transfer_Transaction>:e) - Transaction:tgt
              // tgt must have been touched in the reverse search above
              WHERE tgt.@min_rev_dist < GSQL_INT_MAX
                  AND tgt.ts >= @@min_src_send_time 
                  AND src.@min_fwd_dist < GSQL_INT_MAX 
                  AND tgt.@min_fwd_dist == GSQL_INT_MAX
              ACCUM tgt.@min_fwd_dist += src.@min_fwd_dist + 1               
                  CASE WHEN 
                    tgt.@min_fwd_dist < GSQL_INT_MAX 
                    AND tgt.@min_rev_dist < GSQL_INT_MAX
                    AND tgt.@min_fwd_dist + tgt.@min_rev_dist
                    	<= 2 * STEP_HIGH_LIMIT
                    tgt.@is_valid = TRUE
          Fwd_set = SELECT tgt
              FROM Fwd_set:src - (<User_Receive_Transaction:e) - User:tgt
              //tgt must have been touched in the reverse search above
              WHERE tgt.@min_rev_dist < GSQL_INT_MAX
                  AND src.@min_fwd_dist < GSQL_INT_MAX 
                  AND tgt.@min_fwd_dist == GSQL_INT_MAX
              ACCUM tgt.@min_fwd_dist += src.@min_fwd_dist + 1               
                  CASE WHEN 
                    tgt.@min_fwd_dist < GSQL_INT_MAX 
                    AND tgt.@min_rev_dist < GSQL_INT_MAX
                    AND tgt.@min_fwd_dist + tgt.@min_rev_dist
                    	<= 2 * STEP_HIGH_LIMIT
                    tgt.@is_valid = TRUE
              HAVING tgt != source_id;

After Phase 2, we have found our circles. There is a Phase 3 which traverses the circles and marks the vertices and edges so that they can be displayed. Figures 5-5 and 5-6 show example results from circle detection, for maximum circle sizes of 4, 5, and 6 transactions. As the circle size limit increases, more circles are found.

Circle detection results for source_id 111 and max_transactions of 4 and 5  respectively.
Figure 4-5. Circle detection results for source_id=111 and max_transactions of 4 and 5, respectively.
Circle detection results for source_id 111 and max_transactions 6
Figure 4-6. Circle detection results for source_id=111 and max_transactions=6

Chapter Summary

Financial fraud is a serious and costly problem that most businesses and all financial institutions must face. Better and faster techniques to detect and stop fraud are needed. We showed that graph data modeling and graph queries are a powerful way to detect suspicious patterns of activity that would have otherwise gone unnoticed. Graph modeling makes it easy to address three key phases for searching for patterns: describing the search, performing the search, and examining the results.

We showed that even with a simple graph schema, queries can be aimed at detecting a variety of different behaviors, such as a high concentration of referral bonuses without a corresponding increase in business, a concentration of transactions from one community to another community, and a high concentration and high dollar value of circular transaction sequences.

In the next chapter, we will offer a systematic approach to analyzing graphs. In particular, we will delve into the rewarding world of graph measures and graph algorithms.

Get Graph-Powered Analytics and Machine Learning with TigerGraph now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.