Use CodeLlama to Develop a Fraud Detection Program

The first step to identify potential fraud is to perform an initial screening of the dataset. A simple but effective way to perform this screening is to use Benford’s law, or the “law of first digits.” Hal Varian has pointed out that Benford’s law has been applied to detect fraud in accounting data.

According to this law, the frequency distribution of the leading digits in a dataset follows a power-law distribution where leading digits with smaller numbers are more likely to occur. More formally, according to the Benford’s law, the leading digit in a dataset should occur with a probability P(d), calculated as P(d) = log10(d+1) - log10(d). So, for example, the leading digit 1 should occur approximately 30% of the time, the digit 2 should occur about 17.5% of the time, and so on. If a dataset follows this law, any given number is six times more likely to start with 1 than 9:

Figure 1. Distribution of the first digits according to the Benford Law. Each bar represents the expected frequency of that digit in the first position of the number. Source: Wikipedia.

Benford’s law can be used as an initial screening mechanism to identify datasets that require further investigation. If the distribution of first digits in a dataset significantly deviates from what Bendord’s ...

Get Use CodeLlama to Develop a Fraud Detection Program now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.