Chapter 6. Fraud Detection

Generating relational data as in the previous chapter is very useful, but there are also many cases where the samplers packaged into the base log-synth system are simply not sufficient for simulating certain kinds of data.

This is particularly true when we need to generate stateful transactional histories, largely because maintaining and modifying the state is really much easier with the capabilities of a full programming language to draw on. Common use cases where this approach is needed include network monitoring (different sources will produce different kinds of events), marine or air position tracking (different craft often have to have tracks that reflect realistic physics), and financial transaction generation (different consumers behave differently and may change their behavior). The key aspect of all stateful transaction streams is that the next transaction depends on current state and the current state changes as a result of transactions. We specifically chose to not have a real programming language of log-synth schema definitions, so you need something more powerful.

Simply defining a new sampler in Java and then referencing that sampler in a schema can handle these stateful use cases fairly easily, however. Log-synth already includes one such sampler, the common-point-of-compromise, and you can use it as a template to design other extensions. But first, we describe how this sampler works and how it was used in the real-world fraud example ...

Get Sharing Big Data Safely now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.