O'Reilly logo

Sharing Big Data Safely by Ellen Friedman, Ted Dunning

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. Fraud Detection

Generating relational data as in the previous chapter is very useful, but there are also many cases where the samplers packaged into the base log-synth system are simply not sufficient for simulating certain kinds of data.

This is particularly true when we need to generate stateful transactional histories, largely because maintaining and modifying the state is really much easier with the capabilities of a full programming language to draw on. Common use cases where this approach is needed include network monitoring (different sources will produce different kinds of events), marine or air position tracking (different craft often have to have tracks that reflect realistic physics), and financial transaction generation (different consumers behave differently and may change their behavior). The key aspect of all stateful transaction streams is that the next transaction depends on current state and the current state changes as a result of transactions. We specifically chose to not have a real programming language of log-synth schema definitions, so you need something more powerful.

Simply defining a new sampler in Java and then referencing that sampler in a schema can handle these stateful use cases fairly easily, however. Log-synth already includes one such sampler, the common-point-of-compromise, and you can use it as a template to design other extensions. But first, we describe how this sampler works and how it was used in the real-world fraud example ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required