Appendix A. Additional Resources

Many of the following resources were mentioned in the text; others provide additional options for digging deeper into the topics discussed in this book.

Log-synth Open Source Software

Log-synth is open source software that gives you a simple way to generate synthetic data shaped to the needs of your project. It can generate a wide variety of kinds of data and it’s fast and very flexible.

Please note that not only is log-synth open source, but it is open community as well—contributions are very welcome.

  • Log-synth on Github: Site includes software with various prepackaged samplers, extensions related to the fraud detection use case, and documentation. http://bit.ly/tdunning-log-synth

  • Sample code for this book on Github: Sharing Data Safely: Managing Big Data Security, Chapters 5 and 6. http://bit.ly/log-synth-share-data

    Site includes the source code for the example from Chapter 5 about building sample relational data. Also included is source code that shows how to generate and analyze data for the single point of compromise fraud model described in Chapter 6.

  • “Realistic Fake Data” whiteboard walkthrough video by Ted Dunning: https://www.mapr.com/log-synth

Apache Drill and Drill SQL Views

Apache Drill is an open source, open community Apache project that provides a highly scalable, highly flexible SQL query engine for data stored in Apache Hadoop distributions, MongoDB, Apache HBase, MapR-DB, and more.

You’re invited to get active in the ...

Get Sharing Big Data Safely now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.