Preface
The future of data science and artificial intelligence has never looked brighter. AI now beats humans at games ranging from twitchy, reflexive Pong to deep, contemplative Go. Deep learning models recognize objects nearly as well as humans. Some even say self-driving cars perform better than their distracted human counterparts. The past decadeâs massive gains in data volume, storage capacity, and computing power have enabled rapid advances in data science.
And of course technology has spread into every facet of your business (from finance and sales to production and logistics). However, is each part of your business turbocharged by data science and AI? Likely not. As wonderful as they might be, if you are not designing a self-driving car or predicting customer behavior, you are probably not using these technologies.
Many organizations may have access to business data from an enterprise resource planning (ERP) system such as SAP, and yours is likely no different. Data coming from a business system such as SAP is largely perfect as often validations and checks are in place before it is allowed to save to the database (and, one of the most essential and least rewarding tasks of a data scientist is cleaning the data). This means ERP data in SAP is ripe for the picking, and data science is here to do the harvesting!
Letâs take a hypothetical scenario. The SAP Team at Big Bonanza Warehouse is in a constant state of process improvement. They know how to configure their SAP system to do the tasks their users want, and they play that system like a fiddle, dutifully taking requests and delivering solutions. However, there is a bit of a problem with reporting and analytics; they have a data warehouse and a business intelligence system, but developing reports is a multimonth process. The team often resorts to using standard ALV (ABAP List Viewer) reports, which are quite limited in power because they require a developer to code; in addition, it is very hard to harness the wealth of public data that could be used in conjunction with SAP. Just like at countless other enterprises, SAP data at Big Bonanza Warehouse is an island, siloed within its own system. Teams that donât work with SAP have no idea whatâs in there, and the teams that do work with it spend so much time maintaining the systems that they donât get the chance to look outside them.
SAP data shouldnât be an island, though. The team knows their data, how to find it, and what they want to do with it. However, when it comes to analyzing that data, everyoneâs hands are tied by that multimonth report development process.
Sound familiar? Itâs the story at nearly every SAP shop with whom weâve ever worked. And thatâs a lot in our combined 30+ years of experience.
We want to give that SAP team (and yours!) some modern insightâtools and techniques they can use without defining data cubes, data warehouse objects, or learning complex frontend reports. In this book, weâll present simple scenarios such as dumping data straight out of SAP into a flat file and into a reporting tool. This is useful for ad-hoc reporting and investigations. Weâll also consider more complex scenarios, including using extractor tools and neural network models in the cloud to analyze data in ways not possible within SAP or contemporary data warehouses.
How to Read This Book
Youâll need to approach this book from a conceptual point of view. We present alternative techniques for analyzing business data.We askânay, we begâthe reader to think about business data (in particular SAP data) in new and interesting ways. This book is designed to awaken ideas around how to bridge the gap between your particular business data and the advances in data science. You need not be an expert in the complex algorithms that calculate gradient descent in a neural network, nor do you need to be an expert in your business data. But you do need to have a desire to straddle these two camps and have fun in the process.1
From the data scientistâs perspective, the data science principles in this book are an introduction. If you can spot a sigmoid, tanh, or relu activation function at fifty paces, you can skip those parts. But weâre betting that if your guru level is that high in data science, youâre a novice at the SAP stuff. Focus in on the SAP stories, showing you how to pull things out and demonstrating working with the real business data in the system.
From the SAP professionalâs perspective, youâll break out of traditional reporting and analytics models. Youâll learn to think of business applications and reporting in machine and deep learning terms. This may sound mystical, but by the end of the book you will have the tools necessary to take this step. Along the way youâll automatically detect anomalies in sales data, predict the future from past data, process text as natural language, segment customers into smart groups, visualize all these things brilliantly, and teach bots to use business data.
In our world of AI and data science, asking the same old questions of your data is stale, naive, and (quite frankly) boring. We want you to ask questions of your data that you didnât even know you could ask. Maybe the price of tea in China really does have an outsize effect on your sales.
From the developerâs perspective, youâll be inspired to learn wonderful programming languages like Python and R. We donât teach you these languages, but we challenge you to dip your toe into these warm and effervescent waters. If you are already an experienced R or Python developer, youâre in good shape for the code sections. For the novice, we will point you to resources to get you started. Donât feel left out if you are inclined to use another language such as Java. The âmetaâ goal of this book is to get you to think of how to think of business data differently and if that means you want to use Java, by all means do so.
Operationalizing data science is a whole book in itself. Weâll frequently touch on how to operationalize ideas we present, but it is beyond the scope of this book to dive deep on creating robust pipelines.
Tip
Data scientists may be able to skip over Chapter 2. SAP professionals, you might be able to skip Chapter 3. The stories we tell later in the book merge these two disciplines, so we want readers who come from one or the other side to get a fair understanding of how weâll be poking around to work our magic.
Conventions Used in This Book
The following typographical conventions are used in this book:
- Italic
-
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
-
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
-
Shows commands or other text that should be typed literally by the user.
Constant width italic
-
Shows text that should be replaced with user-supplied values or by values determined by context.
Tip
This element signifies a tip or suggestion.
Note
This element signifies a general note.
Warning
This element indicates a warning or caution.
Using Code Examples
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless youâre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OâReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your productâs documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: âPractical Data Science with SAP by Greg Foss and Paul Modderman (OâReilly). Copyright 2019 Greg Foss and Paul Modderman, 978-1-492-04644-8.âÂ
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.
OâReilly Online Learning
Note
For almost 40 years, OâReilly Media has provided technology and business training, knowledge, and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. OâReillyâs online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from OâReilly and 200+ other publishers. For more information, please visit http://oreilly.com.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
- OâReilly Media, Inc.
- 1005 Gravenstein Highway North
- Sebastopol, CA 95472
- 800-998-9938 (in the United States or Canada)
- 707-829-0515 (international or local)
- 707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/practical-data-sci-sap.
To comment or ask technical questions about this book, send email to bookquestions@oreilly.com.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
We would like to thank our technical reviewers Hau Ngo, Jesse Stiff, Franco Rizzo, Brad Barker, and Christoph Wertz for their valuable feedback. Each chapter was made better from their suggestions.
To Nicole, our fearless editor: you helped us stay calm and grounded in the process. Without your editorial guidance, we would have been lost in data scientific meanderings and code ramblings. Thank you for making each thing you touched more readable.
Greg would like to thank his wife Alycia for her patience, support, and insight and his brother Cory for help with the graphics. Of course, a leviathan thanks to Paul Modderman for his vision, ingenuity, and courage to embark on this journey.
Paul would like to thank his partner Christa Modderman for her wisdom and strength, his grandmother Lois Stratmann for the example set by her remarkable life in creative work, and his parents Mark and Linda for...everything. He wishes to acknowledge Tony Vanderpoel, Dean Stoffel, and Gavin Quinn for respectively encouraging, entrusting, and inspiring him to better himself professionally. Largest thanks goes to Greg Foss: a remarkable author who never backed down from a commitment to quality. Eleanor Modderman is and always will be his favorite.
Special thanks to Wade Krzmarzick for help with CRM scenarios.
1 If youâre not the kind of person who has fun with data, how did you find this book?
Get Practical Data Science with SAP now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.