Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics.
At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community’s leaders gathered to share hard-won lessons from the trenches.
Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid.
- Presents the wisdom of community experts, derived from a summit on software analytics
- Provides contributed chapters that share discrete ideas and technique from the trenches
- Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data
- Presented in clear chapters designed to be applicable across many domains
Table of contents
- Cover image
- Title page
- Table of Contents
- Perspectives on data science for software engineering
- Software analytics and its application in practice
Seven principles of inductive software engineering: What we do is different
- Different and Important
- Principle #1: Humans Before Algorithms
- Principle #2: Plan for Scale
- Principle #3: Get Early Feedback
- Principle #4: Be Open Minded
- Principle #5: Be smart with your learning
- Principle #6: Live With the Data You Have
- Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit
- The need for data analysis patterns (in software engineering)
- From software data to software theory: The path less traveled
- Why theory matters
- Mining apps for anomalies
- Embrace dynamic artifacts
- Mobile app store analytics
- The naturalness of software
- Advances in release readiness
- How to tame your online services
- Measuring individual productivity
- Stack traces reveal attack surfaces
- Visual analytics for software engineering data
- Gameplay data plays nicer when divided into cohorts
- A success story in applying data science in practice
- There's never enough time to do all the testing you want
- The perils of energy mining: measure a bunch, compare just once
- Identifying fault-prone files in large industrial software systems
- A tailored suit: The big opportunity in personalizing issue tracking
- What counts is decisions, not numbers—Toward an analytics design sheet
- A large ecosystem study to understand the effect of programming languages on code quality
- Code reviews are not for finding defects—Even established tools need occasional evaluation
- Look for state transitions in temporal data
- Card-sorting: From text to themes
- Tools! Tools! We need tools!
- Evidence-based software engineering
Which machine learning method do you need?
- Learning Styles
- Do additional Data Arrive Over Time?
- Are Changes Likely to Happen Over Time?
- If You Have a Prediction Problem, What Do You Really Need to Predict?
- Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive?
- Are Your Data Imbalanced?
- Do You Need to Use Data From Different Sources?
- Do You Have Big Data?
- Do You Have Little Data?
- In Summary…
- Structure your unstructured data first!: The case of summarizing unstructured data with tag clouds
Parse that data! Practical tips for preparing your raw data for analysis
- Use Assertions Everywhere
- Print Information About Broken Records
- Use Sets or Counters to Store Occurrences of Categorical Variables
- Restart Parsing in the Middle of the Data Set
- Test on a Small Subset of Your Data
- Redirect Stdout and Stderr to Log Files
- Store Raw Data Alongside Cleaned Data
- Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data
- Natural language processing is no free lunch
- Aggregating empirical evidence for more trustworthy decisions
- If it is software engineering, it is (probably) a Bayesian factor
- Becoming Goldilocks: Privacy and data sharing in “just right” conditions
- The wisdom of the crowds in predictive modeling for software engineering
- Combining quantitative and qualitative methods (when mining software data)
- A process for surviving survey design and sailing through survey deployment
- Log it all?
- Why provenance matters
- Open from the beginning
- Reducing time to insight
- Five steps for success: How to deploy data science in your organizations
- How the release process impacts your software analytics
- Security cannot be measured
- Gotchas from mining bug reports
- Make visualization part of your analysis process
- Don't forget the developers! (and be careful with your assumptions)
- Limitations and context of research
- Actionable metrics are better metrics
- Replicated results are more trustworthy
- Diversity in software engineering research
- Once is not enough: Why we need replication
- Mere numbers aren't enough: A plea for visualization
- Don’t embarrass yourself: Beware of bias in your data
- Operational data are missing, incorrect, and decontextualized
- Data science revolution in process improvement and assessment?
- Correlation is not causation (or, when not to scream “Eureka!”)
- Software analytics for small software companies: More questions than answers
- Software analytics under the lamp post (or what star trek teaches us about the importance of asking the right questions)
What can go wrong in software engineering experiments?
- Operationalize Constructs
- Evaluate Different Design Alternatives
- Match Data Analysis and Experimental Design
- Do Not Rely on Statistical Significance Alone
- Do a Power Analysis
- Find Explanations for Results
- Follow Guidelines for Reporting Experiments
- Improving the reliability of experimental results
- One size does not fit all
- While models are good, simple explanations are better
- The white-shirt effect: Learning from failed expectations
- Simpler questions can lead to better insights
- Continuously experiment to assess values early on
Lies, damned lies, and analytics: Why big data needs thick data
- How Great It Is, to Have Data Like You
- Looking for Answers in All the Wrong Places
- Beware the Reality Distortion Field
- Build It and They Will Come, but Should We?
- To Classify Is Human, but Analytics Relies on Algorithms
- Lean in: How Ethnography Can Improve Software Analytics and Vice Versa
- Finding the Ethnographer Within
- The world is your test suite
- Title: Perspectives on Data Science for Software Engineering
- Release date: July 2016
- Publisher(s): Morgan Kaufmann
- ISBN: 9780128042618
You might also like
Data Architecture: A Primer for the Data Scientist, 2nd Edition
Over the past 5 years, the concept of big data has matured, data science has grown …
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used …
Data Science from Scratch, 2nd Edition
To really learn data science, you should not only master the tools—data science libraries, frameworks, modules, …
Storytelling with Data: A Data Visualization Guide for Business Professionals
Don't simply show your data—tell a story with it! Storytelling with Data teaches you the fundamentals …