book

The Art and Science of Analyzing Software Data

Name: The Art and Science of Analyzing Software Data
ISBN: 9780124115439

by Christian Bird, Tim Menzies, Thomas Zimmermann

September 2015

Beginner to intermediate

672 pages

22h 58m

English

Morgan Kaufmann

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
List of Contributors
Chapter 1: Past, Present, and Future of Analyzing Software Data
AbstractAcknowledgments1.1 Definitions1.2 The Past: Origins1.3 Present Day1.4 Conclusion
Part 1: Tutorial-Techniques
Chapter 2: Mining Patterns and Violations Using Concept Analysis
AbstractAcknowledgments2.1 Introduction2.2 Patterns and Blocks2.3 Computing All Blocks2.4 Mining Shopping Carts with Colibri2.5 Violations2.6 Finding Violations2.7 Two Patterns or One Violation?2.8 Performance2.9 Encoding Order2.10 Inlining2.11 Related Work2.12 Conclusions
Chapter 3: Analyzing Text in Software Projects
Abstract3.1 Introduction3.2 Textual Software Project Data and Retrieval3.3 Manual Coding3.4 Automated Analysis3.5 Two Industrial Studies3.6 Summary
Chapter 4: Synthesizing Knowledge from Software Development Artifacts
Abstract4.1 Problem Statement4.2 Artifact Lifecycle Models4.3 Code Review4.4 Lifecycle Analysis4.5 Other Applications4.6 Conclusion

Chapter 5: A Practical Guide to Analyzing IDE Usage Data
AbstractAcknowledgments5.1 Introduction5.2 Usage Data Research Concepts5.3 How to Collect Data5.4 How to Analyze Usage Data5.5 Limits of What You Can Learn from Usage Data5.6 Conclusion5.7 Code Listings
Chapter 6: Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
Abstract6.1 Introduction6.2 Applications of LDA in Software Analysis6.3 How LDA Works6.4 LDA Tutorial6.5 Pitfalls and Threats to Validity6.6 Conclusions
Chapter 7: Tools and Techniques for Analyzing Product and Process Data
Abstract7.1 Introduction7.2 A Rational Analysis Pipeline7.3 Source Code Analysis7.4 Compiled Code Analysis7.5 Analysis of Configuration Management Data7.6 Data Visualization7.7 Concluding Remarks
Part 2: Data/Problem Focussed
Chapter 8: Analyzing Security Data
Abstract8.1 Vulnerability8.2 Security Data “Gotchas”8.3 Measuring Vulnerability Severity8.4 Method of Collecting and Analyzing Vulnerability Data8.5 What Security Data has Told Us Thus Far8.6 Summary
Chapter 9: A Mixed Methods Approach to Mining Code Review Data: Examples and a Study of Multicommit Reviews and Pull Requests
Abstract9.1 Introduction9.2 Motivation for a Mixed Methods Approach9.3 Review Process and Data9.4 Quantitative Replication Study: Code Review on Branches9.5 Qualitative Approaches9.6 Triangulation9.7 Conclusion
Chapter 10: Mining Android Apps for Anomalies
AbstractAcknowledgments10.1 Introduction10.2 Clustering Apps by Description10.3 Identifying Anomalies by APIs10.4 Evaluation10.5 Related Work10.6 Conclusion and Future Work
Chapter 11: Change Coupling Between Software Artifacts: Learning from Past Changes
Abstract11.1 Introduction11.2 Change Coupling11.3 Change Coupling Identification Approaches11.4 Challenges in Change Coupling Identification11.5 Change Coupling Applications11.6 Conclusion
Part 3: Stories from the Trenches
Chapter 12: Applying Software Data Analysis in Industry Contexts: When Research Meets Reality
Abstract12.1 Introduction12.2 Background12.3 Six Key Issues when Implementing a Measurement Program in Industry12.4 Conclusions
Chapter 13: Using Data to Make Decisions in Software Engineering: Providing a Method to our Madness
Abstract13.1 Introduction13.2 Short History of Software Engineering Metrics13.3 Establishing Clear Goals13.4 Review of Metrics13.5 Challenges with Data Analysis on Software Projects13.6 Example of Changing Product Development Through the Use of Data13.7 Driving Software Engineering Processes with Data
Chapter 14: Community Data for OSS Adoption Risk Management
AbstractAcknowledgments14.1 Introduction14.2 Background14.3 An Approach to OSS Risk Adoption Management14.4 OSS Communities Structure and Behavior Analysis: The XWiki Case14.5 A Risk Assessment Example: The Moodbile Case14.6 Related Work14.7 Conclusions
Chapter 15: Assessing the State of Software in a Large Enterprise: A 12-Year Retrospective
AbstractAcknowledgments15.1 Introduction15.2 Evolution of the Process and the Assessment15.3 Impact Summary of the State of Avaya Software Report15.4 Assessment Approach and Mechanisms15.5 Data Sources15.6 Examples of Analyses15.7 Software Practices15.8 Assessment Follow-up: Recommendations and Impact15.9 Impact of the Assessments15.10 Conclusions15.11 AppendixAuthor Biographies
Chapter 16: Lessons Learned from Software Analytics in Practice
Abstract16.1 Introduction16.2 Problem Selection16.3 Data Collection16.4 Descriptive Analytics16.5 Predictive Analytics16.6 Road Ahead
Part 4: Advanced Topics
Chapter 17: Code Comment Analysis for Improving Software Quality
Abstract17.1 Introduction17.2 Text Analytics: Techniques, Tools, and Measures17.3 Studies of Code Comments17.4 Automated Code Comment Analysis for Specification Mining and Bug Detection17.5 Studies and Analysis of API Documentation17.6 Future Directions and Challenges
Chapter 18: Mining Software Logs for Goal-Driven Root Cause Analysis
Abstract18.1 Introduction18.2 Approaches to Root Cause Analysis18.3 Root Cause Analysis Framework Overview18.4 Modeling Diagnostics for Root Cause Analysis18.5 Log Reduction18.6 Reasoning Techniques18.7 Root Cause Analysis for Failures Induced by Internal Faults18.8 Root Cause Analysis for Failures due to External Threats18.9 Experimental Evaluations18.10 Conclusions
Chapter 19: Analytical Product Release Planning
AbstractAcknowledgments19.1 Introduction and Motivation19.2 Taxonomy of Data-intensive Release Planning Problems19.3 Information Needs for Software Release Planning19.4 The Paradigm of Analytical Open InnovationAnalysis phaseSynthesize phase19.5 Analytical Release Planning—A Case Study19.6 Summary and Future Research19.7 Appendix: Feature Dependency Constraints
Part 5: Data Analysis at Scale (Big Data)
Chapter 20: Boa: An Enabling Language and Infrastructure for Ultra-Large-Scale MSR Studies
Abstract20.1 Objectives20.2 Getting Started with Boa20.3 Boa’s Syntax and Semantics20.4 Mining Project and Repository Metadata20.5 Mining Source Code with Visitors20.6 Guidelines for Replicable Research20.7 Conclusions20.8 Practice ProblemsProject and Repository Metadata ProblemsSource Code Problems
Chapter 21: Scalable Parallelization of Specification Mining Using Distributed Computing
Abstract21.1 Introduction21.2 Background21.3 Distributed Specification Mining21.4 Implementation and Empirical Evaluation21.5 Related Work21.6 Conclusion and Future Work

Overview

The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science.

The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions.

Presents best practices, hints, and tips to analyze data and apply tools in data science projects
Presents research methods and case studies that have emerged over the past few years to further understanding of software data
Shares stories from the trenches of successful data science initiatives in industry

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

The Human Factor in AI-Based Decision-Making

Publisher Resources

ISBN: 9780124115439

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills