book

Making Software

by Andy Oram, Greg Wilson

October 2010

Beginner to intermediate

624 pages

24h 9m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Making Software
Preface
Organization of This Book
Conventions Used in This Book
Safari® Books Online
Using Code Examples
How to Contact Us
I. General Principles of Searching For and Using Evidence
1. The Quest for Convincing Evidence
In the Beginning

The State of Evidence Today
Challenges to the Elegance of StudiesChallenges to Statistical StrengthChallenges to Replicability of Results
Change We Can Believe In
The Effect of Context
Looking Toward the Future
References
2. Credibility, or Why Should I Insist on Being Convinced?
How Evidence Turns Up in Software Engineering
Credibility and Relevance
Fitness for Purpose, or Why What Convinces You Might Not Convince MeQuantitative Versus Qualitative Evidence: A False Dichotomy
Aggregating Evidence
Limitations and Bias
Types of Evidence and Their Strengths and Weaknesses
Controlled Experiments and Quasi-ExperimentsCredibilityRelevanceSurveysCredibilityRelevanceExperience Reports and Case StudiesCredibilityRelevanceOther MethodsIndications of Credibility (or Lack Thereof) in ReportingGeneral characteristicsA clear research questionAn informative description of the study setupA meaningful and graspable data presentationA transparent statistical analysis (if any)An honest discussion of limitationsConclusions that are solid yet relevant
Society, Culture, Software Engineering, and You
Acknowledgments
References
3. What We Can Learn from Systematic Reviews
An Overview of Systematic Reviews
The Strengths and Weaknesses of Systematic Reviews
The Systematic Review ProcessPlanning the reviewConducting the reviewReporting the reviewProblems Associated with Conducting a Review
Systematic Reviews in Software Engineering
Cost Estimation StudiesThe accuracy of cost estimation modelsThe accuracy of cost estimates in industryAgile MethodsDybå and DingsøyrHannay, Dybå, Arisholm, and SjøbergInspection Methods
Conclusion
References
4. Understanding Software Engineering Through Qualitative Methods
What Are Qualitative Methods?
Reading Qualitative Research
Using Qualitative Methods in Practice
Generalizing from Qualitative Results
Qualitative Methods Are Systematic
References
5. Learning Through Application: The Maturing of the QIP in the SEL
What Makes Software Engineering Uniquely Hard to Research
A Realistic Approach to Empirical Research
The NASA Software Engineering Laboratory: A Vibrant Testbed for Empirical Research
The Quality Improvement Paradigm
CharacterizeSet GoalsSelect ProcessExecute ProcessAnalyzePackage
Conclusion
References
6. Personality, Intelligence, and Expertise: Impacts on Software Development
How to Recognize Good Programmers
Individual Differences: Fixed or MalleablePersonalityIntelligenceThe Task of ProgrammingProgramming PerformanceExpertiseSoftware Effort Estimation
Individual or Environment
Skill or Safety in Software EngineeringCollaborationPersonality AgainA Broader View of Intelligence
Concluding Remarks
References
7. Why Is It So Hard to Learn to Program?
Do Students Have Difficulty Learning to Program?
The 2001 McCracken Working GroupThe Lister Working Group
What Do People Understand Naturally About Programming?
Making the Tools Better by Shifting to Visual Programming
Contextualizing for Motivation
Conclusion: A Fledgling Field
References
8. Beyond Lines of Code: Do We Need More Complexity Metrics?
Surveying Software
Measuring the Source Code
A Sample Measurement
Source Lines of Code (SLOC)Lines of Code (LOC)Number of C FunctionsMcCabe’s Cyclomatic ComplexityHalstead’s Software Science Metrics
Statistical Analysis
Overall AnalysisDifferences Between Header and Nonheader FilesThe Confounding Effect: Influence of File Size in the Intensity of CorrelationEffects of size on correlations for header filesEffects of size on correlations for nonheader filesEffect on the Halstead’s Software Science metricsSummary of the confounding effect of file size
Some Comments on the Statistical Methodology
So Do We Need More Complexity Metrics?
References
Bibliography
II. Specific Topics in Software Engineering
9. An Automated Fault Prediction System
Fault Distribution
Characteristics of Faulty Files
Overview of the Prediction Model
Replication and Variations of the Prediction Model
The Role of DevelopersPredicting Faults with Other Types of Models
Building a Tool
The Warning Label
References
10. Architecting: How Much and When?
Does the Cost of Fixing Software Increase over the Project Life Cycle?
How Much Architecting Is Enough?
Cost-to-Fix Growth Evidence
Using What We Can Learn from Cost-to-Fix Data About the Value of Architecting
The Foundations of the COCOMO II Architecture and Risk Resolution (RESL) FactorEconomies and diseconomies of scaleReducing software rework via architecture and risk resolutionA successful example: CCPDS-RThe Architecture and Risk Resolution Factor in Ada COCOMO and COCOMO IIHow the Ada Process Model promoted risk-driven concurrent engineering software processesArchitecture and risk resolution (RESL) factor in COCOMO IIImprovement shown by incorporating architecture and risk resolutionROI for Software Systems Engineering Improvement Investments
So How Much Architecting Is Enough?
Does the Architecting Need to Be Done Up Front?
Conclusions
References
11. Conway’s Corollary
Conway’s Law
Coordination, Congruence, and Productivity
Implications
Organizational Complexity Within Microsoft
Implications
Chapels in the Bazaar of Open Source Software
Conclusions
References
Bibliography
12. How Effective Is Test-Driven Development?
The TDD Pill—What Is It?
Summary of Clinical TDD Trials
The Effectiveness of TDD
Internal QualityExternal QualityProductivityTest Quality
Enforcing Correct TDD Dosage in Trials
Cautions and Side Effects
Conclusions
Acknowledgments
General References
Clinical TDD Trial References
Bibliography
13. Why Aren’t More Women in Computer Science?
Why So Few Women?
Ability Deficits, Preferences, and Cultural BiasesEvidence for deficits in female mathematical-spatial abilitiesThe role of preferences and lifestyle choicesBiases, Stereotypes, and the Role of Male Computer-Science Culture
Should We Care?
What Can Society Do to Reverse the Trend?Implications of Cross-National Data
Conclusion
References
14. Two Comparisons of Programming Languages
A Language Shoot-Out over a Peculiar Search Algorithm
The Programming Task: PhonecodeComparing Execution SpeedComparing Memory ConsumptionComparing Productivity and Program LengthComparing ReliabilityComparing Program StructureShould I Believe This?
Plat_Forms: Web Development Technologies and Cultures
The Development Task: People-by-TemperamentLay Your BetsComparing ProductivityComparing Artifact SizeComparing ModifiabilityComparing Robustness and SecurityHey, What About <Insert-Your-Favorite-Topic>?
So What?
References
Bibliography
15. Quality Wars: Open Source Versus Proprietary Software
Past Skirmishes
The Battlefield
Into the Battle
File OrganizationCode StructureCode StylePreprocessingData Organization
Outcome and Aftermath
Acknowledgments and Disclosure of Interest
References
Bibliography
16. Code Talkers
A Day in the Life of a Programmer
Diary StudyObservational StudyWere the Programmers on Their Best Behavior?
What Is All This Talk About?
Getting Answers to QuestionsThe Search for RationaleInterruptions and MultitaskingWhat Questions Do Programmers Ask?Are Agile Methods Better for Communication?
A Model for Thinking About Communication
References
Bibliography
17. Pair Programming
A History of Pair Programming
Pair Programming in an Industrial Setting
Industry Practices in Pair ProgrammingResults of Using Pair Programming in Industry
Pair Programming in an Educational Setting
Practices Specific to EducationResults of Using Pair Programming in Education
Distributed Pair Programming
Challenges
Lessons Learned
Acknowledgments
References
18. Modern Code Review
Common Sense
A Developer Does a Little Code Review
Focus FatigueSpeed KillsSize KillsThe Importance of Context
Group Dynamics
Are Meetings Required?False-PositivesAre External Reviewers Required At All?
Conclusion
References
Bibliography
19. A Communal Workshop or Doors That Close?
Doors That Close
A Communal Workshop
Work Patterns
One More Thing…
References
Bibliography
20. Identifying and Managing Dependencies in Global Software Development
Why Is Coordination a Challenge in GSD?
Dependencies and Their Socio-Technical Duality
The Technical DimensionSyntactic dependencies and their impact on productivity and qualityLogical dependencies and their impact on productivity and qualityThe Socio-Organizational DimensionDifferent types of work dependencies and their impacts on productivity and qualityThe Socio-Technical Dimension
From Research to Practice
Leveraging the Data in Software RepositoriesThe Role of Team Leads and Managers in Supporting the Management of DependenciesDevelopers, Work Items, and Distributed Development
Future Directions
Software Architectures Suitable for Global Software DevelopmentCollaborative Software Engineering ToolsBalancing Standarization and Flexibility
References
21. How Effective Is Modularization?
The Systems
What Is a Change?
What Is a Module?
The Results
Change LocalityExamined ModulesEmergent Modularity
Threats to Validity
Summary
References
22. The Evidence for Design Patterns
Design Pattern Examples
Why Might Design Patterns Work?
The First Experiment: Testing Pattern Documentation
Design of the ExperimentResults
The Second Experiment: Comparing Pattern Solutions to Simpler Ones
The Third Experiment: Patterns in Team Communication
Lessons Learned
Conclusions
Acknowledgments
References
23. Evidence-Based Failure Prediction
Introduction
Code Coverage
Code Churn
Code Complexity
Code Dependencies
People and Organizational Measures
Integrated Approach for Prediction of Failures
Summary
Acknowledgments
References
24. The Art of Collecting Bug Reports
Good and Bad Bug Reports
What Makes a Good Bug Report?
Survey Results
Contents of Bug Reports (Developers)Contents of Bug Reports (Reporters)
Evidence for an Information Mismatch
Problems with Bug Reports
The Value of Duplicate Bug Reports
Not All Bug Reports Get Fixed
Conclusions
Acknowledgments
References
Bibliography
25. Where Do Most Software Flaws Come From?
Studying Software Flaws
Context of the Study
Phase 1: Overall Survey
Summary of QuestionnaireSummary of the DataSummary of the Phase 1 Study
Phase 2: Design/Code Fault Survey
The QuestionnaireStatistical AnalysisFinding and fixing faultsFaultsFault Frequency Adjusted by EffortUnderlying causesMeans of preventionUnderlying causes and means of preventionInterface Faults Versus Implementation Faults
What Should You Believe About These Results?
Are We Measuring the Right Things?Did We Do It Right?What Can You Do with the Results?
What Have We Learned?
Acknowledgments
References
26. Novice Professionals: Recent Graduates in a First Software Engineering Job
Study Methodology
SubjectsTask AnalysisTask SampleReflection MethodologyThreats to Validity
Software Development Task
Task BreakdownCommunicationDocumentationWorking on bugsProgrammingProject management and toolsDesign specifications and testing
Strengths and Weaknesses of Novice Software Developers
StrengthsWeaknesses
Reflections
Managing Getting EngagedPersistence, Uncertainty, and NovicenessLarge-Scale Software Team Setting
Misconceptions That Hinder Learning
Reflecting on Pedagogy
Pair ProgrammingLegitimate Peripheral ParticipationMentoring
Implications for Change
New Developer OnboardingEducational Curricula
References
27. Mining Your Own Evidence
What Is There to Mine?
Designing a Study
A Mining Primer
Step 1: Determining Which Data to UseStep 2: Data RetrievalStep 3: Data Conversion (Optional)Step 4: Data ExtractionStep 5: Parsing the Bug ReportsStep 6: Linking Data SetsLinking code changes to bug reportsLinking bug reports to code changes (optional)Step 6: Checking for Missing LinksStep 7: Mapping Bugs to Files
Where to Go from Here
Acknowledgments
References
28. Copy-Paste as a Principled Engineering Tool
An Example of Code Cloning
Detecting Clones in Software
Investigating the Practice of Code Cloning
ForkingTemplatingCustomizing
Our Study
Conclusions
References
29. How Usable Are Your APIs?
Why Is It Important to Study API Usability?
First Attempts at Studying API Usability
Study DesignSummary of Findings from the First Study
If At First You Don’t Succeed...
Design of the Second StudySummary of Findings from the Second StudyCognitive Dimensions
Adapting to Different Work Styles
Scenario-Based Design
Conclusion
References
30. What Does 10x Mean? Measuring Variations in Programmer Productivity
Individual Productivity Variation in Software Development
Extremes in Individual Variation on the Bad SideWhat Makes a Real 10x Programmer
Issues in Measuring Productivity of Individual Programmers
Productivity in Lines of Code per Staff MonthProductivity in Function PointsWhat About Complexity?Is There Any Way to Measure Individual Productivity?
Team Productivity Variation in Software Development
References
A. Contributors
Index
About the Authors
Colophon
Copyright

Content preview from Making Software

Detecting Clones in Software

The problem of detecting code clones is an interesting one technically; if an existing code fragment is duplicated and then changed to fit a new purpose, how can you recognize this with an automated tool? If all code clones were verbatim copies that were never subsequently altered—and some clones do fit this description—then code clone detection would be pretty easy. However, usually clones are adapted for new uses: existing lines are changed or removed, and new code may be added. Thus, it’s probably a good idea to pause at this point and ask the question: just what is a software clone?

Well, first we should note that almost all clone detection techniques actually measure similarity of code chunks. That is, we typically don’t have access to logs that record actual copy-paste edit events; we just infer they happened if the similarity measures are within a given threshold.

Second, there is no consensus on what similarity means concretely or what thresholds are reasonable. (Ira Baxter, a researcher in the community, likes to say, “Software clones are segments of code that are similar…according to some definition of similarity.”) Detection tools use a wide variety of techniques. Some tools treat programs as a sequence of character strings, and perform textual comparisons. Other tools compare token streams, abstract syntax trees (ASTs), and program dependence graphs (PDGs). Some compute metrics or compare lightweight semantic models of program components. And ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9780596808310Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Making Software

by Andy Oram, Greg Wilson

Detecting Clones in Software

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Righting Software

How Software Works

Semantic Software Design

Design It!

Publisher Resources