Build Your Own Programming Language

Book description

Written by the creator of the Unicon programming language, this book will show you how to implement programming languages to reduce the time and cost of creating applications for new or specialized areas of computing

Key Features

  • Reduce development time and solve pain points in your application domain by building a custom programming language
  • Learn how to create parsers, code generators, file readers, analyzers, and interpreters
  • Create an alternative to frameworks and libraries to solve domain-specific problems

Book Description

The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software.

In this book, you’ll start with implementing the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you’ll learn how domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. We’ll conclude with how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts where relevant examples are provided in both Unicon and Java so that you can follow the code of your choice of either a very high-level language with advanced features, or a mainstream language.

By the end of this book, you’ll be able to build and deploy your own domain-specific languages, capable of compiling and running programs.

What you will learn

  • Perform requirements analysis for the new language and design language syntax and semantics
  • Write lexical and context-free grammar rules for common expressions and control structures
  • Develop a scanner that reads source code and generate a parser that checks syntax
  • Build key data structures in a compiler and use your compiler to build a syntax-coloring code editor
  • Implement a bytecode interpreter and run bytecode generated by your compiler
  • Write tree traversals that insert information into the syntax tree
  • Implement garbage collection in your language

Who this book is for

This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate-level knowledge and experience working with a high-level language such as Java or the C++ language are expected to help you get the most out of this book.

Table of contents

  1. Build Your Own Programming Language
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Code in Action
    6. Download the color images
    7. Conventions used
    8. Get in touch
    9. Share Your Thoughts
  6. Section 1: Programming Language Frontends
  7. Chapter 1: Why Build Another Programming Language?
    1. So, you want to write your own programming language…
      1. Types of programming language implementations
      2. Organizing a bytecode language implementation
      3. Languages used in the examples
    2. Language versus library – what's the difference?
    3. Applicability to other software engineering tasks
    4. Establishing the requirements for your language
    5. Case study – requirements that inspired the Unicon language
      1. Unicon requirement #1 – preserve what people love about Icon
      2. Unicon requirement #2 – support large-scale programs working on big data
      3. Unicon requirement #3 – high-level input/output for modern applications
      4. Unicon requirement #4 – provide universally implementable system interfaces
    6. Summary
    7. Questions
  8. Chapter 2: Programming Language Design
    1. Determining the kinds of words and punctuation to provide in your language
    2. Specifying the control flow
    3. Deciding on what kinds of data to support
      1. Atomic types
      2. Composite types
      3. Domain-specific types
    4. Overall program structure
    5. Completing the Jzero language definition
    6. Case study – designing graphics facilities in Unicon
      1. Language support for 2D graphics
      2. Adding support for 3D graphics
    7. Summary
    8. Questions
  9. Chapter 3: Scanning Source Code
    1. Technical requirements
    2. Lexemes, lexical categories, and tokens
    3. Regular expressions
      1. Regular expression rules
      2. Regular expression examples
    4. Using UFlex and JFlex
      1. Header section
      2. Regular expressions section
      3. Writing a simple source code scanner
      4. Running your scanner
      5. Tokens and lexical attributes
      6. Expanding our example to construct tokens
    5. Writing a scanner for Jzero
      1. The Jzero flex specification
      2. Unicon Jzero code
      3. Java Jzero code
      4. Running the Jzero scanner
    6. Regular expressions are not always enough
    7. Summary
    8. Questions
  10. Chapter 4: Parsing
    1. Technical requirements
    2. Analyzing syntax
    3. Understanding context-free grammars
      1. Writing context-free grammar rules
      2. Writing rules for programming constructs
    4. Using iyacc and BYACC/J
      1. Declaring symbols in the header section
      2. Putting together the yacc context-free grammar section
      3. Understanding yacc parsers
      4. Fixing conflicts in yacc parsers
      5. Syntax error recovery
      6. Putting together a toy example
    5. Writing a parser for Jzero
      1. The Jzero lex specification
      2. The Jzero yacc specification
      3. Unicon Jzero code
      4. Java Jzero parser code
      5. Running the Jzero parser
    6. Improving syntax error messages
      1. Adding detail to Unicon syntax error messages
      2. Adding detail to Java syntax error messages
      3. Using Merr to generate better syntax error messages
    7. Summary
    8. Questions
  11. Chapter 5: Syntax Trees
    1. Technical requirements
    2. Using GNU make
    3. Learning about trees
      1. Defining a syntax tree type
      2. Parse trees versus syntax trees
    4. Creating leaves from terminal symbols
      1. Wrapping tokens in leaves
      2. Working with YACC's value stack
      3. Wrapping leaves for the parser's value stack
      4. Determining which leaves you need
    5. Building internal nodes from production rules
      1. Accessing tree nodes on the value stack
      2. Using the tree node factory method
    6. Forming syntax trees for the Jzero language
    7. Debugging and testing your syntax tree
      1. Avoiding common syntax tree bugs
      2. Printing your tree in a text format
      3. Printing your tree using dot
    8. Summary
    9. Questions
  12. Section 2: Syntax Tree Traversals
  13. Chapter 6: Symbol Tables
    1. Technical requirements
    2. Establishing the groundwork for symbol tables
      1. Declarations and scopes
      2. Assigning and dereferencing variables
      3. Choosing the right tree traversal for the job
    3. Creating and populating symbol tables for each scope
      1. Adding semantic attributes to syntax trees
      2. Defining classes for symbol tables and symbol table entries
      3. Creating symbol tables
      4. Populating symbol tables
      5. Synthesizing the isConst attribute
    4. Checking for undeclared variables
      1. Identifying the bodies of methods
      2. Spotting uses of variables within method bodies
    5. Finding redeclared variables
      1. Inserting symbols into the symbol table
      2. Reporting semantic errors
    6. Handling package and class scopes in Unicon
      1. Mangling names
      2. Inserting self for member variable references
      3. Inserting self as the first parameter in method calls
    7. Testing and debugging symbol tables
    8. Summary
    9. Questions
  14. Chapter 7: Checking Base Types
    1. Technical requirements
    2. Type representation in the compiler
      1. Defining a base class for representing types
      2. Subclassing the base class for complex types
    3. Assigning type information to declared variables
      1. Synthesizing types from reserved words
      2. Inheriting types into a list of variables
    4. Determining the type at each syntax tree node
      1. Determining the type at the leaves
      2. Calculating and checking the types at internal nodes
    5. Runtime type checks and type inference in Unicon
    6. Summary
    7. Questions
  15. Chapter 8: Checking Types on Arrays, Method Calls, and Structure Accesses
    1. Technical requirements
    2. Checking operations on array types
      1. Handling array variable declarations
      2. Checking types during array creation
      3. Checking types during array accesses
    3. Checking method calls
      1. Calculating the parameters and return type information
      2. Checking the types at each method call site
      3. Checking the type at return statements
    4. Checking structured type accesses
      1. Handling instance variable declarations
      2. Checking types at instance creation
      3. Checking types at instance accesses
    5. Summary
    6. Questions
  16. Chapter 9: Intermediate Code Generation
    1. Technical requirements
    2. Preparing to generate code
      1. Why generate intermediate code?
      2. Learning about the memory regions in the generated program
      3. Introducing data types for intermediate code
      4. Adding the intermediate code attributes to the tree
      5. Generating labels and temporary variables
    3. An intermediate code instruction set
      1. Instructions
      2. Declarations
    4. Annotating syntax trees with labels for control flow
    5. Generating code for expressions
    6. Generating code for control flow
      1. Generating label targets for condition expressions
      2. Generating code for loops
      3. Generating intermediate code for method calls
      4. Reviewing the generated intermediate code
    7. Summary
  17. Chapter 10: Syntax Coloring in an IDE
    1. Downloading the example IDEs used in this chapter
    2. Integrating a compiler into a programmer's editor
      1. Analyzing source code from within the IDE
      2. Sending compiler output to the IDE
    3. Avoiding reparsing the entire file on every change
    4. Using lexical information to colorize tokens
      1. Extending the EditableTextList component to support color
      2. Coloring individual tokens as they are drawn
    5. Highlighting errors using parse results
    6. Adding Java support
    7. Summary
  18. Section 3: Code Generation and Runtime Systems
  19. Chapter 11: Bytecode Interpreters
    1. Technical requirements
    2. Understanding what bytecode is
    3. Comparing bytecode with intermediate code
    4. Building a bytecode instruction set for Jzero
      1. Defining the Jzero bytecode file format
      2. Understanding the basics of stack machine operation
    5. Implementing a bytecode interpreter
      1. Loading bytecode into memory
      2. Initializing the interpreter state
      3. Fetching instructions and advancing the instruction pointer
      4. Instruction decoding
      5. Executing instructions
      6. Starting up the Jzero interpreter
    6. Writing a runtime system for Jzero
    7. Running a Jzero program
    8. Examining iconx, the Unicon bytecode interpreter
      1. Understanding goal-directed bytecode
      2. Leaving type information in at runtime
      3. Fetching, decoding, and executing instructions
      4. Crafting the rest of the runtime system
    9. Summary
    10. Questions
  20. Chapter 12: Generating Bytecode
    1. Technical requirements
    2. Converting intermediate code to Jzero bytecode
      1. Adding a class for bytecode instructions
      2. Mapping intermediate code addresses to bytecode addresses
      3. Implementing the bytecode generator method
      4. Generating bytecode for simple expressions
      5. Generating code for pointer manipulation
      6. Generating bytecode for branches and conditional branches
      7. Generating code for method calls and returns
      8. Handling labels and other pseudo-instructions in intermediate code
    3. Comparing bytecode assembler with binary formats
      1. Printing bytecode in assembler format
      2. Printing bytecode in binary format
    4. Linking, loading, and including the runtime system
    5. Unicon example – bytecode generation in icont
    6. Summary
    7. Questions
  21. Chapter 13: Native Code Generation
    1. Technical requirements
    2. Deciding whether to generate native code
    3. Introducing the x64 instruction set
      1. Adding a class for x64 instructions
      2. Mapping memory regions to x64 register-based address modes
    4. Using registers
      1. Starting from a null strategy
      2. Assigning registers to speed up the local region
    5. Converting intermediate code to x64 code
      1. Mapping intermediate code addresses to x64 locations
      2. Implementing the x64 code generator method
      3. Generating x64 code for simple expressions
      4. Generating code for pointer manipulation
      5. Generating native code for branches and conditional branches
      6. Generating code for method calls and returns
      7. Handling labels and pseudo-instructions
    6. Generating x64 output
      1. Writing the x64 code in assembly language format
      2. Going from native assembler to an object file
      3. Linking, loading, and including the runtime system
    7. Summary
    8. Questions
  22. Chapter 14: Implementing Operators and Built-In Functions
    1. Implementing operators
      1. Asking whether operators imply hardware support and vice versa
      2. Adding String concatenation to intermediate code generation
      3. Adding String concatenation to the bytecode interpreter
      4. Adding String concatenation to the native runtime system
    2. Writing built-in functions
      1. Adding built-in functions to the bytecode interpreter
      2. Writing built-in functions for use with the native code implementation
    3. Integrating built-ins with control structures
    4. Developing operators and functions for Unicon
      1. Writing operators in Unicon
      2. Developing Unicon's built-in functions
    5. Summary
    6. Questions
  23. Chapter 15: Domain Control Structures
    1. Knowing when you need a new control structure
      1. Defining what a control structure is
      2. Reducing excessive redundant parameters
    2. Scanning strings in Icon and Unicon
      1. Scanning environments and their primitive operations
      2. Eliminating excessive parameters via a control structure
    3. Rendering regions in Unicon
      1. Rendering 3D graphics from a display list
      2. Specifying rendering regions using built-in functions
      3. Varying graphical levels of detail using nested rendering regions
      4. Creating a rendering region control structure
    4. Summary
    5. Questions
  24. Chapter 16: Garbage Collection
    1. Appreciating the importance of garbage collection
    2. Counting references to objects
      1. Adding reference counting to Jzero
      2. Generating code for heap allocation
      3. Modifying the generated code for the assignment operator
      4. Considering the drawbacks and limitations of reference counting
    3. Marking live data and sweeping the rest
      1. Organizing heap memory regions
      2. Traversing the basis to mark live data
      3. Reclaiming live memory and placing it into contiguous chunks
    4. Summary
    5. Questions
  25. Chapter 17: Final Thoughts
    1. Reflecting on what was learned from writing this book
    2. Deciding where to go from here
      1. Studying programming language design
      2. Learning about implementing interpreters and bytecode machines
      3. Acquiring expertise in code optimization
      4. Monitoring and debugging program executions
      5. Designing and implementing IDEs and GUI builders
    3. Exploring references for further reading
      1. Studying programming language design
      2. Learning about implementing interpreters and bytecode machines
      3. Acquiring expertise in native code and code optimization
      4. Monitoring and debugging program executions
      5. Designing and implementing IDEs and GUI builders
    4. Summary
  26. Section 4: Appendix
  27. Appendix: Unicon Essentials
    1. Running Unicon
    2. Using Unicon's declarations and data types
      1. Declaring different kinds of program components
      2. Using atomic data types
      3. Organizing multiple values using structure types
    3. Evaluating expressions
      1. Forming basic expressions using operators
      2. Invoking procedures, functions, and methods
      3. Iterating and selecting what and how to execute
      4. Generators
    4. Debugging and environmental issues
      1. Learning the basics of the UDB debugger
      2. Environment variables
      3. Preprocessor
    5. Function mini-reference
    6. Selected keywords
  28. Assessments
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 11
    10. Chapter 12
    11. Chapter 13
    12. Chapter 14
    13. Chapter 15
    14. Chapter 16
    15. Why subscribe?
  29. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Build Your Own Programming Language
  • Author(s): Clinton L. Jeffery
  • Release date: December 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781800204805