Chapter 1. Code Generation
Artificial intelligence can significantly amplify productivity and creativity in code generation and autocompletion. This chapter explores how AI-driven tools are redefining the coding experience, transforming a time-intensive manual process into an interactive, efficient, and error-reducing endeavor.
The advent of AI in code generation is not merely about accelerating developers’ typing speed; it’s about understanding the context of their work, suggesting relevant code snippets, and even generating complex code blocks with minimal inputs. These tools, powered by sophisticated machine-learning algorithms, have the ability to learn from vast repositories of code available in public and private databases to continuously improve their suggestions and accuracy.
I will examine how a software engineer can go from doing 100% of the work in a given software-development task to becoming a reviewer of the contributions provided by AI tools. This entails ensuring proper input about you require from these tools and thoroughly revising the outputs to make sure the deliverable fulfills the requirements.
These AI tools are powerful and impressive, and it’s easy to fall into the trap of using their output without proper precautions–for instance, opening a pull request or pushing code to production without validating how and why the code works. That careless approach carries two important risks:
- Outdated code
-
Most AI tools are trained on dated training data, which means they may suggest outdated frameworks or functionalities.
- Wrong answers
-
LLMs, the technology underlying all these tools, sometimes generate what are commonly described as “hallucinations.” That means their output may include false statements, bugs, or code functions or API endpoints that don’t exist.
Software engineers and developers must use AI tools to help them work better and faster, but not to replace their own judgment, much as we do with the autocomplete functionality that has become popular in most integrated development environments (IDEs). It helps a lot to simply hit the tab key instead of typing every character, of course–but autocomplete suggestions range from perfectly relevant to useless. It’s up to your judgment whether to use or discard them.
The AI tools I cover in this chapter require the same constant assessment. Many times, the code these tools generate will work and fit the task requirements flawlessly. In other cases, it will be only partially complete or will contain bugs, performance issues, or some other flaw that must be revised. It’s your job to use, discard, or revise it.
Types of Code Generation Tools
The AI tools reviewed for this chapter fall into two main categories, whose usage in software development differs slightly:
- Browser-based tools
-
With these tools, such as ChatGPT, you can log in and interact with the model right there in your browser. There’s no activity happening on your local computer, just an interaction with a website over the internet. These tools are easy to use and adapt well to more use cases, but their biggest con is the limited context window. You must manually type or copy/paste context into the prompt for each interaction, which is limiting when you’re dealing with large codebases or pieces of documentation.
- IDE-based tools
-
These tools, such as GitHub Copilot, work as plugins installed in the IDE you use to write code on your local computer. Once installed, they become embedded in your software development experience, in the actual environment where you write code. Their biggest pro is the large context window: these tools can ingest a whole codebase as context for each interaction.
Use Cases
Millions of software engineers are adopting AI tools to support their daily tasks. Perhaps the five most prominent use cases where these tools influence development are:
- Generating code snippets
-
Instead of typing in every single word and function in a codebase, you provide the AI tool with specific requirements that the code should fulfill. It outputs ready-to-use code in any of the most popular programming languages (such as Java, Python, PHP, or Javascript). This can speed up prototyping as well as the development process. The tools described in this chapter can generate code for a wide range of applications, including web development, data analysis, automation scripts or mobile applications. In general, this use case is one where AI helps bridge the gap between conceptualization and implementation and makes technology development more accessible and efficient.
- Debugging
-
This use case is especially valuable because debugging can often be a time-consuming and frustrating part of software development. These AI tools analyze error messages and problematic code snippets and suggest specific changes or improvements. This not only saves time but also serves as an educational tool, enhancing your debugging skills over time. Furthermore, some tools (like ChatGPT) can explain why certain errors occur and sometimes even the architectural tradeoffs implied in avoiding them. This deeper understanding of common pitfalls in software development is a key reason why so many developers use this tool as their coding assistant.
- Accelerating learning
-
AI tools can serve as instructors if you’re trying to get up to speed in a technology stack you aren’t proficient in, learn a new programming language or framework, or understand specific implementation details, like adding indexes to a table in a MySQL database or pulling last month’s transactions from the Stripe API. They can provide tutorials, examples, and concise summaries of documentation for a wide range of technologies. This educational interaction with AI tools can speed your learning progress regardless of the specific technology or the scope of what you’re learning.
- Optimizing code
-
Many software engineers use AI tools to review code and make it more efficient, readable, and maintainable. This includes recommendations for refactoring code, using more efficient algorithms, or applying best practices for performance or security. Code optimization is an ongoing challenge and can be easy to forget about. Eventually, though, all that suboptimal code piles up into huge technical debt that eventually will need to be refactored across the codebase on a large and thus very costly scope. Using AI tools to review code on a task level can make a significant impact on the quality of the overall codebase.
- Automating documentation
-
Documentation is essential for maintaining and understanding software projects, yet developers often overlook or underprioritize it. Some AI tools can generate documentation, including in-line comments and details about functions, classes, and modules. This saves time and also ensures that documentation is consistently updated alongside the codebase. By providing clear, comprehensive documentation, AI tools helps improve code readability and makes it easier for teams to collaborate. This use case is particularly beneficial when used in large teams or on open-source projects, where clear documentation is crucial for enabling other developers to contribute effectively. Automating documentation also enhances projects’ maintainability and facilitates better knowledge transfer within development teams.
Evaluation Process
I evaluated more than 50 AI tools in order to shortlist the ones I highlight in this chapter. Every tool covered here meets the following criteria:
-
It is a professional project with a competent team behind it
-
The code it generates has a high quality threshold
-
It offers some level of functionality for free or on a trial basis
-
It has a high level of adoption at the time of writing (early 2024)
My process in this chapter was as follows: I submitted a brief code challenge to each of the selected code tools, ran the same challenge several times on each tool, and compared their output. I then gave each tool a rating on a scale from 1 to 10, with 1 being the worst– a solution that errors out and doesn’t run at all– and 10 being a flawless solution. A 5 would be a solution that runs but solves only part of the problem. I look closely at the top product in each category, detailing its pros and cons, then provide some more concise information on the runner-up.
It’s also important to note that all tests described in this chapter were run in March 2024. Given the fast pace of evolution of each of these tools and underlying models, it’s likely that you could get a different result at a later time for the same prompt.
Browser-based Tools
This chapter will look first at browser-based AI tools, then at IDE-based tools.
ChatGPT
ChatGPT is an artificial intelligence developed by OpenAI and powered by its GPT-3.5 architecture. Imagine it as a multitool for software engineers, offering a broad range of functions from conversational engagement to intricate problem-solving, way beyond the specific scope I discuss in this particular chapter (generating software code).
As described in OpenAI’s website, ChatGPT is like a highly intelligent virtual assistant that understands the nuances of human language and can generate text that feels as if it were written by a human. It’s an example of modern natural language processing (NLP) technology. It has been meticulously trained on a wide array of internet text, giving it a broad knowledge base that developers and non-developers alike can tap into.
ChatGPT has gained massive adoption, having reached 100 million users in just 2 months after its launch November 30, 2022, making it the fastest-growing product ever. Of course, this large user base includes many software engineers. But before we dive in, it’s important to mention data security, which caused 14 prominent tech companies and even 15 countries’ governments to ban ChatGPT in its early days. Their concerns were that it wasn’t compliant with the EU’s General Data Protection Regulation (GDPR). Most of these bans have since been revoked, and at the time of writing (early 2024) several public authorities are officially starting to use ChatGPT, including the government of Pennsylvania in the US and the UK Judicial Office.
Pros
Let’s look at the pros and cons of ChatGPT, starting with the positive. Note that these also largely apply to most similar tools.
- The 24/7 pair programmer
-
Given the wide range of use cases, from research to code generation to documentation, ChatGPT can be an always-on peer to brainstorm ideas, review code, generate comments, tests, documentation, and more.
- Versatility
-
ChatGPT can generate code in any popular programming language and understand any technical topic included in its training dataset, such as databases, cloud infrastructure, API documentation, and so on.
- Browsing
-
Whenever ChatGPT receives prompts that include recent events, products, or framework updates that happened after its 2022 training-data cutoff date, it automatically browses for the answer online and includes close-to-real-time knowledge as part of its reply. At the time of writing in early 2024, this feature is only available for premium users (see the Cons section below).
- Structured thinking
-
One of the best aspects of ChatGPT is that it structures the code it generates in a very logical and holistic manner, often including the packages to be installed and the environment variables to set up. It uses numbered bullets, as in a how-to tutorial, which makes it easy to transfer those answers to a codebase.
Cons
Now let’s look at some of the drawbacks of ChatGPT and tools like it:
- Security risks
-
For high-security applications, code provided by ChatGPT may not always adhere to best security practices. It’s also unclear whether its suggestions include copyrighted materials from other companies or publicly available sources. Always apply a high level of critical thinking when reviewing its outputs and considering whether to add them to your codebase. Many companies are publishing their own rules for employees’ ChatGPT use, which range from outright bans to training materials to no rules at all. If you’re using ChatGPT for professional purposes, abide by your company’s policy.
- Limited knowledge base
-
ChatGPT works on a model that is pretrained on a certain knowledge base that has an end date (at the time of writing, this date is January 2022 for GPT 3.5, the model available in the free plan). This is a moving window, and the date gets pushed forward with each new update, but there will always be limitations when you ask ChatGPT to generate code that depends on recent events, such as product launches, framework updates, or security patches. For premium users, ChatGPT includes browsing, which significantly reduces this drawback.
- Reduced scope
-
ChatGPT has a limited context window, which means that the amount of text you can include in the prompt is often equivalent to one file’s worth of software code. Many bugs and issues span across a whole repository, and sometimes the root cause lies even beyond the codebase itself. ChatGPT is not very helpful for tackling challenges of such a broad scope, so it’s not well suited for repository-wide solutions. Its replies are text-based, not file-based, and its context window is limited both for input and for output.
- Hallucinations
-
Despite its accuracy, ChatGPT is not infallible. It sometimes generates “hallucinations” such as incorrect code, inaccurate explanations of programming concepts, or false claims about software functionality. Again, approach ChatGPT’s output with a critical mindset. Verify the accuracy of the code and any other information it provides against trusted sources, like official documentation, reputable forums, or peer review. Think of ChatGPT as a brainstorming tool rather than an absolute authority.
Practical example
I asked each of the AI tools covered in this chapter to generate code to solve the exact same challenge, a classic 2-D array I’ve used in many technical interviews over my career as a CTO. Here is the prompt as I provided it to ChatGPT GPT-3.5 model (in March 2024):
Generate code in javascript to solve the following challenge.
Context:
- We have one 2D array, filled with zeros and ones.
- We have to find the starting point and ending point of all rectangles filled with 0.
- It is given that rectangles are separated and do not touch each other however they can touch the boundary of the array.
- A rectangle might contain only one element.
Desired output:
- You should return an array, each element representing one rectangle.
- Each of those array elements contains an array with 4 elements that compose the rectangle (top left X, top left Y, bottom right X, bottom right Y).
Example arrays:
input1 = [ [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1] ]
input2 = [ [0, 1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 1, 1, 1, 1], [1, 0, 0, 1, 1, 0, 0], [1, 0, 0, 1, 1, 0, 0] ]
In less than 10 seconds, ChatGPT generated the code shown in Example 1-2.
Example 1-2. ChatGPT-generated code to solve the challenge.
function findRectangles(input) { const rectangles = []; const numRows = input.length; const numCols = input[0].length; // Helper function to check if a cell is within bounds and contains a zero function isValidCell(row, col) { return row >= 0 && row < numRows && col >= 0 && col < numCols && input[row][col] === 0; } // Helper function to perform depth-first search to find all cells in a rectangle function dfs(row, col, rect) { if (!isValidCell(row, col)) return; input[row][col] = 1; // Marking cell as visited rect[0] = Math.min(rect[0], col); // Update top left X rect[1] = Math.min(rect[1], row); // Update top left Y rect[2] = Math.max(rect[2], col); // Update bottom right X rect[3] = Math.max(rect[3], row); // Update bottom right Y // Check adjacent cells dfs(row + 1, col, rect); // Down dfs(row - 1, col, rect); // Up dfs(row, col + 1, rect); // Right dfs(row, col - 1, rect); // Left } // Iterate through each cell for (let i = 0; i < numRows; i++) { for (let j = 0; j < numCols; j++) { if (input[i][j] === 0) { const rectangle = [j, i, j, i]; // Initialize rectangle [top left X, top left Y, bottom right X, bottom right Y] dfs(i, j, rectangle); // Perform depth-first search to find all cells in the rectangle rectangles.push(rectangle); } } } return rectangles; } // Example arrays const input1 = [ [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1] ]; const input2 = [ [0, 1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 1, 1, 1, 1], [1, 0, 0, 1, 1, 0, 0], [1, 0, 0, 1, 1, 0, 0] ]; console.log(findRectangles(input1)); console.log(findRectangles(input2));
I copied this code snippet into a local file in my computer, named it chatgpt.js, and ran it in the terminal. The result was as shown in Example 1-3.
Example 1-3. Output of ChatGPT’s solution for the code challenge.
sergiopereira@Sergios-MacBook-Pro oreilly_book % node chatgpt.js [ [ 3, 2, 5, 3 ] ] [ [ 0, 0, 0, 0 ], [ 6, 0, 6, 0 ], [ 3, 2, 5, 3 ], [ 1, 5, 2, 7 ], [ 5, 6, 6, 7 ] ]
This output is absolutely correct! Furthermore, the code generated is well structured, ingranular, well defined functions with relevant comments.
To fully explore the boundaries of each tool, I tried this exact same prompt about 10 times per tool. ChatGPT generated a new and original reply every time–even for the exact same prompt. Not only did the actual code generated change, it didn’t always work correctly. Here’s how its answers break down:
-
In most instnaces, it generated the correct result (as shown above), though with some variations in variable names and code organization.
-
Twice, it generated partial results (for example, finding only the top left but not the bottom right of each rectangle).
-
Once, it “hallucinated” by outputting the “X” and “Y” strings in each position of the array literally instead of making the necessary calculations.
For all these reasons, I rate ChatGPT a 9 out of 10 for this code challenge. While it generated the correct result most of the times, it can’t be a 10 of10, because it did generate very discrepant solutions which, in some cases, didn’t correctly solve the problem.
Google Gemini
Gemini is Google’s direct competitor to ChatGPT, its latest and most advanced AI model, succeeding previous models like LaMDA and PaLM 2. Its pros and cons for code generation are very much in line with those of ChatGPT. So let’s compare their performance in the same practical example and see how Google Gemini solved the 2D array challenge.
First, it took significantly longer to reply. Geminidoesn’t have ChatGPT’s throttling user experience, where you can see the reply building up as if someone was typing it). It felt like it was just processing for almost a full minute, until it finally produced the result, which you can see in full in the google_gemini.js file in the book’s Github repository. This result, shown in Example 1-4, is partially correct.
Example 1-4. Console output after running Google Gemini’s solution for the code challenge.
sergiopereira@Sergios-MacBook-Pro oreilly_book % node google_gemini.js [ [ 3, 2, 5, 3 ] ] [ [ 3, 2, 5, 3 ], [ 1, 5, 2, 7 ], [ 5, 6, 6, 7 ] ]
Gemini returned the correct solution for the first input array (with only one rectangle, a simpler problem scope), but it only found 3 out of 5 rectangles in the second input array. I repeated the experiment a few times, just like with ChatGPT, but Gemini produced fewer variations than ChatGPT in the code it generated. Every solution it gave returned this exact same output.
The reason for Google Gemini’s partial failure appears to be that it misunderstood the requirements, which read in part, “A rectangle might contain only one element.” Gemini’s solution included a validation to exclude single elements in the output array, as shown in Example 1-5. The two missing rectangles in Gemini’s output were the two with only one element.
Example 1-5. Part of the code that caused Google Gemini’s partially failed solution.
// Check if it's a rectangle (not a single 0) if (bottomRightX > topLeftX && bottomRightY > topLeftY) { rectangles.push([topLeftX, topLeftY, bottomRightX, bottomRightY]); }
I rate Google Gemini’s solution an 8 out of 10. While it didn’t exactly generate code that correctly solves the challenge, it did produce the correct algorithm– it just added a silly validation that violated one of the guidelines in the brief. This was more of a scope misunderstanding than a genuinely wrong code solution. It generated consistently similar code snippets, and it never generated any hallucinations.
The other tools I tried were unable to solve the challenge or sometimes even to generate code that would run.
IDE-based Tools
Next, let’s review the top IDE-based tools, beginning with the top contender: GitHub Copilot.
GitHub Copilot
GitHub Copilot is a collaborative creation by GitHub, OpenAI, and Microsoft. As its documentation states: “GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. It draws context from comments and code to suggest individual lines and whole functions instantly.”
In fact, GitHub Copilot launched before any of the other tools reviewed in this chapter: in June 2022, roughly 5 months before ChatGPT’s launch. Soon after launch, GitHub claimed that 20,000 organizations were using Copilot. That number has grown to 37,000 at the time of writing (early 2024). In October 2023, Microsoft CEO Satya Nadella claimed that more than a million people were paying to use Copilot. This perhaps paints the most accurate picture of the tool’s usage: users pay at least US$10 per month for access. (Copilot’s free tier, as of early 2024, is reserved for “verified students, teachers, and maintainers of popular open source projects.” Others can sign up for a 30-day free trial.)
While GitHub Copilot uses OpenAI’s GPT models, it does use GPT-4 (the one available in ChatGPT premium), also, the training data is focused on public GitHub code repositories, documentation, and code comments. This has generated some controversy so far, with critics alleging that its output often copies code snippets verbatim from the training data. Since some of those repositories are copyrighted, it’s no surprise that copyright-infringement lawsuits have already been filed against GitHub for this exact reason.
When you install GitHub Copilot, you’ll be asked some questions about what type of code you want it to include in the code it generates. You can allow any code from the training data or place restrictions around copyright and publicly available code. I’d expect many developments on this legal front for Copilot (and most other tools, too), especially as regulators delineate what’s acceptable for AI generation and what constitutes an unacceptable copyright violation.
Pros
Now that you have some context, let’s look at the advantages of GitHub Copilot:
- Maximum convenience
-
Like other IDE-based tools, Copilot lives inside your IDE and generates code directly in the file in which you’re already coding. This allows for a higher level of integration into the software development flow.
- Context window includes the entire codebase
-
As opposed to browser-based tools, which require you to write or copy context into the browser window, in Copilot and other IDE-based tools, the context is already there. It uses the whole codebase as context whenever a user asks it to generate any code. This makes it especially suitable to generate code with dependencies on functions or variables that are declared in different files in the same repository.
Cons
So what are Copilot’s drawbacks?
- It can generate copyrighted code
-
As mentioned above, GitHub Copilot has been seen generating code that was copied verbatim from repositories in it training data, which in some cases could be copyrighted. Using such code could cause trouble for you.
- Lack of depth
-
Most browser-based tools have many general-purpose applications that go way beyond just generating code. With those, you can have a broader discussion about research, brainstorm implementation options, and assess tradeoffs. In GitHub Copilot and other IDE-based tools, however, this is not as feasible. Copilot has recently rolled out a chat function that aims to provide a comparable user experience to the browser-based tools.
Practical example
I used GitHub Copilot to solve the exact same code challenge I gave to all the other tools. But its user experience was quite different from the browser-based tools covered earlier. Let me walk you through that experience.
I installed the GitHub Copilot extension in my IDE (Visual Studio Code, VSC), so where the action is happening. In any empty file, Copilot prompts me to press a command that opens its widget, as shown in Figure 1-1.
When I press ⌘ I as instructed, the widget opens (Figure 1-2) and I paste in the exact same prompt I used with ChatGPT.
As I hit the Enter key, GitHub Copilot starts generating code right there in the code file inside the IDE. The user experience is very much in line with ChatGPT’s, in that it starts writing the code immediately when I submit the prompt and renders the code as if someone’s typing it very fast, line by line. Both tools take about 10 seconds to generate the full solution.
Now, there’s one big difference: GitHub Copilot’s solution is incorrect. Example 1-6 shows the console output when I run the code it generated.
Example 1-6. Console output for the solution generated by GitHub Copilot.
sergiopereira@Sergios-MacBook-Pro oreilly_book % node github_copilot.js [ [ 3, 2, 5, 3 ], [ 4, 2, 5, 3 ], [ 5, 2, 5, 3 ], [ 3, 3, 5, 3 ], [ 4, 3, 5, 3 ], [ 5, 3, 5, 3 ] ] [ [ 0, 0, 0, 0 ], [ 6, 0, 6, 0 ], [ 3, 2, 5, 3 ], [ 4, 2, 5, 3 ], [ 5, 2, 5, 3 ], [ 3, 3, 5, 3 ], [ 4, 3, 5, 3 ], [ 5, 3, 5, 3 ], [ 1, 5, 2, 7 ], [ 2, 5, 2, 7 ], [ 1, 6, 2, 7 ], [ 2, 6, 2, 7 ], [ 5, 6, 6, 7 ], [ 6, 6, 6, 7 ], [ 1, 7, 2, 7 ], [ 2, 7, 2, 7 ], [ 5, 7, 6, 7 ], [ 6, 7, 6, 7 ] ]
While Copilot’s solution correctly found the top lefts and bottom rights of all rectangles, it also added each of those to the “rectangles” array, as if each individual element would be a rectangle in itself. There’s also missing logic in the section of code that flags elements as top left and then checks if the element is already inside an existing rectangle before flagging it as the start of a new rectangle.
So GitHub Copilot’s algorithm is mostly correct, but misses a key part of the logic. Example 1-7 shows the code it generated, which you can also find in the file github_copilot.js in the book’s GitHub repository.
Example 1-7. Code generated by GitHub Copilot to solve the challenge.
function findRectangles(matrix) { const rectangles = []; const rows = matrix.length; const cols = matrix[0].length; for (let i = 0; i < rows; i++) { for (let j = 0; j < cols; j++) { if (matrix[i][j] === 0) { let topLeftX = j; let topLeftY = i; let bottomRightX = j; let bottomRightY = i; // Find the bottom right corner of the rectangle while (bottomRightX + 1 < cols && matrix[i][bottomRightX + 1] === 0) { bottomRightX++; } while (bottomRightY + 1 < rows && matrix[bottomRightY + 1][j] === 0) { bottomRightY++; } rectangles.push([topLeftX, topLeftY, bottomRightX, bottomRightY]); } } } return rectangles; } const input1 = [ [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1] ]; const input2 = [ [0, 1, 1, 1, 1, 1, 0], [1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 0, 0, 0, 1], [1, 1, 1, 1, 1, 1, 1], [1, 0, 0, 1, 1, 1, 1], [1, 0, 0, 1, 1, 0, 0], [1, 0, 0, 1, 1, 0, 0] ]; const rectangles1 = findRectangles(input1); const rectangles2 = findRectangles(input2); console.log(rectangles1); console.log(rectangles2);
For these reasons, I rate GitHub Copilot a 6 out of 10 on this task. While its solution partially solved the problem, the solutions it generated were quite consistent over multiple attempts and never included any hallucinations.
Amazon Code Whisperer
In this race for the AI code-generation market share, CodeWhisperer is Amazon’s contender. It was released after most of the other tools covered in this chapter, but has gained some momentum, especially among the very large user base of Amazon Web Services (AWS). Let’s see how it handles our code challenge.
Example 1-8. Console output for the code generated by Amazon Code Whisperer.
sergiopereira@Sergios-MacBook-Pro oreilly_book % node amazon_code_whisperer.js [ [ 3, 2, 5, 3 ] ] [ [ 0, 0, 0, 0 ], [ 6, 0, 6, 0 ], [ 3, 2, 5, 3 ], [ 1, 5, 2, 7 ], [ 5, 6, 6, 7 ] ]
As shown in Example 1-8, Amazon’s solution returned the correct result–on my sixth try. However, I must tell you that Code Whisperer has the clunkiest user experience of all the tools in this chapter.
First, it took me some time to figure out the exact comment syntax I needed to use to generate the code. Second, Code Whisperer generated more hallucinations than any other tool. Before it generated this correct result, I tried five times, during which it generated simple code comments without any actual code; code that would throw console errors based on undeclared variables; and solutions that partially solved the challenge. Amazon’s tool had the widest range of discrepancies among attempts to solve the same problem.
For these reasons, I rate Amazon Code Whisperer a 7 of 10. This correct output feels like a lucky strike, given the range of useless hallucinations it generated for the exact same prompt right before generating the correct solution.
Tool Comparison
If I were to select a single tool to solve algorithmic functions like this 2D array challenge, ChatGPT would be my go-to choice: it returned the correct result in most instances. Table 1-1 provides an overview of the tools reviewed here.
Tool | UX | Test performance |
---|---|---|
ChatGPT | Browser | 9/10 |
Google Gemini | Browser | 8/10 |
GitHub Copilot | IDE | 6/10 |
Amazon Code Whisperer | IDE | 7/10 |
However, for more general-purpose software development, which usually involves a much broader scope and more nuanced requests, I’d probably use GitHub Copilot, for the convenience of having it in my IDE at a code comment’s distance.
Conclusion
I’ve used the 2D array code challenge from this example dozens of times in interviews over the years. Usually, I start an hour long live coding interview by giving the candidate the challenge brief pretty much exactly as I’ve given it here. The candidates then code the solution, thinking out loud as they work, occasionally searching Google for help.
In that hour-long interview, only a very few candidates have ever managed to solve the full scope of the challenge (multiple rectangles). Most write partial solutions that find only one rectangle, or only the top left corners, or some other variation.
It’s incredible that a free tool like ChatGPT (GPT 3.5, in this case) can produce the same outcomes as those top performers in only 10 seconds. However, it’s also important to stress that it didn’t always produce the correct answer. Even with this objective, straightforward prompt, it produced partial solutions and even hallucinated once.
While every tool reviewed in this chapter was considered best-in-class for code generation at the time of writing (early 2024), none performed better than ChatGPT. The only other tool that generated a correct solution was AWS Code Whisperer, and that was a one-off among the wild hallucinations it generated in all my other attempts. The remaining tools generated either partial solutions or solutions that didn’t run.
None of the tools reviewed here produced a correct result for this challenge on all attempts, and most failed to produce a correct solution at all. Even for those that did produce a correct result, I have no way to know if the code they generated is copyrighted. Again, you must exercise caution.
Most of the prompts a software engineer would use on a daily basis are way more complex or subjective than this challenge, which would increase the likelihood of these tools generating wrong or misleading results. Again, critical thinking is key when using these tools.
Most software engineers view the most important part of reviewing ChatGPT’s reply to their prompt as confirming if it actually solves their problem. While that is of course important, I recommend a few rules of thumb.
First, always review AI-generated code before pushing it to production or opening a pull request. Make the code yours, regardless of how much of it was generated by your tool. Second, test your code. Runit against a test suite that covers a wide range of cases, from the happy path to edge cases and error states. Getting all tests green is a solid confirmation that the code fulfills your requirements. And finally, while I’ve said it before, be sure to revisit your company’s guidelines for any AI tools you use for professional purposes.
Get Generative AI for Software Development now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.