Chapter 4. AI Use Cases For Mass-Scale Refactoring and Analysis
At Moderne, our mission is to help our customers understand and transform their large, unwieldy codebases at mass scale. Whether it’s migrating frameworks, modernizing for the cloud, remediating security vulnerabilities, or performing impact analyses, these are tasks that typically scale beyond individual developers working in single repositories.
In our platform, we have brought together AI models, LST artifacts, and OpenRewrite rules-based recipes to automate code refactoring and analysis work at mass scale. This chapter details four of our current use cases:
- FIND
-
Implementing an AI-based search, enabling users to quickly find recipes that can help them
- ASSIST
-
Using rules-based recipes to ask AI to help with simple operations, leveraging specific context from the LSTs
- DIAGNOSE
-
Using AI models, working with LST data, to recommend recipes based on what the code needs
- BUILD
-
Using GenAI to assist in OpenRewrite recipe development to enable automated code fixes at mass scale
These use cases illustrate how to put the techniques and technologies we’ve explained earlier into practice—from building RAG-like pipelines to improving AI assistant responses.
Using AI to Search a Recipe Catalog (FIND)
The objective of human-computer interaction (HCI) is to enhance usability, which encompasses the development of an intuitive platform, as well as features that assist users in maximizing their interaction with an interface. The latest AI-based interfaces enable users to interact with computers using natural language, further simplifying and enhancing the ease of use.
This section will walk through how we upgraded the search function of the Moderne Platform with an AI-based interface to ensure that our users could more easily find the OpenRewrite recipe they were looking for, as well as discover other relevant search and refactoring recipes to use across their large codebases. This use case demonstrates a multistep pipeline using two different embedding models.
Problem: Failing Exact Match Recipe Searches
The first-generation search in the Moderne Platform used the Apache Lucene library (i.e., string-matching) and required exact match keywords, which often prevented users from finding the desired recipe. For example, a search for “Find method invocations” wouldn’t return the recipe titled “Find method usages,” despite the terms “invocations” and “usages” being synonymous in this context.
We needed a way for our users to search the recipe catalog of thousands of recipes based on concepts instead of exact words, which led us to an AI-based solution. This function needed to be fast and operationally inexpensive (both in time and financially), allowing us to scale as the recipe catalog grows. We also wanted to use an open source solution where we had more control over security for air-gapped and restricted environments, so using OpenAI’s API or other providers was out of the question.
Solution: Build an AI-Powered Search Engine
We built an efficient, semantic-based AI-powered search engine that allows users to quickly find the recipe they were looking for by using word representations and concepts. This is made possible through the use of embeddings, those vector representations of concepts that allow for searching based on the distance between vectors.
The reality is that most recipes will be poor overall matches for a given query. Our solution incorporates two embedding models to get to a relevant subset. The first model conducts a preliminary sweep of all recipes, while the second, more sophisticated but slower model, meticulously refines the results of the initial pass, as shown in Figure 4-1.
Stage 1: Retrieving initial search results
There are many different options of retrievers you can use for returning a response from a query. We evaluated three different ones for our use case:
- Regular retriever
-
Native to the vector database, this simple retriever is typically based on the distance between embeddings, and the results are based on the closest elements to the query’s embedding.
- Multiquery retriever
-
This uses a generation model (such as OpenAI GPT or Code Llama) to produce a range of similar queries, rather than just the one user-provided query, enhancing the diversity and comprehensiveness of the set of results retrieved.
- Ensemble retriever
-
This technique fetches search results using multiple retrievers, bringing the best of both worlds of embedding-based search and keyword matching.
We found that the regular retriever worked fine for our use case and did not add overhead that was unsupportable. We ended up using a BAAI retrieval model, specifically bge-small-en-v1.5.
The multiquery retriever was not fit for many reasons. It comes with a significant computational cost, which in turn leads to higher latency. To get useful queries, you need an LLM that is good at generating, such as using an OpenAI model or an OSS model such as Mistral and Code Llama. With the ensemble retriever, we did not see a significant performance difference for our use case beyond what we were getting with the regular retriever.
Retrieval is quick at search time because all the embeddings, except for the query, are already computed when initializing the recipe database at startup. This means that all you have to do is compare the distance between a recipe’s embedding and the query’s embedding.
Stage 2: Reranking to finesse AI search results
While the retriever has to be efficient for large sets of recipes, it might return irrelevant candidates. Also, the order in which even the top few recipes are returned in practice can be counterintuitive.
Reranking is a technique to finesse your search results, up-leveling the more likely candidates and providing a more intuitive ordering. Reranking passes the query and a recipe retrieved in the first stage and passes them simultaneously to the transformer neural network, which then outputs a single score between 0 and 1, indicating how relevant the recipe is for the query. Using a reranking model that uses a cross encoder enables the model to have access to the query and recipe text together, meaning it can better grasp the subtleties of the query and recipe together.
But this also means that reranker scores can’t be calculated in advance of the query coming in. While the qualitative performance of the reranker is better than a regular retriever, scoring thousands of query/recipe pairs (i.e., for every recipe in the catalog) would be prohibitively expensive. In addition to the lack of cacheability, reranker models are simply larger than retriever models. In our case, we use another model from BAAI called bge-reranker-base, which is 1.11 GB. This is significantly larger than the 134 MB for our retrieval model.
Combining the stages into a pipeline
Now let’s look at the pipeline in operation, as shown in Figure 4-1. By taking just the top-k from the retriever step, the obvious poor recipe matches are swept away in Stage 1. We then pass these top-k recipes to the reranker model in Stage 2. The reranker scores the recipes, discards the recipes that don’t meet a predetermined threshold, and orders the remaining set of recipes based on their respective scores—without any consideration given to the order from the original retrieval step.
This two-step pipeline, using two models of varying sizes, yields the best results both for performance and accuracy.
Using AI to Support Rules-Based Refactoring at Scale (ASSIST)
We talked about using AI as another tool in your toolbox. To that end, LLMs can become a useful assistant to OpenRewrite recipes, adding additional analysis and transformation options for codebases. Connecting rules-based techniques and AI-based techniques are a powerful combination that can help reconcile the need for “human in the loop” for GenAI.
This section details how the Moderne Platform provides the framework for a recipe to walk through a large codebase (aggregated LSTs) in a deterministic way, calling the AI model only when needed. This not only safeguards and focuses the model to precise places in the code, but also makes models more efficient, as they are only used when needed.
Problem: Misencoded French Characters in Code
One of our customers came to us with a problem ripe for an automated fix that leverages AI. Their older code had gone through multiple stages of character encoding transformation through the years, leading to misencoded French characters being unrenderable.
French characters can have accents such as é or è or might be even ç or œ. These special French characters could be found in comments, Javadocs, and basically anywhere you would have textual data in their codebase. ASCII is a 7-bit character encoding standard that was designed primarily for the English alphabet, supporting a total of 128 characters (0–127). This set includes letters, digits, punctuation marks, and control characters but does not include accented characters such as é or other non-English characters. When a character has an encoding issue, then it will be replaced by ? or �.
Other ramifications for this customer were that the misencoded French characters in Javadoc comments caused the Javadoc compiler itself to fail, which means consumers of that code did not have ready access to documentation on the APIs they were using. It also hindered automatic test generation.
Solution: Find and Fix Misencoded Characters
With the Moderne Platform and a little help from AI, we were able to quickly solve this problem. We decided to use AI to figure out what the words were supposed to be and to fill in the appropriate modern UTF-8 characters.
We wrote an OpenRewrite recipe to use AI to fix the misencoded comments and Javadocs. The recipe walks through the codebase (i.e., the LSTs) until it finds either a comment or a Javadoc. It then sends the text in the comment or Javadoc to one LLM sidecar that determines whether it’s French text. If it is, that text is then sent to another sidecar to generate a predicted fix for the misencoded text. We were able to integrate two models for a transformation that can be used at scale. By having a recipe that guides the changes, our customer could be certain that the changes would only be on comments or Javadocs, essentially safeguarding their code from any unnecessary change.
We focused on OSS-specialized LLMs that can run on CPUs for maximum operational security and efficiency within the Moderne Platform. This enabled us to provision the models on the same CPU-based worker nodes where the recipes were manipulating the LST code representations.
On the Moderne Platform, models run as a Python-based sidecar using the Gradio Python library. Each Python sidecar hosts a different model, allowing for a variety of tasks to be performed. A recipe then has several tools in its toolbox, and different recipes can also use the same sidecar.
When a recipe is running on a worker, it can search LSTs for the necessary data, pass it to the appropriate LLM, and receive a response. The LST only sends to the model the parts that need to be evaluated or transformed. The recipe then inserts the LLM response back into LST. The Moderne Platform produces diffs for developers to review and commit back to source code management (SCM), ensuring models are doing their job with precision. See the process in Figure 4-2.
Fixing misencoded French text in a codebase is challenging for both purely rules-based systems and LLMs alone. Rules-based systems can’t identify the natural language in comments, and LLMs struggle to identify comments or other code syntax/semantic data. By using recipes to guide and focus the LLM, we achieve more predictable and reliable results.
We tested it out on ChatGPT alone and found too many instances where the LLM failed to fix the misencoded comments, as shown in Figure 4-3. For example, it fails to understand that “this” represents the keyword instead of the determinant. It also fixes “class” to “classe,” which could be frustrating for developers.
Using AI to Discover Problems in a Codebase (DIAGNOSE)
We’ve found that it’s really useful for AI chatbots to have access to tools, something we call “AI gets a computer.” These tools, such as a calculator, enable AI models to behave more autonomously, mimicking traits of an agent that can interact with its environment, by limiting hallucinations and significantly improving the quality of their output.
This section details the use case of how the Moderne Platform helps LLMs sample our customers’ large codebases, diagnose issues in the code, and recommend applicable fixes (i.e., recipes).
Problem: What’s Wrong with My Code?
In large enterprise codebases spanning millions if not billions of lines of code, the old adage of “you don’t know what you don’t know” is so true. Our customers may not be aware of repositories with older framework versions that are reaching their end of life or whether they’ve missed a security vulnerability, such as Log4Shell, somewhere in the code.
How could we help our customers cut through the noise to discover issues in their codebase, as well as fix the issues for them?
Solution: AI Sampling a Codebase and Recommending Recipes That Can Fix Problems
We developed a recommendations tool designed to diagnose issues in a codebase and recommend recipe fixes specific to that codebase. This solution has three main stages (shown in Figure 4-4):
-
A recipe extracts methods or classes from the codebase and feeds them to a model to produce embeddings. We cluster embeddings using k-means, then select samples from each cluster to give to a generative model to make recommendations.
-
Using these recommendations, our in-house recipe search (see the FIND use case) discovers recipes that can do the modernization, fix, or migration required. This can validate the AI recommendation.
-
Prove the efficacy of the AI recommendation by automatically running the safe and tested OpenRewrite recipes on the code; only recipes that actually produce changes are shown.
It’s important to use recipes for validation as the last step to making such huge transformations. When you look at the second stage of the recommendation pipeline, hallucinations were something to be worried about. An example of a hallucination we saw when prompting the generative model to generate recommendations for modernization was when the model said to upgrade Java version because “the Boolean
class was introduced in Java 5, but it has been deprecated in favor of the boolean
primitive type.” This is absolutely false. If you were not a Java coder or very knowledgeable about versions, you might not know that both the Boolean
and the primitive boolean
are both still accurate ways to represent a boolean in Java 8. So what do we do with recommendations that may contain hallucinations?
Thankfully, at Moderne, we do have a set of recipes that we trust and know are accurate. The next step for us is simple: search for recipes based on the recommendation. Therefore, if there are any recipes that deal with the Boolean
class or Java migrations, we can run them to see if changes exist. The accurate recipe is the only thing that can make a change on the source code.
Using AI to Develop OpenRewrite Recipes (BUILD)
Given the nondeterministic nature of GenAI, as well as the fact that the coding assistant can only change a single file at a time, large-scale code remediation efforts are not feasible with AI assistants alone. Even a slight risk of error is unacceptable, and it doesn’t scale to the human review necessary.
However, AI assistants can be helpful in writing the OpenRewrite recipes that can then be run across a large codebase to stamp out the change—cookie-cutter style. Recipes and their unit tests are highly structured. They can be an imperative program or a declarative composite recipe expressed in YAML, or a declarative template style recipe (“refaster style”), specifying before and after templates for code transformation.
Recipes are tested and reviewed by a human first and then can be used to produce perfect cookies (i.e., code changes) at scale. Involving a human in the recipe creation process ensures reliability and predictability in the changes implemented by the recipe.
This section walks through how we can use AI during the writing process for OpenRewrite recipes. LLMs are great at writing repetitive but necessary parts of code or sketching a possible solution (proposed code) to a problem, such as a large-scale migration. But the actual application of changes must remain deterministic to be effective at scale.
Problem: Generating OpenRewrite Migration Recipes Faster
Modern applications are assembled with as much as 90% of their code coming from dependencies. These dependencies are outside of business control and evolve at their own pace, remediating vulnerabilities and developing new capabilities—in effect changing API signatures or deprecating and deleting APIs.
OpenRewrite makes it easy to provide a refactoring recipe to lift consumers of APIs to a new library version. This enables framework and library authors to freely make changes without worrying about affecting their consumers because upgrades are now automated. However, given that the ecosystem of OSS and third-party vendor libraries is so broad, how do we as a community catch up and keep producing refactoring recipes faster?
Solution: Leveraging GenAI to Write Recipes
At Moderne, our developers write recipes with assistance from Copilot, just like any other program, with the human reviewing each suggestion and supervising the production of the correct recipe for refactoring or remediation. But it is still the human who needs to go read release notes and decide what recipes to build.
Determining what has changed frequently is not straightforward, relying on the quality of release notes, security reports, and other documentation. However, when such information is available, it can be leveraged by the AI assistant to full effect: inserted into an IDE as a comment prefixing a specific recipe declaration or in a ChatGPT prompt with a query to create a declarative OpenRewrite migration from the release notes.
OpenRewrite declarative YAML format is perfect for capturing the majority of migration changes, such as API and dependency version changes. We still rely on developers to correct the definition, but in our experiments, the time saved in having AI write out the boilerplate code is significant. We recommend preprocessing the release notes removing bug fixes, contributors, and other extraneous information, focusing only on what has changed prior to feeding it to the AI assistant—focusing AI on just the relevant information.
We’re also working to leverage AI in improving the recipe feedback cycle. Determining what has changed between versions of libraries is not a straightforward task. For example, understanding the high-level Spring framework release notes is not enough. If you use Kafka, you will likely also need to migrate Kafka libraries, and if you use AWS cloud APIs, you will also need to migrate that. Unfortunately, the lack of recipe coverage usually shows up during the compilation of repos after applying a recipe. With AI, we can help people understand the additional recipe coverage they need and fill the gaps more quickly—such as finding existing recipes or understanding the changes necessary to accomplish an upgrade.
Get AI for Mass-Scale Code Refactoring and Analysis now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.