Chapter 1. Introduction

Why Generative AI services will power future applications

We use computers to automate solutions to everyday problems.

In the past, automating a process required you to manually write code, which could become tedious for complex issues like spam detection. Nowadays, you can develop a model by training it with sufficient data that contains the necessary patterns to understand the nuances of the business process. Once trained, this model can then replace your manually written application code.

This gave rise to a wave of AI-powered applications in the market, solving a range of problems including price optimization, product recommendation or weather forecasting. As part of this wave, generative models emerged that differed from other types of AI in their ability to generate novel outputs rather than just predicting, analyzing or classifying data.

As a software engineer, I believe these models have certain capabilities that will drive the roadmap of future applications. They can:

  1. Facilitate the creative process

  2. Suggest contextually relevant solutions

  3. Personalize user experience

  4. Minimize delay in resolving customer queries

  5. Act as an interface to complex systems

  6. Automating manual back office tasks

  7. Scale and democratize content generation

Let’s look at each capability in more detail.

Facilitating the creative process

Mastering skills and acquiring knowledge are cognitively demanding. You can spend a long time studying and practicing something before you form your own original ideas for producing novel and creative content like an essay or a design. The creative process requires a deep understanding of the purpose behind your creation and a clear awareness of sources for inspiration and ideas you intend to draw upon. Often when you sit down to make something new like an original essay, you may find it difficult to start from a blank piece of paper. You will need to have done extensive research on a topic to have formed your own opinions and the narrative you want to write about. This also applies to design. For instance, when designing a user interface, you may need a few hours of design research by browsing design websites for ideas on the color palette, layout and composition before you originate a design. Creating something truly original by looking at a blank canvas can feel like climbing a wall bare-handed. You will need inspiration and have to follow a creative process.

Producing original content requires creativity. Creativity involves complex, non-linear thinking and emotional intelligence that makes it challenging to replicate or automate with scripting and algorithms. Yet it is now possible to mimic creativity with generative AI.

It is an extraordinary feat for humans to produce original ideas from scratch. New ideas and creations are often based on inspirations, connecting ideas and adaptations of other works.

Generative AI allows us to streamline this method by bridging various ideas and concepts from an extensive repository of human knowledge. These models can embed knowledge repositories into “latent” mathematical spaces and then leverage interpolation techniques to navigate this space to produce new content which never existed in the model’s training data. With a prompt, you can then navigate the latent space and request content to be generated in any shape or form to solve a problem or query.

This paves the path towards a new wave of AI-powered applications that facilitate the creative process. During the creative process, you may have the writer’s block, difficulties with imagining and visualizing scenes, navigating ideas, creating narratives, constructing arguments and understanding relationships between concepts. You can overcome these challenges and reduce the cognitive load by interacting with a GenAI service. You may also stumble upon novel ideas that required understanding a large body of interconnected knowledge and the comprehension of the interactions between several concepts.

Another place where GenAI plays a role during the creative process is in assisting with imagining hard to visualize concepts. Here is an example of a concept that can be hard to visualize:

Example 1-1.
grayscale 4k, In the distance, two moons overlap in a perfect eclipse,
their edges shimmering with an impossible spectrum of gray,
casting an ethereal glow over the scene.

This can be quite difficult to imagine unless you’ve been accustomed to imagining such concepts. However, with the help of a generative model, anyone can now visualize and communicate challenging concepts to others.

Providing the scene description in Example 1-1 to a GenAI tool such as Midjourney, produces an output as shown in Figure 1-1:

midjourney
Figure 1-1. Midjourney [Source: https://midjourney.com]

It is fascinating to see how these GenAI tools can help you visualize and communicate challenging concepts.

These tools allow you to expand your imagination and nudge your creativity. When you feel stuck or find it difficult to communicate or imagine novel ideas, you can turn to these tools for help.

In the future, I can see applications including similar features to help users in their creative process. If your applications give several suggestions for the user to build upon, it can help them get onboarded and build momentum.

Suggesting contextually relevant solutions

Often you will find yourself facing niche problems that don’t have a common solution. Solutions to these problems aren’t obvious and require a lot of research, trial and error, consulting with other experts, and reading. Developers are familiar with this situation as finding relevant solutions to programming problems can be tricky and not straightforward.

This is because developers must solve problems with a certain context in mind. A problem can’t be defined without a thorough description of “circumstances,” and the “situation” arises in the context.

Context narrows down the potential solutions to a problem. With search engines, you look for sources of information with a few keywords that may or may not contain a relevant solution. When developers search for solutions, they paste error logs into Google and be directed to Q&A programming websites like StackOverflow. Developers must then hope to find someone who has encountered the exact same problem in the same context, and that a solution has been provided. This method of finding solutions to programming problems is not very efficient. You may not always find the solution you’re looking for as a developer on these websites. As a result, there has been a significant decrease in traffic to these sites since the introduction of ChatGPT and GitHub Copilot.

Developers are now turning to generative AI to solve programming issues. By providing a prompt that describes the context of a problem, the AI can generate potential solutions. It’s then up to you to determine if the proposed solution is appropriate. While this method does require some understanding of the problem area, it’s often quicker than searching for solutions on online forums and websites. This is because the AI model can generate solutions that aren’t only contextually relevant but also based on its learned knowledge base that may be sourced from those online forums and websites. Importantly, this approach isn’t just confined to programming issues.

Personalizing the user experience

Customers and users of modern software expect a certain level of personalization and interactivity when they use modern applications. Previously providing this feature required having complex onboarding forms and performing an excessive collection of user data, data analysis, and user segmentation. You would then adjust the user experience based on these preference settings. Now, you can provide users access to a large language model with access to relevant systems such as a product database to personalize their experience and application interface without having to know much about them. They can talk to the model and ask it for what they want. Then, the model can perform actions on the system to adjust to the user’s preferences or direct them to the relevant resource or pages.

When you interact with a service powered by a generative model, you can ask for the information you seek or an action to be performed on your behalf. For example, when browsing a travel planning site, you can describe your ideal holiday and have a bot prepare an itinerary for you based on the platform’s access to airlines, accommodation providers and the database of package holiday deals. Or, if you’ve already booked a holiday, you can ask for sightseeing recommendations as it can read the itinerary details from your account data.

Large Language Models can act as a customer service agent by asking relevant questions until they map customer preferences and unique needs to a product catalog for generating personalized recommendations. This is similar to having a personal virtual assistant that understands your desires and suggests choices for you to consider. If you don’t like the suggestions, you can provide some feedback to the assistant to fine-tune the suggestions to your liking.

In education, these GenAI models can be used to describe or visualize challenging concepts tailored to each student’s learning preferences and abilities.

In gaming and Virtual Reality (VR), GenAI can be used to construct dynamic environments and worlds based on user’s interaction with the application. For instance, in Role-Playing Games (RPG)s you can produce narratives and character stories on the fly, based on user’s decisions and dialog choices in real-time using a baked-in large language model. This process creates a unique experience for the gamer and users of these applications.

These examples only scratch the surface of all possible features that can be integrated into existing applications. This flexibility and agility of generative models opens up many possibilities for novel applications in the future.

Minimizing delay in resolving customer queries

Businesses selling products or services often need a customer service function. As the business grows in operational complexity and customer numbers, resolving customer queries in a timely manner can become expensive and require extensive staff training.

GenAI can streamline this process for the customers and the businesses. Customers can chat or go on a call with a large language model capable of accessing databases and relevant sources based on the nature of the query. For example, you can As customers describe their issues, the model can address these queries in accordance with business policies and can direct customers to relevant resources when necessary.

Autonomous bots will be the first point of contact for customers who want their queries swiftly answered before cases are escalated to human agents. As a customer, you may also prefer talking to a bot first if it means avoiding long queues and achieving a quick resolution.

Acting as an interface to complex systems

Many people these days still face problems when interacting with complex systems such as databases or developer tools. Non-developers may need to access information or perform tasks without having the necessary skills to execute commands on these complex systems. LLMs and GenAI models can act as the interface between these systems and their users.

Users can provide a prompt in natural language, and GenAI models can write and execute queries on complex systems. For instance, an investment manager can ask a GenAI bot to aggregate the performance of the portfolio in the company’s database without having to submit requests for reports to be produced by specialists.

With the rise of new AI startups, there will be novel AI-powered applications where users can chat with a bot in natural language to perform actions. There won’t be without a requirement anymore for clicking around the application or following any complex workflows.

Automating Manual Back Office Tasks

Across many large and longstanding companies, there are often several teams performing manual back office tasks that are less visible to the front house teams and their customers.

A typical manual task in the back office involves handling documents with complex layouts, like invoices, purchase orders, and remittance slips. Since each company may arrange information uniquely in these visual documents, sales and accounting teams in the back office have to process them manually.

The reason they were never automated and have been carried out manually for years is that software developers were unaware of them, or they were too challenging to automate, prior to the invention of Large Language Models (LLMs).

Now, LLMs (and other GenAI tools) can enable the automation of many manual processes that were previously unfeasible. The most significant obstacle is simply identifying one of these processes to address.

Scaling and democratizing content generation

People love new content and are always on the lookout for new ideas to explore. Writers can now research and ideate when writing a blog post with the help of GenAI tools. By conversing with a model, they can brainstorm ideas and generate outlines.

The productivity boost is enormous for content generation. You no longer have to perform low-level cognitive tasks of summarizing research or rewording sentences yourself. The time it takes to produce a quality blog post is slashed from days to hours. You can now focus on the outline, flow and structure of the content then use Gen AI to fill in the gaps, polish and refine the content. It can really help when you struggle with sequencing the right words for clarity and brevity.

Many businesses have already started using these tools to explore ideas, draft documents, proposals, social media and blog posts.

These are several reasons why I believe more developers will be integrating Gen AI features into their applications in the future. However, the technology is still in its infancy and there are still many challenges to overcome before it can be widely adopted.

What prevents the adoption of generative AI services

Organizations face several challenges when adopting generative AI services. There are issues related to the inaccuracy, relevance, quality, and consistency of the outputs generated by the AI. In addition, there are concerns about data privacy, cyber-security, and potential abuse and misuse of the models if used in production. Integrating the AI service with existing systems, such as internal databases, web interfaces, and external APIs, can pose a challenge. This integration can be challenging due to compatibility issues, the need for technical expertise, potential disruption of existing processes, and similar concerns about data security and privacy.

Companies that want to use the service for customer-facing applications would want consistency and relevance in the model’s responses and to ensure outputs are not offensive or inappropriate.

There are limitations to the originality and quality of the content that generative models can produce. As covered before, these tools effectively bridge between various ideas and concepts that they’ve been trained within certain domains. They also can’t produce totally unseen or novel ideas. Furthermore, they follow common patterns during generation, which can be generic, repetitive and boring to use without any fine-tuning.

Some challenges—such as data privacy and security issues—can be solved with software engineering best practices which you will read more about in this book.

Solution to other challenges requires prompt engineering and fine-tuning of models to specific domains for improving the relevance, quality, coherence and consistency of the outputs. With fine-tuning, you re-train the model on your own data, and with prompt engineering you effectively write prompts that “code” or “program” the model to produce outputs in a certain way.

Providing more autonomy to the model is another solution to removing GenAI adoption blockers. Autonomous models will then be able to take actions on the user’s behalf on gathering additional information and performing actions in-line with the user’s intent.

Besides from addressing security and scalability challenges, working on providing autonomy to the models can deliver a significant impact on the GenAI’s adoption. Let’s look at this point in more detail.

Making generative services autonomous

When you prompt a model with only a few keywords, you will get a response that may or may not satisfy your intent.

To explain why, let’s look at a search engine and how it works.

Google has invested a lot of capital into building a search engine that can infer your intent based on the few keywords you enter into the search bar. The search results then show you pages that closely match your intent.

The fewer keywords you provide, the harder it is for the search engine to infer what you exactly want and to show you relevant results. For instance, if you search for “ties”, the search engine has to make assumptions and infer that your intent was to shop for ties (clothing) versus learning about ties. If you instead search for “trinity tie” or “types of tie”, there is an educational intent in your search query, and Google can show the appropriate results.

This is also true for generative models. When you provide little to no context as a prompt to the model, the model has to infer your intent and produce something highly likely correlated with your intent. A generic response would satisfy this type of query the most. And, that is why you would often get a generic output to a generic or low keyword prompts.

The more detailed and contextually rich your queries are, the better and more relevant the responses from your model will be. Another option is to fine-tune your model to assume an intent. Otherwise, the model will return non-relevant or generic responses.

That is why context-rich prompts are crucial to getting relevant, specific and high quality results when working with generative models.

However, you may not always know the full context of your query in advance. For instance, you can’t know without looking at a customer’s profile what product description to generate for a sales brochure that would maximize the chances of a customer making a purchase. In such scenarios, you will want to give your model a series of tools and abilities that makes it autonomous. But how do you go about doing that?

Prompting techniques can allow you to translate between natural language and SQL, so that the model can talk to the database given a database schema. Potentially using these techniques, the model can look up information in a database and respond back or perform certain actions such as updating a record. This is particularly useful when asking challenging questions that require multiple sub-queries, joins and aggregations to get the answer. In this scenario, the model can do the heavy lifting for you.

However, text-to-SQL techniques are still in development and are specialized to mostly work with relational databases. They will not yet be able to work with external systems and APIs. Their probabilistic nature reduces the accuracy and reliability of their results. Because of this, you may not want to use them in production scenarios.

To provide context to your model from different sources including databases and APIs, a reliable solution is to wrap the model with a web server. The web server then gets explicit instructions—via an HTTP request—to query the database and external services for enriching the prompt context given to the model.

With this approach, you will not need to fine-tune your model or use separate models to generate code. However, you will have to build a client for your databases and any API you want to interface with so that your web server can make the calls instead on behalf of your model.

Writing code to connect with external services may be less efficient than if your model could generate and execute its own code, but it provides more control and determinism when retrieving data from these services. In addition, it safeguards the systems from any malicious code generated by the model if it is directly connected to services. Additionally, You will also be able to moderate what data is returned to the user or the model based on their permission levels and system restrictions. Furthermore, you can still maintain the system if schema of databases and external resources change constantly. With the text-to-SQL and similar approaches, the model may face issues interfacing with services that change schemas periodically.

You can auto-generate database clients to match the updated databases schemas. I will cover auto-generating database clients with Prisma in Chapter 7 of this book. External service providers do often provide fully typed software development kits (SDKs) that reflect their API schemas changes.

Why build generative AI services with FastAPI

Generative AI services require performant web frameworks as backend engines powering event-driven services and applications. FastAPI can match the performance of Go or Node.js web frameworks while holding onto the richness of the Python’s deep learning ecosystem. Non-Pythonic frameworks lack this direct integration required for interacting with a generative AI model within one service.

Within the python ecosystem, there are several core web frameworks for building API services. The most popular options include:

Django

Afull-stack framework that comes with batteries included. It is a mature framework with a large community and a lot of support.

Flask

A micro web framework that is lightweight and extensible.

FastAPI

A modern web framework built for speed and performance. It is a full-stack framework that comes with batteries included.

FastAPI, despite its recent entry into the Python web framework space, has gained traction and popularity. As of writing, it is the second most popular Python web framework on GitHub and on the trajectory to become more popular than Django based on its GitHub stars as shown in Figure 1-2.

star history
Figure 1-2. GitHub star history of popular Python web frameworks - as of October 2023 [Source: https://star-history.com]

It is also the fastest growing Python web framework in terms of package downloads.

Flask is leading the number of downloads due to its reputation, community support and extensibility. However, it is important to note that Flask is a micro web framework and doesn’t provide the same features as FastAPI. FastAPI is a full-stack framework that provides a lot of features out of the box such as data validation, type safety, automatic documentation, and a built-in web server.

Among the frameworks mentioned, Django falls short in performance compared to FastAPI and Flask lacks out-of-the-box support for schema validation. Because of this, developers familiar with Python may be switching from opinionated and older frameworks like Django to FastAPI. I assume the exceptional developer experience, development freedom, excellent performance, and the recent AI model serving support via lifecycle events may be contributing to this.

This book covers the implementation details of developing generative AI services that can autonomously perform actions and interact with external services, all powered by the FastAPI web framework.

To learn the relevant concepts, I will be guiding you through a capstone project that you can work on as you read through the book. Let’s take a look.

Overview of the Capstone Project

In this book, I will lead you through building an autonomous generative AI service using FastAPI as the underlying web framework. The application you will build is a job application assessment bot capable of assessing candidates for technical positions. It will fetch information from various sources, listing sources and performs actions on databases and external services as needed to reach its own conclusion.

The service will:

  • Be integrated with multiple models including a large language model and a Stable Diffusion model

  • Generate real-time responses to user queries as text, audio or image

  • Use the Retrieval Augmented Generation (RAG) technique to read uploaded application documents with a vector database when responding

  • Scrape the web and communicate with internal databases, external systems and APIs to gather sufficient information when responding to queries

  • Collaborate with other models to produce outputs in a variety of format (text, audio, video)

  • Restrict responses based on user permission and prompt hijacking attempts

  • Provide sufficient protections against mis-use and abuse

  • Produce monitoring action logs and conversation histories for human-in-loop reviews

As the focus of this book is on building a service, I will be providing you with the relevant user interface (UI) code with the accompanying code repository, built on React—a popular frontend UI library.

Summary

In this chapter, we looked at the role of AI in software development and how it has helped automate complex business problems using data. We then discussed how GenAI differs from other types of AI models - that it can generate novel text, audio, images or code from learned patterns in its training data.

Later on, you will learn why GenAI can drive the roadmap of future applications due to its capabilities in facilitating the creative process, eliminating intermediaries, personalization of user experience and democratization of access to complex systems and content generation.

Further on, you were introduced to several challenges preventing widespread adoption of GenAI alongside several solutions, including enabling these models to be autonomous.

Finally, you learned more about the project that you will build as you follow tutorials in this book.

In the next chapter, we will talk in more detail about a web framework that you can use in building autonomous GenAI applications.

Get Building Generative AI Services with FastAPI now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.