Chapter 91. Understand the Risks of Using AI in Application Development

Yasir Ali

Generative AI in software development is going mainstream, evident by the 40K+ organizations using GitHub Copilot. These tools promise enhanced productivity by automatically generating code snippets based on given prompts but raises significant concerns about security vulnerabilities. A Stanford study highlighted that developers using codex tend to produce more insecure code, escalating risks in the SDLC.

Most industry studies show an average of 20–30 bugs per 1,000 lines of code written in a given project. Multiply this by the probable billions of lines of code that modern LLMs have been trained on and the odds of having real security issues inadvertently introduced becomes absolutely massive.

Traditionally, vulnerability concerns were centered around software components. But, with the rise in popularity of transfer learning, the reuse of pretrained models, and the use of crowdsourced data, these issues have extended to AI systems. We are seeing a 49% increase in malware package creation quarter on quarter (QoQ) in 2023, which points to this problem becoming massive for 2024.

Main Risk Categories and Recent Incidents

Three main risk categories are identified with LLM use in software development:

  • Security issues such as malicious code insertion and sensitive data leakage

  • Legal liabilities, including ...

Get 97 Things Every Application Security Professional Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.