Chapter 17. Large Language Models and the Practice of Data Science

According to one estimate, almost four thousand jobs were lost from advances in AI in May 2023 in the US, representing almost 5% of all jobs lost in that month. Another report from a global investment bank estimates that AI could substitute 25% of all jobs, and OpenAI, one of the main players in the field, estimates that almost 19% of all occupations have significant exposure, as measured by the fraction of tasks that could be impacted by AI. Some analysts claim that data science is itself amenable to be affected.

So how will large language models (LLMs) like GPT-4, PaLM2, or Llama 2 change the practice of data science? Will the hard parts presented in this book, or elsewhere, remain important for your professional development and career advance?

This chapter is quite different from the previous ones, as I won’t discuss any techniques, but rather, I’ll speculate on the potential short- and medium-term impact of AI on the practice of data science. I will also discuss whether this book’s content might pass the test of time with the current disruption of AI.

The Current State of AI

AI is a broad field that encompasses many different techniques, methods, and approaches, but is generally associated with the use of very large neural networks and datasets. In the past few years, the pace of advance in the fields of image recognition and natural language processing has increased substantially, but it is the latter, with ...

Get Data Science: The Hard Parts now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.