Skip to Main Content
Generative AI in the Real World: Measuring Skills with Kian Katanforoosh
audiobook

Generative AI in the Real World: Measuring Skills with Kian Katanforoosh

by Kian Kantaforoosh, Ben Lorica
January 2025
30m
English
O'Reilly Media, Inc.

Overview

How do we measure skills in an age of AI? That question has an effect on everything from hiring to productive teamwork. Join Kian Katanforoosh, founder and CEO of Workera, and Ben Lorica for a discussion of how we can use AI to assess skills more effectively. How do we get beyond pass/fail exams to true measures of a person’s ability?

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Points of Interest

  • 0:00: Introduction
  • 0:28: Can you give a sense of how big the market for skills verification is?
  • 0:42: It’s extremely large. Anything that touches skills data is on the rise. When you extrapolate university admissions to someone’s career, you realize that there are many times when they need to validate their skills.
  • 1:59: Roughly what’s the breakdown between B2B and B2C?
  • 2:04: Workera is exclusively B2B and federal. However, there are also assessments focused on B2C. Workera has free assessments for consumers.
  • 3:00: Five years ago, there were tech companies working on skill assessment. What were prior solutions before the rise of generative AI?
  • 3:27: Historically, assessments have been used for summative purposes. Pass/fail, high stakes, the goal is to admit or reject you. We provided the use of assessments for people to know where they stand, compare themselves to the market, and decide what to study next. That takes different technology.
  • 4:50: Generative AI became much more prominent with the rise of ChatGPT. What changed?
  • 5:09: Skills change faster than ever. You need to update skills much more frequently. The half-life of skills used to be over 10 years. Today, it’s estimated to be around 2.5 years in the digital area. Writing a quiz is easy. Writing a good assessment is extremely hard. Validity is a concept showing that what you intend to measure is what you are measuring. AI can help.
  • 6:39: AI can help with modeling the competencies you want to measure.
  • 6:57: AI can help streamline the creation of an assessment.
  • 7:22: AI can help test the assessment with synthetic users.
  • 7:42: AI can help with monitoring postassessment. There are a lot of things that can go wrong.
  • 8:25: Five years ago in program, people used tests to filter people out. That has changed; people will use coding assistants on the job. Why shouldn’t I be able to use a coding assistant when I’m doing an assessment?
  • 9:16: You should be able to use it. The assessment has to change. The previous generation of assessments focused on syntax. Do you care if you forgot a semicolon? Assessments should focus on other cognitive levels, such as analyzing and synthesizing information.
  • 10:06: Because of generative models, it’s become easier to build an impressive prototype. Evaluation is the hard point. Assessment is all about evaluation, so the bar is much higher for you.
  • 10:48: Absolutely. We have a study that calculates the number of skills needed to prototype versus deploy AI. You need about 1,000 skills to prototype AI. You need about 10,000 skills for production AI.
  • 12:39: If I want to do skills assessment on an unfamiliar workflow, say full stack web development, what’s your process for onboarding?
  • 13:17: We have one agent that’s responsible for competency modeling. You can have a subject-matter expert (SME) share a job description or task analysis or job architecture. We take that information and granularize the tasks worth measuring. At that point, there’s a human in the loop.
  • 14:27: Where does AI help? What does the AI need? What would you like to see from people using your tool?
  • 15:04: Language models have been trained on pretty much everything online. You can get a pretty good answer from AI. The SME takes that from 80% to 100%. Now, there are issues with that process. We separate the core catalog of skills from the custom catalog, where customers create custom assessments. A standardized assessment lets you benchmark against other people or companies.
  • 16:32: If you take a custom assessment, it’s highly relevant to your needs, even though comparisons aren’t possible.
  • 16:41: It’s obviously anonymized, right?
  • 16:51: That’s right. It’s industry anonymized.
  • 17:03: Some domains bring tools to the table. Most of the frontier model builders are interested in math. But math has automatic theorem provers and verification tools. Are there other domains that have tools that you can leverage?
  • 18:18: We have a pyramid we call Bloom’s taxonomy that represents the cognitive levels we use at first: the knowledge level, applied level, and solving open-ended problems. Each level needs to be measured differently. Some are closed form; some are hard or very difficult to validate.
  • 19:39: What percentage of Workera usage is at the job application stage or higher levels?
  • 20:00: The majority of people right now are measuring to upskill, to acquire AI skills, data skills, soft skills. The second application is probably project resourcing. Managers have no idea who’s good at what. Where are my experts? What level are they? Then you have a lot of internal mobility. They’re looking to maximize and upskill the current workforce.
  • 21:11: In the past, people used some kind of knowledge graph to find out who to put on teams. Now, you can more accurately determine who worked on what.
  • 21:45: There’s a large category in HR called talent intelligence. They look at the data in the company and try to find matches internally. For example, I need a Python programmer. In practice, everyone has Python on their profile. That’s where assessments become important.
  • 23:00: The worst-case scenario was a consulting company that sends 10 people, and the company rejects all but one. Now, the consulting company can do that filtering before sending people over.
  • 23:34: You’re right. And it’s not an issue if the consultants have already been assessed.
  • 24:04: What would you like to request from the frontier model builders?
  • 24:09: One pain point is observability. LLM observability is very different from software observability. The second big problem is repeatability: For the same calls, we should get the same results. We want an assessment that is trusted and repeatable.
  • 25:18: We tailor the assessment based on HR data.
  • 25:54: Is there a limitation in the nature of the models themselves? What about visual or multimodal models?
  • 26:09: We already have text and audio. Video is big; we’ve tried it, but it’s not good enough yet.
  • 26:30: We have an Effective Communication Essentials assessment. We want those to be situational: pitching an idea to the CFO. Today you do it through text or audio. But adding a video layer makes it more immersive.
  • 27:17: Are there assessments that require the user to interact with a graphical user interface? You would need the model to understand what the user is doing in the GUI.
  • 27:39: Today we have whiteboards, but it’s not as advanced. The whiteboard is prefilled, and the person works on the whiteboard and submits. The next step is real-time feedback and maybe the ability of the assessment to draw on the whiteboard alongside the user.
  • 28:07: We’ve talked about skills and skills assessment. On your website, you also mention mentorship. Is that where you want this to go?
  • 28:29: Mentorship is big. But when you look at mentorship as a system, it’s really three subsystems. The mentor assesses the students; sets goals for the student, helps them to dream bigger; finally, it’s guidance. Guidance is what people think of as assessment, but assessment and goal-setting are the most important part.
  • 30:03: Goal-setting requires a lot of human interaction, right?
  • 30:21: Most goal-setting is qualitative. But on Workera, you can set a goal in terms of scores on standard assessments. You can start thinking about goals as more quantitative.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Generative AI in the Real World: Shreya Shankar on AI for Corporate Data Processing

Generative AI in the Real World: Shreya Shankar on AI for Corporate Data Processing

Ben Lorica, Shreya Shankar

Publisher Resources

ISBN: 0642572013978