To Be or Not to Be Agentic with Maximilian Vogel

KIMBERLY NEVALA: Welcome to Pondering AI. I'm your host, Kimberly Nevala. In this episode, it is a pure pleasure to be joined by Maximilian Vogel. Maximilian is the co-founder of BIG PICTURE, where he focuses on AI strategy and agentic AI. He joins us today to talk about what agentic AI is in its current instantiation, and what is the work required to make it work. So thank you for joining us, Maximilian.

MAXIMILIAN VOGEL: Thank you for the invitation. Great to be here.

KIMBERLY NEVALA: Now, before we wade hip-deep deep into the topic of agentic AI, I noticed that some of your educational background was in philosophy and information sciences. And I was wondering what it was about those domains that initially sparked your curiosity and presumably started you down the path to your work today.

MAXIMILIAN VOGEL: Yeah. Actually, in philosophy we talked a lot about AI in those days, which was a little bit past now. But it was from a completely different perspective. It was the so-called body mind problem. Which came up with AI, being which is not a human, which doesn't have a real mind, a real soul, as the philosophers stated, can that really think? Can it even feel? Or does it have a consciousness? And things like that. That's the way from philosophy.

From information science, it's basically that the term data science didn't really exist at that time in academics. It was more like data science. It was really structuring data and doing that with the first tools in AI and things like that. That brought me a little bit on the path.

KIMBERLY NEVALA: So it sounds like you've always had a bit of a natural predilection for this particular area of study and application.

MAXIMILIAN VOGEL: Yeah.

KIMBERLY NEVALA: And it strikes me that that combination of factors and education also stands you very well, not just to address the technical aspects of AI and agentic AI, but also to talk a little bit about the human factors. So why don't we just-- we'll get right to it, then.

I’d like to start with just laying a little bit of a foundation about what it is that we mean by agentic AI. So for you, what is it that distinguishes an agentic system from a non-agentic system? Or, I suppose the other way to ask that is, what are the characteristics of an agentic system?

MAXIMILIAN VOGEL: Yeah. Basically modern agentic systems are all built on generative AI. So the basis is similar or basically the same. But what differentiates them is that agentic systems are able to multi-step processing. Which usually models, at the beginning, weren't able to do.

There are some things now which are built into the models to help them to do a little bit more than one step and build conversations within a session based on the previous turns. Things like processing, maybe, for instance, an insurance claim. That's one thing we did over hours, days, weeks, and maybe even months, that requires an authentic system which can completely keep the context and do a lot of processing steps. Which may not entirely be based on generative AI, but often on deterministic processes as well. So that's the key differentiation.

KIMBERLY NEVALA: Is it true that a lone large language model does not an agentic system make. There's more componentry required. Correct?

MAXIMILIAN VOGEL: No, you usually need more components. So the big vendors build things, components - agentic components - at least into their chat interfaces. But just a model, a generative model, an LLM, for instance, or even LRM is not, by definition, not an agentic system.

KIMBERLY NEVALA: And why is that not? Because we certainly think - or a lot of people are talking about it - and I suppose a little bit tongue-in-cheek, I would say: you can't prompt your way into an agentic system. So why is it really important for people to understand that it is a component and not the whole?

MAXIMILIAN VOGEL: Yeah. Basically, it can be made to be an agentic system by, for instance, adding tools like being able to perform a web search. Or calling MCP tools, model context protocol, or some tools like these. What really differentiates it? It usually can't do the next step in a process on its own based on a classification, or on a decision. So basically it lacks what the agent should have, agency. Basically the ability to decide, I have this data now, I should do this and hand it over to that process. And then this other process goes on forward.

So this is usually something which is not already built into the model platforms and which is not part of the core models like GPT-5 or so from the scratch. It is built into some of the chat interfaces now, a little bit of agency. But yeah, it's not a real agent. If I tell them, process these complex things based on this data, usually it will stop and come back to me and will not resolve the case end to end.

KIMBERLY NEVALA: And I want to talk about the work required to make sure that we're applying agentic AI to the right problems and to problems that we're confident it can solve. Also, how do we get to the highest levels of accuracy?

But what you're saying there speaks to a frustration we hear a lot with organizations. Where it seems like you can prototype things very, very quickly. But getting the system into production… And I think what you've said is the core of the issue. Which is it's easy to mock it up, if you will, even just with a sort of standalone LLM or foundational model. But in fact - and maybe it's the agentic system, that system is the important word - these are, in fact, engineered workflows. Correct? These are engineered systems. And in a very real sense, they require the same level of rigor, development, thought as any other non-AI based system. Is that right?

MAXIMILIAN VOGEL: Yeah. Yeah. Absolutely. Usually you have to build around the models. You have to build around a deterministic process, taking the answers of the models, then getting data from databases based on the extracted data from the models. Like, for instance, a customer identification number. Then coming back with the record, giving it back to the next model session, trying to get the data which is needed out of that, and then going on and going on with the processing. Usually you need a lot of algorithms around it to really make it work.

KIMBERLY NEVALA: And I think there's also a number of, I'm going to call them aspirational, statements. There are also probably assumptions that are incorrect or just not true when we think about developing these systems.

One of them, I think is around this idea of what it means to be confident in the system. And you have said that confidence doesn't come from a model. A model can't determine confidence. Confidence has to come from the humans in defining and describing the system, identifying the problems.

So can you talk about where people get - I don't want to say it's wrong - misled when they think about confidence in how you actually approach this question….

MAXIMILIAN VOGEL: Yeah. It's super complex.

Basically, the background to this confidence thing is if you're, for instance, talking to ChatGPT, it will almost answer any question. Yeah? And that's nice in the direct interaction with you because you can see, OK, no, maybe that's leading in the wrong direction.

But if there is an autonomous agent which really must run on its own, which must be reliable, safe, secure, you need some point where the model says, OK, I don't know based on the data. I can't take a decision. I can't process this. I can't extract the data. I'm not giving you some proxy data or such but I should hand over to a human colleague who maybe has other means to look into this problem.

Our first step into agentic processes, we thought, yeah, we’ll just ask the model if it's confident or if it wants to hand over. It almost never wanted to hand over. It was always confident. Yeah, that's a question I can answer. Yeah, no, it's not a problem. Maybe…but you have to look on it. But that's not the thing we wanted to have. We wanted to have is basically one number - we boiled it down to a floating-point number between zero and one - what is your confidence? 100% confidence, zero, no confidence at all - I don't know.

And this was, yeah, surprisingly hard to get trained into the models because usually they are confident. The answer they give is usually the answer with the highest log probs, with the highest probability of tokens pushed out. And so yes, they are confident. We had to build it into the models, in another way, and use a lot of training data.

For instance, if you get multiple IDs (multiple customer IDs, multiple tracking IDs, Social Security IDs, also) in the mail of a customer don't just pick one or the first or the one most often appearing. But tell us I can't really find the one number you are looking for, where we could look up that record in the database, But there is more than one number and maybe I should not be confident on that and put my confidence lower.

And so we trained the models to a confidence. Here in this case, for instance, if there are multiple numbers and you should just get one, your confidence that you picked the right number must be lower. It must be something at 60% or 0.6 or 0.4 or something like that. And yeah, that was hard. The models bring a lot of confidence to everything they say. And to bring them back into a mode where they just say, I don't know, was not easy.

KIMBERLY NEVALA: When you are trying to assess them, is there a role in that processing stream or in that workflow, if you will, then for first the human team to say, these are the rules for what confidence looks like or not, in a very deterministic way? Which means that, to some sense, it's not really the model, if we're talking about an LLM in this case. And the models you're talking about here might be other components which might not be LLMs, right? To say that, in this case, for instance, there's more than one answer that was returned, and therefore, I need to --

MAXIMILIAN VOGEL: Yeah.

KIMBERLY NEVALA: -- go down a different flow.

MAXIMILIAN VOGEL: It's actually, when we build an authentic solution, we usually build it with the business experts or domain experts who are working with that. And the first question is always what would you do? So what would you do when you get something like this?

And then they would say, oh yeah, that's complicated. Maybe we have to look multiple records up or we write back to the sender of the email, hey, please send us one number so we can look up your record. Otherwise it would be complicated. We can't process multiple records. And if they say something like that, we say, OK, then this should not be a process.

It's maybe the 1% case; it's the 0.5% case. in the process cases. And this should not be one thing we train the model to really do. This should be handed over to a human team to real domain expert, and here the confidence of the model should be low.

So it all comes from humans, all the processes and how to cope with these outliers comes from humans, basically, from definition of the domain experts.

KIMBERLY NEVALA: And we might assume that if we think about what makes a good problem for an agentic system, that it has to be a workflow that is, that all, the instances or the cases or the events can be 100% processed by the agentic workflow. But what you're saying there, and also, I think what I understood in your article, was that that's actually really not the case. You have a different threshold or a different way of thinking about that. Can you talk about that?

MAXIMILIAN VOGEL: Yeah. Yeah. At the beginning, we thought a little bit like that. If we can't fully solve 100% of the problems maybe we should do another case. We didn't identify these cases completely without outliers. Usually that's things where people already put things into web forms, into fields, and select dropdowns. Why would you need a model then? Maybe you could completely process it by algorithms.

But if you have free text and people are writing things, scanning documents, putting attachments, usually you can't really completely foresee what is coming into such a process. So humans on the other side, so everything basically could come.

So we switched to basically a business case where we say we want to process 70% or 80% or something like that, in that dimension, fully automated. Which means from the first email to the closing of the process. And maybe 20% or 25% or 10% or something like that can be handed over to a human operator. Still, this would be a great case for automation. And if you take something like that, you can even automate the work of, yeah, a little bit complex departments. Where you don't get rid of the human team but hand them over the real complex things. Often the cases which have a higher business value.

For instance, in many cases of business, it's even the cases which have a higher monetary value, for instance, insurance cases where the house was completely destroyed or something like that. You wouldn't process that by the model because there are many, many documents and photographs and things like that. But we could help the team if we process the cases where something was destroyed in the value of $300 or $400 and then help them to focus on the high value cases.

KIMBERLY NEVALA: So what else do you look to determine whether a certain problem or processing flow is a good candidate for agentic AI or automation? And what are the characteristics that you would say it's actually not-- this is not a good place for us to spend our time or effort or to address with agentic AI?

MAXIMILIAN VOGEL: First thing is, because we do a lot of testing and evaluation, we usually pick cases where our business partner tells us we can clearly say whether something has been processed correctly or not.

For instance, this can be in logistics. Did a package arrive? In health care, is this the correct diagnosis or not? It's super hard in some cases. In marketing, is this a good copy? Maybe. Maybe not. Maybe it's a little bit generic. Maybe it's on overdrive. That's not the way we communicate, but it's still good. Something like that. If you are not able to determine whether the processing was correct, it's super hard to get better, to do the evaluation and then, in the end, implement it in a successful way.

Basically we focus on the, yeah, not on the unicorn use cases where the model does something magic, but on the mule use cases where the model does some work reliably and safe and secure and things like that. That's easier to implement.

And then another thing is the number of cases. That's super important. I wouldn't automate things because they are complex to build if it's something like 10 cases, 20 cases, 100 cases. You look beyond 200, 500 cases a month, then it's worthwhile to really automate it.

KIMBERLY NEVALA: And again, I think that goes back to what's a little bit of an underlying thread in the whole conversation. Which is there is germane effort, development time, business time required to develop these. And so you want to make sure, as in anything else. This is not the unicorn ‘agentic AI will go and figure it out for you’ scenario we sometimes hear about.

MAXIMILIAN VOGEL: Yeah.

KIMBERLY NEVALA: You mentioned getting to an accuracy. Can we talk a little bit more about what it might take? And maybe just some examples of how you ensure, I think one example you had, you had driven the error rate from 5% to 0.05%. So this is a very, very low tolerance for errors. What were some of the steps and the mechanisms you use to do that?

Because I do think that sometimes we also have an overconfidence - not to overuse that word - in even the ability of something like, oh, something like RAG. Or we're going to just improve our prompts and we're going to be hallucination-free. And hallucinations and accuracy are not always the same thing. But what are some of the concrete things here so that people have a sense of the types of mechanisms required here?

MAXIMILIAN VOGEL: Yeah. Basically one thing is, and still, I did a lot of prompt engineering in the past, but still, it was surprising as well. For me, it's prompt engineering. It's really building the prompt and tinkering with the text in the prompt and trying it out and letting it run through the evaluation set. One thing is really prompt engineering, and the prompts really get very long. But the state-of-the-art models are able to process really long prompts. But that helped us a lot to add a lot of cases.

And often, we are getting better in integrating cases which were malprocessed before. Where we see, OK, this was processed not correctly. Why not? Is it reproducible? Maybe we have to change the prompt. This is a new thing we didn't foresee. We have to bring it into the prompt. So it's really prompt engineering.

Super interesting, and it's a little bit hard to do it on a high level because there are not many prompt engineers out there. So if somebody with a computer science degree hears that, or this podcast, really if you work yourself into prompt engineering, this is needed. It's hard, and you need really good prompt engineers on that.

On the other side we had a few tricks. But you had a question on prompt engineering, or?

KIMBERLY NEVALA: Oh, no. Well, I was actually interested, when we talk about prompt engineering, I don't know if maybe it's the mindset or the proclivity. The way a scientist or an engineer think are often a little bit different. So I'm interested, when you are thinking about the idea of a prompt engineer, is this more the profile of what we might think of as a traditional data science or data analyst? Or is this someone with more of a traditional engineering bent, if you will? Or is it some combination thereof? Hopefully not a magical combination thereof, but…

MAXIMILIAN VOGEL: Yeah. Yeah, maybe a magical combination is not required. Often, I prefer people from the data, with data science background who ventured into data science. Because often for the guys from the engineering department, the problem is solved if it works three times. Yeah?

KIMBERLY NEVALA: [Laughing] I don’t know what engineers you're hanging out with, but I think I might be hanging out with different ones.

MAXIMILIAN VOGEL: I mean, maybe one. It works on my computer.

But for the data scientists, they know, OK, they really have to run it against large sets of data points and then get the numbers out and then know whether it works or not. And this approach, the prompt engineering, is a little bit of data science, a little bit of engineering as well. It's a little bit of black arts as well. Because some things work and you don't know exactly why they do work, but they do work, and you still keep them in your prompts. But data science is usually good. But I've quite a couple of engineers who work into this field as well and that worked as well.

KIMBERLY NEVALA: And back to that question of accuracy and addressing things like hallucinations.
Because I think as we both well know, hallucinations really are a feature and not necessarily a bug of a large language model.

And so one of the examples I had come across, I think it was perhaps in your article, that I thought was really interesting was that although you were using… and I think this was in response to craft a response to a customer. Maybe it was an insurance claim or not. I don't remember explicitly.

You go through this process, and there were a bunch of models and things in that workflow that would identify what the object was, figure out what the value was, and then maybe use the LLM to generate the written response for the client. But you then took that response and checked it against a validated list of responses. Which, in theory, I think then allows you to avoid this issue, for instance, with Delta where we had someone write saying, oh yes, we absolutely will refund. Or that situation where somebody managed to get a $1 deal on a very expensive car.
It did also beg the question, or it could beg the question, I suppose, of, well why do you need the LLM in there if you still have to go to that sort of set list anyway? And I think this is interesting--

MAXIMILIAN VOGEL: It's a super interesting question. Yeah.

The thing is you can talk almost any model, even the best models, into granting you a refund if you're smart enough. So because of this late package my grandmother died and I think I'm going to sue you. But if you grant me this upgrade or something like that, at one point or the other, the model, if you hit the right tone, get the right message, they will give in. And it's super, super hard and it requires a lot of engineering to avoid that.

So our way was there is no such thing as a refund message except in one case where we have a classification and so on. In processing we see that and there is a refund message and there is a defined refund which we can give to the client. They will never be able to talk us into a refund. This will just be ignored.

So we have defined messages a model can give out. And the magic we get from the model, or we borrow from the model, and what the algorithm processes can't do is basically first a classification, to do a really correct classification. And the classification is never something like the customer is really in great need and now needs a lot of money from outside immediately. The classification is this is justified because it's case seven. We lost a package or the flight was not on time and things like that. And if this classification is correct (there are) quite a couple of text templates the model then can choose based on a classification. Or even the algorithm can choose based on a classification. And the extracted data of the models are being filled in.

And the thing is when we talk to departments on the customer side - we don't work, in our daily work, we are software developers, scientists, and so on, we don't work like this. But on the customer side, in their departments, they work with these text templates as well. For instance, when you get a message about a refund, about grants or something, all things involving money, usually it’s not somebody types in email to you. They are using predefined text and fill in the data. And for them, that was supernatural and absolutely no problem.

And if you're going into the finance area, like insurance or banking, usually they prefer everybody to use these templates. Because they are sure on the legal side there's nothing that the employee could write which maybe could construct something where they go to their lawyer and tell them, hey, we can sue them. They wrote this and that. But everything is vetted and checked. And they prefer using something like that for the new jobs. Basically, you can only do that for the mule jobs. For the unicorn jobs, it would make no sense at all.

KIMBERLY NEVALA: Yeah. And those are the unicorn jobs that you actually wouldn't--

MAXIMILIAN VOGEL: You're not doing that.
KIMBERLY NEVALA: Yeah. You wouldn't recommend going down this path anyway. But again, I think what's really interesting there is the work required to define what those cases are discretely and also what are-- there's a bounded set of responses. Which, again, when we think about LLMs out in the wild, there's a proclivity to think, oh, well, the great thing about this is all of these can be customized responses. That's really not what we're trying to achieve here. This goes back to the point is safe, secure, fully automated, highly, highly accurate processing.

MAXIMILIAN VOGEL: Yeah. Yeah. Absolutely.

KIMBERLY NEVALA: Now, I guess maybe to that point too, you also really encourage folks in identifying what problems to solve and then what success looks like with an agentic system to, just like any other developed, engineered system, to focus on KPIs versus features. Can you talk a little bit about why that's important and what you even mean by that? And what dangerous areas you can get into if you're not careful about this distinction?

MAXIMILIAN VOGEL: Yeah. One standard way, going into software development, is that the client usually delivers something, like a specification gets to the service provider and/or vendor and says, OK, this is our specification. Please implement that. We want to have this implemented.

You can do that with AI as well, but it can be a super complicated and cumbersome process. Both for the client and for the service provider because mostly 80% of the work that's a standard is working on the outliers, on the special cases and things like that.

So we turned things around and told our clients, and they were really happy to accept, that basically, that's not what you want to do. You don't want to implement a process specified but you want to save something. Maybe save time or process more and do a little bit of automation. Let's just do an agreement on KPIs. Something like 80% processed, fully automated, and from this 80%, 99.5% or 99% processed correctly.

That's basically something they have in their human teams as well. They make errors, write wrong emails, mix up data, and things like that as well in some of the cases and do something like that. And then we identified many cases, clustered them to greater cases and said, OK. If we add up these 12 cases or something like that, it's more than 80% automated processing and let's focus on that. And everything else is going into the manual bucket that's processed manually. And of this 85% we selected, even 5% or 7% will go during processing into the manual bucket as well, because the confidence gets lower at one point. And the model says, OK, let's give that to a human.

KIMBERLY NEVALA: So you've been working and actually truly developing and deploying agentic systems, again, as currently defined. I think folks will quibble about-- because these are obviously not systems that are creating their own objectives and deciding what to go work for. That is actually decided for them.
But that quibble with the original definition aside, are there any other key lessons learned or recommendations you would give to folks? That you've developed or that relate to things you hear people saying out in the wild, if you will, that will not serve them well in going down this path?

MAXIMILIAN VOGEL: Yeah. Some things you really, or one thing you really, need is data.

Data not in the form that everything must be cleaned up before you build up the agentic system. We usually work on live data. So if the client's user fills in this or that form, we take that. If the guys from the warehouse, from the finance department or so on, rider smells it this or that way, we take them and have to process them. So we are working on live data, but we need evaluation data.

So from a set of live data, we take evaluation data where we can say, here this is processed correctly. Here this is not processed correctly. Here this assessment is being done or the classification's being done correctly. Here it's not being done correctly. You need a lot of data. If you don't have the data, at any time, you have to go back to the client again. Is this correct? Is this not correct? Or something like that. And you can't test for yourself. So that's one thing which helped us a lot.

Maybe for other service providers, what also helped us to bring the process live. We usually put the process live at a client in a way that we have something like a three-month period where nothing leaves the company. No money, no letter, no email or nothing without a human in the loop. So we usually have a three-month period of human in the loop.

We offer that to them, a mechanism, a website where they can, again, look in on the process, look on the results, and then say, OK, fine, that's going out or not. And this is helping, I'm using it again, helping to build confidence on the client side that the process really works and that they really have it in their hand. And they don't install a system and then, yeah, have a major shitstorm because hundreds of their clients are complaining. We got this email, this is completely wrong, or something like that. In this phase, we still can learn. Usually we go from an accuracy then from maybe 97% to 99%, because we identify, in a real productive system, we identify other cases which we didn't see in the test system. Which we then can optimize or correct. That's another thing.

KIMBERLY NEVALA: Well, and I think that's actually a really important point. Because it speaks to a point of-- it's a sort of a deployment phase, transition phase that I imagine, as you said, provides you the opportunity to ensure that the rules and the boundaries that you've designed and defined are, in fact, working.

But also, humans are innately creative in the way that I imagine no matter how much test data and scenario is pulled together, someone like myself, who is just particularly stubborn about things, will get in there and perturb the system in some way. Just because you sort of refuse to give up. And let's be honest, sometimes customer service processes do not favor those who are not prepared to invest some time and energy. But that aside.
But there's also then that this actually shouldn't be seen as the traditional post-deployment, we're just going to monitor, in the traditional software sense. This is actually still an active development testing phase, right? Which means the way that you resource the project and the levels of support are very different. It still does not negate the need to do that very, very thoroughly in the previous phase. So it's additional.

MAXIMILIAN VOGEL: Yeah.

KIMBERLY NEVALA: It's an additional phase, I would say, to what we think of as the traditional.

MAXIMILIAN VOGEL: It's basically an additional phase. And here, our model for this process is a little bit human teamwork and human operations as well.

So if you bring a trainee into your team, for instance, you may not let them answer, if it’s a financial service provider or something like that, you may not let them answer client emails at the first day. But in the first days, they have to give it to their supervisor. And then if the supervisor gets confident and says, hey, good man, good woman, they are really doing a great job. Then they will give you the allowance to directly answer the client’s emails or whatever. It's a little bit formed after the human process. Maybe that's one thing.

Maybe another point. We think of the agent system not so much as a software system but as a new colleague or a team of colleagues supporting the rest of the team. So usually we build that that they communicate with the other team members via tickets or emails or something like that. Not via some weird interfaces where you have to log in and go there. But they're communicating with the rest of the team with the same means as the team members do as well. I mean, when they're communicating with each other, that makes it easy to integrate into the process and quite easy, yeah, to supervise as well. Because there are supervision structures in every application, that usually all of these customers work with. So that's a little bit of a model which helped us greatly to bring it into the organization.

KIMBERLY NEVALA: Mhm. Now I could, and perhaps another episode would, quibble, perhaps a little bit with the characterization of it as a colleague or a teammate. Because I think there's other considerations there. But what I do take from that and agree is what you're saying is that the interface of what we think is the system outputs. So maybe it used to be in the old, old days, it used to be an error log, or it was a report. Now the interface, the outputs here, and what gets put back is in the same format, whether it's an email or a chat or something, that interface is in that natural language. And using the same sort of forms it would have been if it was a, as you do with your, human colleagues.

MAXIMILIAN VOGEL: And there's nothing in the way the agent does this-- unlike other software systems where the human person sits before and says, oh, I can do anything. The system decided this or that. They can take the case, do it their way, and say, yeah, no, that's wrong, how you did it. It's more like a counterpart than an IT system they communicate with. And that's helping. Usually the humans are much smarter and much, much, much better in their business processes. So usually the way it goes is from the system to the human say, please help me.

KIMBERLY NEVALA: Yeah. I think the language, the language gets tricky. And as I said, I tend to take issue with it more because I think that when we talk about systems as colleagues or people, there are implicit assumptions people then make about capability, about capacity, about even agency, funnily enough, that then I think lead us down paths. And it's the old conversations about overreliance and also all of those kinds of things. But that's a conversation, I think, for a whole other day, because it's a whole other conversation.

But that's a good point, though, about thinking about what the interaction model is between that system and the humans in the loop. And so because this is Pondering AI and we always want to do this, I would love to touch a little bit on your observations of that people factor. There are quite a few, as you've said, aspects to that.

But we are fairly unapologetically here talking about being able to automate 80% of certain processing flows, which certainly has some impact one way or the other on people's jobs. So how do you think about that? And do you see a difference in how people react or respond to that, for instance, in Europe versus on this side of the pond where I am in the US? Is there a difference in attitude about that?

MAXIMILIAN VOGEL: Mhm. Yeah.

So the first thing is when we install the system to at least interface to the people more like colleagues, one problem does not arise, which you usually have with other IT systems, that the humans have to learn into something. They don't. So they do their same job, they just focus. That's the only thing. Their job gets much more complex because they really can do only the complex cases. Because the easy cases are done by the system.

So usually the people, the remaining people on the top, are quite happy because they get a little bit of an upgrade in their work. Which is even an upgrade financially because they're more in a supervision position. And that's helped them. That helps them. But still, basically very often a rationale on doing that is paying less wages to get part of the team, of the department, maybe in other departments and things like that.

It's a problem. It's an open question. I thought, until a few days ago, when this MIT study arose, that it's really already making an impact into hiring and layoffs. But they say, no, not really if you look at all the numbers. But maybe it will. For instance, in these junior jobs where people doing, yeah, easy, repetitive jobs to get into the work, to learn the work, this is clearly endangered. This is clear and we have to talk about it. I don't have a solution for that.

I think in the history of automation, people went into new jobs. But there is this question, clearly. So basically, what we have seen, the teams in Europe, they sort of appreciated it, really, because they got other jobs in the company which are a little bit more demanding. And usually in most European companies, you have something like a job guarantee that you are not losing your position. Might be quite different in the US where they say, OK, basically now we don't need you anymore. But yeah, still, this is an issue. And in Europe, it will translate into, I think, into fewer hirings then. So they will not employ the same amount of people if they're doing automation. It will translate into fewer hirings, and this still is an issue. Youth unemployment and things like that.

KIMBERLY NEVALA: Yeah, so something I think we are all going to have to grapple with and address somehow. And certainly not in the tongue-in-cheek, well, now you'll just have time to do the things that give you bliss.

There's also an interesting point there that you were talking about. Because you are working to develop these systems in close conjunction to the people who use them, to the employees themselves. Which is how they actually now have the opportunity to work on more interesting or more complicated cases.
However, they have likely developed their expertise to be able to do that by cutting their teeth on those simpler cases. We really are going to have to think about how you actually develop that. That's obviously a broader open question across lots and lots of domains and lots and lots of areas. Which is, there's a ladder there and we've taken a few of the lower rungs out and what is that going to look like. But as I said, a conversation for another day, I'm sure. But I appreciate very much that, even as you are doing the work, you and your company are also thinking and cognizant about that.

So all of that being said, as you look at the work you've been doing with developing truly and productionizing truly agentic systems and the broader conversation, any sort of final thoughts or things you'd like to leave with the audience before we go?

MAXIMILIAN VOGEL: Yeah. Maybe the only thing, it's one thing we already mentioned, is really focus on the dull jobs. It's not fancy stuff.

Usually if you've seen the presentation of Open AI on GPT-5, it's all the fancy stuff. What can the model do? I write a few lines of code, and it builds a learning application for my girlfriend to learn French or something like that with a mini game integrated. Yeah, bullshit. Basically, yeah, it's nice. It's nice, it's playful, and I like it. I like to watch it and things like that.

But if you're a company, they're too often focused on things like that. Not anymore, but in Europe, there were people who were really trying in larger enterprises to build a digital twin of the CEO. Oh, no. Focus on the dull jobs and focus on accuracy, reliability, security, safety. And then you will not be part of the 95% of the MIT study will fail, but of the maybe 5% who really make an impact on your business. Yeah. Last word. Long last word.

KIMBERLY NEVALA: I think it's a very important and wise and well-seasoned last word from someone who is actually deploying these systems and deploying them at scale. Which is still, despite the publicity, very rare. So I think everyone would be well-served to listen to that insight. And I really just want to thank you so much for coming on today and sharing your experiences and those lessons learned, really, from the trenches.

MAXIMILIAN VOGEL: Thank you very much.

KIMBERLY NEVALA: Awesome. And if you'd like to continue learning from thinkers, doers, and advocates such as Maximilian, you can subscribe to Pondering AI now. You'll find us wherever you listen to podcasts and also on YouTube.

Creators and Guests

Kimberly Nevala
Host
Kimberly Nevala
Strategic advisor at SAS
Maximilian Vogel
Guest
Maximilian Vogel
Co-Founder, BIG PICTURE
To Be or Not to Be Agentic with Maximilian Vogel
Broadcast by