AI Agents: Building Autonomous Systems
Insights
- Multi-agent systems transform foundation models into coordinated, task-specific teams that can plan, code, validate, and self-improve.
- Autonomous scientific workflows can now complete months of cosmological research in minutes—with results surpassing previous human benchmarks.
- AI agents bring transparency and reliability by breaking problems into interpretable, data-grounded steps, reducing hallucinations and improving accuracy.
How close are we to a world where AI conducts science on its own?
This talk features Professor James Fergusson, Executive Director of Data Intensive Science at University of Cambridge and Director of Infosys-Cambridge AI Centre.
Professor Fergusson breaks down how large foundation models—like ChatGPT, Gemini, and Stable Diffusion—serve as the “engines” of the AI revolution, while agents act as the “car,” giving those engines direction, control, and purpose. He illustrates how planning, control, and validation agents can work together like a scientific team—designing experiments, coding analysis, reviewing outputs, and checking consistency.
At Cambridge, his group has built a multi-agent system that autonomously conducts cosmological analysis, compressing six months of work into ten minutes and producing results eight times more accurate than previous research. This marks a leap toward a future of self-driving laboratories and machine-led discovery.
This session offers scientists, engineers, and technologists a window into the next era of AI-driven science—where agents don’t just analyze data but generate knowledge itself.
Professor James Fergusson:
Welcome to this talk on AI agents or how we use them to build autonomous systems. I'm Professor James Fergusson. I'm the executive director of the Data Intensive Science Group here in Cambridge. I'm also the director of the Infosys Cambridge AI Center. So let's think about agents, right? And so the real revolution that we've seen in AI has come from these foundational models, right? ChatGPT being the first, but there's also Gemini and Claude on the language side. On the image side, you have things like Stable Diffusion and Segment Anything. Also, there's things like Polymathic, which is one for numerical data that we're building. And they are really the, you know, what's powering the AI revolution. What we've seen is the ability to harness them with agents. And this has come up, there's lots of different versions, AG2, LangChain, LangGraph, Qure.AI, which essentially fancy little hats we can put onto foundation models that allow us to tune them to specific tasks. So we can take the very great power of ChatGPT, but we can say, right now you only write code. And that's something that we've seen a lot with things like Copilot. And I think the really nice way to think about these foundation models really are the engine. They're the steam engine driving the revolution. They're very, very powerful, but on their own, they're not that useful. And what agents are is they tend to turn the engine into a car. And so what you do is you take the engine, but now you have to build the wheels. You have to put a steering wheel in. You have to put bumpers. You have to put windscreen wipers. And that's what the agents are. They're an attempt to take that raw power and to harness it into a really useful tool. And so... what we do when we think about agents is we take that large language model, and we try and tune it to specific tasks, and then we can build teams of them. And so here are a lot of common tasks that we tune agents towards. We can have data agents, which interact specifically with databases. You can have particular databases. You can have a big folder full of documents. You can have one that looks at the web. You have tool agents which actually command systems through particular things, right? And they can do this because the way to talk to systems is via code and code is essentially just a specialized form of language. So there are ones that write that can execute work with software. There are ones that can search webs. Ones that write reports, generate plots, reference things. Then you have coding agents, which are ones that can actually write code or execute code. So this is one of the great powerful things of agents is rather than asking a language model to generate information saying what's my sales forecast going to be, you can say to an agent, please write some code that will calculate what my sales forecast could be next month. And so those three there in sort of the bluish greenish colors, they're like the worker agents. They do the hard work. They're all tuned to a specific task. And then you have to have the coordination ones, which are these yellow ones, the system agents. You have a planner that says, if I'm given a task, how should I break this down and how should I decide what order we should do those little segments?
You have a memory one that keeps track of what everything's done, and you have control which decides which agent should do what and when they should do it. And this is where the human could go into the loop to help decide, yes, I approve the plan, or yes, we should transfer this to the coding agent to write that up, or the code that's run from it, yes, that's good enough, we should run it. But the really exciting thing about agents is the last part, the red part. And this is essentially, if we put the car analogy where we build the bumpers and we build the brakes. And we can put validation agents. We can say, if you generate a plan from the planner, we can then validate that plan. Does that plan make sense? Is the report we generate well written? Does it follow the data? Is it making stuff up? This is where we fight hallucinations and things like that. Are the plots relevant to the text? Are they well written? Do they reflect the data? We can have checkers that if we say this data came from the source, we can have an agent that goes and says, well, I'm going to Google that source. I'm going to check that that actually matches what we say it does. Does the table that we generate match the text? Is the table full of real numbers from real places or has it been made up? And then consistency. Are all the things we're saying consistent with the other things?
I think this is the great thing is that LLMs or these big foundation models are just like people. They're much better at criticizing than they are at generating. And so having these agents really helps solve a lot of the key problems that we have with AI with hallucination and with fabrication. And so how do we put these all together? Well, we try to at the moment, one of the most planning ways is to do a planning control strategy. So you have the top level there, which is the planner and the plan reviewer, where you say here is the task, the planner will say, okay, in order to complete that task, I need to do these things in this order. The plan reviewer checks that plan and says, actually, maybe can you make that plan better? Can you revise it? They do a certain number of iterations, then they pass that plan once it's agreed to the control agent. And the control agent then forces those tasks, distributes them between all the worker agents to complete those tasks and eventually work towards the solution that you're working for. So you have to define each chunk that needs to be done, it's completed by agents and you define the hierarchy. And this really, because LLMs are trained on human things, they are a bit like people and so the way we structure them is very much like a company. You can think of the planning and the plan review stages really like the CEO and his board or the C-suite and their board deciding what does the company need to do. How do we want to strategize this? What kind is our real goal? And they review and then once they get the idea, they pass it to the control one, which is really like your manager saying, okay, we have to do this, pass it to the manager and then he works at how to achieve that with his team. And so we've built things like this in Cambridge and we've built them with automating cosmology. So our goal is to build a self-driving lab that can do all of scientific data analysis in a fully autonomous way. So this is taking the initiative from the wet labs in biology or something where they've managed to have these fully automated labs that sort of generate experiments, run the experiments, collect the data in a way where a human doesn't have to sit there manually doing all the things. And so here's an example of what our system can do. Once you build these, a bit like building a car, you know, it's a difficult engineering challenge. Once you've built it though, it's a very, very useful tool. And obviously you can build bad cars and you can build good cars, but if you've made a really good one, then you can do things like this, right? So we asked our system to build an emulator for matter power spectrum, which is a particular code we use in cosmology quite a lot. It comes up with a plan. It says, okay, I'm going to go out, I'm going to work out what are the real candidate architectures for neural networks that we should consider. I want to generate all my training data. I've got to code up all of those emulator architectures that I've found in the literature. I want to train all of them. I want to evaluate the results and I want to generate a report showing the evidence of which one is best and how accurate they are and how they perform. And so it can do this. And we give it that task. It comes back with nice paper, looks about at this, talking about the efficient emulation of the matter power spectrum. It's got nice plots in it, which show you what the accuracy is like and how they all perform. And the amazing thing, it does this in 10 minutes. And it does this in 10 minutes, but also the result it came with was eight times better than anything in the literature before. And so this essentially is taking what would be six months of research. And now you can do it while you go away and have a coffee. And so this is the real power of multi-agent systems is that they allow you to automate these complex workflows by breaking them into small achievable chunks. The same way you get any organization to achieve a complicated task is you break into small chunks and you give that to each person who can then do it. They are interpretable because they're doing it in small chunks and then they interact to each other via language, they make a nice record of what they have done, which makes them much, much more interpretable than if you use, say, a deep neural network to work in a sort of a black box type way to solve a problem. Once they're built, they're very flexible. Our CMB agent code has been built for cosmology, but we can use it essentially for any data analysis task. We've also used it to build and write essays, to analyze very different data and finance or other areas. And so they're very flexible. They're flexible because they understand language and they can adjust to what you ask them to do much more easily than any other more rigid system that we're used to. They can be robust. They will be as robust as you design them to be. The same way you can build a car that doesn't have any brakes and it will be very dangerous. You can build a car that has excellent brakes and is very, very safe. So if you build in these opposing agents that check everything you do, you can make something that's really robust and really reliable. The output they give you can be grounded in data. So they can write code to analyze data rather than making things up that I think sound sensible, which is the previous problem we've had with using large language models for particular tasks. And the last thing is once you've written these, they really free you up, right? They allow you to explore data, but also processes. A bit like the emulator one, we can say, what is the best way of doing something? And it can do that sort of legwork of just trying everything out and giving you the full report very, very quickly. And I think this is really the exciting power of it is you can automate things, but also you can explore really, really fast with these.