The Potential of AI Agents: Three Stories Drawn from Real-World Experiences
Image: Depositphotos
About three years ago I started following and writing about Generative AI (GenAI), Large Language Models (LLMs), chatbots, and related topics. Then a few months ago I started hearing about agentic AI, and since then I’ve been trying to better understand what agentic AI is all about.
I asked Babson College professor Tom Davenport, a prolific writer whom I have long known, if he has written any recent articles on agentic AI. Davenport recommended Agentic Artificial Intelligence: Harnessing AI Agents to Reinvent Business, Work and Life, — a book published in March of 2025 that was the result of a unique collaboration among a diverse team of 27 professionals in business, academia, programming, and research, — where Davenport was one of the principal authors.
As we can expect given the number of co-authors, this is a long, comprehensive book at over 550 pages and 14 chapters. I found the book’s Introduction chapter to be a very good overview of the value of Agentic AI, particularly its explanation of the difference between Agentic and Generative AI. Let me summarize what I learned from the Introduction.
The Critical Limitations of Current Generative AI Systems
“While most businesses are still figuring out how to use ChatGPT for writing emails and creating chatbots, a new breed of organizations is fundamentally reimagining what’s possible with AI,” the book noted. “But here’s the paradox that’s holding most organizations back: We’ve built generative AI systems that can think brilliantly but can’t actually do anything. They can analyze complex data in seconds, write compelling presentations, and offer brilliant insights on any topic. Yet they can’t press a button, send an email, or make a simple reservation. We’ve created a world of brilliant advisors who can’t lift a finger to help.”
“This situation isn’t just inefficient—it’s actively harmful,” the authors added. “The more sophisticated AI becomes at thinking and analyzing, the more humans are forced to handle mechanical, repetitive tasks. Knowledge workers now spend up to 60% of their time on “work about work,” — copying data between systems, fact-checking AI-generated content, and manually executing what generative AI recommends.”
“We’re treating humans like robots and AI like creatives. It’s time to flip the equation. … Companies invest millions in cutting-edge AI only to find their employees spending more time managing these systems than doing meaningful work. The machines dream while humans grind.”
The book illustrates the limitations of current GenAI systems with three interesting stories drawn from real-world experiences. I will briefly describe each of these stories.
The Family Vacation: When Machines Dream and Humans Grind
Brian asked ChatGPT for help in planning their long awaited family vacation to Greece. “Show me a two-week itinerary for a family of four in Greece,” he typed into ChatGPT, adding details about his children’s interests in Greek mythology. Within seconds ChatGPT generated a perfectly crafted daily itinerary “filled with hidden gems, local experiences, and thoughtful touches” including charming family-run hotels and restaurants, out-of-the-way museums, ancient buildings, hidden beaches, and much more.
Brian was delighted with the AI’s impressive suggestions, which included a detailed daily breakdown for their two week vacation. But now, it was up was to him do the real hard work, — “checking availability, comparing prices, and attempting to turn the AI’s perfect fantasy into bookable reality.” After a few hours his delight turned to frustration.
“His experience crystallizes what so many of us expect from AI versus what we actually get. We want technology to handle the tedious parts — the endless browsing of flight options, the cross-referencing of hotel reviews, and the mind-numbing task of finding availability across dozens of booking systems. Instead, AI has become remarkably good at the enjoyable parts of planning — dreaming up possibilities, suggesting adventures, painting pictures of perfect moments — while leaving humans to handle all the practical details.”
When AI Met Reality: A Cautionary Tale from the Research World
Three weeks before Dr. Jessica Sing was scheduled to present her groundbreaking research on the impact of climate change on global food security at a UN Climate Summit, she received a call that her father had suffered a severe stroke. She immediately flew to Singapore to be with him in his final days, delegating the research completion to her capable but inexperienced team of postdocs and research assistants. “Use whatever tools you need,” she told her research team during a rushed call from the hospital. “Just make sure everything is verified and rock-solid. The world will be watching.”
Her team had taken that permission and run with it, using multiple AI tools to complete the massive data analysis on schedule, finding patterns and citations they hadn’t considered before. But when Jessica checked the cited papers two days before the Summit, she found that they didn’t exist and the analysis didn’t match any known studies. With the Summit looming, Jessica called the organizers and withdrew from her keynote slot.
This story highlights a dangerous gap between generative AI’s apparent capabilities and its actual limitations. While GenAI is able to generate impressive-looking research content, it lacks the crucial abilities needed for reliable scientific work: fact-checking, maintaining consistency, comparing sources, and building coherent arguments over time. “Current generative AI systems lack coherent persistence — the ability to maintain consistent knowledge and logical relationships across different interactions and contexts. Each analysis exists in its own bubble, unable to detect or resolve contradictions with other analyses.”
When Minutes Matter: AI’s Life-Critical Disconnect
“Maria arrived at the emergency room clutching her abdomen, her face pale with pain.” The hospital’s experimental GenAI system sprang into action immediately to assist with patient intake. The AI gathered Maria’s symptoms, vital signs, and medical history. Within seconds, it had generated a preliminary assessment: possible complications from recent gastric bypass surgery complicated by Type 2 diabetes. But Jennifer, the emergency nurse assigned to Maria’s case, watched with growing frustration as the AI system couldn’t access Maria’s surgical records from a nearby hospital and was treating her surgery as brand new information.
Suddenly, the various AI systems monitoring Maria’s vital signs flashed a series of warnings: her blood pressure was dropping; the lab analysis AI identified markers suggesting internal bleeding; the medication management system flagged dangerous drug interactions; the patient history AI noted patterns suggesting post-surgical complications. “However, none of these systems could communicate with each other or take action. Jennifer had to manually check each system’s alerts, copy critical values between systems, input data into protocols, and coordinate responses herself.”
This story illustrates another major limitation of current GenAI systems: they lack collaborative intelligence, a particularly serious limitation in settings where minutes matter, such as in a hospital emergency room. While each individual AI systems shows impressive capabilities in its own domain, multiple AI systems lack the ability to communicate with each other, take coordinated action, proactively identify needs and adapt to changing situations in real-time. Each system may independently recognize that a serious situation is developing, but they must then wait for human intervention due to their inability to communicate with each other, identify the problem, and act on their own.
There three stories illustrate three fundamental limitations in our current approach to generative AI:
The Execution Gap. Our AI systems can generate perfect plans but can’t take real-world actions to implement them.
The Learning Gap. Our AI systems can’t build reliable knowledge over time or adapt based on experience. And
The Coordination Gap. Our AI systems can’t work together effectively.
The Imperative of Agentic AI
“The solution lies in a fundamentally different approach to AI — one that focuses not just on making AI systems smarter but on making them more capable of autonomous, coordinated action.”
What sets agentic AI apart is their ability to act independently in pursuit of defined goals. “Unlike generative AI systems that simply respond to queries or generate outputs, agentic AI systems can understand a goal, take initiative, maintain persistent objectives, and adapt their strategies based on real-world feedback. Put simply, an AI agent is a system that uses AI and tools to accomplish actions in order to reach a given goal autonomously.”
An agentic AI system doesn’t just generate insights — it takes action. “It can interact with applications, manipulate data, control hardware, and execute real tasks to achieve specific goals.” In principle, an AI agent can be trained to do anything a human can do on a computer. An AI agent operates in a continuous loop of planning, reasoning, and execution — learning from each step to refine its approach until the goal is achieved. In essence, it’s like having a highly capable assistant who doesn’t just know what to do but actually does it.” But, the book cautions that “success depends on providing clear and precise goals and instructions.”
For example, in planning Brian’s family vacation, instead of just dreaming up the perfect itinerary, an AI agent would work like a seasoned travel professional. “It would start by checking real-time availability and pricing across multiple booking systems. When a hotel was full, or a flight was too expensive, it would automatically adjust the plan, find alternatives, and even make reservations. Most importantly, it would keep track of all the details, from confirmation numbers to cancellation policies, maintaining a complete picture of the trip-planning process.”
In the tale from the research world, an AI agent would approach research challenges the way a seasoned scientist does. It would verify its scientific sources, carefully checking each for credibility and relevance, and actively searching for and flagging any contradictions. Beyond just fact-checking, the system would build a logical framework, connecting different pieces of evidence and ensuring that conclusions flow naturally from verified data. And when new information becomes available, it would be integrated into this framework.
And in the emergency room story, instead of independent AI agents working in parallel, a set of coordinated agents would work together like a well-rehearsed medical team, integrating information from multiple sources in real time, proactively monitoring vital sign across different systems, and coordinating their efforts. The moment that the monitoring of vital signs indicates trouble, the agents would alert the right medical personnel and ensure that critical information flows seamlessly across all departments.
The Promise (and Limitations) of AI Agents
“Every major technological shift reshapes the way we live and work,” the authors added. “The printing press democratized knowledge. The internet connected humanity. AI, in its agentic form, has the potential to amplify human capabilities in ways we’re only beginning to comprehend”:
specialized medical agents coordinating patient experience across entire health systems, not just analyzing symptoms;
educational agents becoming true learning partners, adapting to your unique pace and style;
autonomous agents orchestrating global responses to climate change;
a collective intelligence tackling our planet’s most pressing challenges.
The potential of AI agents is clearly fascinating, but “fully autonomous agents, capable of handling complex, multifaceted tasks without human intervention, are not yet there.” Today’s agents are powerful but limited:
Current AI agents are task-oriented. They excel at orchestrated sequences of actions using well-defined tools and highly detailed instructions, — automating workflows, not replacing entire job roles.
Deployment is harder than development. Many projects fail not because the agent is weak but because the systems around it — data quality, workflow integration, user adoption — are not ready.
Strict human oversight remains essential. In most cases, AI agents are not fully reliable. Their potential lack of accuracy, implementation issues, or unexpected failures requires close human supervision.
Technical Expertise Remains Essential. The deployment of AI agents in enterprises still requires programming expertise in a number of areas like error handling and security.
“The gap between expectation and reality is wide. Those who fail to understand it risk wasting time, money, and credibility” But AI agents learn and improve over time. The more they are used, the more they improve.
Irving Wladawsky-Berger
Irving Wladawsky-Berger, Research Affiliate at MIT's Sloan School of Management and at Cybersecurity at MIT Sloan (CAMS) and Fellow of the Initiative on the Digital Economy, of MIT Connection Science, and of the Stanford Digital Economy Lab.
Dr. Wladawsky-Berger retired from IBM in May of 2007 after 37 years with the company, where he was responsible for identifying emerging technologies and marketplace developments that are critical to the future of the IT industry. Irving was also responsible for the university relations office and for the IBM Academy of Technology where he served as Chairman of the Board of Governors. Irving led a number of IBM’s company wide initiatives including the Internet, supercomputing and Linux.
Since retiring from IBM, Irving has been an Adviser on Digital Strategy and Innovation at Citigroup, at HBO and at Mastercard. He’s been writing a weekly blog, irvingwb.com, since 2005. From April of 2012 until July 2020, Irving was a guest columnist for the Wall Street Journal’s CIO Journal.
Irving served on and later became co-chair of the President’s Information Technology Advisory Committee from 1997 to 2001, and was a founding member of the Computer Sciences and Telecommunications Board of the National Research Council in 1986. Irving is a former member of University of Chicago Board of Governors for Argonne National Laboratories, the Board of Overseers for Fermilab, and BP's Technology Advisory Council. He is a Fellow of the American Academy of Arts and Sciences. Having been born in Cuba and coming to the US at the age of 15, he was named 2001 Hispanic Engineer of the Year. Irving has an M.S. and Ph. D. in physics from the University of Chicago.