The key to achieving production-grade agentic AI

Insights

  • Agentic AI has the potential to transform core business processes across industries in 2026. But many current agentic AI initiatives are too risky for large-scale adoption.
  • Scaling AI effectively means building a poly cloud, poly AI architecture, established on a foundation of responsible governance.
  • This platform-based approach enables access-controlled, performance-optimized, and regulatory-compliant guardrails that work across multiple cloud vendors and generative AI models, while providing a centralized repository for agent selection and orchestration.
  • In our implementation with a global footwear conglomerate, the agentic solution for inventory discrepancy was built in stages, transitioning from a monolithic architecture to one focusing on the key poly cloud, poly AI tenets of modularity, event-driven design, and seamless integration.
  • Using this blueprint, organizations can quickly move from proof of concept to enterprise-wide, cloud- and AI-agnostic implementations.

Large language models (LLMs) and agentic AI continue to advance, with new techniques, plus advancement in the models themselves, including improvements in inference and deep search that enhance agentic AI’s cognitive and reasoning capabilities. This progress has paved the way for more accurate and reliable agentic AI applications that can be utilized for business purposes at enterprise scale.

However, many enterprise deployments are currently in the planning or proof-of-concept phase, with just a fifth achieving full business value, according to our Business Value Radar 2025 report.

And caution remains around agentic AI. One 2025 study by Stanford University found that although almost 60% organizations are actively exploring agentic AI, 42% said, “AI agents are currently too risky for large-scale adoption.”

So how can organizations transform proof-of-concept, point solutions into scalable, responsible, and less risky production-grade implementations?

In our own work with clients, we have found that scaling AI effectively means building what we term a poly cloud, poly AI architecture, established on a foundation of responsible governance. This technical architecture addresses some of the key challenges large organizations face when building agentic AI applications.

The problem with building production-grade agents

Beyond general risk concerns, scaling agentic AI reveals vulnerabilities seen in generative reasoning models over the past two years, along with challenges unique to enterprise-wide agent orchestration.

Deploying agents at scale means managing multiple failure points. For example, AI’s reliance on large models and complex reasoning drives up token usage and costs, impacting return on investment (ROI). Real-time applications suffer from latency due to multiple LLM calls and long-running transactions, such as database or knowledge-base queries, which require persistent state management, i.e. maintaining an application’s state so that it survives across sessions.

Security is another concern. Agentic applications need access to enterprise systems for real-time problem-solving, but insecure access can lead to risks like unintended system modifications.

Finally, the rapid evolution of agentic AI frameworks introduces instability. Frequent updates to open-source tools like LangChain and model context protocol (MCP) require careful integration planning.

Design considerations and guiding principles

The key to tackling these issues is an architecture that is both poly cloud and poly AI. This platform-based approach enables access-controlled, performance-optimized, and regulatory-compliant guardrails that work across multiple cloud vendors and with various generative AI models, while providing a centralized repository for agent selection and orchestration (Figure 1).

Figure 1. Deployment view for productizing agentic applications, at scale

Figure 1. Deployment view for productizing agentic applications, at scale

Source: Infosys

This architecture solves the challenges listed above, and uses tried-and-tested techniques at scale. These include using MCP and MCP gateways. The protocol itself was introduced by Anthropic in late 2023, and provides a secure bridge between agents and commercial off-the-shelf (COTS) systems. Agents have access to the capabilities of these internal business systems, but not access to the systems themselves, which ensures data is kept safe and agentic processes are aligned with business policies. The gateway then controls the flow of requests and manages authentication and authorization for various agents.

Other techniques used in the platform architecture include centralized governance across all agentic systems; deploying only those models that can be explained and monitored effectively; and importantly, integrating agents across multiple cloud platforms, ensuring interoperability and reducing lock-in.

The need to access both small and large language models is also important, given the various reasoning and cost benefits of each in different contexts. The system architecture should also include an agent hub for controlled usage, integration of responsible AI guardrails within the agentic AI platform itself, and the introduction of telemetry and observability to trace reasoning processes and monitor token consumption.

Design considerations and guiding principles

Agentic AI architecture brought to life

So, what does all this mean in practice, and how can we put these design considerations and guiding principles to work in a production setting like solving inventory discrepancies in retail?

Modern global retail is a complex and challenging environment. Today’s leading retailers must fulfill orders across a fragmented set of channels, systems, and partners.

To help, large retailers generally build well-integrated supply chain inventory systems to match their online, physical, and partner networks. However, within these channels, and their underlying systems and processes, discrepancies and mismatches naturally arise.

These discrepancies can result in both oversell and undersell of a retailer’s product. Oversell happens when a product is promised and sold for a particular delivery time but is not fulfilled within that time range, or — in the worst cases — not fulfilled at all. In other instances, undersell happens when a product is sitting in a store or warehouse with no record of its existence.

Infosys’ work with retail clients, which we detail in our upcoming Tech Navigator report on use cases of agentic AI in various industries, has found that agentic AI-based solutions are effective at identifying inventory discrepancies. Once problems are detected and understood, agentic systems can then execute corrective actions.

In our implementation with a global footwear conglomerate, for example, the goal was to resolve inventory discrepancies. The system splits the goal into executable tasks and identifies the relevant and available agentic tools and skills needed for each.

Multiple agents are created as part of this solution, including, but not limited to, the following:

  • Available-to-promise (ATP) monitor agent: Tracks ATP messages regularly being published to the channels. ATP represents the quantity of products that a retailer can confidently commit to deliver to customers based on current stock and expected supply.
  • Demand verifier agent: Compares the demand across systems.
  • Supply verifier agent: Compares the supply across systems.
  • Decision agent: Decides if supply or demand needs to be updated in any systems.
  • Mismatch resolver agent: This agent updates the discrepancy between the recorded inventory data and the actual physical stock on hand, as required.
  • Orchestrator agent: This meta-agent summons the other agents to connect to various systems, such as order management systems and the inventory visibility systems, and compares the values for the given stock keeping units.

In this architecture, task-level agents such as the demand verifier agent or decision agent activate the application programming interfaces (APIs), or connect to the COTS database, to fetch the required details, with the orchestrator agent acting as the cognitive center.

When discrepancies occur, the solution identifies which system needs to be corrected based on history and error logs, and then resolves the discrepancies.

For global operations, the inventory architecture should be able to support multiple AI models, tools, and agents across cloud providers. In other words, the architecture should be poly AI and poly cloud. The agent hub (or agent registry) allows other agents to discover and register tools and models from various cloud providers, while the MCP gateway enforces centralized governance and security.

Important in this implementation is to provide fine-grained access control, securing enterprise data and APIs. Agents are granted only the minimum permission required for their tasks. Role-based agent profiles, as those listed above, enforce restricted access to memory, tools, and data, ensuring compliance with security policies and reducing the attack surface.

The solution can be built in stages. In this case, start with a monolithic architecture and then, through iterative learning, transition to a modular and distributed design that significantly improves flexibility and resilience. This evolution teaches the importance of modularity, event-driven design, and seamless integration across components to support scalable and maintainable agentic systems.

How to drive agentic AI integration at scale

Although this poly cloud, poly AI blueprint is important to go truly AI-native, it isn’t everything. Harnessing the requisite talent is not just important but a foremost consideration in any agentic AI change management strategy. Organizations shouldn’t just hire more data scientists and AI experts but look at this new technology as one that will redefine current roles, requiring training in the continuous and adaptive oversight of agentic AI systems.

Indeed, it is unsure as of August 2025 how much human-in-the-loop involvement will be needed in at-scale deployments, but what is sure is that these autonomous digital laborers will need at least some governance and feedback, not least because of the vulnerabilities already mentioned. Re-engineering business processes will need to cater for this.

Some guiding principles for building scalable, resilient agentic AI applications include:

  • Create a modular agent core: Decompose an agent’s functionality into discrete, self-contained components, enabling developers to upgrade or replace individual components without affecting the entire system, improving system stability and explainability.
  • Ensure event-driven inter-agent and agent-tools communication: Agents should communicate in a way that doesn’t stop other communication happening at the same time (known as asynchronous events) rather than direct function calls. This ability to decouple communication across the system enhances resilience and scalability and is effective for managing agentic workflows that can last a long time.
  • Enable authentication and authorization for agents: Fine-grained access control is essential to secure enterprise data and APIs. Agents should be granted only the minimum permission required for their tasks. Role-based agent profiles should enforce restricted access to memory, tools, and data, ensuring compliance with security policies and reducing the attack surface.
  • Embed agent guardrails in the platform: Agentic systems must embed responsible AI safeguards. These include guardrail agents for input/output filtering, LLM-based governance, and traceability mechanisms.
  • Ensure observability and traceability: Monitoring agentic behavior is crucial for debugging, optimization, and security audits. Observability enhances explainability, mitigates hallucinations, and strengthens security within the agent runtime environment, as we discuss here. Traceability provides transparency into agent decision-making processes, aiding in debugging failures in complex workflows.
  • Optimize costs: Strategies include caching, controlling LLM invocations, identifying agents stuck in loops, and deploying intensive models only when necessary. Monitoring token usage, adaptive scaling, and workload optimization further enhance cost efficiency while maintaining performance.
  • Incorporate centralized governance: Finally, in multicloud environments, centralized governance is essential for control, compliance, and reusability. A unified registry for models, agents, and tools enables scalable deployment while maintaining security and operational standards. Centralized capabilities also support cost tracking, performance monitoring, and continuous improvement, ensuring responsible and reliable AI operations.

Agentic AI has the potential to transform core business processes across industries in 2026 — a key theme outlined in the second edition of our Tech Navigator report, launching soon. Using this blueprint and a trusted partner, organizations can quickly move from proof of concept to enterprise-wide, cloud- and AI-agnostic implementations. This is good news, especially in a business environment that sorely needs the higher levels of efficiency, adaptability, and business insight that agentic AI can deliver to those willing to do the hard work now.

Connect with the Infosys Knowledge Institute

All the fields marked with * are required

Opt in for insights from Infosys Knowledge Institute Privacy Statement

Please fill all required fields