Unlocking the AI-first enterprise: A generative AI blueprint
- Firms should embrace an enterprise-centric, platform-based approach to generative AI, moving away from use-case centric and siloed approaches.
- The platform must endure changing AI frameworks, a fast-moving product and vendor landscape, and evolving industry regulations. Most important, it should factor in evolving business needs to embrace AI.
- The platform should encourage participation among enterprise stakeholders, including employees, contractors, AI vendors, and the open-source community.
- While individual platform components may have certain specifications and associated constraints, the platform should not hinder the organization’s AI-first journey.
- The reference architecture detailed here enables organizations to build their own enterprise AI platform and onboard generative AI quickly and at scale.
Our initial viewpoint on enterprise AI development covered a generic AI platform architecture, designed to democratize AI’s capabilities at scale.
This predates the recent breakthrough in generative AI, a transformation technology set to impact many industries.
Our ongoing research tracks enterprise adoption of generative AI-powered tools and services. Firms that deploy this technology effectively will become more innovative, unlock efficiencies at scale, grow faster, and build a connected ecosystem.
Firms should embrace an enterprise-centric, platform-based approach to generative AI, moving away from use-case centric and siloed approaches. A platform approach that abstracts the underlying engineering complexity will help firms develop and rollout generative AI tools and services through agile techniques.
Though AI development and deployment technologies remain largely the same, generative AI-related challenges necessitate a new reference platform architecture.
The significance of an AI platform
Our first paper showed a reference platform architecture to future proof and democratize AI adoption through modular, holistic, and agile AI development.
The updated platform architecture introduced here sets out essential guidelines for firms adopting large language models (LLMs) beyond a use-case-only approach.
Organizations can tailor these principles to suit their specific use cases.
One key plank here is layering, where each component of the platform is built independently and then layered into a comprehensive framework.
Nonetheless, technology is only part of the equation. Business considerations must play a pivotal role during platform development. In such a collaborative environment, AI will evolve and inspire the whole organization to become more productive, innovative, and customer centric.
Why the need for a generative AI-first experience?
Traditional AI models focus on pattern recognition and provide analytical insights within horizon 1 (H1) and H2. Generative AI models are way ahead in what we term as H3, with a unique capability to generate output including words, code, images and more, using the knowledge from their training data.
Generative AI automates mundane tasks such as code completion and document summary, which enables individuals to focus on more meaningful work and become more productive.
For example, a Kubernetes assistant autonomously checks the cluster health and then provides real-time analysis. This frees up the support engineer to tackle other tasks, such as managing a code defect using another generative AI code assistant.
Generative AI’s other enterprise applications:
- Content creation: AI models, especially LLMs such as ChatGPT, generate a wide range of content, including written text, voice, images, video, and even code.
- Prompt engineering to drive tasks: Done well, this simplifies machine learning (ML) tasks. Users achieve various ML applications by giving just a few instructions to a single LLM, eliminating the need for use-case specific/task-centric fine tuning.
- Combining multiple tools to achieve the end goal: LLMs perform a range of tasks with multistep goals, such as finding Kubernetes cluster issues with just an English language prompt, and a new category of applications called agent frameworks does the rest. This technique has wide-ranging applications like automation in ITOps and multimodal chatbots.
- Fine tuning for the business domain: Advances in fine tuning, such as reduction of trainable parameters for downstream tasks, adapter-based fine tuning, feedback-based fine tuning using reinforcement learning with human feedback (RLHF), and supervised fine tuning (SFT), enable customization of large models for organizational data and experiences.
Distinguishing use-case centric and enterprise-level AI
An isolated use-case centric AI approach diverges significantly from an enterprise-level AI roadmap. In the former, solution architects mainly focus on handling projected transaction volumes, response time, data security, and availability. They also grapple with the challenge of seamlessly integrating AI capabilities into the organization’s wider ecosystem. As a result, architectural decisions are local to use cases and influenced by the specific technology resources available at the business unit level.
Generative AI demands an enterprise-level perspective rather than remaining isolated within business units.
Key considerations include:
- Generative and autoregressive models are evolving fast. For instance, Hugging Face, an open-source generative AI provider, hosts more than 100,000 models, many with nonrestrictive licenses. Predicting growth is difficult due to the technology’s nonlinear and disruptive nature.
- A thriving ecosystem supports these models, with autonomous agents, prompt engineering frameworks, and long- and short-term memory stores for application and agent context management.
- Firms need to address challenges around data safety and IP protection.
- LLM management requires specialized hardware, cost optimization, and specific computing resources.
- Generative AI models hallucinate (i.e., make things up) and pose risks such as prompt poisoning, privacy issues, and safety concerns. Firms should stay vigilant to manage and monitor these issues.
- Finally, the challenge lies in moving from fragmented use-case centric approaches to scalable, multiuse case enterprise solutions.
Platform-based approach is the solution
These challenges require an enterprise platform-based approach, offering common infrastructure, services, processes, and governance to manage multiple AI use cases across the organization. A platform approach ensures swift availability of generative AI tools.
Figure 1 lists key principles to build an enterprise platform for diverse AI use cases. These principles are prioritized and updated based on each organization’s unique business context.
Figure 1. Key architecture principles to develop an enterprise AI platform
The platform must endure changing AI frameworks, a fast-moving product and vendor landscape, and evolving industry regulations. Most important, it should factor in evolving business needs to embrace AI (Table 1).
Table 1. Architecture principles for a futuristic platform
|What does this mean?
|Why is this important?
The platform should encourage participation among enterprise stakeholders, including employees, contractors, AI vendors, and the open-source community. This approach promotes healthy competition among providers and ease of use to consumers (Table 2).
Table 2. Architecture principles for platform democratization
|What does this mean?
|Why is this important?
While individual platform components may have certain specifications and associated constraints, the platform should not hinder the organization’s AI-first journey (Table 3).
Table 3. Architecture principles for platform scalability
|What does this mean?
|Why is this important?
Responsible by Design and Poly AI
Responsible by Design – safety, bias, security, explainability and privacy should be adhered to across the lifecycle of AI development and deployment. This builds trust in data, model, and process, ensuring regulatory and legal compliance.
Poly AI – this ensures various tooling and processes are transparent, measured, and monitored homogenously across multiple hyperscalers. Additionally, Poly AI provides flexibility to use various AI hardware across different vendors for different purposes.
The architecture principles should be:
- Repeatable: Enables automated reproduction of experiments and generative AI models for similar outcomes.
- Secured: Ensures data privacy and security beyond traditional vectors, with guardrails for fine-tuning and model usage.
- Monitored and measured: Implements mechanisms to automate and capture model predictions for business and technical users. This extends to different versions of each generative AI model.
The platform should be futuristic, democratized, and scalable, while also providing repeatability, security, and governance.
Core components of the generative AI architecture
This reference architecture enables organizations to build their own enterprise AI platform.
Figure 2 provides an overview of the platform’s capabilities and benefits, utilizing the reference architecture in our earlier paper. Figure 3 is a detailed reference architecture approach extended for generative AI adoption. The basic MLOps principles remain same for Figure 2 (H1 & H2) and Figure 3 (H3).
Figure 2. Main capabilities and benefits of the enterprise AI platform
Figure 3. Detailed reference architecture for enterprise AI platform
The Figure 3 reference architecture consists of five loosely coupled layers, each with a unique purpose and goal.
- Cognitive API layer defines business-relevant services and exposes AI models through well-defined APIs, facilitating user interaction.
- Autonomous agent apps layer onboards application frameworks (like agent and chain-based), leveraging LLM’s ability of using tools and APIs. Semantic memory and semantic tools are two key emerging patterns that can be implemented here:
- Semantic memory — autonomous agents, as well as applications built for search and “retrieval-based embeddings”, require both short- and long-term memory storage. Agents require memory for interaction scenarios, while retrieval tasks require enterprise knowledge extraction.
- Semantic tools — Generative AI agents can use enterprise APIs and tools to perform tasks autonomously. But they must be provided with the right interfaces and prompts. Each API tool may have different security mechanisms and functional interfaces. Prompts must be configured and tested for each tool.
- Guardrails and command center layer provides visibility and operational optimization for generative AI models across the business. The firm can monitor metrics such as token usage and response time, along with AI-specific metrics such as accuracy and drift through user feedback monitoring. This layer also flags privacy and safety issues from AI-generated content.
- Poly AI/MLOps/AI engineering lifecycle management layer enables data scientists to fine tune and train and deploy their models at enterprise scale without dealing with underlying engineering complexity. It also enables the firm to standardize the AI lifecycle, from training and tuning to experiment tracking, model validation, deployment, and monitoring.
- Prompt engine enables LLM adoption and ensures a common prompt database that can be tailored to specific LLMs. The engine also guards against prompt poisoning through prompt monitoring techniques.
- AI infra and polycloud management layer manages the development, fine tuning, and inferencing clusters for AI models, on-premises and in the cloud. It covers the complexity of onboarding compute and storage from private and public cloud and provides the best-fit cloud for the ongoing task.
The importance of layering
Each layer in the reference architecture functions as an independent application, with distinct user personas, interfaces, technology, services, and deployment. Key considerations include:
- Each layer maintains a clear boundary interface. For instance, the Poly AI MLOps layer can leverage Kubernetes for orchestration, job management, and training frameworks. It offers cloud-agnostic deployment, simplifying model deployment for data science teams and supporting various model architectures.
- Each layer operates independently with its own business case and feature roadmap. For example, Infosys first builds the Poly AI MLOps layer to implement MLOps processes and provide a model repository, and then the Guardrails and Command Center layer to monitor models.
- Each layer is implemented using the preferred technology vendor, either through build or buy. This flexibility ensures organizations can choose the best-fit solutions for each layer without vendor lock-in.
- While every organization can use an open-source framework or third-party product to implement a specific layer, they should ensure that the enterprise AI platform is not locked into any underlying product or framework. For example, the Poly AI/MLOps layer enables consistent interfaces for various MLOps services and can mix and match different capabilities across on-premises and cloud providers.
In short, organizations should treat every layer separately and choose the best option independently for each. This enables iterative development of each layer over time.
To accelerate its AI-first journey, Infosys is developing a vendor-agnostic generative AI reference architecture platform, emphasizing collaboration between business and IT in development and procurement decisions.
This year’s Tech Navigator report, The AI-first Organization, details four AI pillars to go AI-first – AI experience, engineering, governance, talent, and operating model.
The generative AI reference architecture platform is one piece in the larger business strategy and shifts firms away from a use-case approach to a plug and play, constantly learning and evolving live enterprise.
AI will be the linchpin for the next big technological revolution; going AI-first through such a platform-approach will ensure all business functions are well connected and agile to foster new growth when the next major wave of innovation arrives.