Article

Unlocking the AI-first enterprise: A generative AI blueprint

By Kaushal Desai, Prasanna Gawade, Amit Kumar Gupta, Harry Keir Hughes

06 Dec, 2023
9 min read

Insights

Firms should embrace an enterprise-centric, platform-based approach to generative AI, moving away from use-case centric and siloed approaches.
The platform must endure changing AI frameworks, a fast-moving product and vendor landscape, and evolving industry regulations. Most important, it should factor in evolving business needs to embrace AI.
The platform should encourage participation among enterprise stakeholders, including employees, contractors, AI vendors, and the open-source community.
While individual platform components may have certain specifications and associated constraints, the platform should not hinder the organization’s AI-first journey.
The reference architecture detailed here enables organizations to build their own enterprise AI platform and onboard generative AI quickly and at scale.

Our initial viewpoint on enterprise AI development covered a generic AI platform architecture, designed to democratize AI’s capabilities at scale.

This predates the recent breakthrough in generative AI, a transformation technology set to impact many industries.

Our ongoing research tracks enterprise adoption of generative AI-powered tools and services. Firms that deploy this technology effectively will become more innovative, unlock efficiencies at scale, grow faster, and build a connected ecosystem.

Firms should embrace an enterprise-centric, platform-based approach to generative AI, moving away from use-case centric and siloed approaches. A platform approach that abstracts the underlying engineering complexity will help firms develop and rollout generative AI tools and services through agile techniques.

Though AI development and deployment technologies remain largely the same, generative AI-related challenges necessitate a new reference platform architecture.

The significance of an AI platform

Our first paper showed a reference platform architecture to future proof and democratize AI adoption through modular, holistic, and agile AI development.

The updated platform architecture introduced here sets out essential guidelines for firms adopting large language models (LLMs) beyond a use-case-only approach.

Organizations can tailor these principles to suit their specific use cases.

One key plank here is layering, where each component of the platform is built independently and then layered into a comprehensive framework.

Nonetheless, technology is only part of the equation. Business considerations must play a pivotal role during platform development. In such a collaborative environment, AI will evolve and inspire the whole organization to become more productive, innovative, and customer centric.

Why the need for a generative AI-first experience?

Traditional AI models focus on pattern recognition and provide analytical insights within horizon 1 (H1) and H2. Generative AI models are way ahead in what we term as H3, with a unique capability to generate output including words, code, images and more, using the knowledge from their training data.

Generative AI automates mundane tasks such as code completion and document summary, which enables individuals to focus on more meaningful work and become more productive.

For example, a Kubernetes assistant autonomously checks the cluster health and then provides real-time analysis. This frees up the support engineer to tackle other tasks, such as managing a code defect using another generative AI code assistant.

Generative AI’s other enterprise applications:

Content creation: AI models, especially LLMs such as ChatGPT, generate a wide range of content, including written text, voice, images, video, and even code.
Prompt engineering to drive tasks: Done well, this simplifies machine learning (ML) tasks. Users achieve various ML applications by giving just a few instructions to a single LLM, eliminating the need for use-case specific/task-centric fine tuning.
Combining multiple tools to achieve the end goal: LLMs perform a range of tasks with multistep goals, such as finding Kubernetes cluster issues with just an English language prompt, and a new category of applications called agent frameworks does the rest. This technique has wide-ranging applications like automation in ITOps and multimodal chatbots.
Fine tuning for the business domain: Advances in fine tuning, such as reduction of trainable parameters for downstream tasks, adapter-based fine tuning, feedback-based fine tuning using reinforcement learning with human feedback (RLHF), and supervised fine tuning (SFT), enable customization of large models for organizational data and experiences.

Distinguishing use-case centric and enterprise-level AI

An isolated use-case centric AI approach diverges significantly from an enterprise-level AI roadmap. In the former, solution architects mainly focus on handling projected transaction volumes, response time, data security, and availability. They also grapple with the challenge of seamlessly integrating AI capabilities into the organization’s wider ecosystem. As a result, architectural decisions are local to use cases and influenced by the specific technology resources available at the business unit level.

Generative AI demands an enterprise-level perspective rather than remaining isolated within business units.

Key considerations include:

Generative and autoregressive models are evolving fast. For instance, Hugging Face, an open-source generative AI provider, hosts more than 100,000 models, many with nonrestrictive licenses. Predicting growth is difficult due to the technology’s nonlinear and disruptive nature.
A thriving ecosystem supports these models, with autonomous agents, prompt engineering frameworks, and long- and short-term memory stores for application and agent context management.
Firms need to address challenges around data safety and IP protection.
LLM management requires specialized hardware, cost optimization, and specific computing resources.
Generative AI models hallucinate (i.e., make things up) and pose risks such as prompt poisoning, privacy issues, and safety concerns. Firms should stay vigilant to manage and monitor these issues.
Finally, the challenge lies in moving from fragmented use-case centric approaches to scalable, multiuse case enterprise solutions.

Platform-based approach is the solution

These challenges require an enterprise platform-based approach, offering common infrastructure, services, processes, and governance to manage multiple AI use cases across the organization. A platform approach ensures swift availability of generative AI tools.

Figure 1 lists key principles to build an enterprise platform for diverse AI use cases. These principles are prioritized and updated based on each organization’s unique business context.

Figure 1. Key architecture principles to develop an enterprise AI platform

Figure 1. Key architecture principles to develop an enterprise AI platform

Source: Infosys

Futuristic

The platform must endure changing AI frameworks, a fast-moving product and vendor landscape, and evolving industry regulations. Most important, it should factor in evolving business needs to embrace AI (Table 1).

Table 1. Architecture principles for a futuristic platform

	Principle	What does this mean?	Why is this important?
1	Layered architecture	Includes distinct, loosely coupled layers with well-defined responsibilities. Incorporates definite boundary objects to separate one layer from another, with targeted responsibility aligned to organizational roles.	Future changes have a localized impact within a single layer. Avoids vendor lock-in.
2	Data readiness	Provides easy access to data tools for experimenting with both structured and unstructured data. Supports data lineage, provenance, versioning, quality, and dataset catalogue for training and validation.	Ability to explore and use data from enterprise repository. Provides data observability using standard sampling and snapshot-level monitoring using standard metrics for recency, quality, and distribution.
3	Inflow AI	Ability to build AI experiences in app experience. Supports capturing implicit and explicit feedback.	Unified experiences for various roles and corresponding tools increase adoption and productivity. Ability to monitor the performance of models and use the feedback for further experience refinement.

Democratization

The platform should encourage participation among enterprise stakeholders, including employees, contractors, AI vendors, and the open-source community. This approach promotes healthy competition among providers and ease of use to consumers (Table 2).

Table 2. Architecture principles for platform democratization

	Principle	What does this mean?	Why is this important?
1	Unified visibility	Publishes metrics about AI components and use cases, including usage, performance, and experience monitoring. Publishes interfaces between consumers and providers.	Provides visibility to the business community about enterprise AI adoption and maturity. Creates a level playing field among small and large providers.
2	Self-service	Breaks siloes of high-end skills between different roles, including data engineers, ML engineers, and data scientists. Provides specialized skill-based practices for specific functions regardless of organizational boundaries.	Enables individuals to contribute to, and consume, AI content without barriers. Provides an inclusive AI journey.
3	Crowd sourcing	Provides tools and incentives to different roles for dataset preparation, agent frameworks, and model fine-tuning development. Hosts hackathons and competitions	Enables the firm to source diverse AI assets through community engagement Provides an inclusive AI journey

Scalability

While individual platform components may have certain specifications and associated constraints, the platform should not hinder the organization’s AI-first journey (Table 3).

Table 3. Architecture principles for platform scalability

	Principle	What does this mean?	Why is this important?
1	Cloud native	Provide a microservices-based architecture, for independent deployment and scalability. Containerized for higher efficiency, speed, and scalable performance.	Provides inherent scalability of technology components. Enables service composability to orchestrate new capabilities and applications
2	Rapid adoption	Provides rapid experimentation with new frameworks and models, supported by standard frameworks for validation and deployment. Supports experimentation and proof-of-concepts using challenger technology. Incorporates new capabilities easily.	Enables the onboarding of new AI technology and components anytime.
3	Self-governed	Provides policy-based orchestration of models and infrastructure. Includes responsible AI actions like debiasing	Optimizes AI components and infrastructure at an enterprise scale. Eases the implementation of AI governance at scale.

Responsible by Design and Poly AI

Responsible by Design – safety, bias, security, explainability and privacy should be adhered to across the lifecycle of AI development and deployment. This builds trust in data, model, and process, ensuring regulatory and legal compliance.

Poly AI – this ensures various tooling and processes are transparent, measured, and monitored homogenously across multiple hyperscalers. Additionally, Poly AI provides flexibility to use various AI hardware across different vendors for different purposes.

The architecture principles should be:

Repeatable: Enables automated reproduction of experiments and generative AI models for similar outcomes.
Secured: Ensures data privacy and security beyond traditional vectors, with guardrails for fine-tuning and model usage.
Monitored and measured: Implements mechanisms to automate and capture model predictions for business and technical users. This extends to different versions of each generative AI model.

The platform should be futuristic, democratized, and scalable, while also providing repeatability, security, and governance.

Core components of the generative AI architecture

This reference architecture enables organizations to build their own enterprise AI platform.

Figure 2 provides an overview of the platform’s capabilities and benefits, utilizing the reference architecture in our earlier paper. Figure 3 is a detailed reference architecture approach extended for generative AI adoption. The basic MLOps principles remain same for Figure 2 (H1 & H2) and Figure 3 (H3).

Figure 2. Main capabilities and benefits of the enterprise AI platform

Figure 2. Main capabilities and benefits of the enterprise AI platform

Source: Infosys

Figure 3. Detailed reference architecture for enterprise AI platform

Figure 3. Detailed reference architecture for enterprise AI platform

Source: Infosys

The Figure 3 reference architecture consists of five loosely coupled layers, each with a unique purpose and goal.

Cognitive API layer defines business-relevant services and exposes AI models through well-defined APIs, facilitating user interaction.
Autonomous agent apps layer onboards application frameworks (like agent and chain-based), leveraging LLM’s ability of using tools and APIs. Semantic memory and semantic tools are two key emerging patterns that can be implemented here:
- Semantic memory — autonomous agents, as well as applications built for search and “retrieval-based embeddings”, require both short- and long-term memory storage. Agents require memory for interaction scenarios, while retrieval tasks require enterprise knowledge extraction.
- Semantic tools — Generative AI agents can use enterprise APIs and tools to perform tasks autonomously. But they must be provided with the right interfaces and prompts. Each API tool may have different security mechanisms and functional interfaces. Prompts must be configured and tested for each tool.
Guardrails and command center layer provides visibility and operational optimization for generative AI models across the business. The firm can monitor metrics such as token usage and response time, along with AI-specific metrics such as accuracy and drift through user feedback monitoring. This layer also flags privacy and safety issues from AI-generated content.
Poly AI/MLOps/AI engineering lifecycle management layer enables data scientists to fine tune and train and deploy their models at enterprise scale without dealing with underlying engineering complexity. It also enables the firm to standardize the AI lifecycle, from training and tuning to experiment tracking, model validation, deployment, and monitoring.
- Prompt engine enables LLM adoption and ensures a common prompt database that can be tailored to specific LLMs. The engine also guards against prompt poisoning through prompt monitoring techniques.
AI infra and polycloud management layer manages the development, fine tuning, and inferencing clusters for AI models, on-premises and in the cloud. It covers the complexity of onboarding compute and storage from private and public cloud and provides the best-fit cloud for the ongoing task.

The importance of layering

Each layer in the reference architecture functions as an independent application, with distinct user personas, interfaces, technology, services, and deployment. Key considerations include:

Each layer maintains a clear boundary interface. For instance, the Poly AI MLOps layer can leverage Kubernetes for orchestration, job management, and training frameworks. It offers cloud-agnostic deployment, simplifying model deployment for data science teams and supporting various model architectures.
Each layer operates independently with its own business case and feature roadmap. For example, Infosys first builds the Poly AI MLOps layer to implement MLOps processes and provide a model repository, and then the Guardrails and Command Center layer to monitor models.
Each layer is implemented using the preferred technology vendor, either through build or buy. This flexibility ensures organizations can choose the best-fit solutions for each layer without vendor lock-in.
While every organization can use an open-source framework or third-party product to implement a specific layer, they should ensure that the enterprise AI platform is not locked into any underlying product or framework. For example, the Poly AI/MLOps layer enables consistent interfaces for various MLOps services and can mix and match different capabilities across on-premises and cloud providers.

In short, organizations should treat every layer separately and choose the best option independently for each. This enables iterative development of each layer over time.

Looking ahead

To accelerate its AI-first journey, Infosys is developing a vendor-agnostic generative AI reference architecture platform, emphasizing collaboration between business and IT in development and procurement decisions.

This year’s Tech Navigator report, The AI-first Organization, details four AI pillars to go AI-first – AI experience, engineering, governance, talent, and operating model.

The generative AI reference architecture platform is one piece in the larger business strategy and shifts firms away from a use-case approach to a plug and play, constantly learning and evolving live enterprise.

AI will be the linchpin for the next big technological revolution; going AI-first through such a platform-approach will ensure all business functions are well connected and agile to foster new growth when the next major wave of innovation arrives.

Authors

Kaushal Desai, Prasanna Gawade, Amit Kumar Gupta, Harry Keir Hughes