Autonomous Telco Operations powered by NVIDIA

The telecom industry is facing a dual challenge: delivering a better customer experience through its network while upgrading to newer technologies, such as 5G and Edge Computing. The expectation from telecommunications providers to provide seamless digital experiences to their customers is increasing daily, and network operations play a significant role in fulfilling this expectation. However, the cost of operating the network is also growing. According to many studies, a typical Telco’s network operating cost is 15-30% of its total cost. Therefore, telecommunications providers must find a way to deliver better user experiences while operating more efficiently. This is where the concept of Autonomous Operations becomes highly relevant.

TMForum (IG1252) has proposed a comprehensive Autonomous Network (AN) Level Evaluation framework, which helps telecommunications providers to assess their current AN maturity and plan for transformation. The framework identifies key capabilities, including awareness, analysis, decision-making, execution, and intent interpretation, that are required to achieve higher autonomy levels.

TMForum IG 1252 Autonomous Networks Level Evaluation Methodology

Most telecommunications providers exhibit a maturity level between 1.5 and 2 today and aspire to reach a level between 3.5 and 4. Telecommunications providers have focused on network operations first for autonomy, as it promises a higher ROI.

Telco’s challenges hinder network autonomy progress

However, Telcos face many challenges that must be resolved before moving to Autonomous Operations Level 3 and above.

1. Lack of end-to-end correlation
Modern networks are complex, heterogeneous and growing fast. Each network domain (core, wireless access, fixed access, transport, and data center) has distinct operational characteristics due to the specific technologies, protocols, devices, vendors, services offered, and deployment strategies. This distinct boundary makes it challenging to build end-to-end correlation capability by bringing together data, systems and insights across domains. Hence, cross-domain correlation is a key success factor for achieving autonomous operations.

2. Reactive fault management
A lack of unified access to different types of network data (alarms, telemetry data, logs, topology and so on) hinders the scope and effectiveness of predictive AI models and automation being used. Typically, as seen at most Telcos, predictive assurance models are deployed minimally on production networks.

3. Low degree of automation
Most Telcos lack a systematic and uniform approach to building network automation capabilities. NOC engineers usually build automation scripts based on their knowledge and experience. This approach does not scale well and limits adoption.

4. Lack of closed-loop assurance
The ability to automatically take remedial action in response to faults or abnormal operational states is a crucial factor for autonomous operations. However, telcos lack a flexible and configurable automation platform that can help build self-healing flows. Issue resolution is heavily dependent on the knowledge and experience of NOC engineers.


Infosys Smart Network Assurance – Advanced AIOps solution for Autonomous Network Operations

Infosys Smart Network Assurance (ISNA) is an advanced, cloud-native network AIOps solution that combines Agentic AI techniques with supervised and unsupervised AI/ML models to propel Telcos and other enterprises towards building autonomous network operations.

ISNA is designed as a flexible AIOps platform intended to sit alongside the network/ OSS landscape of Telcos and address the gaps and challenges without disrupting their fundamental operating model. The core design principles of ISNA have emerged from the learnings and best practices gathered through Infosys’s extensive industry experience with various Telcos worldwide. ISNA has many prebuilt features that help accelerate a Telco’s journey towards an autonomous network operations maturity level of 4.

Key highlights of ISNA are:

Icon 1

Observability

  • Single pane of glass visualization
  • Configurable integration adapters for alarms, metrics, logs ingestion
  • Topology visualization
  • Real time dashboards and reports
  • Gen AI Operations Copilot
Icon 2

Diagnose

  • Real-time AI/ML based event correlation
  • RCA identification
  • Agentic AI Driven Operations
  • Metric anomaly detection
  • Dynamic Thresholding
  • MLOps support
Icon 3

Self-Heal

  • Zero/No-Code Automation framework for self heal workflows
  • Closed loop assurance
  • Intuitive drag and drop workflow designer
  • Configurable adapters
  • Seamless third-party integrations

Infosys Smart Network Assurance Features

Role of classical AI/ML, Gen AI and Agentic AI in accelerating the journey towards Autonomous Network level 4 and beyond

  • Classical AI/ML models
    ISNA solution comes prepackaged with several classical AI/ ML models for solving many industry-wide fault and performance management use cases, such as:
    • Pattern mining and fault correlation models facilitate accurate RCA by leveraging topology correlation-derived insights. This is particularly useful for cross-domain scenarios.
    • Performance anomaly detection models help proactively identify abnormal operational behavior at the device or interface levels and accurately project potential service impacts.
    • Dynamic thresholding models are used to observe and act on real-time network insights.
  • Operations Copilot
    ISNA offers an Operations Copilot service that uses AI agents to help network operators in troubleshooting faults and recommending the next best action. The Copilot can automatically detect the network engineer’s context and respond accordingly. Many task-focused agents are available out of the box with the solution. The respective agents for the task are allocated automatically by the Copilot supervisor agent.
  • Agentic AI for specialized assurance functions
    ISNA offers a configuration-driven Agentic AI framework for developing, managing, and rolling out specialized AI agents that perform specific functions to realize network assurance use cases. ISNA enables seamless configuration and building of multi-agent flows, helping Telcos design and implement various autonomous operations use cases. Some examples include, but are not limited to, site troubleshooting, RCA and resolution, service impact monitoring, remediation and traffic optimization.

How NVIDIA accelerates ISNA

The Agentic AI service of ISNA leverages NVIDIA LLM models Llama 3.1-Nemotron-70B and Mistral Nemo. The integrated tests using the operations Copilot and the agentic AI service yielded very good results. The cognitive correctness and the degree of hallucination detection were consistently good. Given the telco-specific challenges we highlighted at the beginning, the reasoning capability of LLM/SLM is highly critical to ensuring the production readiness of these agents.

Using NVIDIA NIM allows the ISNA solution to benefit from:

  • Fast, Scalable deployment
  • Flexible deployment on any infrastructure—on-prem, cloud, or hybrid—via standard APIs
  • Secure and Reliable operations

Providing the right compute

Dell’s PowerEdge XE8640 server stands at the heart of this use case, delivering unmatched reliability, performance, and scalability essential for autonomous telecom operations. With its robust architecture, the PowerEdge XE8640 is meticulously designed to handle demanding workloads, ensuring seamless compatibility with NVIDIA’s NIM to drive high-efficiency AIOps processes. Its cutting-edge cooling technology and dense compute capabilities empower telecoms to achieve peak performance while optimizing energy consumption. Furthermore, Dell’s commitment to Communications Service Providers (CSPs) provides them with a trusted partner, ensuring smooth integration and operation of this next-generation solution. Together, these features enable CSPs to transform their networks with confidence and agility, paving the way for AI automation. 

Infosys Smart Network Assurance integrated with NVIDIA and Dell Technologies

Key Benefits

  • Plug and Play for Telcos – The integrated solution can be deployed as a packaged solution and easily integrated with the assurance stack.
  • Provides benefits from Day 1 – With historic data and machine learning of patterns, the solution can be pre-trained for faults and be efficient from Day 1.
  • Reduced operational cost – The observability and diagnosis features reduce the operational effort in monitoring, fault isolation, root cause analysis and fault remediation activities, optimizing the operations cost by 30-40%.
  • Reduced MTTR – Automated RCA and advanced Co-pilot features reduce diagnosis (L2) time, resulting in a 30% reduction in MTTR.
  • Flexibility for ecosystem integration – The agentic AI architecture allows easy integration of specialized agents from network OEMs or custom models.

Conclusion

As Telcos embark on the journey towards autonomous networks, it is important to define the roadmap judiciously. Maturity assessment and gap analysis are important preliminary stages in defining the target architecture. Identifying the right solution levers that will help incrementally evolve to the target solution is critical for the Telco to realize faster ROI. A collaborative ecosystem comprising multiple partners is vital in this context. In this blog, we have examined a successful collaboration model involving Infosys Smart Network Assurance (ISNA), NVIDIA NIM, and Dell PowerEdge XE8640, which delivers a Next-Generation AI-OPS platform for Telco networks.

Authors

Sreekanth Sasidharan
Associate Vice President & Unit Technology Officer – Network, Communication, Media Technology Engineering, Infosys

Nikhil Mohan
Principal Technology Architect, Engineering Services, Infosys

Contributors

Randip Sinha
Vice President, Group Manager - Client Services at Communications, Media & Technology, Infosys