Article

Smarter, smaller, safer: The case for small language models in financial services

By Vaibhav Bhokare, Amit Kumar Gupta, Sharan Bathija

11 Nov, 2025
20 min read

Insights

Large language models (LLMs) used by banks have high latency, which makes them slow; high infrastructure and operating costs, which make them expensive; and limited accuracy, which restricts enterprise adoption.
Small language models (SLMs) are designed for specific banking tasks and can be more accurate and easier to manage.
SLMs cost much less to train and use compared to LLMs, helping banks save millions each year.
Banks can train SLMs with their own data, keeping sensitive information safe and meeting privacy laws.
SLMs can run on edge-devices, giving quick answers needed for things like payments and fraud checks.
Banks have full control over SLMs, making it easier to explain decisions and follow regulations.
Banks can customize existing SLMs or build their own for even more control.
SLMs can be updated with new data so they stay useful as regulations and markets change.

Financial institutions were quick to explore the potential of large language models (LLMs) in areas like fraud detection, risk prediction, and wealth management. These models have proved they can process vast amounts of unstructured data, automate customer interactions, and support complex decision-making. From AI-powered chatbots in retail banking to intelligent document processing in compliance and underwriting, LLMs have demonstrated strong capabilities.

However, as banks and financial institutions move from experimentation to enterprise-scale deployment, several limitations of LLMs have become apparent:

Limited domain specific responses: LLMs require extensive fine-tuning to meet domain-specific needs. Yet even after fine-tuning the accuracy does not meet enterprise requirements.
High latency: Most LLMs are hosted in the cloud, which causes response delays. This high latency hampers real-time, business-critical applications.
Lack of data governance and regulatory explainability: Enterprises get limited visibility into the training of datasets and face challenges in meeting model explainability requirements for regulatory oversight.
High infrastructure and operating costs: LLMs demand significant graphics processing unit (GPU) infrastructure and ongoing operational expenses through token-based per-call application programming interface (API) pricing models.

As agentic AI begins to influence financial institutions, these limitations have been further exposed owing to low return on AI investment. Agentic AI is a class of AI tools that can work autonomously to achieve specific goals, and these tools require more accuracy than can be delivered by generic LLMs. Agentic systems benefit from a modular setup — using multiple small models for different tasks rather than one large monolithic model. This improves maintainability and enables dynamic orchestration.

Banks can benefit from agentic AI systems that run on less compute, offer lower latency, and are cheaper to deploy, especially on edge devices or in private environments. The global agentic AI market size is expected to rise fast — from $7.6 billion in 2025 to $199 billion in 2034. To enable this significant growth, cost-effective and regulatory-compliant reasoning models are necessary.

Small language models: A strategic alternative

Small language models (SLMs) deliver higher accuracy for specific tasks, use less compute, offer lower costs, preserve data privacy, demonstrate domain-specific intelligence, and produce explainable outcomes. As a result, they can provide better return on investment than traditional LLMs.

What are SLMs and how are they different from LLMs?

SLMs are compact versions of LLMs, designed to deliver domain-specific reasoning and natural language processing capabilities with much lower computational and memory requirements (Figure 1).

Figure 1: Comparison of LLMs and SLMs

Source: Infosys Knowledge Institute

“Small language models are engineered for enterprise-grade intelligence. Trained on proprietary data, they deliver context-aware insights tailored to business needs. Their ability to operate securely within the firewall ensures trust, while low inferencing costs make them highly scalable. When combined with context engineering, SLMs empower organizations to scale AI responsibly — with precision, privacy, and performance at the core.”

Ashok Panda
Infosys vice president and global head, AI and automation services

Why SLMs are well-suited for financial services

Cost: LLMs are expensive — very expensive. Anthropic CEO Dario Amodei said that training costs could go from $100 million to in the tens of billions. LLMs run on trillions of parameters — GPT-4 alone has 1.76 trillion. They demand top-shelf GPUs like Nvidia’s A100 or H100.

Meanwhile, an SLM, usually under 14 billion parameters, trains for significantly less — nearly 50% to 75% lower. It can even run on high-end Xeon servers instead of powerful GPUs.

Inference, or when the trained model delivers results from input data, tells the same story: GPT-4 costs $0.09 for 1,000 tokens. For example, a bank with 2 million retail customers and an average of 100,000 API calls a day, an off-the-shelf LLM costs the bank $3.2 million a year, assuming 100,000 calls every day at a cost of $0.09 for 1,000 tokens. However, using Mistral-7B (a 7 billion parameter SLM), which costs $0.0004 per 1,000 tokens, costs less than $15,000 a year, assuming 100,000 calls every day at a cost of $0.0004 to operate.
Data privacy and explainability: Data and trust go hand in hand. LLMs learn from the open web. The financial institution doesn’t control that data or its quality. In contrast, an SLM can be trained on a bank’s own books and records. Every answer traces back to a known source. Nothing leaves the firewall, so banks stay on the right side of regulations such as the EU’s GDPR, the US’ HIPAA, and the EU AI Act.
Low latency: Speed counts. Add an SLM on-premises or at the edge and latency falls. That’s a must for payments, fraud checks, and customer service.
On-premises deployable: Lightweight SLMs with fewer parameters — less than 14 billion — can be easily deployed on premise. For consistent GPU utilization above 60% to 70%, on-premise SLMs can save between 30% and 50% over cloud LLMs in three years.
Higher accuracy: Financial services hinge on precise language. Errors in contracts, transactions, or risk models cause losses and regulatory penalties. SLMs trained on domain data handle financial terms such as “open-to-buy” (a retailer’s available budget for new inventory) and “NAV per unit” (the per unit value of a fund’s assets after liabilities) more accurately than general LLMs. This higher precision reduces fines, audits, and reputational damage.
Sovereignty: SLM sovereignty lets a bank fully control, govern, and operate its models without third-party clouds or APIs. On-prem builds keep data in-house, enable full audits, and ensure compliance with federal laws. Data localization also has become a key part of regulatory environments in each country. In the future, regulators could require AI models to be located domestically within geographical boundaries.

Potential use cases for SLMs in financial services

Financial institutions hold vast, domain-rich datasets thanks to their customer-centric operating model. JP Morgan, for instance, generates about 12 to 27 terabytes of new information each day — transaction logs, compliance and payment data, customer-service transcripts, and even social-media signals. Training an SLM on this proprietary data keeps capital and operating costs in check while delivering high-accuracy reasoning on premises.

Consider payments. A payments processor such as Stripe could build a payments-specific SLM using millions of historical transactions, legacy business requirements, and years of compliance documentation. When deployed in-house, the model can sharpen fraud detection, optimize routing, automate back-office tasks, and push toward fully straight-through processing.

The mortgage market offers similar opportunities. A mid-size lender that handles 10,000 loans a month could train a mortgage-centric SLM to elevate every step of the value chain — lead generation, loan origination, underwriting, servicing, and closure.

Contract management is another fertile ground. One large American bank processes roughly 10,000 customer contracts a month. A contract-analysis SLM could streamline drafting, verify legal clauses, flag risks, run compliance checks, summarize terms, and automate monitoring through to close.

An example of a banking SLM is the Infosys Topaz Banking SLM. It is trained on general banking knowledge, Infosys proprietary (Finacle) documentation, and Banking Industry Architecture Network (BIAN) standards. The Banking SLM can be used as a service, including pretraining as a service and fine-tuning as a service. This enables banks to securely build their own custom Al models that comply with industry standards.

SLMs can either service a financial institution’s line of business or a horizontal function across all business lines:

Figure 2: Examples of horizontal and vertical SLMs

Source: Infosys Knowledge Institute

How to build an SLM

There are two ways that financial institutions can approach building an SLM: Fine-tune a pretrained SLM like Phi 3 (Microsoft), Gemma (Google), Mistral 7B, Llama (Meta), on domain specific dataset, or build an SLM from scratch – train the transformer architecture model using domain specific dataset.

Fine-tuning a pretrained SLM: This is the process of taking a foundational pretrained model and customizing it through targeted training on a smaller, domain or task-specific dataset to optimize performance for specialized financial applications. This leverages the base model's general knowledge while adapting its behavior to a financial institution’s specific needs. Here are the steps to fine-tuning an SLM:
1. Define the objective and prioritize key use cases: Establish clear value propositions by identifying high-impact, domain-specific applications that drive measurable business outcomes. For example, developing an SLM for the cards business to enable agentic AI use cases across the customer life cycle — originations decisioning, real-time authorization, advanced fraud detection, prevention and processing, accounts receivable automation, collections workflows, and disputes resolutions.
2. Data collection, curation and tokenization: Create a high-quality cards business-specific dataset that includes data on historical business requirement documentation, as well as functional and technical design specifications, compliance and regulatory manuals, and labelled transactions. Execute data quality assurance protocols to eliminate redundancies, standardize nomenclature, and ensure data integrity across all sources. Transform the data into a structured format such that model can understand. Tokenize the structured data to convert into numerical tokens that ensures data privacy.
3. Model selection and parameter setup: Select a pretrained base model that fits the bank’s needs. In our experience, open-source base models Phi-3, Llama, or Mistral families would be suitable for a cards business SLM. Banks save money and time because these models are small, easy to customize, and work fast. It gives banks full control over deployment and avoids reliance on third-party APIs. Modify the parameters (weights and biases) in the model using Parameter-Efficient Fine-Tuning (PEFT) techniques like LoRA (Low-Rank Adaptation) that freezes the original model weights and injects a small number of new, trainable parameters. It reduces the memory and compute required for training the model.
4. Train the model: Use a deep learning framework like PyTorch, or TensorFlow to run a training loop so that the model learns from the data. These frameworks offer fast, efficient training with built-in tools and LoRA support, saving time, cost, and ensuring scalability. Tokenized data can be fed into the model in a predetermined batch size. The model processes tokenized data in a loop and with each batch loop, the model can generate a response that can be compared with the input dataset. If the model output is wrong, backpropagation helps fix it by tweaking the trainable parameters to reduce the error.
5. Evaluate and deploy: Use a golden test dataset — one that has been curated and validated by humans - to evaluate the model's output on the bank’s predefined metrics like F1-score, which is a measure of how well a classification model performs by combining precision and recall. Retrain the model until performance meets the desired level for the set objective use case. Deploy the cards business SLM model on an edge device or higher-grade CPU such as the Intel Xeon or Core i7/i9 processors.
6. Building AI applications using SLM: The cards business SLM can be leveraged for all AI use cases involving cards — reasoning, summarization and decisioning tasks. Agentic AI use cases on card originations, and fraud, collections can provide more accurate outcomes thanks to this domain-specific training.
Building an SLM: Unlike fine-tuning an existing SLM, building a proprietary one from scratch allows the business to have governance oversight and control over data sourcing and explainability, parameter tuning and the performance of the model. This approach ensures full IP ownership, regulatory compliance, and alignment with specific business objectives while maintaining transparency.

The most critical step is to choose the right transformer architecture.

A transformer is a type of deep learning model designed for language processing. Banks can choose between a decoder-only, such as GPT, when it needs to generate text like responses or summaries. For tasks that involve understanding and analyzing text, such as classifying transactions or detecting fraud, encoder-only models like BERT are more suitable. After selecting the right model, it should be trained from scratch using the bank’s own curated, domain-specific data so it learns the specific language and processes of the domain.

After the selection of the transformer architecture, the further steps set out in the section on fine-tuning an SLM remain the same. However, training efforts and hardware requirement to build an SLM are likely to be higher.

Enterprises can choose to fine-tune an SLM when they are constrained by time, budget, computational resources and less data explainability concerns. But for financial institutions and banks, building a domain-specific SLM from scratch is often the best approach because:

They have the right data: Banks routinely collect massive amounts of customer and transaction data that's perfect for training specialized models.
Regulations are strict: Financial institutions face tough requirements around data transparency, explainability, and proving where their data comes from. When you build from scratch, you control every aspect and can easily meet these compliance demands.
Better control: Starting fresh means you decide exactly what data goes in, how the model learns, and how it makes decisions — crucial for regulatory approval and risk management.

This approach gives banks the specialized AI they need while staying compliant with financial regulations.

How continual fine-tuning keeps SLMs relevant for financial institutions

Continual fine-tuning (CFT) is the process of incrementally updating a model with new data, ensuring it evolves in line with changing business realities rather than remaining static after initial training. For SLMs, this capability is particularly critical in dynamic environments where agility and relevance are paramount. In financial services, banks can leverage CFT to ensure their enterprise SLMs remain aligned with rapidly shifting regulatory mandates and market conditions.

Alternatives to SLMs

Although SLMs are the preferred choice for agentic AI adoption, because they align with the modular, task-specific nature of agents and offer better economics, they aren’t the only solution. Financial institutions can also fine-tune an LLM or use the retrieval-augmented generation (RAG) approach (Figure 3).

RAG finds information from trusted documents or databases, then gives it to the LLM to create an answer. RAG helps LLMs reduce hallucination, keeps answers accurate and current, and helps the LLM give reliable responses using real, verified knowledge.

Figure 3: Comparing SLMs vs LLM fine-tuning vs RAG

Source: Infosys Knowledge Institute

The final decision will be driven by key parameters such as the use case, domain-specific data availability, resource constraints and based on the importance of data explainability.

However, these approaches are not mutually exclusive. Banks can use a hybrid approach to serve their purpose. For example:

SLM with RAG: Use a domain-specific, low-latency SLM as the generator in a RAG-aided system. This provides excellent balance between the efficient SLM and the up-to-date, dynamic data-laden, factually accurate RAG. For example, a global bank can create a mortgage SLM and then aid it with RAG to fetch country-specific real-time compliance rules and regulations.
RAG plus LLM fine-tuning: An LLM can be fine-tuned on a bank’s domain-specific dataset. RAG can be adopted on it to access and retrieve from a live knowledge base. This can provide more accurate up-to-date responses for any reasoning prompt. For instance, a bank can fine-tune a foundational model like GPT for enterprise level contract management process and then supplement it with RAG to fetch customer-specific latest details from live database.

Implementation roadmap for SLM adoption

Financial institutions can expedite enterprise-level AI implementation through structured SLM adoption:

Align SLM adoption with business priorities: Focus on use cases that deliver measurable business value. Prioritize applications addressing immediate operational challenges and regulatory requirements.
Establish a cross-functional AI task force: Create teams spanning business units to identify high-priority SLM use cases. Include stakeholders from risk, compliance, IT, and business operations to ensure comprehensive evaluation.
Build or acquire domain-specific data assets: Develop comprehensive datasets tailored to financial services applications. Quality data drives SLM effectiveness and regulatory compliance.
Choose appropriate base models: Select foundation models aligning with specific use cases. Begin training with domain-centric data to enhance performance and accuracy.
Modernize infrastructure for on-premises or edge deployment: Invest in infrastructure supporting local deployment. This ensures data privacy, reduces latency, and maintains regulatory compliance.
Implement AI governance and risk controls: Establish frameworks meeting regulatory requirements. Create clear audit trails and explainability mechanisms for compliance review.
Measure ROI and scale strategically: Create metrics tracking performance and business impact. Industry data shows that 71% of organizations use AI in financial operations, with 57% of leaders reporting ROI exceeding expectations.

The future of financial AI is smart, not just big

The advantages of SLMs extend beyond technical capabilities. They deliver strategic value across operational efficiency, business growth, risk management, and regulatory compliance. The Infosys Bank Tech Index research indicates that banks expect AI to have the most positive impact in areas that improve operational efficiency — including productivity, quality, growth, and speed. Over 20% of banks see AI generating the most value from business operations. Generative AI alone could inject between $200 billion and $340 billion annually into the banking sector.

Financial institutions implementing SLMs can achieve faster deployment, greater transparency, and more controllable AI solutions. This combination of focused capability and regulatory compliance makes SLMs increasingly attractive to banking leaders seeking innovation while managing risk.

SLMs represent a cost-effective, secure, and accurate pathway to enterprise-level AI success in financial services. They address core challenges limiting LLM adoption: cost, privacy, regulatory compliance, and explainability. The technology is ready, the framework exists, and the ROI is demonstrable. The question is not whether to adopt SLMs, but how quickly institutions can implement them effectively.

Authors

Vaibhav Bhokare, Amit Kumar Gupta, Sharan Bathija