Speech

Trend 5: Adoption of neural machine translation- and transcription-based systems to mine conversational insights

Historically, translation systems have been implemented using Statistical Machine translations primarily using count-based models. They were best suited for short sentences with standard nouns and phrases, importantly they are lightweight models. Neural Machine Translation and Transcription based systems have brought in significant improvement in accuracy and speed. Improvements are due to usage of deep learning, multi-head, self-attention mechanisms with encoder-decoder transformer architecture style on a pre-trained transfer learned corpus. Their model size is usually large with a parameter size of millions to billions and needs more than one GPU. They also make zero-shot learning possible. For example, in the absence of underlying language data for Portuguese translation, a translation from Spanish to Portuguese is achieved by translating it first from Spanish to English and then English to Portuguese.

For a large railroad company in the U.S., Infosys assisted in transcribing call center conversations using speech-based custom models to identify which product lines had maximum issues being reported, which agents were driving customer satisfaction versus which ones needed training, and the correlation of the rise or the drop in calls with events such as line failures, new product launches and faults reported.

Infosys partnered with a large global airplane manufacturer to transcribe the conversations between pilots and ground staff. The conversations were studded with cockpit noise, strong regional accents, different languages and heavy ambient noise. We successfully custom-trained for the variation in accents to deliver high transcription accuracy, ran language insights to infer causes of delay in flight landing and accidents in the air, and also used transcriptions and insights to improve ground staff and pilot training.

For a large retailer in the U.S., Infosys transcribed call center conversations to derive information about intentions, conversation sentiments and key topics, and we also clustered subject-specific intentions, such as all order delivery-related, payment-related and product return-related intentions in order to understand and improve gaps in supply chain operations.

Speech

Trend 6: Speech biometrics

Speaker-based authentication and verification is another key trend that is getting adopted as an augmented biometric method in addition to those already deployed by enterprises, such as using thumbprint or facial recognition. With the COVID-19 situation, this has gained more relevance.

Experiences that are now de facto with smartphone experiences — where Android and iOS operating systems are able to provide hooks to capture user voices, train and then use them for user authentication, search, query and other functions — are slowly graduating into the enterprise too.

Speaker verification and identification are used in several ways, such as identifying the caller and then greeting the person by name to make it highly personalized or pulling back-end data to provide contextual recommendations and suggestions without exclusively asking for the name; these have a significant influence on the experience of the user.

In this context, among others, we worked with a large global financial institution to develop speaker authentication in a contact center.

Subscribe

To keep yourself updated on the latest technology and industry trends subscribe to the Infosys Knowledge Institute’s publications

Infosys TechCompass