Natural language processing

Trend 3: Active learning for content intelligence from documents

Enterprises embed information in various types of documents, digital or handwritten, comprising research study documents, know-your-customer (KYC) forms, payslips, and invoices. Here, extracting and systematically digitizing this information is a huge challenge. One advanced technique to derive content intelligence from documents is active learning. An AI classifier examines unlabeled data and picks parts of this data for further human labeling. This active process increases data quality, as the classifier controls data selection and picks only areas that are not optimized for ML. In one such legal use case (labeling contractual clauses), active learning increased data accuracy from 66% to 80%, even when using fewer data points. Labeling time and cost were also significantly lower; avoiding tagging by subject matter experts reduced costs by 18%.

A large global seed manufacturer partnered with Infosys to extract various data points from intellectual property documents related to studies and details of various experiments spread across geographies in different shared locations, languages, and versions.


To keep yourself updated on the latest technology and industry trends subscribe to the Infosys Knowledge Institute's publications

Infosys TechCompass