Open-source data

Trend 4: HTAP demonstrates efficiency in real-time analytics

Hybrid transaction/analytics processing (HTAP) can reduce the time lag between a business event and its visibility in analytics. By combining the capabilities of a transactional database that provides high speed, atomicity-consistency-isolation-durability compliance, and SQL friendly interface with the broad analytical capabilities of an OLAP/data warehouse, HTAP reduces complexity and enables faster decision making.

HTAP databases use memory structures to allow rapid ingestion of business events that reliably get stored as a system of record. By horizontally scaling out data storage and processing, HTAP databases provide sufficient room to carry out complex analytical queries without any additional ETL process. Further reduction in costs and complexity can be achieved by leveraging a cloud HTAP in an as a service model. Some databases that support this model are SingleStore, GridGain, MongoDB Atlas, and Couchbase.

A large North American bank leveraged SingleStore to create an HTAP database to store master data currently stored on a mainframe DB2 database. By using the in-memory capabilities of SingleStore, the bank could offload over 11TB of data from the mainframe with over 1,000 transactions per second. In the first month of operations, the new database achieved over 27 million read requests, including analytics workloads.

Open-source data

Trend 5: AI/ML-driven data engineering gains prominence

Businesses today rely on an ecosystem of enterprises to deliver customer value. Each partner in this symbiotic ecosystem is a node that depends on internal information and shared intelligence from other partners. This forces these organizations to be data and intelligence-driven and be agile to respond. While the organizations have adopted agile ways of working, they still rely on traditional hand cranked data pipelines that are requirements-driven and limit the pace at which businesses can respond.

To overcome these limitations, enterprises are shifting toward AI and ML-driven data engineering combined with industry semantics. AI/ML are used for 'source to target mapping', 'auto data curation', 'smart and contextual data quality management', 'data rights management', and 'collaborative data management and governance'. The goal here is to enable engineered systems to process data from disparate systems, learn from experience, and work with humans and machines in a symbiotic relationship.

A leading global CPG company built a retail data intelligence cloud to listen, collate, curate, and drive shopper and category intelligence from retailers. This was a time- and effort-intensive task, involving multiple stakeholders. Infosys implemented an AI-powered data engineering framework that uses ontology-driven services to organize, process POS data signals across retailers, and apply ML-driven data quality. The solution made it a machine-run, human-assisted process. This reduced retailer onboarding time by 50% and time to insight by 30%.