Data engineering

Trend 10: AI-powered technologies enhance data scientists' experience

Even today, many data scientists manually analyze data by using various techniques, with the need to apply various data cleansing activities. There is no standardized set of tools for data wrangling, analytics, feature engineering, and model experimentation. However, data scientists are increasingly shifting from an artisan to an industrialized ecosystem, leading to increased adoption of automated advisory for data cleansing and wrangling for faster feature engineering and quality analysis.

Legitimate privacy concerns, new regulations, cost pressure, and inherent data bias have pushed enterprises to explore data augmentation, an automated process for preparing data and synthesis. One subset of this automated approach is synthetic data generation. Here, data is synthesized from scratch when no data is available, or when outliers/edge cases are rare in real-world data. This approach should be used when safe, reliable, fair, and inclusive ML models are required.

A Europe-based telco wanted to use customer data to enhance client retention. The company worked with Infosys to build datasets to effectively predict customer churn. The telco reduced churn by 10%-15% by developing a catalog of customized offers.

Data engineering

Trend 11: Responsible data crucial for safe and sound AI development

Explainable AI through responsible data is still evolving. The bias on data can have devastating effects on business outcomes, causing serious ethical and regulatory issues. The application of responsible and ethical data policies in AI development is beneficial for businesses and societies.

The rising dependability of AI applications on data to develop and train algorithms highlights the importance of secure and reliable systems. Businesses must consider the following elements as a part of AI design principles:

  • Identify data origin and data lineage.
  • Identify the use of internal and public data for building models.
  • Identify potential data corruption and anomaly detection.
  • Protect individual data privacy rights.
  • Resist cyberattacks.
  • Comply with legal and regulatory requirements.

Data engineering

Trend 12: AI-based tools enhance data-quality

Whether it is for decision-making by corporate executives, frontline staff, or intelligent ML models, any intelligent enterprise needs high-quality data to operate. However, data quality issues are widespread. AI-based data-quality analysis has become an integral part of the ML Ops pipeline.

Enterprises have started considering data engineering an integral part of their data strategy. Tools such as Lakehouse, metadata management, data lineage, data quality, and data discovery will play a significant role in the data engineering architecture.

For data sharing between big enterprises, another technique of note is called “cooperative computing". This technique is deployed when robust datasets are needed for innovative new corporate ML models at scale and speed. In this paradigm, datasets are consolidated and encoded, facilitating different users to use these datasets efficiently and effectively.

An investment firm undergoing a modernization exercise wanted to build a data pipeline on AWS for corporate customers. It involved identifying, ingesting, cleansing, and loading the existing data in its legacy IT ecosystem built on mainframes. The firm partnered with Infosys to leverage Infosys Data Workbench to build a quality gate, where the data from mainframes could be profiled and ingested for building data-centric services. This included demand forecasting and identifying the next best action. The customer improved the marketing campaign effectiveness by 70%, with effort savings of around 45% for the commercial sales line.


To keep yourself updated on the latest technology and industry trends subscribe to the Infosys Knowledge Institute's publications

Infosys TechCompass