Data assurance

Trend 10. Cloud data validation for reliable data clouds and lakes

Data errors and inconsistencies accumulate, with data moving in or out of the cloud (or data lakes). Therefore, the lack of proper cloud data validation is an existential threat to data-sensitive organizations.

Each data repository consists of unique data validation rules, making it difficult to identify rules, even for medium-size repositories. Most data-quality checks are dynamic, hard to code, and need to be updated constantly. Understanding data access controls is crucial. Even if an external service provider holds data, customers themselves are responsible for the security and integrity of their owned data. Data in the cloud typically resides in a shared environment with data from other customers. Hence, it is critical to encrypt and segregate each customer's data separately for data integrity. Data recovery is also important to ensure data integrity, indicating the requirement of a data retention strategy (warm, cold, hot, etc.). These factors showcase the importance of validating cloud data.

Companies have implemented innovative solutions using custom utilities and Infosys Data Testing Workbench (IDTW). Enterprises have established direct connectivity for automated data validation in their legacy databases. This enables a single automation platform for end-to-end data validation, from on-premises legacy systems to various AWS cloud data sources. AWS Redshift and Amazon S3 are using IDTW for their end-to-end automation validation.

Data assurance

Trend 11. Developing end-to-end, self-service test data management

Organizations have shown increased interest in TDM in recent times, as they realize that proper test data can prevent financial losses caused by production defects. Test data has evolved from a few sample files to powerful test data sets with high coverage.

Also, with the growth of Agile and DevOps, quality assurance has become more integral to the sprint cycle. Accommodating tight delivery schedules requires frequent tests with self-service, on-demand test data. The DevOps framework should have endto- end, self-service TDM embedded in it, providing accurate test data in a fast, efficient manner. This will enable high-quality, continuous, and on-time software delivery. The core of TDM is to address test regulatory compliance, data privacy, test coverage, and ondemand data availability.

The trend of end-to-end, self-service TDM covers synthetic data generation and data subletting for multiple formats, gold copy creation and data provisioning, and self-service data requests.