As data moves in or out of the cloud (or data lakes), data errors and inconsistencies accumulate. This causes less than 40% of data clouds (and lakes) to be reliable and usable. Lack of cloud data validation is an existential threat to data-sensitive organizations.
Uniquely different data validation rules within each data repository make it difficult to identify rules even for medium-size repositories. Majority data-quality checks are dynamic, hard to code and need to be updated constantly. Understanding data access controls is crucial. Even if data is held by an external service provider, customers themselves are responsible for the security and integrity of owned data. Data in the cloud typically resides in a shared environment with data from other customers. Hence, to have data integrity, it is critical to encrypt and segregate each customer’s data from others’ data. Data recovery is important to ensure data integrity. At the same time, a data retention strategy ( warm, cold, hot, etc.) is required. All these factors have led to the rise in the importance of validating cloud data.
Companies have implemented innovative solutions using custom utilities and Infosys Data Testing Workbench (IDTW) to have a single automation platform for end-to-end data validation, all the way from on-premises legacy systems to various AWS cloud data sources, to validate huge amounts of data. Enterprises have established direct connectivity for automated data validation in legacy databases, AWS Redshift and Amazon S3 using IDTW for end-to-end automation validation.
Major financial losses caused by production defects have led to a rapid rise of interest in test data management (TDM) in the testing industry. This is because the losses could have been prevented by detection via testing with proper test data. Test data has evolved from a few sample files to powerful test data sets with high coverage.
Also, with the growth of Agile and DevOps, quality assurance has become more integral to the sprint cycle. Accommodating tight delivery schedules requires frequent tests with self-service, on-demand test data. To be successful, the DevOps framework should have end-to-end, self-service TDM embedded in it. This will give the right teams accurate test data in a fast, efficient manner, making it possible to drive high-quality, continuous and on-time software delivery. The core of TDM is to address test regulatory compliance, data privacy, test coverage and ondemand data availability.
The trend of end-to-end self-service TDM covers synthetic data generation and data subletting for multiple formats, gold copy creation and data provisioning, and self-service data requests.
To keep yourself updated on the latest technology and industry trends subscribe to the Infosys Knowledge Institute’s publicationsCount me in!