A leading investment management company in North America required significant data testing effort for their data modernization journey. Data movements across distributed upstream and downstream applications, structured data involving heterogeneous storage environments and data volume of petabytes were involved in the program.

Existing data on conventional databases and big data landscape is being migrated to cloud for realized and proven benefits. This involved:

  • Migration testing of the historic data
  • Intake of quality data to cloud from upstream systems for making strategic decisions and reporting
  • Ensuring that the new infrastructure catered to existing capabilities of ETL

Key challenges

  • Inefficient manual testing due to lack of a proven cloud testing solution in client’s data landscape
  • Voluminous data to be tested (~20 TB) and 20K+ test cases
  • Validation required for 15-20 releases per month

Ready to experience?


The Solution

Amazon Web Services (AWS) Cloud native solution in full stack automation framework

A full stack automation framework was designed and developed for the data warehouse testing program. Test strategy developed for Big Data, Data Lake, Cloud migration, BI report validation and rationalization included solutions and skill development.

With no in-house cloud testing solution, client testing team was forced to follow sampling manual testing approach for the validation of migrated data as part of the project. Infosys QA team addressed this inefficient process by working on a cloud solution that can automate cloud testing. This AWS cloud native solution was proactively offered to the client and was then implemented after a successful proof of concept (POC). Key features of the solution were:

  • Cloud adoption leveraging private / public cloud
  • Automated and optimized data validations in cloud using the cloud native solution
  • Spark based utility which runs on Amazon Elastic MapReduce (EMR) cluster capable to validate the data stored in AWS S3 bucket, AWS Relational DBs and other external Java Database Compliant (JDBC) databases

Other solution features include:

  • Full stack automation for in-sprint regression and automated Test Data Management (TDM)
  • Workforce transformation with teams upskilled on diversified technology stack like ETL, Cloud, R, Amazon services, etc.
  • Code tuning done on big data framework to incorporate multiple utilities in one framework using various technologies
  • Centralized data management team for end-to-end data testing of data mart (big data) and data hub
  • R-based data validation framework developed for validation
  • DevOps adoption for shift-left quality measures
  • Detailed reporting capability with drill down to each record

Levers implemented for Quality Engineering Transformation

  • Cloud adoption
  • Automation first approach
  • Centralized data management
  • Workforce transformation
  • 100% Agile and DevOps adoption


On-demand test environment provisioning and access control through successful cloud adoption

100% data validation for huge datasets of over 1 billion records

Effortless handling of multiple releases with 100% agile adoption and 70% regression automation

Up to 70% increase in execution productivity and optimized data comparison