Overview

On–premise Hadoop based ecosystem help enterprises process varied data sets and build actionable analytics. However, as these platforms are adopted at large scale, enterprise face challenges with provisioning clusters, increased costs, governance and performance. Analytical, Sandbox type of environments require provisioning On-demand compute needs which are difficult with on-prem Hadoop architecture as it does not support decoupling compute and storage.

Logo

Enterprises can address these problems by migrating to a stable, secure, governed cloud platform that can scale-on-demand, effectively manage costs, facilitate Pay-per-use features and meet compliance requirements. Analytical users can also tap into on-demand provisioning of infrastructure and leverage large base of prebuilt library components. Hadoop Migration to Cloud plays a key role in Data Landscape Modernization and can help capitalize opportunities provided by the data economy.

Our Hadoop migration strategy and accelerators can help enterprises to accelerate the migration journey to cloud efficiently.

Infosys data and analytics team has built solution through well-defined strategy and suite of tools to accelerate the Hadoop migration journey to Cloud platform.

We have identified different approaches for efficient migration to cloud:

  • Lift/Shift - Migrating the on-premise process with no changes to cloud
  • Retrofit - Migrating objects with minimal changes like storage components and functions compatible to a new environment
  • Re-architect: Redesign the application to achieve the benefits of modernized platforms
  • Hybrid: Migrating the applications with a combination of different patterns
Data Operations Service Offerings

Fig 1: Hadoop Migration to AWS- Patterns

We have designed accelerators and processes, to help migrating on-premise data lake objects and applications by any of the above patterns followed by an implementation strategy to help clients in achieving scaled and predictable outcomes.

Data Operations Service Offerings

Fig 2: Implementation Strategy

Accelerate your cloud migration with Infosys Wizard and AWS

TALK TO OUR EXPERTS

Accelerated cloud migration journey by 50% with capabilities -

  • Inventory Metadata collection
  • Schema conversion
  • Historical Data migration & catch-up loads
  • Data Certification

The Infosys Data Wizard can help accelerate the migration process. The solution consists of below components:

  • Assessment: A Comprehensive assessment framework that can identify usage patterns of source data stores and recommend best suited target data store
  • Modernization Recommendation: Decision matrix to help identify the right approach for each type of data store
  • Database Object Migration: Solution accelerators that help in migrating different types of DB Object inventory classes
  • Code/ Pipeline Migration: Solution accelerators that help in migrating different types of Data Processing Object inventory classes
  • Consumption Migration Solution accelerators that help in migrating different types of Consumption Object inventory classes
  • History Data Migration: Solution accelerators that help in migrating History Data to target Data Platform
  • Testing and Validation: A Comprehensive testing solution that accelerates validation of migrated assets
  • Partner Ecosystem: Vendor partnerships complement migration framework and solutions

We have varied approaches to meet client specific needs to migrate the workflows/code that are compatible with tools across different platforms.

Migration from Hadoop to AWS can be enabled in the below way:

  • Hadoop platform on cloud
  • Hadoop to AWS EMR
  • Hadoop to Next-gen services (Native+3rd party)
Line

Challenges & Solutions

  • Establish Value Realization Framework at the beginning, capture and monitor its throughout
  • Leverage capabilities offered by target platform, like:
    • Managed services to simplify & save on administration cost
    • Usage of temporary, on-demand storage & processing cluster (ephemeral model) compared to persistence
    • Make storage/compute design a regular task to save cost

Accelerated cloud migration journey by 50% with capabilities -

  • Ensure right Migration approach is followed like - Lift-n-Shift, Retrofit, Re-Architect etc. by considering benefits of target platform tools. Also depending on the workload all of these approaches can be followed instead of one.
  • Start small – build test sandbox & run POCs with smaller/ non-critical data, associated jobs and tune target product configurations
  • Identify dataflow patterns (pattern, tool, biz area) & build foundational components for Data Ingestion, Data Engineering, Common Data Libraries, Data Governance (Quality, Metadata, Lineage) in target tools
  • Leverage migration tool by target product vendor or its partners
  • Leverage off-the-shelf Testing tool (recommended by target product vendor)

Construct the right migration team with clear RACI (Responsible, Accountable, Consulted and Informed)

  • De-risk the program aptly.
  • Make a comprehensive program plan cutting across Governance, Hardware, Hadoop Software, Architecture, Application (Data, Objects, Code, Workflow, Consumption), Testing & Deployment

Split the data domains by timestamp, business lines, workload and convert it into an apt MVP (Minimal Viable Product)

  • sprint in plan
  • People churn is inevitable so consider knowledge mgt., issue mgt. as a critical activity

  • Security (authorization/access) & migration monitoring (Auditing, Logging) should be considered at the beginning
  • Validate each of the target technology components for security compliance (Network, Firewall, software, applications, encryption at rest/motion) with smaller set of data
  • Run security after each major task before the migration is released for production