A lift-and-shift migration to AWS usually involves infrastructure-as-a-service(IaaS) components such as EC2 instances and auto-scaling groups, and other AWS services such as application/network load balancers, CloudWatch, IAM, and Route 53. In a hybrid deployment, AWS Direct Connect can help leverage on-prem services such as Active Directory and other enterprise services. It is also a best practice to plan and architect for high availability and disaster recovery. Figure 1 depicts a reference deployment architecture for applications migrating to AWS using a lift-and-shift approach.
Figure 1. Reference deployment architecture for a lift-and-shift approach
Cloud migration can present a variety of challenges, as each application involves unique technical constraints.
However, some common challenges in migration programs are shown in Figure 2.
Figure 2. AWS migration challenges
|IaaS vs. PaaS
|Choosing between IaaS and PaaS for AWS migration is a challenge. PaaS for databases may be desirable, but it may increase complexity and effort
|Database cutover strategy
|Performing the cutover with minimal downtime and zero loss of data is one of the most challenging tasks
|Source code availability
|Source code is unavailable for certain applications
|Security protocols, encryption
|Latest security protocols and encryption algorithms on the cloud can break applications
|Enterprise processes can be cumbersome and impact migration timelines
|Automation, environment setup
|Automating the setup of applications in the different environments (DEV, TEST, UAT, PROD) could be a challenge in the cloud
|Critical applications will need zero downtime during deployments
|Ensuring critical jobs experience no downtime is a challenge
|Legacy features such as iFrames may stop working as expected on the cloud
We will now delve into these challenges and their corresponding solutions in detail:
IaaS versus PaaS
- Legacy DB servers may have 10s/100s of databases that are interlinked with each other (or other DB servers) through various mechanisms such as linked servers and aliases.
- In SQL Server, for instance, various features such as distributed transactions, CLR functions, reporting services, integration services, etc. may have been used over the years.
- Implementing data rollback from RDS to on-prem databases or setting up an availability group with both on-prem and RDS for database cutover becomes highly complex.
- Certain features may not seamlessly function after migration to RDS, requiring significant reengineering effort.
- Analyze the complexity and features used in the existing databases.
- If the complexity is significant, or unsupported features are used, the IaaS approach (e.g., SQL Server on EC2) will reduce the overall complexity and enable the migration with minimal effort.
Unavailability of source code
- Some applications may remain unchanged and not be actively maintained for years, resulting in the production code version differing from the one in source control.
- When dealing with such applications, always compare the production version to the one in source control. Decompiling tools may be used to enable this comparison, especially for components like DLLs.
- If substantial differences are found between the versions, consider migrating the complete set of production artifacts as-is. Upload the deployed code/artifacts to a repository like Nexus and proceed with cloud deployment.
Compatibility issues with HTTPS/TLS/encryption algorithm
- Upgrading and migrating applications to the cloud may lead to encryption algorithm mismatches between cloud and on-prem servers, especially in hybrid deployments (e.g., with on-prem DB or other services).
- In frameworks such as .NET, adjusting the security protocol to the required version (e.g., TLS1.2) in the code can resolve many issues caused by the default security protocols used by the application.
- It is crucial to verify the list of supported ciphers on both cloud and on-prem servers. If they differ, the application won't function in hybrid mode. Then it is necessary to update the server's ciphers on both the cloud and on-prem.
Database cutover strategy
- Critical external facing applications should experience minimal or zero downtime when transitioning to the cloud.
- Data migration to the cloud should be done without any loss of data.
- Applications and components should function as they did when the database was on-premises.
- When SQL Server databases are migrated, the concept of a stretch cluster can be leveraged, where a DB instance on the cloud is setup and is part of the same cluster as the on-prem instances, with an always-on availability group configuration.
- The existing listener names can be repurposed without any changes to the application code/configuration. The listener will direct the traffic to the cloud DB after the cutover.
- Before the cutover, the on-prem DB is the primary, and the cloud DB is the secondary.
- After the cutover, the roles switch, with the on-prem DB becoming the secondary and the cloud DB becoming the primary.
- For applications using the DB server name instead of the listener's name, CNAME entry could be added to route the traffic from old server to new server.
- This approach requires the SQL Server version on the cloud to match the on-prem version, preventing migration to the latest version on AWS or RDS.
DB migration challenges
- When a database is migrated to the cloud, it is necessary to identify all dependent apps. For instance, some might be utilizing the database without anyone’s knowledge.
- Discrepancies in firewall/security rules between cloud and on-premises setups could lead to post-migration issues, with potential blockage of traffic from specific applications.
- Databases could be accessed in several ways - SQL Auth, Integrated Auth, etc. If there is a mismatch in permissions between cloud and on-prem, it can lead to application failure.
- Monitor the production database for a period, to find all the client applications and make necessary changes for seamless integration with the cloud DB after migration.
- Ensure that all applications are tested in lower environments with similar types of integrations (e.g., on-prem to the cloud connections, etc.). All connectivity issues found in the lower environments should be tracked and applied in advance in the PROD environment before cutover.
- Properly migrate all users and logins to the cloud DB.
Organizational processes (ARB, ISO, CAB, etc.)
Enterprise-level processes vary across organizations. Approvals from different governing bodies are necessary for production deployment.
Effort estimation for the migration should consider all these aspects:
- List all the end-to-end processes involved in migration of an application.
- Arrive at an agreement on which processes should be followed and which ones can be waived for the migration program.
Some of the typical approvals include:
- Information security office approval: Review of application architecture from a security perspective, irrespective of its on-premises security posture.
- Data governance approval: Assessment of sensitive data (e.g., PII and PHI) handling to ensure compliance with data protection standards. Architecture review board approval: Solution and enterprise architects validate the proposed architecture.
- Change advisory board approval: Ensures adherence to all necessary processes for an application deployment.
- Application security scans (e.g., Fortify and InsightVM): Conduct scans to identify and resolve vulnerabilities before deployment.
Automation, environment setup
When applications are migrated to the cloud, the recommended practice is to completely automate the deployment from scratch. This includes Installation and setup of required environment/software/frameworks on the cloud server and application deployment.
- The existing application service accounts used for the on-prem applications may be old and the passwords may not be available for the same.
- The details of the existing environment for an application may not be readily available.
- The necessary installable artifacts as well automation scripts to install them may not be available.
- The deployment pipeline for cloud will likely be different from the existing deployment pipeline.
- The new process may need all application vulnerabilities (static, dynamic etc.) to be addressed.
- Custom images (e.g., AMI) would have to be built from scratch, and version controls may be absent.
- Create new service accounts that mirror the old service accounts with respect to access to DBs, shared folders and other services.
- During the effort estimation, clarify about the availability of all required environmental information and include the same in assumptions.
- Estimate appropriately for the cloud environment automation, including the setup of all environment software in an automated manner.
- Estimate appropriately for IaC requirements.
- Clarify the requirements for addressing existing vulnerabilities and estimate the effort for the same.
In AWS, the recommended practice is to make use of ASGs instead of standalone EC2s. This is to enable easier termination of older instances and instantiate newer instances with the latest security patches and updates. Even during an application deployment, the older EC2 instance is terminated, and a new EC2 instance is created for the deployment of the newer version of the application.
- This approach can cause considerable downtime for an application (e.g., 1 Hour), which will not be acceptable for critical applications.
- Implement a blue-green strategy, where the older instances along with the related load-balancers keep running (e.g., Blue) and the newer version is deployed on new instances (e.g., Green). Once all the sanity checks are performed on the Green instance, the traffic may be switched from Blue to Green.
- The old instance (Blue) may be kept running for a few days, to enable a rollback if the new instance (Green) encounters issues.
Older versions of application framework
- Applications running on an older application framework (e.g., .NET 1.0)
- Source code unavailability of such applications
- In such cases, it may not be possible to deploy the application on the latest OS on the cloud. Utilize an image on the cloud that runs on an OS (e.g., Windows Server 2016 instead of Windows Server 2019) that supports the older framework version.
Batch job migration strategy
- Certain critical batch jobs may not have any downtime.
- There cannot be two instances of the same job running at the same time.
- Disable automatic instance refresh, which can cause downtime in production.
- Implement a strategy, where deployment is done on the same existing server instance instead of creating a new server instance.
Cookies in iFrame or third-party sites
- Older applications often use features such as iFrame, where components from one domain are loaded into another domain's application. These apps may rely on session cookies for proper functioning.
- When upgrading an application before migration (e.g., .NET framework upgrade), the newer framework may enforce stricter security, blocking cookies from being shared between different domains. Consequently, the migrated application may not function as intended.
- Use tools such as Fiddler to intercept the web application traffic and observe the cookies sent and received between successive requests and responses.
- Implement the necessary changes in the app, typically in the configuration (e.g., the SessionState element) to enable cookies sharing between domains.
- On-prem timeout/idle time settings may differ for load balancers, web servers, etc. If these settings are different from those in the cloud when dealing with long-running service calls, DB calls, etc., timeout issues may arise.
- Maintaining shared paths/folders on-premises can lead to performance problems, especially during the transfer of large files.
- Identify timeout/idle time settings at on-prem for various components and ensure similar values are applied in the cloud.
- Replace on-prem shared paths/ folders with an S3 + Storage Gateway implementation. However, ensure to understand and update upstream/downstream apps that use the shared path.
A successful migration strategy requires thorough consideration of requirements and cost-vs-benefit analysis. While a re-engineered or re-architected target state could have significant advantages, the associated costs may pose challenges. The lift-and-shift approach is quicker and easier but involves execution challenges. Companies must prepare mitigation plans to address these challenges while striking the right balance between benefits and costs for a seamless and efficient cloud migration.