Migration from cloud to cloud is not for the faint of heart. However, after a firm merger or purchase, this is occasionally required.
This article examines our work for Zego, a provider of flexible insurance options for commercial cars. A cloud-to-cloud conversion of Drivit's Azure-hosted platform was commenced following Zego's acquisition of vehicle insurance telematics expert Drivit. It was necessary to host all systems and apps on AWS to maximise consistency and cohesiveness. As with every cloud-to-cloud move, however, there were significant technological obstacles to overcome. We were tasked with handling the procedure.
Tens of millions of datafiles were transferred from Azure to AWS.
Drivit's usage of Azure Blob to receive and analyse data regarding driver patterns posed a particular difficulty. This portion of the platform was redesigned as part of the move to make advantage of AWS S3 uploads, which simplified data flow. Nevertheless, around 45TB of current datafiles required to be transferred to S3 to facilitate more cost-effective lifecycle management. The average file size was less than 1MB, and the sheer volume - tens of millions - rendered standard methods such as AWS Snowballs or Rclone ineffective.
A partitioning-based strategy
Before migration, traditional data transportation solutions scan the source location and store the inventory in memory. With so many datafiles to transfer, there was a possibility that memory usage might overwhelm the system, causing it to fail.
We required a data discovery solution that could store the inventory outside of memory with minimal complexity.
Our initial approach consisted of a queue system capable of handling millions of files. According to the architecture outlined below, we iterated over the Azure object store. Each datafile was published to the queue in order for the message to be collected, and the file transfer was accomplished asynchronously. We utilised AWS Batch to execute the 'export files.py' function throughout the Fargate container platform due to the lengthy duration of the object store scan.
This strategy performed well in testing, but was insufficiently quick in application. Even with AWS Batch, export files.py required many days to complete.
The challenge was comparable to attempting to locate a name in a non-alphabetical phone directory. You can begin at the head of the book and work your way down the list, but it will take a very long time. This is because the process is single-threaded. It is limited to a single CPU core by definition, therefore adding resources has no effect. We required a multithreaded procedure.
Our final solution
Using data partitioning, we ultimately built a multiprocessing approach. Drivit's file naming strategy incorporated unique user IDs, thus we partitioned the entire dataset into 62 simple alphanumeric partitions (26 upper case letters, 26 lower case and ten digits). In response, we modified export files to utilise Python threads to boost throughput. We also took use of the chance to add a filter step to remove migrated files from the processing queue. The architecture was as follows:
The export files.py AWS Batch job sends messages to the 'filter queue,' which the export worker.py Lambda function reads. Depending on whether the file already exists, the message is either discarded or moved to the primary queue. Last but not least, import files.py Lambda uploads the file to S3.
Managing cloud-to-cloud complication
Cloud-to-cloud conversions are notoriously challenging to accomplish. It can sometimes feel as if there is no feasible solution to problems such as those discussed in this article. Nonetheless, we have several years of experience managing difficult cloud migrations. Our perseverance and pragmatism enabled Zego to identify a viable solution, culminating in the complete migration of historic Drivit data to AWS.
For More information please visit: https://vaporvm.com/cloud-services/