International Journal of Leading Research Publication

E-ISSN: 2582-8010     Impact Factor: 9.56

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 6 Issue 8 August 2025 Submit your research before last 3 days of to publish your research paper in the issue of August.

Migrating Spark Jobs from On-Premises to GCP Cloud Dataproc

Author(s) Suhas Hanumanthaiah
Country United States
Abstract The growing demand for scalable, cost-efficient, and agile data processing solutions has driven organizations to migrate big data workloads from on-premises environments to cloud platforms. Apache Spark, a widely adopted distributed computing framework, plays a pivotal role in processing large-scale datasets, and its migration to the cloud has become a strategic imperative. This research paper provides a comprehensive exploration of migrating Apache Spark jobs to Google Cloud Platform (GCP) using Dataproc—a fully managed, scalable, and cost-effective service for Spark and Hadoop workloads. The study evaluates various migration strategies, including lift-and-shift, cloud-native re-architecting, and hybrid approaches, while analyzing critical factors such as resource management, job scheduling, storage integration, and configuration optimization. Emphasis is placed on performance tuning through intelligent frameworks, zero-execution configuration techniques, and reinforcement learning-based optimization, all of which significantly enhance Spark performance in the cloud. Real-world case studies from domains such as healthcare, bioinformatics, and real-time analytics illustrate practical benefits including performance gains, operational efficiency, and improved scalability. The paper concludes by offering best practices for successful migration, recommendations for production readiness, and insights into future trends such as serverless computing, AI integration, and edge convergence. These findings provide a robust foundation for enterprises planning to modernize their big data infrastructure through cloud migration.
Keywords Apache Spark, Google Cloud Platform (GCP), Dataproc, Cloud Migration
Field Engineering
Published In Volume 6, Issue 1, January 2025
Published On 2025-01-08
Cite This Migrating Spark Jobs from On-Premises to GCP Cloud Dataproc - Suhas Hanumanthaiah - IJLRP Volume 6, Issue 1, January 2025. DOI 10.70528/IJLRP.v6.i1.1695
DOI https://doi.org/10.70528/IJLRP.v6.i1.1695
Short DOI https://doi.org/g9v44x

Share this