Integration frameworks for Data Management in GCP

Girish Kurup
May 29, 2023

--

Various scenarios of integration

Within a GCP project

  1. Single process, Single Service or Single Job use GCP cloud scheduler
  2. Integration of Multi services , processes, multiple jobs , use GCP workflow. For Data pipelines that are developed using YAML based.
  3. Data pipelines that are ETL or ELT based . If Data pipelines are Python based use Cloud Composer. Cloud composer can be used to mange data pipeline in other cloud providers or on Prem Data Pipelines. Both Data Proc batch data transformation data pipelines and Data Flow based data transformation pipelines can be orchestrated using Cloud Composer

Between GCP projects

Use Kafka integration framework such that producers ( Data Products) produce and publish in respective PUB SUB topics. Producers publish once to Kafka topics and Consumers ( Down stream Data Products or Appa) subscribe to relevant topics

--

--

Girish Kurup

Passionate about Writing . I am Technology & DataScience enthusiast. Reach me girishkurup21@gmail.com.