Integration frameworks for Data Management in GCP
May 29, 2023
Various scenarios of integration
Within a GCP project
- Single process, Single Service or Single Job use GCP cloud scheduler
- Integration of Multi services , processes, multiple jobs , use GCP workflow. For Data pipelines that are developed using YAML based.
- Data pipelines that are ETL or ELT based . If Data pipelines are Python based use Cloud Composer. Cloud composer can be used to mange data pipeline in other cloud providers or on Prem Data Pipelines. Both Data Proc batch data transformation data pipelines and Data Flow based data transformation pipelines can be orchestrated using Cloud Composer
Between GCP projects
Use Kafka integration framework such that producers ( Data Products) produce and publish in respective PUB SUB topics. Producers publish once to Kafka topics and Consumers ( Down stream Data Products or Appa) subscribe to relevant topics