Data Files Transfer


The data files transfer are done using Airflow in Google Cloud Composer. The first scenario is to transfer data from database to data warehouse (Google BigQuery). The connection of both database source and BigQuery has to be defined first in the connection list in Airflow. By creating a pipeline through DAG (Directed Acyclic Graph) in Airflow, the automation of data file transfer can be made.

Weekly Data Aggregation

The image above shows that the extracted query result from the database are stored into the Google Cloud Storage (GCS) temporarily. Then the stored data are loaded into the desired datamart in the BigQuery. The scheduler is configured to control the frequency of data transfer.

The next scenario is transfering the flat files (mostly excel files) from Google Drive to BigQuery. In this case, the service account that has permission to particular Google Drive is utilized to download the files.

Weekly Data Aggregation

After extracting the files, there are some transformations applied in order to comply with BigQuery table format before the data is stored in GCS. Similarly, the stored data are loaded into the desired datamart in the BigQuery.

The last scenario is transfering the Google Sheets files in Google Drive by streaming directly using BigQuery external table feature.

Weekly Data Aggregation

Since the streamed external table could not be altered, hence it may be required to do another query to cleanse or modify the table and stored it into another table or view.


Copyright © 2023 Giovanni Abel Christian