Currently, when setting up a Cloud Storage destination (like Google Cloud Storage), it is difficult to organize data into a standard partitioned folder structure (e.g., folder/YYYY/MM/DD/file.parquet) because the "Upload Path" field often treats date tags as literal strings or lacks support for granular date variables.
The Solution: I would like to suggest the implementation of native support for date variables within the Upload Path field. Ideally, this would include:
-
Standardized Tags: Support for tags like
{YYYY},{MM}, and{DD}that resolve based on the data's date range or the execution date. -
Dynamic Subfolder Creation: The ability to use the forward slash
/character combined with these tags to automatically generate the directory structure in the bucket. -
Hive Partitioning Format: Enabling users to define paths like
year={YYYY}/month={MM}/day={DD}/to facilitate seamless integration with Data Lake tools like BigQuery External Tables, AWS Athena, or Spark.
Why this is important:
-
Data Organization: Manually managing massive amounts of data in a single root folder is not scalable.
-
Query Performance: Partitioning is essential for optimizing query costs and speed in BigQuery/Athena.
-
Automation: It eliminates the need for middleman scripts (like Cloud Functions or Glue) just to move files into the correct date-based folders.
Use Case Example: A user wants to export Google Ads data daily. With this feature, the path would automatically resolve from google_ads/data/ to google_ads/2026/01/28/data.parquet without manual intervention.
