Skip to content

Google Cloud Storage Export V2 Integration

You can write job results directly to your Google Cloud Storage.

Prerequisites

  • Basic knowledge of Treasure Data, including TD Toolbelt.
  • A Google Cloud Platform account with specific permissions

Static IP Address of Treasure Data Integration

If your security policy requires IP whitelisting, you must add Treasure Data's IP addresses to your allowlist to ensure a successful connection.

Please find the complete list of static IP addresses, organized by region, at the following document

Obtain the Destination Bucket in Google Cloud Storage

List the Cloud Storage buckets. They are ordered in the list lexicographically by name.

To list the buckets in a project:

  1. Open the Cloud Storage browser in the Google Cloud Console.
  2. Select Cloud Storage on the left menu then choose Buckets

Buckets that are part of the currently selected project, appear in the browser list.

Optionally Create the Destination Bucket in Google Cloud Storage

To create a new storage bucket:

  1. Open the Cloud Storage browser in the Google Cloud Console.
  2. Select Create bucket to open the bucket creation form.

  1. Enter your bucket information and select Continue to complete each step:

    • Specify a Name, subject to the bucket name requirements.
    • Select a Location type and Location where the bucket data will be permanently stored.
    • Select a Default storage class for the bucket. The default storage class is assigned by default to all objects uploaded to the bucket.
    • Select an Access control model to determine how you control access to the bucket's objects. To support Workload Identity Federation please choose Uniform

    • Optionally, you can set Data protection and Data encryption.
  2. Select Create.

Obtain the Google JSON Credentials

The integration with Google Cloud Storage is based on server-to-server API authentication.

The Service Account used to generate the JSON Credentials must have Storage Object User permissions.

  1. Visit your Google Developer Console.

  2. Select Credentials under APIs & Services at the left menu.

  3. Select Create credentials then choose Service account:

  4. From Permissions add Storage Object User Role.

Obtain Application Default Credentials (ADC) Keyfile

  1. Select create pool under IAM & Admin / Workload Identity Pools or choose existing Pool

  2. Add an AWS provider with account ID 523683666290

  3. From Configure provider attributes click Add mapping. Add attribute with name attribute.account and value assertion.account then Save

  4. From Workload Identity Pools select created pool then select Download config from Connected service accounts

  5. Select AWS provider that created for account ID 523683666290 then click Download and store config file (Application default credential keyfile).

Grant AWS provider permission to access destination bucket

  1. Select destination bucket from your buckets list then click permissions tab

  2. Click on Grant access then add Principals value principalSet://iam.googleapis.com/projects/{PROJECT_NUMBER}/locations/global/workloadIdentityPools/{POOL_ID}/attribute.account/523683666290 with roles Storage Object User

Use the Treasure Console to Create Your Connection

Create a New Authentication

In Treasure Data, you must create and configure the data connection before running your query. As part of the data connection, you provide authentication to access the integration.

  1. Open Treasure Console.

  2. Navigate to Integrations Hub > Catalog.

  3. Search for and select Google Cloud Storage V2.

  4. Select Create Authentication.

  5. Choose Authentication Method and input the credentials.

  6. Type a name for your connection.

  7. Select Continue.

Define your Query

  1. Complete the instructions in Creating a Destination Integration.

  2. Navigate to Data Workbench > Queries.

  3. Select a query for which you would like to export data.

  4. Run the query to validate the result set.

  5. Select Export Results.

  6. Select an existing integration authentication.

  7. Define any additional Export Results details. In your export integration content, review the integration parameters. For example, your Export Results screen might be different, or you might not have additional details to fill out.

  8. Select Done.

  9. Run your query.

  10. Validate that your data moved to the destination you specified.

Integration Parameters for Google Cloud Storage

ParameterMandatoryDescription
BucketyesGoogle Cloud Storage bucket name
File PathyesObject path, including the filename. Example: path/to/filename.csv.
Content typenoMIME type of the output file. Default value: application/octet-stream
FormatnoOutput file format. Default value: csv
EncodersnoCompression applied to the exported file. Default value: none
Public Keyyes if Encoders is PGP EncryptionThe public key to use for encryption.
Key Identifierno, only apply if Encoders is PGP EncryptionThe Key ID or Fingerprint (as a hexadecimal string) of the public key used for encryption
Armorno, only apply if Encoders is PGP EncryptionWhether to use ASCII armor or not (as a hexadecimal string) of the public key used for encryption
Compression Typeno, only apply if Encoders is PGP EncryptionSpecifies the compression algorithm to be used for compressing the file. Default value: none
Header line?noWrite the header line with column names as the first line. Default value: true
DelimiternoCharacter used to separate columns. Default value: Default
Null stringnoSubstitution string for NULL values. Default value: Default
End-of-line characternoLine termination character. Default value: CRLF

Example Query

SELECT 
  col_1
FROM 
  tbl 
WHERE col_1 != 'email'

Validating Export Results

Upon successful completion of the query, the results are automatically imported to the specified Google Cloud Storage destination:

Activate a Segment in Audience Studio

You can also send segment data to the target platform by creating an activation in the Audience Studio.

  1. Navigate to Audience Studio.
  2. Select a parent segment.
  3. Open the target segment, right-mouse click, and then select Create Activation.
  4. In the Details panel, enter an Activation name and configure the activation according to the previous section on Configuration Parameters.
  5. Customize the activation output in the Output Mapping panel.

  • Attribute Columns
    • Select Export All Columns to export all columns without making any changes.
    • Select + Add Columns to add specific columns for the export. The Output Column Name pre-populates with the same Source column name. You can update the Output Column Name. Continue to select + Add Columnsto add new columns for your activation output.
  • String Builder
    • + Add string to create strings for export. Select from the following values:
      • String: Choose any value; use text to create a custom value.
      • Timestamp: The date and time of the export.
      • Segment Id: The segment ID number.
      • Segment Name: The segment name.
      • Audience Id: The parent segment number.
  1. Set a Schedule.

  • Select the values to define your schedule and optionally include email notifications.
  1. Select Create.

If you need to create an activation for a batch journey, review Creating a Batch Journey Activation.

Exporting Data from Google Cloud Storage V2 CLI

The following command allows you to set a scheduled query that sends query results to Google Cloud Storage.

With authentication mode JSONKey

  • Specify your JSON key in the following sample syntax.
  • Use backslash to break a line without breaking the code syntax.
'{"type":"gcs_v2","bucket":"samplebucket","file_path":"output/test.csv","format":"csv","compression":"none","header_line":false,"delimiter":",","null_string":"","newline":"CRLF","auth_method":"json_key","json_keyfile":"{\"private_key_id\": \"ABCDEFGHIJ\", \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nABCDEFGHIJ\\ABCDEFGHIJ\\n-----END PRIVATE KEY-----\\n\", \"client_email\": \"ABCDEFGHIJ@developer.gserviceaccount.com\", \"client_id\": \"ABCDEFGHIJ.apps.googleusercontent.com\", \"type\": \"service_account\"}"}'

With authentication mode Workload Identity Federation

  • Specify your ADC key in the following sample syntax.
  • Use backslash to break a line without breaking the code syntax.
'{"type":"gcs_v2","bucket":"samplebucket","file_path":"output/test.csv","format":"csv","compression":"none","header_line":false,"delimiter":",","null_string":"","newline":"CRLF","auth_method":"wif","adc_keyfile":"{\"universe_domain\": \"googleapis.com\"......}"}'

For example,

$ td sched:create scheduled_gcs_v2 "10 6 * * *" \
-d dataconnector_db "SELECT id,account,purchase,comment,time FROM data_connectors" \
-r '{"type":"gcs_v2","bucket":"samplebucket","file_path":"output/test.csv","format":"csv","compression":"none","header_line":false,"delimiter":",","null_string":"","newline":"CRLF","auth_method":"json_key","json_keyfile":"{\"private_key_id\": \"ABCDEFGHIJ\", \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nABCDEFGHIJ\\ABCDEFGHIJ\\n-----END PRIVATE KEY-----\\n\", \"client_email\": \"ABCDEFGHIJ@developer.gserviceaccount.com\", \"client_id\": \"ABCDEFGHIJ.apps.googleusercontent.com\", \"type\": \"service_account\"}"}'

Parameters

ParameterData TypeMandatoryDefault ValueDescription
bucketstringyesN/AGoogle Cloud Storage bucket name
file_pathstringyesN/AObject path , including the filename. Example: path/to/filename.csv.
content_typestringnoapplication/octet-streamMIME type of the output file.
formatstringnocsvOutput file format. Supported values: csv/tsv
compressionstringnononeCompression applied to the exported file. Supported values: 'none', 'gz', 'bzip2', 'encrypt_pgp'
public_keystringyes if compression is encrypt_pgpN/AThe public key to use for encryption.
key_identifierstringno, only apply if compression is encrypt_pgpN/AThe Key ID or Fingerprint (as a hexadecimal string) of the public key used for encryption
armorstringno, only apply if compression is encrypt_pgpN/AWhether to use ASCII armor or not (as a hexadecimal string) of the public key used for encryption
compression_typestringno, only apply if compression is encrypt_pgpN/ASpecifies the compression algorithm to be used for compressing the file. Supported values: 'none', 'gzip', 'bzip2', 'bzip2_built_in', 'zip_built_in', 'zlib_built_in'
header_linebooleannotrueWrite the header line with column names as the first line. Supported values: true/false
delimiterstringnodefaultCharacter used to separate columns. Supported values: 'default', ',', '\t', '|'
null_stringstringnodefaultSubstitution string for NULL values (string, optional). Supported values: 'default', '', '\N', 'NULL', 'null'
newlinestringnoCRLFLine termination character (string, optional). Supported values: 'CRLF', 'LF', 'CR'

Other configurations

  • The Result Export can be scheduled to upload data to a target destination periodically.
  • All import and export integrations can be added to a Treasure Workflow. The td workflow operator can be used to export a query result to a specified connector. For more information, see Workflow Operators.

References

The Embulk-encoder-Encryption document

FAQ for the GCS V2 Data Connector

Note: Please ensure that you compress your file before encrypting and uploading.

  1. When you decrypt using non-built-in encryption, the file will return to a compressed format such as .gz or .bz2.

  2. When you decrypt using built-in encryption, the file will return to raw data.