Skip to content

Replacing Data Tank 1.0 with 2.0

This article covers changes around integrations when replacing Data Tank with Data Tank 2.0. As part of its commitment to security and protection of sensitive data, Treasure Data recommends reviewing the entire document before beginning any steps.

Feature Differences between Data Tank and Data Tank 2.0

This section describes the feature gaps between Data Tank and Data Tank 2.0.

FeatureStatusData Tank 1.0Data Tank 2.0
EndpointDIFFStatic IP addressWriter endpoint URI; associated IP address cannot be fixed
Database nameDIFFdatatankaciddb
Access controls (roles and schemas)DIFFTreasure Data provides tank_user and tank_integration_user accountsUsers and schemas are created in Management UI (accounts are called Access Keys); access control is managed within the UI
DDL statementsDIFFNo additional command requiredSET ROLE _owner_aciddb_<schema_name> must be executed before running DDL
SSL encryptionDIFFNot enforcedAll connections require SSL
Maintenance windowsDIFFNo regular window; support coordinates as neededCustomers define a weekly maintenance window; patches are applied during that time (see Understanding Data Tank 2.0 Maintenance)
IP allowlist sizeDIFFNo limitUp to 200 IP addresses
Engine behavior gapsDIFFBased on PostgreSQL 12Based on Aurora PostgreSQL 12.13; some engine differences apply
Custom port numberUNSUPPORTEDDefault 5432, customizable via SupportFixed to port 5432
Port forwardingUNSUPPORTEDSupport can add ports 80, 443, or 5439Not supported
IPv6 allowlistUNSUPPORTEDSupports IPv6IPv6 is not supported
ExtensionsUNSUPPORTEDplpgsql, pgcrypto, cstore_fdw, mysql_fdw, tds_fdw, foreign_table_exposer installed by defaultOnly plpgsql and pgcrypto available; SERVER, FOREIGN DATA WRAPPER, FOREIGN TABLE, and USER MAPPING objects cannot be migrated
Management UINEWNot availableWeb interface for managing configurations, roles, schemas, and monitoring
Log collectionNEWNot availablePostgreSQL logs viewable in Management UI
MonitoringNEWDashboards provided via DataDog linkDashboards available directly in Management UI

Refer to the AWS Aurora PostgreSQL extension documentation for the full list of extensions supported in Data Tank 2.0 (Aurora PostgreSQL 12.13).

Migration overview from Data Tank 1.0 to 2.0

Within the Management UI you can configure database settings, create and manage schemas and access keys, and review logs and dashboards from a single interface.

Legacy MonitoringManagement UI Monitoring
Legacy DataDog monitoring dashboardManagement UI monitoring dashboard

Change Integration with Data Tank to that with Data Tank 2.0

The following two functions can be used to import records from Data Tank or export the result of a Hive/Trino query to Data Tank.

To use these integrations, you need to create an Authentication in advance in most cases.

Authentications for Data Tank

The following Authentications are prepared by default. tank_integration_user is used as a database user for both of the following Authentications.

  • datatank
  • datatank_cstore
Default Authentications for Data Tank

You can also create user-defined Authentications.

Authentications for Data Tank 2.0

Authentications are not prepared by default. You need to create new Authentications.

Import from Data Tank

To import from Data Tank, you need to run a Bulk Load job. To run this type of job, you use either of the following. If you want to replace Data Tank with Data Tank 2.0, you must change the import source from Data Tank to Data Tank 2.0.

  • Sources (Treasure Console)
  • CLI (Toolbelt, td_load> operator of Workflow)

Sources

If you have existing Sources for Data Tank, you can see those Sources connected to the Authentication in the Treasure Console as follows. The red arrow in the image points to the number of Sources associated with the Authentication named databank.

Sources connected to Authentication in Treasure Console

If an existing Source imports from Data Tank 2.0, a new Source associated with the Authentication for Data Tank 2.0 is needed because each Source's authentication cannot be replaced. When you create a new Source that imports from Data Tank 2.0, it is recommended that the existing Source not be scheduled to run.

CLI (TD Toolbelt)

If you use CLI to import to Plazma DB using an Authentication, you use the td_authentication_id option in your YAML file as follows. The Authentication ID set in this option must be changed to change the import source. In addition, you also need to change other options like the database option.

in:
  type: postgresql
  td_authentication_id: <your_authentication_id>
  ...

If you don't use the td_authentication_id option in the YAML file, the credentials of Data Tank would be specified instead of Authentication as shown in the following example. If so, replace them with Data Tank 2.0 credentials.

in:
  type: postgresql
  host: xx.xxx.xxx.xxx
  user: tank_integration_user
  password: xxxxx
  ...

Export Result of Hive/Trino to Data Tank

Either of the following features can export the result of a Hive/Trino job to Data Tank. If you are using these features to export to Data Tank, you will need to change the destination to export to Data Tank 2.0.

  • Query Editor on Treasure Console
  • td_run> operator of Workflow
  • td> operator of Workflow
  • CLI (Toolbelt)
  • REST API

Query Editor on Treasure Console

You can configure the export settings by selecting Export Results as shown below. Then, you need to configure it again with a new Authentication to replace the existing one. Once you have deleted the existing export settings, use the Authentication for Data Tank 2.0 to make new export settings.

Export Results configuration in Query Editor

If DDL is required, you need to set _owner_<database_name>_<schema_name> to the Set Role option as follows:

Set Role option configuration for DDL

td_run> Operator of Workflow

This workflow operator calls the existing Saved Query. Therefore, you need to do the same operation as in the case of Query Editor.

td> Operator of Workflow

For this workflow operator, you specify your Authentication name for the result_connection parameter as follows. Replace this parameter with the Authentication for Data Tank 2.0 as well as any other parameters.

+export_task:
  td>: queries/sample.sql
  result_connection: data_tank # Authentication Name
  result_settings:
    database: datatank
    table: ...
    ...

In addition, if DDL is required, you need to specify the set_role option as follows.

+export_task:
  td>: queries/sample.sql
  result_connection: <your_authentication_name>
  result_settings:
    database: aciddb
    table: ...
    schema: <schema_name>
    set_role: _owner_aciddb_<schema_name>
    ...

If you don't specify your Authentication for result_connection, use the result_url parameter instead of result_connection and result_settings as shown.

+export_task:
  td>: queries/sample.sql
  result_url: postgresql://tank_integration_user:xxx@......

In this case, modify the value for the result_url parameter.

In addition, add set_role=_owner_aciddb_<schema_name> as a query parameter if a DDL statement is required.

CLI (Toolbelt) for Export

For CLI, you specify the information for export in URL format.

For example, you can export the result of the query to Data Tank by using the td query command as follows. As you specify the information for export to the -r option directly, specify the setting for Data Tank 2.0.

td query -d kazzy_test -w -r "postgresql://tank_integration_user:<password>@<ip_address>/datatank/<table_name>?mode=replace" "SELECT 1 AS col1;"

Connect to Data Tank via pg> Operator of Workflow

You can run any query statement in Data Tank from Workflow by using the pg> operator as shown in the following example.

_export:
  pg:
    host: ....
    port: 5432
    database: datatank
    user: tank_integration_user
    schema: public

+access_data_tank:
  pg>: queries/sample.sql
  ...

You need to change the values set for the options of the pg> operator in order to run the query statements in Data Tank 2.0. If you use DDL, you need to add a SET ROLE statement before the DDL statements.

Connect to Data Tanks via Third-Party Tools

To connect to your Data Tank via any third-party tool, you specify the connection information in the tool. Replace the connection information with that of Data Tank 2.0.

References