# Treasure Data CDP Integration

Treasure AI Voice can push your organization's recordings, transcripts, AI-generated summaries, and device inventory directly into your own [Treasure Data CDP](/products/customer-data-platform) database. Once the data lands in your CDP, you can join it with the rest of your customer data, build segments, drive journeys, and run analytics over what was actually said in your meetings and calls.

The integration is **push-based and on demand**: an administrator runs an export from the AI Voice Console, and Treasure AI Voice streams a snapshot of the selected data into the tables you specify. There is no standing pipeline to manage and no data sits in an intermediate location — your API key is used only to authenticate that single run and is never stored or persisted.

Who can use this
The Treasure Data CDP export is available to **Enterprise Administrators** only, and must be enabled for your organization before it appears. See [Enabling the integration](#enabling-the-integration) below.

## What gets exported

Each export writes to four logical tables in your CDP database. The `recordings`, `transcript_segments`, and `summaries` tables reflect the recordings in scope for the run; the `devices` table always reflects your full device inventory regardless of the selected date range.

Recordings are included regardless of processing status. The `status` column reflects each recording's state, and `transcription_text`, `summary_text`, and the related fields are empty for recordings that have not finished transcribing or summarizing yet.

| Table | Contents | Granularity |
|  --- | --- | --- |
| **`recordings`** | One row per recording: metadata, the full speaker-labeled transcript, the AI summary, and action items. | One row per recording |
| **`transcript_segments`** | The transcript broken into individual timed utterances, with speaker attribution and per-segment language detection. | One row per utterance |
| **`summaries`** | The AI-generated summary and action items, separated out for easy joining. Only recordings that have a summary produce a row. | One row per summarized recording |
| **`devices`** | Your organization's PLAUD device fleet: assignment, status, firmware, and last-seen timestamps. | One row per device |


Every row carries a stable `uuid` derived from the source record. Because the same record always maps to the same `uuid`, you can safely re-run an export over an overlapping date range and deduplicate on `uuid` in your CDP (for example, keep the most recent row per `uuid`) to refresh state without creating duplicates.

Every row also carries a `time` column (Unix epoch seconds) that Treasure Data CDP uses for native time partitioning, so analysts can efficiently filter by time. The `time` value is anchored to the most meaningful timestamp for each table — `recorded_at` for recordings and summaries, the utterance start for segments, and the last heartbeat (falling back to provisioning time) for devices.

## Schema reference

The schema is owned by Treasure AI Voice. Columns are appended to your CDP tables on first write — you do not need to pre-declare them.

### `recordings`

| Column | Type | Description |
|  --- | --- | --- |
| `time` | long | Unix epoch seconds of `recorded_at`. Used for time partitioning. |
| `uuid` | string | Stable identifier for this recording row. Deduplicate on this. |
| `id` | string | Recording ID (matches `recording_id` in the other tables). |
| `organization_id` | string | Your organization's identifier. |
| `recorded_at` | string | ISO 8601 timestamp of when the recording was captured. |
| `user_email` | string | Email of the member who created the recording. |
| `user_name` | string | Display name of the member who created the recording. |
| `file_name` | string | Original file name of the recording. |
| `duration_seconds` | long | Length of the recording in seconds. |
| `status` | string | Processing status (e.g., `COMPLETED`). |
| `transcription_language` | string | Primary detected language of the transcription. |
| `transcription_text` | string | The full transcript, rendered as speaker-labeled, speaker-grouped text — the same form an administrator sees when opening the recording in the AI Voice Console. |
| `summary_text` | string | The AI-generated summary of the recording. |
| `action_items` | array<string> | AI-extracted action items. |
| `inferred_name` | string | AI-inferred title for the recording. |
| `tags` | array<string> | Tags applied to the recording. |
| `device_serial_number` | string | Serial number of the PLAUD device used to capture the recording. |
| `team_id` | string | Identifier of the team the recording is associated with, if any. |


### `transcript_segments`

| Column | Type | Description |
|  --- | --- | --- |
| `time` | long | Unix epoch seconds at the segment's start offset within the recording. |
| `uuid` | string | Stable identifier for this segment row. |
| `recording_id` | string | The parent recording's ID. Join to `recordings.id`. |
| `organization_id` | string | Your organization's identifier. |
| `segment_index` | long | Zero-based position of the segment within the recording. |
| `start_ms` | long | Segment start offset, in milliseconds from the beginning of the recording. |
| `end_ms` | long | Segment end offset, in milliseconds from the beginning of the recording. |
| `speaker_id` | string | Raw speaker identifier from diarization (e.g., `spk_0`). |
| `speaker_label` | string | Human-friendly speaker name when one has been assigned in the AI Voice Console; otherwise empty. |
| `language` | string | Detected language of the segment. |
| `language_probability` | double | Confidence of the language detection for the segment. |
| `text` | string | The transcribed text of the segment. |


### `summaries`

| Column | Type | Description |
|  --- | --- | --- |
| `time` | long | Unix epoch seconds of `recorded_at`. |
| `uuid` | string | Stable identifier for this summary row. |
| `recording_id` | string | The recording this summary belongs to. Join to `recordings.id`. |
| `organization_id` | string | Your organization's identifier. |
| `recorded_at` | string | ISO 8601 timestamp of the underlying recording. |
| `summary_text` | string | The AI-generated summary. |
| `action_items` | array<string> | AI-extracted action items. |
| `inferred_name` | string | AI-inferred title for the recording. |


### `devices`

| Column | Type | Description |
|  --- | --- | --- |
| `time` | long | Unix epoch seconds of the last heartbeat (or provisioning time if the device has never reported). |
| `uuid` | string | Stable identifier for this device row. |
| `serial_number` | string | The device serial number. |
| `organization_id` | string | Your organization's identifier. |
| `status` | string | Current device status (e.g., `ACTIVE`, `LOCKED`). |
| `device_model` | string | Device model (e.g., Note Pro, NotePin S). |
| `firmware_version` | string | Firmware version currently installed. |
| `assigned_user_email` | string | Email of the member the device is assigned to. |
| `assigned_user_name` | string | Display name of the member the device is assigned to. |
| `assigned_at` | string | ISO 8601 timestamp when the device was assigned. |
| `provisioned_at` | string | ISO 8601 timestamp when the device was provisioned. |
| `last_heartbeat_at` | string | ISO 8601 timestamp of the device's most recent status report. |
| `last_sync_at` | string | ISO 8601 timestamp of the device's most recent sync with the mobile app. |


## Prerequisites

Before running an export, make sure you have:

- An **Enterprise Administrator** account on Treasure AI Voice.
- The integration **enabled** for your organization (see below).
- A **Treasure Data CDP API key** that holds the **IMPORT** role and can write to the target database.
- A **destination database** in your Treasure Data CDP. The export writes to the tables you specify within that database, creating them on first write.


Use a write-only key, not your Master key
Provide a dedicated **write-only API key** scoped to the destination database and limited to the **IMPORT** role. **Do not use your Master key** — it grants full account access and far exceeds what an export needs. The key you supply is used only to write events for a single export run; it is held in memory for the duration of the run and is never stored, logged, or read back.

Because the export writes to whatever destination the supplied key can reach, treat the Enterprise Administrator role and the keys used here as part of your data-governance controls — restrict who holds the role and which databases the key can write to.

Japan region only
Treasure AI Voice exports to the **Japan (AP01)** region of Treasure Data CDP only. Make sure your destination database lives in the Japan region; keys and databases in other regions will not accept the export.

## Enabling the integration

The integration is **off by default**. To turn it on:

1. Sign in to the AI Voice Console as an Enterprise Administrator.
2. Open the **Export** page.
3. Select the **Treasure Data CDP** destination.
4. Click **Enable export**.


If you do not see the option to enable it, contact your Treasure AI representative.

## Running an export

1. On the **Export** page, with **Treasure Data CDP** selected, enter:
  - **API Key** — your Treasure Data CDP API key (IMPORT role).
  - **Database** — the destination database name (for example, `ai_voice_export`).
2. (Optional) Expand **Advanced** to customize:
  - **Recordings table** and **Transcript segments table** names (defaults: `recordings`, `transcript_segments`).
  - **From** / **To** date range to limit which recordings are exported. The range applies to when each recording was captured, and the **To** date is inclusive through the end of that day.
3. Click **Export**.


The AI Voice Console reports progress while it reads your recordings, joins transcript data, and sends the events to Treasure Data CDP. A typical run completes in under a minute.

Date range and devices
The **From** / **To** range filters recordings, transcript segments, and summaries. The **devices** snapshot always reflects your full fleet, so you will see device rows even when no recordings match the selected range.

## Understanding the result

Each run finishes in one of these states:

| Result | Meaning |
|  --- | --- |
| **Done** | All rows in scope were accepted by Treasure Data CDP. The summary shows how many recordings and segments were sent. |
| **Done with some rejections** | The export completed, but Treasure Data CDP rejected some rows. The summary shows the accepted and rejected counts. Review the audit log or your Treasure Data CDP table receipts to investigate the rejected rows. |
| **No recordings to export** | Nothing matched the selected scope — either there are no completed recordings yet, or the date range filtered them all out. (A device snapshot may still be sent.) |
| **API key rejected** | Treasure Data CDP did not accept the API key. Double-check it and try again. |
| **API key lacks the IMPORT role** | The key is valid but cannot write events. Ask your Treasure AI administrator to grant the IMPORT role to the key, or use a key scoped to the destination database. |
| **A recording is too large to send** | A single recording's transcript exceeded Treasure Data CDP's per-request size limit, so the export stopped. This is rare and affects only unusually long transcripts. Narrow the date range to exclude that recording, or contact your Treasure AI representative. |


Every export run — including its outcome and the row counts — is recorded in the [Audit Logs](/products/ai-voice/administrator-guide#audit-logs) under the `TD_CDP_EXPORT` event type. The API key is never written to the audit log.

## Querying the exported data

Because each table shares `recording_id` (`recordings.id`), you can reconstruct any view of your meeting data in your CDP. For example:

- Join `summaries` to `recordings` on `recording_id` to get summaries alongside the full recording metadata.
- Join `transcript_segments` to `recordings` on `recording_id`, ordered by `segment_index`, to reconstruct or analyze the conversation utterance by utterance.
- Pivot `transcript_segments` on `speaker_label` (or `speaker_id`) to analyze talk time per participant.
- Join `recordings` to `devices` on `device_serial_number` = `serial_number` to attribute conversations to devices and assigned users.


Refreshing data
To keep your CDP current, re-run the export periodically over the relevant date range and deduplicate on `uuid` in your downstream models. Re-running over the same scope is safe — matching records keep the same `uuid`.