Skip to content
Last updated

Bulk Data Import

Bulk data import loads large amounts of data into a database in a relatively short period of time. When you need to load a lot of data all at once, inserting one row at a time is inefficient; bulk import operations use more efficient methods.

Bulk operations might bypass triggers and integrity checks (such as constraints). Bypassing these can significantly improve data loading performance.

Bulk import integration options

Use Embulk to bulk import data into Treasure AI. Treasure AI offers integrations that allow you to import data in bulk from the following:

Installing Embulk

You can import data using Treasure AI's open-source bulk data loader Embulk. Embulk helps transfer data between various databases, storage locations, file formats, and cloud services.

Prerequisites

  • Basic knowledge of Treasure AI
  • Basic knowledge of Embulk
  • Java installed (Embulk is a Java application)
  • JRuby installed and configured (Embulk v0.10.50 and v0.11.0 do not include JRuby; see the "JRuby" section of Embulk v0.11 is coming soon)

Installing Embulk from the command line

curl --create-dirs -o ~/.embulk/bin/embulk -L "http://dl.embulk.org/embulk-latest.jar"
chmod +x ~/.embulk/bin/embulk
echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Installing the Embulk Treasure Data plugin

Embulk plugins load data to or from various systems and file formats. See the list of Embulk plugins.

Install the embulk-output-td plugin (imports records to Treasure AI):

embulk gem install embulk-output-td

Using a proxy server

If you cannot upload, verify whether your network uses a proxy. Set the proxy with command-line options:

embulk -J-Dhttp.proxyHost=HOST -J-Dhttp.proxyPort=PORT -J-Dhttp.proxyUser=USER -J-Dhttp.proxyPassword=PASS run config.yml

Using environment variables with bulk import

See Using Environment Variables with Bulk Import.