{"templateId":"markdown","sharedDataIds":{"sidebar":"sidebar-sidebars.yaml"},"props":{"metadata":{"markdoc":{"tagList":["admonition"]},"redocly_category":"Integrations","type":"markdown"},"seo":{"title":"Embulk Bulk Import From Tsv Files","description":"Treasure Data Product Documentation · Collect and Unify · Segment and Activate · Experiment and Analyze · Decisioning Automate with AI Scale and Trust.","siteUrl":"https://docs.treasuredata.com","lang":"en-US","llmstxt":{"hide":false,"sections":[{"title":"Table of contents","includeFiles":["**/*"],"excludeFiles":[]}],"excludeFiles":[]}},"dynamicMarkdocComponents":[],"compilationErrors":[],"ast":{"$$mdtype":"Tag","name":"article","attributes":{},"children":[{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"embulk-bulk-import-from-tsv-files","__idx":0},"children":["Embulk Bulk Import From Tsv Files"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can import data from TSV files into Treasure Data using ",{"$$mdtype":"Tag","name":"em","attributes":{},"children":["Embulk"]},", an open-source bulk data loader."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":[{"$$mdtype":"Tag","name":"img","attributes":{"src":"/assets/image-20191021-194448.33463c75cf3cb8b1a37595872537be60b30cee4f234d918acbc8fa2cb005f933.d2564ff8.png","alt":""},"children":[]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"prerequisites","__idx":1},"children":["Prerequisites"]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":["Basic knowledge of Treasure Data."]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["Basic Knowledge of ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"http://www.embulk.org/docs/"},"children":["Embulk"]},"."]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://www.jruby.org/download"},"children":["JRuby"]}," needs to be installed and configured. Embulk v0.10.50 and v0.11.0 do not include JRuby, so you will need to manually install and configure it. (For details on configuring embulk with JRuby see the  \"JRuby\" section of ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://www.embulk.org/articles/2023/04/13/embulk-v0.11-is-coming-soon.html"},"children":["Embulk v0.11 is coming soon"]},". )"]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"what-is-embulk","__idx":2},"children":["What is Embulk?"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Embulk enables you to transfer data between various databases, storage locations, file formats, and cloud services."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"how-to-install-embulk","__idx":3},"children":["How to Install Embulk"]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":3,"id":"linux-mac-and-bsd","__idx":4},"children":["Linux, Mac and BSD"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Embulk is a Java application. Make sure that Java is installed."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Following 4 commands install Embulk to your home directory:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"bash","header":{"controls":{"copy":{}}},"source":"curl --create-dirs -o ~/.embulk/bin/embulk -L \"http://dl.embulk.org/embulk-latest.jar\"\nchmod +x ~/.embulk/bin/embulk\necho 'export PATH=\"$HOME/.embulk/bin:$PATH\"' >> ~/.bashrc\nsource ~/.bashrc\n","lang":"bash"},"children":[]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":3,"id":"windows","__idx":5},"children":["Windows"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Embulk is a Java application. Make sure that Java is installed."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can download ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["embulk.bat"]}," using the following command on cmd.exe or PowerShell.exe:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"PowerShell -Command \\\n  \"& {Invoke-WebRequest http://dl.embulk.org/embulk-latest.jar -OutFile embulk.bat}\"\n"},"children":[]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"how-to-install-treasure-data-plugin","__idx":6},"children":["How to install Treasure Data Plugin"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can use plugins to load data from and to various systems and file formats. Select to view a list of publicly released plugins: ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://plugins.embulk.org/"},"children":["list of plugins by category"]},"."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["The following command installs embulk-output-td plugin, which imports records to Treasure Data."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"bash","header":{"controls":{"copy":{}}},"source":"embulk gem install embulk-output-td\n","lang":"bash"},"children":[]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"create-a-seed-configuration-file","__idx":7},"children":["Create a Seed Configuration File"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Using your favorite text editor, create embulk config file (for eg:seed.yml) defining input(file) and ouput(TD) parameters. Example:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"yaml","header":{"controls":{"copy":{}}},"source":"in:\n    type: file\n    path_prefix: /path/to/files/sample_    # path of *.csv or *.tsv file on your local machine\nout:\n    type: td\n    apikey: xxxxxxxxxxxx\n    endpoint: api.treasuredata.com\n    database: dbname\n    table: tblname\n    time_column: time\n    mode: replace\n    #by default mode: append is used, if not defined. Imported records are appended to the target table with this mode.\n    #mode: replace, replaces existing target table\n    default_timestamp_format: '%Y-%m-%d %H:%M:%S'\n","lang":"yaml"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["This is Sample Data."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"id,account,time,purchase,comment\n1,32864,2015-01-27 19:23:49,20150127,embulk\n2,14824,2015-01-27 19:01:23,20150127,embulk jruby\n3,27559,2015-01-28 02:20:02,20150128,\"Embulk \"\"TSV\"\" parser plugin\"\n4,11270,2015-01-29 11:54:36,20150129,NULL\n"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["For further details about additional parameters available for embulk-local-file-input, refer to ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"http://www.embulk.org/docs/built-in.html#local-file-input-plugin"},"children":["Embulk Local file input"]}," Also details about embulk-output-td, refer to the ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://github.com/treasure-data/embulk-output-td#td-output-plugin-for-embulk"},"children":["TD output plugin for Embulk"]},"."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"guess-fields-generate-loadyml","__idx":8},"children":["Guess Fields (Generate load.yml)"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Embulk guess option uses ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["seed.yml"]}," to read the target file and automatically guesses the column types and settings, and creates a new file ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["load.yml"]}," with this information."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"bash","header":{"controls":{"copy":{}}},"source":"embulk guess seed.yml -o load.yml\n","lang":"bash"},"children":[]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"yaml","header":{"controls":{"copy":{}}},"source":"in:\n  type: file\n  path_prefix: /path/to/files/sample_\n  'last_path:': /path/to/files/sample_02.tsv\n  parser:\n    charset: UTF-8\n    newline: CRLF\n    type: tsv\n    delimiter: ','\n    quote: '\"'\n    escape: '\"'\n    null_string: 'NULL'\n    trim_if_not_quoted: false\n    skip_header_lines: 1\n    allow_extra_columns: false\n    allow_optional_columns: false\n    columns:\n      - name: id\n        type: long\n      - name: account\n        type: long\n      - name: time\n        type: timestamp\n        format: '%Y-%m-%d %H:%M:%S'\n      - name: purchase\n        type: timestamp\n        format: '%Y%m%d'\n      - name: comment\n        type: string\n  out:\n    type: td\n    apikey: xxxxx\n    endpoint: api.treasuredata.com\n    database: dbname\n    table: tblname\n    time_column: time\n    mode: replace\n    default_timestamp_format: '%Y-%m-%d %H:%M:%S'\n","lang":"yaml"},"children":[]},{"$$mdtype":"Tag","name":"Admonition","attributes":{"type":"info"},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Add the \"auto_create_table: true\" parameter to the load.yml, so that tables that do not exist are automatically."]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["This is a sample of the auto_create_table parameter in a .yml file."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"yaml","header":{"controls":{"copy":{}}},"source":"out:\n  type: td\n  apikey: your apikey\n  endpoint: api.treasuredata.com\n  database: dbname\n  table: tblname\n  time_column: created_at\n  auto_create_table: true\n  mode: append\n","lang":"yaml"},"children":[]},{"$$mdtype":"Tag","name":"Admonition","attributes":{"type":"info"},"children":[{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You must create the database and table in TD, prior to executing the load job. Alternative: If you either: 1) must add a database or 2) do not add the auto_create_table parameter in a .yml file and must add a table, run the following TD commands:"]}]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"data-language":"bash","header":{"controls":{"copy":{}}},"source":"td database:create dbname\ntd table:create dbname tblname\n","lang":"bash"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can also create the database and table using ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://console.treasuredata.com/app/databases"},"children":["Treasure Console"]},"."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Now, you may preview the data using ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["embulk preview load.yml"]}," command. If any of the column types or data seems incorrect you can edit ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["load.yml"]}," file directly and preview again to verify. If the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["guess"]}," option doesn’t yield satisfactory results, you can change parameters in ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["load.yml"]}," according to your requirements manually by using ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"http://www.embulk.org/docs/built-in.html#csv-parser-plugin"},"children":["CSV/TSV parser plugin options"]},"."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"embulk preview load.yml\n"},"children":[]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"execute-load-job","__idx":9},"children":["Execute Load Job"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Finally, issue the import job by running the following command:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"embulk run load.yml\n"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["It may take a few minutes to hours for the job to complete, depending on the size of the data."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"appendix","__idx":10},"children":["Appendix"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["You can also import data from TSV files using the Bulk Import program (td-import). However, be advised that the td-import is not actively maintained and is a candidate for deprecation in the future. Therefore, we strongly recommend using Embulk."]}]},"headings":[{"value":"Embulk Bulk Import From Tsv Files","id":"embulk-bulk-import-from-tsv-files","depth":1},{"value":"Prerequisites","id":"prerequisites","depth":2},{"value":"What is Embulk?","id":"what-is-embulk","depth":2},{"value":"How to Install Embulk","id":"how-to-install-embulk","depth":2},{"value":"Linux, Mac and BSD","id":"linux-mac-and-bsd","depth":3},{"value":"Windows","id":"windows","depth":3},{"value":"How to install Treasure Data Plugin","id":"how-to-install-treasure-data-plugin","depth":2},{"value":"Create a Seed Configuration File","id":"create-a-seed-configuration-file","depth":1},{"value":"Guess Fields (Generate load.yml)","id":"guess-fields-generate-loadyml","depth":2},{"value":"Execute Load Job","id":"execute-load-job","depth":2},{"value":"Appendix","id":"appendix","depth":2}],"frontmatter":{"seo":{"title":"Embulk Bulk Import From Tsv Files"}},"lastModified":"2026-06-01T09:09:59.000Z","pagePropGetterError":{"message":"","name":""}},"slug":"/int/embulk-bulk-import-from-tsv-files","userData":{"isAuthenticated":false,"teams":["anonymous"]},"isPublic":true}