{"templateId":"markdown","sharedDataIds":{"sidebar":"sidebar-sidebars.yaml"},"props":{"metadata":{"markdoc":{"tagList":[]},"redocly_category":"Integrations","type":"markdown"},"seo":{"title":"Legacy Bulk Import For Aws S3","description":"Treasure Data Product Documentation · Collect and Unify · Segment and Activate · Experiment and Analyze · Decisioning Automate with AI Scale and Trust.","siteUrl":"https://docs.treasuredata.com","lang":"en-US","llmstxt":{"hide":false,"sections":[{"title":"Table of contents","includeFiles":["**/*"],"excludeFiles":[]}],"excludeFiles":[]}},"dynamicMarkdocComponents":[],"compilationErrors":[],"ast":{"$$mdtype":"Tag","name":"article","attributes":{},"children":[{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"legacy-bulk-import-for-aws-s3","__idx":0},"children":["Legacy Bulk Import For Aws S3"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["This article explains how to import data directly from Amazon S3 to Treasure Data."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"install-bulk-import","__idx":1},"children":["Install Bulk Import"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["First, install the ",{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://support.treasuredata.com/hc/en-us/articles/command-line"},"children":["Toolbelt"]},", which includes bulk loader program, on your computer."]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":2,"id":"downloads","__idx":2},"children":["Downloads"]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://toolbelt.treasuredata.com/win"},"children":["Toolbelt Installer for Windows"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"https://toolbelt.treasuredata.com/mac"},"children":["Toolbelt Installer for Mac OS X"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"/tools/cli-and-sdks/quickstart"},"children":["Toolbelt Installer for Linux"]}]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["After the installation, the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["td"]}," command will be installed on your computer. Open up the terminal and type ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["td"]}," to execute the command. Also, make sure you have ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["java"]}," as well. Run ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["td import:jar_update"]}," to download the up-to-date version of our bulk loader:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"$ td\nusage: td [options] COMMAND [args]\n$ java\nUsage: java [-options] class [args...]\n$ td import:jar_update\nInstalled td-import.jar 0.x.xx into /path/to/.td/java\n"},"children":[]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"authenticate","__idx":3},"children":["Authenticate"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Log in to your Treasure account."]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"$ td account -f\nEnter your Treasure Data credentials.\nEmail: xxxxx\nPassword (typing will be hidden):\nAuthenticated successfully.\nUse 'td db:create db_name' to create a database.\n"},"children":[]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":1,"id":"importing-data-from-amazon-s3","__idx":4},"children":["Importing data from Amazon S3"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["The bulk loader can read data from files stored in Amazon S3 in all three supported file formats:"]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":["CSV (default)"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["JSON"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["TSV"]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Suppose you have a file called ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["data.csv"]}," on Amazon S3 with these contents:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"\"host\",\"log_name\",\"date_time\",\"method\",\"url\",\"res_code\",\"bytes\",\"referer\",\"user_agent\"\n\"64.242.88.10\",\"-\",\"2004-03-07 16:05:49\",\"GET\",\"/twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables\",401,12846,\"\",\"\"\n\"64.242.88.10\",\"-\",\"2004-03-07 16:06:51\",\"GET\",\"/twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2\",200,4523,\"\",\"\"\n\"64.242.88.10\",\"-\",\"2004-03-07 16:10:02\",\"GET\",\"/mailman/listinfo/hsdivision\",200,6291,\"\",\"\"\n\"64.242.88.10\",\"-\",\"2004-03-07 16:11:58\",\"GET\",\"/twiki/bin/view/TWiki/WikiSyntax\",200,7352,\"\",\"\"\n"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Execute the following commands to upload the CSV file:"]},{"$$mdtype":"Tag","name":"CodeBlock","attributes":{"header":{"controls":{"copy":{}}},"source":"$ td db:create my_db\n$ td table:create my_db my_tbl\n$ td import:auto \\\n  --format csv --column-header \\\n  --time-column date_time \\\n  --time-format \"%Y-%m-%d %H:%M:%S\" \\\n  --auto-create my_db.my_tbl \\\n  \"s3://s3_access_key:s3_secret_key@/my_bucket/path/to/data.csv\"\n"},"children":[]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["where the location of the file is expressed as an S3 path with the AWS public and private access keys embedded in it."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["Because ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["td import:auto"]}," executes MapReduce jobs to check the invalid rows, it'll take at least ",{"$$mdtype":"Tag","name":"strong","attributes":{},"children":["1-2 minutes"]},". If the column chosen for ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--time-column"]}," is in epoch timestamp (or unix time), you don't need the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--time-format"]}," flag."]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["In the above command, we assumed that:"]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":["The CSV files are located on Amazon S3, within a bucket called ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket"]}," under this path/key ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["/path/to/"]},"."]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["The first line in the file indicates the column names, hence we specify the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--column-header"]}," option. If the file does not have the column names in the first row, you will have to specify the column names with the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--columns"]}," option (and optionally the column types with ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--column-types"]}," option), or use the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--column-types"]}," for each column in the file."]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["The time field is called “date_time” and it’s specified with the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--time-column"]}," option"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":["The time format is ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["%Y-%m-%d %H:%M:%S"]}," and it’s specified with the ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["--time-format"]}," option"]}]},{"$$mdtype":"Tag","name":"Heading","attributes":{"level":3,"id":"wildcards","__idx":5},"children":["Wildcards"]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["The source files to be imported by the bulk loader can be specified as full Amazon S3 paths or using wildcards. Here are some examples:"]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"code","attributes":{},"children":["s3://my_bucket/path/to/data*"]}," ","All files under ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket/path/to/"]}," with prefix ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["data"]},";"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"code","attributes":{},"children":["s3://my_bucket/path/to/data*.csv"]}," ","All files under ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket/path/to/"]}," with prefix ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["data"]}," and extension ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":[".csv"]},";"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"code","attributes":{},"children":["s3://my_bucket/path/to/*.csv"]}," ","All files under ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket/path/to/"]}," with extension ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":[".csv"]},";"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"code","attributes":{},"children":["s3://my_bucket/path/to/*"]}," ","All files under ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket/path/to/"]},";"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"code","attributes":{},"children":["s3://my_bucket/path/to/*/*.csv"]}," ","All files in the direct subfolders of ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket/path/"]}," with extension ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":[".csv"]},";"]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"code","attributes":{},"children":["s3://my_bucket/**/*.csv"]}," ","All files in all subfolders of ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":["my_bucket/path/"]}," with extension ",{"$$mdtype":"Tag","name":"code","attributes":{},"children":[".csv"]},";"]}]},{"$$mdtype":"Tag","name":"p","attributes":{},"children":["For further details, check the following pages:"]},{"$$mdtype":"Tag","name":"ul","attributes":{},"children":[{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"/int/legacy-bulk-import-internals"},"children":["Bulk Import Internals"]}]},{"$$mdtype":"Tag","name":"li","attributes":{},"children":[{"$$mdtype":"Tag","name":"MarkdownLink","attributes":{"href":"/int/legacy-bulk-import-tips-and-tricks"},"children":["Bulk Import Tips and Tricks"]}]}]}]},"headings":[{"value":"Legacy Bulk Import For Aws S3","id":"legacy-bulk-import-for-aws-s3","depth":1},{"value":"Install Bulk Import","id":"install-bulk-import","depth":1},{"value":"Downloads","id":"downloads","depth":2},{"value":"Authenticate","id":"authenticate","depth":1},{"value":"Importing data from Amazon S3","id":"importing-data-from-amazon-s3","depth":1},{"value":"Wildcards","id":"wildcards","depth":3}],"frontmatter":{"seo":{"title":"Legacy Bulk Import For Aws S3"}},"lastModified":"2026-06-01T09:09:59.000Z","pagePropGetterError":{"message":"","name":""}},"slug":"/int/legacy-bulk-import-for-aws-s3","userData":{"isAuthenticated":false,"teams":["anonymous"]},"isPublic":true}