# JSON Parser

The JSON parser plugin for Treasure Data's data connectors parses JSON data. You can use the following option:

| **Option** | **Description** | **required?** |
|  --- | --- | --- |
| `stop_on_invalid_record` | Stop bulk load transaction if a file includes an invalid record. | false by default |
| invalid_string_escapes | Escape strategy for invalid JSON strings. (see below) | PASSTHROUGH by default |
| root | Specify when pointing a JSON object value as a record via JSON pointer expression | optional |
| flatten_json_array | Set to true if you want to treat elements in a JSON array as multiple records. | false by default |
| columns | Columns (see below) | optional |


If you set flatten_json_array, the job can fail because the parser will need a lot of memory to read and store the entire JSON array before processing it. In the case you have only one large JSON array, Treasure Data recommends that you either reduce the number of JSON objects in the array or extract and store the JSON object as illustrated in this example:


```jsonl
{"time":1455829282,"ip":"93.184.216.34","name":"frsyuki"}   
{"time":1455829282,"ip":"172.36.8.109","name":"sadayuki"}   
{"time":1455829284,"ip":"example.com","name":"Treasure Data"}   
{"time":1455829282,"ip":"10.98.43.1","name":"MessagePack"}
```

If you set invalid_string_escapes, and the parser encounters an invalid JSON string, it takes the following the action:

| invalid_string_escapes | converts to |
|  --- | --- |
| PASSTHROUGH | \a |
| SKIP | empty string |
| UNESCAPE | a |


The **columns** option declares the list of columns, and specifies the way the parser will extract JSON values into columns.

| Name | Description |
|  --- | --- |
| name | Name of the column. The JSON value with this name is extracted if **element_at** has not been specified. |
| type | Type of the column (same as used with CSV parser) |
| element_at | Descendant element to be extracted as the column, expressed as a relative JSON Pointer (optional) |
| format | Format of the timestamp, if element is of type timestamp |


## Configuration

You can set the parser to JSON automatically by using the `guess` command to generate the `load.yml`. Or set the `parser` section’s type to `json` in `load.yml`, for example:


```yaml
in:
  ...
  parser:
    type: json
out:
  ...
```

## Use Case 1: Load JSON File into One Column

The following example shows that the source file is JSON data and how to configure the .yml file.

- Example of Source data


```jsonl
{"time":1455829282,"ip":"93.184.216.34","name":"frsyuki"}
{"time":1455829282,"ip":"172.36.8.109","name":"sadayuki"}
{"time":1455829284,"ip":"example.com","name":"Treasure Data"}
{"time":1455829282,"ip":"10.98.43.1","name":"MessagePack"}
```

`load.yml` example is as follows.


```yaml
in:
  ...
    parser:
      type: json
    ...
```

You can show parsed JSON data which in the following example is named `record` column by `preview` command.


```bash
$ td connector:preview load.yml
```


```bash
+---------------------------------------------------------------------------+
| record:json                                                           |
+---------------------------------------------------------------------------+
| "{\"ip\":\"93.184.216.34\",\"time\":1455829282,\"name\":\"frsyuki\"}"     |
| "{\"ip\":\"172.36.8.109\",\"time\":1455829282,\"name\":\"sadayuki\"}"     |
| "{\"ip\":\"example.com\",\"time\":1455829284,\"name\":\"Treasure Data\"}" |
| "{\"ip\":\"10.98.43.1\",\"time\":1455829282,\"name\":\"MessagePack\"}"    |
+---------------------------------------------------------------------------+
```

- Import to Treasure Data table.


```bash
$ td connector:issue load.yml --database <database name> --table <table name>  --auto-create-table
```

The result table is as follows:

![](/assets/image-20191108-212134.d991d7b1b7e896a1eb07945f10b318d48413d54329b1d94b1893810b34d50977.b2039cd8.png)

The JSON data shows `string` type, but you can use [JSON Functions](https://trino.io/docs/423/functions/json.html) as follows:


```sql
SELECT 
  json_extract(
    record,
    '$.ip'
  ) AS IP
FROM
  json
```

ip

"93.184.216.34"
"172.36.8.109"
"example.com"
"10.98.43.1"

## Use Case 2: Expand a JSON File into Multiple Columns

The following example shows the source file is JSON data and how to configure the .yml file.

Example of Source data


```jsonl
{"time":1455829282,"user_info":{"ip":"93.184.216.34","name":"frsyuki"}}
{"time":1455829282,"user_info":{"ip":"172.36.8.109","name":"sadayuki"}}
{"time":1455829284,"user_info":{"ip":"example.com","name":"Treasure Data"}}
{"time":1455829282,"user_info":{"ip":"10.98.43.1","name":"MessagePack"}}
```

The sub JSON is:


```json
{"ip":"93.184.216.34","name":"frsyuki"}
```

You can configure the data connector to expand the file.


```yaml
    in:
    ...
      parser:
        type: json
        columns:
        - name: time
          type: long
        - name: ip
          type: string
          element_at: /user_info/ip
        - name: name
          type: string
          element_at: /user_info/name
    out:
    ...
```

The result is as follows:

![](/assets/expandjsonresult.288672bf26649dd8e14097d5b484d18c045e4392542c051fa979bbcd115fbd26.b2039cd8.png)

## Use Case 3: Flattening JSON array

You can flatten a JSON array by setting **flatten_json_array** to true.


```yaml
    in:
    ...
      parser:
        type: json
        flatten_json_array: true
        root: /records
```

- Example Input


```json
{
  "records": [
    {"col1":1,"col2":"test1"},
    {"col1":2,"col2":"test2"}
  ]
}
```

You can see two records by flattening JSON array as follows.


```bash
$ td connector:preview load.yml
```


```bash
+-----------------------------------+
| record:json                       |
+-----------------------------------+
| "{\"col1\":1,\"col2\":\"test1\"}" |
| "{\"col1\":2,\"col2\":\"test2\"}" |
+-----------------------------------+
2 rows in set
```