JSON

JSON (Java Script Object Notation) is used both as a file format and as a serialization method. The JSON file format is flexible and is commonly used for dynamic, nested, and semi-structured data representations.

The BLUE JSON parser supports the RFC 8259 data interchange format and supports both JSON objects and JSON object arrays.

Only the JSON Lines data format is supported by BLUE.

Foreign Data Wrapper Prerequisites

Before proceeding, ensure the following Foreign Data Wrapper (FDW) prerequisites:

  • File Existence: Verify that the file you are ingesting data from exists at the specified path.

  • Path Accuracy: Confirm that all path elements are present and correctly spelled. Any inaccuracies may lead to data retrieval issues.

  • Bucket Access Permissions: Ensure that you have the necessary access permissions to the bucket from which you are ingesting data. Lack of permissions can hinder the data retrieval process.

  • Wildcard Accuracy: If using wildcards, double-check their spelling and configuration. Misconfigured wildcards may result in unintended data ingestion.

Mapping between JSON and BLUE

A JSON field consists of a key name and a value.

Key names, which are case sensitive, are mapped to BLUE columns. Key names which do not have corresponding BLUE table columns are treated as errors by default, unless the IGNORE_EXTRA_FIELDS parameter is set to true, in which case these key names will be ignored during the mapping process.

BLUE table columns which do not have corresponding JSON fields are automatically set to null as a value.

Values may be one of the following reserved words (lower-case): false, true, or null, or any of the following data types:

JSON Data Type

Representation in BLUE

Number

TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, NUMERIC

String

TEXT

JSON Literal

NULL, TRUE, FALSE

JSON Array

TEXT

JSON Object

TEXT

Character Escaping

The ASCII 10 character (LF) marks the end of JSON objects. Use \\n to escape the \n character when you do not mean it be a new line.

Parameters

The following parameters are supported by json_fdw:

Parameter

Description

DATETIME_FORMAT

Default format is yyyy-mm-dd. Other supported date formats are:iso8601, iso8601c, dmy, ymd, mdy, yyyymmdd, yyyy-m-d, yyyy-mm-dd, yyyy/m/d, yyyy/mm/dd, d/m/yyyy, dd/mm/yyyy, mm/dd/yyyy, dd-mon-yyyy, yyyy-mon-dd.

IGNORE_EXTRA_FIELDS

Default value is false. When value is true, key names which do not have corresponding BLUE table columns will be ignored. Parameter may be used with the COPY TO and IGNORE FOREIGN TABLE statements.

COMPRESSION

Supported values are auto, gzip, and none. auto means that the compression type is automatically detected upon import. Parameter is not supported for exporting. gzip means that a gzip compression is applied. none means that no compression or an attempt to decompress will take place.

LOCATION

A path on the local filesystem, on S3, or on HDFS URI. The local path must be an absolute path that BLUE can access.

LIMIT

When specified, tells BLUE to stop ingesting after the specified number of rows. Unlimited if unset.

OFFSET

The row number from which to start ingesting.

ERROR_LOG

If when using the COPY command, copying a row fails, the ERROR LOG command writes error information to the error log specified in the ERROR LOG command.

  • If an existing file path is specified, the file will be overwritten.

  • Specifying the same file for ERROR_LOG and REJECTED_DATA is not allowed and will result in error.

  • Specifying an error log when creating a foreign table will write a new error log for every query on the foreign table.

CONTINUE_ON_ERROR

Specifies if errors should be ignored or skipped. When set to true, the transaction will continue despite rejected data. This parameter should be set together with ERROR_COUNT. When reading multiple files, if an entire file cannot be opened, it will be skipped.

ERROR_COUNT

Specifies the maximum number of faulty records that will be ignored. This setting must be used in conjunction with continue_on_error.

MAX_FILE_SIZE

Sets the maximum file size (bytes).

ENFORCE_SINGLE_FILE

Permitted values are true or false. When set to true, a single file of unlimited size is created. This single file is not limited by the MAX_FILE_SIZE parameter. false permits creating several files together limited by the MAX_FILE_SIZE parameter. Default value: false.

AWS_ID, AWS_SECRET

Specifies the authentication details for secured S3 buckets.

Automatic Schema Inference

You may let BLUE automatically infer the schema of a foreign table when using json_fdw.

Automatic Schema Inference example:

CREATE FOREIGN TABLE t
WRAPPER
  json_fdw
OPTIONS
  (LOCATION = 's3://sqream-docs/nba.json');

Examples

JSON object array:

{ "name":"Avery Bradley", "age":25, "position":"PG" }
{ "name":"Jae Crowder", "age":25, "position":"PG" }
{ "name":"John Holland", "age":27, "position":"SG" }

JSON objects:

[
{ "name":"Avery Bradley", "age":25, "position":"PG" },
{ "name":"Jae Crowder", "age":25, "position":"SF" },
{ "name":"John Holland", "age":27, "position":"SG" }
]

When using the CREATE FOREIGN TABLE statement, make sure that the table schema corresponds with the JSON file structure.

CREATE FOREIGN TABLE t (ID INT NOT NULL)
WRAPPER
  json_fdw
OPTIONS
  (location = 's3://sqream-docs/nba.json');

Tip

An exact match must exist between the BLUE and JSON types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.