JSON¶
JSON (Java Script Object Notation) is used both as a file format and as a serialization method. The JSON file format is flexible and is commonly used for dynamic, nested, and semi-structured data representations.
The BLUE JSON parser supports the RFC 8259 data interchange format and supports both JSON objects and JSON object arrays.
Only the JSON Lines data format is supported by BLUE.
Foreign Data Wrapper Prerequisites¶
Before proceeding, ensure the following Foreign Data Wrapper (FDW) prerequisites:
File Existence: Verify that the file you are ingesting data from exists at the specified path.
Path Accuracy: Confirm that all path elements are present and correctly spelled. Any inaccuracies may lead to data retrieval issues.
Bucket Access Permissions: Ensure that you have the necessary access permissions to the bucket from which you are ingesting data. Lack of permissions can hinder the data retrieval process.
Wildcard Accuracy: If using wildcards, double-check their spelling and configuration. Misconfigured wildcards may result in unintended data ingestion.
Mapping between JSON and BLUE¶
A JSON field consists of a key name and a value.
Key names, which are case sensitive, are mapped to BLUE columns. Key names which do not have corresponding BLUE table columns are treated as errors by default, unless the IGNORE_EXTRA_FIELDS
parameter is set to true
, in which case these key names will be ignored during the mapping process.
BLUE table columns which do not have corresponding JSON fields are automatically set to null
as a value.
Values may be one of the following reserved words (lower-case): false
, true
, or null
, or any of the following data types:
JSON Data Type |
Representation in BLUE |
---|---|
Number |
|
String |
|
JSON Literal |
|
JSON Array |
|
JSON Object |
|
Character Escaping¶
The ASCII 10 character (LF) marks the end of JSON objects. Use \\n
to escape the \n
character when you do not mean it be a new line.
Parameters¶
The following parameters are supported by json_fdw
:
Parameter |
Description |
---|---|
|
Default format is |
|
Default value is |
|
Supported values are |
|
A path on the local filesystem, on S3, or on HDFS URI. The local path must be an absolute path that BLUE can access. |
|
When specified, tells BLUE to stop ingesting after the specified number of rows. Unlimited if unset. |
|
The row number from which to start ingesting. |
|
If when using the
|
|
Specifies if errors should be ignored or skipped. When set to true, the transaction will continue despite rejected data. This parameter should be set together with |
|
Specifies the maximum number of faulty records that will be ignored. This setting must be used in conjunction with |
|
Sets the maximum file size (bytes). |
|
Permitted values are |
|
Specifies the authentication details for secured S3 buckets. |
Automatic Schema Inference¶
You may let BLUE automatically infer the schema of a foreign table when using json_fdw
.
Automatic Schema Inference example:
CREATE FOREIGN TABLE t
WRAPPER
json_fdw
OPTIONS
(LOCATION = 's3://sqream-docs/nba.json');
Examples¶
JSON object array:
{ "name":"Avery Bradley", "age":25, "position":"PG" }
{ "name":"Jae Crowder", "age":25, "position":"PG" }
{ "name":"John Holland", "age":27, "position":"SG" }
JSON objects:
[
{ "name":"Avery Bradley", "age":25, "position":"PG" },
{ "name":"Jae Crowder", "age":25, "position":"SF" },
{ "name":"John Holland", "age":27, "position":"SG" }
]
When using the CREATE FOREIGN TABLE
statement, make sure that the table schema corresponds with the JSON file structure.
CREATE FOREIGN TABLE t (ID INT NOT NULL)
WRAPPER
json_fdw
OPTIONS
(location = 's3://sqream-docs/nba.json');
Tip
An exact match must exist between the BLUE and JSON types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.