JSON
Overview
JSON (Java Script Object Notation) is used both as a file format and as a serialization method. The JSON file format is flexible and is commonly used for dynamic, nested, and semi-structured data representations.
The SQream DB JSON parser supports the RFC 8259 data interchange format and supports both JSON objects and JSON object arrays.
Only the JSON Lines data format is supported by SQream.
Making JSON Files Accessible to Workers
To give workers access to files, every node in your system must have access to the storage being used.
The following are required for JSON files to be accessible to workers:
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream servers have access to the HDFS NameNode with the correct user-id. For more information, see HDFS Environment.
For S3, ensure network access to the S3 endpoint. For more information, see Amazon Web Services.
For more information about configuring worker access, see Workload Manager.
Mapping between JSON and SQream
A JSON field consists of a key name and a value.
Key names, which are case sensitive, are mapped to SQream columns. Key names which do not have corresponding SQream table columns are treated as errors by default, unless the IGNORE_EXTRA_FIELDS
parameter is set to true
, in which case these key names will be ignored during the mapping process.
SQream table columns which do not have corresponding JSON fields are automatically set to null
as a value.
Values may be one of the following reserved words (lower-case): false
, true
, or null
, or any of the following data types:
JSON Data Type |
Representation in SQream |
---|---|
Number |
|
String |
|
JSON Literal |
|
JSON Array |
|
JSON Object |
|
Character Escaping
The ASCII 10 character (LF) marks the end of JSON objects. Use \\n
to escape the \n
character when you do not mean it be a new line.
Ingesting JSON Data into SQream
Syntax
To access JSON files, use the json_fdw
with a COPY FROM
, COPY TO
, or CREATE FOREIGN TABLE
statement.
The Foreign Data Wrapper (FDW) syntax is:
json_fdw [OPTIONS(option=value[,...])]
Parameters
The following parameters are supported by json_fdw
:
Parameter |
Description |
---|---|
|
Default format is |
|
Default value is |
|
Supported values are |
|
A path on the local filesystem, on S3, or on HDFS URI. The local path must be an absolute path that SQream DB can access. |
|
When specified, tells SQream DB to stop ingesting after the specified number of rows. Unlimited if unset. |
|
The row number from which to start ingesting. |
|
If when using the
|
|
Specifies if errors should be ignored or skipped. When set to true, the transaction will continue despite rejected data. This parameter should be set together with |
|
Specifies the maximum number of faulty records that will be ignored. This setting must be used in conjunction with |
|
Sets the maximum file size (bytes). |
|
Permitted values are |
|
Specifies the authentication details for secured S3 buckets. |
Automatic Schema Inference
You may let SQream DB automatically infer the schema of a foreign table when using json_fdw
.
For more information, follow the Automatic Foreign Table DDL Resolution page.
Automatic Schema Inference example:
CREATE FOREIGN TABLE t
WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
Examples
JSON objects:
[
{ "name":"Avery Bradley", "age":25, "position":"PG" },
{ "name":"Jae Crowder", "age":25, "position":"SF" },
{ "name":"John Holland", "age":27, "position":"SG" }
]
Using the COPY FROM
statement:
COPY t
FROM WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
Note that JSON files generated using the COPY TO
statement will store objects, and not object arrays.
COPY t
TO WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
When using the CREATE FOREIGN TABLE
statement, make sure that the table schema corresponds with the JSON file structure.
CREATE FOREIGN TABLE t
(
id int not null
)
WRAPPER json_fdw
OPTIONS
(
location = 'somefile.json'
)
;
The following is an example of loading data from a JSON file into SQream:
WRAPPER json_fdw
OPTIONS
(
LOCATION = 'somefile.json'
);
Tip
An exact match must exist between the SQream and JSON types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.