JSON
JSON (Java Script Object Notation) is used both as a file format and as a serialization method. The JSON file format is flexible and is commonly used for dynamic, nested, and semi-structured data representations.
The SQreamDB JSON parser supports the RFC 8259 data interchange format and supports both JSON objects and JSON object arrays.
Only the JSON Lines data format is supported by SQreamDB.
Foreign Data Wrapper Prerequisites
Before proceeding, ensure the following Foreign Data Wrapper (FDW) prerequisites:
File Existence: Verify that the file you are ingesting data from exists at the specified path.
Path Accuracy: Confirm that all path elements are present and correctly spelled. Any inaccuracies may lead to data retrieval issues.
Bucket Access Permissions: Ensure that you have the necessary access permissions to the bucket from which you are ingesting data. Lack of permissions can hinder the data retrieval process.
Wildcard Accuracy: If using wildcards, double-check their spelling and configuration. Misconfigured wildcards may result in unintended data ingestion.
Making JSON Files Accessible to Workers
To give workers access to files, every node in your system must have access to the storage being used.
The following are required for JSON files to be accessible to workers:
For files hosted on NFS, ensure that the mount is accessible from all servers.
For HDFS, ensure that SQream servers have access to the HDFS NameNode with the correct user-id. For more information, see HDFS Environment.
For S3, ensure network access to the S3 endpoint. For more information, see Amazon Web Services.
For more information about configuring worker access, see Workload Manager.
Mapping between JSON and SQream
A JSON field consists of a key name and a value.
Key names, which are case sensitive, are mapped to SQream columns. Key names which do not have corresponding SQream table columns are treated as errors by default, unless the IGNORE_EXTRA_FIELDS
parameter is set to true
, in which case these key names will be ignored during the mapping process.
SQream table columns which do not have corresponding JSON fields are automatically set to null
as a value.
Values may be one of the following reserved words (lower-case): false
, true
, or null
, or any of the following data types:
JSON Data Type |
Representation in SQream |
---|---|
Number |
|
String |
|
JSON Literal |
|
JSON Array |
|
JSON Object |
|
Character Escaping
The ASCII 10 character (LF) marks the end of JSON objects. Use \\n
to escape the \n
character when you do not mean it be a new line.
Ingesting JSON Data into SQream
Syntax
To access JSON files, use the json_fdw
with a COPY FROM
, COPY TO
, or CREATE FOREIGN TABLE
statement.
The Foreign Data Wrapper (FDW) syntax is:
json_fdw [OPTIONS(option=value[,...])]
Parameters
The following parameters are supported by json_fdw
:
Parameter |
Description |
---|---|
|
Default format is |
|
Default value is |
|
Supported values are |
|
A path on the local filesystem, on S3, or on HDFS URI. The local path must be an absolute path that SQream DB can access. |
|
When specified, tells SQream DB to stop ingesting after the specified number of rows. Unlimited if unset. |
|
The row number from which to start ingesting. |
|
If when using the
|
|
Specifies if errors should be ignored or skipped. When set to true, the transaction will continue despite rejected data. This parameter should be set together with |
|
Specifies the maximum number of faulty records that will be ignored. This setting must be used in conjunction with |
|
Sets the maximum file size (bytes). |
|
Permitted values are |
|
Specifies the authentication details for secured S3 buckets. |
Automatic Schema Inference
SQreamDB can read the file metadata, enabling the automatic inference of column structure and data types.
CREATE FOREIGN TABLE nba
WRAPPER
json_fdw
OPTIONS
(LOCATION = 's3://sqream-docs/nba.json');
For more information, follow the CREATE FOREIGN TABLE page.
Examples
JSON objects:
[
{ "name":"Avery Bradley", "age":25, "position":"PG" },
{ "name":"Jae Crowder", "age":25, "position":"SF" },
{ "name":"John Holland", "age":27, "position":"SG" }
]
Using the COPY FROM
statement:
COPY
nba
FROM
WRAPPER
json_fdw
OPTIONS
(LOCATION = 's3://sqream-docs/nba.json');
Note that JSON files generated using the COPY TO
statement will store objects, and not object arrays.
COPY
nba
TO
WRAPPER
json_fdw
OPTIONS
(location = 's3://sqream-docs/nba.json');
When using the CREATE FOREIGN TABLE
statement, make sure that the table schema corresponds with the JSON file structure.
CREATE FOREIGN TABLE t (id int not null)
WRAPPER
json_fdw
OPTIONS
(location = 'sqream-docs.json');
The following is an example of loading data from a JSON file into SQream:
WRAPPER
json_fdw
OPTIONS
(LOCATION = 'sqream-docs.json');
Tip
An exact match must exist between the SQream and JSON types. For unsupported column types, you can set the type to any type and exclude it from subsequent queries.