Python Functions

Python functions in SQream allow you to execute custom Python logic on your data and return the results. This functionality integrates the power of Python’s data processing libraries (like Pandas) directly into your SQL queries, supporting two main types: Table Functions (which return a table) and Scalar Functions (which return a single value).

1. Syntax Overview

Python Table Functions (PTFs) and Python Scalar Functions (PSFs) are based on creating a Module.

Module Creation:

CREATE MODULE module_name OPTIONS (
        PATH = 'module.py',
        ENTRY_POINTS = [
                [
                        NAME = 'function_name',
                        ARGUMENTS [type1, type2, ...] | LIKE table_name,
                        RETURNS TABLE (col1 type1, col2 type2, ...) | LIKE table_name | SCALAR type,
                        LITERAL_PARAMETERS = number,
                        GPU = true|false
                ]
        ]
);

Syntax

Description

Mandatory/Optional

PATH

Specifies the file path to your Python script on the server.

Mandatory

ENTRY_POINTS

A list of the Python functions within the script that can be called from SQream.

Mandatory

NAME

The name of the Python function.

Mandatory

ARGUMENTS

Can specify explicit types [int, text, float] or reference existing table structure LIKE table_name.

Mandatory

RETURNS

The RETURNS clause supports the following options: 1. TABLE (column_name data_type, ...) – Defines an explicit table structure. 2. LIKE table_name – Inherits structure from an existing table. 3. SCALAR data_type – Specifies a scalar return type.

Mandatory

GPU=true/false

Whether the function executes on GPU or CPU. Default is CPU.

Optional

LITERAL_PARAMETERS

Number of literal parameters the function accepts.

Optional

Notes:

1. Input Casting

The system applies automatic type casting to all input arguments.

2. Output Casting

  • All return types are automatically marked as nullable (isNullable = true).

  • Output columns are typed according to the function’s declared return specification.

3. Literal Parameters

  • Literal values can be passed to table functions.

  • Literal parameters are stored as Seq[String] and provided to the function at execution time.

4. Dataframe Input Parameter to the Python Function

Both PTFs and PSFs require a dataFrame to be passed as the input parameter to the Python function.

Python Table Functions (PTFs): The <function_clause>

This clause defines the execution of your Python Table Function. It has the following structure:

table(<table_function>(cursor(<sub_query>), <literal_param>*));

Syntax

Description

Mandatory/Optional

table()

This is the main function wrapper that tells SQream to execute the Python Table Function and treat its result as a relation (a table).

Mandatory

table_function

This is the fully qualified name of the Python function, including the module name. For example, arr_varif.final_array.

Mandatory

cursor(<sub_query>)

This is an optional argument that passes the result of a subquery (any valid SELECT statement) to your Python function. The data is provided to the Python function as a Pandas DataFrame.

Optional

<literal_param>*

These are optional string literals that are passed as additional arguments to your Python function. They must be defined in the module’s literal_parameters option and will be cast to strings in the Python code.

Optional

Python Scalar Functions (PSFs): The <scalar_function>

A Python Scalar Function is called directly by its fully qualified name and accepts one or more arguments, which can be column names or literal values.

<module_name>.<scalar_function>([<arg1>, <arg2>, ...<literal_params>*]);

Syntax

Description

Mandatory/Optional

<module_name>

See the section for Python Module

Mandatory

<scalar_function>

The fully qualified name of the Python Scalar Function

Mandatory

[<arg1>, <arg2>, …]

The arguments passed to the function. These can be columns from the queried table (e.g., t.my_column) or literal values (e.g., 10, ‘hello’).

Optional

<literal_param>*

These are optional string literals that are passed as additional arguments to your Python function. They must be defined in the module’s literal_parameters option and will be cast to strings in the Python code.

Optional

Example: Defining a Module with Both Function Types

CREATE OR REPLACE MODULE my_funcs
    OPTIONS(
path='/home/sqream/udf/my_functions.py',
entry_points =
[
    -- PTF Definition
    [
        name = 'process_data_table',
        arguments [text, int],
        returns table (processed_text text, processed_value int),
        gpu=true
    ],
    -- PSF Definition
    [
        name = 'add_one',
        arguments [int],
        returns int,
        gpu=false
    ]
]
    );

2. Usage Examples

Example 1: Python Scalar Function (PSF)

This example demonstrates calling a PSF in the SELECT list.

  • Python Function:

def multiply(df):
"""
Takes a dataframe containing 'number' and 'multiplier' columns,
calculates the product row-by-row, and returns the modified dataframe.
"""
# Create a copy to ensure we don't modify the original data unintentionally
df_copy = df.copy()

# Vectorized multiplication: Multiplies aligned rows instantly
df_result = df_copy['number'] * df_copy['multiplier']

return df_result
  • Creating the Module in Sqream:

CREATE OR REPLACE MODULE my_mod1
OPTIONS(
    PATH='/tmp/py_udf.py',
    ENTRY_POINTS=[
        [
            NAME = 'multiply',
            ARGUMENTS [int, int],
            RETURNS SCALAR int,
            GPU = false
        ]
    ]
);
  • How to use the Module?

create or replace table t (number int, multiplier int);
INSERT INTO t VALUES ( 1, 2 ),( 2, 2 );
select my_mod1.multiply(number, multiplier) as multiple_results from t;
  • Results:

multiple_results

2

4

Example 2: Passing Literal Parameters For Python Scalar Function (PSF)

This example demonstrates how to pass literals to PSF.

  • Python Function:

    def multiply(df,literals_map):
    """
Takes a dataframe containing 'number' and 'multiplier' columns,
calculates the product row-by-row, and returns the modified dataframe.
"""
num1= int(literals_map['0'])
num2= int(literals_map['1'])

df_copy = df.copy()

df_result = df_copy['number'] * df_copy['multiplier'] * num1 * num2
  • Creating the Module in Sqream:

CREATE OR REPLACE MODULE my_mod2
OPTIONS (
                PATH = '/app/scalar_multipy_function_with_literals.py',
                ENTRY_POINTS = [
                                [
                                                NAME = 'multiply',
                                                ARGUMENTS [int, int],
                                                LITERAL_PARAMETERS = 2,
                                                RETURNS SCALAR int,
                                                GPU = false
                                ]
                ]
);
  • How to use the Module?

create or replace table t (number int, multiplier int);
INSERT INTO t VALUES ( 2, 4 ),( 5 , 7 );
select my_mod2.multiply(number, multiplier,'10','20') as multiple_results from t;
  • Results:

multiple_results

1600

7000

Example 3: Python Table Function (PTF)

This example demonstrates how to use cursor() to pass an entire table’s data to a PTF.

  • Python Function:

# Defined in 'my_functions.py'
def last_column(df):
df_new = df.iloc[:, -1:]
return df_new
  • Creating the Module in Sqream:

CREATE OR REPLACE MODULE my_mod3
OPTIONS (
    PATH = '/app/my_functions1.py',
    ENTRY_POINTS = [
        [
            NAME = 'last_column',
            ARGUMENTS [date, datetime, text],
            RETURNS TABLE (col_name text),
            GPU = true
        ]
    ]
);
  • How to use the Module?

CREATE OR REPLACE TABLE t (xdate DATE, xdatetime DATETIME, xtext TEXT);
INSERT INTO t VALUES (DATE '2025-09-11', DATETIME '2025-09-11 14:30:00', 'some sample text');
SELECT * FROM table(my_mod3.last_column(CURSOR(SELECT * FROM t)));
  • Results:

col_name

some sample text

Note

The select * from t subquery passes the t table to the function.

Example 4: Passing Literal Parameters For Python Table Function (PTF)

This example demonstrates how to pass literals to PTF.

  • Python Function:

    # Defined in 'my_functions.py'
def add_literal_column(df, literals_map ):
df_new = df.copy()
df_new["col4"] = literals_map['0']
df_new["col5"] = literals_map['1']
return df_new
  • Creating the Module in Sqream:

CREATE OR REPLACE MODULE my_mod4
OPTIONS (
    PATH = '/app/passing_literals.py',
    ENTRY_POINTS = [
        [
            NAME = 'add_literal_column',
            ARGUMENTS [boolean, int, date],
            LITERAL_PARAMETERS = 2,
            RETURNS TABLE (col1 boolean, col2 int, col3 date, col4 text, col5 int),
            GPU = true
        ]
    ]
);
  • How to use the Module?

CREATE OR REPLACE TABLE t (col1 boolean, col2 int, col3 date);
INSERT INTO t VALUES (0, 1, '2025-09-11'),(1, 2, '2027-01-01');

SELECT col1, col2, col3, col4, (col5+5) FROM TABLE( my_mod4.add_literal_column(  CURSOR(SELECT * FROM t), 'Text1', '1000') );
  • Results:

col1

col2

col3

col4

EXPR$4

0

1

2025-09-11

Text1

1005

1

2

2027-01-01

Text1

1005

Example 5: Join Statment on a Python Table Function (PTF)

In the following example, the employees table stores employee attributes, while the sales_orders table contains sales records, including the employee responsible for each sale and the corresponding sale amount. A standard join between these two tables allows us to retrieve metrics such as the total number of sales per employee and the overall sales amount. To enhance this data, we may want to convert the sales amount from USD to EUR. The Parameterized Table Function (PTF) performs this enrichment by applying the appropriate conversion rate, which is supplied by the user at runtime. The PTF returns the enriched dataset with the total amount expressed in EUR.

  • Python Function:

# Defined in 'my_functions.py'
def convertAmountBasedOnRate(df, literals_map):
    df_new = df.copy()
    rate = literals_map['0']

    if 'totalamount' in df_new.columns:
        df_new["ConvertedAmount"] = df_new['totalamount'] * rate
    else:
        # Handle case where the expected column isn't present
        print("Warning: 'totalamount' column not found. 'commission' column not added.")

    return df_new
  • Creating the Module in Sqream:

CREATE OR REPLACE MODULE my_mod5
OPTIONS (
    PATH = '/app/my_functions.py',
    ENTRY_POINTS = [
        [
            NAME = 'convertAmountBasedOnRate',
            ARGUMENTS [int, int, date, double],
            LITERAL_PARAMETERS = 1,
            RETURNS TABLE (orderid int, employeeid int, orderdate date, totalamount double, ConvertedAmount double),
            GPU = true
        ]
    ]
);
  • How to use the Module?

create table employees (EmployeeId int , FirstName text, LastName text, HireDate date, DepartmentId int);

insert into employees values (101,'Alex','Johnson','2023-01-15','3'),(102,'Sarah','Chen','2020-07-01','1'),
(103,'David','Lee','2024-11-20','4'),(104,'Emily','Smith','2021-10-25','4'),(105,'Ryan','Garcia','2023-05-10','2');

create or replace table sales_orders (OrderId int, EmployeeId int, OrderDate date, TotalAmount double);

insert into sales_orders values (5001,102,'2025-02-10',1250.00),
                                (5002,101,'2025-05-01',890.50),
                                (5003,102,'2025-09-15',3400.00),
                                (5004,104,'2025-08-22',520.25),
                                (5005,101,'2025-11-05',150.00),
                                (5006,102,'2025-10-01',1800.00),
                                (5007,104,'2025-06-18',985.00),
                                (5008,105,'2025-03-20',2500.00);

SELECT
    emp.EmployeeId AS "EmployeeId",
    COUNT(*) AS "NumberOfSales",
    SUM(convSales.totalamount) AS "TotalSalesAmountUsd",
    SUM(convSales.ConvertedAmount) AS "TotalSalesAmountEur"
FROM
    employees AS emp
JOIN
    TABLE(
        my_mod5.convertAmountBasedOnRate(
            CURSOR(SELECT * FROM sales_orders),
            '0.86'
        )
    ) AS convSales
    ON emp.EmployeeId = convSales.EmployeeId
GROUP BY
    emp.EmployeeId
ORDER BY
    NumberOfSales DESC;
  • Results:

EmployeeId

NumberOfSales

TotalSalesAmountUsd

TotalSalesAmountEur

102

3

6450.0

5547.0

101

2

1040.5

894.83

104

2

1505.25

1294.515

105

1

2500.0

2150.0

3. Return Values

  • Python Table Function(PTF): Your Python function must return a Pandas DataFrame. The column names and data types of this DataFrame must exactly match the schema defined in the returns table(…) clause of the module’s entry point.

  • Python Scalar Function(PSF): Your Python function must return a single, non-DataFrame Python value (e.g., an integer, string, or float). This value will be automatically converted to the single SQL data type defined in the returns <data_type> clause of the module’s entry point.