Python (pysqream)

The current Pysqream connector supports Python version 3.9 and newer. It includes a set of packages that allows Python programs to connect to SQreamDB. The base pysqream package conforms to Python DB-API specifications PEP-249.

pysqream is a pure Python connector that can be installed with pip on any operating system, including Linux, Windows, and macOS. pysqream-sqlalchemy is a SQLAlchemy dialect for pysqream.

Installing the Python Connector 

Prerequisites

It is essential that you have the following installed:

Python 

The connector requires Python version 3.9 or newer.

To see your current Python version, run the following command:

$ python --version

PIP 

The Python connector is installed via pip, the standard package manager for Python, which is used to install, upgrade and manage Python packages (libraries) and their dependencies.

We recommend upgrading to the latest version of pip before installing.

To verify that you have the latest version, run the following command:

$ python3 -m pip install --upgrade pip
Collecting pip
   Downloading https://files.pythonhosted.org/packages/00/b6/9cfa56b4081ad13874b0c6f96af8ce16cfbc1cb06bedf8e9164ce5551ec1/pip-19.3.1-py2.py3-none-any.whl (1.4MB)
         |████████████████████████████████| 1.4MB 1.6MB/s
Installing collected packages: pip
  Found existing installation: pip 19.1.1
        Uninstalling pip-19.1.1:
          Successfully uninstalled pip-19.1.1
Successfully installed pip-19.3.1

Note

On macOS, you may want to use virtualenv to install Python and the connector, to ensure compatibility with the built-in Python environment
If you encounter an error including SSLError or WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. - please be sure to reinstall Python with SSL enabled, or use virtualenv or Anaconda.

OpenSSL for Linux 

The Python connector relies on OpenSSL for secure connections to SQreamDB. Some distributions of Python do not include OpenSSL.

To install OpenSSL on RHEL, run the following command:

$ sudo yum install -y libffi-devel openssl-devel

Installing via PIP with an internet connection

The Python connector is available via PyPi.

To install the connector using pip, it is advisable to use the -U or --user flags instead of sudo, as it ensures packages are installed per user. However, it is worth noting that the connector can only be accessed under the same user.

To install pysqream and pysqream-sqlalchemy with the --user flag, run the following command:

$ pip3.9 install pysqream pysqream-sqlalchemy --user

pip3 will automatically install all necessary libraries and modules.

Installing via PIP without an internet connection

To get the .whl package file, contact you SQreamDB support representative.
Run the following command:

tar -xf pysqream_connector_5.2.0.tar.gz
cd pysqream_connector_5.2.0
#Install all packages with --no-index --find-links .
python3 -m pip install *.whl -U --no-index --find-links .
python3.9 -m pip install pysqream-5.2.0.zip -U --no-index --find-links .
python3.9 -m pip install pysqream-sqlalchemy-1.3.zip  -U --no-index --find-links .

Upgrading an Existing Installation

The Python drivers are updated periodically. To upgrade an existing pysqream installation, use pip’s -U flag:

$ pip3.9 install pysqream pysqream-sqlalchemy -U

SQLAlchemy 

SQLAlchemy is an Object-Relational Mapper (ORM) for Python. When you install the SQream dialect (pysqream-sqlalchemy) you can use frameworks such as Pandas, TensorFlow, and Alembic to query SQream directly.

Before You Begin

Download pysqream-sqlalchemy

Limitation

Does not support the ARRAY data type

Creating a Standard Connection

Parameter	Description
`username`	Username of a role to use for connection
`password`	Specifies the password of the selected role
`host`	Specifies the hostname
`port`	Specifies the port number
`port_ssl`	An optional parameter
`database`	Specifies the database name
`clustered`	Establishing a multi-clustered connection. Input values: `True`, `False`. Default is `False`
`service`	Specifies service queue to use

import sqlalchemy as sa
from sqlalchemy.engine.url import URL


engine_url = sa.engine.url.URL('sqream',
                               username='<user_name>',
                               password='<password>',
                               host='<host_name>',
                               port=<port_number>,
                               port_ssl=<port_ssl>,
                               database='<database_name>')

engine = sa.create_engine(engine_url,connect_args={"clustered": False, "service": "<service_name>"})

Pulling a Table into Pandas

The following example shows how to pull a table in Pandas. This example uses the URL method to create the connection string:

import sqlalchemy as sa
import pandas as pd
from sqlalchemy.engine.url import URL


engine_url = sa.engine.url.URL('sqream',
                               username='sqream',
                               password='12345',
                               host='127.0.0.1',
                               port=3108,
                               database='master')

engine = sa.create_engine(engine_url,connect_args={"clustered": True, "service": "admin"})
table_df = pd.read_sql("select * from nba", con=engine)

API 

Using the Cursor 

The DB-API specification includes several methods for fetching results from the cursor. This sections shows an example using the nba table, which looks as follows:

nba
Name	Team	Number	Position	Age	Height	Weight	College	Salary
Avery Bradley	Boston Celtics	0.0	PG	25.0	6-2	180.0	Texas	7730337.0
Jae Crowder	Boston Celtics	99.0	SF	25.0	6-6	235.0	Marquette	6796117.0
John Holland	Boston Celtics	30.0	SG	27.0	6-5	205.0	Boston University
R.J. Hunter	Boston Celtics	28.0	SG	22.0	6-5	185.0	Georgia State	1148640.0
Jonas Jerebko	Boston Celtics	8.0	PF	29.0	6-10	231.0		5000000.0
Amir Johnson	Boston Celtics	90.0	PF	29.0	6-9	240.0		12000000.0
Jordan Mickey	Boston Celtics	55.0	PF	21.0	6-8	235.0	LSU	1170960.0
Kelly Olynyk	Boston Celtics	41.0	C	25.0	7-0	238.0	Gonzaga	2165160.0
Terry Rozier	Boston Celtics	12.0	PG	22.0	6-2	190.0	Louisville	1824360.0

As before, you must import the library and create a Connection(), followed by execute() on a simple SELECT * query:

import pysqream


con = pysqream.connect(host='127.0.0.1',
                       port=3108,
                       database='master',
                       username='rhendricks',
                       password='Tr0ub4dor&3',
                       clustered=True)

cur = con.cursor() # Create a new cursor
# The select statement:
statement = 'SELECT * FROM nba'
cur.execute(statement)

When the statement has finished executing, you have a Connection cursor object waiting. A cursor is iterable, meaning that it advances the cursor to the next row when fetched.

You can use fetchone() to fetch one record at a time:

first_row = cur.fetchone() # Fetch one row at a time (first row)

second_row = cur.fetchone() # Fetch one row at a time (second row)

To fetch several rows at a time, use fetchmany():

# executing `fetchone` twice is equivalent to this form:
third_and_fourth_rows = cur.fetchmany(2)

To fetch all rows at once, use fetchall():

# To get all rows at once, use `fetchall`
remaining_rows = cur.fetchall()

cur.close()


# Close the connection when done
con.close()

The following is an example of the contents of the row variables used in our examples:

>>> print(first_row)
('Avery Bradley', 'Boston Celtics', 0, 'PG', 25, '6-2', 180, 'Texas', 7730337)
>>> print(second_row)
('Jae Crowder', 'Boston Celtics', 99, 'SF', 25, '6-6', 235, 'Marquette', 6796117)
>>> print(third_and_fourth_rows)
[('John Holland', 'Boston Celtics', 30, 'SG', 27, '6-5', 205, 'Boston University', None), ('R.J. Hunter', 'Boston Celtics', 28, 'SG', 22, '6-5', 185, 'Georgia State', 1148640)]
>>> print(remaining_rows)
[('Jonas Jerebko', 'Boston Celtics', 8, 'PF', 29, '6-10', 231, None, 5000000), ('Amir Johnson', 'Boston Celtics', 90, 'PF', 29, '6-9', 240, None, 12000000), ('Jordan Mickey', 'Boston Celtics', 55, 'PF', 21, '6-8', 235, 'LSU', 1170960), ('Kelly Olynyk', 'Boston Celtics', 41, 'C', 25, '7-0', 238, 'Gonzaga', 2165160),
[...]

Note

Calling a fetch command after all rows have been fetched will return an empty array ([]).

Reading Result Metadata 

When you execute a statement, the connection object also contains metadata about the result set, such as column names, types, etc).

The metadata is stored in the Connection.description object of the cursor:

import pysqream


con = pysqream.connect(host='127.0.0.1',
                       port=3108,
                       database='master',
                       username='rhendricks',
                       password='Tr0ub4dor&3',
                       clustered=True)
cur = con.cursor()
statement = 'SELECT * FROM nba'
cur.execute(statement)
print(cur.description)
# [('Name', 'STRING', 24, 24, None, None, True), ('Team', 'STRING', 22, 22, None, None, True), ('Number', 'NUMBER', 1, 1, None, None, True), ('Position', 'STRING', 2, 2, None, None, True), ('Age (as of 2018)', 'NUMBER', 1, 1, None, None, True), ('Height', 'STRING', 4, 4, None, None, True), ('Weight', 'NUMBER', 2, 2, None, None, True), ('College', 'STRING', 21, 21, None, None, True), ('Salary', 'NUMBER', 4, 4, None, None, True)]

You can fetch a list of column names by iterating over the description list:

>>> [ i[0] for i in cur.description ]
['Name', 'Team', 'Number', 'Position', 'Age (as of 2018)', 'Height', 'Weight', 'College', 'Salary']

Loading Data into a Table 

This example shows how to load 10,000 rows of dummy data to an instance of SQreamDB.

To load data 10,000 rows of dummy data to an instance of SQreamDB:

Run the following:

import pysqream
import sqlalchemy.orm as orm
from datetime import date, datetime
from time import time


con = pysqream.connect(host='127.0.0.1',
                       port=3108,
                       database='master',
                       username='rhendricks',
                       password='Tr0ub4dor&3',
                       clustered=True)

cur = con.cursor()

Create a table for loading:

create = 'create or replace table perf (b bool, t tinyint, sm smallint, i int, bi bigint, f real, d double, s text(12), ss text, dt date, dtt datetime)'
cur.execute(create)

Create a session:

session = orm.sessionmaker(bind=engine)()

Load your data into table using the INSERT command.

Create dummy data matching the table you created:

data = (False, 2, 12, 145, 84124234, 3.141, -4.3, "Marty McFly" , u"キウイは楽しい鳥です" , date(2019, 12, 17), datetime(1955, 11, 4, 1, 23, 0, 0))

row_count = 10**4

Get a new cursor:

insert = 'insert into perf values (?,?,?,?,?,?,?,?,?,?,?)'
start = time()
cur.executemany(insert, [data] * row_count)
print (f"Total insert time for {row_count} rows: {time() - start} seconds")

Close this cursor:
```
cur.close()
```

Verify that the data was inserted correctly:

cur = con.cursor()
cur.execute('select count(*) from perf')
result = cur.fetchall() # `fetchall` collects the entire data set
print (f"Count of inserted rows: {result[0][0]}")

Close the cursor:
```
cur.close()
```
Close the connection:

con.close()

Using SQLAlchemy ORM to Create and Populate Tables 

This section shows how to use the ORM to create and populate tables from Python objects.

To use SQLAlchemy ORM to create and populate tables:

Run the following:

import sqlalchemy as sa
import pandas as pd

engine_url = "sqream://rhendricks:secret_password@localhost:5000/raviga"

engine = sa.create_engine(engine_url)

Build a metadata object and bind it:

metadata = sa.MetaData()
metadata.bind = engine

Create a table in the local metadata:

employees = sa.Table(
    'employees',
    metadata,
    sa.Column('id', sa.Integer),
    sa.Column('name', sa.TEXT(32)),
    sa.Column('lastname', sa.TEXT(32)),
    sa.Column('salary', sa.Float)
)

The create_all() function uses the SQreamDB engine object.

Create all the defined table objects:
```
metadata.create_all(engine)
```
Populate your table.

Build the data rows:

insert_data = [ {'id': 1, 'name': 'Richard','lastname': 'Hendricks',   'salary': 12000.75},
                {'id': 3,  'name': 'Bertram', 'lastname': 'Gilfoyle', 'salary': 8400.0},
                {'id': 8,  'name': 'Donald', 'lastname': 'Dunn', 'salary': 6500.40}]

Build the INSERT command:
```
ins = employees.insert(insert_data)
```
Execute the command:
```
result = session.execute(ins)
```

Prepared Statements 

Prepared statements, also known as parameterized queries, are a safer and more efficient way to execute SQL statements. They prevent SQL injection attacks by separating SQL code from data, and they can improve performance by reusing prepared statements. In SQream, we use ? as a placeholder for the relevant value in parameterized queries. Prepared statements INSERT, SELECT, UPDATE and DELETE

Prepared Statement Limitations

Prepared Statement do not support the use of Identifiers and Keywords as input parameters.

Prepared Statements code example

import pysqream
from datetime import date, datetime
from time import time

# SQreamDB Connection Setting
con = pysqream.connect(host='<your-host-ip>', port=3108, database='master'
                                                                   , username='<SQDB role name>', password='<SQDB role password>'
                                                                   , clustered=True)
cur = con.cursor()

# CREATE
create = 'create or replace table perf (b bool, t tinyint, sm smallint, i int, bi bigint, f real, d double, s text(12), ss text, dt date, dtt datetime)'
cur.execute(create)

# DATA
data = (False, 2, 12, 145, 84124234, 3.141, -4.3, "STRING1" , "STRING2", date(2024, 11, 11), datetime(2024, 11, 11, 11, 11, 11, 11))
row_count = 10**2

# Insert
insert = 'insert into perf values (?,?,?,?,?,?,?,?,?,?,?)'
start = time()

# Prepared Statement
cur.executemany(insert, [data] * row_count)
print (f"Total insert time for {row_count} rows: {time() - start} seconds")


# Results(Table Count)
cur = con.cursor()
cur.execute('select count(*) from perf')
result = cur.fetchall() # `fetchall` collects the entire data set
print (f"Count of inserted rows: {result[0][0]}")


# SELECT
query = "SELECT * FROM perf WHERE s = ?"
params = [("STRING1",)]

# Prepared Statement
cur.execute(query,params)


# Result
rows = cur.fetchall()
print(rows)

for row in rows:
                print(row)


# UPDATE
query = "UPDATE perf SET s = '?' WHERE s = '?'"
params = [("STRING3", "STRING2")]

# Prepared Statement
cur.execute(query,params)

print("Update completed.")


# DELETE
query = "DELETE FROM perf WHERE s = ?"
params = [("STRING1",)]

# Prepared Statement
cur.execute(query,params)

print("Delete completed.")

# Conn Close
cur.close()
con.close()