Pentaho Data Integration
Overview
This document is a Quick Start Guide that describes how to install Pentaho, create a transformation, and define your output.
The Connecting to SQream Using Pentaho page describes the following:
Installing Pentaho
To install PDI, see the Pentaho Community Edition (CE) Installation Guide.
The Pentaho Community Edition (CE) Installation Guide describes how to do the following:
Downloading the PDI software.
Installing the JRE (Java Runtime Environment) and JDK (Java Development Kit).
Setting up the JRE and JDK environment variables for PDI.
Installing and Setting Up the JDBC Driver
After installing Pentaho you must install and set up the JDBC driver. This section explains how to set up the JDBC driver using Pentaho. These instructions use Spoon, the graphical transformation and job designer associated with the PDI suite.
You can install the driver by copying and pasting the SQream JDBC .jar file into your <directory>/design-tools/data-integration/lib directory.
Creating a Transformation
After installing Pentaho you can create a transformation.
To create a transformation:
Use the CLI to open the PDI client for your operating system (Windows):
$ spoon.bat
Open the spoon.bat file from its folder location.
In the View tab, right-click Transformations and click New.
A new transformation tab is created.
In the Design tab, click Input to show its file contents.
Drag and drop the CSV file input item to the new transformation tab that you created.
Double-click CSV file input. The CSV file input panel is displayed.
In the Step name field, type a name.
To the right of the Filename field, click Browse.
Select the file that you want to read from and click OK.
In the CSV file input window, click Get Fields.
In the Sample data window, enter the number of lines you want to sample and click OK. The default setting is 100.
The tool reads the file and suggests the field name and type.
In the CSV file input window, click Preview.
In the Preview size window, enter the number of rows you want to preview and click OK. The default setting is 1000.
Verify that the preview data is correct and click Close.
Click OK in the CSV file input window.
Defining Your Output
After creating your transformation you must define your output.
To define your output:
In the Design tab, click Output.
The Output folder is opened.
Drag and drop Table output item to the Transformation window.
Double-click Table output to open the Table output dialog box.
From the Table output dialog box, type a Step name and click New to create a new connection. Your steps are the building blocks of a transformation, such as file input or a table output.
The Database Connection window is displayed with the General tab selected by default.
Enter or select the following information in the Database Connection window and click Test.
The following table shows and describes the information that you need to fill out in the Database Connection window:
No.
Element Name
Description
1
Connection name
Enter a name that uniquely describes your connection, such as sampledata.
2
Connection type
Select Generic database.
3
Access
Select Native (JDBC).
4
Custom connection URL
Insert jdbc:Sqream://<host:port>/<database name>;user=<username>;password=<password>;[<optional parameters>; …];. The IP is a node in your SQream cluster and is the name or schema of the database you want to connect to. Verify that you have not used any leading or trailing spaces.
5
Custom driver class name
Insert com.sqream.jdbc.SQDriver. Verify that you have not used any leading or trailing spaces.
6
Username
Your SQreamdb username. If you leave this blank, you will be prompted to provide it when you connect.
7
Password
Your password. If you leave this blank, you will be prompted to provide it when you connect.
Click OK in the window above, in the Database Connection window, and Table Output window.
Importing Data
After defining your output you can begin importing your data.
For more information about backing up users, permissions, or schedules, see Backup and Restore Pentaho Repositories
To import data:
Double-click the Table output connection that you just created.
To the right of the Target schema field, click Browse and select a schema name.
Click OK. The selected schema name is displayed in the Target schema field.
Create a new hop connection between the CSV file input and Table output steps:
On the CSV file input step item, click the new hop connection icon.
Drag an arrow from the CSV file input step item to the Table output step item.
Release the mouse button. The following options are displayed.
Select Main output of step.
Double-click Table output to open the Table output dialog box.
In the Target table field, define a target table name.
Click SQL to open the Simple SQL editor.
In the Simple SQL editor, click Execute.
The system processes and displays the results of the SQL statements.
Close all open dialog boxes.
Click the play button to execute the transformation.
The Run Options dialog box is displayed.
Click Run.
The Execution Results are displayed.