Pentaho Data Integration

Overview

This document is a Quick Start Guide that describes how to install Pentaho, create a transformation, and define your output.

The Connecting to SQream Using Pentaho page describes the following:

Installing Pentaho

To install PDI, see the Pentaho Community Edition (CE) Installation Guide.

The Pentaho Community Edition (CE) Installation Guide describes how to do the following:

  • Downloading the PDI software.

  • Installing the JRE (Java Runtime Environment) and JDK (Java Development Kit).

  • Setting up the JRE and JDK environment variables for PDI.

Back to Overview

Installing and Setting Up the JDBC Driver

After installing Pentaho you must install and set up the JDBC driver. This section explains how to set up the JDBC driver using Pentaho. These instructions use Spoon, the graphical transformation and job designer associated with the PDI suite.

You can install the driver by copying and pasting the SQream JDBC .jar file into your <directory>/design-tools/data-integration/lib directory.

Back to Overview

Creating a Transformation

After installing Pentaho you can create a transformation.

To create a transformation:

  1. Use the CLI to open the PDI client for your operating system (Windows):

$ spoon.bat
  1. Open the spoon.bat file from its folder location.

  1. In the View tab, right-click Transformations and click New.

    A new transformation tab is created.

  2. In the Design tab, click Input to show its file contents.

  1. Drag and drop the CSV file input item to the new transformation tab that you created.

  1. Double-click CSV file input. The CSV file input panel is displayed.

  1. In the Step name field, type a name.

  1. To the right of the Filename field, click Browse.

  1. Select the file that you want to read from and click OK.

  1. In the CSV file input window, click Get Fields.

  1. In the Sample data window, enter the number of lines you want to sample and click OK. The default setting is 100.

    The tool reads the file and suggests the field name and type.

  2. In the CSV file input window, click Preview.

  1. In the Preview size window, enter the number of rows you want to preview and click OK. The default setting is 1000.

  1. Verify that the preview data is correct and click Close.

  1. Click OK in the CSV file input window.

Back to Overview

Defining Your Output

After creating your transformation you must define your output.

To define your output:

  1. In the Design tab, click Output.

    The Output folder is opened.

  2. Drag and drop Table output item to the Transformation window.

  1. Double-click Table output to open the Table output dialog box.

  1. From the Table output dialog box, type a Step name and click New to create a new connection. Your steps are the building blocks of a transformation, such as file input or a table output.

    The Database Connection window is displayed with the General tab selected by default.

  2. Enter or select the following information in the Database Connection window and click Test.

    The following table shows and describes the information that you need to fill out in the Database Connection window:

    No.

    Element Name

    Description

    1

    Connection name

    Enter a name that uniquely describes your connection, such as sampledata.

    2

    Connection type

    Select Generic database.

    3

    Access

    Select Native (JDBC).

    4

    Custom connection URL

    Insert jdbc:Sqream://<host:port>/<database name>;user=<username>;password=<password>;[<optional parameters>; …];. The IP is a node in your SQream cluster and is the name or schema of the database you want to connect to. Verify that you have not used any leading or trailing spaces.

    5

    Custom driver class name

    Insert com.sqream.jdbc.SQDriver. Verify that you have not used any leading or trailing spaces.

    6

    Username

    Your SQreamdb username. If you leave this blank, you will be prompted to provide it when you connect.

    7

    Password

    Your password. If you leave this blank, you will be prompted to provide it when you connect.

  3. Click OK in the window above, in the Database Connection window, and Table Output window.

Back to Overview

Importing Data

After defining your output you can begin importing your data.

For more information about backing up users, permissions, or schedules, see Backup and Restore Pentaho Repositories

To import data:

  1. Double-click the Table output connection that you just created.

  1. To the right of the Target schema field, click Browse and select a schema name.

  1. Click OK. The selected schema name is displayed in the Target schema field.

  1. Create a new hop connection between the CSV file input and Table output steps:

    1. On the CSV file input step item, click the new hop connection icon.

    2. Drag an arrow from the CSV file input step item to the Table output step item.

    3. Release the mouse button. The following options are displayed.

    4. Select Main output of step.

  1. Double-click Table output to open the Table output dialog box.

  1. In the Target table field, define a target table name.

  1. Click SQL to open the Simple SQL editor.

  1. In the Simple SQL editor, click Execute.

    The system processes and displays the results of the SQL statements.

  2. Close all open dialog boxes.

  1. Click the play button to execute the transformation.

    The Run Options dialog box is displayed.

  2. Click Run.

    The Execution Results are displayed.

Back to Overview