IBM Information Server 8.X (DataStage): Parallel Transformer Stage

DataStage: What is Transformer Stage?

DataStage provides several stages for extraction, transformations and loading the data into the data warehouse or data marts. The stages are classified into General, Database, Developement and Debug, File, Processing, Real time etc. These stages will fall into active or passive stage categories.

The transformer stage is a processing stage.

This stage allows us to create transformations to apply to your data based on the given business rules.

It can have a single input and any number of outputs. It can also have a reject link that takes any rows which have not been written to any of the outputs links by reason of a write failure or expression evaluation failure or by null handling rejects.

The transformer stage is divided into

1. Link Area

  • Define column definition
  • Define Stage Variables

2. Metadata Area

  • Define column metadata for input and outputs

Output links:

  1. Pass some data straight through the Transformer stage unaltered
  2. Modify the derivation by entering transform expression.
  3. Specify constraints that operate on entire output links
  4. Can also specify a constraint otherwise link, which is an output link that carries all the data not output on other links, that is, columns that have not met the criteria.

A constraint is an expression that specifies criteria that data must meet before it can be passed to the output link.

Reject link:

You can also specify another link which takes rows that have not be written to any other links because of write failure or expression evaluation failure. This is specified outside the stage by adding a link and converting it to a reject link. All the records that gets dropped because of null handling will also be written to reject link.

If Runtime Column Propagation is enabled, then metadata is not required for the outputs.

The Find and Replace capabilities allows to find the particular string with in an expression or search for column names or find an empty expression in the expression types.

Defining output column derivations:

  • Use drag-and-drop or copy and paste to copy an input column to outputs
  • Column auto-match facility to automatically set the columns derived from their matching input columns.

Column-auto match

  1. Choose the output link that you want to match columns with the input link from the drop down list.
  2. Match type area.
    • Location match – This will set column derivations to the input link columns in the equivalent positions.
    • Name match – The output derivations set based on the name match.

CONSTRAINTS and OTHERWISE/Log

A constraint is an expression that specifies criteria that data must meet before it can be passed to the output link.

  • Clicking on the Otherwise/Log field so a tick appears and leaving the Constraint fields blank. This will catch any rows that have failed to meet constraints on all the previous output links.
  • Clicking on the Otherwise/Log field will result in the number of rows written to that link (that is, rows which satisfy the constraint) recorded in the job log as a warning message

Along with these, we can define the local stage variables, use the system variables and also we can set the partitioning methods and sorting operations.