Topics on this page:

Next section

Writing Transformations

This section provides you information on how to write your own transformations.

Step by step instructions

To start, extend class Transformer. Transformer is generic, so it's best if you specify a data type. I'll use the Sorter transformer provided by Transform4J as an example. Sorter will read all from the data source and sort the data i the order you specify. You'll be required to implement process (T). As you see, this transformation merely records the data to be sorted in a List; they'll be sorted later.

public class Sorter extends Transformer<DataRecord> {

	private List<DataRecord> recordList = new ArrayList<DataRecord>();

	@Override
	protected void process(DataRecord input) {
		recordList.add(input);		
	}
	
}
			

This excerpt from Sorter is incomplete; there's no code to actually allow users to specify the ascending or descending sort fields or actually perform the sort yet. Specifying ascending and descending fields are merely properties on the class (I've omitted the accessors and mutators for brevity).

public class Sorter extends Transformer<DataRecord> {
	private String[] sortFields;
	private String[] sortFieldDescending;
}
			

For actually performing the sort, we rely on a Comparator to determine the order. The Comparator used here is DataRecordComparator (see DataRecordComparator javadoc). We override the init(), which is executed before the transformation to establish the Comparator.

public class Sorter extends Transformer<DataRecord> {
	@Override
	protected void init() {
		super.init();
		comparator = new DataRecordComparator(sortFields, sortFieldDescending);
	}
}
			

To actually perform the sort, we override the close() method to sort the data and write the sorted data to the provided data target.

public class Sorter extends Transformer<DataRecord> {
	@Override
	protected void close() {
		super.close();
		
		Collections.sort(recordList, comparator);
		for (DataRecord record: recordList) {
			this.getTransformerDataTarget().insert(record);
		}
		recordList.clear();
	}
	
}
			

Implementors merely except if they can't process properly. For additional detail on error handling, see Error handling for transformations.

Implementors writing data to relational databases will find that transaction management is handled for you. For additional detail on transaction management, see JDBC Transaction Management.

JDBC Transaction Management

If the transformation uses JdbcDataTarget, transaction management is taken care of for you. If a record processed errors out, rollbacks are issued. If records are processed successfully, commits are issued. For example, each record from the data source is processed by the transformation via method process(). If processing of a record completes without exception, a JDBC commit() is issued. A rollback() is issued if an error occurs.

If you use CompositeTarget and/or ChainedTarget to effectively perform multiple writes for a transformation or string multiple transformations together, transaction management will be handled for all Jdbc data targets and will be managed at the outer-most transformation in the string. For example, let's consider the following example:

  • Transformer 1 has a CompositeTarget that contains a JDBCDataTarget and a ChainedTarget containing a reference to Transformer 2. When transformer 1 processes a record, it potentially writes to that composite target, writes to the JDBCDataTarget is initiated as well as transformer 2 processing for that written data. This activity occurs during record processing for Transformer 1.
  • Transformer 2 has a CompositeTarget that contains a JDBCDataTarget and a ChainedTarget containing a reference to Transformer 3. When transformer 2 processes a record, it potentially writes to that composite target, writes to the JDBCDataTarget is initiated as well as transformer 3 processing for that written data. This activity occurs during record processing for transformers 1 and 2.
  • Transformer 3 has a JDBCDataTarget and initiates a write for it. This activity occurs during record processing for transformers 1, 2, and 3.
  • If any of the three transformers or data targets excepts, all JDBC transactions in process will be rolled back. This is true even if those JDBC transactions occurred on different databases or even different servers. Assuming the transaction is processed without exception, a commit() is issued when transformer 1 finishes processing each record (which means that transformers 2 and 3 have also finished processing their records)

    Error handling for transformations

    Transformer implementors throw runtime exceptions if a transformation errors out. If a transformer excepts while processing a record, the transformation is not terminated. Processing proceeds to the next input record. At the end of the transformation, an exception which contains details for all transactions encountered is thrown.

    A classic problem for ETL toolsets is the potential volume of errors. If you're processing millions of records and experience exceptions on even a small percent of them, there will be an extremely large number of exceptions to wade through and resolve. Transform4J assists you with this problem by consolidating exception information for you. In all likelihood, while you may receive a large number of exceptions, many of those exceptions will be of the same type with the exception occurring on the exact same line with exactly identical stack traces. Transform4J will consolidate those exception reports to reduce the amount of exception output developers need to examine. By default, Transform4J will report data for the first five records receiving each exception type (yes - you can configure that number and even make it unlimited).

    A common need for ETL product customers capture records that erred out during processing. In some cases, you might need this data to test fixes to your transformations or re-process that data after problems have been fixed. Configuring this additional target for input data that errors out is easy. Consider the following example:

    JdbcDataSource source = new JdbcDataSource(myDataSource, "select * from customer");
    CsvDataTarget target = new CsvDataTarget(new File("c:/extracts/myFile.csv"));
    CsvDataTarget errorTarget = new CsvDataTarget(new File("c:/extracts/myInputErrors.csv"));
    
    MyTransformation myTransformation = new MyTransformation(source, target);
    Transform4j.runTransformation(myTransformation, errorTarget);
    				

    Next section