04 July, 2008

Kettle's Regex Sample

Kettle (Pentaho Data Integration) is a very popular open source ETL tool. Built on Java platform, it can run in multiple OS such as Windows, Linux, Mac, and other Unix based platform.

Regular expression is a powerful construct to manipulate any text and is supported in Java language. With the capability of Regex evaluation step and Java scripting in Kettle, I give a simple example on how to read an "unstructured" log file and make it tabular using the steps.

Get the sample transformation file from this wiki page.