04 July, 2008

Kettle's Regex Sample

Kettle (Pentaho Data Integration) is a very popular open source ETL tool. Built on Java platform, it can run in multiple OS such as Windows, Linux, Mac, and other Unix based platform.

Regular expression is a powerful construct to manipulate any text and is supported in Java language. With the capability of Regex evaluation step and Java scripting in Kettle, I give a simple example on how to read an "unstructured" log file and make it tabular using the steps.

Get the sample transformation file from this wiki page.

tomcat_file_log_transformation

4 comments:

Mr. PH said...

Pak, mau nanya nih, di Indo belum ada komunitas Pentaho ya ?

Lagi mau belajar2 nih, thanks

IndoNY said...

waduh, berat nih

Daniel Einspanjer said...

Have you checked out Kettle 3.1-RC1 yet? I checked in an enhancement to the RegexEval step to actually allow the creation of fields via regex capture groups. Also included is an example transformation that demonstrates parsing an NCSA web access log file.

Feris Thia said...

Hi Daniel,

Yes, I've been working with 3.1-RC1 a little while. Many changes in the UI features.

Thanks for pointing me to the sample. It is a very good sample for the use of REGEX EVAL step :)

Make me plan to write a wiki article explaining this sample in detail.

Feris