Simply add new column for call/program execution result

For both calling another EasyMorph project and for an external program, I would like a transformation to simply return the value(s) as additional column(s).

For instance, I just wrote a little PHP script that takes a URL and returns the likely language of the page (“en”, “fr”, etc).

For this case, ideally I would be able to add a transformation that:

  • I build the full command like the current EasyMorph Run Program transformation
  • I indicate a new fieldname where the result should go in
  • This new column is simply added to the existing columns

So if I had a table like the following…

URL



… this transformation would call the external program for each line, and when it returned “en” the first time, and then “fr” the next time it was called, the resulting table would be:

URL | language

So it’s kind of a blend of Calculate (which adds a column with the calculated value) and Run Program.

Ideally the external program could return multiple values that were added in multiple columns (perhaps space delimited) but at the same time the Split transformation could accomplish this.

Basically, something like Pentaho ETL’s execute a program’s “Result fieldname”: http://wiki.pentaho.com/display/EAI/Execute+a+process

Similarly, I have an EasyMorph project that normalizes URLs. So it takes as an input a URL and returns a normalized URL and hash of this normalized URL (which I use as a key throughout). Ideally I would:

  • Indicate which column(s) are passed to the project that normalizes URLs (perhaps just “URL”)
  • The called project then returns two values ("perhaps “normalized_URL” and “URL_hash”) — ideally it’s just the final table in the default
  • The main project then just adds these new columns to the table

At least for how I’m structuring things, this allows for a cleaner implementation. For one thing, that URL normalizing project can be used in more contexts (where the table before the transformation sometimes has tons of columns and sometimes not, but I always would have a “function” to call to normalize the URL).

The Append transformation can work in “append columns” mode. If the resulting dataset returned by Run Program has the same number of rows then it can simply be appended to the right of the original dataset using the Append transformation.

Would that work?