About EasyMorph Tutorials & Examples Web-help

Transformation: Remove empty columns


#1

Blank columns can be common in input data or when using a transformation to load files with slightly different column names. It would be useful to be able to automatically remove all columns where all its values are empty (after filtering for example)


#2

I’m thinking how this transformation can be generalized. Actually, this is conditional removal of columns. Conditions can be something like:

  • Column contains only empty values
  • Column contains empty values or empty text
  • Column contains empty values or empty text or whitespace
  • Column doesn’t contain errors (so that columns with errors remain)

The last one can be useful as well because it would allow detecting columns with error values. It would play well with conditionally derived tables and even tell exactly which columns contain errors. With addition of a transformation that filters rows with error values it would make possible to detect and pinpoint error values in any column/row.

Any other ideas what kind of conditions can be used for conditional column removal/selection?


#3

These are all good. Another option would be to specify a formula as the condition to remove. If the formula is true for all rows then the column gets removed. The formula would need to be able to reference the column so maybe something like [Input]

This may be more advanced than what’s desired. I do like the simplicity of what you propose.

I would suggest the following from your list:

  • Column contains empty values or empty text or whitespace
  • Column doesn’t contain errors (so that columns with errors remain)

Also, I later realized it’s possible to implement this without a new transformation using Iterate Table. Unfortunately, with Iterate Table the results are appended as rows and and pivot would be necessary. It would also be nice to have an “Append columns to the right” option for Iterate Table. Here’s a sample project… It’s not very efficient though on large data. It’s been running for 10 minutes on a 2,780,000 x 84 table. I didn’t make any attempt to optimize as it was just a POC. I suspect the slowdown in this is the pivot.

Remove Blanks.zip (3.9 KB)


#4

Another generalization may be “Column with only one value”… This would work for empty, white space, or if the column only has a value like “X” or 0. We get data like that fairly regularly which is just noise