Table-Wide Find function?

I have a table with hundreds of columns - is there some sort of way/function to search the entire table for a specific value and then tell me where it is in the table.

Very much exactly like Excel's Find All feature - ctrl+f.

Yes, there is. Go to the Analysis View (explained in the Advanced topics in this tutorial article: EasyMorph | Profiling data), and press "Search dataset".

Helpful! But now how can I combine that to tell me which rows in the dataset has the searched value. My dataset has 10 million rows/records. Do I need to write a calculated field to look at each column to search for my value? It could be in one of many columns upward of 50-100.

Well, that's a bit different story. Filtering all rows that have a particular value in any column is possible, but not with one action.

Typically, it could be done by unpivoting the whole table, and then searching in 1 column with all the values. But your dataset is too big for that - unpivoting would produce several billion rows.

Here is one algorithm:

  1. Save the dataset into a delimited text file (e.g. CSV)
  2. Split it into 1000 smaller files (with the "Split delimited file" action)
  3. Process each file in a loop.
  4. For each file, enumerate rows and then unpivot it, and search the value in 1 column. Keep only the row numbers where it's found.

Alternatively, you can use the "Iterate column" action to process every column one by one. In each iteration:

  1. Enumerate rows
  2. Filter only the rows that contain the necessary value
  3. Save into a file (e.g. CSV file) in the "Append" mode so that row numbers accumulate on each iteration

When iteration finishes, load the generated file and deduplicate row numbers. Finally, filter the original dataset to keep only the filtered rows by their numbers.

1 Like