Tutorials & Examples Web-help Blog

Data Quality on columns


#1

There is a feature many of today’s data wranglers have : data quality on columns. You can see the different values, the distribution of values at a glance. You can also filter dynamically using the same screen and see the distribution of other columns changing at the same time.

You could think of a new panel at the bottom of easymorph that could detail the content of columns in the table the user has selected.

Also still looking forward to have the possibility to create cross table and export them, like you can do with chart.


#2

We’ve added data profiling in version 3.8 which has just been released.

It shows various counts for a selected column, with the ability to instantly create a filter action for each count.

Also, the profiler shows a distribution histogram for numerical values. The histogram has two rulers for filtering a range of values visually.

The profiler window is floating which means you can click and profile other columns without closing the profiler window.


#3

That’s good but quite limited.
Example : if you have a string column containing names, you expect to have the count number by name in a histogram.

Here i feel that only number can be profiled, is that correct ?


#4

Yes, the histogram shows distribution for numeric values. Counts for values not available in the profiler yet. If you need to see counts for distinct values, right-click the column and pick Aggregate/Count.


#5

Okay, that’s a good start. Just waiting for values count and if possible interactions with bars to auto filter.
That could be a good thing. Also a dedicated panel with these informations on all columns (not one by one) could be nice !


#6

Good point ! Is there a workaround to achieve the profiling on all the columns with the current profiling options ? It does not seem possible yet because I cannot see an option to write the output of the profiling to a table ?

Does easyMorph forsee to elaborate the profiling options in the near future ?


#7

Not possible at this point.

We will be adding automated data quality suggestions in 3.8.1 and full-table profiling in 3.9.


#8

Hi Dmitry,

I have installed the version 3.9 but it appears that the full-table profiling is not implemented yet ?
Is this still on the roadmap ? When will it be released ?

Kind regards


#9

It’s still on the roadmap but not assigned to a specific release at this point.


#10

Any news on this topic? We would realy appreciate to be able to do automatic profiling of entire datasets so that we do not have to incorporate a set of tests in each project.


#11

No release assigned for this feature yet.