5.2.1 : Catalog item type: Database table/query

RJO · May 23, 2022, 4:59pm

Hi,

This is what you announce on your official web site. Question : what is it ? How would it look like ? what does it bring compared to datasets ?

dgudkov · May 23, 2022, 6:39pm

It will be possible to specify a query in the item settings (see below).

When the item is retrieved in Desktop, the query can be modified by the user before execution (see below).

The result of the modified query is a regular dataset that can be additionally filtered in the Dataset Viewer (just like any other dataset) and/or exported to Excel or CSV.

It will still be possible to declare and use parameters in such queries even if there will be no linked projects. When such catalog item is retrieved from the Server UI, editing query will not be possible. Only changing parameters will be possible.

RJO · May 24, 2022, 7:33am

That's interesting thank you for this information. We have a lot of simple projects containing one query that has just to be updated with filters. For this, users today have to understand easymorph, follow tutorials, install the desktop and use it. This feature will simplify the process but not completely as it is still required to install and use the desktop. It would be a dream to get a graphical interface on the server !

dgudkov · May 24, 2022, 10:31am

We will eventually increase the functionality of the web interface but the Desktop will remain the main UI for a long time. What in your opinion makes the Desktop inconvenient to use? Users still need to launch some application and click some buttons. What's the difference between launching a web browser or the Desktop?

RJO · May 24, 2022, 11:05am

I would say quite easily : installation and skill. You have to know where to find the installation, which version to install (remember we are a big company), how you answer to the installation questions, how you manage your license, connect to server etc. And then you have to understand how the desktop is working, the menus, what is a table and an action etc.

The web browser eludes all this part : no training needed nor installation. Imagine a user who just wants to send a query and get the excel result, he does not want / need to use the desktop in this case.

dgudkov · May 24, 2022, 11:25am

We will find a way how to make Desktop installation as simple as, for instance, installing Zoom or Teams For instance, it can be a link in the Server UI to download, install, and configure Desktop automatically in one click. And the Desktop versions will be controlled by the Server administrator.

The Data Catalog can be as easy to use as a website. No need to learn about tables and actions. A user still has to learn how to use a website, it's not much different from learning an application.

RJO · May 24, 2022, 11:28am

Well you know in big companies, you never have the right to install things on your own ... This part is complex. I would have to show you one day how it looks like on our side.

dgudkov · May 24, 2022, 11:30am

I can imagine. I've done several BI/DWH projects in big banks in US and Canada in my pre-EasyMorph times. The rules are strict.

jcaseyadams · May 31, 2022, 8:25pm

Why don't you just create an unscheduled task on the server with parameters that the user can set when they run the task? The parameters are your filters in the SQL query and there is no need for software installation. Unless I am missing something, this seems like a good solution. And if you need to catalog each individual query, you can output the project documentation, the resulting dataset and the project itself to be stored in the catalog. I hope that, in future iterations, there will be API's so you will be able to add these items to the data catalog programmatically.

@dgudkov Correct me if I have misunderstood the thread.

Thanks!

dgudkov · May 31, 2022, 8:33pm

Hi Casey,

The problem with parameters is that the "Fixed list" or "Multiple choice" types of parameters have a fixed list of values. It may not work in cases when the user should select one or more values from a list of values that change (e.g. every day).

Also, a query allows construction of more complex filters that would be challenging to create with parameters. For instance, include A but exclude B when C = 1 or D = "N".

Finally, certain heavy pre-aggregations on large tables (hundreds of millions or billions of records) and post-filtering (i.e. filters applied on aggregation results) can be done right in the query (i.e. in the database) without pulling large datasets into EasyMorph.

jcaseyadams · May 31, 2022, 8:49pm

Absolutely agree, but the comment was "simple one query projects" and that made me think this would be an easy route. Also, I think the parameter notes are displayed on the server task, but you could easily add example/instructions that might be easier than the SW install and subsequent user education. More complex filters and aggregations might require a different approach, but I am guessing there are low hanging fruit that can be picked using Tasks on the server.

And I am not super technical and deep in the product as you are. Only devising strategic ways to use your awesome software to automate all of our data wrangling!

dgudkov · May 31, 2022, 9:46pm

Yes, Server tasks with parameters is one way of doing things. After all, computed Catalog items and tasks have a lot in common. Although tasks simply are not designed for retrieving data but rather for processing it. This is why we decided to design the Data Catalog because:

It's designed specifically for retrieving data
It's intended for even less technical users
It provides data search capabilities

RJO · June 1, 2022, 8:34am

Here are big differences between data catalog and unscheduled tasks :

Major difference in terms of result : the result of a task is a shared file on the server generally speaking. But as I was told in the forum, the result of an item will be private, only for the user. As we implement row level security based on the login of users, we could not afford to share excel output files between users.
data catalog is more convenient and far better documented. One data catalog item AND each field can be documented AND searchable. If I wanted to do the same on tasks, I would use the "note" not very well displayed and the automated documentation that describes the workflow more than the data.
Also you can access and get back the last 10 results If I remember well. And there is an interesting cache effect for common datasets (it happens a lot on our side that 2 users can get the exact same dataset).

More globally you can use the data catalog as a big dictionnary of all your data sources, even web link to other sources that you would have. Users get access to a centralized repo of data mainly organized as documented datasets, and not only a simple list of tasks.