Is there any teasing about the future data catalog?

RJO · April 2, 2021, 1:58pm

I wonder what it will look like. A data catalog is cool to show metadata to the users and have a common way of querying whatever the source (file, database, …) which is hiding the implementation. I don’t know if it’s the purpose ? And how it will be managed. I guess there will be no data modeling as it’s not the initial purpose of Easymorph ?

So many questions and great expectation about this new feature !

dgudkov · April 2, 2021, 4:18pm

We don’t have ready UI sketches yet, although from a functional perspective the concept is as follows:

Data catalog consists of entities (e.g. orders or customers). Each entity is basically a table. Behind each entity, there is an EasyMorph project that produces a dataset on demand and according to specified parameter values (such as start date or end date, country or region, product group, etc.).

Each entity has a searchable general description and column descriptions, and a sample result.

Each entity can be certified. Certification expires after 3, 6, or 12 months. It will be possible to forbid using non-certified entities. This should encourage keeping projects up to date.

In the most basic scenarios, users will be able to retrieve a data entity by finding it in the catalog, pressing “Retrieve” and providing parameter values (if needed).

It will be possible to access the data catalog from both Desktop (in the initial release) and Server (later releases).

Access to the data catalog from Desktop and Server will be included in the Professional license. Accessing from Server UI only will require a separate user license (EasyMorph Viewer) which will be less expensive than Professional. Accessing the data catalog from Excel is on the long-term roadmap too.

Now a few words about the most unusual part of the data catalog. It will be bi-directional. It will be possible to not only read data from a data entity but also write (export) into it using a special EasyMorph action. The underlying project will receive data using the “Input” action and export it into the target system as necessary.

RJO · April 6, 2021, 8:28am

That’s really promising ! It’s like I imagined it and it will probably help us on one of our project where we have to deal with a big table to query by region, exactly as you mentioned in your post. Certification is a very good idea also.

The only thing that may be missing is the way to query entities. I understand that they will be queried by parameters which is the most natural way of course. But the best way would probably be to have a query editor on them and that would not be easy at all to implement. Well at least I guess we can manage a text parameter containing a “where” clause and use it in the entity project But a true graphical query editor would be awesome.

dgudkov · April 6, 2021, 12:53pm

Yes, we’ve come to a similar conclusion. It would be very helpful to be able to specify at least filtering conditions visually and similarly to parameters. It’s not on the roadmap yet, but once we ship a couple of versions 5.x we may get back to this question.

Overall, the idea of data catalog is exactly as you described - to isolate users from the complexity of the underlying systems and to make user experience simpler and less technical.

RJO · April 14, 2021, 1:46pm

We are interested in this data catalog access on server. I mean currently we did tasks for that but it’s not very user friendly (people have to search the results in the file tab after a task completion, I’ve already made a suggestion to enhance that in the forum).

The good thing would be to have a new tab “Data Catalog” with possibility to choose a data source. Then you enter your parameters and you get your results in the same web page, to be exported in txt or excel format. Maybe a preview or paginated results would be good.

dgudkov · April 14, 2021, 2:26pm

We envision it in a similar way. It will be a separate tab, "Data catalog" with a list of data entities. When a data entity is selected, the user provides parameters and the output format - xls, dset, or txt. The result will remain available for some time (e.g. 1 hour, configurable in space settings). Alternatively, a download link to the result will be sent when the task finishes.

Although, in the 1st rollout phase, the Server UI won't be ready yet. The data catalog will only be available via Desktops in the beginning. The Server UI will be added in later phases.

A sample data preview will be available for each data entity (as metadata) so that users can get an idea about what kind of data they would get from this data entity. A preview of result datasets is not planned.

I believe, the bi-directional export feature should also be interesting to you. It's just very unusual for a data catalog, so it may not be immediately clear what use cases it may have. But it's actually quite cool, as it enables non-technical users to securely and reliably export/update data in enterprise systems without knowing SQL or web APIs. You will also no longer need to build web-forms or applications to let users see/modify data in enterprise systems.

RJO · April 14, 2021, 2:59pm

Yes it’s interesting. One common use case we have is the possibility to comment data. You manage a comment table as data source in the data catalog and users would just fill parameters to massively update the comment table at a specified level of aggregation in the source.

RJO · May 18, 2021, 4:35pm

If I try to build the puzzle entirely : will it become possible to link a list of values parameter to a data entity ? Because in another post you said that it was more logic to bind list of values parameter to modules. So it becomes even more logic to be able to dynamically bind list of values to entities ?

That would change the easymorph logic because in order to run one project, you would have first and implicitly to run others, with dependencies, as your parameters would depend on entities. And it will be even more difficult on the server as each time you display task properties, you display parameters. Maybe you will have to handle cache of data for entities. Not an easy thing for sure !

dgudkov · May 18, 2021, 5:44pm

It will be possible to import a data entity using an action, and therefore it will be technically possible to use it for a list of values in a dynamic parameter. I’m not sure if it’s a good use case for a data catalog entity, because it will overcomplicate things.

We envision that a dynamic parameter will obtain a list of values from a module that either imports the list directly from a database or reads it from a file.

Caching is a good question and we’re thinking about it too. Caching can already be done using existing actions in EasyMorph, but we would like to offer simple ways to cache datasets in workflows and data catalog entities.

dgudkov · November 2, 2021, 10:34pm

The thread continues here: