Data Catalog: instructions for beta-testers

The Data Catalog is a feature of EasyMorph Server. However, for the purpose of beta-testing, you don't have to install EasyMorph Server. Instead, we've made a public demo site that is available at https://demo.datacatalog.com.

Configure the test environment

1) Request a demo account

Send me a DM (direct message) here on this forum to obtain a demo account. The demo account is a Server space (with password protection). Note that in the production version, the Data Catalog will only be available for spaces with Active Directory authentication (for licensing purposes).

Test spaces are isolated from each other using Server workers that run under different Windows accounts. Beta-testers can't access files of each other.

2) Download EasyMorph Desktop v5.2 (BETA).

Here is the download link: Download EasyMorph Desktop v5.2 (BETA)

Trial key valid until May, 31st: EasyMorph.zip (you will need it if you install the beta under a different Windows account)

To avoid overlapping with an existing EasyMorph installation, it is recommended to install the beta Desktop under a temporary Windows account on your machine, or on a machine without an EasyMorph installation.

Certain features of the beta version may not work correctly. Don't use it for production data or workflows.

3) Configure Server Link

Configure Server Link to point to demo.datacatalog.com, use HTTPS, port 443.

image

Also, configure your Connector Manager to use connectors from your test space:

image

Data Catalog overview

The purpose of the Data Catalog is to simplify access to various sources of business data and provide means for access audit and governance.

Data Catalog has a hierarchical folder-like structure of directories. Each directory contains catalog items and/or other directories.

:exclamation: Note that catalog directories are NOT subfolders of the Public folder (tab Files). The directories are a separate data structure that is stored in an internal database and not accessible externally.

Catalog items can be of 3 types:

  • Datasets
  • Files
  • URLs

In turn, each of the item types can be static or computed (using an EasyMorph workflow). Therefore, in total there are 6 item types:

  1. Static EasyMorph datasets - regular .dset files stored in the Public folder of EasyMorph Server.

  2. Computed datasets - EasyMorph datasets computed dynamically on the fly using a published project stored in the Public folder of the Server. The result table of the project is the resulting dataset.

  3. Static file - any file stored in the Public folder of EasyMorph Server. For instance, a PDF file.

  4. Computed file - a file stored in the Public folder. The relative path to the file is computed dynamically on the fly using a published project stored in the Public folder. The first value of the first column is the resulting relative path to the file.

  5. Static URL - any URL (e.g. a hyperlink to a PowerBI report)

  6. Computed URL - a URL that is computed dynamically on the fly using a published project stored in the Public folder. The first value of the first column is the resulting URL.

As you can notice, computed URLs and files paths use the top (first) value of the first (leftmost) column. Example:

image

If the result dataset has more columns/rows - they are ignored. The first column name doesn't matter.

Since computed items (datasets, files, and URLs) use a project to calculate its result, project parameters can be specified before retrieving a computed item. This makes computed catalog items somewhat similar to Server tasks. However, they are not Server tasks. The Catalog is a separate feature of EasyMorph Server

More item types will become available in later releases.

Working with the Data Catalog in Desktop

EasyMorph Desktop now consists of two integrated applications - Workflow Editor and Data Catalog. You can switch between them in the app bar, introduced in version 5.2 (see below). The Workflow Editor is what EasyMorph Desktop was prior to v5.2. Let's look closer at the Data Catalog part:

The app sidebar (1) is where you switch to the Catalog. It's been introduced in v5.2.

The "Recent computations" sidebar (2) shows the recent computed items. Note that static items aren't displayed here.

Finally, the biggest part of the screen is occupied by the catalog browser (3). Here is you can browse directories and sub-direcories, create new directories and catalog items, and retrieve items.

Let's look closer at a catalog item:

image

When hovered with mouse, it reveals two buttons - "More details" and "Retrieve". The former displays a dialog with more information about the item (such as annotation or related items). The latter retrieves the item.

Retrieving items

What happens when you press "Retrieve" depends on the item type:

Retrieving a dataset will download it to your machine and open it in the Dataset Viewer (more on that below).

Retrieving a file will save it to the specified folder on your machine.

Retrieving a URL will open the URL in the default web-browser on your machine.

Parameters

When retrieving a computed item, you may be prompted to provide parameters first. Note that parameter annotations are displayed as well.

image

When a catalog item is created/edited, parameters to enter should be explicitly selected from a list of available project parameters. By default, computed items have no parameters.

Dataset Viewer

The Dataset Viewer displays retrieved datasets, computed or static. In the Viewer you can:

  • View dataset
  • See dataset metadata (unique counts, etc.)
  • Find values
  • Send dataset to a sandbox (new or existing) in Workflow Editor
  • Save dataset in a supported file format (csv, xlsx, dset, etc.)
  • View catalog item details (description, related items, parameters, etc.)
  • Discard dataset from memory

The Dataset Viewer can keep several datasets open. Since all of them are stored in memory, make sure to discard large datasets when they are no longer needed to avoid running out of memory.

Adding catalog items

Adding catalog items is done from the start screen of the Catalog.

image

Note that if you are creating a computed item, you will need to create a corresponding EasyMorph project first, and publish it to the Server. Then, when creating a computed item, you can specify the published project in item settings.

Item fields

It is possible to describe fields of each item, be it a dataset or a file (e.g. a PDF file). These fields are searchable - you can find an item by a field. In the initial release, item fields should be created manually. In later releases, we will add tools for automated field creation/editing.

Working with Data Catalog in the web UI

After logging in EasyMorph Server, the Catalog is available in a new tab unsurprisngly named "Catalog".

From the web UI, it is possible to view item details and retrieve items (computed or not). Note that viewing datasets is not possible from the web UI. Retrieved datasets will be available for downloading and further viewing locally in EasyMorph Desktop.

Retrieving files and opening URLs works as expected.

Journal

Most operations with catalog items are recorded in the Server journal. Therefore you can view who accessed what item and when.

Accessing Server journal of the public demo site is not possible for beta testers. If you would like to test journalling of catalog operations, request a Server installer of v5.2. from our support.

Here is my first feedback. Globally itā€™s working quite well and I understand the logic :

  • In the desktop, when I click on details on a catalog item, I have a description tab but I should have a Fields tab and itā€™s not there. On server, the tab is there. So Fields description are not displayed in the desktop, you can only use them to search

  • On Server, the recent computations look quite similar between them. It would be good to add the values of parameters used, at least the date of running. Same on Desktop. Or we can not differentiate results even if I guess that they are displayed by last run date desc. Itā€™s a bit confusing.

  • On Server when I click on success on last results I have an error : Unable to get or render event details: Event not found

  • On Server for computed dataset, I see that I can just download the file .dset. That is really disappointing because I would expect to be able to download at least as excel as in the desktop. I really appreciate this feature with the fact that you donā€™t need to store the .dset. The thing missing is the output : users would expect more possibilities => they will always have to use the desktop in this case

  • Question : can we disable the last results in desktop and server ? I mean disable the storage of last computations ? Because I see that anyone in the workspace can download the result of another one. So if you intend to apply a security based on login for example, you canā€™t because each one has access to others computations.

  • I had bugs with file computations. I understand that I have to store the result on the server in this case. I used this path : Dataset\states.csv. On desktop, If I wanted to save in X folder, I understood that I had to create MYSELF the folder Dataset under X so that it can work. This will be not clear for a user. More than that, Itā€™s not working : I think you are searching the file to download in the local saving folder defined by the user but you should rather search on the server, where I computed it. So itā€™s not finding the good file. On server when I wanted to download I had a red ā€œOKā€ message but no download.

  • On dataset from desktop, when you click to open the dataset and before the dataset is opening there might be seconds or minutes if itā€™s very big. I would suggest to display a loading animation or something or we donā€™t really know what is happening.

Edit : the ā€œMorph itā€ button after dataset upload does nothing for me. I think itā€™s the only way to store running configurations including parameters. It would be good to have bookmarks including on the server, as we already spoke together

It seems I can now download the computed file on my laptop only if I create before the server folder tree where the file is supposed to be. On the server, I can not still download the file, there is an ā€œOKā€ message in red and thatā€™s all.

Note that when you click on retreive on a computed filed, you are directed to a wrong page => itā€™s redirecting to ā€œdefaultā€ workspace instead of my workspace.

Regular users will be a bit surprised by this behaviour. They expect to get immediately their file and instead, they must wait for the catalog task to finish and click on download, not so obvious. And as I said in the previous post, there should be at least the time of execution and also the requester. Other idea : filter by default to only see YOUR own executions, and add a filter to see all executions.

Edit : There is something you can implement on server and desktop. Imagine you do as described above : only your own last submissions are shown by default, not others (there would be a button for that). You then can also add a button to rerun a submission. That would act as bookmark : one user would be able to rerun the same extract with the same parameters every day.

Thank you very much for the feedback, Romain! A few comments:

The Fields tab will be available in one of the next updates of Desktop 5.2 BETA.

In the final release, it will be possible to download a dataset in different formats - CSV, XLSX, etc.

To simplify beta-testing, we don't require Active Directory authentication but this in turn doesn't allow distinguish users. In the final release, the Catalog will only be available with AD authentication and users will only see their own computations and results.

Animation will appear in further updates of Desktop 5.2 BETA.

A few questions from my end:

  • Does the whole idea of the Catalog seem useful for your organization? The Catalog is basically a virtualized view of datasets (and other analytical assets) stored elsewhere or obtained on the fly. Which is different from a traditional centralized data warehouse where data is "materialized".
  • Does the "URL" item type make sense? Do you see a use for it in your organization?
  • Is the process of creation and publishing catalog items sufficiently simple for non-technical people?

Hi Dmitry,

From my perspective, I was expecting a kind of shared super data catalog (see attached project). In this example, from an open data connection, I select a set of datasets and then an iterated module would add each dataset to a catalog. A kind of ā€œadd to catalogā€ action. A set of dedicated actions to manage data catalog could be defined. (suppress, update if exist, freezeā€¦)
To retrieve the datacatalog datasets in a project, I was expecting a 3rd tab in the connection manager, a catalog folder could be the equivalent of a connector name.
To sum up, something like data catalog/data virtualization/semantic layer combined and done the EM way.

As usual, your team and you did an awesome work. Iā€™ve said it many times but EM is the most innovative software Iā€™ve seen on the last 10 years.

Regards

Open Data CVO WIZ data cat 2.morph (11.6 KB)

1 Like

That sounds great thanks !

Yes of course data catalog is useful, Iā€™m only regretting it costs more on server side :frowning: But yes for users itā€™s very convenient because they can build new datasets on their own or use the one we will create for them. Everything is centralized and searchable, thatā€™s a big plus. The equivalent I have in mind is datalake + Atlas. What we have now, tasks and files, are not so easy to use and everyone can see the files. Itā€™s more a batch thing than what you propose now, which is more suitable to end users in interactive mode.

Links are interesting for my team because we provide a lot of links including power bi reports, paginated reports and so on. This is a way to centralize everything, and users really need to stay in the same tool to avoid wasting time. I think you should keep it, and Iā€™m sure itā€™s not hard to maintain.

The process of creating and publishing items is very simple. You barely can make it easier than that.

1 Like

Thank you for the feedback, Christophe!

The Catalog will have tools for automation and an API, just not in the initial release. There will be a new action, ā€œCatalog commandā€ that will allow users automatically create/update/delete catalog items and their metadata such as fields. Therefore it will be possible to retrieve a list of database tables or files, and generate catalog items with an EasyMorph workflow.

The beta-testing of the Data Catalog is now open. The topic has been moved to the main category.

Hi Dmitry
I uninstalled EasyMorph on my computer, then installed the Beta version.
In Diagnostics: EasyMorph version: 5.1.2.20 (ef8c88, ā€˜Beta releaseā€™)

I configured the Server Link
I also configured your Connector Manager, but cannot see any connector names.
I can also not see the Data Catalog.
In the notes above you indicated that the Data Catalog is available from version 5.2.
Am I on the wrong version?

Thanks

Hi Rykie,

This is what you should see after configuring Server Link and switching to Data Catalog in EasyMorph Desktop.

There are no catalog items because you havenā€™t created any yet. Each beta tester has his/her own space which is initially empty. Create a few catalog items first. See ā€œAdding catalog itemsā€ in my post above.

Hi Dmitry

I do not see that.

This is what I see. The data catalog is not available.

Where am I going wrong?

Try reinstalling the beta using the link in the opening post. Iā€™ve updated it to the most recent build.

Thanks, Dmitry.
It was user error (me).
I did not install the key.

1 Like

Yes, the free edition doesnā€™t have access to the Data Catalog.

Hi Dmitry

My feedback from a users perspective:

  1. The Data catalog concept is important
  2. I do not totally understand the computed imports
  3. I know that saving to your hard drive is great for users, but it can be challenging from a data perspective one source of the truth.
  4. Adding fields - Drop and drag would have been great or importing the headers.
  5. You cannot change the description or long description after publishing

I had some experience in working with Samenta with setting up a data catalog about 4 years ago.

We did it centrally and set up related links ā€œthat you could see in a diagrammeā€
I am trying to get my head around individuals loading their data catalogā€™s. I am used to a central point (such as from a server side) that loads the ā€œweb of dataā€ & metadata and then gives access to individuals to a dataset. I am not sure about the data governance if anyone can load data.

Another suggestion: From Server, can you pick up all data sources being used and start a catalog that way.

I hope my comments helped.

Thank you for the feesback, Rykie. See my answers below:

I do not totally understand the computed imports

A computed item means that the result is produced on the fly using an EasyMorph project stored on Server in the Public folder. For instance, a computed dataset means that the resulting dataset isn't a ready .dset file, but instead computed on the fly using an EasyMorph project.

Similarly, a computed hyperlink (URL) is computed by an EasyMorph project, that should return a URL in the 1st row of the 1st column. Then the resulting URL is opened by EasyMorph.

I know that saving to your hard drive is great for users, but it can be challenging from a data perspective one source of the truth.

The assumption is that the Catalog serves as the single source of truth. If everyone obtains the same data from the Catalog, then everyone is on the same page.

Adding fields - Drop and drag would have been great or importing the headers.

Yes, it's fully manual in the initial release, but we will add ways to semi-automate or fully automate adding field information.

You cannot change the description or long description after publishing

Every item property, including the descriptions, can be changed. For that, click the black arrow in the item and select "Edit" in the item menu. Edit a description and then press Publish.

image

Perhaps, I should do a webinar and demo the Catalog and answer questions about it. I understand that the concept is new and it requires a bit of adjusting especially for long-time EasyMorph users.

Thanks, Dmitry. Yes, a demo will be good.

@dgudkov

Hi Dmitry,

It would be very interesting to see a video explaining the capabilities of the data catalog (product showcase) and maybe another one explaining more in depth how to work with it.

For example: how does one distribute the catalog to other users who are outside your organisation ?
Can we grant access rights for each catalog item ?

Thanks !
Nikolaas

We will do a webinar on Data Catalog on June 24th at 10am EST (3pm GMT). In the webinar, I will demonstrate the Catalog and answer questions. The registration link is below.

image

>> Click to register <<

1 Like

Hi Dmitry,

Great initiative !
Unfortunately, I have another item in my agenda at that point.
Will there be a recording of this webinar ?

Thanks !
Nikolaas