Schedule tasks for load to datawarehouse

reynsnivea · May 7, 2020, 6:24pm

Hi,

When we want to load a data warehouse using easyMorph, we would need to carefuly plan the order of the ETL-jobs for the differen business areas. Unfortunately, in easyMorph server, there’s no option to specify the dependency between tasks.
What’s the best practice with the current functionality to achieve this: creating one morph-file with calls to all other easymorph projects ?

Thanks
Nikolaas

dgudkov · May 7, 2020, 8:12pm

Yes, create a master project to run all other projects in the required order. Note that you can run projects as projects using the “Call” action, or run them as Server tasks using the “EasyMorph Server command” action with the command “Run task”.

reynsnivea · May 21, 2020, 7:13pm

Hi Dmitry,

Thanks. Creating a flow with several server commands triggering different taks seems to be an interesting option here.

I have created a server link but I have the impression that we need to create a connector for each space on EM-server ?

What if data domains of the datawarehouse need to be loaded with different frequencies (e.g. a flow that needs to be executed as soon as a file arrives versus a flow that extracts data from a database on a monthly basis vs flows that run once a year on specific dates)?
How can we alter the “masterflow” for loading our datawarehouse to cope with these differences ? Could this be solved using a separate module by type of frequence or data domain and execute them when some condition is satisfied ?

Kind regards
Nikolaas

dgudkov · May 22, 2020, 7:20pm

Yes, the "EM Server Command" action doesn't work with Server Link, a connector is requried.

I would assume that one master flow should be run on on schedule. If you need different schedules, probably you will need different master flows.

reynsnivea · May 25, 2020, 12:00pm

Hi Dmitry,

Is it possible to clarify the difference between the server connection that we can establish via the button on the home page (first picture) and the second one that we can create via de add connectors button (second picture) ?

The first one allows adding multiple spaces to the connector while the second one only can contain 1 space.
Do I understand it correctly that if we want to use the server command that we have to create 1 connector for each space in which we want to run tasks using that server command action ?

Additional questions:

Can we call other easyMorph projects using “call another project” action that reside in other spaces ?
The EasyMorph server command does not allow to add a parameter for the name of the task that EM-server has to run. So if we want to deploy or master flow that orchestrates all tasks in another space or server, we have to name the tasks exactly the same. Is that correct (same task name and description) ?

dgudkov · May 28, 2020, 10:23pm

The first picture shows Server Link configuration. See this post for more details: Server Link explained

Yes, the "Server command" action requires a connector to EM Server. Each connector can connect to 1 space only.

The "Call another module/project" action doesn't "know" about spaces. It refers a project by project path (absolute or relative). If you need a project to run a space task, use the "EM Command" action.

Alternatively, you can use a parameter to provide a project with a path to another project that may happen to be in another space. But that can be inconvenient, because by default the file picker in Server UI doesn't allow picking files from public folders of other spaces. You can enable picking files from the whole disk drive, but that can be undesirable from a security perspective if untrusted people can run the task.

The "Run task" command in the "EM Server command" action identifies a task by an internal ID rather than a name. That's why it doesn't support parameters. Two tasks may have the same name but different IDs. So that won't work.

reynsnivea · June 19, 2020, 6:22am

Hi Dmitry,

The proposed solution to coordinate all tasks is a bit cumbersome when we have multiple spaces. We have multiple development spaces: dev_gebouwen, dev_installaties, etc. As I understand it, I need to create a connector to each space (so that I can use it in EasyMorph server command) en then select a tasks in EasyMorph server command to run.

Since we are moving to production with some data flows in short notice, I would like to find the best possible solution.

Ideally, it should be possible to copy all the tasks from our development-spaces to our production spaces. Is there absolutely no way to do this so that we do not have to create the tasks manually?
- Maybe creating a script that creates all the tasks in the required space ? I have looked at the command line interface for EM-server but creating a tasks is not available as command?
It should be possible that if we have our “master” ETL-flow that triggers all other ETL-flows, to in some way parameterize the task that is triggered in the EasyMorph server command action.

Thanks !

dgudkov · July 5, 2020, 9:06pm

I understood your problem: you need to replicate spaces, but it should work without restarting the Server service.

Let me discuss it with our development team.

reynsnivea · August 11, 2020, 1:46pm

Thanks !

dgudkov · August 19, 2020, 8:56pm

We had an internal discussion about it. Hot reloading of spaces (i.e. without restating Server) isn’t currently available and isn’t trivial to implement.

We’re currently working on a major redesign of Server in order to support running spaces under a different account than the Server service. It’s planned for release in version 4.5. After the new version is released we will be able to get back to discussing this issue, but not before that.

reynsnivea · January 17, 2021, 4:13pm

Hi Dmitry,

Is there some evolution in EM-server development so that we can replicate tasks created in e.g. a development space in a test and production space ?

As our datawarehouse and the number of EasyMorph tasks grows, we need a reliable way to manage the load to the datawarehouse in the right order (= a number of EM-server tasks).
If we also have to replicate the tasks from development in production manually, this is prone to error.

It would be nice if we could select them and copy them between spaces in one operation or create a script using ems-cmd to create tasks in a space.

Thanks !
Nikolaas

reynsnivea · April 26, 2021, 8:36am

@dgudkov : Hi Dmitry,

Is there some progress so that we can create tasks in other spaces based on e.g. a script in ems-cmd or functionality to copy taskes between spaces?

Thanks !

dgudkov · April 27, 2021, 9:37am

Hi Nikolaas,

No progress on that so far. We’re entirely focused on version 5 which is the top priority at this point.