How to run task when file appears in folder

dgudkov · August 25, 2020, 10:14pm

Currently, EasyMorph (both Launcher and Server) can only trigger tasks either manually or on schedule. Event-based task triggering (for Server) is on the product roadmap but in order to get there, we will have to redesign the way tasks are executed on the Server and add the ability to run multiple instances of the same task (currently only one instance of a task can run).

Meanwhile, here are a few ideas how you can trigger a task on a file event, such as a file appearing in a folder.

Ideas #1, #2, and #3 only work when the next file is be placed in the folder not sooner than the task finishes processing the previous file.

Idea 1: Run task every 5 minutes

In the simplest case, you can just run the task every 5 minutes using the Continuous schedule available in both Launcher and Server. The project should do the following:

Use the “List files” action to obtain a list of files in specified folder
Optionally, use a filtering action to keep only a file (or files) with specific name(s). If the needed file(s) is not there, at this point the list should be empty
Use conditional branching to continue only if the list is not empty, for instance, a conditionally derived table. In that table call a module that does what’s needed with the file(s).
When that module finishes, delete the processed files, or move them to another folder. Otherwise they will be processed again and again by the task.

Pros: Simple.
Cons: Latency can be as big as 5 minutes.

Idea 1A: Run 5 tasks every 5 minutes with different delays

Basically the same idea as the previous one, but instead insert the “Wait” action as the very 1st action in the project. The delay of the “Wait” action must be configured using a parameter, e.g. {Start delay}.

Now create 5 tasks in Launcher/Server that run the project every 5 minutes (i.e. all 5 tasks run the same project). However, each of the tasks runs with a different value of the parameter {Start delay}.

Task 1 runs with {Start delay} = 0.
Task 2 runs with {Start delay} = 60.
Task 2 runs with {Start delay} = 120.
Task 3 runs with {Start delay} = 180.
Task 4 runs with {Start delay} = 240.

In this case the latency can be as low as 1 minute.

The idea can be adapted for the case when processing an input file takes longer than 1 minute (i.e. another task will be triggered sooner that the file is processed). In this case the input file must be moved to a temporary folder for this task first. Make sure that each task works with its own temp folder, and no temp folder is shared between two tasks. In this case, another parameter can be used to specify a temp folder name for each task.

Pros: scalable, low latency.
Cons: multiplies the number of tasks that needs to be set up.

Idea 2: Lower latency

If a smaller latency is required, you can wrap the project from the Idea #1 in a loop, and add the “Wait” action to insert a pause between iterations. The pause can be, for instance, 30 seconds. In this case, you will need to have a loop with 9 iterations, with each iteration lasting no longer than 30 seconds, so that the total delay is no longer than 4 minutes and 30 seconds. Then trigger the loop every 5 minutes using the Continuous schedule. The main loop will look as follows:

Use the “Generate sequence action” to generate a table with 9 rows.
Use the “Iterate” action to run the project from the Idea #1. The project should additionally have the “Wait” action configured for a 30 second delay. Make sure that the delay is enforced in both branches of the conditional branching - i.e. when the necessary file(s) is found, and when it’s not.

Pros: lower latency, can work reliably only with short tasks ( up to 5-6 seconds) or if new files appear not sooner than every 5 minutes.
Cons: if the main task takes long time, or new files appear frequently (more frequent than every 5 minutes), there is a risk that the task will be attempted to run again while it’s still running, which will cause the task fail.

Idea 3: Trigger task via Server API

This idea can be suitable for cases when the system that generates the source files can run an external application immediately after it places a source file into the designated folder. In that case the external system can run the ems-cmd utility from the command line which in turn will trigger the task via the Server API. See here more information on ems-cmd: https://github.com/easymorph/server-cmd

Pros: The lowest latency time possible.
Cons: the external system must be able to run programs from the command line.

Idea 4: Queued processing.

Unlike the previous ideas, this idea may work even if a new file appears before the previous one is processed.

What if the new files appear frequently, e.g. every few seconds? In this case, the best approach would to queue files in a folder and then process them in one batch. I.e. run every 5 minutes a task that will process a batch of files, and remove them from the input folder (clean up the queue). It might be a good idea to move input files into a temporary folder before processing them.

Pros: Reliable and simple.
Cons: Latency can be big. May not work if incoming files have the same names.

Notes

If you can do batch processing, do rather it then anything quasi-real-time. Batch processing is the most reliable way of processing files so far.
If processing a file takes a long time and a low latency is expected the idea 1A is probably the best.
Keep in mind that when a file appears in a folder it can still being copied and thus incomplete (esp. large files). If you read an incomplete file, you may lose data.

If there are no lock files (aka marker files) used, it might be a good idea to only process files that were created at least N seconds ago. The “List files” action allows obtaining the file creation timestamp. So the filtering condition may include something like that:

  now() - [Created] > 1/24/60/60 * 5   // Keep only files created no later than 5 seconds ago

RJO · August 27, 2020, 2:57pm

We implemented Idea 1 (every x minutes) but the “problem” is that after the file is processed, the task is running for nothing.

It’s still possible to build a new task to disable these tasks already completed and reactivate them at midnight for example. Double problem : impossible to disable task by id/name (drop down list in easymorph) and beware if your midnight job fails : nothing will run the day after

Hendrik_Lombard · November 22, 2020, 7:21pm

I found an excellent, free utility called “Directory Monitor” which you can set up to monitor a folder and then run a morph project when a new fileis detected.

DirMon exe

Tomas_French · January 15, 2024, 6:53pm

Hi Dmitry,
I've implemented Idea 1A and runs perfectly. I've selected the "halt project execution" option on the first export so the task runs every 5 minutes but doesn't execute after the first one. Do you see any problems with this approach?
Secondly, and most importantly, if my PC is turned off and Windows is not active, the task will not run. Is there a workaround to have the scheduled task run when Windows is not active?

dgudkov · January 15, 2024, 9:57pm

You would typically have another computer (server) that is always on to run your EasyMorph tasks.

Richard · October 23, 2024, 11:50am

Hi Dmitry,
I'm keen on having the same ability to trigger EasyMorph processing once a file arrives in a directory - has this ability been included in the roadmap since this original conversation thread?

If not, do you know if it's due to be available in the near future or would I have to use one of your options listed above for now?
Many thanks,
Richard.

dgudkov · October 23, 2024, 3:03pm

Hi Richard,

For now, use one of the options described above. In version 5.9, we're introducing the concept of triggers - various events that can trigger tasks. I can't confirm yet that the "file appears in a folder" trigger will be available in the initial release, but in any case, it will be added rather quickly after triggers become available, as it's a frequent feature request.