The server may fail because one or more tasks use too much memory. Do you have a good way to identify those tasks ?
It would be good that natively, a monitoring of memory consumption by tasks would be possible. Currently the logs display only remaining memory globally, its a good information but not enough to explain issues.
By configuring the hard and soft limits (both are optional) in any combination, you can ensure that the Server doesn't run out of memory in most cases.
Notes:
workload management is not available for the "Default" worker.
a worker with configured workload limits can technically still run out of memory if it runs an external application that doesn't respond to termination requests from Windows
You can use that feature to manage "misbehaving" tasks by creating a "quarantine space" for such tasks:
Create a separate worker and configure its workload limits.
Create a space that runs under that worker
Move a task in question to that space.
Once a task is running in that space, you can monitor its memory and CPU consumption in near real time:
You are right about the limits, we definitely have to use them but it's not solving the situation.
The problem is that before moving tasks into a quarantine space, you have to identify them. We have thousands of tasks, it's really difficult to identify the one causing memory issues.
What if the Easymorph Server Command for example was able to retrieve the RAM/CPU used by one worker as you show in your screenshot ? We would be able to automate this monitoring right ?