We're working on tight Python integration in EasyMorph - EasyMorph workflows will be able to call Python scripts as if they were another workflow. This will be achieved with a native Python package (plug-in) that will allow the following:
Obtaining workflow parameters in Python
Obtaining the input dataset in Python
Reading .dset files in Python
Building in Python an output dataset that will be returned to the calling EasyMorph workflow or saved into a .dset file
Accessing Shared Memory from Python
Retrieving dataset assets from EasyMorph's catalog in Python
A Python script for use in EasyMorph might look something like that (the syntax is approximate, just to illustrate the concept):
import easymorph
# somewhere down the code
# get a parameter
date = easymorph.parameters["start date"].as_date()
output_dset = easymorph.Dataset()
# process the input dataset row by row
for row in easymorph.input.rows:
description = row["description"]
print(description)
output_row = {
"timestamp": datetime.now()
"description": description.upper()
}
# build the output dataset
output.add(output_row)
easymorph.output = output_dset
Combining the power of EasyMorph workflows for extracting, manipulating, cleansing, transforming and automating data, with the capabilities of python for performing advanced data science or creating machine learning models. Isn't this what all of us data nerds dream of?
We have been using EasyMorph for about two years now, and there are very few things we haven't been able to achieve without Python. However, for the few cases where we do use it (e.g., interacting with the Qlik Sense API server) and to open up a new range of possibilities—while keeping a good balance between tech and non-tech users—it's, in my opinion, a great enhancement.
Wonderfull !
Question : what is exactly the python plugin : a venv with a special package easymorph but able to use whatever whl I want (pip install xxx)? Or is it something with more restrictions ?
This is still an early announcement, so we have not yet settled on the distribution method (for now, in development, it's just a .whl; no venv). Let us know any specific use cases or constraints you have.
At the moment I have workflows where I output data, parse it through Python, then pull the parsed back in, so it would be amazing to be able to do that within Easymorph.
It would also be good to have the same AI feature as you have rolled out for writing functions to write Python code, so for non-coders they could write a description of what they want the Python to do and have it generated for them. The alternative is that they do it in Gemini, ChatGPT etc. but without the benefit of parameter integration, so they would then have to work this out.
@vlad_dzhos will be the best person to confirm about exactly how but I've been told it should be possible to work with Pandas along with other commonly used data science and machine learning libraries.
Hi Matt,
Thanks for the quick response! That's great to hear that pandas and other data science libraries should be supported.
Having seamless pandas integration will make the Python workflows even more powerful.
Looking forward to testing this out once it's ready!
To clarify - in the first release, integration is possible with minimal 'glue' code. For example:
import pandas as pd
import easymorph as em
def dataset_to_df(ds: em.Dataset) -> pd.DataFrame:
return pd.DataFrame({col.name: list(col) for col in ds.columns})
def df_to_dataset(df: pd.DataFrame) -> em.Dataset:
b = em.DatasetBuilder()
for name, series in df.items(): b.add_column(name, series)
return b.to_dataset()
Direct to/from_dataframe API is not planned for this initial release. This first version will be fairly lightweight and without dependencies. We are looking for feedback and would of course consider expanding the API in the future.
Hi @vlad_dzhos,
Thank you for the clarification and the example code - this is really helpful! The glue code approach looks straightforward and clean. I appreciate that you're keeping the first release lightweight without external dependencies, which makes a lot of sense.
A couple of questions on the initial implementation:
The column-based conversion approach looks efficient. Will it preserve data types (dates, numbers, etc.) when converting between formats?
For the DatasetBuilder API, is it possible to add multiple rows at once, or would we iterate row by row? Just thinking about performance for larger datasets.
The minimal approach actually gives nice flexibility - we can customize the conversion based on our specific needs.
I think this is a solid foundation for the first release. Having the ability to work with pandas (even with glue code) opens up so many possibilities for data processing and analysis within EasyMorph workflows.
Looking forward to testing this out.
Regards,
Knut Petter
Dataset that you get from the workflow (em.input) will have the following type mappings:
Empty cell -> None
Number cell -> float
Text cell -> str
Boolean cell -> bool
Similarly, the same types (None, float, str, bool) are accepted as inputs in the dataset builder.
Additionally, datetime is also accepted and converted to an OADate number, the same way EasyMorph handles dates in-engine.
Generally, the idea is that at least for now users would use DatasetBuilder to prepare the output dataset (or .dset file), so the types it accepts should match what EasyMorph can work with. Similarly, the Dataset returned by input is immutable and contains only the types the workflow can produce.
Adding multiple rows at once is possible:
# src is EM dset, builder is DatasetBuilder
def builder_demo(src, builder):
# add many rows positionally
builder.add_rows([row for row in src.rows if row["count"] > 40])
# add many 'dict' rows (multiple 'list'/'sequence' rows can be added too in the same way)
builder.add_rows([
{"col_1": 1.0, "col_2": "foo", "col_3": None},
{"col_2": 1.0, "col_1": "foo"}, # different order
{"col_1": float('nan')}, # NaN -> EasyMorph empty cell
{"col_1": True}, # bool stays bool
{"col_1": datetime.now()}, # converted to EM number cell (OADate repr)
])
em.yield_output(builder)
There is also add_row which accepts a list or dict and adds a single row.
Please let us know if this kind of API fits your intended use case.
Thank you for the detailed explanation! This is exactly the kind of information I was hoping for.
The type mappings look very sensible:
Great to see that None/empty cells are handled properly
The datetime → OADate conversion makes perfect sense for EasyMorph compatibility
Having NaN → empty cell mapping is a nice touch for pandas integration
This API definitely fits our use cases well. The combination of:
Immutable input datasets
Flexible row addition methods
Clear type mappings