Opinions needed: Python integration

We're working on tight Python integration in EasyMorph - EasyMorph workflows will be able to call Python scripts as if they were another workflow. This will be achieved with a native Python package (plug-in) that will allow the following:

  • Obtaining workflow parameters in Python
  • Obtaining the input dataset in Python
  • Reading .dset files in Python
  • Building in Python an output dataset that will be returned to the calling EasyMorph workflow or saved into a .dset file
  • Accessing Shared Memory from Python
  • Retrieving dataset assets from EasyMorph's catalog in Python

A Python script for use in EasyMorph might look something like that (the syntax is approximate, just to illustrate the concept):

import easymorph

# somewhere down the code

# get a parameter
date = easymorph.parameters["start date"].as_date()
output_dset = easymorph.Dataset()

# process the input dataset row by row
for row in easymorph.input.rows:
    description = row["description"]
    print(description)
    output_row = {
        "timestamp": datetime.now()
        "description": description.upper()
    }
    # build the output dataset
    output.add(output_row)

easymorph.output = output_dset

Thoughts? Suggestions? Questions?

2 Likes

Combining the power of EasyMorph workflows for extracting, manipulating, cleansing, transforming and automating data, with the capabilities of python for performing advanced data science or creating machine learning models. Isn't this what all of us data nerds dream of? :pray:

Good move from a product standpoint for sure!

3 Likes

We have been using EasyMorph for about two years now, and there are very few things we haven't been able to achieve without Python. However, for the few cases where we do use it (e.g., interacting with the Qlik Sense API server) and to open up a new range of possibilities—while keeping a good balance between tech and non-tech users—it's, in my opinion, a great enhancement.

3 Likes

Wonderfull !
Question : what is exactly the python plugin : a venv with a special package easymorph but able to use whatever whl I want (pip install xxx)? Or is it something with more restrictions ?

1 Like

Hey @Samuel_Flandrin,

This is still an early announcement, so we have not yet settled on the distribution method (for now, in development, it's just a .whl; no venv). Let us know any specific use cases or constraints you have.

This is fantastic news!

At the moment I have workflows where I output data, parse it through Python, then pull the parsed back in, so it would be amazing to be able to do that within Easymorph.

It would also be good to have the same AI feature as you have rolled out for writing functions to write Python code, so for non-coders they could write a description of what they want the Python to do and have it generated for them. The alternative is that they do it in Gemini, ChatGPT etc. but without the benefit of parameter integration, so they would then have to work this out.

2 Likes

Hi,
This looks great!.
Will the integration support working with Pandas dataframes as well.

An example might be:

import easymorph
import pandas as pd

c: Convert input to pandas DataFrame
df = easymorph.input.to_dataframe()

c: Process with pandas
df['description'] = df['description'].str.upper()
df['timestamp'] = pd.Timestamp.now()
df['category'] = 'processed' # Add new column

c: Convert back to Easymorph dataset
easymorph.output = easymorph.Dataset.from_dataframe(df)

This would make it easier to leverage the full pandas ecosystem for data manipulation.

Hi @Knut_Petter_Nor ,

@vlad_dzhos will be the best person to confirm about exactly how but I've been told it should be possible to work with Pandas along with other commonly used data science and machine learning libraries.

Regards
Matt

Hi Matt,
Thanks for the quick response! That's great to hear that pandas and other data science libraries should be supported.
Having seamless pandas integration will make the Python workflows even more powerful.

Looking forward to testing this out once it's ready!

Best regards,
Knut Petter

Hi @Knut_Petter_Nor ,

To clarify - in the first release, integration is possible with minimal 'glue' code. For example:

import pandas as pd
import easymorph as em
def dataset_to_df(ds: em.Dataset) -> pd.DataFrame: 
    return pd.DataFrame({col.name: list(col) for col in ds.columns})

def df_to_dataset(df: pd.DataFrame) -> em.Dataset:
    b = em.DatasetBuilder()
    for name, series in df.items(): b.add_column(name, series)
    return b.to_dataset()

Direct to/from_dataframe API is not planned for this initial release. This first version will be fairly lightweight and without dependencies. We are looking for feedback and would of course consider expanding the API in the future.

1 Like

Hi @vlad_dzhos,
Thank you for the clarification and the example code - this is really helpful! The glue code approach looks straightforward and clean. I appreciate that you're keeping the first release lightweight without external dependencies, which makes a lot of sense.
A couple of questions on the initial implementation:

The column-based conversion approach looks efficient. Will it preserve data types (dates, numbers, etc.) when converting between formats?
For the DatasetBuilder API, is it possible to add multiple rows at once, or would we iterate row by row? Just thinking about performance for larger datasets.
The minimal approach actually gives nice flexibility - we can customize the conversion based on our specific needs.

I think this is a solid foundation for the first release. Having the ability to work with pandas (even with glue code) opens up so many possibilities for data processing and analysis within EasyMorph workflows.
Looking forward to testing this out.
Regards,
Knut Petter

Dataset that you get from the workflow (em.input) will have the following type mappings:
Empty cell -> None
Number cell -> float
Text cell -> str
Boolean cell -> bool

Similarly, the same types (None, float, str, bool) are accepted as inputs in the dataset builder.
Additionally, datetime is also accepted and converted to an OADate number, the same way EasyMorph handles dates in-engine.
Generally, the idea is that at least for now users would use DatasetBuilder to prepare the output dataset (or .dset file), so the types it accepts should match what EasyMorph can work with. Similarly, the Dataset returned by input is immutable and contains only the types the workflow can produce.

Adding multiple rows at once is possible:

# src is EM dset, builder is DatasetBuilder
def builder_demo(src, builder):  
    # add many rows positionally
    builder.add_rows([row for row in src.rows if row["count"] > 40])    
    
    # add many 'dict' rows (multiple 'list'/'sequence' rows can be added too in the same way)                  
    builder.add_rows([
        {"col_1": 1.0, "col_2": "foo", "col_3": None}, 
        {"col_2": 1.0, "col_1": "foo"},  # different order
        {"col_1": float('nan')},         # NaN -> EasyMorph empty cell
        {"col_1": True},                 # bool stays bool
        {"col_1": datetime.now()},       # converted to EM number cell (OADate repr)
    ]) 

    em.yield_output(builder)

There is also add_row which accepts a list or dict and adds a single row.
Please let us know if this kind of API fits your intended use case.

1 Like

Hi @vlad_dzhos,

Thank you for the detailed explanation! This is exactly the kind of information I was hoping for.
The type mappings look very sensible:

Great to see that None/empty cells are handled properly
The datetime → OADate conversion makes perfect sense for EasyMorph compatibility
Having NaN → empty cell mapping is a nice touch for pandas integration

This API definitely fits our use cases well. The combination of:

Immutable input datasets
Flexible row addition methods
Clear type mappings

Really looking forward to putting this to use.

Regards,
Knut Petter

4 Likes