Suggested new actions: Import PDF File or Import PDF Text

It would be very helpful to have an ACTION Import PDF File or Import PDF text. Currently, we are using extensive PDF edits to prepare a PDF file to be imported in Excel and then import the excel file into Easy Morph. This process is very labor intensive.

Linking similar feature requests:

See also this topic: PDF as input in Easymorph

I posted this request in November 2022 and have not heard back on and planned commitments from Easymorph. Unfortunately for us, a partner that we communicate with sends us PDF reports and not Excel spreadsheets. They will not change their methods. Trying to prepare or read the file so we can process the data in Easymorpf is very difficult. You posted a tool that converts PDF to text but when I tried to contact the company, no one answered. Please consider adding some import of a PDF file to Easymorph. Let us know what your answer is. Thanks.

We're expanding our team to invest in a few new product features, including working with PDF. I don't have any ready hard deadlines for when support for PDF will improve, but it's on our roadmap.

Let's revive this topic in 3-4 months, I should have a more concrete timeline by then.

Hi,
Interesting.
Maybe a connector to an API able to extract table from PDF ?
Or maybe a connector to elasticsearch or opensearch ?

Regards

Any new information on this request? The solution listed, use a DOCTOTEXT converter program is not optimal since the majority of users do not have the app or are restricted from downloading it. It is preferable that an action Import PDF file would be preferable.

Hi @dgudkov & team,

I would love to hear more on the status of this feature as well. Just today, we ended up needing to import data from a PDF file. Considering EasyMoprh couldn't do it, we had to use PowerQuery first, to convert pdf to csv and then use csv with EasyMorph. Wasn't very seamless. Would love to have PDFs added as an import option.

As of today, the best PDF import quality is provided by AI services (LLM models), such as Gemini or ChatGPT. EasyMorph can interoperate with these AI services using the "Ask AI" action:

  • Create a connector to ChatGPT or an OpenAI-compatible service
  • Use the "Ask AI" action with a prompt that explains what part of PDF needs to be extracted (e.g. as a comma-delimited text)
  • Attached the PDF to the action.
  • If you instructed the AI service to use a comma-delimited text, add the "Split delimited text" action to parse the output of the AI sevice (the result of the "Ask AI" action).

Tip: You can wrap the import from PDF as a reusable project and call it from other projects (workflows) using the "Call another module/project" action.

Tip: If you're comfortable with the JSON format, you can instruct the AI service (via the "Ask AI" action) to return a JSON.

Thanks @dgudkov . Unfortunately, LLMs are not an option in this case because of the confidential nature of the document. We did try something similar with CoPilot (the only authorised option in the organisation) , but it started hallucinating.

One more option: use the Able2Extract utility to convert PDFs to CSV files. It's not too expensive and has been used by some of our customers.

The utility can be called using the "Run program" action before importing the CSV into EasyMorph.