PDF as input in Easymorph

I Just wanted to share something that I wasn’t able to find on the fourms earlier but that I’ve finally found a reasonably good solution for.
We’re recieving data in PDF format from suppliers that has been handled pretty manually and we’ve been using EM to ETL data from the other formats recieved but PDF’s has been hard until I stumbled across a nice Open source utility called doctotext
This little program can be used to parse the text of a pdf very fast and easy with the “Run Program” action

Basically just to point to the .exe and use "–pdf {and a paramter for the location of the file} and you’ll end up with the pdf split by line in your table.


Then using some Regex to parse the text into usable data and voila.

I hope it can help somebody else parsing pdf’s with easymorph.
(Doctotext can also be used to parse other file formats, powerpoint, email files, word files etc. but i haven’t tested any of those as of yet)

3 Likes

Another option is Able2Extract. It’s not free but reasonably priced.