My initial goal: parse a large file embedding Json as a column.
Initial problem faced: The Json quality is not perfect in the source so that, when I use Json Parse action I can get some errors that do stop the morph. I could not find any option in Json parse to continue processing and just get an error on faulty rows.
Imagined solution: iterate my table and delegate json parsing to a module so that I can collect error and continue processing (iterate and append results, when an iteration fails -> Collect errors and continue).
Enclosed: sample project with my different tests.
Problem 1 - basic iteration: I cannot manage Iteration action to work as expected. Did check examples and community, but my output result always seems empty when using Iterate action (see very simple test trying to inc a column value by 1). I must miss something very simple...
Problem 2 - iterating json (anicipating next step in my process): I always get an error message (colum not found - see dedicated table in the same project). I guess this is due to the fact I am not sending json column as a parameter, but as far as I can see Json parse does not accept parameters as input.
Thanks a lot for your ideas on why I cannot get basic iteration to work (I am feeling a bit noobish here and need it for my EM understanding :-)) and on how I could handle the json processing without interruption.
JSON_PROCESSING_ITERATION.morph (11.1 KB)
JSON_SAMPLE_PM.csv (1.2 KB)
TEST_ITERATION_JSON.xlsm (4.8 KB)
I did find the solution for my first problem (by using the Generate sequence action in called module).
About problem 2, still investigating and clueless
For problem 2, I found a (maybe) dirty way. In called module, no Input action, I pass as parameter the json column, Generate a sequence with 1 line, calculate a new column using an expression that takes the value of module parameter, process, send back result.
Is there any better or more straighforward way to achieve the goal ?
Thanks and regards,
I've edited your project to make it work (see below):
JSON_PROCESSING_ITERATION.morph (12.3 KB)
- Removed (disabled) the "Input" action. This action is not required for the regular "Iterate" action because it passes row values as parameters, not as an entire dataset.
- Added parameters in modules ITERATE and ITERATE_JSON, populated them with sample values
- Added the "Parameter table" action. It's a way to make a table from module parameters. I could do the same with expressions, just in a less convenient way.
Also, it doesn't look like iterations are required here. You can use the new "Extract JSON properties" action. It has a failover setting. See the updated project below, I've added the "No iteration" table.
JSON_PROCESSING_ITERATION.morph (14.0 KB)
Hello Dmitry, (looks like my previous post was cut when including attachments, sorry)
Thanks for the tip with parameter table.
About Extract json properties new action, it fails when I am using it. I did attach new version of the project with No iteration Test with faulty json fails table. And provided a dset with some working rows and last faulty row on a different json I need to parse as well (csv export won't work as my json is sometimes far bigger than the 32K or so caracters limitation that seem to apply to csv export action).
As well, I need to parse deeper on this additional Jsons, I can do it with the Parse Json action (see screenshot enclosed) but couldn't find a way with the Extract JSON properties new action. So maybe I will have to iterate eventually...
Thanks a lot for your support.
In the column "JSON response", the text value in the last line (RowNo 1229) is not a valid JSON object. Its JSON structure is broken. That's why the action fails. Maybe try contacting the producer of the JSON data and ask them to fix it.
You can test whether a text value is a valid JSON object or not using the isjson() function. After that, you can filter rows with valid JSONs and parse them in one table, and rows with invalid JSONs in another table (and decide what to do with them).
Here is the updated project (notice the new tab "Group 2").
JSON_PROCESSING_ITERATION.morph (16.8 KB)
Thanks a lot Dmitry for all those precious inputs and tips. I will surely use them in many projects now.
About my specific use case, and as I need to operate many additional actions after json operations on large files (many Gb) with huge JSONs (sometimes over 1M caracters....), the iteration approach is finally the most performant.
I did some testing on a 600 Mb sample and iteration approach, as it splits operations to very small chunck, is at least 3 times faster than a oneshot.