About EasyMorph Tutorials & Examples Web-help

API pagination how I create loop

Hi guys,

hope this is not a duplicate topic or something already ask by someone else :slight_smile:

I will explain what I’ve done to create a loop to call successive pages from an API and merge the result in a CSV file to load my DB

Context:
I have to update a database with a list of companies send by this free API. the API return a CSV or a JSON (I chose to work on the CSV because I can manage more row according the API specification)

All the documentation can be found here (In french only) https://api.insee.fr/catalogue/site/themes/wso2/subthemes/insee/pages/item-info.jag?name=Sirene&version=V3&provider=insee

What I’m trying to do
I want to be able in the case that I have more than 10000 row returned by the API to relaunch call until I arrive at the end of the row I have to receive.
example:
If I have to process 15000 row, I have to do 2 call to this API.

1. Iteration calculation

Here I calculate the MAX number of iteration I will have to do.
Parameter is calculate with “round(asnumber([X-Total-Count])/10000,0)” where [X-Total-Count] is the amount of row updated at sysdate -1

image

2. Iteration start
in this step I run my second module (responseToCSV) and push my parameter “numberofiteration” to a paramter on the second module called “page number”.

3. What is executed
my second module contain a succession of action (table) that will push data to the CSV.
I create 3 parameter to make the loop

pagenumber : receive the data previously calculated and called “numberofiteration”
pagetocall : an empty number parameter
Parameter : dynamics parameter calculate with

if(isempty({pagenumber}),0,(if({pagetocall}>0 and {pagetocall}<{pagenumber},{pagetocall}+1,{pagenumber})))

When I launch the project, it run without any problem or warning or alert.
From my point of view, it seems to work.

the questions are:

  1. I am right when I think it’s work will it only make 1 call
  2. If not What I’ve missed? Can you please help me?
  3. did I’m using Easymorph on the good way?

I attach my process if you want to take a look at.

thanks in advance for your help.

Cheers :slight_smile:

InseeDailySIREN.morph (51.9 KB)

Hi Mathieu and welcome to the Community!

can you please share the settings of the Web Location connector “INSEE API V3 27092020”? It’s not embedded in the attached project and therefore I can’t make the project work.

Hi Dimitri

here is a print screen of the connector there is nothing more :slight_smile:

image

One more question - what’s the purpose of the pagetocall parameter? Why not user pagenumber?

OK, it seems like pagetocall isn’t really needed.

Here is your project edited to make it work. You will need to update your EasyMorph to v4.5 which was released yesterday in order to open the project (sorry, I don’t have an earlier version installed in my computer).

What was done:

I changed the number of records to pull from API at once (the batch size) from 10,000 to 1,000 in order to have a case when multiple requests are required because the total number of records in the API was just 6,871.

Also the batch size is now defined using a parameter so it’s easy to change it back to 10,000 throughout the workflow.

Fixed iterations

Iterations in EasyMorph is a way to arrange FOR…EACH type of loops. An iteration action performs as many cycles in such loop as there are rows in the input dataset. In your project the iteration in table “iterationRun” didn’t work because it always had 1 row regardless of batch size or total number of rows. This has been fixed. Now the “iterationCalculation’ table” calculates the necessary parameters for iterations. For instance, if the total number of records to fetch is 6871 and the batch size is 1000 rows per request then the iteration table should have 7 rows:

image

Also, the “Iterate table” action wasn’t really necessary. It could work, but its purpose is a bit more advanced for the type of iteration that you needed. The regular “Iterate” action is sufficient here.

I highly recommend reading our tutorial on iterations if you haven’t done it yet: https://easymorph.com/learn/iterations.html

Fixed API queries in module “responseToCSV”

In the API queries that pulled data, the debut URL parameter was missing, so I’ve added it. The debut request header wasn’t required so I removed it.

Also, the “Iterate Web Request” action now saves the response body into a file, instead of keeping it as a text cell. Keeping the whole CSV dataset (which can be megabytes of data) in just 1 cell is not recommended. Instead, the response body (CSV data) is exported into a temporary CSV file. Then it’s immediately re-imported back from the temporary CSV file into EasyMorph as a nice normal table. The temporary file is then deleted.

We don’t need to have CSV file(s) permanently, because we can keep all CSV data in memory in EasyMorph. Since the “Iterate” action in table “iterationRun” works in the “Iterate and append” mode, all result datasets of all iterations will be automatically appended into 1 dataset when the iteration finishes.

Removed CSV imports in the export group

To give the workflow a cleaner structure, I’ve moved tables with actions that export data into the target database into a separate group (tab) called “Export to SIREN”

In the table “SIREN_UPDATE” I’ve removed the CSV import actions because they are no longer needed. The whole dataset resides in-memory in EasyMorph, so no need to re-import it again. Actions “Export to database” and “Database command” don’t modify data in EasyMorph. Their output datasets in EasyMorph is always the same as the input datasets. They are “transparent” for data flow in EasyMorph.

Miscellaneous

  • Removed extra columns with response attributes as they are not necessary and in order to have cleaner data
  • Added a parameter for the temporary data folder - you may need to change it before running the project
  • The API connector made embedded into the project. You may want to change it back to your connector from repository
  • Requesting a token in every iteration in module "responseToCSV’ seems to be unnecessary and slows down execution. I suppose you can just reuse the token obtained in module “baseProcess”.

Here is the updated project:
InseeDailySIREN_DG.morph (40.2 KB)

If you have questions let me know.

1 Like

Hey Dimitri!

Thanks for all of this. I’ve done the training about iterations but wasn’t understand it well :sweat_smile:

Thanks for your time!

Cheers

You’re welcome! :slight_smile:

To learn more about EasyMorph visit easymorph.com.