Multiple files, small variation in sheet name, import with parameter

DKcrm · November 28, 2017, 9:03pm

Hi,

I’m going to import a folder with a number of files with almost same structure - only variance is in the sheet names in the excel files.

When importing multiple files, with different sheet names, how do I make a parameter to match for 22 rightmost characters in the sheet name?

The sheet name is 31 characters long, and the final 22 are matching.

Examples for variances on 9 first characters in sheet name;
xxzeb0293
edzeb5581
zzzeb0989

22 rightmost characters (common for all files)
excel_sscc_inventory

//C

dgudkov · November 29, 2017, 12:59am

Typically, you would want to use the “Excel metadata” transformation which returns a list of sheets in a spreadsheet. Then use regular transformations and calculations you can do matching and keep only the sheet(s) you need to load. Finally, use Iterate or Call to run a sub-project that loads the sheet(s).

DKcrm · November 29, 2017, 8:50am

Hi

Thank you for the time …

If I need to load, say, 30 files or more, all placed in the same folder, with only one sheet in each, how do I add all files from the folder to the process?
The metadata only works on 1 file at the time?

dgudkov · November 29, 2017, 4:47pm

Yes, Excel Metadata only works with 1 file at a time but this file can be specified by a parameter and therefore can be used in an iteration as well.

Therefore your process should have 3 layers:

Project 1: Generate a list of 30 files to load in particular folder. Iterate: for each file call Project 2.
Project 2: Obtain a list of sheets in each file. Do the matching and produce 1 sheet name to load. Call Project 3.
Project 3: Load given sheet from the spreadsheet.

Let me know if you need help with any of these steps.

DKcrm · December 3, 2017, 9:52pm

Thanks!

I’ve been busy so haven’t had the time to set it up yet… However, I found another solution within the community, for another case I had (copy files from list to another location), where I should generate a list, and run iterations based on the list. I guess it’s almost the same, except the one I have to try here, will generate the list(s) in the first couple of steps…

I’ll let you know how it went, when I have had the time to set it up…

//C

DKcrm · December 5, 2017, 11:09am

Hi

I got the list of files in the folder…
I can’t seem to find out how to proceed as you describe…

Do I start up following projects with sandbox, or how do I do?

dgudkov · December 5, 2017, 7:15pm

@DKcrm,

Here is an example:
Loading sheets.zip (20.0 KB)

It has 3 projects and 3 worksheets:

Before your read further please make sure you read these two tutorial articles:
Tutorial:Parameters
Tutorial:Iterations

Let’s start from the lowest level and will be moving up.

Level 3: Loading a sheet

This project just loads a sheet from a spreadsheet. It has one transformation (Import from Excel) and two parameters: {File name} and {Sheet name}. The name of the spreadsheet and sheet name to load are specified using these two parameters (respectively).

Now let’s go one level up:

Level 2: Match sheet name

This project has 1 parameter: {File name}. Note that wrapping braces are not part of the parameter name, I’m using them to denote a parameter.

The project does 3 things:

Obtain a list of sheets in spreadsheet which name is specified by project parameter {File name}.
Do matching with sheet names in the list and produce the name of the sheet that we need to load. In this example it’s always “Sheet1” but you can put whatever matching logic you need .
Use iterations to call project “Level 3 …” and return the loaded data from it. Since we have only 1 sheet name to load, there will be only 1 iteration.

In a bit more details:

Loading sheet names from a spreadsheet (see the screenshot below): you can see that the file name is defined by a parameter.

Matching, and loading the matched sheet using iterations:

The “Iterate” transformation properties:

Level 1: Generate a list of worksheets, and iterate

Now the main project to run which is Level 1. This project has no parameters.

It generates a list of Excel files in a given folder, then for each of the files it runs the Level 2 project.

The result is sheets named “Sheet1” loaded from all the spreadsheets in the folder.

DKcrm · December 29, 2017, 10:52am

Hi there

Finally got the time to look at this - I got it working most of the way…
It breaks for me, when I have a variance in the sheetname.

The sheet name is defined in the parameter for level 3.
In level 1 I can see all the files in the folder - But in level 1 I only get results from the files where the sheet names matches that of the one defined in the parameter for level 3.

The absolute name for the sheet as entered in level 3 is ‘xxzeb0531_excel_sscc_inventory_’ - This applies to 2 of the 3 main sources, while the 3rd source has this sheet name ‘edzeb0531_excel_sscc_inventory_’ - a difference in the 2 first characters.
How do I go about this? - Can I use wildcards in the parameter setting in level 3?

I have a folder with a number of files - they all come from 3 sources, each source unique in file name from the others, and all with a date and time stamp in the filename from when it’s generated.
As I said, I can see all the files in Level 1, but only get results from files coming from the 2 sources with identical sheet names…

dgudkov · December 29, 2017, 3:00pm

It seems like it didn’t emphasize it enough – you should do the sheet name matching on level 2. On level 2 you obtain a list of all sheets in given workbook. Keep only those sheet names that match whatever pattern you need.

No matching happens on level 3. Level 3 only loads given sheet from given workbook.

The example has a simple matching condition [Sheet name] = 'Sheet1' on level 2. Modify it according to your matching logic. You can use, for instance, functions like contains(), endswith(), or startswith() to keep only sheet names as per your matching rule. If you only need to keep sheets names that end with a certain text, use endswith() for filtering.

Once you narrow down the list of sheets to load on level 2, the level 3 simply loads the sheets remained after the matching. Whatever default sheet name is hardcoded on level 3 doesn’t matter because it’s overridden when iterated from level 2.

I’ve modified the example and changed the matching expression to keep only sheets that contain a text substring defined by parameter {Matching substring}.

level 2

I’ve also added parameter {Matching substring} to level 1 which is passed to level 2 in “Iterate”. Now you can define the substring to match right in the main project (level 1).

level 1

Here is the updated example:

Loading sheets.zip (20.5 KB)

I understand that iterations are a bit hard to grasp, but it’s worth learning as it’s a very powerful and useful mechanism in EasyMorph.

I’m ready to answer any further questions

DKcrm · December 30, 2017, 7:02am

HA!

Managed to get it working with your last input. Thank you so much!

Though, the example files you added were created in a newer version of EasyMorph (3.7.0.2) than the installation I have (3.6.3.2) …

2017-12-30_07-49-20

dgudkov · December 30, 2017, 1:45pm

Great! You're welcome.

Just get the latest version on our website: EasyMorph | Download free ETL tool

We haven't triggered the global notification yet.

DKcrm · December 31, 2017, 7:53am

Done - Thanks!

jmarques · July 3, 2019, 2:06pm

Hi Dmtry,
I tried to product same, but my files have a twist:

i have file Daily_report_week1_20190306 and Daily_report_week2_20190315 and inside each respectively have same sheets different dates:

Daily_report_week1_20190306

Articles_on_20190306
articles_in_store

Daily_report_week1_20190315

Articles_on_20190315
articles_in_store

Everyday in this folder we received the daily file.
Following same process, how can I ensure I just pickup most recent file and only sheets (Articles_on_XXXXXX)?

I tried this process, but in Load 3 I need to choose one file and one sheet and then in Load even if I try the spreadsheet, it only returns data from 1 file.

Thank you.Jorge

DKcrm · July 4, 2019, 7:23am

Hi Jorge

I had success with this setup (starting from lowest level - the actual file) - I hope it’s possible to enlarge and save/print the images below.

In step 3, when making the project you make a parameter for the sheet name, and enter a specific sheet name to begin with (in order to load a file and sheet in design phase)
In step 2, you define the search string for the sheet name and generate a list of files/sheets to load - The file name and sheet name are assigned as input for parameters in step 3 when defining the iterate process.

In step 1 you get all the columns from the files/sheets defined in step 2 and loaded from step 3.

Loading of the specific file and given sheet:

Loading list of files and sheets - to be fed to step above to load the files/sheets:

Process list of files and sheets in above project, and append result to table:

dgudkov · July 4, 2019, 7:37am

I would only add to @DKcrm’s reply that to obtain the most recent file extract the timestamp from file names (on level 1) and use the “Keep min/max” action to keep only the file name with the latest timestamp in list.

Extracting the timestamp could be done as follows (replace the file name in the example with column name):

date(right("Daily_report_week1_20190306", 8), "yyyyMMdd")

LWP · October 3, 2024, 9:39pm

Hello, @dgudkov. I am struggling with a task that I think the above might address. My question is, can this be done with multiple modules within the main project, or must it be done with separate projects (level 1, level 2, and level 3) as shown in your example?

Since this topic is so old, I will add that my task is to bring in multiple excel files that each contain multiple sheets. The number of sheets is inconsistent, and I am trying to create something that will work repeatedly with similar files. Ultimately, I want to append all sheets from all files into one master sheet.

dgudkov · October 3, 2024, 11:47pm

Yes, it can. The posts above were written in 2019. Modules were introduced later. You can replace separate projects with modules in all the examples above.

The approach above is still valid, but will require a modification. You will need two nested loops:

The outer loop that goes across a list of files (spreadsheets) and is done as described in the example above (List files + Iterate).
The inner loop (in a separate module) - for each of these files, produce a list of sheets (it's a mode in the "Import Excel spreadsheet" action) + Iterate for each of the sheets

The inner loop should iterate another (3rd) module with two parameters:

File name (provided by the outer loop)
Sheet name (provided by the inner loop)

This module simply loads the specified sheet from the specified file.

All iterations must be in the "Iterate and append" mode to append all sheets from all files into one file.

LWP · October 4, 2024, 4:15pm

Thank you so much @dgudkov. It took some trial and error, but I was finally successful.

dgudkov · October 4, 2024, 4:37pm

Awesome!