Multiple files, small variation in sheet name, import with parameter

Hi,

I’m going to import a folder with a number of files with almost same structure - only variance is in the sheet names in the excel files.

When importing multiple files, with different sheet names, how do I make a parameter to match for 22 rightmost characters in the sheet name?

The sheet name is 31 characters long, and the final 22 are matching.

Examples for variances on 9 first characters in sheet name;
xxzeb0293
edzeb5581
zzzeb0989

22 rightmost characters (common for all files)
excel_sscc_inventory

//C

Typically, you would want to use the “Excel metadata” transformation which returns a list of sheets in a spreadsheet. Then use regular transformations and calculations you can do matching and keep only the sheet(s) you need to load. Finally, use Iterate or Call to run a sub-project that loads the sheet(s).

Hi

Thank you for the time …

If I need to load, say, 30 files or more, all placed in the same folder, with only one sheet in each, how do I add all files from the folder to the process?
The metadata only works on 1 file at the time?

Yes, Excel Metadata only works with 1 file at a time but this file can be specified by a parameter and therefore can be used in an iteration as well.

Therefore your process should have 3 layers:

Project 1: Generate a list of 30 files to load in particular folder. Iterate: for each file call Project 2.
Project 2: Obtain a list of sheets in each file. Do the matching and produce 1 sheet name to load. Call Project 3.
Project 3: Load given sheet from the spreadsheet.

Let me know if you need help with any of these steps.

Thanks!

I’ve been busy so haven’t had the time to set it up yet… However, I found another solution within the community, for another case I had (copy files from list to another location), where I should generate a list, and run iterations based on the list. I guess it’s almost the same, except the one I have to try here, will generate the list(s) in the first couple of steps…

I’ll let you know how it went, when I have had the time to set it up…

//C :smiley:

Hi

I got the list of files in the folder…
I can’t seem to find out how to proceed as you describe…

Do I start up following projects with sandbox, or how do I do?

@DKcrm,

Here is an example:
Loading sheets.zip (20.0 KB)

It has 3 projects and 3 worksheets:
image

Before your read further please make sure you read these two tutorial articles:
Tutorial:Parameters
Tutorial:Iterations

Let’s start from the lowest level and will be moving up.

Level 3: Loading a sheet

This project just loads a sheet from a spreadsheet. It has one transformation (Import from Excel) and two parameters: {File name} and {Sheet name}. The name of the spreadsheet and sheet name to load are specified using these two parameters (respectively).

image

Now let’s go one level up:

Level 2: Match sheet name

This project has 1 parameter: {File name}. Note that wrapping braces are not part of the parameter name, I’m using them to denote a parameter.

The project does 3 things:

  1. Obtain a list of sheets in spreadsheet which name is specified by project parameter {File name}.
  2. Do matching with sheet names in the list and produce the name of the sheet that we need to load. In this example it’s always “Sheet1” but you can put whatever matching logic you need .
  3. Use iterations to call project “Level 3 …” and return the loaded data from it. Since we have only 1 sheet name to load, there will be only 1 iteration.

In a bit more details:

Loading sheet names from a spreadsheet (see the screenshot below): you can see that the file name is defined by a parameter.
image

Matching, and loading the matched sheet using iterations:

The “Iterate” transformation properties:
image

Level 1: Generate a list of worksheets, and iterate

Now the main project to run which is Level 1. This project has no parameters.

It generates a list of Excel files in a given folder, then for each of the files it runs the Level 2 project.

The result is sheets named “Sheet1” loaded from all the spreadsheets in the folder.

image

1 Like

Hi there

Finally got the time to look at this - I got it working most of the way…
It breaks for me, when I have a variance in the sheetname.

The sheet name is defined in the parameter for level 3.
In level 1 I can see all the files in the folder - But in level 1 I only get results from the files where the sheet names matches that of the one defined in the parameter for level 3.

The absolute name for the sheet as entered in level 3 is ‘xxzeb0531_excel_sscc_inventory_’ - This applies to 2 of the 3 main sources, while the 3rd source has this sheet name ‘edzeb0531_excel_sscc_inventory_’ - a difference in the 2 first characters.
How do I go about this? - Can I use wildcards in the parameter setting in level 3?

I have a folder with a number of files - they all come from 3 sources, each source unique in file name from the others, and all with a date and time stamp in the filename from when it’s generated.
As I said, I can see all the files in Level 1, but only get results from files coming from the 2 sources with identical sheet names…

It seems like it didn’t emphasize it enough – you should do the sheet name matching on level 2. On level 2 you obtain a list of all sheets in given workbook. Keep only those sheet names that match whatever pattern you need.

No matching happens on level 3. Level 3 only loads given sheet from given workbook.

The example has a simple matching condition [Sheet name] = 'Sheet1' on level 2. Modify it according to your matching logic. You can use, for instance, functions like contains(), endswith(), or startswith() to keep only sheet names as per your matching rule. If you only need to keep sheets names that end with a certain text, use endswith() for filtering.

Once you narrow down the list of sheets to load on level 2, the level 3 simply loads the sheets remained after the matching. Whatever default sheet name is hardcoded on level 3 doesn’t matter because it’s overridden when iterated from level 2.

I’ve modified the example and changed the matching expression to keep only sheets that contain a text substring defined by parameter {Matching substring}.

level 2

I’ve also added parameter {Matching substring} to level 1 which is passed to level 2 in “Iterate”. Now you can define the substring to match right in the main project (level 1).

level 1

Here is the updated example:

Loading sheets.zip (20.5 KB)

I understand that iterations are a bit hard to grasp, but it’s worth learning as it’s a very powerful and useful mechanism in EasyMorph.

I’m ready to answer any further questions :slight_smile:

HA!

Managed to get it working with your last input. Thank you so much!

Though, the example files you added were created in a newer version of EasyMorph (3.7.0.2) than the installation I have (3.6.3.2) …

2017-12-30_07-49-20

Great! You're welcome.

Just get the latest version on our website: EasyMorph | Download free ETL tool

We haven't triggered the global notification yet.

Done - Thanks!

Hi Dmtry,
I tried to product same, but my files have a twist:

  1. i have file Daily_report_week1_20190306 and Daily_report_week2_20190315 and inside each respectively have same sheets different dates:

Daily_report_week1_20190306

  1. Articles_on_20190306
  2. articles_in_store

Daily_report_week1_20190315

  1. Articles_on_20190315
  2. articles_in_store

Everyday in this folder we received the daily file.
Following same process, how can I ensure I just pickup most recent file and only sheets (Articles_on_XXXXXX)?

I tried this process, but in Load 3 I need to choose one file and one sheet and then in Load even if I try the spreadsheet, it only returns data from 1 file.

Thank you.Jorge

Hi Jorge

I had success with this setup (starting from lowest level - the actual file) - I hope it’s possible to enlarge and save/print the images below.

In step 3, when making the project you make a parameter for the sheet name, and enter a specific sheet name to begin with (in order to load a file and sheet in design phase)
In step 2, you define the search string for the sheet name and generate a list of files/sheets to load - The file name and sheet name are assigned as input for parameters in step 3 when defining the iterate process.

In step 1 you get all the columns from the files/sheets defined in step 2 and loaded from step 3.

Loading of the specific file and given sheet:

Loading list of files and sheets - to be fed to step above to load the files/sheets:

Process list of files and sheets in above project, and append result to table:

1 Like

I would only add to @DKcrm’s reply that to obtain the most recent file extract the timestamp from file names (on level 1) and use the “Keep min/max” action to keep only the file name with the latest timestamp in list.

Extracting the timestamp could be done as follows (replace the file name in the example with column name):

date(right("Daily_report_week1_20190306", 8), "yyyyMMdd")
1 Like

Hello, @dgudkov. I am struggling with a task that I think the above might address. My question is, can this be done with multiple modules within the main project, or must it be done with separate projects (level 1, level 2, and level 3) as shown in your example?

Since this topic is so old, I will add that my task is to bring in multiple excel files that each contain multiple sheets. The number of sheets is inconsistent, and I am trying to create something that will work repeatedly with similar files. Ultimately, I want to append all sheets from all files into one master sheet.

Yes, it can. The posts above were written in 2019. Modules were introduced later. You can replace separate projects with modules in all the examples above.

The approach above is still valid, but will require a modification. You will need two nested loops:

  1. The outer loop that goes across a list of files (spreadsheets) and is done as described in the example above (List files + Iterate).
  2. The inner loop (in a separate module) - for each of these files, produce a list of sheets (it's a mode in the "Import Excel spreadsheet" action) + Iterate for each of the sheets

The inner loop should iterate another (3rd) module with two parameters:

  • File name (provided by the outer loop)
  • Sheet name (provided by the inner loop)

This module simply loads the specified sheet from the specified file.

All iterations must be in the "Iterate and append" mode to append all sheets from all files into one file.

Thank you so much @dgudkov. It took some trial and error, but I was finally successful.

Awesome! :slight_smile: