Typically, you would want to use the “Excel metadata” transformation which returns a list of sheets in a spreadsheet. Then use regular transformations and calculations you can do matching and keep only the sheet(s) you need to load. Finally, use Iterate or Call to run a sub-project that loads the sheet(s).
If I need to load, say, 30 files or more, all placed in the same folder, with only one sheet in each, how do I add all files from the folder to the process?
The metadata only works on 1 file at the time?
Yes, Excel Metadata only works with 1 file at a time but this file can be specified by a parameter and therefore can be used in an iteration as well.
Therefore your process should have 3 layers:
Project 1: Generate a list of 30 files to load in particular folder. Iterate: for each file call Project 2.
Project 2: Obtain a list of sheets in each file. Do the matching and produce 1 sheet name to load. Call Project 3.
Project 3: Load given sheet from the spreadsheet.
Let me know if you need help with any of these steps.
I’ve been busy so haven’t had the time to set it up yet… However, I found another solution within the community, for another case I had (copy files from list to another location), where I should generate a list, and run iterations based on the list. I guess it’s almost the same, except the one I have to try here, will generate the list(s) in the first couple of steps…
I’ll let you know how it went, when I have had the time to set it up…
Let’s start from the lowest level and will be moving up.
Level 3: Loading a sheet
This project just loads a sheet from a spreadsheet. It has one transformation (Import from Excel) and two parameters: {File name} and {Sheet name}. The name of the spreadsheet and sheet name to load are specified using these two parameters (respectively).
Now let’s go one level up:
Level 2: Match sheet name
This project has 1 parameter: {File name}. Note that wrapping braces are not part of the parameter name, I’m using them to denote a parameter.
The project does 3 things:
Obtain a list of sheets in spreadsheet which name is specified by project parameter {File name}.
Do matching with sheet names in the list and produce the name of the sheet that we need to load. In this example it’s always “Sheet1” but you can put whatever matching logic you need .
Use iterations to call project “Level 3 …” and return the loaded data from it. Since we have only 1 sheet name to load, there will be only 1 iteration.
In a bit more details:
Loading sheet names from a spreadsheet (see the screenshot below): you can see that the file name is defined by a parameter.
Matching, and loading the matched sheet using iterations:
Finally got the time to look at this - I got it working most of the way…
It breaks for me, when I have a variance in the sheetname.
The sheet name is defined in the parameter for level 3.
In level 1 I can see all the files in the folder - But in level 1 I only get results from the files where the sheet names matches that of the one defined in the parameter for level 3.
The absolute name for the sheet as entered in level 3 is ‘xxzeb0531_excel_sscc_inventory_’ - This applies to 2 of the 3 main sources, while the 3rd source has this sheet name ‘edzeb0531_excel_sscc_inventory_’ - a difference in the 2 first characters.
How do I go about this? - Can I use wildcards in the parameter setting in level 3?
I have a folder with a number of files - they all come from 3 sources, each source unique in file name from the others, and all with a date and time stamp in the filename from when it’s generated.
As I said, I can see all the files in Level 1, but only get results from files coming from the 2 sources with identical sheet names…
It seems like it didn’t emphasize it enough – you should do the sheet name matching on level 2. On level 2 you obtain a list of all sheets in given workbook. Keep only those sheet names that match whatever pattern you need.
No matching happens on level 3. Level 3 only loads given sheet from given workbook.
The example has a simple matching condition [Sheet name] = 'Sheet1' on level 2. Modify it according to your matching logic. You can use, for instance, functions like contains(), endswith(), or startswith() to keep only sheet names as per your matching rule. If you only need to keep sheets names that end with a certain text, use endswith() for filtering.
Once you narrow down the list of sheets to load on level 2, the level 3 simply loads the sheets remained after the matching. Whatever default sheet name is hardcoded on level 3 doesn’t matter because it’s overridden when iterated from level 2.
I’ve modified the example and changed the matching expression to keep only sheets that contain a text substring defined by parameter {Matching substring}.
I’ve also added parameter {Matching substring} to level 1 which is passed to level 2 in “Iterate”. Now you can define the substring to match right in the main project (level 1).
Hi Dmtry,
I tried to product same, but my files have a twist:
i have file Daily_report_week1_20190306 and Daily_report_week2_20190315 and inside each respectively have same sheets different dates:
Daily_report_week1_20190306
Articles_on_20190306
articles_in_store
Daily_report_week1_20190315
Articles_on_20190315
articles_in_store
Everyday in this folder we received the daily file.
Following same process, how can I ensure I just pickup most recent file and only sheets (Articles_on_XXXXXX)?
I tried this process, but in Load 3 I need to choose one file and one sheet and then in Load even if I try the spreadsheet, it only returns data from 1 file.
I had success with this setup (starting from lowest level - the actual file) - I hope it’s possible to enlarge and save/print the images below.
In step 3, when making the project you make a parameter for the sheet name, and enter a specific sheet name to begin with (in order to load a file and sheet in design phase)
In step 2, you define the search string for the sheet name and generate a list of files/sheets to load - The file name and sheet name are assigned as input for parameters in step 3 when defining the iterate process.
In step 1 you get all the columns from the files/sheets defined in step 2 and loaded from step 3.
I would only add to @DKcrm’s reply that to obtain the most recent file extract the timestamp from file names (on level 1) and use the “Keep min/max” action to keep only the file name with the latest timestamp in list.
Extracting the timestamp could be done as follows (replace the file name in the example with column name):
Hello, @dgudkov. I am struggling with a task that I think the above might address. My question is, can this be done with multiple modules within the main project, or must it be done with separate projects (level 1, level 2, and level 3) as shown in your example?
Since this topic is so old, I will add that my task is to bring in multiple excel files that each contain multiple sheets. The number of sheets is inconsistent, and I am trying to create something that will work repeatedly with similar files. Ultimately, I want to append all sheets from all files into one master sheet.
Yes, it can. The posts above were written in 2019. Modules were introduced later. You can replace separate projects with modules in all the examples above.
The approach above is still valid, but will require a modification. You will need two nested loops:
The outer loop that goes across a list of files (spreadsheets) and is done as described in the example above (List files + Iterate).
The inner loop (in a separate module) - for each of these files, produce a list of sheets (it's a mode in the "Import Excel spreadsheet" action) + Iterate for each of the sheets
The inner loop should iterate another (3rd) module with two parameters:
File name (provided by the outer loop)
Sheet name (provided by the inner loop)
This module simply loads the specified sheet from the specified file.
All iterations must be in the "Iterate and append" mode to append all sheets from all files into one file.