Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I have over 5000 files with similar data that I want to concatenate. However, they have a different number of fields and different field names (the field name contains the name of the file):
Each file has from 2 to max 30 fields. Here a small example:
location_file304589 | type_file304589 | user_file304589 |
UK | 7 | UserA |
location_file93458 | type_file93458 | user_file93458 | day_file93458 |
US | 8 | UserB | Monday |
location_file34jkshdf4 | user_file34jkshdf4 |
NL | UserC |
The outcome should be like this:
location | type | user | day |
UK | 7 | UserA | |
US | 8 | UserB | Monday |
NL | UserC |
Is this somehow possible without loading the files separately and renaming the fields. Thanks!
How are you fetching the 5000 files? Are you looping through then one by one or are you using wildcards in your source destination like LOAD ... FROM *.CSV ?
If you are looping through the files then you would know the unique file name string. By knowing this string you could use the ALIAS feature.
Master:
Load * inline [
location_file, type_file
];
//Your LOOP
ALIAS location_file$(vYourFileName) as location_file;
ALIAS type_file$(vYourFileName) as type_file;
Tmp:
Load * from $(vYourFileName).csv;
Concatenate (master) load * resident Tmp;
Drop table Tmp;
//NEXT LOOP
Read more about ALIAS here: https://help.qlik.com/en-US/sense/June2019/Subsystems/Hub/Content/Sense_Hub/Scripting/ScriptRegularS...
Thank you for your comment!
To be honest, I am not sure how to do the load. The problem is that I don't know the names of all fields and new field names might be added in future.
So I think I need a solution that loads each table individually, removes the text after the underscore in the field name, and concatenates to the previously loaded table. Does this sound like something that can be done? 😕
I think I would go with a main-approach like the following:
for each file in filelist('YourPath\*.csv')
table: load * from [$(file)] (txt, …);
renameMap: mapping load fieldname(recno(), 'table'), subfield(fieldname(recno(), 'table'), '_', 1)
autogenerate nooffields('table');
rename fields using renameMap;
if filesize('table.qvd') then
concatenate(table) load * from table.qvd (qvd);
end if
store table into table.qvd (qvd);
drop tables table;
next
whereby I'm not sure if I would really concatenate 5000 times the already transformed data in the loop and storing the whole amount of data again and again else storing each table in a separate qvd and loading afterwards from there.
- Marcus
Thank you Vegar and Marcus for the responses!
@marcus_sommer, I tried your script but Rename Fields works only for the first table. I did some research and it turns out that this is a limitation as described in the help section:
Limitations:
You cannot rename two fields to having the same name.
This turns out to be much more complex than I expected. 😕
That's right - you couldn't rename a field to an already existing field. But for this I added the:
drop tables table;
at the end from the loop and then the approach should work again.
- Marcus
Thank you Marcus, I must be doing something wrong then.
I will spend more time on this.