Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am trying to load a large .csv file size 980 MB,
The .csv file has 54 Columns and containing 244k lines,
FROM [980MB.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq
, filters(Remove(Row, RowCnd(CellValue, 53, StrCnd(null, not))))
)
;
I am trying to eliminate an Empty Field -(Blank field name) by adding a condition highlighted in Bold blue above.
When I reload the file I am getting the error "Request ran out of memory " error.
If I reload other .csv files of smaller size than that one is working fine, 950MB, and 900MB are reloading fine without any errors,
If I tried without this >>, filters(Remove(Row, RowCnd(CellValue, 53, StrCnd(null, not))))
FROM[980MB.csv]
(txt, codepage is 1252, embedded labels, delimiter is ',', msq)
it's working without an error.
Can you help me to solve this issue?
It looks like the filter might add too much overhead. Perhaps you can replace it with a where condition, that could be lighter to process:
load
...
From ...
Where len(trim(X53)) > 0
;
Thank you for your reply.
The issue here is. Everything is in a loop
For each file in a Filelist (Dir\*.csv)
Load *
From $(File)
(txt, codepage is 1252, embedded labels, delimiter is ',', msq
, filters(Remove(Row, RowCnd(CellValue, 53, StrCnd(null, not))))
)
;
next File
Because some or all of the *.csv files have a blank Field name [] and it has blank values in each row. I found that it is the 53rd column.
When I use Load * I cannot comment out that 'Blank Field name', Therefor I used the option 'Transformation Step' in the QlikView File wizard.
, filters(Remove(Row, RowCnd(CellValue, 53, StrCnd(null, not))))
this means checking the 53rd column and removing it if it is null. This was working fine when the rows are less than 200k.
You mentioned that the same load without the filter works with no errors. Why don't you load the files without filtering, and then reload the same table in a Resident load and filter out the NULL values there? It may not be the most efficient way, but it should work without a problem.
However be careful - if some of the tables have field names, and some other tables don't have field names, you may end up with several tables instead of one table. All sorts of issues could be caused by that...
Apologies for the confusion. If I don't use the filter then I need to Load all the field names instead of using * and comment out the blank field name. //[]. -> this is the 53rd column. then it works.
using that filter makes things easier. The filter is only failing when the .csv file has more than 250k rows.
, filters(Remove(Row, RowCnd(CellValue, 53, StrCnd(null, not))))
I think I would tend to check the files on the existing fields before loading them finally. This might be done by using FIRST 1 to load only the first record with or without embedded labels and then checking the field-names respectively field-values and storing all fields within a variable and/or branching into various field-lists or ...
Such a general logic could be very useful within many scenarios and might be in a sub-routine outsourced and fetched per include-variable and only called where it's needed. For example you may check the file-size within the filelist() and applying it only for the biggest ones ...
- Marcus