Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to load and extract information from multiple CSV files

@Hey guys,

I'm a newbie in the Qlikview world and require your support in resolving 2 queries:

1. I intend to read data into Qlikview from multiple CSV files and correspondingly, use that data to plot a graph.  Each of these files also contain some unwanted header information and the data to be read from all files is also identical

My intention is to load these files, delimit them to remove the unwanted headers and to extract the required data information into separate tables.[For  Y1, Y2]

2. I loaded a single CSV file and tried to extract the required data. The script runs successfully, but I did not receive any information in the corresponding fields. Please find code used for this particular operation

Also, I did not face this problem while importing and carrying out the operation on a XLS file.

Code:

A:
LOAD "X [ms]",Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8 as Data
FROM
[..\Test.csv]
(
txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 17 lines);
B:
LOAD  SubField(Data, ',' ,5) as Relative,
SubField(Data, ',' ,2) as Bit
Resident A;
DROP
Table A;

Could anyone suggest the best way to carry out these activities?

Thanks!! .

1 Solution

Accepted Solutions
marcus_sommer

1. Normally the not initialized variable of vFileCounter should return NULL by the first call and rangesum() should just ignore it. Therefore the error-message looked quite odd. Nevertheless you could initialize the variable before the for-loop with something like: let vFileCounter = 0;

2. Here you couldn't use filebasename() else the variable FoundFile maybe in this way:

if left(subfield(FoundFile, '\', -1), 2) = 'AB' then

...

- Marcus

View solution in original post

18 Replies
marcus_sommer

I think that the load-result from A is quite different to your expectation and load B doesn't make much such sense on it. Therefore add a rowno() to load A and uncomment load B and the drop-statement and take a look on these data within a tablebox - I assume this B load won't be needed.

- Marcus

Anonymous
Not applicable
Author

Hello Marcus,

Yes, you're right. This was an incorrect approach that I took to load the CSV files.

Since the header information for all the CSV files were the same, I used the * operator to import all the files into Qlikview. As a follow-up question, if I want to extract some information as a separate list for each file imported, what must be done to achieve this?

Eg: The information from the field Y1 must be present as a separate list from all the files imported [Y1 from file 1 is one list, Y1 from file 2 is a separate list table]

Thanks for your help.

Regards,

Suraj

marcus_sommer

I wouldn't do it in this way. If you have for example 100 files you would get 100 separate Y-fields which would be quite difficult to handle as dimensions and also within expressions.

More common is the approach to load all these data into a single table respectively field and to tag their sources. For this you could add for example filename() or filebasename() - maybe with some further text-functions like subfield(), mid(), left(), right() to extract valuable meta-informations from the path and/or the filename.

These field(s) could be later used for selections or as a filter within the expressions, for example:

count({< SourceField = {'SourceFieldValue1'}>} Y)

- Marcus

Anonymous
Not applicable
Author

Hi Marcus,

Yes, that makes sense.

I have implemented the file loading as given in the below code.

Now, I'm able to carry out the filtering between the files by the date and time stamp information given in the file name. Thanks for your suggestions!!

However I'm still facing an issue with regard to the below 2 topics:

1. Finding the total count of CSV files imported:

I tried using the Noofrows(' FileList') function but didn't return any data.

2. Only choosing particular files to import:

Tried using an IF Statement before the FoundFile For Loop, but it didn't help.

Any suggestions to resolve these issues?

Regards,

Suraj

marcus_sommer

1. Below your load-statement you could just count the files with each iteration, maybe with something like this:

let vFileCounter = rangesum($(vFileCounter), 1);

2. I think this condition needs to applied between "for each FoundFile ..." and the load with something:

if YourDateExpressionOnFilename >= today() - 30 then

     load ...

else

     Trace 'condition not valid and therefore skipping the load of $(FoundFile)';

end if

- Marcus

evan_kurowski
Specialist
Specialist

Hello Suraj,

For your items:

#1 - the expression: COUNT(DISTINCT [FileName]) should produce your number of input files (so long as you did not read two files with the identical same name.  If that is a possibility, then disambiguate with the full path)

#2 - can you provide the criteria being used for when to parse/pass on the file iteration?  You're on the right track with an IF statement, but maybe it needs adjustment in it's execution.

Also, some suggestions regarding your syntax:


//this sequence addresses the issue of needing syntax that does NOT concatenate on the first pass, but wants concatenation on all subsequent passes.
//The inherent "auto-concatenation" feature of QlikView is very convenient, but also unyieldingly precise in requiring all subsequent table structures
//be identical in terms of table structure. Very often auto-concatenation collections are "detached" by renaming a single column,
//or when wildcard is involved, by adding an extra field
//therefore, "forced" concatenation will collect the target tables even if they vary by even minor or insignificant structure details


//Try adjusting this...

Set vConcatenate = ;
For each FoundFile in filelist (Dir)

FileList:
$(vConcatenate)
LOAD 'the syntax parser is interrupted because of the variable. Not a breakage, and the script will run, but it looks awkward' AS FileName AUTOGENERATE(1);

Set vConcatenate = Concatenate;
Next FoundFile


//...with this

[FileList]:
LOAD '' AS FileName Autogenerate(0); //you now have an empty "placeholder" table to attach 1..N load statements

For each FoundFile in filelist(Dir)
CONCATENATE(FileList)
LOAD '$(FoundFile)' AS FileName
AUTOGENERATE(1)


/*alternative that would allow the function FileBaseName() to be used in this context
FIRST 1 LOAD FileBaseName()  AS FileName
FROM [$(FoundFile)]  //you need to add file specific import settings here
*/

;
Next FoundFile


/*forewarning: the QlikView, recursive, nested ScanFolder sequence works pretty well... when the filenames & paths are cooperative.
If you're in a 'hostile' environment ,  file paths will attempt not to cooperate by throwing any kind of programmatic interruptions they can find
in the file iteration sequences, including various types of quotes and other programmatic characters in the file path. (i.e try throwing single quotes in the paths)

Also, links can somehow create infinite loops of recursion. For the most industrially robust set of file iteration, sometimes it may be
beneficial to collect a manifest of input files from other software (perhaps Powershell), and then load that set of isolated targeted files via
manifest. 

the iteration sequence here does work, but it also has interruption vectors*/

evan_kurowski
Specialist
Specialist

Also... wondering about embedded messaging stuff in here.. like load 2,5 [X ms].  is this a Christmas card?

Anonymous
Not applicable
Author

Hello Marcus,

1. I'm still confused with regard to understanding and implementing your suggestion for finding the count of the files imported. Tried implmenting as below, but it threw up errors. Could you please explain it again?

         

2.For the conditional check, I intend to select files back on the text content of their names.

For example, if I have 4 files AB1.csv, AB2.csv, HV1.csv and HV2.CSV, I would like to only select all the files containing the AB text.

I tried implementing something like this:

If( (Left (FileBaseName(),2 = 'AB')) then

   LOAD

else

   Exit

end if

But, unfortunately it did not work. How would I need to go about tackling this?

Maybe for later revisions, your suggestion for using the date mightbe useful to me to select the data relevant for that measured period.

Thanks.

Regards,

Suraj

Anonymous
Not applicable
Author

Hello Evan,

Sadly, it's not a Christmas Card. It's just data fields pulled from a sensor with the data read every 2.5 ms

Thank you for the suggestions and the feedback. But, have some follow queries with the same:

1. Tried loading COUNT(DISTINCT [FileName]). The script runs, but I do not get the variable under which the values are stored as a field after reloading.

After reloading:

2. I have mentioned the conditions required to parse and select the data in an earlier reply to Marcus. You can refer that and please let me know if further details are required.

In your alternative Syntax suggestion, the field to be entered in Dir can directly be ('Folder Path' & '\*.csv)?'

We use concatenate to append the information of the last loaded file in the table and we generate 1 for every file loaded. Is my understanding right?

I have absolutely no background in Qlikview or analytics and just want to understand as much as possible with regard to the implementation and to effectively utilise the tool for my work.

Thanks.

Best regards,

Suraj