18 Replies Latest reply: Dec 14, 2017 12:28 AM by Evan Kurowski RSS

    How to load and extract information from multiple CSV files

    Suraj R

      @Hey guys,

       

      I'm a newbie in the Qlikview world and require your support in resolving 2 queries:

       

      1. I intend to read data into Qlikview from multiple CSV files and correspondingly, use that data to plot a graph.  Each of these files also contain some unwanted header information and the data to be read from all files is also identical

       

      My intention is to load these files, delimit them to remove the unwanted headers and to extract the required data information into separate tables.[For  Y1, Y2]

       

      2. I loaded a single CSV file and tried to extract the required data. The script runs successfully, but I did not receive any information in the corresponding fields. Please find code used for this particular operation

       

      Also, I did not face this problem while importing and carrying out the operation on a XLS file.

       

      Code:

       

      A:
      LOAD "X [ms]",Y1,Y2,Y3,Y4,Y5,Y6,Y7,Y8 as Data
      FROM
      [..\Test.csv]
      (
      txt, codepage is 1252, embedded labels, delimiter is ',', msq, header is 17 lines);
      B:
      LOAD  SubField(Data, ',' ,5) as Relative,
      SubField(Data, ',' ,2) as Bit
      Resident A;
      DROP
      Table A;

       

      Could anyone suggest the best way to carry out these activities?

       

      Thanks!! .

        • Re: How to load and extract information from multiple CSV files
          Marcus Sommer

          I think that the load-result from A is quite different to your expectation and load B doesn't make much such sense on it. Therefore add a rowno() to load A and uncomment load B and the drop-statement and take a look on these data within a tablebox - I assume this B load won't be needed.

           

          - Marcus

            • Re: How to load and extract information from multiple CSV files
              Suraj R

              Hello Marcus,

               

              Yes, you're right. This was an incorrect approach that I took to load the CSV files.

               

              Since the header information for all the CSV files were the same, I used the * operator to import all the files into Qlikview. As a follow-up question, if I want to extract some information as a separate list for each file imported, what must be done to achieve this?

               

              Eg: The information from the field Y1 must be present as a separate list from all the files imported [Y1 from file 1 is one list, Y1 from file 2 is a separate list table]

               

              Thanks for your help.

               

              Regards,

              Suraj

                • Re: How to load and extract information from multiple CSV files
                  Marcus Sommer

                  I wouldn't do it in this way. If you have for example 100 files you would get 100 separate Y-fields which would be quite difficult to handle as dimensions and also within expressions.

                  More common is the approach to load all these data into a single table respectively field and to tag their sources. For this you could add for example filename() or filebasename() - maybe with some further text-functions like subfield(), mid(), left(), right() to extract valuable meta-informations from the path and/or the filename.

                   

                  These field(s) could be later used for selections or as a filter within the expressions, for example:

                   

                  count({< SourceField = {'SourceFieldValue1'}>} Y)

                   

                  - Marcus

                    • Re: How to load and extract information from multiple CSV files
                      Suraj R

                      Hi Marcus,

                       

                      Yes, that makes sense.

                       

                      I have implemented the file loading as given in the below code.

                      Now, I'm able to carry out the filtering between the files by the date and time stamp information given in the file name. Thanks for your suggestions!!

                       

                      However I'm still facing an issue with regard to the below 2 topics:

                       

                      1. Finding the total count of CSV files imported:

                      I tried using the Noofrows(' FileList') function but didn't return any data.

                       

                      2. Only choosing particular files to import:

                      Tried using an IF Statement before the FoundFile For Loop, but it didn't help.

                       

                      Any suggestions to resolve these issues?

                       

                      Regards,

                      Suraj

                        • Re: How to load and extract information from multiple CSV files
                          Marcus Sommer

                          1. Below your load-statement you could just count the files with each iteration, maybe with something like this:

                           

                          let vFileCounter = rangesum($(vFileCounter), 1);

                           

                          2. I think this condition needs to applied between "for each FoundFile ..." and the load with something:

                           

                          if YourDateExpressionOnFilename >= today() - 30 then

                               load ...

                          else

                               Trace 'condition not valid and therefore skipping the load of $(FoundFile)';

                          end if

                           

                          - Marcus

                            • Re: How to load and extract information from multiple CSV files
                              Suraj R

                              Hello Marcus,

                               

                              1. I'm still confused with regard to understanding and implementing your suggestion for finding the count of the files imported. Tried implmenting as below, but it threw up errors. Could you please explain it again?

                                       

                              2.For the conditional check, I intend to select files back on the text content of their names.

                               

                              For example, if I have 4 files AB1.csv, AB2.csv, HV1.csv and HV2.CSV, I would like to only select all the files containing the AB text.

                               

                              I tried implementing something like this:

                               

                              If( (Left (FileBaseName(),2 = 'AB')) then

                                 LOAD

                              else

                                 Exit

                              end if

                               

                              But, unfortunately it did not work. How would I need to go about tackling this?

                               

                              Maybe for later revisions, your suggestion for using the date mightbe useful to me to select the data relevant for that measured period.

                               

                              Thanks.

                               

                              Regards,

                              Suraj

                                • Re: How to load and extract information from multiple CSV files
                                  Marcus Sommer

                                  1. Normally the not initialized variable of vFileCounter should return NULL by the first call and rangesum() should just ignore it. Therefore the error-message looked quite odd. Nevertheless you could initialize the variable before the for-loop with something like: let vFileCounter = 0;

                                   

                                  2. Here you couldn't use filebasename() else the variable FoundFile maybe in this way:

                                   

                                  if left(subfield(FoundFile, '\', -1), 2) = 'AB' then

                                  ...

                                   

                                  - Marcus

                                  • Re: How to load and extract information from multiple CSV files
                                    Evan Kurowski

                                    Suraj Ramachandrappa wrote:

                                     

                                    I tried implementing something like this:

                                     

                                    If( (Left (FileBaseName(),2 = 'AB')) then

                                       LOAD

                                    else

                                       Exit

                                    end if

                                     

                                    But, unfortunately it did not work. How would I need to go about tackling this?

                                     

                                    Thanks.

                                     

                                    Regards,

                                    Suraj

                                     

                                    This particular portion is not working because FileBaseName() is not a function available in this context of your looping sequence (has to be used in conjunction with a load statement inside your file-loop).

                                     

                                     

                                     

                                    Instead switch your logic to look for matches in your looping variable $(FoundFile)

                                     

                                     

                                    $(FoundFile) will be assigned full paths, so when parsing for the file name you have to strip off the last sub-segment in the path tree.

                                     

                                     

                                     

                                    Subfield('$(FoundFile)','\',-1)  //should isolate the FileBaseName + Extension


                                    IF
                                    ( Left( Upper(Subfield('$(FoundFile)','\',-1)) , 2) = 'AB' THEN  //case-sensitive, maybe throw in an upper for extra-measure

                                    Trace you could give her this;

                                    ELSE

                                    Trace Or you could give her that;

                                    END IF

                                • Re: How to load and extract information from multiple CSV files
                                  Evan Kurowski

                                  Hello Suraj,

                                  For your items:

                                   

                                   

                                  #1 - the expression: COUNT(DISTINCT [FileName]) should produce your number of input files (so long as you did not read two files with the identical same name.  If that is a possibility, then disambiguate with the full path)

                                   

                                   

                                   

                                  #2 - can you provide the criteria being used for when to parse/pass on the file iteration?  You're on the right track with an IF statement, but maybe it needs adjustment in it's execution.

                                   

                                   

                                   

                                   

                                   

                                   

                                   

                                  Also, some suggestions regarding your syntax:

                                   

                                   

                                   


                                  //this sequence addresses the issue of needing syntax that does NOT concatenate on the first pass, but wants concatenation on all subsequent passes.
                                  //The inherent "auto-concatenation" feature of QlikView is very convenient, but also unyieldingly precise in requiring all subsequent table structures
                                  //be identical in terms of table structure. Very often auto-concatenation collections are "detached" by renaming a single column,
                                  //or when wildcard is involved, by adding an extra field
                                  //therefore, "forced" concatenation will collect the target tables even if they vary by even minor or insignificant structure details


                                  //Try adjusting this...

                                  Set vConcatenate = ;
                                  For each FoundFile in filelist (Dir)

                                  FileList:
                                  $(vConcatenate)
                                  LOAD 'the syntax parser is interrupted because of the variable. Not a breakage, and the script will run, but it looks awkward' AS FileName AUTOGENERATE(1);

                                  Set vConcatenate = Concatenate;
                                  Next FoundFile


                                  //...with this

                                  [FileList]:
                                  LOAD '' AS FileName Autogenerate(0); //you now have an empty "placeholder" table to attach 1..N load statements

                                  For each FoundFile in filelist(Dir)
                                  CONCATENATE(FileList)
                                  LOAD '$(FoundFile)' AS FileName
                                  AUTOGENERATE(1)


                                  /*alternative that would allow the function FileBaseName() to be used in this context
                                  FIRST 1 LOAD FileBaseName()  AS FileName
                                  FROM [$(FoundFile)]  //you need to add file specific import settings here
                                  */

                                   

                                  ;
                                  Next FoundFile


                                  /*forewarning: the QlikView, recursive, nested ScanFolder sequence works pretty well... when the filenames & paths are cooperative.
                                  If you're in a 'hostile' environment ,  file paths will attempt not to cooperate by throwing any kind of programmatic interruptions they can find
                                  in the file iteration sequences, including various types of quotes and other programmatic characters in the file path. (i.e try throwing single quotes in the paths)

                                  Also, links can somehow create infinite loops of recursion. For the most industrially robust set of file iteration, sometimes it may be
                                  beneficial to collect a manifest of input files from other software (perhaps Powershell), and then load that set of isolated targeted files via
                                  manifest. 

                                  the iteration sequence here does work, but it also has interruption vectors*/

                          • Re: How to load and extract information from multiple CSV files
                            Evan Kurowski

                            Also... wondering about embedded messaging stuff in here.. like load 2,5 [X ms].  is this a Christmas card?

                              • Re: How to load and extract information from multiple CSV files
                                Suraj R

                                Hello Evan,

                                 

                                Sadly, it's not a Christmas Card. It's just data fields pulled from a sensor with the data read every 2.5 ms

                                 

                                Thank you for the suggestions and the feedback. But, have some follow queries with the same:

                                 

                                1. Tried loading COUNT(DISTINCT [FileName]). The script runs, but I do not get the variable under which the values are stored as a field after reloading.

                                After reloading:

                                 

                                2. I have mentioned the conditions required to parse and select the data in an earlier reply to Marcus. You can refer that and please let me know if further details are required.

                                 

                                In your alternative Syntax suggestion, the field to be entered in Dir can directly be ('Folder Path' & '\*.csv)?'

                                We use concatenate to append the information of the last loaded file in the table and we generate 1 for every file loaded. Is my understanding right?

                                 

                                I have absolutely no background in Qlikview or analytics and just want to understand as much as possible with regard to the implementation and to effectively utilise the tool for my work.

                                 

                                Thanks.

                                 

                                Best regards,

                                Suraj

                                  • Re: How to load and extract information from multiple CSV files
                                    Evan Kurowski

                                    Suraj Ramachandrappa wrote:

                                     

                                    Hello Evan,

                                     

                                    Sadly, it's not a Christmas Card. It's just data fields pulled from a sensor with the data read every 2.5 ms

                                     

                                    Thank you for the suggestions and the feedback. But, have some follow queries with the same:

                                     

                                    1. Tried loading COUNT(DISTINCT [FileName]). The script runs, but I do not get the variable under which the values are stored as a field after reloading.

                                    Thanks.

                                     

                                    Best regards,

                                    Suraj

                                     

                                     

                                    Regarding this issue... the approach suggested was to use a user-interface expression to calculate your number of distinct input files.  You don't use this in the script portion to assign a variable (you could do this, but my suggestion sidestepped this approach)

                                    If after reloading you have the field [FileName] present in the data model, then create a text box and assign the expression =COUNT(DISTINCT [FileName]).  This should be a readily available count of your inputs, and unless there's some other reason you need the count of files recorded to table and/or variable, there's no rule that says you have to calculate this via scripting.  If you do have to tabulate during scripting, then techniques incrementing the counter per loop pass (even incrementing conditionally whether falling into IF criteria), as Marcus has pointed out, should work fine.

                                    • Re: How to load and extract information from multiple CSV files
                                      Marcus Sommer

                                      1. Yes, count() won't work to assign a value to a variable - it's an aggregation-function and worked within the gui or in the script in an aggregation-load with group by clause. Here within the script you could use instead:

                                       

                                      let vTotFileCount  = fieldvaluecount('FileName');

                                       

                                      2. If there is just one folder you don't need a loop through folders and could just put the path into filedir(). The same is with the extension - with just one extension it's not needed. If there are more than one extension and/or the files have different data-sructure you would need to query which value has the current file to choose within further if-loops or case-statements which file-format and load-statement should be used.

                                       

                                      An explicit concatenate-statement is only needed if the data-structure could be differ in some way otherwise the automatic table-concatenation from Qlik would do the job.

                                       

                                      - Marcus

                                        • Re: How to load and extract information from multiple CSV files
                                          Evan Kurowski

                                          Marcus Sommer wrote:

                                           

                                           

                                           

                                          An explicit concatenate-statement is only needed if the data-structure could be differ in some way otherwise the automatic table-concatenation from Qlik would do the job.

                                           

                                          - Marcus

                                           

                                           

                                          Mentioning again...

                                           

                                           

                                          Automatic table-concatenation feature is a beautiful idea, but probably only safe for use if you're in a controlled lab environment, or directly prepare the formats/structures of your own data inputs.

                                           

                                          With zero exaggeration, probably every auto-concatenation I've attempted to use in which data inputs were prepared by another party, without exception were eventually disrupted by an unannounced upstream table structure change (either an add, drop, or alias).  If you're going to develop around an auto-concatenation, make sure you get your data providers to commit to a stable format.  It is an amazing technical feature, but sometimes you have to take into account the human element.

                                          ~E

                                    • Re: How to load and extract information from multiple CSV files
                                      Suraj R

                                      Thank you Marcus and Evan for your valuable suggestions!!

                                       

                                      Regards,

                                      Suraj