Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Selecting data from multiple Excel tables with same field names

Hi all,

I have multiple excel files (*.xslx) containing data gathered over different dates. The excel files have the same sheet names and the same field names. The only data that is different from one file to another is the values of the fields.

Sample_File_Date1 :

Feature_Name                                        Value_1              Value_2        

4D TRAK                                                  98%                    75%

Automatic Coil Selection\AcsFcc              81%                    65%

MRE                                                        93%                    40%

PCA                                                        96%                    21%

Sample_File_Date2 :

Feature_Name                                        Value_1              Value_2        

4D TRAK                                                  94%                    85%

Automatic Coil Selection\AcsFcc              82%                    21%

MRE                                                        0%                    40%

PCA                                                        67%                    22%


In my load script, I am loading the excel files dynamically as below :


LOAD

    Feature_Name

    Value_1 

    Value_2

FROM [lib://../Sample_File_*.xlsx]

(ooxml, embedded labels, table is [SampleSheet]);

The problem I have is I have to display/calculate values based on the dates. That is, some of my charts use, for example, average of Value_1 only from the file Sample_File_Date2 , while elsewhere a pivot table might have to display the fields only from Sample_File_Date1 in a pivot table. What function(s) should I use to achieve this?

Any help is truly appreciated.

Regards,

Kanchana

1 Solution

Accepted Solutions
swuehl
MVP
MVP

I would start with checking if the table chart shows what you want when you select the file date in a filter pane (maybe remove the set expression from your Only() function to avoid a conflict between the selection and the set expression).

If it does work, then your set expression is not correct.

{<[File Date] = MaxDate >}


is basically copying the selection in field MaxDate to [File Date]. I assume that you don't have an active selection in this field?

And from what I've learned above, your MaxDate field shows two dates, but you want to show the data for the min date of the two?


Check if


=Only( If( [File Date] = Minstring(MaxDate), [Value_1]))


returns what you want (without any selection in the date fields).


If yes, you can create also a set expression with this filter, but you need to take care of some Qlik specifics (basically value formatting in the set modifier):

Dates in Set Analysis


Maybe something like

=Only({<[File Date] = {'$(=Date(Max({1} [File Date],2)))'} >} [Value_1])

edit: and again, all help would be much easier if you could provide a small sample application

View solution in original post

19 Replies
swuehl
MVP
MVP

Create a new field in the LOAD that identifies the file source.

Instead of the wildcard '*' in the filename, I would use a FOR EACH loop and use the variable, something like

For Each File in FileList('[lib://../Sample_File_*.xlsx]')

    Data:

    LOAD

        Feature_Name,

        Value_1,

        Value_2,

        Subfield('$(File)', '\', -1) AS Filename

    FROM [$(File)] (ooxml, embedded labels, table is [SampleSheet]);

Next File


edit: there is also a function FileName() you can have a look into, but there seems to be an open issue with latest QlikSense versions and xlsx files


Re: Function FileName() not working after upgrade to 3.2.3

Not applicable
Author

Hi Stefan,

Thanks for your response.

However, I do not have any use for the file name as there are hundreds of such files that differ only in the date in which they were generated. Therefore, I have to use the date to filter out the data, such as the scenarios as below :

1. Display Feature_Name, Value_1, Value_2 from the table that was last generated (I use Max(date) function here) in a pivot table.

2. Calculate the Average of all Value_1 values from the table that was generated just previous to the latest table and display in a KPI.

etc.

I need to know specifically how to select the data in my chart - I am not sure whether set analysis would come in handy here, and if it would, how should I create my statements?

Regards,

Kanchana

swuehl
MVP
MVP

As far as I understood, you need to parse the date from your filename, right?`

Above was just a sample, use string functions, like Textbetween() or Subfield() to filter the dates from the filename.

After you parsed the dates and stored them into a field, you can filter your date using set analysis or filter panes or conditional functions just like you filter on any other attribute.

Not applicable
Author

I have the date information in two separate places, one an xml file which contains some generic information on the excel file as below :

<records xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <record>

      <File_No>1</File_No>

      <File_Date>05/02/17</File_Date>

      <File_Stream>Atlmainrecon</File_Stream>

      <File_SWID>404</File_SWID>

  </record>

  <record>

      <File_No>2</File_No>

      <File_Date>06/01/17</File_Date>

      <File_Stream>Atlmain</File_Stream>

      <File_SWID>339</File_SWID>

  </record>

</records>

I have loaded this file also in the script as

SampleCrossTable:

    LOAD

        File_No as "File No",

        File_Date as "File Date",

        File_Stream as "File Stream",

        File_SWID as "File SWID"

    FROM [lib://*/Summary.xml]

(XmlSimple, table is [records/record]);

From this, I can get the date information, say, the latest date as follows :

LOAD

    Date(max("File Date")) as MaxDate

resident SampleCrossTable;

Two, the same date information is also available from the filename, which I have extracted this way within the load script:

Date(Mid(FileName(), 12, 11)) as "File Date"

But my question is, how can I correlate the MaxDate from the load script with the corresponding Excel file and use it in my charts to get the data for the scenarios that I mentioned in my previous question.

I have tried using conditional statements such as

if([File Date] = MaxDate), do something..


but this doesn't get me any data. Where could I be going wrong?

Regards,

Kanchana

swuehl
MVP
MVP

If something like

if([File Date] = MaxDate), do something..

does not work, there are two common issues (besides that if() function should show at least two arguments, THEN branch being second, an optional ELSE branch being third. Have a look at the HELP pages for full syntax and examples. I guess above pseudo code is just not based on your real expression syntax)

a) both dates should be read in as dates by Qlik, having a numerical representation

Why don’t my dates work?

Get the Dates Right

b) You probably need to use an aggregation function around the if() conditional, i,e, rather use

=Sum( If( X=Y, Z))

than

=If( X=Y, Sum(Z))

I say probably, because the correct syntax is depending on the scope of your expression and your requirement.

For example, second expression may work when it's used with either X or Z as dimension.

Use Aggregation Functions!

If this is still not getting you any further, then please add more context to your request, best by posting a small, reloadable sample application.

d_prashanthredd
Creator III
Creator III

Hi Kanchana,

What I understood in this case..

  1. You have multiple files and loaded data with the date which is available in file name.
  2. Also a master table which contains file n date name.
  3. From above 2, you only bother about Max date.

When you have same columns, automatically data will be appended. For ex. In file 1 you have 10 records with date, col1, ..

and file 2 with 10 records with diff date, col1, ..

Now you will have 20 records after load..

To get your desired output have a variable to hold max date from your other table/master.

Next, resident load from base data with date condition = max date from the variable.

I hope, it will solve your problem.

Not applicable
Author

Hi Prashanth,

Thank you for your response. But as I was discussing with Stefan earlier, I have the max date stored in a variable at the time of loading the files. And my charts and calculations work perfectly when only 1 excel file is present in the directory. But as soon as I load more than one file (since they all have the same column names), the charts are unable to display the right data from the date in which I am interested.

I am rewriting the code from the earlier posts for your benefit :

--------------------------------------------------------------------------------------------------------

PART 1 : This is my master file with the generic information:

<records xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <record>

      <File_No>1</File_No>

      <File_Date>05/02/17</File_Date>

      <File_Stream>Atlmainrecon</File_Stream>

      <File_SWID>404</File_SWID>

  </record>

  <record>

      <File_No>2</File_No>

      <File_Date>06/01/17</File_Date>

      <File_Stream>Atlmain</File_Stream>

      <File_SWID>339</File_SWID>

  </record>

</records>

I have loaded this file in the script as :


SampleCrossTable:

    LOAD

        File_No as "File No",

        File_Date as "File Date",

        File_Stream as "File Stream",

        File_SWID as "File SWID"

    FROM [lib://*/Summary.xml]

(XmlSimple, table is [records/record]);

From this, I can get the date information, the max date as follows :

LOAD

    Date(max("File Date")) as MaxDate

resident SampleCrossTable;

And the date just previous to the latest as :

LOAD

    Date(max("File Date")) as MaxDate

resident SampleCrossTable where not exists(MaxDate,"File Date");


I have checked that I get both the dates as expected.

---------------------------------------------------------------------------------------------------------------


PART 2 : These are my multiple excel data files with the information that has to be selectively used in chart display or for some calculations based on the dates :


Sample_File_Date1 :

Feature_Name                                        Value_1              Value_2

4D TRAK                                                  98%                    75%

Automatic Coil Selection\AcsFcc              81%                    65%

MRE                                                        93%                    40%

PCA                                                        96%                    21%

Sample_File_Date2 :

Feature_Name                                        Value_1              Value_2

4D TRAK                                                  94%                    85%

Automatic Coil Selection\AcsFcc              82%                    21%

MRE                                                        0%                    40%

PCA                                                        67%                    22%


In my load script, I am loading the excel files dynamically as below :


LOAD

    Feature_Name

    Value_1

    Value_2

FROM [lib://../Sample_File_*.xlsx]

(ooxml, embedded labels, table is [SampleSheet]);


---------------------------------------------------------------------------------------------------------------

PART 3 : Now, let us say I want to display the contents of the sample file with the latest date in a pivot table, this is the code I have written in the pivot table :


ROW :

[Feature_Name]


MEASURE :

Only({<[File Date] = MaxDate>} [Value_1])

This is where the problem begins as the values are not being filtered. It only displays those values in the chart when they are the same across all the excel files. The other values are being displayed as null.

-----------------------------------------------------------------------------------------------------------------------

I don't know if I have missed anything, and any help would be great!

Regards,

Kanchana

swuehl
MVP
MVP

And how do you want to display the values if they are not the same?

Only() does work as designed to show only identical values:

The Only Function

You either need to add a dimension (like Value_1) to create more possible rows per combination of dimensional values or use a different aggregation function, like Concat():

=Concat(DISTINCT {<[File Date] = MaxDate >} [Value_1], ', ')

d_prashanthredd
Creator III
Creator III

Hi Kanchana,

I think you can do this way to have a relationship with dates to do some filter stuff or any in front end.

Sample_File_Date1 :


load Feature_Name, Value1, Value2, 'Date1 (ex: 5/6/2017)' as FileDate .....;

Note: Date1 can be extracted from the filename

Feature_Name                                        Value_1              Value_2          FileDate

4D TRAK                                                  98%                    75%          5/6/2017

Automatic Coil Selection\AcsFcc              81%                    65%          5/6/2017

MRE                                                        93%                    40%          5/6/2017

PCA                                                        96%                    21%          5/6/2017

Sample_File_Date2 :

Feature_Name                                        Value_1              Value_2          FileDate

4D TRAK                                                  94%                    85%          4/6/2017

Automatic Coil Selection\AcsFcc              82%                    21%          4/6/2017

MRE                                                        0%                    40%          4/6/2017

PCA                                                        67%                    22%          4/6/2017


Sorry, If I'm talking about something esle.