Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Write Table now available in Qlik Cloud Analytics: Read Blog
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Complex (?) Merging / Blending Data Query

Hello there,

I’ve been using QlikView (QV) for a while now, but despite the fact I am pretty good at building basic and some lightly complex applications from scratch, I am still by no means an expert by any standard.  I’m looking for some advice / guidance as to the best way forward in answering a query I have which may well come two-fold.

On a yearly basis the organisation I work for purchases data from an external supplier from which I use to incorporate into QlikView (QV) and from therein provide analysis to the rest of the team within the office.

Currently there are two raw tab delimited text files ported across into QV.  The first contains around 16 million rows of data covering a 6 year period between 2007/08 and 2012/13.  The second file contains 5 million rows of data for the periods between 20013/14 and 2014/15.

For information purposes, both files originate from the same source and in most cases have the same fieldnames between the two.  The second file contains a few new fieldnames, which are not contained with the first file (and vice versa some fieldnames which have been removed but appear in the other file).

So, bearing all this in mind, my queries are as follows:

FIRSTLY, I am looking to extrapolate from the first file one year’s worth of data 2012/13 (instead of the full file) and then merge / blend the data from this file with the following two years in the second file.  Essentially what I’m looking to achieve here is to rebuild an application using just three years’ worth of data (as opposed to what would have been eight years’ worth).

Normally – and perhaps you would think that – I would access the raw data file and filter out 2012/13 data before transferring it to another sheet or merging it with the second file. However because the file is so large it makes it virtually impossible to access it without crashing the system.

So, here I am wondering if such a thing is achievable from within QV.  If so, how do I go about achieving this?  What syntax structure should I be using?

SECONDLY, - and probably the biggest bug-bear I have at present – is when I merge two files together as one, particularly where each file has certain fieldname differences between the two, it produces very different results particularly where overall total figures are concerned.

I know this is because QV has problems when it comes to matching fieldnames data and identifies certain fieldnames that it cannot match with another file.  If I remove the unmatchable fieldnames from the LOAD structure, the total results matches what I have recorded on paper.

But I want to be able to include the unmatchable fieldnames into the final table without losing the any of the total results / data in the process.

I have applied the QUALIFY / UNQUALIFY command process into the script structure, but when this is applied to the full reload it shuts down the application because of the command on numerous occasions.  I have modified the structure to incorporate each command on its own or together at the same time.

Is there a way to overcome this problem I’m experiencing?  If so, what syntax structure should I be considering to use?

Perhaps examples of syntax structures would be of help but in any case any advice / guidance would be thankfully received.

1 Solution

Accepted Solutions
petter
Partner - Champion III
Partner - Champion III

Concatenating the two source tables into a concatenated ("merged") table in QlikView should work for you:

[DATA]:

LOAD

    *

FROM

    OLDEST_DATA.CSV (txt, delimiter is \t .... )

WHERE

    Period = '2013/23'  // Don't know how the periods are stored in your file ... so it might have to be modified ...

;

CONCATENATE LOAD

    *

FROM

    NEWEST_DATA.CSV (txt, delimiter is \t ....)

WHERE

    Period >= '2013/14' AND Period <= '2014/15';

View solution in original post

2 Replies
petter
Partner - Champion III
Partner - Champion III

Concatenating the two source tables into a concatenated ("merged") table in QlikView should work for you:

[DATA]:

LOAD

    *

FROM

    OLDEST_DATA.CSV (txt, delimiter is \t .... )

WHERE

    Period = '2013/23'  // Don't know how the periods are stored in your file ... so it might have to be modified ...

;

CONCATENATE LOAD

    *

FROM

    NEWEST_DATA.CSV (txt, delimiter is \t ....)

WHERE

    Period >= '2013/14' AND Period <= '2014/15';

Not applicable
Author

Petter,

Many thanks for your contribution / help.  This was exactly what I needed.  I had considered the CONCATENATE command, but where I had previously originated it from within the structure wasn't exactly in the right place for it work properly.

Once again, many thanks.

Best regards, Mike.