I am having trouble with application data I am loading. I am doing analysis on all applications we receive. The problem is that certain applications are logged more than once. If an application has missing information, the system will auto-decline it and send it back. When it is then returned a day or 2 later if may be approved or declined.
This is messing up my figures, as I only want to know about each application once - the latest one. So I have been using the distinct function in my expressions (so it only counts unique application numbers), but in a nutshell - there are still problems.
What I need to do to solve everything is to NOT LOAD the old applications that were sent back and logged again. So in my script, I want to say that all [Appl No]'s must be unique, and when they appear more than once, it most only load the latest one.
Here are the relevant fields:
Load [Appl No],
FROM [....../application extract/Daily_Application_Extract_*.txt]
So let's say [Appl No] 1234 appears in the application extract of 12 June and is declined for missing information and sent back. Then it appears again in the application extract of 14 June and this time it is approved.
So [Appl No] will be the same, [Log Date] will be the same, but [Decision Date] and [Decision Result] will be different.
Whenever this happens (so when the [Appl No] already exists) I want to overwrite the previous data and I only want to load the data for the 14th of June. I don't want to know anything about the application of the 12th only the one where [Decision Date] is the LATEST.
Can someone please help? I need to do this in my script.
1) what does your Daily_Application_Extract file contain?
just one days log? or all the history?
2) what is the missing piece of your file name? the suffixe that you have replaced with an asterisk?
3) your file contains many fields which you require for other analysis (demographics etc.). Are you sre you wil alays receive these withe the latest record relating to your application? this is to asceratin whether you will be safe in discarding all but the most recente record linked to an application.
4) is Appl No the unique key for a given application?
1) It is just a daily log - all applications with a [Decision Date] for the particular day.
2) The missing piece is the file name (Date) and timestamp (date generated) added to the file name of each day. A file will be called Daily_Application_Extract_A_DD-MMM-YYYY_ddmmyyyhhmmss.txt. I replace the variable part with an asterisk so the script loads all files in the folder.
3) Yes - All files are identical and will always contain the same fields. I therefore will have all the data I need using just the latest application.
I'll help further if I can but start by looking at help on qvd files - it is the native read/write storage for qv.
basic idea is
1 you read only the latest file.
2 you pull from your qvd all applications not present in your latest file (thus ensuring you only have one record per application and that this record is the most recent one) - this will be doe with a "where not exists([Appl No]) " clause.
3 you write to the qvd file
and so it continues each day.
for this you will need to get a variable with the right format into your load statement for your text file.
start by looking on the forum for posts on qvds and incremental loads.
Damn - I hate it when that happens. I appreciate your help a lot.
I will look into it as soon as I get a chance. Here are just a few notes - I am not sure if they will affect the qvd idea in any way:
I use QV Personal Edition, for the time being at least.
This particular QV file has two parts - it shows applications received (for a new company we have taken over), and then it shows the transactions and turnover generated by these new accounts (i.e. all applications that were approved).
To do this I use data from 3 sources:
The Daily Account Extract - this is a daily snap of ALL accounts, and is updated daily. So this file I do not concatenate - I overwrite the file daily. This shows me things like account status, last purchase/payments dates, etc.
The Daily Application Extract (as discussed earlier). Here I get all application and demographical data from, e.g. approval rate per store, per gender, per day, etc etc. I also use the [Appl No] field as a synthetic key, because this field is also in the above-mentioned Account Extract. So if I select a particular application that was approved, I can also see this person's transactions.
The Daily Transaction Extract. This I load in the same way as the application extract - every day. It shows details on all transactions.
yes it is all relevant, well except for the personal edition question the impact of which I know little but it shouldn't prevent you using qvds.
in an earlier thread you posted your load statement which in fact is just your DailyApplicationExtract. the qvd idea will work on that load when dealt with in isolation. If your model has hookups/links to ther tables with differening granularity life becomes more complicated - like life is.
I wont be able to go much further than this - I'm sure locally you have some experienced qv modellers who could come in and provide some consultancy because that wouldn't be money wasted.
try and build the qvd incremental load but in a new qv document so that you don't mess up what you already have . When it's done and giving you the results you want then think how it fits with your other stuff.
Thanks for all the help - I will keep investigating the issue. For the time being though, I am a sole QV user in a sea of excel fans. For now consulting isn't an option until I can convince them we need to properly inplement qlikview:
One last question:
Is there any way without using incremental loads or QVD files that I can just update and delete within my script?
So if I am loading from a folder as I am now, and the [Appl No] exists, then just overwrite it with the latest file?
In the QV Help it says:
Distinct is a predicate used if only the first of duplicate records should be loaded.
Is there anyway I can change this to only load the last? Or alternatively to order my files so they are loaded from newest to oldest?
Then I can just say
LOAD distinct [Appl No].
although I get an error message when I do that so I guess the wording is wrong.
check your previous post. Jeroen povided you with some code - in his last post- to extract each application and its latest record. that will get you some of the way. can't say hwo it will ist inside your qv but gaina laod it up first in a new qv on its own till you get it right.