Solved: Re: "where" clause to show the most recent record ... - Qlik Community

BS · ‎2019-09-18

I have a series of Excels that I have to load in a QVD. These excels are extractions from a ticketing tool. This extraction is done every week and stored in different excels. The problem is that, if a ticket has been modified (for example, its Status changes from "In execution" to "Completed") in the extraction, this ticket returns with the same ID.

In the excels I also have a field called "Updated" that records the day of the modification. In my QVD I need to get ONLY the ticket that has been modified most recently. This is the code section where I load the excels:

PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);

My idea was to make a "where" clause similar to the following:

where if (Exists (ID), max (Updated)) // If you find a repeated ID among the excels, return the one with the most recent "Updated" date field.

This sentence has not worked for me but that is what I want to do.

I hope you can help me.

asinha1991 · ‎2019-09-18

for identical, this will remove duplicates

PeticionesTareasTME:
LOAD *,rowNo() as RNO
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);

inner join(PeticionesTareasTME)

Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;

inner join(PeticionesTareasTME)

Load ID,max(RNO)as RNO Resident PeticionesTareasTME Group by ID;

for non identical you will need to decide logic

View solution in original post

StarinieriG · ‎2019-09-18

Hi,

you could try to sort data by ID and Updated (desc), then use firstsortedvalue

asinha1991 · ‎2019-09-18

I don't know how many columns you have hence wont suggest group by all , you can do this

PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);

inner join(PeticionesTareasTME)

Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;

BS · ‎2019-09-18

It did take some duplicates away but not all of them.

Thank you so much anyway.

Any other ideas?

StarinieriG · ‎2019-09-18

Maybe using right join with the same logic suggested by @asinha1991

PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);

Right Join

LOAD

ID,

Max(Updated) as Updated

Resident PeticionesTareasTME;

asinha1991 · ‎2019-09-18

you can try what @StarinieriG had suggested

but as per my view, both inner join and right join should work, can you share data where you see mismatch?

I think there is some kind of issue with grouping (looks like what seems to be same ID are creating multiple groups, maybe you have space in some of them?)

BS · ‎2019-09-18

Here's the results i'm getting ("#" = ID and "Actualizado" = Updated).

https://snipboard.io/YZ836f.jpg (sorry for the external link, I'm getting an error trying to insert a picture)

As you can see, I get the record twice with the two dates of the updated field, when I really only need to get the most recent one.

BS · ‎2019-09-18

wait, in the same image you can see that there are identical records in both ID and Updated. I think the code you suggested is generating duplicates, maybe?

asinha1991 · ‎2019-09-18

are there duplicates in source data?

asinha1991 · ‎2019-09-18

try this once and also add trim() in ID just to make sure
PeticionesTareasTME:
LOAD *,rowNo() as RNO,num(updated) as updated_num
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
inner join(PeticionesTareasTME)
Load ID, num(max(updated)) as updated_num,max(RNO) as RNO Resident PeticionesTareasTME Group by ID;

"where" clause to show the most recent record among duplicates