Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a series of Excels that I have to load in a QVD. These excels are extractions from a ticketing tool. This extraction is done every week and stored in different excels. The problem is that, if a ticket has been modified (for example, its Status changes from "In execution" to "Completed") in the extraction, this ticket returns with the same ID.
In the excels I also have a field called "Updated" that records the day of the modification. In my QVD I need to get ONLY the ticket that has been modified most recently. This is the code section where I load the excels:
PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
My idea was to make a "where" clause similar to the following:
where if (Exists (ID), max (Updated)) // If you find a repeated ID among the excels, return the one with the most recent "Updated" date field.
This sentence has not worked for me but that is what I want to do.
I hope you can help me.
for identical, this will remove duplicates
PeticionesTareasTME:
LOAD *,rowNo() as RNO
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
inner join(PeticionesTareasTME)
Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;
inner join(PeticionesTareasTME)
Load ID,max(RNO)as RNO Resident PeticionesTareasTME Group by ID;
for non identical you will need to decide logic
Hi,
you could try to sort data by ID and Updated (desc), then use firstsortedvalue
I don't know how many columns you have hence wont suggest group by all , you can do this
PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
inner join(PeticionesTareasTME)
Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;
It did take some duplicates away but not all of them.
Thank you so much anyway.
Any other ideas?
Maybe using right join with the same logic suggested by @asinha1991
PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
Right Join
LOAD
ID,
Max(Updated) as Updated
Resident PeticionesTareasTME;
you can try what @StarinieriG had suggested
but as per my view, both inner join and right join should work, can you share data where you see mismatch?
I think there is some kind of issue with grouping (looks like what seems to be same ID are creating multiple groups, maybe you have space in some of them?)
Here's the results i'm getting ("#" = ID and "Actualizado" = Updated).
https://snipboard.io/YZ836f.jpg (sorry for the external link, I'm getting an error trying to insert a picture)
As you can see, I get the record twice with the two dates of the updated field, when I really only need to get the most recent one.
wait, in the same image you can see that there are identical records in both ID and Updated. I think the code you suggested is generating duplicates, maybe?
are there duplicates in source data?
try this once and also add trim() in ID just to make sure
PeticionesTareasTME:
LOAD *,rowNo() as RNO,num(updated) as updated_num
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
inner join(PeticionesTareasTME)
Load ID, num(max(updated)) as updated_num,max(RNO) as RNO Resident PeticionesTareasTME Group by ID;