Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a series of Excels that I have to load in a QVD. These excels are extractions from a ticketing tool. This extraction is done every week and stored in different excels. The problem is that, if a ticket has been modified (for example, its Status changes from "In execution" to "Completed") in the extraction, this ticket returns with the same ID.
In the excels I also have a field called "Updated" that records the day of the modification. In my QVD I need to get ONLY the ticket that has been modified most recently. This is the code section where I load the excels:
PeticionesTareasTME:
LOAD *
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
My idea was to make a "where" clause similar to the following:
where if (Exists (ID), max (Updated)) // If you find a repeated ID among the excels, return the one with the most recent "Updated" date field.
This sentence has not worked for me but that is what I want to do.
I hope you can help me.
sorry, i just checked it out. Yes, in my data there are identical duplicates in addition to those in which some field is modified and the Updated field is modified.
for identical, this will remove duplicates
PeticionesTareasTME:
LOAD *,rowNo() as RNO
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels);
inner join(PeticionesTareasTME)
Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;
inner join(PeticionesTareasTME)
Load ID,max(RNO)as RNO Resident PeticionesTareasTME Group by ID;
for non identical you will need to decide logic
now there are no duplicates but those duplicates with updated fields do not appear either ... 😞
no other choice I can think of but group by if you want to keep non identicals
PeticionesTareasTME:
LOAD field1,field2.....n(except updated), max(updated) as updated
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels) group by field1,field2.....n(except updated);
inner join(PeticionesTareasTME)
Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;
//field1,field2.....n(except updated) means all fieldnames ',' separated except updated
PeticionesTareasTME:
LOAD distinct
(all field names)
FROM
[$(vPathCSVTLF)/*.xlsx]
(ooxml, embedded labels) ;
inner join(PeticionesTareasTME)
Load ID,max(updated)as updated Resident PeticionesTareasTME Group by ID;
That worked!!! Thank you so so much!! 🙂