Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have database with these columns
IP_ADDRESS,
SALE_ID,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID,
In the data source , once a transaction is editted it picks a new SALE_ID while maitaining all the values for the rest of the fields .
How do i filter out editted transactions to only pick one transaction instead of picking two transactions in my QVD.
Do I understand correctly that when a transaction is edited, all values are copied to a new entry with a new SALE_ID?
And you want only the last one?
Yes please
You can group the data on all fields that are just copied and use e.g. LastValue() or Max() for the SALE_ID:
Data:
NoConcatenate Load * Inline [
IP_ADDRESS, SALE_ID, AGENT_NAME, DATE, QUANTITY, CATEGORY, AMOUNT, POOL_NR, AGENT_ID
1.1.1.1, 1, Alice, 01/01/2024, 1, 1, 1, 1, 1
2.2.2.2, 2, Bob, 02/01/2024, 2, 2, 2, 2, 2
1.1.1.1, 3, Alice, 01/01/2024, 1, 1, 1, 1, 1
1.1.1.1, 4, Alice, 01/01/2024, 1, 1, 1, 1, 1
2.2.2.2, 5, Bob, 02/01/2024, 2, 2, 2, 2, 2
2.2.2.2, 6, Bob, 02/01/2024, 2, 2, 2, 2, 2
];
Transactions:
NoConcatenate Load
IP_ADDRESS,
LastValue(SALE_ID) as SALE_ID,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID
Resident Data
Group By
IP_ADDRESS,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID
;
Drop Table Data;
My dataset has more than 5M rows of data
@DGAL1, Do this with Max() or MaxString() depending on the SALE_ID values.
Data:
IP_ADDRESS,
SALE_ID,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID
FROM ...;
Right Join
LOAD Max(SALE_ID) as SALE_ID
Resident Data;
I timed grouping 10M rows into groups of a specific size. 5M rows shouldn't be a problem:
Times: NoConcatenate Load Null() as BatchSize, Null() as Seconds AutoGenerate 0;
For Each vBatchSize in 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000
Data:
NoConcatenate Load
Floor(((RecNo()-1)-1) / $(vBatchSize)) as IP_ADDRESS,
(RecNo()-1) as SALE_ID,
Floor((RecNo()-1) / $(vBatchSize)) as AGENT_NAME,
Floor((RecNo()-1) / $(vBatchSize)) as DATE,
Floor((RecNo()-1) / $(vBatchSize)) as QUANTITY,
Floor((RecNo()-1) / $(vBatchSize)) as CATEGORY,
Floor((RecNo()-1) / $(vBatchSize)) as AMOUNT,
Floor((RecNo()-1) / $(vBatchSize)) as POOL_NR,
Floor((RecNo()-1) / $(vBatchSize)) as AGENT_ID
AutoGenerate 10000000;
Let vStart = Now();
Transactions:
NoConcatenate Load
IP_ADDRESS,
LastValue(SALE_ID) as SALE_ID,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID
Resident Data
Group By
IP_ADDRESS,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID
;
Let vSeconds = Now() - vStart;
Concatenate(Times) Load $(vBatchSize) as BatchSize, $(vSeconds) as Seconds AutoGenerate 1;
Drop Tables Data, Transactions;
Next vBatchSize
Let vBatchSize;
Let vStart;
Let vSeconds;
You can try this ->
Distinct SALE_ID,
IP_ADDRESS,
AGENT_NAME,
DATE,
QUANTITY,
CATEGORY,
AMOUNT,
POOL_NR,
AGENT_ID