Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi all,
I have 2 very differente data sources that only have 1 field in common: the Invoice Number. Each invoice has several items, identified with a number called "position". However, in data source A, the first item is identified with the number 1, but in data source B it is identified with number 20 and subsequent items are may have different numbers. As a result, I cannot use a Invoice Number&Position to join both tables. I can't use the name of the item because they are called differently in each source.
The good thing is that both tables organize the Position of each invoice in the same order, like this:
LOAD * INLINE
[
Invoice,Position,Item
100,1,Apples
100,2,Bricks
100,3,Notebooks
100,4,laptops
100,5,Bag
100,6,Mouse,
100,6,Mouse,
,,,,
](delimiter is ',');
LOAD * INLINE
[
Invoice2,Position2,Item2
100,20,Red Apples
100,40,yellow bricks
100,70,110009 notebooks
100,90,Laptop dell 123
100,100,Leather bag 0939
100,150,Ergonomic mouse Sharkk22
100,150,Ergonomic mouse Sharkk22
,,,,
](delimiter is ',');
I would like to create an additional column whereas each iteration of the invoice gets counted starting for number 1.
How can I do that?
Thank you in advance for your help!
I think you are missing a 1 in the else branch of your if() statement.
And use Rangesum() instead of Rangecount()
If(previous(invoice)=invoice, Rangesum(peek('ID'),1),1) as ID
I found also the issue with the Autonumber(), since Invoice is the same in both tables, we need to add a table identifier to create unique buckets per table and invoice values:
LOAD *,
Autonumber(Recno(), 'T1'&Invoice) as InvoiceDetailID,
//if(previous(Invoice)=Invoice, Rangesum(peek('ID'),1),1) as ID
INLINE
[
Invoice,Position,Item
200,20,Chairs
200,30,Chairs
100,1,Apples
100,2,Bricks
100,3,Notebooks
100,4,laptops
100,5,Bag
100,6,Mouse,
100,6,Mouse,
](delimiter is ',');
LOAD *,
Autonumber(Recno(), 'T2'&Invoice2) as InvoiceDetailID,
//If(previous(Invoice2)=Invoice2, Rangesum(peek('ID'),1),1) as ID
INLINE
[
Invoice2,Position2,Item2
100,20,Red Apples
100,40,yellow bricks
100,70,110009 notebooks
100,90,Laptop dell 123
100,100,Leather bag 0939
100,150,Ergonomic mouse Sharkk22
100,150,Ergonomic mouse Sharkk22
200,20,Chairs
200,30,Chairs
](delimiter is ',');
But with a lot of invoices, the Peek() method is perfoming better.
Try something like
LOAD
Autonumber(Recno(), Invoice) as InvoiceDetailID,
...
FROM Table;
Hi Stefan,
Thanks for your help. It does work! However, I have over 1 million rows and its taking over 1hr to process it. Is there any way to improve its speed?
Thanks,
Luis
It actually didn't work . If Invoice 200 happens to be before Invoice 100, it returns wrong values.
It shouldn't depend on the order of Invoices, only the order of Positions should matter.
Can you show a sample (INLINE table script or at least a screen shot) that demonstrates the issue?
The Autonumber could be slow if you have a lot of distinct Invoice numbers (buckets to create an autonumber set). The faster method would be to use good old sorted input table with Peek() function to create the position counter.
Hi Stefan,
When I run your script on a table that is not sorted, it returns wrong values.
LOAD * INLINE
[
Invoice,Position,Item
200,20,Chairs
200,30,Chairs
100,1,Apples
100,2,Bricks
100,3,Notebooks
100,4,laptops
100,5,Bag
100,6,Mouse,
100,6,Mouse,
,,,,
](delimiter is ',');
LOAD * INLINE
[
Invoice2,Position2,Item2
100,20,Red Apples
100,40,yellow bricks
100,70,110009 notebooks
100,90,Laptop dell 123
100,100,Leather bag 0939
100,150,Ergonomic mouse Sharkk22
100,150,Ergonomic mouse Sharkk22
200,20,Chairs
200,30,Chairs
,,,,
](delimiter is ',');
I tried using the following script:
If(previous(invoice)=invoice, rangecount(peek('ID'),1),) as ID
Seems like it works and its pretty fast. Any suggestion to make it better or more consistent?
I think you are missing a 1 in the else branch of your if() statement.
And use Rangesum() instead of Rangecount()
If(previous(invoice)=invoice, Rangesum(peek('ID'),1),1) as ID
I found also the issue with the Autonumber(), since Invoice is the same in both tables, we need to add a table identifier to create unique buckets per table and invoice values:
LOAD *,
Autonumber(Recno(), 'T1'&Invoice) as InvoiceDetailID,
//if(previous(Invoice)=Invoice, Rangesum(peek('ID'),1),1) as ID
INLINE
[
Invoice,Position,Item
200,20,Chairs
200,30,Chairs
100,1,Apples
100,2,Bricks
100,3,Notebooks
100,4,laptops
100,5,Bag
100,6,Mouse,
100,6,Mouse,
](delimiter is ',');
LOAD *,
Autonumber(Recno(), 'T2'&Invoice2) as InvoiceDetailID,
//If(previous(Invoice2)=Invoice2, Rangesum(peek('ID'),1),1) as ID
INLINE
[
Invoice2,Position2,Item2
100,20,Red Apples
100,40,yellow bricks
100,70,110009 notebooks
100,90,Laptop dell 123
100,100,Leather bag 0939
100,150,Ergonomic mouse Sharkk22
100,150,Ergonomic mouse Sharkk22
200,20,Chairs
200,30,Chairs
](delimiter is ',');
But with a lot of invoices, the Peek() method is perfoming better.