Skip to main content
Announcements
Happy New Year! Cheers to another year of collaboration, connections and success.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Peek and Concatenate missunderstanding

Hello,  I have a little problem that I think that comes from the misunderstanding of the Concatenate or Peek functions. As far as I know those functions do:

    Concactenate (Old_Table): Used when loading a table this adds the rows of the new tabla at de end of (in the last position) the Old_Table to create the New_Table.

    Peek(Field): Written this way, this function returns the value of the last row of the new table.

Given This, suppose that I have a Table named "Old Table"  whit data of some product that includes sn (Serial Number) , date (Timestamp), and Color. What I want is to Concatenate Table_With_Duplicates to it so i use a where statement with peeks to check if the last row value is the same as the new one I want to introduce and discard it.

My code:


Concatenate(Old_Table):
load *
Resident Table_Sorted_With_Duplicates wherenot(peek(date)=date and sn=peek(sn) and COLOR=peek(COLOR));
drop table Table_Sorted_With_Duplicates;

With this code, when on the old table still there but if I create first a table without duplicates and then concatenate it works:

NoConcatenate

Table_Clean:

load *
Resident Table_Sorted_With_Duplicates wherenot(peek(date)=date and sn=peek(sn) and COLOR=peek(COLOR));
drop table Table_Sorted_With_Duplicates;


Concatenate(Old_Table):

load *
Resident Table_Clean;

drop table Table_Clean;


My point is that there is something that I misunderstand abount the Concatenate function on QV and when you use it the new rows are not added at the end of the table where the peek is checking. 

I provide you an example to test it. Also, I have testested with peek( Variable, 1)  in order to know if the values are added on the top an does not have worked.

1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hello, I think I Find out what is really happening.

The Way QV does the concatenate is not ROW by ROW unless it must do it this way. If is possible for the QV just adds the whole table with one step and then checks the where condition after it. It is easy to see that if the that this consumes much less memory and is faster than do it row by row.

This process is like if it was pasting the memory location of the first row just after the last row of the previous table in order to start with the new one.


This does not happen if there is a calculated field on the load, then the QV is forced to create it row by row.

Event though I marked this answer as the correct (because is the one that answers the question), If someone reads this searching for a way to eliminate duplicates, please, read the answers of the other people, among them there are pretty good ways to do it.

View solution in original post

11 Replies
prieper
Master II
Master II

Cannot rebuild your problem, your assumptions on concatenate and peek are correct.

You may share some sample data in order to have a closer look

please find a simplified model attached.

Peter

Anonymous
Not applicable
Author

Hello prieper‌  I have attached you a QV with a better example. I have attatched on the original post a QV with an example. Because I can not do it here. On the example there is a variable named vLoadType that allow you to try both cases. Also on the chart in the main page you will see that the duplicates still there.

Thank You for your response

prieper
Master II
Master II

Not sure, but think that the PEEK in the WHERE-clause makes the trouble..

Try PREVIOUS instead.

(Believe that a while ago I have read an article on the difference betw both command)

Edith has found a link to this problem:

Peek() vs Previous() – When to Use Each

Peter

Anonymous
Not applicable
Author

Yeah, with previous() work fine as it should because only looks at the values of the previous table, I read an article once that was talking about that.

What is wondering me is why peek() is failing here.

juliedecoene
Partner - Contributor III
Partner - Contributor III

Hi Eric,

I would suggest to create a temporary key in your first table as follows: AutoNumberHash256(date, sn, COLOR) as Key_temp. Then for the concatenation use the where clause : Where Not Exists('Key_temp', AutoNumberHash256(date, sn, COLOR)).

Julie

Anonymous
Not applicable
Author

I didn't know that way. Does it mean that the AutoNumberHash256() function does not have a value (or it is null) when there is a repetition and that is why you charge just when the field exists?

Also, I think that the one that just applies on the table you are loading from is the previous and that is why it work. :S

juliedecoene
Partner - Contributor III
Partner - Contributor III

The AutoNumberHash will calculate an unique key for each combination of the expressions inside it.

The Exists will compare the new keys with existing ones. Exists checks the field Key_temp in the whole model.


exists(field [ , expression ] )

Determines whether a specific field value exists in a specified field of the data loaded so far. Field is a name or a string expression evaluating to a field name. The field must exist in the data loaded so far by the script. Expr is an expression evaluating to the field value to look for in the specified field. If omitted, the current record’s value in the specified field will be assumed.


I believe this is a more robust solution than only looking at the previous record, since it will check the whole model for duplicates. If you use Peek or Previous you first need to sort your records and apparently it does not work with concatenate, so that's a lot of manipulation that can be avoided with the Exists() function and a key.

juliedecoene
Partner - Contributor III
Partner - Contributor III

I don't know if you noticed in your example, but the 2 last lines with D2, S2 from table 2 are not loaded, which means in the case of using Concatenate and Peek together, you are comparing all the records of table 2 with the last one of table 1. So it seems to be a consequence the order in which QV is performing the different operations. Sorry I cannot explain more than that why it's not working as you expected.

Anonymous
Not applicable
Author

I like a lot your solution in order to remove duplicates!

It's true that is really robust and you don't need to load the table two times (which is really good to clean the script). I wonder if it is faster or not than the one with previous(), but right now for me this is not much important.

Thank You