Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All
I have in my data base form IDs. These forms can be copied from original and a new ID can be generated. Sometimes these can result in a loop where one was copied from one but another from another
Ie
Form ID. Copy ID
1. 2
2. 1
3. 2
As can be seen here form 2 was copied from 1 and 1 then must have been reupdated and copied from 2. Form 3 was copied from 2.
My master form is 1 as it has the lowest ID. Our original always have lowest ID
How do I identify the original?
Thanks
Hi,
The undirected graph can be quickly constructed from the directed tree by making every arc bi-directional. So if A ==> B is an arc in the graph, you will add arc B ==> A.
So the table in your latest comment becomes the following:
FormID, CopyID
20, 22
21, 20
20, 21
22, 20
in which the last row (22, 20) has been added to the table as the opposite of the first row (20, 22). Both (20,21) and (21,20) are already in the table, so we don't have to add the opposite for these arcs.
You can then feed this table into the FindOriginal() sub. The whole script is as follows:
// Function finding original form
// Returns value in variable vOriginalFormID
Sub FindOriginal(vSearchFormID)
vPos = 0;
vCurrentID = vSearchFormID;
Original:
LOAD * INLINE [
OriginalFormID_Temp
$(vSearchFormID)];
vOriginalLength = NoOfRows('Original');
Do While vPos < vOriginalLength
vCurrentID = Peek('OriginalFormID_Temp',vPos);
Copies_Temp:
LOAD DISTINCT CopyID as CopyID_Temp Resident Copies_Undirected Where FormID = $(vCurrentID);
Concatenate (Original)
LOAD CopyID_Temp as OriginalFormID_Temp Resident Copies_Temp WHERE NOT Exists(OriginalFormID_Temp,CopyID_Temp);
DROP TABLE Copies_Temp;
vPos = vPos + 1;
vOriginalLength = NoOfRows('Original');
Loop;
MinOriginal:
LOAD
MIN(OriginalFormID_Temp) AS MinOriginalFormID
RESIDENT Original;
vOriginalFormID = Peek('MinOriginalFormID');
DROP TABLES Original, MinOriginal;
End Sub;
//Sample data
Copies:
LOAD * Inline[
FormID, CopyID
9, 10
9, 8
8, 10
8, 7
7, 1
7, 8
10, 2
9, 10
4, 5
6, 5
3, 4
4, 3
];
// Convert table into undirected graph
Copies_Undirected_Temp:
NoConcatenate LOAD Distinct * Resident Copies;
Concatenate(Copies_Undirected_Temp)
LOAD CopyID AS FormID,
FormID AS CopyID
Resident Copies_Undirected_Temp;
Copies_Undirected:
NoConcatenate LOAD Distinct * Resident Copies_Undirected_Temp;
DROP TABLE Copies_Undirected_Temp;
Copies_Distinct:
LOAD DISTINCT FormID RESIDENT Copies_Undirected;
Copies_Original:
LOAD * INLINE [FormID, OriginalFormID];
// Loop through each FormID and apply sub FindOriginal
For i=0 to NoOfRows('Copies_Distinct')
vFormID = Peek('FormID',i,'Copies_Distinct');
Call FindOriginal(vFormID);
Concatenate(Copies_Original)
LOAD * INLINE [
FormID, OriginalFormID
$(vFormID), $(vOriginalFormID)
];
Next;
DROP TABLE Copies_Distinct, Copies_Undirected;
Hope this helps,
BR,
Vu Nguyen
Hi,
Interesting problem! Let's consider the following data sample:
To trace the original form for FormID = 9, you can think of this table as the following tree structure:
in which 9 → 10 and 9 → 8 means "form 9 is copied from form 10 and form 8" (corresponding to the first 2 lines in the sample). So if form A is copied from B, then A is the parent node and B is the child node.
This tree is built via a repetitive process. At each step, if a child node is already in the tree, you will not add its child nodes (red nodes in the pictures) to the tree again.
Because the final original always have lowest ID, it's clear that the original should be the node with smallest ID in the tree. For FormID=9, it is form 1 according to the tree.
For each value in FormID field, you will need to generate a tree like this.
Complete script for the above sample data is as follows:
// Function finding original form
// Returns value in variable vOriginalFormID
Sub FindOriginal(vSearchFormID)
vPos = 0;
vCurrentID = vSearchFormID;
Original:
LOAD * INLINE [
OriginalFormID_Temp
$(vSearchFormID)];
vOriginalLength = NoOfRows('Original');
Do While vPos < vOriginalLength
vCurrentID = Peek('OriginalFormID_Temp',vPos);
Copies_Temp:
LOAD DISTINCT CopyID as CopyID_Temp Resident Copies Where FormID = $(vCurrentID);
Concatenate (Original)
LOAD CopyID_Temp as OriginalFormID_Temp Resident Copies_Temp WHERE NOT Exists(OriginalFormID_Temp,CopyID_Temp);
DROP TABLE Copies_Temp;
vPos = vPos + 1;
vOriginalLength = NoOfRows('Original');
Loop;
MinOriginal:
LOAD
MIN(OriginalFormID_Temp) AS MinOriginalFormID
RESIDENT Original;
vOriginalFormID = Peek('MinOriginalFormID');
DROP TABLES Original, MinOriginal;
End Sub;
// Sample data
Copies:
LOAD * Inline[
FormID, CopyID
9, 10
9, 8
8, 10
8, 7
7, 1
7, 8
10, 2
];
Copies_Distinct:
LOAD DISTINCT FormID RESIDENT Copies;
Copies_Original:
LOAD * INLINE [FormID, OriginalFormID];
// Loop through each FormID and apply sub FindOriginal
For i=0 to NoOfRows('Copies_Distinct')
vFormID = Peek('FormID',i,'Copies_Distinct');
Call FindOriginal(vFormID);
Concatenate(Copies_Original)
LOAD * INLINE [
FormID, OriginalFormID
$(vFormID), $(vOriginalFormID)
];
Next;
DROP TABLE Copies_Distinct;
Outcome of the script
Hope it helps!
BR,
Vu Nguyen
Hi Vu,
Thanks for the solution. We have however found an exception.
Have a look at this data.
Copies:
LOAD *
Inline
[
FormID, CopyID
20, 22
21, 22
22, 21
];
Form 20 should be the original for all of these as it is the lowest occurring one. It is linked to the others in a different way.
Hi,
According to the initial description in your first comment, your new table
FormID, CopyID
20, 22
21, 22
22, 21
should be translated as follows:
(for example, in your post, you mentioned that (3,2) means "Form 3 was copied from 2")
With this logic in mind, there is actually no information in the current table saying any form is copied from Form20
So either your problem description has changed, which no longer fits into the suggested script logic;
or data in your table is incomplete. I.E. if Form20 is indeed the original of all, why isn't there at least one row in the table capturing the event that something has been captured from Form20?
Or in other words, why Form20 doesn't appear in field CopyID at all?
BR,
Vu Nguyen
Hi,
For some reason Qlik community continuously marks my comment in this post as spam and removed it. Have no idea why 🤔
Please see my answer below
Hi Vu,
Yes this is an awkward problem.
Looks like the data is sometimes incomplete. The lowest ID is always the original but sometimes this copy id is replaced by a copy id from a different higher number.
In this case. What I think has happened is the following:
Form 20 was beginning and copied to 21
Form 21 was copied to 22
22 was copied back to 21
22 was also then copied back to 20.
So the copied field is replaced from the original when you make an additional copy.
We just need a way to link these numbers together and pick the lowest one.
So 21 links to 22 and 20 links to 22. Therefore lowest number is 20 which is original.
Sorry it's tricky to explain. You script works but just not for this situation.
Hi,
If it's just the matter of "picking out the lowest ID in a group of linked forms", then you can think of the table as an undirected graph instead of a tree. In the graph, FormIDs are nodes. For both events "A is copied from B" and "B is copied from A", you draw one unique undirected arc A--B
Your table then becomes a graph like this
The problem is then reduced to "find the node with smallest ID in the graph", which is 20 in this case.
However, if you apply this logic, then the data sample in my first comment becomes this graph
which gives a result that Form1 is the original of all 6 forms because they're all connected and 1 is the lowest ID. So original of Form10 is not Form2 like in my last comment, but Form1.
If this is the result you're looking for, we can tweak the script to build this undirected graph instead of tree.
Otherwise, for a complete solution, you will need to think of a way to generate the hidden event "Form 20 was beginning and copied to 21" as a row in your latest data table (FormID=21, CopyID=20). I think this cannot be done at Qlik script level and should be implemented in the system that generates the data source.
With an incomplete input data, it is usually very tricky to generate the expected insight 🙂
Best Regards,
Vu Nguyen
HI Vu,
Yes it looks like the forms could be linked in any way and not necessarily a tree fashion.
You second example of the undirected graph is correct!
Can we change the script so that it works that way.
Thanks for all your help! My challenge is now to understand how your script works. My Qlik is okay but not at that level.
Hi,
The undirected graph can be quickly constructed from the directed tree by making every arc bi-directional. So if A ==> B is an arc in the graph, you will add arc B ==> A.
So the table in your latest comment becomes the following:
FormID, CopyID
20, 22
21, 20
20, 21
22, 20
in which the last row (22, 20) has been added to the table as the opposite of the first row (20, 22). Both (20,21) and (21,20) are already in the table, so we don't have to add the opposite for these arcs.
You can then feed this table into the FindOriginal() sub. The whole script is as follows:
// Function finding original form
// Returns value in variable vOriginalFormID
Sub FindOriginal(vSearchFormID)
vPos = 0;
vCurrentID = vSearchFormID;
Original:
LOAD * INLINE [
OriginalFormID_Temp
$(vSearchFormID)];
vOriginalLength = NoOfRows('Original');
Do While vPos < vOriginalLength
vCurrentID = Peek('OriginalFormID_Temp',vPos);
Copies_Temp:
LOAD DISTINCT CopyID as CopyID_Temp Resident Copies_Undirected Where FormID = $(vCurrentID);
Concatenate (Original)
LOAD CopyID_Temp as OriginalFormID_Temp Resident Copies_Temp WHERE NOT Exists(OriginalFormID_Temp,CopyID_Temp);
DROP TABLE Copies_Temp;
vPos = vPos + 1;
vOriginalLength = NoOfRows('Original');
Loop;
MinOriginal:
LOAD
MIN(OriginalFormID_Temp) AS MinOriginalFormID
RESIDENT Original;
vOriginalFormID = Peek('MinOriginalFormID');
DROP TABLES Original, MinOriginal;
End Sub;
//Sample data
Copies:
LOAD * Inline[
FormID, CopyID
9, 10
9, 8
8, 10
8, 7
7, 1
7, 8
10, 2
9, 10
4, 5
6, 5
3, 4
4, 3
];
// Convert table into undirected graph
Copies_Undirected_Temp:
NoConcatenate LOAD Distinct * Resident Copies;
Concatenate(Copies_Undirected_Temp)
LOAD CopyID AS FormID,
FormID AS CopyID
Resident Copies_Undirected_Temp;
Copies_Undirected:
NoConcatenate LOAD Distinct * Resident Copies_Undirected_Temp;
DROP TABLE Copies_Undirected_Temp;
Copies_Distinct:
LOAD DISTINCT FormID RESIDENT Copies_Undirected;
Copies_Original:
LOAD * INLINE [FormID, OriginalFormID];
// Loop through each FormID and apply sub FindOriginal
For i=0 to NoOfRows('Copies_Distinct')
vFormID = Peek('FormID',i,'Copies_Distinct');
Call FindOriginal(vFormID);
Concatenate(Copies_Original)
LOAD * INLINE [
FormID, OriginalFormID
$(vFormID), $(vOriginalFormID)
];
Next;
DROP TABLE Copies_Distinct, Copies_Undirected;
Hope this helps,
BR,
Vu Nguyen
Came to say thanks!
Works like a charm