Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
harsha
Creator
Creator

Compare two itemIDs' values and highlight with flag

Hi guys
Need a help

EX:

 

Company ID:4129034
Year: 2017

DataID: 60634843286 and Value : 287966000
DataID: 60634843285 and Value :145077000

Rule :
60634843286>60634843285

 

I need to compare two Data points in the same company and within year, if one is value is greater than other value, then those company IDs must have some flag

Or I want to highlight those companies in graph (Image Attached )

 

your help is highlighy appreciated as it will crack for 12 similar cases

29 Replies
harsha
Creator
Creator
Author

@Saravanan_Desingh 

 

Thank you very much.. it worked and helped me to get the right results...! 😀

harsha
Creator
Creator
Author

Hi Saran,

Hope you are doing great..!

I really appreciate your help/solution on my problem earlier ' Compare two itemIDs' values and highlight with flag '.  It was working awesome with 9 million data.

I received a new QVD file from other team to find the discrepancies today with 26 million data and I have applied the same solution but this time it is 3 hours still the data is in loading stage. I tried making some changes to see if the loading gets faster but no use.  

Request you to please help ..!

I applied this solution:

tab1:
LOAD RowNo() As RowID,* INLINE [
    CompanyID, PointID, Year, Value
    635939, 5290, 2017, 350
    635939, 5291, 2017, 250
    635939, 5290, 2018, 200
    635939, 5291, 2018, 300
    635939, 5290, 2019, 420
    635939, 5291, 2019, 280
    516278, 5290, 2017, 810
    516278, 5291, 2017, 850
    516278, 5290, 2018, 460
    516278, 5291, 2018, 340
    516278, 5290, 2019, 290
    516278, 5291, 2019, 310
    635939, 6320, 2017, 460
    635939, 6321, 2017, 320
    635939, 6320, 2018, 650
    635939, 6321, 2018, 750
    635939, 6320, 2019, 820
    635939, 6321, 2019, 600
    516278, 6320, 2017, 270
    516278, 6321, 2017, 350
    516278, 6320, 2018, 510
    516278, 6321, 2018, 750
    516278, 6320, 2019, 470
    516278, 6321, 2019, 360
];

tab2:
LOAD CompanyID, Year, Value As Value1
Resident tab1
Where PointID=5290;

Left Join(tab2)
LOAD CompanyID, Year, Value As Value2
Resident tab1
Where PointID=5291;

Left Join(tab1)
LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Glsag
Resident tab2;

Drop Table tab2;

 

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi @harsha 

When loading from very large QVDs it is crucial that you ensure that they load with Optimised QVD Load. This will not be entirely possible in this case, but you can load the first QVD optimised and the second not.

Load the data as follows:

Compare:
LOAD
  CompanyID,
  PointId,
  Year,
  Value
FROM [lib://MyQVDs/QVD1.qvd] (qvd);

CONCATENATE (Compare)
LOAD
  CompanyID,
  PointId,
  Year,
  Value as Compare
FROM [lib://MyQVDs/QVD2.qvd] (qvd);

 

Rather than joining, so every value is on one row (which will take forever to complete) you will have two rows per Company, Year and PointID, in the same table.

In the front end you can then create a table which compares values when Company, Year and PointID are the same.

Create a tables with the three dimensions and the following four expressions:

sum(Value))
sum(Compare))
abs(sum(Value)-sum(Compare))
abs(sum(Value)-sum(Compare)) / rangemax(sum(Value),sum(Compare))

To only show differences, and not places where the numbers are the same, prefix the first two expressions with if(sum(Value)<>sum(Compare), 

This will then show you, at every data point, the values in each column, the difference and the variance as a percentage, so you can zero in on the biggest differences.

You could have the same table with fewer dimensions, to see if each year was the same or each company etc.

What I will often do when doing compares like this is have a pick list of fields from $Field and have this as the dimension on the table - so you can view the difference by any selected dimension. You can see how to do this in my Sense Data Profiler.

Anywhere you have a RESIDENT in your load script will cause the load to take longer (as it has to do things twice), if you can avoid these you will be in a better place performance wise.

Hope that helps.

Steve

harsha
Creator
Creator
Author

Hi Steve, 

thanks for the detailed explanation and i always appreciate your response that includes with a performance optimization 🙂.

You mentioned to concatenate two QVDs. However,  I have to load and compare values from one QVD.

harsha
Creator
Creator
Author

also, i have almost 6 comparisons with flags to identify discrepancies. 

ex:

********Flag-1************
tab2:
LOAD CompanyID, Year, Value As Value1
Resident tab1
Where PointID=5290;

Left Join(tab2)
LOAD CompanyID, Year, Value As Value2
Resident tab1
Where PointID=5291;

Left Join(tab1)
LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Flag_1
Resident tab2;

Drop Table tab2;

********Flag-2************
tab2:
LOAD CompanyID, Year, Value As Value1
Resident tab1
Where PointID=6320;

Left Join(tab2)
LOAD CompanyID, Year, Value As Value2
Resident tab1
Where PointID=6321;

Left Join(tab1)
LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Flag_2
Resident tab2;

ETC

harsha
Creator
Creator
Author

However, I want to compare the value of pointID (60634843286) with other value of pointID (60634843285)

@stevedark  

 

This is what exactly i do 

Ex:

CompanyID: 4129034, year: 2017, value of pointID: 60634843286 is 287966000 greater than the 
CompanyID: 4129034, year: 2017, value of pointID: 60634843285 is 145077000

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

How do you know that it is 60634843285 that you want to compare with 60634843286 and not some other number? Is it that you want to compare consecutive IDs, or could the ID pairs be in any sequence?

In that case you can just do an optimised load of your QVD with all the data.

What you could have is a Filter Pane with the IDs in and then the user can select two for comparison. You would need a table with Company and Year as dimensions and your expressions would be something like:

sum({<PointId={$(=min(PointId))}>}Value)


sum({<PointId={$(=min(PointId,2))}>}Value)


abs(sum({<PointId={$(=min(PointId))}>}Value)-sum({<PointId={$(=min(PointId,2))}>}Value))


abs(sum({<PointId={$(=min(PointId))}>}Value)-sum({<PointId={$(=min(PointId,2))}>}Value)) / rangemax(sum({<PointId={$(=min(PointId))}>}Value),sum({<PointId={$(=min(PointId,2))}>}Value))

You could then pick any two point ids from the list and get a table of the differences.

If you only want to load the 12 point ids from the QVD in the first place then loading with a WHERE EXISTS preserves the optimised load and will be much quicker than using where statements:

tmpIDs:
LOAD
  PointId
INLINE [
PointId
60634843285
60634843286
60634843287
60634843288
];

Compare:
LOAD
  CompanyID,
  PointId,
  Year,
  Value
FROM [lib://MyQVDs/QVD1.qvd] (qvd)
WHERE EXISTS (PointId);

DROP TABLE tmpIDs;

 

Using a WHERE EXISTS in that way will be many many times quicker than using WHERE PointId =.

Hope that helps.

harsha
Creator
Creator
Author

Hi @stevedark ,

1. Yes, my requirement is to compare 60634843285 with 60634843286 and there are other comparisons that there is no sequence and PointIDs are completely diff.  

Q: how to optimise load of your QVD with all the data?

2. My requirement is to create diff comparison Flags by comparing diff PointIDs.

3. I kept FROM [lib://MyQVDs/QVD1.qvd] (qvd)
WHERE EXISTS (PointId);

however, nothing was loaded. all charts and tables are empty. Do I need to specify the pointIDs in WHERE EXISTS?

4. I require a script to create flags by comparing diff PointIDs rather than filtering on front end like below. However, it is taking too much of time (hours) also, it stucks at one stage

tab2:
LOAD CompanyID, Year, Value As Value1
Resident tab1
Where PointID=5290;

Left Join(tab2)
LOAD CompanyID, Year, Value As Value2
Resident tab1
Where PointID=5291;

Left Join(tab1)
LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Flag_1
Resident tab2;

Drop Table tab2;

 Thanks a lot for your quick responses and help... 🙂

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

The WHERE EXISTS relies on you having the list of values in the field loaded into a table first. That is what the INLINE table is doing in my example above. It's important that the field name is identical and the values are the same. I see you have both PointId and PointID in messages above, ensure this is always the same. Also, the values have to be identical, you may need to force the IDs to be numeric in the inline if they are numeric in the table.

To get the flags in the script you will need to load with a GROUP BY, something like:

LOAD
  *,
  Comp1A - Comp1B as Comp1Diff,
  if(Comp1A = Comp1B, 'Yes', 'No') as Comp1Match,
  Comp2A - Comp2B as Comp2Diff,
  if(Comp2A = Comp2B, 'Yes', 'No') as Comp2Match
  ;
LOAD
  Company,
  Year,
  sum(if(PointID = 60634843286, Value, 0)) as Comp1A,
  sum(if(PointID = 60634843287, Value, 0)) as Comp1B,
  sum(if(PointID = 60634843288, Value, 0)) as Comp2A,
  sum(if(PointID = 60634843289, Value, 0)) as Comp3B
FROM [lib://MyQVD/QVD.qvd] (qvd)
WHERE EXISTS (PointID)
GROUP BY
  Company,
  Year
;

 

I'm not sure if the preceding load will work correctly on a group by. You may need to split it into two loads with a RESIDENT load, but if it works this way that will be better.

The GROUP BY will cause it not to be optimised, so it will be slower, but it should work.

If the twelve PointIDs are only a small subset of the whole data you could do an optimised load of those (using the load statement from my previous post) and STORE that to a new QVD and just work from that going forward.

Steve

harsha
Creator
Creator
Author

@stevedark , I tested this script and optimized as per your suggestions. Brilliant thing is it loaded only the required PointIDs.

but one question:

i want to see if Value of DataPointID 60684989045 < Value of DataPointID 60684986288 as Flag_1

and if Value of DataPointID 4300689< Value of DataPointID 60684986288 as Flag_2

Is this work? and what is the highlighted bold does?

"LOAD
*,
Comp1A - Comp1B as Comp1Diff, if(Comp1A <= Comp1B, 'Yes', 'No') as Flag_1,
Comp2A - Comp2B as Comp2Diff, if(Comp2A < Comp2B, 'Yes', 'No') as Flag_2
;
LOAD
CompanyID,
[Time Series Year],
sum(if(DataPointID = 60684989045, VALUE, 0)) as Comp1A,
sum(if(DataPointID = 60684986288, VALUE, 0)) as Comp1B,
sum(if(DataPointID = 4300689, VALUE, 0)) as Comp2A,
sum(if(DataPointID = 60684986288, VALUE, 0)) as Comp2B

Resident RAWDATA

WHERE EXISTS (PointID)
GROUP BY
Company,
[Year]