Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi guys
Need a help
EX:
Company ID:4129034
Year: 2017
DataID: 60634843286 and Value : 287966000
DataID: 60634843285 and Value :145077000
Rule :
60634843286>60634843285
I need to compare two Data points in the same company and within year, if one is value is greater than other value, then those company IDs must have some flag
Or I want to highlight those companies in graph (Image Attached )
your help is highlighy appreciated as it will crack for 12 similar cases
Hi Saran,
Hope you are doing great..!
I really appreciate your help/solution on my problem earlier ' Compare two itemIDs' values and highlight with flag '. It was working awesome with 9 million data.
I received a new QVD file from other team to find the discrepancies today with 26 million data and I have applied the same solution but this time it is 3 hours still the data is in loading stage. I tried making some changes to see if the loading gets faster but no use.
Request you to please help ..!
I applied this solution:
tab1: LOAD RowNo() As RowID,* INLINE [ CompanyID, PointID, Year, Value 635939, 5290, 2017, 350 635939, 5291, 2017, 250 635939, 5290, 2018, 200 635939, 5291, 2018, 300 635939, 5290, 2019, 420 635939, 5291, 2019, 280 516278, 5290, 2017, 810 516278, 5291, 2017, 850 516278, 5290, 2018, 460 516278, 5291, 2018, 340 516278, 5290, 2019, 290 516278, 5291, 2019, 310 635939, 6320, 2017, 460 635939, 6321, 2017, 320 635939, 6320, 2018, 650 635939, 6321, 2018, 750 635939, 6320, 2019, 820 635939, 6321, 2019, 600 516278, 6320, 2017, 270 516278, 6321, 2017, 350 516278, 6320, 2018, 510 516278, 6321, 2018, 750 516278, 6320, 2019, 470 516278, 6321, 2019, 360 ]; tab2: LOAD CompanyID, Year, Value As Value1 Resident tab1 Where PointID=5290; Left Join(tab2) LOAD CompanyID, Year, Value As Value2 Resident tab1 Where PointID=5291; Left Join(tab1) LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Glsag Resident tab2; Drop Table tab2;
Hi @harsha
When loading from very large QVDs it is crucial that you ensure that they load with Optimised QVD Load. This will not be entirely possible in this case, but you can load the first QVD optimised and the second not.
Load the data as follows:
Compare:
LOAD
CompanyID,
PointId,
Year,
Value
FROM [lib://MyQVDs/QVD1.qvd] (qvd);
CONCATENATE (Compare)
LOAD
CompanyID,
PointId,
Year,
Value as Compare
FROM [lib://MyQVDs/QVD2.qvd] (qvd);
Rather than joining, so every value is on one row (which will take forever to complete) you will have two rows per Company, Year and PointID, in the same table.
In the front end you can then create a table which compares values when Company, Year and PointID are the same.
Create a tables with the three dimensions and the following four expressions:
sum(Value))
sum(Compare))
abs(sum(Value)-sum(Compare))
abs(sum(Value)-sum(Compare)) / rangemax(sum(Value),sum(Compare))
To only show differences, and not places where the numbers are the same, prefix the first two expressions with if(sum(Value)<>sum(Compare),
This will then show you, at every data point, the values in each column, the difference and the variance as a percentage, so you can zero in on the biggest differences.
You could have the same table with fewer dimensions, to see if each year was the same or each company etc.
What I will often do when doing compares like this is have a pick list of fields from $Field and have this as the dimension on the table - so you can view the difference by any selected dimension. You can see how to do this in my Sense Data Profiler.
Anywhere you have a RESIDENT in your load script will cause the load to take longer (as it has to do things twice), if you can avoid these you will be in a better place performance wise.
Hope that helps.
Steve
Hi Steve,
thanks for the detailed explanation and i always appreciate your response that includes with a performance optimization 🙂.
You mentioned to concatenate two QVDs. However, I have to load and compare values from one QVD.
also, i have almost 6 comparisons with flags to identify discrepancies.
ex:
********Flag-1************
tab2:
LOAD CompanyID, Year, Value As Value1
Resident tab1
Where PointID=5290;
Left Join(tab2)
LOAD CompanyID, Year, Value As Value2
Resident tab1
Where PointID=5291;
Left Join(tab1)
LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Flag_1
Resident tab2;
Drop Table tab2;
********Flag-2************
tab2:
LOAD CompanyID, Year, Value As Value1
Resident tab1
Where PointID=6320;
Left Join(tab2)
LOAD CompanyID, Year, Value As Value2
Resident tab1
Where PointID=6321;
Left Join(tab1)
LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Flag_2
Resident tab2;
ETC
However, I want to compare the value of pointID (60634843286) with other value of pointID (60634843285)
This is what exactly i do
Ex:
CompanyID: 4129034, year: 2017, value of pointID: 60634843286 is 287966000 greater than the
CompanyID: 4129034, year: 2017, value of pointID: 60634843285 is 145077000
How do you know that it is 60634843285 that you want to compare with 60634843286 and not some other number? Is it that you want to compare consecutive IDs, or could the ID pairs be in any sequence?
In that case you can just do an optimised load of your QVD with all the data.
What you could have is a Filter Pane with the IDs in and then the user can select two for comparison. You would need a table with Company and Year as dimensions and your expressions would be something like:
sum({<PointId={$(=min(PointId))}>}Value)
sum({<PointId={$(=min(PointId,2))}>}Value)
abs(sum({<PointId={$(=min(PointId))}>}Value)-sum({<PointId={$(=min(PointId,2))}>}Value))
abs(sum({<PointId={$(=min(PointId))}>}Value)-sum({<PointId={$(=min(PointId,2))}>}Value)) / rangemax(sum({<PointId={$(=min(PointId))}>}Value),sum({<PointId={$(=min(PointId,2))}>}Value))
You could then pick any two point ids from the list and get a table of the differences.
If you only want to load the 12 point ids from the QVD in the first place then loading with a WHERE EXISTS preserves the optimised load and will be much quicker than using where statements:
tmpIDs:
LOAD
PointId
INLINE [
PointId
60634843285
60634843286
60634843287
60634843288
];
Compare:
LOAD
CompanyID,
PointId,
Year,
Value
FROM [lib://MyQVDs/QVD1.qvd] (qvd)
WHERE EXISTS (PointId);
DROP TABLE tmpIDs;
Using a WHERE EXISTS in that way will be many many times quicker than using WHERE PointId =.
Hope that helps.
Hi @stevedark ,
1. Yes, my requirement is to compare 60634843285 with 60634843286 and there are other comparisons that there is no sequence and PointIDs are completely diff.
Q: how to optimise load of your QVD with all the data?
2. My requirement is to create diff comparison Flags by comparing diff PointIDs.
3. I kept FROM [lib://MyQVDs/QVD1.qvd] (qvd)
WHERE EXISTS (PointId);
however, nothing was loaded. all charts and tables are empty. Do I need to specify the pointIDs in WHERE EXISTS?
4. I require a script to create flags by comparing diff PointIDs rather than filtering on front end like below. However, it is taking too much of time (hours) also, it stucks at one stage
tab2: LOAD CompanyID, Year, Value As Value1 Resident tab1 Where PointID=5290; Left Join(tab2) LOAD CompanyID, Year, Value As Value2 Resident tab1 Where PointID=5291; Left Join(tab1) LOAD CompanyID, Year, If(Value1<Value2, 'Yes','No') As Flag_1 Resident tab2; Drop Table tab2;
Thanks a lot for your quick responses and help... 🙂
The WHERE EXISTS relies on you having the list of values in the field loaded into a table first. That is what the INLINE table is doing in my example above. It's important that the field name is identical and the values are the same. I see you have both PointId and PointID in messages above, ensure this is always the same. Also, the values have to be identical, you may need to force the IDs to be numeric in the inline if they are numeric in the table.
To get the flags in the script you will need to load with a GROUP BY, something like:
LOAD
*,
Comp1A - Comp1B as Comp1Diff,
if(Comp1A = Comp1B, 'Yes', 'No') as Comp1Match,
Comp2A - Comp2B as Comp2Diff,
if(Comp2A = Comp2B, 'Yes', 'No') as Comp2Match
;
LOAD
Company,
Year,
sum(if(PointID = 60634843286, Value, 0)) as Comp1A,
sum(if(PointID = 60634843287, Value, 0)) as Comp1B,
sum(if(PointID = 60634843288, Value, 0)) as Comp2A,
sum(if(PointID = 60634843289, Value, 0)) as Comp3B
FROM [lib://MyQVD/QVD.qvd] (qvd)
WHERE EXISTS (PointID)
GROUP BY
Company,
Year
;
I'm not sure if the preceding load will work correctly on a group by. You may need to split it into two loads with a RESIDENT load, but if it works this way that will be better.
The GROUP BY will cause it not to be optimised, so it will be slower, but it should work.
If the twelve PointIDs are only a small subset of the whole data you could do an optimised load of those (using the load statement from my previous post) and STORE that to a new QVD and just work from that going forward.
Steve
@stevedark , I tested this script and optimized as per your suggestions. Brilliant thing is it loaded only the required PointIDs.
but one question:
i want to see if Value of DataPointID 60684989045 < Value of DataPointID 60684986288 as Flag_1
and if Value of DataPointID 4300689< Value of DataPointID 60684986288 as Flag_2
Is this work? and what is the highlighted bold does?
"LOAD
*,
Comp1A - Comp1B as Comp1Diff, if(Comp1A <= Comp1B, 'Yes', 'No') as Flag_1,
Comp2A - Comp2B as Comp2Diff, if(Comp2A < Comp2B, 'Yes', 'No') as Flag_2
;
LOAD
CompanyID,
[Time Series Year],
sum(if(DataPointID = 60684989045, VALUE, 0)) as Comp1A,
sum(if(DataPointID = 60684986288, VALUE, 0)) as Comp1B,
sum(if(DataPointID = 4300689, VALUE, 0)) as Comp2A,
sum(if(DataPointID = 60684986288, VALUE, 0)) as Comp2B
Resident RAWDATA
WHERE EXISTS (PointID)
GROUP BY
Company,
[Year]