Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I want to remove rows from a LOADed table. One way to do it is to use WHERE when loading the table but this will stop the optimized load causing serious performance problem. Because the number of rows I want to exclude is small (one thousand of 25millions), I had the "brilliant" idea to load the whole table and delete rows after, but I cannot find any statement to delete loaded data.
Is there anything like sql delete statement to delete rows with criteria?
Thanks
Just do a second load on the RESIDENT table using a WHERE clause to filter out the unwanted data then DROP the initial table. To prevent Qlikview joinging the second table to the initial you will want to use NOCONCATENATE LOAD.
Regards,
Gordon
Hi.. try using "Reduce Data" . Select the rows from a list box you wish to keep (use Select Excluded) and simply select "Reduce Data" from the file menu option. You may save the file with a new name before choosing this function.
Thanks & Regards,
Vineet Agarwal
..... but you would need to do this after every load...
In by company we use QlikView Publisher. In the distributions task, it is possible to do a Reduction. That seems to do the job for us. But it may be depending on how the scripting is done.
Do I need to say, that the reduction is done automatic, on every load, and dynamic when data changes....
Regards,
Jerrik
Hello Vlisidis,
Here you are my suggestion: I would load first the whole table and store it in a QVD file and drop the table, then a second load from the QVD file using WHERE clause now. Depending on your hardware configuration and if storage is not a problem, will perform better than loading from RESIDENT. In my experience, this option is preferable than loading from resident tables when moving hundreds of thousands of records.
You may be able to do your data reductions with where exists and/or inner joins. You can use ONE where exists statement without breaking the optimized load. And you can do inner joins to the table after the fact. This is often faster than an unoptimized load or doing a load resident.
So let's say I wanted all fruit records with color RED and type APPLE. Let's say that 30% of my fruit are red, but only 5% are apples. So I want to apply the apple filter first, in the where exists. Then I'll apply the color red filter after the fact with an inner join.
[Red Apples]:
LOAD 'APPLE' as Type
AUTOGENERATE 1
;
INNER JOIN ([Red Apples])
LOAD
Type
,Color
,other fields
FROM Fruit.qvd (QVD)
WHERE exists(Type)
;
INNER JOIN ([Red Apples])
LOAD 'RED' as Color
AUTOGENERATE 1
;
just to understand: is the where exists() needed if you're already doing an inner join?
dragonauta wrote:
just to understand: is the where exists() needed if you're already doing an inner join?<div></div>
It is not needed, in the sense that the results will be exactly the same either way. However, if you don't use the exists(), it still loads all of the rows, and then drops the ones that don't match in the inner join. With the exists(), it only loads matching rows.
Testing on a sample data set, it appears that both methods take the same amount of time. However, using exists() saves a significant amount of RAM during the load. That may or may not matter in your environment, but it seems like it would be a good habit to use an exists() in a case like this.