Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I came across a scenario where the passthrough filter is surprisingly slower than the table filter in full load. What are the scenarios which can cause this to happen?
My understanding is that the passthrough filter offer greater performance as it alters the actual select statement sent to the source endpoint.
Regards,
Mohammed
Hello Mohammed, @MoeyE
Good morning and thanks for reaching out!
You are right, in general passthrough filter offers greater performance. We'd like to understand the issue by:
1. Replace the target endpoint by NULL device target to make sure the bottleneck is at source side;
2. Set source_unload to Verbose with passthrough filter and get a task log file.
3. It's important to check the table's PK or Unique Index definition. if the filter cannot utilize the PK/UI information and it's in full scan mode then the performance is a challenge.
4. BTW, how about if you compare the 2 queries by manual running?
We'd like to investigate the task log file to understand the RCA.
Regards,
John.
@MoeyE wrote:
Hi,
I came across a scenario where the passthrough filter is surprisingly slower than the table filter in full load. What are the scenarios which can cause this to happen?
My understanding is that the passthrough filter offer greater performance as it alters the actual select statement sent to the source endpoint.
Regards,
Mohammed
While passthrough filters are generally more performant, they can be slower in certain scenarios. This can happen due to factors like complex filter expressions, inefficient data models, or limitations in the data source. In such cases, table filters, which apply filtering on the Power BI side, might perform better. It's recommended to test both filter types in your specific scenario to determine the optimal approach.
@MoeyE Do your homework! Execute the resulting query on the source side and check out the execution plan.
(and tell us what the source Db is, its version and just to be sure the Replicate version in play,
and while you are at it... what was the general layout of the pass-through filter? ).
Using a pass-through filter is expected to be faster because a) there should be fewer rows to be transferred over the network and examined by Replicate and b) supposedly the source understands the data and knows the best query to fetch the desired rows.
Apparently this backfired. Without a pass-through the source will typically just do a full table scan, not utilizing any indexes. With pass through it probably has an index it can use to limit the number of rows to be considered. However, using an indexed read typically makes the lowest levels of the DB jump around from left to rights and back to gather the rows. This plays havoc on everything. It may read a block to fetch a row and minutes later have to come back to that block to fetch another row but the block is no longer in the DB cache and the storage system needs to go back as well. It then becomes a number game where the table statistics define what the right plan is.
My personal guess is that if you need 50% or more of the data, then a full table scan is optimal.
What does the DB know about the data distribution and filter values? Does the DB used (as of yet unknown) even have stats on data by value? Maybe the filter query is age < 67. Well on a retired personel list that is likely to be more than 80% of the rows, but on a collage application list it will be less than 5%. For the retirees the full table scan is vastly better. For the college applicants the key range is better. Does the DB have the right information to decide?
Hein