Compose Live views extremely slow due to many small sequence files in _CT Tables. Provide ability to MERGE seq files or create large files
Compose Live views very slow due to large number of sequence files in _CT Tables for real time latency use cases. Provide an ability to MERGE seq files in _CT Table or create large files
Why we need this: with databricks endpoint, we have to set “Change Processing” settings to default 30 seconds and 32 MB for a use case where we need near real time latency when querying live views. Doing so, replicate is creating like 3,600 files in each partition and we are partitioning every hour for the __CT Table. The impact of this is, over time, live views becomes very slow to query due to the large number of files for use cases specifically where we need the tables to have historical data or have high volumes in a short period of time.
Impact: Slow live views and even though from replicate standpoint we have near real time latency, the live views are so slow that they defeat the purpose of having near real time latency.
Ask: replicate automatically merge the small sequence files once the current partition is closed or any other acceptable manner.