Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Rep
Contributor II
Contributor II

Log Stream Staging

We have a database which has large number of tables - some tables have primary key defined and some does not have primary key or any unique index defined.

Will it speed-up processing if we define 2 Log Streams on the source:

1. One Log Stream for tables that have Primary / Unique Keys

2. Second Log Stream for tables that do not have a Primary / Unique Key

We understand there can be multiple targets on a single Log Stream. However, trying to explore if we split Log Stream into 2 - will it expedite processing.

Labels (1)
1 Solution

Accepted Solutions
JitenderR
Employee
Employee

Multiple logstreams or tasks is not a complete solution for achieving low latency. In your case if you have 10,000 tables and say if they are of 10 different databases. And you have multiple database logs for each set of tables - say 10 databases -> 10 db logs -> so 10 sets. in this case you can create 10 tasks as each task would have to scan the logs for the corresponding tables it has in it. but if all 10k tables are part of a singe log, and then there is a huge influx of changes for a few tables, then unless all those changes are replicated, the other tables will not be updated. 

Further note that the purpose of logstream is to avoid multiple connection to database. for example if DB2 zOS or mainframe is your source, you don't want to have multiple connection as that will exhaust your mainframe cycles resulting in huge cost. this is where replicate logstream plays a key role by minimizing the connections and reading the entire data. 

Summary, consider the entire architecture end to end and engage Qlik Consulting Services for a thorough design review to ensure to address latency and any other challenges. 

Regards

JR

View solution in original post

3 Replies
JitenderR
Employee
Employee

are you seeing latency when using logstream as a target in normal scenarios? By normal scenarios I meant, if you have a sudden influx of changes (say like a purge activity), you will see latency for a while but eventually it does catch up. 

Now, logstream allows you to avoid multiple connection to the database. But if you are seeing latency, say if you have 10,000 tables, then you might want to create multiple LS jobs. but again, only if we see latency and other design challenges. 

 

Regards

JR

Rep
Contributor II
Contributor II
Author

Hi JR,

Thanks. Currently, we are running in POC mode and still experimenting/evaluating.

If i understand correctly, if we experience latency then one option may be is to create multiple Log Streams. E.g. one Log Stream for Audit tables and another Log Stream for other tables. 

Please advise if creation of multiple Log Streams will help in reducing latency.  What are the other factors we may need to consider to help us decide - if creating multiple Log Streams is a better option?

JitenderR
Employee
Employee

Multiple logstreams or tasks is not a complete solution for achieving low latency. In your case if you have 10,000 tables and say if they are of 10 different databases. And you have multiple database logs for each set of tables - say 10 databases -> 10 db logs -> so 10 sets. in this case you can create 10 tasks as each task would have to scan the logs for the corresponding tables it has in it. but if all 10k tables are part of a singe log, and then there is a huge influx of changes for a few tables, then unless all those changes are replicated, the other tables will not be updated. 

Further note that the purpose of logstream is to avoid multiple connection to database. for example if DB2 zOS or mainframe is your source, you don't want to have multiple connection as that will exhaust your mainframe cycles resulting in huge cost. this is where replicate logstream plays a key role by minimizing the connections and reading the entire data. 

Summary, consider the entire architecture end to end and engage Qlik Consulting Services for a thorough design review to ensure to address latency and any other challenges. 

Regards

JR