Skip to main content
Announcements
Introducing a new Enhanced File Management feature in Qlik Cloud! GET THE DETAILS!

Q&A with Qlik: Qlik Replicate Best Practices

cancel
Showing results for 
Search instead for 
Did you mean: 
Troy_Raney
Digital Support
Digital Support

Q&A with Qlik: Qlik Replicate Best Practices

Last Update:

Aug 23, 2023 1:37:32 AM

Updated By:

Troy_Raney

Created date:

Aug 7, 2023 4:54:16 AM

Troy_Raney_0-1691398361919.gif

 

Environment

  • Qlik Replicate

Transcript

Welcome to another session of Q&A with Qlik.
Today's topic is Qlik Replicate Best Practices.
So if you are new to Qlik or you want to learn more about Qlik Replicate best practices for you and your company, this is the webinar for you.
Troy could not be here today. He's on a well deserved vacation and we hope that he is having fun.
My name is Emmanuel Herndon. I'm a Digital Customer Success Specialist and one of my focus is to do webinars like
these, bringing some value to customers through their journey. My name is Kelly Hobson.
I'm a support engineer for Qlik Replicate and some of our QDI tools
in addition to supporting Qlik Auto ML, which is one of Qlik's newer products.
Steve? Yeah, my name is Steve Nguyen. Pretty much I am a principal support engineer over Qlik here.
I've been with the company for over 15 years. Old products, new products,
they usually come to me for assistance in isolation of what is broken.
Dana? Hi, I'm Dana Baldwin. I'm again a customer support engineer with Qlik.
I've been here about seven and a half years. Good morning, afternoon, folks. This is Bill Steinagle, principal engineer.
I've been at Qlik going on five years. A lot of support backgrounds, 29 plus years.
Happy to have you here. Thanks. All right. Thank you.
Where I get started? What should be checked prior to upgrading Qlik Replicate?
What should be done prior to upgrading?
Thanks. This is a good general question as upgrades are certainly a part of the process as you have with your
Qlik Replicate set whether that's just going to your newer version or if you need to apply a fix for an issue you're experiencing.
But what should be done prior to doing the upgrade is to first review the user
guide, the instructions to understand the process for applying an upgrade.
In addition to that, we ask you to review the release notes. If you're coming from a version that's
a few releases behind, those release notes will have the information on those upgrade steps or the upgrade path.
Then also, a good best practice just in general is
to make sure that you have a test environment so that you can potentially test and understand the steps in a lower non production
environment so that then when you do it on production, you're ready and you really feel like you're confident with the steps you're performing.
Additional notes is that when you're upgrading your environment, make sure that a supported OS is the correct version that you're going to.
Where some of the new products, like, give example, replicate 2023, we only supported OS, give example, Red Hat enterprise 8 and above.
So some customer tried to upgrade and they run into many issues. So just like Kelly was advised,
make sure to check your user guides, especially in their OS support level.
Yeah. Also, I would add that sometimes the versions of the sources
and targets that we support with each new version can be different as well. That's always a good thing to check in the release notes, too.
Okay, thank you. We did have a question that came in, but it came in through the chat instead of the Q&A panel.
How to make end destination schema different from source?
It says, Could you discuss setting up replicate for cases where the destination schema is different from the source?
For example, we want to record transaction from deleting records and keep the deleted records as a reference.
Can anyone help with that question? Yeah. Excuse me. I posted an example.
It's basically a transformation using the ARH operation.
I could share the link with the 235 version with the different headers that you could use.
And the example shared in the in the chat is basically you can add that to a target column and add your case statement
and the source schema to be added to the new column with the transformation.
And the transformation is just an example. These are insert statements, update statements on the given column.
Hopefully that's helpful. Thank you. There's a question that came in to the Q&A, and it says,
Estimated release for GenAI feature?
I saw a video of adopting GenAI in Qlik, and it's very interesting.
Is there an estimated release date for the GenAI feature?
I don't know if this is technically a question to be asked here, but let's see if you have anybody have an answer.
I can give my best on that one. The generative AI, some of the tools are already released.
For example, Auto ML is already available and it's a part of the Gen AI ecosystem.
Then in addition, some of the advanced analytics connectors and Qlik sensor available.
Then I would say the other part of it was some of the alerting and reporting that's done.
I think there's not a release date that's like here, it's here, or there's a date in the future.
I think that they're just iteratively keep adding to the Qlik cloud some of the features that are a part of that AI ecosystem.
Okay, thank you. Next question, are there any plans to support older
Plans to support older SQL endpoints?
versions of SQL servers such as 2008 as an endpoint?
I think I can take that right there. The reason that Replicate has we started supporting in SQL Server 2008 is that
Microsoft is actually this end of live of SQL Server 2008 back in 2019.
Since Microsoft would not support it anymore, we are limited to what we can do.
That's the reason we end support of the SQL Server 2008 as well, too.
Thank you, Steve, for that.
There is a question that just came in. Is it possible to replicate an open edge database?
Possible to replicate Open Edge database?
I could answer that question. If it's not in the support documentation as a supported endpoint or
endpoint, the one that can be used via an ODBC connection,
that would be an added expense to your current subscription with the tool.
Obviously, anything ODBC related would be professional service engagement with the QDI Qlik teams.
And just a side note, that's in general for some of these unsupported
endpoints source of your target, the recommendation is to use ODBC.
We do have ODBC and ODBC CDC, and that's going through our professional services team.
It's out of scope with support. All right. Thank you, Bill.
Sure. Next question. For a schedule full load,
How to drop a recreated table index in a task script?
how can we run a preload and postload script to drop a recreated table index.
Okay. Dana, actually, I can take that. So much is that within the current
replicate scheduler, there is no preload or postload scripting involved.
But however, if you use and give example QEM API, what you can do is you can actually do a script to call a
P preload and a postload script within a API call to reload your full load.
So pretty much it's possible to do a preload or postload using the script and calling the API to do a for a reload of your task.
Okay? Okay. Thank you for that.
Next question, where do I find the settings to increase and decrease how long the redo archives file is set?
How to adjust how long redo archives are kept?
I'll take that. I'm assuming that the redo example or SQL Server Oracle, those archive login information are really not in replicate.
Those are within the database administrator themselves. So database admin can set the retention period for a redo log itself in term
of how long the redo archive log are kept on the database server.
It's not portion of replicate. Replicate, just read the information.
All right, thank you. Next question.
How to change the ports that are used?
Hi, now it says, our Qlik Replicate, HTSP is running on port 8443, but we would still like to run on port 3552.
Can you kindly suggest how we can do the same?
Okay, I can take that. Pretty much, is that most of the UI by standard default for replicate
themselves in the Windows environment run on port 80 and 443. Now, the Port 3552 are mostly Unix replicate server running on Port 3552.
I'm assuming that you're running 3552 because it's a Linux environment
right there, and that's why you're seeing the Port 3552. Another thing that you could really run on Windows environment
3552 if you actually set in just the admin credential.
When you're doing that, you don't have user permission control over who can log into your web UI.
So is not beneficial to run 3552. It's better to run on Windows Port
443 or Port 80 so that you can limit the number of users that log into the web UI.
When you're using Port 452, there's only one user which is admin.
They followed up and said that their environment is Linux? Yeah. Linux environment only run on Port 3552.
Most likely is that they have a Windows version connected to their Linux server. That's why they're seeing the three, Port AD or Port 443.
They still can use Port 3552. If they don't have their Port if they have
an issue, they can open a support case on that. Okay? Okay, thank you for that.
These are some great questions. Keep them coming and we will get to them.
The next question is a little lengthy, but it says, Hello, our production replicate server is
How to manage multiple servers with around 100 tasks each?
configured on Windows Server with about 130 tasks running. One of the Qlik documents suggested
having only 100 tasks per each replicate machine. We are asking to configure replicate
on another server and move a couple of tasks to this new server. Do you suggest having two standalone
servers or having a load Balancer between these two replicating service things?
I can talk to that one. Basically, the limitation has to do with the way the operating system manages the heapsize.
Generally, what we find is it's about 100 tasks that can run at the same time on a single single replicate server.
It might be a little bit fewer tasks or a few more tasks in your individual situation, but
you would want another standalone server for the second part of your question because a load balancer would assume that you've got a separate web
service that runs from the application itself, but it's all integrated. It uses the same data folder and definition for the tasks.
So there's no way to have two servers running the application against one single data folder at the same time.
Now you can configure that in a failover configuration, but again, only one server is going to be active at a time in that case,
and you would still be limited to about 100 tasks running at the same time.
Correct, Dana. Dana, pretty much just adding to Dana is that, Windows hips is
actually a Windows non interactive sessions where replicate and open when
the task is run it is considered a non interactive session. Those Windows hips sites can be increased,
but that's dependent on how your server is set up. It's best to just keep it under the 100 limit.
Okay? Okay, thank you both for that. There was a question that came into the chat, which was misplaced,
but it says, What's the recommended share drive for high availability in AWS?
In AWS, what is the recommended share drive for high availability?
I can take that question. So in AWS, I'm assuming that it's S3
storage and they have a file system set up for file share that they want to point to.
As long as Replicate can get to the drive and it would probably be more beneficial to use an EC2 type instance of replicate hosted in the same
cloud provider for AWS as far as performance. So as long as Replicate can get
to the drive and the data directory, replicate shouldn't have an issue. But again, for performance, you might want to consider adding
the replicate server in the same EC2 environment and the AWS cloud environment. Hopefully that helps.
Okay, thank you. I wanted to go back to the previous question that Steve was talking about when we were dealing with the couple of ports.
They followed up and said, Your team are suggested on ticket suggested bringing up Qlik in port 3552
Should it run on port 3552 to disable legacy protocol?
for to disable legacy protocol like SSH
2.0, TLS 1.0, and TLS 1.1 allow...
Yeah. I didn't mean to cut you off in there Emmanuel. No, you're fine. That's basically the security cipher calls at the OS level.
Replicate uses the TLS 1.2 and onwards as far as the SSL calls from the operating system.
If there's a change at that level, that would be OS level, not replicate.
There's some Linux environments. There's a configuration file,
open SSL.Config in your SC Linux directory that they could
work with their Linux admin to disable that. I could share an article.
It's related, but it gives you the information about
the updating of the OpenSSL config files on Linux. That would be great. Make sure that everyone gets it.
Thank you for that. No problem. Next question.
It's a little lengthy, but it says, We are experiencing an issue with a series of SAP extractors to Qlik replicate.
Why empty rows when extracting from SAP?
The problem arises when we attempt to extract data using Qlik replicate is it returns empty rows despite being
able to visualize the data through the RSA three query in SAP.
Our suspicions is that the problem lies within the immediate tables created by replicate before data extraction.
As we observe that the tables remain empty, using the same test, other extractors work.
Is there a way to restart metadata extractors for Qlik replicate?
I can take that question, especially with SAP extractor. There are certain limitations as far as replicate and the extractor,
depending on the environment and how the table is defined in SAP. I would suggest just looking over the...
I'll post it in the chat the limitations currently with the extractor as far as stopping,
starting, resuming, starting from timestamp. There's a few limitations that you have to
look out for, especially or specific to the SAP extractor. I would look over the limitations that I shared in the link,
and if you have more questions, I would definitely have a support case open.
Okay. Because just a side note, with the extractor, there are certain steps in SAP
that customers work with their SAP admin to get the extractor set up and work. And I do see the RSA 3 T code in SAP that you're actually seeing the data.
And then it's just between the data being shown in SAP and the limitations, I would just double check that as well.
And if you're still not satisfied with the information, just share a case and we could definitely proceed with you.
Thank you for that. I have to get that link into the chat for everyone to see.
Yes. And just a side note, as far as the links I'm sharing, these are the latest 2023 5 version.
If you do click on the link, there is an option to change the release to the certain version that you're running.
Notice you only see November 2021. As we go and move on to new versions,
those are removed from the documentation once they're out of service. Just a side note to be aware of.
I just wanted to show quickly the link that Bill provided.
All right, next question is, Is there a way to schedule a turn off Qlik CDC for a few hours on Sunday night
How to schedule downtime for tasks?
to support schedule downtime of this source?
Yeah, I can take this one, Bill. So there is. If you go under the server tab and the Scheduler,
you can go to the scheduled jobs and have a new scheduled job of type stop task.
So you would stop your CDC task, and then you can have another
task a couple of hours later that would do the start or the resume the task. So then it would just pick up where it
left off, but it would have that period of downtime.
And latency when you started. Yeah, latency. All right, thank you.
Next question. It says, continuing from my previous question on load balancing and central
Does Replicate support clustering?
servers, does replicate support clustering? If yes, how does that work?
I can take that. Pretty much is that replicate support either Windows clustering or Linux clustering.
Pretty much is that give example, for window clustering, we support window clustering where replicate is installed on both instances and the data folder is on XJF storage.
Give example, we support only active passive clustering. We don't support active active clustering.
That means that one server is up and running and if Windows Cluster service detects some issue
with one node, it will go ahead and LMS transfer to another note. That means that all the tasks gracefully stop.
Windows transfer is to another note, bring up their services, and at that point, all the tasks will resume from another nodes.
Since that the data folder is shared on a shared network drive,
all that information is considered the continuation of the task themselves.
Okay? Okay, thank you. The next question that we have is following up from the previous question
Recommended AWS service for shared storage?
on recommendation, shared drives for high availability. It says, Any recommended AWS service for shared storage.
I can just try to give some information on it. As far as the AWS and the storage,
that's the actual physical storage device that they define in AWS that's going
to hold the data for replication on the source and order target.
I guess that would be EC2, BMW, or something in the cloud.
Out from the replicate server running on Windows. Thank you for that.
Sure. We could take that offline if needed as well.
Okay. The next question that we have is, do we have below options for CDC replicate?
How to capture only specific operations (updates, inserts, or deletes)?
If not, when can we expect in which version? They have capture only updates, inserts and deletes, capture only deletes,
not inserts and updates, and capture only inserts, not deletes and updates.
Anyone? Yeah. Sorry, Emmanuel. That goes back to the question posted earlier that would be the ar_h_operation
indicator, delete, insert, and update. Then they could just use the transformation on that column
that they want to capture or not capture the specific operation. Thank you for that.
Sure. Next question. You are a partner for Postgres Data Direct.
Is there an Open Edge connector?
Why don't you have a connector for Open Edge? Anyone have an answer to that question?
I believe this is all the up to product management right now, Emmanuel, is that
depending on our product management fund team where they can see
this particular endpoint is meet customer requirements. So really it's up to product management.
So the best way to have an open edge end point is that you might have to submit a
feature request ideation into the Qlik form. Hopefully, that many people requesting us will have an open edge endpoint.
Moving on to the next question. It says, Can Qlik Replicate handle two
Is it better to have one multi-billion record task or to split it?
billion records for Oracle Cloud storage to Snowflake target? Is there better to combine multiple two
billion records into one task or split table into multiple tasks? Are there any performance parameters that we should add?
This is a Linux environment. Anyone have any ideas? Yeah, I can answer that. Yes.
So replicate doesn't really have any limitations as far as going from Oracle to Snowflake.
Be a good consideration for parallel load. If you need to load the data from your
Oracle environment to your target, it may take a few days, but you have to understand that replicates is on top of the ODBC layer that does all
the communication between your source and your target and the location of your replicate server to your Snowflake target as well.
The other thing to note about Snowflake is the warehouse size that you're writing to.
Depending on queuing, there's some different techniques that working with your Snowflake admin
that could help better tune the target environment coming from the source. But just remember, the Replicate server,
depending on the location, as far as Oracle reading from Oracle and then writing to your Snowflake target, that takes in consideration as well as far
as the performance and the parameters to use. You could do multiple tables, but if this table is specific to one
table, parallel load would be the best option. And then the other thing to look at at Snowflake is the warehouse size.
I think the warehouse size is more appropriate than the amount of clusters associated with that warehouse.
Correct, Bill. And just a side note, any performance related when working with support via support case or
the Community, support can take a look, give recommendations, but anything that
gets more involved, we always recommend professional services because it's performance tuning as far as your source and target goes.
Another question, a little lengthy. We are using batch mode for CDC
Is it possible to resume a PostgreSQL task instead of running a full load?
from ProScribe to SNOW flake, and the source is an Aurora ProScribe global.
If we had the need to fall over from a royal cluster to our secondary DR
site, is it possible to resume the task rather than full load the task?
After discussing with process of service, it seems the recommendation to not
lose data is full load at the fault, fail over. But there is a very limited in our DR
testing validation as the full load can take many days to complete.
I can answer that right there, E mil. I believe this that since Postgres fail over in the DR environment on Aurora,
it doesn't keep the transaction law in term of what is Postgres is reading.
Postgres is reading... Dana or Kelly helped me out a little bit.
It's not transaction law, what is Postgres reading? It uses a wall file, right?
The wall file. Yeah, the wall file. So pretty much is that on the CDC, when the DR or Postgres fail over, that wall is got changed right there
in the failover of the cluster on Azure, I believe. Well, if that's the case,
then replicate doesn't know where the wall was resumed from. And that's why professional service recommend a full load.
If the wall is able to continue from one cluster to another, then Replicate is able to continue moving forward from there on.
Yeah. And just to add to that, with Postgres, there's no starting from timestamp.
So back to Steve's point, if the wall ID is not moved to the other cluster from the failover,
if you can't resume the task, that would be most likely why professional
service recommend the full load to ensure no data loss. Correct. All right.
Thank you for that. We have another question, but before we get to it,
if there is any more last minute questions that anyone have, please, again, put them in the Q&A panel and we will try to get to them.
The next question we have is, for DDL changes, how do we avoid doing a full reload of the table?
How to capture DDL changes without a full reload?
We are using Microsoft SQL Server. Our processes when a DDL change is ready
to be released, we have to flip the CDC on and off to get the metadata table
to match in order to get Qlik, to unsuspend the table. However, we are not sure if we will miss data during the CDC flip.
Reloading large tables can be very cumbersome and inconvenient to our end users.
Any feedback you can provide for other customers or I guess for them, that would be helpful.
Thank you. Okay. I believe we can handle DDL here in term of MS CDC.
I believe they turn it up and turn it on. It could be that the version of the SQL Server is not up to 2018, 2019.
I can double check in term of DDL handling the user guy real fast. So just bear with me.
Go ahead, Bill. I shared for you, Steve. And that goes back to the MS CDC.
That is one of the limitation as far as the DDL change. Yeah, that's what I thought too.
I believe it's a limitation right there. Yeah. It's the endpoint that you're using, which is the MS CDC.
That's why it's a limitation. If you use the normal MySQL replication, then we handle the DDL without any issue.
You definitely have to consider looking at the user guy under the limitation going forward.
Yeah, I shared the link for on prem SQL Server as well. Thank you, Bill. Sure.
Okay, I have shared it with everyone. Next question, and it might be our next to last question.
Again, if you have any more questions, please put it into the Q&A panel. Is there a possibility to improve the loading performance for SAP to...
How to improve performance from SAP to Qlik Cloud?
The SaaS Cloud. Yeah. The SaaS Cloud. Qlik View took 40 % to 45 % less time.
I'll make an attempt for that question.
As far as the cloud and SAP, do we understand or know the performance?
Is it on the SAP side or is it on the SaaS Cloud side? With SAP, obviously,
you have different source endpoints, SAP application, SAP DB.
So if it's SAP application, it goes through RFC calls, which is a load on the SAP source system environment.
If it's the database, all the physical and data loading is at the database level. That goes into consideration along with the location of the cloud server.
But it depends on the type of connections. Remember, replicate uses DOEC connections.
As far as SAP goes, it uses our internal endpoint server connections to do the communication.
Without having more details and information, I can't confirm as a performance why one would be faster than the other.
That would be a support case and would most likely turn
into a professional service engagement as far as performance goes.
Okay, thank you, Bill. Sure. How often should we upgrade,
How often should Qlik Replicate be upgraded?
Qlik replicate as a service pack release appeared to be frequent? Go ahead, Dan.
Yeah, I can speak to that. Basically, service packs come out as we're working support cases and then we run into issues that need to be fixed.
So generally speaking, unless you're impacted by that particular issue, you don't need a service pack.
Occasionally a service pack will include new features or enhancements.
To obtain those, you would need to open a support case and we could provide you with a download link.
Also, be aware that about every three months we will roll up service packs into what we call a service
release, and those will be be made available on the regular downloads page. It's publicly available.
And mainly it's just to get fixes available to users that have accumulated.
And also if there's been any significant changes, then a service release will come out. That's good. Then another thing I want to add is that you can always open the support case
and ask, Hey, what is the latest service pack? You don't have to download it, but you can always Qlik on it and get the release note of that service pack.
If your endpoint gives them SQL Server or of course, Snowflake, any of those notes, then hey,
it's best to just go ahead and update it so that your environment doesn't affect or
get out of the latest in term of what is covered in that service pack. You don't have to upgrade it.
You just can always download the release note and just read up on it.
I just wanted to add piggyback with Steve, as far as SAP goes, we do recommend customers,
if you're at a current version, especially the supported versions, 2022 11 and 2023 5, is to be on the latest service release for that version to pick
up all the specific SAP changes for that version. Okay, thank you.
Possible to replicate non-cluster index AND where are parameters documented?
Sure. We have one last question in the Q&A panel. It's a version of Qlik that replicates noncluster index over to the target DB.
The second question, is there any documentation of all the internal parameters somewhere we can reference?
Okay. Well, I'm not sure about the first question right there, but in term of internal parameters, their internal parameter are there for support use.
It's really not for open to the world. Yes, we could open it up to the world.
It's just that now we're going to create more issues. It's just like, think of where you're trying to cook a recipe and you put all the ingredients into your cooking.
It wouldn't be a good tasting meal. That's the same reason we didn't open up to everybody.
It's only based case needed to have a particular internal parameter to use on a certain situation.
All right. Thank you, Steve. That was a good analogy. I like that. Yeah, it's true. That's how I could put it.
There's one last question that slid in real quick, and we just want to cover it. Can we give task level permissions
Can we give task level permissions to security groups?
to replicate instances to different security groups? O ne set of users are of one AD group
can only see the tasks that are assigned to them. If so, is there an API available to do the same?
That can be accomplished from QEM itself. It's not within Replicate.
Replicate is the actual replication engine. Qem is the monitor application
where QEM monitor the server and can assign to an AD group or an AD user on a particular endpoint or a particular case only.
Yes, it could be possible from QEM. Thank you.
I think there's a follow up. Qlik replicate, only replicates primary keys but doesn't copy over non cluster index.
How to apply non cluster indexes?
Is there a version of a way for a click to copy and apply noncluster indexes?
Noncluster. What is that end point right there, Bill? Is it a limitation somewhere, Bill? Yeah, it depends on the source.
We would need to need to to know the source and target endpoint to further give you a complete answer on that follow up?
Yeah, I believe it could be a limitation on that particular end point right there.
Right. Just as an example, real quick, with Snowflake, Snowflake doesn't have the concept of primary keys.
The primary key and the constraint that's handled by Replicate, that's what has the information. So a more follow up and thorough answer
can be provided depending on your source and target. I would check the community if it's not found there, definitely share a support
case and we'd be happy to complete the answer. I think they updated and said SQL and MySQL.
Yeah, yes, I would have to take that offline to confirm. Okay. Just as a side note to that last question, check the Community and then share support
case if it's not there and we'll get back to you. Okay. All right.
I would like to thank the panelists as well as the audience for your questions.
Before we go. Yes. Emmanuel, just one quick follow up on that last question from David.
Just just make sure we get that back and we get that communicated out to them. It's pretty straightforward. Okay.
We say have a great day.

Comments
apouliou
Contributor
Contributor

Wie kann ich in replicate die reloads der Tabellen loggen? D.h. alle erfolgten reloads in einer Log-Tabelle im Targetsystem ( Oracle ) eintragen um diese auszuwerten?

Sonja_Bauernfeind
Digital Support
Digital Support

Hallo @apouliou 

Bitte stellen Sie Ihre Frage in das Qlik Replicate Forum. Wenn möglich in Englisch damit unsere User und Support Engineers Ihnen schneller helfen können.

mfg,
Sonja 

Contributors
Version history
Last update:
‎2023-08-23 01:37 AM
Updated by: