Talk to Experts Tuesday - Qlik Replicate FAQ

Jamie_Gregory · Oct 22, 2020 2:42:56 PM

This is the FAQ for the October 20th Talk to Experts Tuesday session on Qlik Replicate.

For the recording and transcript, please see TTET - Qlik Replicate on October 20, 2020: Recording & Transcript.

Environment: Qlik Replicate

Q: Why is the users tab is not showing up under the Server View in my Linux environment?

A: In Linux Replicate, there is only one user in the default mode, the admin user that has full access to everything. If you want to be able to have Roles for users, you would have to use the Windows UI with the Linux Replicate server. You can see that there are fewer tabs than on the Windows side, like user permissions. You can work with Active Directory and set up an individual user or groups that are in your Active Directory to give access and change the roles.

Q: How do I collect the process for Qlik Replicate?

A: We did a presentation in house, a couple weeks back and we uploaded it to the Qlik Community Knowledgebase, Collecting Replicate Process Dumps. It’s a presentation but it’s also a PowerPoint. In the PowerPoint, it goes through and gives you instructions on what you have to set up inside of replicate, what you have to set up at the operating system level, both for Windows and for a Linux environment, to be able to capture the debug information. It shows you all the stuff that’s needed by Support to be able to debug a dump situation where processes are ending unexpectedly. It’s got some pretty good information on things that Support is going to be looking for to be able to help fix the issue.

Q: How can Qlik replicate access SAP ECC, HANA systems via SAP extractors?

A: We just added support of SAP extractors to our Replicate capabilities in this release. It’s very new functionality. Basically, we are just beta testing it. We are basically utilizing the extractor as a connector to SAP. The same way that you are used to utilizing extractors to capture data from SAP and move it into BW, we are just replacing the endpoint on the other side. We have all the same information, the same setup you’re using today to capture data from exactors, on the backend we’re plugging in Replicate. From there, it will be taken into the destination of your choice, whatever is supported by Replicate.

Q: Can you please recommend on how to set up a change control management in production and the options to migrate several tasks lower to higher environment?

A: If you’re trying to move, and we’re talking about basically a Replicate task and trying to move it between environments. Either moving it up to Production or moving it into a QA environment or maybe down to dev. Typically you would set up the connections ahead of time on all the environment. We call them endpoints, it’s basically the connection string information. You can then take advantage of Enterprise Managers API solution to be able to export that information from one environment, either change the endpoint connection strings or remove them completely with the same (as long as they’re the same name) and then import those using the API set to the new environment. When you extract that, it’s in JSON format that you can check into source control so that you can track that.

Q: We have experience slowness when using upsert mode. What alternative, if any is available, to increase speed while keeping features that upsert provide such as the ability to do an insert when a update transaction arrives and there is no record in the target?

A: Typically we’re in batch apply mode and you’ve set the upsert mode, then when an update comes in, we’re going to issue a delete then an insert. Unfortunately, that doesn’t work for tables that have LOB information in them. Most of the time we suggest splitting your LOB tables, tables that have LOBs in them, into a separate task that you know are going to be slower to apply. So it doesn’t impact the bulk of your tables going through. If you’re in a database that likes big transactions, like Snowflake or some of the other web based apps, then you want to be able to build large batches. Some of that tuning is around setting the batch size so that you’re doing large amounts of changes at the same time. But every situation is a little bit different so typically you’d want to open a case for that and get somebody from Support involved to see if there’s something else impacting your environment.

Q: Is there any way where we can extract the record counts of a table/tables from source and target , columns matching of the source tables and target tables and the values in each column between source and target. All these can be extracted from replicate? Any tables that capture these information? my source is SAPHANA and targets are Azure managed instance and Azure synapse.

A: Replicate doesn’t store information about the actual record counts that are in the source and target. We only keep track of data that flows through Replicate and you can use Enterprise Manager to track those statistics if you’re looking for something that’s more comparison between the source and the target. Although Replicate doesn’t support that type of a tool set, there are lots of tools out there that allows you to compare it to different database. Most of our customers will do SELECT record counts or a hash value of the rows and compare that information to find out if their data is in sync. Typically, there’s only the analytics database that is built into Enterprise Manager for data that has actually flowed through Replicate.

Q: Will there be a similar session for compose for data warehouses?

A: We don’t currently have anything on the schedule for that, but if there is a demand for it and you guys would like to see that, of course we can get that on the schedule. We are doing planning for 2021 so I will make sure that gets added.

Q: How does Replicate send updates/deletes from relational database to file targets for example Oracle to S3? Does it send any indicator in the file to mention it as updates/deletes?

A: In change tables, there’s quite a few different header columns that identify the user that made the change as long as the source database provides that transaction ID, the time when the change happened, the operation (is it an insert, update, delete) for the update. So that header information on the change table, which the change tables can be a file target, then you’ll get that information and to be able to act on it.

Q: With the November Technical preview, are there any new endpoints being added?

A: Yes, we added support for Azure Databricks Delta as a data warehouse target and Azure MySQL as a source. For more information on the November Technical Preview, please see Qlik Replicate and Enterprise Manager November 2020 Tech Previews Now Available!

Q: In the November Technical preview are there going to be any changes in what drivers, DB version, and clients are supported?

A: Naturally, with every release, we certify the latest drivers and those are posted in the documentation. Please see November 2020 Tech Preview Release Notes for Qlik Replicate and Qlik Compose for Data Warehouses.

Q: Can data be replicated onto AWS S3 in the form of Parquet files?

A: Today we don’t support that. Today we support sending data in CDC or sequence formats. We do not support direct replication into parquet format. Qlik Community is an amazing platform for submitting general questions but there is also the Ideas board for submitting ideas and request. You can review what your peers have already submitted by filtering on Qlik Replicate. If you don’t see something similar already, you can submit a new request. The PM team is reviewing the requests on a biweekly basis.

Q: What were the factors considered while coming up the hardware sizing for Replicate server?

A: If you’re trying to figure out the hardware requirements, a lot of it comes down to the volume that you’re doing. The number of full loads that you’re going to be running in parallel is a big impact to the size of the server. If you’re going to a file target, you may want to make sure you have the right amount of disk space available. If your SLA’s are very small and tight, you may have to make sure you have enough memory because most of the time Replicate will process in-memory. There’s some tuning we can do to make sure it uses a lot of the available memory that you have on your box. The user’s guide has general small, medium, larger, extra-large type setup that works pretty well for the most part for implementations. Typically, during the POC (proof-of-concept) we can get a good handle on what product would look like and then the PS group can help with sizing of the hardware. There would be an architectural review and they would be talking about sizing. Another factor that goes into sizing is in the tables that you’re replicating, how many LOB columns are there and if you can determine what the maximum size of an entry or data in the LOBs, that will also go along ways in determining the size of the Replicate server.

Q: I have a log stream setup that pulls from SQL Server. It is currently 197 hours behind. I have checked the memory and disk on the Qlik Server and see no issues. There is 50% ram left and no disk queuing. Is there any tuning that needs to be done on the Qlik side? It has been processing 124, 076, 911 commands for a week plus.

A: When we’re tuning and reading from SQL Server, there’s some things that can adversely impact latency. Database maintenance, doing a re-index on the database causes a huge amount of transaction logs that Replicate has to parse through to be able to get to the data at once. One thing you can do is, if you’re using native backups, you can give direct access to the logs, then you can configure the Replicate endpoint to do what we call a direct ready of the files, instead of going through the normal API route. So, if it is database maintenance causing it, that’s one option. Another best practice, if you can’t do direct read is to stop and resume the task after the database maintenance is done. This has to do with the way that transactions are stored in the transaction log and things that happen during a re-index that cause a latency issue.

Q: Are there any best practice documents or tips on transmitting data fast to the Target sources?

A: This question needs a bit more context, but it depends on what target it is. If the target is in AWS or is cloud based, then you want to make sure that you take advantage of that cloud’s services, they have different names for it, but you basically want a dedicated connection to that target. When posting to the Qlik Community or opening a case with Support, it’s important to be specific and as detailed as possible. For more information on what to include (specifically with cases), please see 8 Tips for Creating a Case with Qlik Support. Also, in the Qlik Community there are different Data Integration forums. Each place has a place for the Q&A and also for the documentation. Let’s say, for example, you want to see how to improve data load to Snowflake. I would recommend first logging into the forum, going into the Documentation section and just search for specific keywords and see if those documents already exist.

Q: Is it recommended to have the data directory outside of the product installation directory and it should be on another drive (say product is installed on C drive and Data folder on D drive)?

A: Usually you would want to do that because by default, everything is put on the C drive of your system. You would want to have the data folder somewhere else because if something runs away, you wouldn’t want to fill up your system folder. The other thing in Replicate, there are resource controls and one of which is a disk resource control on the drive that contains the data directory. That way you can prevent the system from filling up the data drive. Typically, on the production system, you would want the data folder on another drive than where you installed the main product. If you’re using log stream, it's a good practice to have log stream on a different drive than the data drive so they’re not competing.

Q: For the November Technical preview Is there going to be any improvements to Google Big Query?

A: One of the performance improvements is to Google Big Query. Instead of just using upsert operations, you have an option now to select merge operation. Our estimates in the lab show that can improve performance 3x faster than our current methodology.

Q: We have a business resiliency requirement to provide HA where there can be no more than a 10-minute impact on CDC replication. We have Qlik Replicate running on GCP. What is the recommended best practice for providing HA in this environment? Active-Active would be desirable.

A: I don’t think we have any best practices, per se, for running on a web based platform, either the Google Cloud or AWS. Typically, if we’re going into a relational database, there are settings you can turn on in Replicate to store the context of where we’re leaving off in the target database. So that you can recover quickly if you want to switch to another server. We don’t support an active-active type environment. It would have to be somewhat manual or automated if you automate the starting of tasks. You’d have to stop the task on one server and the start it up on the other server. But there’s lots of other things that go into that, by making sure that those servers as far as the configuration are in sync and things like. That’s something that would have to go through our PS group to help you.

Q: Is it recommended that Replicate server should be installed closer to source system or closer to target for better performance? For example, my source ERP is in Europe region and target is cloud platform in US region.

A: Usually it’s better to have Replicate closer to the source. If it’s a database, we’re reading something like a transaction log or redo logs, it makes sense to have it there. Especially in the cases where there is lots of filtering.

File channel is another kind of architecture where you have a local and remote Replicate server. Then you can have one server in each environment. What happens is that the data is transferred very quickly, in an optimized form between the two Replicate servers in an internal format. The file channel technology is support but it is a little bit outdated. The solution for data transfers in compression over the network that are available today in the market are by far significantly better. So we do recommend installation next to the source and the utilizing some of that technology that is available today.

A lot of that depends on your latency requirements. If you don’t have tough latency requirements, then you may not need a remote replicate server. But if the latency requirements are smaller or it’s a huge amount of data then you may need to.

Q: What are the new features of the November technical preview?

A: In the Attunity days, we used to call it either Beta or Early Access Program but we are adopting Qlik’s software delivery platform and one those is Technical Preview. The Technical Preview can be found on the Qlik Community. It is open to only existing customers and internal users that have an account. So every time we have a major release and it makes sense, we are participating in the Technical Preview. You’ll have access to the software three to four weeks before the product becomes GA. It’s available right now, go to Qlik Community > Product Insight & Ideas > Technical Preview then filter on Qlik Replicate. All the information on the new features is available here along with documentation. The official release data for the November release is November 10^th, 2020.

Q: How can Replicate ingest LOB datatype columns?

A: It really depends on what the source is, but in general Replicate does ingest and deal with LOB data types. You look in the Replicate User Guide to see what it actually does with the specific source endpoint or the database type you are reading it from. There are some limitations, depending upon what the database is. Certainly, the other thing you have to be aware of is where are you replicating the data to because that endpoint type or database will potentially have its own limitations that you have to be aware of. Replicate will deal with both but it may not do what you want it to do.

Q: Where can users find the Qlik Replicate User Guide?

A: Help.qlik.com is where you will find the user guide for the current and past versions of Qlik Replicate. Currently they are in PDF format but with the November 2020 release, we will be releasing our documentation in HTML format. If you join the Technical Preview, you’ll have a link to the beta site to view this new format.

Q: Can the Replicate server service and Replicate UI service be run by the same user on Windows server? Does it need to have admin role?

A: They can be run on the same server. They don’t necessarily have to be an admin role but they have to have the ability to edit files because the console and the server both can be writing task logs, other logging information, updating status files. So, it needs quite a bit of permission on the local server. It may also be interacting with domain manager to authenticate users. It’s quite a bit of security.

For Linux, there is no separate Replicate UI service. Now you can run Linux Replicate server service with the Windows UI. But on Linux, part of the installation is there an “Attunity User”, does not need to be root, you specify what that user is. As part of the installation process, which has to be run through root or SUDO, it changes the ownership and the protections on all the files that are created. So the Replicate user account has access to all those times. So the requirements about be able to write the log files and the status files and everything else will be upheld.

Q: Any best practice recommendations while replicating data from SAP HANA to AWS S3?

A: Like we talked about before on cloud connections, make sure you’re using, like in AWS, Direct Connect so that you have a good solid connection to the cloud process.

Talk to Experts Tuesday - Qlik Replicate FAQ

Talk to Experts Tuesday - Qlik Replicate FAQ

General Question