Occational qvs distribution is refused

Skage · ‎2024-07-02

Hi

in a 2-qvs + 2-qds environment occasionally tasks will get the status "Warning".

According to the log one node does not have "Upload enabled" so it tries the other qvs. This setting IS enabled and this is a cluster setting, so how come one node report it as not enabled?

This is from the logs.

------------------------------------------

(2024-07-02 04:47:11) Information: Trying to distribute. QVS=qvp://server1/

(2024-07-02 04:47:11) Information: Connecting to QlikView Server. Address=server1:4747

(2024-07-02 04:47:11) Information: Successfully connected to QlikView Server. Address=server1:4747

(2024-07-02 04:47:11) Warning: QVS does not have upload enabled. QVS=qvp://server1/

(2024-07-02 04:47:11) Warning: Could not connect. QVS=qvp://server1/

(2024-07-02 04:47:11) Information: Trying to distribute. QVS=qvp://server2/

(2024-07-02 04:47:11) Information: Connecting to QlikView Server. Address=server2:4747

(2024-07-02 04:47:11) Information: Successfully connected to QlikView Server. Address=server2:4747

(2024-07-02 04:47:13) Information: Creating/Updating file: qvp://server2/folder/Application.qvw

------------------------------------------

I've checked the Settings.ini on both machines and they are identical. Is this setting somewhere else?

This would be easier to explain if no connection could be made but the message is clear - a setting is not enabled or if the node did not answer on 4747.

Does anyone have input that might explain or even remedy this?

/lars

Jill_Masinde · ‎2024-08-09

@Skage Do you see this status "Warning" in QMC? This is not an error, but just a warning message recorded there are services not able to communicate.

Did you review the entire Tasklog and there are no warnings or errors? If you are seeing this in Qlikview Management Console review the following.

Check if any node has service down issue, but QVS still alive
Review resource usage on the QlikView Server node. If the QlikView Server is unable to respond in time due to CPU or memory starvation, connection attempts will time out
If one of node is turned off and no longer required the cluster URL should be removed from cluster group.

Skage · ‎2024-08-12

@Jill_Masinde

Yes, the task has the status of warning.

The other node will accept the upload so the task is completed.

My question is why would QVS at random claim that a setting, that is set correctly, is not present. What other things beside the settings is involved in this check? Disk-access, network connectivity...?

My suspicion is that more than the setting is checked but the error message is not reflecting the real reason but the conclusion is the same - uploads can't be performed at this moment.

I've checked Windows Events, Engine logs and Proxy logs for errors/warnings that might explain why this node suddenly claim that the upload-setting is not enabled.

The Engine performance logs show very low resource consumption. Later the same morning several other uploads are accepted and most days all uploads are accepted on this node.

Both front-end-nodes are in use. These machines perform no reloads and this is early, early morning so the resource consumption should be low.

Skage · ‎2024-10-09

@Jill_Masinde

These "warnings" keep happening and the customer would like to know why one node at random is claiming why a settings is considered "not set".

There are no sign of resource starvation and all services are running fine. Port 4747 is QVS and that is for sure running on the frontend machines.

The reason the task has the status "warning" is due to the fact that the connection refusal only happen with one node and the next node accept the communication.

The failed connection still happen at random and they'd like to not rely on luck, that not both nodes claim "upload disabled" when it comes to distributing apps to the users. Understanding if this is a problem within QlikView or in their environment is valuable.

marcus_sommer · ‎2024-10-09

That this happens randomly showed that the general functionality is working. Therefore the main-cause is on the outside of Qlik and very likely any delaying caused from a too high workload of the machines and/or the network which probably results in a timeout - whereby the message not directly hints to it else it showed a consequential effect.

Finding and resolving such issues could be quite hard - especially if the various performance-logs (QV + OS + Network) didn't contain appropriate hints else tools like the process monitor and/or wireshark are needed to track the happenings.

Chip_Matejowsky · ‎2024-10-09

Hi @Skage,

I agree with @marcus_sommer 's comment. I have seen this issue a few times and belive this issue is due to a high workload on QVS 1. You'll notice that in the timestamps in the log you posted, the transition from QVS 1 to QVS 2 is almost instantaneous. So, this transition from QVS nodes is likely due to the polling that occurs between QDS and QVS nodes.

I do agree that the verbiage of the warning is somewhat misleading. As @Jill_Masinde previously, as this is a warning and not an error, the end result is the reload task is successful. If you begin to see actual task failures, then please do contact support if you need assistance in determining the root cause.

Best Regards

Principal Technical Support Engineer with Qlik Support
Help users find answers! Don't forget to mark a solution that worked for you!

Skage · ‎2024-10-15

@Chip_Matejowsky and @marcus_sommer

I've finally had some time to dig a bit deeper into this.

This is NOT a case of high load on the QVS and I can't find any trace of anything that point to something outside QlikView.

All involving machines are far from being saturated during every time frame this has happened in the last month. QVS-event-log show nothing out of the ordinary. Node-1 show "socket closed by client" at the same timestamp, so the initial connection on 4747 is successful, as the task-log show. The other QVS-node have log entries regarding the following upload.

I do not agree that moving directly to the other node is an indicator that timeouts are involved. It successfully connect to node-1, request upload-support, get a rejection directly and move on to the other node, all within the same second. If more time was involved then perhaps. But this is a situation where there are no known/visible network issues, no saturation...so it seems to either be a deliberate answer from node-1 or what can be seen as a complete fabrication from the QDS. It would be interesting to know what code is involved and what decisions went into the log statement.

It is correct that the TASK only has a warning but the task do contain show clear trace of an intermittent problem.

This is not their first intermittent problem this year. That was a bug.

From what I can see, tasks always probe the QVS-nodes in the same order. It is also always the first node that claim that the upload is not enabled. The second QVS-node has never refused an upload.

There are many, many tasks and uploads per day and it is only at random that the first node doesn't successfully accept uploads.

Yes, it is "only" an annoyance and a warning but the reason for asking was to be able to explain to my customer WHY a task claim that a QVS is not configured correctly so it can be properly taken care of. The wording seems to indicate that QVS know something or that the following QDS-probe after initial connection on 4747 is failing somehow.

Sure this could be "environmental" but then it would be great with something tangible or even an hint to pass on to the teams, that would have to act on that. At the moment I have anything to pass on.

Seems like a support ticket would be the way forward.

I was hoping that someone else had run into the same situation, but I guess not. Thank you for helping me exclude some parameters.

/lars

Client Managed

Management

Publisher