I need some help with a strange error that I haven't seen before, and can't figure out. We are seeing a publish only task failing with the following errors. This particular job hasn't published in two days, always with these errrors. The strange part, is the job is showing on the AccessPoint as being published. Everything seems normal when looking only at the AccessPoint. Anyone seen anything like this before? A distribute only job failing but actually publishing correctly? Any advice on how to fix it?
Error: QVClient.Execute failed: System.Net.Sockets.SocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond(2014-10-23 12:47:05)
Information: Performance data (CopyFile): Connecting to QVS. (0), DMSAccessRules=4 (0), Uploading file: AccesPoint/ABC.qvw (16), Write complete. (1323576576 bytes processed) (70388)(2014-10-23 12:47:05)
Error: QVS Distribution failed to distribute to all nodes in the QVS Cluster. NodeCount=2(2014-10-23 12:47:05)
Error: Distribution to resources failed with errors. Warnings: 0, Errors: 3
Hi, Brian, long time no speak, hope all is well. Is your Distribution Service by any chance on a different box than your Directory Service Connector? If so, install a DSC on the Publisher box and always use the local one.
Good to hear from you. Hope you have been well. We have two servers, one for publishing, and one for the front end applications. They are clustered together, but the distribution service is only running on the one server. We have the DSC running on both servers, but last night I tried turning it off on either server and the job still fails. We are only having this issue with two jobs, and both are over 1GB in size. Some of the other publish jobs are getting the below warnings,but they still seem to work fine, although I'm sure its all related.
Warning: QVS Distribution failed to distribute to one or more nodes in the QVS Cluster, but one node completed with success. NodeCount=1
My guess is that the file is only dropped from RAM memory when it is succesfully replaced by another file with the same name, what would explain why you still see the dashboard in the accesspoint. But this is only a guess.
If you open QMC >> System >> Distribution Services, make sure Publisher is configured there to use the DSC that is on its own machine. I've seen that error come up when Publisher has to reach across a network to authenticate a distribution list. In general, I install a DSC on every single one of my QlikView machines. There is no advantage to having it run on only one box IMO.
I checked the Distribution Service and is is pointing to its own machine. I tried changing the chunk size to 20 as I read in a few other posts, but that didn't work either.
I wonder if, because of your file size, you're hitting some sort of timeout when copying the file (QV assuming it failed if it didn't return "success" if a certain amount of time).
Another thing it might be is a DSC timeout. I've sometimes seen where over-zealous IT admins block every port and that causes huge delays in authentication, since the services have to query the DC over a small range of ports.
Check the timestamps of the task log events and see if, especially towards the end, you notice any unusually long delays and post back here.
Also cross-reference with any QVS Event logs around the same time as your task completed, as well as Event Viewer Application logs.
I have checked the Event Logs and the QlikView logs and the only pieces that stand out to me are the below errors from the DSC. They are close to the time of failures, but don't line up 100% though, so don't know if they are the problem.
(ActiveDirectory.ADItem) Failed to initialize node : Invalid DirectoryEntry (objectGUID/sAMAccountName)
(ActiveDirectory.ActiveDirectoryProvider) Node initialization exception: Invalid DirectoryEntry (objectGUID/sAMAccountName)
I believe we have found a work around however. I don't know if I like having to do it this way but it does seem to solve the problem. We changed the Distribute to Server, to Distribute to Folder, and it is working just fine. Unless you have any other thoughts, we will just leave it this way since it is working.
Yes, it shouldn't have any problems. This particular problem is only happening on our Stage environment, but the same jobs, under the same service account run find on Prod. Everything is setup the same between the two environments.