Biztalk Windows servers lose connection to NAS storage resulting in backlog accumulation.
We've observed a really strange error over the last week or so that just have us pulling our hairs.
We run a pretty busy Microsoft BizTalk Server on three servers in a cluster and we use a file share (an enterprise NAS with a mountpoint we access) to read and write files from.
We define it like: \our_fileshare\files
which is a CNAME alias in the AD
And the NAS exists as NAS01 so we can just as well access it as: \NAS01\files
However, since last week, all of a sudden one of the servers will stop being able to access \our_fileshare\files
The error can occur on all of the servers or just one of them or a combination in between and it won't occur at the same time on all three servers, but it will, if left unattended, happen to all three which will then just block all traffic.
It throws up an error in BizTalk that says "Could not save file to disk!", the file explorer says "Could not connect to \our_fileshare\files" and if we try it from a command prompt or from Windows+R it states "Insufficient system resources exist to complete the requested service", the same error occurs if we run any of our own in-house developed .NET applications.
The strange thing is though that if \our_fileshare\files gives us that error, we can, on the same server, access the file share using the \NAS01\files just fine.
So, we thought that there might be a DNS issue and while the DNS guys were looking into it we change it in BizTalk to start using \NAS01\files instead. So, we did it today, and low and behold the issue arose AGAIN. Same errors.
SO, now \NAS01\files won't work, BUT \our_fileshare\files works fine AND a new thing we've found out is that \NAS01.ourdomain.local\files works fine as well.
Does anyone have ANY hints as to where we can look into to find the root cause of this issue? The only current resolution as we see it is to EITHER reboot the Windows Server in question OR restart the "Workstation" service in Windows Services.
I've tried googling the error and read endless of ms docs and various Q&A's here and on other places but I can't find anything that will work. I saw some question that matched this perfectly regarding corrupt cache of offline files, but CSC is set to startup "Disabled" on our servers, so no luck there either.
We're running Windows Server 2019.