Distributed transactions not working
Question
Thursday, August 21, 2014 5:00 PM
Hi.
Please read this post carefully as I believe this is going to be a complex issue to resolve and I trust we've eliminated all commonly known causes while troubleshooting. We've had two DBAs and two infrastructure analysts working on this problem for more than a week with no progress.
I've been having trouble running distributed transactions between a clustered (WSFC 2012) and a standalone instance of SQL Server 2012 Enterprise.
Both MSDTC services are running and with all those parameters enabled in component services (allow inbound/outbound connections, network access, no authentication required, etc etc etc). Both servers are able to resolve DNS and reverse DNS of each other, even though they are in different domains. Distributed transactions are able to run fine between instances belonging to the cluster.
In both the standalone and the clustered instances, the following error is returned when attempting to run a distributed transaction:
OLE DB provider "SQLNCLI11" for linked server <LINKED SERVER NAME> returned message "No transaction is active.".
The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server <LINKED SERVER NAME> was unable to begin a distributed transaction.
When running a distributed transaction started from the clustered instance, the following error is generated in the sql server error log in the REMOTE, standalone instance:
"Recovery of any in-doubt distributed transactions involving Microsoft Distributed Transaction Coordinator (MS DTC) has completed. This is an informational message only. No user action is required."
This is NOT observed the other way around.
DTC Ping was returning the following message for either instances:
There are no more endpoints from the endpoint mapper
Acording to the link below, this sugests either a firewall problem OR the operating system running out of DCOM ports.
However, we have already ruled out these options as there are no firewalls between these hosts and the server in which both SQL Server instances are running are dedicated to them, so there are no other running services which could cause port overlapping that i know of. Listing open sockets in command prompt confirms this. We have also tried to increase the port range for MSDTC manually, but that didn't change anything. The ONLY open socket, in both servers, that is documented for MSDTC is port 135, which is used for RPC so i believe its alright. There are no other open sockets for ports 1434 or 5100-5200.
After we installed a few aditional .net components on both servers, DTCPing test started to run successfully from the clustered instance to standalone, and clicking "run" in standalone did nothing. The error messages in SQL Server remained as they were.
One interesting remark, although I'm unsure wether this has anything to do with the problem, is that the physical machines in which the cluster nodes run have two MSDTC services, one local and one clustered. I've never seen this setup before. Strangely, the local, disabled MSDTC service displays a uniqueidentifier in services.msc that is listed as a clustered resource in regedit
This service wont start and will generate the following errors in event viewer:
MS DTC log file not found. After ensuring that all Resource Managers coordinated by MS DTC have no indoubt transactions, please run msdtc -resetlog to create the log file.
MS DTC Transaction Manager start failed. LogInit returned error 0x3. This is expected if the corresponding volume has been restored (search for kb article regarding the issue). In all other cases, contact product support.
OBS.: The clustered MSDTC service is running fine and DT works between two different active clustered instances, either on the same node or on distinct ones.
When running a distributed transaction from one of the clustered instances to the standalone instance, the transaction is listed as "active" for long period of time in the clustered server before timing out in SQL Server and aborting. It is NOT, at any time, listed as an in-doubt transaction.
Changing the server parameter "in-doubt xact resolution" to 1 in both instances didn't have any effect either.
Questions:
- Could this problem be related to the presence of multiple MSDTC services in the clustered instance?
- Why won't the MSDTC local service start in the clustered service?
- Why is the in-doubt transaction message generated for the standalone instance, but not for the clustered one?
- Does MSDTC even work between a clustered and a standalone SQL Server instance???
- In the clustered instance, SQL Server resource is not dependant on MSDTC. Could this be interfering with transactions outside the clustered instances?
All replies (4)
Monday, August 25, 2014 9:55 AM ✅Answered
Hi Samir
Currently, I can only answer two of your questions.
Q1: multiple MSDTC services are supported on a clustered node- local MSDTC and clustered MSDTC, or multiple clustered MSDTC. Which MSDTC will be used by SQL server is determined by this:
Q4: yes, it works
For the error below, please refer to http://technet.microsoft.com/en-us/library/cc774113(v=ws.10).aspx for more information.
"MS DTC log file not found. After ensuring that all Resource Managers coordinated by MS DTC have no indoubt transactions, please run msdtc -resetlog to create the log file."
Moreover, for the description:
"Strangely, the local, disabled MSDTC service displays a uniqueidentifier in services.msc that is listed as a clustered resource in regedit"
It indicated the "local" MSDTC service is actually clustered. I don't know why it is said as "local". Still the same URL below can help you identify your MSDTC configurations.
Tuesday, September 9, 2014 10:26 PM ✅Answered
For further reference, I managed to solve the problem by adding entries in LMHosts file for both servers so netbios would work, as they were in different domains. After the LMHosts file editing, DTC started to work both ways.
Monday, August 25, 2014 5:40 AM
Hi Samir ,
Thank you for your post.
I'm trying to involve someone familiar with this topic to further look at this issue. There might be some time delay.
Best Regards,
Tracy
Tracy Cai
TechNet Community Support
Tuesday, June 2, 2015 7:13 PM
Samir, hello
I have a very similar problem. Please eleborate on what entries did you insert in LMHosts. I tried adding
<ip> <server name> #PRE
but it didn't help.
Thanks