Troubleshooting DCOM

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By Dan Rogers

Technical Evangelist, Microsoft Corporation

[Adapted from the Business Application (BizApps) Conference in September 1998.]

Summary

You've just finished the most marvelous component-based application. It works great in development and it's time to spread it out across multiple servers. Deployment is a time when many MTS applications slip their schedules, whether due to actual problems in the application design or simply because getting distributed applications configured properly is hard. Finding bugs in applications that are based on components takes a certain degree of planning up front—planning such as anticipating that problems are inevitable and building in mechanisms for tracking them down at the time the application is designed. This paper covers several tactics for uncovering and eliminating problems that are common to COM-based distributed applications.

The following summary describes the topics we'll cover:

Section	Description
COM Security Settings	Security is essential. Security is a pain. Getting it right is the first step. This section covers common problems and your options.
Finding COM Security Problems	How can you tell if it's an application problem or a security problem. If you wonder which, then it's probably a security problem.
Setting up your Servers	Windows servers aren't DCOM friendly out of the box. Getting them there as part of a standard machine configuration helps.
Tools that crack DCOM bugs	Every Windows server and workstation comes with the tools you need to crack most DCOM bugs.

Mostly, this paper contains a loosely linked set of information that forms the basis for a method to analyze and eliminate common problems that plague almost every new application that relies heavily on distributed components and COM. You'll find a heavy bias towards assuming all bugs in tested applications are security setting related—why? Because 99 out of 100 problems you'll encounter in applications that work fine on one workstation but fail in a production environment are security related.

COM Security Settings

COM components expose a myriad of settings that need to be dealt with in an application deployment architecture plan. All too often, the unsuspecting friendly administrator is handed a bunch of COM components after the coding is done and asked to put them on "the server". If they're MTS components, our friendly administrator knows what to do—put them in a package. Here is where things can start going wrong. After all, which package should these things go in? All too often, the administrator is left with the choice and mistake number one happens:

Mistake #1: Package planning step is skipped.

Results: Potluck performance, random security problems

When designing a component based application, including security and packaging plans are essential project deliverables. The security plan covers identifying application roles, and lists of Windows users or groups that need to be made members of these roles. The packaging plan is a part of the application deployment plan that covers how COM components will be set-up on the component server. It identifies which servers will contain which packages, and spells out the package properties, or settings, that an administrator (or installation script) will use when defining the deployment on a server.

Identity crisis: a new user type

Many businesses today have made Windows their choice for desktop and server environments. Applications that are common include desktop productivity suites like Microsoft Office, document exchange and electronic mail programs such as Outlook 98 and Exchange Server, and file and print services. Every day, millions of Windows users around the globe enter their ID and password and start using applications. Most businesses as a result have a good understanding of the concept of user identity and have business processes in place to support the business need to define new users (such as when someone is hired).

In environments like this, a first-time or early-adopter application that relies on COM components may run into deployment difficulties due to the need for every package or component to have an identity when it runs. Focusing on MTS components, there are two options for setting component identity – the interactive user, and a user account. These settings are found on the "identity" tab in the property settings for each package.

The default setting for any package is "interactive user". This works OK in development where the person testing the application is logged in on the developer workstation or server and actively exercising each component. The identity of the package inherits or assumes the Windows user identity of the person that is logged in on the machine where the component is running. However, in a hands-off server environment, like those found in production data center applications, there isn't anyone logged in. What happens to these components under that circumstance? Let's look at the meaning of Interactive User.

The "Interactive User" is a special concept in Windows. A Windows server can run any number of processes – and each of these processes has a specific user identity. For each user identity that is active in a system at a given time, a security context is established. This security context takes the form of a "window station"—or process group belonging to that user identity. Of all the window stations that are active, only one can have access to the keyboard and mouse—the system console. The window station, or session, that was started or resumed at the console is known as the interactive session. Its identity is the interactive user. So what's this mean from a DCOM perspective? Let's look closer at package identity to find out.

Package Identity

When an application is deployed or tested on a component server that has no one logged in at the system console, MTS components will fail to run if the package identity property is set to the default setting. This is because without anyone logged in, there is no interactive user, and Windows will refuse to guess which identity to assign to the components that run in the package.

Mistake #2: Package Identity not set

Results: Application components fail to start in production

To remedy this problem, the application security plan must identify a number of non-user Windows user account names. These non-user accounts are assigned to the packages that make up an application. Determining the right number of accounts and coming up with an account name for each required account is a design activity that should occur up-front. The Business Operations sample application that was created for the Business Applications Conference (BizApps) called for three non-user accounts in a full production set-up. These accounts were used to identify the package that the business document components ran in (the FabExposed package), the package that the message queue processing components ran in (FabQueue), and the Site Server commerce pipeline components that processed orders (IUSR_FABSS1).

The latter identity may look familiar to anyone that has installed Internet Information Server. This non-user account is created automatically when IIS is installed on a Windows server. Non-user accounts are commonly used in server processing environments today, and many applications create local non-user accounts when they are installed. Still, many administrators and virtually every user account request process is unaccustomed to the concept of an application needing a Windows account. This is one of the areas where the application architect needs to do some trailblazing when the first component-based applications are deployed.

A best practice is to include a "Non-User Account Requirements" section in the application deployment plan, and identify all of the non-user accounts that are required by an application before the application is coded. By socializing this need early on, and championing the cause to have non-user account requests made part of the mainstream application deployment process, problems like Mistake #2 won't happen any more.

Non-user accounts for DCOM component access require a special setting to be made on the server where the account will be employed. These accounts must be granted "Log in as batch job" permissions. This setting is accomplished via the User manager tool, using the "User Rights" selection on the "Policies" menu. When you access this dialog, click on the "Show Advanced Rights" checkbox, and then scroll the "Rights" list down to the "Log In as Batch Job" setting. Add the non-user accounts to the user list.

Note: When creating non-user accounts, a best practice is to establish a recognizable naming convention that will be recognizable to system administrators. Some sites have adopted a naming convention that clearly identifies an NT non-user account as one intended to be used by MTS packages. For example: pkg_accting1 to represent the MTS package named accting1.

It doesn't change right away:

Here's one that gets almost everyone at least once, and a real deployment problem that crops up to bite you if you don't have your application configured properly before you the user community starts to use your application heavily. The symptoms are somewhat strange: You make a change to a package property, and the application doesn't respond until some time later—usually several minutes to several hours. Instead of rebooting your server to force a package change, you should try another drastic (but much less so) step—stopping the package after you make a configuration change.

Some care is called for—especially if the application is in production. Stopping a package is a drastic step that will impact every component in the package, affecting any active transactions that involve any of those components. However, if you need to have a package property change to take immediate effect, stopping the running package is the only option. For this reason, it is important that the settings for your packages are identified early on in the design stage of a project, and verified as part of the deployment process.

Mistake # 3: Package settings not planned or settings checklist is skipped

Results: Application settings need to be adjusted later on, impacting users

A best practice is to define a project deliverable called "MTS Package Setting Checklist" that enumerates all of the settings that are called for by your applications MTS packages. If there are multiple servers involved, a separate checklist should be included with each of the package deployment plans for each of these servers. Before any users are given access to the application, run through all of the application requirements or use cases using representative "test" workstations. Don't forget to exercise the security requirements of the application by establishing enough "test accounts" for the application. These test accounts should include each user role, as well as "rogue users" who shouldn't have access to the application. Test plans should accommodate testing package settings and security in this way as well.

Permission Denied:

Another problem related to package settings happens when administrators disable security on a package because there is a user on the phone that cannot access an application. This setting is accessed via the Security tab on the package "properties" dialog in the MTS explorer. On this tab you'll find a simple check-box called "enable security", and a drop-down list that lets you adjust authentication levels for component calls.

You know you have a security setting related problem when you can uncheck the "enable security" setting, and after stopping the package your problems disappear. The trouble is, disabling security checks is hardly the remedy to the problem. All too often, I've seen frantic developers called in by even more frantic administrators and the developers mysteriously "fix" the problem. This always happens when some vice president is in the middle of a live application demonstration.

Using the Enable Security checkbox is a valid debugging technique, and part of your MTS component debugging tool-belt. Since "Permission Denied" is such a common problem and can have a number of root causes, any simple technique that helps divide the problem space into 'yes, it's security related' or 'no, that didn't effect anything' is helpful. If you do use this tool, however, you must reset the security settings to the intended value before you call it a day. After all, many components will perform actions that should be guarded. As an administrator or trouble-shooter, it is difficult to know with any assurance which components in a given package are benign, an which perform operations that must be restricted to a small number of users.

Note: Use this approach with caution—opening a security gap, even for a moment, may not be an acceptable approach in some circumstances. In sensitive circumstances, be sure to restrict access to the application being tested by isolating the application from the rest of the network before disabling security in this manner.

The dreaded 'permission denied' has so many causes that it is impossible to tell you the one trick that will work when you see this message. This error can be caused by errors in package settings, role assignments, Windows security permissions, and server or client DCOM settings. However, by understanding the different tools at your disposal and the component parts that make DCOM settings come together in a distributed system, you should be able to eliminate problems in applications that come out of the design phase missing some vital information. Possible causes and remedies include:

Roles defined on package, Security is enabled: Under these conditions all user requests will be rejected when the MTS roles are unpopulated. Remedy: Define Windows security groups and add these groups to the MTS roles as users. Add human user identities to the Windows security groups.
DCOM access permissions on component server not configured: Applications that work locally on the server but fail to operate with remote calls can be caused by server DCOM permissions. Change default security settings for access and/or launch on the server with DCOMCNFG. Think twice—it is far too easy to "unsecure" an application with this tool.
The user isn't a member of the role or security group: Don't overlook this possibility—it happens all too frequently. Start by checking the role definitions for a package. If roles are defined and security is enabled on a package, any application attempting to use a component will be rejected if the user identity of the application element making the COM call isn't a member of the roles that are protecting a package1. When you find the full scope of role protection on a package and determine that a user is getting 'permission denied' errors, first check that the user is entitled to the access.
It's too slow—beware Library packages: You have followed my direction and designed an application that exposes high level business document components and made these components call components that run in "hidden" MTS packages. Works great—you got the security settings just right—but it's not as fast as you'd hoped. After reading a little, you discover the "Activation" tab on the package property dialog and change the subordinate package to "Run as Library package" and performance picks up. Be careful—you just turned off security on the hidden package. While setting a check-box is a great way to quickly discover if you can improve performance, the better way to remedy this problem is to add hardware—faster processors, more memory, you know the drill. There is no budget that can fix a security breach—and if you expose sensitive operations in components in library packages you'll have a heck of a time explaining how much capital budget you saved to the CEO who wants to know why the problem happened.

Finding COM Security Problems

How can you tell if it's an application problem or a security problem? If you wonder which, then it's probably a security problem. While this may sound glib, in reality it's a good axiom to follow. When we set up the BizApps production demonstrations, we had 11 servers involved, each playing a different role in the distributed system. While some were out of the picture, from a DCOM perspective, most of these servers managed at least a few components running in MTS. As a single, distributed digital_nervous_system, the interconnects between these different component servers, and the client workstations that used them was complex. In an enterprise of any size, the number of interconnects between servers will grow rapidly as the number of deployed applications increases.

Planning and careful design are supposed to cure everything, but until our policy planners and application architects all understand COM security settings completely, a lot of time will be spent figuring out where the settings are broken.

Not every application that uses COM relies on MTS. Applications that pass object or component references to other components complicate security settings. In these cases, the security setting requirements move out to the client workstations. Consider the trace program that we used for the BizApps sample systems. This program uses a COM-based publish and subscribe metaphor to capture information from running components. A document on the conference CD describes how you can make the trace utility work on a single workstation, but during the conference, I used DCOM to run the Event Monitor program on the workstations that projected to the large screens on stage.

To make this happen, I needed to configure the workstation to allow DCOM callbacks. Since there are no checkboxes that are labeled "allow DCOM callbacks", I needed to figure out which DCOM security settings were called for on the workstation. Today, passing object references, using COM connection points, or raising events from a VB program are all common application design patterns. Whenever an application uses any of these mechanisms, the application deployment plan and COM settings checklist need to be modified.

Spotting design and implementation approaches that inject COM security setting requirements into the deployment plan takes experience. Gaining this experience takes practice with COM based applications across a number of applications. Despite sounding like the cliché job ad that requires 5 years of Java programming experience (and Java is a two year old language), the only way to get started is to get started. Hopefully, armed with a handy how-to like this paper, your discovery period won't be as long as mine was. Just remember that if it ran on one workstation or server and started to fail when you deployed it to multiple machines, odds are you have just encountered a security setting problem.

Setting up your Servers

A good starting point for eliminating DCOM security problems is when you set up a new server. Many companies are adopting standard server and workstation configurations to reduce their total cost of ownership. Since tracking down a DCOM related configuration problem could be very expensive after the fact, tremendous savings opportunities exist at the beginning.

When you first install Windows Workstation or Server, DCOM is disabled for remote calls. This is a good default when your business is selling operating system software, but a horrible default when your business depends on applications that require machine to machine DCOM calls. My recommendation is to apply the following steps to your standard configuration recipe:

Define and then enable default DCOM security settings: Two steps here. First, use DCOMCNFG to turn on "additional security for reference tracking". To do this, click on the start menu, select run, and enter 'DCOMCNFG' and click OK. A dialog is displayed: select the "default properties tab". Here you'll find the default settings for DCOM that effect the entire system. At the bottom of this dialog you'll find a check-box that enables additional security options that help you trace DCOM configuration problems.

DCOM default security settings (the next tab) impact all DCOM components that don't have explicit security settings. By default, only the local machine administrator account and "interactive user" have access to DCOM, and only administrator accounts have launch permissions. Launch permissions involve starting up a COM component the first time, while access involves using a DCOM component that is already running. The distinction is lost on most humans, but suffice it to say that there's a first time and the next time—launch is important.

By defining an enterprise policy on DCOM and applying it to a standard Windows configuration prevents you from having to visit every single workstation or server when you decide you want to deploy COM applications. Use DCOMCNFG to set the default settings you've selected whenever you set up a new workstation or server.
Enable auditing: When you first set up a Windows server or Workstation, the default setting for system wide audits is "disabled". Turn on failure audits by using the user manager program, again accessing the Policies menu. This time, click on the "Audits" selection. Enable audits and select all of the "failure" check boxes and click OK. At this point, any DCOM security problems will start to show up in the system event log on the system where they occurred.

Unable to launch

Now that you've got your servers set up to start tracking DCOM related problems, I've got to mention a particular problem that is quite common, but falls outside of the security realm—sort of. This problem makes its appearance when you try and call a component, and the application hangs for a while and then just fails. The error message received at the client application will vary widely depending on the actions taken by the programmer that wrote the application. Most often, you see a big long error number that starts with "800005" something. These error numbers are documented in the header files that come with Visual Studio. If you look them up, they are almost always dead on from a description perspective (where the description is limited to the text of the #DEFINE declaration)—and usually end up being something like E_COULDN'T_DO_IT. The point here is that a meaningful error message is more useful than an obscure error code.

When we look at the system event log (more on this in the next section) we see what appears to be an internal failure inside of MTS. You'll see a message that indicates that a component failed while trying to initialize it. There are two common causes for this, one that actually turns out to be security setting related.

The first cause is that an error was thrown while the component was initializing itself. This happens when a VB programmer, typically, adds logic to a Class_Intialize method, and that logic does more than simply initialize some variables. Any error thrown in the object's "constructor" or initialization code causes MTS to shut the component down and log an error.

The second cause is that the package is configured to run as specific identity, but that identity doesn't have "log in a batch job" permissions. This is far more common and easily distinguished by testing the component with package identity set to 'interactive user'. If the application will start up as the interactive user, but not as the specific identity, then either the identity being used is no longer valid (which generates a security failure log message in the system security log) or the "log in as batch" permission hasn't been granted to that ID (which generates the weird MTS message).

Tools that Crack DCOM Bugs

Four essential tools that are standard parts of a Windows administrators tool box give you most of the information you'll ever need to track down a distributed COM settings bug. These tools are:

Event Viewer: (eventvwr.exe) This program lets you access the local system event logs or logs on any other server. Administrators may want to grant privileges to certain application support team members so that these people can look at the contents of the event log. Once auditing and extra DCOM security events are part of your standard server configuration, the system event log is the place to start when tracking down COM related problems. Event Viewer may be launched via the system "run" menu and is commonly located on the Administrative Tools menu.
Distributed COM Configuration tool: (dcomcnfg.exe) This program allows you to control permissions on COM packages – which isn't the same as MTS packages. COM packages represent the grouping of COM classes within a COM server DLL. In cases where there are multiple components in the same DLL file, only the first one of these will be visible via the DCOMCNFG component list. Security settings on a per component basis are controlled via this list – and this is where you go to set component identities, access control lists, etc on non-MTS components.

This tool also lets you set system wide defaults for network protocols used and supported by DCOM, as well as system wide security defaults for component access, launch/start-up, and to identify users that may alter the DCOM configuration itself. DCOMCNFG doesn't appear on any system menu structures and is easily accessed via the standard system "run" menu.
User Manager: (usrmgr.exe) This program allows you to define non-user security accounts, account groups, and define policy and audit settings. User manager may be launched via the system "run" menu and is commonly located on the Administrative Tools menu.
MTS Explorer: (mmc snap-in) The MTS snap-in lets you define packages, create package export sets, and define security roles and package configuration settings. The MTS Explorer is commonly located on the Transaction Server menu found under the NT Option Pack menu.

The event viewer is the most useful for tracking down distributed security problems. Once the system settings are correctly set, three logs are available—system, application and security. The security log displays graphical icons that look like a padlock when a security event logs a failure condition, and a key condition on success. Success cases aren't interesting when debugging security problems. The application log also contains information that is useful for tracking down the source of DCOM problems.

The other tools are useful in establishing machine configurations. Changes made with User Manager take effect right away, while changes made with the MTS Explorer or DCOMCNFG often require that you stop the COM component for changes to take effect.

THIS INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. MICROSOFT DISCLAIMS ALL WARRANTIES, EITHER EXPRESS OR IMPLIED, INCLUDING THE WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL MICROSOFT CORPORATION OR ITS SUPPLIERS BE LIABLE FOR ANY DAMAGES WHATSOEVER INCLUDING DIRECT, INDIRECT, INCIDENTAL, CONSEQUENTIAL, LOSS OF BUSINESS PROFITS OR SPECIAL DAMAGES, EVEN IF MICROSOFT CORPORATION OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SOME STATES DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES SO THE FOREGOING LIMITATION MAY NOT APPLY.

1	In some cases, designers add declarative MTS protection by assigning roles to specific interfaces exposed by components. You need to drill into the Interfaces folder under the component to check this rare case.