So what exactly IS COM anyway?

2004-10-15

A couple of days ago, David Candy asked (in a comment on a previous COM related post) what exactly was COM.

Mike Dimmick gave an excellent answer to the question, and I'd like to riff on his answer a bit.

COM is just one of three associated technologies: RPC, COM and OLE (really OLE Automation).

Taken in turn:

RPC, or Remote Proceedure Call, is actually the first of the "Cairo" features to debut in Windows (what, you didn't know that there were parts of Cairo already in Windows? Yup, actually, almost all of what was called "Cairo" is currently in windows).

RPC provides a set of services to enable inter-procedure and inter-machine procedure calls. The RPC technology is actually an implementation of the DCE RPC specification (the DCE APIs are renamed to be more windows-like), and is on-the-wire interoperable with 3rd party DCE implementations. RPC deals with two types of entities, client's and servers. The client makes requests, and the server responds to those requests. You tell RPC about the semantics of the procedures you're calling with an IDL file (IDL stands for "Interface Definition Language" - It defines the interface between client and server). IDL files are turned into C files by MIDL, the "Microsoft IDL compiler".

When RPC needs to make a call from one process to another, it "marshalls" the parameters to the function call. Marshalling is essentially the process of flattening the data structures (using the information in the IDL file), copying the data to the destination and then unpacking the flattened data into a format that the receiver can use.

RPC provides an extraordinarily rich set of services - it's essentially trivial to write an application that says "I want to talk to someone on my local network segment who's providing this service, but I don't care who they are - find out who's offering this service and let me talk to them" and RPC will do the hard work.

The next technology, COM, is built on RPC. COM stands for "Component Object Model". COM is many, many, things - it's a design pattern, it's a mechanism to hide implementation of functionality, it's an inter-process communication mechanism, it's the kitchen sink.

At it's heart, COM's all about a design pattern that's based around "Interfaces". Just as RPC defines an interface as the contract between a client and a server, COM defines an interface as a contract between a client of a set of functionality and the implementor of that functionality. All COM interfaces are built around a single "base" interface called IUnknown, which provides reference count semantics, and the ability to query to see if a particular object implements a specific interface. In addition, COM provides a standardized activation pattern (CoCreateInstance) that allows the implementation of the object to be isolated from the client of the object.

Because the implementation of the COM object is hidden from the client of the object, and the implementation may exist in another process (or on another machine in the case of DCOM), COM also defines its interfaces in an IDL file. When the MIDL compiler is compiling an IDL file for COM, it emits some additional information including a C++ class definitions (and C surrogates for those definitions). It will also optionally emit a typelib for the interfaces.

The typelib is essentially a partially compiled version of the information in the IDL - it contains enough information to allow someone to know how to marshall the data. For instance, you can take the information in a typelib and generate enough information to allow managed code to interoperate with the COM object - the typelib file contains enough information for the CLR to know how to convert the unmanaged data into its managed equivilant (and vice versa).

The third technology is OLE Automation (Object Linking and Embedding Automation). OLE Automation is an extension of COM that allows COM to be used by languages that aren't C/C++. Essentially OLE Automation is built around the IDispatch interface. IDispatch can be though of as "varargs.h-on-steroids" - it provides a abstraction for the process of passing parameters too and from functions, thus allowing an application to accept method semantics that are radically different from the semantics provided by the language (for instance, VB allows parameters to functions to be absent, which is not allowed for C functions - IDispatch allows a VB client to call into an object implemented in C).

Anyway that's a REALLY brief discussion, there are MANY, MANY books written about this subject. Mike referenced Dale Rogerson's "Inside COM", I've not read that one, but he says it's good :)

Comments

Anonymous
October 15, 2004
Larry, I've always wondered what the story was behind an article at this site http://www.relisoft.com/win32/olerant.html
. The author says that while he worked at MS, he pointed out a "fatal" COM design flaw that was basically ignored. He writes...

"You might think, "Oh, right, big deal! It's easy to come up with these ideas now, after OLE has been on the market for almost a decade." What if I told you that yours truly, who worked for Microsoft back then, soon after OLE 1.0 was released, had these ideas written down and sent to the responsible people. To make the long story short, the ideas were accepted as valid, but rejected on the premise that there already had been too much code written to the OLE specification (mostly at Microsoft). No manager was willing to take the risk of redesigning OLE."

Is that topic off limits or are you perhaps the wrong person to ask?

Thanks.
Anonymous
October 15, 2004
Bartoz did work at MS, actually I used to work with him.

I have a lot of respect for Bartoz, and his points are valid, but my simple answer is two-fold.

The first part is: "How often does this design pattern REALLY occur?".

The second is that he's saying that COM reference counting isn't important. Actually it's critical - lifetime management is a nightmare without it - he naively says "The truth is, there is very little need for refcounting as long as you agree not to destroy the object while you are using its interfaces." This is true, but how do you know if nobody's using it's interfaces?

If you call a method that takes ownership of the object, how do you know that it's taken ownership? With refcounting, it's irrelevant - if the method wants to take ownership, it adds a reference, if it doesn't want to take a reference, it doesn't add the reference. Lifetime management is clear and clean. The only alternative to this is the CLR - object lifetime isn't owned by the object, there's another entity that owns the object's lifetime. But that wasn't a viable option back in the Win16 days where most windows machines had 1M of physical RAM.
Anonymous
October 15, 2004
The comment has been removed
Anonymous
October 15, 2004
Wow, I haven't heard "Cairo" in ages. I remember going to the Santa Monica MS office in Feb 1996 where they were showing off a beta of NT 4. The guy doing the demo made us all laugh by starting the presentation with "So do you all want to see Cairo? Here!" and he opened up a web browser showing a picture of the city of Cairo. ;)
Anonymous
October 15, 2004
The comment has been removed
Anonymous
October 15, 2004
The comment has been removed
Anonymous
October 15, 2004
The comment has been removed
Anonymous
October 15, 2004
The comment has been removed
Anonymous
October 16, 2004
The comment has been removed
Anonymous
October 16, 2004
Mo's absolutely right (in both his/her comments). If you're using COM directly, you can control this (by using the CLSCTX flags to specify that you only want an inproc server, for example), but as far as I know, you don't have that flexability from managed code (or from VB).
Anonymous
October 17, 2004
I understand the out of process bit.

So marchalling is only the data? What about COM putting itself into the calling process. Is this included under marchalling or is that something else. I remember (and this is all I remember) a problem relateted to some com object in clsid that had marchalling in it's name.

I have no problems reading C with API calls and converting to VB (eg FindWindow, set windowpos). But I don't know what all these > and something (or is it something) mean.

I also don't really know what a heap is apart from memory the program uses. I've used GlobalLock & GlobalAlloc et al in Win 16 (from basic) to manupulate the clipboard and to work in bytes (though normally I make a fixed length string). Although if I guess right all memory is pretty much the same in Win32.

Anyway some new MS toys.
http://www.microsoft.com/globaldev/outreach/dnloads/downloads.mspx (it's 11:38AM yesterday in Seattle now)

Don't tell that bloke that did the powertoy calc.
Anonymous
October 17, 2004
COM doesn't "put itself" in the calling process. The calling process uses COM, and calls into COM APIs. That stuff's not marshalling, it's loading DLL's into a process.

The heap is a virtual memory manager. When you allocate memory (for operator new, or whatever), the heap is where it comes from.

The CLR has a heap too, but it's managed (in other words, memory is GC'ed if it's not in use).
Anonymous
October 17, 2004
Why are IDL and ODL almost but not quite compatible? What are the criteria for choosing one or the other? What is the method for switching from one to the other if the programmer's actions in VC++ happened to generate the less appropriate choice?

(One of your colleagues couldn't answer this one. I don't know if that's because ODL was too old or too new.)
Anonymous
October 17, 2004
David Candy wrote:
"I have no problems reading C with API calls and converting to VB (eg FindWindow, set windowpos). But I don't know what all these > and something (or is it something) mean."

This is where most of the fun in C is ;). With > I assume you're thinking of something like this: lpPoint->x or lpHubba->DoBubba(). The "- >" is an operator named "Member access operator" and is quite similar to the "." operator which you are familiar with. Ok, but what's the difference? The "." op. is used to access members of classes and structs which are on your stack, and the "->" op. is used to access members of classes and structs to which you have a pointer to. What "->" really does is to dereference the pointer for you, and it is only syntactic sugar for writing (lpPoint).x.

And now that "" operator appeared. What does that do? :) It is used for two different things actually:
1) To declare pointers. This is something such as:
POINT lpPoint;

But before you can use lpPoint you need to allocate memory to it, for example using HeapAlloc, malloc etc.

A pointer points to some place in memory, instead of being some value. See below how to access the value of what the pointer points to.

2) The indirection operator. This means to take a pointer and access the value it points to. You see an example of using it above in the discussion of the "->" operator.

Often you may see the "&" operator too. This is the opposite of the "" operator, and is called the "address-of" operator. What it gives you is the address of some value. For example you have something like this:
POINT myPoint; // myPoint is stored on the stack

Now you want to send myPoint to a function which is declared like this:
void MySuperFunction(POINT* pnt);
What you do is to call it like this:
MySuperFunction(&myPoint);

It is a lot quicker to call functions by reference rather then by value (ByRef and ByVal in VB), because you only pass an address to the function instead of the contents of the struct.
Anonymous
October 17, 2004
Larry: His, not hers (10/10 for not making assumptions, though :))

David:

When create an instance of a COM object that's handled in-process, what you get back is a table of function pointers, and those function pointers will normally refer to addresses within the area of memory used by the DLL that implements that COM object. In this situation, you can draw (loose) parallels with LoadLibrary and GetProcAddress.

When marshalling steps into the fray, no DLL is loaded into your process' address space, but you still need a table of function pointers. Obviously, these function pointers have to come from somewhere. What happens is that the function pointers all point at client stubs instead of the 'real' functions. If you ignore the speed aspect for a moment, the net result is identical as far as you're concerned - you have an interface pointer, complete with a VMT that contains a bunch of function pointers, that you can use to invoke methods. In the marshalling case, the client stubs take the parameters in the normal fashion, and send them to the server. Now, they might be sent over an inter-process communication mechanism (IPC/LPC), or they might be sent via a remote procedure call to another host (RPC). What's important is that the parameters are put together into a packet that can be sent and understood by the server, which can then decode the packet and actually perform the action you requested in the first place. The same encoding/decoding-type action happens for any parameters marked 'out' or 'in, out', along with the return value.

This might all sound a bit complicated - and it can be - but the important thing to remember is that from the client application, a marshalled method call is no different (in terms of the code you write to invoke it) to an in-process call. That's what makes COM cool.
Anonymous
October 17, 2004
Maybe this (classic?) diagram will help you to better understand how marshalling works:
+-----------------+
+------+ |Remote COM object|
|Client| +-----------------+
+------+ /-
| |
-/ +------------+
+-----------+ |Remote proxy|
|Client stub| +------------+
+-----------+ /-
| |
+------------+

That's the basic call-flow when invoking a COM object out of process, as explained by Mo above. So basicly the stub/proxy pair handles the out-of-proc complexity so the client doesn't have to think about it. Quite the same is done in CORBA.

Sorry for my bad ASCII-drawing skills ;)
Anonymous
October 17, 2004
The comment has been removed
Anonymous
October 17, 2004
COM, IMO, is about several things:

- Memory allocation discipline (CoTaskMemAlloc et al)
- Object activation protocol (CoCreateInstance et al)
- Object lifetime control (IUnknown::AddRef and Release)
- Object interaction protocol (IUnknown::QueryInterface and interfaces)

The rest of the stuff kind of follows from these basics. The memory allocation protocol arguably is only there to enable marshaling but nonetheless establishing a standard for lifetime management of non-objects is pretty important.

Interfaces, being long-lived binary API contracts are pretty darned important and useful. I'm constantly amazed at how people seem to have forgotten the problems with making non-virtual constructs part of the long-term contract for an object.

You can debate the relative goodness of reference counting vs. garbage collection. In my book, determinism of lifetime beats faster allocations hands down. But then maybe I'm becoming a dinosaur. The stupid thing was forcing a virtual function call for every modification of the refcount...

The activation is arguably the most important part of the definition at a systems level. The fact that the metadata to determine how and where to activate an object is separate from the calling code is probably the greatest genius of COM.

It's unfortunate that a lot of issues came to light during/after development but hey that's the reality of product development.

Object-based marshaling, which is very cool, is really a MS-only innovation over DCE RPC that as Larry mentions was done as part of Cairo. (I'm not sure that's true; when joining MSFT in '94 the incipient release of DCOM was heralded as the great enabler of truly distributed systems and Cairo was still incubating furiously at the time; it's just suprising for a technology that's incubating to spin off an important subpiece and actually release it... it would be like if Avalon or WinFS shipped before LH. But that's also where Nile a/k/a OLE/DB came from... ah for the old days when something as simple as the next set of APIs were going to solve everyone's problems...)

In simple terms, COM = (OLE/2 - all the document/in-place-activation stuff).
Anonymous
October 17, 2004
The comment has been removed
Anonymous
October 18, 2004
One thing that has always confused me with COM is the STA Model which is implemented using a hidden window to provide Single Threaded access to the COM object. Can anyone throw some light on what actually goes on underneath this particular model? The COM runtime has a lot of quirks built into it hidden from the programmer and because of this sometimes it is possible to shoot yourself in the foot if you dont properly understand the apartment concepts and use multhi-threading in your program.
Anonymous
October 18, 2004
The comment has been removed
Anonymous
October 18, 2004
The comment has been removed
Anonymous
October 18, 2004
I suspect this is one of the reasons why the BeOS folks designed all their stuff such that every window would have a separate dedicated UI thread; though on BeOS, threads are cheap.
Anonymous
October 18, 2004
I should clarify; BeOS doesn't (as far as I know) use COM, but the problem of UI-blocking is one that's plagued programmers on a whole host of different platforms for years :)
Anonymous
April 30, 2007
PingBack from http://punkouter.wordpress.com/2007/05/01/on-com/
Anonymous
June 01, 2009
PingBack from http://woodtvstand.info/story.php?id=15901
Anonymous
June 02, 2009
PingBack from http://patiochairsite.info/story.php?id=28842
Anonymous
June 17, 2009
PingBack from http://patiosetsite.info/story.php?id=163
Anonymous
June 18, 2009
PingBack from http://patioumbrellasource.info/story.php?id=1784

Share via

So what exactly IS COM anyway?

Comments

Additional resources