No MSG For You!
Whenever I find myself repeating the same message over and over again, I have to ask why I haven't blogged it yet. This is one of those cases. :)
I've seen quite a few issues over the years with MSG files. The issues range from "it takes too long to write properties" to "the properties on the MSG don't match what I see in the store" to "I get such and such error trying to copy this message to an MSG". The root cause of most of these issues is one of expectations. People are trying to use MSG files as an archival format, and that's not their intended purpose. If you really want to archive mail, you should develop your own format for persisting the data. You'll gain advantages in versatility, speed, and fidelity.
To understand why I make this recommendation, we first need to realize that not all messages can be copied over to the MSG format. This is noted at the end of https://support.microsoft.com/kb/171907. Since MAPI is transacted, the underlying MSG file has to be opened with STGM_TRANSACTED, meaning nothing is committed to disk until SaveChanges is called on the message. Couple that with the quirk in the MAPI specification that pretty much forces you to create a new transaction each time you add a recipient or attachment and you quickly run into the limit on open root storage files noted in https://support.microsoft.com/kb/163202. This OS imposed limit on open root storage objects isn't likely to ever change, as it's an artifact of the implementation. Likewise, the need for new transactions for each recipient and attachments also won't ever change. Neither the MSG format nor structured storage have seen active development in years. This limit is going to be hit whenever a message has a large number of recipients or attachments, or when there exist a deep level of embedded messages.
The next issue is speed. Writing a message to MSG can be quite slow. There is a huge performance penalty working with a structured storage file in STGM_TRANSACTED mode. And this penalty is multiplied by the number of open root storage objects. So not only do you run into a limit trying to add all those recipients and attachments, but each subsequent recipient and attachment is that much slower to add. For instance, I recently worked on an issue where the repro required that I have 5000 recipients on a message that I then copied over to MSG format. It took over an hour to write the file. And none of that delay was actually in the MAPI code - it was all at the COM level.
Next - not every MSG file you can write can be opened by Outlook. Over the years folks have tried various tricks to squeeze performance out of the code writing their MSG files. In many cases, they succeeded in writing the file faster, or allowing more recipients and attachments on the message. But the downside was they wrote a file that Outlook didn't know how to open! One variation of this issue surfaced with Outlook 2007. Given the performance problems working with MSG files, in Outlook 2007 we decided to check the number of recipients and attachments when opening the file. If either was over 2048, then we refused to open the file at all. The main reasoning for this was a number of corrupt MSG files that had surfaced in the wild with astronomical counts of recipients and attachments - on the order of millions. But a side effect was to block Outlook 2007 from opening some MSG files that Outlook 2003 could open. We've had some customers complain about this one and a fix is in the works. I'll report here when it's done. However, that fix will only cover this one variation of the problem. It won't fix the large number of other scenarios out there.
That covers the mechanics of reading and writing to MSG. Now we discuss fidelity. This isn't about whether the MSG format is out partying with the EML format, but rather how faithfully the MSG represents the source message. This is where MSG being a MAPI based format gets you in trouble. For instance, in archival scenarios, especially when the archive is used for legal discovery, properties such as PR_LAST_MODIFICATION_TIME and PR_LAST_MODIFIER_NAME are very important as they indicate who modified the message and when. But since MSG is itself a MAPI message and has such has to obey all the rules of MAPI, those properties will only reflect the time the MSG was written and the name of the account that wrote it, both of which aren't likely to match the original message. This problem can extend to the body properties as well: no matter how you do it, you're likely to end up converting the body from one format to another when storing it in the MSG file. And every conversion carries with it the possibility of a loss of data. Perhaps some line spacing is subtly changed, or font choices aren't preserved exactly. In some messages, these subtle textual differences could have huge semantic ramifications.
Fidelity also figures in when discussing Unicode. In a large organization, messages will be written in a variety of languages. The only way to preserve these messages into MSG format without converting half the characters to question marks or boxes is to use the Unicode format. Unfortunately, this format is only understood by Outlook 2003 and Outlook 2007. Exchange's MAPI doesn't understand this format at all. So if you're relying on MSG files to save out Unicode data, your solution is stuck using Outlook's implementation of MAPI for all processing of the archive. This severely hampers your ability to build a server based application.
Workaround
So, we've got messages that cannot be copied to the archive, a painfully slow API, messages that cannot be opened once archived, and a format that's not capable of representing the actual message being archived. Clearly, these are not the attributes we want in an archive.
Fortunately, the workaround is simple: don't use MSG to archive messages. Instead, develop your own file format to preserve the important properties on a message. Here's one approach using the file system and XML files:
- Each message, not counting embedded messages or attachments, is stored as a single XML file, the elements of which map back to the various properties from the source message.
- Properties to archive are chosen by using GetPropList. Additionally, the various body properties should be read individually.
- The recipient collection is also stored in this XML file.
- Information about the attachment collection is stored in this XML file, but the actual attachments are stored in separate files. If the attachment is an embedded message, it is stored in an XML just like the parent message.
- Embedded message XML files may be marked in the file name or by an attribute to distinguish them from the set of parent messages. Alternatively, all attachments could live in a subfolder.
- All file names would be autogenerated to avoid conflicts.
The only really hard part about this format is determining how to store each of the possible MAPI property types. However, when we look closely, we see there are only 13 types to consider, most of which can be represented as just a simple number or string. Even binary data is easy to store if it's first converted to hex. Multivalued properties, large binary and string properties, and named properties all add additional wrinkles, but are easily addressed. I figure a junior programmer could complete a reasonable first draft of the required code to both read and write a MAPI message to and from XML in an afternoon. In fact, most of the code for writing the XML format is already present in MFCMAPI - check out dumpstore.cpp.
Objections
Hopefully I've convinced most of you not to use the MSG file format for archiving. Some of you might not be convinced though. You might think you've got that one special case that requires you to use MSG. I don't believe such a case exists. I've anticipated a few of the common objections:
- "I have to use MSG for legal discovery." - See the above article on fidelity. MSG is a poor copy, totally unsuited for answering legal questions. You can do much better on your own.
- "It's too hard/complex to write my own file format" - Good thing I just spec'd one out for you. :) If you're thinking this you probably haven't tried to sit down and do it.
- "How would I know which properties to persist?" - That's what GetPropList gets you. It's exactly the same way CopyTo works to determine which properties to copy in the MSG code, with the advantage that you can archive everything now, not just what MSG is capable of storing.
- "XML is too bloated" - You don't have to use XML - you could use columns in a SQL table, or any other storage medium. It's your choice.
- "Reading, writing and parsing text is too slow" - Text processing can be quite fast if you approach it correctly. And anything is better than the speed of MSG.
- "I shouldn't have to fix this - Microsoft should fix it" - Fair enough, but on the other hand, we've never encouraged anyone to use MSG files for archiving. It's just not what they're intended for. Additionally, consider how such a "fix" would be deployed. You'd be requiring all of your customers to install a new build of Outlook or Exchange, and they'd never be able to use an older build. You'd never get this sort of approval from most customers.
- "I need to be able to open the messages in Outlook" - Most archival solutions include some sort of client side component, even if it's just a web page. If yours does, then from your client side component you can create a message on the fly, read the properties from your archive, and populate it. This would be no slower that opening an MSG file, and in many cases would actually be faster. Additionally, you'd be free to create viewers for your archive that do not depend on Outlook!
- "I need MSG so I can index the data" - You must be using an IFilter extension that supports MSG. If you used XML, all your data would be text to begin with, so the native filters would already work. Plus, if you want more granular search, it would be easy to write an IFilter for your own format.
The final objection is my favorite: "But I've never had a problem with MSG files" - Bully for you! This article isn't addressed to you then. However, I had one customer who also made this claim when I found they were using MSG to archive messages. Not quite believing them though, I outlined each of the problems listed above. It turns out they had encountered or were encountering every single one of them. They just hadn't connected the problems back to their choice to use MSG to archive their data.
Comments
Anonymous
January 08, 2008
"I need to be able to open the messages in Outlook" - Steve, you are missing the point. People want to be able to open a message in Outlook, not my super fast reliable viewer, which, unfortunately, does not get installed by Outlook :-) A requirement to have Outlook installed is an easy one (have you ever seen a corporate PC with a copy of Office installed?), installing anything third-party is a PITA. It does not have to be an MSG file, people simply want something that Outlook can open. If MS comes up with an XML schema that OUtlook can natively open, I'll be the first one to use it. EML format would be good, but I don't think you could handle EX type recipients (people woudl want to see the familir GAL dialog, not a one-off SMTP address).Anonymous
January 08, 2008
No - I don't think I missed the point - the point is they're trying to use this format for archiving and it's totally unsuitable for that purpose. When I talk about having Outlook installed, I'm speaking more about the server where the archiving is taking place. No server should have Outlook installed on it. That's just a bad idea. The only scenario I can see where they would have access to the archive but NOT have any software (not even a web page) from the archive vendor is if the only interface the vendor presents to the end user is a file share. And I think we can agree a file share is a pretty poor interface for an enterprise ready product. You seem to be proposing "Microsoft should fix this", which I already addressed. Even if we were to make some better format, it would only work with the newest version of Outlook, so for that reason most vendors would reject it.Anonymous
January 08, 2008
I am with you when you talk about archiving - each and every property that is expected to be used later must be persisted explicitly. What I am talking about however is UI - people are most comfortable with Outlook, they do not want to use any other app that does something that Outlook can do. A use case: a user sends/receives a message to/from a customer. It gets parsed and its most important properties (subject, body, attachments, etc) are parsed and stored separately. The whole message is also stored in the MSG format in a blob in a DB (storage is cheap). A user (the same or a different one) at a later time can simply look at a history view for a given contact and double click on the message. The message is extracted from DB, saved as an MSG file, and opened by Outlook. A user can then reply/forward/etc in the familiar Outlook environment. The "familiar Outlook environment" is the keyword; at this point I would not care less if an obscure MAPI property was not persisted correctly.Anonymous
January 08, 2008
That's a valid use case. However, since in that case you already have code pulling the MSG file from the DB and saving it to disk, there's no reason you couldn't pull properties from the DB and construct a message on the fly. In fact, I even mentioned this option in the article. As long as you have code running on the client side there's no reason your UI needs to change. I'm not concerned with the expense of storing an MSG file. What I am concerned with is the fact that so many messages cannot be represented in MSG at all.Anonymous
January 08, 2008
I really, really do not want to deal with the stuff that I do not care about, especially the pretty fomanatting, be that HTML or RTF, or a combination of the two. The user however does care about that a lot. Plus if the extraction is done on the server, the MAPI system might not be installed, even if Outlook is locally available. I just want to have a file format that Outlook can open natively. Another option that you did not mention is that (since your own all the source code), you can just pull out the relevant pieces of the COM system from Windows and create a private MAPI function that can deal with any MSG file. The function does not have to be real fancy and support simultaneous access from different processes (you cannot do that now with MSG files anyway). I understand the technical limitations of the MSG format, but if a customer wants to have 10,000 recipients in a message, I can come up with an excuse why accessing such an MSG file takes long time, but saying that he simply can't do that ain't gonna fly...Anonymous
January 08, 2008
The comment has been removedAnonymous
January 08, 2008
The comment has been removedAnonymous
January 08, 2008
I do realize that many people believe the statements "I need to display the message in Outlook" and "I must use MSG" to be equivalent. But they don't have to be. There's nothing stopping an archiving vendor from building a message on the fly. In fact, that's exactly what's happening under the covers when an MSG is opened.Anonymous
January 09, 2008
I can see where you're going on this Stephen, and you make very good points, but implementing an ability to re-constitute a exported object back into Outlook is going to require the use MAPI which is not an option for many who are stuck using the monstrosity called the "Outlook Object Model" or OOM. It is impossible to recreate an object in the "sent" state using the OOM. Re-creating certain complex Outlook objects such as task requests and appointment is also impossible using the OOM because of Outlook's use of numerous undocumented MAPI properties which are not exposed in OOM. Another issue is more and more Outlook addins are being written in .NET which has no support for MAPI short of using libraries like MAPI33. The use of any MAPI with .NET is unsupported by Microsoft. As developers we should be encouraging standardized open formats instead of everyone making their own. Wouldn't it be better for everyone if Microsoft created a new open format that would be recognizable and supported by both MAPI and Outlook? One that IS suitable for archiving AND can be opened from the Windows explorer though a shell open/double click action? Word, Excel, PowerPoint, etc. are all moving to an open format in Office 2007, why is Outlook not?Anonymous
January 09, 2008
You're right that implementing the reconsitution does require MAPI, but not very much MAPI. Here's an idea: Someone could start an open project to build a handler that knows how to open these XML files. All it would need to do is register in the file system to handle whatever extension is used, then when invoked log on to the default/current profile and build the message. It would then hand the message off to Outlook to display. That's really all that Outlook's MSG handler code is doing. This would put us in a much better position for lobbying the Outlook team. Instead of saying "you should support a better format" you'd be saying "you should support the XYZ format". BTW - I don't find the OOM or .Net observations to be relevent. The handler you use to open the files doesn't have to be tied into any other code. It doesn't even need to be an add-in. It can stand alone.Anonymous
January 09, 2008
There is already MIME format, which Outlook itself can handle (EML files are currently handled by OE). You already have IConverterSession used all over the place by Outlook; why not reset the EML file handler to outlook.exe? The potential proiblem I see is the EX type addresses, while RFC really expects SMTP.Anonymous
January 09, 2008
Yeah - I considered discussing MIME in my post, but MIME's an even worse format fidelity-wise for storing MAPI messages. You could TNEF encode all the MAPI stuff, but that doesn't help on the indexing front. Plus, TNEF has it's own problems.Anonymous
January 10, 2008
Stephen, The .NET issue is relevant because Microsoft will not give support for a .NET program or addin using MAPI in any manner, that includes interop. Things like the MAPI33 library are unsupported. Thus if you have a very large application written in .NET that works again MSG files solely using the OOM, you cannot give up using MSG files and do object re-constitution through MAPI without either giving up .NET and re-writing the application in C++, or giving up a large amount of Microsoft support. For my case we've broken this rule long ago by using MAPI33 and have had to do quite a bit of haggling to get some degree of support for an issue we had with Outlook crashing and a .NET addin being used that was interfacing with MAPI. The issue was finally resolved and had nothing to do our using of MAPI. I'm not outright rejecting your advice on this, we're seriously looking into giving up the saving/archiving of .MSG files. Is there anything you can do on your end to remove the support barriers with .NET programs using MAPI through managed C++ like the MAPI33 library does? Managed C++ can bridge the gap with .NET and MAPI because the sensitive MAPI API calls that don't work from CLR interop can be made safely in the unmanaged portion of the code which is what the MAPI33 library does.Anonymous
January 10, 2008
Once you're archiving to a text file MAPI doesn't need to be involved to process it. And there's no reason an application needs to be a single process. So I don't see the comments about .Net to be particularly relevant. BTW, MAPI33 isn't removing any support barriers or gaps. It's doing exactly what we don't support, which is to use MAPI from managed code. The only thing MAPI33 gets you is the ability to be unsupportable faster. The product teams are the ones that decided not to support MAPI with .Net. I don't see them changing their minds any time soon. There's no sense in lobbying me for a change. We're all well versed in all the arguments.Anonymous
January 14, 2008
I think the biggest flaw in your argument for a user-defined format is that you seem to be under the impression that we developers are in control of our environment. As a vendor, I supply MSG files to our clients. I have no control over their environment, which varies widely from client to client. I don't know if they're using a webapp to process and display these MSG files, or they're using Outlook, or any of a hundred other use cases I could come up with. Heck, I can't even guarantee that the client isn't running a *nix environment and has their own MSG parser. The MSG format is one Outlook natively supports, so it is a loose file format from which hundreds of avenues of businesses have sprung. As a "supplier" of data, we can't control the format. We can't tell every client "you need to design your own XML reader for this document type we invented for Outlook e-mails because MSG is unsuitable and we've no idea how or what you intend to do with it". What you're proposing is a format war 100x the scale of Blu-Ray vs. HD-DVD. The data suppliers can't get into the format wars, because we don't control the player. If and when Microsoft decides to replace MSG with something else, we will all happily march along. However, we the suppliers have no choice but to use it until then.Anonymous
January 15, 2008
It's not a "flaw in my argument" because I'm just enumerating why MSG is such a poor choice for archival. Saying I'm wrong to state this because you don't feel you had a choice in format simply doesn't make sense. You should step back and ask yourself why you are providing MSG files to the customer. It's almost certainly not because the customer demanded that particular format. On the contrary, it's more likely that you informed them MSG was the format you were going to use and they accepted it because it worked. Most customers don't know anything about file formats. They just want something that works. If you had handed them an alternate file format and the appropriate tools to work with it, I bet most customers would have accepted that as well. You should reconsider your analogy. Seriously - Blu-Ray vs. HD-DVD? You think this debate would ever reach the evening news? I don't.Anonymous
January 15, 2008
The comment has been removedAnonymous
January 15, 2008
I see - we're talking apples and oranges - this article is about the appropriateness of MSG in archival. I only mentioned discovery because the issues with date and last modified are serious and need to be understood by those using MSG. However, assuming you archive first and then produce output for others to analyze, there's no reason you can't archive to your own format and then build MSG files for those who want them.Anonymous
January 15, 2008
Electronic Discovery is a multi-billion dollar industry and the MSG format is deeply entrenched in it. We use it because we have to, and there is no other format available. Believe me ... there'd be dancing in the streets if MS decided to come up with a MSGX format (for example) that was better suited for archiving. If Outlook supported it, it would trigger a domino effect in EDD just like the new DOCX, XLSX and the rest of the XML formats have done.Anonymous
January 15, 2008
Let me chime in with the chorus begging for Outlook to support a new format that can become a standard for archiving purposes. Our company (Vault Solutions) builds products that work with Symantec's Enterprise Vault, and the msg format is typically used as a way to exchange information from one system to another. Customers need to export information from the archive to be sent to one of a number of review, analytic or e-discovery tools. This is most often asked for in msg format for the reasons Robert enumerated. Microsoft is in the best position to define a better format.Anonymous
March 13, 2008
Couple of points:
- I begged for an XML-based format supplied by Microsoft about four years ago (http://playground.doesntexist.org/?id=43), hoping it would make it into Office 12. Well, maybe Office 14....
- Even the latest Microsoft SharePoint platform (MOSS 2007/WSS v3) uses MSG files with incoming mail-enabled Document libraries. So, maybe Microsoft should have started to evangelize their internal groups first how bad the MSG file format is for archival.
- Windows Live offers an MSG iFilter for free. Encouraging customers to use MSG files for archival, but leaving those who archive into PST files in the cold and dark. Neither an iFilter nor can you put them into SharePoint (blocked by default).
- Microsoft has changed products continuously over the past decade (and longer), breaking compatibility. Just take unicode PST's. So, breaking backwards compatibility as an argument of standing still and not innovating is something I am not buying any more, sorry. The time of those binary lousy formats is over. The change of the Internet, including connectivity speed (sitting behind a DSL 20Mbit/2Mbit for a bargain here), over the past eight years yells for opening the Outlook storage NOW. OutlookXML, please!! And in Outlook 14, not Outlook 15 :-) Famous last words: I know, talking apples and oranges. You, Stephen, just wanted to point out how badly designed MSG files are. Instead it turned into a discussion about closed formats and incomplete/limited API's. I feel sorry for you ;-) Cheers, Sig
Anonymous
March 14, 2008
Wow - Stephen, Dmitry & Sig all in one place! That's a party I'd like to go to. I know this conversation has gone off-topic from Stephen's original, and as always very helpful post, but I just want to agree with the call for a more 'open' format. The problem out here in software house land is that the msg format is the nearest thing we have to a standard for interoperability. It's treated in the same way as xls & doc, even if it is flawed. The .Net argument is also relevant because that's what we code in these days. In the past MAPI/CDO 1.21/redemption were often the choices for dealing with messaging because of the OOM shortcomings. Plus we don't always have outlook installed (for example on a server). From .Net I can no longer use any of these in a supported fashion without considerable pain. Nor can I read and write MSG files very easily. We need an open and documented file format for storing Email messages, contacts etc. And only MS can be the sheriff because you make outlook and exchange. I got all excited about winfs just because I thought finally I'd have a documented way to easily get at emails, contacts, tasks and meetings ... You have to understand that we are often under resourced, time poor and rarely do we have an expert C++/MAPI developer hiding in a cupboard. We're trying to use MS software to write business applications for our clients using the best MS tools (.Net) and Outlook/Exchange and believe me it's hard work sometimes!! Simple, published file formats and open APIs which perform and I can actually use would make a world of difference. I reckon clients buying our software have bought sql server & exchange licenses in the hundreds, so we're giving back to MS too! Big thanks to you guys, you've no idea how many times you've helped me out just by writing about this stuff. Here's to an open and managed (MMAPI?) future!! LJAnonymous
May 18, 2008
Hi I am using BizTalk to receive email and I see the messages I receive is in MIME format. When I save the file and open it in OE, it makes the embedded image to be a part of the attachment and doesn't show as embedded image. Is there a way to convert the eml file to MSG file. Thanks GrdAnonymous
July 15, 2008
I hate to carry this on, but in a similar way to Grd I now have a problem with interoperability of email formats on my current project and I thought I'd mention it to highlight the pain we have with this. OK, I'm writing a program in C# to generate business emails and I want to be able to save them to a database for audit purposes. I may also want to re-send them, and I would like to support sending the emails from the client (outlook) for ad-hoc mails and from a server (smtp) for bulk mails. So the question is, how do I store them on the database and achieve all of the above? The obvious answer would seem to be Internet mime format, or maybe storing the addressing fields, attachments and body separately with the body as MHTML (forgetting the plain text for now, that's another story). MHTML and Internet format are after all standards and independent of the client/server formats. My problem is that there's no common format between the System.Net.Mail.MailMessage and Outlook. There's no way to convert from one to the other and no way to convert to/from a standard internet mail in either of them. I cannot even set the body using MHTML. If I save the body as MHTML, how do I construct an outlook email and System.Net.MailMessage from it? This goes even further as I want my users to be able to edit the body of the HTML email. The obvious solution seemed to be word, given that's what Outlook 2007 uses, and it can save to MHTML. I guess what we need is a standard format - like OOXML for email messages, or the adoption and support by microsoft of the RFC2822 Mime. And ideally the System.Net.Mail classes would also understand it. Obviously all of this is achievable if I spend months writing conversion routines, but I'm on a deadline. Something else that would help is if the domain and presentation logic of office was split out so I could use Outlook on a server. Assuming the domain logic was server-quality I'd be much happier using an Outlook format for storage. Any suggestions gratefully received LJAnonymous
July 22, 2008
One of our customers reported that if they used IConverterSession::MIMEToMAPI to generate an MSG file,Anonymous
April 24, 2015
Apparently IConverterSession::MIMEToMAPI has a limit of number attachments in eml that it can convert. We found we can't convert eml which is 200+ attachments to msg. Can't find related documentation either. Is there a limit of number of attachments that IConverterSession::MIMEToMAPI can handle? thanks!Anonymous
April 24, 2015
I'm not familiar with such a limit, but 200 attachments is quite a few more than I'd expect to normally see, so it doesn't surprise me.