Background on Metadata Query Language

During the design phase of WIC (Windows Imaging Component), we took a holistic view of the issues the issues that plagued our predecessor GDI+. Many of the items that came up from customers, newsgroups, and internal discussions revolved around building a stable and robust set of CODECs, providing what we now call the extensibility system for CODECs/Metadata Handlers/PixelFormats, and a robust handling of metadata formats. By far, the largest issue was around providing an extensible and robust mechanism for metadata to be handled.

 

If you take a look closely at the GDI+ API set for metadata, there is a hard coded list of metadata items that can be read or written to the file. One of the problems with such a list is that this binds the implementation to a specific version of a metadata handler. So if a new tag in EXIF is introduced, you cannot get to it. Another problem is that such a hard coded list imposes a “policy” as to what the enum value actually binds to. For example, the enum lists “PropertyTagArtist”, but does this map to (in JPEGs and TIFFs) IPTC writer or contact information, or the artist information stored in the IFD? The only way to find out is to experiment. Lastly, with the introduction of new metadata formats, like XMP, the approach of using such an enum just falls over.

 

Within WIC, such issues have been resolved. You’ll notice that the hard coded list is gone. So what replaces this? This brings us to the topic at hand… The metadata query language is what replaces this hard coded list. How does the metadata query language solve the previous problems?

  • Extensibility: Third parties are able to implement their own metadata handler. Once this is done, they may register their own keyword that will be used to access the metadata in this new metadata schema.
  • Full Disclosure: By using the full metadata “path”, a user can clearly see which piece of metadata is being written.
  • Policy Component: If a user still wants to access or modify metadata by a built-in policy, they may do so using a set of keywords instead of a full metadata “path”.
  • Familiar Syntax: The metadata query language is very similar to basic XPath in the XML world. When I say “basic”, I mean the very basic concept of a “path” to metadata and an index into which index of multiple pieces of the same metadata. Full fledged features of XPath are not supported currently in the metadata query language.

Recently when the update to WICExplorer was made, searching by metadata query language strings was added. Also, I’ve posted a managed sample (code, compiled) for getting metadata out of the file. Using one or both of these tools is extremely helpful in understanding how the metadata query language works.

 

Using these sample images, here are some examples:

· JPEG containing XMP metadata:

o (Notice in the case of XMP namespaces can be aliased to short names.)

o Creator:

§ /xmp/creator/{ulong=0}

§ /xmp/dc:creator/{ulong=0}

o Description:

§ /xmp/description/x-default

§ /xmp/dc:description/x-default

o Creator Contact Information:

§ /xmp/CreatorContactInfo/CiAdrExtadr

§ /xmp/http\:\/\/iptc.org\/std\/Iptc4xmpCore\/1.0\/xmlns\/:CreatorContactInfo/http\:\/\/iptc.org\/std\/Iptc4xmpCore\/1.0\/xmlns\/:CiAdrExtadr

· JPEG containing IFD metadata:

o Artist:

§ /app1/ifd/{uint=315}

o ImageDescription:

§ /app1/ifd/{uint=270}

· TIFF containing XMP metadata:

o (Notice in the case of XMP namespaces can be aliased to short names.)

o Creator:

§ /ifd/xmp/creator/{ulong=0}

§ /ifd/xmp/dc:creator/{ulong=0}

o Description:

§ /ifd/xmp/description/x-default

§ /ifd/xmp/dc:description/x-default

o Creator Contact Information:

§ /ifd/xmp/CreatorContactInfo/CiAdrExtadr

§ /ifd/xmp/http\:\/\/iptc.org\/std\/Iptc4xmpCore\/1.0\/xmlns\/:CreatorContactInfo/http\:\/\/iptc.org\/std\/Iptc4xmpCore\/1.0\/xmlns\/:CiAdrExtadr

· TIFF containing IFD metadata:

o Artist:

§ /ifd/{uint=315}

o ImageDescription:

§ /ifd/{uint=270}

· TIFF containing IPTC metadata in the IFD:

o /ifd/iptc/by-line

· PNG tEXt metadata:

o Author:

§ /[*]tEXt/Author

 

As you notice there are duplicate places where certain metadata may live. For example, an abstracted view of author information includes the following pieces of metadata (not a complete list):

- JPEG:

o /xmp/dc:creator

o /xmp/tiff:artist

o /app1/ifd/{ushort=315}

o /app1/ifd/{ushort=40093}

- TIFF:

o /ifd/xmp/dc:creator

o /ifd/xmp/tiff:artist

o /ifd/iptc/by-line

o /ifd/{ushort=315}

o /ifd/{ushort=40093}

- PNG:

o /[*]tEXt/Author

 

Instead of using absolute metadata query language paths, you can use a policy component path. For example, for author the path is “System.Author”. When the metadata query engine sees this type of query, it hands this off to the policy component. In turn, the policy component returns a fully qualified metadata query language string back. If a file has the information requested in more than one location it’s the policy component’s job to determine which fully qualified path to actually use. When encoding metadata, you may also set the author using “System.Author”. This will write the metadata back to all applicable spots within the file (depending on what metadata formats are already in the file).

 

Currently if you try to use these policy component query strings on Windows XP, they will fail (they only work on Windows Vista today). However as Peggi mentioned in our Channel9 interview, there will be a WIC redistributable package that includes the policy component. This redistributable package will be available for Windows XP SP2 and Windows 2003 Server.

 

All in all, the metadata query language is a pretty powerful tool. Play around with it a bit in WICExplorer and the managed metadata sample. Let us know if you have any questions!