Frequently Asked Questions about SAX2

 

The following are some frequently asked questions about the Simple API for XML (SAX).

What are the system requirements for SAX2?

The system requirements for SAX2 are the same as those for Installing MSXML.

Does SAX support validation?

Yes. With MSXML 6.0, SAX supports validation to XSD schemas but does not support validation using Document Type Definition (DTD) files.

To validate documents when using SAX, you set the validation flag on the SAXXMLReader through the putFeature method.

When setting this feature, the feature name is "schema-validation" and its value is set to True.

This feature is read-only during parsing and read/write otherwise.

For more information, see Validate Documents Using SAX.

Can I pass a BSTR, instead of a URL, to the SAXXMLReader?

You can pass a VARIANT containing a BSTR to ISAXXMLReader::parse(VARIANT). In this case, the encoding is UTF-16.

Why is white space reported as characters()? Why isn't ignorableWhitespace called?

White space can occur in several places, for example, in an element without character data, which contains only child nodes and white space. To ignore white space, the parser must be able to distinguish those cases. The SAX parser is a nonvalidating parser and cannot distinguish those cases, so ignorableWhitespace() never gets called. Nonvalidating parsers treat all white space between elements as characters.

How do I get XML header information?

The XML header contains version and encoding information, for example, <?xml version="1.0" encoding="UTF-8"?>. To get XML header information, call ISAXXMLReader::getProperty([in] const wchar_t * pwchName, [out, retval] VARIANT * pvarValue); and pass one of following three property values:

"xmldecl-encoding"

"xmldecl-version"

"xmldecl-standalone"

Note

The "xmldecl-encoding", "xmldecl-version", and "xmldecl-standalone" properties provide information about the presence and content of the XML header. The information is available only when SAXXMLReader reads and parses the XML document. After processing, the control returns to the application, and this information is no longer available.

XML header information was designed for low-level reader and parser use, not for applications.

To get the processing instruction, implement a ContentHandler that supports ISAXContentHandler and handles the processingInstruction event.

How can I use SAXXMLWriter from scripts?

SAXXMLReader implements IVBSAXXMLReader, which is accessible from scripts. You can call handler events directly from SAXXMLWriter and generate XML without the reader.

Can I use the same instance of SAXXMLReader to parse XML files sequentially?

You can use the same instance of SAXXMLReader to parse two XML files sequentially, but not in different threads. The MSXML implementation does not support multithreaded use. AddRef/Release are not multithread-safe and there is no locking on any of the API entry points.

However, you can use two instances of SAXXMLReader in two threads, and parse two different XML files, as long as nothing gets shared.

How can I write from SAXXMLWriter in a memory buffer in nonUnicode encoding?

You can provide an IStream/ISequentialStream object, which writes to a memory buffer. XML will be generated the same way as for output into a file.

How can I reset SAXXMLWriter to create a new string?

To reset SAXXMLWriter to create a new string:

  1. Generate an XML document.

  2. Use a string object from the writer.

  3. Generate another XML document.

How do I avoid appending a new XML document to the previous one?

To reset XML writer to create a new string, reset the output property. Internally, the flush method of IMXXMLWriter will be called.

How can I tell if attribute values have an entity reference?

There is no indication of whether attribute values have an entity reference.

Can I find the order of element attributes?

The order of attributes is not important in XML, and is therefore not exposed. Enumeration with attributes may follow the original order of the attributes.

How do I handle errors with SAX?

ISAXErrorHandler/IVBSAXErrorHandler provides the basic interface for handling parsing errors. Currently, all errors are fatal.

In C++, a fatal error will result in returning a value other than S_OK HRESULT from the parse or parseURL method.