Java API for XML Parsing
Release Notes
Version: 1.0.1
This document contains notes that may help you use this library
more effectively.
Please feel free to send problem reports,
questions, and other feedback to the feedback alias,
<xml-feedback@java.sun.com>.
With respect to new feature requests, please keep in mind that we
want to see packages built on top of this core for most features.
The core API is intended to facilitate a layered architecture
for value-added products that leverage XML.
Conformance
- The parsers conform to the W3C's
XML 1.0
recommendation. Sun has done extensive testing to ensure that
they conform as closely as possible to this recommendation.
- The parser supports the JAXP
1.0 pluggability API.
- The parse tree supports the XML (core) part of W3C's
DOM Level 1
recommendation.
- In combination, the two also support the current W3C
XML Namespaces
recommendation.
- The parser supports the SAX
1.0 API. Sun has done extensive testing to ensure that it conforms
as closely as possible to this API.
- The entity resolution used within the parser normally
conforms to the IETF's RFC 2376 registration for XML-related
MIME content types. This can be overridden as required.
(See below; overriding may be necessary because many web servers do
not conform to that specification, and report incorrect character
set encoding information.)
- This parser supports all of the character encodings supported
by the Java platform with which it is used. See the package
overview for the com.sun.xml.parser package for more
detailed information, including names of specific encodings that
are widely used.
- When used in a supported configuration (JDK 1.1.8 and later),
this software is Y2K compliant; it has no date related content.
Parser
- There are two factory classes for making parsers pluggable. If you
write to the
javax.xml.parsers, org.xml.sax
and org.w3c.dom
classes, you can use the code in a manner
independent of the underlying implementing parser.
- Please let us know about any diagnostics produced by the
parser that are misleading or confusing.
- Whenever you work with text encodings other than UTF-8 and
UTF-16, you should put an encoding declaration at the very beginning of
all your XML files (including DTDs). If you don't do this, the
parser will not be able to determine the encoding being used, and
will probably be unable to parse your document. A text declaration
like
<?xml version='1.0' encoding='euc-jp'?>
says
that the document uses the "euc-jp" encoding.
- The parser currently reports warnings, rather than errors,
in cases where the declared and actual text encodings don't match.
It may give those same warnings in the common case where the encoding
name used internally to Java is not the one used in the document.
If the declared encoding is truly an error, you'll usually see other
errors (not warnings) being reported by the parser.
- The parser currently does not report an error for content
models which are not deterministic. Accordingly it may not behave
well when given data which matches an "ambiguous" content model
such as ((a,b)|(a,c)). DTDs with such models are in
error, and must be restructured to be unambiguous. (In the example,
(a,(b|c)) is an equivalent legal content model.)
- If you are using JDK 1.1 with large numbers of symbols
(more than can be counted in sixteen bits) you might encounter
a message, panic: 16-bit string hash table overflow
as the Java VM aborts. The Java 2 SDK does not have this limitation.
Object Model
- Conforming to the XML specification, the parser reports all
whitespace to the DOM even, if it's meaningless. Many applications
do not want to see such whitespace. You can remove it by invoking
the Element.normalize method, which merges adjacent text
nodes and also canonicalizes adjacent whitespace into a single space
(unless the xml:space="preserve" attribute prevents it).
- Currently, attribute nodes may not have children. Access their
values as strings instead of enumerating children.
- Currently, when documents are cloned, the clone will not have a
clone of the associated ElementFactory or DocumentType.
- The in-memory representation of text nodes has not been tuned
to be efficient with respect to space utilization.
Other Issues
- This software is a "Java Optional Package" for
XML processing.
- If you recompile the DOM implementation using versions of
"javac" older than the Java 2 SDK version 1.2, or version 1.3
beta, you may run into a compiler bug. The symptom is a report of
illegal access violations for some of the private classes inside the
DOM implementation.
This is because of incorrect code generated by the compiler.
You should only compile these class files with a compiler that
does not have this bug; you may also use the pre-compiled version
in this release. There is no bytecode dependency on the Java 2
runtime; you may use these classes on JDK 1.1 systems also.
- The Microsoft SDK 3.2 for Java (and presumably all earlier
versions) has bugs similar to the one noted above. There are
both compiler and JVM bugs; the JVM bugs prevent the correct
byte codes (as produced by the Java 2 SDK) from working. This
means that you can't compile or use this DOM code with Microsoft
implementations of Java until Microsoft fixes these bugs, which
have been reported to Microsoft.
Changes since 1.0
- Default parser is used in controlled environments such as applets
where
System.getProperty ()
results in a
SecurityException.
- Default Message.properties is provided to avoid gettting error codes
in Locales other than English.
Changes since EA1
- API for pluggability has changed. See the specification and
javadocs for more details.
- All the reported bugs have been fixed including those reported
internally for SAX 1.0 DOM Level 1 and the JAXP 1.0 API.