Scales Pull Parsing leverages StAX, the JAXP streaming api, and Scalaz Iteratees, to allow flexible parsing of large documents or many documents in memory constrained environment (e.g. a high performance server).

Scales, as with full tree parsing, allows configurable optimisation strategies and the pull parser used. The optimisation strategy type, unlike full tree parsing is a MemoryOptimisationStrategy and does not allow for tree path optimisations.

The input to all pull xml functions is a sax.InputSource, which allows the same conversions as for a full tree parse.

A curio exists with pullXmlCompletely, which uses the pull parser to load xml Docs. This may be of use in an environment where the StAX parser performs better than SAX, but tests have shown lower memory consumption and higher performance when using SAX to parse full trees.

Some of the advanced features of Pull Parsing may require importing Scalaz as well:

Pull Model

Where XmlEvent is exactly the same Elem, XmlItem model as with full Trees. This is possible because Scales separates the structure of data and the data itself.

The developer only has to learn one single difference to be able to use pull parsing. For example the code to process a stream is simply:

Resource Management

The above example has a serious potential flaw, if anything in the while loop throws the resource cannot be closed. To allow greater control of the resource Scales provides the following interface (full api details present here):

Importantly closeResource only closes a resource once. This resource is directly available when calling pullXmlResource.

pullXml itself is also a Closable and provides the same guarantee, close only attempts to close the resource once.

Both results provide the isClosed function (via the IsClosed interface) allowing code to trust that it has been closed. (NB - a future version may choose to expose this in the type system, but integrating with the ARM library makes more sense).

What CloseOnNeed also adds is the ++ function, which combines one CloseOnNeed with another to create a new CloseOnNeed that closes the other two resources. This allows chaining of xml files (via pull iterators).

Simple Reading Of Repeated Sections

where the interesting parts are always repeating in the same location, we can model the interesting parts a simple List of QNames (very simplified XPath):

The resulting Iterator contains paths with single child parents up to the root and all of the subtree of interest.

Buffering And Identifying Xml Messages

When parsing xml messages it is often necessary to identify the type of the message before further processing, for example what kind of soap request is being sent, or what is the root element?

To help with this issue Scales pull parsing offers the ability to "peek" into an event stream and replay the events again to fully process them.

A simple example is processing soap messages based on the first body element, you may want to choose different code paths based on this, but require elements in the header to do so. The usage is simple via the capture function and the skip/skipv functions:

The result from skip is simply Option[XmlPath], if the stream runs out or its no longer possible to get that position it is None. Only as much of the stream is read as needed, it will stop on the Left(Elem) event.

NB to only identify the first element, simply use skip(Nil) instead (or skipv()).

Scales Xml 0.5.0

Generated Documentation

Documentation Highlights

First Steps
Setup
How To Use

Xml Model
QNames
Creating QNames
Directly
With Sugar
Namespaces & Scope
Namespaces in Scales
Type System FTW
Runtime Validation
Equality
via Scalaz Equal and Scales Equiv
Testing For QNames
Serializing QNames
XML Version Support
Differences Between Xml 1.0 and 1.1
Scales Support for Both Versions
In Parser We Trust - Users We Protect
Runtime XmlVersion QName Related Correctness
Attributes
Defining
Explicitly
Implicitly
Equality
Within an Elem
Attributes ListSet
Testing Against QNames or Namespaces
Elem
XML Elements
Declaring
QName And Namespace Correctness
Elems Are Reusable
Runtime Validation Checks
XmlItem
Declaring
XmlItems Are Reusable
Runtime Correctness Checks
Serializing XmlItems
Serializing CData
Xml DSL and Trees
Tour of the DSL
Creating a Tree
Adding To The Tree
Adding an Attribute
Setting Text
Removing Children
Removing Attributes
Optional XML
Folding Within The DSL
Optional Xml DSL
Cascading Optionals

Accessing and Querying Data
XPath Embedded DSL
Simple Usage Examples
XPath Crash Course
XPath Axe
Node Tests
Predicates
Predicate Construction
Chaining Predicates
Positional Predicates
Direct Filtering
Unsorted Results and Views
XPath Functions
Organisation
QName Functions
Text Functions
Boolean Function
XPath 1.0 String Evaluation
How To Use
Other Jaxen Tricks

Parsing XML
Full XML Doc Parsing
Direct SAX XMLReader Usage
Pull Parsing
Pull Model
Resource Management
Simple Reading Of Repeated Sections
Buffering And Identifying Xml Messages
Pulling Repeated Sections
Supported Repeating Section Examples
Alternating and Repeating Elements
Grouped Repeating
Repeating Nested
Sectioned Grouped Repeating
Pull Parsing ResumableIter'atees
Async Pull
Integrating With Enumeratees - enumToMany
Asyc Pull with enumToMany

Serializing & Transforming XML
Serializing
writeTo & writeTo
What Can Be Serialized?
Folding Xml
PathFoldR - Catchy Result Type
Composing Transformations
ReplaceWith - Nested
& - Fail Early
| - Try The Next
TrAX & XSLT Support
Simple Usage Example

Xml Equality
Xml Equality Basics
How To Use
Types Covered
Why Join Adjacent Text and CData?
Removing Comments and PIs
Why Not Use Canonical Xml?
XmlComparison - Where Is It Different?
The compare Function
The calculate Parameter
ComparisonContext
Return Value
XmlDifference
QName Token Handling

Technical Details
Memory Optimisation
Disclaimer
Introduction
Options for memory optimisation
Resulting Sizes
Memory Consumption During Parsing
Overall Parsing Performance
Special Case - Pull Parsing via onQNames
General Memory Optimisation Details
ImmutableArrayProxy
EitherLike
TreeOptimisation
QName and Elem Memory Usage
TreeProxies and Builders
Serializing Details
Encoding
XML Names
Text Data
Other Markup Character Data
Creating a SerializerFactory