The XML XPath specifications allows navigation of XML documents via a DSL that describes routes through a document using a combination of axe, steps and predicates. It has a limited number of these abstractions but together they create a powerful direct, whilst remaining simple to use, querying language.

Scales provides this power via both a traditional string based approach and an embedded DSL that leverages the power of Scalas syntactical flexibility to mimic the XPath syntax.

The DSL uses the existing Scales abstractions to the full, and works via a zipper over the XmlTree itself. Each navigation step through the tree creates new zippers and new paths through the tree.

In every case possible (with the exception of the namespace:: axis) the range of behaviours closely follows the specification, like for like queries matching 100%. Instead of matching on prefixes Scales uses fully qualified expanded QNames (qualifiedName in the QName Functions) to match against, not requiring a prefix context within which to evaluate.

Internally, perhaps unsurprisingly, XPath is implemented as a combination of filter, map and flatMap. When retrieving results (e.g. converting to an Iterable) the results are sorted into Document order, this can be expensive for large result sets (see Unsorted Results for alternatives).

Simple Usage Examples

XPath Crash Course

Scales Xml follows the XPath spec fairly closely and accordingly represents the concepts of context, location steps and axe, full details of which can be found in the XPath Standard.

The context, which can be thought of as current "place" in the document, is represented by the following:

Location steps are a combination of axe, node test and predicates e.g. /*fred which represents the child axe, element node test and a predicate against a no-namespace local name of "fred".

As the XPath adds more axe, steps and predicates the context changes, reducing or expanding possible matches as it develops. Scales Xml's XPath DSL represents that context with the XPath class, where each operation on that class returns another immutable instance for the next context.

As with XPath, Scales Xml predicates, axe and node tests can be chained with the current context (the self axe in XPath) always represented by the resulting Scales XPath object. Only when the underlying results are used (for example by string or qname functions) do they leave the XPath object and get transformed into a, by default, ordered list of matching nodes.

XPath Axe

Scales supports the complete useful XPath axe, each of which can be used against a given context (an instance of Scales XPath), for the full XPath axe details find the spec here:

XPath Axis	Scales DSL	Details
ancestor	ancestor_::	All the parents of this context
ancestor-or-self	ancestor_or_self::	All the parents of this context and this node
attribute	*@	All the attributes for a given context, is often combined directly with a name
child	\ or \+ to expand XmlItems	Children of this context. NB: \ alone in Scales DSL simply removes the initialNode setting required by \\. If the children should be expanded (e.g. to use .filter directly) then \+ will "unpack" the child nodes.
descendant	descendant_::	All children, and their children
descendant-or-self	descendant_or_self_::	This node and all descendants, also known as \\
following	following_::	All nodes that follow this context in document order without child nodes of this context
following-sibling	following_sibling_::	All direct children of this contexts parent node that follow in document order.
parent	\^	The parent context of this context. For elements it represents the parent eleemnt and for attributes the containing element.
preceding	preceding_::	All nodes that precede this context in document order excluding the parent nodes
preceding-sibling	preceding_sibling_::	All previous children of the parent in the current context in document order.
self	The XPath object itself via .	The current context node within a document.

A commonly used abbreviation not listed above is of course \\, which means descendant_or_self_::. The difference being that \\ also supports possible eager evaluation and as per the spec the notion of \\ in the beginning expression.

NB Scales Embedded XPath DSL does not support the namespace axis - if you have a requirement for it then it can be looked at (please send an email to the mailing list to discuss possible improvements)

Node Tests

Predicates

The first two are special cased, as in the XPath spec, as they are the most heavily used predicates (using the above example document):

In each case the XmlPath (or AttributePath) is passed to the predicate with a number of shortcuts for the common QName based matches and positional matches for elements:

XPath Node Test	Scales DSL	Details
node()	.\+	Returns a new context for all the children below a given context
text()	.text	Returns a new context for all the text and cdata below a given context
comment()	.comment	Returns a new context for all the comments below a given context

The developer can chose to ignore namespaces (not recommended) by using the *:* and *:@ predicates instead (equivalent to string xpath /*= "x").

Predicate Construction

The various base node types and filters are based on these functions, for example the element predicate * is implemented as:

In turn \* can be seen as a combination of the \ child step and the * predicate (via xflatMap) and is provided as syntactic sugar.

All of the standard set of predicates (and axis combinations) can be found in the XPath ScalaDoc. Clicking the right arrow for many of the functions will lead you to the Definition Classes docs and their code.

Chaining Predicates

Predicates can be chained on the context itself, i.e. the XPath object, for example:

This represents /root/*ns:Child[.\@nsp:attr3] where the * Scales Xml element predicate allows matching on the self axis. The same chaining is available on the attribute axis represented by the AttributePaths class.

Positional Predicates

These, more difficult to model, positional tests can be leveraged the same way as position() and last() can be in XPath.

Direct Filtering

The xflatMap, xmap, xfilter and filter methods allow extra predicate usage where the existing XPath 1.0 functions don't suffice.

The filter method accepts a simple XmlPath => Boolean, whereas the other varieties work on the matching sets themselves.

It is not recommended to use these functions for general use as they primarily exist for internal re-use.

Unsorted Results and Views

In order to meet XPath expected usage results are sorted in Document order and checked for duplicates. If this is not necessary - but speed of matching over a result set is (for example lazy querying over a large set) - then the raw functions (either raw or rawLazy) are good choices.

The viewed function however uses views as its default type and may help add further lazy evaluation. Whilst tests have shown lazy evaluation takes place its worth profiling your application to see if it actually impacts performance in an expected fashion.

XPath Position Function	Scales DSL	Details
position()	pos_<, pos_==, pos() and pos_>	Functions to work against the current position within a context
last()	last_<, last_== and last_>	Functions that work against the size of a given context
position() == last()	pos_eq_last	Take the last item in a context