The Scales Xml equality framework aims to help with both testing applications (similar to XmlUnit) that use Scales and also for runtime comparison activities, e.g. if an element has this attribute with this value do X.

The Scales equality framework does not throw, but returns information allowing decisions to be made. In the case of `===` a simple boolean is returned, in the case of `compare` a full ADT and path can be returned diagnosing the first difference found.

NB Scala 2.8.x support for the Equality framework is experimental. Importing FromEqualsImplicit._ enables the use of === from Scalaz. Unfortunately due to 2.8.x compiler issues the implicit resolution does not correctly function and may cause compiler crashes.

How To Use

Scales Xml Equality leverages two type classes, XmlComparison and the Scalaz Equal typeclass to provide comparison via a simple === . As such Scalaz must be imported (best after Scales imports to avoid Tree import issues):

Wherever an XmlComparison exists an Equal instance can be created. The results of a compare include both a path to the difference and a fully pattern matchable XML difference ADT.

Types Covered

QNames by default do not compare with the prefix(unlike Canonical Xml, where string comparisons including prefixes are expected), only the namespace (as per =:=). This implies that documents created by different systems using different prefixes are still comparable, a different implicit default Equal[QName] instance can change that behaviour.

XmlTrees/XmlPath's etc are converted to Iterator[PullType] in order to compare. No attempt to match DTDs or encoding are made, but the rest of a given document (Doc and DocLike implementations) will be.

Within the comparison framework the comparison for all the types are combined, the QName Equal typeclass is used throughout, including for the Attribute comparison, which is used in turn by the Elem - which is finally used by Stream comparisons.

This lookup is performed implicitly, allowing for individual parts to be swapped out, if the developer wants prefixes to be tested. Either use name based overriding in the relevant scope or mix the traits differently to provide custom behaviour (and not import ScalesXml._)

_Note_ The three different kinds of QNames each have a different type and, as such, using === to compare different types will not work. Using compare, however, will:

Why Join Adjacent Text and CData?

Scales Xml equality makes three default design decisions, prefixes aren't generally relevant only the namespace is (unless you tell it to use QName Token comparison) and to join adjacent CData and Text.

The reason for joining adjacent CData and Text nodes is to simplify the comparison of text. CData can always be written as Text nodes, and a parser is free to "split" a single logical Text node into multiple smaller text nodes. Scales neither forces joining of the text nodes at parse time nor when adding child nodes, as such to usefully compare they must be joined.

Removing Comments And PIs

The default comparison logic treats both Comments and PIs as relevant for comparison. This design choice meets expectations for the majority of XML documents.

In the event that Comments and PIs should not be compared the implicits can be overridden in scope with:

Why Not Use Canonical XML?

Testing Xml Equality is not always straightforward, a standard approach however exists : Canonical Xml - a defined w3c standard approach to serialization. Canonical Xml treats QName prefixes themselves as relevant, if an XML processor changes a prefix, that document is no longer comparable under Canonical Xml (No Namespace Prefix Rewriting).

Whilst the justifications for the prefix rewriting rule in Canonical Xml is, within the context of embedded XPaths or XML QNames (their prefixes only make sense within that document), understandable Scales takes the position that this is far rarer an occasion than simple Xml as a data transport usage. However, as with the rest of Scales, this default too is customisable.

The problem, and it stops the Canonical Xml reasoning as well, is that a producing application can re-write the prefixes for embedded QNames or XPaths before sending. WSDM applications often meet this (see Apache Muse for examples of this). Scales is of course, by default, not aware of such usage - see here for details on how to configure Scales to be QName token aware. In short, it can be as simple as declaring this in the correct scope:

Canonical Xml also forces redundant namespace declarations to be removed (Superfluous Namespace Declarations). Scales typically only uses namespace declarations for predictable document processing - i.e. loading and saving should be 1:1 in usage - however it also can if necessary leverage declarations for QName Token handling in both Attribute values and Text/CData nodes.

Similarly the following approaches to Canonical Xml actually may break data assumptions of equality:

The latter may cause issues with certain receiving processors and depends on a validation / schema enrichment working. Scales concerns itself with the documents actually being compared.

In short - Scales offers a typed and more flexible approach to equality than Canonical Xml handling.