Pulling Repeated Sections

Scales leverages and extends Scalaz Iteratees to allow resuming an Iteratee. This resuming action is simply returning the current value and the next continuation when done (ResumableIter). The iterate function, as shown here, uses this approach to provide a single path repeating section.

Many documents however have a more complex structure, of many repeated or alternating structures, the following shows the various structures supported by the combination of onDone and onQNames:

Supported Repeating Section Examples

Its far easier to discuss the solution with a few examples of the problem:

Alternating and Repeating Elements

  <root>
    <nested>
      <ofInterest> <!-- Collect all of these -->
        <lotsOfInterestingSubTree>
        </lotsOfInterestingSubTree>
      </ofInterest>
      <alsoOfInterest> <!-- Collect all of these -->
	just some text
      </alsoOfInterest>
    </nested>
...
    <nested>
....
  </root>

It should be noted that monadic serial composition of onQNames would also work here, onDone is not absolutely necessary, although as we will see it is more general..

Grouped Repeating

  <root>
    <nested>
      <ofInterest> <!-- Collect all of these -->
        <lotsOfInterestingSubTree>
        </lotsOfInterestingSubTree>
      </ofInterest>      
    </nested>
...
    <nested>
      <alsoOfInterest> <!-- Collect all of these -->
	just some text
      </alsoOfInterest>	
    </nested>
....
  </root>

Repeating Nested

  <root>
    <nested>
      <ofInterest> <!-- Collect all of these -->
        <lotsOfInterestingSubTree>
          <smallKeyValues> <!-- Collect all of these -->
            <key>toLock</key>
            <value>fred</value>
          </smallKeyValues>
        </lotsOfInterestingSubTree>
      </ofInterest>
    </nested>
...
    <nested>
....
  </root>

Sectioned Grouped Repeating

  <root>
    <section>
      <!-- Necessary for processing the below events -->
      <sectionHeader>header 1</sectionHeader>

      <ofInterest> <!-- Collect all of these -->
        <lotsOfInterestingSubTree>
	  <value>1</value>
        </lotsOfInterestingSubTree>
      </ofInterest>
      <ofInterest> <!-- Collect all of these -->
        <lotsOfInterestingSubTree>
	  <value>2</value>
        </lotsOfInterestingSubTree>
      </ofInterest>
      <ofInterest> <!-- Collect all of these -->
        <lotsOfInterestingSubTree>
	  <value>3</value>
        </lotsOfInterestingSubTree>
      </ofInterest>
    </sectionHeader>
...
    <sectionHeader>
      <!-- Necessary for processing the below events -->
      <sectionHeader>header 2</sectionHeader>
....
  </root>

Pull Parsing ResumableIter'atees

ResumableIter is an Iteratee over E that instead of returning just a Done[R] returns Done[(R, NextResumableIter)]. The next ResumableIter stores the calculation up until the point of returning, allowing the calculation to be resumed.

To process the above examples we make use of this and the onDone Iteratee. This takes a list of ResumableIter and applies the input element to each of the Iteratees in that list, Done here returns both a list of the Iteratees which evaluate to Done for that input and (of course) the next continuation of onDone.

A simple, and recommended, way to leverage onDone is with the foldOnDone function:

  val Headers = List("root"l,"section"l,"sectionHeader"l)
  val OfInterest = List("root"l,"section"l,"ofInterest"l)

  val ofInterestOnDone = onDone(List(onQNames(Headers), onQNames(OfInterest)))

  val total = foldOnDone(xml)( (0, 0), ofInterestOnDone ){ 
    (t, qnamesMatch) =>
    if (qnamesMatch.size == 0) {
      t // no matches
    } else {
      // only one at a time possible for xml matches (unless multiple identical onQNames are passed to onDone).
      assertEquals(1, qnamesMatch.size)
      val head = qnamesMatch.head
      assertTrue("Should have been defined",head._2.isDefined)
	  
      // we should never have more than one child in the parent
      // and thats us
      assertEquals(1, head._2.get.zipUp.children.size)

      val i = text(head._2.get).toInt
      // onQNames always returns the list as well as the XmlPath to allow matching against the input.
      if (head._1 eq Headers) {
	assertEquals(t._1, t._2)
	// get new section
	(i, 1)
      } else (t._1, i)
    }
  }
 
  assertEquals(total._1, total._2)

Scales Xml 0.5.0

Generated Documentation

Documentation Highlights

First Steps
Xml Model
Accessing and Querying Data
Parsing XML
Serializing & Transforming XML
Xml Equality
Technical Details