Ruby XML (REXML)

XML is eXtensible Markup Language like HTML. It allows programmers to develop applications that can be read by other applications irrespective of operating system and developmental language used.

It keeps track of small to medium amounts of data without any SQL based technique in backend.

REXML is a pure Ruby XML processor. It represents a full XML document including PIs, doctype, etc. An XML document has a single child that can be accessed by root(). If you want to have an XML declaration for a created document, you must add one. REXML documents do not write a default declaration for you.

REXML was inspired by Electric XML library for Java. Its API is easy to use, small size and have followed the Ruby methodology for method naming and code flow.

It supports both tree and stream document parsing. Steam parsing is 1.5 times faster than tree parsing. However, in stream parsing you don’t get access to some features like XPath.


REXML features:

  • It is written 100 percent in Ruby.
  • It contains less than 2000 lines of code, hence, lighter in weight.
  • Its methods and classes are easy to understand.
  • It is shipped with Ruby installation. No need to install it separately.
  • It is used for both DOM and SAX parsing.

Parsing XML and accessing elements

Let’s start with parsing an XML document:

require "rexml/document"  

file = File.new( "trial.xml" )  

doc = REXML::Document.new file

In the above code, line 3 parses the supplied file.

Example:

require 'rexml/document'   

  

include REXML   

  

file = File.new("trial.xml")   

doc = Document.new(file)   

puts docs

In the above code, the require statement loads the REXML library. Then include REXML indicates that we don’t have to use names like REXML::Document. We have created trial.xml file. Document is shown on the screen.

Output:

Ruby XML 1

The Document.new method takes IO, String object or Document as its argument. This argument specifies the source from which XML document has to be read.

If a Document constructor takes a Document as argument, all its element nodes are cloned to new Document object. If the constructor takes a String argument, string will be expected to contain an XML document.


XML with “Here Document”

A here Document is a way to specify a text block, preserving line breaks, whitespaces or identation with text.

A here Document is constructed using a command followed by “<<” followed by a token string.

In Ruby, there should be no space between “<<” and token string.

Example:

#!/usr/bin/env ruby   

  

require 'rexml/document'   

include REXML   

  

info = <<XML   

<info>   

 <name>Caroline</name>   

 <street>9820 St.</street>   

 <city>Seattle</city>   

 <contact>9854126575</contact>   

 <country>USA</country>   

</info>   

XML   

  

document = Document.new( info )   

puts document

Here, we use here Document info. All the characters including newlines between <<EOF and EOF are part of info.

For XML parsing examples, we will use following XML file code as input:

file trial.xml

#!/usr/bin/ruby -w   

  

require 'rexml/document'   

include REXML   

xmlfile = File.new("trial.xml")   

xmldoc = Document.new(xmlfile)   

  

# Now get the root element   

root = xmldoc.root   

puts "Root element : " + root.attributes["shelf"]   

  

# This will output all the cloth titles.   

xmldoc.elements.each("collection/clothing"){   

   |e| puts "cloth Title : " + e.attributes["title"]   

}   

  

# This will output all the cloth types.   

xmldoc.elements.each("collection/clothing/type") {   

   |e| puts "cloth Type : " + e.text   

}   

  

# This will output all the cloth description.   

xmldoc.elements.each("collection/clothing/description") {   

   |e| puts "cloth Description : " + e.text   

}

Ruby XML DOM-Like Parsing

We will parse our XML data in tree fashion. The above file trial.xml code is taken as input.

#!/usr/bin/ruby -w   

  

require 'rexml/document'   

include REXML   

  

xmlfile = File.new("trial.xml")   

xmldoc = Document.new(xmlfile)   

  

# Now get the root element   

root = xmldoc.root   

puts "Root element : " + root.attributes["shelf"]   

  

# This will output all the cloth titles.   

xmldoc.elements.each("collection/clothing"){   

   |e| puts "cloth Title : " + e.attributes["title"]   

}   

  

# This will output all the cloth types.   

xmldoc.elements.each("collection/clothing/type") {   

   |e| puts "cloth Type : " + e.text   

}   

  

# This will output all the cloth description.   

xmldoc.elements.each("collection/clothing/description") {   

   |e| puts "cloth Description : " + e.text   

}

Output:

Ruby XML 2

Ruby XML SAX-Like Parsing

We will parse our XML data in stream fashion. The above file trial.xml code is taken as input. Here, we will define a listener class whose methods will be targeted for callbacks from the parser.

It is advisable that do not use SAX-like parsing for a small file.

#!/usr/bin/ruby -w   

  

require 'rexml/document'   

require 'rexml/streamlistener'   

include REXML   

  

class MyListener   

  include REXML::StreamListener   

  def tag_start(*args)   

    puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"   

  end   

  

  def text(data)   

    return if data =~ /^\w*$/     # whitespace only   

    abbrev = data[0..40] + (data.length > 40 ? "..." : "")   

    puts "  text   :   #{abbrev.inspect}"   

  end   

end   

  

list = MyListener.new   

xmlfile = File.new("trial.xml")   

Document.parse_stream(xmlfile, list)

Output:

Ruby XML 3

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *