XML is eXtensible Markup Language like HTML. It allows programmers to develop applications that can be read by other applications irrespective of operating system and developmental language used.
It keeps track of small to medium amounts of data without any SQL based technique in backend.
REXML is a pure Ruby XML processor. It represents a full XML document including PIs, doctype, etc. An XML document has a single child that can be accessed by root(). If you want to have an XML declaration for a created document, you must add one. REXML documents do not write a default declaration for you.
REXML was inspired by Electric XML library for Java. Its API is easy to use, small size and have followed the Ruby methodology for method naming and code flow.
It supports both tree and stream document parsing. Steam parsing is 1.5 times faster than tree parsing. However, in stream parsing you don’t get access to some features like XPath.
REXML features:
- It is written 100 percent in Ruby.
- It contains less than 2000 lines of code, hence, lighter in weight.
- Its methods and classes are easy to understand.
- It is shipped with Ruby installation. No need to install it separately.
- It is used for both DOM and SAX parsing.
Parsing XML and accessing elements
Let’s start with parsing an XML document:
require "rexml/document"
file = File.new( "trial.xml" )
doc = REXML::Document.new file
In the above code, line 3 parses the supplied file.
Example:
require 'rexml/document'
include REXML
file = File.new("trial.xml")
doc = Document.new(file)
puts docs
In the above code, the require statement loads the REXML library. Then include REXML indicates that we don’t have to use names like REXML::Document. We have created trial.xml file. Document is shown on the screen.
Output:
The Document.new method takes IO, String object or Document as its argument. This argument specifies the source from which XML document has to be read.
If a Document constructor takes a Document as argument, all its element nodes are cloned to new Document object. If the constructor takes a String argument, string will be expected to contain an XML document.
XML with “Here Document”
A here Document is a way to specify a text block, preserving line breaks, whitespaces or identation with text.
A here Document is constructed using a command followed by “<<” followed by a token string.
In Ruby, there should be no space between “<<” and token string.
Example:
#!/usr/bin/env ruby
require 'rexml/document'
include REXML
info = <<XML
<info>
<name>Caroline</name>
<street>9820 St.</street>
<city>Seattle</city>
<contact>9854126575</contact>
<country>USA</country>
</info>
XML
document = Document.new( info )
puts document
Here, we use here Document info. All the characters including newlines between <<EOF and EOF are part of info.
For XML parsing examples, we will use following XML file code as input:
file trial.xml
#!/usr/bin/ruby -w
require 'rexml/document'
include REXML
xmlfile = File.new("trial.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]
# This will output all the cloth titles.
xmldoc.elements.each("collection/clothing"){
|e| puts "cloth Title : " + e.attributes["title"]
}
# This will output all the cloth types.
xmldoc.elements.each("collection/clothing/type") {
|e| puts "cloth Type : " + e.text
}
# This will output all the cloth description.
xmldoc.elements.each("collection/clothing/description") {
|e| puts "cloth Description : " + e.text
}
Ruby XML DOM-Like Parsing
We will parse our XML data in tree fashion. The above file trial.xml code is taken as input.
#!/usr/bin/ruby -w
require 'rexml/document'
include REXML
xmlfile = File.new("trial.xml")
xmldoc = Document.new(xmlfile)
# Now get the root element
root = xmldoc.root
puts "Root element : " + root.attributes["shelf"]
# This will output all the cloth titles.
xmldoc.elements.each("collection/clothing"){
|e| puts "cloth Title : " + e.attributes["title"]
}
# This will output all the cloth types.
xmldoc.elements.each("collection/clothing/type") {
|e| puts "cloth Type : " + e.text
}
# This will output all the cloth description.
xmldoc.elements.each("collection/clothing/description") {
|e| puts "cloth Description : " + e.text
}
Output:
Ruby XML SAX-Like Parsing
We will parse our XML data in stream fashion. The above file trial.xml code is taken as input. Here, we will define a listener class whose methods will be targeted for callbacks from the parser.
It is advisable that do not use SAX-like parsing for a small file.
#!/usr/bin/ruby -w
require 'rexml/document'
require 'rexml/streamlistener'
include REXML
class MyListener
include REXML::StreamListener
def tag_start(*args)
puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"
end
def text(data)
return if data =~ /^\w*$/ # whitespace only
abbrev = data[0..40] + (data.length > 40 ? "..." : "")
puts " text : #{abbrev.inspect}"
end
end
list = MyListener.new
xmlfile = File.new("trial.xml")
Document.parse_stream(xmlfile, list)
Output:
Leave a Reply