The PHP 4 DOMXML extension has undergone some serious transformation since PHP5 and is a lot easier to use. Unlike SimpleXML, DOM can, at times, be cumbersome and unwiedly. However, it is often a better choice than SimpleXML. Please join me and find out why.
Since SimpleXML and DOM objects are interoperable you can use the former for simplicity and the latter for power. How you can exchange data between the two extensions is explained at the bottom of the article.
The DOM extension is especially useful when you want to modify XML documents , as SimpleXML for example does not allow to remove nodes from an XML document. For this article's code examples we will use the same foundation that we used in the Parsing XML with SimpleXML post.
We will use this very site's google sitemap file, which can be downloaded here. The sitemap.xml file features an xml list of pages of php-coding-practices.com for easy indexing in google.
Loading and Saving XMLDocuments
The DOM extension, just like SimpleXML, provides two ways to load xml documents - either by string or by filename:
$source =
'sitemap.xml';
$dom =
new DomDocument
();
$dom->
load($source);
// load as string
$dom2 =
new DomDocument
();
$dom2->
loadXML(file_get_contents($source));
In addition to that, the DomDocument object provides two functions to load html files. The advantage is that html files do not have to be well-formed to load. Here is an example:
$doc =
new DOMDocument
();
$doc->
loadHTML("<html><body>Test
</body></html>");
echo $doc->
saveHTML();
The cool news is that mal-formed HTML will automatically be transferred into well-formed one. Look at this script:
$doc =
new DOMDocument
();
$doc->
loadHTML("<html><body><p>Test
</p></body></html>");
echo $doc->
saveHTML();
The DomDocument::loadHTML() method will automatically add a DTD (Document Type Definition) and add the missing end-tag for the opened p-tag. Cool, isn't it?
< !DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Test
</p></body></html>
Saving XML data with the DOM library is as easy. Just use DomDocument::saveHTML() and DomDocument::saveXML() with no parameters. They will automatically create XML or HTML documents from your xml contents and return them. DomDocument::saveHTMLFile() and DomDocument::save() save to html and xml files. They request a filepath paramter as a string.
XPath Queries
One of the most powerful features of the DOM extension is the way in which it integrates with XPath queries. In fact, DomXpath is much more powerful than its SimpleXML equivalent:
$source =
'sitemap.xml';
$dom =
new DomDocument
();
$dom->
load($source);
$xpath =
new DomXPath
($dom);
$xpath->
registerNamespace('c',
'http://www.google.com/schemas/sitemap/0.84');
$result =
$xpath->
query("//c:loc/text()");
echo $result->
length.
'
';
//echo $result->item(3)->data;
foreach($result as $b) {
echo $b->
data.
'
';
}
Notice that the sitemap xml file contains a namespace already, which we register using DomXPath::registerNamespace():
< ?xml version="1.0" encoding="UTF-8"?>
We really have to register that namespace with the DomXPath object or else it will not know where to search. ;) You can also register multiple namespaces, but more on that later. Notice that we use text() within the xpath query to get the actual text contents of the nodes.
If you want to learn the ins and outs of the xpath language, I recommend reading the W3C XPath Reference.
Modifying XML Documents
Adding New Nodes
To add new data to a loaded dom documented, we need to create new DomElement objects by using the DomDocument::createElement(), DomDocument::createElementNS() and DomDocument::createTextNode() methods.
In the following we will add a new url to our urlset:
$source =
'sitemap.xml';
$dom =
new DomDocument
();
$dom->
load($source);
// url element
$url =
$dom->
createElement('url');
// location
$loc =
$dom->
createElement('loc');
$text =
$dom->
createTextNode('http://php-coding-practices.com/article/');
$loc->
appendChild($text);
// last modification
$lastmod=
$dom->
createElement('lastmod');
$text =
$dom->
createTextNode('2007-04-20T10:24:32+00:00');
$lastmod->
appendChild($text);
// change frequency
$changefreq=
$dom->
createElement('changefreq');
$text =
$dom->
createTextNode('weekly');
$changefreq->
appendChild($text);
// priority
$priority=
$dom->
createElement('priority');
$text =
$dom->
createTextNode('0.3');
$priority->
appendChild($text);
// add the elements to the url
$url->
appendChild($loc);
$url->
appendChild($lastmod);
$url->
appendChild($changefreq);
$url->
appendChild($priority);
// add the new url to the root element (urlset)
$dom->
documentElement->
appendChild($url);
echo $dom->
saveHtml();
The code is pretty self-explanatory. First we create a new url element as well as some sub-elements. Then we append those sub-elements to the url element, which we in turn append to the document's root element. Note that the root element can be accessed via the $dom->documentElement property. The output:
....
<loc>http://php-coding-practices.com/2007/04/</loc>
<lastmod>2007-04-30T16:54:58+00:00</lastmod>
<changefreq>yearly</changefreq>
<priority>0.5</priority>
<url>
<loc>http://php-coding-practices.com/2007/03/</loc>
<lastmod>2007-03-29T20:04:51+00:00</lastmod>
<changefreq>yearly</changefreq>
<priority>0.5</priority>
</url>
Now it was certainly not as easy as it would have been had we used SimpleXML. The DOM extension provides many more methods for more power. For example you can associate a namespace with an element
while creation using DomDocument::createElementNS(). I will provide some example code on that later in the article.
Adding Attributes To Nodes
Via DomDocument::setAttribute() we can easily add an attribute to a node object. Example:
$url = $dom->createElement('url');
...
$url->setAttribute('meta:level','3');
Here we set a fictive meta:level attribute with the value 3 to our url NodeElement from above.
Moving Data
Moving data is not as obvious as you might expect, as the DOM extension does not provide a real method that takes care of that, explicitly. Instead we will have to use
a combination of DomDocument::insertBefore(). As an example, suppose we want to move our new url from above just before the very first url:
$xpath =
new DomXPath
($dom);
$xpath->
registerNamespace("c",
"http://www.google.com/schemas/sitemap/0.84");
$result =
$xpath->
query("//c:url");
$result->
item(1)->
parentNode->
insertBefore($result->
item(1),
$result->
item(0));
echo $dom->
saveXML();
DomDocument::insertBefore() takes two parameters, the new node and the reference node. It inserts the new node before the reference node. In our example, we insert the second url ($result->item(1)) before the first one ($result->item(0)).
I hear you asking why we use DomDocument::insertBefore() on the $result->item(1)->parentNode node.. Couldn't we just as easily use simply $result->item(0)? No of course not, as we need to execute DomDocument::insertBefore() on the root element, urlset, and not a specific url (look at our xpath query).
We could use the following code which is perfectly valid and gets us the same results, though:
$result->item(0)->parentNode->insertBefore($result->item(1),$result->item(0));
If we wanted to append the first url at the bottom of the sitemap, the following code is the way to go:
$result->item(0)->parentNode->appendChild($result->item(0));
// or $dom->documentElement->appendChild($result->item(0)); respectively
Easy is it not? :) DomDocument::insertBefore() and DomNode::appendChild() automatically move (and not copy and then move) the corresponding nodes. If you wish to clone a node first before moving it, use DomNode::cloneNode() first:
$source =
'sitemap.xml';
$dom =
new DomDocument
();
$dom->
load($source);
$xpath =
new DomXPath
($dom);
$xpath->
registerNamespace("c",
"http://www.google.com/schemas/sitemap/0.84");
$result =
$xpath->
query("//c:url");
$clone =
$result->
item(0)->
cloneNode(true);
$result->
item(4)->
parentNode->
appendChild($clone);
echo $dom->
saveXML();
The important thing here is that you have to supply omNode::cloneNode() with a true parameter (default is false), so that it copies all descendant nodes as well. If we had left that to false, we would have gotten an empty <url></url> node, which is not desirable. ;)
Modifying Node Data
When modifying node data, you want to modify the CDATA within a node. You can use xpath again to find the node you want to edit and then simply supply a new value to its data property:
$source =
'sitemap.xml';
$dom =
new DomDocument
();
$dom->
load($source);
$xpath =
new DomXPath
($dom);
$xpath->
registerNamespace("c",
"http://www.google.com/schemas/sitemap/0.84");
$result =
$xpath->
query("//c:loc/text()");
$node =
$result->
item(1);
$node->
data =
strtoupper($node->
data);
echo $dom->
saveXML();
This code transforms the location data of the second url to uppercase letters.
Removing Data From XML Documents
There are three types of data that you would possbily want to remove from xml documents: elements, attributes and CDATA. The DOM extension provides a method for each of them:
DomElement::removeAttribute(), DomNode::removeChild() and DomCharacterData::deleteData(). We will use a custom xml document and not our sitemap to demonstrate their behavior. This makes it easier for you
to come back to this article and see at first glance how these methods work. Thank Nikos if you want to. ;)
$xml = <<<XML
<xml>
<text type=
"input">This is some really cool text!</text>
<text type=
"input">This is some other really cool text!</text>
<text type=
"misc">This is some cool text!</text>
<text type=
"output">This is text!</text>
XML;
$dom =
new DomDocument
();
$dom->
loadXML($xml);
$xpath =
new DomXPath
($dom);
$result =
$xpath->
query("//text");
// remove first node
$result->
item(0)->
parentNode->
removeChild($result->
item(0));
// remove attribute from second node
$result->
item(1)->
removeAttribute('type');
//delete data from third element
$result =
$xpath->
query('text()',
$result->
item(2));
$result->
item(0)->
deleteData(0,
$result->
item(0)->
length);
echo $dom->
saveXML();
The output of this is:
< ?xml version="1.0"?>
<xml>
<text>This is some other really cool text!</text>
<text type="misc"></text>
<text type="output">This is text!</text>
In this example we start by retrieving all text nodes from a document. Then we remove some data from that document. Simple.
In fact we remove the first node alltogether as well as the attribute of the second node. Finally we truncate the character data of the third node, using xpath to query the corresponding text() node.
Note that DomCharacterData::deleteData() requires a starting offset and a length parameter. Since we want to truncate the data in our example we supply 0 and the length of the CDATA node.
DOM And Working With Namespaces
DOM is very capable of handling namespaces on its own. Most of the time you can ignore them and pass attribute and element names with the appropriate prefix directly to most DOM functions.
$dom =
new DomDocument
();
$node =
$dom->
createElement('ns1:somenode');
$node->
setAttribute('ns2:someattribute',
'somevalue');
$node2 =
$dom->
createElement('ns3:anothernode');
$node->
appendChild($node2);
// Set xmlns attributes
$node->
setAttribute('xmlns:ns1',
'http://php-coding-practices.com/');
$node->
setAttribute('xmlns:ns2',
'http://php-coding-practices.com/articles/');
$node->
setAttribute('xmlns:ns3',
'http://php-coding-practices.com/sitemap/');
$node->
setAttribute('xmlns:ns4',
'http://php-coding-practices.com/about-the-author/');
$dom->
appendChild($node);
echo $dom->
saveXML();
The output of this script is:
< ?xml version="1.0"?>
<ns1 :somenode
ns2:someattribute="somevalue"
xmlns:ns1="http://php-coding-practices.com/"
xmlns:ns2="http://php-coding-practices.com/articles/"
xmlns:ns3="http://php-coding-practices.com/sitemap/"
xmlns:ns4="http://php-coding-practices.com/about-the-author/">
<ns3 :anothernode/>
We can simplify the use of namespaces somewhat by using DomDocument::createElementNS() and DomDocument::setAttributeNS(), which were specifically designed for this purpose:
$dom =
new DomDocument
();
$node =
$dom->
createElementNS('http://php-coding-practices.com/',
'ns1:somenode');
$node->
setAttributeNS('http://somewebsite.com/ns2',
'ns2:someattribute',
'somevalue');
$node2 =
$dom->
createElementNS('http://php-coding-practices.com/articles/',
'ns3:anothernode');
$node3 =
$dom->
createElementNS('http://php-coding-practices.com/sitemap/',
'ns1:someothernode');
$node->
appendChild($node2);
$node->
appendChild($node3);
$dom->
appendChild($node);
echo $dom->
saveXML();
This results in the following output:
< ?xml version="1.0"?>
<ns1 :somenode
xmlns:ns1="http://php-coding-practices.com/"
xmlns:ns2="http://somewebsite.com/ns2"
xmlns:ns3="http://php-coding-practices.com/articles/"
xmlns:ns11="http://php-coding-practices.com/sitemap/"
ns2:someattribute="somevalue">
<ns3 :anothernode xmlns:ns3="http://php-coding-practices.com/articles/"/>
<ns11 :someothernode xmlns:ns1="http://php-coding-practices.com/sitemap/"/>
Interfacing With SimpleXML
As I have mentioned at the start of our little DOM journey it is very easy to exchange loaded documents between SimpleXML and DOM. Therefore, you can take advantage of both
systems' strengths - SimpleXML's simplicity and DOM's power.
You can import SimpleXML object into DOM by using PHP's dom_import_simplexml() function:
$sxml = simplexml_load_file('sitemap.xml');
$node = dom_import_simplexml($sxml);
$dom = new DomDocument();
$dom->importNode($node,true);
$dom->appendChild($node);
DomDocument::importNode() creates a copy of the node and associates it with the current document. Its second parameter - a boolean value - determines if the method will recursively import the subtree or not.
You can also import a dom object into SimpleXML using simple_xml_import_dom():
$dom =
new DomDocument
();
$dom->
load('sitemap.xml');
$sxe = simplexml_import_dom
($dom);
echo $sxe->
url[0]->
loc;
Conclusion
DOM is certainly a very powerful way of dealing with XML documents. While it provides a good interface for basically every task one could dream of it often takes quite a lot of code lines to accomplish a task. SimpleXML's interface is of course a little easier, but less powerful.
Especially the fact that SimpleXML is rather incapable of removing data makes DOM the way to go for more complicated XML document processing. DOM's power in dealing with namespaces make it a valuable tool when dealing with large portions of data where naming conflicts are likely.
In fact we covered only a small portion of DOM's power. There are many other associating objects which have several useful methods. For example, we have not covered how to append character data. Check the DOM function reference for more information.
Thanks for staying with me on the DOM-boot till the end of our joirney! I hope you enjoyed it - please beware of the gap between the boot and the footbridge when leaving.