XML: specification and DOM functions in PHP
Sometimes you want to digress from current coding routine and from inconsiderable problems to which a passage on ‘details’ is devoted, to overview all you have been doing for a long time. So here is my vision of approaches to the main task of PHP-programming – generation of web-pages.
Introduction: about XML-technologies specifications.
A plenty of various specifications about XML are before all aimed at regulating and bringing to the unified standard all the approaches to the work with data in XML-format. At the moment such of them exist XML + XLink + XSL + names’ spaces + information multitude + XML Linking + Model XPointer + names’ spaces XPointer + xptr() XPointer + XSLT + XPath + XSL FO + DOM + SAX + PI for connection with styles’ list + XML-scheme + XQuery + XML-ciphering + XML-canonization + XML-signature + DOM level 2 + DOM level 3 (the list is taken from the article ‘Happy birthday, XML!’).
What is DOM?
This is Document Object Model. Object in this case means object in the programmer sense – objective oriented programming artifact and all those wonderful things which make us like it.
Let’s have a look of an XML-document source:
<?xml version="1.0" encoding="windows-1251"?>
<root language="english ">
<title>XML: specification and DOM functions in PHP</title>
<text> A plenty of various specifications about <acronym>XML</acronym>
are before all aimed at <b> regulating </b> and bringing to
the unified standard all the approaches to the work with data in
<acronym>XML</acronym>format.
</text>
<date>2003-05-12</date>
<raw-code>
<![CDATA[ <br> isn’t an example of well-formed marking: <p>bla-bla</p> ]]>
</raw-code>
<!—it would be desirable to write livid examples into the article… -->
</root>
XML-ideology is based on the fact that a document is a set of nods of tree-formed data structure. The given document could be represented like following tree:
-o- Document
|
+-o- Element root
|
+-o- Attribute language
|
+-o- Element title
| |
| +-o- Text node (“XML: specification…”)
|
+-o- Element text
| |
| +-o- Text node (“A plenty of…”)
| |
| +-o- Element acronym
| | |
| | +-o- Text node (“XML”)
| |
| +-o- Text node (“before all…”)
| |
| +-o- Element b
| | |
| | +-o- Text node (“at regulating”)
| |
| +-o- Text node (“and bringing…”)
| |
| +-o- Element acronym
| | |
| | +-o- Text node ("XML")
| |
| +-o- Text node (".")
|
+-o- Element date
| |
| +-o- Text node ("2003-05-12")
|
+-o- Element raw-code
| |
| +-o- Section CDATA ("<br>...")
|
+-o- Comment (“it would be…”)
Sign “-o-“in the scheme is used for indication of nodes. Text from the right of them indicates node type. For text nodes, CDATA section and comment contents are added to make orientation easy. In fact, strings transfers between elements are text nodes and they can be placed into the scheme as well.
So let’s analyze the scheme. All that is present in the document are nodes and the document itself is node too. This means that there is an object class ‘node’ and the rest of the classes (‘document’, ‘element’, ‘text node’, ‘CDATA’, ‘comment’) are its children and they inherit its features and methods. Which features and methods should be placed into some definite classes is described in the DOM specification.
If we look through module DOM XML documentation, we’ll see that all these different nodes have very much in common – 28 methods possesses class DomNode and together with children classes there are 62 methods. As you can guess, methods and features of the class DomNode are also present in other classes.
In the article “Raw scheme of DOM XML module in PHP” by Garry Fewex following illustration of module DOM XML classes’ interaction is given:
o- DomNode
|
+-o- DomAttribute
|
+-o- DomCData
| |
| +-o- DomComment
| |
| +-o- DomDTD
| |
| +-o- DomText
|
+-o- DomDocument
|
+-o- DomDocumentType
|
+-o- DomElement
|
+-o- DomEntity
|
+-o- DomEntityReference
|
+-o- DomProcessingInstruction
Further the notes are given that module DOM XML doesn’t fit completely by the time into specification (and everyone has had enough time to use ‘doubtful’ functions, now they should be deleted from many applications and code should be rewritten), that there are still many drains from the module’s memory should be corrected in the version 4.3.2 (which hasn’t been released yet and is on the release-candidate stage). But all this isn’t worth consideration. People who have been using DOM XML for a long time should get accustomed to it; but if you’ve just started your acquaintance to it, you’ll start its usage in real tasks only when it becomes stable and fits into specification. So we’ll continue our acquaintance with DOM and the module.
DOM specification describes modules that should be present in applications working with XML, methods that should be possessed by these objects and their influence upon document’s nodes. Therefore in Java language, Javascript and other systems already possessing DOM support XML-documents have similar interface which differs only with names of the functions. It’s scaring to suppose what would happen if developers started creating a model on their own.
Work with a PHP-document
Creation of a document
Object of a document can be created from an already existing file of text string or absolutely new blank document.
<?
$dom1 = domxml_open_file("c:/xml/existing_file.xml");
$dom2 = domxml_open_mem($string);
$dom3 = domxml_new_doc();
?>
In case of an error all these functions return false meaning instead of an object that makes verification of operation’s result rather simple.
On default by document creation verification of its syntax is made (well-form) but not that of its permissibility (correspondence to the DTD-scheme or XML-scheme of a document, validity). To verify its permissibility you need to indicate in the document creation function (any of the three given above) second parameter that isn’t documented yet and constant DOMXML_LOAD_VALIDATING in it:
<?
$dom2 = domxml_open_mem($string, DOMXML_LOAD_VALIDATING);
?>
Element object obtaining
All the element objects are saved in the PHP-memory after creation of a document. But they aren’t recorded into script’s variables without a special call.
Root element of a document may be obtained by addressing a document object by means of method document_element. Function returns object of class DomElement which could be used as argument of another function or recorded into a variable:
<?
$root = $dom1->document_element();
?>
In a similar way it’s possible to get any node from a document by means of document object methods or elements objects.
<?
// Root child elements massive
$root_child = $root->child_nodes();
for ($i = 0; $i < sizeof($root_child); $i++)
print("$i. ". $root_child[$i]->node_type(). " ". $root_child[$i]->node_name().
"<br/>");
// the first and the last child elements
$first_child = $root->first_child();
$last_child = $root->last_child();
print($first_child->node_name()." and ".$root_child[0]->node_name()." – are the same
");
print($last_child->node_name()." and ".$root_child[sizeof($root_child)-1]->node_name().
" – also coincide
");
// element following the firs one
// previous_sibling works in a similar way
$second_child = $first_child->next_sibling();
print($second_child->node_name(). " ". $root_child[1]->node_name(). "
");
By analysis of child elements it’s important to watch nodes’ types as the strings divisions which are made for reading and editing convenience also become document’s nodes and correspondingly are included into child elements massive.
<?
for ($i = 0; $i < sizeof($root_child); $i++)
if ($root_child[$i]->node_type() == XML_ELEMENT_NODE)
// For illustration here text is recoded although
//it isn’t necessary
$root_child[$i]->set_attribute("makes-sence", iconv("windows-1251",
"UTF-8", "maybe"));
else
print("$i – type element ". $root_child[$i]->node_type());
However sometimes you cannot be sure at all that you’ve got a node object, not ‘false’ or ‘null’. In this case, if you call an object method, you may get string with warning directly into resulting document. To evade this you may check type of element with function get_class.
And you may be unsure in the result, for example, when you extract the element needed from the document by means of XPath expressions. To obtain the element needed it has no sense to sort out all the document’s elements in its search. Special for this there are XPath expressions used in XSLT for converted document nodes addressing (attributes select, match).
<?
/* Creation of XPath context. Argument of the function is a document object
in which XPath expressions
will be accomplished. */
$context = xpath_new_context($dom1);
/* Expression accomplishment and record of the result into the result variable*/
$result = xpath_eval($context, "/root/text/acronym");
var_dump($result);
/* $result variable is an object of class XPathObject, nodeset feature is a massive
containing objects of the elements obtained. */
for ($i = 0; $i < sizeof($result->nodeset); $i++)
{
$text = $result->nodeset[$i]->first_child();
print(iconv("UTF-8", "windows-1251", $text->node_value()). "
");
}
/* Obtaining scalar meaning by means of XPath (calculation of all the elements’ number
In the document except for the root one) */
$result = xpath_eval($context, "count(/root//*)");
var_dump($result);
print("
{$result->value}");
It’s important to remember about XML-names spaces which may be used in documents. If you want to accomplish expressions within documents which contain elements from their names’ spaces (for example, XSLT-documents), you need to declare this names’ space. Otherwise it will be impossible to indicate names like ‘xsl:tamplate’ within expression.
Address (URI) of names’ space in the function’s argument must coincide with that indicated in the document otherwise XPath-parser will suppose that two different names’ spaces are registered with the same xsl-prefix.
<?
$xslt = domxml_open_file("c:/xml/custom.xslt");
$context = xpath_new_context($xslt);
/* Registration of names’ space xsl in XPath context */
xpath_register_ns($context, "xsl", "http://www.w3.org/1999/XSL/Transform");
/* Evaluation of templates’ number in XSLT-style.*/
$result = xpath_eval($context, "count(/xsl:stylesheet/xsl:template)");
print($result->value);
So the task of necessary element obtaining is analyzed. Now about things that can be done with it.
Copying of elements
Until DOM XML module didn’t fit properly into DOM-specification you had to obtain an element object and add it to another document’s element. Now you have to clone the element with function clone_node before. Following code copies elements from the first document’s root into the root of the second one.
<?
$root1 = $dom1->document_element();
$child = $root1->child_nodes();
$root2 = $dom2->document_element();
for ($i = 0; $i < sizeof($child); $i++)
$root2->append_child($child[$i]->clone_node());
New nods creation within a document
You should have mentioned that ‘element’ and ‘node’ are written here in turns. I hope from the classes’ inheritance scheme you understood that element is a tag and node is a more general idea including all the possible things. I try to use these words properly to evade double meaning.
Each node is inserted into a document by two operations. The first operation is creation of a node. The node should be created within that document in which it is going to be inserted. Then the node is added as a child to any of the document’s nodes. For attributes which in a certain sense are also nodes there is more convenient construction.
In the PHP-documentation which I’ve downloaded recently there is a functions’ list which were present in the previous versions but didn’t fit into DOM-specification and descriptions of their work as well. They may be studied but it’s not recommended to use them. Behavior of some constructions changed by appearance of new versions so you are not to rely on the things which are condemned to cancellation.
<?
/* Root element is added exactly like the rest of nodes.*/
$dom3 = domxml_new_doc();
/* Function create_element creates a node of ‘element’ type*/
$root3_new = $dom3->create_element("root");
/* Now the element created is added to the document. In fact nothing
prevents you from sending result of function create_element directly to the document
instead of sending it through the variable $root3_new. */
$root3 = $dom3->append_child($root3_new);
$title = $root3->append_child($dom3->create_element("title"));
/* Function create_text_node creates a text node. We’ll add it as contents of element title.
It’s not obligatory to save the element added into the variable -
Only in case if you want to work with it after adding. */
$title->append_child($dom3->create_text_node("New nods creation within document"));
In a similar way nodes of other types are created and inserted into a document.
Changing of nodes
Formally these methods aren’t foreseen. Attributes possess the method changing contents. There are such elements which have no child elements and possess only CDATA sections, text with substances or comment. It’s possible to change the m by deletion of existing nodes and insertion of the new ones. For elements possessing the child ones together with text ones the edition method would be nonsense.
Attributes are created and changed through elements objects’ methods in which these attributes are contained. According to specification they also should be created and added into elements through functions create_attribute and append_child but all this hasn’t been realized in PHP 4.3.1. (4.3.2) yet.
<?
// You may set attribute’s meaning through its element’s object
$root3->set_attribute("language", "English");
// This way it is written in the documentation but it doesn’t work.
$root2->append_child($dom2->create_attribute("language", "English"));
?>
Explanation of why it doesn’t work is ‘not ready yet’. At the same time you are offered to use the ultimate version from CVS. You are also offered to use function set_attribute_node. In some cases it causes inconvenience when, for example, type of the inserted into the element node is unknown in advance – if a text will be done or an attribute and it would be better to use one function but until we’ll be able to do this we would have to use construction if-else.
Replacement of one text node is deleting of an existing one and insertion of a new one. If you aren’t sure the element contains only a text node and no others, you may clone element, insert it into its parent and after that delete an initial one. You’ll get ‘pure’ element without child ones but at the same time attributes won’t be copied.
<?
// $target – variable with changeable element
// We get a parent node.
$parent = $target->parent_node();
// Insert into it clone of the node we need.
$new_target = $parent->append_child($target->clone_node(false));
// Delete an old element.
$parent->remove_child($target);
// Insert necessary text into a new element.
$new_target->append_child($dom->create_text_node(iconv("windows-1251",
"UTF-8", " Replacement of a node is deleting of an existing one and insertion of a new one. ")));
Changing of text nodes or CDATA sections in the complicated elements’ combination is also simple: we obtain the necessary object, add a new node before it and delete the old one.
<?
$new_node = $target_node->insert_before($dom->create_text_node(iconv("windows-1251",
"UTF-8",
" Replacement of a node is deleting of an existing one and insertion of a new one.")),
$target_node);
$parent = $target_node->parent_node();
$parent->remove_child($target_node);
Two last strings could be replaced with one - $target_node->unlink_node() but as far as this function doesn’t fit into the standard it can be deleted and correspondingly you’d better not to use it in the examples.
XSL-transformation in the DOM XML module
XSLT is also a XML-document. It is read from file (string) – exactly like a document object is created – or is created from a XML-document object. Then method process is called with a transformed document object in the quality of an argument and by the output we get a XML-document object. XSLT is also a XML-document. It is read from file (string) – exactly like a document object is created – or is created from a XML-document object. Then method process is called with a transformed document object in the quality of an argument and by the output we get a XML-document object.
<?
$xslt = domxml_xslt_stylesheet_doc("c:/xml/custom.xslt");
$dom = domxml_open_file("c:/xml/existing_file.xml");
$final = $xslt->process($dom);
print($final->dump_mem());
Changing of text nodes or CDATA sections in the complicated elements’ combination is also simple: we obtain the necessary object, add a new node before it and delete the old one.
Conclusion
Objective approach to document is a step to our future. DOM XML module offers program interface with such abilities which are impossible to make on SAX-parser in PHP-script. It removes vagueness connected with symbols interpretation in the XML-document made by you by its analysis with XSLT-processor or other handler. For example, there are no problems with text insertion into document’s elements in DOM XML but at the same time you are to check and filter service symbols when working text of a document. Text in a DOM-object is a text but if we simply insert it into a document’s string where symbols < and > convert into tags and actually if they aren’t declared may cause an error.
Module DOM XML is raw at the moment. Not all the functions are realized; the last release will always remain behind that developers in CVS have. Documentation remains behind releases considerably enough. But developers communicate actively with users; the module is opened for innovations and improvements. So you are to master its functions right now in order to have some knowledge and experience of simple sites building with XML by the first commercial project in which XML will be used.



