PHP

DOM Traversal in PHP – As simple as JQuery


JQuery is a very popular javascript library that has eased the traversal of HTML DOM elements. The Javascript community has benefitted immensely from that.

PHP also has a very good library that can be used to traverse the DOM, it’s very efficient and simple to use.

Today, I’ll show you how to do DOM Traversal in PHP using Symfony’s DOMCrawler Component.

Let’s create a new PHP project and see how this works.

1. Run composer init and fill out all the instructions.

2. Open the composer.json file and require this package symfony/dom-crawler. The latest version as of this writing is 2.7. This is the official Git repository https://github.com/symfony/dom-crawler

3. Run composer install to install the package.

4. Create an index.php file

index.php

In the file above, we required the autoload file, then went on to import/use the Crawler class from the  DOMCrawler  package we installed.

If you observe carefully, you will discover I copied the content of the laravel welcome page and pasted it here as the data we’ll traverse.

Create an instance of Crawler and call the addContent function, then pass the HTML content you want to traverse as a string.

Now let’s call the filterXPath method on it. That method helps to query the DOM for whatever node Elements we are interested in.

descendant-or-self is the <html> block which calls the <body> tag then gets to the inner divs and finally calls the <p> tag. The text() simply extracts the text of the paragraph node which is This is the Welcome page of Laravel. 

Now, I don’t particularly like this way of doing it, it seems cumbersome. In jQuery, I like to use CSS selectors.

Let’s pull in the Symfony CSS selector library to help us with this.

In your composer.json, require symfony/css-selector version 2.7 and run composer update.

Now replace

with

Viola!!..It still returns the text This is the Welcome page of Laravel  but this was achieved with a way cleaner syntax.

Now, let’s check out other methods

You can use the children(), last() and first() methods too

Do this var_dump($crawler->filter('body')->children()); and check out the results. There also methods for nodeName(), methods to select form elements , links and buttons.

If you are building a web scraping project, you might just want to use this library, it will help immensely.

Check out the documentation here

If you have any questions or comments, please let me know in the comments section below.

 

PROSPER OTEMUYIWA

About PROSPER OTEMUYIWA

Food Ninja, Code Slinger, Technical Trainer, Accidental Writer, Open Source Advocate and Developer Evangelist.