DOM Traversal in PHP – As simple as JQuery
JQuery is a very popular javascript library that has eased the traversal of HTML DOM elements. The Javascript community has benefitted immensely from that.
PHP also has a very good library that can be used to traverse the DOM, it’s very efficient and simple to use.
Today, I’ll show you how to do DOM Traversal in PHP using Symfony’s DOMCrawler Component.
Let’s create a new PHP project and see how this works.
1. Run composer init
and fill out all the instructions.
2. Open the composer.json file and require this package symfony/dom-crawler
. The latest version as of this writing is 2.7. This is the official Git repository https://github.com/symfony/dom-crawler
3. Run composer install
to install the package.
4. Create an index.php file
index.php
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
<?php require "vendor/autoload.php"; use SymfonyComponentDomCrawlerCrawler; $htmlContent = '<!DOCTYPE html> <html> <head> <title>Laravel</title> <link href="https://fonts.googleapis.com/css?family=Lato:100" rel="stylesheet" type="text/css"> <style> html, body { height: 100%; } body { margin: 0; padding: 0; width: 100%; display: table; font-weight: 100; } .container { text-align: center; display: table-cell; vertical-align: middle; } .content { text-align: center; display: inline-block; } .title { font-size: 96px; } </style> </head> <body> <div class="container"> <div class="content"> <p> This is the Welcome page of Laravel </p> </div> </div> </body> </html>'; $crawler = new Crawler(); $crawler->addContent($htmlContent); echo $crawler->filterXPath('descendant-or-self::body/div/div/p')->text(); |
In the file above, we required the autoload file, then went on to import/use the Crawler class from the DOMCrawler package we installed.
If you observe carefully, you will discover I copied the content of the laravel welcome page and pasted it here as the data we’ll traverse.
1 2 3 |
$crawler = new Crawler(); $crawler->addContent($htmlContent); echo $crawler->filterXPath('descendant-or-self::body/div/div/p')->text(); |
Create an instance of Crawler and call the addContent function, then pass the HTML content you want to traverse as a string.
Now let’s call the filterXPath method on it. That method helps to query the DOM for whatever node Elements we are interested in.
descendant-or-self is the <html> block which calls the <body> tag then gets to the inner divs and finally calls the <p> tag. The text() simply extracts the text of the paragraph node which is This is the Welcome page of Laravel.
Now, I don’t particularly like this way of doing it, it seems cumbersome. In jQuery, I like to use CSS selectors.
Let’s pull in the Symfony CSS selector library to help us with this.
In your composer.json, require symfony/css-selector
version 2.7 and run composer update
.
Now replace
1 |
echo $crawler->filterXPath('descendant-or-self::body/div/div/p')->text(); |
with
1 |
echo $crawler->filter('div.content p')->text(); |
Viola!!..It still returns the text This is the Welcome page of Laravel but this was achieved with a way cleaner syntax.
Now, let’s check out other methods
You can use the children(), last() and first() methods too
Do this var_dump($crawler->filter('body')->children());
and check out the results. There also methods for nodeName(), methods to select form elements , links and buttons.
If you are building a web scraping project, you might just want to use this library, it will help immensely.
Check out the documentation here
If you have any questions or comments, please let me know in the comments section below.
- How to build your own Youtube – Part 10 - August 1, 2016
- How to build your own Youtube – Part 9 - July 25, 2016
- How to build your own Youtube – Part 8 - July 23, 2016
- How to build your own Youtube – Part 6 - July 6, 2016
- Introducing Laravel Password v1.0 - July 3, 2016
- How to build your own Youtube – Part 5 - June 28, 2016
- How to build your own Youtube – Part 4 - June 23, 2016
- How to build your own Youtube – Part 3 - June 15, 2016
- How to build your own Youtube – Part 2 - June 8, 2016
- How to build your own Youtube – Part 1 - June 1, 2016