"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Posted on 2025-02-06
Browse:860

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Extracting Flat Text from Elements with a Designated Class Using PHP DOM

Extracting text from specific HTML elements is a common task in web development. PHP DOM provides robust tools for parsing HTML and accessing its contents. This article addresses a specific requirement to extract text from elements with a nominated class into two flat arrays.

Problem

Given HTML content containing text distributed between multiple p elements with alternating class names, the task is to save the text into two arrays: one for headings and one for content. For instance, given the following HTML:

Chapter 1

This is chapter 1

We need to obtain the following output:

$heading = ['Chapter 1', 'Chapter 2', 'Chapter 3'];
$content = ['This is chapter 1', 'This is chapter 2', 'This is chapter 3'];

Solution

To accomplish this extraction using PHP DOM, we employ DOMDocument and DOMXPath. The solution involves the following steps:

  1. Load the HTML into a DOMDocument object:
$dom = new DOMDocument();
$dom->loadHTML($test);
  1. Create a DOMXPath object to perform XPaths:
$xpath = new DOMXPath($dom);
  1. Use parseToArray() function to extract text from elements with specified class:
$heading = parseToArray($xpath, 'Heading1-H');
$content = parseToArray($xpath, 'Normal-H');

In the parseToArray() function:

  • It performs an XPath query for the designated class.
  • Iterates through the matched nodes and extracts their text content.
  • Stores the extracted text in an array, which is returned.

Here's the complete PHP code:

query($xpathquery);

    $resultarray = [];
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            $resultarray[] = $node->nodeValue;
        }
    }

    return $resultarray;
}

$test = 
    Chapter 1

This is chapter 1

Chapter 2

This is chapter 2

Chapter 3

This is chapter 3

HTML; $dom = new DOMDocument(); $dom->loadHTML($test); $xpath = new DOMXPath($dom); $heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H'); var_dump($heading); echo "
"; var_dump($content); echo "
";

This approach utilizes the power of PHP DOM and XPath to efficiently extract text from HTML documents, allowing for more complex and targeted content manipulation.

Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3