Extracting Flat Text from Elements with a Designated Class Using PHP DOM
Extracting text from specific HTML elements is a common task in web development. PHP DOM provides robust tools for parsing HTML and accessing its contents. This article addresses a specific requirement to extract text from elements with a nominated class into two flat arrays.
Problem
Given HTML content containing text distributed between multiple p elements with alternating class names, the task is to save the text into two arrays: one for headings and one for content. For instance, given the following HTML:
Chapter 1
This is chapter 1
We need to obtain the following output:
$heading = ['Chapter 1', 'Chapter 2', 'Chapter 3']; $content = ['This is chapter 1', 'This is chapter 2', 'This is chapter 3'];
Solution
To accomplish this extraction using PHP DOM, we employ DOMDocument and DOMXPath. The solution involves the following steps:
$dom = new DOMDocument(); $dom->loadHTML($test);
$xpath = new DOMXPath($dom);
$heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H');
In the parseToArray() function:
Here's the complete PHP code:
query($xpathquery); $resultarray = []; foreach ($elements as $element) { $nodes = $element->childNodes; foreach ($nodes as $node) { $resultarray[] = $node->nodeValue; } } return $resultarray; } $test = Chapter 1This is chapter 1
Chapter 2
This is chapter 2
Chapter 3
This is chapter 3
HTML; $dom = new DOMDocument(); $dom->loadHTML($test); $xpath = new DOMXPath($dom); $heading = parseToArray($xpath, 'Heading1-H'); $content = parseToArray($xpath, 'Normal-H'); var_dump($heading); echo "
"; var_dump($content); echo "
";
This approach utilizes the power of PHP DOM and XPath to efficiently extract text from HTML documents, allowing for more complex and targeted content manipulation.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3