How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Front page > Programming > How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Posted on 2025-02-06

Browse:860

How to Extract Text from HTML Elements with Specific Classes into Flat Arrays using PHP DOM?

Extracting Flat Text from Elements with a Designated Class Using PHP DOM

Extracting text from specific HTML elements is a common task in web development. PHP DOM provides robust tools for parsing HTML and accessing its contents. This article addresses a specific requirement to extract text from elements with a nominated class into two flat arrays.

Problem

Given HTML content containing text distributed between multiple p elements with alternating class names, the task is to save the text into two arrays: one for headings and one for content. For instance, given the following HTML:

Chapter 1

This is chapter 1

We need to obtain the following output:

$heading = ['Chapter 1', 'Chapter 2', 'Chapter 3'];
$content = ['This is chapter 1', 'This is chapter 2', 'This is chapter 3'];

Solution

To accomplish this extraction using PHP DOM, we employ DOMDocument and DOMXPath. The solution involves the following steps:

Load the HTML into a DOMDocument object:

$dom = new DOMDocument();
$dom->loadHTML($test);

Create a DOMXPath object to perform XPaths:

$xpath = new DOMXPath($dom);

Use parseToArray() function to extract text from elements with specified class:

$heading = parseToArray($xpath, 'Heading1-H');
$content = parseToArray($xpath, 'Normal-H');

In the parseToArray() function:

It performs an XPath query for the designated class.
Iterates through the matched nodes and extracts their text content.
Stores the extracted text in an array, which is returned.

Here's the complete PHP code:

query($xpathquery);

    $resultarray = [];
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            $resultarray[] = $node->nodeValue;
        }
    }

    return $resultarray;
}

$test = 
    Chapter 1


    This is chapter 1


    Chapter 2


    This is chapter 2


    Chapter 3


    This is chapter 3

HTML;

$dom = new DOMDocument();
$dom->loadHTML($test);
$xpath = new DOMXPath($dom);
$heading = parseToArray($xpath, 'Heading1-H');
$content = parseToArray($xpath, 'Normal-H');

var_dump($heading);
echo "
";
var_dump($content);
echo "
";

This approach utilizes the power of PHP DOM and XPath to efficiently extract text from HTML documents, allowing for more complex and targeted content manipulation.

Latest tutorial More>

$Why Doesn\'t Firefox Display Images Using the CSS `content` Property?$
Why Doesn\'t Firefox Display Images Using the CSS `content` Property?
Displaying Images with Content URL in FirefoxAn issue has been encountered where certain browsers, specifically Firefox, fail to display images when r...

Programming Posted on 2025-04-08
How Can I Efficiently Create Dictionaries Using Python Comprehension?
Python Dictionary ComprehensionIn Python, dictionary comprehensions offer a concise way to generate new dictionaries. While they are similar to list c...

Programming Posted on 2025-04-08
Can You Use CSS to Color Console Output in Chrome and Firefox?
Displaying Colors in JavaScript ConsoleIs it possible to use Chrome's console to display colored text, such as red for errors, orange for warnings...

Programming Posted on 2025-04-08
How Can I Configure Pytesseract for Single Digit Recognition with Number-Only Output?
Pytesseract OCR with Single Digit Recognition and Number-Only ConstraintsIn the context of Pytesseract, configuring Tesseract to recognize single digi...

Programming Posted on 2025-04-08
How to Combine Data from Three MySQL Tables into a New Table?
mySQL: Creating a New Table from Data and Columns of Three TablesQuestion:How can I create a new table that combines selected data from three existing...

Programming Posted on 2025-04-08
How to Correctly Use LIKE Queries with PDO Parameters?
Using LIKE Queries in PDOWhen trying to implement LIKE queries in PDO, you may encounter issues like the one described in the query below:$query = &qu...

Programming Posted on 2025-04-08
Why Doesn't `body { margin: 0; }` Always Remove Top Margin in CSS?
Addressing Body Margin Removal in CSSFor novice web developers, removing the margin of the body element can be a confusing task. Often, the code provi...

Programming Posted on 2025-04-08
How does Android send POST data to PHP server?
Sending POST Data in AndroidIntroductionThis article addresses the need to send POST data to a PHP script and display the result in an Android applica...

Programming Posted on 2025-04-08
$How to Resolve the \"Invalid Use of Group Function\" Error in MySQL When Finding Max Count?$
How to Resolve the \"Invalid Use of Group Function\" Error in MySQL When Finding Max Count?
How to Retrieve the Maximum Count Using MySQLIn MySQL, you may encounter an issue while attempting to find the maximum count of values grouped by a sp...

Programming Posted on 2025-04-08
How to Handle User Input in Java's Full-Screen Exclusive Mode?
Handling User Input in Full Screen Exclusive Mode in JavaIntroductionWhen running a Java application in full screen exclusive mode, the usual event ha...

Programming Posted on 2025-04-08
How to Parse JSON Arrays in Go Using the `json` Package?
Parsing JSON Arrays in Go with the JSON PackageProblem: How can you parse a JSON string representing an array in Go using the json package?Code Exampl...

Programming Posted on 2025-04-08
How to Simplify JSON Parsing in PHP for Multi-Dimensional Arrays?
Parsing JSON with PHPTrying to parse JSON data in PHP can be challenging, especially when dealing with multi-dimensional arrays. To simplify the proce...

Programming Posted on 2025-04-08
Do I Need to Explicitly Delete Heap Allocations in C++ Before Program Exit?
Explicit Deletion in C Despite Program ExitWhen working with dynamic memory allocation in C , developers often wonder if it's necessary to manu...

Programming Posted on 2025-04-08
Is There a Performance Difference Between Using a For-Each Loop and an Iterator for Collection Traversal in Java?
For Each Loop vs. Iterator: Efficiency in Collection TraversalIntroductionWhen traversing a collection in Java, the choice arises between using a for-...

Programming Posted on 2025-04-08
$Why Am I Getting a \"Class \'ZipArchive\' Not Found\" Error After Installing Archive_Zip on My Linux Server?$
Why Am I Getting a \"Class \'ZipArchive\' Not Found\" Error After Installing Archive_Zip on My Linux Server?
Class 'ZipArchive' Not Found Error While Installing Archive_Zip on Linux ServerSymptom:When attempting to run a script that utilizes the ZipAr...

Programming Posted on 2025-04-08