"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Remove Stubborn HTML Special Characters Before Stripping Tags?

How to Remove Stubborn HTML Special Characters Before Stripping Tags?

Published on 2024-11-08
Browse:593

How to Remove Stubborn HTML Special Characters Before Stripping Tags?

Stripping Out Obstinate HTML Special Characters

The strip_tags function, though adept at removing HTML tags, fails to tackle pesky HTML special characters such as   for non-breaking space or © for the copyright symbol. This can be a stumbling block in creating clean RSS feeds.

To remedy this issue, consider utilizing one of the following strategies:

  • HTML Entity Decoding: Use html_entity_decode to convert the special codes back to their original characters before your string undergoes strip_tags processing.
  • Regular Expression Removal: Alternately, employ the preg_replace function to target and remove these characters directly from your string. Here's a sample pattern that will accomplish the task:
$Content = preg_replace("/&#?[a-z0-9]{2,8};/i","",$Content);

Note that the above pattern includes a modification suggested by Jacco to prevent unintended replacements of genuine ampersand characters (&) in unencoded text. By specifying a character range of {2,8}, the pattern is more discriminative in targeting HTML special codes.

Release Statement This article is reprinted at: 1729256054 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3