Stripping HTML Special Characters from RSS Feed
When creating RSS feed files, removing HTML tags using PHP's strip_tags function is common practice. However, this function often fails to remove HTML special code characters like , &, and ©.
To effectively remove these characters, consider the following options:
Option 1: Using html_entity_decode
You can use html_entity_decode to decode these characters back to their original forms.
$decodedContent = html_entity_decode($originalContent);
Option 2: Using preg_replace
Alternatively, you can use preg_replace with a regular expression to remove the characters directly:
$cleanContent = preg_replace("/&#?[a-z0-9] ;/i","",$originalContent);
This pattern matches HTML special characters represented as numeric entities ( for example) or named entities ( ).
Alternative Pattern
To improve the accuracy of the replacement, consider using the following modified pattern, as suggested by Jacco:
$cleanContent = preg_replace("/&#?[a-z0-9]{2,8};/i","",$originalContent);
This pattern limits the replacement to entities with 2 to 8 characters, reducing the risk of unintended replacements.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3