Removing Emojis from Strings in Python
The provided Python code for removing emojis fails because it contains syntax errors. Unicode strings must be designated using the u'' prefix on Python 2. Additionally, the re.UNICODE flag should be passed to the regular expression, and the input data should be converted to Unicode using codecs:
import codecs
import re
text = codecs.decode('This dog \U0001f602'.encode('UTF-8'), 'UTF-8')
print(text) # with emoji
emoji_pattern = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"]+", flags=re.UNICODE)
print(emoji_pattern.sub(r'', text)) # no emoji
<h4>Output</h4>
<pre>This dog ?
This dog
</pre>
Note: This pattern only matches a limited range of emojis. For a more comprehensive solution, refer to Unicode character ranges.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3