"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Selectively Remove Non-ASCII Characters Preserving Spaces and Periods?

How to Selectively Remove Non-ASCII Characters Preserving Spaces and Periods?

Published on 2024-11-01
Browse:345

How to Selectively Remove Non-ASCII Characters Preserving Spaces and Periods?

Selective Removal of Non-ASCII Characters

Working with textual data often involves the need to remove non-ASCII characters, while preserving certain symbols like spaces and periods. While basic filtering methods may remove all non-ASCII characters, this might not be desirable in some cases.

Let's consider the following code:

def onlyascii(char):
    if ord(char)  127: return ''
    else: return char

This code removes all characters with ASCII values less than 48 or greater than 127, effectively stripping the text of non-ASCII characters. However, it also removes spaces (ASCII 32) and periods (ASCII 46).

To selectively remove non-ASCII characters while preserving spaces and periods, we can leverage Python's string.printable module:

import string
printable = set(string.printable)
filtered_data = filter(lambda x: x in printable, data)

The string.printable set contains all printable characters on the system, including digits, letters, symbols, spaces, and periods. Using this set as a filter, we can remove all non-printable characters from the string.

For example, if we have the string "some\x00string. with\x15 funny characters":

s = "some\x00string. with\x15 funny characters"
''.join(filter(lambda x: x in printable, s))

The result will be:

'somestring. with funny characters'

This method effectively removes non-ASCII characters while preserving spaces and periods, providing a clean string for further processing.

Release Statement This article is reprinted at: 1729341017 If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3