Replacing Blank Values (Whitespace) with NaN in Pandas
Data cleaning is a crucial step in data analysis. One common task is replacing blank values (whitespace) with NaN. This can be done efficiently using Pandas.
To achieve this, utilize the df.replace() function. This function allows for a regular expression-based search and replace operation on DataFrame values. Here's how you can implement it:
import numpy as np
import pandas as pd
df = pd.DataFrame([
[-0.532681, 'foo', 0],
[1.490752, 'bar', 1],
[-1.387326, 'foo', 2],
[0.814772, 'baz', ' '],
[-0.222552, ' ', 4],
[-1.176781, 'qux', ' '],
], columns='A B C'.split(), index=pd.date_range('2000-01-01','2000-01-06'))
# Replace fields that contain only whitespace (or are empty) with NaN
print(df.replace(r'^\s*$', np.nan, regex=True))
# Output:
# A B C
# 2000-01-01 -0.532681 foo 0
# 2000-01-02 1.490752 bar 1
# 2000-01-03 -1.387326 foo 2
# 2000-01-04 0.814772 baz NaN
# 2000-01-05 -0.222552 NaN 4
# 2000-01-06 -1.176781 qux NaN
Note that this code replaces fields that contain only whitespace or are empty (i.e., match the regular expression r'^\s*$'**). If your valid data contains white spaces, adjust the regex accordingly (e.g., remove the **$ from the end for r'^\s ').
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3