Pandas: Extracting Numbers from Strings
When working with data frames in Pandas, it's often necessary to extract numeric information from cells that contain non-numeric characters. This can be challenging, but Pandas provides several methods to help you achieve this.
Using str.extract() for Number Extraction
One effective method for extracting numbers from strings is str.extract(). This method allows you to specify a regular expression pattern that defines the numeric data you want to capture.
Consider the following data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A':['1a',np.nan,'10a','100b','0b'],
})
print(df)
Output:
A 0 1a 1 NaN 2 10a 3 100b 4 0b
To extract the numbers from each cell, you can use the following regular expression:
df.A.str.extract('(\d )')
The regex pattern (\d ) captures any sequence of one or more digits. The parentheses around the pattern create a capturing group, which is used to return the matched portion of the string.
Output:
0 1 1 NaN 2 10 3 100 4 0 Name: A, dtype: object
As you can see, the desired numbers have been successfully extracted from each cell, even those that contained non-numeric characters. Note that this method will only work for whole numbers and not for floating-point numbers.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3