Splitting Comma-Separated Pandas Dataframe Strings into Separate Rows
In pandas dataframes, it is often encountered that one or more columns contain comma-separated values (CSV) that need to be split into individual rows. To achieve this, several approaches can be employed:
Using Series.explode() or DataFrame.explode():
This method is available in Pandas 0.25.0 and above and is specifically designed for exploding list-like columns.
df.explode('column_name')
Using a Vectorized Function:
For situations involving multiple normal and multiple list columns, a vectorized function can provide a more versatile solution.
def explode(df, lst_cols, fill_value='', preserve_index=False): # ... (implementation details)
Converting CSV Strings to Lists:
If the goal is solely to convert CSV strings to lists, this can be achieved by splitting the strings using str.split().
df['var1'] = df['var1'].str.split(',')
Custom Vectorized Approach:
This approach can handle multiple columns, including both normal and list columns.
exploded_df = pd.DataFrame({ col: np.repeat(x[col].values, x[lst_col].str.len()) for col in x.columns.difference([lst_col]) }).assign(**{lst_col: np.concatenate(x[lst_col].values)})[x.columns.tolist()]
Legacy Solution:
An earlier method involves using .set_index(), .str.split(), .stack(), and .reset_index() to split the CSV strings and stack them into individual rows.
These approaches offer various options for splitting comma-separated strings in Pandas dataframes, catering to specific requirements and performance considerations.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3