Explode (Split) Pandas DataFrame String Entries into Separate Rows
In Pandas, a common requirement is to split comma-separated values in a text string column and create a new row for each entry. This can be achieved through various methods.
Using Series.explode() or DataFrame.explode()
For Pandas versions 0.25.0 and above, the Series.explode() and DataFrame.explode() methods provide a convenient way to explode CSV-like columns:
For single columns:
df.explode('column_name')
For multiple columns:
df.explode(['column1', 'column2']) # Pandas 1.3.0
Generic Vectorized Function
A more versatile vectorized approach that works for both normal and list columns is provided below:
def explode(df, lst_cols, fill_value='', preserve_index=False): # Convert CSV string columns to list columns for col in lst_cols: df[col] = df[col].str.split(',') # Extract all non-list columns idx_cols = df.columns.difference(lst_cols) # Calculate list lengths lens = df[lst_cols[0]].str.len() # Create exploded DataFrame result = (pd.DataFrame({ col: np.repeat(df[col].values, lens) for col in idx_cols }, index=np.repeat(df.index.values, lens)) .assign(**{col: np.concatenate(df.loc[lens>0, col].values) for col in lst_cols})) # Handle empty list rows if (lens == 0).any(): result = result.append(df.loc[lens==0, idx_cols], sort=False).fillna(fill_value) # Revert index order and reset index if requested result = result.sort_index() if not preserve_index: result = result.reset_index(drop=True) return result
Applications
CSV Column:
df['var1'] = df['var1'].str.split(',')
Multiple List Columns:
explode(df, ['num', 'text'], fill_value='')
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3