"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?

How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?

Published on 2024-12-25
Browse:817

How to Explode (Split) Pandas DataFrame String Entries into Separate Rows?

Explode (Split) Pandas DataFrame String Entries into Separate Rows

In Pandas, a common requirement is to split comma-separated values in a text string column and create a new row for each entry. This can be achieved through various methods.

Using Series.explode() or DataFrame.explode()

For Pandas versions 0.25.0 and above, the Series.explode() and DataFrame.explode() methods provide a convenient way to explode CSV-like columns:

For single columns:

df.explode('column_name')

For multiple columns:

df.explode(['column1', 'column2'])  # Pandas 1.3.0 

Generic Vectorized Function

A more versatile vectorized approach that works for both normal and list columns is provided below:

def explode(df, lst_cols, fill_value='', preserve_index=False):
    # Convert CSV string columns to list columns
    for col in lst_cols:
        df[col] = df[col].str.split(',')

    # Extract all non-list columns
    idx_cols = df.columns.difference(lst_cols)

    # Calculate list lengths
    lens = df[lst_cols[0]].str.len()

    # Create exploded DataFrame
    result = (pd.DataFrame({
        col: np.repeat(df[col].values, lens)
        for col in idx_cols
    }, index=np.repeat(df.index.values, lens))
        .assign(**{col: np.concatenate(df.loc[lens>0, col].values)
                    for col in lst_cols}))

    # Handle empty list rows
    if (lens == 0).any():
        result = result.append(df.loc[lens==0, idx_cols], sort=False).fillna(fill_value)

    # Revert index order and reset index if requested
    result = result.sort_index()
    if not preserve_index:
        result = result.reset_index(drop=True)

    return result

Applications

CSV Column:

df['var1'] = df['var1'].str.split(',')

Multiple List Columns:

explode(df, ['num', 'text'], fill_value='')
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3