"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > Programming > How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!

How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!

Published on 2024-11-08
Browse:999

How to Efficiently Process Large DataFrames in Pandas: Chunk It Up!

Pandas - Slicing Large Dataframes into Chunks

When attempting to process oversized dataframes, a common obstacle is the dreaded Memory Error. One effective solution is to divide the dataframe into smaller, manageable chunks. This strategy not only reduces memory consumption but also facilitates efficient processing.

To achieve this, we can leverage either list comprehension or the NumPy array_split function.

List Comprehension

n = 200000  # Chunk row size
list_df = [df[i:i n] for i in range(0, df.shape[0], n)]

NumPy array_split

list_df = np.array_split(df, math.ceil(len(df) / n))

Individual chunks can then be retrieved using:

list_df[0]
list_df[1]
...

To reassemble the chunks into a single dataframe, employ pd.concat:

# Example: Concatenating by chunks
rejoined_df = pd.concat(list_df)

Slicing by AcctName

To split the dataframe by AcctName values, utilize the groupby method:

list_df = []

for n, g in df.groupby('AcctName'):
    list_df.append(g)
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3