When attempting to process oversized dataframes, a common obstacle is the dreaded Memory Error. One effective solution is to divide the dataframe into smaller, manageable chunks. This strategy not only reduces memory consumption but also facilitates efficient processing.
To achieve this, we can leverage either list comprehension or the NumPy array_split function.
n = 200000 # Chunk row size
list_df = [df[i:i n] for i in range(0, df.shape[0], n)]
list_df = np.array_split(df, math.ceil(len(df) / n))
Individual chunks can then be retrieved using:
list_df[0]
list_df[1]
...
To reassemble the chunks into a single dataframe, employ pd.concat:
# Example: Concatenating by chunks
rejoined_df = pd.concat(list_df)
To split the dataframe by AcctName values, utilize the groupby method:
list_df = []
for n, g in df.groupby('AcctName'):
list_df.append(g)
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3