Performance Comparison of Pandas apply vs np.vectorize
It has been observed that np.vectorize() can be significantly faster than df.apply() when creating a new column based on existing columns in a Pandas DataFrame. The observed performance difference stems from the underlying mechanisms employed by these two methods.
df.apply() vs Python-Level Loops
df.apply() essentially creates a Python-level loop that iterates over each row of the DataFrame. As demonstrated in the provided benchmarks, Python-level loops such as list comprehensions and map are all relatively slow compared to true vectorised calculations.
np.vectorize() vs df.apply()
np.vectorize() converts a user-defined function into a universal function (ufunc). Ufuncs are highly optimised and can perform element-wise operations on NumPy arrays, leveraging C-based code and optimised algorithms. This is in contrast to df.apply(), which operates on Pandas Series objects and incurs additional overhead.
True Vectorisation: Optimal Performance
For truly efficient column creation, vectorised calculations within NumPy are highly recommended. Operations like numpy.where and direct element-wise division with df["A"] / df["B"] are extremely fast and avoid the overheads associated with loops.
Numba Optimisation
For even greater efficiency, it is possible to further optimise loops using Numba, a compiler that translates Python functions into optimised C code. Numba can reduce execution time to microseconds, significantly outperforming both df.apply() and np.vectorize().
Conclusion
While np.vectorize() may offer some improvement over df.apply(), it is not a true substitute for vectorised calculations in NumPy. To achieve maximum performance, utilise Numba optimisation or direct vectorised operations within NumPy for the creation of new columns in Pandas DataFrames.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3