Adding a New Column to an Existing DataFrame
When working with pandas DataFrames, it often becomes necessary to add new columns to existing dataframes. There are multiple approaches to achieve this, each with its own advantages and drawbacks.
1. Using assign (Recommended for Pandas 0.17 and above):
import pandas as pd import numpy as np # Generate a sample DataFrame df1 = pd.DataFrame({ 'a': [0.671399, 0.446172, 0.614758], 'b': [0.101208, -0.243316, 0.075793], 'c': [-0.181532, 0.051767, -0.451460], 'd': [0.241273, 1.577318, -0.012493] }) # Add a new column 'e' with random values sLength = len(df1['a']) df1 = df1.assign(e=pd.Series(np.random.randn(sLength)).values)
2. Using loc[row_index, col_indexer] = value:
# Add a new column 'f' using loc df1.loc[:, 'f'] = pd.Series(np.random.randn(sLength), index=df1.index)
3. Using df[new_column_name] = pd.Series(values, index=df.index):
# Add a new column 'g' using the old method df1['g'] = pd.Series(np.random.randn(sLength), index=df1.index)
Remember that the latter method may trigger the SettingWithCopyWarning in newer versions of pandas. Using assign or loc is generally recommended for efficiency and clarity.
Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.
Copyright© 2022 湘ICP备2022001581号-3