A new column in pandas which value depends on other columns

We Are Going To Discuss About A new column in pandas which value depends on other columns. So lets Start this Python Article.

A new column in pandas which value depends on other columns

  1. How to solve A new column in pandas which value depends on other columns

    To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.
    def calc_new_col(row): if row['col2'] <= 50 & row['col3'] <= 50: return row['col1'] else: return max(row['col1'], row['col2'], row['col3']) df["state"] = df.apply(calc_new_col, axis=1) # axis=1 makes sure that function is applied to each row print(df) datetime col1 col2 col3 state 2021-04-10 01:00:00 25 50 50 25 2021-04-10 02:00:00 25 50 50 25 2021-04-10 03:00:00 25 100 50 100 2021-04-10 04:00:00 50 50 100 100 2021-04-10 05:00:00 100 100 100 100
    apply helps the code to be cleaner and more reusable.

  2. A new column in pandas which value depends on other columns

    To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.
    def calc_new_col(row): if row['col2'] <= 50 & row['col3'] <= 50: return row['col1'] else: return max(row['col1'], row['col2'], row['col3']) df["state"] = df.apply(calc_new_col, axis=1) # axis=1 makes sure that function is applied to each row print(df) datetime col1 col2 col3 state 2021-04-10 01:00:00 25 50 50 25 2021-04-10 02:00:00 25 50 50 25 2021-04-10 03:00:00 25 100 50 100 2021-04-10 04:00:00 50 50 100 100 2021-04-10 05:00:00 100 100 100 100
    apply helps the code to be cleaner and more reusable.

Solution 1

To improve upon other answer, I would use pandas apply for iterating over rows and calculating new column.

def calc_new_col(row):
   if row['col2'] <= 50 & row['col3'] <= 50:
        return row['col1']
    else:
        return max(row['col1'], row['col2'], row['col3'])

df["state"] = df.apply(calc_new_col, axis=1)
# axis=1 makes sure that function is applied to each row

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100

apply helps the code to be cleaner and more reusable.

Original Author xssChauhan Of This Content

Solution 2

# Create a mask:

# Create a mask for the basic condition
mask1 = ((df['col2'] <= 50) & (df['col3'] <= 50))

# Use loc to select rows where condition is met and input the df['col1'] value in state
df.loc[mask1, 'state'] = df['col1']

# Check for rows where condition is not met ~ does that, input the mean in state.
df.loc[~mask1, 'state'] = (df['col1'] + df['col2'] + df['col3'])/3

Original Author Sid Of This Content

Solution 3

You can iterate through the dataframe’s rows and check the condition

values = []

for ind, row in df.iterrows():
    if row['col2'] <= 50 & row['col3'] <= 50:
        values.append(row['col1'])
    else:
        values.append(max(row['col1'], row['col2'], row['col3']))

df['state'] = values

print(df)
            datetime  col1  col2  col3  state
2021-04-10  01:00:00    25    50    50     25
2021-04-10  02:00:00    25    50    50     25
2021-04-10  03:00:00    25   100    50    100
2021-04-10  04:00:00    50    50   100    100
2021-04-10  05:00:00   100   100   100    100

Original Author imdevskp Of This Content

Solution 4

An option using np.where:

import numpy as np
import pandas as pd

df = pd.DataFrame({'datetime': {0: '2021-04-10 01:00:00', 1: '2021-04-10 02:00:00',
                                2: '2021-04-10 03:00:00', 3: '2021-04-10 04:00:00',
                                4: '2021-04-10 05:00:00'},
                   'col1': {0: 25.0, 1: 25.0, 2: 25.0, 3: 50.0, 4: 100.0},
                   'col2': {0: 50.0, 1: 50.0, 2: 100.0, 3: 50.0, 4: 100.0},
                   'col3': {0: 50, 1: 50, 2: 50, 3: 100, 4: 100}})

df['state'] = np.where((df['col2'] <= 50) & (df['col3'] <= 50), df.col1, df.max(axis=1))

print(df)

Output:

           datetime  col1  col2  col3  state
2021-04-10 01:00:00  25.0  50.0    50   25.0
2021-04-10 02:00:00  25.0  50.0    50   25.0
2021-04-10 03:00:00  25.0 100.0    50  100.0
2021-04-10 04:00:00  50.0  50.0   100  100.0
2021-04-10 05:00:00 100.0 100.0   100  100.0

Original Author Henry Ecker Of This Content

Conclusion

So This is all About This Tutorial. Hope This Tutorial Helped You. Thank You.

Also Read,

ittutorial team

I am an Information Technology Engineer. I have Completed my MCA And I have 4 Year Plus Experience, I am a web developer with knowledge of multiple back-end platforms Like PHP, Node.js, Python and frontend JavaScript frameworks Like Angular, React, and Vue.

Leave a Comment