Adding New Columns to DataFrames - Pandas vs. Polars

In this tutorial, we'll compare how to add new columns to a DataFrame using two amazing Python libraries – **Pandas** and **Polars**. DataFrames are a fundamental tool in data analysis, and adding new columns is something you'll do often when cleaning or preparing data.


## Importing Libraries and Creating Sample Data

Let's start by creating some sample data. Here's the code for generating a small mock dataset in both Pandas and Polars:

### Setting Up Data

# For Pandas
import pandas as pd

# Create a simple DataFrame using Pandas
data_pandas = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000, 60000, 70000]
})
print("Pandas DataFrame:")
print(data_pandas)

# For Polars
import polars as pl

# Create a simple DataFrame using Polars
data_polars = pl.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "salary": [50000, 60000, 70000]
})
print("\nPolars DataFrame:")
print(data_polars)
Pandas DataFrame:
      name  age  salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000
Polars DataFrame:
shape: (3, 3)
┌─────────┬─────┬────────┐
│ name    ┆ age ┆ salary │
│ ---     ┆ --- ┆ ---    │
│ str     ┆ i64 ┆ i64    │
╞═════════╪═════╪════════╡
│ Alice   ┆ 25  ┆ 50000  │
│ Bob     ┆ 30  ┆ 60000  │
│ Charlie ┆ 35  ┆ 70000  │
└─────────┴─────┴────────┘

## 1. Adding a New Column with a Static Value

Sometimes you may want to add a new column that has the same value for all rows, such as a default category or a flag. Let's do this in both Pandas and Polars.

### Code Example: Adding Static Columns

#### Using Pandas
# Add a static value to a new column in Pandas
data_pandas["department"] = "Engineering"
print("Pandas DataFrame with static column:")
print(data_pandas)
Pandas DataFrame with static column:
      name  age  salary   department
0    Alice   25   50000  Engineering
1      Bob   30   60000  Engineering
2  Charlie   35   70000  Engineering
#### Using Polars
# Add a static value to a new column in Polars
data_polars = data_polars.with_columns(pl.lit("Engineering").alias("department"))
print("Polars DataFrame with static column:")
print(data_polars)
Polars DataFrame with static column:
shape: (3, 4)
┌─────────┬─────┬────────┬─────────────┐
│ name    ┆ age ┆ salary ┆ department  │
│ ---     ┆ --- ┆ ---    ┆ ---         │
│ str     ┆ i64 ┆ i64    ┆ str         │
╞═════════╪═════╪════════╪═════════════╡
│ Alice   ┆ 25  ┆ 50000  ┆ Engineering │
│ Bob     ┆ 30  ┆ 60000  ┆ Engineering │
│ Charlie ┆ 35  ┆ 70000  ┆ Engineering │
└─────────┴─────┴────────┴─────────────┘

## 2. Adding a New Column Based on an Existing Column

You can generate a new column based on the values of existing columns. For example, let's calculate a "bonus" column as 10% of the "salary".

### Code Example: Adding a Column Based on Logic

#### Using Pandas
# Create a new column "bonus" as 10% of salary in Pandas
data_pandas["bonus"] = data_pandas["salary"] * 0.1
print("Pandas DataFrame with bonus column:")
print(data_pandas)
Pandas DataFrame with bonus column:
      name  age  salary   department   bonus
0    Alice   25   50000  Engineering  5000.0
1      Bob   30   60000  Engineering  6000.0
2  Charlie   35   70000  Engineering  7000.0
#### Using Polars
# Create a new column "bonus" as 10% of salary in Polars
data_polars = data_polars.with_columns((pl.col("salary") * 0.1).alias("bonus"))
print("Polars DataFrame with bonus column:")
print(data_polars)
Polars DataFrame with bonus column:
shape: (3, 5)
┌─────────┬─────┬────────┬─────────────┬────────┐
│ name    ┆ age ┆ salary ┆ department  ┆ bonus  │
│ ---     ┆ --- ┆ ---    ┆ ---         ┆ ---    │
│ str     ┆ i64 ┆ i64    ┆ str         ┆ f64    │
╞═════════╪═════╪════════╪═════════════╪════════╡
│ Alice   ┆ 25  ┆ 50000  ┆ Engineering ┆ 5000.0 │
│ Bob     ┆ 30  ┆ 60000  ┆ Engineering ┆ 6000.0 │
│ Charlie ┆ 35  ┆ 70000  ┆ Engineering ┆ 7000.0 │
└─────────┴─────┴────────┴─────────────┴────────┘

## 3. Adding a Column Based on a Conditional

Let’s classify employees into two age groups: "junior" if age is less than 30, and "senior" otherwise.

### Code Example: Adding Conditional Column

#### Using Pandas
# Add "age_group" column based on condition in Pandas
data_pandas["age_group"] = ["junior" if age < 30 else "senior" for age in data_pandas["age"]]
print("Pandas DataFrame with age_group column:")
print(data_pandas)
Pandas DataFrame with age_group column:
      name  age  salary   department   bonus age_group
0    Alice   25   50000  Engineering  5000.0    junior
1      Bob   30   60000  Engineering  6000.0    senior
2  Charlie   35   70000  Engineering  7000.0    senior
#### Using Polars
# Add "age_group" column
data_polars = data_polars.with_columns(
    pl.when(pl.col("age") < 30)
      .then(pl.lit("junior"))
      .otherwise(pl.lit("senior"))
      .alias("age_group")
)

print("Polars DataFrame with age_group column:")
print(data_polars)
Polars DataFrame with age_group column:
shape: (3, 6)
┌─────────┬─────┬────────┬─────────────┬────────┬───────────┐
│ name    ┆ age ┆ salary ┆ department  ┆ bonus  ┆ age_group │
│ ---     ┆ --- ┆ ---    ┆ ---         ┆ ---    ┆ ---       │
│ str     ┆ i64 ┆ i64    ┆ str         ┆ f64    ┆ str       │
╞═════════╪═════╪════════╪═════════════╪════════╪═══════════╡
│ Alice   ┆ 25  ┆ 50000  ┆ Engineering ┆ 5000.0 ┆ junior    │
│ Bob     ┆ 30  ┆ 60000  ┆ Engineering ┆ 6000.0 ┆ senior    │
│ Charlie ┆ 35  ┆ 70000  ┆ Engineering ┆ 7000.0 ┆ senior    │
└─────────┴─────┴────────┴─────────────┴────────┴───────────┘

## Summary: Pandas vs. Polars

| Feature                                    | Pandas                                           | Polars                                |
|--------------------------------------------|-------------------------------------------------|---------------------------------------|
| Adding a static column                     | `df["new_col"] = value`                        | `.with_columns(pl.lit(value).alias())` |
| Adding a column based on existing columns  | `df["new_col"] = df["col"] * 0.1`              | `.with_columns((pl.col("col") * 0.1))` |
| Adding a column based on conditions        | List comprehension or `np.where()`             | `.when().then().otherwise().alias()`  |