In this tutorial, we'll compare how to add new columns to a DataFrame using two amazing Python libraries – **Pandas** and **Polars**. DataFrames are a fundamental tool in data analysis, and adding new columns is something you'll do often when cleaning or preparing data. ## Importing Libraries and Creating Sample Data Let's start by creating some sample data. Here's the code for generating a small mock dataset in both Pandas and Polars: ### Setting Up Data
# For Pandas
import pandas as pd
# Create a simple DataFrame using Pandas
data_pandas = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"salary": [50000, 60000, 70000]
})
print("Pandas DataFrame:")
print(data_pandas)
# For Polars
import polars as pl
# Create a simple DataFrame using Polars
data_polars = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"salary": [50000, 60000, 70000]
})
print("\nPolars DataFrame:")
print(data_polars)## 1. Adding a New Column with a Static Value Sometimes you may want to add a new column that has the same value for all rows, such as a default category or a flag. Let's do this in both Pandas and Polars. ### Code Example: Adding Static Columns #### Using Pandas
# Add a static value to a new column in Pandas
data_pandas["department"] = "Engineering"
print("Pandas DataFrame with static column:")
print(data_pandas)#### Using Polars
# Add a static value to a new column in Polars
data_polars = data_polars.with_columns(pl.lit("Engineering").alias("department"))
print("Polars DataFrame with static column:")
print(data_polars)## 2. Adding a New Column Based on an Existing Column You can generate a new column based on the values of existing columns. For example, let's calculate a "bonus" column as 10% of the "salary". ### Code Example: Adding a Column Based on Logic #### Using Pandas
# Create a new column "bonus" as 10% of salary in Pandas
data_pandas["bonus"] = data_pandas["salary"] * 0.1
print("Pandas DataFrame with bonus column:")
print(data_pandas)#### Using Polars
# Create a new column "bonus" as 10% of salary in Polars
data_polars = data_polars.with_columns((pl.col("salary") * 0.1).alias("bonus"))
print("Polars DataFrame with bonus column:")
print(data_polars)## 3. Adding a Column Based on a Conditional Let’s classify employees into two age groups: "junior" if age is less than 30, and "senior" otherwise. ### Code Example: Adding Conditional Column #### Using Pandas
# Add "age_group" column based on condition in Pandas
data_pandas["age_group"] = ["junior" if age < 30 else "senior" for age in data_pandas["age"]]
print("Pandas DataFrame with age_group column:")
print(data_pandas)#### Using Polars
# Add "age_group" column
data_polars = data_polars.with_columns(
pl.when(pl.col("age") < 30)
.then(pl.lit("junior"))
.otherwise(pl.lit("senior"))
.alias("age_group")
)
print("Polars DataFrame with age_group column:")
print(data_polars)
## Summary: Pandas vs. Polars
| Feature | Pandas | Polars |
|--------------------------------------------|-------------------------------------------------|---------------------------------------|
| Adding a static column | `df["new_col"] = value` | `.with_columns(pl.lit(value).alias())` |
| Adding a column based on existing columns | `df["new_col"] = df["col"] * 0.1` | `.with_columns((pl.col("col") * 0.1))` |
| Adding a column based on conditions | [List comprehension](/tutorials/list-comprehension) or `np.where()` | `.when().then().otherwise().alias()` |