In this tutorial, we'll compare how to add new columns to a DataFrame using two amazing Python libraries – **Pandas** and **Polars**. DataFrames are a fundamental tool in data analysis, and adding new columns is something you'll do often when cleaning or preparing data. ## Importing Libraries and Creating Sample Data Let's start by creating some sample data. Here's the code for generating a small mock dataset in both Pandas and Polars: ### Setting Up Data
# For Pandas import pandas as pd # Create a simple DataFrame using Pandas data_pandas = pd.DataFrame({ "name": ["Alice", "Bob", "Charlie"], "age": [25, 30, 35], "salary": [50000, 60000, 70000] }) print("Pandas DataFrame:") print(data_pandas) # For Polars import polars as pl # Create a simple DataFrame using Polars data_polars = pl.DataFrame({ "name": ["Alice", "Bob", "Charlie"], "age": [25, 30, 35], "salary": [50000, 60000, 70000] }) print("\nPolars DataFrame:") print(data_polars)
Pandas DataFrame: name age salary 0 Alice 25 50000 1 Bob 30 60000 2 Charlie 35 70000
Polars DataFrame: shape: (3, 3) ┌─────────┬─────┬────────┐ │ name ┆ age ┆ salary │ │ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 │ ╞═════════╪═════╪════════╡ │ Alice ┆ 25 ┆ 50000 │ │ Bob ┆ 30 ┆ 60000 │ │ Charlie ┆ 35 ┆ 70000 │ └─────────┴─────┴────────┘
## 1. Adding a New Column with a Static Value Sometimes you may want to add a new column that has the same value for all rows, such as a default category or a flag. Let's do this in both Pandas and Polars. ### Code Example: Adding Static Columns #### Using Pandas
# Add a static value to a new column in Pandas data_pandas["department"] = "Engineering" print("Pandas DataFrame with static column:") print(data_pandas)
Pandas DataFrame with static column: name age salary department 0 Alice 25 50000 Engineering 1 Bob 30 60000 Engineering 2 Charlie 35 70000 Engineering
#### Using Polars
# Add a static value to a new column in Polars data_polars = data_polars.with_columns(pl.lit("Engineering").alias("department")) print("Polars DataFrame with static column:") print(data_polars)
Polars DataFrame with static column: shape: (3, 4) ┌─────────┬─────┬────────┬─────────────┐ │ name ┆ age ┆ salary ┆ department │ │ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ str │ ╞═════════╪═════╪════════╪═════════════╡ │ Alice ┆ 25 ┆ 50000 ┆ Engineering │ │ Bob ┆ 30 ┆ 60000 ┆ Engineering │ │ Charlie ┆ 35 ┆ 70000 ┆ Engineering │ └─────────┴─────┴────────┴─────────────┘
## 2. Adding a New Column Based on an Existing Column You can generate a new column based on the values of existing columns. For example, let's calculate a "bonus" column as 10% of the "salary". ### Code Example: Adding a Column Based on Logic #### Using Pandas
# Create a new column "bonus" as 10% of salary in Pandas data_pandas["bonus"] = data_pandas["salary"] * 0.1 print("Pandas DataFrame with bonus column:") print(data_pandas)
Pandas DataFrame with bonus column: name age salary department bonus 0 Alice 25 50000 Engineering 5000.0 1 Bob 30 60000 Engineering 6000.0 2 Charlie 35 70000 Engineering 7000.0
#### Using Polars
# Create a new column "bonus" as 10% of salary in Polars data_polars = data_polars.with_columns((pl.col("salary") * 0.1).alias("bonus")) print("Polars DataFrame with bonus column:") print(data_polars)
Polars DataFrame with bonus column: shape: (3, 5) ┌─────────┬─────┬────────┬─────────────┬────────┐ │ name ┆ age ┆ salary ┆ department ┆ bonus │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ str ┆ f64 │ ╞═════════╪═════╪════════╪═════════════╪════════╡ │ Alice ┆ 25 ┆ 50000 ┆ Engineering ┆ 5000.0 │ │ Bob ┆ 30 ┆ 60000 ┆ Engineering ┆ 6000.0 │ │ Charlie ┆ 35 ┆ 70000 ┆ Engineering ┆ 7000.0 │ └─────────┴─────┴────────┴─────────────┴────────┘
## 3. Adding a Column Based on a Conditional Let’s classify employees into two age groups: "junior" if age is less than 30, and "senior" otherwise. ### Code Example: Adding Conditional Column #### Using Pandas
# Add "age_group" column based on condition in Pandas data_pandas["age_group"] = ["junior" if age < 30 else "senior" for age in data_pandas["age"]] print("Pandas DataFrame with age_group column:") print(data_pandas)
Pandas DataFrame with age_group column: name age salary department bonus age_group 0 Alice 25 50000 Engineering 5000.0 junior 1 Bob 30 60000 Engineering 6000.0 senior 2 Charlie 35 70000 Engineering 7000.0 senior
#### Using Polars
# Add "age_group" column data_polars = data_polars.with_columns( pl.when(pl.col("age") < 30) .then(pl.lit("junior")) .otherwise(pl.lit("senior")) .alias("age_group") ) print("Polars DataFrame with age_group column:") print(data_polars)
Polars DataFrame with age_group column: shape: (3, 6) ┌─────────┬─────┬────────┬─────────────┬────────┬───────────┐ │ name ┆ age ┆ salary ┆ department ┆ bonus ┆ age_group │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ str ┆ i64 ┆ i64 ┆ str ┆ f64 ┆ str │ ╞═════════╪═════╪════════╪═════════════╪════════╪═══════════╡ │ Alice ┆ 25 ┆ 50000 ┆ Engineering ┆ 5000.0 ┆ junior │ │ Bob ┆ 30 ┆ 60000 ┆ Engineering ┆ 6000.0 ┆ senior │ │ Charlie ┆ 35 ┆ 70000 ┆ Engineering ┆ 7000.0 ┆ senior │ └─────────┴─────┴────────┴─────────────┴────────┴───────────┘
## Summary: Pandas vs. Polars | Feature | Pandas | Polars | |--------------------------------------------|-------------------------------------------------|---------------------------------------| | Adding a static column | `df["new_col"] = value` | `.with_columns(pl.lit(value).alias())` | | Adding a column based on existing columns | `df["new_col"] = df["col"] * 0.1` | `.with_columns((pl.col("col") * 0.1))` | | Adding a column based on conditions | List comprehension or `np.where()` | `.when().then().otherwise().alias()` |