The `collections.Counter` class in Python is a powerful tool for counting hashable objects. It is part of the `collections` module, which provides alternative container types to Python's built-in containers like [lists](/tutorials/list), [tuples](/tutorials/tuple), and [dictionaries](/tutorials/dict). `Counter` is particularly useful in data science when you need to count the occurrence of items, such as words in text or categorical values in a dataset. In this tutorial, we'll cover the following: 1. Creating a Counter 2. Common methods and operations 3. Applications in data science ## 1. Creating a Counter To start using `Counter`, you need to import it from the `collections` module. You can create a `Counter` object by passing an iterable (like a list or a string) or a dictionary. ### Example: Counting Elements in a List
from collections import Counter # Sample list data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] # Creating a Counter counter = Counter(data) print(counter)
### Example: Counting Characters in a String
from collections import Counter # Sample string text = "data science bootcamp" # Creating a Counter counter = Counter(text) print(counter)
## 2. Common Methods and Operations The `Counter` class provides various methods and operations to work with the counted data efficiently. ### a. Most Common Elements The `most_common` method returns a list of the `n` most common elements and their counts.
from collections import Counter # Sample data data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4] # Creating a Counter counter = Counter(data) # Get the 2 most common elements most_common_elements = counter.most_common(2) print(most_common_elements)
### b. Updating Counts You can update the counts using another iterable or a dictionary.
from collections import Counter # Initial data data = [1, 2, 2, 3, 3, 3] # Creating a Counter counter = Counter(data) # Data to update with update_data = [2, 3, 4, 4] # Updating the counter counter.update(update_data) print(counter)
### c. Subtracting Counts The `subtract` method allows you to subtract element counts.
from collections import Counter # Initial data data = [1, 2, 2, 3, 3, 3] # Creating a Counter counter = Counter(data) # Data to subtract subtract_data = [2, 3, 4] # Subtracting from the counter counter.subtract(subtract_data) print(counter)
### d. Elements Method The `elements` method returns an iterator over elements repeating each as many times as its count.
from collections import Counter # Sample data data = [1, 2, 2, 3, 3, 3] # Creating a Counter counter = Counter(data) # Getting elements elements = list(counter.elements()) print(elements)
### e. Arithmetic and Set Operations Counters support addition, subtraction, intersection, and union.
from collections import Counter # Sample data counter1 = Counter([1, 2, 2, 3]) counter2 = Counter([2, 3, 3, 4]) # Addition print(counter1 + counter2) # Subtraction print(counter1 - counter2) # Intersection (minimum of corresponding counts) print(counter1 & counter2) # Union (maximum of corresponding counts) print(counter1 | counter2)
## 3. Applications in Data Science Let's look at some practical applications of `Counter` in data science. ### a. Word Frequency in a Text Counting the frequency of words in a text is a common task in natural language processing (NLP).
from collections import Counter import re # Sample text text = "Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data." # Clean and split the text into words words = re.findall(r'\w+', text.lower()) # Creating a Counter counter = Counter(words) # Most common words print(counter.most_common(5))
### b. Counting Categorical Data Counters can also be used to count occurrences of categorical values in a dataset.
from collections import Counter
# Sample dataset: list of tuples (ID, category)
dataset = [
(1, 'A'),
(2, 'B'),
(3, 'A'),
(4, 'A'),
(5, 'B'),
(6, 'C')
]
# Extract categories
categories = [category for _, category in dataset]
# Creating a Counter
category_counter = Counter(categories)
print(category_counter)