Python Dataclass: The Complete Guide to Boilerplate-Free Data Classes in Python

You write a lot of classes in Python. Most of them exist just to hold a few fields, maybe print nicely, and compare with each other. If you have been doing this manually, you are wasting time. The @dataclass decorator handles all of that for you, and it has been in the standard library since Python 3.7. I use it in virtually every project I work on, and in this guide I will show you exactly why, and how to use it the right way.

You will know what dataclasses are, how to create them, every field option available, how to handle immutability, validation, inheritance, and performance tuning by the time you finish this guide. You will also know when dataclasses are the wrong tool, and what to use instead. Let us get into it.

What Is a Python Dataclass?

A dataclass is a regular Python class that gets an automatic __init__, __repr__, and __eq__ method generated by the @dataclass decorator. You define the fields using type hints, and the decorator does the rest. This eliminates the boilerplate you would otherwise write by hand for simple data-holding classes.

Here is the most basic example you can write.

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float

p = Point(3.5, 7.2)
print(p)
# Point(x=3.5, y=7.2)
print(p.x)  # 3.5

That is it. No manual __init__, no __repr__, no __eq__. The decorator generates all three. Now let me show you what these generated methods actually do for you.

The Three Auto-Generated Methods

The @dataclass decorator generates three methods by default. You should understand each one before you start using dataclasses in production code.

The __init__ Method

The generated __init__ accepts one argument per field you defined, in the order you defined them. You can also specify default values for fields, which makes those arguments optional at construction time.

from dataclasses import dataclass

@dataclass
class Product:
    name: str
    price: float
    quantity: int = 0  # Default value

p1 = Product("Keyboard", 79.99)
p2 = Product("Mouse", 29.99, 5)
print(p1.quantity)  # 0
print(p2.quantity)  # 5

One rule to keep in mind: fields without defaults must come before fields with defaults. Python enforces this same rule for dataclasses. Violating it raises a ValueError at class definition time.

The __repr__ Method

The generated __repr__ gives you a readable string representation of your instance. This is what lets you print a dataclass and get something useful instead of the unhelpful output you get from a plain class that lacks a custom __repr__ method.

from dataclasses import dataclass

@dataclass
class ServerConfig:
    host: str
    port: int
    debug: bool = False

cfg = ServerConfig("localhost", 8080, True)
print(cfg)
# ServerConfig(host='localhost', port=8080, debug=True)

The __eq__ Method

The generated __eq__ compares two instances field by field. Two dataclass instances with the same field values are considered equal, even if they are separate objects in memory. This is structural equality, not identity.

from dataclasses import dataclass

@dataclass
class Vector:
    x: float
    y: float

v1 = Vector(1.0, 2.0)
v2 = Vector(1.0, 2.0)
v3 = Vector(1.0, 3.0)

print(v1 == v2)  # True
print(v1 == v3)  # False
print(v1 is v2)  # False (different objects)

Field Options: Beyond the Basics

The field() function is where dataclasses get serious. It gives you fine-grained control over individual fields. You need to import it from the dataclasses module to use it.

Mutable Default Values: Use default_factory

One of the most common mistakes people make with dataclasses is using a mutable object like a list as a default value. Python evaluates default arguments once at function definition time, which means every instance shares the same list. The default_factory solves this by calling a function to produce a fresh value for each instance.

from dataclasses import dataclass, field
from typing import List

# WRONG: shared list across all instances
# @dataclass
# class ShoppingCart:
#     items: List[str] = []  # DANGEROUS

# CORRECT: factory function creates a new list per instance
@dataclass
class ShoppingCart:
    customer_id: int
    items: List[str] = field(default_factory=list)

cart1 = ShoppingCart(customer_id=1)
cart1.items.append("Milk")

cart2 = ShoppingCart(customer_id=2)
cart2.items.append("Bread")

print(cart1.items)  # ['Milk']
print(cart2.items)  # ['Bread']

The same principle applies to dictionaries and any other mutable object. Always use default_factory for mutable defaults. I have seen this bug cause hard-to-debug issues in production systems, and it is entirely avoidable.

Excluding Fields from repr

You can exclude a field from the auto-generated __repr__ output by setting repr=False. This is useful for fields that contain sensitive data or internal state that should not appear in logs or error messages.

from dataclasses import dataclass, field

@dataclass
class User:
    username: str
    email: str
    password_hash: str = field(repr=False)
    failed_login_attempts: int = field(default=0, repr=False)

user = User("alice", "[email protected]", "hashed_password_here")
print(user)
# User(username='alice', email='[email protected]')
# password_hash is not shown

Excluding Fields from Comparison

Setting compare=False on a field removes it from the auto-generated __eq__ method. This is useful for fields like timestamps or internal IDs that do not affect logical equality.

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Document:
    title: str
    content: str
    created_at: datetime = field(default_factory=datetime.now, compare=False)

d1 = Document("Report", "Q1 numbers", datetime(2026, 1, 1, 9, 0, 0))
d2 = Document("Report", "Q1 numbers", datetime(2026, 1, 1, 9, 0, 1))
print(d1 == d2)  # True (content matches, created_at excluded)

Making a Field Computed: init=False

Sometimes you want a field that is set by the class itself, not passed in during construction. Setting init=False removes the field from the auto-generated __init__. You then set the value manually inside __post_init__.

from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)

    def __post_init__(self):
        self.area = self.width * self.height

r = Rectangle(5.0, 3.0)
print(r)  # Rectangle(width=5.0, height=3.0, area=15.0)
print(r.area)  # 15.0

Validation with __post_init__

The __post_init__ method runs after the auto-generated __init__ completes. This is your hook for validation logic, data transformation, or any initialization that depends on multiple fields. I use it heavily in real code.

from dataclasses import dataclass

@dataclass
class Investment:
    ticker: str
    shares: int
    purchase_price: float

    def __post_init__(self):
        if self.shares <= 0:
            raise ValueError("Shares must be positive")
        if self.purchase_price <= 0:
            raise ValueError("Purchase price must be positive")
        if len(self.ticker) > 5:
            raise ValueError("Ticker symbols are at most 5 characters")

inv = Investment("AAPL", 50, 178.50)
print(f"Total cost: ${inv.shares * inv.purchase_price:.2f}")
# Total cost: $8925.00

You can also use __post_init__ to normalize input data. For example, stripping whitespace from strings or converting names to title case.

from dataclasses import dataclass

@dataclass
class Customer:
    name: str
    email: str

    def __post_init__(self):
        self.name = self.name.strip().title()
        self.email = self.email.strip().lower()

c = Customer("  ALICE JOHNSON ", "[email protected]")
print(c)  # Customer(name='Alice Johnson', email='[email protected]')

Immutability with frozen=True

Setting frozen=True on the @dataclass decorator makes all fields read-only after initialization. Any attempt to modify a field raises a FrozenInstanceError. I recommend using this for any dataclass that represents a value object or configuration data that should not change.

from dataclasses import dataclass

@dataclass(frozen=True)
class Coordinates:
    latitude: float
    longitude: float

loc = Coordinates(40.7128, -74.0060)
print(loc)
# Coordinates(latitude=40.7128, longitude=-74.0060)

# loc.latitude = 40.7130  # Raises FrozenInstanceError

Frozen dataclasses are also safe to use as dictionary keys or add to sets, because their contents cannot change after creation. This is a significant advantage over regular classes for certain use cases.

from dataclasses import dataclass

@dataclass(frozen=True)
class RGB:
    red: int
    green: int
    blue: int

color1 = RGB(255, 0, 0)
color2 = RGB(255, 0, 0)
color3 = RGB(0, 255, 0)

color_set = {color1, color2, color3}
print(len(color_set))  # 2 (color1 and color2 are equal, deduplicated)

Performance: Using slots=True

Python 3.10 introduced the slots=True option for dataclasses. Enabling slots reduces the memory footprint of each instance significantly, because Python does not create a __dict__ for each object. Instead, attributes are stored in a fixed-size array defined at class creation time. For applications that create millions of dataclass instances, this matters.

from dataclasses import dataclass

@dataclass(slots=True)
class SensorReading:
    sensor_id: str
    timestamp: float
    value: float

# Memory per instance is significantly lower than a regular dataclass
reading = SensorReading("temp_01", 1713000000.0, 23.7)
print(reading.sensor_id)  # temp_01

You can combine slots=True with frozen=True for fully immutable, memory-efficient data objects.

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Config:
    api_key: str
    base_url: str
    timeout_seconds: int

cfg = Config("secret_key", "https://api.example.com", 30)
print(cfg)
# Config(api_key='secret_key', base_url='https://api.example.com', timeout_seconds=30)

Slots come with one limitation: you cannot add new attributes to an instance at runtime. For dataclasses that are purely data containers, this is never a problem. Stick with regular dataclasses if you need to mix in behavior or dynamic attributes.

Ordering with order=True

By default, dataclasses generate only __eq__. Setting order=True generates the full suite of rich comparison methods: __lt__, __le__, __gt__, and __ge__. Comparisons are performed field by field in the order the fields are defined.

from dataclasses import dataclass

@dataclass(order=True)
class Employee:
    department: str
    salary: int
    name: str

e1 = Employee("Engineering", 120000, "Alice")
e2 = Employee("Engineering", 95000, "Bob")
e3 = Employee("Marketing", 100000, "Carol")

print(e1 > e2)  # True (Engineering, 120000 > 95000)
print(e2 < e3)  # True (Engineering < Marketing alphabetically)
print(sorted([e1, e2, e3]))
# [Employee(department='Engineering', salary=95000, name='Bob'),
#  Employee(department='Engineering', salary=120000, name='Alice'),
#  Employee(department='Marketing', salary=100000, name='Carol')]

Inheritance with Dataclasses

Dataclasses support inheritance. A subclass dataclass inherits all fields from its parent, and can add new fields of its own. The same rule about default and non-default fields applies across the inheritance chain.

from dataclasses import dataclass, field

@dataclass
class Animal:
    name: str
    age: int

@dataclass
class Dog(Animal):
    breed: str
    commands_known: int = 0

dog = Dog("Buddy", 3, "Golden Retriever", 15)
print(dog)
# Dog(name='Buddy', age=3, breed='Golden Retriever', commands_known=15)

Here is the critical inheritance rule: a non-default field in a subclass cannot follow a defaulted field from a parent class. Python detects this at class definition time and raises a TypeError.

from dataclasses import dataclass

# This works
@dataclass
class Base:
    x: int
    y: int = 10

@dataclass
class Child(Base):
    z: int = 20

# This FAILS at definition time:
# @dataclass
# class BrokenChild(Base):
#     w: int  # No default
#     z: int = 20  # Default in parent, non-default in child

c = Child(1, 2, 3)
print(c)  # Child(x=1, y=2, z=3)

InitVar: Arguments That Are Not Fields

InitVar lets you accept an argument during initialization without storing it as a field. This is useful for values that are needed only during setup, such as passwords that get hashed or configuration files that get read and then discarded.

from dataclasses import dataclass, InitVar

@dataclass
class SecureUser:
    username: str
    password: InitVar[str]
    password_strength: str = ""

    def __post_init__(self, password: str):
        if len(password) < 8:
            self.password_strength = "weak"
        elif any(c.isdigit() for c in password) and any(c.isupper() for c in password):
            self.password_strength = "strong"
        else:
            self.password_strength = "medium"

user = SecureUser("john_doe", "MyPass123")
print(user)
# SecureUser(username='john_doe', password_strength='strong')
# password field itself is not stored

Real-World Use Cases

Dataclasses are not just a syntactic convenience. They solve real problems in production code. Here are the scenarios where I reach for them most often.

Data Transfer Objects

When your code sends or receives structured data across API boundaries, dataclasses give you a clean way to represent that data with full type information.

from dataclasses import dataclass
from typing import Optional
from datetime import datetime

@dataclass
class ApiResponse:
    status_code: int
    message: str
    data: Optional[dict] = None
    timestamp: datetime = field(default_factory=datetime.now)

response = ApiResponse(200, "Success", {"user_id": 42})
print(response)
# ApiResponse(status_code=200, message='Success', data={'user_id': 42}, ...)

Configuration Objects

Dataclasses are excellent for holding configuration. They are immutable by design, and their __repr__ makes it easy to log and debug configuration at startup.

from dataclasses import dataclass, field

@dataclass(frozen=True)
class DatabaseConfig:
    host: str
    port: int
    database: str
    pool_size: int = 10
    timeout_seconds: int = 30

db_cfg = DatabaseConfig("db.acme.com", 5432, "production")
print(db_cfg)

Game Objects and Simulations

For game development or physics simulations, dataclasses provide a clean way to represent entities like points, vectors, and game state without the overhead of a full class implementation.

from dataclasses import dataclass, field
from typing import List

@dataclass(frozen=True)
class Vector3D:
    x: float
    y: float
    z: float

    def magnitude(self) -> float:
        return (self.x**2 + self.y**2 + self.z**2) ** 0.5

    def dot(self, other: "Vector3D") -> float:
        return self.x * other.x + self.y * other.y + self.z * other.z

v1 = Vector3D(1.0, 0.0, 0.0)
v2 = Vector3D(0.0, 1.0, 0.0)
print(v1.magnitude())  # 1.0
print(v1.dot(v2))  # 0.0

Comparison with Alternatives

Dataclasses are not the only way to create data-holding classes in Python. You should know what else is available and why dataclasses are often the right choice.

Feature Dataclass Pydantic Named Tuple attrs
Auto __init__ Yes Yes Yes Yes
Auto __repr__ Yes Yes Yes Yes
Auto __eq__ Yes Yes Yes Yes
Validation Manual (__post_init__) Automatic None With validators
Immutability frozen=True Model Config Tuple semantics attr.s(frozen=True)
Stdlib only Yes No Yes No
Serialization Manual Built-in Manual With converters
Min Python version 3.7 3.8 2.6 3.7

Pydantic is the right choice when you need automatic validation and parsing, especially for API request and response models. Named tuples are best for simple fixed-length sequences where immutability is required. The attrs library predates dataclasses and inspired many of its features. For most everyday data-holding classes in application code, dataclasses from the standard library are the right tool.

When Not to Use Dataclasses

Dataclasses are powerful, but they are not a universal replacement for every class. Avoid them when your class has significant business logic, side effects, or complex lifecycle management. Regular classes give you more control over __init__ and make the intent clearer when behavior matters more than data.

Avoid dataclasses for classes that serve as base classes for complex inheritance hierarchies, for classes that need custom __init__ signatures that do not map cleanly to field definitions, and for classes where you need advanced serialization that goes beyond what a manual __post_init__ can handle. Pydantic or attrs handle those cases better.

RankMath FAQ

Here are the questions I hear most from developers when I am teaching them about dataclasses.

What is a Python dataclass?

A Python dataclass is a class decorated with @dataclass from the standard library dataclasses module. The decorator automatically generates __init__, __repr__, and __eq__ methods based on the type-hinted fields you define.

How do you create a dataclass in Python?

Import dataclass from the dataclasses module and apply the @dataclass decorator to your class definition. Define fields using type hints. Python generates the constructor and comparison methods automatically.

from dataclasses import dataclass

@dataclass
class Book:
    title: str
    author: str
    pages: int

book = Book("1984", "George Orwell", 328)
print(book)

What is the difference between frozen=True and slots=True?

frozen=True makes fields read-only after initialization. slots=True reduces memory usage per instance by disabling __dict__. You can use both together for immutable, memory-efficient data objects.

How do you handle mutable default values in dataclasses?

Never use a mutable object as a direct default value. Use field(default_factory=YourMutableType) instead. The factory function is called once per instance, producing a fresh object each time.

from dataclasses import dataclass, field
from typing import List

@dataclass
class Team:
    name: str
    members: List[str] = field(default_factory=list)

What does __post_init__ do in a dataclass?

__post_init__ is a method that runs after the auto-generated __init__ completes. It is the place to add validation logic, transform field values, or compute derived fields.

Can dataclasses be inherited?

Yes, dataclasses support inheritance. A subclass dataclass inherits all parent fields. The only constraint is that non-default fields cannot follow defaulted fields in the inheritance chain.

When should you use dataclasses instead of dictionaries?

Use dataclasses when you need type safety, auto-completion in your IDE, and structured data with clear field names. Dictionaries are still fine for dynamic or ad-hoc data structures. Dataclasses win on larger, structured datasets where correctness and maintainability matter.

Summary

Python dataclasses eliminate boilerplate for data-holding classes. You get __init__, __repr__, and __eq__ automatically by defining type-hinted fields. The field() function gives you control over defaults, repr visibility, comparison behavior, and initialization arguments. Use frozen=True for immutable value objects, slots=True for memory efficiency in Python 3.10+, and __post_init__ for validation and transformation. Understand when dataclasses are the wrong tool and reach for Pydantic or attrs when your requirements go beyond what the standard library offers.

Start using dataclasses in your next project where you would otherwise write a class with just a few attributes. You will write less code, get better error messages, and spend less time maintaining boilerplate.

Ninad
Ninad

A Python and PHP developer turned writer out of passion. Over the last 6+ years, he has written for brands including DigitalOcean, DreamHost, Hostinger, and many others. When not working, you'll find him tinkering with open-source projects, vibe coding, or on a mountain trail, completely disconnected from tech.

Articles: 117