Data Classes in Python – Karolos Koutsoulelos

Python data classes are a relatively new feature that was added in Python 3.7. They are used to create classes that are primarily used to store data. A data class is similar to a regular Python class, but it comes with additional functionality and built-in methods that make it easier to work with structured data.

A key difference between a data class and a normal Python class is that a data class is designed to be immutable. This means that the data stored in a data class cannot be changed once it has been created. This makes them ideal for use cases where you need to work with data that should not be modified, or where you need to ensure that the data remains consistent.

Data classes are also more concise and easier to read than traditional classes. They come with a set of default behaviors, such as __init__(), __repr__(), __eq__(), and __hash__(), which means that you don’t have to write them yourself. This makes data classes faster to create and easier to maintain.

When to use data classes

There are several situations where data classes can be useful:

Working with large or complex data structures: When you’re working with large or complex data structures, data classes can make it easier to manage the data and keep it organized. By using a data class, you can ensure that the data is structured and consistent, which can make it easier to work with and understand.
Dealing with data that should not be modified: If you’re working with data that should not be modified, such as configuration files or user input, a data class can help you ensure that the data remains consistent. By making the class immutable, you can prevent accidental modification of the data.
Creating APIs: If you’re creating an API, using data classes can make it easier for users of the API to understand how the data is structured. By using data classes, you can ensure that the data is consistent and well-organized, which can make it easier for users to work with.
Working with external data sources: If you’re working with data from an external data source, such as a database or API, data classes can make it easier to work with the data. By using a data class to represent the data, you can ensure that the data is consistent and well-organized, which can make it easier to work with and understand.

Creating a data class

To create a data class in Python, you use the dataclass decorator. Here’s an example:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    job: str = "unemployed"

In this example, we’ve defined a Person data class with three attributes: name, age, and job. We’ve also set a default value of “unemployed” for the job attribute. Note that we don’t need to define an __init__ method or any other methods; the dataclass decorator takes care of that for us.

The dataclass decorator has several optional keywords that can be used to customize the behavior of the resulting data class. Here are some of the most commonly used ones:

init: If set to True (the default), an __init__ method will be generated for the data class. If set to False, no __init__ method will be generated.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    job: str = "unemployed"

person = Person("Alice", 25)
print(person) # Output: Person(name='Alice', age=25, job='unemployed')

@dataclass(init=False)
class Person:
    name: str
    age: int
    job: str = "unemployed"

person = Person(name="Alice", age=25)
print(person) # Output: TypeError: Person() takes no arguments

repr: The repr option controls whether a __repr__ method is generated for the data class. When repr is set to True (the default), a default __repr__ method is generated which returns a string that represents the object’s attributes. When repr is set to False, no __repr__ method is generated.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    job: str = "unemployed"

person = Person("Alice", 25, "developer")
print(person)  # Output: Person(name='Alice', age=25, job='developer')

@dataclass(repr=False)
class Person:
    name: str
    age: int
    job: str = "unemployed"

person = Person("Alice", 25, "developer")
print(person)  # Output: <__main__.Person object at 0x7fdd51752e50>

eq: The eq option controls whether an __eq__ method is generated for the data class. When eq is set to True (the default), a default __eq__ method is generated which compares the object’s attributes for equality. When eq is set to False, no __eq__ method is generated.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    job: str = "unemployed"

person1 = Person("Alice", 25, "developer")
person2 = Person("Alice", 25, "developer")

print(person1 == person2)  # Output: True

@dataclass(eq=False)
class Person:
    name: str
    age: int
    job: str = "unemployed"

person1 = Person("Alice", 25, "developer")
person2 = Person("Alice", 25, "developer")

print(person1 == person2)  # Output: False

order: The order option controls whether comparison methods (__lt__, __le__, __gt__, and __ge__) are generated for the data class. When order is set to True, these methods are generated, allowing instances of the class to be compared with <, <=, >, and >=. By default, order is set to False, meaning no comparison methods are generated.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    job: str = "unemployed"

person1 = Person("Alice", 25, "developer")
person2 = Person("Bob", 30, "manager")

print(person1 < person2)  # Output: TypeError: '<' not supported between instances of 'Person' and 'Person'

@dataclass(order=True)
class Person:
    name: str
    age: int
    job: str = "unemployed"

person1 = Person("Alice", 25, "developer")
person2 = Person("Bob", 30, "manager")

print(person1 < person2)  # Output: True

frozen: If set to True, the resulting data class will be immutable. If set to False (the default), the resulting data class will be mutable.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    job: str = "unemployed"

person = Person("Alice", 25, "developer")
person.name = "Bob"
print(person) # Output: Person(name='Bob', age=25, job='developer')

@dataclass(frozen=True)
class Person:
    name: str
    age: int
    job: str = "unemployed"

person = Person("Alice", 25, "developer")
person.name = "Bob"  # Raises an AttributeError because the data class is frozen
print(person)

Functionality of Data Classes

Data classes provide several useful features that can simplify and streamline your code. Some of the key benefits can be found below with examples.

Automatic generation of common methods: By using the dataclass decorator, Python automatically generates several common methods for you, such as init, repr, and eq. This can save you time and effort, and make your code easier to read and maintain.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

p1 = Person("Alice", 25)
p2 = Person("Bob", 30)
print(p1)  # Output: Person(name='Alice', age=25)
print(p1 == p2)  # Output: False

In this example, we’ve defined a Person data class with two attributes: name and age. We’ve then created two instances of the Person class, p1 and p2, and used the generated __repr__ and __eq__ methods to print and compare them.

Built-in type checking: Data classes support type hints, which can help catch errors early and make your code more reliable. You can also use the typing module to specify more complex types, such as lists or dictionaries.

from dataclasses import dataclass

@dataclass
class Rectangle:
    width: float
    height: float

r = Rectangle("10", 20.5)  # raises TypeError

In this example, we’ve defined a Rectangle data class with two attributes: width and height. We’ve then created an instance of the Rectangle class with a string value for width, which raises a TypeError due to the type hint for width being float.

Easy serialization and deserialization: Data classes provide two methods that can convert instances to and from dictionaries and tuples: asdict and astuple. These methods are especially useful when working with JSON, XML, or other data formats.

from dataclasses import dataclass, asdict, astuple

@dataclass
class Point:
    x: float
    y: float

p = Point(3.5, 4.2)
d = asdict(p)  # converts to dictionary
t = astuple(p)  # converts to tuple
p2 = Point(**d)  # converts back from dictionary

print(d)  # Output: {'x': 3.5, 'y': 4.2}
print(t)  # Output: (3.5, 4.2)
print(p == p2)  # Output: True

In this example, we’ve defined a Point data class with two attributes: x and y. We’ve then created an instance of the Point class, p, and used the asdict and astuple methods to convert it to a dictionary and tuple, respectively. Finally, we’ve used the dictionary to create a new Point instance, p2, and used the generated __eq__ method to compare it to the original p.

Custom initialization: Data classes support the __post_init__ method, which you can use to add custom initialization logic or perform additional checks on the attributes.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

    def __post_init__(self):
        if self.age < 0:
            raise ValueError("Age must be non-negative")

p1 = Person("Alice", 25)
p2 = Person("Bob", -30)  # raises ValueError

In this example, we’ve defined a Person data class with two attributes: name and age. We’ve also defined a post_init method that checks if the age is non-negative. If the age is negative, it raises a ValueError.

Comparison with normal Classes

Data classes provide many of the same features as normal classes, but with less boilerplate and less manual work. In particular, data classes are optimized for handling simple, structured data that can be easily serialized and deserialized. Normal classes, on the other hand, are better suited for more complex, behavior-driven objects that may have more complex initialization, logic, or state.

Feature	Data classes	Normal classes
Declaration	Decorator (`@dataclass`)	Manual definition
Boilerplate	Less	More
Common methods	Auto-generated	Manual implementation
Type hints	Built-in support	Optional
Serialization	Built-in (`asdict`, `astuple`)	Manual implementation
Immutability	Optional (`frozen=True`)	Optional

Comparison between the two types of Classes in Python

Advantages of using data classes

There are several advantages to using data classes over normal classes, especially when working with structured data. Some of the key benefits include:

Simplicity and readability: Data classes are designed to be simple and easy to read, which can help reduce errors and make your code more maintainable.
Automatic generation of common methods: Data classes automatically generate several common methods for you, such as __init__, __repr__, and __eq__, which can save you time and effort.
Built-in type checking: Data classes support type hints, which can help catch errors early and make your code more reliable.
Easy serialization and deserialization: Data classes provide two methods that can convert instances to and from dictionaries and tuples: asdict and astuple. These methods are especially useful when working with JSON, XML, or other data formats.

While data classes provide many benefits, there may still be situations where normal classes are more appropriate. Here are a few scenarios where normal classes might be a better fit:

If you need more flexibility: Data classes are designed for structured data, and as a result, they have some limitations. If you need a class that can do more complex things, or you need more fine-grained control over the behavior of your class, a normal class might be a better choice.
If you need to interact with legacy code: If you’re working with existing code that doesn’t use data classes, it might be easier to use normal classes to maintain consistency with the existing codebase.
If you’re working with performance-critical code: While data classes are generally fast, they do come with some overhead, especially when using the built-in methods like asdict and astuple. If you’re working with code that needs to be as fast as possible, a normal class might be a better choice.

Advanced functionality of Data Classes

Inheritance

Data classes support inheritance just like regular classes, and you can use them as base classes for other data classes. This can be useful when you want to define a base class with common attributes and methods, and then create subclasses with additional attributes or behavior.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

@dataclass
class Employee(Person):
    id: int
    salary: float

In this example, we’ve defined a Person data class with name and age attributes, and an Employee data class that inherits from Person and adds id and salary attributes.

Methods in data classes

In addition to the common methods that are automatically generated, you can also define your own methods in data classes. These methods can perform any type of computation or manipulation on the data in the class.

from dataclasses import dataclass

@dataclass
class Circle:
    radius: float

    def area(self):
        return 3.14 * self.radius ** 2

In this example, we’ve defined a Circle data class with a radius attribute and an area method that computes the area of the circle.

Immutability and mutability of data classes

By default, data classes are mutable, which means that you can change the values of their attributes after they’ve been created. However, you can also make data classes immutable by setting the frozen parameter to True when you define them. This can be useful when you want to ensure that the values of the attributes remain constant.

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: float
    y: float

In this example, we’ve defined a Point data class with x and y attributes, and set the frozen parameter to True. This makes the data class immutable, so you can’t change the values of its attributes after it’s been created.

In summary, data classes are a convenient and powerful feature of Python 3 that can simplify your code and make it more reliable. With data classes, you can define classes for structured data with minimal boilerplate and automatic generation of common methods like __init__, __repr__, and __eq__. You can also take advantage of type hints and serialization methods to catch errors early and work with various data formats more easily. Additionally, data classes can be customized with methods and inheritance, and can be made immutable or mutable depending on your needs.

Data classes are not a silver bullet, and there may be cases where you’d still prefer to use regular classes or other Python features. However, in general, if you’re dealing with data structures that have well-defined attributes and no complex methods, data classes can be a great fit. They can help you write cleaner, more concise code, and avoid common mistakes like forgetting to implement key methods. Moreover, data classes can be a good way to learn and practice some of the latest features of Python and software engineering best practices.

If you’re interested in learning more about data classes please check the docs in Official Python documentation on data classes.

You can also explore related topics like type hints, serialization, and Python’s object-oriented programming model.