Passive Data Patisserie

Decaf POJOs

If you’ve ever written Java, you’re familiar with the contept of a POJO, or “Plain Old Java Object.”

I wrote Java for a few years with a sage elder dev to guide me. Very often, a complex problem would arise. Various bits of data and calculation lay scattered on the table.

“How am I to deal with all this complexity?” I would ask. My mentor, pitying me, would say “Why not just use a POJO?”

Oh yeah. Good idea!

Logic is the hard part of coding

For the unfamiliar, a POJO in Java is an object that

In C, they’re called POD’s (Plain Old Data) and more academically, they may be referred to as PDS’s (Passive Data Structures).

Personally, I would characterize POJOs and similar objects as describing structure rather than behavior.

Such objects are considered “simple” because structure––in general––is easier to grok than behavior.

When data is described separately from code that implements logic about it, you end up with a very logical separation of concerns.

Simple data can be hard to see :(

I would add one additional piece to the characteristics of a PDS:

Given this extra constraint, Java and old-school Python suffer the same issue with their passive data structures: The context clues to tell you they are “simple” are not so simple.

Here’s a classical POJO:

public class Point {

    public Integer x;
    public Integer y;

    public Point(Integer x, Integer y) {
        this.x = x;
        this.y = y;
    }
}

And here’s the Python equivalent:

class Point:
    def __init__(self, x: int, y: int):
        self.x, self.y = x, y

In both cases, you come to the conclusion that these are “passive” by:

  1. Reading the arguments of the constructor
  2. Noting the public properties of the object
  3. Reconciling the fact that the 2 constructor args are assigned to the 2 public properties

Here it’s not so bad, but as you add more and more properties it becomes harder to see.

class Heptagon:
    def __init__(
        self,
        v0: Point,
        v1: Point,
        v2: Point,
        v3: Point,
        v4: Point,
        v5: Point,
        v6: Point,
    ) -> None:
        self.v0 = v0
        self.v1 = v1
        self.v2 = v2
        self.v3 = v3
        self.v4 = v4
        self.v5 = v5
        self.v6 = v6

🗣️ That is VERY LOUD!

A Crisper, Lighter PDS

Enter: the dataclass. After Python 3.7, there was a new builtin module named dataclasses!

Dataclasses allow for extremely obvious and straightforward declaration of passive data structures.

There is exactly one context clue to tell you that an object:

It’s the @dataclass decorator:

from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int

Objects decorated with dataclasses have their constructors dynamically created based on the named properties in the class definition. (There are some exceptions here, but I won’t go into it. Read the docs if you’re curious.)

It’s a much simpler and more obvious recipe for a PDS, and it’s unique to Python.

Crisper still with Iced Coffee

I have to write JS for work. And to just… exist in the world.

I have a love/hate relationship with JS. But one thing that keeps it fun for me is the beautiful object notation. Classes are an afterthought because you can just speak passive data structures into existence.

let x = 1;
let y = 2;
let point = { x, y };

Voila! A passive data structure with two properties: x and y. Just like the two examples above. The constructor is just a literal {}, so there’s very little to grok.

The lack of constructors distills a truism

I’m sure you’ve heard this:

Program to interfaces, not implementations

In strict OOP languages, there’s a lot to say about this. But I don’t write in a strict OOP language.

The word “interface” comes with a lot of baggage in languages like Java. But in dynamic languages like Python and JS, I generally fall back on it referring the contract expressed in signatures and public properties.

JS facilitates this beautifully.

Let’s say the application needs to calculate the hemisphere of points via the y axis.

A first cut in JS may look like this:

function hemisphereOfPoint(y) {
  return y > 0 ? "north" : "south";
}

But an experienced JS dev would lilkely prefer this:

function hemisphereOfPoint({ y }) {
  hemisphere = y > 0 ? "north" : "south";
  return { hemisphere };
}

Why? Well, because now you’re programming to an interface.

The signature very obviously tells us that the interface we have at hand is an object with a x and y and nothing more.

The return value provides another obvious interface: Another PDS with one property, hemisphere.

This leads to objects that compose and extend easily:

Objects compose together easily in JS, and object composition is the point sitting beneath all this talk of interfaces. This comes naturally here:

const pointMeta = ({ x, y }) => ({
  point: { x, y },
  ...hemisphereOfPoint({ y })
});

Very concisely, we’ve composed a new “metadata” object wrapping the point plus derivations from it.

It’s easy to do more:

const isPoint3d = ({ z }) => ({ is3d: !isNaN(z) });

To mix this into pointMeta, just change the signature to ( { x, y, z } ). It stays backwards compatible, because the arity of the signature does not change. Awesome! 🤩

const pointMeta = ({ x, y, z }) => ({
  point: { x, y },
  ...hemisphereOfPoint({ y }),
  ...isPoint3d({ z }),
});

This combination of first-class-functions and easily-expressible, passive interfaces makes for code that composes and extends very nicely.

You’ll also notice that all the objects above are Plain and Old. No behavior at all! The logic is offloaded to receiving functions that program against those objects’ interfaces. And we’re supposed to “program to interfaces,” so this is great!

Hello constructor, goodbye truism

We just watched JS compose nicely because object interfaces can be declared inline with functions.

And in dynamic languages like Python and JS, we tend to reach for functions since they’re first-class and widely considered simpler than classes.

When asked to find the hemisphere of Point in Python and put it in a metadata object, this would not be an unlikely solution:

def hemisphere(y: int) -> str:
    return "north" if y > 0 else "south"

Without dataclasses, you may say this:

class PointMeta:
    def __init__(self, point: Point) -> None:
        self.point = point
        self.hemisphere = hemisphere(point.y)

It’s so simple! Hooray! Right…?

With dataclasses, you may take advantage of __post_init__ to throw a bit of garnish on your passive data structure at creation time:

@dataclass
class PointMeta:
    point: Point
    hemisphere: str

    def __post_init__(self) -> None:
        self.hemisphere = hemisphere(point.y)

This is passive enough. But it’s not programming to an interface.

To add in the “is 3d” property, maybe you just add more to __post_init__:

@dataclass
class PointMeta:
    point: Point
    hemisphere: str
    is_3d: bool

    def __post_init__(self) -> None:
        self.hemisphere = hemisphere(point.y)
        z = getattr(self.point, "z", None)
        self.is_3d = isinstance(z, int)

The description of data and the logic around that data is now hopelessly tangled. hemisphere just takes point.y because it’s “simple”, but the coupling is too loose. Gluing stuff together is already looking gross. And this eventually leads to an unsatisfying breadcrumb trail of little functions.

A return to passivity

The good thing about the JS code above was that the description of data was trapped in function signatures, while the logic around that data lived in nice, simple first-class functions.

That kind of design reflects a flow of data, which is a design strategy that always works.

Python lacks the notation needed to make that easy or obvious. But discipline and consideration can lead us there!

Let’s make a flowing-data version.

First, get rid of __post_init__ to make PointMeta passive.

@dataclass
class PointMeta:
    point: Point
    hemisphere: str
    is_3d: bool

Next, edit hemisphere the way we would have in JS: Make it receive an interface that reflects our business problem:

def hemisphere(point: Point) -> str:
    return "north" if point.y > 0 else "south"

Now it’s obvious that that function concerns itself with points. And we’ve got the ball rolling on this strategy.

Follow the rabbit and make metadata creation separate from metadata description.

def point_meta(point: Point) -> PointMeta:
    return PointMeta(
        point=point,
        hemisphere=hemisphere(point),
    )

point in the point_meta signature is a much smaller surface area than self in the now-deleted PointMeta.__post_init__. Focus is tighter, so when it’s time to think about is_3d, there’s a firmer nudge to edit Point itself to describe data that might be a 3d point.

@dataclass
class Point:
    x: int
    y: int
    z: int | None = None


def point_meta(point: Point) -> PointMeta:
    return PointMeta(
        point=point,
        hemisphere=hemisphere(point),
        is_3d=point.z is not None,  # `.` is simpler than `getattr`
    )

The boundary between logic and data description is almost always a good one to enforce. Separation of concerns can be hard, but this kind is very easy.

Ingredients Do Not Cut Themselves

Passive Data Structures are easy to define in modern Python.

Describing data is a good way to get arms around a business problem without getting mired in the challenges of writing up logic right away.

With those structures well-described, you create a lovely mise-en-place of objects that are simply injected into function signatures and reasoned about. Control is inverted without you needing to think about it. Interfaces are being programmed against. And signatures stay stable, as object properties grow more elegantly than function definitions.

Use dataclasses and enjoy the first-class-functions of our dynamic language responsibly. 🥐