What do I mean by algebra?

When I say algebra, I don't think about implementing i.e. DSL for linear algebra in Python. I think about algebra in general as in "a set of objects and a collection of operations on them".

I like algebras, when they are used in programming. Probably the most well-known is Relational Algebra. The object in Relational Algebra is a relation, with operations like join, projection and union. The nice thing about having something that works like an algebra is, that you always work with the same type of object, reusing the box of tools without breaking the flow.

That is one of the reasons I like Pythons collections. Pythons sets and list actually have quite a nice algebra built around them.

Who should read this?

Anybody who is interested with using algebraic design patterns in Python. I have three kinds of people in mind as my target audience.

  • Me :-) Lets be clear, this is mostly a record for my future self, to remind me that I once understood something about this.
  • People familliar with these concepts in haskell, wanting to see them in dynamically typed language like Python
  • People interested in these algebraic patterns that know Python and don't have resources to learn i.e. haskell.

If you are somebody interested, but wants to see these in haskell rather than python, go read Bartosz Milewski's Programming Cafe, or haskellforall by Gabriel Gonzalez.

In [1]:
{1,2,3} | {3,4,5}
Out[1]:
{1, 2, 3, 4, 5}
In [2]:
{1,2,3} & {3,4,}
Out[2]:
{3}
In [3]:
{1,2,3} - {3,4,5}
Out[3]:
{1, 2}

And what I really like is, that all the laws I remember about sets from discrete math still work. For example, with union as | and empty set as set():

In [4]:
a = {1,2,3}
a == a|set()==set()|a
Out[4]:
True
In [5]:
b = {3,4,5}
c = {5,6,7}
(a | b) | c == a | (b | c)
Out[5]:
True

My personal preference is to use these math-like operators over calling methods on objects, even though methpd calls would produce the same result, and often would be more efficient

In [6]:
a = {1,2}
a.add(3)
a.add(4)
a.remove(2)
a == ({1,2} | {3} | {4}) - {2}
Out[6]:
True

To be honest, I probably would write this code with method calls, because they are more readable than operators:

In [7]:
a == {1,2,3}.union({3,4}).difference({2})
Out[7]:
True

I don't really want to have operators everywhere, what I want is composability. Now-days we might call this fluent interface, but I want to go one step further, I want the interface to conform to predefined set of rules.

This is what I mean when I say algebra:

  • interfaces I work with compose with each other, they interoperate well, they are fluent
  • the implementation is lawful, it conforms to predefined rules

Why?

So I have been watching too many talks from Scala conferences and reading too many posts from haskell enthusiasts. And now I envy them. Because I am mostly working with javascript and python and barely anybody even knows what monad is, while on the scala-side they are using co-free co-monads to implement cool, extensible DSLs.

Ok, there is fantasy-land and Ramda, but last time my colleague was trying to show off Ramda to the rest of the team, he mostly got puzzled looks from the most of them. And there isn't much in Python.

But I still want to try all of those cool things I see in the typed functional languages.

Algebraic laws

So, how might those predefined rules look like? For example lets have an binary operations $\circ$ and $\diamond$ some objects of some set $S$ that will work with this operation. Rules we might be interested in:

  • set is closed under the operation: $$\forall a,b \in S: a \circ b \in S$$

  • operation is associative: $$\forall a,b,c \in S: a \circ (b \circ c) = (a \circ b) \circ c$$

  • identity element exists: $$ \exists e \in S, \forall a \in S: e \circ a = a = a \circ e$$

  • inverse element exist: $$ \forall a \in S \exists b \in S: a \circ b = e = b \circ a$$
  • operation is commutative: $$\forall a,b \in S: a \circ b = b \circ a$$
  • distributivity for $\circ$ and $\diamond$: $$\forall a,b,c \in S: a \circ (b \diamond c) = a = (a \circ b) \diamond (a \circ c)$$

  • absorbtion for $\circ$ and $\diamond$: $$\forall a,b \in S: a \circ (a \diamond b) = a = a \diamond (a \circ b)$$

  • idempotence $$ \forall a \in S: a \circ a = a$$

There are many known algebraic structures that are defined by adhering to some of these. For example:

  • monoid is a set is closed under the operation, that is associative and has identity element. Strings with concat operation, or lists with append are monoids.
  • group is a monoid, where for every element in set inverse exists. These are i.e. integers with addition, more interestingly set of $n \times n$ invertible matrices with matrix multiplication operation.
  • semilattice is is a set is closed under the operation, that is associative, comutative and idempotent. Integers and operation max(a,b), or min(a,b) form a semilattice
  • bounded semilattice is a semilattice that is a monoid, for example, positive integers, max and 1.
  • lattice is a set that has two operations that form a semilatice, where absorbtion for the two operations holds

Monoids are useful

There actually already is a pattern that is fairly similiar to monoid. It is the Null Object pattern. But, because monoids have additional structure, with restctions on the operation to be closed and associative.

Strings with concat (and ""), or lists with append (and []), sets with union (and set()), dictionaries with union (and {}). What is even more interesting is that you can nest them, i.e. tuple of monoids is still monoid.

Latices are useful

Another of the things where this might be helpful, is in distributed systems. When i.e. doing a replication of some data over a wire:

  • your requests could arive in wrong order, but if updates would be commutative, this wouldn't be a problem
  • Same bit of data might be re-sent multiple times, but if the updates would be idempotent, this wouldn't be a problem

So we want a thing that is commutative and idempotent and that sounds like a lattice. There is a good talk by Lindsey Kuper about their usefulness in parallel programming.

Combining the power of abstract math with clarity of abstract math

I think this nicely sums up the sentiment I have about using algebraic patterns. Because I like math. And I like the guarantees it gives me. That if I know that something is commutative I don't need to care if I do $a \circ b$ or $b \circ a$. That if something is monoid I can always initialize a value with its identity element.

On the other hand, I understand that it might bring additional complexity where it wasn't needed previously. Using lattices for your data makes things simpler when you start scaling across your cluster, but they might not support all the operations you need. Or you just don't have them in a library for your language and the one you need is just too complex to implement yourself.

So, lets keep that in mind. Next time, monoids?