Monoids are cool

One useful concept to investigate here is the one of a monoid. If we think of this as an interface that is implemented by a type T, it needs two things:

  • the empty element, usually called mempty
  • the append operation, that takes any two T's and returns a new T

There are three laws:

  • operation is closed over T: for any two objects of type T it will really return new T, no exceptions.
  • identity: for any a of type T: append(a,empty) == append(empty,a) == a
  • associativity: for any a,b,c of type T: append(a,append(b,c)) == append(append(a,b),c)

Now we have this interface, we can do cool things with this :)

And by cool thigs, I mean going through this presentation by Gabriel Gonzalez and translating the concepts from haskell to python :)

If you are not scared of Haskell, I really recomend that presentation. There is even a souce for it.

But first, semigroups

Maybe we are getting ahead of ourselves. What if I don't require the empty element. Even if I just have a binary operation that abides just by the first to laws,

  • it is closed
  • and associative it still makes it realy nice to work with. The fact that I know it is closed means that I don't have to think about the current values I am going to append, and associavity means that I don't really have to think about the order in which my appends get executed. And less thinking I have to do, the better.

This is what we call semigroup.

I my mind, the most useful of the laws is requiring the operation to be closed. Few thing that easily throws wrench into our laws are

  • null
  • exceptions
  • non-termination

In theory, when you use sufficiently advanced type system, you could forbid some of these.

For example in Haskell, Kotlin, Typescript and even C# you are able to define functions that make it a compile-time error, if you pass in a value that might be null.

In Python, we don't have this available by default. Although I'd like to explore the power of Python's typing library.

Useful, but not-semigroups

Of course, there are many useful operations on objects, that are not semigroups. For example, function composition.

In [5]:
def compose(f,g):
    return lambda x: f(g(x))


Function composition is a useful concept, but I just can't compose arbitrary functions together, they have to match.

In [6]:
TypeError                                 Traceback (most recent call last)
<ipython-input-6-780387530514> in <module>()
----> 1 compose(len,len)(10)

<ipython-input-5-6bdc4733c056> in <lambda>(x)
      1 def compose(f,g):
----> 2     return lambda x: f(g(x))
      4 print(compose(len,str)(10))
      5 print(compose(str,abs)(-10))

TypeError: object of type 'int' has no len()

This is applicable to many concepts in stream processing, or dataflow programming, where there is the concept of piping a stream to transform it for processing. If endpoints of the pipes don't fit together, your pipe-line just won't work. This explains why so much of unix tooling deals with text-to-text transformations. The fact that so many of the utilities are closed over text streams makes their usage much more pleasant.

Some monoid examples

But if we have a monoid, there are few cool things we can do with it, as shown in the presentation by Gabriel Gonzalez.

Typeclasses present the first translation hiccup. In haskell you use these to signidy that a type conforms to some interface, similarily to Java/C# interfaces. Nice thing about this is, that you can define implementations for existing types. This makes them more similiar to C# extension methods, or Clojure protocols. Fortunately, we have single dispatch in Python to simulate this.

In [12]:
from functools import singledispatch
def mempty(a):
    raise Error("Not implemented for" + a)

def mappend(a, b):
    raise Error("Not implemented for" + a)

We can definitely implement these for lists.

  • mappend is +, and we know that appending lists is associative
  • mempty is [], and we know that it doesn't matter if you'd append empty list to left or right

We can definitely implement these for lists.

  • mappend is +, and we know that appending lists is associative
  • mempty is [], and we know that it doesn't matter if you'd append empty list to left or right
In [14]:
def _(a):
    return []

def _(a,b):
    return a + b
In [15]:
[1, 2, 3, 4, 5, 6]

We can trivially implement these for None. It might look kind'a silly, but it will be useful, once we get to combine function.

In [16]:
def _(a):
    return None

def _(a,b):
    return None
In [17]:
mappend(None,None) == None

Generic functions

Now we can create a generic function that works on all monoids, such as mconcat, that takes a list and appends its contents.

In [1]:
def mconcat(l):
    acc = l[0]
    for x in l[1:]:
        acc = mappend(acc,x)
    return acc

NameError                                 Traceback (most recent call last)
<ipython-input-1-bb9cceb40914> in <module>()
      5     return acc
----> 7 mconcat([[1,2],[3,4,5]])

<ipython-input-1-bb9cceb40914> in mconcat(l)
      2     acc = l[0]
      3     for x in l[1:]:
----> 4         acc = mappend(acc,x)
      5     return acc

NameError: name 'mappend' is not defined

As you can see, we have run into our first problem. In theory we should be able to do mconcat of an [] and get the mempty for the lists member type. But python doesn't have typed lists. Well, we see how far will this get us :)


Second thing we could do, is to try to nest these inside of other structures. For example, if you have n-tuple with monoids, you can prove, that the n-tuple is monoid. I am lazy to write out the proof, so I just defer to the presentation I am copying from :)

In [19]:
def _(a,b):
    return tuple(mappend(i,j) for i,j in zip(a,b))
In [20]:
([1, 2, 3, 4, 5, 6], [10, 11, 12, 14, 15])

This of course means that we can nest the touples in other touples :)

In [21]:
([1, 2, 3, 4], ([10, 11, 12, 13], ([20, 21, 22, 23], [25, 26, 26, 28])))

Nesting with functions

If we have function f that accepts type A as input param and returns type B, then f forms a semigroup if B forms a semigroup.

Basically, we pass the input into all of the functions and then we append the results.

In [22]:
def _(a,b):
  def result(*x):
    a_r= a(*x)
    return mappend(a_r,b_r)
  return result

This means we can send a single arg to multiple functions that return None. This is where the mappend definition comes useful, because without it we would have seen exceptions here.

In [23]:
def phello(arg):
def phi(arg):

Hello World!
Hi World!

Or they could return function that returns None :)

In [24]:
def promptName():
    name = input("Enter Your Name: ")
    return lambda: print("Hi ", name)

def promptAge():
    age = input("Enter Your Age: ")
    return lambda: print("Your Age is", age)

promptBoth = mappend(promptName,promptAge)
In [25]:
answers = promptBoth()
Enter Your Name: Adam
Enter Your Age: 99
In [26]:
Hi  Adam
Your Age is 99

We could actually get these back, if we return a monoid from tose inner functions.

In [27]:
def getName():
    name = input("Enter Your Name: ")
    return lambda: [name]

def getAge():
    age = input("Enter Your Age: ")
    return lambda: [age]

getBoth = mappend(getName,getAge)
listAnswers = getBoth()
Enter Your Name: Adam
Enter Your Age: 99
['Adam', '99']

Function nesting and laws

Lets look again at the most important law that this mappend needs to uphold.

mappend(f, mappend(g,h)) == mappend(mappend(f,g),h)

, given any functions a,b,c, that

  • accept the same input param
  • and return the same semigroup.

To make this more readable, I will use $\oplus$ instead of mappend. This means, that I want to prove $$ f \oplus (g \oplus h) = (f \oplus g) \oplus h $$

Functions are equivalent when for any input, they produce equal output.

So, lets assume arbitrary input of x, and we will try to evaluate this on the both sides and try get to the same result (we will mark result of $f(x)$ as $f_x$ and we will evaluate the functions from the left):

$$ \begin{align*} (f \oplus (g \oplus h))(x) &= f(x) \oplus (g \oplus h)(x) \\ &= f_x \oplus (g \oplus h) \\ &= f_x \oplus (g(x) \oplus h(x)) \\ &= f_x \oplus (g_x \oplus h(x) )\\ &= f_x \oplus (g_x \oplus h_x) \end{align*} $$$$ \begin{align*} ((f \oplus g) \oplus h)(x) &= (f \oplus g)(x) \oplus h(x) \\ &= (f(x) \oplus g(x)) \oplus h(x) \\ &= (f_x \oplus g(x)) \oplus h(x) \\ &= (f_x \oplus g_x) \oplus h(x) \\ &= (f_x \oplus g_x) \oplus h_x \end{align*} $$

Because we know, that $f_x$, $g_x$ and $h_x$ are from the same semigroup, the last lines are equivalent.

There is another assumption, and that is, that the function application will always start at right. This way, no matter the bracket position, $f(x)$ in our example always gets evaluated first. This is why we can't i.e. flip the evaluation of the functions in our implementation, because ordering matters, and next code sample would introduce a subtle bug.

def _(a,b):
  def result(*x):
    b_r = b(*x)
    a_r = a(*x)
    return mappend(a_r,b_r)
  return result

In math, we know that function will always return the same result for the same input, in python this is not the case. This means, that while sketching out a proof is helpful, we can't rely on it too much and still need to be doing testing.

To conclude

I wouldn't actually advise anybody to go overboard with nesting monoids as a basis for extendable and robust application architecture. Well, in Haskell you can have a field day with this, but in Python there isn't the scaffolding of types to keep the madness in check. Which means that I stop here, and don't try to implement monoid for combining streaming transactions :-)

On the other hand, going just slightly over-board with funciton nesting to create a little configuration parsing dsl might be fun. Maybe next time.