Monoids are cool¶
One useful concept to investigate here is the one of a monoid. If we think of this as an interface that is implemented by a type T, it needs two things:
- the empty element, usually called mempty
- the append operation, that takes any two T's and returns a new T
There are three laws:
- operation is closed over T: for any two objects of type T it will really return new T, no exceptions.
- identity: for any a of type T: append(a,empty) == append(empty,a) == a
- associativity: for any a,b,c of type T: append(a,append(b,c)) == append(append(a,b),c)
Now we have this interface, we can do cool things with this :)
And by cool thigs, I mean going through this presentation by Gabriel Gonzalez and translating the concepts from haskell to python :)
If you are not scared of Haskell, I really recomend that presentation. There is even a souce for it.
But first, semigroups¶
Maybe we are getting ahead of ourselves. What if I don't require the empty element. Even if I just have a binary operation that abides just by the first to laws,
- it is closed
- and associative it still makes it realy nice to work with. The fact that I know it is closed means that I don't have to think about the current values I am going to append, and associavity means that I don't really have to think about the order in which my appends get executed. And less thinking I have to do, the better.
This is what we call semigroup.
I my mind, the most useful of the laws is requiring the operation to be closed. Few thing that easily throws wrench into our laws are
- null
- exceptions
- non-termination
In theory, when you use sufficiently advanced type system, you could forbid some of these.
For example in Haskell, Kotlin, Typescript and even C# you are able to define functions that make it a compile-time error, if you pass in a value that might be null.
In Python, we don't have this available by default. Although I'd like to explore the power of Python's typing library.
Useful, but not-semigroups¶
Of course, there are many useful operations on objects, that are not semigroups. For example, function composition.
def compose(f,g):
return lambda x: f(g(x))
print(compose(len,str)(10))
print(compose(str,abs)(-10))
Function composition is a useful concept, but I just can't compose arbitrary functions together, they have to match.
compose(len,len)(10)
This is applicable to many concepts in stream processing, or dataflow programming, where there is the concept of piping a stream to transform it for processing. If endpoints of the pipes don't fit together, your pipe-line just won't work. This explains why so much of unix tooling deals with text-to-text transformations. The fact that so many of the utilities are closed over text streams makes their usage much more pleasant.
Some monoid examples¶
But if we have a monoid, there are few cool things we can do with it, as shown in the presentation by Gabriel Gonzalez.
Typeclasses present the first translation hiccup. In haskell you use these to signidy that a type conforms to some interface, similarily to Java/C# interfaces. Nice thing about this is, that you can define implementations for existing types. This makes them more similiar to C# extension methods, or Clojure protocols. Fortunately, we have single dispatch in Python to simulate this.
from functools import singledispatch
@singledispatch
def mempty(a):
raise Error("Not implemented for" + a)
@singledispatch
def mappend(a, b):
raise Error("Not implemented for" + a)
We can definitely implement these for lists.
- mappend is +, and we know that appending lists is associative
- mempty is [], and we know that it doesn't matter if you'd append empty list to left or right
We can definitely implement these for lists.
- mappend is +, and we know that appending lists is associative
- mempty is [], and we know that it doesn't matter if you'd append empty list to left or right
@mempty.register(list)
def _(a):
return []
@mappend.register(list)
def _(a,b):
return a + b
mappend([1,2,3],[4,5,6])
We can trivially implement these for None. It might look kind'a silly, but it will be useful, once we get to combine function.
@mempty.register(None.__class__)
def _(a):
return None
@mappend.register(None.__class__)
def _(a,b):
return None
mappend(None,None) == None
Generic functions¶
Now we can create a generic function that works on all monoids, such as mconcat, that takes a list and appends its contents.
def mconcat(l):
acc = l[0]
for x in l[1:]:
acc = mappend(acc,x)
return acc
mconcat([[1,2],[3,4,5]])
As you can see, we have run into our first problem. In theory we should be able to do mconcat of an [] and get the mempty for the lists member type. But python doesn't have typed lists. Well, we see how far will this get us :)
Nesting¶
Second thing we could do, is to try to nest these inside of other structures. For example, if you have n-tuple with monoids, you can prove, that the n-tuple is monoid. I am lazy to write out the proof, so I just defer to the presentation I am copying from :)
@mappend.register((0,0).__class__)
def _(a,b):
return tuple(mappend(i,j) for i,j in zip(a,b))
mappend(([1,2,3],[10,11,12]),([4,5,6],[14,15]))
This of course means that we can nest the touples in other touples :)
mappend(([1,2],([10,11],([20,21],[25,26]))),
([3,4],([12,13],([22,23],[26,28]))))
Nesting with functions¶
If we have function f that accepts type A as input param and returns type B, then f forms a semigroup if B forms a semigroup.
Basically, we pass the input into all of the functions and then we append the results.
@mappend.register(mconcat.__class__)
def _(a,b):
def result(*x):
a_r= a(*x)
b_r=b(*x)
return mappend(a_r,b_r)
return result
This means we can send a single arg to multiple functions that return None. This is where the mappend definition comes useful, because without it we would have seen exceptions here.
def phello(arg):
print("Hello",arg)
def phi(arg):
print("Hi",arg)
mappend(phello,phi)("World!")
Or they could return function that returns None :)
def promptName():
name = input("Enter Your Name: ")
return lambda: print("Hi ", name)
def promptAge():
age = input("Enter Your Age: ")
return lambda: print("Your Age is", age)
promptBoth = mappend(promptName,promptAge)
answers = promptBoth()
answers()
We could actually get these back, if we return a monoid from tose inner functions.
def getName():
name = input("Enter Your Name: ")
return lambda: [name]
def getAge():
age = input("Enter Your Age: ")
return lambda: [age]
getBoth = mappend(getName,getAge)
listAnswers = getBoth()
listAnswers()
Function nesting and laws¶
Lets look again at the most important law that this mappend
needs to uphold.
mappend(f, mappend(g,h)) == mappend(mappend(f,g),h)
, given any functions a
,b
,c
, that
- accept the same input param
- and return the same semigroup.
To make this more readable, I will use $\oplus$ instead of mappend
. This means, that I want to prove
$$ f \oplus (g \oplus h) = (f \oplus g) \oplus h $$
Functions are equivalent when for any input, they produce equal output.
So, lets assume arbitrary input of x
, and we will try to evaluate this on the both sides and try get to the same result (we will mark result of $f(x)$ as $f_x$ and we will evaluate the functions from the left):
Because we know, that $f_x$, $g_x$ and $h_x$ are from the same semigroup, the last lines are equivalent.
There is another assumption, and that is, that the function application will always start at right. This way, no matter the bracket position, $f(x)$ in our example always gets evaluated first. This is why we can't i.e. flip the evaluation of the functions in our implementation, because ordering matters, and next code sample would introduce a subtle bug.
@mappend.register(mconcat.__class__)
def _(a,b):
def result(*x):
b_r = b(*x)
a_r = a(*x)
return mappend(a_r,b_r)
return result
In math, we know that function will always return the same result for the same input, in python this is not the case. This means, that while sketching out a proof is helpful, we can't rely on it too much and still need to be doing testing.
To conclude¶
I wouldn't actually advise anybody to go overboard with nesting monoids as a basis for extendable and robust application architecture. Well, in Haskell you can have a field day with this, but in Python there isn't the scaffolding of types to keep the madness in check. Which means that I stop here, and don't try to implement monoid for combining streaming transactions :-)
On the other hand, going just slightly over-board with funciton nesting to create a little configuration parsing dsl might be fun. Maybe next time.