Catlab refactor II: Basic Sets
$$
$$
This is a sequel post to Catlab Refactor I. Check that one out first!
Catlab.jl is in the process of updating from v0.16
to v0.17
. In the new (v0.17
) version of Catlab, there is now a module BasicSets
, which handles the representation of sets and functions. The theme of this post is that we can use the design pattern of interfaces and implementations (in this post, synonymous with theories and models) to structure our code in a transparent and modular way. This will allow us to use simple types (without type parameters) that have an explicit set of methods that can be expected to work on them, in contrast to more idiomatic Julia which uses a complex type hierarchy and multiple dispatch to control code execution, as described in the previous post.
Set interfaces
We saw the interface for ThSet
in the previous post. Let’s reproduce it here:
Any implementation of ThSet
must pick a Julia datatype for its elements, and we name this type X
. However, the set might be smaller than the entire type. So, part of the data one must always provide, in order to implement ThSet
, is a membership predicate, contains
. This evaluates to true
for the elements of the Julia type X
which are actually in the set.
Now we define a Julia type, SetOb
, which wraps arbitrary implementations of the theory of sets:
Our predicate contains
only accepts arguments which were of the right Julia type, but we can overload the in
method to be more permissive: it returns false
if the input was not of the right type rather than throwing a MethodNotFound
error.
Base.in(x, set::AbsSet) = try
ThFinSet.contains(set, x)
catch e
e isa MethodError || throw(e)
false
end
Next we consider how ThFinSet
extends this theory with a Julia iterator1 and an integer cardinality, i.e. a length
.
@theory ThFinSet <: ThSet begin
Int′::TYPE{Int}
Any′::TYPE{Any}
length()::Int′
iterator()::Any′ # length(collect(iterator()) == length()
end
ThFinSet.Meta.@wrapper FinSet
const AbsSet = Union{SetOb, FinSet} # `AbstractSet` already claimed by Julia!
Now, when you receive a FinSet
, you know you have something which you can check for membership, get the length, and iterate over. There are no type parameters, in contrast to old Catlab:
This extra type parameter S
was needed because we wanted different code to be executed when we were working in \mathsf{Skel(FinSet)} versus when we happened to have some other type of set with Int
elements. This is an example of complicated type acrobatics needed to work directly with multiple dispatch, in contrast to the model-based approach taken in the refactored code.
Set implementations
The theories of sets and finite sets will make more sense if you see some examples of implementations of them. Some of these are shown below:
Product sets
Product sets are very common in math and programming. While they can be represented explicitly as sets of pairs (or, more generally, n-tuples, for the product of a vector of n sets), it is more efficient to just store the vector of SetOb
(or FinSet
) components and lazily generate pairs at runtime as needed.
"""
Unbiased Cartesian product. `S` is either SetOb or FinSet.
`T` is the type of the tuple elements.
"""
struct ProdSet{S<:AbsSet, T}
sets::Vector{S}
ProdSet(sets::Vector{S}) where {S<:AbsSet} =
new{S, Tuple{eltype.(sets)...}}(sets)
end
# Common method for ThSet and ThFinSet implementations
contains_set(model::ProdSet{<:Any, T}, i::T) =
all(e ∈ s for (s, e) in zip(model.sets, i))
@instance ThSet{T} [model::ProdSet{SetOb,T}] where T begin
contains(i::T)::Bool = contains_set(model, i)
end
@instance ThFinSet{T} [model::ProdSet{FinSet,T}] where T begin
contains(i::T)::Bool = contains_set(model, i)
length()::Int = prod(length.(model.sets))
iterator()::Any = Iterators.product(model.sets...)
end
The implementation has type parameters, which we use to our advantage in order to have just one data structure that is parametric in the type of its elements and implements both ThSet
and ThFinSet
. However, these aren’t the problematic kind of type parameters we were trying to avoid: the only way a ProdSet
gets used is after it has been wrapped by a SetOb
or FinSet
, which itself has no type parameters.
FinSetInt
Our favorite implementation of ThFinSet
is FinSetInt
, which is a set {1,2,...,n}
represented by a single Int
:
struct FinSetInt
n::Int
end
@instance ThFinSet{Int} [model::FinSetInt] begin
contains(i::Int)::Bool = 0 < i ≤ model.n
length()::Int = model.n
iterator()::Any = 1:model.n
end
FinSetInt
is not a FinSet
This is a subtle point that is worth stressing. We know how to obtain a FinSet
by wrapping a FinSetInt
, but a FinSetInt
is a concrete datatype, whereas a FinSet
is a wrapper around anything which implements the ThFinSet
interface.
What is counterintuitive is that, even though FinSet
is in some sense more general than FinSetInt
, it is not the case that FinSetInt <: FinSet
, which was true in old Catlab. Because a datatype like FinSetInt
could in principle implement any number of theories, there isn’t any theory in particular whose wrapper type could be privileged as the supertype of FinSetInt
. So, a FinSetInt
is not a FinSet
. When it becomes time to implement the category of \mathsf{Skel(FinSet)}, it will make sense for the objects to be of type FinSetInt
rather than FinSet
. Whereas for the category \mathsf{FinSet}, we should have objects of type FinSet
.
This is important because, when we come to \mathsf{C}-Sets, we’ll be tempted to say “Catlab should know how to take an \mathsf{C}-Set and get a FinSet
for every object of its schema”. But at times we want to interpret the \mathsf{C}-Set as representing a diagram in \mathsf{Skel(FinSet)}, and then we’d want to extract a FinSetInt
rather than a FinSet
from each object in the schema.
This addresses a common class of bugs in the old Catlab: accidentally getting overly strict element types when mapping over collections. When one has a list of FinSetInt
, one doesn’t know how to distinguish from a list of FinSet
where all elements happen to be FinSetInt
and a list where it’s important that they are all FinSetInt
.
Singleton and Empty Sets
The singleton set is a set with exactly one element. We make a particular choice that the Nothing
type (with its unique value nothing
) is the singleton set.
struct SingletonSet end
@instance ThFinSet{Nothing} [model::SingletonSet] begin
contains(::Nothing)::Bool = true
length()::Int = 1
iterator()::Any = [nothing]
end
Likewise, the best choice of a Julia type for the empty set is Union{}
:
struct EmptySet end
@instance ThFinSet{Union{}} [model::EmptySet] begin
contains(::Union{})::Bool = false
length()::Int = 0
iterator()::Any = Union{}[]
end
While we could model these sets as FinSetInt(1)
and FinSetInt(0)
,2 respectively, there are downstream benefits of not presuming that these special sets have Int
as the Julia type for their values.3
Old versus new implementations
Implementations of ThSet
correspond to subtypes of the abstract type SetOb
in old Catlab. While old Catlab had TypeSet
and PredicatedSet
, the refactor extends this with a UnionSet
, ProdSet
, and various kinds of sum types: EitherSet
(values wrapped in Left
or Right
), SumSet
(values tagged by integers), and NamedSumSet
(values tagged by arbitrary keys).
For subtypes of FinSet
, old Catlab had FinSetInt
, FinSetCollection
(wrapper around a vector or Julia Set
) and TabularSet
. The refactor includes these in addition to EmptyFinSet
, SingletonSet
, EnumSet
(an @enum
type regarded as a FinSet
), and the same flavors of union/product/sum types listed above, specialized to finite sets.
Default implementations
Suppose I want a TypeSet
, which in the previous post we saw is a particular implementation of ThSet
that is specified by Julia type.
In old Catlab, one simply writes TypeSet(String)
and gets a SetOb
; this is because we used to have TypeSet <: SetOb
. That is no longer the case, so it seems like we’ve lost something; we have to do this tedious wrapping if we want to use TypeSet(String)
as the codom of a function, since the function is expecting a model wrapped in a smart constructor, not a raw model (which could be any Julia type).
However, in all honesty, how many different implementations of SetOb
are there? (not many) And how many of them simply take in a Julia type as their only data? (just one). Thus we should feel entitled to define the following shortcut:
Now it is just as easy as before (though we use SetOb
rather than TypeSet
):
The general lesson is: if there is a ‘canonical’ implementation for some given data, we can define a method for the wrapper type to simply take in that data, call the canonical implementation on the data and then wrap that. This pattern for model wrapper types gets used all the time throughout the refactor. Another example would be:
There are times where there isn’t a canonical implementation. Consider:
We should not define a such a method. Do we want the product? The disjoint union? In those cases we really ought be explicit about which implementation we intend.
Function interfaces
Let’s look at the theory for functions in the new Catlab refactor, starting with a core theory shared by SetFunction
, FinFunction
, and FinDomFunction
:
The ThFunctionCore
interface states that we require function implementations to apply a function to some Julia type Dom
to get elements of some other Julia type Codom
.
abstract type SetFunction′ end # *only* subtyped by SetFunction
@theory ThSetFunction <: ThFunctionCore begin
Fun′::TYPE{SetFunction′}
DomSet::TYPE{SetOb}
CodSet::TYPE{SetOb}
dom()::DomSet # eltype(dom()) <: Dom
codom()::CodSet # eltype(codom()) <: Cod
postcompose(t::Fun′)::Fun′
end
ThSetFunction.Meta.@wrapper SetFunction
Implementations of the theory of ThSetFunction
can be understand as morphisms in \mathsf{Set}. For these we require that one must supply a (co)domain SetOb
4 and be able to postcompose another function. When we eventually compose
these SetFunctions
(in \mathsf{Set}), the immediate result will be a sequence of functions. However, for performance considerations, we’ll want to be able to evaluate this sequence into a single SetFunction
(via a force
method which calls the underlying postcompose
methods). Being able to repeatedly post-compose the first function in such a sequence is sufficient for this task and is often a more natural choice than precomposing.5
We can overload the call syntax of Julia to use the app
function.
The theory of functions, now thought of as living in \mathsf{FinSet}, is very similar:
abstract type FinFunction′ end # *only* subtyped by FinFunction
@theory ThFinFunction <: ThFunctionCore begin
Fun′::TYPE{FinFunction′}
DomSet::TYPE{FinSet}
CodSet::TYPE{FinSet}
dom()::DomSet # eltype(dom()) <: Dom
codom()::CodSet # eltype(codom()) <: Cod
postcompose(t::Fun′)::Fun′
end
ThFinFunction.Meta.@wrapper FinFunction <: FinFunction′
Note in both cases we did use an abstract type to avoid a chicken-egg problem: in the theory of finite functions we need to refer to the wrapper type for models of the theory of finite functions (which we can’t really define until the theory has been defined). Because only FinFunction
type subtypes FinFunction′
, we can treat the two types as equivalent.
The theory for a FinDomFunction
is similar, where DomSet
must be a FinSet
but CodSet
must be a SetOb
.
Function instances
Let’s see a couple models of these theories to get a sense of them:
Identity functions
An identity function wraps a set, either SetOb
or FinSet
. Depending on which of those two options it received, it can either implement ThSetFunction
or ThFinFunction
.
struct IdentityFunction{S<:AbsSet, T}
set::S
IdentityFunction(d::S) where S<:AbsSet = new{S, eltype(d)}(d)
end
@instance ThFinFunction{T,T} [model::IdentityFunction{FinSet, T}] where T begin
dom()::FinSet = model.set
codom()::FinSet = model.set
app(i::T)::T = i
postcompose(f::FinFunction′)::FinFunction′ = f
end
@instance ThSetFunction{T,T} [model::IdentityFunction{SetOb,T}] where T begin
dom()::SetOb = model.set
codom()::SetOb = model.set
app(i::T)::T = i
postcompose(f::SetFunction′)::SetFunction′ = f
end
Vector-based finite functions
A very common kind of function is a function with a FinSetInt
domain. When we are in this fortunate situation, we can express a function out of that domain with a vector of values (where the values all live in some codomain set). This is handled by the following implementation:
struct FinFunctionVector{S<:AbsSet,T}
val::AbstractVector
codom::S
FinFunctionVector(val::AbstractVector,codom::S) where {S<:AbsSet} =
new{S,eltype(codom)}(val, codom)
end
@instance ThFinFunction{Int,T} [model::FinFunctionVector{FinSet,T}] where T begin
dom()::FinSet = FinSet(length(model.val))
codom()::FinSet = model.codom
app(i::Int)::T = model.val[i]
postcompose(f::FinFunction′)::FinFunction′ =
FinFunction(FinFunctionVector(f.(model.val), codom(f)))
end
@instance ThFinDomFunction{Int,T} [model::FinFunctionVector{SetOb,T}] where T begin
dom()::FinSet = FinSet(length(model.val))
codom()::SetOb = model.codom
app(i::Int)::T = model.val[i]
postcompose(f::SetFunction)::FinDomFunction =
FinDomFunction(FinFunctionVector(f.(model.val), codom(f)))
end
Old versus new implementations
The Catlab refactor has models for all the subtypes of SetFunction
/FinDomFunction
/FinFunction
in old Catlab. These are CallableFunction
, CompositeFunction
, ConstantFunction
, IdentityFunction
, PredicatedFunction
, FinFunctionVector
, IndexedFinFunctionVector
and FinFunctionDict
.
Contrasting design of functions with old Catlab
There was not one force
method for SetFunction
but rather a variety of force
methods for various subtypes of SetFunction
. And those process of evaluating a sequence of functions was handled by type dispatch - any pair of implementations of functions could have its own idiosyncratic method for evaluating to a single implementation. This led to lots of complexity relative to the interface+model approach: it was harder to know which code will be actually executed given complicated types and it was harder to locate the relevant code within the codebase, as there is no obvious place for do_compose(f::MyImplA, g::MyImplB)
to go, when MyImplA
and MyImplB
are defined in different places.
Regarding the complexity of type parameters for functions in old Catlab: the type FinFunction
had four type parameters ({S, S′, Dom <: FinSet{S}, Codom <: FinSet{S′}}
) in order to manage their dispatch. This gave us for free that every FinFunction
was a FinDomFunction
and every FinDomFunction
was a SetFunction
. This was often convenient but sometimes led to pernicious bugs where we really want to be able to make this distinction (the same reasons as above for not enforcing FinSetInt <: FinSet
or FinSet <: SetOb
).
File structure and future posts
The v0.16
to v0.17
refactor is concentrated on Catlab’s CategoricalAlgebra
module. This was a folder with about 15 files in it, including FinCats
(826 Lines Of Code), FinSets
(1507 LOC), CSets
(1418 LOC), FunctorialDataMigrations
(456 LOC).
In the refactored version, these large files have been broken down into smaller ones which for the most part either declare a theory or declare a data structure and show how that data structure is a model of various theories. Another major structural change is to subdivide this into a sequence of modules:
Cats
(general machinery for categories, functors, limits, etc.)SetCats
(specializing machinery inCats
to Set-like categories, including FinSet)Pointwise
(specializing machinery inCats
to \mathsf{C}-Set-like categories, such as ACSets and VarACSets).
We’ll cover these in a follow-up blog post!
Footnotes
Julia iterators are duck-typed; there is no single abstract type that unifies all the sorts of things which we can call
iterate
on. For that reason, our interface cannot be more precise than to say the iterator is of typeAny
. It is up to implementers ofThFinSet
that the value returned byiterator()
is iterable. Furthermore, it’s up to the implementor to make sure thelength()
result is consistent with theiterator()
result, as shown in the commented equation.↩︎We do use
FinSetInt(1)
andFinSetInt(0)
for the terminal and initial objects whenever we are working in \mathsf{Skel(FinSet)} rather than \mathsf{FinSet}.↩︎For example, when unioning a set with the empty set, the element type will be the union of the Julia types for the respective sets. If the empty set were represented with
FinSet(0)
, then the element type ofUnionSet(set_containing_strings, empty_set)
would beUnion{Int,String}
rather than justString
as we’d hope for.↩︎The expressivity of GATs doesn’t let us talk about a
TYPE
whose terms are precisely the domain elements: rather we have a Julia typeDom
for which a subset of elements (those in the setdom()
) are defined (likewise forCod
). Theapp
function sends everyDom
to aCod
, which isn’t as constrained as sending every element ofdom()
to an element ofcodom()
. It is expected, but can only be enforced at runtime, thateltype(dom()) <: Dom
(likewise forCod
). These things could be improved in the future by moving to a richer language than GATs, though it is fine in practice and is no worse than how things previously worked.↩︎Often postcomposing allows one to keep the same structure but just update the values, e.g. a function (defined on the domain \{1,...,n\}) represented by a length-n vector of values can be postcomposed with f by just applying f to each value in the vector. However, there are some function implementations which do fundamentally change upon postcomposition, such as postcomposing an
IdentityFunction
with some other f or postcomposing some f with aConstantFunction
.↩︎