Catlab refactor II: Basic Sets

code
Examples of theories and models for sets and functions in Catlab.jl.
Author

Kris Brown

Published

March 25, 2025

This is a sequel post to Catlab Refactor I. Check that one out first!

Catlab.jl is in the process of updating from v0.16 to v0.17. In the new (v0.17) version of Catlab, there is now a module BasicSets, which handles the representation of sets and functions. The theme of this post is that we can use the design pattern of interfaces and implementations (in this post, synonymous with theories and models) to structure our code in a transparent and modular way. This will allow us to use simple types (without type parameters) that have an explicit set of methods that can be expected to work on them, in contrast to more idiomatic Julia which uses a complex type hierarchy and multiple dispatch to control code execution, as described in the previous post.

Set interfaces

We saw the interface for ThSet in the previous post. Let’s reproduce it here:

@theory ThSet begin
  X::TYPE
  Bool′::TYPE{Bool}
  contains(elem::X)::Bool′
end

Any implementation of ThSet must pick a Julia datatype for its elements, and we name this type X. However, the set might be smaller than the entire type. So, part of the data one must always provide, in order to implement ThSet, is a membership predicate, contains. This evaluates to true for the elements of the Julia type X which are actually in the set.

Now we define a Julia type, SetOb, which wraps arbitrary implementations of the theory of sets:

ThSet.Meta.@wrapper SetOb # `Set` already claimed by Julia!

Our predicate contains only accepts arguments which were of the right Julia type, but we can overload the in method to be more permissive: it returns false if the input was not of the right type rather than throwing a MethodNotFound error.

Base.in(x, set::AbsSet) = try 
  ThFinSet.contains(set, x)
catch e 
  e isa MethodError || throw(e)
  false
end

Next we consider how ThFinSet extends this theory with a Julia iterator1 and an integer cardinality, i.e. a length.

@theory ThFinSet <: ThSet begin
  Int′::TYPE{Int}
  Any′::TYPE{Any}
  length()::Int′
  iterator()::Any′ # length(collect(iterator()) == length()
end

ThFinSet.Meta.@wrapper FinSet

const AbsSet = Union{SetOb, FinSet} # `AbstractSet` already claimed by Julia!

Now, when you receive a FinSet, you know you have something which you can check for membership, get the length, and iterate over. There are no type parameters, in contrast to old Catlab:

abstract type FinSet{S,T} <: SetOb{T} end # OLD CATLAB

This extra type parameter S was needed because we wanted different code to be executed when we were working in \mathsf{Skel(FinSet)} versus when we happened to have some other type of set with Int elements. This is an example of complicated type acrobatics needed to work directly with multiple dispatch, in contrast to the model-based approach taken in the refactored code.

Set implementations

The theories of sets and finite sets will make more sense if you see some examples of implementations of them. Some of these are shown below:

Product sets

Product sets are very common in math and programming. While they can be represented explicitly as sets of pairs (or, more generally, n-tuples, for the product of a vector of n sets), it is more efficient to just store the vector of SetOb (or FinSet) components and lazily generate pairs at runtime as needed.

""" 
Unbiased Cartesian product. `S` is either SetOb or FinSet. 
`T` is the type of the tuple elements.
"""
struct ProdSet{S<:AbsSet, T}
  sets::Vector{S}
  ProdSet(sets::Vector{S}) where {S<:AbsSet} = 
    new{S, Tuple{eltype.(sets)...}}(sets)
end

# Common method for ThSet and ThFinSet implementations
contains_set(model::ProdSet{<:Any, T}, i::T) = 
  all(e  s for (s, e) in zip(model.sets, i))

@instance ThSet{T} [model::ProdSet{SetOb,T}] where T begin
  contains(i::T)::Bool = contains_set(model, i)
end

@instance ThFinSet{T} [model::ProdSet{FinSet,T}] where T begin
  contains(i::T)::Bool = contains_set(model, i)
  length()::Int = prod(length.(model.sets))
  iterator()::Any = Iterators.product(model.sets...)
end

The implementation has type parameters, which we use to our advantage in order to have just one data structure that is parametric in the type of its elements and implements both ThSet and ThFinSet. However, these aren’t the problematic kind of type parameters we were trying to avoid: the only way a ProdSet gets used is after it has been wrapped by a SetOb or FinSet, which itself has no type parameters.

FinSetInt

Our favorite implementation of ThFinSet is FinSetInt, which is a set {1,2,...,n} represented by a single Int:

struct FinSetInt
  n::Int
end 

@instance ThFinSet{Int} [model::FinSetInt] begin
  contains(i::Int)::Bool = 0 < i  model.n
  length()::Int = model.n
  iterator()::Any = 1:model.n
end
A FinSetInt is not a FinSet

This is a subtle point that is worth stressing. We know how to obtain a FinSet by wrapping a FinSetInt, but a FinSetInt is a concrete datatype, whereas a FinSet is a wrapper around anything which implements the ThFinSet interface.

What is counterintuitive is that, even though FinSet is in some sense more general than FinSetInt, it is not the case that FinSetInt <: FinSet, which was true in old Catlab. Because a datatype like FinSetInt could in principle implement any number of theories, there isn’t any theory in particular whose wrapper type could be privileged as the supertype of FinSetInt. So, a FinSetInt is not a FinSet. When it becomes time to implement the category of \mathsf{Skel(FinSet)}, it will make sense for the objects to be of type FinSetInt rather than FinSet. Whereas for the category \mathsf{FinSet}, we should have objects of type FinSet.

This is important because, when we come to \mathsf{C}-Sets, we’ll be tempted to say “Catlab should know how to take an \mathsf{C}-Set and get a FinSet for every object of its schema”. But at times we want to interpret the \mathsf{C}-Set as representing a diagram in \mathsf{Skel(FinSet)}, and then we’d want to extract a FinSetInt rather than a FinSet from each object in the schema.

This addresses a common class of bugs in the old Catlab: accidentally getting overly strict element types when mapping over collections. When one has a list of FinSetInt, one doesn’t know how to distinguish from a list of FinSet where all elements happen to be FinSetInt and a list where it’s important that they are all FinSetInt.

Singleton and Empty Sets

The singleton set is a set with exactly one element. We make a particular choice that the Nothing type (with its unique value nothing) is the singleton set.

struct SingletonSet end 

@instance ThFinSet{Nothing} [model::SingletonSet] begin
  contains(::Nothing)::Bool = true
  length()::Int = 1
  iterator()::Any = [nothing]
end

Likewise, the best choice of a Julia type for the empty set is Union{}:

struct EmptySet end 

@instance ThFinSet{Union{}} [model::EmptySet] begin
  contains(::Union{})::Bool = false
  length()::Int = 0
  iterator()::Any = Union{}[]
end

While we could model these sets as FinSetInt(1) and FinSetInt(0),2 respectively, there are downstream benefits of not presuming that these special sets have Int as the Julia type for their values.3

Old versus new implementations

Implementations of ThSet correspond to subtypes of the abstract type SetOb in old Catlab. While old Catlab had TypeSet and PredicatedSet, the refactor extends this with a UnionSet, ProdSet, and various kinds of sum types: EitherSet (values wrapped in Left or Right), SumSet (values tagged by integers), and NamedSumSet (values tagged by arbitrary keys).

For subtypes of FinSet, old Catlab had FinSetInt, FinSetCollection (wrapper around a vector or Julia Set) and TabularSet. The refactor includes these in addition to EmptyFinSet, SingletonSet, EnumSet (an @enum type regarded as a FinSet), and the same flavors of union/product/sum types listed above, specialized to finite sets.

Default implementations

Suppose I want a TypeSet, which in the previous post we saw is a particular implementation of ThSet that is specified by Julia type.

string_set = SetOb(TypeSet(String))
int_set = SetOb(TypeSet(Int))

In old Catlab, one simply writes TypeSet(String) and gets a SetOb; this is because we used to have TypeSet <: SetOb. That is no longer the case, so it seems like we’ve lost something; we have to do this tedious wrapping if we want to use TypeSet(String) as the codom of a function, since the function is expecting a model wrapped in a smart constructor, not a raw model (which could be any Julia type).

However, in all honesty, how many different implementations of SetOb are there? (not many) And how many of them simply take in a Julia type as their only data? (just one). Thus we should feel entitled to define the following shortcut:

""" Default model for a Set made out of a Julia `Type` """
SetOb(T::Type) = SetOb(TypeSet{T}())

Now it is just as easy as before (though we use SetOb rather than TypeSet):

string_set = SetOb(String)
int_set = SetOb(Int)

The general lesson is: if there is a ‘canonical’ implementation for some given data, we can define a method for the wrapper type to simply take in that data, call the canonical implementation on the data and then wrap that. This pattern for model wrapper types gets used all the time throughout the refactor. Another example would be:

FinSet(::Nothing) = FinSet(SingletonSet())

There are times where there isn’t a canonical implementation. Consider:

FinSet(s1::FinSet, s2::FinSet) = ...

We should not define a such a method. Do we want the product? The disjoint union? In those cases we really ought be explicit about which implementation we intend.

Function interfaces

Let’s look at the theory for functions in the new Catlab refactor, starting with a core theory shared by SetFunction, FinFunction, and FinDomFunction:

@theory ThFunctionCore begin
  Dom::TYPE
  Cod::TYPE
  app(e::Dom)::Cod
end

The ThFunctionCore interface states that we require function implementations to apply a function to some Julia type Dom to get elements of some other Julia type Codom.

abstract type SetFunction′ end # *only* subtyped by SetFunction

@theory ThSetFunction <: ThFunctionCore begin
  Fun′::TYPE{SetFunction′}
  DomSet::TYPE{SetOb}
  CodSet::TYPE{SetOb}
  dom()::DomSet # eltype(dom()) <: Dom
  codom()::CodSet # eltype(codom()) <: Cod
  postcompose(t::Fun′)::Fun′
end

ThSetFunction.Meta.@wrapper SetFunction

Implementations of the theory of ThSetFunction can be understand as morphisms in \mathsf{Set}. For these we require that one must supply a (co)domain SetOb4 and be able to postcompose another function. When we eventually compose these SetFunctions (in \mathsf{Set}), the immediate result will be a sequence of functions. However, for performance considerations, we’ll want to be able to evaluate this sequence into a single SetFunction (via a force method which calls the underlying postcompose methods). Being able to repeatedly post-compose the first function in such a sequence is sufficient for this task and is often a more natural choice than precomposing.5

We can overload the call syntax of Julia to use the app function.

(f::SetFunction)(x) = ThSetFunction.app(f, x)

The theory of functions, now thought of as living in \mathsf{FinSet}, is very similar:

abstract type FinFunction′ end # *only* subtyped by FinFunction

@theory ThFinFunction <: ThFunctionCore begin
  Fun′::TYPE{FinFunction′}
  DomSet::TYPE{FinSet}
  CodSet::TYPE{FinSet}
  dom()::DomSet # eltype(dom()) <: Dom
  codom()::CodSet # eltype(codom()) <: Cod
  postcompose(t::Fun′)::Fun′
end

ThFinFunction.Meta.@wrapper FinFunction <: FinFunction′

Note in both cases we did use an abstract type to avoid a chicken-egg problem: in the theory of finite functions we need to refer to the wrapper type for models of the theory of finite functions (which we can’t really define until the theory has been defined). Because only FinFunction type subtypes FinFunction′, we can treat the two types as equivalent.

The theory for a FinDomFunction is similar, where DomSet must be a FinSet but CodSet must be a SetOb.

Function instances

Let’s see a couple models of these theories to get a sense of them:

Identity functions

An identity function wraps a set, either SetOb or FinSet. Depending on which of those two options it received, it can either implement ThSetFunction or ThFinFunction.

struct IdentityFunction{S<:AbsSet, T}
  set::S
  IdentityFunction(d::S) where S<:AbsSet = new{S, eltype(d)}(d)
end

@instance ThFinFunction{T,T} [model::IdentityFunction{FinSet, T}] where T begin
  dom()::FinSet = model.set
  codom()::FinSet = model.set
  app(i::T)::T = i
  postcompose(f::FinFunction′)::FinFunction′ = f
end

@instance ThSetFunction{T,T} [model::IdentityFunction{SetOb,T}] where T begin
  dom()::SetOb = model.set
  codom()::SetOb = model.set
  app(i::T)::T = i
  postcompose(f::SetFunction′)::SetFunction′ = f
end

Vector-based finite functions

A very common kind of function is a function with a FinSetInt domain. When we are in this fortunate situation, we can express a function out of that domain with a vector of values (where the values all live in some codomain set). This is handled by the following implementation:

struct FinFunctionVector{S<:AbsSet,T}
  val::AbstractVector
  codom::S
  FinFunctionVector(val::AbstractVector,codom::S) where {S<:AbsSet} =
    new{S,eltype(codom)}(val, codom)
end

@instance ThFinFunction{Int,T} [model::FinFunctionVector{FinSet,T}] where T begin
  dom()::FinSet = FinSet(length(model.val))
  codom()::FinSet = model.codom
  app(i::Int)::T = model.val[i]
  postcompose(f::FinFunction′)::FinFunction′ =
    FinFunction(FinFunctionVector(f.(model.val), codom(f)))
end

@instance ThFinDomFunction{Int,T} [model::FinFunctionVector{SetOb,T}] where T begin
  dom()::FinSet = FinSet(length(model.val))
  codom()::SetOb = model.codom
  app(i::Int)::T = model.val[i]
  postcompose(f::SetFunction)::FinDomFunction =
    FinDomFunction(FinFunctionVector(f.(model.val), codom(f)))
end

Old versus new implementations

The Catlab refactor has models for all the subtypes of SetFunction/FinDomFunction/FinFunction in old Catlab. These are CallableFunction, CompositeFunction, ConstantFunction, IdentityFunction, PredicatedFunction, FinFunctionVector, IndexedFinFunctionVector and FinFunctionDict.

Contrasting design of functions with old Catlab

There was not one force method for SetFunction but rather a variety of force methods for various subtypes of SetFunction. And those process of evaluating a sequence of functions was handled by type dispatch - any pair of implementations of functions could have its own idiosyncratic method for evaluating to a single implementation. This led to lots of complexity relative to the interface+model approach: it was harder to know which code will be actually executed given complicated types and it was harder to locate the relevant code within the codebase, as there is no obvious place for do_compose(f::MyImplA, g::MyImplB) to go, when MyImplA and MyImplB are defined in different places.

Regarding the complexity of type parameters for functions in old Catlab: the type FinFunction had four type parameters ({S, S′, Dom <: FinSet{S}, Codom <: FinSet{S′}}) in order to manage their dispatch. This gave us for free that every FinFunction was a FinDomFunction and every FinDomFunction was a SetFunction. This was often convenient but sometimes led to pernicious bugs where we really want to be able to make this distinction (the same reasons as above for not enforcing FinSetInt <: FinSet or FinSet <: SetOb).

File structure and future posts

The v0.16 to v0.17 refactor is concentrated on Catlab’s CategoricalAlgebra module. This was a folder with about 15 files in it, including FinCats (826 Lines Of Code), FinSets (1507 LOC), CSets (1418 LOC), FunctorialDataMigrations (456 LOC).

In the refactored version, these large files have been broken down into smaller ones which for the most part either declare a theory or declare a data structure and show how that data structure is a model of various theories. Another major structural change is to subdivide this into a sequence of modules:

  1. Cats (general machinery for categories, functors, limits, etc.)
  2. SetCats (specializing machinery in Cats to Set-like categories, including FinSet)
  3. Pointwise (specializing machinery in Cats to \mathsf{C}-Set-like categories, such as ACSets and VarACSets).

We’ll cover these in a follow-up blog post!

Footnotes

  1. Julia iterators are duck-typed; there is no single abstract type that unifies all the sorts of things which we can call iterate on. For that reason, our interface cannot be more precise than to say the iterator is of type Any. It is up to implementers of ThFinSet that the value returned by iterator() is iterable. Furthermore, it’s up to the implementor to make sure the length() result is consistent with the iterator() result, as shown in the commented equation.↩︎

  2. We do use FinSetInt(1) and FinSetInt(0) for the terminal and initial objects whenever we are working in \mathsf{Skel(FinSet)} rather than \mathsf{FinSet}.↩︎

  3. For example, when unioning a set with the empty set, the element type will be the union of the Julia types for the respective sets. If the empty set were represented with FinSet(0), then the element type of UnionSet(set_containing_strings, empty_set) would be Union{Int,String} rather than just String as we’d hope for.↩︎

  4. The expressivity of GATs doesn’t let us talk about a TYPE whose terms are precisely the domain elements: rather we have a Julia type Dom for which a subset of elements (those in the set dom()) are defined (likewise for Cod). The app function sends every Dom to a Cod, which isn’t as constrained as sending every element of dom() to an element of codom(). It is expected, but can only be enforced at runtime, that eltype(dom()) <: Dom (likewise for Cod). These things could be improved in the future by moving to a richer language than GATs, though it is fine in practice and is no worse than how things previously worked.↩︎

  5. Often postcomposing allows one to keep the same structure but just update the values, e.g. a function (defined on the domain \{1,...,n\}) represented by a length-n vector of values can be postcomposed with f by just applying f to each value in the vector. However, there are some function implementations which do fundamentally change upon postcomposition, such as postcomposing an IdentityFunction with some other f or postcomposing some f with a ConstantFunction.↩︎