Catlab Refactor I: GATlab Preliminaries
$$
$$
This is a first blog post of a series which will guide you through the upcoming Catlab major refactor. At the core of this refactor is another AlgebraicJulia package, GATlab.jl. This post explains how and why GATlab is important to Catlab.
Multiple dispatch vs explicit interfaces with GATlab
The key concept underlying this change is a contrast between two strategies for controlling what method gets executed when you type something like compose(x,y)
. Julia is ultimately governed by a strategy called multiple dispatch. One is allowed to define the following:
# objects are sets {1,2,...,n}
# morphisms n->m are functions, expressed as vectors of length n
id(x::Int)::Vector{Int} = 1:i
compose(f::Vector{Int}, g::Vector{Int})::Vector{Int} = g[f]
dom(f::Vector{Int}) = length(f)
# there is one object (`nothing` of the unit type, `Nothing`)
# morphisms are lists of things
id(x::Nothing)::Vector = []
compose(f::Vector, g::Vector)::Vector = [f..., g...]
dom(f::Vector)::Nothing = nothing
Which code gets executed when you call compose(x,y)
depends on the type of the arguments x
and y
. A problem occurs when two different categories have the same objects:
id(x::Int)::Matrix = I(x) # identity matrix
compose(f::Matrix,g::Matrix)::Matrix = f*g # matrix multiplication
dom(f::Matrix)::Int = first(size(f)) # number of rows
However, now we’ve defined id(x::Int)
twice, a problem which Julia will warn us about. We can circumvent this limitation in two ways. The first is to use a wrapper type:
# Just an integer, but formally a different type
# so multiple dispatch can treat it differently
struct MatrixOb
val::Int
end
id(x::MatrixOb) = I(x.val)
dom(f::Matrix) = MatrixOb(first(size(f)))
This is common design pattern in Julia, advertised in the earliest work on the language, and Catlab has often taken this approach. However, it can become very inconvenient. In order to have fine-grained control over the dispatch, Catlab was forced to promote an unwieldy amount of information into the type parameters of its data structures.
Another strategy to address this problem is to control dispatch by using an extra argument parameter, a strategy known as “Holy traits” in Julia. For technical reasons, we call such a parameter a model (of a theory). Such parameters are often singleton types, like below:
struct FinSetCat end # a model
id(::FinSetCat, x::Int) = 1:i
compose(::FinSetCat, f::Vector{Int}, g::Vector{Int}) = g[f]
dom(::FinSetCat, f::Vector{Int}) = length(f)
struct MatrixCat end # a model
id(::MatrixCat, x::Int) = I(x)
compose(::MatrixCat, f::Matrix,g::Matrix) = f*g
dom(::MatrixCat, f::Matrix) = first(size(f))
# model parameter used to control multiple dispatch
id(FinSetCat(), 4) == [1,2,3,4]
id(MatrixCat(), 2) == [1 0; 0 1]
An advantage of the latter approach is that it captures the fact that the collection of methods id
, compose
, dom
, etc. are grouped together into some cohesive unit (in other languages, this would be an interface or a typeclass). GATlab.jl allows us to make these groupings of methods explicit via GATs (generalized algebraic theories), which play the role of interfaces:
@theory ThCategory begin
# Syntactic sugar
@op begin
(→) := Hom
(⋅) := compose
end
# Types
Ob::TYPE
Hom(dom::Ob, codom::Ob)::TYPE
# Operations
id(A::Ob)::(A → A)
compose(f::(A → B), g::(B → C))::(A → C) ⊣ [A::Ob, B::Ob, C::Ob]
# Laws that the operations must satisfy
(f ⋅ g) ⋅ h == f ⋅ (g ⋅ h) ⊣ [A::Ob, B::Ob, C::Ob, D::Ob,
f::(A → B), g::(B → C), h::(C → D)]
f ⋅ id(B) == f ⊣ [A::Ob, B::Ob, f::(A → B)]
id(A) ⋅ f == f ⊣ [A::Ob, B::Ob, f::(A → B)]
end
GATlab also provides an @instance
macro to allow declaring a particular model for a theory:
@instance ThCategory{Ob=Int, Hom=Vector{Int}} [model::MatrixCat] begin
id(x::MatrixOb) = I(x.val)
compose(f::Matrix,g::Matrix)::Matrix = f*g
dom(f::Matrix)::MatrixOb = first(size(f))
codom(f::Matrix)::MatrixOb = last(size(f)))
end
Here the @instance
macro is handling lots of boilerplate; most importantly it defines the methods which have MatrixCat
as its first parameter, throwing an error if any operations have not been provided. Additionally, a getindex
method is generated to allow for passing in a model as the first parameter:
The WithModel
wrapper is owned by GATlab and helps us distinguish when a Julia value is used as a model rather than as a piece of data.
In summary, the major refactor of Catlab is a major shift away from type dispatch via intricate types and towards using explicit interfaces and models. Although Julia doesn’t provide a built-in notion of interface, with the help of Julia’s macro system, we can create an ergonomic interface system via a library.
Wrapped models
GATlab allows us to declare that some Julia type M
is a model of some theory T
. Once we do this, we can write code that takes in both raw data and models which guide how to process that data:
""" Expects `model` to be an instance of `ThCategory` """
function compose_three(model, x, y, z)
compose[model](compose[model](x,y), z)
end
We could make our assumptions more explicit (and get better errors than MethodNotFound
) if we do the following:
function compose_three(model, x::H, y::H, z::H) where {H}
implements(model, ThCategory, [Union{}, H]) || error(
"Model $model doesn't implement ThCategory with homs of type $H")
compose[model](compose[model](x,y), z)
end
implements
is provided by GATlab.1 However, we can do better with smart constructors - for example:
struct Category{Ob,Hom}
model::Any
function Category(model)
implements(model, ThCategory) || error("Bad model $model")
new{O,H}(model)
end
end
compose(c::Category, f, g) = compose[c.model](f, g)
# etc, for other methods
Now we can write a well-typed function which will not have any runtime errors:
compose_three(model::Category{O,H}, x::H, y::H, z::H) where {O,H} =
compose(model, compose(model, x,y), z)
Trying to use a Julia value that isn’t a model of ThCategory
will be caught earlier, whenever we tried to wrap it in the Category
smart constructor. GATlab can automatically generate two wrapper types for any GAT (a version which has type parameters and one which has no type parameters).
The Catlab refactor uses these smart constructors when possible, but there is one scenario in which the former ‘risky’ strategy is used: a Julia value can be a model of multiple theories. What if our function needs to take in such a model with the intent of using it for T₁
,T₂
, and T₃
? Until we have a wrapper for the union of these theories it’s more convenient to just pass in the ‘raw’ model and use it as a model of various theories:
The best way to address this is to glue the theories together T = T₁+T₂+T₃
(formally a pushout, which allows one to coherently do “multiple-inheritance” of interfaces) and make a wrapper type for theory T
. As our theory morphism2 infrastructure becomes more robust, we can rely less on the risky workaround and use this as the preferred solution.
GATs with fixed Julia types
A model of a theory assigns a Julia type for each type constructor of the theory (e.g. {Ob = Int, Hom = Matrix}
). Sometimes we know at the time we’re writing our theory that we have a particular Julia type in mind for a type constructor. GATlab provides the syntax MyTypeCon::TYPE{MyJuliaType}
within a @theory
declaration to allow one to make explicit that all models of the theory should send the type constructor MyTypeCon
to the julia type MyJuliaType
.
For example, Catlab has a DiscreteDiagram
type which wraps a vector of objects in some category. Any model which implements the below interface is a means of taking a discrete diagram into a AbstractColimit
and applying the universal property of that colimit to a cocone (i.e. multi-cospan) to obtain a hom from the apex of the colimit into the apex of the cocone.
@theory ThCategoryUnbiasedCoproducts <: ThCategoryColimitBase begin
DiscDiag::TYPE{DiscreteDiagram}
colimit(d::DiscDiag)::Colimit
universal(Σ::Colimit, d::DiscDiag, csp::MCospan)::(ob(Σ) → apex(csp))
end
# Note: `ThCategoryColimitBase <: ThCategory` fixes
# `Colimit` to the Julia type `AbstractColimit` and
# `MCospan` to the Julia type `Multicospan`
Without asserting various types of this theory should be sent to specific Julia types, the semantics of what colimit
means would only be constrained by documentation (it could be some arbitrary function between arbitrary types). As GATlab develops, a more expressive language for expressing theories could eventually allow for encoding richer constraints into the GAT itself (e.g. support for variadic arguments), and then we could rely less on fixing particular Julia types.
Putting it all together: handling sets in Catlab
To help solidify the abstract topics above, let’s compare how old and new Catlab handle the concept of a set. In old Catlab, we have SetOb
as an abstract type. Subtypes of SetOb
are expected to have certain methods defined for them, though this isn’t made explicit anywhere!
abstract type SetOb{T} end
Base.eltype(::Type{<:SetOb{T}}) where T = T
""" A Julia data type regarded as a set. """
struct TypeSet{T} <: SetOb{T} end
Base.in(elem,::TypeSet{T}) where T = isa(elem,T)
""" A bool-valued function tests membership in the set """
struct PredicatedSet{T} <: SetOb{T}
predicate::Any
end
function Base.in(s::PredicatedSet{T}, x::T)::Bool where {T}
s.predicate(x)
end
Here the abstract type SetOb
is the interface. Just like with AbstractVector
and other abstract types in Julia’s standard library, one’s best hope is to delve into documentation to know which methods are mandatory, which are optional, and which are derived from other interfaces. Whether or not one has implemented the interface appropriately cannot be checked at the time of declaring that a type implements the interface.
From the new perspective, there was a theory (i.e. interface) that should have been made explicit.
@theory ThSet begin
Bool′::TYPE{Bool}
Any′::TYPE{Any}
contains(e::Any′)::Bool′
eltype()::Any′
end
# Wrapper type
ThSet.Meta.@wrapper SetOb <: AbsSet
We reproduce the old behavior by making two models for that theory.
struct TypeSet{T} end
@instance ThSet [model::TypeSet{T}] where T begin
contains(i::Any)::Bool = i isa T
eltype()::Any = T
end
struct PredicatedSet{T}
predicate::Any
end
@instance ThSet [model::PredicatedSet{T}] where T begin
contains(i::Any)::Bool = i isa T && model.predicate(i)
eltype()::Any = T
end
If our model is not a zero-field struct, it might have some actual data inside of it that is relevant. In this case, we can refer to the model (which is implicitly the first argument for the functions defined within an @instance
) with the name model
, as was done above for model.predicate(i)
.
We no longer have the T
element type parameter for SetOb
. This was needed in old Catlab because a lot of code would dispatch on that T
parameter (e.g. limits and colimits - thus implicitly presupposing a particular ThCategory
model for objects of that type).
Stay tuned for future posts which will walk through key parts of the Catlab refactor!
Footnotes
implements
takes a modelm::M
, a theoryT
, and the Julia types associated with the type constructors ofT
. It uses Julia’shasmethod
function to check whether appropriate methods (withWithModel{M}
as the first argument) have been defined. If no types are provided, it will determine them via animpl_type
method.↩︎A key reason for why we want to think about interfaces in mathematical language of GATs is that we have a well-defined notion of morphism between GATs, enabling us to address problems like gluing together interfaces by using off-the-shelf mathematical tools. This is described in our paper.↩︎