The categorical scoop on attributed C-sets
$$
$$
The trouble with C-sets is that they cannot model data structures such as labelled graphs, which are graphs where every edge and vertex has an associated element from some set of labels. In this blog post, we will discuss an extension to C-sets which we call attributed C-sets that solves this issues. This formalism is mainly inspired by the viewpoint of the “Algebraic Databases” paper (Schultz et al. 2016), though in truth we take more after the “databases” part than we do the “algebraic” part.
The double category of profunctors
It is possible to develop attributed C-sets within the 2-category of categories, functors, and natural transformations. However, we gain much in simplicity when we move to a double categorical framework, and it is for this reason that we dive into a bit of abstraction before moving to attributed C-sets.
From a graphical perspective, a double category is a mathematical object where the basic unit of analysis is a square that looks like this:
Although this looks like a commutative diagram, it is in general not. Rather, it is a depiction of a 2-cell, and the morphisms and objects are merely the boundaries of the 2-cell. In an actual diagram, the asterisks and arrows would all be named, but we have left out names for now to emphasize the shape. These squares can be composed both vertically:
and horizontally
A double category can be very succinctly defined as a category in \mathsf{Cat}. This is analogous to the definition of a Lie group as a “group in the category of smooth manifolds.” However, it is a little more subtle than that for a couple reasons, so we will spell it out explicitly. A double category \mathbb{D} is defined by
- A category \mathbb{D}_0. We say that an object in this category is an object in \mathbb{D} (i.e., the corners of a square). A morphism in this category is a vertical morphism in \mathbb{D} (i.e., the left and right sides of a square).
- A category \mathbb{D}_{1}. An object in this category is a horizontal morphism in \mathbb{D} (i.e., the top and bottom of the square), and a morphism is a 2-cell, i.e. an entire square. This will make more sense when we introduce the next element. Vertical composition of 2-cells is given by regular composition in \mathbb{D}_{1} (so in particular, it is strictly associative and unital).
- Two functors \operatorname{dom}, \operatorname{codom}: \mathbb{D}_{1} \to \mathbb{D}_{0}. On an object p of \mathbb{D}_{1} (i.e. a horizontal morphisms), \operatorname{dom} and \operatorname{codom} give the objects of \mathbb{D}_{0} (i.e. objects of \mathbb{D}) that are the domain and codomain of p. On a morphisms \alpha of \mathbb{D}_{1} (i.e. a 2-cell), \operatorname{dom} and \operatorname{codom} give the morphisms of \mathbb{D}_{0} (i.e. vertical morphisms) that are on the left and right of \alpha.
- The heart of a double category is horizontal composition. Intuitively, horizontal composition is a functor from the category of “pairs of composable horizontal morphisms” to \mathbb{D}_{1}. This category is the equalizer of \operatorname{codom}\circ \pi_{1} and \operatorname{dom}\circ \pi_{2} which both go from \mathbb{D}_{1} \times \mathbb{D}_1 to \mathbb{D}_{0}. We write it as an equalizer rather than a pullback to emphasize that it is a sub-category of \mathbb{D}_{1} \times \mathbb{D}_{1}. In any case, we have a functor \odot from this equalizer to \mathbb{D}_{1}, and this functor on objects is horizontal composition of horizontal morphisms, and on morphisms is horizontal composition of 2-cells.
- Finally, we have a functor u from \mathbb{D}_{0} to \mathbb{D}_{1} that is a unit for \odot, again up to unique natural isomorphism. This gives the identity for horizontal composition of both horizontal morphisms (when applied to objects of \mathbb{D}_{0}), and 2-cells (when applied to morphisms of \mathbb{D}_{0}).
Before we get into profunctors, let’s talk about some simple examples of double categories.
Example 1. Given any category \mathsf{C}, there is a double category where the 2-cells are commutative squares in \mathsf{C}. The category \mathbb{D}_{1} is actually a familiar category: the arrow category of \mathsf{C}. This is a strict double category, in that the natural isomorphisms for associativity and unitality of C are actually the identity.
Example 2. Given any 2-category \mathsf{C}, there is a double category where the 2-cells are squares that commute up to 2-isomorphism. The reader is invited to investigate this double category further.
Example 3. Given any category \mathsf{C} with pushouts, there is a double category where the horizontal morphisms are cospans in \mathsf{C}. A 2-cell is a commutative diagram in \mathsf{C} that looks like this:
and horizontal composition is by pushout. John Baez and Kenny Courser make a very compelling argument that this construction is key to understanding open systems, and we encourage the interested reader to look at the work that they have done in this area (Baez and Courser 2019; Courser 2020).
The double category that we are interested in is called \mathbb{P}\mathsf{rof}, and its generic 2-cell is illustrated in the following diagram.
\mathsf{A}, \mathsf{B}, \mathsf{X}, and \mathsf{Y} are all categories. F and G are functors \mathsf{A} \to \mathsf{X} and \mathsf{B} \to \mathsf{Y} respectively. M and P are functors \mathsf{A}^{\mathrm{op}} \times \mathsf{B} \to \mathsf{Set} and \mathsf{X}^{\mathrm{op}} \times \mathsf{Y} \to \mathsf{Set} (also known as profunctors). And finally, \alpha is a natural transformation from M to P \circ (F \times G). That is, the components of \alpha have the form \alpha_{A,B} : M(A,B) \to P(F(A),G(B)).
What we have said so far should be sufficient for the reader to implement the first three parts of a double category, and we encourage the reader to do so before continuing.
At this point, we have not defined composition or units in \mathbb{P}\mathsf{rof}. However, we have enough material to define attributed C-sets at this point, so in order to give the reader a break from abstract nonsense, we will talk about them now.
What is an attributed C-set?
If we think about C-sets as databases, their “schemas” are finitely presented categories. Before we talk about attributed C-sets, we must talk about the construct that plays an analogous role. The “schema” for an attributed C-set is a horizontal morphism \operatorname{Attr} in \mathbb{P}\mathsf{rof} whose source is a finitely presented category \mathsf{C}, and whose target is a finitely presented discrete category \mathsf{Data}. We visualize this in the following way:
Let’s parse this diagram step-by-step. On the left in yellow, we have \mathsf{C}, and this is a standard way of picturing a finitely-presented category. On the right in blue, we have a discrete category (no arrows). Finally, connecting \mathsf{C} and \mathsf{Data}, we have two green ovals. These ovals represent the sets \operatorname{Attr}(E,X) and \operatorname{Attr}(V,X). We picture an element of \operatorname{Attr}(E,X) as an arrow from E to X, which is consistent with the viewpoint that a profunctor is a heterogenous Hom-functor. We call an element of \operatorname{Attr}(E,X) an attribute.
The reader is likely unfamiliar with the notation \operatorname{src}; \operatorname{vdec} because to our knowledge, it is non-standard. A more standard way of writing it would be \operatorname{Attr}(\operatorname{src},\operatorname{id}_{X})(\operatorname{vdec}). Namely, it is the action of the morphism \operatorname{src}\in \operatorname{Hom}_{\mathsf{C}} on \operatorname{vdec}\in \operatorname{Attr}(V, X). We write it like \operatorname{src}; \operatorname{vdec} to suggest that we are precomposing \operatorname{src} with \operatorname{vdec}.1
As always, when we are doing computational category theory we must introduce presentations whenever we introduce new mathematical object. In this case, to present a “schema”, we present a category \mathsf{C} and a discrete category \mathsf{Data} in a standard way, but we also must present a profunctor, and we expect that the reader is not so familiar with this.
Fortunately, it is not so difficult; to present a profunctor we simply give generating elements in each of the component sets (i.e., \operatorname{edec}\in \operatorname{Attr}(E,X)), and then we write down some equalities involving precomposition (for instance, if we wanted to say that the decoration on an edge was equal to the decoration on its source, we could write \operatorname{src}; \operatorname{vdec}= \operatorname{edec}).
In Catlab, we would write down the presentation of decorated graphs like so:
@present SchDecGraph(FreeSchema) begin
V::Ob
E::Ob
src::Hom(E,V)
tgt::Hom(E,V)
X::AttrType
edec::Attr(E,X)
vdec::Attr(V,X)
end
Note that this presention has no equations. If we did the category of decorated symmetric graphs, we would end up with
@present SchDecSymmetricGraph <: SchDecGraph begin
inv::Hom(E,E)
compose(inv,inv) == id(E)
compose(inv,src) == tgt
compose(inv,tgt) == src
compose(inv,edec) == edec
end
The last equation says that the decoration on an edge should be the same as the decoration on its inverse.
It is perhaps best to understand attributed C-sets by looking at how we implement them in Catlab. Each object in \mathsf{C} becomes a table. The table for c \in \mathsf{C} has a column of type Int
for each generating morphism with domain c, and a column of some other type for each generating attribute with domain c. We will later see exactly how this type is determined.
The columns corresponding to morphisms in \mathsf{C} are similar to FOREIGN KEY
columns in SQL: they point to rows in other tables. The columns corresponding to attributes are just normal columns, but they might have some constraints corresponding to the equations in the presentation.
Formally, we define an attributed C-set on the schema (\mathsf{C},\mathsf{Data},\operatorname{Attr}) to be a 2-cell in \mathbb{P}\mathsf{rof} that looks like this:
Let’s pick apart this definition. First of all, we have an ordinary C-set F. This is the “core” of an attributed C-set, and we should understand this fairly well.
Then we have K, which gives the attributes types. In this presentation, we have chosen to make \mathsf{Set} the codomain of K. However, in Catlab, the codomain of K is really the category of Julia types. This is not so well-behaved, but that’s the difference between math and computing for you. Typically, we actually treat K as part of the schema. This means that we would be working with \mathbb{R}-decorated graphs, or \mathbb{Z}-decorated graphs, instead of just graphs decorated by an unspecified type.
Finally, we have \alpha, which sends an attribute x \in \operatorname{Attr}(c,X) to a column of the data table for c, of type K(X). The profunctor \operatorname{Arr} is defined in exactly such a way to make this work: \operatorname{Arr}([m],K(X)) is the set of length-m arrays with element type K(X), where [m] denotes the set \{1,\ldots,m\}.2
Another way of thinking about this is that we can treat F as a 2-cell from the profunctor \operatorname{Hom}_{\mathsf{C}} to the profunctor \operatorname{Hom}_{\mathsf{FinSet}}. In this guise, F sends f : c \to d to an element F(f) \in \operatorname{Hom}_{\mathsf{FinSet}}(F(c),F(d)) that represents the column in the data table for c. When thought about this way, it becomes more clear how F and \alpha are analogous even though they seem like they are different shapes.
Back to \mathbb{P}\mathsf{rof}
Note: for the reader with categorical inclinations, this section will be very interesting. For the reader who is mainly interested in attributed C-sets, we suggest skimming this section and then moving on to the next.
We now know what an attributed C-set is, but in category theory, that’s only half the battle! We must also define the morphisms!
To do this, we need to further investigate \mathbb{P}\mathsf{rof}. One good way of thinking about profunctors is that they are kind of like matrices (Schultz et al. 2016). That is, if we work with sets instead of categories, a map C \times D \to \mathbb{R} is a matrix. This analogy goes deeper than one might expect.
In the context of matrices, the following 5 things are all equivalent.
- A map C \times D \to \mathbb{R}
- A map C \to \mathbb{R}^{D}
- A map D \to \mathbb{R}^{C}
- A linear map \mathbb{R}^{C} \to \mathbb{R}^{D}.
- A linear map \mathbb{R}^{D} \to \mathbb{R}^{C}.
It turns out that this is directly analogous to profunctors. A profunctor P : \mathsf{C} ⇸\mathsf{D} has the following equivalent forms.
- A functor P : \mathsf{C}^{\mathrm{op}} \times \mathsf{D} \to \mathsf{Set}.
- A functor \mathsf{C}^{\mathrm{op}} \to \mathsf{Set}^{\mathsf{D}}.
- A functor \mathsf{D} \to \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}.
- A colimit-preserving functor \Lambda_{P} : \mathsf{Set}^{\mathsf{C}} \to \mathsf{Set}^{\mathsf{D}}.
- A colimit-preserving functor \Lambda_{P}' : \mathsf{Set}^{D^{\mathrm{op}}} \to \mathsf{Set}^{\mathsf{C}^{\mathrm{op}}}. This is for the same reasons as above.
The equivalence of 1, 2, and 3 are provided by the product-hom adjunction in \mathsf{Cat}. However, understanding 4 and 5 takes some work. We explain this by analogy, rather than by rigorous proof. The analogy proceeds in several steps.
- Just as the canonical basis vectors of \mathbb{R}^{C} are in canonical bijection with C, the representable functors (i.e. functors of the form \operatorname{Hom}(c,-) : \mathsf{C} \to \mathsf{Set} for c \in \mathsf{C}) are in canonical isomorphism with \mathsf{C}^{\mathrm{op}} (by the map c \mapsto \operatorname{Hom}(c,-), of course!). This is important, because in the next step we will see that the representable functors form a basis of sorts for \mathsf{Set}^{\mathsf{C}}.
- Just as any element of \mathbb{R}^{C} can be written canonically as a linear combination of basis vectors, any element of \mathsf{Set}^{\mathsf{C}} can be written canonically as a colimit of representable functors.
- Therefore, just a linear map is uniquely determined by its action on a basis, which is the same thing as a map C \to \mathbb{R}^{D}, a colimit-preserving functor is uniquely determined by its action on the representable functors, which is the same thing as a map \mathsf{C}^{\mathrm{op}} \to \mathsf{Set}^{\mathsf{D}}.
To go beyond the presentation in terms of analogy, we encourage the reader to consult a standard text on category theory, such as (Riehl 2016), and look up the Yoneda lemma and the writing of a presheaf as a canonical colimit of representables.
The fourth equivalent form is the most useful form for defining the composition of P : \mathsf{C} ⇸\mathsf{D} and Q : \mathsf{D} ⇸\mathsf{E}. That is, simply compose \Lambda_{P} and \Lambda_{Q}! This works because colimits are preserved across the composition. Additionally, it is clear what the identity is: 1_{\mathsf{Set}^{\mathsf{C}}}!
However, it is also useful to have an explicit formula for composition and identity in terms of profunctors as functors \mathsf{C}^{\mathrm{op}} \times \mathsf{D} \to \mathsf{Set}. This is provided by a funny little colimit called a coend, that looks like this:
(P \odot Q)(c,e) = \int^{d \in \mathsf{D}} P(c,d) \times Q(d,e)
Essentially, \int^{d \in \mathsf{D}} P(c,d) \times Q(d,e) is the coproduct \coprod_{d \in \mathsf{D}} P(c,d) \times Q(d,e) modulo the equivalence relation that for any x \in P(a,d_{1}), f : d_{1} \to d_{2}, and y \in Q(d_{2},c), (x;f,y) \sim (x,f;y).
This gives a rather neat explanation of the identity
\int^{c' \in \mathsf{C}} \operatorname{Hom}_{\mathsf{C}}(c,c') \times P(c',d) \cong P(c,d)
Namely, to make the function \to, we map (f,x) to f;x, and to make the function \leftarrow, we map x to (1_c,x). The reader can verify that this defines a bijection.
The reader is encouraged to verify that taking the identity on \mathsf{Set}^{\mathsf{C}} and contorting it into a map \mathsf{C}^{\mathrm{op}} \times \mathsf{C} \to \mathsf{Set} gives precisely \operatorname{Hom}_{\mathsf{C}}.
Now, this gives us horizontal composition of horizontal arrows. We still need horizontal composition of 2-cells. However, the seeds of this composition are already sewn in the definition of horizontal composition as a colimit. If \alpha : M \to P and \beta : N \to Q are 2-cells, as depicted in the diagram
then we can use the universal property of the coend to define
(\alpha \odot \beta)(c,e) = \int^{d \in \mathsf{D}} \alpha(c,d) \times \beta(d,e)
That is, for each d we have a map \alpha(c,d) \times \beta(d,e) : M(c,d) \times N(d,e) \to P(F(c),G(d)) \times Q(G(d),H(e)), and each element of (M \odot N)(c,e) is represented by at least one (x,y) \in M(c,d) \times N(d,e) for some d, so we just apply \alpha(c,d) \times \beta(d,e) to that representative, and we get a representative of something in (P \odot Q)(F(c),H(e)).
With this, we have a full definition of the double category \mathbb{P}\mathsf{rof}. Before we get back to attributed C-sets, we will quickly discuss one interestesting feature of this category.
The question is, what is a 2-cell with a frame that looks like this:
Given that the top and bottom are the identity for horizontal composition, one might suggestively write this as
This implies that \alpha might be a natural transformation between F and G. Let’s see if that holds up. Define \bar{\alpha}_{c} = \alpha(c,c)(1_{c}) \in \operatorname{Hom}_{\mathsf{D}}(F(c),G(c)). Suppose that f : c_{1} \to c_{2} in C. Then by naturality of \alpha,
f;\alpha(c_{2},c_{2})(1_{c_{2}}) = \alpha(c_{1},c_{2})(f) = \alpha(c_{1},c_{1})(1_{c_{1}});f.
However, this is precisely the naturality condition for \bar{\alpha} in disguise, as f;\alpha(c_{2},c_{2})(1_{c_{2}}) = F(f) ; \bar{\alpha}_{c_{2}}, and \alpha(c_{1},c_{1})(1_{c_{1}}) = \bar{\alpha}_{c_{1}} ; G(f).
Moreover, if we have a natural transformation \beta : F \Rightarrow G, we can define \tilde{\beta} by \tilde{\beta}(c_{1},c_{2})(f) = F(f) ; \beta_{c_{2}} = \beta_{c_{1}} ; G(f).
Therefore, the 2-category \mathsf{Cat} is hiding in \mathbb{P}\mathsf{rof}!
We now have all the technology we need to finish the definition of attributed C-sets.
The category of attributed C-sets on a schema
A morphism of C-sets is fairly easy to define: it is just a natural transformation \gamma between two functors F,G : \mathsf{C} \to \mathsf{Set}. If (F,\alpha) and (G,\beta) are attributed C-sets on the schema (\mathsf{C},\mathsf{Data},\operatorname{Attr},K), one natural definition for morphisms between them is \gamma : F \Rightarrow G such that \gamma preserves attributes. This is most clearly seen if we introduce some notation. If a \in F(c) and x \in \operatorname{Attr}(c,X), we define a.x to be \alpha_{(c,X)}(x)(a). That is, a.x is the element of K(X) that \alpha assigns to a. The reason for this notation is that it reminds us of dot-syntax in programming languages, where swiss.cheese
returns the value of the field cheese
on the object swiss
. Viewing a as a row in a database, a.x is the value of column x at row a.
In any case, if \gamma : F \to G is a natural transformation, we say that it preserves attributes if \gamma_{c}(a).x = a.x for all c \in \mathsf{C}, X \in \mathsf{Data}, a \in F(c), and x \in \operatorname{Attr}(A,X).
For the reader who was paying attention in the last section, it turns out that this condition can be very succinctly stated categorically as \alpha = \tilde{\gamma} \odot \beta, or in commutative diagram form as
To verify this, one simply needs to chug through the definition of \tilde{\gamma} and the definition of \odot. This defines the category of attributed C-sets over a schema as a sort of double category-theoretic analogue of a slice category.
The only thing remaining is to name this category! We have chosen to name it \operatorname{Attr}^{K}\text{-}\mathsf{C}\text{-}\mathsf{Set}, which can be shortened to \operatorname{Attr}\text{-}\mathsf{Set} if \mathsf{C} and K are clear from the context.
However, this is not the only possible definition for a category with attributed C-sets as objects; there are other types of morphisms that one might want to consider. For instance, we could let a morphism be a commutative diagram of the form
Here \delta would be a natural transformation between J,K : \mathsf{Data} \to \mathsf{Set}, so \tilde{\delta} would be an analogous 2-cell to \tilde{\gamma}. In this type of morphism, attributes are allowed to change, but they need to be related by a natural transformation. We are less likely to use this kind of morphism in Catlab for the simple reason that Julia functions don’t serialize, and Julia functions are the homs in our implementation of \mathsf{Set}. However, it is an equally valid definition. A special case of this would be J = K.
Both of these categories are instances of a more general construction which takes slice categories into the double-category realm. The properties and full definition of this more general construction have yet to be determined, but we will likely cover this in a future blog post.
Variations and extensions
A simple variation on attributed C-sets would be to replace \mathsf{FinSet} with the category of finite sets and partial maps, or replace \operatorname{Arr} with the profunctor that takes ([m],S) to the set of partial maps from [m] to S. The first variation is actually in Catlab, because it turned out to be convenient.
However, there are also many other possiblilities. For instance, we could allow C to have finite products, and force F to be product-preserving, or replace \mathsf{FinSet} with \mathsf{FinRel}. One open problem is how to specify that some attributes or morphisms should end up being injective. It is clear that we have just scratched the surface in terms of things that fit this pattern, and we are excited to see what the future has in store for attributed things!
References
Footnotes
One way of thinking about this is by analogy to group actions on a set. If \mathsf{C} and \mathsf{D} are single-element categories with all invertible morphisms (i.e. groups), then a profunctor \mathsf{C} ⇸\mathsf{D} is precisely a set with a left action of \mathsf{C} and a right action of \mathsf{D}.↩︎
In an actual implementation, we do not store the column for all of the attributes, just the generating attributes. This saves space.↩︎