Introducing InterTypes

acsets
integration
Announcing the first version of InterTypes: a package for cross-language serialization for ADTs and ACSets
Author

Owen Lynch

Published

November 14, 2023

Motivation

Part of the AlgebraicJulia vision for scientific computing is that a scientific model should be piece of data that can be inspected, analyzed, passed between programming languages, and saved in a database.

In order to do this, we need to make sure that different languages can load and save the models.

One way to do this would be to define a data type for “all scientific models”, and then implement that data type in each programming language we care about. But this is clearly ridiculous; there is no one data type that can encompass every single scientific model. Moreover, often we want to specify that we only want a certain type of scientific model.

Another approach would be to manually implement, for each type of scientific model, types in every language we care about. However, this is an m \times n problem, where m is the number of languages and n is the number of types of scientific models. Moreover, it is error-prone (because there are subtle differences between the type systems of different languages), and would be a massive drag on rapid iteration; any new type of model or change to a modeling framework needs to be implemented across many different languages.

The better way to do it would be to define your types once, in a language-agnostic way, and then generate the types in each language automatically along with serialization/deserialization code. This sort of system has been done before: see

The relevant XKCD is, of course,

So why make a new one? Well, I want to support ACSets natively, as many of our scientific models are built on top of them (Patterson, Lynch, and Fairbanks 2021). And it seemed that modifying an existing system would be more work than building a new one from scratch. But more importantly, I find that building this kind of thing from scratch gives you a much better picture of the kind of design decisions that go into this, and thus if I end up trying to modify a pre-existing one later down the line I’ll have a better idea of how to go about it.

One difference between InterTypes and these other formats is that I don’t intend (at least at first) to have a custom serialization format that goes along with it. The main feature of InterType is to generate the data structures in each programming language. Currently we have serialization/deserialization to JSON (including 64-bit integer support!), but we could also support other serialization formats. The whole point of InterTypes is that you shouldn’t have to think about the underlying serialization details. Along with this, from an intertype specification one can generate a JSONSchema file that describes the JSON produced by the automatically generated serialization.

Use

WARNING: InterTypes is alpha-quality software, and not only are there certainly bugs but also the interface to it may change radically.

The core of InterTypes is the intertype schema, which declares a collection of types that can refer to one another. An intertype schema is a file ending in .it. Currently, we use the Julia parser to parse the .it file, and we use Julia to generate code for all languages. However, we hope to in the future to produce a standalone binary that will parse the .it file and generate the code in other languages. So although .it files look like Julia, most features of Julia will not work with in an intertype schema; for instance, you cannot define functions in an intertype schema, or refer to types that are defined outside of an intertype schema.

There are 4 fundamental building blocks of InterTypes.

  1. Primitive types.
  • Int32/Int64/UInt32/UInt64 for integer numbers. We have 32 bit and 64 bit integers, because only 32 bit integers are safe to put in JSON numbers and 64 bit integers must be put in JSON strings.
  • Bool for booleans.
  • Float64 for floating point numbers.
  • String for strings.
  • Symbol for symbols. In languages that don’t have symbols, this is the same as String.
  • Vector{T} for representing sequences (arrays and lists) of type T
  • Binary for sequences of raw bytes without a numeric interpretation. In Julia, this maps to Vector{UInt8}, but we think of it as a binary blob rather than a sequence of values.
  • Dict{K,V} for representing dictionaries with key type K and value type V.
  1. Structs. A struct has a list of fields and each field has a name and a type. This looks like:
struct Point2D
  x::Float64
  y::Float64
end
  1. Sum types, also known as “tagged unions”. A sum type has a list of variants, and each variant is a record containing fields. This looks like
@sum Op begin
  Plus
  Mul
end

@sum Term begin
  Constant(val::Float64)
  App(op::Op, arg1::Term, arg2::Term)
end
  1. ACSets. ACSets are handled a little differently than they work in Julia, in order to paper over the fact that I have yet to fully figure out Python’s (and pydantic’s, which is the validation/serialization framework) support for generic types. When you declare the schema, you have to specify a concrete type for every AttrType. Then, when you declare an instance of the schema, you do not get a generic instance like you do in Julia; you get an instance with attribute types fixed to the supplied types. This looks like:
struct EdgeData
  name::Symbol
  length::UInt64
end

@schema SchGraph begin
  (E,V)::Ob
  (src, tgt)::Hom(E, V)
end

@schema SchWeightedGraph <: SchGraph begin
  Weight::AttrType(EdgeData) # note that we provide a type here
  weight::Attr(E, Weight)
end

@abstract_acset_type AbstractGraph

@acset_type EDWeightedGraph(SchWeightedGraph,
                            generic=WeightedGraph, index=[:src, :tgt]) <: AbstractGraph

This is equivalent to the Julia code

struct EdgeData
  name::Symbol
  length::UInt64
end

@schema SchGraph begin
  (E,V)::Ob
  (src, tgt)::Hom(E, V)
end

@schema SchWeightedGraph <: SchGraph begin
  Weight::AttrType # note that there is no type here
  weight::Attr(E, Weight)
end

@abstract_acset_type AbstractGraph

@acset_type WeightedGraph(SchWeightedGraph, index=[:src, :tgt]) <: AbstractGraph

const EDWeightedGraph = WeightedGraph{EdgeData}

However in the Python code, no data structure with the name WeightedGraph is produced; only EDWeightedGraph. This is because the Python and Julia ACSets code were written pre-intertype, so their handling of attrtypes weren’t fully compatible, and we had to get something working; hopefully in the future Python and Julia will be more congruous. This is a good first issue for someone familiar with types in Python/pydantic!

To use an intertype schema, one “declares an intertype module” like so:

@intertypes "weightedgraph.it" module weightedgraph end

Then weightedgraph is a module that contains an export for each type defined in weightedgraph.it. It also contains a Meta variable, which stores the parsed intertype definition. This can then be used to write out generated python code, via

generate_python_module(weightedgraph, ".")

which writes a python file called weightedgraph.py in the current directory. This python file imports both acsets and intertypes, so in order to use it one must have the py-acsets library installed, and also a copy of intertypes.py, which can be produced with

write("intertypes.py", InterTypes.INTERTYPE_PYTHON_MODULE)

In a similar manner, a JSONSchema definition for the json produced by intertypes can be produced with

generate_jsonschema_module(weightedgraph, ".")

which writes a JSONSchema file called weightedgraph_schema.json in the current directory. This is a file which has a JSONSchema def for each type in the intertype definition file.

Intertype modules can refer to one another. For instance, we could write another file called twoweightedgraphs.it with contents of:

struct TwoWeightedGraphs
  g1::weightedgraph.EDWeightedGraph
  g2::weightedgraph.EDWeightedGraph
end

and then import it like:

@intertypes "twoweightedgraphs.it" module twoweightedgraphs
  import ..weightedgraph
end

In fact, weightedgraph.it and twoweightedgraphs.it could be in completely different packages; as long as the first package exports the weightedgraph Julia module this will work fine.

For more examples of how to use intertypes, it would probably be best to refer to the test file.

Future Work

There are a lot of directions I’m excited to take intertypes in.

First of all, I need to write more documentation beyond this blog post.

After that, I plan to add support for Scala, TypeScript, and Rust. Scala in particular would solve a lot of problems between AlgebraicJulia and Semagrams, so I’m going to tackle that next.

Thirdly, I’d like to think about integrating GATlab with InterType, so that scientific models with algebraic expressions in them can be first-class.

Longer down the road, I want to investigate combinatorial data structures beyond acsets, as laid out in array systems and combinatorial data structures via finite existential types, and also think about structured version control for intertypes in line with chit.

But finally, I want to use intertypes to make the vision at the beginning a reality, a vision where scientific models can be passed around between programming languages and stored in databases. This is beyond a technical vision; this is a social vision; I hope to reshape how people think about scientific models. If you are interested in this, please reach out on the category theory zulip, julia zulip, localcharts, or github issues.

References

Patterson, Evan, Owen Lynch, and James Fairbanks. 2021. “Categorical Data Structures for Technical Computing.” https://doi.org/10.32408/compositionality-4-5.