Tess Language: A minimal set of practical additions to C
26 April 2026
For nearly the past year I've been working on a language and transpiler in C, and I wanted to share some of my initial questions and what I've come up with. The language is Tess, which is a mnemonic for 'typed s-expression', reflecting the origin of the project.
I've always enjoyed programming languages, and I've explored many, from Common Lisp to Zig, but I've most frequently come back to C++ for my more ambitious projects. So when I wanted to work on a compiler, I naturally started in C++. I first created a stack virtual machine with a minimal s-expression based instruction set. I spent some time optimizing it, benchmarking against Python's VM for tiny recursive programs like factorial, using optimisation tricks like computed goto.
Then I found myself spending (wasting?) an afternoon working on specializations to make use of the new C++
std::format library. That time investment reminded me that I frequently over-engineer things. I mean,
despite how neat std::format is, there's nothing wrong with to_string.
So I decided to switch to C, and to expressly take an under-engineered approach. This immediately raised the question: beyond the basics of C programming, are there any language features that would be nice to have?
TL;DR
Well, the final list I came up with is this:
- Namespaces
- Type inference (Hindley-Milner)
- Parametric polymorphism and monomorphisation
- Explicit allocators
- Lambda functions, stack or explicitly-allocated
- Tagged unions (i.e., sum types, ADTs, etc.)
defer,try- Traits
- Operator overloading
- Dot-call syntax (UFCS)
- Packages
- Seamless C interop
I built these features into a new language, incrementally, generally taking the path of least resistance, and embracing the simplicity and directness of C. Through a bit of constructive laziness, the new language is built on semantics that match C, with pass-by-value, unsafe pointers and casts. Enough people are working on "safe" languages that forbid these shenanigans. I grew to appreciate C and just want to make it more ergonomic.
Read on for more.
C+
If you're familiar with C programming, you know that some people use something like 'C+', which means writing C but using a C++ compiler to use one or possibly two language features C programmers appreciate: namespaces, and possibly operator overloading. This points to the first two affordances I wanted to consider for a minimal extension to C.
In parallel, I also had an additional itch to scratch: type inference. Writing C can be verbose: there are often times where type names have to be repeated despite it feeling like the compiler should know what we mean. So I committed to a Hindley-Milner style type inference system: constraints and unifications.
I did some superficial research, and didn't come across other projects that tried to combine unsafe C
semantics with Hindley-Milner, so I thought: excellent, this is either a terrible idea or a brilliant one.
It's probably the former. Of course, C++ has, and C will soon have, auto, but that is baby steps towards
HM inference.
Thus far, the minimal set of ergonomic additions to C that I think have widespread consensus are:
- Namespaces
- Operator overloading
To that I add my brilliant/terrible idea:
- Type inference
Next, I'm a big fan of generic types and functions, aka parametric polymorphism. When I first looked at Go, that was the first thing I missed. (Go later added generic types, but I haven't played much with it.)
With generic functions, we want specialisation: the compiler should analyse callsites and emit type-correct code so there is no need for runtime dispatch. Generic code should not perform worse than handwritten code.
So now the list is:
- Namespaces
- Operator overloading
- Type inference
- Parametric polymorphism + monomorphisation
This is a good minimal list, but it's missing several things that I find I enjoy when I program: polymorphic allocators, lambda functions, and tagged unions.
Allocators
Coming to this project right after learning Zig, I wanted to explore a language that required the use of explicit polymorphic allocators, same as Zig. It just seemed to make things easier in a lot of ways, though there were some ergonomic challenges, which I later mostly resolved by adding arity-based overloading and a default global (thread-local) allocator context.
So for example, from the Array.tl module:
(self: Ptr[Array[T]]): {
push(x: T) -> Void
push(alloc: Ptr[Allocator], x: T) -> Void
}
// The above is syntax sugar for declaring the free functions:
push[T](self: Ptr[Array[T]], x: T) -> Void
push[T](self: Ptr[Array[T]], alloc: Ptr[Allocator], x: T) -> Void
The three-arity form accepts an explicit allocator, and the two-arity form uses the default allocator, which
is user-configurable using an Allocator.Context. Simple arity-based overloading works much of the time,
and I try to limit it to this type of API. Erlang and Clojure were two of my inspirations for arity-based
overloads.
(In this snippet, I also included another nice-to-have, 'receiver blocks', which are syntax sugar around the idiomatic way we free write functions that act on a particular type. The term 'receiver' comes from Go. In Tess, we wrap groups of functions that have the same shape, to avoid repeating the same boilerplate in front of each function. Tess receiver blocks are also not limited to a single argument.)
Lambda functions
Lambda functions interact in interesting ways when we have explicit allocators. On one axis, we can distinguish lambda functions which capture references to their enclosing scope, and those that don't. On another axis, we distinguish lambda functions that escape their dynamic scope, and those that don't. The only lambdas which require heap allocation are those that escape their dynamic scope, because a capturing lambda that never leaves its scope can simply use pointers into the stack.
For example, the simplest lambda function is one that represents logic and captures nothing, and does not escape its dynamic scope:
// filter array to retain only even integers
arr.filter( (x) { x % 2 == 0 } )
// ^----------------^
// anon lambda function
Next up is a lambda that uses variables from the enclosing scope:
delta := 42
arr.map( (x) { x + delta } )
And, a lambda that modifies variables in their enclosing scope, i.e. which captures a reference.
count := 0
arr.filter( (x) { x % 2 == 0; count += 1 } )
By the way, lambda functions can be bound to a name, and are polymorphic over their arguments:
id := (x) { x }
str := id("hello")
num := id(123)
So far, these three types of lambdas never escape their dynamic scope, and therefore don't need heap allocation. In fact, they can be further optimized to avoid even stack allocation, but that's a compiler detail. The important thing is the lack of heap allocation. The compiler ensures that these stack-based lambda functions cannot be returned from their defining functions.
The other axis is for lambdas that need to escape their defining scope. These typically capture some values, which must be stored in a closure context.
make_adder(incr) {
[[alloc, capture(incr)]] (n) { n + incr }
// ^----------------------^ ^--------------^
// attribute lambda
}
The function make_adder takes a number and returns an allocated lambda, using the default allocator. This
is indicated by the use of the alloc attribute. An explicit allocator could also be used, by passing it as
an argument to the alloc attribute:
make_adder(a: Ptr[Allocator], incr) {
[[alloc(a), capture(incr)]] (n) { n + incr }
}
The adder can be used like this:
add_one := make_adder(1) answer := add_one(41)
The compiler enforces that a lambda returned from a function must be an allocated lambda, and to keep things
simple, the captures are all by value, not by reference. To share mutable state between lambda functions,
capture a Ptr to the state.
There are some oddities in the current design, however.
Stack lambdas automatically capture references (pointers) to symbols they use, because it's more general and easy to implement. Allocated lambdas capture values, and they must be explicit. The difference in semantics between these two forms reflects the fact their runtime impact is substantially different, and I erred on the side of explicitly making this visible to the user. On reflection, it's a rough edge in the current design.
Tagged unions, algebraic data types (ADTs), discriminated unions, enums, etc.
I settled on the name 'tagged union' because that is the simplest description of how C programmers implement this pattern:
Shape: | Circle { radius: Float }
| Square { length: Float }
| Other
area(sh) {
when sh {
c: Circle { 2 * pi * c.radius }
sq: Square { sq.length * sq.length }
}
}
Tagged unions are so useful for modelling all sorts of domain types, they're the first thing I reach for. I grew to really enjoy them when I spent time with F#.
Many languages have extensive support for these types, including all sorts of fancy pattern matching. I
opted for no pattern matching beyond the when expression shown above. I feel the explicit use of a binding
(e.g. c and sq in the example), while more verbose, makes parsing simpler and is easy to live with.
The compiler ensures that all variants are explicitly listed, or that an else arm exists.
Besides the most general when expression, there are a variety of syntax forms to make working with tagged
unions more ergonomic, such as try, conditional variant binding, etc. You'll want to see the
Language Reference for details.
Traits and operator overloading
Here's a simple trait declaration:
Eq[T]: {
eq(a: T, b: T) -> Bool
}
This says that all types satisfying the Eq trait must implement an eq function with two operands of that
type, returning Bool. The compiler detects the types of the operands of == and !=, and if they conform
(are the same type and implement eq), the compiler transforms the binary operations a == b to eq(a,
b).
This is a simple, structural trait conformance implementation. If a trait has more than one argument, all
arguments must be the same type (like eq above). This is simpler than most language trait implementations,
and is a real limitation to expressiveness. However, I feel this is enough to implement all operator
overloading, and avoids a great deal of complexity.
In a nutshell, you can use traits and operator overloading to add two Vec2 objects, but if you want to multiply by a scalar, you have to use a function call.
Vec2: { x: Int, y: Int }
add(a: Vec2, b: Vec2) {
Vec2(x = a.x + b.x, y = a.y + b.y)
}
foo() {
v1 := Vec2(1, 2)
v2 := Vec2(3, 4)
// + operator rewritten to add(v1, v2)
result := v1 + v2
}
Trait conformance is entirely structural. A new type does not declare its trait conformance. It simply
implements trait methods of the traits it wants to conform to, like add. (I did implement a method to
allow opting out of this automatic conformance, but I'm not convinced it's necessary.)
The implementation of trait conformance depends on the module system, which is Tess' approach to namespaces.
Since everything is a free function, including, for example, the operator-overloading add function, we
need a way for the compiler to find trait informations for a given type. Rather than implement type-based
function overloading, we require that types live inside a specific named module.
I feel this is a pretty lightweight requirement. A single source file can contain more than one module, and
module contents may be specified across multiple source files, just like the design of C++ namespaces. So
the above example would more likely have #module Vec2 before the Vec2 type definition and the add
function, and that's all there is to it. (This enables other ergonomic features, like automatic syntax sugar
to expand Vec2 to the fully qualified Vec2.Vec2 type at callsites in other modules.)
Other concerns that real-world programs present include resource management (e.g. RAII in C++), and error propagation.
Resource management (RAII)
Instead of C++ destructors, with all that entails, defer expressions are handy enough. These work
similarly to other languages with defer, so I won't go into detail here. They can be nested, are invoked
in reverse order, and are scoped to their enclosing block (not the outer function, as in some languages).
Error propagation
When a function returns a two-variant union like Result[TOk, TErr], you can replace multiple if ok {...} else
{return ...} with try ok.
Dependency management
There are various approaches to dependency and package management. I chose a conservative approach: packages must specify exact versions of any dependency they need, and name mangling ensures multiple versions of the same package can co-exist in the final binary. The benefits and tradeoffs of this approach are well understood and well argued, so I won't get into it here.
Combined with support for a package.tl.lock file, this ensures that the last version you successfully
tested and shipped will always work the same way, no matter who updates what dependency package.
Tess provides tools like tess init and tess pack to manage packages and .tpkg archives, and tess
fetch to download and verify dependencies and maintain a package.tl.lock file.
C interoperability
After initially starting with a high level language in mind, during implementation of the type system I decided to try to model C's pointers with Hindley-Milner, and eventually ended up with a language that is very close to C. So I leaned into that aspect.
Tess code can directly #include C source, which is embedded in the output C. C source code can be directly
included in Tess files, wrapped in #ifc/#endc blocks. Any symbol which starts with c_ is interpreted as
a C symbol and passed straight to the transpiled output after removing the prefix. This gives access to C
defines, functions, types, etc. c_struct_ can similarly be used to access names in the C struct namespace.
In order to participate in Tess type inference, C symbols can be declared with Tess type annotations,
indicating argument types, return types, field types, etc. Variadic C functions like printf are supported.
In practice, this combination makes it trivial to access C functions, as this complete program demonstrates:
#module main
#import <cstdio.tl>
#include <time.h>
// c_struct_ prefix indicates the type must be rendered in C as "struct <name>"
c_struct_timespec: {
// must provide concrete type for any field that will be referenced,
// but the struct need not include all fields
tv_sec: Int
}
main() {
// Initializing a variable with the value `void` is similar to
// declaring the variable in C but not initializing it.
ts: c_struct_timespec := void
c_timespec_get(ts.&, c_TIME_UTC)
c_printf(c"Current tv_sec = %i\n", ts.tv_sec)
0
}
The ability to include C source directly in a Tess source file makes it easy to drop down to C whenever necessary:
#module main
#ifc
int add(int a, int b) { return a + b; }
#endc
// Required: type annotation for C functions
c_add(a: CInt, b: CInt) -> CInt
main() {
result := c_add(-1, 1)
result
}
Dot-call, UFCS, method-call syntax
Every function in Tess is a free function, just like in plain C. Of course, it's common to create modules with a single type to hold data, and a collection of functions which act on that type. It's therefore convenient to be able to call those functions in a more object-oriented style, like a method call. This is just syntax sugar around a free function call, and uses the type of the operand to resolve the correct module.
arr := Array.with_capacity[Int](128) defer arr.free() arr.push(1) arr.push(2) // Desugars to: arr := Array.with_capacity[Int](128) defer Array.free(arr) Array.push(arr, 1) Array.push(arr, 2)
We also have auto-address-of and auto-dereference, which means dot-call syntax can be used with pointer and value arguments with no need to add a dereference or address-of operator at the callsite, if the called function's concrete type can be inferred or is annotated. In practice, I find that most library code I write uses explicit type annotations rather than rely on inference, because it makes the API easier to understand.
Reflections and status
This list of features seems to me to be the smallest set of most practical additions to a minimal and ubiquitous language. Implementing Hindley-Milner type inference with extensions to support C semantics was a good learning experience, and the result is what I feel a more ergonomic language for small C programs and libraries.
At the time of this writing, I have a few doubts:
- What should be the best API for allocating libraries like
ArrayandHashMap? I think it's nice and ergonomic to provide APIs that don't require an explicit allocator at each callsite. But at the same time, I like Zig's approach of making every allocation visible at each callsite. These two goals are contrary, and I haven't resolved that tension. - I like the explicit visibility of pointers and const in the C language, and I carried that over to Tess.
However, ergonomically, does the user want to see
Shapeas an argument, versusPtr[Shape]for mutating APIs, andPtr[Const[Shape]]for immutable APIs? I leaned strongly into the explicit here, but I'm not sure it's the most practical.
I've laid out some of the language features I find most useful in a relatively low level language very similar to C. My intention was to explore a practical and ergonomic language for small libraries and programs that could be built on C, with full access to the C ecosystem, and I think this language gets there, albeit with a few edges.
As far current status goes, the language implementation has 400-500 tests and a few small sample programs
and libraries. The user experience is still not production ready, mainly because of incomplete or missing
error messages. It's not uncommon to run into a compiler assertion or exit that should have been a compiler
error, so frequent use of tess check while writing code is still necessary, to avoid writing too much code
that makes narrowing down the compiler bug a bit of a pain.
From here, I am planning to write some real programs in Tess to find more bugs and illuminate its design edges and tradeoffs. I had planned to start with a language server and compiler, but I may find other projects instead. We'll see.
If you're at all intrigued by this project, please feel free to reach out and let me know.
Contact: @mocom@mastodon.social. Published using C-c C-e P p. If you use this content commercially, kindly make an appropriate donation to Zig Software Foundation in my name.