It has been recent news that Nlnet and NGI0-Core decided to support me for working on GNU Mes but I didn’t really share the motivation behind this proposal and my personal goals with it yet. Let’s do it here.
The problem
In my previous work in Bootstrapping GCC in RISC-V there was a lot of frustration due to the time the whole process needed. The slowest point in the process is not really in the GCC compilation step, but surprisingly earlier in the chain when TinyCC is built. This step is performed by GNU Mes.
For you to understand how slow this step is, I’ve been running things at night, while I was sleep, to get some results the next day, and when I was working on real RISC-V hardware like the VisionFive2 my friend set up for me1, the process took several days to finish. This supposed a huge delay on the process, and it forced me often to push hard at night, trying to make sure I could run a test build so I could have something ready for the next day2.
It is obvious then that making this very step faster is nothing but good for the process and even small improvements in speed would mean hours of time saved. This alone is enough to justify the effort.
But you are not here just for me telling you that making this faster is good. So let’s get a little bit more into the details.
During the work we did, we realized that the VisionFive2 was not that slow, comparing with my laptop, when building with GCC or TinyCC as it was with GNU Mes. The GNU Mes step is slow in my laptop, don’t get me wrong, but in the VisionFive2 it’s significantly slower. That’s what made us think, and Andrius suggested that might happen because small Single Board Computers (SBC) like the VisionFive2 have a very slow memory access, comparing with a laptop or a desktop computer.
That was enough for me to trigger.
GNU Mes
But first we need to understand what we are dealing with.
The goal of GNU Mes as a whole here is to connect M2-Planet, a compiler that is able to build a small subset of C, with TinyCC, a fully-featured and fast C compiler, that is able to build GCC 4.
GNU Mes has three main ingredients:
mes
: a Scheme interpreter that is compatible with Guile. This is written in a subset of C that M2-Planet is able to build.mescc
: a C compiler written in Scheme that is mutually-hosted withmes
.meslibc
: a minimal C standard library.
The bootstrapping process goes like this: mes
is compiled by M2-Planet. Once
mes
is built, it runs mescc
to build its standard library, and then it
builds TinyCC. Running mescc
is the part that is really slow.
mes
is a tree-walk interpreter. It loads scheme code in an AST and then runs
it directly. Macros are expanded as they are found, and code is processed as it comes.
mescc
uses the NYACC C parser that gives the C AST as a tree structure made
by lists. mescc
, written in the functional style that schemers like,
processes that tree.
The hypothesis
With all this in mind, my hypothesis is GNU Mes step is slow is because mes
,
the scheme interpreter, due to its design, makes the memory bottleneck obvious
in SBCs. Changing to a more memory-compact design that exploits the cache would
make mes
way faster.
When I say “due to its design” I mean two things: first that a tree-walk
interpreter requires jumping around the memory, as the scheme AST is not a
compact structure, and, second, generate a huge amounts of lists, the basic
scheme data structure. These two are also happening in mescc
, which builds
the C AST using scheme lists, and them processes them.
Having a compact data structure would increase the cache usage, reducing the access to memory, and making the program faster.
The project
The project aims to improve the performance of GNU Mes’ scheme interpreter,
rewriting it as a bytecode interpreter, while keeping it as simple and
readable as it is (or even more!). This would enable faster execution of the
Mes C Compiler (mescc
) for faster build times, making the bootstrapping chain
more accessible, specially in small single-board computers where memory access
is more expensive. This speedup could also lead to a reduction of steps in the
bootstrapping chain, making it simpler and easier to maintain.
On the other hand, our work is constrained by the C constructs that M2-Planet is able to deal with. The project includes a review on the new additions that have been done to M2-Planet in the most recent releases, that allow us to make Mes more readable and faster.
Some numbers
The first step of the project is to create a benchmark that would become useful later during the rewrite of the core of the interpreter. I didn’t finish that yet but it has already led me to some interesting numbers that talk a little bit about the feasibility of the project.
If the list is “the” scheme data structure we could try a simple program and put it to test:
(define (fib n)
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))
(display (fib 35))
Here is the wall time of several runs of that file:
- Guile: 0.35s — run as
guile fib.scm
- Guile: 3.33s — run as
guile --no-auto-compile fib.scm
(after cleaning the cache) - Guile: 8.51s — run as
GUILE_JIT_THRESHOLD=-1 guile --no-auto-compile fib.scm
- Chibi: 1.47s — run as
chibi-scheme -m scheme.small fib.scm
- Mes: 23.33s — when built with GCC
- Mes: 56.03s — when built with M2-Planet
Of course, this is not the only test I have to make but it is quite self-explanatory. We’ll take a deeper look to the numbers in later posts, but I think there’s room for improvement.
Some thoughts
I don’t think I say anything stupid here, and I think the numbers support the hypothesis. Of course, that’s nothing new: bytecode interpreters are known for long, and the reason why most of modern interpreters are bytecode interpreters (in the numbers above, both Guile and Chibi are) is exactly what I’m mentioning here.
Obviously, bytecode interpreters are not the perfect solution for everything, and just that is not the reason why things are fast or slow. If we had, say, a very slow list creation function, we would still be slow.
Being as slow as GNU Mes currently is, I think just a non-pessimization strategy should be enough (that’s what a bytecode interpreter is to some degree) to obtain a significant speed improvement. In any case, even small improvements can be very beneficial when we are talking about long processes.
I hope to make it around 10x faster. But we’ll see if I’m able to achieve that.
Personal goals
Of course my life is not always controlled only by the things I consider useful. A project has to have something else for me to prepare a proposal and spend a year or more on it. It has to be interesting and has to be aligned with my goals.
I’m not a compiler engineer and I’m not an optimization expert, and I’m not specially interested in becoming any of them. I’m interested in interpreters and programming languages as they are tools that I use every day, and I tried several times to make a programming language implementation but I didn’t have the time or the resources to do so. I have other things to do, after all.
I have also to admit I have other goals for this project that go a little bit further than just making GNU Mes faster. I think I should make my changes readable because I think it’s a great opportunity for others to learn. This also means documenting my decisions and design goals, and writing that down inside of the project’s documentation. We are going to learn together, and my process is going to be there for others to follow.
In summary, this project is a great example of something I think I can do, that I think is useful for the bootstrapping and aligns with my long term goals of learning about languages, specifically about Lisp implementations, and making the knowledge more accessible to others.
I never did this before, but my whole career has been me doing things I never did before, so I guess it would be fine. It always happens to be.
We’ll see.
Thanks for the support, and for believing that I could do this,