Ekaitz's tech blog:
I make stuff at ElenQ Technology and I talk about it

Milestone — Source to Binary RISC-V support in GCC 4.6.4

From the series: Bootstrapping GCC in RISC-V

In the series we already introduced GCC, and we already shared how I backported the RISC-V support from the GCC core to GCC-4.6.4. Now it’s time to finish what we left half-done and actually introduce a full RISC-V compiler.

Where we left last time

The Tuesday, 7th of April, I marked a commit with the minimal-compiler tag. That commit contains all the work we did until that time. In that tag we describe how we can build a compiler that is only able to assemble files to RISC-V.

As we already explained around here, GCC is a driver program that calls other programs to do its work. The GCC core compiles the code to assembly language and then calls binutils to do the rest of the work: assembly and linking.

At that point, we had to call binutils by hand.

The changes

The changes applied at the time of writing are available in the working-compiler tag. As the tag message describes, they were split in two different branches: the guix-package branch and the riscv branch.

The guix_package branch is merged in the riscv branch but this split lets us differentiate which changes are related with the compiler itself and which are related with the tooling around the compiler. That way we’ll be able to choose what to do with the commits easily in the future. We’ll probably need to rearrange some stuff.

The context is everything: Guix package part

The guix_package branch contains all the commits that make the Guix tooling around the project work. This includes the compilation process definition in a reproducible way, the environment setup and all that.

As the working-compiler tag message describes, this is the way you can currently make this compiler work and play with it:

$ guix shell -m manifest.scm
$ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu
 # This second command will prepare the PATH and other environment
 # variables to make GCC find libraries and executables

If you use this in the future and it fails, it might be because between the time this blog post was written and you read it Guix made some changes in the core packages that are used. You can always use the time-machine utility to make sure you use everything like in the moment this post was written:
guix time-machine --channels=channels.scm -- shell -m manifest.scm

From this point you can directly run the compiler, it will need the sysroot option to be able to find the crt* files, but that’s something I’m not worried about at this point, we’ll fix that when we integrate this in the bootstrapping process.

Run the compiler like this now:

$ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static]  ...

Notable changes in the Guix side

The most notable change in the Guix side is the addition of the manifest.scm file and also the PREPARE_FOR_COMPILATION.sh file. With the help of my man Janneke, I realized the problems I had came from the fact that I was calling the compiler with the wrong environment and it was unable to find the linker and the assembler. Yes, this kind of things happen a lot in Guix if you are not careful (and I am not careful at all). Adding these tools let me prepare a working environment where the assembler and the compiler are found and called properly.

This change also includes the some interesting extras: the GLibC added to the manifest also contains the static version so we can generate static binaries that are easier to test in an emulated environment without having to deal with the dynamic linker. Important stuff.

Also, now the compilation process relies on a newer Guix version, which removed the -unknown part from the triplets (actually quadruplets), like riscv64-unknown-linux-gnu. That was a little bit of a pain, because I just tried to compile everything one day and failed, and in the end it was just that small change. I decided to update the Guix version needed to keep it up-to-date with the current Guix, so I didn’t need to run guix time-machine each time. It’s better like this.

If you want to read more about the change and see how fast Guix people helped me understand what was going on, see this mailing list thread1. I have also to mention that I needed to add a small change to my GCC to be able to work in the case the -unknown part was not added to it: adding riscv to config.sub was enough for that.

I also fixed a couple of extra things but they are not really relevant for this. Having a working environment preparation is a nice milestone by itself, but we did some things more on the GCC side!

Road to a working compiler: The GCC part

The changes in the riscv branch contain some commits, most of them are small, but they are really important. I have to say this is full of details I don’t really understand, so I’ll try to focus on those I actually do. The rest of them are simply things that happened to work in the end. You know, this is pretty old software and the project is too complex to understand it all…

Memory models and fences

First, before doing anything else, we mentioned in the previous post that the memory models were something we needed to review. We knew this because the code related to memory models was used in a couple of parts of the RISC-V code we copied from the GCC 7.5 codebase, but it was not available in GCC 4.6.4. That API simply did not exist back then.

The commit 71dc25d removes the memory models from the code (which were already commented out but not solved), taking in account the most conservative approach: always add the .aq flag and the fence instruction. This is not optimal, but the performance penalty is negligible and it’s not affecting the functionality.

I did not come up with this myself, as I mentioned in the previous post, I asked the maintainer of the RISC-V support of GCC (who is also one of the big names of RISC-V) about this and he gave me this solution.

I also had to change the optabs a little bit, using memory_barrier instead of one of the more recent optabs. For this I just compared the code from the MIPS architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for many other parts of this work. Easy-peasy.

Wrong arguments in the assembler call

As I mentioned in the Guix part, we were unable to call the assembler. This means we didn’t uncover the assembler call was broken until we actually put it in the PATH and tried to call it.

The commit 7030067 shows how I needed to make small changes in the way the assembler is called by GCC to ensure that it was called correctly.

This issue was easy to fix, but not that easy to catch. First I found the assembler was complaining because it didn’t understand the -k-march option. I spent some time realizing the problem was that those were to options that were merged together due to a lack of a space. Yes, the space in the end of the line is relevant.

I directly removed the -k option from the ASM_SPEC because my assembler was considering it ambiguous. I don’t remember where I copied this from but it works and I don’t want to think about it ever again.

Libgcc: the core of this change

The biggest thing in this set of changes was the addition of libgcc, which is mandatory if you want to link your programs compiled with GCC. libgcc is a library GCC uses for complex operations: instead of generating the assembly code directly, it generates calls to libgcc, where those complex operations are defined. You can read further about those operations but they are not really relevant for this post, the relevant part is we need to add libgcc in order to have a working compiler.

The GCC codebase has different folders for its different blocks, so it’s not surprising to see there’s a folder called gcc for the core and a folder called libgcc for libgcc. Anyone would expect that just cherry picking the commit that added the libgcc support to GCC 7.5 would be enough to have the backport ready.

Sadly, life is a little bit harder than that.

Cherry picking the libgcc support

The first and easiest thing to do is to cherry pick the commit 72add2f and pray. It looked plausible to make it work, because, if you look at the changes it makes, it’s pretty well contained in the libgcc/config/riscv/ folder and adds just a couple of lines to the libgcc/config.sub to make it find the riscv folder.

The contents of the commit are pretty clear:

  1. Some assembly files that implement some operations
  2. Some header files and C code that implement other things
  3. Some weird files called t-something

The first two types of files we can understand as the body of the libgcc support: the juice. The t-something files are what are called Makefile Fragments.

The Makefile Fragments are the basis of the GCC build system. The files like config.host, also part of the commit, sets a variable, tmake_file, where all the t-somethings are added so the compiler generator framework knows how to build the things according to the rules described in them.

That’s how GCC buildsystem works. Now let’s talk about the problems.

LIB2ADD iteration is broken

First thing I realized when I did the cherry pick of the libgcc support was the whole thing did not build anymore. There was a crazy issue here.

We are not going to talk about LIB2ADD variable yet, but we can see this small change, b9c7f39, affects it. The main issue here was the whole makefile system (*.mk files in libgcc) was iterating over the values of the variable wrong, because libgcc support commit was appending values to LIB2ADD instead of setting it. The LIB2ADD variable was set empty from the main makefiles, and appending to it was leaving an empty entry, so the iteration process was trying to compile an empty value.

This was superhard to debug, but this small change just made the whole thing compile and now I was able to test the whole thing further.

Still broken

But it was still broken. GCC didn’t want to compile. Some weird errors appeared, mentioning something like the extra_parts were not coherent between gcc and libgcc. Weird.

Reading gcc/config.gcc and libgcc/config.host I realized the use of the extra_parts variable and how it was certainly incoherent between the two files. But why?

This led me to analyze the whole build system, comparing the RISC-V support with others. I realized here that the buildsystem is mixed in gcc and libgcc folders and it’s extremely difficult to know what’s the line that separates one from another.

Apart from that, the buildsystem was unable to compile the crt* files, because it didn’t know how to do it… The recipes were missing.

This made me go for the most aggressive change possible, 9c0f736: just copy everything from the libgcc/config/riscv/ to the gcc/config/riscv, add the rules for the crt* files and make the extra_parts coherent.

Of course, this is not a good change, but it lets us try if the generated compiler is able to compile anything. “I’ll have time to clean this up later” I thought.

The buildsystem is just a pain in the butt

Now I was able to compile the GCC, so I could try it for some things.

I build a RISC-V cross compiler and tried to statically compile a small Hello World program. Errors appeared:

/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2'
collect2: ld returned 1 exit status

The most logical thing to do was to build a MIPS cross compiler and check if the same issue appeared. Of course, it didn’t.

Researching a little bit in the old GCC internals documentation, I found a couple of interesting things:

https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment

  • The LIB2FUNCS_EXTRA variable is the one that contains what it should be compiled and added to libgcc.
  • Floating Point Emulation support is added by generating a couple of files with some macros on top: fp-bit.c and dp-bit.c.

Neither of those were used in the libgcc support we backported because the GCC buildsystem changed a lot since 4.6.4. In fact, there is a commit2, much later than the 4.6.4 release, that removes the need to generate those fp-bit.c thingies.

The LIB2FUNCS_EXTRA variable was not used either, but somewhere in the makefiles I found LIB2ADD was set from it. It looks like the whole buildsystem changed from LIB2FUNCS_EXTRA to LIB2ADD, which was an internal variable in the past. I don’t know.

I just moved the LIB2ADD to LIB2FUNCS_EXTRA and set the floating point emulation in the t-riscv makefile fragment and hoped my work was done there.

A huge pain in the butt

It still failed, but at least now the __letf2 symbol was found. The only one I needed to fix now was __unordtf2.

I was disheartened.

The __unordtf2 name did not appear anywhere in the code, but building libgcc for MIPS had the symbol inside (I checked it with nm!). I had no idea of what was going on.

I asked all my peers about this, and I was sent a program that was actually compilable and runnable (Janneke is a genius, someone has to say it!):

#include <stdio.h>

int
main ()
{
  return printf ("Hello, world!\n");
}

int
__unordtf2 ()
{
  return 0;
}

Hah! Still, no solution, but it was a little bit of hope.

This gave me the energy I needed to research further. This __unordtf2 function comes from software floating point support but the makefile fragments in the libgcc folder seem to be correctly set…

Moxie for the rescue

MIPS architecture was too complex to be understandable for this humble human being so I decided to go for Moxie this time.

Moxie is a really interesting thing. But we are not going to spend time on it, but in its support in GCC 4.6.4. Take a look to the files on both parts of the Moxie support: the libgcc and gcc:

gcc/config/moxie
├── constraints.md
├── crti.asm
├── crtn.asm
├── moxie.c
├── moxie.h
├── moxie.md
├── moxie-protos.h
├── predicates.md
├── rtems.h
├── sfp-machine.h
├── t-moxie
├── t-moxie-softfp
└── uclinux.h

libgcc/config/moxie
├── crti.asm
├── crtn.asm
├── sfp-machine.h
├── t-moxie
└── t-moxie-softfp

As you can see, some things are repeated, and most of the files are located in the gcc part, which was not the case in the backported commit. I used this as a reference for a massive cleanup of the previous aggressive duplication and I ended up with this commit: 703efe3

But that wasn’t enough.

I also found that the soft-fp support did not come from the libgcc directory, but from the gcc one, so I needed to fix some makefile fragments. The reference on how to do that was located in gcc/config/soft-fp/t-softfp. This file described all the variables that I needed to set up to make the whole process find the software floating point functions to add (see how the function names are built with the $(m) variable? That’s why I couldn’t find where did the __unordtf2 came from…).

Those variables were set in libgcc/config/riscv/t-softp* files. I replicated them in gcc/config/riscv as in the Moxie target and added referenced to them to the gcc/config.gcc file, copying the lines I had libgcc/config.host. The process was still failing, as the variables were not found by the main makefile. I decided to hardcode them and give it another go, this time it built and I was able to build files and the weird errors did not appear anymore.

I realized in the end that the reason why the main makefile wasn’t finding the variables was because I was referring to the t-softfp* files through the variable host_address, as it was done in the libgcc/config.host. The problem was that variable was not available in the main gcc/config.gcc file so I had to make a beautiful switch-case to deduce the wordsize.

With all this knowledge and with the help from the Moxie support I finally arranged a new commit, where I duplicated the files that I needed to duplicate, added the correct references to the makefile fragments and I even fixed some of the variables in the makefiles: f42a214

Yeah, all this was hard to deduce, because this buildsystem is really complex and makefiles are really hard to debug3. Also the fact that I don’t understand why I need to replicate the t-softp* files in both places drives me mad, but I have to learn to deal with the fact that I can’t understand everything.

In these commits you can see I deleted references to extra_parts and some other things, too. The reason is simple: if other architectures don’t need to set those variables, me neither. In the end, the crt* files were generated anyway.

Other changes

I also removed -latomic from the calls to the linker because it looks like it didn’t exist back then (we’ll see how this explodes in my face in the future), and fixed a couple of things more, but that’s not really interesting in my opinion4.

Missing things

There are many things missing still, but this some I won’t even try because they are out of the scope of the project. Remember: we just need to be able to compile a more recent GCC, not the rest of the world.

Some of the things I left might become mandatory in the near future as we do proper testing of all this. My goal here was to provide something that can run, and then I’ll collaborate with the different agents in this bootstrapping effort to fix anything we need to reach the full bootstrapping support.

There are few obvious things missing:

  • Big Endian support: riscv64be-linux-gnu support, basically (note the be in the target name). I won’t add this until we are sure we need it. It shouldn’t be difficult, I already found some commits in the main GCC where this was added and they were simple.
  • Specific device support: we didn’t add support for any specific device yet, that’s something we’ll need to think about in the future, but we probably won’t add because it will make us maintain more code, and I don’t think generic RISC-V code is going to have issues in the majority of the devices.
  • There are also many commits that came after the main port that fix some relocations and some other things. Many of them are not really relevant, because most of them are related with bugs that were introduced later, fix things that won’t change anything in the only program we need to build (GCC) and so on. In order to know which ones are relevant we need…
  • Proper testing! I didn’t do this yet, and I’ll probably need help with it. Compile your RISC-V software with this and give it a try! Send me the errors you get!
  • Libatomic: was directly removed from the calls to the linker, as I mentioned before and we have to make sure it didn’t exist back then and so on. Boring things…
  • I didn’t even bother to add the testsuite support, our only test has to be if we are able to compile GCC with this, which I didn’t really try yet anyway (because it needs some extra things).

Conclusion

This part of the project came in the worst moment. I wasn’t really motivated and I had some personal things going on. It was difficult for me to do this.

In contrast with what I did in the previous steps of the project, this part is really uninteresting because it doesn’t give you a lot of chances for learning, which is the only thing that keeps me alive at this point.

It’s also pretty boring and exasperating to feel you’ll never understand something and trying and trying almost in a trial and error way is really boring for someone like me.

Sometimes, working like this makes you feel really alone. You have almost no people to help you, and the project needs a huge amount of context to be understood so you can’t ask for help to anyone, and those who are supposed to know are really hard to reach. Or what it might be worse: maybe there’s none that understands this thing well, because it’s old, it changed a lot and probably just a handful of people do really took part in the development of the fucking buildsystem.

In conclusion, this is boring and uninteresting job, but someone has to do this, and… It was my turn this time.

You go next.


  1. Some people also spent time with me in the IRC. Thanks to all that helped! 

  2. 569dc494616700a3cf078da0cc631c36a4f15821 

  3. Try to run make --debug in a project of the size of GCC and laugh with me. 

  4. The rest of the post is not really interesting either, but I need to report what I did. It’s just me fighting against myself and a very complex buildsystem that could’ve been simpler and/or better documented.