In the series we already introduced GCC, and we already shared how I backported the RISC-V support from the GCC core to GCC-4.6.4. Now it’s time to finish what we left half-done and actually introduce a full RISC-V compiler.
Where we left last time
The Tuesday, 7th of April, I marked a commit with the minimal-compiler
tag.
That commit contains all the work we did until that time. In that tag we
describe how we can build a compiler that is only able to assemble files to
RISC-V.
As we already explained around here, GCC is a driver program that calls other programs to do its work. The GCC core compiles the code to assembly language and then calls binutils to do the rest of the work: assembly and linking.
At that point, we had to call binutils by hand.
The changes
The changes applied at the time of writing are available in the
working-compiler
tag. As the tag message describes, they
were split in two different branches: the guix-package
branch and the riscv
branch.
The guix_package
branch is merged in the riscv
branch but this split lets
us differentiate which changes are related with the compiler itself and which
are related with the tooling around the compiler. That way we’ll be able to
choose what to do with the commits easily in the future. We’ll probably need to
rearrange some stuff.
The context is everything: Guix package part
The guix_package
branch contains all the commits that make the Guix tooling
around the project work. This includes the compilation process definition in a
reproducible way, the environment setup and all that.
As the working-compiler
tag message describes, this is the way you can
currently make this compiler work and play with it:
$ guix shell -m manifest.scm
$ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu
# This second command will prepare the PATH and other environment
# variables to make GCC find libraries and executables
If you use this in the future and it fails, it might be because between the time this blog post was written and you read it Guix made some changes in the core packages that are used. You can always use the
time-machine
utility to make sure you use everything like in the moment this post was written:
guix time-machine --channels=channels.scm -- shell -m manifest.scm
From this point you can directly run the compiler, it will need the sysroot
option to be able to find the crt*
files, but that’s something I’m not
worried about at this point, we’ll fix that when we integrate this in the
bootstrapping process.
Run the compiler like this now:
$ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static] ...
Notable changes in the Guix side
The most notable change in the Guix side is the addition of the manifest.scm
file and also the PREPARE_FOR_COMPILATION.sh
file. With the help of my man
Janneke, I realized the problems I had came from the fact that I was calling
the compiler with the wrong environment and it was unable to find the linker
and the assembler. Yes, this kind of things happen a lot in Guix if you are not
careful (and I am not careful at all). Adding these tools let me prepare a
working environment where the assembler and the compiler are found and called properly.
This change also includes the some interesting extras: the GLibC added to the manifest also contains the static version so we can generate static binaries that are easier to test in an emulated environment without having to deal with the dynamic linker. Important stuff.
Also, now the compilation process relies on a newer Guix version, which removed
the -unknown
part from the triplets (actually quadruplets), like
riscv64-unknown-linux-gnu
. That was a little bit of a pain, because I just
tried to compile everything one day and failed, and in the end it was just that
small change. I decided to update the Guix version needed to keep it up-to-date
with the current Guix, so I didn’t need to run guix time-machine
each time.
It’s better like this.
If you want to read more about the change and see how fast Guix
people helped me understand what was going on, see this mailing list
thread1. I have also to mention that I needed to add a small
change to my GCC to be able to work in the case the -unknown
part was not
added to it: adding riscv
to config.sub
was enough for that.
I also fixed a couple of extra things but they are not really relevant for this. Having a working environment preparation is a nice milestone by itself, but we did some things more on the GCC side!
Road to a working compiler: The GCC part
The changes in the riscv
branch contain some commits, most of them are small,
but they are really important. I have to say this is full of details I don’t
really understand, so I’ll try to focus on those I actually do. The rest of
them are simply things that happened to work in the end. You know, this is
pretty old software and the project is too complex to understand it all…
Memory models and fences
First, before doing anything else, we mentioned in the previous post that the memory models were something we needed to review. We knew this because the code related to memory models was used in a couple of parts of the RISC-V code we copied from the GCC 7.5 codebase, but it was not available in GCC 4.6.4. That API simply did not exist back then.
The commit 71dc25d
removes the memory models from the code
(which were already commented out but not solved), taking in account the most
conservative approach: always add the .aq
flag and the fence
instruction.
This is not optimal, but the performance penalty is negligible and it’s not
affecting the functionality.
I did not come up with this myself, as I mentioned in the previous post, I asked the maintainer of the RISC-V support of GCC (who is also one of the big names of RISC-V) about this and he gave me this solution.
I also had to change the optabs a little bit, using memory_barrier
instead of
one of the more recent optabs. For this I just compared the code from the MIPS
architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for
many other parts of this work. Easy-peasy.
Wrong arguments in the assembler call
As I mentioned in the Guix part, we were unable to call the assembler. This
means we didn’t uncover the assembler call was broken until we actually put it
in the PATH
and tried to call it.
The commit 7030067
shows how I needed to make small changes in the
way the assembler is called by GCC to ensure that it was called correctly.
This issue was easy to fix, but not that easy to catch. First I found the
assembler was complaining because it didn’t understand the -k-march
option. I
spent some time realizing the problem was that those were to options that were
merged together due to a lack of a space. Yes, the space in the end of the line
is relevant.
I directly removed the -k
option from the ASM_SPEC
because my assembler was
considering it ambiguous. I don’t remember where I copied this from but it
works and I don’t want to think about it ever again.
Libgcc: the core of this change
The biggest thing in this set of changes was the addition of libgcc
, which
is mandatory if you want to link your programs compiled with GCC. libgcc
is a
library GCC uses for complex operations: instead of generating the assembly
code directly, it generates calls to libgcc
, where those complex operations
are defined. You can read further about those operations but they are not
really relevant for this post, the relevant part is we need to add libgcc
in
order to have a working compiler.
The GCC codebase has different folders for its different blocks, so it’s
not surprising to see there’s a folder called gcc
for the core and a folder
called libgcc
for libgcc
. Anyone would expect that just cherry picking the
commit that added the libgcc
support to GCC 7.5 would be enough to have the
backport ready.
Sadly, life is a little bit harder than that.
Cherry picking the libgcc support
The first and easiest thing to do is to cherry pick the commit
72add2f
and pray. It looked plausible to make it work,
because, if you look at the changes it makes, it’s pretty well contained in the
libgcc/config/riscv/
folder and adds just a couple of lines to the
libgcc/config.sub
to make it find the riscv
folder.
The contents of the commit are pretty clear:
- Some assembly files that implement some operations
- Some header files and C code that implement other things
- Some weird files called
t-something
The first two types of files we can understand as the body of the libgcc
support: the juice. The t-something
files are what are called Makefile Fragments.
The Makefile Fragments are the basis of the GCC build system. The files like
config.host
, also part of the commit, sets a variable, tmake_file
, where
all the t-something
s are added so the compiler generator framework knows how
to build the things according to the rules described in them.
That’s how GCC buildsystem works. Now let’s talk about the problems.
LIB2ADD iteration is broken
First thing I realized when I did the cherry pick of the libgcc
support was
the whole thing did not build anymore. There was a crazy issue here.
We are not going to talk about LIB2ADD
variable yet, but we can see this
small change, b9c7f39
, affects it. The main issue here was the
whole makefile system (*.mk
files in libgcc
) was iterating over the values
of the variable wrong, because libgcc
support commit was appending values to
LIB2ADD
instead of setting it. The LIB2ADD
variable was set empty from the
main makefiles, and appending to it was leaving an empty entry, so the
iteration process was trying to compile an empty value.
This was superhard to debug, but this small change just made the whole thing compile and now I was able to test the whole thing further.
Still broken
But it was still broken. GCC didn’t want to compile. Some weird errors
appeared, mentioning something like the extra_parts
were not coherent between
gcc
and libgcc
. Weird.
Reading gcc/config.gcc
and libgcc/config.host
I realized the use of the
extra_parts
variable and how it was certainly incoherent between the two
files. But why?
This led me to analyze the whole build system, comparing the RISC-V support
with others. I realized here that the buildsystem is mixed in gcc
and
libgcc
folders and it’s extremely difficult to know what’s the line that
separates one from another.
Apart from that, the buildsystem was unable to compile the crt*
files,
because it didn’t know how to do it… The recipes were missing.
This made me go for the most aggressive change possible,
9c0f736
: just copy everything from the
libgcc/config/riscv/
to the gcc/config/riscv
, add the rules for the crt*
files and make the extra_parts
coherent.
Of course, this is not a good change, but it lets us try if the generated compiler is able to compile anything. “I’ll have time to clean this up later” I thought.
The buildsystem is just a pain in the butt
Now I was able to compile the GCC, so I could try it for some things.
I build a RISC-V cross compiler and tried to statically compile a small Hello World program. Errors appeared:
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2'
collect2: ld returned 1 exit status
The most logical thing to do was to build a MIPS cross compiler and check if the same issue appeared. Of course, it didn’t.
Researching a little bit in the old GCC internals documentation, I found a couple of interesting things:
https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment
- The
LIB2FUNCS_EXTRA
variable is the one that contains what it should be compiled and added tolibgcc
. - Floating Point Emulation support is added by generating a couple of files
with some macros on top:
fp-bit.c
anddp-bit.c
.
Neither of those were used in the libgcc
support we backported because the
GCC buildsystem changed a lot since 4.6.4. In fact, there is a commit2,
much later than the 4.6.4 release, that removes the need to generate those
fp-bit.c
thingies.
The LIB2FUNCS_EXTRA
variable was not used either, but somewhere in the
makefiles I found LIB2ADD
was set from it. It looks like the whole
buildsystem changed from LIB2FUNCS_EXTRA
to LIB2ADD
, which was an internal
variable in the past. I don’t know.
I just moved the LIB2ADD
to LIB2FUNCS_EXTRA
and set the floating point
emulation in the t-riscv
makefile fragment and hoped my work was done there.
A huge pain in the butt
It still failed, but at least now the __letf2
symbol was found. The only one
I needed to fix now was __unordtf2
.
I was disheartened.
The __unordtf2
name did not appear anywhere in the code, but building
libgcc
for MIPS had the symbol inside (I checked it with nm
!). I had no
idea of what was going on.
I asked all my peers about this, and I was sent a program that was actually compilable and runnable (Janneke is a genius, someone has to say it!):
#include <stdio.h>
int
main ()
{
return printf ("Hello, world!\n");
}
int
__unordtf2 ()
{
return 0;
}
Hah! Still, no solution, but it was a little bit of hope.
This gave me the energy I needed to research further. This __unordtf2
function comes from software floating point support but the makefile fragments
in the libgcc
folder seem to be correctly set…
Moxie for the rescue
MIPS architecture was too complex to be understandable for this humble human being so I decided to go for Moxie this time.
Moxie is a really
interesting thing. But we are not going to spend time on it, but in its support
in GCC 4.6.4. Take a look to the files on both parts of the Moxie support: the
libgcc
and gcc
:
gcc/config/moxie
├── constraints.md
├── crti.asm
├── crtn.asm
├── moxie.c
├── moxie.h
├── moxie.md
├── moxie-protos.h
├── predicates.md
├── rtems.h
├── sfp-machine.h
├── t-moxie
├── t-moxie-softfp
└── uclinux.h
libgcc/config/moxie
├── crti.asm
├── crtn.asm
├── sfp-machine.h
├── t-moxie
└── t-moxie-softfp
As you can see, some things are repeated, and most of the files are located in
the gcc
part, which was not the case in the backported commit. I used this as
a reference for a massive cleanup of the previous aggressive duplication and I
ended up with this commit: 703efe3
But that wasn’t enough.
I also found that the soft-fp
support did not come from the libgcc
directory, but from the gcc
one, so I needed to fix some makefile fragments.
The reference on how to do that was located in gcc/config/soft-fp/t-softfp
.
This file described all the variables that I needed to set up to make the whole
process find the software floating point functions to add (see how the function
names are built with the $(m)
variable? That’s why I couldn’t find where did
the __unordtf2
came from…).
Those variables were set in libgcc/config/riscv/t-softp*
files. I replicated
them in gcc/config/riscv
as in the Moxie target and added referenced to them
to the gcc/config.gcc
file, copying the lines I had libgcc/config.host
. The
process was still failing, as the variables were not found by the main
makefile. I decided to hardcode them and give it another go, this time it built
and I was able to build files and the weird errors did not appear anymore.
I realized in the end that the reason why the main makefile wasn’t finding the
variables was because I was referring to the t-softfp*
files through the
variable host_address
, as it was done in the libgcc/config.host
. The
problem was that variable was not available in the main gcc/config.gcc
file
so I had to make a beautiful switch-case
to deduce the wordsize.
With all this knowledge and with the help from the Moxie support I finally
arranged a new commit, where I duplicated the files that I needed to duplicate,
added the correct references to the makefile fragments and I even fixed some of
the variables in the makefiles: f42a214
Yeah, all this was hard to deduce, because this buildsystem is really complex
and makefiles are really hard to debug3. Also the fact that I
don’t understand why I need to replicate the t-softp*
files in both places
drives me mad, but I have to learn to deal with the fact that I can’t
understand everything.
In these commits you can see I deleted references to extra_parts
and some
other things, too. The reason is simple: if other architectures don’t need
to set those variables, me neither. In the end, the crt*
files were generated anyway.
Other changes
I also removed -latomic
from the calls to the linker because it looks like it
didn’t exist back then (we’ll see how this explodes in my face in the future),
and fixed a couple of things more, but that’s not really interesting in my
opinion4.
Missing things
There are many things missing still, but this some I won’t even try because they are out of the scope of the project. Remember: we just need to be able to compile a more recent GCC, not the rest of the world.
Some of the things I left might become mandatory in the near future as we do proper testing of all this. My goal here was to provide something that can run, and then I’ll collaborate with the different agents in this bootstrapping effort to fix anything we need to reach the full bootstrapping support.
There are few obvious things missing:
- Big Endian support:
riscv64be-linux-gnu
support, basically (note thebe
in the target name). I won’t add this until we are sure we need it. It shouldn’t be difficult, I already found some commits in the main GCC where this was added and they were simple. - Specific device support: we didn’t add support for any specific device yet, that’s something we’ll need to think about in the future, but we probably won’t add because it will make us maintain more code, and I don’t think generic RISC-V code is going to have issues in the majority of the devices.
- There are also many commits that came after the main port that fix some relocations and some other things. Many of them are not really relevant, because most of them are related with bugs that were introduced later, fix things that won’t change anything in the only program we need to build (GCC) and so on. In order to know which ones are relevant we need…
- Proper testing! I didn’t do this yet, and I’ll probably need help with it. Compile your RISC-V software with this and give it a try! Send me the errors you get!
- Libatomic: was directly removed from the calls to the linker, as I mentioned before and we have to make sure it didn’t exist back then and so on. Boring things…
- I didn’t even bother to add the testsuite support, our only test has to be if we are able to compile GCC with this, which I didn’t really try yet anyway (because it needs some extra things).
Conclusion
This part of the project came in the worst moment. I wasn’t really motivated and I had some personal things going on. It was difficult for me to do this.
In contrast with what I did in the previous steps of the project, this part is really uninteresting because it doesn’t give you a lot of chances for learning, which is the only thing that keeps me alive at this point.
It’s also pretty boring and exasperating to feel you’ll never understand something and trying and trying almost in a trial and error way is really boring for someone like me.
Sometimes, working like this makes you feel really alone. You have almost no
people to help you, and the project needs a huge amount of context to be
understood so you can’t ask for help to anyone, and those who are supposed to
know are really hard to reach. Or what it might be worse: maybe there’s none
that understands this thing well, because it’s old, it changed a lot and
probably just a handful of people do really took part in the development of the
fucking buildsystem.
In conclusion, this is boring and uninteresting job, but someone has to do this, and… It was my turn this time.
You go next.
-
Some people also spent time with me in the IRC. Thanks to all that helped! ↩
-
569dc494616700a3cf078da0cc631c36a4f15821
↩ -
Try to run
make --debug
in a project of the size of GCC and laugh with me. ↩ -
The rest of the post is not really interesting either, but I need to report what I did. It’s just me fighting against myself and a very complex buildsystem that could’ve been simpler and/or better documented. ↩