Ekaitz's tech bloghttps://ekaitz.elenq.tech/2024-03-28T00:00:00+02:00I make stuff at ElenQ Technology and I talk about itGCC 4.6.4 with RISC-V support2024-03-28T00:00:00+02:002024-03-28T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2024-03-28:/bootstrapGcc10.html<p>We built <span class="caps">GCC</span> 4.6.4 with <span class="caps">RISC</span>-V support and C++ and all that in a Debian
machine, in a VisionFive board. Here is how.</p><p>I already mentioned at the <a href="https://fosdem.org/2024/schedule/event/fosdem-2024-1755-risc-v-bootstrapping-in-guix-and-live-bootstrap/"><span class="caps">FOSDEM</span>-2024</a> that we built <span class="caps">GCC</span> 4.6.4 in
a Debian machine in real <span class="caps">RISC</span>-V hardware, but I didn’t explain the specifics.
Since then we’ve been working on many other parts and trying to package it for
Guix, which happened to be harder than we thought (more on that later).</p>
<p>Today I decided to build it again to make sure it was possible to do, including
<span class="caps">GCC</span>’s bootstrapping process that was giving us headaches in Guix and wrote the
process down, because I remember the last time I tried it failed for me and I
was worried. Maybe I did something I forgot? Or was my brain playing tricks on me?</p>
<p>That’s why I’m writing this down. You already know how it works. It has
happened to you, and if it didn’t it probably will.</p>
<h5>Debian</h5>
<p>Install some simple deps:</p>
<pre><code class="language-bash">$ sudo apt install build-essential git
</code></pre>
<p>Clone the repo and jump to it, to the <code>working-compiler-c++</code> tag:</p>
<pre><code class="language-bash">$ git clone https://github.com/ekaitz-zarraga/gcc.git
$ cd gcc
$ git checkout working-compiler-c++
</code></pre>
<p>In Debian the <code>riscv64-linux-gnu</code> standard library is installed in a weird
location, so we need to make the compilation process find it.</p>
<pre><code class="language-bash">$ export C_INCLUDE_PATH=/usr/include/riscv64-linux-gnu/
$ export CPLUS_INCLUDE_PATH=/usr/include/riscv64-linux-gnu/
$ export LIBRARY_PATH=/usr/lib/riscv64-linux-gnu/
</code></pre>
<p>We need to patch a couple of things, that we’ll set up in the Guix recipe. I
decided to keep them out of the codebase because I want to keep the code
consistent with the past.</p>
<p>This first patch is to convert the <code>struct ucontext</code> to the modern name:
<code>ucontext_t</code>. This change only makes <span class="caps">GCC</span> compilable using a modern toolchain,
in other contexts you might want to keep the old name.</p>
<pre><code class="language-bash">$ sed -i 's/struct ucontext/ucontext_t/g' gcc/config/*/linux-unwind.h
</code></pre>
<p>Next, to avoid some error with pthread, you have to apply this diff. Which is
just removing some pthread reference from <code>gcc/config/riscv/linux.h</code></p>
<pre><code class="language-diff">diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
index cd027813b41..d7d2b0978de 100644
--- a/gcc/config/riscv/linux.h
+++ b/gcc/config/riscv/linux.h
@@ -23,16 +23,6 @@ along with GCC; see the file COPYING3. If not see
#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-riscv" XLEN_SPEC "-" ABI_SPEC ".so.1"
-/* FIXME */
-/* Because RISC-V only has word-sized atomics, it requries libatomic where
- others do not. So link libatomic by default, as needed. */
-#undef LIB_SPEC
-#ifdef LD_AS_NEEDED_OPTION
-#define LIB_SPEC GNU_USER_TARGET_LIB_SPEC \
- " %{pthread:" LD_AS_NEEDED_OPTION LD_NO_AS_NEEDED_OPTION "}"
-#else
-#endif
-
#define LINK_SPEC "\
-melf" XLEN_SPEC "lriscv \
%{shared} \
</code></pre>
<p>Now, configure and build, classic <span class="caps">GNU</span> build-system:</p>
<pre><code class="language-bash">$ ./configure --build=riscv64-linux-gnu --enable-languages=c,c++ \
--disable-shared --disable-gomp --prefix=/data/prefix
$ make -j4
</code></pre>
<p>This should finish properly and give you a working <span class="caps">GCC</span>.</p>
<h5>Guix</h5>
<p>In the <code>riscv</code> branch you can see more work by Efraim Flashner trying to make a
Guix package we can use later, but it’s not as easy as it looks. Guix is not
like Debian in many things, and that makes the process a little bit harder.</p>
<p>Efraim managed to fix the package I made (see <code>guix.scm</code>) to work on i386 but
the very same package didn’t work for <span class="caps">RISC</span>-V without changes.</p>
<p>The main problem has to do with <span class="caps">GCC</span>’s bootstrapping process.</p>
<h6><span class="caps">GCC</span>’s bootstrapping process</h6>
<p>When we talk about <span class="caps">GCC</span>’s bootstrapping process we don’t mean the whole
distribution bootstrapping which is what we are trying to achieve in this very
project, but the process the compiler itself has to check itself.</p>
<p>When you build <span class="caps">GCC</span> from source, it is built with the compiler you have in your
machine. Once it did that, the resulting compiler is used to compile the <span class="caps">GCC</span>’s
source code again, and the resulting compiler builds <span class="caps">GCC</span> again. The binaries
generated by the latest two steps are compared one to another, and if they are
not identical (bit by bit) the build process is considered a failure.</p>
<p>This is giving us some headaches in Guix. We manage it to finish, but the
latest steps are slightly different, for reasons we are not sure about yet.</p>
<h5>So…</h5>
<p>The reason why I built this thing in Debian again was to remind me it is
possible to build it right, passing the comparison step, so I could get a
little bit more of motivation to make it build in Guix properly.</p>
<p>Let’s see if this serves its purpose and we manage to make it soon.</p>FOSDEM and Guix Days 20242024-02-12T00:00:00+02:002024-02-12T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2024-02-12:/fosdem-2024.html<p>About my personal <span class="caps">FOSDEM</span> 2024 experience and Guix Days</p><p>This year I gave a talk at <span class="caps">FOSDEM</span>, summarizing our work on the Guix
bootstrapping for <span class="caps">RISC</span>-V, so I decided to get a couple of free days more and
also visit the Guix Days, that where happening before the <span class="caps">FOSDEM</span> itself. Let’s
do a short summary of my experience here.</p>
<h3>Guix Days</h3>
<p>I visited the Guix Days but not full-time because I wanted to spend some time
in Brussels itself, rather than being locked in a place for the whole day.</p>
<p>We had some interesting discussions about Guix, the most interesting for me was
the Guix Governance, where we discussed how Guix is managed and how it works
internally in a social level. This discussion was specially important for me as
an external contributor because I believe Guix has a complex but opaque
internal structure that is difficult to grasp from the outside. Being in places
like the Guix Days themselves let’s you understand it but it’s our
responsibility to make Guix accessible for people that doesn’t have time to
come to these kind of events.</p>
<p>I say all this because I’m basically that person. I don’t enjoy this kind of
events that much and I feel I was a little bit forced to be there to be a
little bit more than a random string in the <span class="caps">IRC</span> chat.</p>
<p>I had the chance to be there, but it was an effort to me. I’d like it not to be
the same for others.</p>
<p>It’s not like I’m an antisocial, in fact I feel I’m a very social person, but I
don’t like the politics of things and this event, more than anything else, I
felt it was a political event where I had to be just to show up.</p>
<p>This is not just a Guix problem. Most large enough project fall into this kind
of dynamic, where people that show up are better considered that those who
don’t. It makes sense (this is how the world works), but at the same time it
doesn’t (I don’t like how the world works).</p>
<h3><span class="caps">FOSDEM</span> 2024</h3>
<p>We arrived <span class="caps">FOSDEM</span> on Saturday. It was literally impossible to do anything
there. It was basically supercrowded. We tried to watch some talks, all were
full. Waited for a long queue to enter one and we didn’t have the chance to
enter in the end and we decided to leave. We had a nice day in the city instead.</p>
<p><a href="https://fosdem.org/2024/schedule/event/fosdem-2024-1755-risc-v-bootstrapping-in-guix-and-live-bootstrap/">Sunday morning I gave a talk</a>. Sunday was way better: we could walk
around and do things but we spent the morning in the Declarative and
Minimalistic Computing Devroom until my talk happened. After my talk, we
watched a couple more there and left for a very good lunch.</p>
<p>I think the talk went well, but I’ll leave further judgement to you. Feel free
to watch and send me feedback if you like to do so.</p>
<h3>My feelings</h3>
<p>In a personal level, traveling (taking flights and all that…) is mentally
exhausting for me, and it’s also expensive. I don’t feel like I would do this
often in the future, as I didn’t do it in the past.</p>
<p>Also, I don’t enjoy geeky events like these that much. I don’t use Guix or any
other software as a part of a tribe, I just think it’s useful. I enjoy other
kind of social interactions more. I just felt like an outsider in both events,
but I don’t really want to become anything else than that. I don’t feel
comfortable with becoming “part” of anything. I believe software is not a cult,
and everyone should have the freedom to contribute and enjoy in a purely
practical way, with no identities involved.</p>
<p>Also I felt like people around those higher latitudes are colder, they laugh
less than I do and they don’t have the boiling blood that I have. Maybe that’s
why my talk made people laugh and react. There’s nothing wrong about that,
culture is always cool, but the culture mismatch I felt it was a little bit of
a barrier.</p>
<p>Apart form all that, I had the chance to visit a cool city with my significant
other and with my friends, who came to support me in my talk and enjoy a
conference. We probably didn’t enjoy the conference that much, but I
experienced being surrounded by people that love me and had a lot of fun with
them, and that’s more valuable than anything else.</p>
<p>On the other hand, I don’t need <span class="caps">FOSDEM</span> for that. I feel grateful for my people
every single day of the year.</p>
<p>Don’t expect to find me in many geek events like these, but I don’t totally
discard showing up from time to time.</p>Guix + Zig + NSIS for the win…DOWS?2023-12-15T00:00:00+02:002023-12-15T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2023-12-15:/windows2.html<p>How I made a program for Windows and <span class="caps">GNU</span>/Linux without touching any Windows
machine. The tools and the tricks to be effective (Zig and <span class="caps">NSIS</span> for the win).</p><p>Some months later, it’s time to talk about <a href="https://ekaitz.elenq.tech/windows.html">the post I made about writing
software for windows, without windows</a> because I put some of those things
in practice.</p>
<p>I made a simple application with one external audio library (OpenAL) and simple
networking. No <span class="caps">GUI</span> this time, but some complexity was there.</p>
<h3>The program</h3>
<p>The program I’ll discuss here is Karkarkar, a tool to read a Twitch chat out
loud in Basque. I made this for the Basque streaming community of gamers, which
is really cool but they had to rely on Text-To-Speech systems for other
languages as the main services for streamers don’t include Basque in their
service, and the closest one, Spanish, is not good at some words’ pronunciation defer.</p>
<p>Most of these people are gamers, and they use Windows, which I don’t like, but
in the end they are also people and they deserve Free Software regardless of
the Operating System they decide to (or are forced to) use.</p>
<p>I took this as a cool exercise to test all we talked about in the past, and as
a learning experience for the times I need to do something for a client that
requires Windows support or anything like that.</p>
<h4>The Text-To-Speech system</h4>
<p>I used a <span class="caps">TTS</span> library by Aholab, a research group from the Bilbao School of
Engineering, the university where I studied. They released AhoTTS (<em>ahots</em>
means <em>voice</em> in Basque) some years ago in Github, a working <span class="caps">TTS</span> library for
Basque and Spanish, with all the extra data it needs to work.</p>
<p>The <em>only</em> problem it has is the AhoTTS codebase is a mess, with tons of
horrible decisions and undercover bugs. I had to fork the lib and make it work
more or less before adding it to the project (I won’t discuss the issues here),
but I did.</p>
<h4>Connection to the Twitch chat</h4>
<p>This simply uses <span class="caps">IRC</span> to connect to Twitch chat. The protocol is simple, and I
only implemented an embarrassingly minimal amount of it. Enough to make it kind
of work. I hope to make more of this in the future.</p>
<h4>Playing the audio</h4>
<p>I started with <code>libao</code>, a simple audio library and then moved to OpenAL for
reasons I’ll mention next.</p>
<h4>All together</h4>
<p>The program listens to <span class="caps">IRC</span>, when a message arrives it sends it to AhoTTS,
receives the samples to play and sends them to OpenAL which does its magic to
make it play out loud.</p>
<p>Everything is the simplest thing possible as I didn’t have a lot of time to
spend on this and also wanted to focus on the release process and being able to
make a package for windows, and some linux distributions. <strong>Adding more code on
top of that is easy once the problem of the distribution is solved.</strong></p>
<h3>The tooling</h3>
<p>My strongest dependency<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>, AhoTTS, is written in C++, so I decided to go for
binary file distribution. And as <a href="https://ekaitz.elenq.tech/windows.html#zig">I discussed in aforementioned post</a>,
I was looking for an excuse to make a project in Zig, and its cross-compilation
capabilities could help me in this case, so I went for that. I made everything
with Zig 0.10.1, as that is the latest version packaged for Guix, but I was
forced to move to my Zig 0.11.0 package (see <a href="#testing">Testing</a>).</p>
<p>For binary distribution, I didn’t have many ideas when I wrote the post, but a
person in the Lisp Game Jam of that time suggested me to use <span class="caps">NSIS</span> for the
Windows installer. It was already packaged in Guix, my distro of choice, so I
just went for that, as it looked the simplest way to solve this.</p>
<p>For testing all this, I relied in <code>wine</code>, what else could I do?</p>
<p>So tl;dr:</p>
<ul>
<li>I’m coding on <a href="https://guix.gnu.org/">Guix</a>.</li>
<li>Programming in <a href="https://ziglang.org/">Zig</a> 0.10.1 but then moved to
<a href="https://ziglang.org/">Zig</a> 0.11.0.</li>
<li>Installer using <a href="https://nsis.sourceforge.io/Main_Page"><span class="caps">NSIS</span></a>.</li>
<li>Testing done with <a href="https://www.winehq.org/">Wine64</a>.</li>
</ul>
<h3>Keeping it small</h3>
<p>First of all, publishing software for Windows from Linux is painful if you have
to compile everything for it. Guix helps a little bit with that as you can use
<code>--target=</code> with mingw and have some luck. Sadly, many packages (most of them)
need <code>bash-minimal</code>, which <a href="https://issues.guix.gnu.org/62226">is not buildable for mingw at the
moment</a>.</p>
<p>In many software projects cross-compilation is not even possible.</p>
<p>Knowing that I decided to keep my project as small as possible, because I
wanted to actually deliver something without losing my sanity.</p>
<p>Each library you depend on you have to compile and deliver to your target, too.
It’s not that common to have Windows users to install things by themselves as a
<span class="caps">GNU</span>/Linux person would.</p>
<h4>Audio library</h4>
<p><code>libao</code> is the smallest audio library I could find, but I didn’t manage to
build it for Windows myself, so I had to rely on something else. The
OpenAL-Soft maintainers have a binary distribution for windows, so I decided to
go for that one instead. It’s way harder to use, and I had to do some weird
stuff to make it work, but it’s easier for me: no need to build it myself.</p>
<p>This is even more important in the case of having an audio library, interacting
with the system is always a pain in the ass, and trying to cross-compile a
library like this is not always easy, as their configure scripts are complex as
they need to check for many things.</p>
<p>This is why I also avoided a <span class="caps">GUI</span> for the moment. Too much.</p>
<h4>AhoTTS</h4>
<p>AhoTTS is written in C++, it has zero dependencies and uses CMake as a build system.</p>
<p>At the beginning I compiled it using Guix to mingw, because it doesn’t have
dependencies, it just worked. Later I encountered some issues though:
<code>libstdc++</code> and <code>libgcc</code> were not found at runtime and I had to statically link
them (<code>-static-libgcc -static-libstdc++</code> ftw).</p>
<p>Later I started digging in its code, fixing some <strong>horrible</strong> things inside and
then I managed to add it as a submodule (yes, I hate that too) and compile it
directly with Zig. This is the best option so far: you can statically link the
library, it is built with the same process and it is cross-compiled by Zig. No
more missing library problems.</p>
<blockquote>
<p>Extra: In order to interact with AhoTTS from Zig, I had to make <a href="https://github.com/ekaitz-zarraga/karkarkar/blob/master/ahotts_c/htts.cpp">a small C++
to C bridge</a>, that I never did before. It’s easy stuff, just convert
the <span class="caps">API</span> a little bit, put some <code>extern "C"</code> on it and everything will go
fine. <code>void *</code> is your friend.</p>
</blockquote>
<h4>The rest of it</h4>
<p>That’s just writing some actual code and making the program run. That’s the
easiest part.</p>
<h3>Bringing it to the users</h3>
<p>One thing is making something that builds and runs in your computer and a very
different thing is making it work in other people’s computers. And this is the
main problem that I wanted to discuss here.</p>
<p>I have some requirements:</p>
<ul>
<li>I made a terminal application, but I don’t expect my users to know how to run
it. I need to make something that is click and run.</li>
<li>I need it to have some icon in the desktop/startup-menu.</li>
<li>The AhoTTS library needs some extra data files. These are searched by the
library and need to have some specific structure. These need to be installed
properly, too.</li>
<li>I need to provide a simple way to uninstall the application.</li>
</ul>
<h4>Windows</h4>
<p>The cool thing of this is I don’t own a windows machine since ~2010 and I have
no access to any (living the good life!). I don’t have any plans to change that
neither I have plans to really learn about Windows. So we have to be clever to
solve this.</p>
<p>First things first:</p>
<pre><code class="language-bash">zig build -Dcpu=baseline -Dtarget=x86_64-windows-gnu
</code></pre>
<p>Damn! That was actually very easy to do. Why isn’t this the norm in other languages?</p>
<p>Of course, in order to do this I needed to provide a proper <code>build.zig</code> file
that was able to find the DLLs I was linking against and the header files.
That’s not that difficult after all<sup id="fnref:dll-names"><a class="footnote-ref" href="#fn:dll-names">2</a></sup>. But has to be done. Still,
easy. Kudos for Zig.</p>
<p>Now, knowing where Windows searches the DLLs we depend on is important as our
installation process will depend on it. They have <a href="https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order#standard-search-order-for-unpackaged-apps">some good documentation for
that</a> which has a very interesting point:</p>
<blockquote>
<p>Standard search order for unpackaged apps:<br>
…<br>
7. The folder from which the application loaded.</p>
</blockquote>
<p>This seems good enough for my purpose, so I can just install my binary and the
libraries in the very same folder. I thought I would need to install stuff
mixed somewhere else and learn a lot about windows, but I didn’t have to. Simple!</p>
<p>The extra data can be stored anywhere I want, because I am the one that
controls the search algorithm, but I still need to have easy access to it. I
decided to go for <code>LOCALAPPDATA</code>, but I’m thinking on putting it in the same
folder as the rest of the program. At the time of writing that’s not done yet.</p>
<h5><span class="caps">NSIS</span></h5>
<p>Having clear where everything should be installed, it’s time to make it
actually install it.</p>
<p><span class="caps">NSIS</span> is a great tool. The look of the vanilla installer is old-school but I
didn’t even bother to use the modern interface because that required me to
think, activity I like to reserve to special occasions (like when I’m paid
extreme amounts of money for it or the time I’m in bed before falling asleep).</p>
<p><span class="caps">NSIS</span> in a nutshell: you write a script, run <code>makensis</code> with the script as an
input and, boom! You have a <code>.exe</code> that installs your stuff.</p>
<p>It took me a little while to understand the structure of the script but once
you learn it is really easy (for the basic things, after that you can go as
hard as you want). It has two concepts you need to understand: Pages and Sections.</p>
<ul>
<li>Pages: define the different <em>pages</em> the user will navigate through during the
process. There are many pages pre-defined, and they cover all the basic
functionality: license agreement, component selection, installation directory selection…</li>
<li>Sections: define the different parts of your installation process. You can
mark some as optional or include them in different installation profiles
(all, minimal, recommended… You’ve seen this before). Then, if you use
<code>components</code> page, the sections will be listed to the user to choose which
one they want to install. You can also add sections for the uninstaller,
which will only be run when the user uninstalls the program.</li>
</ul>
<blockquote>
<p><span class="caps">EXAMPLE</span>: I decided to include the sources also in the installer, but they are
not required to run the program. It’s as simple as adding a Section with the
<code>/o</code> flag and they won’t be automatically checked in the components step.
Really cool stuff!</p>
</blockquote>
<p>The rest of the thing is just commands you can read in the documentation. You
can do many-many things with it. It has environment variables for almost
anything you’ll need, so you don’t need to hardcode things in the script.</p>
<p>In my case I decided to ask the user for a configuration (the Twitch username)
during the installation using a custom page (this requires some digging in the
plugins’ documentation, but it’s not hard either), and created a launcher that
automagically inserts the username in the call to the program (not great for
later configuration, I know). This is done with a Batch file, which also makes
the dirty job of opening the terminal when it’s double clicked.</p>
<p>Here’s the installer script I did, if you want to read it:</p>
<p><a href="https://github.com/ekaitz-zarraga/karkarkar/blob/master/windows/installer.nsi">https://github.com/ekaitz-zarraga/karkarkar/blob/master/windows/installer.nsi</a></p>
<h4><span class="caps">GNU</span>/Linux</h4>
<p><span class="caps">GNU</span>/Linux world is really diverse, so it’s not easy to know about every single
system’s requirements. At the moment I stayed with Guix and Debian because they
are the only distros I use and I’m more familiar with them.</p>
<p>In Windows I was asking to the user to add their username in the installation
process, but in Linux I don’t have any simple (for the user) way to do it, so
for the moment I ask for the username when the program is called with no input
arguments. Ugly, but works for the moment. The goal was to deliver this thing,
not to make it perfect. I can do that later.</p>
<p>The cool part is they provide different approaches for packaging: source vs
binary distribution.</p>
<h5><span class="caps">XDG</span> standard: Desktop file and icons</h5>
<p>We need, of course, to arrange the same things we arranged in Windows: the
desktop icon and launching the terminal automatically. That’s not hard to do
using <span class="caps">XDG</span> specification!</p>
<p>Just add the <code>Terminal=true</code> line and it should open a terminal emulator when
clicked<sup id="fnref:should"><a class="footnote-ref" href="#fn:should">3</a></sup>. The <code>Exec=</code> line in the desktop file has the program you want
to run, and it has to point to it correctly.</p>
<p>Once the <code>.desktop</code> file is done, we realize we need to deal with the icons. I
went just for a <span class="caps">SVG</span> icon, but I could add the rest. The only thing I needed to
do is put everything in the correct folder. Something like this:</p>
<pre><code class="language-bash"> usr
└── share
├── applications
│ └── karkarkar.desktop
└── icons
└── hicolor
└── scalable
└── apps
├── karkarkar.svg
└── karkarkar-symbolic.svg
</code></pre>
<p>Both Guix and Debian detect them properly once they are installed in the folder
where they expect them.</p>
<h5>Guix</h5>
<p>I wrote quite a few Guix packages lately so I’m pretty comfortable with this.</p>
<p>I just need to tell Guix how it should build the program and put some files in
the proper directory.</p>
<p>Also, I added the <code>zig-build-system</code> myself to Guix not that long ago. It’s
pretty straightforward to use.</p>
<p>The icons and the desktop file needs to put everything in the correct place,
<code>#$output/share/...</code>, and patch the Desktop file to point to the correct
binary, <code>#$output/bin/...</code> that is. For this, I kept a desktop file as a
template, with some reasonable defaults and just patched it in the Guix
package. That’s easy.</p>
<p>Not the best Guix package ever but it simply works, and that’s everything I
want at this point.</p>
<h5>Debian</h5>
<p>Debian packages can be exported from Guix package definitions, but it exports
its file structure with every single dependency. In our case, that meant
hundreds of MegaBytes. That’s too much so I did it manually, and the final size
was around 10 MegaBytes. Not bad.</p>
<p>I never made a Debian package before, and I did the most minimalistic thing I
could think of.</p>
<p>First, we need to build the thing:</p>
<pre><code class="language-bash">zig build -Dcpu=baseline
</code></pre>
<p>And then just place everything in place in a folder, and call <code>dpkg-deb</code> on top
of it. In order to do that, I just wrote a bash script that did this whole
thing, it explains what I did way better than I can write in English:</p>
<pre><code class="language-bash"># Run me in the linux/ folder
# I need the version in an argument, which should be Major.Minor-Revision
# for example 0.1-3.
version=$1
outfolder="karkarkar-$version"
mkdir "$outfolder"
mkdir -p "$outfolder/usr/bin"
mkdir -p "$outfolder/usr/share/applications"
mkdir -p "$outfolder/usr/share/AhoTTS/"
mkdir -p "$outfolder/usr/share/icons/hicolor/scalable/apps"
cp -r "../AhoTTS/data_tts" "$outfolder/usr/share/AhoTTS/"
cp "karkarkar.desktop" "$outfolder/usr/share/applications"
cp "../icons/karkarkar.svg" "$outfolder/usr/share/icons/hicolor/scalable/apps"
cp "../zig-out/bin/karkarkar" "$outfolder/usr/bin/"
patchelf "$outfolder/usr/bin/karkarkar" --set-interpreter "/lib64/ld-linux-x86-64.so.2"
mkdir -p "$outfolder/DEBIAN"
cat > $outfolder/DEBIAN/control <<EOF
Package: karkarkar
Version: $version
Section: base
Priority: optional
Architecture: amd64
Depends: libopenal1 (>=1.19.1)
Maintainer: Ekaitz Zarraga <blablablah>
Description: Karkarkar
Listen to a Twitch chat in Basque.
EOF
dpkg-deb --root-owner-group --build "$outfolder"
rm -rf "$outfolder"
</code></pre>
<p>I need to highlight here the <code>zig build</code> I did automatically adds the Guix
dynamic linker to the binary, but that is not where the dynamic linker is in
Debian. I decided to patch the binary (<code>patchelf</code>) instead of trying to
configure the compilation process, I thought this would be easier. I don’t know
if it was easier or not, but it was easy, so that’s ok.</p>
<p>Also note that the <code>DEBIAN/control</code> file has the bare minimum fields, but it’s
enough to work. In Debian, everything is installed in <code>/usr/whatever</code>, but
that’s the only detail I changed.</p>
<p>Something I want to write down to remember later is the debian packages (<code>.deb</code>
files) can be extracted with <code>ar -x</code> and they have a couple of <code>tar.xz</code> files
inside. The <code>data.tar.xz</code> file has the file structure that will later installed
in the system.</p>
<h3 id="testing">Testing</h3>
<p>Yeah, you have to do it too. I tested it in Wine, but still needed to be tested
in Windows. I had a working version done in Zig 0.10.0 that was running well in
Wine, but it exploded in Windows because of <a href="https://github.com/ziglang/zig/issues/8943">this issue</a>, that didn’t
happen in Wine. I needed to use Zig 0.11 because of this error, which wasn’t a
big deal anyway, because I already had it more or less packaged.</p>
<p>So, yes, you have to test in Windows just in case, if you can. I have to thank
my friend (you know who you are!) for testing this program in his computer and
reporting the error.</p>
<h3>Conclusions</h3>
<p>The program itself it’s really underdeveloped, but I actually made a lot of
progress understanding how to make a program reach users in different systems
easily. In the end, <strong>being a solo developer forces me to be clever, and do
everything as simple as I can</strong>.</p>
<p>I don’t normally have many dependencies because I believe software has to be
simple and tailor-made. This makes me control every aspect of the projects I do
but also greatly simplify the distribution. That’s my context.</p>
<p>I already talked about this in the <a href="https://ekaitz.elenq.tech/windows.html">previous post on the subject</a>, but in
the end, being able to do this relieves me from the “Web is Cross-platform”
mentality, which I don’t think is a silver bullet, even though in some cases
might be a simple solution.</p>
<p>I think developers today tend to avoid making native applications and software
quality suffers from that. There are many reasons for this (corporate control,
subscription systems, easy deployments, controlled environments…) but one
might be that the code people is used to write has too many dependencies, and
it is hard to package and distribute. I don’t usually have that problem. I
already said I don’t like having many dependencies.</p>
<p>Now I have an easy way to do this, I can simply focus on writing the actual
code, which is, in the end, the most important part.</p>
<p>Key points:</p>
<ul>
<li>
<p><strong>Zig</strong> happened to make this process really simple, just because the
developers decided to make it simple and they had the engineering skill to
back the decision.</p>
</li>
<li>
<p><strong><span class="caps">NSIS</span></strong> gave me all I needed to put my application in a windows machine
without learning almost anything about Windows. I have the enough information
to make the thing work, not more. <strong><span class="caps">NSIS</span></strong> was the key I was missing.</p>
</li>
<li>
<p><strong>Guix</strong> lets me cross-compile some of the dependencies to mingw, and I did
that at the beginning (and it worked!) so it’s a powerful tool even for that.</p>
</li>
<li>
<p><strong>Wine</strong> is a good way to see everything was working well, but I found some
discrepancies between it and Windows.</p>
</li>
</ul>
<p>The full problem is not completely solved yet. Finding a <strong><span class="caps">GUI</span> toolkit</strong> I can
cross compile easily is still in the <span class="caps">TODO</span> list. But I’m pretty satisfied when
the result until now.</p>
<blockquote>
<p>My colleague Andrius Štikonas talked to me about
<a href="https://github.com/mxe/mxe"><span class="caps">MXE</span></a>. I may try it in the future but I don’t
like the fact that it downloads things by itself. I leave it here just in the
case I need it in the future.</p>
</blockquote>
<p>In fact, I think I can try other programming languages, even interpreted ones.
<strong><span class="caps">NSIS</span></strong> is surely capable of dealing with it. This opens a whole world of
possibilities for me.</p>
<p>In the end, everything has been a very interesting process, making applications
targeted to the non-programmer (and non-gnu/linux) user has always been in my
mind. Now, I can say I almost solved the problem and I have the base to work
more on this in the future.</p>
<p>Finally, the program is released in a very alpha stage in
<a href="https://ekaitz-zarraga.itch.io/karkarkar">itch.io</a> where you can find the
installers and packages and the code is hosted in
<a href="https://github.com/ekaitz-zarraga/karkarkar/">Github</a> until I get completely
mad and I delete my account entirely (which I have no doubt it will happen one
day or another).</p>
<p>And that’s mostly it, now I can focus on extending the behavior.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>There are not that many Basque <span class="caps">TTS</span> systems out there… You know? <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:dll-names">
<p>I still had weird issues with this. In Zig 0.10.1 DLLs where
found even if their name started with <code>lib</code>, but moving to Zig 0.11.0
didn’t find the libraries starting with <code>lib</code>. But Wine did find them so I
didn’t know what to do. When I added the AhoTTS as a submodule the problem
disappeared, but not because it was solved, it was just avoided. <a class="footnote-backref" href="#fnref:dll-names" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:should">
<p><em><span class="dquo">“</span>Should”</em> is the best word choice here, because it doesn’t work for
some people, like in <span class="caps">GNOME</span>, as the terminals are hardcoded:<br>
<a href="https://gitlab.gnome.org/GNOME/glib/-/blob/main/gio/gdesktopappinfo.c#L2685">https://gitlab.gnome.org/<span class="caps">GNOME</span>/glib/-/blob/main/gio/gdesktopappinfo.c#L2685</a> <a class="footnote-backref" href="#fnref:should" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>Bye Protonmail2023-11-21T00:00:00+02:002023-11-21T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2023-11-21:/bye-protonmail.html<p>I left Protonmail. Here is why. I still like them to some degree though.</p><p>The other day in the fediverse a friend of mine asked me about Protonmail.
I explained a little bit my feelings and Protonmail jumped in, making me
finally explain further. I think the conversation is interesting enough to
share here<sup id="fnref:deleted"><a class="footnote-ref" href="#fn:deleted">1</a></sup>.</p>
<section class="masto-thread">
<article class="masto-toot">
<a href="https://mastodon.social/@ekaitz_zarraga"> Ekaitz Zárraga 👹 </a>
<p>I really like <code>@protonmail</code> but they are always getting in the way
with their non-standard things and their bridge which is <span class="caps">FULL</span> of dependencies
and it’s impossible to package for some systems.</p>
<p>They are pushing me away from them too hard…</p>
<p>I won’t be surprised if finally push me away from their service in the mid
term… after many years of trusting them for my business and personal email…
It’s a real shame.</p>
</article>
<article class="masto-toot">
<a href="https://mastodon.social/@protonmail"> Proton Mail</a>
<p><code>@ekaitz_zarraga</code> Can you let us know what kind of dependencies you’re
referring to?</p>
</article>
<article class="masto-toot">
<a href="https://mastodon.social/@ekaitz_zarraga"> Ekaitz Zárraga 👹 </a>
<p><code>@protonmail</code> the Proton bridge has many dependencies:<br>
<a href="https://github.com/ProtonMail/proton-bridge/blob/master/go.mod"> https://github.com/ProtonMail/proton-bridge/blob/master/go.mod</a></p>
<p>Packaging all of them for a distribution is a huge effort. I don’t think you
are really aware of the level of work it requires. Also, you have a .deb and a
.rpm package, which are precompiled… forcing your users to trust those.</p>
<p>My distro and my work are focused on reproducible builds and
bootstrappability… some serious concerns you don’t take in account.</p>
</article>
<article class="masto-toot">
<a href="https://mastodon.social/@ekaitz_zarraga"> Ekaitz Zárraga 👹 </a>
<p><code>@protonmail</code> Also, don’t get me wrong. I love protonmail and its
ideas but I think you are too focused on “normal” users and breaking other
people’s setups without giving much in exchange. I feel like a second class
user in protonmail, as my distro doesn’t support .deb or .rpm packages… and I
need to use plain text email pretty often (which you don’t really support in
the web either).</p>
</article>
<article class="masto-toot">
<a href="https://mastodon.social/@ekaitz_zarraga"> Ekaitz Zárraga 👹 </a>
<p><code>@protonmail</code> I love protonmail, and I’d love to fix these issues, I
would even make a reproducible bridge for you if you ask me to. But I don’t
have the energy to do it by myself. It’s simply not possible to package.</p>
<p>So, here we are. As much as I’d like to continue to work with you and support
you, I don’t feel I can do it anymore</p>
</article>
</section>
<style>
.masto-toot {
border: 2px solid var(--border-color);
border-radius: 10px;
margin: 1rem;
padding: 1rem;
}
</style>
<p>Not long later I simply moved my email out of protonmail, to a different
platform. An Europe based email provider that provides an interaction based on
standards. Standards I can use with <strong>any</strong> setup, in <strong>any</strong> machine.</p>
<p>I wouldn’t care to have a non-standard solution if the Protonmail Bridge
application worked, but they only provide <code>.deb</code> and <code>.rpm</code> packages. I can’t
package the app, because it has too many dependencies to be done in an
acceptable amount of time.</p>
<p>Also, the Web client is getting more and more complex. My anti-tracking plugins
(like jShelter) tell me they are fingerprinting me when I reach the login
screen. Why? Who knows. I contacted them and told them about this and, of
course, I didn’t talk with a technical person, because you are not supposed to
do that, so I don’t think my words reached anyone that could understand them,
or consider them.</p>
<p>Maybe it’s me who changed. I don’t need the default <span class="caps">PGP</span> configuration anymore
because I can configure it myself, I realized I am more in the need of being
able to easily <code>git send-email</code> than using a beautiful Web <span class="caps">UI</span> that tracks me or
uses modern <span class="caps">JS</span> features. I use a weird distro now, which shouldn’t be a problem
but it happens to be, and I realized having too many dependencies in the code
is often a problem in several dimensions.</p>
<p>So, something that happens too often in my life happened again: being a
technical user has been punished again in favour of the concept of <em>dumb
users</em>. The funny thing of all this is I don’t think <em>dumb users</em> exist. We
should discuss that another time.</p>
<p>They are a company, they want to grow, so they must try to sell for the
<em>baseline</em> user. The minimal amount of knowledge a person can have. Selling a
product for “expert” users is lost money, there are not that many “experts” in
this world after all. So it’s easier to add layers and layers of complexity to
your software in order to provide a <em>dumb proof</em> interface, instead of
educating your userbase, or letting the educated ones to customize their stuff.
Don’t get me wrong, it makes perfect sense, and Protonmail’s mission is to
provide default encryption to the largest proportion of email users possible so
the decision fits their mission<sup id="fnref:good"><a class="footnote-ref" href="#fn:good">2</a></sup>.</p>
<p>The default encryption and the “easy” <span class="caps">PGP</span> key setup Protonmail offers is really
cool for users that don’t require more level of customization. I still like the
goals of the company but I could’ve used a simpler way to customize my
experience: maybe a simpler bridge? Maybe something else.. I don’t know.</p>
<p>In the end, they pushed me away from their service.</p>
<p>So long, Protonmail. It’s been a good time together.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:deleted">
<p>My posts in Mastodon are also automatically deleted so if you try
to read it later in there you might not find it. I’m copying it here as a
reference. <a class="footnote-backref" href="#fnref:deleted" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:good">
<p>Regardless of anything I said here, they are making many people
encrypt their email, one way or another, and I’ll continue to do so. That
is valuable. <a class="footnote-backref" href="#fnref:good" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Mes released and bootstrappable TCC merged2023-11-16T00:00:00+02:002023-11-16T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2023-11-16:/bootstrapGcc9.html<p>Some merging and releasing has been done. So here we are.</p><p>So, some merging and releasing has been done so we need to update a little bit
on what we talked about in the previous blog post.</p>
<h3>Mes</h3>
<p>We spent some time more testing what we shared in the previous post with you
and now we can proudly say our work has been merged in Mes, and has been
released with it in <span class="caps">GNU</span> Mes 0.25.</p>
<p>You can read the <span class="caps">GNU</span> Mes 0.25 release notes in Janneke’s blog in the following link:</p>
<p><a href="http://joyofsource.com/gnu-mes-025-released.html">http://joyofsource.com/gnu-mes-025-released.html</a></p>
<h3>Bootstrappable TinyCC</h3>
<p>We are also very happy to announce that our changes to the bootstrappable
TinyCC have been merged to Janneke’s repository that is used for the official
Guix bootstrapping process. You can see the changes<sup id="fnref:changes"><a class="footnote-ref" href="#fn:changes">1</a></sup> being included here:</p>
<p><a href="https://gitlab.com/janneke/tinycc/-/tree/mes-0.25?ref_type=heads">https://gitlab.com/janneke/tinycc/-/tree/mes-0.25?ref_type=heads</a></p>
<h3>Some words about it</h3>
<p>All of us are of course very happy about this, and this didn’t make us relax,
as we continued to push fixes and test all this in more ways, looking always
for the next challenge.</p>
<p>We should enjoy this moment a little bit more, and that’s why I am posting this.</p>
<p>I want also to thank again the people that took part in this, specially Andrius
for his help, for all the hours of sleep he lost during the process and for
giving a second life to this effort, when I thought I was too tired to
continue; and Janneke, who very patiently reviewed every single contribution
and has been pushing me since the very early beginning of this adventure, when
I was thinking about accepting the challenge or continue with my life. I’m glad
I chose the adventure.</p>
<p>Of course, this is a cool milestone for all of us, we worked hard to make this
but specially for me it means a lot. I’ve been working on this for almost two
years now, and since I my changes on the bootstrappable TinyCC have been
sitting in my repository since the last year, when I finished the previous
NlNet grant. In fact, all that I did in that grant was sitting there, nothing
was merged in the actual Guix bootstrap, as what I did were very specific parts
of the chain, but they lacked the connection with other steps.</p>
<p>When you work in a project like that there’s almost no satisfaction. No
releases, no upstreaming and, in my case, almost no help and no company.
Everything that I did could be sitting there in my repos forever.</p>
<p>At the time when I did the backport of TinyCC I was unsure of what I did and I
was exhausted after all the work I did with <span class="caps">GCC</span>. When we started this second
round I thought everything was going to be broken. And it was, but it was much
better than I thought!</p>
<p>Now, being part of the official Bootstrappable TinyCC means I can finally close
that chapter, which was full of uncertainty, and actually have some interesting
feedback of all that work I did that it seemed useless at the time when I did it.</p>
<p>It happened to be useful after all.</p>
<p>Now let’s see if <span class="caps">GCC</span> happens to be as useful as this was.</p>
<p>Cheers, dear reader. We deserve to celebrate.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:changes">
<p>The commits we had have been reordered and squashed as the changes
were split in around 40 different commits that were done as we found the
errors. I managed to rearrange them in a few commits that have way more
sense. I say it just in case you start looking for the independent commits:
they are gone. My repository is still keeping the branches and tags we
mentioned before so you can still go there to find the changes the way we
did them. <a class="footnote-backref" href="#fnref:changes" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Milestone — MesCC builds TinyCC and fun C errors for everyone2023-10-30T00:00:00+02:002023-10-30T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2023-10-30:/bootstrapGcc8.html<p>We spent the last months making MesCC able to compile TinyCC and making the
result of that compilation able to compile TinyCC. Many cool problems
appeared, this is the summary of our work.</p><p>It’s been a while since the latest technical update in the project and I am
fully aware that you were missing it so it’s time to recap with a really cool announcement:</p>
<p><span style="font-size: larger">
<strong>We finally made a self-hosted Bootstrappable TinyCC in <span class="caps">RISC</span>-V</strong>
</span></p>
<p>Most of you probably remember I <a href="bootstrapGcc6.html">already backported</a> the
Bootstrappable TinyCC compiler, but I didn’t test it in a proper environment.
Now, we can confidently say it is able to compile itself, a “large” program
that makes use of more complex C features than I did in the tests.</p>
<p>All this work was done by Andrius Štikonas and myself. Janneke helped us a lot
with Mes related parts, too. The work this time was pretty hard, honestly. Most
of the things we did here are not obvious, even for C programmers.</p>
<p>I’m not used to this kind of quirks of the C language. Most of them are really
specific, related with the standards and many others are just things were
missing. I hope the ones I chose to discuss here help you understand your
computing better, as they did to me.</p>
<p>This is going to be veery long post, so take a ToC to help you out:</p>
<ol>
<li><a href="#context">Context</a><ol>
<li><a href="#why-important">Why is this important?</a></li>
</ol>
</li>
<li><a href="#problems">Problems fixed</a><ol>
<li><a href="#tinycc-missing-instructions">TinyCC misses assembly instructions needed for MesLibC</a></li>
<li><a href="#tcc-assembly">TinyCC’s assembly syntax is weird</a></li>
<li><a href="#extended-assembly">TinyCC does not support Extended Asm in <span class="caps">RV64</span></a></li>
<li><a href="#main-args">MesLibC <code>main</code> function arguments are not set properly</a></li>
<li><a href="#dollars">TinyCC says <code>__global_pointer$</code> is not a valid symbol</a></li>
<li><a href="#tcc-casting-issues">Bootstrappable TinyCC’s casting issues</a></li>
<li><a href="#long-double">Bootstrappable TinyCC’s <code>long double</code> support was missing</a></li>
<li><a href="#mescc-struct-init">MesCC struct initialization issues</a></li>
<li><a href="#size-problems">MesCC vs TinyCC size problems</a></li>
<li><a href="#mes-signed-shift">MesCC add support for signed shift operation</a></li>
<li><a href="#broken-case">MesCC switch/case falls-back to default case</a></li>
<li><a href="#got">Boostrappable TinyCC problems with <span class="caps">GOT</span></a></li>
<li><a href="#wrong-conditionals">Bootstrappable TinyCC generates wrong assembly in conditionals</a></li>
<li><a href="#varargs">Support for variable length arguments</a></li>
<li><a href="#int8">MesLibC use <code>signed char</code> for <code>int8_t</code></a></li>
<li><a href="#jmp">MesLibC Implement <code>setjmp</code> and <code>longjmp</code></a></li>
<li><a href="#more">More</a></li>
</ol>
</li>
<li><a href="#reproducing">Reproducing what we did</a><ol>
<li><a href="#live-bootstrap">Using live-bootstrap</a></li>
<li><a href="#guix">Using Guix</a></li>
</ol>
</li>
<li><a href="#conclusions">Conclusions</a></li>
<li><a href="#next">What is next?</a></li>
</ol>
<h3 id="context">Context</h3>
<p>You have many blogposts in the series to find the some context about the
project, and even a <span class="caps">FOSDEM</span> talk about it, but they all give a very broad
explanation, so let’s focus on what we are doing right now.</p>
<p>Here we have Mes, a Scheme interpreter, that runs MesCC, a C compiler, that is
compiling our simplified fork of TinyCC, let’s call that Bootstrappable TinyCC.
That Bootstrappable TinyCC compiler then tries to compile its own code. It
compiles it’s own code because it’s goal is to add more flags in each
compilation, so it has more features in each round<sup id="fnref:rounds"><a class="footnote-ref" href="#fn:rounds">1</a></sup>. We do all this
because TinyCC is way faster than MesCC and it’s also more complex, but MesCC
is only able to build a simple TinyCC with few features enabled.</p>
<p>During all this process we use a standard library provided by the Mes project,
we’ll call it MesLibC, because we can’t build glibc at this point, and TinyCC
does not provide it’s own C standard library.</p>
<p>With all this well understood, this is the achievement:</p>
<p><strong>We made MesCC able to compile the Bootstrappable TinyCC, using MesLibC, to an
executable that is able to compile the Bootstrappable TinyCC’s codebase to a
binary that works and has all the features we need enabled.</strong><sup id="fnref:self-hosted"><a class="footnote-ref" href="#fn:self-hosted">2</a></sup></p>
<p>The process affected all the pieces in the system. We added changes in MesCC,
MesLibC and the Bootstrappable TinyCC.</p>
<h4 id="why-important">Why is this important?</h4>
<p>We already talked long about the bootstrapping issue, the trusting trust attack
and all that. I won’t repeat that here. What I’ll do instead is to be specific.
This step is a big thing because this allows us to go way further in the chain.</p>
<p>All the steps before Mes were already ported to <span class="caps">RISC</span>-V mostly thanks to Andrius
Štikonas who worked in <a href="https://github.com/oriansj/stage0-posix">Stage0-<span class="caps">POSIX</span></a> and the rest of glue projects
that are needed to reach Mes.</p>
<p>Mes had been ported to <span class="caps">RISC</span>-V (64 bit) by <span class="caps">W. J.</span> van der Laan, and some patches
were added on top of it by Andrius Štikonas himself before our current effort started.</p>
<p>At this moment in time, Mes was unable to build our bootstrappable TinyCC in
<span class="caps">RISC</span>-V, the next step in the process, and the bootstrappable TinyCC itself was
unable to build itself either. This was a very limiting point, because TinyCC
is the first “proper” C compiler in the chain.</p>
<p>When I say “proper” I mean fast and fully featured as a C compiler. In x86,
TinyCC is able to compile old versions of <span class="caps">GCC</span>. If we manage to port it to
<span class="caps">RISC</span>-V we will eventually be able to build <span class="caps">GCC</span> with it and with that the world.</p>
<p>In summary, TinyCC is a key step in the bootstrapping chain.</p>
<h3 id="problems">Problems fixed</h3>
<p>This work can be easily followed in the commits in my <span class="caps">TCC</span> fork’s
<a href="https://github.com/ekaitz-zarraga/tcc/tree/riscv-mes"><code>riscv-mes</code></a> branch, and in my Mes clone’s <a href="https://github.com/ekaitz-zarraga/mes/tree/riscv-tcc-boot"><code>riscv-tcc-boot</code></a>
branch. We are also identifying the contents of this blogpost in the git
history by adding the git tag <code>self-hosted-tcc-rv64</code> to both of my forks. We
will try to keep both for future reference.</p>
<p>In Mes the process might be a little bit harder to follow because we sent most
of the patches to Janneke and he merged them so when we were about to release
this post I continued from Janneke’s branch to avoid divergences (I had some
problems with that before). In any case, the code is there and searching by
authors (Andrius and myself) would guide you to the changes we did.</p>
<p>Many commits have a long message you can go read there, but this post was born
to summarize the most interesting changes we did, and write them in a more
digestible way. Lets see if I manage to do that.</p>
<p>The following list is not ordered in any particular way, but we hope the
selection of problems we found is interesting for you. We found some errors
more, but these are the ones we consider more relevant.</p>
<h4 id="tinycc-missing-instructions">TinyCC misses assembly instructions needed for MesLibC</h4>
<p>TinyCC is not like <span class="caps">GCC</span>, TinyCC generates binary code directly, no assembly code
in between. TinyCC has a separate assembler that doesn’t follow the path that C
code follows.</p>
<p>It works the same in all architectures, but we can take <span class="caps">RISC</span>-V as an example:</p>
<p>TinyCC has <code>riscv64-gen.c</code> which generates the binary files, but
<code>riscv64-asm.c</code> file parses assembly code and also generates binary. As you can
see, binary generation is somehow duplicated.</p>
<p>In the <span class="caps">RISC</span>-V case, the C part had support for mostly everything since my
backport, but the assembler did not support many instructions (which, by the
way are supported by the C part).</p>
<p>MesLibC’s <code>crt1.c</code> is written in assembly code. Its goal is to prepare the
<code>main</code> function and call it. For that it needs to call <code>jalr</code> instruction and
others that were not supported by TinyCC, neither upstream nor our
bootstrappable fork.</p>
<p>These changes appear in several commits because I didn’t really understood how
the TinyCC assembler worked, and some instructions need to use relocations
which I didn’t know how to add. The following commit can show how it feels to
work on this, and shares how relocations are done:</p>
<p>There you can see we started to understand things in TinyCC, but some other
changes came after this.</p>
<p>A very important not here is upstream TinyCC does not have support for these
instructions yet so we need to patch upstream TinyCC when we use it, contribute
the changes or find other kind of solutions. Each solution has its downsides
and upsides, so we need to take a decision about this later.</p>
<h4 id="tcc-assembly">TinyCC’s assembly syntax is weird</h4>
<p>Following with the previous fix, TinyCC does not support <span class="caps">GNU</span>-Assembler’s syntax
in <span class="caps">RISC</span>-V. It uses a simplified assembly syntax instead.</p>
<p>When we would do:</p>
<pre><code class="language-asm">sd s1, 8(a0)
</code></pre>
<p>In TinyCC’s assembly we have to do:</p>
<pre><code class="language-asm">sd a0, s1, 8
</code></pre>
<p>This requires changes in MesLibC, and it makes us create a separate folder for
TinyCC in MesLibC. See <code>lib/riscv64-mes-tcc/</code> and <code>lib/linux/riscv64-mes-tcc</code>
for more details.</p>
<h4 id="extended-assembly">TinyCC does not support Extended Asm in <span class="caps">RV64</span></h4>
<p>Way later in time we also found TinyCC does not support <a href="https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html">Extended Asm</a>
in <span class="caps">RV64</span>. The functions that manage that are simply empty.</p>
<p>We spent some time until we realized what was going on in here for two reasons.
First, there are few cases of Extended Asm in the code we were compiling.
Second, it was failing silently.</p>
<p>Extended Asm is important because it lets you tell the compiler you are going
to touch some registers in the assembly block, so it can protect variables and
apply optimizations properly.</p>
<p>In our case, our assembly blocks were clobbering some variables that would have
been protected by the compiler if the Extended Asm support was implemented.</p>
<p>Andrius found all the places in MesLibC where Extended Asm was used and rewrote
the assembly code to keep variables safe in the cases it was needed.</p>
<p>The other option was to add Extended Asm support for TinyCC, but we would need
to add it in the Bootstrappable TinyCC and also upstream. This also means
understanding TinyCC codebase very well and making the changes without errors,
so we decided to simplify MesLibC, because that is easier to make right. We are
probably going to need to do this later on anyway, but we’ll try to delay this
as much as possible.</p>
<h4 id="main-args">MesLibC <code>main</code> function arguments are not set properly</h4>
<p>Following the previous problem with assembly, we later found input arguments of
the <code>main</code> function, that come from the command line arguments, were not
properly set by our MesLibC. Andrius also took care of that in
<a href="https://github.com/ekaitz-zarraga/mes/commit/4f4a11745d1c7ed0995e9d31c7994abfb4a60b25">4f4a1174</a> in Mes.</p>
<p>This error was easier to find than others because when we found issues with
this we already had a compiled TinyCC. So we just needed to fix simple things
around it.</p>
<h4 id="dollars">TinyCC says <code>__global_pointer$</code> is not a valid symbol</h4>
<p>This is a small issue that was a headache for a while, but it happened to be a
very simple issue.</p>
<p>In <span class="caps">RISC</span>-V there’s a symbol, <code>__global_pointer$</code>, that is used for dynamic
linking, defined in the <span class="caps">ABI</span>. But TinyCC had issues to parse code around it and
it took us some time to realize it was the dollar sign (<code>$</code>) which was causing
the issues in this point.</p>
<p>TinyCC does not process dollars in identifiers unless you specifically set a
flag (<code>-fdollars-in-identifiers</code>) when running it. In the <span class="caps">RISC</span>-V case, that
flag must be always active because if it isn’t the <code>__global_pointer$</code> can’t be processed.</p>
<p>We tried to set that flag in the command line but we had other issues in the
command line argument parsing (we found and fixed them later later) so we just
hardcoded it.</p>
<p>This issue is interesting because it’s an extremely simple problem, but its
effect appears in weird ways and it’s not always easy to know where the problem
is coming from.</p>
<h4 id="tcc-casting-issues">Bootstrappable TinyCC’s casting issues</h4>
<p>This one was a really hard one to fix.</p>
<p>When running our Bootstrappable TinyCC to build MesLibC we found this error:</p>
<pre><code class="language-nothing"> cannot cast from/to void
</code></pre>
<p>We managed to isolate a piece of C code that was able to replicate the
problem.<sup id="fnref:reproducer"><a class="footnote-ref" href="#fn:reproducer">3</a></sup></p>
<pre><code class="language-clike">long cast_charp_to_long (char const *i)
{
return (long)i;
}
long cast_int_to_long (int i)
{
return (long)i;
}
long cast_voidp_to_long (void const *i)
{
return (long)i;
}
void main(int argc, char* argv[]){
return;
}
</code></pre>
<p>Compiling this file raised the same issue, but then I realized I could remove
two of the functions on the top and the error didn’t happen. Adding one of
those functions back raised the error again.</p>
<p>I tried to change the order of the functions and the functions I chose to add,
and I could reproduce it: if there were two functions it failed but it could
build with only one.</p>
<p>Andrius found that the function type was not properly set in the <span class="caps">RISC</span>-V code
generation and its default value was <code>void</code>, so it only failed when it compiled
the second function.</p>
<p>Knowing that, we could take other architectures as a reference to fix this, and
so we did.</p>
<p>See <a href="https://github.com/ekaitz-zarraga/tcc/commit/6fbd17852aa11a2d0bc047183efaca4ff57ab80c">6fbd1785</a>.</p>
<h4 id="long-double">Bootstrappable TinyCC’s <code>long double</code> support was missing</h4>
<p>When I backported the <span class="caps">RISC</span>-V support to our Bootstrappable TinyCC I missed the
<code>long double</code> support and I didn’t realize that because I never tested large
programs with it.</p>
<p>The C standard doesn’t define a size for <code>long double</code> (it just says it has to
be at least as long as the <code>double</code>), but its size is normally set to 16 bytes.
All this is weird in <span class="caps">RV64</span>, because it doesn’t have 16 byte size registers. It
needs some extra support.</p>
<p>Before we fixed this, the following code:</p>
<pre><code class="language-clike">long double f(int a){
return a;
}
</code></pre>
<p>Failed with:</p>
<pre><code class="language-nothing"> riscv64-gen.c:449 (`assert(size == 4 || size == 8)`)
</code></pre>
<p>Because it was only expecting to use <code>double</code>s (8 bytes) or <code>float</code>s (4 bytes).</p>
<p>In upstream TinyCC there were some commits that added <code>long double</code> support
using, and I quote, a <em>mega hack</em>, so I just copied that support to our
Bootstrappable TinyCC.</p>
<p>See <a href="https://github.com/ekaitz-zarraga/tcc/commit/a7f3da33456b4354e0cc79bb1e3f4c665937395b">a7f3da33456b</a>.</p>
<p>After this commit, some extra problems appeared with some missing symbols. But
these errors were link-time problems, because TinyCC had the floating point
helper functions needed for <span class="caps">RISC</span>-V defined in <code>lib/lib-arm64.c</code>, because they
were reusing aarch64 code for them.</p>
<p>After this, we also compile and link <code>lib-arm64.c</code> and we have <code>long double</code>
support.</p>
<h4 id="mescc-struct-init">MesCC struct initialization issues</h4>
<p>This one was a lot of fun. Our Bootstrappable TinyCC exploded with random
issues: segfaults, weird branch decisions…</p>
<p>After tons of debugging Andrius found some values in <code>struct</code>s were not set
properly. As we don’t really know TinyCC’s codebase really well, that was hard
to follow and we couldn’t really know where was the value coming from.</p>
<p>Andrius finally realized some <code>struct</code>s were not initialized properly. Consider
this example:</p>
<pre><code class="language-clike">typedef struct {
int one;
int two;
} Thing;
Thing a = {0};
</code></pre>
<p>That’s supposed to initialize <em>all</em> fields in the <code>Thing</code> <code>struct</code> to <code>0</code>,
according to the C standard<sup id="fnref:cppref"><a class="footnote-ref" href="#fn:cppref">4</a></sup>.</p>
<p>As a first solution we set struct fields manually to <code>0</code>, to make sure they
were initialized properly. See <a href="https://github.com/ekaitz-zarraga/tcc/commit/29ac0f40a7afba6a2d055df23a8ee2ee2098529e">29ac0f40a7afb</a></p>
<p>After some debugging we found that the fields that were not explicitly set were
initialized to <code>22</code>. So I decided to go to MesCC and see if the struct
initialization was broken.</p>
<p>This was my first dive in MesCC’s code, and I have to say it’s really easy to
follow. It took me some time to read through it because I’m not that used to
<code>match</code>, but I managed to find the struct initialization code.</p>
<p>What I found in MesCC is there was a <code>22</code> hardcoded in the struct
initialization code, probably coming from some debug code that never was
removed. As no part of the x86 bootstrapping used that kind of initializations,
or nothing relied on them, the error went unnoticed.</p>
<p>I set that to <code>0</code>, as it should be, and continued with our life.</p>
<h4 id="size-problems">MesCC vs TinyCC size problems</h4>
<p>The C standard does not set a size for integers. It only sets relative sizes:
<code>short</code> has to be shorter or equal to <code>int</code>, <code>int</code> has to be shorter or equal
to a <code>long</code>, and so on. If you platform wants, all the integers, including the
<code>char</code>s can have 8 bits, and that’s ok for the C standard.</p>
<p>TinyCC’s <span class="caps">RISC</span>-V backed was written under the assumption that <code>int</code> is 32 bit
wide. You can see this happening in <code>riscv64-gen.c</code>, for example, here:</p>
<pre><code class="language-clike"> EI(0x13, 0, rr, rr, (int)pi << 20 >> 20); // addi RR, RR, lo(up(fc))
</code></pre>
<p>The bit shifting there is done to clear the upper 20 bits of the pi variable.
This code’s behavior might be different from one platform to another. Taking
the example before, of that possible platform that only has 8 bit integers,
this code would send a <code>0</code> instead of the lower 12 bits of <code>pi</code>.</p>
<p>In our case, we had MesCC using the whole register width, 64bits, for temporary
values so the lowest <code>44</code> bits were left and the next assertion that checked
the immediate was less than 12 bits didn’t pass.</p>
<p>This is a huge problem, as most of the code in the <span class="caps">RISC</span>-V generation is written
using this style.</p>
<p>There are other ways to do the same thing (<code>pi & 0xFFF</code> maybe?) in a more
portable way, but we don’t know why upstream TinyCC decided to do it this way.
Probably they did because <span class="caps">GCC</span> (and TinyCC itself) use 32 bit integers, but they
didn’t handle other possible cases, like the one we had here with MesCC.</p>
<p>In any case, this made us rethink MesCC, dig on how are its integers defined,
how to change this to be compatible with TinyCC and so on, but I finally
decided to add casts in the middle to make sure all this was compiled as expected.</p>
<p>It was a good reason to make us re-think MesCC’s integers, but it took a very
long time to deal with this, that could be better used in something else. Now,
we all became paranoids about integers and we still think some extra errors
will arise from them in the future. Integers are hard.</p>
<h4 id="mes-signed-shift">MesCC add support for signed shifting</h4>
<p>Integers were in our minds for long, as described in the previous block, but I
didn’t talk about signedness in that one.</p>
<p>Following one of the crazy errors we had in TinyCC, I somehow realized (I don’t
remember how!) that we were missing signed shifting support in MesCC. I think
that I found this while doing some research of the code MesCC was outputting
when I spotted some bit shifts done using unsigned instructions for signed
values and I started digging in MesCC to find out why. I finally realized that
there was no support for that and the shift operation wasn’t selected
depending on the signedness of the value being shifted.</p>
<p>Let’s see this with an example:</p>
<pre><code class="language-clike">signed char a = 0xF0;
unsigned char b = 0xF0;
// What is this? (Answer: 0xFF => 255)
a >> 4;
// And this? (Answer: 0x0F => 15)
b >> 4;
</code></pre>
<p>In the example you can see the shifting operation does not work the same way if
the value is signed or not. If you always use the unsigned version of the <code>>></code>
operation, you don’t have the results you expected. Signs are also hard.</p>
<p>In this case, like in many others, the fix was easier than realizing what was
going wrong. I just added support for the signed shifting operation, not only
for <span class="caps">RISC</span>-V but for all architectures, and I added the correct signedness check
to the shifting operation to select the correct instruction. The patch (see
<a href="https://github.com/ekaitz-zarraga/mes/commit/88f24ea8661dd279c2a919f8fbd5f601bb2509ae">88f24ea8</a> in Mes) is very clean and easy to read, because
MesCC’s codebase is really well ordered.</p>
<blockquote>
<p><span class="caps">EDIT</span>: Some person in the web noted I called the <em>bit-shift</em> operations
<em>rotation</em> operations. I normally use both words interchangeably but it is
true they don’t mean the exact same thing. A shift is when the values are
lost, and a rotation when they come from the other side of the register. I
edited the article to use the correct word.</p>
</blockquote>
<h4 id="broken-case">MesCC switch/case falls-back to default case</h4>
<p>In the early bootstrap runs, our Bootstrappable TinyCC it did weird things.
After many debugging sessions we realized the <code>switch</code> statements in
<code>riscv64-gen.c</code>, more specifically in <code>gen_opil</code>, were broken. The fall-backs
in the <code>switch</code> were automatically directed to the <code>default</code> case. Weird!</p>
<p>MesCC has many tests so I read all that were related with the <code>switch</code>
statements and the ones that handled the fall-backs were all falling-back to
the <code>default</code> case, so our weird behavior wasn’t tested.</p>
<p>I added the tests for our case and read the disassemble of simple examples when
I realized the problem.</p>
<p>Each of the <code>case</code> blocks has two parts: the clause that checks if the value
of the expression is the one of the case, and the body of the case itself.</p>
<p>The <code>switch</code> statement generation was doing some magic to deal with <code>case</code>
blocks, but it was failing to deal with complex fall-through schemes because
the clause of the target <code>case</code> block was always run, making the code fall to
the <code>default</code> case, as the clause was always false because the one that matched
was the one that made the fall-back.</p>
<p>There were some problems to fix this, as NyaCC (MesCC’s C parser) returns
<code>case</code> blocks as nested when they don’t have a <code>break</code> statement:</p>
<pre><code class="language-lisp">(case testA
(case testB
(case testC BODY)))
</code></pre>
<p>Instead of doing this, I decided to flatten the <code>case</code> blocks with empty
bodies. This way we can deal with the structure in a simpler way.</p>
<pre><code class="language-lisp">((case testA (expr-stmt))
(case testB (expr-stmt))
(case testC BODY))
</code></pre>
<p>Once this is done, I expanded each <code>case</code> block to a jump that jumps over the
clause, the clause and then its body. Doing this, the fall-back doesn’t
re-evaluate the clause, as it doesn’t need to. The generated code looks like
this in pseudocode:</p>
<pre><code class="language-assembly"> ;; This doesn't have the jump because it's the first
CASE1:
testA
CASE1_BODY:
...
goto CASE2_BODY
CASE2:
testB
CASE2_BODY:
...
goto CASE3_BODY
CASE3:
testB
CASE3_BODY:
...
</code></pre>
<p>If one of the <code>case</code>s has a <code>break</code>, it’s treated as part of its body, and it
will end the execution of the <code>switch</code> statement normally, no fall-back.</p>
<p>This results in a simpler <code>case</code> block control. The previous approach dealt
with nested <code>case</code> blocks and tried to be clever about them, but
unsuccessfully. The best thing about this commit is most of the cleverness was
simply removed with a simple solution (flatten all the things!).</p>
<p>It wasn’t that easy to implement, but I first built a simple prototype and
Janneke’s scheme magic made my approach usable in production.</p>
<p>All this is added in Mes’s codebase in several commits, as we needed some
iterations to make it right. <a href="https://github.com/ekaitz-zarraga/mes/commit/22cbf823582e3699b6a21ee0cf74c2dbf0a6a4e9">22cbf823582</a> has the base of this commit,
but there were some iterations more in Mes.</p>
<h4 id="got">Boostrappable TinyCC problems with <span class="caps">GOT</span></h4>
<p>The Global Offset Table is a table that helps with relocatable binaries. Our
Bootstrappable TinyCC segfaulted because it was generating an empty <span class="caps">GOT</span>.</p>
<p>Andrius debugged upstream TinyCC alongside ours and realized there was a
missing check in an <code>if</code> statement. He fixed it in
<a href="https://github.com/ekaitz-zarraga/tcc/commit/f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9">f636cf3d4839d1ca</a>.</p>
<p>The problem with this kind of errors is TinyCC’s codebase is really hard to
read. It’s a very small compiler but it’s not obvious to see how things are
done on it, so we had to spend many hours in debugging sessions that went
nowhere. If we had a compiler that is easier to read and change, it would be
way simpler to fix and we would have had a better experience with it.</p>
<h4 id="wrong-conditionals">Bootstrappable TinyCC generates wrong assembly in conditionals</h4>
<p>We spent a long time debugging a bug I introduced during the backport when I
tried to undo some optimization upstream TinyCC applied to comparison operations.</p>
<p>Consider the following code:</p>
<pre><code class="language-clike">if ( x < 8 )
whatever();
else
whatever_else();
</code></pre>
<p>Our Bootstrappable TinyCC was unable to compile this code correctly, instead,
it outputted a code that always took the same branch, regardless of the value
in <code>x</code>.</p>
<p>In TinyCC, a conditional like <code>if (x < CONSTANT)</code> has a special treatment, and
it’s converted to something like this pseudoassembly:</p>
<pre><code class="language-pseudo">load x to a0
load CONSTANT to a1
set a0 if less than a1
branch if a0 not equal 0 ; Meaning it's `set`
</code></pre>
<p>This behaviour uses the <code>a0</code> register as a flag, emulating what other CPUs
use for comparisons. <span class="caps">RISC</span>-V doesn’t need that, but it’s still done here
probably for compatibility with other architectures. In <span class="caps">RISC</span>-V it could look
like this:</p>
<pre><code class="language-pseudo">load x to a0
load CONSTANT to a1
branch if a0 less than a1
</code></pre>
<p>You can easily see the <code>branch</code> <span class="dquo">“</span>instruction” does a different comparison in
one case versus the other. In the one in the top it checks if <code>a0</code> is set,
and in the other checks if <code>a0</code> is smaller than <code>a1</code>.</p>
<p>TinyCC handles this case in a very clever way (maybe too clever?). When they
emit the <code>set a0 if less than a1</code> instruction they replace the current
comparison operation with <code>not equal</code> and they remove the <code>CONSTANT</code> and
replace it with a <code>0</code>. That way, when the <code>branch</code> instruction is generated,
they insert the correct clause.</p>
<p>In my code I forgot to replace the comparison operator so the branch checked
<code>if a0 is less than 0</code> and it was always false, as the <code>set</code> operation writes
a <code>0</code> or a <code>1</code> and none of them is less than <code>0</code>.</p>
<p>The commit <a href="https://github.com/ekaitz-zarraga/tcc/commit/5a0ef8d0628f719ebb01c952797a86a14051228c">5a0ef8d0628f719</a> explains this in a more technical way,
using actual <span class="caps">RISC</span>-V instructions.</p>
<p>This was also a hard to fix, because TinyCC’s variable names (<code>vtop->c.i</code>) are
really weird and they are used for many different purposes.</p>
<h4 id="varargs">Support for variable length arguments</h4>
<p>In C you can define functions with variable argument length. In <span class="caps">RISC</span>-V, those
arguments are sent using registers while in other architectures are sent using
the stack. This means the <span class="caps">RISC</span>-V case is a little bit more complex to deal
with, and needs special treatment.</p>
<p>Andrius realized in our Bootsrappable TinyCC we had issues with variable length
arguments, specially in the most famous function that uses them: <code>printf</code>. He
also found that the problem came from the arguments not being properly set and
found the problem.</p>
<p>Reading upstream TinyCC we found they use a really weird system for the defines
that deal with this. They have a header file, <code>include/tccdefs.h</code>, which is
included in the codebase, but also processed by a tool that generates strings
that are later injected at execution time in TinyCC.</p>
<p>This was too much for us so we just extracted the simplest variable arguments
definitions for <span class="caps">RISC</span>-V and introduced that in MesLibC and our Bootstrappable TinyCC.</p>
<h5>Extra: files generated with no permissions</h5>
<p>The bootstrappable TinyCC built using MesCC generated files with no permissions
and Andrius found that this problem came from the variable length argument
support definitions. So he fixed that, too<sup id="fnref:stikonas"><a class="footnote-ref" href="#fn:stikonas">5</a></sup>.</p>
<p>The macro that defined <code>va_start</code> was broken pointer arithmetic. At the
beginning he thought it was related with MesCC’s internals but he tested in <span class="caps">GCC</span>
later and realized the problem was in the macro definition. That’s why
currently the commit says “workaround” in the name, but it’s more than a
workaround: it’s a proper fix. We are rewording that, but that would happen
after we release this post.</p>
<h4 id="int8">MesLibC use <code>signed char</code> for <code>int8_t</code></h4>
<p>We already had a running Bootstrappable TinyCC compiled using MesCC when we
stumbled upon this issue. Somehow, when assembling:</p>
<pre><code class="language-asm">addi a0, a0, 9
</code></pre>
<p>The code was trying to read <code>9</code> as a register name, and failed to do it (of
course). It was weird to realize that the following code (in <code>riscv64-asm.c</code>)
was always using the true branch in the <code>if</code> statement, even if
<code>asm_parse_regvar</code> returned <code>-1</code>:</p>
<pre><code class="language-clike">int8_t reg;
...
if ((reg = asm_parse_regvar(tok)) != -1) {
...
} else ...
</code></pre>
<p>I disassembled and saw something like this:</p>
<pre><code class="language-pseudoassembly">call asm_parse_regvar ;; Returns value in a0
reg = a0
a0 = a0 + 1
branch if a0 equals 0
</code></pre>
<p>This looks ok, it does some magic with the <code>-1</code> but it makes sense anyway. The
problem is that it didn’t branch because <code>a0</code> was <code>256</code> even when
<code>asm_parse_regvar</code> returned <code>-1</code>.</p>
<p>During some of the <code>int</code> related problems someone told me in the Fediverse that
<code>char</code><span class="quo">‘</span>s default signedness is not defined in the C standard. I read MesLibC
and, exactly: <code>int8_t</code> was defined as an alias to <code>char</code>.</p>
<p>In <span class="caps">RISC</span>-V <code>char</code> is by default <code>unsigned</code> (don’t ask me why) but we are used to
x86 where it’s <code>signed</code> by default. Only saying <code>char</code> is not portable.</p>
<p>Replacing:</p>
<pre><code class="language-clike">typedef char int8_t;
</code></pre>
<p>With:</p>
<pre><code class="language-clike">typedef signed char int8_t;
</code></pre>
<p>Fixed the issue.</p>
<p>From this you can learn several things:</p>
<ol>
<li>Don’t assume <code>char</code><span class="quo">‘</span>s signedness in C</li>
<li>If you design a programming language, be consistent with your decisions. In
C <code>int</code> is always <code>signed int</code>, but <code>char</code><span class="quo">‘</span>s don’t act like that. Don’t do this.</li>
</ol>
<h4 id="jmp">MesLibC Implement <code>setjmp</code> and <code>longjmp</code></h4>
<p>Those that are not that versed in C, as I was before we found this issue, won’t
know about <code>setjmp</code> and <code>longjmp</code> but they are, simplifying a lot, like a
<code>goto</code> you can use in any part of the code. <code>setjmp</code> needs a buffer and it
stores the state of the program on it, <code>longjmp</code> sets the status of the program
to the values on the buffer, so it jumps to the position stored in <code>setjmp</code>.</p>
<p>Both functions are part of the C standard library and they need specific
support for each architecture because they need to know which registers are
considered part of the state of the program. They need to know how to store the
program counter, the return address, and so on, and how to restore them.</p>
<p>In their simplest form they are a set of stores in the case of the <code>setjmp</code> and
a set of loads in the case of <code>longjmp</code>.</p>
<p>In <span class="caps">RISC</span>-V they only need to store the <code>s*</code> registers, as they are the ones that
are not treated as temporary. It’s simple, but it needs to be done, which
wasn’t in neither for <span class="caps">GCC</span> nor for <span class="caps">RISC</span>-V in MesLibC.</p>
<p>Andrius is not convinced with our commit in here, and I agree with his
concerns. We added the full <code>setjmp</code> and <code>longjmp</code> implementations directly
<del>stolen from</del> inspired in the ones in Musl<sup id="fnref:stolen"><a class="footnote-ref" href="#fn:stolen">6</a></sup> but it has also
floating point register support, using instructions that are not implemented in
TinyCC yet. This is going to be a problem in the future because later
iterations will try to execute instructions they don’t actually understand.</p>
<p>There are two (or three) possible solutions here. The first is to remove the
floating point instructions for now (another flavor for this solution is to
hide them under an <code>#ifdef</code>). The second is to implement the floating point
instructions in TinyCC’s <span class="caps">RISC</span>-V assembler, which sounds great but forces us to
upstream the changes, and that process may take long and we’d need to patch it
in our bootstrapping scripts until it happens.</p>
<p>We just added the <code>#ifdef</code>s because our code is full of them anyway and sent it
to Mes: <a href="https://github.com/ekaitz-zarraga/mes/commit/0e2c55697df285250c8a24442f169bc52d729c31">0e2c5569</a>.</p>
<h4 id="more">More</h4>
<p>Those are mostly the coolest errors we needed to deal with but we stumbled upon
a lot of errors more.</p>
<p>Before this effort started Andrius added support for 64 bit instructions in Mes
and fixed some issues 64bit architectures had in M2.</p>
<p>I found a <a href="https://issues.guix.gnu.org/65225">bug in Guix shell</a> (it’s still
open) and had to fix some <span class="caps">ELF</span> headers in MesCC generated files because objdump
and gdb refused to work on them.</p>
<p>Andrius also found issues with weak symbols in MesLibC that were triggered
because <span class="caps">TCC</span> didn’t have support for them, thankfully upstream <span class="caps">TCC</span> had that
issue fixed and we just cherry-picked for the win.</p>
<p>He even had the energy to test all this in real <span class="caps">RISC</span>-V we specifically acquired
for this task.</p>
<p>There are many more things to tell, but this is already getting too long and if
I continue writing we’ll probably end up fixing some stuff more.</p>
<p>In the end, a project like this is like hitting your head against a wall until
one of them breaks. Sometimes it feels like the head did, but it’s all good.</p>
<h4 id="reproducing">Reproducing what we did</h4>
<p>All we did means nothing if you can’t reproduce it. We provide two ways to
reproduce this process: live-bootstrap and Guix.</p>
<p>Both provide a similar thing but there are some differences from the
high-level that is worth mention now.</p>
<p>Comparing with <code>live-bootstrap</code>, using Guix helps because it reuses the
previous steps if they didn’t change. This results in shorter waits once Mes is
sorted out.</p>
<p>On the other hand, I’ve have had issues with the failed builds in Guix (in
emulated systems). It was hard to jump inside the build container and play
around inside so the development cycle suffered a lot. In <code>live-bootstrap</code>, if
you are good with <code>bwrap</code> you can jump and tweak things with no issues.</p>
<p>For those who enjoy digging in the code and trying to follow the process I
recommend following <code>live-bootstrap</code><span class="quo">‘</span>s scripts. The directory structure is a
little bit confusing but the scripts are very plain and linear. The ones in the
Guix process come from previous bootstrap efforts and they are designed to do
many things automagically, that makes them a hard to follow.</p>
<h5 id="live-bootstrap">Using live-bootstrap</h5>
<p>Andrius is part of the <code>live-bootstrap</code> effort and he’s doing all the scripting
there to keep the process reproducible.</p>
<p><a href="https://github.com/fosslinux/live-bootstrap">Live-bootstrap</a> is…</p>
<blockquote>
<p>An attempt to provide a reproducible, automatic, complete end-to-end
bootstrap from a minimal number of binary seeds to a supported fully
functioning operating system.</p>
</blockquote>
<p>That’s the official description of the project. From a more practical
perspective, it’s a set of scripts that build the whole operating system from
scratch, depending on few binary seeds.</p>
<p>That’s not very different to what Guix provides from a bootstrapping
perspective. Guix is “just” an environment where you can run “scripts” (the
packages define how they are built) in a reproducible way. Of course, Guix is
way more than that, but if we focus on what we are doing right now it acts like
the exact same thing.</p>
<blockquote>
<p><span class="caps">NOTE</span>: <code>live-bootstrap</code><span class="quo">‘</span>s project description is a little bit outdated. If you
read the comparison with Guix, what you’d read is old information. If you
want to read a more up-to-date information about Guix’s bootstrapping process
I suggest you to read this page of Guix manual:
<a href="https://guix.gnu.org/manual/devel/en/html_node/Full_002dSource-Bootstrap.html">https://guix.gnu.org/manual/devel/en/html_node/Full_002dSource-Bootstrap.html</a></p>
</blockquote>
<p>Being very different projects, in a practical level, the main difference
between them is <code>live-bootstrap</code> is probably easier for you to test if you are
working on any <span class="caps">GNU</span>/Linux distribution<sup id="fnref:in-guix"><a class="footnote-ref" href="#fn:in-guix">7</a></sup>.</p>
<p>If you want to reproduce this exact point in time you only need to use my fork
of <a href="https://github.com/ekaitz-zarraga/live-bootstrap/">live-bootstrap</a>, branch
<code>riscv-tcc-boot</code>. I also made a tag on it, <code>self-hosted-tcc-rv64</code>, to make it
easier to remember when was this post released. Andrius made all the magic to
set that process to take all the inputs from Mes and TinyCC from the correct tag.</p>
<p>Clone the repository, set up the dependencies and run this (if you are not in a
<span class="caps">RISC</span>-V host you need to configure Qemu and binfmt):</p>
<pre><code class="language-bash"> ./rootfs.py --bwrap --arch riscv64 --preserve
</code></pre>
<p>That should, after a long time, reach a point where there’s a properly compiled
bootstrappable TinyCC.</p>
<h4 id="guix">Using Guix for a reproducible environment</h4>
<p>I made a Guix recipe that can replicate the whole process, too. It took me long
time to make it work but it finally does.</p>
<p>From my <span class="caps">TCC</span> fork reproducing this should be easy for the people versed in Guix.
There’s a <code>guix</code> folder with some files, (most of them broken, not gonna lie)
but there are two you should pay attention to:</p>
<ul>
<li>
<p><code>channels.scm</code> stores the state of my Guix checkout so you can reproduce it
in the future using <code>guix time-machine</code>. At the moment it doesn’t feel
necessary but if something fails when you try it, please refer to that.</p>
</li>
<li>
<p><code>commencement.scm</code> is an edited copy of the Guix bootstrapping process,
directly obtained from <code>gnu/packages/commencement.scm</code> from Guix’s codebase.
I patched this to make it work for <span class="caps">RISC</span>-V, using some more modern commits in
the dependencies.</p>
</li>
</ul>
<p>In order to reproduce all our work in Guix you just need to build <code>tcc-boot0</code>
package from the <code>commencement.scm</code> file using <code>riscv64-linux</code> as your
<code>--system</code>. I’m a nice guy so I just added a command there you can use for
this, just run:</p>
<pre><code class="language-bash">./tcc-boot0-from-source.sh
</code></pre>
<p>And that should build the whole thing. It takes hours, you have been warned.</p>
<p>Also it adds <code>--no-grafts</code> (thanks Efraim), because if you keep the grafts it
compiles the world from scratch (curl, x11… not good).</p>
<p>If you just want to build <code>mes-boot</code> as an intermediate step, I also made a
file for that:</p>
<pre><code class="language-bash">./mes-boot-from-source.sh
</code></pre>
<p>The both scripts will load variables from the <code>commencement.scm</code> module
provided. The module is not complex if you are used to Guix, but it calls
some complex shell scripts in both Mes and TinyCC to build. Those contain all
the magic.</p>
<h3 id="conclusions">Conclusions</h3>
<p>Of course, the problems we fixed now look easy and simple to fix. This blog
post doesn’t really do justice to the countless debugging hours and all the
nights we, Andrius and I, spent thinking about where could the issues be
coming from.</p>
<p>The debugging setup wasn’t as good as you might imagine. The early steps of the
bootstrap don’t have all the debug symbols as a “normal” userspace program
would. In many cases, function names were all we had.</p>
<p>I have thank my colleague Andrius here because he did a really good debugging
job, and he provided me with small reproducers that I could finally fix. Most
of the times he made the assist and I scored the goal.</p>
<p>He also did a great job with the testing which I couldn’t do because I was
struggling with Guix from the early days, trying to make the compilers find the
header files and libraries.</p>
<p>In the emotional part it is also a great improvement to have someone to rely
on. Andrius, Janneke and I had a good teamwork and we supported each other when
our faith started to crumble. And believe, it does crumble when a new bug
appears after you fixed one that you needed a week for. There were times this
summer I thought we would never reach this point.</p>
<p>It’s also worth mention here that the bootstrapping process is extremely slow:
it takes hours. This kills the responsiveness and makes testing way harder than
it should be. Not to mention that we are working on a foreign architecture,
which has it’s own problems too.</p>
<p>If you have to take some lesson from something like this, here you have a
suggestion list:</p>
<ul>
<li>The simplest error can take ages to debug if your code is crazy enough.</li>
<li>Don’t be clever. It sets a very high standard for your future self and people
who will read your code in the future.</li>
<li>I guess we can summarize the previous two points in one: If we could remove
TinyCC from the chain, we would. It’s a source of errors and it’s hard to
debug. The codebase is really hard to read for no apparent reason.</li>
<li>When build times are long, small reproducers help.</li>
<li>Add tests for each new case you find.</li>
<li>Don’t trust, disassemble and debug.</li>
<li>Be careful with C and standards and undefined behavior.</li>
<li>Integers are hard. Signedness makes them harder.</li>
<li>Being surrounded by the correct people makes your life easier.</li>
</ul>
<p>Also, as a personal note I noticed I’m a better programmer since the previous
post in the this series. I feel way more comfortable with complex reasoning and
even writing new programs in other languages, even if I spent almost no time
coding anything from scratch. It’s like dealing with this kind of issues about
the internals give you some level of awareness that is useful in a more general
way than it looks. Crazy stuff.</p>
<p>If you can, try to play with the internals of things from time to time. It
helps. At least it helped me.</p>
<h3 id="next">What is next?</h3>
<p>Now we have a fully featured Bootstrappable TinyCC we need to decide what to do next.</p>
<p>On the short term, all this has to be released in the original projects: Mes,
M2, and so on. That’s the easy part, as everything has proved to be ready.</p>
<p>On the mid term, it’s not very clear what to do first. We suspect we’ll need
upstream TinyCC for the next steps, because we many different tools to
continue with the bootstrapping chain, and the bootstrappable TinyCC might not
be enough to build them. On the other hand, when we go for a standard library
we’ll miss the extended assembly support we already mentioned. There’s some
uncertainty in the next step.</p>
<p>The long-term is pretty much clear though, the goal is <span class="caps">GCC</span>. First <span class="caps">GCC</span> for C and
then for C++ to make it able build <span class="caps">GCC</span> 7.5 which should enable the rest of the
chain pretty easily (famous last words). I anticipate we are going to have
problems with <span class="caps">GCC</span> (I know this because I left them there last time) so we’ll
need to fix those, too. Once that is done, we would use <span class="caps">GCC</span> to compile more
recent versions of <span class="caps">GCC</span> until we compile the world.</p>
<p>That’s more or less the description of what we will do in the next months.</p>
<p>And this is pretty much it. I hope you learned something new about C, the
Bootstrapping process or at least had a good time reading this wall of text.</p>
<p>We’ll try to work less for the next one, but we can’t promise that. 😉</p>
<p>Take care.</p>
<hr>
<!--
MANY OF THIS ARE REALLY HARD TO REASON ABOUT!!!!
WITH THIS WE START PASSING MANY MORE TESTS IN MESCC AND ALSO ADDED SOME EXTRA
TESTS THAT CHECK COMPLEX BEHAVIOR HERE AND THERE
- `int`s are 64 bit in MesCC and TinyCC is written like they are 32 bit.
- TinyCC's assembly for RISC-V is not complete and we need some of that in
meslibc. We implemented the missing instructions (jal, jalr, lla and some
pseudoinstructions).
- TinyCC's assembler for RISC-V uses a simplified syntax, so we need to rewrite
our meslibc according to that.
- RISC-V uses a `__global_pointer$` symbol, but TinyCC does not allow dollars
in identifiers by default. The `-fdollars-in-identifiers` flag exploded when
used so we hardcoded the flag to true.
- We backported the `long double` support from TinyCC's `mob` branch.
- And large constant generation.
- Fixed some weird casting issues in TinyCC (see Fix casting issues (missing
func_vt in riscvgen.c)
- MesCC produced binaries that were impossible to debug with GDB and OBJDUMP
complained about them. We fixed those too (some archs are missing)
- MesCC's struct initialization to zeroes like `Whatever a = {0};` initialized
everything to `22` and is now working as expected.
- `switch/case` statements in MesCC fallback always to default because they
check the fallback clause and then jump to default.
- Mes had some incompatibilities with Guile that prevented us from running the
code fast. Fixed those.
- Added support for RISC-V instruction formats in MesCC
(https://git.savannah.gnu.org/cgit/mes.git/commit/?h=wip-riscv&id=e42cf58d14520a5360d7d527d1c2c18c0a498c28)
- Added support for signed rotation in MesCC. (all arches affected)
- And also fixed some M2 things that allow all this 64 bit support happen in
MesCC, which didn't have 64 bit support before. Stikonas?
- Stikonas also fixed problems in M2:
https://github.com/oriansj/M2-Planet/commit/85dd953b70c5f607769016bbf2a0aa3de7e41b6c
- Fix Bootstrappable TinyCC's GOT (global offset table). It was just a broken
condition in an if (stikonas dealt with that)
- Meslibc again! Tinycc does not support [extended
asm](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html) in RV64 but
stikonas fixes it replacing the extended asm by abi-compatible handwired asm.
The good fix would be to implement it, but upstream doesn't have it either...
- `int size = 0; if (size < 8) size = 8;` does not work because TCC generated
wrong assembly and it jumps over the true branch even if it checks the
condition is ok. (reproducer in `C_TESTS/if.c`)
- Variable length arguments were broken in Bootstrappable TCC. Upstream TCC
does some string magic to support them (c2str) where the same header file is
used twice: one in the binary and one in runtime. That functionality was lost
in the ~translation~ backport. We had to push some defines to Meslibc that
support that.
- Meslibc had `typedef char int8_t` in `stdint.h` but that's not reliable,
because the C standard doesn't define the signedness of the `char`. In RISC-V
the signedness of the char is `unsigned` by default, so we have to be
explicit and say `signed char`, to avoid issues.
- Remove some 0bXXXX literals I introduced in the assembler to simplify
things... They happen not to be standard C but a GCC extension.
- Add a setjmp and longjmp implementation to meslibc that also support tinycc
assembler syntax. (copy from musl but with our syntax)
-->
<div class="footnote">
<hr>
<ol>
<li id="fn:rounds">
<p>There are many rounds. Like 7 or so. <a class="footnote-backref" href="#fnref:rounds" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:self-hosted">
<p>So it can compile itself again an again, but who would want to
do that? <a class="footnote-backref" href="#fnref:self-hosted" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:reproducer">
<p>This is how we managed to fix most of the problems in our code:
make a small reproducer we can test separately so we can inspect the
process and the result easily. <a class="footnote-backref" href="#fnref:reproducer" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:cppref">
<p>You can see an explanation in the (1) case at
<a href="https://en.cppreference.com/w/c/language/struct_initialization">cppreference.com</a> <a class="footnote-backref" href="#fnref:cppref" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:stikonas">
<p>He is like that. <a class="footnote-backref" href="#fnref:stikonas" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:stolen">
<p>Yo, if it’s free software it’s not stealing! Please steal my code.
Make it better. <a class="footnote-backref" href="#fnref:stolen" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:in-guix">
<p>If you run it in Guix or in a distribution that doesn’t follow <span class="caps">FHS</span>
you’d probably need to touch the path of your Qemu installation or be
careful with the options you send to the <code>rootfs.py</code> script. <a class="footnote-backref" href="#fnref:in-guix" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
</ol>
</div>More work, more people, more energy — thanks NlNet2023-07-17T00:00:00+03:002023-07-17T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2023-07-17:/bootstrapGcc7.html<p>Now it’s time to focus on combining all the previous work and making it
production ready. NlNet for the rescue, again.</p><p>I might become a little bit famous in this small world of Guix, <span class="caps">RISC</span>-V and
bootstrapping after my <a href="https://fosdem.org/2023/schedule/event/guixriscv/"><span class="caps">FOSDEM</span> talk of this year</a> and the work I did
during 2021 and 2022, that you can follow in this <a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series of posts</a> I’m
going on with right now.</p>
<p>All that I write here is nothing I did alone. Many people helped me to make
this happen, but in the end it is me the one that writes here and makes the
noise. So, before explaining anything else, I want to thank everyone that is
involved in the process.</p>
<p>I also want to thank <a href="https://nlnet.nl/project/GNUMes-RISCV/">NLNet / <span class="caps">NGI</span>-Assure</a> for funding the project.
Without there wouldn’t be anything to discuss here. They just enabled this work
with their funds.</p>
<p>The work is done, it was funded and it was finished. I backported <span class="caps">RISC</span>-V
support to <span class="caps">GCC</span>, and I also backported the <span class="caps">RISC</span>-V support to the bootstrappable
TinyCC, but that’s not enough. All that I did has to be combined with the whole
bootstrapping toolchain, so it’s time for more.</p>
<h3>A new way</h3>
<p>Even with all the help, during the project I felt alone. The codebases are huge
(<span class="caps">GCC</span> is millions LoC), or very badly written (tcc I’m looking at you) and there
are tons of moving parts (Hex0, M1, M2-Planet, Mes, tcc, bootstrappable tcc,
gcc, all the libcs…). It’s really hard to know everything and none of us know
all the ecosystem deeply so many times there’s none to ask for help. You are alone.</p>
<p>This might seem a good thing, a challenge, and it is, but it’s also very energy
consuming. I did all I could and I’m not sure if I can take this a lot further
by myself.</p>
<p>Now, the project has evolved. We have most of the dots and it’s time to draw
the line that connects them.</p>
<p>In order to do that we need more collaboration as each of us has become an
expert in a different part of the chain. Also, many new problems will arise
from the interaction between the different parts.</p>
<p>Knowing that, this time I proposed something else: I wanted to make a larger
project where more people would collaborate and I asked NlNet for the funds to
continue the work from that perspective.</p>
<p>That’s good because we can pay every person involved on this according to their
implication<sup id="fnref:surprise"><a class="footnote-ref" href="#fn:surprise">1</a></sup>.</p>
<h3>NlNet / <span class="caps">NGI</span> Assure</h3>
<p>Of course, I wouldn’t be writing this if NlNet didn’t give us the funds.</p>
<p>So, yes: NlNet decided to fund us. Big thanks to them and to <span class="caps">NGI</span> Assure.</p>
<style>
.container{
display: flex;
flex-flow: row wrap;
justify-content: center;
gap: 40px;
}
.no-side-margin{
margin: 0px;
}
</style>
<div class="container">
<img class="no-side-margin" src="https://ekaitz.elenq.tech/nlnet.svg" width=200px>
<img class="no-side-margin" src="https://ekaitz.elenq.tech/NGIAssure.svg" width=200px>
</div>
<h3>The work</h3>
<p>As I introduced in the <a href="https://fosdem.org/2023/schedule/event/guixriscv/">Fosdem talk</a>, there’s a lot of integration
work to do.</p>
<p>During the last year I focused on backporting the <span class="caps">RISC</span>-V support to <span class="caps">GCC</span> and the
Bootstrappable TinyCC. I did that because I knew that would enable more work on
the whole chain of compilers we use in the bootstrapping process. But also
because I could just focus on a very specific part, forgetting about the whole
chain for a moment.</p>
<p>Now it’s time to start combining all the work together.</p>
<p>The funding includes enough tasks to make the full source bootstrapping for
<span class="caps">RISC</span>-V. This is a summary of the tasks:</p>
<ul>
<li>Finish <span class="caps">GNU</span> Mes’ <span class="caps">RISC</span>-V support</li>
<li>Build the Bootstrappable TinyCC using <span class="caps">GNU</span> Mes’ <span class="caps">RISC</span>-V support added in the
first task</li>
<li>Fix the backported <span class="caps">GCC</span> 4.6.4 package to include C++ support and fix missing functionality</li>
<li>Build the backported <span class="caps">GCC</span> 4.6.4</li>
<li>Build the upstream <span class="caps">GCC</span> 7.5 or higher with the backported <span class="caps">GCC</span> 4.6.4</li>
<li>Package the whole process and include it in Guix’s commencement module</li>
<li>Review the associated projects and fix the possible issues</li>
<li>Document all the process</li>
</ul>
<p>You are probably not familiar enough with the whole thing to know what they
really mean but some of them are <strong>really hard</strong>.</p>
<p>I’ll go into more detail on all of them as we work on them, so don’t worry at
the moment.</p>
<h3>The feelings</h3>
<p>I’m not very excited with this project anymore. The tasks you can see in the
previous block are not good for a person like me. I really struggle with them:
configuring development environments, fixing weird imports, etc. It’s hard for
me, intellectually and emotionally.</p>
<p>I already did the parts that interested me the most and I want to move on to
something else.</p>
<p>Why ask for more funds then?</p>
<p>Well, lets say it plainly: the funds are not for me. I’m using the success I
had with the previous project and the interest NlNet has on it to fund other
people to finish the work.</p>
<p>Whoever that makes the task will get the budget associated to it.</p>
<p>The plan here is to help coordinate other people to make the tasks, but not
really do them myself. I don’t discard it though. I’ll probably need to work on
some of them.</p>
<p>The fact that I’m the one that presented the proposal doesn’t mean the proposal
is for me. The proposal is for <strong>you</strong>.</p>
<h3>The people</h3>
<p>I already managed to involve two fellow hackers and I made the proposal with
them as collaborators:</p>
<ul>
<li><strong>Efraim Flashner</strong>, who has been working on the <span class="caps">RISC</span>-V port of most of the
Guix packages is going to take part in this second stage of the project as he
knows better than anyone else what’s the status of <span class="caps">RISC</span>-V in Guix.</li>
<li><strong>Danny Milosavljevic</strong>, who worked in the bootstrapping process for <span class="caps">ARM</span> also
agreed to get involved in this.</li>
<li><strong>Jan Nieuwenhuizen</strong> (Janneke) is the Mes author and maintainer.</li>
<li><strong>Andrius Štikonas</strong> is deeply involved in the bootstrapping process, too. Making
a lot of patches to live-bootstrap, Mes, Hex0, M2-Planet and so on.</li>
<li><strong>Juliana Sims</strong> has also shown interest in the project because she has been
involved in <span class="caps">RISC</span>-V related projects before.</li>
</ul>
<p>You can also collaborate with us, if your contributions are good we can even
add you to the official team.</p>
<p>If you want to join us, feel free to contact me to <code>riscv-effort@elenq.tech</code> or
join <code>#bootstrappable</code> in <code>libera.chat</code> and ping me there.</p>
<p>I’m sure we all will learn a lot together during the process.</p>
<h3>Closing words</h3>
<p>So, in summary, I’m just introducing a new part of this adventure. Thanks to
NlNet, we can take all the work we have been doing separately on Mes, <span class="caps">GCC</span>,
TinyCC, Hex0, M2-Planet and so on and finally combine it all together.</p>
<p>This is a huge effort, but hopefully we’ll manage to do it, learn a lot in
the process and get paid.</p>
<p>We’ll see how it goes. I’ll keep you all informed.</p>
<p>Take care.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:surprise">
<p>It might sound really surprising to some, but some of the people
involved on this are paid zero money for their time at the moment, and they
are doing great improvements. This is a topic for a huge discussion but, in
summary: work is work, and you should get paid for it. <a class="footnote-backref" href="#fnref:surprise" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Support Windows not supporting Windows2023-03-18T00:00:00+02:002023-03-18T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2023-03-18:/windows.html<p>About the possibility of having Windows users as clients being a software
developer that doesn’t use Windows, and how to solve that technically.</p><p>I hate Windows. I don’t like it, I don’t support the software practices of
Microsoft and I probably never will. That doesn’t mean I don’t live in this
world, where unfortunately most of the people uses Windows. Many people have no
other choice than using Windows and they still deserve to have some good free
software in their computers.</p>
<p>Of course, I always try to encourage my clients (and everyone around me!) to
use free software and stop using Windows, but sometimes it’s impossible, and
it’s always better for them to use Windows with some free software made by
me<sup id="fnref:money"><a class="footnote-ref" href="#fn:money">1</a></sup> than using Windows with more proprietary bloatware done by any
garbage corporation that doesn’t care about their freedom.</p>
<p>Since I started with ElenQ Technology I always had this issue in mind, and the
time to tackle it has come, so:</p>
<blockquote>
<p>How could I make software for Windows users if I don’t use Windows, I don’t
have any machine that runs Windows and I don’t support Windows in any way?</p>
</blockquote>
<p>Until recently, most of my clients asked me for Web based tools, so I dodged
that ball without even realizing it, but I always had that the impression I
would have to tackle the issue someday and I was just delaying the moment to
make some research.</p>
<p>During the last couple of weeks, in my spare time, I made that research a
little bit and this is a simple high-level result of that research.</p>
<p>Needless to say this research is for myself and I have a very strong
background to take in consideration (read the blog and you’ll see!), so it
probably won’t fit your needs in any way, but it does fit mine and probably
some of my near colleagues’.</p>
<ol>
<li><a href="#web">Web-based</a><ol>
<li><a href="#webext">Web Extensions</a></li>
</ol>
</li>
<li><a href="#jvm">Java Virtual Machine</a><ol>
<li><a href="#jvmgui">GUIs</a></li>
<li><a href="#jvmlangs">Interesting <span class="caps">JVM</span> languages</a><ol>
<li><a href="#clojure">Clojure</a></li>
<li><a href="#kawa">Kawa</a></li>
</ol>
</li>
<li><a href="#jar">Distribution: <span class="caps">JAR</span> files and UberJAR</a></li>
</ol>
</li>
<li><a href="#native">Native Binaries</a><ol>
<li><a href="#mingw">MinGW</a></li>
<li><a href="#zig">Zig</a></li>
<li><a href="#staticbin">Distribution: statically built binaries</a></li>
<li><a href="#guilibs">GUIs</a></li>
</ol>
</li>
<li><a href="#wininst">Distribution with Windows Installers</a></li>
<li><a href="#conclusion">Conclusion</a></li>
<li><a href="#finalwords">Final words</a></li>
</ol>
<h3 id="web">Web-based</h3>
<p>Most of the times I am asked to do Web stuff, because most of the people just
use Webs for everything<sup id="fnref:ask-vs-need"><a class="footnote-ref" href="#fn:ask-vs-need">2</a></sup>.</p>
<p>Clients are used to work with websites, the UIs are easy to make, they work in
any device… They are cool for many things, but when you need to make any
interesting native operation they don’t make any sense (i.e. reading or
creating files locally) and they require deployment, one way or another, and
that may carry extra costs or efforts and maintenance.</p>
<h4 id="webext">Web Extensions</h4>
<p>Another interesting option are Web Extensions (browser extensions). They are
kinda easy to make and making them work in several browsers<sup id="fnref:web-ext"><a class="footnote-ref" href="#fn:web-ext">3</a></sup> is almost
no effort, they don’t require deployment (no servers, no pain) and they have
more permissions than a regular website.</p>
<p>The problem they have is the browser is still a constrained environment, and
you might not be able to do anything you’d like in there and they force the
users to have the browser open in order to run them.</p>
<h3 id="jvm"><span class="caps">JVM</span> runs everywhere!</h3>
<p>I’ve never been a Java fan. I don’t really like the language and the fact that
it supposes you are using an <span class="caps">IDE</span> to code in it, but I have to say the <span class="caps">JVM</span> is a
really interesting environment.</p>
<p>The main problem it has is it’s pretty large, but it’s not a huge deal to tell
my clients to install it (if they don’t have it already) and it provides most
of the functionality I’d ever need out of the box.</p>
<h4 id="jvmgui"><span class="caps">GUI</span></h4>
<p>It comes with <span class="caps">GUI</span> stuff by default (Swing and <span class="caps">AWT</span><sup id="fnref:swing"><a class="footnote-ref" href="#fn:swing">4</a></sup>) and there are more
modern ways to make GUIs like JavaFX, which I didn’t manage to make it work in
my Guix machine.</p>
<h4 id="jvmlangs">Interesting <span class="caps">JVM</span> languages</h4>
<p>The best thing about the Java Virtual Machine is you don’t need to use Java for
it. There are some cool languages full of parenthesis you can use in it.</p>
<h5 id="clojure">Clojure</h5>
<p>I have some Clojure past. It’s a language I love. It had a couple of things I
didn’t like about it though:</p>
<ul>
<li>The startup time of Clojure made me feel uncomfortable.</li>
<li>I was worried about the size of the <span class="caps">JVM</span>.</li>
<li>Most of my code relied too heavily in Leiningen for everything and I didn’t
know very well what was going on internally or which libraries were being
used (a little bit of an <span class="caps">NPM</span> effect), and I was worried about the maintenance
of the software if I was asked to make changes in the
future<sup id="fnref:maintenance-future"><a class="footnote-ref" href="#fn:maintenance-future">5</a></sup>.</li>
<li>The Java interaction is really well designed at a language level, but
integrating Clojure’s functional programming with heavily imperative Java
code (like GUIs) feels uncomfortable.</li>
</ul>
<p>I have to say I don’t code in Clojure for a long time and I wasn’t that good
programmer at the time I played around with it. Probably I would make a way
better use of it right now, and things that I felt weird may feel way more comfortable.</p>
<p>None of this issues is a big deal anyway. For these kind of projects for
Windows, it might be a great choice, as most of the problems don’t mean a lot
any more in this context. I may give Clojure another go.</p>
<h5 id="kawa">Kawa</h5>
<p>Recently I discovered Kawa, and it looks great.
Kawa programs are easy to build with no additional tools, it’s a scheme, it’s
<strong>fast</strong>, and its Java interaction feels natural.</p>
<p>Of course, there are almost no libraries written in Kawa, no tutorials, no
learning resources further than the documentation (which is very good, by the way).</p>
<p>It’s minimalistic, it’s easy to set up and is really fast: it might be a good
choice for many projects.</p>
<h4 id="jar">Distribution: <span class="caps">JAR</span> files and UberJARs</h4>
<p>The Java world might be “too enterprisey” for someone like me, but it has
interesting features. The <code>jar</code> files are just zip files that have Java
bytecode and resources inside.</p>
<p>In any machine with Java installed they are launched automatically when they
are clicked. There’s no need to unpack them or anything like that.</p>
<p>There are several ways to make those <code>jar</code> files, and one is simply insert all
the dependencies of the Java application inside of the <code>jar</code>. That’s called the UberJAR.</p>
<p>Doing this makes you sure to have all the dependencies and resource files (such
as icons and stuff like that) inside of a file that you can distribute and will
always run if there’s a <span class="caps">JVM</span> installed in the target machine. Simple distribution!</p>
<p>The only problem they have is they are not located in the correct system folder
to appear in the application launchers<sup id="fnref:jpackage"><a class="footnote-ref" href="#fn:jpackage">6</a></sup>.</p>
<h3 id="native">Native binaries</h3>
<p>Sharing prebuilt binaries is also feasible if you are using the right tools,
but it might be tricky to distribute.</p>
<p>The first thing you have to do if you want to share binaries is cross-compile
them for windows. I found a couple of tools that fit very well with my style
here: MinGW and Zig.</p>
<h4 id="mingw">MinGW</h4>
<p>Recently I discovered this and it happens to be great. Simply put, MinGW a
cross-compiler toolchain for Windows. It has everything you need to build your
C/C++ software for Windows: gcc, binutils, libraries and header files.</p>
<p>Pretty straightforward.</p>
<p>Guix also supports this as a target so I can even <code>guix build --target=</code> with
it and have all the fun.</p>
<h4 id="zig">Zig</h4>
<p>Zig is a great programming language and the tooling around it is absolutely
fantastic. It’s designed to be easy to cross-compile and don’t need anything
else than the Zig compiler itself to be able to build your Zig, C or C++
software for a Windows machine. Just change the target and boom, works!</p>
<p>Zig also comes with a build-system, that lets you describe how to build your
whole project and build it with a simple command. No need to use external tools
like <span class="caps">GNU</span> Autotools, Make, CMake, Meson or anything like that. One Zig
installation comes with everything you need.</p>
<blockquote>
<p>Another advantage of Zig is that I love the language and I’m looking for an
excuse to learn it. I think it’s very well designed and I love the community
it has. I don’t like the syntax that much but I think I’ll get used to it.</p>
</blockquote>
<h4 id="staticbin">Distribution: statically built binaries</h4>
<p>In order to distribute the binaries, there only obvious choice is to statically
link everything and give it in one <code>.exe</code> file to my clients. I can’t really
trust non-tech users to install all the dependencies in place and I can’t guide
them in the process of how to do it because I don’t know how to do it myself,
and I can’t try as I don’t own any Windows machine.</p>
<p>Statically built binaries require no installation so that’s great, but they are
not set in the correct folder of the system and they don’t support resources as
icons and stuff like that<sup id="fnref:resources-bin"><a class="footnote-ref" href="#fn:resources-bin">7</a></sup>. They might be ok for many things but
they are not a perfect solution either.</p>
<h4 id="guilibs">GUIs</h4>
<p>This kind of clients require GUIs most of the time. I don’t imagine them
running a script from the shell.</p>
<p>There are many <span class="caps">GUI</span> libraries I could use but I’d like to use anything that is
small and easy to build for any target. That leaves most of them out.</p>
<ul>
<li>
<p>I tried to build <a href="https://www.tecgraf.puc-rio.br/iup/"><span class="caps">IUP</span></a> myself but the
build process is basically broken. It looks good and uses native <span class="caps">GUI</span>
components, but if the build process is broken I can’t really trust it. I
could just use the binaries they provide but I don’t like that.</p>
</li>
<li>
<p>I built <a href="https://www.fltk.org/index.php"><span class="caps">FLTK</span></a> successfully without problems
and I even <a href="http://git.elenq.tech/guix-packages/commit/?id=b952e7778843adbebae0d12d6f1601e4594313eb">packaged it in my personal Guix
channel</a>.
It’s not beautiful, but it works, and I don’t expect it to be hard to build
for Windows either.</p>
</li>
<li>
<p>I could go for something like Dear ImGUI, but ImGUIs are better suited for
programs that are being rendered continuously like games and such.</p>
</li>
<li>
<p>Qt, <span class="caps">GTK</span>, wxWidgets and those are great too, but probably too much for a
simple man like me<sup id="fnref:kde"><a class="footnote-ref" href="#fn:kde">8</a></sup>.</p>
</li>
</ul>
<h3 id="wininst">Distribution with Windows installers</h3>
<p>Software distribution can be eased using a Windows Installer like <span class="caps">MSI</span> or <span class="caps">MSIX</span>.
Those packages know where to install everything and they do automagically, as
Windows users are used to.</p>
<p>They require extra tools but they might be simple enough to deal with and help
a lot removing the downsides of the distribution methods described previously.</p>
<ul>
<li>
<p><span class="caps">GNOME</span> project has a tool called <code>msitools</code>, which exposes a similar interface
to WixToolset, a popular Windows installer generator. I can use that to build
and inspect <span class="caps">MSI</span> installers.</p>
</li>
<li>
<p>Microsoft also provides <a href="https://github.com/microsoft/msix-packaging">a tool for <span class="caps">MSIX</span>
installers</a> that is Open Source
but it happens to insert
<a href="https://github.com/microsoft/msix-packaging/issues/569">telemetry</a> (what a surprise!).</p>
</li>
<li>
<p>There’s a <a href="https://github.com/jpakkane/msicreator">Python package called
<code>msicreator</code></a> that simplifies the use
of <code>msitools</code> with a simpler approach that might be more than enough for my needs.</p>
</li>
</ul>
<p>There are also some language-specific tools like PyInstaller but that forces me
to use Python, which I like but I don’t know if I want to keep using for
everything. Also, it includes the interpreter in the installer, which feels
like a little bit too much.</p>
<blockquote>
<p><span class="caps">EDIT</span>: <a href="https://ekaitz-zarraga.itch.io/bee/comments">Someone</a> mentioned the
existence of <a href="https://nsis.sourceforge.io/Main_Page"><span class="caps">NSIS</span></a>, which looks
pretty promising. I add it here for future reference. It’s packaged in Guix
so it might be a good idea to give it a go.</p>
</blockquote>
<h3>Conclusion</h3>
<p>Each choice comes with its downsides and shines in specific scenarios. Working
in a classic C/C++ setup with MinGW might be great but I have to make sure I
don’t use a complex dependency tree, as everything might fail to compile or
distribute for Windows.</p>
<p>I want to learn Zig. It’s really cool and I think it will ease the process
significantly. I’m not sure about the C integration but I need to give it a go
first. It might become my go-to language for these kind of applications.</p>
<p>On these cases I need still to find the best <span class="caps">GUI</span> library to use (suggestions welcome!).</p>
<p>The <span class="caps">JVM</span> case is also interesting. It comes batteries included and has Swing by
default, which is horrible but it’s something. For faster development I could
use some flexible language like Clojure or Kawa there, being the second way
faster than the first but also way less know (which shouldn’t be a problem as I
don’t want to rely in many external libraries).</p>
<p>All these options look feasible so I could just go for any of them. Obviously,
some have way better performance than others (C/C++/Zig vs Clojure) but the
ease of development is also something I have to take in account. That I’d need
to think about when the projects come.</p>
<p>We’ll see…<sup id="fnref:chicken"><a class="footnote-ref" href="#fn:chicken">9</a></sup></p>
<h3 id="finalwords">Final words</h3>
<p>It’s obvious that I left many options out and I can’t wait to get some emails
of people recommending me to learn Go or Rust<sup id="fnref:rust"><a class="footnote-ref" href="#fn:rust">10</a></sup>, but this non-exhaustive
research is mostly based on my personal (and current) preference.</p>
<p>Surely you’ll think I’m making it way more difficult than it actually is:
<em>“just install a virtual machine and build there! Buy a Windows machine if you
really want to solve this issue!!”</em> and you’d probably be right, but it doesn’t
feel right to me. So I won’t.</p>
<p>If you have other ideas or success stories that fit this line of thinking don’t
hesitate to <a href="https://ekaitz.elenq.tech/pages/about.html">contact me</a>.</p>
<p>Stay safe!</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:money">
<p>And I can earn some money in the process. <a class="footnote-backref" href="#fnref:money" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:ask-vs-need">
<p>Often times what the clients want or they ask for is not what
they need, so be careful with that. <a class="footnote-backref" href="#fnref:ask-vs-need" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:web-ext">
<p><span class="caps">API</span> support in all the browsers is not the same, be careful with
that. <a class="footnote-backref" href="#fnref:web-ext" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:swing">
<p>Welcome back to 2001. <a class="footnote-backref" href="#fnref:swing" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:maintenance-future">
<p>Guix can help here! <a class="footnote-backref" href="#fnref:maintenance-future" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:jpackage">
<p>The Java ecosystem provides a tool to solve this <a href="https://openjdk.org/jeps/343">called
<code>javapackage</code></a> but unfortunately it doesn’t
cross-compile (Wine for the win?) and it would add a full Java runtime to the
installer. Maybe it’s too much. <a class="footnote-backref" href="#fnref:jpackage" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:resources-bin">
<p>The problem with the resources can be bypassed by a
self-expanding executable: a program that unpacks its binary contents in the
current folder. They can be made by 7zip and other tools like this, but I
don’t really like it, as they pollute the current folder and you might not
expect that to happen. Some games are distributed like this. <a class="footnote-backref" href="#fnref:resources-bin" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
<li id="fn:kde">
<p>Probably you didn’t know but my first free software contribution was
for <span class="caps">KDE</span> and I had to deal with Qt. It was right in the migration from Qt4 to
Qt5. Good times. <a class="footnote-backref" href="#fnref:kde" title="Jump back to footnote 8 in the text">↩</a></p>
</li>
<li id="fn:chicken">
<p>I can always chicken out and go for PyInstaller + PyQt when the
projects come. 😐 <a class="footnote-backref" href="#fnref:chicken" title="Jump back to footnote 9 in the text">↩</a></p>
</li>
<li id="fn:rust">
<p>I don’t like Go but I’m open to learn Rust even if its a little bit
more complex than I’d like it to be (and I don’t like the syntax). It would
require me to allocate a long time for it, which is not a problem, but I need
to be sure that I will be able to get some benefit from it. <a class="footnote-backref" href="#fnref:rust" title="Jump back to footnote 10 in the text">↩</a></p>
</li>
</ol>
</div>Milestone – RISC-V support in Mes’s bootstrappable TinyCC2022-09-22T00:00:00+03:002022-09-22T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-09-22:/bootstrapGcc6.html<p>Bringing <span class="caps">RISC</span>-V support to the bootstrappable TinyCC Mes forked. Some
problems and a look into the future.</p><p>In the <a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series</a> we already introduced <span class="caps">GCC</span>,
TinyCC, Mes and Mes’s TinyCC fork that is designed to be bootstrappable. In
this post we are going to deal with the latter, explain how we made it work for
<span class="caps">RISC</span>-V and the challenges we encountered.</p>
<h3>The non-bootstrappable nature of TinyCC</h3>
<p>As we introduced in the previous post TinyCC is not compilable from very simple
compilers like Mes’s <code>mescc</code>. So the Mes project decided to make a <a href="https://gitlab.com/janneke/tinycc">fork that
<code>mescc</code> was able to compile</a>. Mes calls it a
<em>bootstrappable tinycc</em>.</p>
<blockquote>
<p>There’s a in uninteresting philosophical debate about what does
<em>bootstrappable</em> mean, which leads to many errors and
misunderstandings<sup id="fnref:misunderstandings"><a class="footnote-ref" href="#fn:misunderstandings">1</a></sup>. Many compilers call themselves
bootstrappable if they can be compiled with themselves. When <strong>we</strong> talk
about this, we are looking for a <em>full-source bootstrappability</em>, that is,
that the compilers can be compiled from <em>source</em>, or from a <em>full-source
bootstrappable</em> compiler.</p>
</blockquote>
<p>TinyCC is supposed to be compilable by itself, but who compiles the version
that compiles TinyCC? Another TinyCC? And who compiles that?</p>
<p>The yogurt problem we always get: how do you make yogurt? Take yogurt, mix with
milk and in some hours you’ll get yogurt. See the problem?</p>
<p>If you are a culinary maniac, as I am, you can stretch this metaphor further.
If you know what you are doing, you can obtain yogurt from raw milk<sup id="fnref:kefir"><a class="footnote-ref" href="#fn:kefir">2</a></sup>.</p>
<p>That’s what our project is doing: make yogurt from raw milk at some point.</p>
<p>So the compilers normally only care about the latest yogurt, but, we, the
saviors of the ancient milk, those who can acidify the raw pureness, can make
yogurt starter with raw milk.</p>
<p>That’s the kind of magic nobody cares about, not in the compiler world nor in
the real life.</p>
<p>The yogurt starter does not make the best yogurt, by the way, it needs
generations and generations of yogurts to make the best. That’s what our
project does: start simple (stage-0 and Mes) and go enriching the product
(TinyCC) until reaching a mature yogurt (<span class="caps">GCC</span>).</p>
<p>TinyCC does not really care about this bootstrappability concept. They only
want to be compilable with themselves. Nothing else.</p>
<p>That’s why <a href="http://joyofsource.com/">Jan</a>, the inventor of this metaphor I just
stretched to the infinite, had to fork the project. He had another choice:
simplify TinyCC’s code upstream to be able to be compiled from a simpler step,
but his ideas were rejected and some weird animosity I don’t understand
started. More on that later.</p>
<h3>The <span class="caps">RISC</span>-V support</h3>
<p>When the previous blogpost was written, TinyCC had a <span class="caps">RV64</span> backend, but the
TinyCC fork did not have <span class="caps">RISC</span>-V support.</p>
<p>My job here was to take the backend from the official TinyCC and bring it to
the bootstrappable one, Jan’s fork. I can say that is done. Good for me.</p>
<h4>The process</h4>
<p>I followed the cross-compiler trick again, in order to make this process easier
in my computer and because Mes doesn’t support <span class="caps">RISC</span>-V output yet. Making a
TinyCC for my x86_64 machine that had <span class="caps">RISC</span>-V output sounded more than
reasonable to me. Later I could always move to a full <span class="caps">RISC</span>-V machine making
sure that the backend was working.</p>
<p>So first I made a guix package for upstream <a href="https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm#L85">TinyCC cross-compiler (for
<span class="caps">RISC</span>-V)</a>
with <span class="caps">GCC</span>. This wasn’t really obvious, because there were some variables to set
correctly. Tested everything compiled and worked like expected. Apart from a
couple of issues later corrected upstream, it did.</p>
<p>Next, I made a guix package for <a href="https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm#L83">the forked TinyCC with
<span class="caps">GCC</span></a>. This
also needed some changes, as the forked one is a quite old version of TinyCC.
The process needs here a <code>libtcc1.a</code> that can be empty if the process is
compiled with <span class="caps">GCC</span> (<code>libgcc</code> provides that functionality) but the compilation
process doesn’t mention anything about this, and coming up with that by
yourself is hard.</p>
<p>Now the project was compilable, it was time to code. You can see this part in
the <code>riscv-mes</code> branch:</p>
<p><a href="https://github.com/ekaitz-zarraga/tcc/commits/riscv-mes">https://github.com/ekaitz-zarraga/tcc/commits/riscv-mes</a></p>
<p>I took the backend from the upstream and inserted it in the fork. Of course, it
didn’t compile. Many internal structures and APIs changed, so after trying to
stitch all together myself, I headed to the Mailing List. At the beginning I
wanted to think the answers I was getting were because I wasn’t explaining my
doubts properly or something but what it was happening was that the animosity
towards our fork (decision I didn’t take) appeared and someone tried to
ridicule me in the mailing list for no reason at all.</p>
<p>The funny thing is I’d never needed to contact the mailing list if the project
was as well written as they claim it to be. It’s full of functions and
variables with one character, the code is mixed together in a very aggressive
way… It’s supersmall, tiny even, but really hard to read. Also, the commits
are not very descriptive for anyone that is not the main maintainer, who,
surprise! Is the same person that gives aggressive answers in the mailing
list… I hope it’s only my perception and they are nice with his friends and
family, but the interaction made me feel uncomfortable and I don’t want to
touch this code again.</p>
<p>It was a sad moment, I must admit. But I decided I was going to do this with
help or without it. And I think I did it. Removed references here and there and
finally it looks like I reached somewhere.</p>
<p>There are some differences to point out, one of the commits that made me ask in
the mailing list was a huge change on the way that conditionals are handled in
TinyCC. Our fork didn’t have that so I needed to split the code in several
pieces and the benefits from that commit (some instruction optimization) are
lost in the backport. Still the branching and jumping is correct, but less
optimal. Not bad.</p>
<p>Code added and compiled, it was time for testing. I made a little script (I
didn’t share that, but it’s not really relevant either) and a small test case
of simple C files and compiled (not linked) them with the upstream version of
the compiler and the forked one. Disassembled them and compared differences.</p>
<p>You can try it building the upstream TinyCC and the fork and make them compile
(<code>-c</code>) a some files. Use <code>objdump --dissassemble</code> and see the results. It’s not
really hard to test. Here you have an example of a program you can build:</p>
<pre><code class="language-clike">// Example file to build
int main (int argc, char *argv[]){
int a = 19, b = 90;
if (a && b){
return 1;
} else {
return 45 + 90 << 8;
}
}
</code></pre>
<p>And the result it should give in both versions, optimized (upstream) and
unoptimized (our fork):</p>
<pre><code class="language-text">OPTIMIZED VERSION || UNOPTIMIZED VERSION
===============================================||==================================================
0000000000000000 <main>: || 0000000000000000 <main>:
0: fd010113 addi sp,sp,-48 || 0: fd010113 addi sp,sp,-48
4: 02113423 sd ra,40(sp) || 4: 02113423 sd ra,40(sp)
8: 02813023 sd s0,32(sp) || 8: 02813023 sd s0,32(sp)
c: 03010413 addi s0,sp,48 || c: 03010413 addi s0,sp,48
10: 00000013 nop || 10: 00000013 nop
14: fea43423 sd a0,-24(s0) || 14: fea43423 sd a0,-24(s0)
18: feb43023 sd a1,-32(s0) || 18: feb43023 sd a1,-32(s0)
1c: 0130051b addiw a0,zero,19 || 1c: 0130051b addiw a0,zero,19
20: fca42e23 sw a0,-36(s0) || 20: fca42e23 sw a0,-36(s0)
24: 05a0051b addiw a0,zero,90 || 24: 05a0051b addiw a0,zero,90
28: fca42c23 sw a0,-40(s0) || 28: fca42c23 sw a0,-40(s0)
2c: fdc42503 lw a0,-36(s0) || 2c: fdc42503 lw a0,-36(s0)
30: 00051463 bnez a0,38 <main+0x38> || 30: 00051463 bnez a0,38 <main+0x38>
34: 0180006f j 4c <main+0x4c> || 34: 01c0006f j 50 <main+0x50>
38: fd842503 lw a0,-40(s0) || 38: fd842503 lw a0,-40(s0)
3c: 00051463 bnez a0,44 <main+0x44> || 3c: 00051463 bnez a0,44 <main+0x44>
40: 00c0006f j 4c <main+0x4c> || 40: 0100006f j 50 <main+0x50>
44: 0010051b addiw a0,zero,1 || 44: 0010051b addiw a0,zero,1
48: 0100006f j 58 <main+0x58> || 48: 0140006f j 5c <main+0x5c>
4c: 00008537 lui a0,0x8 || 4c: 0100006f j 5c <main+0x5c>
50: 7005051b addiw a0,a0,1792 || 50: 00008537 lui a0,0x8
54: 00000033 add zero,zero,zero || 54: 7005051b addiw a0,a0,1792
58: 02813083 ld ra,40(sp) || 58: 00000033 add zero,zero,zero
5c: 02013403 ld s0,32(sp) || 5c: 02813083 ld ra,40(sp)
60: 03010113 addi sp,sp,48 || 60: 02013403 ld s0,32(sp)
64: 00008067 ret || 64: 03010113 addi sp,sp,48
|| 68: 00008067 ret
</code></pre>
<p>In the right you can see there are some <code>j</code> instructions duplicated, but it’s
not supposed to be a problem, as the rest of the addresses are calculated
properly, and they are never going to be reached.</p>
<h4>Last step</h4>
<p>So the code is added to the fork and it seems to work. That’s what I promised
to do, but I wanted to go a little bit further and test if Mes was able to
handle the code I added to the TinyCC fork.</p>
<p>In order to do that I made another branch in the project where I changed the
package and some configuration in order to compile the forked TinyCC using Mes.</p>
<p>You can see what I did here:</p>
<p><a href="https://github.com/ekaitz-zarraga/tcc/commits/mes-package">https://github.com/ekaitz-zarraga/tcc/commits/mes-package</a></p>
<p>Turns out that I managed to build the thing, using Mes for my x86_64 machine
choosing <span class="caps">RISC</span>-V as the backend, but it doesn’t work at all.</p>
<p>The resulting compiler generates empty files that have no permissions and fails instantly.</p>
<p>At least we tested that <code>mescc</code> is ok with the C constructs we used in the
backport of the <span class="caps">RISC</span>-V support. But there are still many things to test and
this isn’t easy at all.</p>
<p>Let me give you some examples on how tricky this process is.</p>
<p>This line in the <code>guix.scm</code> file<sup id="fnref:line"><a class="footnote-ref" href="#fn:line">3</a></sup>:</p>
<pre><code class="language-clike"> "--extra-cflags=-Dinline= -DONE_SOURCE=1"
</code></pre>
<p>Does two crazy preprocessor tricks, inserted as C flags. It’s equivalent to
adding these macros in the top level of the sources:</p>
<pre><code class="language-clike">#define inline
#define ONE_SOURCE 1
</code></pre>
<p>The first one removes the word <code>inline</code> from the source code, because <code>mescc</code>
does not support that. The second, defines <code>ONE_SOURCE</code> to a value because if
it’s only defined, without a value, like the makefile does by default, it is
not matched properly by de <code>#ifdef</code>s. Finding this is not obvious.</p>
<p>That’s of course not the only thing, we found out many others. I spent a couple
of weeks making the building process work for <code>mescc</code> and when I thought it was
working the result is a broken binary. Pretty fun.</p>
<p>And why all this trouble, you might think?</p>
<p>Jan’s fork is not compiled using the <code>configure</code> and the <code>Makefile</code> the project
comes with, he wrote some shell scripts to build everything. I wanted to try to
build the project directly as it came for several reasons: the scripts are
prepared for native compilers and not for the cross compiler I was building,
they use Mes from source but I just needed to use the upstream one and I
thought integrating all this in the normal building process would be an extra win.</p>
<p>I lost this time though.</p>
<p>The compilation process might be missing some libraries, or some stubs might be
in use instead of the real code… Maybe the problem is I’m using the x86_64
version of Mes, which is not thoroughly tested… But using the i386 version is
not possible because I’m building for 64bit <span class="caps">RISC</span>-V and the i386 doesn’t know
how to deal with 64 bit words… Honestly, I don’t know what to do.</p>
<h3>Something cool to say</h3>
<p>Mes does not compile following the classic process. Mes is integrated with some
tools from the stage-0 project so it uses the M1 macro system, hex0 and all
that kind of things to build the programs.</p>
<p>During the process I found that some of the M1 instructions Mes was generating
were not available by M1, so I had to add a few extra instructions to the M1
macro definitions for Mes. Here’s the diff (a little bit simplified) I had to make:</p>
<pre><code class="language-diff">diff --git a/lib/x86_64-mes/x86_64.M1 b/lib/x86_64-mes/x86_64.M1
index 9ffbbf15..64997c55 100644
--- a/lib/x86_64-mes/x86_64.M1
+++ b/lib/x86_64-mes/x86_64.M1
@@ -147,6 +148,10 @@ DEFINE mov____0x8(%rbp),%rsp 488b65
DEFINE mov____0x8(%rdi),%rax 488b47
DEFINE mov____0x8(%rdi),%rbp 488b6f
DEFINE mov____0x8(%rdi),%rsp 488b67
+DEFINE mov____(%rax),%si 668b30
+DEFINE mov____(%rax),%sil 408a30
+DEFINE mov____%si,(%rdi) 668937
+DEFINE mov____%sil,(%rdi) 448837
DEFINE movl___%eax,0x32 890425
DEFINE movl___%edi,0x32 893c25
DEFINE movl___%esi,(%rdi) 8937
base-commit: aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037
</code></pre>
<p>Now, with those instructions added, my package got a little bit more complex:
I had to extend the Mes package with my patch until that change is accepted
upstream. But this is great! Using software and improving it while you use it
is the best feeling in life!<sup id="fnref:choco"><a class="footnote-ref" href="#fn:choco">4</a></sup></p>
<p>Let me use this point to show you a little bit how this macro system works. You
can see this <code>x86_64.M1</code> file has three columns: <code>DEFINE</code>, some text, and some
number in hex. This is kind of an assembler description. There’s the M1 program
that receives a file written with instructions that look like the text in the
second column in the <code>.M1</code> file and converts them one by one to the numbers in
the third. In short, the <code>.M1</code> file is a reference that tells the M1 program
how to do the conversion.</p>
<p>M1 is just a text replacement tool that makes the conversion based on the input
file it gets from the <code>.M1</code> file. It helps us write instructions in a way that
looks like they have a meaning (that’s what an assembler is after all).</p>
<p>Later, those numbers are converted to binary, using Hex0 or another a little
bit more sophisticated tool.</p>
<p>All these tools are written in a way that can be audited (Hex0 is written in
Hex0…) and they are executed from source at their very beginning.</p>
<p>This is how we make yogurt directly from milk. Cool huh?
Props to <a href="http://bootstrappable.org/">http://bootstrappable.org/</a></p>
<h3>Conclusions</h3>
<p>Back to the project, considering the fact that I didn’t manage to build a fully
working TinyCC with a <span class="caps">RISC</span>-V backend using Mes, is this a failure?</p>
<p>I wouldn’t say so.</p>
<p>The new <span class="caps">RISC</span>-V backend is added and tested in the forked TinyCC, using <span class="caps">GCC</span> as a
compiler. That’s a big chunk of the work.</p>
<p>On the other hand, I can compile the forked TinyCC with <code>mescc</code> even if the
result didn’t work, I can say the code I added was processed so it was
technically acceptable for <code>mescc</code>. Not bad, but we’ll still need to see how
true is this.</p>
<p>In the end, these kind of small steps make progress, and having everything
documented here and in the commits on the git repositories help others continue
with what I just did.</p>
<p>Now, I’m going to leave this as finished, as the code is supposed to work. All
the dots are more or less drawn. Now it’s time for another project, one that
connects all the dots of the <span class="caps">RISC</span>-V full source bootstrap: from <code>mescc</code>
(already has some <span class="caps">RISC</span>-V support) to the forked TinyCC (I added the <span class="caps">RISC</span>-V
support), next to the mainline TinyCC (has <span class="caps">RISC</span>-V support) or/and <span class="caps">GCC</span> 4.6.4 (I
added <span class="caps">RISC</span>-V support) and from one of those to <span class="caps">GCC</span> 7.5 (the first one with
<span class="caps">RISC</span>-V support) and then to the world.</p>
<p>My work in this project left all the breadcrumbs in the forest, ready for
anyone to follow<sup id="fnref:breadcrumbs"><a class="footnote-ref" href="#fn:breadcrumbs">5</a></sup>.</p>
<p>That person can be me, anyone else or even a group of people. All I can say is
I won’t forget this project, I’ll always be reachable for advice and I’d try to
help as much as I can. As I always do.</p>
<p>These days I’ll continue to give a couple of tries to this and I may reach
something else, but I won’t be as busy on it as I’ve been. I think I gave
everything I could in this project. There’s still a lot to do, but what it’s
left is not something I can do alone.</p>
<p>Until next time.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:misunderstandings">
<p>I’ve reached many misunderstandings about my project too.
Some people have told me all this work is worthless because you can always
bootstrap from an x86_64 machine and then continue the bootstrapping effort in
your <span class="caps">RISC</span>-V. And so on. That’s why this blog doesn’t have a comment section.
People insist to believe that other people’s work is worthless or they are able
to do it simpler with no effort. I won’t claim that my explanations are the
best, but I can claim to be the laziest person I know, and I’d never spent time
in something that doesn’t worth the effort. <a class="footnote-backref" href="#fnref:misunderstandings" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:kefir">
<p>With kefir you are fucked. We don’t know where it comes from. Luckily
we harvested a lot and it’s easy to grow. <a class="footnote-backref" href="#fnref:kefir" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:line">
<p><a href="https://github.com/ekaitz-zarraga/tcc/blob/mes-package/guix.scm#L196">https://github.com/ekaitz-zarraga/tcc/blob/mes-package/guix.scm#L196</a> <a class="footnote-backref" href="#fnref:line" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:choco">
<p>Chocolate and hot coffee too. <a class="footnote-backref" href="#fnref:choco" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:breadcrumbs">
<p>I hope someone follows them before the birds eat them. <a class="footnote-backref" href="#fnref:breadcrumbs" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
</ol>
</div>Adding TinyCC to the mix2022-08-01T00:00:00+03:002022-08-01T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-08-01:/bootstrapGcc5.html<p>Discussing what changes need to be done to make <span class="caps">GCC</span> compilable form a
simpler C compiler, TinyCC.</p><p>In the <a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series</a> we already introduced <span class="caps">GCC</span>,
made it able to compile C programs and so on, but we didn’t solve how to build
that <span class="caps">GCC</span> with a simpler compiler. In this post I’ll try to explain which
changes must be applied to all the ecosystem to be able to do this.</p>
<h3>The current status</h3>
<p>I already talked about this in the past, but it’s always a good moment to
remind the bootstrapping process we are immerse in. There are steps before of
these, but I’m going to start in <span class="caps">GNU</span> Mes, which is the core of all this.</p>
<p>From the part that interests us, <span class="caps">GNU</span> Mes has a C compiler, called MesCC. This C
compiler is the one we use to compile TinyCC and we use that TinyCC to compile
a really old version of <span class="caps">GCC</span>, the 2.95, and from that we compile more recent
versions until we reach the current one. From the current one we compile the world.</p>
<p>That’s the theory, and it’s what we currently have in the most widely supported
architectures (<code>i386</code> and maybe some <span class="caps">ARM</span> flavour). Problems arise when you deal
with some new architecture, like the one we have to deal with: <span class="caps">RISC</span>-V.</p>
<p><span class="caps">RISC</span>-V was invented recently, and the compilers did not add support for it
until some years ago. <span class="caps">GCC</span> added support for <span class="caps">RISC</span>-V in the 7.5 version, as we
have been discussing through this series, which needed a C++ compiler in order
to be built. That’s a problem we almost solved in the previous steps,
backporting the <span class="caps">RISC</span>-V support to a <span class="caps">GCC</span> that only needed a C compiler to be built.</p>
<p>Now, extra problems appear. Which C compiler are we going to use to build that
<span class="caps">GCC</span> 4.6.4 that has the <span class="caps">RISC</span>-V support we backported?</p>
<p>According to the process we described, we should use <span class="caps">GCC</span> 2.95, but it doesn’t
support <span class="caps">RISC</span>-V so we would need to backport the <span class="caps">RISC</span>-V support to that one too.
That’s not cool.</p>
<p>Another option would be to remove the <span class="caps">GCC</span> 2.95 from the equation and compile
the <span class="caps">GCC</span> 4.6.4 directly from TinyCC, if that’s possible. Making the whole
process faster removing some dependencies. But this means TinyCC has to be able
to compile <span class="caps">GCC</span> 4.6.4. We are going to try to make this one, but that requires
some work we will describe today.</p>
<p>On the other hand, in order to be able to build all this for <span class="caps">RISC</span>-V, TinyCC and
MesCC have to be able to target <span class="caps">RISC</span>-V…</p>
<p>Too many conditions have to be true to all this to work. But hey! Let’s go step
by step.</p>
<h3><span class="caps">RISC</span>-V support in TinyCC</h3>
<p>First, we have to make sure that TinyCC has <span class="caps">RISC</span>-V support, and it does. Since
not a long time ago, TinyCC is able to compile, assemble and link for <span class="caps">RISC</span>-V,
only for 64 bits.</p>
<p>I tested this support using a TinyCC cross-compiler and it works. If you want
to try it, I have a simple <a href="https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm">Guix package</a> for the cross compiler,
and I also fixed the official Guix package for the native TinyCC, which have
been broken for long.</p>
<p>Still, I didn’t test the <span class="caps">RISC</span>-V support natively, but if the cross-compiler
works, chances are the native will also work, so I’m not really worried about
this point.</p>
<h3><span class="caps">GNU</span> Mes compiling TinyCC</h3>
<p><span class="caps">GNU</span> Mes supports an old C standard that is simpler than the one TinyCC uses, so
it uses a fork of TinyCC with some C features removed. This fork was done way
before the <span class="caps">RISC</span>-V support was added to TinyCC and many things have changed
since then.</p>
<p><a href="https://www.youtube.com/watch?v=-1qju6V1jLM">We need to backport the TinyCC <span class="caps">RISC</span>-V support to Mes’s own TinyCC fork,
then.</a> Or at least do something
about it.</p>
<p>When I first took a look into this issue, I thought it would be an easy fix, I
already backported <span class="caps">GCC</span>, which is orders of magnitude larger than TinyCC… But
it’s not that easy. TinyCC’s internal <span class="caps">API</span> changed quite a bit since the fork
was done, and I need to review all of it in order to make it work. Also, this
process includes the need to convert all the modern C that is not supported by
MesCC to the older C constructs that are available on it.</p>
<p>It’s a lot of work, but it’s doable to a certain degree, and this might suppose
a big step for the full source bootstrap process. Like what I did in <span class="caps">GCC</span>, it’s
not going to solve everything, but it’s a huge step in the right direction.</p>
<h3><span class="caps">GNU</span> Mes supporting <span class="caps">RISC</span>-V</h3>
<p>On the lower level part of the story, if we want to make all this process work
for <span class="caps">RISC</span>-V, <span class="caps">GNU</span> Mes itself should be runnable on it, and able to generate
binaries for it.</p>
<p><a href="https://lists.gnu.org/archive/html/bug-mes/2021-04/msg00031.html">There have been efforts</a> to make all this possible, and I
don’t expect this support to take long to appear finally in <span class="caps">GNU</span> Mes. It’s just
a matter of time and funding. I am aware that Jan is also interested on
spending time on this, so I think we are covered on this area.</p>
<h3><span class="caps">GCC</span> compilation with TinyCC</h3>
<p>The only point we are missing then is to be able to build the backported <span class="caps">GCC</span>
from TinyCC, without the intermediate <span class="caps">GCC</span> 2.95. This a tough one to test and
achieve, because the <span class="caps">GCC</span> compilation process is extremely complex, and we need
to make quite complex packages for this process to work.</p>
<p>On the other hand, the work I already did, packaging my backported <span class="caps">GCC</span> for guix
is not enough for several reasons: it was designed to work with a modern <span class="caps">GCC</span>
toolchain, and not with TinyCC; and a cross-compiler is not the same thing as a
native one.</p>
<p><span class="caps">GCC</span> is normally compiled in stages, which are called <em>bootstrap</em> by the <span class="caps">GCC</span>
build system. I described a little bit of that process <a href="https://ekaitz.elenq.tech/bootstrapGcc3.html#fn:staged">in a footnote in
past</a>. That process is not activated in a cross-compilation
environment, which is what I used when the backend I backported was
<del>back</del>tested. If the <em>bootstrap</em> process doesn’t work, it means the
compilation process fails, so this introduces possible errors in the build
system which we were avoiding thanks to the cross-compilation trick.</p>
<p>I did this on purpose, of course. I just wanted a simple working environment
which was letting me test the backported <span class="caps">RISC</span>-V backend of the compiler, but
now we need to make a proper package for <span class="caps">GCC</span> 4.6.4, and make it work for TinyCC.</p>
<p>I wouldn’t mention this if I didn’t try it and failed making this package. It’s
not specially difficult to make a package, or it doesn’t look like, until you
get errors like:</p>
<pre><code class="language-weird-error-lol">configure: error: C compiler cannot create executables
`¯\_(ツ)_/¯`
</code></pre>
<p>That being said, this is not only a packaging issue. As we already mentioned,
we are removing <span class="caps">GCC</span> 2.95 from the pipeline, so TinyCC has to be able to deal
with the <span class="caps">GCC</span> 4.6.4 codebase directly, including the backport I did.</p>
<p>The easiest way to test this is to compile <span class="caps">GCC</span> 4.6.4 for x86_64 in my machine,
with no emulation in between, so we can find the things TinyCC can’t deal with.
Later we would be able to test this further in an emulated environment or
directly in a <span class="caps">RISC</span>-V machine to make sure TinyCC can deal with the <span class="caps">RISC</span>-V
backend, but for a first review in the <span class="caps">GCC</span> core, using x86_64 can be enough.
It requires no weird setup, further than a working package… Ouch!</p>
<p>I’m not really good at this part and I’m not sure if anyone else is, but I
don’t feel like spending time in trying to make this package cascade. I feel
like my time is better spent on fixing stuff, or, once the package cascade is
done, fixing the compatibility.</p>
<p>During the whole project, making Guix packages and figuring out build systems
is the part where more time was spent, and it’s the one with the lowest success
rate. It feels like I wasted hours trying to make the build process work for nothing.</p>
<p>The funny part of this is Guix is partially the one to blame here, not
conforming the <span class="caps">FHS</span> and having this weird way to handle inputs is what makes the
whole process really complex. Code has to be patched to find the libraries,
scripts must be patched too, binaries are hard to find… On the good side,
it’s Guix that makes this work worth the effort, and also what makes this
process reproducible, once it’s done, to let everyone enjoy it.</p>
<h4>Wait, but didn’t Mes use a TinyCC fork?</h4>
<p>Oh yeah of course. What I forgot to mention is the step we just described,
making TinyCC able to compile the backported <span class="caps">GCC</span> 4.6.4, is not just as simple
as I mentioned. If we use upstream TinyCC to compile <span class="caps">GCC</span>, who is going to
compile that TinyCC? We already said MesCC is not able to do that directly.</p>
<p>We could build that TinyCC with the TinyCC fork Mes has or make the TinyCC fork
go directly for the <span class="caps">GCC</span> 4.6.4, but in any case there’s an obvious task to
tackle: The <span class="caps">RISC</span>-V support must arrive the TinyCC fork before we can do
anything else. And that’s where I want to focus.</p>
<h3>This is not only about <span class="caps">RISC</span>-V</h3>
<p>I have to be clear with you: I mixed two problems together and I did that on purpose.</p>
<p>On the one hand we have the <span class="caps">RISC</span>-V support related changes. And on the other
hand we have the changes on the compilation pipeline: the removal of <span class="caps">GCC</span> 2.95.</p>
<p>The second part is just a consequence of the first, but it’s not only related
with the <span class="caps">RISC</span>-V world. Once we have our compilers ready, we are going to apply
the change for the whole thing. Removing a step is a really important task for
many reasons but one is the obvious at this point: having a really old compiler
like <span class="caps">GCC</span> 2.95 forces us to stay with the architectures it was able to target,
or makes us add them and maintain them ourselves. It’s a huge flexibility
issue for the little gain it gives: <span class="caps">GCC</span> 4.6.4 is already compilable from a C90 compiler.</p>
<p>So, this is an important milestone, not only for my part of the job but also
for the whole <span class="caps">GNU</span> Mes and bootstrapping effort. Skipping <span class="caps">GCC</span> 2.95 has to be
done in every architecture, and the packaging effort of that is unavoidable.</p>
<h3>What I already did</h3>
<p>While I was reviewing what it needed to be done, I started doing things here
and there, preparing the work and making sure I was understanding the context better.</p>
<p>First, I realized I introduced some non-C90 constructs in the backport of <span class="caps">GCC</span>,
because I directly copied some code from 7.5 and I removed those. This is
important, because we need to be able to compile all this with TinyCC, and I
don’t expect TinyCC to support modern constructs.</p>
<p>I packaged a TinyCC <span class="caps">RISC</span>-V cross compiler <a href="https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm">for the upstream
project</a>, and also for <a href="https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm">the Mes fork</a> even
thought the latter is not available yet for compilation: we need to backport
the backend in order to make it work. Still, it’s important work, because it
lets me start the backport easily. I’ll need to apply more changes on top of
it, for sure, but at the moment I have all I need to start coding the new backend.</p>
<p>I spent countless hours trying to make a proper <span class="caps">GCC</span> package and trying to use
TinyCC as the C compiler for it with no success. This is why I decided to move
on and work in a more interesting and usable part: adding the <span class="caps">RISC</span>-V backend to
the Mes fork of TinyCC.</p>
<p>Of course, I already started working on the <span class="caps">RISC</span>-V support of the TinyCC fork
from Mes, and started encountering <span class="caps">API</span> mismatches here and there. Most of them
related with some optimizations introduced after the fork, that I need to
review in more detail in the upcoming weeks. I also spent some time trying to
understand how TinyCC works, and it’s a very interesting approach I have to
say<sup id="fnref:maybe"><a class="footnote-ref" href="#fn:maybe">1</a></sup>.</p>
<h3>Conclusions</h3>
<p>I’d love to tackle all these problems together and fix the whole system, but
I’m just one guy coding from his couch. It’s not realistic to think I can fix
everything, and trying to do so is detrimental to my mental health.</p>
<p>So I decided to go for the <span class="caps">RISC</span>-V support for the TinyCC fork we have at Mes.
This would leave all the ingredients ready for someone more experienced than me
to make the final recipe.</p>
<p>The same thing happened with the <span class="caps">GCC</span> backport. I didn’t really finish the job:
there’s no C++ compiler working yet, but that’s not what matters. Anyone can
take what I did, package it properly, which it happened to be an impossible
task for me, and make it be ready. We already made a huge step.</p>
<p>Fighting against a wall is bad for everyone, it’s better to pick a task where
you can provide something. You feel better, and the overall state of the
project is improved. Achieving things is the best gasoline you can get for
achieving new things.</p>
<p>Regarding the task I chose, I’ve already spent some hours working on it. It’s
not an easy task. The internal TinyCC <span class="caps">API</span> changed a lot since the moment the
fork was done, and there are many commits related with <span class="caps">RISC</span>-V since then. One
of the most recent one fixes the <span class="caps">RISC</span>-V assembler after I reported it wasn’t
working, few weeks ago. All these changes must be reviewed carefully, undoing
the <span class="caps">API</span> changes and also, most importantly, keeping the code compatible with
<span class="caps">GNU</span> Mes’s C compiler.</p>
<p>Not an easy task.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:maybe">
<p>Maybe I’ll have the time to explain it in a future blog post, maybe
not. <a class="footnote-backref" href="#fnref:maybe" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Milestone — Source to Binary RISC-V support in GCC 4.6.42022-06-20T00:00:00+03:002022-06-20T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-06-20:/bootstrapGcc4.html<p>Description of the changes applied from a minimal compiler that runs and
generates assembly to something that is actually able to compile,
interacting with binutils and having a working libgcc.</p><p>In the <a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series</a> we already introduced <span class="caps">GCC</span>,
and we already shared how I backported the <span class="caps">RISC</span>-V support from the <span class="caps">GCC</span> core to
<span class="caps">GCC</span>-4.6.4. Now it’s time to finish what we left half-done and actually
introduce a <em>full</em> <span class="caps">RISC</span>-V compiler.</p>
<h3>Where we left last time</h3>
<p>The Tuesday, 7th of April, I marked a commit with the <code>minimal-compiler</code> tag.
That commit contains all the work we did until that time. In that tag we
describe how we can build a compiler that is only able to assemble files to
<span class="caps">RISC</span>-V.</p>
<p>As we already explained around here, <span class="caps">GCC</span> is a driver program that calls other
programs to do its work. The <span class="caps">GCC</span> core compiles the code to assembly language
and then calls binutils to do the rest of the work: assembly and linking.</p>
<p>At that point, we had to call binutils by hand.</p>
<h3>The changes</h3>
<p>The changes applied at the time of writing are available in the
<a href="https://github.com/ekaitz-zarraga/gcc/releases/tag/working-compiler"><code>working-compiler</code></a> tag. As the tag message describes, they
were split in two different branches: the <code>guix-package</code> branch and the <code>riscv</code>
branch.</p>
<p>The <code>guix_package</code> branch is merged in the <code>riscv</code> branch but this split lets
us differentiate which changes are related with the compiler itself and which
are related with the tooling around the compiler. That way we’ll be able to
choose what to do with the commits easily in the future. We’ll probably need to
rearrange some stuff.</p>
<h3>The context is everything: Guix package part</h3>
<p>The <code>guix_package</code> branch contains all the commits that make the Guix tooling
around the project work. This includes the compilation process definition in a
reproducible way, the environment setup and all that.</p>
<p>As the <code>working-compiler</code> tag message describes, this is the way you can
currently make this compiler work and play with it:</p>
<pre><code class="language-bash">$ guix shell -m manifest.scm
$ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu
# This second command will prepare the PATH and other environment
# variables to make GCC find libraries and executables
</code></pre>
<blockquote>
<p>If you use this in the future and it fails, it might be because between the
time this blog post was written and you read it Guix made some changes in the
core packages that are used. You can always use the <code>time-machine</code> utility to
make sure you use everything like in the moment this post was written:<br>
<code>guix time-machine --channels=channels.scm -- shell -m manifest.scm</code></p>
</blockquote>
<p>From this point you can directly run the compiler, it will need the <code>sysroot</code>
option to be able to find the <code>crt*</code> files, but that’s something I’m not
worried about at this point, we’ll fix that when we integrate this in the
bootstrapping process.</p>
<p>Run the compiler like this now:</p>
<pre><code class="language-bash">$ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static] ...
</code></pre>
<h4>Notable changes in the Guix side</h4>
<p>The most notable change in the Guix side is the addition of the <code>manifest.scm</code>
file and also the <code>PREPARE_FOR_COMPILATION.sh</code> file. With the help of my man
Janneke, I realized the problems I had came from the fact that I was calling
the compiler with the wrong environment and it was unable to find the linker
and the assembler. Yes, this kind of things happen a lot in Guix if you are not
careful (and I am <em>not</em> careful at all). Adding these tools let me prepare a
working environment where the assembler and the compiler are found and called properly.</p>
<p>This change also includes the some interesting extras: the GLibC added to the
manifest also contains the static version so we can generate static binaries
that are easier to test in an emulated environment without having to deal with
the dynamic linker. Important stuff.</p>
<p>Also, now the compilation process relies on a newer Guix version, which removed
the <code>-unknown</code> part from the triplets (actually <em>quadruplets</em>), like
<code>riscv64-unknown-linux-gnu</code>. That was a little bit of a pain, because I just
tried to compile everything one day and failed, and in the end it was just that
small change. I decided to update the Guix version needed to keep it up-to-date
with the current Guix, so I didn’t need to run <code>guix time-machine</code> each time.
It’s better like this.</p>
<p>If you want to read more about the change and see how fast Guix
people helped me understand what was going on, <a href="https://lists.gnu.org/archive/html/bug-guix/2022-06/msg00092.html">see this mailing list
thread</a><sup id="fnref:guix"><a class="footnote-ref" href="#fn:guix">1</a></sup>. I have also to mention that I needed to add a small
change to my <span class="caps">GCC</span> to be able to work in the case the <code>-unknown</code> part was not
added to it: adding <code>riscv</code> to <code>config.sub</code> was enough for that.</p>
<p>I also fixed a couple of extra things but they are not really relevant for
this. Having a working environment preparation is a nice milestone by itself,
but we did some things more on the <span class="caps">GCC</span> side!</p>
<h3>Road to a working compiler: The <span class="caps">GCC</span> part</h3>
<p>The changes in the <code>riscv</code> branch contain some commits, most of them are small,
but they are really important. I have to say this is full of details I don’t
really understand, so I’ll try to focus on those I actually do. The rest of
them are simply things that happened to work in the end. You know, this is
pretty old software and the project is too complex to understand it all…</p>
<h4>Memory models and fences</h4>
<p>First, before doing anything else, we mentioned in the previous post that the
memory models were something we needed to review. We knew this because the code
related to memory models was used in a couple of parts of the <span class="caps">RISC</span>-V code we
copied from the <span class="caps">GCC</span> 7.5 codebase, but it was not available in <span class="caps">GCC</span> 4.6.4. That
<span class="caps">API</span> simply did not exist back then.</p>
<p>The commit <a href="https://github.com/ekaitz-zarraga/gcc/commit/71dc25d08354dead26180bd552c0c3e299b012cb"><code>71dc25d</code></a> removes the memory models from the code
(which were already commented out but not solved), taking in account the most
conservative approach: always add the <code>.aq</code> flag and the <code>fence</code> instruction.
This is not optimal, but the performance penalty is negligible and it’s not
affecting the functionality.</p>
<p>I did not come up with this myself, as I mentioned in the previous post, I
asked the maintainer of the <span class="caps">RISC</span>-V support of <span class="caps">GCC</span> (who is also one of the big
names of <span class="caps">RISC</span>-V) about this and he gave me this solution.</p>
<p>I also had to change the optabs a little bit, using <code>memory_barrier</code> instead of
one of the more recent optabs. For this I just compared the code from the <span class="caps">MIPS</span>
architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for
many other parts of this work. Easy-peasy.</p>
<h4>Wrong arguments in the assembler call</h4>
<p>As I mentioned in the Guix part, we were unable to call the assembler. This
means we didn’t uncover the assembler call was broken until we actually put it
in the <code>PATH</code> and tried to call it.</p>
<p>The commit <a href="https://github.com/ekaitz-zarraga/gcc/commit/7030067e6aa54b44a2f2447d4e706e76bc88f696"><code>7030067</code></a> shows how I needed to make small changes in the
way the assembler is called by <span class="caps">GCC</span> to ensure that it was called correctly.</p>
<p>This issue was easy to fix, but not that easy to catch. First I found the
assembler was complaining because it didn’t understand the <code>-k-march</code> option. I
spent some time realizing the problem was that those were to options that were
merged together due to a lack of a space. Yes, the space in the end of the line
<strong>is relevant</strong>.</p>
<p>I directly removed the <code>-k</code> option from the <code>ASM_SPEC</code> because my assembler was
considering it ambiguous. I don’t remember where I copied this from but it
works and I don’t want to think about it ever again.</p>
<h4>Libgcc: the core of this change</h4>
<p>The biggest thing in this set of changes was the addition of <code>libgcc</code>, which
is mandatory if you want to link your programs compiled with <span class="caps">GCC</span>. <code>libgcc</code> is a
library <span class="caps">GCC</span> uses for complex operations: instead of generating the assembly
code directly, it generates calls to <code>libgcc</code>, where those complex operations
are defined. You can read further about those operations but they are not
really relevant for this post, the relevant part is we need to add <code>libgcc</code> in
order to have a working compiler.</p>
<p>The <span class="caps">GCC</span> codebase has different folders for its different blocks, so it’s
not surprising to see there’s a folder called <code>gcc</code> for the core and a folder
called <code>libgcc</code> for <code>libgcc</code>. Anyone would expect that just cherry picking the
commit that added the <code>libgcc</code> support to <span class="caps">GCC</span> 7.5 would be enough to have the
backport ready.</p>
<p>Sadly, life is a little bit harder than that.</p>
<h5>Cherry picking the libgcc support</h5>
<p>The first and easiest thing to do is to cherry pick the commit
<a href="https://github.com/ekaitz-zarraga/gcc/commit/72add2fa4c354af4bf8db0b8dcb50c5b076b3ae5"><code>72add2f</code></a> and pray. It looked plausible to make it work,
because, if you look at the changes it makes, it’s pretty well contained in the
<code>libgcc/config/riscv/</code> folder and adds just a couple of lines to the
<code>libgcc/config.sub</code> to make it find the <code>riscv</code> folder.</p>
<p>The contents of the commit are pretty clear:</p>
<ol>
<li>Some assembly files that implement some operations</li>
<li>Some header files and C code that implement other things</li>
<li>Some weird files called <code>t-something</code></li>
</ol>
<p>The first two types of files we can understand as the body of the <code>libgcc</code>
support: the juice. The <code>t-something</code> files are what are called Makefile Fragments.</p>
<p>The Makefile Fragments are the basis of the <span class="caps">GCC</span> build system. The files like
<code>config.host</code>, also part of the commit, sets a variable, <code>tmake_file</code>, where
all the <code>t-something</code>s are added so the compiler generator framework knows how
to build the things according to the rules described in them.</p>
<p>That’s how <span class="caps">GCC</span> buildsystem works. Now let’s talk about the problems.</p>
<h5><span class="caps">LIB2ADD</span> iteration is broken</h5>
<p>First thing I realized when I did the cherry pick of the <code>libgcc</code> support was
the whole thing did not build anymore. There was a crazy issue here.</p>
<p>We are not going to talk about <code>LIB2ADD</code> variable yet, but we can see this
small change, <a href="https://github.com/ekaitz-zarraga/gcc/commit/b9c7f394b33a60c1e64191b0e31f0cf98d6a5f93"><code>b9c7f39</code></a>, affects it. The main issue here was the
whole makefile system (<code>*.mk</code> files in <code>libgcc</code>) was iterating over the values
of the variable wrong, because <code>libgcc</code> support commit was appending values to
<code>LIB2ADD</code> instead of setting it. The <code>LIB2ADD</code> variable was set empty from the
main makefiles, and appending to it was leaving an empty entry, so the
iteration process was trying to compile an empty value.</p>
<p>This was superhard to debug, but this small change just made the whole thing
compile and now I was able to test the whole thing further.</p>
<h5>Still broken</h5>
<p>But it was still broken. <span class="caps">GCC</span> didn’t want to compile. Some weird errors
appeared, mentioning something like the <code>extra_parts</code> were not coherent between
<code>gcc</code> and <code>libgcc</code>. Weird.</p>
<p>Reading <code>gcc/config.gcc</code> and <code>libgcc/config.host</code> I realized the use of the
<code>extra_parts</code> variable and how it was certainly incoherent between the two
files. But why?</p>
<p>This led me to analyze the whole build system, comparing the <span class="caps">RISC</span>-V support
with others. I realized here that the buildsystem is mixed in <code>gcc</code> and
<code>libgcc</code> folders and it’s extremely difficult to know what’s the line that
separates one from another.</p>
<p>Apart from that, the buildsystem was unable to compile the <code>crt*</code> files,
because it didn’t know how to do it… The recipes were missing.</p>
<p>This made me go for the most aggressive change possible,
<a href="https://github.com/ekaitz-zarraga/gcc/commit/9c0f7364b89acb38ea3af1cbe1884059671b3c04"><code>9c0f736</code></a>: just copy everything from the
<code>libgcc/config/riscv/</code> to the <code>gcc/config/riscv</code>, add the rules for the <code>crt*</code>
files and make the <code>extra_parts</code> coherent.</p>
<p>Of course, this is not a good change, but it lets us try if the generated
compiler is able to compile anything. <em>“I’ll have time to clean this up later”</em>
I thought.</p>
<h5>The buildsystem is just a pain in the butt</h5>
<p>Now I was able to compile the <span class="caps">GCC</span>, so I could try it for some things.</p>
<p>I build a <span class="caps">RISC</span>-V cross compiler and tried to statically compile a small Hello
World program. Errors appeared:</p>
<pre><code class="language-unknown">/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2'
collect2: ld returned 1 exit status
</code></pre>
<p>The most logical thing to do was to build a <span class="caps">MIPS</span> cross compiler and check if
the same issue appeared. Of course, it didn’t.</p>
<p>Researching a little bit in the old <span class="caps">GCC</span> internals documentation, I found a
couple of interesting things:</p>
<p><a href="https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment">https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment</a></p>
<ul>
<li>The <code>LIB2FUNCS_EXTRA</code> variable is the one that contains what it should be
compiled and added to <code>libgcc</code>.</li>
<li><strong>Floating Point Emulation</strong> support is added by generating a couple of files
with some macros on top: <code>fp-bit.c</code> and <code>dp-bit.c</code>.</li>
</ul>
<p>Neither of those were used in the <code>libgcc</code> support we backported because the
<span class="caps">GCC</span> buildsystem changed a lot since 4.6.4. In fact, there is a commit<sup id="fnref:commit"><a class="footnote-ref" href="#fn:commit">2</a></sup>,
much later than the 4.6.4 release, that removes the need to generate those
<code>fp-bit.c</code> thingies.</p>
<p>The <code>LIB2FUNCS_EXTRA</code> variable was not used either, but somewhere in the
makefiles I found <code>LIB2ADD</code> was set from it. It looks like the whole
buildsystem changed from <code>LIB2FUNCS_EXTRA</code> to <code>LIB2ADD</code>, which was an internal
variable in the past. I don’t know.</p>
<p>I just moved the <code>LIB2ADD</code> to <code>LIB2FUNCS_EXTRA</code> and set the floating point
emulation in the <code>t-riscv</code> makefile fragment and hoped my work was done there.</p>
<h5>A huge pain in the butt</h5>
<p>It still failed, but at least now the <code>__letf2</code> symbol was found. The only one
I needed to fix now was <code>__unordtf2</code>.</p>
<p>I was disheartened.</p>
<p>The <code>__unordtf2</code> name did not appear anywhere in the code, but building
<code>libgcc</code> for <span class="caps">MIPS</span> had the symbol inside (I checked it with <code>nm</code>!). I had no
idea of what was going on.</p>
<p>I asked all my peers about this, and I was sent a program that was actually
compilable and runnable (Janneke is a genius, someone has to say it!):</p>
<pre><code class="language-clike">#include <stdio.h>
int
main ()
{
return printf ("Hello, world!\n");
}
int
__unordtf2 ()
{
return 0;
}
</code></pre>
<p>Hah! Still, no solution, but it was a little bit of hope.</p>
<p>This gave me the energy I needed to research further. This <code>__unordtf2</code>
function comes from software floating point support but the makefile fragments
in the <code>libgcc</code> folder seem to be correctly set…</p>
<h5>Moxie for the rescue</h5>
<p><span class="caps">MIPS</span> architecture was too complex to be understandable for this humble human
being so I decided to go for Moxie this time.</p>
<p><a href="http://moxielogic.org/blog/pages/architecture.html">Moxie</a> is a really
interesting thing. But we are not going to spend time on it, but in its support
in <span class="caps">GCC</span> 4.6.4. Take a look to the files on both parts of the Moxie support: the
<code>libgcc</code> and <code>gcc</code>:</p>
<pre><code class="language-unknown">gcc/config/moxie
├── constraints.md
├── crti.asm
├── crtn.asm
├── moxie.c
├── moxie.h
├── moxie.md
├── moxie-protos.h
├── predicates.md
├── rtems.h
├── sfp-machine.h
├── t-moxie
├── t-moxie-softfp
└── uclinux.h
libgcc/config/moxie
├── crti.asm
├── crtn.asm
├── sfp-machine.h
├── t-moxie
└── t-moxie-softfp
</code></pre>
<p>As you can see, some things are repeated, and most of the files are located in
the <code>gcc</code> part, which was not the case in the backported commit. I used this as
a reference for a massive cleanup of the previous aggressive duplication and I
ended up with this commit: <a href="https://github.com/ekaitz-zarraga/gcc/commit/703efe3e86e68fe05380e996943c831e7ad9a541"><code>703efe3</code></a></p>
<p>But that wasn’t enough.</p>
<p>I also found that the <code>soft-fp</code> support did not come from the <code>libgcc</code>
directory, but from the <code>gcc</code> one, so I needed to fix some makefile fragments.
The reference on how to do that was located in <code>gcc/config/soft-fp/t-softfp</code>.
This file described all the variables that I needed to set up to make the whole
process find the software floating point functions to add (see how the function
names are built with the <code>$(m)</code> variable? That’s why I couldn’t find where did
the <code>__unordtf2</code> came from…).</p>
<p>Those variables were set in <code>libgcc/config/riscv/t-softp*</code> files. I replicated
them in <code>gcc/config/riscv</code> as in the Moxie target and added referenced to them
to the <code>gcc/config.gcc</code> file, copying the lines I had <code>libgcc/config.host</code>. The
process was still failing, as the variables were not found by the main
makefile. I decided to hardcode them and give it another go, this time it built
and I was able to build files and the weird errors did not appear anymore.</p>
<p>I realized in the end that the reason why the main makefile wasn’t finding the
variables was because I was referring to the <code>t-softfp*</code> files through the
variable <code>host_address</code>, as it was done in the <code>libgcc/config.host</code>. The
problem was that variable was not available in the main <code>gcc/config.gcc</code> file
so I had to make a beautiful <code>switch-case</code> to deduce the wordsize.</p>
<p>With all this knowledge and with the help from the Moxie support I finally
arranged a new commit, where I duplicated the files that I needed to duplicate,
added the correct references to the makefile fragments and I even fixed some of
the variables in the makefiles: <a href="https://github.com/ekaitz-zarraga/gcc/commit/f42a21427361fb2d6d8481d143258af3237fd232"><code>f42a214</code></a></p>
<p>Yeah, all this was hard to deduce, because this buildsystem is really complex
and makefiles are really hard to debug<sup id="fnref:debug-makefile"><a class="footnote-ref" href="#fn:debug-makefile">3</a></sup>. Also the fact that I
don’t understand why I need to replicate the <code>t-softp*</code> files in both places
drives me mad, but I have to learn to deal with the fact that I can’t
understand everything.</p>
<p>In these commits you can see I deleted references to <code>extra_parts</code> and some
other things, too. The reason is simple: if other architectures don’t need
to set those variables, me neither. In the end, the <code>crt*</code> files were generated anyway.</p>
<h4>Other changes</h4>
<p>I also removed <code>-latomic</code> from the calls to the linker because it looks like it
didn’t exist back then (we’ll see how this explodes in my face in the future),
and fixed a couple of things more, but that’s not really interesting in my
opinion<sup id="fnref:interesting"><a class="footnote-ref" href="#fn:interesting">4</a></sup>.</p>
<h3>Missing things</h3>
<p>There are many things missing still, but this some I won’t even try because
they are out of the scope of the project. Remember: <strong>we just need to be able
to compile a more recent <span class="caps">GCC</span></strong>, not the rest of the world.</p>
<p>Some of the things I left might become mandatory in the near future as we do
proper testing of all this. My goal here was to provide something that can run,
and then I’ll collaborate with the different agents in this bootstrapping
effort to fix anything we need to reach the full bootstrapping support.</p>
<p>There are few obvious things missing:</p>
<ul>
<li><strong>Big Endian support</strong>: <code>riscv64be-linux-gnu</code> support, basically (note the
<code>be</code> in the target name). I won’t add this until we are sure we need it. It
shouldn’t be difficult, I already found some commits in the main <span class="caps">GCC</span> where
this was added and they were simple.</li>
<li><strong>Specific device support</strong>: we didn’t add support for any specific device
yet, that’s something we’ll need to think about in the future, but we
probably won’t add because it will make us maintain more code, and I don’t
think generic <span class="caps">RISC</span>-V code is going to have issues in the majority of the devices.</li>
<li>There are also <strong>many commits that came after</strong> the main port that fix some
relocations and some other things. Many of them are not really relevant,
because most of them are related with bugs that were introduced later, fix
things that won’t change anything in the only program we need to build (<span class="caps">GCC</span>)
and so on. In order to know which ones are relevant we need…</li>
<li><strong>Proper testing!</strong> I didn’t do this yet, and I’ll probably need help with
it. Compile your <span class="caps">RISC</span>-V software with this and give it a try! Send me the
errors you get!</li>
<li><strong>Libatomic</strong>: was directly removed from the calls to the linker, as I
mentioned before and we have to make sure it didn’t exist back then and so
on. Boring things…</li>
<li>I didn’t even bother to add the <strong>testsuite support</strong>, our only test has to
be if we are able to compile <span class="caps">GCC</span> with this, which I didn’t really try yet
anyway (because it needs some extra things).</li>
</ul>
<h3>Conclusion</h3>
<p>This part of the project came in the worst moment. I wasn’t really motivated
and I had some personal things going on. It was difficult for me to do this.</p>
<p>In contrast with what I did in the previous steps of the project, this part is
really uninteresting because it doesn’t give you a lot of chances for learning,
which is the only thing that keeps me alive at this point.</p>
<p>It’s also pretty boring and exasperating to feel you’ll never understand
something and trying and trying almost in a <em>trial and error</em> way is really
boring for someone like me.</p>
<p>Sometimes, working like this makes you feel really alone. You have almost no
people to help you, and the project needs a huge amount of context to be
understood so you can’t ask for help to <em>anyone</em>, and those who are supposed to
know are really hard to reach. Or what it might be worse: maybe there’s none
that understands this thing well, because it’s old, it changed a lot and
probably just a handful of people do really took part in the development of the
<del>fucking</del> buildsystem.</p>
<p>In conclusion, this is boring and uninteresting job, but someone has to do
this, and… It was my turn this time.</p>
<p>You go next.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:guix">
<p>Some people also spent time with me in the <span class="caps">IRC</span>. Thanks to all that
helped! <a class="footnote-backref" href="#fnref:guix" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:commit">
<p><code>569dc494616700a3cf078da0cc631c36a4f15821</code> <a class="footnote-backref" href="#fnref:commit" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:debug-makefile">
<p>Try to run <code>make --debug</code> in a project of the size of <span class="caps">GCC</span>
and laugh with me. <a class="footnote-backref" href="#fnref:debug-makefile" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:interesting">
<p>The rest of the post is not really interesting either, but I
need to report what I did. It’s just me fighting against myself and a very
complex buildsystem that could’ve been simpler and/or better documented. <a class="footnote-backref" href="#fnref:interesting" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Milestone — Minimal RISC-V support in GCC 4.6.42022-04-08T00:00:00+03:002022-04-08T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-04-08:/bootstrapGcc3.html<p>Description of the changes for a minimal <span class="caps">RISC</span>-V support in <span class="caps">GCC</span>-4.6.4 and
how did I reach this point.</p><p>In the <a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series</a> we already introduced <span class="caps">GCC</span>,
its internals, and the work I’m doing to make it able to bootstrap on <span class="caps">RISC</span>-V.
In this post we are going to tackle the backporting effort and see how I
managed to make <span class="caps">GCC</span>-4.6.4 compile a simple program to <span class="caps">RISC</span>-V.</p>
<h3>How to follow this post</h3>
<p>As this is going to be deeply connected to the changes I introduced in the
codebase, I suggest you to follow it directly in <a href="https://github.com/ekaitz-zarraga/gcc">the
repository</a>, the branch where I did the
changes is <code>riscv</code>, which starts from <code>releases/gcc-4.6.4</code>. As I will continue
adding changes on top of this, I left <a href="https://github.com/ekaitz-zarraga/gcc/releases/tag/minimal-compiler">a tag called <code>minimal-compiler</code></a>
that points to the contents of the repository when this blog post was written.</p>
<p>In any case, I’ll share small pieces of the code in the post, but of course I
can’t share everything here so I recommend you to go to the sources. I won’t
link the sources directly but mention where you can find the changes so you are
not forced to follow all the links in the browser and you can use your favorite
editor for that.</p>
<h3>Overview of the commits</h3>
<p>The <code>riscv</code> branch were I made all the work is split in several commits from
<code>releases/gcc-4.6.4</code>, where it started.</p>
<p>First it comes a series of 4 commits that make <span class="caps">GCC</span>-4.6.4 compilable with more
recent toolchains. These should be separated as independent patches later and
apply them by the distribution tool, Guix in this case.</p>
<p>Next a couple of commits describe a precarious <code>guix.scm</code> file that should
compile the project properly. At the moment it’s not fully ready for
distribution but that’s not really our job in the project, so I don’t want to
spend a lot of time on that yet. At the moment it’s just working so you can run
<code>guix build -f guix.scm</code> from the project directory and it should build a
minimal compiler, as we’ll see later. There’s also a <code>channels.scm</code> file, so
you can use the exact packages I used thanks to the very powerfull <code>guix
time-machine</code> command and replicate my exact build.</p>
<p>Even if I didn’t want to spend a long time with the Guix package, I’d lie to
you if I tell you I didn’t. Compiling legacy software is extremely difficult.
In this case, I had to patch the code to be compatible with more modern <span class="caps">GCC</span>
Toolchains, package an old <code>flex</code>, choose lots of configure time options…
Still, there are tons of things missing: there’s no C++ support, the package
doesn’t find system’s libraries such as glibc and it’s not integrated with
system’s binutils. I don’t know how I’m going to fix that to be honest, but I
don’t want to think on that right now.</p>
<p>The next commits are what interests us the most: changes on top of <span class="caps">GCC</span>.</p>
<p>The first of them<sup id="fnref:port"><a class="footnote-ref" href="#fn:port">1</a></sup> is just the <span class="caps">RISC</span>-V port commit from upstream <span class="caps">GCC</span>
applied on top of the project, being a little bit careful about
conflicts<sup id="fnref:conflicts"><a class="footnote-ref" href="#fn:conflicts">2</a></sup>. Obviously, this change doesn’t really work, it doesn’t
even compile, but it serves us to see which changes were needed on top of it.</p>
<p>In the next commit<sup id="fnref:md-files"><a class="footnote-ref" href="#fn:md-files">3</a></sup> I made a high-level fix on the Machine
Description files. If you remember from the <a href="https://ekaitz.elenq.tech/bootstrapGcc1.html">post about <span class="caps">GCC</span>
internals</a>, the machine description files are some
kind of Lisp-like files that describe both the translations between <span class="caps">GIMPLE</span> and
<span class="caps">RTL</span> and also between <span class="caps">RTL</span> and assembly, among other things. In this commit I
just removed some of the RTXs that were not available back in the 4.6.4 days
but were in use in the port. I’m talking, more specifically, about
<code>define_int_iterator</code> and <code>define_int_attr</code>. Thankfully they were just a
couple of loops that were easy to unroll by hand. Not a big deal.</p>
<p>Then, I made a larger commit that tries to fix the rest of the
<code>gcc/config/riscv</code> folder<sup id="fnref:large-commit"><a class="footnote-ref" href="#fn:large-commit">4</a></sup>. In this one I had two goals: make the
port compatible with the old C-based <span class="caps">API</span> and remove parts that weren’t strictly
necessary but complex to keep. This means I removed all the builtins support so
I didn’t need to port them (nice trick, huh?) and I kept the code related with
memory models out of the equation. I may need to fix that in the future, but I
was looking for a minimal support and I didn’t need that for my goal.</p>
<p>After that I tried to compile the project and run it, but I realized there was
a problem with the argument handling of the compiler. It was unable to find
arguments like <code>-march</code> and it was always failing to compile anything.</p>
<p>I realized there was a weird file at <code>gcc/common/config/riscv/riscv-common.c</code>
that looked like it was handling input arguments, so I focused on porting that
one too. It happens that the old <span class="caps">GCC</span> didn’t have that code structure:
everything was done in <code>gcc/config/</code> back then, so I moved the support and
made the argument handling follow the old <span class="caps">API</span>. That’s the last commit of the
series<sup id="fnref:last-commit"><a class="footnote-ref" href="#fn:last-commit">5</a></sup>.</p>
<h3>Deep diving</h3>
<p>Now I’ll try to explain the changes I made in the code here and there, but
first I have to explain the method I followed to make this.</p>
<p>It might be surprising but for the first time I didn’t try to understand
everything but work my way through it. This means I have absolutely no clue
about what does the code do in most of the places<sup id="fnref:guilt"><a class="footnote-ref" href="#fn:guilt">6</a></sup>. I just looked the
overall shape of it and try to match that shape with the code found in other
architecture, mostly <span class="caps">MIPS</span>, which the <span class="caps">RISC</span>-V support was based on. If I found
anything that I didn’t know how to convert I would read how that thing was
implemented on <span class="caps">MIPS</span> when the <span class="caps">RISC</span>-V support was added and then compare that
implementation with the one at 4.6.4. That would give me an idea about how to
convert to the old way to make things.</p>
<p>So, yeah, most of the coding was a mental exercise of pattern matching code and
conversion. There are very few things that I coded myself, like understanding
what I was doing deeply.</p>
<p>This doesn’t really mean you don’t need any knowledge to do this. Of course you
do. You need to understand what the code does in a very high-level, and know
how targets are described in <span class="caps">GCC</span><sup id="fnref:gcc-course"><a class="footnote-ref" href="#fn:gcc-course">7</a></sup>, but you don’t really need to
know each function to the detail.</p>
<p>Sadly, in some cases I had to read functions carefully and understand them, so
there’s some knowledge needed, still.</p>
<h4>First patch set</h4>
<p>The first patch set is not really relevant. I just made it while I was trying
to compile the project without changes. The compilation ended with errors, I
reviewed them, go to the <span class="caps">GCC</span> issue tracker and search. In some cases I was
lucky that I found a patch that fixed them, in others I only found suggestions
and I had to fix the thing myself. Not really interesting, honestly.</p>
<h4>The Guix package</h4>
<p>The Guix part in <code>guix.scm</code> is not really interesting neither, at least for the
moment. The most interesting part might be the addition of <code>flex-2.5</code> to the
input and the use of <code>local-file</code> as a source for the <span class="caps">GCC</span> package<sup id="fnref:efraim"><a class="footnote-ref" href="#fn:efraim">8</a></sup>.</p>
<p>All the rest is playing around with the configure flags and trying to read
Guix’s <span class="caps">GCC</span> packages and <a href="https://gitlab.com/janneke/guix/-/blob/wip-full-source-bootstrap/gnu/packages/commencement.scm">Janneke’s work with the full-source
bootstrap</a>.</p>
<p>Even with all that, there are some things missing, so I have to come back to
this in the future.</p>
<p>There is, though, a really interesting point to take in account. We already
said in the <a href="https://ekaitz.elenq.tech/bootstrapGcc1.html">post about <span class="caps">GCC</span> internals</a> that <span class="caps">GCC</span> is
a driver that calls other programs, such as <code>as</code> and <code>ld</code> from <span class="caps">GNU</span> Binutils, so
we know we only need the very basics in order to test that our compiler can
output <span class="caps">RISC</span>-V assembly so we can ignore the rest of things and focus on one
thing: I’m talking, of course, about <code>cc1</code>, the C compiler.</p>
<p>That’s why I only set the target to <code>all-gcc</code> and focus on that. Later we’ll
need to dig deeper.</p>
<p>One of the issues I’ll have to tackle is that the <span class="caps">GCC</span> I’m building is a
cross-compiler, but this whole project is being developed for a <span class="caps">RISC</span>-V target.
This doesn’t let the compiler check itself using the staged approach<sup id="fnref:staged"><a class="footnote-ref" href="#fn:staged">9</a></sup>,
which is something I’m interested on watching.</p>
<p>Once the proper <code>guix.scm</code> file is generated, I’ll prepare a package for the
<span class="caps">RISC</span>-V bootstrapping process. In that package I’ll define the first 4 commits
as separate patches to apply on top of the source, but I’ll remove them from
the original source. That way the codebase will continue to be compatible with
old toolchains and we’ll only apply those patches where needed, that is, when
we try to build with more recent environments.</p>
<h4>Machine Description files</h4>
<p>The machine description files did not change that much during the years. Some
extra constructs were added but the idea, the goal and the shape of the files
didn’t really change.</p>
<p>As we introduced already, the <span class="caps">RISC</span>-V port used <code>define_int_iterator</code> constructs
in order to simplify some of the work, repeating pieces of the machine
description file according to the integer iterator. Back in <span class="caps">GCC</span> 4.6.4 that
construct was not available so I unrolled the loop by hand following the
example at the <span class="caps">GCC</span> documentation:</p>
<p><a href="https://gcc.gnu.org/onlinedocs/gccint/Int-Iterators.html">https://gcc.gnu.org/onlinedocs/gccint/Int-Iterators.html</a></p>
<p>Simply repeat the structures (unroll them) using the value of the iterators and
use the <code>define_int_attr</code> to set some of the fields too. The example in the
docs gives a good description on how to do it.</p>
<p>On the other hand, I also found that the RTLs at <span class="caps">RISC</span>-V port were using
<code>simple-return</code> in some places and I realized that didn’t exist in the past. I
replaced that with <code>return</code>, hoping that it was the same, but I don’t remember
if I reasoned further<sup id="fnref:see"><a class="footnote-ref" href="#fn:see">10</a></sup>. In any case, you can take a look into
<code>gcc/rtl.def</code><sup id="fnref:def"><a class="footnote-ref" href="#fn:def">11</a></sup> and see how <code>SIMPLE_RETURN</code> was added later.</p>
<h4>Matching the <span class="caps">API</span></h4>
<p>There are other more meaningful changes. The large commit<sup id="fnref2:large-commit"><a class="footnote-ref" href="#fn:large-commit">4</a></sup> is
full of changes related with the conversion back to the C <span class="caps">API</span>.</p>
<p>The most obvious ones are converting from <code>rtx_insn *</code> to <code>rtx</code>, and
adding/removing machine modes where needed. It was just a matter of searching
the functions being used in the <span class="caps">MIPS</span> target and trying to match them. Boring,
and probably wrong in a couple of places, but looks like it’s working, I don’t
know. Examples:</p>
<pre><code class="language-diff">- emit_insn (gen_rtx_SET (target, src));
+ emit_insn (gen_rtx_SET (VOIDmode, target, src));
</code></pre>
<pre><code class="language-diff">- op = plus_constant (Pmode, UNSPEC_ADDRESS (base), INTVAL (offset));
+ op = plus_constant (UNSPEC_ADDRESS (base), INTVAL (offset));
</code></pre>
<p>There were a couple of functions using a small class called <code>cumulative_args_t</code>
that it was easy to convert to <code>CUMULATIVE_ARGS *</code> just removing calls to
<code>get_cumulative_args</code> and <code>pack_cumulative_args</code>. In C everything is rougher
and low level. Thankfully in this case, the low level <span class="caps">API</span> was still present so
we could just use that instead of the new C++ one, and removing the abstraction
level was trivial. See <code>riscv_setup_incoming_varargs</code> in
<code>gcc/config/riscv/riscv.c</code> as an example. There might be some things wrong, but
it looks reasonable.</p>
<p>There were also a couple of <code>std::swap</code> calls here and there I needed to get
rid of. I made a temporary variable and made the swap by hand in the classic way.</p>
<p>Some other changes were harder to spot. Like these:</p>
<pre><code class="language-diff"> || !TYPE_MIN_VALUE (index)
- || !tree_fits_uhwi_p (TYPE_MIN_VALUE (index))
- || !tree_fits_uhwi_p (elt_size))
+ || !host_integerp(TYPE_MIN_VALUE (index),0)
+ || !host_integerp(elt_size,0))
return -1;
- n_elts = 1 + tree_to_uhwi (TYPE_MAX_VALUE (index))
- - tree_to_uhwi (TYPE_MIN_VALUE (index));
+ n_elts = 1 + TREE_INT_CST_LOW(TYPE_MAX_VALUE (index))
+ - TREE_INT_CST_LOW (TYPE_MIN_VALUE (index));
</code></pre>
<p>All those functions and macros are pretty different, but they happen to be more
or less the same. What I did here was: read the newer <span class="caps">MIPS</span> implementation, try
to find those and then go back in time to the old <span class="caps">MIPS</span> implementation and see
what they were using instead. It wasn’t obvious at the beginning so I read the
definitions of all of those things (ctags for the win!) and I even had to
define some like <code>sext_hwi</code>, which I added to <code>gcc/hwint.h</code> like I could.</p>
<h4>The include dance</h4>
<p>If you check the changes on the top of <code>gcc/config/riscv/riscv.c</code>, you’ll see
there are a lot of <code>#include</code>s removed and some new ones are added. This is
normal, as the older C <span class="caps">API</span> was very different to the newer C++ one, but also
because many of these includes were not really used inside of the code. First I
reviewed which files did exist but later just copied from <span class="caps">MIPS</span> and rearranged
until the thing compiled.</p>
<h4>Crazy changes and inventions</h4>
<p>Some other changes were crazier. I had to add the <code>riscv_cpu_cpp_builtins</code>
which was defined in <code>gcc/config/riscv/riscv-c.c</code> but I had no way to make it
work so I copied what was done in other places and made it a huge macro, added
it to <code>gcc/config/riscv/riscv.h</code> and prayed. The compiler was happy with that
change, and I was too. That let me remove the <code>riscv-c.c</code> file from the
compilation process, even if it’s still included in the repository (yeah, I know…).</p>
<p>The <code>riscv.h</code> file has some other magic tricks too. The <code>ASM_SPEC</code> is a lot of
fun now. Basically a copy of somewhere else, because defining the craziest
macro I’ve seen in my life was too much for me:</p>
<pre><code class="language-diff">#define ASM_SPEC "\
%(subtarget_asm_debugging_spec) \
-%{" FPIE_OR_FPIC_SPEC ":-fpic} \
+%{fpic|fPIC|fpie|fPIE:-k}\
%{march=*} \
%{mabi=*} \
%(subtarget_asm_spec)"
</code></pre>
<p>Wanna see the macro? Well you asked for it (this is just half of it):</p>
<pre><code class="language-c">#ifdef ENABLE_DEFAULT_PIE
#define NO_PIE_SPEC "no-pie|static"
#define PIE_SPEC NO_PIE_SPEC "|r|shared:;"
#define NO_FPIE1_SPEC "fno-pie"
#define FPIE1_SPEC NO_FPIE1_SPEC ":;"
#define NO_FPIE2_SPEC "fno-PIE"
#define FPIE2_SPEC NO_FPIE2_SPEC ":;"
#define NO_FPIE_SPEC NO_FPIE1_SPEC "|" NO_FPIE2_SPEC
#define FPIE_SPEC NO_FPIE_SPEC ":;"
#define NO_FPIC1_SPEC "fno-pic"
#define FPIC1_SPEC NO_FPIC1_SPEC ":;"
#define NO_FPIC2_SPEC "fno-PIC"
#define FPIC2_SPEC NO_FPIC2_SPEC ":;"
#define NO_FPIC_SPEC NO_FPIC1_SPEC "|" NO_FPIC2_SPEC
#define FPIC_SPEC NO_FPIC_SPEC ":;"
#define NO_FPIE1_AND_FPIC1_SPEC NO_FPIE1_SPEC "|" NO_FPIC1_SPEC
#define FPIE1_OR_FPIC1_SPEC NO_FPIE1_AND_FPIC1_SPEC ":;"
#define NO_FPIE2_AND_FPIC2_SPEC NO_FPIE2_SPEC "|" NO_FPIC2_SPEC
#define FPIE2_OR_FPIC2_SPEC NO_FPIE2_AND_FPIC2_SPEC ":;"
#define NO_FPIE_AND_FPIC_SPEC NO_FPIE_SPEC "|" NO_FPIC_SPEC
#define FPIE_OR_FPIC_SPEC NO_FPIE_AND_FPIC_SPEC ":;"
</code></pre>
<p>Well anyway, more things were basically made up like that, like these lines in
<code>gcc/config/riscv/linux.h</code>:</p>
<pre><code class="language-diff">-#define TARGET_OS_CPP_BUILTINS() \
- do { \
- GNU_USER_TARGET_OS_CPP_BUILTINS(); \
- } while (0)
+#define TARGET_OS_CPP_BUILTINS() LINUX_TARGET_OS_CPP_BUILTINS()
</code></pre>
<pre><code class="language-diff"> %{!shared: \
%{!static: \
%{rdynamic:-export-dynamic} \
- -dynamic-linker " GNU_USER_DYNAMIC_LINKER "} \
+ -dynamic-linker " LINUX_DYNAMIC_LINKER "} \
%{static:-static}}"
</code></pre>
<p>I just copied from other places because there were absolutely no references to
those macros, so… I thought the best way to do this was to copy what other
targets did.</p>
<p>Of course this whole thing is not really tested right now, because this affects
how the linker is called, but that was broken anyway because of my distribution
of choice (Guix I love you but…) so what could I do? Just make them up and
fix them later sounded like a good plan.</p>
<p>As I already mentioned, I left builtins and memory models out of the equation.
Just commented them out and hoped everything worked properly for small
programs. I will try larger programs later.</p>
<h4>Argument handling</h4>
<p>The last commit<sup id="fnref2:last-commit"><a class="footnote-ref" href="#fn:last-commit">5</a></sup> was a little bit hard to do too, the changes
related to this one were adding a file that was completely out of place, as we
said earlier, so I reviewed other architectures and found how those
architectures dealt with this. First, the <span class="caps">API</span> was pretty different so the first
thing I made was to make the function’s formal arguments fit those on the <span class="caps">API</span>
and then started making changes.</p>
<p>It was really hard to realize how the <code>MASK_*</code> macros worked just looking to
the code, because there were defined nowhere!</p>
<p>The problem was I wasn’t looking in the correct place. More code generation
magic! The <code>gcc/config/riscv/riscv.opt</code> file is what handles all those masks
and <code>TARGET_*</code> macros, like <code>TARGET_MUL</code> to check if the target has the
multiplication plugin. All those were defined there, even if the definition was
obscure and hard to match with anything else in the code<sup id="fnref:hard-to-match"><a class="footnote-ref" href="#fn:hard-to-match">12</a></sup>.</p>
<p>Once that was understood everything else was easier to do, “just follow <span class="caps">MIPS</span>
and you’ll be fine” I told myself, and it worked. Moved everything to <code>riscv.c</code>
where all the other target description macros and functions are defined and…
Boom! Working compiler.</p>
<h3>Result</h3>
<p>With all these changes is now possible to generate a minimal compiler and
compile a file. As we said, we are only interested on the C to assembly
conversion at the moment, and that’s what we have and nothing else.</p>
<p>Taking the project as it is right now you can run:</p>
<pre><code class="language-bash">$ guix build -f guix.scm
...
/gnu/store/gsq72r3xnv7b2f1l4z5idpy3j900hizk-gcc-4.6.4-HEAD-debug
/gnu/store/qglp0cx0nq2nblcg9ya4gmc5gfk2amjg-gcc-4.6.4-HEAD-lib
/gnu/store/l612a4h9a6l4hs7kq49rph4clwf6l2k5-gcc-4.6.4-HEAD
</code></pre>
<p>So you’ll get something like this:</p>
<style>
code {
line-height: 1;
}
</style>
<pre><code class="language-bash">$ tree /gnu/store/l612a4h9a6l4hs7kq49rph4clwf6l2k5-gcc-4.6.4-HEAD
/gnu/store/l612a4h9a6l4hs7kq49rph4clwf6l2k5-gcc-4.6.4-HEAD
├── bin
│ ├── riscv64-unknown-linux-gnu-cpp
│ ├── riscv64-unknown-linux-gnu-gcc
│ ├── riscv64-unknown-linux-gnu-gcc-4.6.4
│ └── riscv64-unknown-linux-gnu-gcov
├── etc
│ └── ld.so.cache
├── libexec
│ └── gcc
│ └── riscv64-unknown-linux-gnu
│ └── 4.6.4
│ ├── cc1
│ ├── collect2
│ ├── install-tools
│ │ ├── fixincl
│ │ ├── fixinc.sh
│ │ ├── mkheaders
│ │ └── mkinstalldirs
│ └── lto-wrapper
├── riscv64-unknown-linux-gnu
│ └── lib
└── share
...
16 directories, 28 files
</code></pre>
<p>If you want to try it, you can generate an extremely simple C file and give it
a go:</p>
<pre><code class="language-bash">$ cat <<END > hello.c
int main (int argc, char * argv[]){
return 19;
}
END
$ /gnu/store/...-gcc-4.6.4-HEAD/bin/riscv64-unknown-linux-gnu-gcc -S hello.c
$ cat hello.s
.file "hello.c"
.option nopic
.text
.align 1
.globl main
.type main, @function
main:
add sp,sp,-32
sd s0,24(sp)
add s0,sp,32
mv a5,a0
sd a1,-32(s0)
sw a5,-20(s0)
li a5,19
mv a0,a5
ld s0,24(sp)
add sp,sp,32
jr ra
.size main, .-main
.ident "GCC: (GNU) 4.6.4"
</code></pre>
<p>This can be later assembled and linked using binutils with not much
trouble, as we might have introduced in the past.</p>
<h3>Conclusion</h3>
<p>The process as you can see is pretty much a pattern matching exercise, as I
already mentioned in the beginning. Of course there were some places where I
needed to review the different APIs and their implementation, but those were
just a few. Not bad. We made this “work” in a short period of time and it looks
pretty well.</p>
<p>Now I need to test this further, make more complex programs and try it, but
it’s actually very difficult to do with the current compilation process because
the standard C library is not found correctly and the assembler and the linker
have to be dealt with independently. This means I need to fix the context
first and then review the compiler itself.</p>
<p>On the other hand, the memory model related code, the builtins and the code I
basically made up are worrying part of the project, because they might be a
point of failure in the future. If they work only for optimizations and
multithreading, that might not be an issue, because I don’t know how much of
that is used in the <span class="caps">GCC</span> version we are going to compile with this compiler.
Remember our backport’s only goal is to compiler a more recent <span class="caps">GCC</span> with it, so
we don’t really need to care about other programs.</p>
<p>I already asked some people<sup id="fnref:people"><a class="footnote-ref" href="#fn:people">13</a></sup> about the memory model parts and I got a
very simple solution from them (basically forget about the memory models and
always make a <code>fence</code> before and after synchronization code), so that’s going
to be solved for the next post, and I can always review the builtins later if I
need them.</p>
<p>The rest of the code looks like it would work in more complex cases, but still
this needs proper testing and I need to be able to include the standard C
library for that.</p>
<h3>Reviewing the code</h3>
<p>Of course, we are going to find bugs, and I did find some bugs in the
development of the process. The code review is really hard to do so it’s better
to use tricks and magic.</p>
<p>First of all, we need some debug symbols for <code>gdb</code> to find where the errors are
and be able to debug them properly. The defined Guix package has a
strip-binaries step that moves all the debug symbols to a separate folder:</p>
<pre><code class="language-bash">$ guix build -f guix.scm
...
/gnu/store/gsq72r3xnv7b2f1l4z5idpy3j900hizk-gcc-4.6.4-HEAD-debug
/gnu/store/qglp0cx0nq2nblcg9ya4gmc5gfk2amjg-gcc-4.6.4-HEAD-lib
/gnu/store/l612a4h9a6l4hs7kq49rph4clwf6l2k5-gcc-4.6.4-HEAD
</code></pre>
<p>The <code>debug</code> directory there contains the debug symbols of the binaries so we
can just call <code>gdb</code> and then use the <code>symbol-file</code> command to load the debug
symbols associated with the program itself.</p>
<p>It is important to note that loading the <code>gcc</code> binary is a problem because it
is a driver that <code>exec</code>s other binaries, so the errors can’t be really followed
properly. It’s better to choose the specific program we want to debug, normally
<code>cc1</code>.</p>
<p>This happened to be extremely important because I forgot to convert one
function to the old <span class="caps">API</span> and it was giving a segmentation fault. Using the <span class="caps">GNU</span>
Debugger I found the source of the error and I just replaced formal arguments
with the proper ones.</p>
<h3>Last words</h3>
<p>So, all that being said, we covered the changes, the possible problems, how to
debug and what’s coming next. That was basically it.</p>
<p>If you have any question, suggestion, comment, or anything you want to share
about this, contact me<sup id="fnref:contact"><a class="footnote-ref" href="#fn:contact">14</a></sup>. I’d be very happy to discuss.</p>
<p>From here, the plan is to review what I already did, test more complex software
and share the results with you and also try to make the compilation process
more reasonable. I hope it’s easier to do than it looks.</p>
<p>Wish me luck.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:port">
<p><code>06166d9e5ff121fd3dfd6c0995621e557a023ef0</code> <a class="footnote-backref" href="#fnref:port" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:conflicts">
<p>I screwed the ChangeLog files anyway <span class="caps">LOL</span>. <a class="footnote-backref" href="#fnref:conflicts" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:md-files">
<p><code>af295d607786f96b4e8f2e35f41ca34820a9aacb</code> <a class="footnote-backref" href="#fnref:md-files" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:large-commit">
<p><code>14577a05e3d64c9e2a05e8f0ff1f8965ddb27b68</code> <a class="footnote-backref" href="#fnref:large-commit" title="Jump back to footnote 4 in the text">↩</a><a class="footnote-backref" href="#fnref2:large-commit" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:last-commit">
<p><code>2b97a03a443fe8e408d7129bce9658032d0d9cd2</code> <a class="footnote-backref" href="#fnref:last-commit" title="Jump back to footnote 5 in the text">↩</a><a class="footnote-backref" href="#fnref2:last-commit" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:guilt">
<p>And I’m trying not to feel guilty for it. <a class="footnote-backref" href="#fnref:guilt" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:gcc-course">
<p>There’s a great set of <a href="https://www.cse.iitb.ac.in/grc/index.php?page=videos">videos about <span class="caps">GCC</span> at the <span class="caps">GCC</span> Resource
Center</a>. They
specifically talk about <span class="caps">GCC</span> 4.6! I watched them before going for the code and
they helped me a lot to understand how was the code organized and how did <span class="caps">GCC</span>
work. I recommend them a lot. <a class="footnote-backref" href="#fnref:gcc-course" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
<li id="fn:efraim">
<p>This <code>local-file</code> thing I learned from Efraim Flashner, currently a
Guix maintainer, who gave a talk called “Compile it with Guix” where he
introduces this method. Sadly, I can’t find the talk in the web to link you
to it. <a class="footnote-backref" href="#fnref:efraim" title="Jump back to footnote 8 in the text">↩</a></p>
</li>
<li id="fn:staged">
<p>This process is that you compile <span class="caps">GCC</span> with the compiler you had
(stage-1), then the resulting <span class="caps">GCC</span> compiles itself (stage-2), and the
resulting <span class="caps">GCC</span> compiles itself again (stage-3). One way to make sure
everything is correct is to compare the binary of the stage-2 and the
stage-3. If they are the same, there are chances that our code is correct.
If they are different, our code is wrong. <span class="caps">GCC</span>’s compilation framework does
this automatically (if <code>--disable-bootstrap</code> is not set) but, you can’t do it
when cross-compiling, because there’s no way to run the stage-1 compiler. I
would like to see the result of this process, but I can’t at the moment. <a class="footnote-backref" href="#fnref:staged" title="Jump back to footnote 9 in the text">↩</a></p>
</li>
<li id="fn:see">
<p>See? That’s why I try to write blog posts about the things I do, that
way I don’t forget things. It was too late for this. <a class="footnote-backref" href="#fnref:see" title="Jump back to footnote 10 in the text">↩</a></p>
</li>
<li id="fn:def">
<p>These <code>.def</code> files are a lot of fun in <span class="caps">GCC</span>’s codebase. They appear
really often. They are files that look like a bunch of similar function calls
but what they actually are macro calls. Then, this files are <code>#include</code>d
into another file right after the macro is defined so they generate code.
Later, you can redefine the macro to create some other output and <code>#include</code>
them again so they’ll always generate coherent code. This is used a lot on
enums and switch-case statements, if you want them both to be coherent, you
can move them to a <code>.def</code> file, define all the possible values of the enum
there, and generate first the enum with the first <code>#include</code> and later the
switch-case with a new <code>#include</code> later. Take a look to <code>gcc/rtl.c</code> and
you’ll see what I mean. (Yes I know this is like hardcore magic and it’s hard
to understand, I didn’t choose to do this). <a class="footnote-backref" href="#fnref:def" title="Jump back to footnote 11 in the text">↩</a></p>
</li>
<li id="fn:hard-to-match">
<p>I say “hard to match” because searching for <code>TARGET_MUL</code> or
<code>MASK_MUL</code> gave <strong><span class="caps">NO</span></strong> results, and searching for <code>MUL</code> gave too many. <a class="footnote-backref" href="#fnref:hard-to-match" title="Jump back to footnote 12 in the text">↩</a></p>
</li>
<li id="fn:people">
<p>I asked Andrew Waterman himself (one of the authors of <span class="caps">RISC</span>-V, and
the current maintainer of the <span class="caps">RISC</span>-V <span class="caps">GCC</span> target). Yep, and he actually
answered. <a class="footnote-backref" href="#fnref:people" title="Jump back to footnote 13 in the text">↩</a></p>
</li>
<li id="fn:contact">
<p>You can find my contact info in the <a href="/pages/about.html">About
page</a>. <a class="footnote-backref" href="#fnref:contact" title="Jump back to footnote 14 in the text">↩</a></p>
</li>
</ol>
</div>ELF format — why not?2022-03-14T00:00:00+02:002022-03-14T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-03-14:/bootstrapGcc2.html<p>Some introduction to <span class="caps">ELF</span> as we’ll need to deal with this in the future.</p><p>In the <a href="https://ekaitz.elenq.tech/bootstrapGcc1.html">previous post</a> of the
<a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series</a> we introduced <span class="caps">GCC</span> and how it
generates assembly code and we left a question unanswered: <em>“Why is learning
about <span class="caps">ELF</span> interesting if <span class="caps">GCC</span> generates assembly?”</em>. In this post we are going
to answer that question (not interesting) and maybe understand the very basics
of <span class="caps">ELF</span> file format (more interesting).</p>
<h3>What’s <span class="caps">ELF</span></h3>
<p><span class="caps">ELF</span> is a file format with two main goals:</p>
<ul>
<li>Represent an executable file</li>
<li>Represent a linkable file</li>
</ul>
<p>Apart from that, <span class="caps">ELF</span> can also represent core dumps, but if you think about that
all of the possible options have something in common: they represent contents
on the memory. We can simply say <span class="caps">ELF</span> is a file format that acts as a picture of
the state of the memory. In the case of the executables, the state will be
loaded from the file, but in the case of the core dumps the state is obtained
from the memory and dumped in a file.</p>
<p>Linkable files are those files that can be combined with others to generate
executables or shared objects, so they can also fit that definition because
they are going to end up in the memory anyway.</p>
<p>For efficiency reasons, the <span class="caps">ELF</span> format has two separate views of the same contents:</p>
<ul>
<li>The <strong>Linking</strong> view is based on sections and needs a <em>section header</em>.</li>
<li>The <strong>Executable</strong> view is based on segments and needs a <em>program header</em>.</li>
</ul>
<h4><span class="caps">ELF</span> header</h4>
<p>The <span class="caps">ELF</span> header is the only thing that has a fixed position in the file, at the
beginning. The <span class="caps">ELF</span> header has information that defines how to identify the
file, the machine, the endianness and that sort of things, but it also says
where are the headers located and identifies the size of their entries and
their entry count.</p>
<p>It’s not that interesting, honestly. The most important thing is it points to
the descriptions to both of the views (the headers) so we can check them.</p>
<h4>Linking view</h4>
<p>Based on sections, the linking view is the most detailed view of the file and
it defines how the file should be linked with others in order to create an
executable file.</p>
<p>Sections, the basic unit of the linking view, are consecutive sequences of
bytes that do not overlap.</p>
<p>There are <a href="https://refspecs.linuxfoundation.org/LSB_3.0.0/LSB-PDA/LSB-PDA.junk/sections.html">different types of sections according to their possible contents and
meaning</a>,
the most interesting are:</p>
<ul>
<li><code>SYMTAB</code> and <code>DYNSYM</code> that hold a symbol table. The <code>DYNSYM</code> is for dynamic
linking symbols, while <code>SYMTAB</code> normally is used for static linking but may
contain both.</li>
<li><code>STRTAB</code> holds a string table.</li>
<li><code>RELA</code> contains relocation entries with addends and <code>REL</code> contains
relocations without addends.</li>
<li><code>NOTE</code> section contains some information of the file.</li>
<li><code>HASH</code> contains a symbol hash table, necessary for dynamic linking.</li>
<li><code>DYNAMIC</code> for dynamic linking information.</li>
</ul>
<p>Each section has also a <code>name</code>, an <code>address</code> if it is supposed to appear in the
memory of running process, an <code>offset</code> that defines where in the file do the
section’s contents appear, a <code>size</code>, and some extra data fields that all
together form a section header entry.</p>
<p>The section header entries are all located where the <span class="caps">ELF</span> header says, one after
the other (like a C array of structures), so the programs just need to access
that position in the file and read all the headers in a row. The contents of
the sections are located throughout the file, where the section headers point.</p>
<h5>String section</h5>
<p>The string section (<code>STRTAB</code>) is one of the simplest. It contains all the
strings of the file: the section and symbol names. It’s simply a set of null
terminated strings, written one after the other (it also starts with a null
character but whatever).</p>
<p>Anywhere in the file where we are supposed to get an string what we get is an
index that points to the first position in this section to read from. We should
read from that until we reach a null character. For example in the following
string section:</p>
<pre><code> \0 h e l l o \0 n a m e \0
</code></pre>
<p>If a name of a section says <code>1</code>, the actual name of the section is <code>hello</code>
and if it says <code>7</code> it would be <code>name</code>. Also, if it says <code>9</code> it would be <code>me</code>,
this trick could be used too.</p>
<h5>Symbol table</h5>
<p>The symbol table contains information needed to locate and relocate a program’s
symbolic definitions and references. The symbol table is formed as an array of
symbol elements that are defined with a <code>name</code>, obviously a <code>value</code>, their
<code>size</code>, some extra <code>info</code>, the index of the section header they relate to
(<code>shndx</code>) and some <code>other</code> stuff.</p>
<p>The <code>info</code> field manages symbol’s type (<code>OBJECT</code> for data, <code>FUNC</code> for
function…) and binding attributes, which define the linking visibility and
behavior of the symbol (local vs global…).</p>
<p>The <code>value</code> can be interpreted in several ways too, depending on the type of
the symbol you are dealing with. But that’s not really relevant for us at the moment.</p>
<h5>Relocation</h5>
<p>According to the <span class="caps">ELF</span> documentation I got from somewhere I don’t really remember:</p>
<blockquote>
<p>The relocation is the process of connecting symbolic references with symbolic definitions. </p>
</blockquote>
<p>I hope it’s more explanatory for you than what it is to me, but I don’t have a
clue of what that is supposed to mean. The
<a href="https://en.wikipedia.org/wiki/Relocation_(computing)">Wikipedia</a> does a <strong>much
better</strong> job in the specifics right here:</p>
<blockquote>
<p>Relocation is the process of assigning load addresses for position-dependent
code and data of a program and adjusting the code and data to reflect the
assigned addresses.</p>
</blockquote>
<p>If this doesn’t really help, you have a really good example later, but we can
basically say that it’s a way to adjust the code to point to the correct
addresses, at linking or loading, or even execution, time.</p>
<p><span class="caps">ELF</span> files have, as we said, sections that let us define relocations. These will
point to some parts of the file and tell the linker or the loader that that
positions of the file must be reprocessed.</p>
<p>There are two types of relocation sections and in both of them the relocation
section is an array of entries where each of them represents one relocation.
In the simple one (<code>REL</code>) each relocation only contains an <code>offset</code> and an
<code>info</code> word, which also includes the type of relocation to apply. The more
complex one (<code>RELA</code>) is mostly the same but it includes an <code>addend</code> which
includes a constant value to use in calculation of the relocation.</p>
<p>The calculus of the final addresses are specific to the <span class="caps">ISA</span> and the relocation
type, because processors have different instruction formats and different ways
to pack addresses in instructions. <span class="caps">RISC</span>-V has no way to pack a full address
inside of an instruction, while x86 does, so they have to patch the
instructions in a different way.</p>
<h5>Special sections</h5>
<p>Some sections have a special treatment according to their name, normally the
ones that start with a dot. These you might have found in the past in assembly
files, defined like <code>.data</code> (for data), <code>.rodata</code> (for read only data) or
<code>.text</code> (for code).</p>
<p>These are interesting to have in mind because they appear the same way they do
in assembly, and we are going to disassemble some of them and play around with them.</p>
<p>Other special sections like <code>.got</code> or <code>.dynamic</code> don’t appear in assembly but
they have a strong meaning in the resulting file, we are not going to deal with
those today because we want to finish this post someday. If you need to deal
with those I recommend you to read <span class="caps">ELF</span>’s documentation on special sections and
the loading process.</p>
<h4>Executable view</h4>
<p>The executable view is another way to access the same contents, but with a
different perspective. It’s based on <em>segments</em> rather than <em>sections</em>.
Segments are also pieces of the file, as sections are, but segments can contain
one or more sections.</p>
<p>Like in the linking view, the base unit, sections for the linking view but for
segments for the executable view, are described in a header. The header of the
executable view is called program header and it is, like the section header, a
bunch of structures piled together, each describing one of the segments.</p>
<p>The program header describes the position and size in the file of each of the
segments but also some important information about them: how they are supposed
to be loaded in the memory and where (virtual address and physical address),
the type of the segment, and some info more.</p>
<p>The most interesting segment types are the following:</p>
<ul>
<li><code>LOAD</code> is used for loadable segments, with the other fields of the segment
the position and the size this segment will have in memory are described.</li>
<li><code>DYNAMIC</code> are segments that have some dynamic linking information. It has to
contain the <code>.dynamic</code> section.</li>
<li><code>INTERP</code> gives the location and size of a null-terminated path name to invoke
as an <em>interpreter</em>. Interpreter in this context usually means a dynamic
linker, which will be called instead of loading this file to memory and the
dynamic linker will be the one that will load the parts of the file it considers.</li>
</ul>
<p>You can see how segments are interesting for loading the file in the memory,
that is, they are mostly interesting for executable files or shared objects.</p>
<h4>Segments vs Sections</h4>
<p>If you want to have a clear idea about the difference between segments and
sections, you can consider a file with multiple sections: <code>.text</code>, <code>.rodata</code>
and <code>.data</code>.</p>
<p>A file that contains those sections can be understood from a linking
perspective as a file that has some code (<code>.text</code>), read-only data (<code>.rodata</code>)
and read-write data (<code>.data</code>). Each of those parts must be managed in a
different way by the linker, but the reality is that the program loader doesn’t
really care about some of the differences of them.</p>
<p>The code and the read-only data are loaded in the memory in the same way, with
read and execute permission but no write permission, so the executable view can
put both sections in the same segment, and make the loader’s life easier.</p>
<p>Also, the linker doesn’t really care about how is the memory loaded so the
section header does not hold that information. It does care about the section’s
goals though, as it will need to put them together in order during the linking.
On the other hand, the loader is not really interested on what’s the goal of
the contents of the file but only on what to do with those contents, so it only
has that information.</p>
<h3>So, why do we need to learn it?</h3>
<p>We don’t really need to learn it very deeply, just learn how it works in a
high-level way and make sure we are able to read it with the tools we have
available. The good news for you is if the reasons I give you are not good
enough it doesn’t really matter because you already learned<sup id="fnref:gotcha"><a class="footnote-ref" href="#fn:gotcha">1</a></sup>. Continue
reading and you’ll realize how much you understand now.</p>
<p>First, let me tell you a personal story. I have previous experience working
with assembly, but only in small devices that have two memories, one for data
and other for code (Hardvard Architecture). In those small devices you often
don’t really need to think about how the code and the data is mapped to memory
because your programs are small and the separation is clear. Computers are a
different thing, and I have had issues understanding this whole assembly thing.</p>
<p>Computers store both code and data in the same memory, the main memory, (Von
Neumann Architecture) and they normally have memory segmentation, pagination,
memory management units and all that kind of stuff, because there are many
processes running and they want to separate one from the other. That forces us
to think about how the code and the data are mapped to the memory. Also, modern
operating systems also use dynamic linkers, which are not available in small
devices, and we need to be able to deal with that amount of complexity.</p>
<p><span class="caps">ELF</span> allows us to make that all, because it was born for that. <span class="caps">ELF</span> is a
distillation of many of the ideas from System V Unix, that include exactly all
I mentioned. It’s a great way to understand how memory, linking and processes
work in a <em>modern</em> operating system. This is why you need to learn it, at least
a little. It makes you a cultivated person, which is always good<sup id="fnref:system-v"><a class="footnote-ref" href="#fn:system-v">2</a></sup>.</p>
<h4>The specifics</h4>
<p>As I’m sure you are not satisfied totally with the answer of being a cultivated
person<sup id="fnref:some-of-you"><a class="footnote-ref" href="#fn:some-of-you">3</a></sup>, let me go for some specifics.</p>
<p>So in this project <span class="caps">GCC</span> is not the only software we are dealing with, <span class="caps">GNU</span>
Binutils and TinyCC are part of the party too, and I need to make them fit
together in the best way possible. In those I need to make sure the
relocations, formats and other things work properly, following the <span class="caps">RISC</span>-V <span class="caps">ABI</span>
specification for <span class="caps">ELF</span>. That might be a point of failure, so being prepared on a
high-level at least is interesting.</p>
<p>Of course, <span class="caps">GCC</span>’s output we need to analyze too, and in order to do that we need
to make sure we know what it means. We already saw that some <span class="caps">ELF</span> sections are
directly mentioned in the assembly, so in order to know their meanings <span class="caps">ELF</span> is a
good way to understand them. They are really an <span class="caps">OS</span> related thing and <span class="caps">ELF</span> only
reflects it, but learning them from the <span class="caps">ELF</span> perspective makes the path easier probably.</p>
<p>Relocations are a huge point in all this mess, because they are machine
specific (instructions are too, but those I expect us to know already), and
they are something I didn’t need to research on all the <span class="caps">RISC</span>-V adventures I had
last year. I have to do it sometime.</p>
<p>In general, there are many sharp edges where we can get hurt, so it’s better if
we wear gloves.</p>
<h3>Tools</h3>
<p>For all this process there are a couple of tools that were designed to help.
<span class="caps">GNU</span> Binutils has many of them but we are going to focus on two, as they are
more than enough for many usecases: <code>objdump</code> and <code>readelf</code>.</p>
<p>The example below uses both of them to analyze a piece of code and its
compilation result. As you’ll see, the main problem they have is their output:
it’s not always clear, the formatting is a little bit chaotic, it’s not
obvious at all to get right and it’s really hard to use it procedurally.</p>
<p>There is a really cool tool you should investigate though, called <span class="caps">GNU</span> Poke,
that is designed specifically to fight against those issues. I recommend you to
<a href="https://www.gnu.org/software/poke/">take a look to it</a>.</p>
<h3>Example</h3>
<p>Starting from a very simple C file we can follow a really interesting process
and understand some of the <span class="caps">ELF</span> internals:</p>
<pre><code class="language-c">long global_symbol;
int main() {
return global_symbol != 0;
}
</code></pre>
<p>We compile it to assembly with:</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-gcc -S b.c -O0
</code></pre>
<p>This are the contents of the assembly file:</p>
<pre><code class="language-asm"> .file "b.c"
.option pic
.text
.globl global_symbol
.bss
.align 3
.type global_symbol, @object
.size global_symbol, 8
global_symbol:
.zero 8
.text
.align 1
.globl main
.type main, @function
main:
addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
lla a5,global_symbol
ld a5,0(a5)
snez a5,a5
andi a5,a5,0xff
sext.w a5,a5
mv a0,a5
ld s0,8(sp)
addi sp,sp,16
jr ra
.size main, .-main
.ident "GCC: (Debian 10.2.1-6) 10.2.1 20210110"
.section .note.GNU-stack,"",@progbits
</code></pre>
<p>Assemble the file with <code>as</code>:</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-as b.s -o b.o
</code></pre>
<p>And this is what we get in <code>b.o</code>. The <code>.text</code> section contains the following:</p>
<pre><code class="language-asm">$ riscv64-linux-gnu-objdump --disassemble b.o
b.o: file format elf64-littleriscv
Disassembly of section .text:
0000000000000000 <main>:
0: ff010113 addi sp,sp,-16
4: 00813423 sd s0,8(sp)
8: 01010413 addi s0,sp,16
c: 00000797 auipc a5,0x0
10: 00078793 mv a5,a5
14: 0007b783 ld a5,0(a5) # c <main+0xc>
18: 00f037b3 snez a5,a5
1c: 0ff7f793 andi a5,a5,255
20: 0007879b sext.w a5,a5
24: 00078513 mv a0,a5
28: 00813403 ld s0,8(sp)
2c: 01010113 addi sp,sp,16
30: 00008067 ret
</code></pre>
<h3>Relocations</h3>
<p>There are some relocations!</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-objdump b.o -r
b.o: file format elf64-littleriscv
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
000000000000000c R_RISCV_PCREL_HI20 global_symbol
000000000000000c R_RISCV_RELAX *ABS*
0000000000000010 R_RISCV_PCREL_LO12_I .L0
0000000000000010 R_RISCV_RELAX *ABS*
</code></pre>
<p>But in order to understand those relocations properly we need to check the
value of the symbols too:</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-objdump -t b.o
b.o: file format elf64-littleriscv
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 b.c
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
000000000000000c l .text 0000000000000000 .L0
0000000000000000 l d .comment 0000000000000000 .comment
0000000000000000 g O .bss 0000000000000008 global_symbol
0000000000000000 g F .text 0000000000000034 main
</code></pre>
<p>If you pay attention to the offsets of those relocations (<code>0x0c</code> and <code>0x10</code>)
they exactly match the instructions <code>auipc a5, 0x0</code> and <code>mv a5, a5</code> and those
are expanded from the <code>lla a5, global_symbol</code> (load local address)
pseudoinstruction from the assembly.</p>
<p>The <code>mv</code> is not really a <code>mv</code>. <code>mv</code> is a pseudoinstruction too, that should be
expanded to an <code>addi a5, a5, 0</code>. The <code>objdump</code> is playing with us, making the
opposite conversion so we can read better but in fact is tricking us.</p>
<p>The <code>auipc</code> + <code>addi</code> couple in <span class="caps">RISC</span>-V appears pretty often, because it’s the
method it has to load addresses in memory. The first instruction, <code>auipc</code> adds
a high part of an immediate to the program counter and stores the result in a
register, the <code>addi</code> adds then another, in this case low, immediate to the
register i.e. they make a <code>x[reg] = pc + immediate</code> operation in two steps:
<code>x[reg] = pc + hi20(immediate)</code> followed by <code>x[reg] = x[reg] + lo12(immediate)</code>.</p>
<p>As we have relocations in both <code>auipc</code> and <code>addi</code> this means their <code>0</code> values
(the immediates) are going to be overwritten with something else at linking
time, and there’s when <span class="caps">RISC</span>-V has something to say. All the relocations we can
see are <span class="caps">RISC</span>-V specific, and you can read about them in <a href="https://github.com/riscv-non-isa/riscv-elf-psabi-doc"><span class="caps">RISC</span>-V <span class="caps">ABI</span>
Specification</a>.</p>
<p>In our case we have really some simple ones, the easiest to understand (what a
coincidence, huh?):</p>
<blockquote>
<p><code>R_RISCV_PCREL_HI20</code>: High 20 bits of 32-bit <span class="caps">PC</span>-relative reference,
<code>%pcrel_hi(symbol)</code>. The formula is: <code>S+A-P</code> [but only obtains the highest 20 bits].</p>
<p><code>R_RISCV_PCREL_LO12_I</code>: Low 12 bits of a 32-bit <span class="caps">PC</span>-relative,
<code>%pcrel_lo(address of %pcrel_hi)</code>, the addend must be 0. The formula is:
<code>S-P</code> [but it only obtains the lowest 12 bits].</p>
</blockquote>
<p>Both the <code>HI20</code> and the <code>LO12</code> have a similar formula, this is the meaning of
the elements on the formula:</p>
<ul>
<li><code>S</code>: Address of the symbol</li>
<li><code>A</code>: Addend of the relocation</li>
<li><code>P</code>: Position of the relocation</li>
</ul>
<p>If you match their formulas with the description of what we just said about how
do <code>auipc</code> + <code>addi</code> couples work, you can easily understand the formulas and
their meaning. We are not going to do it, do something yourself!</p>
<p>The other relocation:</p>
<blockquote>
<p><code>R_RISCV_RELAX</code>: Instruction can be relaxed, paired with a normal relocation
at the same address.</p>
</blockquote>
<p>Is an addition our example doesn’t use but it could. The <code>R_RISCV_RELAX</code>
basically means that if the relocation it points at is not needed it can be
discarded. And when does that happen? Easy, when we can get <code>global_symbol</code><span class="quo">‘</span>s
address with only one of them, we can remove the other instruction from the program.</p>
<h4>Relocation resolution</h4>
<p>If we link the file and generate an executable, we can see the final value
those zeroes get.</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-gcc b.o -o b.out
</code></pre>
<p>We link it like this because <code>ld</code> needs a lot of input fields and we don’t want
to set them all by hand, but you can do it with <code>ld</code> if you feel like it.</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-objdump --disassemble b.out
...
00000000000005e4 <main>:
5e4: ff010113 addi sp,sp,-16
5e8: 00813423 sd s0,8(sp)
5ec: 01010413 addi s0,sp,16
5f0: 00002797 auipc a5,0x2
5f4: a6878793 addi a5,a5,-1432 # 2058 <global_symbol>
5f8: 0007b783 ld a5,0(a5)
5fc: 00f037b3 snez a5,a5
600: 0ff7f793 andi a5,a5,255
604: 0007879b sext.w a5,a5
608: 00078513 mv a0,a5
60c: 00813403 ld s0,8(sp)
610: 01010113 addi sp,sp,16
614: 00008067 ret
...
</code></pre>
<p>There you see the relocation was resolved (<code>0x5f0</code> and <code>0x5f4</code>) by the linker
and the final values have been added. <code>objdump</code> is intelligent enough to tell
us where are those instructions pointing (says <code>2058 <global_symbol></code>). Just to
make sure we can search in the symbol table for the <code>global_symbol</code>:</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-objdump -t b.out | grep global_symbol
0000000000002058 g O .bss 0000000000000008 global_symbol
</code></pre>
<blockquote>
<p><span class="caps">NOTE</span>: We could try to calculate the address of the <code>global_symbol</code> as the
linker did, but it’s a little bit complicated because we also linked the file
with the standard library and the startup files, which adds the <code>crt</code> files
on top of the file. It’s really that we get more code than what we had in the
assembly file. If you want to see that, you can see the rest of the output of
the command, or even try with <code>--disassemble-all</code> and calculate the symbol
address by hand. Good luck.</p>
</blockquote>
<h4>More sections</h4>
<p>If you want the review some simple things, like a string section, you can use
<code>readelf</code> for that. The <code>-p</code> flag (equivalent to <code>--string-dump=</code>) displays the
contents of the section as strings. You can read the <code>.comment</code> section that way:</p>
<pre><code class="language-asdf">$ riscv64-linux-gnu-readelf -p .comment b.o
String dump of section '.comment':
[ 1] GCC: (Debian 10.2.1-6) 10.2.1 20210110
</code></pre>
<p>This is what we had inserted in <code>.ident</code> on the assembly file by the compiler.
We have it in the binary too.</p>
<p>In other distros the output is a little bit different. Look the output we have
in Guix:</p>
<pre><code class="language-asdf">String dump of section '.comment':
[ 1] GCC: (GNU) 11.2.0
</code></pre>
<h3>Conclusion</h3>
<p>So this whole this just to explain that <span class="caps">ELF</span> files are some kind of dual files
that have two different goals at the same time. The executable one is kind of a
picture of the memory state that can be used for loading that state in the
memory, while the linking one just describes how different parts of the
contents relate to each other and has tons of funny tricks to make the files
relocatable, position independent and that kind of things. Cool.</p>
<p>There are still many fields of <span class="caps">ELF</span> we didn’t talk about but I consider this
introduction more than enough. Having a simple understanding about how is the
file organized and what kind of information it has is probably enough for the
things we are going to need.</p>
<p>The proposed example shows that with the knowledge obtained by this short
introduction we can dig a little bit on the files that result from a
compilation and analyze their internals. That’s mostly the work I’ll need to do
when I start combining compilers in a pipeline of death and destruction.</p>
<p>If I ever need to dig on something deeper, I’ll do.</p>
<p>Anyway, I’m still unsure if I answered the question we left in the previous
post<sup id="fnref:cliff"><a class="footnote-ref" href="#fn:cliff">4</a></sup>:</p>
<blockquote>
<p>Why is learning about <span class="caps">ELF</span> interesting if <span class="caps">GCC</span> generates assembly?</p>
</blockquote>
<p>Did I?</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:gotcha">
<p>Ha! Gotcha! <a class="footnote-backref" href="#fnref:gotcha" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:system-v">
<p>It also makes you understand the complexities of the system so you
can criticize it. Changing the world requires to learn about it first. <a class="footnote-backref" href="#fnref:system-v" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:some-of-you">
<p>For those that really are. That’s the good attitude in life.
High five. You can read the whole section still, it has interesting points I
think. <a class="footnote-backref" href="#fnref:some-of-you" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:cliff">
<p>It was a good cliffhanger, though. <a class="footnote-backref" href="#fnref:cliff" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>GCC internals — From a porting perspective2022-03-08T00:00:00+02:002022-03-08T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-03-08:/bootstrapGcc1.html<p>Deep diving into <span class="caps">GCC</span>’s internals from the perspective of someone who
wants to port <span class="caps">GCC</span> for a new architecture.</p><p>In the <a href="https://ekaitz.elenq.tech/bootstrapGcc0.html">previous post</a> of the
<a href="https://ekaitz.elenq.tech/tag/bootstrapping-gcc-in-risc-v.html">series</a> the problem of the <span class="caps">GCC</span> bootstrapping
was introduced. In this post we’ll describe how <span class="caps">GCC</span> works, from the
perspective of someone who wants to port it so we understand what’s the job we
have to do.</p>
<ol>
<li><a href="#disclaimer">Disclaimer</a></li>
<li><a href="#intro">Overview</a><ol>
<li><a href="#cgf">The compiler generation framework</a></li>
<li><a href="#gcc-coordinator"><span class="caps">GCC</span> as a coordinator</a></li>
</ol>
</li>
<li><a href="#parsing">Source code parsing</a><ol>
<li><a href="#generic"><span class="caps">GENERIC</span></a></li>
</ol>
</li>
<li><a href="#gimple"><span class="caps">GIMPLE</span></a></li>
<li><a href="#rtl">Register Transfer Language</a><ol>
<li><a href="#target-dependent">Target-dependent code</a></li>
<li><a href="#md">Machine description files</a><ol>
<li><a href="#mm">Machine modes</a></li>
<li><a href="#rtl-templates"><span class="caps">RTL</span> Templates</a></li>
</ol>
</li>
<li><a href="#target-desc">Target description macros and functions</a></li>
</ol>
</li>
<li><a href="#assembly">Assembly code generation</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#job">My job in the backport</a></li>
<li><a href="#last">Last words</a></li>
<li><a href="#more">Learn more</a></li>
</ol>
<h3 id="disclaimer">Disclaimer</h3>
<ul>
<li>This post may be only valid for old <span class="caps">GCC</span> versions, like 4.something, because
that’s the one I’m interested in. More recent versions may have different
details, but I don’t expect them to be very different to what is described
here. More specifically: I’m working on <span class="caps">GCC</span> 4.6.4, and the first <span class="caps">GCC</span> with
<span class="caps">RISC</span>-V support is <span class="caps">GCC</span> 7.0.0.</li>
<li>This post will focus on how <span class="caps">GCC</span> compiles C programs because that’s the part
we care about. Some other languages have differences on how they are treated
but that’s not very relevant for us, as it has no implications on the
<em>back-end</em>.</li>
</ul>
<p>Both of these points will get clearer later.</p>
<h3 id="intro">Overview</h3>
<p><span class="caps">GCC</span> is structured as a pipeline of several steps that run one after the other.</p>
<ol>
<li>Source code parsing</li>
<li><span class="caps">GIMPLE</span> <span class="caps">IR</span> generation (target-independent)</li>
<li>Some <span class="caps">GIMPLE</span> tree optimizations</li>
<li><span class="caps">RTL</span> <span class="caps">IR</span> generation (target-dependent)</li>
<li><span class="caps">RTL</span> optimizer</li>
<li>Assembly code generator</li>
</ol>
<p>Before starting to analyze each of the steps independently there are a couple
of things to clarify.</p>
<h4 id="cfg">The Compiler Generation Framework</h4>
<p>An important point to note is <span class="caps">GCC</span> is a <strong>compiler collection</strong> meaning that it
is able to compile code from many high level languages (<span class="caps">HLL</span>) and for many
different targets. This has implications on how some steps are mapped to <span class="caps">GCC</span>’s
source code.</p>
<p>The most important thing of all this is to differentiate between <span class="caps">GCC</span>’s code and
an actual <code>gcc</code> executable. The key point here is that <span class="caps">GCC</span>’s codebase includes
what is called <span class="caps">CGF</span> (Compiler Generator Framework) that can generate <code>gcc</code>
executables from <span class="caps">GCC</span>’s code. The <span class="caps">CGF</span> generates <code>gcc</code> executables according to
the input (target machine, host machine…) we give it, but the generated <code>gcc</code>
executables may differ one from another even if they were generated from the
same codebase.</p>
<p>Any <code>gcc</code> executable is able to compile any input <span class="caps">HLL</span><sup id="fnref:language"><a class="footnote-ref" href="#fn:language">1</a></sup> (C, C++,
Objective-C, Ada, Fortran and Go<sup id="fnref:java"><a class="footnote-ref" href="#fn:java">2</a></sup>), so <span class="caps">GCC</span>’s code must include parsers
for each of these languages.</p>
<p>On the other hand, <code>gcc</code> executables are only able to generate code for one
target (x86, <span class="caps">MIPS</span>, <span class="caps">ARM</span>, <em><span class="caps">RISC</span>-V</em>…), that must be chosen when <span class="caps">GCC</span> is compiled.
In order to make the porting efforts easier, <span class="caps">GCC</span> has a set of tools that
generate the target-dependent code from some configuration files called Machine
Descriptions (<span class="caps">MD</span>).</p>
<p>Putting all this together, source code parsing and <span class="caps">AST</span> generation depend on the
input <span class="caps">HLL</span>, and the code that runs for each <span class="caps">HLL</span> is <strong>selected</strong> when <code>gcc</code> runs
(<em>steps 1</em>). The intermediate representation, <span class="caps">GIMPLE</span>, is target-independent so
everything related with that is <strong>copied</strong> inside the final <code>gcc</code> executable
(<em>steps 2 and 3</em>). The <span class="caps">RTL</span> (Register Transfer Language) representation and
assembly code generation are target-dependent and the code related to that is
<strong>generated</strong> from <span class="caps">MD</span> files when <span class="caps">GCC</span> is compiled (<em>steps 4, 5 and
6</em>)<sup id="fnref:rtl-opt"><a class="footnote-ref" href="#fn:rtl-opt">3</a></sup>.</p>
<p>This all means if we want to be able to read the source code of <span class="caps">GCC</span> we have to
have clear in mind how the source code maps to the actual executable, i.e. if
we generate a <code>gcc</code> executable for <strong>x86</strong> it won’t contain the code for other
architectures and <strong>it won’t even check if it was correctly programmed</strong>,
because it’s not going to compile it.</p>
<h4 id="gcc-coordinator"><span class="caps">GCC</span> as a coordinator</h4>
<p>Many <span class="caps">GCC</span> users or C programmers (or me, not that long ago) might think there is
something missing on the list of steps we recently reviewed. The normal usecase
for calling <code>gcc</code> like</p>
<pre><code class="language-bash">$ gcc -o helloworld helloworld.c
</code></pre>
<p>does several steps internally that we need to separate:</p>
<ol>
<li>Preprocessing: the resolution of preprocessor macros like <code>#define</code> and
stuff like that.</li>
<li>Compiling to assembly: the generation of assembly code files per compilation
unit (a file that is the output of the preprocessor).</li>
<li>Assembly: the conversion from an assembly file to <span class="caps">ELF</span> object file.</li>
<li>Linking: the executable or library generation from the <span class="caps">ELF</span> object files
created in the previous step.</li>
</ol>
<p>The reality is there’s more than one program involved here and <code>gcc</code> is just a
coordinator that makes other programs run if needed.</p>
<p>The preprocessor is called <code>cpp</code> and it is generated from the <span class="caps">GCC</span> codebase. The
compiler is <code>gcc</code> itself but the assembler and the linker are generally
obtained from <span class="caps">GNU</span> Binutils’ <code>as</code> and <code>ld</code> respectively.</p>
<p>So, one of the most important things to understand is <span class="caps">GCC</span> only generates
assembly, but it looks like it doesn’t<sup id="fnref:tinycc-asm"><a class="footnote-ref" href="#fn:tinycc-asm">4</a></sup>.</p>
<p>This means we need a proper support for our architecture on the assembler and
the linker too. But we’ll keep that story for another day<sup id="fnref:post-long"><a class="footnote-ref" href="#fn:post-long">5</a></sup>.</p>
<hr>
<h3 id="parsing">Souce code parsing</h3>
<blockquote>
<p><span class="caps">HLL</span> dependent. Generates appropriate <span class="caps">IR</span></p>
</blockquote>
<p>So the first step of the compiler is to process the input text and convert it
to the appropriate Intermediate Representation. The most used intermediate
representation is <span class="caps">GENERIC</span>, that was designed for C but fits other procedural
languages pretty well<sup id="fnref:fortran"><a class="footnote-ref" href="#fn:fortran">6</a></sup>.</p>
<p>This parsing process not really relevant for us, as we want to add a new
target, but it’s interesting to note because it gives shape to the codebase.
<span class="caps">GCC</span> splits the code for the different input languages in folders named like:
<code>gcc/$LANGUAGE</code>.</p>
<h4 id="generic"><span class="caps">GENERIC</span></h4>
<p><span class="caps">GENERIC</span> is just a representation, we don’t need to care that much about it but
a word is not going to hurt anyone. <span class="caps">GENERIC</span> is a tree representation: a set of
nodes with some extra common information. Those nodes can be read at
<code>gcc/tree.def</code>.</p>
<p>A simple example of this could be a function declaration, that would take the
a node of type <code>FUNCTION_DECL</code> that has some sub-nodes: one for the return
type, another for the body of the function and another for the arguments of the function.</p>
<p>It’s a simple <span class="caps">AST</span> you could come up with yourselves, except the fact that it is
pretty complex. 😅</p>
<h3 id="gimple"><span class="caps">GIMPLE</span></h3>
<blockquote>
<p><span class="caps">HLL</span>- and Target- independent representation</p>
</blockquote>
<p>The next step is called <em>Gimplification</em> (see <code>gimplify.c</code>), the process of
converting to <span class="caps">GIMPLE</span>. Normally, representing the <span class="caps">AST</span> as <span class="caps">GIMPLE</span> is too complex
to be done in one step, so the <span class="caps">GENERIC</span> (+ some extensions) is used as a
previous step that is easier to create.</p>
<p><span class="caps">GIMPLE</span> is the central internal representation of <span class="caps">GCC</span>. It’s target-independent
and High-Level-Language-independent. At this point some optimizations can be
applied, those related with the structure of the source code, like loop
unfolding or dead code elimination.</p>
<p>From the porting perspective, this representation is important, as it’s the
border line between the front-end and the back-end, and we are interested in
the latter. A really interesting part is to understand is how is this converted
to the next representation, <span class="caps">RTL</span>.</p>
<h3 id="rtl">Register Transfer Language (<span class="caps">RTL</span>)</h3>
<blockquote>
<p>Target-dependent low level representation</p>
</blockquote>
<p>The next part of the compiler work is done using the <span class="caps">RTL</span> intermediate
representation. The <span class="caps">RTL</span> representation is based on <span class="caps">LISP</span>, so we have a reason to
love it, and it serves two purposes:</p>
<ol>
<li>Specify target properties via the Machine Descriptor files. These Machine
Descriptor files are text files that look like <span class="caps">LISP</span> and are processed at
compilation time.</li>
<li>Represent a compilation. Meaning that the <span class="caps">RTL</span> is also an intermediate
representation, a low-level one, that represents sets of instructions.</li>
</ol>
<p><span class="caps">GCC</span> does not make any distinction between the first and the second purpose,
calling both <span class="caps">RTL</span>, but there some difference on the purpose and the shape of the
<span class="caps">RTL</span>. <span class="caps">RTL</span> has both, an internal form represented by structures (case 2) and an
external form represented as a text file (case 1).</p>
<p>The <span class="caps">RTL</span> is formed by a set of objects: expression, integers, wide integers,
strings or vectors. In the textual form they are represented like in <span class="caps">LISP</span>,
using double quotes for strings, brackets for vectors… and a lot of
parenthesis. The internal representation you can imagine, structures for
expressions, integer types for integers, <code>char*</code> for strings, etc.</p>
<p>The most interesting <span class="caps">RTL</span> objects are expressions, aka <span class="caps">RTX</span>, that are just a name
(an expression code) plus the amount of possible arguments.</p>
<p>This is how a piece of <span class="caps">RTL</span> may look, it represents an instruction that sets the
register 0 to the result of the addition of the register 1 and the constant
integer 10 (see <code>rtl.def</code> for more information):</p>
<pre><code class="language-lisp">(set (reg 0)
(plus (reg 1)
(const_int 10)))
</code></pre>
<p>In the example the only things that are not expressions are the numbers (0, 1
and 10), all the rest you can find in <code>rtl.def</code> and see what they mean.</p>
<p>From <span class="caps">GIMPLE</span>, there are two steps left to reach our target, assembly code, and
both involve <span class="caps">RTL</span>. The first maps the <span class="caps">GIMPLE</span> nodes to pattern names in a
target-independent way, generating a list of <span class="caps">RTL</span> <code>insn</code>s. The second matches
those <code>insn</code> lists to <span class="caps">RTL</span> templates described in Machine Description files and
uses those matches to generate the final assembly code.</p>
<p>Those <code>insn</code>s are objects that represent code in <span class="caps">RTL</span>. Each function is
described with a doubly-linked list of <code>insn</code>s. You can think about them as
<em>instructions</em> in the <span class="caps">RTL</span> world.</p>
<p>In the first step, the <span class="caps">RTL</span> <code>insn</code> generation step, only the names matter (and
they are hardcoded in the compiler), while in the second the structure of the
<code>insn</code> is going to be analyzed as we’ll see later.</p>
<h4 id="target-dependent">Target-dependent code</h4>
<p>As we previously said, target-dependent steps are generated at compile time and
then inserted in the final <code>gcc</code> executable. All this code is located in one
folder per target, under <code>gcc/config/$TARGET</code>, so the <span class="caps">CFG</span> is able to load the
target we choose at compile time (using <code>--target=</code>) and insert that in the
final executable.</p>
<p>That is done in different ways depending on the type of file we are working
with: Machine Description files are processed by the programs (<code>gencodes</code>,
<code>gentags</code>…) that generate C code files from them, while target description
macros and functions, which are C files, are inserted in the building process
as any other C file.</p>
<p>I’d like to insist here on the fact that the <code>--target</code> is the only one to be
processed and loaded and the other possible targets are going to be ignored.
The build process is not going to complain if a target is broken or anything
like that if it isn’t the target we chose. It just doesn’t care.</p>
<h4 id="md">Machine Description files</h4>
<p>Machine Description files (<code>.md</code> extension) let us define <code>insn</code> patterns,
which are incomplete <span class="caps">RTL</span> expressions that can be matched against the <code>insn</code>
list generated from the <span class="caps">GIMPLE</span>, <code>attributes</code> and other interesting things we
may not try to decipher here.</p>
<p><code>define_insn</code> is a <span class="caps">RTX</span> we can use to define new <code>insn</code> patterns. It receives
four or five operands:</p>
<ol>
<li>An optional name. It’s going to be used to match against <span class="caps">GIMPLE</span>.</li>
<li>An <span class="caps">RTL</span> template. A vector of <em>incomplete</em> <span class="caps">RTL</span> expressions which describe how
should the instruction look like. <em>Incomplete</em> in this context means it uses
expressions like <code>match_operand</code> or <code>match_operator</code> which are designed to
match against the <span class="caps">RTL</span> <code>insn</code> list and see if they are compatible or not.</li>
<li>A condition. A final condition to say if the <code>insn</code> matches this pattern or not.</li>
<li>An output template. A string that contains the output assembly code for this
<code>insn</code>. The string can contain special characters like <code>%</code> to define where
should the arguments be inserted. If the output is very complex we can write
C code on this field too.</li>
<li>An optional list of attributes.</li>
</ol>
<p>This is an actual example from the <span class="caps">RISC</span>-V code we are backporting:</p>
<pre><code class="language-lisp">(define_insn "adddi3"
[(set (match_operand:DI 0 "register_operand" "=r,r")
(plus:DI (match_operand:DI 1 "register_operand" "r,r")
(match_operand:DI 2 "arith_operand" "r,I")))]
"TARGET_64BIT"
"add\t%0,%1,%2"
[(set_attr "type" "arith")
(set_attr "mode" "DI")])
</code></pre>
<p>You can see the name <code>adddi3</code> is something like: <code>add</code> + <code>di</code> + <code>3</code>. This means
it’s the <code>add</code> instruction with the <code>di</code> mode and <code>3</code> input arguments. That’s
the way things are named.</p>
<p>Next block is a vector with the <span class="caps">RTL</span> template. If you try to ignore
<code>match_operand</code> expressions you can see the template is not very different to
the <span class="caps">RTL</span> example we gave before. In this case it’s something like:</p>
<pre><code class="language-lisp">(set (reg 0)
(plus (reg 1)
(reg 2)))
</code></pre>
<p>It’s basically storing in the first register the result of the addition of the
other two.</p>
<p>The next field is the condition. In this case it needs to have <code>TARGET_64BIT</code>
defined in order to work because the machine mode is <code>DI</code> (we’ll explain that soon).</p>
<p>The output code is simple, just a <span class="caps">RISC</span>-V <code>add</code> instruction:</p>
<pre><code class="language-asm">add %0,%1,%2
</code></pre>
<p>Where <code>%N</code> is going to be replaced by the register numbers used as arguments
for this instruction.</p>
<p>The last field are the attributes, which can be used to define the instruction
size and other kind of things. We are not going to focus on them today.</p>
<h5 id="mm">Machine modes</h5>
<p>Machine modes are a way to describe the size of a data object and its representation.</p>
<ul>
<li><span class="caps">QI</span>: quarter integer</li>
<li><span class="caps">HI</span>: half integer</li>
<li><span class="caps">SI</span>: single integer</li>
<li><span class="caps">DI</span>: double integer</li>
<li><span class="caps">SF</span>: single floating</li>
<li><span class="caps">DF</span>: double floating</li>
</ul>
<p>And so on.</p>
<p>The standard <code>insn</code> names include machine modes to describe what kind of
instruction they are. The example above is <code>addddi3</code>, meaning it uses <code>di</code>
machine mode: double integer. That’s why it needs the target to be a 64 bit
<span class="caps">RISC</span>-V machine.</p>
<p>Machine modes also appear in some <span class="caps">RTL</span> expressions like <code>plus</code> or
<code>match_operand</code> meaning that they operate in that machine mode, that is, with
that data size and representation. For example <code>(plus:SI ...)</code>.</p>
<h5 id="rtl-templates"><span class="caps">RTL</span> Templates</h5>
<p><code>match_*</code> expressions are what make <span class="caps">RTL</span> expressions <em>incomplete</em>, because they
are designed to be compared against the <code>insn</code> list that comes from the
previous step.</p>
<p>In the example above we had:</p>
<pre><code class="language-lisp">(set (match_operand:DI 0 "register_operand" "=r,r")
(plus:DI (match_operand:DI 1 "register_operand" "r,r")
(match_operand:DI 2 "arith_operand" "r,I")))
</code></pre>
<p><code>(match_operand N predicate constraint)</code> is a placeholder for an operand number
<code>N</code> of the <code>insn</code>. When the <code>insn</code> is constructed, the <code>match_operand</code> will be
replaced by the corresponding operand of the <code>insn</code>. When the template is
trying to match an <code>insn</code> the <code>match_operand</code> forces the operand number <code>N</code> to
match the <code>predicate</code> in order to make the <code>insn</code> match the template.
The <code>match_*</code> expressions are what defines how <code>insn</code>s should be.</p>
<p>The <code>predicate</code> is a function name to be called. The function receives two
input arguments: an expression and a machine mode. If the function returns <code>0</code>
the function does not match.</p>
<p><code>predicate</code>s can also be combined in Machine Description files like this:</p>
<pre><code class="language-lisp">(define_predicate "arith_operand"
(ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
</code></pre>
<p>So the <code>arith_operand</code> shown in the example above can be a
<code>const_arith_operand</code> <em>or</em> (that’s what <code>ior</code> means) a <code>register_operand</code>.
They can be more complex but this is more than enough to understand how they
are built. In the end, they always check against C functions, but you can
combine them with the convenience of the Machine Description files.</p>
<p>The <code>constraint</code> allows to fine-tune the matching. They define if the argument
is in a register or the memory and stuff like that. <code>r</code>, for example, means the
operand comes from a register.</p>
<p>There are other matching expressions too, but <code>match_operand</code> is the most used
one and it’s the one that explains this concept of <em>incomplete</em> expressions the best.</p>
<h4 id="target-desc">Target description macros and functions</h4>
<p>Apart from the machine descriptor files, there are other files involved. For
example, the constraints defined above need to be defined in code somewhere.</p>
<p>The most important of these are the target description macros and functions,
normally defined in <code>gcc/config/$TARGET/$TARGET?(.h|.c)</code>. The <code>.c</code> should
initialize the <code>targetm</code> variable, which contains all the machine information
relevant to the compiler. It is initialized like this:</p>
<pre><code class="language-c">struct gcc_target targetm = TARGET_INITIALIZER;
</code></pre>
<p>That <code>TARGET_INITIALIZER</code> is a huge macro, defined in <code>gcc/target.h</code>, that
initializes the <code>targetm</code> structure. This macro is split in smaller macros with
reasonable defaults that may be overwritten by pieces. Each target should have
a file that includes both <code>target.h</code> and <code>target-def.h</code> and overwrites any
inappropriate default by redefining new macros and ends with the initialization
line we just introduced. This is normally done in
<code>gcc/config/$TARGET/$TARGET.c</code>, while the <code>.h</code> is normally used to define some
macros that are needed in the <code>.c</code> file.</p>
<p>As a reference, the <span class="caps">RISC</span>-V code we need to backport (see
<code>gcc/config/riscv/riscv.c</code>) uses the file to introduce the amount of registers,
the type, the size, and that kind of things, and many others.</p>
<p>All this information contained in <code>targetm</code> is used by the compiler to decide
how registers have to be allocated, which ones have preference, the cost of
them, and many other things.</p>
<h3 id="assembly">Assembly code generation</h3>
<p>Having the previous step clear is enough to understand how does the assembly
generation work. Each of the <code>insn</code>s in the list obtained from <span class="caps">GIMPLE</span> is going
to be compared against the <span class="caps">RTL</span> templates and the best match is going to be
chosen. Once the match is chosen, the corresponding assembly is going to be
generated from the corresponding field of the <code>define_insn</code> <span class="caps">RTL</span> expression.</p>
<p>As simple as that, but also that complex.</p>
<p>Why do I say it’s complex? Because many things have to be considered and <span class="caps">GCC</span>
does consider them. Each instruction has a size, that has to be considered to
calculate addresses, but also they have some execution time associated and <span class="caps">GCC</span>
calculates the best matches to make the final assembly file as optimum as possible.</p>
<p>The <span class="caps">RTL</span> step has a lot of optimization passes, too. It’s a complex step but
it’s not really important for us because we just need to make a temporary
compiler that lets us compile a better one. It doesn’t really matter if it’s
not perfect, at least at this point.</p>
<h3 id="summary">Summary</h3>
<p>So, in summary, the process is the following:</p>
<ol>
<li>The <span class="caps">HLL</span> language is parsed to a tree, normally <span class="caps">GENERIC</span>.</li>
<li><span class="caps">GENERIC</span> is converted to <span class="caps">GIMPLE</span>.</li>
<li><span class="caps">GIMPLE</span> optimizations are applied.</li>
<li><span class="caps">GIMPLE</span> is matched to a <code>insn</code> list using pattern names.</li>
<li>The <code>insn</code> list is matched against the <span class="caps">RTL</span> templates defined in the Machine
Description files. </li>
<li><span class="caps">RTL</span> optimizations are applied.</li>
<li>The matches convert the <span class="caps">RTL</span> to assembly code also taking in account the
information obtained from the target definition macros and functions.</li>
</ol>
<p>From our perspective, the most important things to remember are these:</p>
<ul>
<li>The front-end is not very relevant for us, from the parsing to <span class="caps">GIMPLE</span> we can
ignore for the moment.</li>
<li><span class="caps">RTL</span> step is pretty complex, and the <span class="caps">GIMPLE</span>-><span class="caps">RTL</span> conversion is too.</li>
<li><span class="caps">GCC</span> is a compiler collection that has a very powerful compilation process,
the Compiler Generator Framework (<span class="caps">CFG</span>), in order to modularize the code and
make it easier to port.</li>
<li>The machine description files and the target definition macros and functions
are designed to make the porting process simpler. Those are the only files we
need to touch.</li>
</ul>
<hr>
<div style="
text-align: center;
font-size: smaller;
padding-left: 3em;
padding-right: 3em;
padding-top: 1em;
padding-bottom: 1em;
border-top: 1px solid var(--border-color);
border-bottom: 1px solid var(--border-color)">
If you like my job here, consider hiring <a href="https://elenq.tech">ElenQ
Technology</a>. <br> Even if I’m busy with this, I still have some time slots
available.
</div>
<hr>
<h3 id="job">My job in the backport</h3>
<p>With all the process more or less clear, we can be more specific on the job I
need to do. I share some specifics in this section, so if you like reading code
you are going to have some fun<sup id="fnref:examples"><a class="footnote-ref" href="#fn:examples">7</a></sup>.</p>
<p>First, I need to make sure all the used <span class="caps">RTL</span> expressions are compatible with the
old version of the compiler. If they are not, I have to translate them to the
old way to make them. Some examples of this are iterators like
<a href="https://github.com/riscv-collab/riscv-gcc/blob/ca312387ab141060c20c388d83d6fc4b2099af1d/gcc/config/riscv/riscv.md?plain=1#L342"><code>(define_int_iterator ...)</code></a> which is not available in old <span class="caps">GCC</span>
versions, so I need to unfold a couple of loops by hand and make them only use
the old constructs.</p>
<p>Second, I need to convert the target description macros and functions to the
old internal C-based <span class="caps">API</span> instead of the more <a href="https://github.com/riscv-collab/riscv-gcc/blob/ca312387ab141060c20c388d83d6fc4b2099af1d/gcc/config/riscv/riscv.c">modern C++-based one</a>, as
the recent port uses. These changes involve many layers and I didn’t yet
analyze this in detail. They can be simple like converting from <code>rtx_insn</code>
class to <code>rtx</code>, the older way to do this. But they can also be complex, like
removing the 40% of the <code>#include</code> directives from <code>riscv.c</code>, which has many
that were not available in the past. It’s going to be a lot of fun I predict.</p>
<p>Third, as this whole compilation process is complex, I decided to make it as
accessible as possible, so other people can audit and replicate my work. For
that I’m using Guix, my package manager of choice. I added a <a href="https://github.com/ekaitz-zarraga/gcc/blob/guix_package/guix.scm"><code>guix.scm</code></a>
and <a href="https://github.com/ekaitz-zarraga/gcc/blob/guix_package/channels.scm"><code>channels.scm</code></a> file to the repository so my work can be
replicated precisely by myself in the future, or by others<sup id="fnref:github"><a class="footnote-ref" href="#fn:github">8</a></sup>.</p>
<p>The Guix package also provides a better interaction with the building process
of <span class="caps">GCC</span>, letting us replace inputs in a very simple way. I’m thinking about the
next steps of the project here, when we need to compile my backported compiler
with TinyCC, test if it works and then patch TinyCC until it does. Having the
<code>guix.scm</code> file makes it easy to replace the current compiler with a patched
TinyCC and ensures nothing is interfering in the compilation process because
the compilation is done in an isolated container.</p>
<p>That’s mostly the job I need to do in the backport.</p>
<p>Something to keep in mind is that we don’t need to make it perfect, we just
need it to work. The backported <span class="caps">GCC</span> is not going to be used as a production
compiler, but just as a bridge with the next <span class="caps">GCC</span> version, so <strong>there’s only one
program it needs to be able to compile correctly: <span class="caps">GCC</span> 7</strong>. Once we make the
bridge with the next version, we can use that to compile anything we want.</p>
<h3>Last words</h3>
<p>I know this post is long, and the lack proper diagrams make everything a little
bit hard to understand. That’s exactly how I felt reading about <span class="caps">GCC</span>, but the
difference was I had to read some documentation that is… About 100 times
longer than this post (see <a href="#more">Learn more</a> below). Not that bad after all.</p>
<p>There are many things I decided to leave out, like peephole optimizations,
instruction attributes, and some other constructs that are not that important
from my perspective. You may want to make your research on those on your own.</p>
<p>In any case, if you have any question you can always contact me<sup id="fnref:contact"><a class="footnote-ref" href="#fn:contact">9</a></sup> and
ask me any questions you have or send me some words of support.</p>
<p>In the next post I’ll describe a little bit about <span class="caps">ELF</span>, the executable and
linkable format, just the bare minimum to understand the format, as it will be
relevant for us in the future. And you might be thinking, why is it relevant if
<span class="caps">GCC</span> compiles to assembly? Well, that’s one of the questions that we will be
answering in the next post.</p>
<p>Now I leave you with a couple of interesting links on the next section.</p>
<p>Good luck with your ports!</p>
<h3 id="more">Learn more</h3>
<ul>
<li><a href="https://gcc.gnu.org/onlinedocs/gccint/">The <span class="caps">GCC</span> internals documentation</a>: if
you are interested on my work you should read an older version of the
documentation. See <a href="#disclaimer">Disclaimer</a>.</li>
<li><a href="https://gcc.gnu.org/git.html">The <span class="caps">GCC</span> source code</a>: of course, this has
everything you need to understand <span class="caps">GCC</span>, but the problem is that <span class="caps">GCC</span> is a huge
codebase, you probably need to spend months reading it in order to understand
everything. That’s why I think posts like this one are interesting, they help
you focus on the parts you are interested in.</li>
</ul>
<div class="footnote">
<hr>
<ol>
<li id="fn:language">
<p>When calling <code>gcc</code> you can choose which language you are compiling
using <code>-x language</code> option or you can let <code>gcc</code> guess from the extension. <a class="footnote-backref" href="#fnref:language" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:java">
<p>And also Java in the past! <a class="footnote-backref" href="#fnref:java" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:rtl-opt">
<p>The <span class="caps">RTL</span> optimizer contains many steps, most of them being target
independent. That doesn’t really matter here, but those are not generated but
copied from <span class="caps">GCC</span>’s source, as <span class="caps">GIMPLE</span> is. <a class="footnote-backref" href="#fnref:rtl-opt" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:tinycc-asm">
<p>Other compilers have different approaches for this. For example,
TinyCC generates machine code directly, without the intermediate assembly
file generation step, and is also able to link the files by itself. <a class="footnote-backref" href="#fnref:tinycc-asm" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:post-long">
<p>This post is already long enough and we only made the
introduction. <a class="footnote-backref" href="#fnref:post-long" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:fortran">
<p>The case of <span class="caps">FORTRAN</span> is a little bit weird, as it generates its own
representation that is later converted to <span class="caps">GENERIC</span>, we don’t really care about
this at this point. <a class="footnote-backref" href="#fnref:fortran" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:examples">
<p>I’ll link to some examples of the code on <span class="caps">RISC</span>-V’s GitHub account.
This code is already merged in <span class="caps">GCC</span>. <a class="footnote-backref" href="#fnref:examples" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
<li id="fn:github">
<p>I’m hosting this on GitHub at the moment because the repository is
huge. I’ll probably move all this to my server and edit the post after that. <a class="footnote-backref" href="#fnref:github" title="Jump back to footnote 8 in the text">↩</a></p>
</li>
<li id="fn:contact">
<p>You can find my contact info in the <a href="/pages/about.html">About
page</a>. <a class="footnote-backref" href="#fnref:contact" title="Jump back to footnote 9 in the text">↩</a></p>
</li>
</ol>
</div>Intro to GCC bootstrap in RISC-V2022-02-14T00:00:00+02:002022-02-14T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2022-02-14:/bootstrapGcc0.html<p>Introduction to my new adventure bootstrapping <span class="caps">GCC</span> for <span class="caps">RISC</span>-V. Why, how,
and who is going to pay for it.</p><p>You probably already know about how I spent more than a year having fun with
<span class="caps">RISC</span>-V and software bootstrapping from source.</p>
<p>As some may know from my <a href="https://fosdem.org/2022/schedule/event/riscvadventures/"><span class="caps">FOSDEM</span> talk</a>, <a href="https://nlnet.nl/project/GNUMes-RISCV/">NLNet / <span class="caps">NGI</span>-Assure put the
funds</a> to make me spend more time on this for this year and I decided
to work on <span class="caps">GCC</span>’s bootstrapping process for <span class="caps">RISC</span>-V.</p>
<h3>Why <span class="caps">GCC</span></h3>
<p><span class="caps">GCC</span> is probably the most used compiler collection, period. With <span class="caps">GCC</span> we can
compile the world and have a proper distribution directly from source, but who
compiles the compiler?<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>
<p>Well, someone has to.</p>
<h3>The bootstrap</h3>
<p>Bootstrapping a compiler with a long history like <span class="caps">GCC</span> for a new architecture
like <span class="caps">RISC</span>-V involves some complications, starting on the fact that the first
version of <span class="caps">GCC</span> that supports <span class="caps">RISC</span>-V needs a C++98 capable compiler in order to
build. C++98 is a really complex standard, so there’s no way we can bootstrap a
C++98 compiler at the moment for <span class="caps">RISC</span>-V. The easiest way we can think of at
this point is to use an older version of <span class="caps">GCC</span> for that, one of those that are
able to build C++98 programs but they only require a C compiler to build. Older
versions of <span class="caps">GCC</span>, of course, don’t have <span class="caps">RISC</span>-V support so… We need a
<em>backport</em><sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<p>So that’s what I’m doing right now. I’m taking an old version of <span class="caps">GCC</span> that only
depends on C89 and is able to compile C++98 code and I’m porting it to <span class="caps">RISC</span>-V
so we can build newer GCCs with it.</p>
<p>Only needing C to compile it’s a huge improvement because there are <em>Tiny C
Compilers</em> out there that can compile C to <span class="caps">RISC</span>-V, and those are written using
simple C that we can bootstrap with simpler tools of a more civilized world.</p>
<p>In summary:</p>
<ul>
<li>C++98 is too complex, but C89 is fine.</li>
<li><span class="caps">GCC</span> is the problem and also the solution.</li>
</ul>
<h3>What about <span class="caps">GNU</span> Mes?</h3>
<p>When <em>we</em><sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup> started with this effort we wanted to prepare <span class="caps">GNU</span> Mes, a small C
compiler that is able to compile a <em>Tiny C Compiler</em>, to work with <span class="caps">RISC</span>-V so we
could start to work in this bootstrap process from the bottom.</p>
<p>Some random events, like someone else working on that part, made us rethink our
strategy so we decided to start from the top and try to combine both efforts at
the end. We share the same goal: full source bootstrap for <span class="caps">RISC</span>-V.</p>
<h3>Tiny C Compilers?</h3>
<p>There are many small C compilers out there that are written in simple C and are
able to compile an old <span class="caps">GCC</span> that is written in C. Our favorite is TinyCC (Tiny C Compiler).</p>
<p><span class="caps">GNU</span> Mes is able to build a patched version of TinyCC, which already supports
<span class="caps">RISC</span>-V (<span class="caps">RV64</span> only), and we can use that TinyCC to compile the <span class="caps">GCC</span> version I’m backporting.</p>
<p>We’d probably need to patch some things in both projects to make everything
work smoothly but that’s also included in the project plan.</p>
<h3>Binutils</h3>
<p>Binutils is also a problem mostly because <span class="caps">GCC</span>, as we will talk about in the
future, does not compile to binary directly. <span class="caps">GCC</span> generates assembly code and
coordinates calls to <code>as</code> and <code>ld</code> (the <span class="caps">GNU</span> Assembler and Linker) to generate
the final binaries. Thankfully, TinyCC can act as an assembler and a linker,
and there’s also the chance to compile a modern binutils version because it is
written in C.</p>
<p>In any case, the binary file generation and support must be taken in account,
because <span class="caps">GCC</span> is not the only actor in this film and <span class="caps">RISC</span>-V has some weird things
on the assembly and the binaries that have to be supported correctly.</p>
<h3>Conclusion</h3>
<p>This is a very interesting project, where I need to dig in <strong><span class="caps">BIG</span></strong> stuff, which
is cool, but also has a huge level of uncertainty, which scares the hell out of
me. I hope everything goes well…</p>
<p>In any case, I’ll share all I learn here in the blog and I keep you all posted
with the news we have.</p>
<p>That’s all for this time. If you have any question or comment or want to share
your thoughts and feelings with me<sup id="fnref:5"><a class="footnote-ref" href="#fn:5">5</a></sup> you can find my
<a href="https://ekaitz.elenq.tech/pages/about.html">contact information here</a>.</p>
<hr>
<blockquote>
<p><span class="caps">PS</span>: Big up to NlNet / <span class="caps">NGI</span>-Assure for the money.</p>
</blockquote>
<style>
.container{
display: flex;
flex-flow: row wrap;
justify-content: center;
gap: 40px;
}
.no-side-margin{
margin: 0px;
}
</style>
<div class="container">
<img class="no-side-margin" src="https://ekaitz.elenq.tech/nlnet.svg" width=200px>
<img class="no-side-margin" src="https://ekaitz.elenq.tech/NGIAssure.svg" width=200px>
</div>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p><em>wHo wATcHes tHE wAtchMEN?</em> <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>Insert “Back to the Future” music here. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p><span class="dquo">“</span><em>We</em>” means I shared my thoughts and plans with other people who have a
much better understanding of this than myself. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p>But there are some others that are really interesting (see
<a href="https://sr.ht/~mcf/cproc/">cproc</a>, for example) <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:5">
<p>Or even hire me for some freelance <span class="caps">IT</span> stuff 🤓 <a class="footnote-backref" href="#fnref:5" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
</ol>
</div>Lessons learned on machine code generation2021-06-16T00:00:00+03:002021-06-16T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2021-06-16:/machine-code-generation.html<p>A summary of the lessons I learned about machine code generation during my
work at Lightening, Hex0 and all my recent research on compilers.</p><p>Machine code generation sounded like a weird magic to me half a year ago, I
swear, but now it doesn’t look so disturbingly complicated. <em>Nothing in
computer science is that complicated, after all</em>.</p>
<ol>
<li><a href="#basics">Basics</a><ol>
<li><a href="#what">Machine code is numbers</a><ol>
<li><a href="#demo">Demonstration</a></li>
</ol>
</li>
<li><a href="#talk">Calling convention</a></li>
<li><a href="#protections">Memory protections</a></li>
<li><a href="#jit">Just-in-Time Compilation</a><ol>
<li><a href="#lightening">Example: Lightening: Guile’s machine code generation library</a></li>
</ol>
</li>
</ol>
</li>
<li><a href="#problems">Lessons learned</a><ol>
<li><a href="#large-imm">Problem: Large immediates</a><ol>
<li><a href="#multi-inst">Solution: multiple instruction expansion</a></li>
<li><a href="#constants">Solution: constant insertion</a></li>
</ol>
</li>
<li><a href="#addr-off">Problem: Unknown addresses and offsets</a><ol>
<li><a href="#relocs">Solution: relocations</a><ol>
<li><a href="#relocs-c">Example: C compilers</a></li>
<li><a href="#relocs-lightening">Example: Lightening</a></li>
</ol>
</li>
</ol>
</li>
<li><a href="#jumps">Problem: Long jumps</a><ol>
<li><a href="#always-largest">Solution: always insert the largest jump possible</a><ol>
<li><a href="#relaxation">Optimization: pointer relaxation</a></li>
<li><a href="#relax-example">Example: relaxed global variable access in C compilers</a></li>
</ol>
</li>
<li><a href="#veneer">Solution: Veneers</a><ol>
<li><a href="#lightening-veneer">Example: Lightening’s veneer system</a></li>
</ol>
</li>
</ol>
</li>
<li><a href="#reg-access">Problem: Register Access</a><ol>
<li><a href="#stack">Solution: use the stack</a></li>
<li><a href="#controlled-regs">Solution: controlled register access</a></li>
</ol>
</li>
</ol>
</li>
<li><a href="#final">Final thoughts</a></li>
</ol>
<h3 id="basics">Basics</h3>
<p>There are many contexts where you may need to generate machine code. If you are
writing a compiler, an assembler, a jit compiler… In the last months I’ve
been working on Lightening, a machine code generation library that powers
Guile’s <span class="caps">JIT</span> Compilation, a <span class="caps">RISC</span>-V assembler and interpreter and Hex0, which was
introduced in the <a href="https://ekaitz.elenq.tech/hex0.html">previous post</a>, where I
needed to assemble a file by hand.</p>
<p>All of those cases result in the same thing, even if they have different
conditions: we are generating machine code.</p>
<p>In this post I’ll try to talk about some issues that are generic and apply to
all the cases and others that are more specific to some of the projects I mention.</p>
<p>But first we need to clarify some stuff just in case.</p>
<h4 id="what">Machine code is numbers</h4>
<blockquote>
<p>Machine code is what people from the electronics world call “code”.</p>
</blockquote>
<p>I know you know it, but let’s refresh some things about computing we may have
forgotten thanks to all the efforts that <a href="https://ekaitz.elenq.tech/hiding-the-complexity.html">hide the
complexity</a> of our
everyday business.</p>
<p>Machine code instructions are basically blocks of bits your processor is
reading and interpreting. Those bit blocks encode all the information the
processor needs: the identifier of the instruction and its arguments.</p>
<p>The identifier is normally known as <em>opcode</em>. The arguments can have many
different meanings, depending on the instruction so we are not getting into
that. The instructions normally alter the values of registers, so they need to
have identifiers for the source and destination registers, or literal values
that are introduced literally inside of the instruction (they are called
<em>immediates</em>).</p>
<p>Let’s put a simple <span class="caps">RISC</span>-V example here. Consider this assembly instruction:</p>
<pre class="highlight"><code class="language-asm">addi a0, zero, 56
</code></pre>
<p>This thing you interpret as some assembly instruction that adds <code>56</code> to the
<code>zero</code> register and stores the result in the <code>a0</code> register, has to be encoded
in a way that the machine is able to understand. Better said, it is encoded in
a way that <strong>you</strong> can understand! The real instruction is a bunch of bits that
represent the same thing.</p>
<p><span class="caps">RISC</span>-V base <span class="caps">ISA</span> has a various instruction formats which depend on the goal of
the instruction. This one is from the <code>I</code> format, because it includes an
<em>immediate</em>. Read it and compare with the following:</p>
<ul>
<li>First the <em>opcode</em>, <code>addi</code> for you, has a binary counterpart: <code>0010011</code>. 7
bits for this instruction format.</li>
<li>Then the destination register, <code>a0</code>, has a binary representation: <code>01010</code>.
There are 32 registers in <span class="caps">RISC</span>-V so each of them are represented by a 5 bit value.</li>
<li>There’s some extra space for an opcode-like field called <code>funct3</code>: <code>000</code></li>
<li>Then there the source register, <code>zero</code>, which is: <code>00000</code>. Again 5 bits.</li>
<li>And the <em>immediate</em> you are adding, <code>56</code>, which is just the binary
representation of <code>56</code>: <code>000000111000</code>. It’s 12 bits wide for this
instruction format.</li>
</ul>
<p>Putting all together:</p>
<pre class="highlight"><code class="language-asm">000000111000 | 00000 | 000 | 01010 | 0010011
</code></pre>
<p>So this forms the following binary value:</p>
<pre class="highlight"><code class="language-asm">00000011100000000000010100010011
</code></pre>
<p>Or in hex:</p>
<pre class="highlight"><code class="language-asm">0x3800513
</code></pre>
<blockquote>
<p>Just for if you didn’t realize, you just assembled an instruction by hand.</p>
</blockquote>
<p>That instruction we just created is going to be processed by the machine,
reading each of the fields and activating its circuits as it needs according to
the voltage levels those values represent.</p>
<p>In this case, it’s going to activate the <span class="caps">ALU</span> to add the numbers and all
that kind of things, but in other cases it may just change the value of the
program counter or whatever. All this is executed by the circuitry of the
device, right after it loads the instruction.</p>
<p>That’s for the machine, but for us, from the perspective of a programmer,
instructions are just numbers, as we just saw.</p>
<h5 id="demo">Demonstration</h5>
<p>I purposely used <em>machine</em> to refer to the device that runs our instructions,
but we have to be more specific about it now.</p>
<p>I’m going to talk specifically about modern (and common) microprocessors,
because other devices may have peculiarities that can sidetrack us too
hard<sup id="fnref:harvard"><a class="footnote-ref" href="#fn:harvard">1</a></sup>.</p>
<p>In our modern and common microprocessor, <a href="https://en.wikipedia.org/wiki/Von_Neumann_architecture">instructions are located in the
memory</a>. But that’s
nothing we didn’t know! If we run a binary it’s loaded in the memory and
executed from there. We all know that!</p>
<p>But you can be surprised to certain level if we stretch that a little bit.</p>
<p>Well, we know from the previous part that instructions are just numbers, and we
know that they loaded from the memory so let’s do some C black magic and see
what happens:</p>
<pre class="highlight"><code class="language-clike">#include<stdint.h>
#include<stdio.h>
typedef int f0(void);
int main(int argc, char* argv[]){
uint32_t instructions[2];
instructions[0] = 0x03800513; // addi a0, zero, 56
instructions[1] = 0x00008067; // jalr zero, ra, 0
f0 *load_56 = (f0*) instructions; // Reinterpret the array address
// as a function
int a = load_56();
printf("%d\n", a);
}
</code></pre>
<p>In that example we build an array of two values. The first one corresponds to
the instruction we encoded by hand before and the second corresponds to <code>jalr
zero, ra, 0</code>, the return instruction, which you can encode yourself.</p>
<p>After that we convert the address of the array to a function that returns and
integer and… Boom! We execute the array of numbers.</p>
<p>The code only works on <span class="caps">RISC</span>-V, but don’t worry, I can tell you that it prints
<code>56</code>.</p>
<p>So it was true that the machine can execute stuff from the memory, but what we
may not know is that for the machine there’s no actual distinction between
instructions and data<sup id="fnref:lisp"><a class="footnote-ref" href="#fn:lisp">2</a></sup>. We just executed an array of numbers!</p>
<p>The machine doesn’t care. If it looks like instructions it executes.</p>
<p>You can try to put random values in the array and try to execute them, too.
An <code>Illegal instruction</code> error is going to happen, probably. If you are lucky
you may execute something by accident, who knows.</p>
<p>But how did this thing work that well? Why did it return the value correctly
and all that?</p>
<h4 id="talk">Calling convention</h4>
<p>The code worked because we are following the <span class="caps">RISC</span>-V <span class="caps">ABI</span>, the same that C is
following in the example. It tells us how do we need to pass arguments to
functions and return and all that. This part of the <span class="caps">ABI</span> that defines how to
call and return from functions is called <em>calling convention</em>.</p>
<p>I’m not going to extend a lot talking about this, but I will just say that
<span class="caps">RISC</span>-V has some registers to pass arguments on: <code>a0</code>, <code>a1</code>…<code>a7</code>. And those
registers are also used for return values.</p>
<p>In the example we are not taking any argument so we don’t need to read from any
but we return one value, just writing it in <code>a0</code>.</p>
<p>With what you know, you can now create a function that gets an input argument
and adds an <em>immediate</em> to it. Why don’t you try?</p>
<p>On the other hand. <span class="caps">RISC</span>-V <span class="caps">ABI</span> defines there’s a register called <code>ra</code> that
contains the Return Address, so we need to jump to it if we want to finish our
function execution.</p>
<p>There are many things more you can read about, but this is enough to begin.</p>
<h4 id="protections">Memory protections</h4>
<p>The C example where we executed an array is correct, it runs and all that, but
the reality is that memory has different kinds of permissions for each part of it.</p>
<p>Code in memory is normally read-only and executable, and data can be read-only
or not, depending on the goal it has (constant or variable).</p>
<p>If you think about the example above, once the array is set, we can overwrite
it later, or even write it from the instructions we inserted on it. This could
lead to security issues or unexpected results. That’s why code is normally read
only and any attempt to write it will raise an exception to the kernel.</p>
<p>There are several ways to identify a memory block as code: the <span class="caps">RISC</span>-V assembly
(and many others) uses the <code>.text</code> directive which automatically sets the block
as a read-only block that can be executed; the <code>mmap</code> Linux system call needs
some flags to indicate the protections on the memory block (<code>PROT_EXEC</code>,
<code>PROT_READ</code>, <code>PROT_WRITE</code>…); etc.</p>
<h4 id="jit">Just-in-Time Compilation</h4>
<p>Just-in-time (<span class="caps">JIT</span>) Compilation is a way to execute programs that involve a
compilation step at runtime. Typically this happens on interpreted programs,
where the interpreter consumes part of the execution time. An interpreter with
a <span class="caps">JIT</span> Compilation feature is able to compile parts of the code it’s going to
run to machine code and speed up the execution of those parts.</p>
<p>Clever interpreters are able to predict if the time they need to compile and
execute the <span class="caps">JIT</span> Compiled parts is less than the time they need to interpret
them, so they can decide if it’s worth the effort.</p>
<p>Normally, the <span class="caps">JIT</span> Compilation is more effective in pieces of code that are
executed many times because the code only needs to be compiled once and the
speed increase is going to be obtained in every execution. But many algorithms
may be defined, and parts of the code may be recompiled looking for different
optimizations while the interpreter collects data about the performance of the program.</p>
<p>Explained like this it looks like it’s a complex thing to do (and it is) but
with the previously mentioned points we can imagine a simple <span class="caps">JIT</span> machine code
generation library. We “just” need to:</p>
<ul>
<li>Know what code to generate (choose a function to compile, this step may need
some code analysis).</li>
<li>Reserve some space (<code>malloc</code>, <code>mmap</code>…)</li>
<li>Fill the space with numbers (the machine code instructions resulting from the
compilation of the function).</li>
<li>Next time the program wants to call the function we compiled, call the
numbers instead (as we did <a href="#demo">in the demonstration</a>).</li>
</ul>
<h5 id="lightening">Example: Lightening, Guile’s machine code generation library</h5>
<p>The just-in-time compilation process in Guile is simple, but effective<sup id="fnref:guile"><a class="footnote-ref" href="#fn:guile">3</a></sup>.
Guile uses a library called Lightening for it. Lightening is a template-like
library that defines a virtual instruction set. That instruction set is
translated by the library to the instruction set of the actual machine.</p>
<p>Implementing support for another architecture is as simple as implementing all
the translation code for the new architecture. That’s <a href="https://ekaitz.elenq.tech/lightening.html">what I’ve been doing
these days</a>.</p>
<p>Guile’s <span class="caps">JIT</span> compiler only needs to call the instructions of the library and
they will generate actual machine code by themselves, packaged in a function
the interpreter will be able to call later.</p>
<p>Lightening is simple because it doesn’t need to compile from source code, or
make code analysis to find which part of the code does it need to compile. It
just exposes an <span class="caps">API</span> that looks like an instruction set and that’s the thing
that we translate to the machine code.</p>
<p>The <span class="caps">JIT</span> is going to call the <span class="caps">API</span> of Lightening, creating more complex
operations by combining Lightening’s instructions and Lightening is going to
convert those operations to their machine code by a simple translation, filling
the array of numbers and returning its address as a function pointer we can
call later.</p>
<p>Of course, it is much more complex than that because it needs to solve many
other problems we are talking about later but that’s the idea. And the idea
doesn’t sound too difficult, once you have in mind what we talked about previously.</p>
<h3 id="problems">Lessons learned</h3>
<p>There are many problems that a machine code generation library like that can
encounter, but they are not exclusive for those libraries. This kind of
problems can also appear in compilers, assemblers and many other things.</p>
<p>The lessons I learned come as problems I encountered during these days of
digging and implementing and some possible solutions or thoughts about them.</p>
<h4 id="large-imm">Problem: Large immediates</h4>
<p>Large <em>immediates</em> are one of the most obvious but boring issues in this world,
and they apply to many cases.</p>
<p>In the <a href="#what">example above</a> we encoded an <code>addi</code> instruction that added <code>56</code>,
an <em>immediate</em>, to a register, and we said the <em>immediate</em> had a 12 bit space
in the instruction. Registers in <span class="caps">RISC</span>-V are 32 bit (in <span class="caps">RV32</span>) or 64 bit (in
<span class="caps">RV64</span>) wide, so we can work with larger values, but we are limited to use <code>12</code>
bit immediates in <code>addi</code> and all the other <code>I</code>-type instructions.</p>
<p>Why is that? Well, <span class="caps">RISC</span>-V instructions are 32 bit and they need to be able to
pack much more information than the <em>immediate</em> they use, so the <em>immediates</em>
can’t be as large as we want. The fixed instruction size is a design decision
that keeps the processor simple, but other processors have other design
decisions<sup id="fnref:x86"><a class="footnote-ref" href="#fn:x86">4</a></sup> around this.</p>
<h5 id="multi-inst">Solution: multiple instruction expansion</h5>
<p>There are several solutions for this, but the most obvious one is to use more
than one instruction to operate in an immediate.</p>
<p>If we want to load a 64 bit value, we can add, rotate left, add, rotate left,
add… Until we fill a whole register with the value we were looking for.</p>
<p>This means a simple addition can be expanded to many instructions. In some
cases they are going to be just a few, but as the <em>immediates</em> get big we may
need more than eight instructions, very well encoded, to write the immediate to
a register and be able to operate with it.</p>
<h5 id="constants">Solution: constant insertion</h5>
<p>This is not a solution we can take everywhere, but we can take in the context
we are right now (code is stored in the memory and all that, remember).
Consider this <span class="caps">RV64</span> code:</p>
<pre class="highlight"><code class="language-clike">auipc t0, 0 // x[t0] = PC + 0
ld t0, 12(t0) // x[t0] = mem[ x[t0] + 12 ]
jal zero, 3 // PC = PC + 3
0xAAAAAAAAAAAAAAAA // This is a 64 bit literal
addi t0, t0, 1 // x[t0] = x[t0] + 1
// What's the value of t0 here?
</code></pre>
<p>The code has some comments on the right that I’m going to use through the whole
post, so get used to them. The <code>x</code> means register access (base registers are
called X registers in <span class="caps">RISC</span>-V), and <code>mem</code> is memory, <code>PC</code> (program counter) is
written in uppercase and not as if it was a register because it’s not
accessible by the programmer so we need to treat it as a global variable we can
only set using jumps or get using <code>auipc</code>.</p>
<p><span class="caps">RISC</span>-V instructions are 32 bit long (4 bytes), so you can get what the offset
in the <code>ld</code> instruction does, right?</p>
<p>Basically we are loading a doubleword (<code>ld</code>) at the position of the
<code>0xAAAAAAAAAAAAAAAA</code> in the <code>t0</code> register and adding <code>1</code> to it. So the answer
to the question is <code>0xAAAAAAAAAAAAAAAB</code>.</p>
<p>But can you see the trick we are using?</p>
<p>The <code>jal</code> instruction is jumping over the constant so we can’t execute it by
accident (which would cause an <code>Illegal Instruction</code> error), and using the <code>ld</code>
instruction we are able to load a big constant to a register. A constant which
<strong>is mixed with the code</strong>, as any immediate would be, but without being
associated with any instruction.</p>
<p>If we know the code we are generating is a function, we can always wait until
the return instruction and insert all the constants after it, so they are
perfectly separated and we don’t insert jumps to avoid executing the constants
by accident. For that case, we need to change the values of the <code>auipc</code> and the
<code>ld</code> accordingly, making them point to the correct address, which has some
associated issues we need to talk about now.</p>
<hr>
<div style="
text-align: center;
font-size: smaller;
padding-left: 3em;
padding-right: 3em;
padding-top: 1em;
padding-bottom: 1em;
border-top: 1px solid var(--border-color);
border-bottom: 1px solid var(--border-color)">
Keep in mind you can hire <a href="https://elenq.tech">ElenQ
Technology</a> if you like this kind of material. <br/>
We teach with this mixture of passion and awkward charisma. We also code and
research.
</div>
<hr>
<h4 id="addr-off">Problem: Unknown addresses and offsets</h4>
<p>Addresses and offsets are a pain in the ass because you may don’t know them
when you expect to.</p>
<p>Let’s consider an unconditional jump like the one of the previous example. The
number we introduce is the amount of instructions to jump from the program
counter: an offset. The <em>immediate</em> offset can be positive, for forward jumps,
or negative, for backward jumps.</p>
<pre class="highlight"><code class="language-clike">jal zero, 3 // PC = PC + 3
</code></pre>
<p>Generating this jump supposes that you know where you need to jump: you want to
jump 3 instructions to <em>the future</em>.</p>
<p>But imagine you are assembling a file, a real assembly file that is not an
oversimplification of the assembly, like what we did in the previous example. A
real assembly file with <em>labels</em>:</p>
<pre class="highlight"><code class="language-clike">add a0, a0, t1 // I don't care about this instruction
j end // Unconditional jump to `end`
// Some code here
end: // Label `end`
ret // return
</code></pre>
<p>If you are assembling this file line by line, you can actually assemble the
<code>add</code> in the first line, because you know <em>everything</em> from it, but you are
unable to emit the <code>j end</code> because you don’t know where <code>end</code> is <em>yet</em>.</p>
<p>If this assembly is written in a file you can always preprocess the whole file,
get the labels, associate them with their addresses and then assemble the whole
thing, but you are not always in this situation.</p>
<p>Lightening, for instance, generates the code as you call the <span class="caps">API</span>, so it doesn’t
know where your jump points to until you call the <span class="caps">API</span> for the label later.</p>
<p>Compilers may encounter this issue too, when they are using separate
compilation and linking steps. You must be able to compile one source file by
your own but you may not know where do global variables appear, because they
might be in a different file, and you only know those at link time. </p>
<h5 id="relocs">Solution: relocations</h5>
<p>There’s one simple way to solve it: introduce a fake offset or address and
patch it later, when we know the position of the symbol. That’s what
relocations do.</p>
<h6 id="relocs-c">Example: C compilers</h6>
<p>The relocations are a mechanism to pass information between the compiler and
the linker, you can actually see them in the object files generated by your
compiler. Make a simple file with a global variable and compile it. Something
like this:</p>
<pre class="highlight"><code class="language-clike">int global_symbol;
int main(int argc, char* argv[]){
return global_symbol !=0;
}
</code></pre>
<p>If you compile it with <code>gcc -c</code>, you can inspect relocations in the result with
<code>objdump</code>, using the <code>-r</code> flag alongside with <code>-d</code> for disassemble. In <span class="caps">RISC</span>-V
you’ll find things like <code>R_RISCV_HI20</code> or <code>R_RISCV_LO12</code> where the relocations
are located. They are ways to encode <em>immediates</em> in <code>U</code>-type instructions and
<code>I</code>-type instructions respectively. In my case I get something like this (it’s
not the full result):</p>
<pre class="highlight"><code class="language-clike"> 6: 00000797 auipc a5,0x0
6: R_RISCV_PCREL_HI20 global_symbol
6: R_RISCV_RELAX *ABS*
a: 00078793 addi a5,a5,0x0
a: R_RISCV_PCREL_LO12_I .L0
a: R_RISCV_RELAX *ABS*
e: 639c ld a5,0(a5)
</code></pre>
<p>There are two types of relocations but we are going to talk about the
<code>R_RISCV_RELAX</code> later. You see my relocations have <code>PCREL</code> in the middle, but
just to mention they are relative to the program counter.</p>
<p>If you just inspect the binary with the <code>-d</code> you won’t see the relocations and
the result will look like nonsense code<sup id="fnref:nonsense-code"><a class="footnote-ref" href="#fn:nonsense-code">5</a></sup>:</p>
<pre class="highlight"><code class="language-clike"> 6: 00000797 auipc a5,0x0 // x[a5] = PC + 0
a: 00078793 addi a5,a5,0x0 // x[a5] = x[a5] + 0
e: 639c ld a5,0(a5) // x[a5] = mem[ x[a5] + 0 ]
</code></pre>
<p>This adds <code>0</code> to program counter and stores the result in <code>a5</code>, then adds <code>0</code>
to <code>a5</code>, and loads a doubleword to <code>a5</code> from the address at <code>a5</code>. But the
address at <code>a5</code> at the moment of the load is nothing but the program counter at
the <code>auipc</code> instruction. Weird.</p>
<p>The relocation is going to point to the <code>auipc</code> and the <code>addi</code>, and tell the
linker it has to replace the zeros by other value. Which one? The address of
the global variable. If we replace the zeros by a combination that is able to
load the address of the global variable the code will work. That’s what the
relocation does here.</p>
<p>So, as we don’t know where to point, we insert anything (zeros) and we fix the
instructions when we know where do they need to point to.</p>
<h6 id="relocs-lightening">Example: Lightening</h6>
<p>The same approach is followed in Lightening, and you can follow in your
assembler, library or anything that has a similar problem. Let’s consider some
code using Lightening (obtained from <code>tests/beqr.c</code>, comments added by me):</p>
<pre class="highlight"><code class="language-clike">// Make a function that loads two arguments
jit_load_args_2(j, jit_operand_gpr (JIT_OPERAND_ABI_WORD, JIT_R0),
jit_operand_gpr (JIT_OPERAND_ABI_WORD, JIT_R1));
jit_reloc_t r = jit_beqr(j, JIT_R0, JIT_R1); // branch if equal registers
jit_leave_jit_abi(j, 0, 0, align); // end ABI context
jit_reti(j, 0); // return 0
jit_patch_here(j, r); // make the branch jump here
jit_leave_jit_abi(j, 0, 0, align); // end ABI context
jit_reti(j, 1); // return 1
// Obtain the function we created
jit_word_t (*f)(jit_word_t, jit_word_t) = jit_end(j, NULL);
// Test if it works
ASSERT(f(0, 0) == 1); // 0 == 0 so it jumps -> returns 1
ASSERT(f(0, 1) == 0); // 0 != 1 so it doesn't jump -> returns 0
</code></pre>
<p>In this example we see how we generate machine code statement by statement, so
there’s no way to know where does the <code>beqr</code> need to jump until we generated
all the code for it.</p>
<p>You see the <code>beqr</code> function doesn’t receive the target address or offset as an
argument, but it returns a <code>jit_reloc_t</code>, which other functions like <code>reti</code>
don’t return.</p>
<p>That <code>jit_reloc_t</code> is what we are patching later with the <code>jit_patch_here</code>
indicating where does it need to jump. The <code>jit_patch_here</code> function is going
to correct the bits we set to zero because we didn’t know the target at that moment.</p>
<p>There are different kinds of relocations, as it happened in the previous
example with the compilers, because different instruction formats need to be
patched in different ways. In the case of Lightening, the relocation has a type
associated with it, so we can check and act accordingly.</p>
<h4 id="jumps">Problem: Long jumps</h4>
<p>As we saw, some jumps encode the target as an <em>immediate</em>. This has a couple of
implications that we described previously:</p>
<ul>
<li>The jump target could be larger than the space we have for the immediate.</li>
<li>Sometimes we can’t know the target until we reach the position where the jump
points to.</li>
</ul>
<p>Both issues can be combined together in a killer combo. Consider this code:</p>
<pre class="highlight"><code class="language-clike">j label // jump to label
// A lot of instructions here
label:
// this is the target of the jump
</code></pre>
<p>In <span class="caps">RISC</span>-V the <code>j</code> pseudoinstruction is resolved to <code>jal</code>, that has a <code>21</code> bit
(signed) space for the jump target. If we had a hella lot of instructions
between the jump and the target we may need more bits for the jump than the
space we actually have.</p>
<p>Again, in the case were we can preprocess everything there’s no problem, but if
we are assembling the instructions as they come we are going to have issues.
We realize we can’t jump that far too late, because we already inserted
a 21 bit jump and too many instructions when we reach the label. Patching the
jump is not enough, because we didn’t leave enough space to insert the offset
we need.</p>
<h5 id="always-largest">Solution: always insert the largest jump possible</h5>
<p>There’s an obvious solution: always insert the largest possible jump and patch
the whole jump later.</p>
<p>In <span class="caps">RISC</span>-V <code>jalr</code> jumps to the absolute address that is stored on a register
with an optional 12 bit (signed) offset. Combined with the <code>auipc</code> (add upper
immediate to program counter) it lets us make 32 bit relative jumps in just 2
instructions. Let’s explain that in code just in case:</p>
<pre class="highlight"><code class="language-clike">auipc t0, offset_hi // x[t0] = PC + (offset_hi<<12)
jalr zero, offset_lo(t0) // PC = x[t0] + offset_lo
</code></pre>
<p>If we consider <code>offset</code> as a value we know, we can split it in two blocks: the
highest 20 bits as <code>offset_hi</code> and the lowest 12 bits as <code>offset_low</code> and use
them to jump to any address in the 32 range from the current position, using
just 2 instructions.</p>
<p>In 32 bit machines, this jump is the largest jump possible, because the machine
can only address 32 bits, so we will be sure that any relative (or absolute,
using <code>lui</code> instead of <code>auipc</code>) jump we want to make can fit in place. The only
thing we have to take in account is to patch both instructions when we find the
targets, not only one.</p>
<h6 id="relaxation">Optimization: pointer relaxation</h6>
<p>But using the largest possible jumps can lead to inefficiencies because we use
two instructions for jumps that can potentially fit in just one.</p>
<p>We can use something we saw before for that: relocations. More specifically,
in the case of the <span class="caps">GCC</span> toolchain, we can use the <code>R_RISCV_RELAX</code> that appeared before.</p>
<p>The relaxation relocation is going to tell the next step, which can be the
linker or anything else depending on the context we are working on, that the
pointer can be relaxed. In the case of the <code>auipc</code> + <code>jalr</code>, possibly by
replacing both instructions by a 21 bit jump like <code>jal</code>.</p>
<p>So we start with the longest jump possible, but when we actually know the
target of the jump, we can reduce it to something smaller that needs fewer instructions.</p>
<h6 id="relax-example">Example: relaxed global variable access in C compilers</h6>
<p>Global variables, as we saw before, are some of those points where compilers
need to use relocations and let the linker clean the result.</p>
<p>Global variables don’t necessarily involve jumps but they do involve pointers
for the loads and stores needed to operate with them. In the final executables,
global variables are part of the <code>.data</code> segment, because they are known at
compilation time, so we can exploit that fact a little and relax our weird
<code>auipc</code> + <em>load/store</em> combos.</p>
<p><span class="caps">RISC</span>-V has many registers, so we can use them for things that may not be the
norm in other platforms where registers are scarce. In this case, we can
exploit the <code>gp</code> (global pointer) register on <span class="caps">RISC</span>-V to improve the access to
the global variables. We can cache the address of the <code>.data</code> segment of the
program in the <code>gp</code> register so, as we know most of the global variables are
going to be near (12 bit offset) to the beginning of the <code>.data</code> segment, we
are probably going to be able to remove some of the <code>auipc</code>s we inserted before.</p>
<p>So a simple load of a global 64bit variable to a register:</p>
<pre class="highlight"><code class="language-clike">auipc t0, offset_to_global_hi // x[t0] = PC + offset_to_global_hi << 12
ld t0, offset_to_global_lo(t0) // x[t0] = mem[ x[t0] + offset_to_global_lo ]
</code></pre>
<p>Is optimized to this:</p>
<pre class="highlight"><code class="language-clike">ld t0, offset_from_data(gp) // x[t0] = mem[ x[gp] + offset_from_data ]
</code></pre>
<p>Of course, the offsets have to be calculated and all that, but this not that difficult.</p>
<h5 id="veneer">Solution: Veneers</h5>
<p>There are other solutions that don’t involve messing around with the code we
generated earlier in an aggressive way like removing instructions, which can be
pretty bad because you have to shift the array of instructions you generated to
clean the gaps the pointer relaxation leaves.</p>
<p>Veneers are non destructive, and they involve no instruction reorganization, so
they are interesting for those cases where you need to generate the code as you go.</p>
<p>Let’s explain them with an example:</p>
<pre class="highlight"><code class="language-clike">beq a0, a1, branch // Jump to `branch` if x[a0] == x[a1]
// Instructions...
branch:
// Branch target
</code></pre>
<p>As we saw previously, if we insert too many instructions in between the jump
and the target we screw it. What we didn’t mention is that as we go assembling
instructions one by one we can keep a track of the amount of instructions we
are inserting.</p>
<p>Having that in mind, we can take decisions in time, right before it’s too late.
We can combine that knowledge with the <a href="#constants">constant insertion</a> method
introduced before to insert full-range jumps if needed, right before we exhaust
the possible offset of the original instruction.</p>
<p>Of course, we need to patch the original instruction to jump to the code we are
just going to insert, and we need to add some protections around the veneer to
make it only accessible to the original jump.</p>
<pre class="highlight"><code class="language-clike">beq a0, a1, veneer // Jump to `veneer` if x[a0] == x[a1]
// Many instructions, but not too many!
// Here we realize we are running out of offset range so we insert a helper
// block that lets us jump further.
j avoid // Jump to `avoid` so the normal execution flow
// doesn't fall in the veneer
veneer:
auipc t0,0 // x[t0] = PC + 0
ld t0,12(t0) // x[t0] = mem[ x[t0] + 12 ]
jalr zero,0(t0) // PC = x[t0]
ADDRESS(branch) // Literal address of `branch` label
avoid:
// As many instructions as we want
branch:
// Branch target
</code></pre>
<p>As it happened with constant insertion, there are positions where the veneer
insertion can be optimized a little, like right after a return or an
unconditional jump, so we don’t need the protection (<code>j avoid</code> in the example).</p>
<p>The bad thing about veneers is they insert a bunch of instructions in the cases
that are not in range and the jumps are done in two steps, that has a negative
effect on the performance because they drop the pre-processed instructions <a href="https://en.wikipedia.org/wiki/Instruction_pipelining">in
the pipeline</a>.</p>
<p>Of course, the veneers themselves have to be patched too, because we won’t know
the target (<code>branch</code> in the example) until we reach it. But, in the case of the
veneer we can be 100% sure that we are going to be able to point to the target.</p>
<h6 id="lightening-veneer">Example: Lightening’s constant pools</h6>
<p>Lightening uses veneers for the jumps<sup id="fnref:veneer-if-needed"><a class="footnote-ref" href="#fn:veneer-if-needed">6</a></sup>, but they are part of
Lightening’s constant pool mechanism. Constant pools work the same for the
constant insertion than for veneers, because veneers are basically constants.
Remember, code is numbers!</p>
<p>Basically anything that might be inserted as a constant, which can be a veneer
or just a number or whatever, is queued to the constant pool. The library is
going to emit instructions and check on each instruction if it needs to emit
any of the constants of the pool.</p>
<p>The constant pool and each of the entries on the pool have associated
information that tells the emitter if they need to be emitted now or if they
can wait for later so the emitter can decide to insert them.</p>
<p>The literal pool entries have, of course, an associated relocation that
contains information of the original jump or load instructions we may need to
patch, as we already saw. So, in the case of a veneer emission, we need to
patch the original jump to the veneer and remember the veneer needs to be
patched later, when we find its target.</p>
<p>The mechanism is not complex, but it’s not simple neither. There are several
kinds of relocations, depending on what we want to do with them, different kind
of patches we need to do, address calculations and all those things that
require a level of attention to the detail I’m not prepared to talk about.</p>
<h4 id="reg-access">Problem: Register access</h4>
<p>You may have seen a problematic point in some of the solutions we worked with:
we are using registers.</p>
<p>It’s not a problem by itself, but using registers might be really problematic
if we are inserting code between the instructions someone else wrote because we
can’t control the register use the original program did and we might be
changing the values inside of the registers in the magic tricks we sneakily inserted.</p>
<p>Imagine we use, say, <code>t0</code> register in our veneer but the original program uses
that register for something else. That’s a problem. We are messing with the
value in the register and potentially (surely) breaking the program.</p>
<h5 id="stack">Solution: use the stack</h5>
<p>The most obvious solution you can think of is to use the stack. We can surround
our veneers or code insertions with some protection code that saves the values
of the registers on the stack and restores them when we finished.</p>
<p>It’s a simple solution in your mind, but if you need to deal with the jumps it
can get messy. You may need to restore the register far away in the code and
keeping track of everything. It can be complicated.</p>
<p>On the other hand, memory access is slow and boring and we don like that kind
of things in our lives. We need more dynamite.</p>
<h5 id="controlled-regs">Solution: controlled register access</h5>
<p>The other solution we can provide is to keep control of the registers that are
being accessed and use others for our intervention.</p>
<p>A simple way to do this is to provide functions to get and release temporary
registers, instead of letting the programmers do whatever they want. This makes
sure that all the register access is controlled and we are not changing the
values of any register in use.</p>
<p>The main problem we can have comes when the programmer needs all the register
for their things and then we can’t really use any for our magic tricks. But we
can always keep at least one register for us and only for us (throwing an error
to the programmer when they use it) or even combine the use of the stack with
this solution.</p>
<p>If we are directly working with assembly code, where we can’t force the
programmer to use the interface we want, we can choose the solutions that don’t
involve register access so we don’t need to analyze the code to deduce if the
programmer is using the registers or not. Avoiding the problem is sometimes the
best solution.</p>
<p>In the case of libraries like Lightening register access control is a must
because Lightening can’t control how its (virtual-) instructions are translated
to machine code instructions: each machine has its own peculiarities and
details. In many cases they need to make use of temporary registers and, as the
instructions are built incrementally, preventing instructions from peeing on
each other is important.</p>
<hr>
<div style="
text-align: center;
font-size: smaller;
padding-left: 3em;
padding-right: 3em;
padding-top: 1em;
padding-bottom: 1em;
border-top: 1px solid var(--border-color);
border-bottom: 1px solid var(--border-color)">
Please, consider supporting me on <a href="https://liberapay.com/ekaitz">Liberapay</a> to
encourage my free software work.
</div>
<hr>
<h3 id="final">Final thoughts</h3>
<p>I know these are just a few things, but they are enough to let you make your
first program that involves machine code generation to certain level.</p>
<p>I’m not a computer scientist but a telecommunication engineer<sup id="fnref:engineer"><a class="footnote-ref" href="#fn:engineer">7</a></sup>, so I
may put the focus on things that are obvious for the average reader of these
kind of posts, but at the same time I may be flying over things that I
consider basic due to my studies but the average reader doesn’t. In any case,
feel free to <a href="https://ekaitz.elenq.tech/pages/about.html">contact me</a> if you
have questions or corrections.</p>
<p>Some of the tricks and lessons I included here are more important than others,
but the most important thing is to start thinking in these terms. Try to
understand the problems you face when you have separate compilation, assume
the fact that you can’t know the future… The mindset is the most important
point of all this and, once you have it, everything comes easier.</p>
<p>It’s also a lot of fun to realize code is just numbers in memory you can mess
around with. I hope you keep it in your brain forever.</p>
<p>I hope this post throws some light on this dark hole that machine code
generation is and makes you try to make your own findings on this beautiful
area of compilers, machines and extremely long blog entries.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:harvard">
<p>One of those peculiarities is the <a href="https://en.wikipedia.org/wiki/Harvard_architecture">Harvard
Architecture</a> that is not
going to let us make the fantastic trick I’m going to show you now. Harvard
Architecture is popular on microcontrollers. <a class="footnote-backref" href="#fnref:harvard" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:lisp">
<p>LISPers are always right. <a class="footnote-backref" href="#fnref:lisp" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:guile">
<p>You can read more about <a href="https://www.gnu.org/software/guile/manual/html_node/Just_002dIn_002dTime-Native-Code.html">how it works here</a>. <a class="footnote-backref" href="#fnref:guile" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:x86">
<p>In x86 all the instructions don’t have the same length and some can
encode larger <em>immediates</em>. <a class="footnote-backref" href="#fnref:x86" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:nonsense-code">
<p><code>addi a5,a5, 0x0</code> is adding 0 to <code>a5</code> and storing it in <code>a5</code>
so it’s just moving it. <span class="caps">RISC</span>-V has a pseudoinstruction for that: <code>mv a5,a5</code>,
which is expanded to the <code>addi</code>. <code>objdump</code> is going to write <code>mv</code> in its
output, because it tries to be clever, but that is not the goal of the
instruction we have. I changed it to the actual instruction so we can
understand this better. <a class="footnote-backref" href="#fnref:nonsense-code" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:veneer-if-needed">
<p>Only in the architectures that need them. <code>x86</code> does not
need constant pools or veneers because the <span class="caps">ISA</span> is complex enough to handle
the problematic cases adding levels of complexity ISAs like <span class="caps">RISC</span>-V or <span class="caps">ARM</span>
didn’t want to deal with. <span class="caps">RISC</span> vs <span class="caps">CISC</span>, y’know… <a class="footnote-backref" href="#fnref:veneer-if-needed" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:engineer">
<p>So, for all that software developers that write blog posts like
“Are we really engineers?” or stuff like that: <strong>I am</strong>, thanks for the
interest. <span class="caps">LOL</span> <a class="footnote-backref" href="#fnref:engineer" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
</ol>
</div>RISC-V Adventures II: hex02021-06-08T00:00:00+03:002021-06-08T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2021-06-08:/hex0.html<p>A love story about trust, machine code, hexadecimal notation and weird
instruction formats, with an epic unexpected solution coming back from the afterlife.</p><p><a href="https://github.com/oriansj/stage0">Stage0</a> is a crazy project that is pretty
well aligned with our vision of trust, bootstrappable software and whatnot.</p>
<p>During the last two weeks we have been working on the port of Stage0 to
<span class="caps">RISC</span>-V, providing the very first step of the process, so we came here to talk
about it, including a fantastic software necromancy moment you are going to enjoy.</p>
<h3>The origin of the times</h3>
<p>Once upon a time, software was written in machine code. Directly expressing the
machine instructions by the hands of the <em>programmers</em>. That was long, long,
time ago.</p>
<p>One day some programmer decided to write a translator that mapped that machine
code to something more human readable and created what we call <em>assembly
language</em> today. It gained popularity and programmers decided to add more and
more functionalities to the <em>assembly language</em> until the point that what they
created was not a one-to-one mapping with machine code anymore.</p>
<p>That’s how the first <em>programming languages</em> were born.</p>
<p>Their power was so immense that programmers decided to rewrite all their tools
using the new <em>programming languages</em>, they even wrote newer <em>programming
languages</em> with them.</p>
<p>But power corrupts the mind of the fool. Blinded by the power of <em>programming
languages</em>, most of the programmers forgot the origin of the times, and
forgetting the history is always a mistake.</p>
<p><em>Epic music starts…</em></p>
<h3>The problem</h3>
<blockquote>
<p>Warning: I oversimplified during the beginning of the post, but now… Oh
boy! I’m going to flatten this shit.</p>
</blockquote>
<p>Well, we need auditable software. I don’t think anyone can deny that fact.</p>
<p>But what does “auditable software” mean? Isn’t free software enough?</p>
<p>It is true that the best way to audit stuff is to read the code of the
programs. It’s the classic way we had to know if a program is doing what we
want it to. But, how can you be really sure the code you are reading is the one
that is shipping with your program? </p>
<p>You can’t! In general it’s impossible to know. There are many reasons, but I
will oversimplify and give you just some thoughts and let the people from
<a href="http://bootstrappable.org/">bootstrappable</a> do the dirty job.</p>
<ol>
<li>
<p>The compilation process is not reproducible, so the same source can result
in different binaries. You can’t just compare different binaries to make
sure your compiler is compiled correctly.</p>
</li>
<li>
<p>We have no way to solve the chicken-egg problem. The recipe to build the
compiler version X is to get the sources of the compiler version X and
compile them with the compiler version X-1. But how do you get the compiler
version X-1? Rinse and repeat.<br>
Also… Where’s the first version of your compiler? Does it run in modern
machines with modern operating systems?</p>
</li>
<li>
<p>As there’s no real way to get your compilers compiled by yourself, there’s
no real way to be sure that the compilers are emitting the code they are
supposed to. You have to <a href="https://www.cs.umass.edu/~emery/classes/cmpsci691st/readings/Sec/Reflections-on-Trusting-Trust.pdf"><strong>trust
them</strong></a>,
and you can’t audit what you need to trust.</p>
</li>
</ol>
<p>The first point is where projects like Nix and Guix have sense: they try to
create reproducible stuff. Not only for compilation processes but also for
scientific studies (that have to be reproduced by other people, because…
that’s how science works, isn’t it?) and other things. Being able to create
identical environments where you can ensure that their (compilation) output is
going to be identical is extremely important, but I’ll leave that for now.</p>
<p>The second and third point are two different problems but they result in the
same: software distributions ship huge binary blobs (hundreds of megabytes) as
an starting point (bash, gcc…) so the users have no chance to check if those
binaries are corrupt.</p>
<p><a href="https://www.gnu.org/software/mes/"><span class="caps">GNU</span> Mes</a> is a Scheme interpreter and C
compiler that was designed to reduce the size of the binaries you need to ship
with your distro. Mes has successfully reduced the size of the binaries that
need to be shipped with distros like Guix, but the project is more ambitious
that that.</p>
<h3>Full-source bootstrap</h3>
<p><a href="https://savannah.nongnu.org/projects/stage0/">Stage-0</a> is a project that is
tackling the same “trusting trust” problem but from the opposite perspective,
starting in the low-level, rather than the high-level approach that <span class="caps">GNU</span> Mes uses.</p>
<p>Both projects work together to provide a greater goal: the full-source
bootstrap. The whole bootstrap is started from source, with no binaries
involved, so the distros don’t need to ship binaries anymore.</p>
<p>But how is that?</p>
<h3>Hex0</h3>
<p>Stage0 starts in the low level, the lowest possible, and builds more complex
programs from there, step by step.</p>
<p>The first step, Hex0, is a self-hosting “assembler”. I quote the word assembler
there because I think it’s a very strong word for this: it’s an <span class="caps">ELF</span> file
written in hexadecimal, with extra comments.</p>
<p>Hex0 is able to compile itself to a binary <span class="caps">ELF</span> file, converting the Hexadecimal
values to the binary values and stripping the comments.</p>
<p>We still have to compile the first Hex0 with something but that’s not as
difficult as compiling the first <span class="caps">GCC</span> because basically Hex0 can be compiled by
very simple programs or even by hand, because it contains literally what is
going to be written in the final <span class="caps">ELF</span>.</p>
<p>The comments on the Hex0 files describe the instructions on each of the lines
of the <span class="caps">ELF</span> file so the resulting files can be audited, instruction by
instruction, with the manual of the <span class="caps">ISA</span> as a reference.</p>
<p>This starting point is more than enough to build on top of. We just need to add
more functionalities to the next steps: labels, constants… Until we are able
to compile a simple C compiler, Mes or anything.</p>
<p>It’s a clever solution for a crazy problem.</p>
<h3>Hex0 in <span class="caps">RISC</span>-V: my experience</h3>
<p>So this is my blog and I come here to talk about myself!<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>
<p>Some weeks ago I had the chance to make that first step, Hex0, for <span class="caps">RISC</span>-V (64
bit). You can take a look to <a href="https://github.com/oriansj/bootstrap-seeds/pull/2/files">the code here</a>.</p>
<p>There you can see I added three files: the assembly file, the hex0 file and the
binary of the compiled hex0. They are basically the same thing, but they are
included for readability.</p>
<h4>The assembly</h4>
<p>The first step for me was to write the assembly file. It’s easy once you know
how to make system calls in <span class="caps">POSIX</span>.</p>
<p><strong><span class="caps">POSIX</span> system calls in <span class="caps">RISC</span>-V</strong> are pretty easy:</p>
<ul>
<li>Load the arguments for the call in registers <code>a0</code>, <code>a1</code>…</li>
<li>Load the <em>syscall number</em> in <code>a7</code></li>
<li>Run <code>ecall</code></li>
</ul>
<p>The result of the system call comes in <code>a0</code>.</p>
<p><strong>Input arguments</strong> are also important, because we need to be able to tell to
Hex0 which is the file we want to compile, and where to put its output.</p>
<p>That’s pretty easy, input arguments are inserted in the stack, so we can load
them by <em>pop</em>-ing them. As in any C program, the first element we get is the
amount of arguments and the rest of them are the arguments themselves.</p>
<p>Putting all together, if you take a look to the first block of the program:</p>
<pre class="highlight"><code class="language-asm">_start:
ld a0, 0(sp) # Get number of the args
ld a1, 8(sp) # Get program name
ld a2, 16(sp) # Input file name
...
# Open input file and store FD in s2
li a7, 56 # sys_openat
li a0, -100 # AT_FDCWD
mv a1, a2 # input file
li a2, 0 # read only
ecall
mv s2, a0 # Save fd in for later
</code></pre>
<p>In the first chunk we are reading the stack, element by element, and in the
second we are opening the input file, using the filename we just obtained from
the stack.</p>
<p>Simple.</p>
<p>As a note, I’d like to remind you to finish your program, because if you don’t
it will continue to execute the memory after it and it’ll explode in your face:</p>
<pre class="highlight"><code class="language-asm">terminate:
# Terminate program with 0 return code
li a7, 93 # sys_exit
li a0, 0 # Return code 0
ecall
</code></pre>
<p>This tells the <span class="caps">OS</span> to finish the execution.</p>
<p>The internals of the assembly file are simple, I won’t explain them in detail.
It basically iterates character by character, removing the comments and
converting the hex value couples to a byte.</p>
<p>Read it and tell me if you need help understanding it!<sup id="fnref:contact"><a class="footnote-ref" href="#fn:contact">2</a></sup></p>
<h4>The conversion to the Hex0</h4>
<p>Hex0, as I said, is an <span class="caps">ELF</span> file, written in hexadecimal, so we need to compile
our assembly file to binary and represent each of the instructions in
hexadecimal. And we need to resolve all the labels to final addresses.</p>
<p>There’s no easy way to do it. I started doing it by hand, reading the <span class="caps">RISC</span>-V
spec and converting the instructions one by one. But I tackled several
difficulties doing that.</p>
<p>Pseudoinstructions are expanded to more than one instruction so we need to be
careful in the comments and explain that correctly. Also, we need to resolve
the addresses accordingly. For example:</p>
<pre class="highlight"><code class="language-asm">la a1, buffer
</code></pre>
<p>This is a pseudoinstruction, and we need to resolve it to:</p>
<pre class="highlight"><code class="language-asm">auipc a1, 0
addi a1, a1, $OFFSET
</code></pre>
<p>Where <code>$OFFSET</code> is the offset from that instruction to the label <code>buffer</code>.</p>
<p>These kind of expansions change our perception of the amount of instructions we
have and we have to be extremely careful. I didn’t even mention the case where
the offset is very large! That’s another story (thankfully we don’t have to
deal with yet).</p>
<p>Once the pseudoinstructions are expanded we need to convert them to the hex
value, and I swear it’s the most boring task I ever made in my life. Basically
because the <span class="caps">RISC</span>-V instructions are not easy to map to their binary (for
reasons related with the hardware implementation).</p>
<h5>The eagles are coming!</h5>
<p>But I had a trick, a deus ex machina that would safe my life. During the last
months I’ve been randomly working on a Scheme compiler for <span class="caps">RISC</span>-V assembly and
that made me start making a <a href="http://git.elenq.tech/pysc-v/"><span class="caps">RISC</span>-V assembly interpreter and compiler in
python</a>. It’s still an early <span class="caps">WIP</span>, and was
almost abandoned, but it has the basic machinery that lets me compile simple
instructions to hex.</p>
<p>With this dirty glue code I was able to compile the instructions one by one:</p>
<pre class="highlight"><code class="language-python">from registers.RV32I import *
from InstructionSets.RV64I import *
Regs = RegistersRV32I()
def x(registerName):
return Regs.getPos(registerName)
def compile(instruction):
hexrepr = hex(instruction.compile().value)
hexval = hexrepr[2:]
if len(hexval) < 8:
hexval = "0" * (8 - len(hexval)) + hexval
final = ""
for i in range(0,8,2):
final += hexval[i:i+2] + " "
final = final.rstrip().upper()
return " ".join(reversed(final.split(" ")))
</code></pre>
<p>I just needed to open a python shell and write something like:</p>
<pre class="highlight"><code class="language-python">compile( addi(x("a0"), x("a1"), 12) )
</code></pre>
<p>And that would compile that instruction for me, giving me the output in a
beautiful hexadecimal format.</p>
<pre class="highlight"><code class="language-hex">13 85 C5 00
</code></pre>
<p>Not the best <span class="caps">UX</span> but usable enough for a small file like this.</p>
<h5>The addresses</h5>
<p>The addresses are still something to solve.</p>
<p>I’m an idiot so I counted the instructions by hand and then realized I had to
expand some pseudoinstructions I forgot, so all the branch instructions were
broken. <em>Yes, I’m like that</em>.</p>
<p>Try to be smarter than I am. Use this trick:</p>
<p>Leave all the instructions that use addresses set to a wrong address, like 0 or
something, until you converted the whole file. Once you have that, resolve the
addresses. That way you’ll make sure every pseudoinstruction is expanded and
you’ll be able to use tools that will help you to choose addresses correctly.</p>
<p>The trick I used was to add the <span class="caps">ELF</span> header, compile the file and then inspect
the resulting binary.</p>
<p>For the compilation there are two choices: we can use the assembly file we
wrote previously simply assembling it to a binary and use it as the hex0
assembler; or we can use the high level C prototype that Stage0 provides. In
any case, they have to give the same result.</p>
<p>I still don’t know why <code>objdump</code> is unable to process the binaries of the hex0
files but <span class="caps">GDB</span> is able to do it so… Launch a <span class="caps">GDB</span> as <a href="https://ekaitz.elenq.tech/lightening.html">I explained in the
previous post</a> and disassemble the
whole file<sup id="fnref:where"><a class="footnote-ref" href="#fn:where">3</a></sup>.</p>
<p>It’ll look like this:</p>
<pre class="highlight"><code class="language-asm">0x0000000000600078: ld a0,0(sp)
0x000000000060007c: ld a1,8(sp)
0x0000000000600080: ld a2,16(sp)
0x0000000000600084: li s4,0
0x0000000000600088: li s5,0
0x000000000060008c: li a7,56
0x0000000000600090: li a0,-100
0x0000000000600094: mv a1,a2
0x0000000000600098: li a2,0
0x000000000060009c: ecall
...
</code></pre>
<p>With that dissasembled result, we can literally make some math with the
addresses and fix all the instructions. We just need to substract the target
address from the current address in the branches, so we get the offset.</p>
<blockquote>
<p><span class="caps">NOTE</span>: Be careful with the <code>la</code> pseudoinstruction (<code>auipc + addi</code>). The base
address here is the one of the <code>auipc</code>, not the one of the <code>addi</code>.</p>
</blockquote>
<h5>The <span class="caps">ELF</span> header</h5>
<p>If we don’t know how to make the <span class="caps">ELF</span> header, we can’t make the previous step,
so better if we mention something about it.</p>
<p>Other files of the project are hex0 files too, so they also have a heavily
commented <span class="caps">ELF</span> header we can use as a reference. Also, <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header">wikipedia has a great
explanation of it</a>.</p>
<p>The main point we need to change is the <code>e_machine</code> field. We need to set it to
<code>0xF3</code>, indicating <span class="caps">RISC</span>-V. Also we need to make sure the flag of 64 bit system
has to be set for <span class="caps">RV64</span> and remember to check the endianness.</p>
<blockquote>
<p><span class="caps">NOTE</span>: Big Endian it’s the most natural way to write the file by hand. If you
want to go for Little Endian this might get weird to write. The python script
above uses Little Endian, watch the <code>reversed</code> call on it.</p>
</blockquote>
<h5>Debugging</h5>
<p>Once you have everything ready you need to make sure it’s doing what it’s
supposed to.</p>
<p>My first working program was failing to use an output file. Someone in the
<code>#bootstrappable</code> <span class="caps">IRC</span> channel (I’m sorry, I can’t remember who was) told me to
<code>strace</code> the program to see what was going on, and with that and some debugging
with <span class="caps">GDB</span>’s <code>layout asm</code>, I was able to figure out one instruction was using a
wrong register.</p>
<p>These tools are important because the all the process is done by hand so there
are many chances to screw up somewhere.</p>
<p><code>strace</code> is extremely handy in this specific program because most of the
functionality it has is based on system calls. If you clean the output
correctly you can see everything the program does accurately.</p>
<hr>
<div style="
text-align: center;
font-size: smaller;
padding-left: 3em;
padding-right: 3em;
padding-top: 1em;
padding-bottom: 1em;
border-top: 1px solid var(--border-color);
border-bottom: 1px solid var(--border-color)">
Remember you can hire <a href="https://elenq.tech">ElenQ
Technology</a> to help you with your research, development or training. <br/>
If you want to encourage my free software work you can support me on <a
href="https://liberapay.com/ekaitz">Liberapay</a>.
</div>
<hr>
<h3>Final thoughts</h3>
<p>This contribution have been a lot of fun. It let me understand a little bit
more about the ecosystem around the full-source bootstrap, which is kinda
complex and includes some other stuff I didn’t even mention.</p>
<p>I learned a lot from this, now I have a deeper understanding of the instruction
formats on <span class="caps">RISC</span>-V and I learned some cool <span class="caps">GDB</span> tricks that are always useful.</p>
<p>Stage0 has a really interesting approach for auditability that’s worth thinking
about. They build everything from a commented binary file (that’s basically
what hex0 is) that acts like a seed, so we can audit everything, including the
very first step. The solution of having the contents of the <span class="caps">ELF</span> file directly
written in hexadecimal is enough to ensure we can certify the contents are what
we expect, and having every instruction commented with its assembly counterpart
gives us the chance to go to the <span class="caps">ISA</span> and check if that’s actually what it is
supposed to be. Perfectly auditable.</p>
<p>Using this first step as a building block for anything else ensures that we
never need to rely on a binary file we can’t know where does it come from.</p>
<p>Really interesting stuff.</p>
<p>Now we have this very first step ported to an open instruction set, what also
opens the door to the auditability from the hardware perspective. Now we can
start thinking about having an auditable software stack in a device we designed
ourselves, so we can audit it too. This is huge.</p>
<p>Now, we need to keep pushing in this direction, porting all the rest of the
steps of Stage0, Mes and many other projects, if we want to reach a full <span class="caps">RISC</span>-V
support. This is just one small step in that direction.</p>
<p>Hey! I almost forgot! And thanks to this I had the chance to work a little bit
more on my assembly interpreter, and recover it from the darkness. That’s also
great. Isn’t it?</p>
<blockquote>
<p>Well, so we learned some things today, but the most important is that all the
stupid things we do, all the random projects we work on, all the experiences
we have in life are not just a <em>waste of time</em>. They may appear to be useful
in the future, but you don’t know when…<br>
What is sure here is if you don’t stay creative and active, you’ll never
have any experience to learn from.</p>
</blockquote>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Not really, in fact, I use my experience as a vehicle to introduce you to
great projects and interesting pieces of knowledge. But ssssh, don’t tell
anyone. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:contact">
<p>Contact me, seriously. I have my <a href="https://ekaitz.elenq.tech/pages/about.html">contact info here</a> <a class="footnote-backref" href="#fnref:contact" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:where">
<p>If you want to know where to start to disassemble, you can just ask
<code>where</code> to <span class="caps">GDB</span>. <a class="footnote-backref" href="#fnref:where" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>RISC-V Adventures: Lightening2021-05-19T00:00:00+03:002021-05-19T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2021-05-19:/lightening.html<p>The port of Lightening, the code generation library used in
Guile Scheme, and other adventures on the low level world of <span class="caps">RISC</span>-V.</p><p>In <a href="https://ekaitz.elenq.tech/2020.html">the latest post</a> I summarized the last
year because I wanted to talk about what I’m doing <strong>now</strong>. In this very moment
I just realized that almost the half of this 2021 is already gone so following
the breadcrumbs until this day could be a difficult task. That’s why I won’t
give you more context than this: <span class="caps">RISC</span>-V is a deep, <em>deep</em>, hole.</p>
<p>I told you I was researching on programming languages and that made me research
a little bit about ISAs. That’s how I started reading about <span class="caps">RISC</span>-V, and I
realized learning about it was a great idea for many reasons: it’s a new thing
and as an R&D engineer I should keep updated and the book I chose is really
good<sup id="fnref:book"><a class="footnote-ref" href="#fn:book">1</a></sup> and gives a great description about the design decisions behind
<span class="caps">RISC</span>-V.</p>
<p>From that, and I don’t really know how, I started taking part on the efforts of
porting Guix to <span class="caps">RISC</span>-V. One of the things I’m working on right now is the
port of the machine code generation library that Guile uses, called
<code>lightening</code>, to <span class="caps">RISC</span>-V, and that’s what I’m talking about today.</p>
<h3>The lightening</h3>
<p>Lightening is a lightweight fork of the <a href="https://www.gnu.org/software/lightning/"><span class="caps">GNU</span>
Lightning</a>, a machine code generation
library that can be used for many things that need to abstract from the target
<span class="caps">CPU</span>, like <span class="caps">JIT</span> compilers or so.</p>
<p>The design of <span class="caps">GNU</span> Lightning is easy to understand. It exposes a set of
instructions that are inspired in <span class="caps">RISC</span> machines, you use those, the library
maps them to actual machine instructions on the target <span class="caps">CPU</span> and returns you a
pointer to the function that calls them. Simple stuff.</p>
<p>The code is not that easy to understand, it makes a pretty aggressive and
clever use of C macros that I’m not that used to read so it is a little bit
hard for me.</p>
<p>I could try to explain the reasons behind the fork, but <a href="https://wingolog.org/archives/2019/05/24/lightening-run-time-code-generation">the guy who did it,
that is also the maintainer of Guile explains it much better than I
could</a>.
But at least I can summarize: lightening is simpler and it fits better what
Guile needs for its <span class="caps">JIT</span> compiler.</p>
<p>Boom! Lightened!</p>
<h3>The process</h3>
<p>So Lightening is basically simpler but the idea is the same. But how do you
make the port of a library like that to other architecture?</p>
<p>The idea is kind of simple, but we need to talk about the basics first.</p>
<p>Lightening (and <span class="caps">GNU</span> Lightning too, but we are going to specifically talk about
Lightening from here) emulates a fake <span class="caps">RISC</span> machine with its functions. It
provides <code>movr</code>, <code>movi</code>, <code>addr</code> and so on. Basically, all those are C functions
you call, but they actually look like assembly. Look a random example here
taken from the <a href="https://gitlab.com/wingo/lightening/-/blob/main/tests/addr.c#L6"><code>tests/addr.c</code> file</a>:</p>
<pre class="highlight"><code class="language-clike">jit_begin(j, arena_base, arena_size);
size_t align = jit_enter_jit_abi(j, 0, 0, 0);
jit_load_args_2(j, jit_operand_gpr (JIT_OPERAND_ABI_WORD, JIT_R0),
jit_operand_gpr (JIT_OPERAND_ABI_WORD, JIT_R1));
jit_addr(j, JIT_R0, JIT_R0, JIT_R1);
jit_leave_jit_abi(j, 0, 0, align);
jit_retr(j, JIT_R0);
size_t size = 0;
void* ret = jit_end(j, &size);
int (*f)(int, int) = ret;
ASSERT(f(42, 69) == 111);
</code></pre>
<p>Basically you can see we get the <code>f</code> function from the calls to <code>jit_WHATEVER</code>,
which include the call to the preparation of the arguments, <code>jit_load_args_2</code>,
and the actual body of the function: <code>jit_addr</code>. The word <code>addr</code> comes from
<em>add</em> and <em>r</em>egisters, so you can understand what it does: adds the contents of
the registers and stores the result in other register.</p>
<p>The registers have understandable names like <code>JIT_R0</code> and <code>JIT_R1</code>, which are
basically the register number (the <code>R</code> comes from “register”).</p>
<p>So, if you check the line of the <code>jit_addr</code> you can understand it’s adding the
contents of the register <code>0</code> and the register <code>1</code> and storing them in the
register <code>0</code> (the first argument is the destination).</p>
<p>That’s pretty similar to <span class="caps">RISC</span>-V’s <code>add</code> instruction, isn’t it?</p>
<p>Well, it’s basically the same thing. The only problem is that we have to emit
the machine code associated with the <code>add</code>, not just writing it down in text,
and we also need to declare which are the registers <code>JIT_R0</code> and <code>JIT_R1</code> in
our actual machine.</p>
<p>Thankfully, the library has already all the machinery to make all that. There
are functions that emit the code for us, and we can also make some <code>define</code>s to
set the <code>JIT_R0</code> to the <span class="caps">RISCV</span> <code>a0</code> register, and so on.</p>
<p>We just need to make new files for <span class="caps">RISC</span>-V, define the mappings and add a little
bit of glue around.</p>
<h3>The problems</h3>
<p>All that sounds simple and easy (on purpose), but it’s not <em>that</em> easy.</p>
<p>Some instructions that Lightening provides don’t have a simple mapping to
<span class="caps">RISC</span>-V and we need to play around with them.</p>
<p>There’s an interesting example: <code>movi</code> (move immediate to register).</p>
<p>Loading and immediate to a register is something that sounds extremely simple,
but it’s more complex than it looks. The <span class="caps">RISC</span>-V assembly has a
pseudoinstruction for that, called <code>li</code> (load immediate) that can be literally
mapped to the <code>movi</code>. The main problem is that pseudoinstructions <em>don’t really
exist</em>.</p>
<p>You all know there are <span class="caps">CISC</span> and <span class="caps">RISC</span> machines. <span class="caps">CISC</span> machines were a way to make
simpler compilers, pushing that complexity to the hardware. <span class="caps">RISC</span> machines are
the other way around.</p>
<p>The <span class="caps">RISC</span> hardware tends to be simple and they have few instructions, the
compiler is the one that has to make the dirty job, trying to make the
programmer’s life better.</p>
<p>Pseudoinstructions are a case of that. The programmer only wants to load a
constant to a register but real life can be very depressing. When you want to
load an immediate you don’t want to think about the size of it, if it fits a
register you are fine, aren’t you?</p>
<p>Pseudoinstructions are expanded to actual instructions by the assembler, so you
don’t need to worry about those details. In fact, <span class="caps">RISC</span>-V doesn’t really have
move instructions, they are all pseudoinstructions that are expanded to
something like:</p>
<pre class="highlight"><code>addi destination, source, 0
</code></pre>
<p>Which means “add 0 to source and store the result in destination”.</p>
<p>The <code>li</code> pseudoinstruction is a very interesting case, because the expansion is
kind of complex, it’s not just a conversion.</p>
<p>In <span class="caps">RISC</span>-V all the instructions are 32bit (or 16 if you take in account the
compressed instruction extension) and the registers are 32bit wide in <span class="caps">RV32</span> and
64bit wide in <span class="caps">RV64</span>. You see the problem, right? No 32bit instruction is able to
load a full register at once, because that would mean that all the bits
available for the instruction (or more!) need to be used to store the immediate.</p>
<p>Depending on the size of the immediate you want to load, the <code>li</code> instruction
can be expanded to just one instruction (<code>addi</code>), two (<code>lui</code> and <code>addi</code>) or, if
you are in <span class="caps">RV64</span> to a series of eight instructions (<code>lui</code>, <code>addi</code>, <code>slli</code>,
<code>addi</code>, <code>slli</code>, <code>addi</code>, <code>slli</code>, <code>addi</code>.). There are also sign extensions in the
middle that make all the process even funnier.</p>
<p>Of course, as we are generating the machine code, we can’t rely in an assembler
to make the dirty job for us: we need to expand everything ourselves.</p>
<p>So, something that looked extremely simple, the implementation of an obvious
instruction, can get really messy, so we need a reasonable way to check if we
did the expansions correctly.</p>
<p>And we didn’t talk yet about those instructions that don’t have a clear mapping
to the machine!</p>
<p>Don’t worry: we won’t. I just wanted to point the need of proper tools for this task.</p>
<h3>The debugging</h3>
<p>The debugging process is not as complex as I thought it was going to be, but
my setup is a little bit of a mess, basically because I’m on Guix, which
doesn’t have a proper support for <span class="caps">RISC</span>-V so I can’t really test on my machine
(if there’s a way please let me know!).</p>
<p>I’m using an external Debian Sid machine (see acknowledgements below) for it.</p>
<p>I basically followed the <a href="https://wiki.debian.org/RISC-V#Cross_compilation">Debian
tutorial</a> for cross
compilation environments and Qemu and everything is perfectly set for the task.</p>
<p>Next: how to debug the code?</p>
<p>I’m using Qemu as a target for <span class="caps">GDB</span>, so I can run a binary on Qemu like this:</p>
<pre class="highlight"><code class="language-sh">qemu-riscv64-static -g 1234 test-riscv-movi
</code></pre>
<p>Now I can attach <span class="caps">GDB</span> to that port and disassemble the <code>*f</code> function that was
returned from Lightening to see if the expansion is correct:</p>
<pre class="highlight"><code>$ gdb-multiarch
GNU gdb (Debian 10.1-2) 10.1.90.20210103-git
...
For help, type "help".
Type "apropos word" to search for commands related to "word".
(gdb) file lightening/tests/test-riscv-movi
Reading symbols from lightening/tests/test-riscv-movi...
(gdb) target remote :1234
Remote debugging using :1234
0x0000000000010538 in _start ()
(gdb) break movi.c:15
Breakpoint 1 at 0x1d956: file movi.c, line 15.
(gdb) continue
Continuing.
Breakpoint 1, run_test (j=0x82e90, arena_base=0x4000801000
"\023\001\201\377#0\021", arena_size=4096) at movi.c:15
15 ASSERT(f() == 0xa500a500);
(gdb) disassemble *f,+100
Dump of assembler code from 0x4000801000 to 0x4000801064:
0x0000004000801000: addi sp,sp,-8
0x0000004000801004: sd ra,0(sp)
0x0000004000801008: lui a0,0x0
0x000000400080100c: slli a0,a0,0x20
0x0000004000801010: srli a0,a0,0x21
0x0000004000801014: mv a0,a0
0x0000004000801018: slli a0,a0,0xb
0x000000400080101c: addi a0,a0,660 # 0x294
0x0000004000801020: slli a0,a0,0xb
0x0000004000801024: addi a0,a0,20
0x0000004000801028: slli a0,a0,0xb
0x000000400080102c: addi a0,a0,1280
0x0000004000801030: ld ra,0(sp)
0x0000004000801034: addi sp,sp,8
0x0000004000801038: mv a0,a0
0x000000400080103c: ret
0x0000004000801040: unimp
...
</code></pre>
<p>Of course, I can debug the library code normally, but the generated code has to
be checked like this, because there’s no debug symbol associated with it and
<span class="caps">GDB</span> is lost in there.</p>
<p>Important stuff. Take notes.</p>
<hr>
<div style="
text-align: center;
font-size: smaller;
padding-left: 3em;
padding-right: 3em;
padding-top: 1em;
padding-bottom: 1em;
border-top: 1px solid var(--border-color);
border-bottom: 1px solid var(--border-color)">
This free software work is also work. It needs funding!<br/>
Remember you can hire <a href="https://elenq.tech">ElenQ
Technology</a> to help you with your research, development or training. <br/>
If you want to encourage my free software work you can support me on <a
href="https://liberapay.com/ekaitz">Liberapay</a>.
</div>
<hr>
<h3>The acknowledgements</h3>
<p>It’s weird to have acknowledgments in a random blog post like this one, but I
have to thank my friend <a href="https://56k.es/">Fanta</a> for preparing me a Debian
machine I can use for all this.</p>
<p>Also I’d like to thank Andy Wingo for the disassembly trick you just read.
Yeah, there were no chances I discovered that by myself!</p>
<h3>The code</h3>
<p>All the process can be followed in the gitlab of the project where I added a
Merge Request. Feel free to comment and propose changes.</p>
<p><a href="https://gitlab.com/wingo/lightening/-/merge_requests/14/commits">Here’s the link</a>.</p>
<h3>The future</h3>
<p>There’s still plenty of work to do. I only implemented the basics of the <span class="caps">ALU</span>,
some configuration of the <span class="caps">RISC</span>-V context like the registers and all that, but
I’d say the project is in the good direction.</p>
<p>I don’t know if I’m going to be able to spend as much as time as I want on it
but I’m surely going to keep adding new instructions and eventually try to wrap
my head around how are jumps implemented.</p>
<p>It’s going to be a lot of fun, that’s for sure.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:book">
<p>It’s available for free in some languages and it’s 20 bucks in
English. Totally worth it:<br>
<a href="http://www.riscvbook.com/">http://www.riscvbook.com/</a> <a class="footnote-backref" href="#fnref:book" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Review of 20202021-05-16T00:00:00+03:002021-05-16T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2021-05-16:/2020.html<p>The review of our year 2020 at ElenQ Technology.</p><p>It’s been a while since the previous post here, and it’s not because I don’t
have anything to talk about. I’ve been working on many things since the
previous one.</p>
<p>I wanted to write specifically about something I’m doing these days, but that’s
difficult to contextualize if there’s a full year gap in the middle. So I
decided to talk about the 2020 and make a short review about what we did so we
can look forward and see what can we build from this.</p>
<h4>2020 at ElenQ Technology</h4>
<p>2020 have been harsh for everyone, including ElenQ Technology. We started the
year with a lot of energy and we were pretty busy with courses here and there.
But then the pandemic came and all the in-person training stopped so we lost
our main income source, which is also one of the works I personally enjoy the most.</p>
<p>So, after finishing our course on <em>Modern C++</em> in July (we’ll talk about that
in a future post), right after we were freed from the lockdown here, everything
stopped. No more in-person courses, no more clients, nothing.</p>
<p>We knew that the pandemic was affecting the economy so we were well aware that
there were few chances to get clients in the rest of the year. Thankfully, we
had some work to do: <a href="https://publishing.elenq.tech/en/">ElenQ Publishing</a>.</p>
<p>We spent the summer and part of the autumn preparing the books, the printing
and making the paperwork as well as the tools we needed for the website and
future books. By November 13 we already had every book shipped and the
website was almost ready. At the beginning of December, the website was
finished and published.</p>
<p>It was more work than we expected but now we have a complete set of tools for
future publications, that can cover any of the points of the process with
almost no human interaction. We automated almost everything, and those things
we didn’t automate are simple things once you know how to make them.</p>
<p>Of course, as engineers, we only consider automating things that we are going
to repeat so you can think about all this work as a plan to keep publishing new
material in the future.</p>
<p>It’s really interesting to mention that our whole process is reproducible as we
are using <a href="https://guix.gnu.org">Guix</a> as a tool, so no matter what happens we
could still go back in time and remake the books exactly as they were when we
published them.</p>
<p>As you see, at a company level, most of our work of 2020 was focused on
teaching and making the books (another form of teaching), because it’s
something I personally enjoy a lot and I’d say it’s more fulfilling than
anything else I’ve done. But it was sadly affected by the pandemic, so we need
to reorganize a little bit our strategy.</p>
<h4>Personal level</h4>
<p>Of course, I spend time on other things too. A great part of my job is to
randomly research anything I find interesting, so I can keep my mind fresh for
the possible projects that may come. This gives me tools and ideas, and also
lets me learn from other people.</p>
<p>During the year I spent some time contributing to Guix, for <a href="https://ekaitz.elenq.tech/donations-guix-01.html">reasons I already
discussed here</a>. The most
notable contributions were the addition of a really interesting package that
was missing: Meshlab; and the correction of a package that was failing to
compile for months: FreeCAD.</p>
<p>Being locked at home, I also had the chance to go back to electronics, which
are a huge part of what I studied at university, but I never had the chance to
work on that in a professional level. I even designed some PCBs, produced
and soldered them with the highest level of quality possible. It was a great experience.</p>
<p>On the other hand, I also needed some time to relax and try to recover from
some longstanding health issues I’ve been dealing with, that also deteriorated
because of the pandemic.</p>
<p>After some time practicing yoga and taking care of my body, I feel much better
in general, even if my issues are still there, at least they are not aggravated
by the bad posture and the physical stress that working in a computer can
provoke. So, if you are open to a suggestion: stretch, make some strength
exercises and try to keep your body on shape, specially if you work in an
office or any other kind of sedentary work that makes use of repetitive
movements like using a mouse or typing in a keyboard.</p>
<h5>December</h5>
<p>As I mentioned, our work with ElenQ Publishing was done at the beginning of
December. We approached that as a chance to stop and think.</p>
<p>During the last three years I had few chances to focus on an specific subject
for a long time, I had to quickly jump from one thing to another, in order to
be able to reach all the projects we had.</p>
<p>I was frustrated because of that. I’m easily distracted and it’s hard for me to
pay attention for a while to the same thing but I really like to understand
things <strong>deeply</strong>, those who know me or that attended to my courses know it,
and my everyday life, full of stress and various stimulus, was making me unable
to concentrate.</p>
<p>I had moments of attention and clearness of mind during the pandemic (and due
to the pandemic) that made me feel in peace so I wanted to feel that kind of
frustration-less live on purpose, not only when things come like that.</p>
<p>So that’s what I did. I just needed something to investigate, something I was
interested since the early beginning of my career: programming languages.</p>
<p>I collected some books on compiler implementation and started reading them,
then I realized I was interested on operating system implementation so I read
about that too. Both things need to run somewhere so I also spent some time
digging on various architectures and their instruction sets, and so on.</p>
<p>I started developing a simple <a href="https://github.com/ekaitz-zarraga/blas">Scheme
implementation</a> (only started, not
finished or anything) that served as an excuse to have a goal in mind in the
process. Also, I decided to <a href="https://twitch.tv/ekaitzza">live stream</a> my
research process so I could share my findings with others and let them provide
me some thoughts and help me go slowly, paying attention to the interesting details.</p>
<p>And let me tell you compiler implementation is often a difficult subject for
me, specially the theory, because my background is lacking some of the concepts
that Computer Science students have but I have to study from scratch<sup id="fnref:note"><a class="footnote-ref" href="#fn:note">1</a></sup>.</p>
<p>Having the chance to tackle a difficult long term task helped me forget and not
worry about the <em>bad</em> year we had as a company, in which we only had actual
paid work during the first half of the year. I was just grateful to be able to
sustain myself enough time to have the chance to breathe and spend more time
with myself, doing something I don’t always have the chance to do, regardless
of everything we, individually and collectively, were going through.</p>
<p>I hope you had some moments of relief too.</p>
<h5>What I learned</h5>
<p>I obviously learned many things during the year (books have been read!) But I
don’t want to focus on that.</p>
<p>Sometimes the most important thing is not the goal, but the process. You learn
more from the travel than from the arrival, right?</p>
<p>I like to think that I learned to care more about myself in 2020. I’m still
sick, and my recovery got stuck as I was literally stuck at home, but that’s
just a temporary issue, because I’m taking care of myself. Maybe not everyday,
but almost everyday I take care of myself. That’s what counts.</p>
<p>2020 taught me how to make a publishing house. That’s some important piece of
knowledge, but I consider more valuable to reclaim my time and my attention.
That taught me an important lesson by itself and it also served me to learn
about myself.</p>
<p>I learned that I was feeling alone in my interests. I had no one to share my
interests with. I know it is surprising to you, but basically nobody is
interested on how do garbage collectors, processors or anything like that work.
Most of the people don’t even care about what they are. Crazy huh?</p>
<p>Sharing my findings, my research and my errors with other people makes me feel
better. I feel someone is there, on the other side. It helps me avoid
the frustration and the lack of motivation I have been feeling during the last years.</p>
<p>The streaming helped with that<sup id="fnref:english"><a class="footnote-ref" href="#fn:english">3</a></sup>: I had people reacting instantly, some
sent me papers to read, ideas, and others proposed me interesting things to do.
That feels good. It helped me remember that I’m not alone.</p>
<p>If 2020 had taught me anything is that I, or we, need others to feel better.
We need to take care of people<sup id="fnref:people"><a class="footnote-ref" href="#fn:people">4</a></sup>, because life is much better with them.</p>
<p>On top of many things, being conscious that I was researching <strong>deep</strong> opened
the door to apply that deepness in my everyday life more often. Not that I
wasn’t doing that before, those who know me are aware that I’m kind of an
intense guy, but that I’m more conscious about it and I can selectively choose
to go deeper about my thoughts and feelings.</p>
<p>This time for myself remind me how intense I was back then and how I enjoyed
being a dedicated person.</p>
<h4>So what</h4>
<p>As I said, in a company level I decided to use that time to arrange a new
strategy. I wouldn’t say I changed it that much, because I was in peace when it
was developed, almost 4 years ago, but it let me rethink it taking in account
my professional and personal experience in the recent years.</p>
<p>Collaborating on free software projects has shown me that I feel comfortable
with larger codebases and more complex concepts that were too much for me in
the past. Now I feel more confident about that.</p>
<p>Of course, this came with practice and time, but also after years of stressful
work and random research that is not really fulfilling. I don’t mean that you
need to spend time on that to be able to tackle bigger projects. I mean that my
past is part of what I am now, and even the bad times can help forge a better future.</p>
<p>I decided to keep researching the way I was, because it’s something that makes
me feel good, and work more slowly, but paying attention to the details as I
like to do.</p>
<p>I’ll try to share more about my work, in a technical and a personal level. I’ll
keep streaming for some time, and I’ll try to use this blog more, as I was in
the past.</p>
<p>So, as I was saying, all this year helped me remember about important things,
and forget a little bit about urgent things.</p>
<blockquote>
<p><span class="dquo">“</span>Instead of swimming fast trying to reach as far as I could, pumping my
blood, splashing water around and having to take a short breath between each
arm stroke, now I want to dive. I’m far enough from the coast, already.</p>
<p>I want to stay in the surface until I’m ready, having some rest and breathing
as much as I want, and then, I’ll dive. I’ll discover the colors of the
coral reef, the sea creatures and even the deepest darkness if I feel like
it. When I’m done or I’m tired, I’ll go back to the surface, take a deep
breath and have some rest, feeling the sun in my face, until the next immersion.</p>
<p>I’m not going anywhere. I’m not in a hurry anymore.”</p>
</blockquote>
<div class="footnote">
<hr>
<ol>
<li id="fn:note">
<p>But hey, I’m much more comfortable with low level stuff like ISAs and
all that. My degree is not useless after all. <a class="footnote-backref" href="#fnref:note" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:blog">
<p>In this blog, as contrast, I can’t really know how many people reads
or interacts with what I write. So I encourage you to contact me and share
ideas! <a class="footnote-backref" href="#fnref:blog" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:english">
<p>Making the videos also helped me to feel more confident about my
English (people understand what I say!) and that is helping me tackle larger
projects that involve people from different places. <a class="footnote-backref" href="#fnref:english" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:people">
<p>More now, that we have some heavy shit going on out there. <a class="footnote-backref" href="#fnref:people" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Our own git server2020-07-09T00:00:00+03:002020-07-09T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-07-09:/git-repo.html<p>How to set up a git server with simple tools</p><p>Having some rest these days after some really hardworking months… I decided I
wanted to solve something that was on my to-do list for a long time. Really
long time.</p>
<p>I wanted to have my own git repository for ElenQ and my personal projects
(which are the same thing because I take ElenQ <em>very</em> personally) and so I did.</p>
<p>You may think: so you installed Gitea or something and you are done, right?</p>
<p>But the answer is no.</p>
<p>That would be the normal approach if I didn’t have the arbitrary constraints I
imposed. This text is about those weird constraints and random thoughts I have
about this, and also about what’s my current setup and how to make it. Serving
the second part as a <em>tutorial</em> for myself if I screw up and I need to start
over and also as a way to consciously think about what I did.</p>
<h3>Context: random thoughts</h3>
<p>For me code is not a social network and it shouldn’t be. I understand why
github is the way it is but for me it’s just code. I don’t need to show how
much I code, I don’t need to follow, like, star, fork, or even share my opinion
about other people’s work publicly. That’s completely unrelated to the job.</p>
<p>Large projects like github are changing the way we collaborate. I’m not against
that, but looks like we start to forget that git doesn’t need anything else to
function. There’s no need for pull/merge requests, for web servers or anything.</p>
<p>Web interfaces for code are cool, but nothing is better than your own editor.
I realized I just clone the repositories I want to dig in and search with my
own tools in my local clone so… Why bother to have a powerful<sup id="fnref:powerful"><a class="footnote-ref" href="#fn:powerful">1</a></sup> web interface?</p>
<p>I don’t like to be forced to register in a platform just for sending patches or
taking part in a project. Why do I <strong>need</strong> to have a github account?</p>
<p>ElenQ Technology currently uses a free gitlab account, but recently I’ve
started to be concerned about gitlab’s business practices so I prefer to start
migrating out of it. I’ve seen they always send you to login page when you hit
a 404, and all that kind of weird behaviours that don’t look they have been
done by accident<sup id="fnref:gitlab"><a class="footnote-ref" href="#fn:gitlab">2</a></sup>. Of course, there’s also the fact that they are a
startup and all that. I don’t really trust them. But that’s a different story.
I still like the fact that their community edition is free software. It’s a
business model we should do more.</p>
<p>Gitea and Gogs are easy to install, which is a must for me, and they are simple
and useful, but replicate the same model. It’s much better to self-host your
code than relying in a third party, no doubt. But that makes the login problem
even harder: more gitea or gogs instances we create more separate places to
register in<sup id="fnref:forgefed"><a class="footnote-ref" href="#fn:forgefed">3</a></sup>.</p>
<p>Those free software tools solve the problem of the centralization in different
scales, but they are small social networks, still.</p>
<h3>Possible replacements</h3>
<p>I’m ok with sending an email and I’m ok with receiving emails from people.</p>
<p>With git’s email workflow (<code>git sendmail</code>, or <code>git format-patch</code> if you don’t
want to configure your connection to your email account) you can send patches
via email that can be applied in the code directly. That’s more than enough for
many projects.</p>
<p>Issues, suggestions and questions can be sent via email with no hassle, too.</p>
<p>The possibility to clone the repositories via git protocol gives people the
chance to check the code freely in their editor of choice, without being
tracked while they browse<sup id="fnref:editor"><a class="footnote-ref" href="#fn:editor">4</a></sup>.</p>
<h3>Problems</h3>
<p>Issue management makes perfect sense to me and it’s a process that is cool to
have in the open. It helps people take part in projects, check what’s the
status of the project, collaborate more effectively, share the bugs they find
and so on. But, for the kind of projects I have, issue management is more of a
problem than a solution. I’ve been receiving spam in my gitlab issues for a
while. To be honest, there have been more spam than real issues in my gitlab account.</p>
<p>There’s not any easy way to fully replace an issue management tool, though.
Maybe a good use of the <code>README.md</code> and some extra files in the repository can
help. People are still able to reach and share their bug reports via email
without being publicly exposed.</p>
<p>That’s also a thing: if you let people interact freely on an issue board you
need some moderation (which requires skills and effort). It is true that people
may come to very interesting ideas if working together, but it’s also true that
only happens in very popular projects<sup id="fnref:popularity"><a class="footnote-ref" href="#fn:popularity">5</a></sup>. Handling that privately
helps to avoid misunderstandings you can’t control.</p>
<p>Apart from that, we have to admit only sharing repositories via git protocol
has exactly <code>0</code> discoverability, so we have to share them in a website or
something. Maybe not for interacting with them, but at least to show them.</p>
<p>Git is able to handle a website via gitweb too, but it’s simple, a little bit
hard to configure and not too fast. Also, it can be more visually appealing by default.</p>
<p>On the owner’s side, it’s interesting to be able to decide which repositories
you want to share with the public. Being able to give permissions to specific
people without giving them permissions to the whole server is also nice. If
the permissions can be set in specific branches of the repositories better.</p>
<h3>Other option</h3>
<p><a href="https://fossil-scm.org/home/doc/trunk/www/index.wiki">Fossil-scm</a> is really
interesting. It comes with support for issues and wikis, and devnotes are a
great idea I’m sure I could take advantage of.</p>
<p>But the tool itself is not as good as git in my opinion.</p>
<p>Fossil uses SQLite databases for its things (it’s developed by SQLite’s
developers) which is cool sometimes but in other times is not as good as it
sounds. I’m getting too used to plain text files maybe?</p>
<p>I tried to configure a multi-repository fossil for the server and I gave up in
the past but it’s probably my fault rather than theirs.</p>
<p>If you are interested on trying something new, you should take a look to
fossil. If you do, please, <a href="https://ekaitz.elenq.tech/pages/about.html">contact
me</a> and tell me your experience
with it.</p>
<h3>My solution</h3>
<p>For the permissions I used Gitolite, which is an authorization control that
makes heavy use of ssh. It uses a management repository where the administrator
can add users’ public keys and each project’s permissions and metadata.</p>
<p>It basically creates ssh pseudo-sessions that are locked in gitolite-shell,
which decides if the user has access to the repo or not. Interesting use of ssh
for this. <a href="https://gitolite.com/gitolite/overview.html">Read more in the website</a>, they explain it much better
than I can.</p>
<p>For the website I chose <code>cgit</code>, which is famous for being fast (cached by
default) and reliable, and turned out to be easy to configure.</p>
<p>Both projects are in the order of some thousands lines of code, which is an
amount I could manage to read and edit if I want to.</p>
<h3>How to configure</h3>
<p>Well, this is the reminder for myself, but it can be useful for you too.</p>
<p>I installed both of the projects using debian’s package repository.</p>
<h4>Gitolite</h4>
<p>By default, debian package creates a <code>gitolite3</code> user so you have to take that
in account if you want to make gitolite work in a debian machine (other
machines will have other details to check).</p>
<p>Gitolite’s debian package also asks for administrator’s public ssh-key so you
have to provide it sooner or later. Once that’s done you’ll get a fantastic
<code>/var/lib/gitolite3</code> folder with everything you need. You’ll see that folder
contains a <code>projects.list</code> file, that lists the git repositories, a
<code>repositories</code> folder with the repositories, a <code>.gitolite</code> folder and a
<code>.gitolite.rc</code> file. The last one needs some changes in order to work correctly
with cgit:</p>
<h4>Enable cgit access to the repos</h4>
<p>Set <code>.gitolite.rc</code><span class="quo">‘</span>s <code>UMASK</code> to <code>0027</code> to give group access to new repositories,
that will let other users in the group (cgit and git-daemon) access the repositories.</p>
<p>You probably don’t want to share the gitolite-admin repository so leave it with
the permissions it came with. If you screw up here or there don’t be afraid to
<code>chmod</code> any repository later.</p>
<p>You also need to make <code>GIT_CONFIG_KEYS</code> more permissive (<code>.*</code> if you are crazy
enough) if you want Gitolite to be able to load git configuration. That way
you’ll be able to set gitweb description in the repository that cgit can read.</p>
<h4>Enable git unauthenticated clone</h4>
<p>There are a couple of ways to do this. The first is to set the <span class="caps">HTTP</span> mode, that
is something I didn’t do but you can check how to do it in the docs.</p>
<p>I used git-daemon for git based unauthenticated clones. It’s simple but you
may need to create your own systemd service or something:</p>
<pre class="highlight"><code class="language-systemd-service"># git.service file
[Unit]
Description=Start Git Daemon
[Service]
ExecStart=/usr/bin/git daemon --base-path=/var/lib/gitolite3/repositories --reuseaddr /var/lib/gitolite3/repositories
Restart=always
RestartSec=500ms
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=git-daemon
User=gitdaemon
Group=gitolite3
[Install]
WantedBy=multi-user.target
</code></pre>
<p>Once you do that you should add it to systemd with <code>systemctl enable
/path/to/git.service</code> or something like that. Once added you can start it.</p>
<p>But that’s not going to show any repository because you didn’t export any. If
you want to export them, Gitolite has an specific configuration option you have
to set in the <code>gitolite-admin</code> repo. You have to give the user <code>daemon</code> read access:</p>
<pre class="highlight"><code class="language-perl">repo testing
# daemon is what adds the daemon-export
R = daemon
# You should add some extra people too...
# This is for cgit:
config gitweb.description = For testing purpose.
config gitweb.owner = Ekaitz
</code></pre>
<p>When you add the <code>daemon</code> access Gitolite adds a <code>git-daemon-export-ok</code> file to
the repository that says to git-daemon the project can be shared. It won’t be
possible to push to it anyway because we didn’t allow it in the git-daemon configuration.</p>
<h4>cgit</h4>
<p>Some cgit configuration does the rest. This is my example configuration on
cgit. I’ll probably change it soon, but there it goes:</p>
<pre class="highlight"><code class="language-bash"># cgit config
# see cgitrc(5) for details
css=/cgit.css
logo=/cgit.png
footer=/usr/share/cgit/footer.html
repository-sort=age
# if cgit messes up links, use a virtual-root.
# For example, cgit.example.org/ has this value:
virtual-root=/
clone-url=git://$HTTP_HOST/$CGIT_REPO_URL
# gitolite3@$HTTP_HOST:$CGIT_REPO_URL
enable-index-links=1
enable-index-owner=1
enable-git-config=1
enable-gitweb-owner=1
remove-suffix=1
# Readmes to use
# readme=README.md
# you can set more of them here like README.rst and stuff, but all of them
# require some rendering I didn't want to configure.
# Set title and description
root-title=ElenQ Technology
root-desc=Software repository for ElenQ
root-readme=/usr/share/cgit/root-readme.html
project-list=/var/lib/gitolite3/projects.list
scan-path=/var/lib/gitolite3/repositories
# Mimetypes
mimetype.gif=image/gif
mimetype.html=text/html
mimetype.jpg=image/jpeg
mimetype.jpeg=image/jpeg
mimetype.pdf=application/pdf
mimetype.png=image/png
mimetype.svg=image/svg+xml
</code></pre>
<p>But cgit is still unable to see the projects because it’s not part of the
<code>gitolite3</code> group. Make it part of the <code>gitolite3</code> group with <code>usermod</code> or something.</p>
<p>Also, cgit is a web server you have to add to your stuff. I have an nginx based
config so I need to add cgit to it. Cgit can work with uWSGI or fcgiwrap. I
chose the latter for no real reason:</p>
<pre class="highlight"><code class="language-nginx">server {
listen 80;
listen [::]:80;
server_name git.elenq.tech;
root /usr/share/cgit;
try_files $uri @cgit;
location @cgit {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME /usr/lib/cgit/cgit.cgi;
fastcgi_param PATH_INFO $uri;
fastcgi_param QUERY_STRING $args;
fastcgi_param HTTP_HOST $server_name;
fastcgi_pass unix:/run/fcgiwrap.socket;
}
}
</code></pre>
<p>Also you may be interested on <span class="caps">HTTPS</span> support, but you know how to add that
(certbot does, and it’s not hard to do).</p>
<h3>Closing words</h3>
<p>Now it’s live at <a href="https://git.elenq.tech">https://git.elenq.tech</a>. If you were
wondering, cloning and pushing from there crazy fast, and the server that hosts
it is the cheapest server possible. It’s much faster than github, or that’s at
least my impression.</p>
<p>So yeah… That’s most of it. </p>
<p>I just wanted to share some thoughts about software development workflow and
find an excuse to write down my configuration since I had issues to find any
explanation that had all the points I needed together.</p>
<p>And I think I did, didn’t I?</p>
<p>Stay safe.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:powerful">
<p>There’s no power for free. Powerful also means resource-intensive. <a class="footnote-backref" href="#fnref:powerful" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:gitlab">
<p>cHeCKiNg YOur brOWsER bEfOrE AcCeSsIng gitLAB.cOm. <a class="footnote-backref" href="#fnref:gitlab" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:forgefed">
<p>Maybe not. Forgefed is doing a good job:
<a href="https://forgefed.peers.community/">https://forgefed.peers.community/</a> <a class="footnote-backref" href="#fnref:forgefed" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:editor">
<p>It also depends on the editor you choose. Choose wisely. <a class="footnote-backref" href="#fnref:editor" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:popularity">
<p>If a project I make reaches that kind of popularity, I’ll open
a tool for that kind of discussion or maintain a mirror somewhere else. <a class="footnote-backref" href="#fnref:popularity" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
</ol>
</div>ElenQ Donations — Chibi Scheme2020-05-31T00:00:00+03:002020-05-31T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-05-31:/donations-chibi-02.html<p>Donation to Chibi Scheme programming language</p><p>In a previous post I already talked about why I consider important to donate
money or time to Free Software projects.</p>
<p>This time I want to talk about my recent contributions to Chibi Scheme’s
standard library.</p>
<p>Chibi Scheme is a <span class="caps">R7RS</span> scheme implementation that was designed to work as an
extension or scripting language. It’s just a small library you can embed.
That’s similar to Lua, but with <em>a lot</em> of parenthesis<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>.</p>
<p>For those that are not familiar with Scheme: it’s just a programming language
you should read about. You’ll probably discover all those new cool things you
have in your programming language of choice are not that new<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<p>There’s a detail I’d like to talk about, though. Contrary to other programming
language definitions or standards, Scheme’s <span class="caps">R7RS</span> report is something you can
read yourself. It’s less than 100 pages long<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.</p>
<p>If you combine that with the design decisions that Alex Shinn (who also took
part on the <span class="caps">R7RS</span> definition) took on Chibi Scheme, you end up with a <em>simple</em>
programming language you can actually read.</p>
<p>That’s important.</p>
<p>You might wonder why should you care about the readability of a programming
language if you are just a user. The answer is simple too: free software relies
in the fact that you can audit and improve or customize it. If you are not able
to read it you can’t exercise your rights by yourself and you’ll always need to
rely in someone else. That’s not intrinsically bad, that’s the only solution
that non-programmer users have. Programmers need to trust other people in other
things as well so that’s not a major issue.</p>
<p>Problems come when projects get so complicated —and I mean
millions-of-lines-of-code complicated here— only large companies have enough
resources to tackle the task of editing the code. In those cases, software is
not really free anymore, because <em>in practice</em> you are unable to use your
rights and you can’t afford to find someone else to do it for you.</p>
<p>We started to get used to that, though.</p>
<p>Something I learned as a sculptor is the tools that fit you better are those
that you made, you hacked or you get used to. As programmers, we are supposed
to know how to program, so we are supposed to be able to make and hack the
tools but we are deciding to get used to the ones that others built.</p>
<p>The first step to control your workspace, and consequently your own job is to
control your tools<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup>.</p>
<p>I’d love to say those are the reasons why I use Chibi Scheme, but that’s not
<em>totally</em> true. I don’t know why I use it. I just <em>like it</em>.</p>
<p>Anyway, the other day I realized Chibi Scheme’s <span class="caps">JSON</span> parser was unable to parse
Spanish accents so I was unable to control <a href="http://en.goteo.org/project/elenq-publishing">ElenQ
Publishing’s</a> book’s metadata
correctly. That’s a problem.</p>
<p>As the language is simple, I was able to read the standard library and propose
a change that would let the <span class="caps">JSON</span> parser use <span class="caps">UTF</span>-8 characters.</p>
<p><a href="https://github.com/ashinn/chibi-scheme/pull/643">https://github.com/ashinn/chibi-scheme/pull/643</a></p>
<p>During the process I checked <a href="https://github.com/python/cpython/blob/master/Lib/json/decoder.py#L117">CPython’s <span class="caps">JSON</span> parser
implementation</a>
and I realized I could do it better adding surrogate pair support. So I decided
to add it too.</p>
<p><a href="https://github.com/ashinn/chibi-scheme/pull/644">https://github.com/ashinn/chibi-scheme/pull/644</a></p>
<p>Once my changes were merged, I realized it was a good idea to keep going and
add a <span class="caps">JSON</span> encoder, that wasn’t developed yet. So I did.</p>
<p><a href="https://github.com/ashinn/chibi-scheme/pull/648">https://github.com/ashinn/chibi-scheme/pull/648</a></p>
<p>While I was testing my <span class="caps">JSON</span> encoder I realized there was an issue with floating
point numbers in the <span class="caps">JSON</span> parser. So I fixed that too.</p>
<p><a href="https://github.com/ashinn/chibi-scheme/pull/647">https://github.com/ashinn/chibi-scheme/pull/647</a></p>
<p>I also fixed some random indentation issue I found:</p>
<p><a href="https://github.com/ashinn/chibi-scheme/pull/646">https://github.com/ashinn/chibi-scheme/pull/646</a></p>
<p>I didn’t really need to do all what I did, but I did it anyway. I just wanted
to keep Chibi Scheme healthy while I was opening the door to some future
contributions. Now I have a little bit more control on my tooling, and I feel
more comfortable with the fact that I might need to make some changes on
Chibi’s code in the future.</p>
<p>It doesn’t need to be perfect, neither. I’m sure it isn’t, because I didn’t
write C code since I was at university and I had zero experience working on
Chibi-Scheme’s internals. My code was just enough to make the features happen,
now with Alex’s changes the code is running fine and <strong>everyone</strong> can benefit
from this.</p>
<p>So, the message I take from this can be summarized in these points:</p>
<ul>
<li>Use tools you can read and edit like Chibi Scheme or even CPython, which is a
large codebase but is surprisingly easy to read.</li>
<li>Programming languages (or their stdlib) should never be considered something
untouchable. Touch them, change them, make them fit your needs.</li>
<li>Don’t be afraid of tackling something that may seem hard on the first look.</li>
<li>You don’t have to be perfect.</li>
<li>Spend time and energy on stuff that matters.</li>
</ul>
<p>Hope all this —the post and the code— is useful.</p>
<p>Being useful is the greatest goal of an engineer, after all.</p>
<p>Take care.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>And no <code>end</code>-s. Less typing for sure: good for your hands. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>And maybe not that cool neither. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>I’m not going to talk about the implications of that fact. It’s obvious
there must be some kind of trade-off comparing to other standards that are
more than one thousand pages long. I’ll just recommend you to read it, it’s
pretty good: <a href="https://small.r7rs.org/attachment/r7rs.pdf">https://small.r7rs.org/attachment/r7rs.pdf</a> <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p>In fact, that’s the second. I’m supposing we all know what we are doing. <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>ElenQ Donations — Intro + GNU Guix2020-05-25T00:00:00+03:002020-05-25T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-05-25:/donations-guix-01.html<p>Recent ElenQ Technology donation to the great <span class="caps">GNU</span> Guix package manager
and software distribution</p><p>I consider my work part of my responsibility to make this world a better place
so since the early beginning of the company I decided to donate as much as I
could to the free software projects I was using for my work in order to help
the ecosystem be sustainable.</p>
<p>Many times, free software projects that are being extensively used by companies
are considered just <em>free</em> products that don’t carry any kind of responsibility
with them. It is fine to use free software for your own goals (that’s the
freedom 0), but it’s not morally acceptable to base your business model on a
project that independent developers made with no funds (or with very low funds)
and don’t even consider helping them.</p>
<p>We already <a href="https://www.propublica.org/article/the-worlds-email-encryption-software-relies-on-one-guy-who-is-going-broke">had cases</a> of free software developers that are keeping the
projects running with their own expenses while the whole <del>fucking</del>
world is just <em>using</em> the software they make without thinking about their conditions.</p>
<p>ElenQ Technology has been founded by a not-very common individual. That’s obvious.</p>
<p>Sadly, sometimes ElenQ Technology simply can’t afford to donate a part of our
income<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. But I can code.</p>
<p>I can always share my time between projects and my free time on trying to
support free software projects that make <del>my</del> our life easier.</p>
<h3><span class="caps">GNU</span> Guix</h3>
<p><a href="https://guix.gnu.org"><span class="caps">GNU</span> Guix</a> is one of those projects. I started using it a
couple of months ago as a package manager and now I moved to the full software distribution.</p>
<p>For those who don’t know Guix yet, it’s a package manager and a software
distribution like Nix and NixOS are. They are based on the same principle and
have the same core.</p>
<p>The innovation they carry is the transactional package manager that eases
rollbacks and isolated environment creation. In the case of the software
distribution, the whole system can be described by an easy-to-write file that
is also version controlled, so you can always recover an older configuration if
you need to.</p>
<p>All the packages and system descriptions are defined in code. In Nix, they are
defined with Nix programming language (a <span class="caps">DSL</span> for that purpose). In Guix, they
use Guile (scheme) programming language (a general-purpose programming language).</p>
<p>As my work at ElenQ forces me to visit many different codebases and use a wide
variety of software in short term projects, Guix is very handy for me. I can
create a new isolated environment, code on it and, once I’m done, remove it
from my system in the cleanest way.</p>
<p>Also, package definition is easy and straight to the point, so I can package
anything I want just coding few lines.</p>
<p>It’s an interesting project for system administrators too. Machines are easy to
replicate with it and it’s easy to go back if you screwed up in the configuration.</p>
<p>Further than that, they are working really hard on reproducible builds and the
chain of trust that modern software needs.</p>
<p>You should check the project yourselves better for more detailed info.</p>
<h3>So what?</h3>
<p>Since I use the project I’m started to take part on it, packaging new code and
sending simple patches. More I get involved on it more I will do. I’m not
really used to Guix yet, so I didn’t dig the code deeply enough and I’m not
able to code very complex stuff on it.</p>
<p>At the moment, I’m trying to package <strong>Meshlab</strong>, a 3D mesh editing software
you’ve probably heard about.</p>
<p>For that I packaged (already merged) <code>openctm</code>, <code>lib3ds</code> and <code>openkinect</code> (in
its three flavors: C/C++ lib, python bindings and open-cv bindings). And during
that time I also found a couple of details I could improve and I made some
patches for them too.</p>
<p>In the past I also contributed with few package patches, on <code>chicken</code> and
<code>chibi</code> scheme implementations and <code>kitty</code> terminal emulator. You can find all
of them searching my name in the issue board you’ll find in the following link:</p>
<p><a href="https://issues.guix.gnu.org">https://issues.guix.gnu.org</a></p>
<p><span class="caps">GNU</span> Guix is not a very big project and it doesn’t have a large userbase that
can help them grow fast and reliably so it needs some extra help, from me and
surely from you. They have been a very welcoming community so I encourage you
to take part if you are interested on it.</p>
<p>I hope this helps to spark your interest on helping on any project you like and
maybe pressure your company to spend some resources on helping any project they use.</p>
<p>Stay safe.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>You can change that <a href="mailto:hello@elenq.tech">hiring us</a>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>ElenQ Publishing2020-02-18T00:00:00+02:002020-02-18T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-02-18:/elenq-publishing.html<p>ElenQ Publishing is a technical book publishing project that aims to open
the door of technical knowledge to everyone.</p><p>Hi all,</p>
<p>This 18th of February of 2020 the crowdfunding campaign of ElenQ Publishing
started and I’d like to talk a bit about it.</p>
<p><a href="http://en.goteo.org/project/elenq-publishing">http://en.goteo.org/project/elenq-publishing</a></p>
<h2>The platform</h2>
<p>First of all, I want to talk you about the crowdfunding platform:
<a href="http://goteo.org">http://goteo.org</a>.</p>
<p>Goteo is platform for social crowdfunding that aims to support projects with a
social goal. The software it runs is Free Software licensed under the Affero
<span class="caps">GPL</span> license, meaning if you want to make your own crowdfunding platform you can
use the code that Goteo shares, as long as you provide the <del>users</del>
people the source code that is running in your platform (server and client).</p>
<p>Goteo Foundation, the maintainers of the code and the people behind goteo.org,
fund themselves with the 5% of the crowdfunded money from the campaigns, a fair
price for their services. They also receive some help from different government
entities like Barcelona’s local government or Spanish Education, Culture and
Sports Ministry because of their social impact.</p>
<p>At least in Spain, people that takes part in the campaigns run in goteo.org
have the chance to declare they donated money for social goals and get some
money back in their tax returns.</p>
<p>For those who want to make a campaign, Goteo reviews and gives feedback, they
make the campaign management easier and they are really focused on being
multilingual. They support translations for the campaigns and the <span class="caps">UI</span> is in many
languages, including local all Spanish regional languages and some extra
languages more.</p>
<p>This platform is perfectly aligned with the philosophy of ElenQ Technology.</p>
<h2>The story</h2>
<p>I don’t talk about it enough, but I’ve been teaching informatics related topics
since I started with ElenQ Technology. In these 3 years I gave many courses:
introductory python, advanced python, data analysis, web scraping, bitcoin
and blockhain<sup id="fnref:blockchain"><a class="footnote-ref" href="#fn:blockchain">1</a></sup>, introductory clojure… And some more I can’t
remember at the moment. All of those were done in different contexts, from
courses for young unemployed people to courses for engineers in research
centers. Also, it looks I’m going to keep teaching, because I like it and
the students say I’m good at it.</p>
<p>But this is not good enough. It helps me to make a living but it’s not enough.
I want to make my best to correct many of the issues I found in this 3 year travel.</p>
<p>I realized I have some tested course structure and materials I want to share.</p>
<p>I realized many people’s English level is not good enough for learning by
technology by themselves. They needed someone like me to serve as a bridge.
They are isolated from knowledge because the place they come from and the
culture they have.</p>
<p>I realized all the technical publications I was reading were written from the
same perspective of technology. That made sense, because every author authors
came from the same context. I would like to have more diverse people writing
about technology and the only way to do it is to make technology more accessible.</p>
<p>I realized that, in my local context, access to knowledge is broken in many
ways that looks nobody is willing to change:</p>
<ul>
<li>
<p>In my area, government backed courses only focus in groups of people that are
likely to get a job soon anyway. This way they can say they got the job
thanks to the money the government invested on the courses and win elections
with that. Young people that finished university this year are likely to get
a job in the next year. This doesn’t mean they don’t need the
course<sup id="fnref:course"><a class="footnote-ref" href="#fn:course">2</a></sup>, but the course itself won’t really affect their
employability. What about people with <strong>real</strong> employability
problems<sup id="fnref:employability-problems"><a class="footnote-ref" href="#fn:employability-problems">3</a></sup>?</p>
</li>
<li>
<p>Some people have individual problems that don’t fit in the goals of social
campaigns run by the government or other entities because they focus on large
groups of society with similar problems and they don’t focus the individual.
It makes sense because they can help more efficiently that way, but the net
has some holes we should repair.</p>
</li>
<li>
<p>There is no structural support for people who just want to learn new things
with no further intention. University is deriving. Now it’s just a place
where you get a paper that helps you get a job, but it’s not fulfilling the
goals of knowledge it should. It’s not a place where you find knowledge
anymore. Many people don’t want to think or learn, but some do and we are
preventing them from doing it.</p>
</li>
<li>
<p>In other places the problem of education is even worse and they don’t have
resources (or will<sup id="fnref:will"><a class="footnote-ref" href="#fn:will">4</a></sup>) to solve it. Individuals shouldn’t suffer from
that. It’s our responsibility to help everyone have all the chances to
develop themselves as much as they want, regardless of their context.</p>
</li>
</ul>
<p>In general, all the points are summarized in one: Knowledge should be free
(libre). If it’s not free it’s not knowledge, is just something that makes you
more powerful than others: it’s injustice.</p>
<p>I realized many of these things could be solved with a good repository of
knowledge in different languages, and, as I’m teaching stuff and I like to
write, I considered interesting to work a little bit more on the notes I give
my students and make them look like a book.</p>
<p>With <a href="https://ekaitz.elenq.tech/templates-released.html">a little bit of
effort</a>, I can make a book
that can be published on the web, on a physical book and on a easy-to-print <span class="caps">PDF</span>
that anyone can print and copy in a local print shop. I can make it arrive any
place in this world.</p>
<p>Not only that. Some people designed a license that lets others create new
contents on top of what I did and force them to share what they did with the
same license: Creative Commons.</p>
<p>So, with some effort and some funding (and a smile in my face) I can create a
publishing project where I gather all the knowledge that my job makes me deal
with and I can share it in a way that is ethical and respects everyone’s circumstances.</p>
<p>This is something I want to try. It’s something I <em>need</em> to try.</p>
<h2>The campaign</h2>
<p>That’s why I’m trying it.</p>
<p>This campaign is the first attempt to make this happen. If it’s successful, it
will make me spend some of my time giving love to the contents I want to
release and let me talk with people who have knowledge in areas I don’t and
help them publish it.</p>
<p>The campaign has physical books as a goal but they are just a vehicle to be
able to publish them in a way that is easy to share. The physical objects are
just a way to get funds.</p>
<p>The main goal is not just to make books: it’s the creation of an infrastructure
to share knowledge that I can use for the things I research but it can be used
also for things other people researches. All the content is going to be
published in raw in a repository that anyone could audit, review, improve or
create a new project based on it.</p>
<p>Once the infrastructure is ready, publishing new books should be a piece of
cake. This first project is going to teach us how to make the paperwork for the
<span class="caps">ISBN</span> and the book registration and is going to give us time to create the
website where the content is going to be stored. Once all those points are
ready, the rest of it is “just” write and publish.</p>
<p>The campaign’s goals are separated in two levels: the minimum and the optimum.</p>
<p>The minimum is the publishing of the books in Spanish, my mother tongue
language, and covers all the infrastructure costs of that. This way Spanish
speaking people will have at least some technical books in their language.</p>
<p>The optimum ensures the publishing in English<sup id="fnref:english"><a class="footnote-ref" href="#fn:english">5</a></sup>. This goal enables the
translation to different languages because many people would probably know or
Spanish or English, because they are two of the most widely spoken languages in
the world, and that way they would be able to translate the books to their
mother tongue language to help their own community. I’m not able to supervise a
translation to a language I don’t know, but I’m able to make a reliable
material in both of these languages for people to work on top of.</p>
<p>As you see, the goals have a really interesting point of contradiction: I want
to provide technical material for people that doesn’t talk English, but I’m
trying to make it in English for it. Funny, huh?</p>
<p>I’m just trying to be as practical as I can.</p>
<p>There are many points I’d like to consider, many other translations I’d love to
do, but I need to focus my effort on something useful in the short term because
if it happens to be useful it’s going to push me to keep working on it in the
future, providing more and more material and translations.</p>
<h2>The feelings</h2>
<p>I like the idea of crowdfundings since I heard about them and I’ve been
planning to make one for years. Electronic devices, software, miniature games
and collections… Everything was candidate for a crowdfunding campaign in my
mind, but I didn’t want to disappoint the patrons and I never started one.</p>
<p>This time I think it’s possible to provide good material. The content is
already defined and tested in my courses, it only needs to be updated, the
goals are simple and doable, and I have all the good people around me that is
going to help me with everything.</p>
<p>This project helped me connect with the people I love and that’s one of the
best things in life. I know everything is going to be fine with their help and support.</p>
<p>Since I started with ElenQ Technology I had the chance to meet many good people
with incredible skills and love in their heart. This project is somehow the
result of the love they gave me because it filled me with the courage I needed.</p>
<p>Just wanted to express my gratitude<sup id="fnref:gratitude"><a class="footnote-ref" href="#fn:gratitude">6</a></sup>.</p>
<p>Thank you all.</p>
<hr style="border-width: 1px; height: 4em;">
<h4>If you want to take part…</h4>
<p>There are many ways you can help but crowdfunding campaigns look like the only
one is giving money and that’s not true. Also, there are many ways to give
money and some are more effective than others.</p>
<p>For people that is not able to provide funds but want to take part there are
also very helpful tasks:</p>
<ul>
<li>
<p>Sharing the campaign with people, communities, universities, libraries and so
on, that might be interested is always good.</p>
</li>
<li>
<p>Once the contents are released in our repository, reviewing the content or
improving it will help us a lot.</p>
</li>
<li>
<p>Translating the content to other languages will help other communities we
can’t directly help. Your language skills are valuable.</p>
</li>
<li>
<p>Your love and support is always welcome and it helps us to keep on the good job.</p>
</li>
</ul>
<p>For the people that want to provide monetary help, there are points to consider:</p>
<ul>
<li>
<p>The best way to help is to give money and don’t ask for any physical good
(the first reward is for that) because the books have an associated printing
and shipping cost. They are going to be released online anyway so if you
don’t really like the idea of having a physical object, you can also take
part and get the result of the campaign.</p>
</li>
<li>
<p>The second best way is to give money and ask for the physical good(s). In the
case of this campaign, more books you order the cheaper their production is
(bulk orders and scale economy, you know…).</p>
</li>
<li>
<p>One of the highest costs is the shipping, asking us to get the goods in
person<sup id="fnref:coffee"><a class="footnote-ref" href="#fn:coffee">7</a></sup> (Bilbao) or making bulk shippings reduces the costs and makes
the donation more efficient. It’s better for us if a group of friends make
just one big order than having many small orders.</p>
</li>
</ul>
<p>That said, here goes the link to the campaign if you want to take part.</p>
<p><a href="http://en.goteo.org/project/elenq-publishing">http://en.goteo.org/project/elenq-publishing</a></p>
<p>Thank you.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:blockchain">
<p>don’t judge me too fast: the course was a technical explanation
about every detail of how does bitcoin work. I wanted students to learn the
cryptography behind and all the good design ideas bitcoin has while I tried
to make them be critical about the blockchain technology during the
blockchain boom. <a class="footnote-backref" href="#fnref:blockchain" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:course">
<p>They need it, more if its a course like mine where I talk them about
working with ethics and being independent. :D <a class="footnote-backref" href="#fnref:course" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:employability-problems">
<p>Say: women, unemployed people in their 50s dropped
out of their jobs because of the
<a href="https://en.wikipedia.org/wiki/Deindustrialization">de-industrialization</a>,
immigrants, people with disabilities, people that just get out of jail… <a class="footnote-backref" href="#fnref:employability-problems" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:will">
<p><span class="caps">USA</span>, I’m looking at you. <a class="footnote-backref" href="#fnref:will" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:english">
<p>Don’t worry, I know my English is bad and I’m not going to
translate them, a professional service will (under my supervision for the
terminology and stuff). <a class="footnote-backref" href="#fnref:english" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:gratitude">
<p>I don’t know why… I’m like that I guess. <a class="footnote-backref" href="#fnref:gratitude" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:coffee">
<p>If you get the books in person I’ll take a coffee/tea with you. <a class="footnote-backref" href="#fnref:coffee" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
</ol>
</div>ElenQ Publishing2020-02-18T00:00:00+02:002020-02-18T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-02-18:/elenq-publishing-es.html<p>ElenQ Publishing es un proyecto de publicaciones técnicas que pretende abir
la puerta del conocimiento técnico a cualquiera.</p><p>Saludos,</p>
<p>Hoy 18 de febrero de 2020 ha empezado la campaña de ElenQ Publishing y me
gustaría hablar un poco sobre ello.</p>
<p><a href="http://goteo.org/project/elenq-publishing">http://goteo.org/project/elenq-publishing</a></p>
<h2>La plataforma</h2>
<p>En primer lugar, me gustaría destacar la plataforma en la que se ha publicado:</p>
<p><a href="http://goteo.org">http://goteo.org</a></p>
<p>Goteo es una plataforma para campañas de mecenazgo con un fondo social.
Funciona sobre código libre publicado con licencia Affero <span class="caps">GPL</span>, que permite la
reutilización y la extensión del mismo siempre y cuando el código de la
plataforma (tanto servidor como cliente) esté disponible para <del>los
usuarios</del> las personas que la usen.</p>
<p>La Fundación Goteo se encarga de mantener el código de goteo y de gestionar
goteo.org. La fundación se financia con el 5% del dinero obtenido por las
campañas, una cifra bastante justa por sus servicios, y por el apoyo que
reciben de entidades públicas como el Ayuntamiento de Barcelona y el Ministerio
de Educación, Cultura y Deporte de España.</p>
<p>Además, al menos en España y no sé si en otros países, las personas que
participen aportando dinero en campañas de Goteo pueden declararlo en su
Declaración de la Renta y desgravar por donación a fines sociales.</p>
<p>Para los que quieran hacer una campaña, la Fundación Goteo revisa el contenido
y aporta recomendaciones e ideas. Además, la plataforma está pensada para
aceptar varios idiomas y la interfaz está traducida a todos los idiomas
regionales de España y algunos otros más.</p>
<p>En general, encaja muy bien con la perspectiva ética de ElenQ Technology.</p>
<h2>La historia</h2>
<p>No hablo mucho sobre ello, quizás menos de lo que debería, pero he estado
dedicándome a dar cursos relacionados con la informática durante estos primeros
3 años de ElenQ Technology. He dado cursos de diversos temas: introducción a
python, python avanzado, bitcoin y blockchain<sup id="fnref:blockchain"><a class="footnote-ref" href="#fn:blockchain">1</a></sup>, clojure… Y
algunos más que no recuerdo. He trabajado en muchos contextos distintos desde
cursos para jóvenes en desempleo a cursos para ingenieros en centros de
investigación. Parece, además, que voy a seguir haciéndolo, porque es un
trabajo que disfruto y los alumnos suelen decirme que se me da bien.</p>
<p>Por mucho que me ayude a ganarme el pan, creo que esto no es suficiente. En mi
día a día veo muchos problemas que me gustaría resolver.</p>
<p>Me dí cuenta que tengo materiales y cursos ya probados que quiero compartir.</p>
<p>Me dí cuenta que la gente que no estudia estos temas por su cuenta no lo hace
por vagancia o porque no sean suficientemente inteligentes. Muchos de ellos no
lo hacen porque tienen problemas con el inglés y necesitan a alguien como yo
que les sirva de puente. Están aislados por el lugar en el que nacieron o por
la cultura que tienen. Esto es inadmisible.</p>
<p>Me dí cuenta que las publicaciones técnicas que leía estaban escritas, en
general, desde la misma perspectiva. Tiene sentido, porque la mayor parte de
los autores parten del mismo contexto. Me gustaría tener unas publicaciones
técnicas más diversas y la única forma de conseguir esto es hacer que la
tecnología sea más accesible.</p>
<p>Me dí cuenta de que en mi contexto local el sistema educativo tiene problemas
evidentes que parece que no hay ningún interés en resolver:</p>
<ul>
<li>
<p>En mi provincia, los cursos subvencionados sólo se centran en colectivos que
es probable que consigan un trabajo en el corto plazo. De este modo, los
políticos responsables pueden decir que consiguieron el trabajo gracias a
ellos y seguir ganando elecciones. Los jóvenes que acaban de terminar la
universidad conseguirán un trabajo en el corto plazo independientemente de
que realicen o no realicen cursos subvencionados. Esto no significa que no
los necesiten<sup id="fnref:cursos"><a class="footnote-ref" href="#fn:cursos">2</a></sup>, significa que no afectará a su empleabilidad. ¿Qué
pasa con quienes tienen <strong>verdaderos problemas</strong> para conseguir
empleo<sup id="fnref:problemas"><a class="footnote-ref" href="#fn:problemas">3</a></sup>?</p>
</li>
<li>
<p>Algunas personas tienen problemas individuales que no encajan en los
objetivos de las campañas sociales de gobiernos y otras entidades porque se
fijan en grandes grupos de personas con problemas similares y no en personas
individuales. Tiene sentido que lo hagan, porque de este modo su ayuda es más
eficiente, pero esta red de soporte tiene orificios que deberíamos resolver.</p>
</li>
<li>
<p>No hay soporte estructural para personas que simplemente quieren aprender por
el placer de hacerlo y no con un fin laboral. La universidad está derivando.
Hoy en día no es mucho más que un lugar que te da un papel con el que luego
es más fácil conseguir un trabajo, pero no está satisfaciendo las necesidades
intelectuales de la sociedad como debería. Ya no es un lugar donde encontrar
conocimiento. Muchas personas no quieren aprender ni pensar, pero otras sí
que quieren y les estamos impidiendo hacerlo.</p>
</li>
<li>
<p>En otros lugares el problema de la educación es incluso peor y no tienen
recursos (o voluntad<sup id="fnref:voluntad"><a class="footnote-ref" href="#fn:voluntad">4</a></sup>) para solucionarlo. Las personas no deberían
sufrir las consecuencias de un sistema que no funciona, sea por la razón que
sea. Es nuestra responsabilidad ayudar a todo el mundo a tener oportunidades
para desarrollarse tanto como quiera independientemente de su contexto.</p>
</li>
</ul>
<p>En general, todos estos puntos vienen a resumirse en uno: El conocimiento tiene
que ser libre. Si no es libre no se puede considerar conocimiento, es sólo algo
que me hace más fuerte que los demás: es injusticia.</p>
<p>Me dí cuenta que todas estas cosas pueden solventarse (o tratar de solventarse)
con un buen repositorio de conocimiento en varios lenguajes y, ya que doy
clases y me gusta escribir, considero interesante dedicar algo más de tiempo a
los apuntes que entrego a mis alumnos y darles forma de libro.</p>
<p>Con <a href="https://ekaitz.elenq.tech/templates-released.html">un poco de esfuerzo</a>,
puedo crear un libro que se puede publicar en la web, en un libro físico y un
<span class="caps">PDF</span> fácil de imprimir y de copiar en una copistería cercana. Puedo hacer que
esto llegue a cualquier lugar del mundo.</p>
<p>No sólo eso. Alguien se ha tomado la molestia de crear una licencia que permite
que los contenidos que publique sean editados y mejorados siempre que el
producto resultante tenga la misma condición: Creative Commons.</p>
<p>Por tanto, con un poco de esfuerzo y un poco de presupuesto (y una sonrisa)
puedo crear un proyecto de publicación que albergue el conocimiento que a
diario me encuentro gracias a mi trabajo y poder así compartirlo de forma ética
y accesible, respetando las circunstancias de las personas que quieran consumirlo.</p>
<p>Esto es algo que, evidentemente, quiero intentar. Es algo que <em>tengo que</em> intentar.</p>
<h2>La campaña</h2>
<p>Es por eso que lo estoy intentando.</p>
<p>Esta campaña es el primer intento para hacer de esto una realidad. Si tiene
éxito, me permitirá dedicar un poco de tiempo a dar amor a los contenidos que
quiero publicar y me permitirá publicar el conocimiento de otros.</p>
<p>El objetivo de publicar libros físicos es sólo un vehículo para poder
publicarlos de modo que sea fácil de compartir. Los objetos físicos son sólo
una forma de conseguir los fondos.</p>
<p>La idea principal no es hacer unos libros, es crear una infraestructura para
compartir contenido que me permita compartir lo que investigo y pueda servir de
plataforma para que otros compartan lo que ellos investigan. Todo el contenido
será publicado en un crudo en un repositorio que cualquiera pueda auditar,
revisar, mejorar o crear proyectos derivados desde éste.</p>
<p>Una vez que la infraestructura esté disponible, publicar nuevos contenidos será
extremadamente sencillo. El primer proyecto nos enseñará a tratar con el
papeleo necesario para conseguir los <span class="caps">ISBN</span>, el registro del libro, etc. y para
crear las herramientas necesarias para publicar (una web…). Una vez resueltos
estos puntos, el resto es “sólo” escribir y publicar.</p>
<p>Los objetivos de la campaña están separados en dos niveles: el mínimo y el óptimo.</p>
<p>El mínimo trata de publicar los libros en español, el idioma en el que pienso,
y cubre los gastos de infraestructura para estos. De este modo, todas las
personas de habla hispana podrán tener libros técnicos en su idioma.</p>
<p>El objetivo óptimo asegura la publicación de los libros en inglés<sup id="fnref:ingles"><a class="footnote-ref" href="#fn:ingles">5</a></sup> con
el fin de habilitar la traducción a otros idiomas. Muchas personas son capaces
de hablar, además de su propio idioma, inglés o español, ya que son dos de los
idiomas más hablados del mundo. De este modo, la probabilidad de que alguien
pueda tomar nuestras publicaciones y traducirlas a otros idiomas aumenta de
forma radical, facilitando así que ayuden a sus comunidades de una forma en la
que nosotros no estamos capacitados. Puedo hacer lo posible para ayudar en
traducciones a los idiomas que conozco, pero no tengo más alcance que ése.
Aportar una buena base en un idioma común permite que estas traducciones
espontáneas surjan de forma independiente.</p>
<p>Me encantan estas contradicciones: quiero aportar material para acabar con la
hegemonía del inglés y lo publico en inglés. Es gracioso. ¿Verdad?</p>
<p>Sólo intento ser lo más práctico posible y elegir qué batallas puedo librar.</p>
<p>Hay muchas cosas que me gustaría revisar, muchas traducciones que me gustaría
hacer, etc. pero necesito fijarme en lo que puedo aportar a corto plazo y, si
resulta útil, crecer desde ahí, ya que me dará fuerzas para seguir trabajando
en el futuro.</p>
<h2>Los sentimientos</h2>
<p>La idea de los crowdfunding lleva años en mi cabeza. Llevo años tratando de
hacer alguno: de electrónica, software, juegos y colecciones de miniaturas…
Todas mis aficiones eran candidatas a ser un crowdfunding, pero nunca me animé
a hacerlos porque no quería decepcionar a los mecenas. Me daba vértigo.</p>
<p>En esta ocasión creo que puedo aportar buen material. El contenido está probado
en mis cursos, sólo necesita actualizarse, los objetivos son simples y
realizables y estoy rodeado de buenas personas que me ayudan con todo.</p>
<p>Este proyecto me ha ayudado a conectar con las personas que quiero y eso es
algo maravilloso. Sé que todo va a salir bien con su ayuda y su apoyo.</p>
<p>Desde que empecé ElenQ Technology, he tenido oportunidad de crear vínculos con
gente maravillosa a la que tengo mucho respeto. Este proyecto es, de alguna
manera, el resultado del amor que me dan, porque me ha hecho superar el miedo
al fracaso. Con su ayuda, creo que puedo conseguir todo lo que me proponga.</p>
<p>Sólo quería expresar mi gratitud<sup id="fnref:gratitud"><a class="footnote-ref" href="#fn:gratitud">6</a></sup>.</p>
<p>Gracias.</p>
<hr style="border-width: 1px; height: 4em;">
<h4>Si quieres colaborar…</h4>
<p>Hay muchas formas de colaborar pero la realidad es que los crowdfunding dan la
impresión de que la única es la monetaria. Eso no es cierto. Además, algunas
formas de aportar fondos son más efectivas que otras.</p>
<p>Para las personas que quieran ayudar sin hacer aportaciones monetarias hay
unas tareas que serían muy útiles:</p>
<ul>
<li>
<p>Compartir la campaña con personas, comunidades, universidades y librerías que
puedan estar interesadas siempre es útil. Yo prefiero compartir con quien
pueda tener interés más que insistir de forma aleatoria en las redes sociales.</p>
</li>
<li>
<p>Una vez los contenidos estén disponibles en el repositorio, revisarlos y
mejorarlos nos ayuda mucho. La tarea de revisar libros es tediosa y aburrida
y cualquier ayuda que podamos tener ahí será bienvenida.</p>
</li>
<li>
<p>Traducir el contenido a otros idiomas es importante. Nosotros a título
personal no podemos ayudar en esa tarea, si lo haces, ayudas a tu comunidad
de forma directa.</p>
</li>
<li>
<p>Tu amor y tu apoyo es siempre bienvenido y nos ayuda a tener fuerzas en los
momentos donde el trabajo se acumula.</p>
</li>
</ul>
<p>Para los que quieran y puedan permitirse una ayuda monetaria, hay algunos
detalles a considerar:</p>
<ul>
<li>
<p>La mejor manera de ayudar es no pedir el producto físico (para eso está la
primera recompensa). De este modo, ayudas sin crear ningún tipo de gasto
material y de igual modo podrás acceder al contenido una vez se publique.
Evidentemente, los libros físicos tienen algo de romanticismo y por eso se ofertan.</p>
</li>
<li>
<p>La segunda mejor manera es pedir los productos físicos. En el caso de esta
campaña, cuantos más libros produzcamos menos coste tendrá cada unidad
(economía de escala).</p>
</li>
<li>
<p>Uno de los mayores gastos de la campaña es el envío. Recoger los libros en
mano<sup id="fnref:cafe"><a class="footnote-ref" href="#fn:cafe">7</a></sup> (Bilbao) o agrupar pedidos reduce el coste del envío y hace que
la donación sea más eficiente. Es mejor para nosotros (y para ti, debido a
los descuentos) si se forma un grupo de personas y hacen pedidos conjuntos.</p>
</li>
</ul>
<p>Dicho esto, aquí va el link de la campaña por si te apetece participar:</p>
<p><a href="http://goteo.org/project/elenq-publishing">http://goteo.org/project/elenq-publishing</a></p>
<p>Muchas gracias.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:blockchain">
<p>No me juzguéis demasiado rápido: el curso estaba enfocado desde
una perspectiva técnica que trataba de fomentar el pensamiento crítico en los
tiempos del boom del blockchain. <a class="footnote-backref" href="#fnref:blockchain" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:cursos">
<p>Los necesita, sobre todo si son cursos como los que yo hago en los
que hablo de ética y de cómo crear tecnología independiente :D <a class="footnote-backref" href="#fnref:cursos" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:problemas">
<p>Hablo de las mujeres, de los desempleados de más de 50 años que
fueron expulsados de sus puestos de trabajo por la desindustrialización, las
personas con discapacidad, inmigrantes, las que acaban de salir de la
cárcel… <a class="footnote-backref" href="#fnref:problemas" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:voluntad">
<p>Un saludo para los Estados Unidos de Norteamérica. <a class="footnote-backref" href="#fnref:voluntad" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:ingles">
<p>Tranquilidad, no seré yo el traductor. Soy consciente de que mi
nivel de inglés no es suficiente para una tarea así. Sí que trataré de
supervisar que la terminología, etc. es la correcta, pero no llego a mucho
más. <a class="footnote-backref" href="#fnref:ingles" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:gratitud">
<p>Soy así, yo qué sé. <a class="footnote-backref" href="#fnref:gratitud" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:cafe">
<p>Si te acercas a la ciudad para recibirlo en mano te invito a un café o
un té y charlamos un rato si te apetece. <a class="footnote-backref" href="#fnref:cafe" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
</ol>
</div>Dark or Light: choose one2020-01-15T00:00:00+02:002020-01-15T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-01-15:/dark-light-theme.html<p>How I implemented dark-light theme switcher in this blog.</p><p>Since this afternoon this blog has a way to change between dark and light themes.</p>
<p>I made this because my eyes hurt when I visit really light websites from a
window with a dark background. My desktop environment is configured to show
everything with a dark background and I spend most of my time on the terminal
so my eyes get used to the dark background and the light ones hurt, specially
at night.</p>
<p>I realized one of the sites that made my eyes hurt was my own website and I
can’t fix the whole web, but at least I can fix my site and write down what I
did <strong>to encourage you to fix yours</strong>.</p>
<h2>User preference</h2>
<p>First things first, since 2019 <span class="caps">CSS</span> has a new <em>mediaquery</em> that lets you know if
the visitor has a dark or a light background configured in their system. I was
introduced to this thanks to <a href="https://mastodon.social/@sheenobu/103149905102239911">@sheenobu</a>, who took the time to
answer to my message and make all this happen<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>.</p>
<p><a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@media/prefers-color-scheme">Here you have some documentation</a> about that <em>mediaquery</em>
called <code>prefers-color-scheme</code>, but in summary it can take three values
—<code>light</code>, <code>dark</code> and <code>no-preference</code>— that are quite self-explanatory in my opinion.</p>
<p>So if you mix that with a little bit of <a href="mdn-custom-pro"><span class="caps">CSS</span> custom properties
magic</a> (<span class="caps">AKA</span> variables) you can just parametrize the whole
color scheme and then use the <em>mediaquery</em> to choose the variables you want to
use. That’s fine.</p>
<h2>Locked in your preference</h2>
<p>The problem comes when you want to be able to let the user change from one
color scheme to other.</p>
<p>The <code>prefers-color-scheme</code> mediaquery gets user’s preference but, at least in
Firefox<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>, it’s not easy for the user to change to a light theme if they want.
They are locked in what they chose for their <span class="caps">OS</span>.</p>
<p>Sometimes it’s interesting to let the readers change the theme by themselves
for multiple reasons. As each developer or designer chooses the colors they
want, that may led to color scheme inconsistencies between the system colors
and the sites or have readability issues. Also, readability is subject to
personal preference: I like to use dark backgrounds, but sometimes I prefer to
read from a lighter background if the ambient light is stronger.</p>
<h2>Letting visitors contradict themselves</h2>
<p>In order to let the visitors go against that blood pact they signed with their
<span class="caps">OS</span>, we need some JavaScript<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.</p>
<blockquote>
<p><strong>Note about my personal taste:</strong> I avoid the use of JavaScript in places
that is not needed. I consider it unnecessary for blogs or websites that show
you information in a format supported by the web (text, audio, video…).
Also, I consider <strong>really</strong> important to think about the users who don’t
want to or can’t run JavaScript.</p>
<p>Most of my sites don’t use JavaScript at all, modern <span class="caps">CSS</span> and <span class="caps">HTML</span> are more
than enough for most of the applications. Webpages with a heavy use of
JavaScript are a threat to accessibility and make bots, spiders and
non-canonical browsers hard to implement<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup>.<br>
This blog makes use of JavaScript for two different things:</p>
<ul>
<li>The theme change I’m talking about in this post</li>
<li>Source code highlighting</li>
</ul>
<p>In both cases the blog is prepared to work perfectly for users with
JavaScript disabled. When the JavaScript is disabled, code blocks respect the
<span class="caps">HTML</span> tags for code declaration but they have no any extra tags or style. In
the case of the theme control, when JavaScript is disabled, the website makes
use of the user’s default preference leaving the option to change the theme
in hands of the browser or the operating system. Most of the time, these
design decisions work in favour of users that access the web from browsers
that don’t need any kind of styling (terminal browsers, screen readers…)
helping the browser find the content more easily.</p>
</blockquote>
<p>When I was going to start implementing it I remembered a <a href="https://medium.com/@mwichary/dark-theme-in-a-day-3518dde2955a">Medium post by Marcin
Wichary</a> that explains the process very well. I used as
a reference but I added a couple of points I want to share with you. I’ll also
try to cover everything the author talks about with my own words, just in case
someone doesn’t want to access Medium<sup id="fnref:5"><a class="footnote-ref" href="#fn:5">5</a></sup>.</p>
<p>First difference from the reference post is what I told you about in the
previous section. The post is from 2018, and the <code>prefers-color-scheme</code>
mediaquery is from 2019, so it’s not mentioned in the post<sup id="fnref:6"><a class="footnote-ref" href="#fn:6">6</a></sup>.</p>
<p>The mentioned post has also an introduction to <span class="caps">CSS</span> Custom Properties and their
use. I already gave you a link to the <span class="caps">MDN</span> Web documentation and I don’t feel
myself informed enough to try to explain you anything about <span class="caps">CSS</span>, so better go
there and read.</p>
<p>That said, the first point we have to solve is to have some property that makes
<span class="caps">CSS</span> know which theme is in use. That can be implemented like the article does,
adding a <code>data-</code><em>something</em> attribute to the <code>html</code> that then can be captured
in <span class="caps">CSS</span> like this:</p>
<pre class="highlight"><code class="language-css">html[data-theme='dark'] {
/*Your dark theme style here*/
}
html[data-theme='light'] {
/*Your light theme style here*/
}
</code></pre>
<blockquote>
<p><span class="caps">WARNING</span>: Be careful with the priority of this change, you have to put it
after the <code>prefers-color-scheme</code> mediaquery to make the cascade work as it
should. If you put it before, the mediaquery is going to override this
configuration and will make it pass unnoticed.</p>
</blockquote>
<p>But now you have to deal with the attribute and make it change whenever the
visitor selects one or other configuration. As I said, you need JavaScript for
that. Setting the attribute is as simple as this vanilla JavaScript line:</p>
<pre class="highlight"><code class="language-clike">document.documentElement.setAttribute('data-theme', color);
</code></pre>
<p>Good. Now it’s quite easy to start, right? Add a button, put an event listener
on it and whenever it’s pressed change the theme by setting the attribute with
the line I just show you. For instance:</p>
<pre class="highlight"><code class="language-clike">var theme_switch = document.getElementById('dark-light-switch');
function change(color){
document.documentElement.setAttribute('data-theme', color);
theme_switch.setAttribute('color', color);
}
function theme_change_requested(){
color = theme_switch.getAttribute('color');
if(color=='light')
change('dark');
else
change('light');
}
theme_switch.addEventListener('click', theme_change_requested);
</code></pre>
<p>We selected an element that will act as a theme switcher and added an event
listener to it. Whenever it’s clicked it will run the <code>theme_change_requested</code>
function that will change the color from the current to the other. Easy.</p>
<p>Problems come now.</p>
<h3>Get the initial color</h3>
<p>In order to start that process, you have to be able to know the current theme
in use, that way you’d be able to activate the necessary attribute for the
<code>html</code> tag or the current look of the theme switcher (in this blog a sun or a moon).</p>
<p>This current theme inspection results to be difficult because JavaScript
doesn’t have access to the <code>prefers-color-scheme</code> mediaquery. You can bypass
that by getting something you know is going to be present in your <span class="caps">CSS</span> and
reading it. In my case I used the <code>background-color</code> of the <code>body</code> because I
set the background to white in the light color scheme as you can see in the
<code>getCurrentColor</code> function:</p>
<pre class="highlight"><code class="language-clike">var theme_switch = document.getElementById('dark-light-switch');
function change(color){
document.documentElement.setAttribute('data-theme', color);
theme_switch.setAttribute('color', color);
}
function theme_change_requested(){
color = theme_switch.getAttribute('color');
if(color=='light')
change('dark');
else
change('light');
}
function getCurrentColor(){
// This is dependant of the CSS, be careful
var body = document.getElementsByTagName('BODY')[0];
var background = getComputedStyle(body).getPropertyValue('background-color');
if(background == 'rgb(255, 255, 255)') {
return 'light';
} else {
return 'dark';
}
}
function init( color ){
change(color);
theme_switch.setAttribute('color', color);
}
init( getCurrentColor() )
theme_switch.addEventListener('click', theme_change_requested);
</code></pre>
<p>Now, with this new code you are able to get the current theme when the page
loads and prepare your button and your <code>html</code> tag to start with the color
scheme the visitor has configured by default.</p>
<h3>Page-change amnesia</h3>
<p>Once you have all we explained working you’ll realize the website forgets
visitor’s decision when they navigate form one page to another. It makes
perfect sense, because there’s no way to keep the selection set.</p>
<p>We can make use of <code>localStorage</code> for this. With the following line we can set
the <code>'color'</code> item in the <code>localStorage</code> to the color visitor chose:</p>
<pre class="highlight"><code class="language-clike">localStorage.setItem('color', color);
</code></pre>
<p>Updating the <code>getCurrentColor</code> function, we can get the color from the
<code>localStorage</code> first, and, if it’s not set, we can use the strategy we used
before with <code>body</code><span class="quo">‘</span>s <code>background-color</code>. This is the updated <code>getCurrentColor</code>
function:</p>
<pre class="highlight"><code class="language-clike">function getCurrentColor(){
// Color was set before in localStorage
var storage_color = localStorage.getItem('color');
if(storage_color !== null){
return storage_color;
}
// If local storage is not set check the background of the page
// This is dependant of the CSS, be careful
var background = getComputedStyle(body).getPropertyValue('background-color');
if(background == 'rgb(255, 255, 255)') {
return 'light';
} else {
return 'dark';
}
}
</code></pre>
<p>With this function we can know what color has the user configured or the color
they chose in our color selector button, but still have to activate the theme
if the user has chosen one that is not the one on their preferences. Updating
the <code>init</code> and <code>change</code> functions this way is more than enough for that:</p>
<pre class="highlight"><code class="language-clike">function init( color ){
change(color, true);
localStorage.setItem('color', color); // CHANGED!
theme_switch.setAttribute('color', color);
}
function change(color, nowait){
document.documentElement.setAttribute('data-theme', color);
theme_switch.setAttribute('color', color);
localStorage.setItem('color', color); // CHANGED!
}
</code></pre>
<h3>Smooth transitions</h3>
<p>In the article I had as a reference the author made a simple but very effective
approach for theme transitions. The article uses the following <span class="caps">CSS</span> for smooth transitions:</p>
<pre class="highlight"><code class="language-css">html.color-theme-in-transition,
html.color-theme-in-transition *,
html.color-theme-in-transition *:before,
html.color-theme-in-transition *:after {
transition: all 750ms !important;
transition-delay: 0 !important;
}
</code></pre>
<p>The article also explains how to activate the transition, the following piece
of JavaScript code activates the transition and deactivates it one second later:</p>
<pre class="highlight"><code class="language-clike">window.setTimeout(function() {
document.documentElement.classList.remove('color-theme-in-transition')
}, 1000)
document.documentElement.classList.add('color-theme-in-transition');
</code></pre>
<p>We have to be careful with where do we add this because we may be forcing
transitions in the navigation and that’s really annoying. Updating the <code>change</code>
function with the transition is not enough, we need a way to discard the
transition for the changes produced by the <code>init</code> function. We can exploit the
fact that <span class="caps">JS</span> arguments are optional for that. Of course, the transition must be
added in the <code>change</code> function.</p>
<pre class="highlight"><code class="language-clike">function init( color ){
change(color, true); // Add true for nowait
localStorage.setItem('color', color);
theme_switch.setAttribute('color', color);
}
function change(color, nowait){ // Add the nowait argument
// Discard transition is nowait is set
if(nowait !== true){
window.setTimeout(function() {
document.documentElement.classList.remove('color-theme-in-transition')
}, 1000)
document.documentElement.classList.add('color-theme-in-transition');
}
document.documentElement.setAttribute('data-theme', color);
theme_switch.setAttribute('color', color);
localStorage.setItem('color', color);
}
</code></pre>
<p>Now with all this we are able to make the website the user configuration from
one page to another.</p>
<h2>Wrapping up</h2>
<p>With this configuration we are able to:</p>
<ul>
<li>Get visitor’s configuration based on the <span class="caps">OS</span> color theme: dark or light.</li>
<li>Make the visitor able to change their mind by choosing a different
color scheme.</li>
<li>Get the initial color of the page to be able to initialize the buttons.</li>
<li>Make the web remember the color scheme selection from one page to another
using <code>localStorage</code>.</li>
<li>Add smooth transitions but don’t activate them in page changes to avoid weird flashings.</li>
</ul>
<p>And that’s all.</p>
<p>No! It isn’t! We also had some fun talking about philosophy, accessibility and
sites you shouldn’t visit. In fact, all the color theme stuff was an excuse to
talk about it, but <em>sssssssh</em> don’t tell anyone.</p>
<p>If after knowing that you are still interested on the excuse itself, all the
code together it looks like this:</p>
<pre class="highlight"><code class="language-clike">var theme_switch = document.getElementById('dark-light-switch');
var body = document.getElementsByTagName('BODY')[0];
function init( color ){
change(color, true);
localStorage.setItem('color', color);
theme_switch.setAttribute('color', color);
}
function change(color, nowait){
// Discard transition is nowait is set
if(nowait !== true){
window.setTimeout(function() {
document.documentElement.classList.remove('color-theme-in-transition')
}, 1000)
document.documentElement.classList.add('color-theme-in-transition');
}
document.documentElement.setAttribute('data-theme', color);
theme_switch.setAttribute('color', color);
localStorage.setItem('color', color);
}
function theme_change_requested(){
color = theme_switch.getAttribute('color');
if(color=='light')
change('dark');
else
change('light');
}
function getCurrentColor(){
// Color was set before in localStorage
var storage_color = localStorage.getItem('color');
if(storage_color !== null){
return storage_color;
}
// If local storage is not set check the background of the page
// This is dependant of the CSS, be careful
var background = getComputedStyle(body).getPropertyValue('background-color');
if(background == 'rgb(255, 255, 255)') {
return 'light';
} else {
return 'dark';
}
}
init( getCurrentColor() )
theme_switch.addEventListener('click', theme_change_requested);
</code></pre>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Thanks for being there! <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>The way to change that is to access <code>about:config</code> and update the
<code>ui.systemUsesDarkTheme</code> field: <code>1</code> means <code>true</code> and <code>0</code> means <code>false</code>. Be
careful because it’s not a boolean field, it’s an integer field (I don’t know
why, don’t ask me). This change affects to <strong>all</strong> the tabs. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>This isn’t true in <em>every</em> context. We need it here because this is a
Static Website. This means the content you read is already created at server
side before you ask for it. If it wasn’t, all this could be simpler: just
load a different <span class="caps">CSS</span> depending on the user’s choice. The static counterpart
of this approach would be to create the <em>whole website</em> once per color scheme
and leave them in different folders like <code>domain/dark/whatever.html</code> and
<code>domain/light/whatever.html</code> this is not practical at all and carries tons of
extra problems. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p>As <em>everyone</em> want to have a good rank in the search engines <em>more than
anything else</em>, Google made a lot of decisions about how websites should be
in order to be able to be indexed properly. With the market quota they had
(almost 100%) they had the power to force developers and designers make
websites the way Google liked. That was obviously bad but it had some good
consequences: websites were easy to scrape or read by a robot with low
resources (that was what Google wanted). But since some years ago Google
announced their spider is able to run JavaScript, that made all those
developers and designers who wanted to make their websites have a good
ranking free: they don’t have any other limit to the use of JavaScript right
now (because they don’t really care about anything else). That made many
pages impossible to read by clients that don’t use JavaScript and made the
process of accessing websites automatically or with non-canonical browsers
impossible in many cases. <em>Thank you developers and designers for breaking
the Web</em>. <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:5">
<p>There are so many reasons to avoid Medium that <a href="https://nomedium.dev/">someone made a specific
website for them</a>. Also, some interesting free
software projects decided to migrate away from it and wrote about it, that’s
<a href="https://blog.elementary.io/welcome-to-the-new-blog/#the-decline-of-medium">the case of ElementaryOS</a>. <a href="https://mastodon.social/@ekaitz_zarraga/103498743276277093">I asked in the fediverse about
this</a> and many people sent me articles and links. Thanks to all! <a class="footnote-backref" href="#fnref:5" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:6">
<p>Too bad Marcin, you were unable to see the future. <a class="footnote-backref" href="#fnref:6" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
</ol>
</div>Screencasts: discussing with ffmpeg2020-01-11T00:00:00+02:002020-01-11T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2020-01-11:/ffmpeg-screencast.html<p>Screencasts in <code>ffmpeg</code>, having some fun solving issues thanks to
other people</p><p>When you battle using your arguments that’s called a <em>discussion</em>… That’s
exactly what I’ve been doing for a couple of days with <code>ffmpeg</code>: I’ve been
using arguments trying to reach an understanding.</p>
<p>I wanted to record my screen, and a couple of cameras, for reasons you’ll know
about later, and I didn’t want to play around with new <span class="caps">GUI</span> programs and
configuration so I decided to go for a simple <code>ffmpeg</code> setup.</p>
<p>You probably had played with <code>ffmpeg</code> in the past. It has tons of different
input arguments and options. It’s crazy.</p>
<p>Most of my previous times using it were just video conversions and it’s as
simple as choosing the right extensions for files, but when it comes to video
and audio recording it gets complicated. I have no idea about video and audio
encodings and I don’t really have the time to dig in such an exciting topic.
I searched on the Internet for examples and I found some: cool.</p>
<p>I played around with multiple inputs and outputs, I changed arguments I can’t
even remember now and it kinda worked until I decided to record my voice at the
same time. <strong>Delay</strong>.</p>
<p>What to do then?</p>
<p>Just go to the internet an keep searching.</p>
<p>I found a project called <code>vokoscreen</code> that now is archived because it’s
migrating from <code>ffmpeg</code> to <code>gstreamer</code> (I also struggled with gstreamer in the
past but that’s another story) but it worked fine. It was in the repos of my
distro, it only asked me to install one dependency, a couple of megs only… Great!</p>
<p>I tried to make a screencast and the audio worked like a charm. I went for the
code, read it and <a href="https://github.com/vkohaupt/vokoscreen/blob/b5865a85561baa46e627c09cf77efb7369516327/screencast.cpp#L2718">realized the arguments it uses to call <code>ffmpeg</code> are easy to
find</a>.</p>
<p>Even better, in the program itself there’s a button to show a log of what it
does and it dumps the exact call it does.</p>
<p>With that and some extra things I learned from the investigation in the deep
abyss of the Internet, boom! There it goes:</p>
<pre class="highlight"><code class="language-bash">ffmpeg
-y -f x11grab -draw_mouse 1 -framerate 25 -video_size 1920x1080 -i :0+0,0 \
-f alsa -ac 2 -i default \
-pix_fmt yuv420p -c:v libx264 -preset veryfast \
-c:a libmp3lame -q:v 1 -s 1920x1080 -f matroska \
output.mkv
</code></pre>
<p>Today in half an hour I solved the thing I’ve been struggling with a couple of
days. But I think I wouldn’t be able to solve it if I didn’t struggle with it
last days… I don’t know.</p>
<p>The good thing is I learned a couple of things from this I’ll write down here
to avoid forgetting them:</p>
<h3>Multiple inputs</h3>
<p>Like the command in the example, <code>ffmpeg</code> can get multiple inputs. In the case
of the example, they are <code>x11grab</code> (my screen) and <code>alsa</code><span class="quo">‘</span>s default input (the
microphone). More inputs can be combined, like music playing in the background
or whatever you want.</p>
<h3>Multiple outputs</h3>
<p>There’s also the chance to put multiple outputs there just like the multiple
input thing does but in the output part of the command<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>.</p>
<h4>Pipe</h4>
<p>You can even pipe the command to a different one, like:</p>
<pre><code>ffmpeg [...] - | ffplay -i -
</code></pre>
<p>In this case you can use one of the outputs to record to a file and another
one to <code>ffplay</code> which plays the video in screen.</p>
<p>This is useful if you want to record from a webcam and you want to see what you
are recording.</p>
<h3>Closing note</h3>
<p>So yeah, I was an ignorant about <code>ffmpeg</code> and I still am.</p>
<p>But at least I learned a couple of the arguments and learned how to deal with
all my cameras and screens at the same time.</p>
<p>Good enough.</p>
<p>I mean, it works, right? And that’s the most important thing<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Yes, it’s hard to know where’s the input and where’s the output. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>It’s not. The most important thing is to be happy, do what you like,
enjoy your life and feel appreciated and valued. If your software works it’s
like… Uugh… Congratulations I guess? <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Hiding the complexity2019-10-27T00:00:00+03:002019-10-27T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-10-27:/hiding-the-complexity.html<p>Inspired by scheme share my thoughts about the complexity and how it’s
being hidden by our tools and how does that affect us as creators and engineers.</p><p>I’ve been recently playing with Scheme, reading R⁷RS and so on and I found
something really interesting: Even with its high level of abstraction, it
doesn’t hide the complexity and makes you pay attention to it.</p>
<p>That’s extremely powerful and interesting.</p>
<p>It’s even more interesting if you think about the fact that Scheme can be
implemented from scratch in an acceptable amount of time by a couple of hands.
You don’t need to be a big corporation or a big group of developers coding for
years to implement it.</p>
<p>It’s <strong>simple</strong> but it doesn’t hide the complexity of the implementation.
That’s a really powerful balance.</p>
<p>But both points are too much to leave them here without playing with them
separately so let’s try to understand why both of the points are<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>
fundamental.</p>
<h2><del>Hidden</del> Latent complexity</h2>
<p>You can create the best programming language in the world but the
complexity of the programming can’t be eliminated because the user of the
language will be, actually, programming. So, you can take two approaches here:</p>
<ul>
<li>Expose the <strong>intrinsic</strong> complexity of programming.</li>
<li>Make it look as complexity doesn’t exist. Which means hiding the complexity
as much as you can under layers of abstraction.</li>
</ul>
<p>Most of modern programming languages go for the second option. But not only
programming languages, also operating systems, computers themselves and many
areas more. Which is not specially bad in general, but it’s dangerous when you
need control.</p>
<p>Most of the times where complexity is hidden by design, it’s just <em>latent
complexity</em>. It’s harder to reach by the user, so the user scope of things they
can do is reduced (and with it the their ability to decide with a high level
of detail) but the complexity is still there, happening without being noticed,
under the surface and being impossible to correct if something goes wrong.</p>
<p>Think about your cellphone. You can’t open it, change the battery, change it’s
software, change… <em>anything</em>. Because it’s hard to do and “<em>people don’t need
to know about that</em>“. But finally, what you have is a phone that is impossible
to repair if something goes wrong or impossible to change if you want it to do
something that is not the default behaviour.</p>
<p>It is a problem (some people is trying to fix, by the way) but it’s not a
problem for <em>everyone</em> because <em>everyone</em> doesn’t need to have that level of
complexity exposed. But they should have the right to see it if they wanted to.</p>
<h2>Exposed complexity</h2>
<p>As engineers working on technology, we should be demanding for the complexity
of things being exposed, more than running away from it.</p>
<p>As engineers we are supposed to want to know how stuff works!</p>
<p>In the case of programming languages, I want to control what the program does
and I be aware of what I’m doing and which decisions I’m taking.</p>
<p>When complexity is exposed you are reminded of the importance of every choice
you make. It’s not something that happens: you have to think about it.</p>
<blockquote>
<p>In Scheme: List vs Vectors. Which one is better? Why have both? Why not use
use one all the time?</p>
</blockquote>
<p>It’s reminding you what do you have under the hood, even if you aren’t
implementing it yourself. That way you don’t forget about <em>your job</em>.</p>
<h2>Simplicity</h2>
<p>Simplicity means that there’s no unneeded complexity. It doesn’t mean that
complexity is hidden. We tend to confuse both terms too often.</p>
<p>Scheme is simple, because its core is small and it’s based in few concepts.
Being simple means that concepts are clear and consistent and have few or none exceptions.</p>
<p>Other programming languages use the same concepts that scheme does but they are
not clearly stated so you can’t rely on them for your understanding of a
language. Scope in JavaScript (for instance) is often explained as a thing
related with curly brackets’ position while hiding the fact that it’s a lexical
scope. Watching engineers prefer a silly trick than an academical fact is
unsatisfying<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<p>In many platforms abstraction layers are added until the internals are hidden
or blurred. In this case, complexity is directly hidden by implementation, more
than by users themselves running away from it.</p>
<p>Some would ask: “Who cares about the details?” <sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup> And it’s perfectly fine to
think that at some point but when it comes to choosing the right tool to the
right problem, performance and fine tunning, you’d really like to know how they
are implemented because implementation is what makes some operations be faster
or more accurate than others. And, probably more important than that, being
aware about how stuff is implemented make us independent enough to change the
implementation if we want, <strong>which is the base of free software</strong>.</p>
<p>When your tools hide reality from you for long enough, you start to forget that
the reality still exists even if you are not watching it and you start
acting like it wasn’t there.</p>
<h2>Assembly then… Right?</h2>
<p>Don’t get me wrong. I’m not against simplification or making our job easier.
Scheme, is a really high-level language. <strong>Abstraction is good</strong>.</p>
<p>Accidental self-lobotomy is not that good.</p>
<blockquote>
<p><span class="caps">NOTE</span>:<br>
This blogpost was triggered by this talk where a musician talks about
chiptune music and says how making chiptune music made him a better
guitarist. It has some good points about constraints and complexity.</p>
<p><a href="https://youtu.be/_7k25pwNbj8">https://youtu.be/_7k25pwNbj8</a></p>
</blockquote>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>In my opinion, of course. This is <em>my</em> blog. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>And insulting, I’d say. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p><span class="caps">TLDR</span>: <em>You, as an engineer, should</em>. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>TUI Slang: Speak like the natives2019-06-15T00:00:00+03:002019-06-15T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-06-15:/clopher04.html<p>Interfacing between Clojure, Java and Native code.</p><p>The previous post introduced <code>termios</code> as a native interface to configure the
terminal input processing. With termios we managed to make our C programs get
input character by character processing them as they came with no buffering but
we didn’t integrate that with our Clojure code. Now it’s time to make it.</p>
<h3>Run before it’s too late</h3>
<p>Before we dig in the unknown, I have to tell you there are other alternatives
for the terminal configuration. The simplest one I can imagine is using
<code>stty</code><sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> as an external command. I learned this from <a href="http://salza.dk/">Liquid</a>, a
really interesting project I had as a reference. If you want to see this work
check the <code>adapters/tty.clj</code> file in the <code>src</code> directory of the project.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>
<p>Of course, it has some drawbacks. <code>stty</code> is part of the <span class="caps">GNU</span>-Coreutils project
and you have to be sure your target has it installed if you want to rely on
that. I’m not sure about if it’s supported in non-<span class="caps">GNU</span> operating systems<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.</p>
<p>In my case, I decided to stay with <code>termios</code> interface to deal with all this
because I didn’t really want to rely on external commands and it’s supposed to
be implemented in any <span class="caps">POSIX</span> <span class="caps">OS</span>. The good (bad?) thing is it made me deal with
native libraries from Clojure and had to learn how to do it.</p>
<h3>The floor is Java</h3>
<p>When dealing with low-level stuff we have to remember Clojure is just Java, and
most of the utilities we need to use come from it. This means the question we
have to answer is not really <em>“how to call native code from Clojure?”</em> because
if we are able to call native code from Java, we will be able to do it from
Clojure too (if we spread some magic on top).</p>
<h4>So, how to call native code from Java?</h4>
<p>First I checked the <a href="https://en.wikipedia.org/wiki/Java_Native_Interface">Java Native Interface (aka <span class="caps">JNI</span>)</a>, but I thought it
was too much for me and I decided to check further. Remember there are only a
couple of calls to make to <code>termios</code> from our code, so we don’t really want to
mess with a lot of boilerplate code, compilations and so on.</p>
<p>My research made me find <a href="https://en.wikipedia.org/wiki/Java_Native_Access">Java Native Access (aka <span class="caps">JNA</span>)</a> library. If you
check the link there you’ll find that the Wikipedia<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup> describes it as:</p>
<blockquote>
<p><span class="caps">JNA</span>’s design aims to provide native access in a natural way with a minimum of
effort. No boilerplate or generated glue code is required.</p>
</blockquote>
<p>Sounds like right for me. Doesn’t it?</p>
<p>I encourage you to check the full Wikipedia entry and, if you have some free
time at the office or something, to check the implementation because it’s
really interesting. But I’ll leave that for you.</p>
<h5>A lantern in the dark</h5>
<p><span class="caps">JNA</span> is quite easy to use for the case of Clopher, even easier if you realize
there is <a href="https://github.com/mabe02/lanterna">lanterna</a>, the <span class="caps">TUI</span> library, out there, using it internally
so you can <em>steal</em><sup id="fnref:5"><a class="footnote-ref" href="#fn:5">5</a></sup> the implementation from it. Lanterna is a great piece
of software I took as a reference for many parts of the project. Digging in the
internals of large libraries is a great exercise and you can learn a lot from it.</p>
<p>First of all, like many Java projects, the amount of abstractions it has is
crazy. It takes some time to find the actual implementation of what we want.
This isn’t like this for no reason, the reality is they need to create this
amount of abstractions because the part of the library that handles the widgets
can work on top of many different terminal implementations, including a
<a href="https://en.wikipedia.org/wiki/Swing_(Java)">Swing</a> based one that comes with Lanterna itself.</p>
<p>Clopher only targets <span class="caps">POSIX</span> compatible operating systems so we can go directly
to what we want and read the termios part directly discarding all the other
compatibility code. This code is quite easy to find if you see the directory
tree of Lanterna: there’s a <code>native-integration</code> folder in the root directory.
If you follow that you’ll arrive to <a href="https://github.com/mabe02/lanterna/blob/master/native-integration/src/main/java/com/googlecode/lanterna/terminal/PosixLibC.java"><code>PosixLibC.java</code></a> that
uses <span class="caps">JNA</span> to interact with termios.</p>
<p>The implementation provided by Lanterna is quite complete, they declare a
library with the functions they need and the data structure introduced in the
previous chapter. Once the library interface and the necessary data structures
are defined from Java they can be called with <span class="caps">JNA</span>, like they do in the file:
<a href="https://github.com/mabe02/lanterna/blob/263f013a2ee1d522eb86b8f1d315423fb1f79711/native-integration/src/main/java/com/googlecode/lanterna/terminal/NativeGNULinuxTerminal.java#L123"><code>NativeGNULinuxTerminal.java</code></a>.</p>
<h4>How to call <span class="caps">JNA</span> from Clojure, then?</h4>
<p>Calling Java code from Clojure is quite simple because Clojure have been
designed with that in mind, but this is not only that. Thanks to the Internet,
there’s a great <a href="https://nakkaya.com/2009/11/16/java-native-access-from-clojure/">blogpost by Nurullah Akkaya</a> describing a simple way
to use <span class="caps">JNA</span> from Clojure. From that, we can move to our specific case.</p>
<p><code>termios</code> has its own data structure so we need to define it so the <span class="caps">JNA</span> knows
how to interact with it. The problem is that Clojure doesn’t have enough <span class="caps">OOP</span>
tools to do it directly so we need to make it in plain Java. The good thing is
that we don’t really need to create anything else.</p>
<p>If we remove some unneeded code from Lanterna’s termios structure
implementation it will look like the implementation I made at
<code>src/java/clopher/Termios.java</code>:</p>
<pre class="highlight"><code class="language-clike">package clopher.termios;
import com.sun.jna.Structure;
import java.util.Arrays;
import java.util.List;
/**
* Interface to Posix libc
*/
public class Termios extends Structure {
private int NCCS = 32;
public int c_iflag; // input mode flags
public int c_oflag; // output mode flags
public int c_cflag; // control mode flags
public int c_lflag; // local mode flags
public byte c_line; // line discipline
public byte c_cc[]; // control characters
public int c_ispeed; // input speed
public int c_ospeed; // output speed
public Termios() {
c_cc = new byte[NCCS];
}
// This function is important for JNA, because it needs to know the
// order of the fields of the struct in order to make a correct Java
// class to C struct translation
protected List<String> getFieldOrder() {
return Arrays.asList(
"c_iflag",
"c_oflag",
"c_cflag",
"c_lflag",
"c_line",
"c_cc",
"c_ispeed",
"c_ospeed"
);
}
}
</code></pre>
<p>Once the struct is defined, it’s time to use it from Clojure. <code>clopher.term</code>
namespace has the code to solve this. Summarized here:</p>
<pre class="highlight"><code class="language-clojure">(ns clopher.term
(:import [clopher.termios Termios]
[com.sun.jna Function]))
(def ^:private ICANON 02)
(def ^:private ECHO 010)
(def ^:private ISIG 01)
(def ^:private ECHONL 0100)
(def ^:private IEXTEN 0100000)
(def ^:private VTIME 5)
(def ^:private VMIN 6)
; The macro we saw at the blogpost by Nurulla Akkaya
(defmacro jna-call [lib func ret & args]
`(let [library# (name ~lib)
function# (Function/getFunction library# ~func)]
(.invoke function# ~ret (to-array [~@args]))))
; Wrapper for the tcgetattr function
(defn get-config!
[]
(let [term-conf (Termios.)]
(if (= 0 (jna-call :c "tcgetattr" Integer 0 term-conf))
term-conf
(throw (UnsupportedOperationException.
"Impossible to get terminal configuration")))))
; Wrapper for the tcsetattr function
(defn set-config!
[term-conf]
(when (not= 0 (jna-call :c "tcsetattr" Integer 0 0 term-conf))
(throw (UnsupportedOperationException.
"Impossible to set terminal configuration"))))
; Example to set the non-canonical mode using the flags at the top of the
; file
; Yeah, binary operations.
(defn set-non-canonical!
([]
(set-non-canonical! true))
([blocking]
(let [term-conf (get-config!)]
(set! (.-c_lflag term-conf)
(bit-and (.-c_lflag term-conf)
(bit-not (bit-or ICANON ECHO ISIG ECHONL IEXTEN))))
(aset-byte (.-c_cc term-conf) VMIN (if blocking 1 0))
(aset-byte (.-c_cc term-conf) VTIME 0)
(set-config! term-conf))))
</code></pre>
<p>Pay attention to all the mutable code here!
<code>aset-byte</code> function helps a lot when dealing with all that.</p>
<p>Be also sure to check termios’ documentation because the calls act in a very
C-like way, returning a non-zero answer when they fail.</p>
<p>We need an extra point in our code to solve the Java-Clojure interoperability:
we have to tell our project manager that we included some Java code in there.
If our project manager is Leiningen, we can just tell it where do we store our
Java code. Be careful because Leiningen <a href="https://github.com/technomancy/leiningen/blob/master/sample.project.clj#L302">doesn’t like if you mix Java and
Clojure in the same folder</a>.</p>
<pre class="highlight"><code class="language-clojure">(defproject
; There's more blablabla in here but these are the keys I want you to
; take in account
:source-paths ["src/clojure"]
:java-source-paths ["src/java"]
:javac-options ["-Xlint:unchecked"])
</code></pre>
<h3>Look back!</h3>
<p>Now you can configure your terminal to act non-canonically and serve you the
characters one by one as they come. It’s cool but you’ll see there are some
problems to come for the next chapters. Don’t worry! They’ll come.</p>
<blockquote>
<p>This is like a heroic novel where the character (in this case you) fights
monsters one by one leaving their dead corpses in the dungeon floor. Looking
back will let you remember how many monsters did you slaughter in your way to
the deep where the treasure awaits. Remember to take rest and sharpen your
sword. This is a long travel.</p>
<p>Prepare yourself for the next monster. Let the voice of the narrator guide you
to the unknown.</p>
</blockquote>
<p>Why don’t you mix what you learned on the previous chapter with what you
learned from this one and try to make an interactive terminal program yourself?</p>
<p>I’ll solve that in the next chapter, but there’s some code of that part already
implemented in the repository. You can check it while I keep writing and
coding. Here’s the link to the project:</p>
<p><a href="https://gitlab.com/ekaitz-zarraga/clopher">https://gitlab.com/ekaitz-zarraga/clopher</a></p>
<p>See you in the next episode!</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Use the man pages, seriously: <a href="https://linux.die.net/man/1/stty"><code>man stty</code></a> <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>I’ve also been in contact with Mogens, the author of the project, who is
a really good guy and gave me a lot of good information. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>But who cares about them anyway? <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p><em>the free encyclopedia</em> <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:5">
<p>If it’s free software it’s not stealing and it’s exactly what you are
supposed to do with it. <a class="footnote-backref" href="#fnref:5" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
</ol>
</div>TUI: A look to the deep2019-05-30T00:00:00+03:002019-05-30T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-05-30:/clopher03.html<p>Research and work on the Terminal based User Interface of Clopher, the
gopher client.</p><p>This software have been introduced as a Gopher client but, as you can probably
deduce from the previous post, the Gopher part is probably the simplest one.
The complexity comes with user interaction. <em>People are hard</em>. That’s why we
are going to delay that as much as possible, trying to cover all the points in
the middle before we <em>jump to the unknown</em>.</p>
<p>Just joking. In fact, we have to shave many yaks before thinking about user
interaction anyway. This text talks about them.</p>
<h3>Are you talking to me?</h3>
<p>Let’s remember we can classify programs by two different categories like this:</p>
<ul>
<li>
<p><strong>Non-interactive programs</strong>, often called <em>scripts</em>, are programs that take
an input and return an output. There’s no interaction with the user in
between. An example of this could be the command <code>ls</code>.</p>
</li>
<li>
<p><strong>Interactive programs</strong> receive user input while they run and respond to the
user while they are running. An example of this could be the machine that
sells you the tickets for the subway, it asks you where are you going, then
tells you the price, take your money and so on. All of this with the program
constantly running.</p>
</li>
</ul>
<p>Remembering what we talked about Gopher: it’s a <em>stateless</em> protocol. There’s
no <em>state</em> stored in the server so all the queries <em>must</em> contain all the info
related to them. <em>Queries are independent</em>.</p>
<p>This, somehow leaves the door open to two possible implementations of Clopher.
The <em>non-interactive</em> one would work like <code>curl</code>. Getting the <span class="caps">IP</span>, port,
selector string and an optional search string as input it would open the
connection retrieve the result and return it.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>
<p>But Clopher is designed as an <strong>interactive program</strong>. More like <code>lynx</code>, where
you interactively ask for the pages and have a <em>local state</em> that records your
history and other things. This is a decision, it’s not imposed by the protocol.</p>
<h3>Shell<em>f boycott</em></h3>
<p>There are some different ways to handle user interaction in <span class="caps">TUI</span> based programs.
The simplest one is to read by line, waiting until the user hits <code>ENTER</code> to
read the result. That’s the behaviour of the classic <code>scanf</code> function of C and
many others like <code>input</code> in Python, etc.</p>
<p>In programs like Clopher, where the design is similar to <code>lynx</code> or <code>vi</code>, this
kind of input makes no sense at all. The program needs to be able to capture
every key pressed by the user and perform action in response to them. For
instance, in <code>vi</code> when the user hits <code>i</code> in normal mode it needs to change to
insert mode and when the user presses <code>i</code> in insert mode it needs to change the
contents of the buffer.</p>
<p>The design of these kind of programs is simple to understand, it’s an infinite
loop<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup> where key presses are captured and they change the <em>state</em> of the
program. When the user hits the key combination that halts the program the
loop is broken.</p>
<p>In simple C code the program would look like this:</p>
<pre class="highlight"><code class="language-clike">#include<stdio.h>
int
main(int argc, char * argv[]){
char c;
// Create some state
while(1){
c = getchar();
if( c == 'q'){ // Exit if user pressed `q`
return 0;
}
// Update state here
putchar(c); // Show the character for debugging
}
}
</code></pre>
<p>Or the simplified Clojure equivalent:</p>
<pre class="highlight"><code class="language-clojure">(loop [c (char (.read *in*))
state (->state)] ; Create some state
(when (not= c \q) ; Exit if user pressed `q`
(print c) ; Show the character for debugging
(recur (char (.read *in*))
(update-state c state))) ; Update state
</code></pre>
<p>Looks simple, right?</p>
<p>Wait a second, there’s a lot of stuff going on under the hood here. If you run
the code in any <span class="caps">POSIX</span> compatible operating system (I didn’t test on others,
and I won’t) you’ll find the code might not be doing what we expected it to:
The <code>getchar</code> (or <code>.read</code>) calls will wait until <code>ENTER</code> is pressed in the
input buffer and then they’ll get the characters one by one. But we want to get
them as they come!</p>
<h4>Saints and demons — canonical mode</h4>
<p>In <span class="caps">POSIX</span> operating systems, the input is buffered by default. But that behavior
can be configured following the <span class="caps">POSIX</span> terminal interface under the name
<strong>canonical</strong> mode or <strong>non-canonical</strong> mode. The mode we are looking for is
the non canonical mode. You can read more about it in <a href="https://en.wikipedia.org/wiki/POSIX_terminal_interface#Input_processing">the Wikipedia</a>.</p>
<p>Choosing the non-canonical mode has some extra options: one controls the number
of minimum characters to have in the buffer to perform a <code>read</code> operation and
the other defines the amount of tenths of second to wait for that input<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.
Choosing the right value for those fields (<code>c_cc[MIN]</code> and <code>c_cc[TIME]</code>)
depends on the kind of interaction we are looking for.</p>
<h4>Make Dikembe smile — blocking</h4>
<p>Setting <code>c_cc[TIME]</code> field to <code>0</code> means the <code>read</code> operation will wait
indefinitely until the minimum amount of characters defined with <code>c_cc[MIN]</code>
are waiting in the buffer. Together with that, the <code>c_cc[MIN]</code> can be <code>0</code> that
means the read operations will wait until there are <code>0</code> characters in the
buffer, or, in other words, they won’t wait.</p>
<p>Be aware that both fields can provoke the read operations in the input buffer
be non-blocking operations and that will cause the read operation to return
with no value.</p>
<p>In the case of Clopher, I decided to set the <code>c_cc[MIN]</code> to <code>1</code> so the read
operations block until there’s at least one character in the buffer (that means
they will always return something) and the <code>c_cc[TIME]</code> to <code>0</code> so the read
operations have no timeout and will block until a character arrives.</p>
<p>Depending on the application you are developing, you might choose other kind of
blocking configuration. For instance, setting a timeout can let you process
other parts of the system and wait for the input in the same thread.</p>
<h4>We’re talking about practice? — termios</h4>
<p>So now we know where to find this theoretical configuration it’s time to put it
in practice. In <span class="caps">POSIX</span> the standard way to access this is via <code>termios</code><sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup>.
It has some details that are not specified and depend on the implementation, so
it might have some differences from Linux to <span class="caps">BSD</span> or whatever.</p>
<p><code>tcsetattr</code> and <code>tcgetattr</code> calls can be used to set and read the terminal
configuration via termios. Check this example, compile it and compare it with
the C code of the previous example:</p>
<pre class="highlight"><code class="language-clike">#include<stdio.h>
#include<termios.h>
int
main(int argc, char* argv[]){
// Get interface configuration to reset it later
struct termios term_old;
tcgetattr(0, &term_old);
// Get interface configuration to edit
struct termios term;
tcgetattr(0, &term);
// Set the new configuration
term.c_lflag &= ~(ECHO | ECHONL | ICANON | IEXTEN | ISIG);
term.c_cc[VMIN] = 1; // Wait until 1 character is in buffer
term.c_cc[VTIME] = 0; // Wait indifinitely
//TCSANOW makes the change occur immediately
tcsetattr(0, TCSANOW, &term);
char ch;
while(1){
if(ch == 'q'){
// Set old configuration again and exit.
// If it's not set back the normal configuration of the
// terminal will be broken later!
tcsetattr(0, TCSANOW, &term_old);
return 0;
}
ch = getchar();
putchar(ch);
}
}
</code></pre>
<p>All the code has enough comments to be understood but there are some weird
flags it’s better to check in termios documentation.<sup id="fnref2:4"><a class="footnote-ref" href="#fn:4">4</a></sup></p>
<h3>But this is C code and Clopher is written in Clojure!</h3>
<p>I know but this is becoming long and boring. Why not wait until I get some
spare time and write the next chapter? You have tons of information to check
until I write it so you won’t be bored if you don’t want to.</p>
<p>See you next.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>In fact, you can navigate the Gopherverse like this with <code>curl</code>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>Unsurprisingly called <em>main loop</em>. Programmers are very creative. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>That <code>read</code> operation is what <code>getchar</code> is doing under the hood. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p><code>man termios</code> or visit <a href="https://linux.die.net/man/3/termios">online <code>man</code>
pages</a> <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a><a class="footnote-backref" href="#fnref2:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Down the rabbit gopher hole2019-05-07T00:00:00+03:002019-05-07T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-05-07:/clopher02.html<p>Working on the Gopher protocol implementation and opening the door to
the future problems.</p><p>As the project’s goal was to create a Gopher client, it was time to understand
something about the protocol and read the <a href="https://en.wikipedia.org/wiki/Gopher_%28protocol%29"><span class="caps">RFC</span></a>. No need for you to
know the protocol to understand what I’m going to say here. I think I already
did the difficult part for you.</p>
<h3>Understand some Gopher</h3>
<p>Gopher is a really simple protocol (this doesn’t mean I implemented it
correctly anyway). It’s assumed to work on top of <span class="caps">TCP</span> (default port is 70) and
it’s as simple as creating a socket, sending the <em>selector string</em> to it
followed by a line terminator<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>, and reading everything from it until it
closes. That’s in most of the cases how it works.</p>
<p>It has two major ways to access the data:</p>
<ol>
<li>
<p><strong>Text mode</strong>, which is used in most of the queries, needs the client to
read from the socket until a line with a single dot (<code>.</code>) appears. Then
the connection is closed.</p>
</li>
<li>
<p><strong>Binary mode</strong>, expects the client to read from the socket until the server
closes it.</p>
</li>
</ol>
<p>Easy-peasy.</p>
<p>Gopher is a stateless protocol and that helps a lot during the creation of the
client. There’s no need to retain data or anything related.</p>
<p><em>Selector strings</em> are what client wants to see. In order to know what
selections are possible, Gopher defines an specific text format that works as a
menu, and it’s called, unsurprisingly, <em>Menu</em>.</p>
<p>Menus have a description of the content, the address or hostname, the port, the
selector string, and a number that indicates the type of each of its elements
separated by a <span class="caps">TAB</span> (aka <code>\t</code> character). Each element in one line<sup id="fnref2:1"><a class="footnote-ref" href="#fn:1">1</a></sup>.</p>
<p>Pay attention to the fact that each menu entry contains an address and a port,
that means it can be pointing to a different server!</p>
<p>The <em>type</em> further than making the client choose between <em>binary</em> and <em>text</em>
mode also gives the client information about what kind of response it’s going
to get from it: if it’s a menu, an image, an audio file… It also says if the
element is a <em>search endpoint</em><sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<p>Yes, Gopher supports searches!</p>
<p>Well, Gopher supports tons of things because the only rule is that all the
logic is on the server side. You can do whatever you want, if you do it on the server.</p>
<p>Searching is as simple as asking for a text document, but it adds also the
search query to the equation. During a search, the client needs to send the
<em>selector string</em> to select the endpoint and then the <em>search string</em>
(separated by a <code>TAB</code> character).</p>
<p>There are some points more but this is more than enough for the moment.</p>
<p>Let’s make something work.</p>
<h3>Make Gopher queries</h3>
<p>Before jumping to Clojure, lets make sure that we understood how this works
with some simple text queries. In a <span class="caps">UNIX</span>-like terminal you can do the following
to navigate the <em>Gopherverse</em>:</p>
<pre class="highlight"><code class="language-bash">exec 5<>/dev/tcp/tilde.team/70
echo -e "~giacomo\r\n" >&5
cat <&5
</code></pre>
<p>This code opens a <span class="caps">TCP</span> socket to <code>tilde.team</code> at port <code>70</code> sends the selector
string <code>~giacomo</code> followed by the line terminator (<code>\r\n</code>) and prints the
answer. Simple.</p>
<p>You can do some telnet magic instead, which is easier but not as cool as the
other<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>:</p>
<pre class="highlight"><code>telnet tilde.team 70
~giacomo
</code></pre>
<p>If you run the code you’ll see you can understand the response with your bare
eyes with no parser involved. Isn’t that great?</p>
<p>Notice that in our examples our selector string is <code>~giacomo</code>. Gopher supports
empty strings as selector strings that, in most cases, return a Menu where we
can see which selector strings are valid. Why don’t you try it yourself?</p>
<h3>Move to Clojure</h3>
<p>Now we understand what it’s happening under the hood, it’s time to move to Clojure.</p>
<p>A simple text request can be understood like this piece of Clojure code here
(which involves more Java than I’d like to):</p>
<pre class="highlight"><code class="language-clojure">; Define the function to make the queries
(defn send-text-request
[host port body]
(with-open [sock (java.net.Socket. host port)
writer (clojure.java.io/writer sock)
reader (clojure.java.io/reader sock)
response (java.io.StringWriter.)]
(.append writer body)
(.flush writer)
(clojure.java.io/copy reader response)
(str response)))
; Make a query and print the result
(println (send-text-request "tilde.team" 70 (str "~giacomo" "\r\n"))
</code></pre>
<p>As you see, it’s not waiting to the dot at the end of the file and it’s not
doing any kind of parsing, error checking or timeout handling, but it works.
This a minimal (and ugly, clean the namespaces!) implementation for you to be
able to run it in the <span class="caps">REPL</span>.</p>
<h3>Binary or not?</h3>
<p>The binary is almost the same but the output must be handled in a different
way. As Clopher is a terminal based application I made it store the answer in a file.</p>
<p>There’s a simple and beautiful way to handle temporary files in Java that you
can access from Clojure. As I wasn’t a Java user before I didn’t know this:</p>
<pre class="highlight"><code class="language-clojure">(defn- ->tempfile
"Creates a temporary file with the provided extension. If extension is
nil it adds `.tmp`."
[extension]
(doto
(. java.io.File createTempFile "clopher" extension)
.deleteOnExit))
</code></pre>
<p>With this function is really simple to create a temporary file and copy the
download there. It’s also easy to ask the user if they want to store the file
as a temporary file or in a specific path. With the code below, calling to
<code>download-file-to</code> works like we described. If <code>destpath</code> is <code>nil</code> a temporary
file is created. Cool.<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup></p>
<pre class="highlight"><code class="language-clojure">(defn download-file-to
[host port srcpath destpath]
(with-open [sock (->socket host port)
writer (io/writer sock)
reader (io/reader sock)]
(.append writer (str srcpath defs/CR-LF))
(.flush writer)
(io/copy reader
(io/output-stream
(or (io/file destpath)
(->tempfile (get-extension srcpath)))))))
</code></pre>
<h3><code>doto</code>, make Java interop less painful</h3>
<p>You probably know what <code>doto</code> does but it’s interesting enough to talk about it
here. It returns the result of the first form with all the rest of the forms
applied inserting the first form’s result as first argument and discarding the
result of the operations. This sounds weird at the beginning but in cases like
this one where you are working with mutation it’s really handy:</p>
<p>We are creating a <code>File</code> instance and returning it after calling
<code>.deleteOnExit</code> on it. Take in consideration that <code>.deleteOnExit</code> returns
nothing, so discarding its return value is great. We want to return the <code>File</code>,
not the result of the <code>.deleteOnExit</code> operation.</p>
<p>Once we now how to deal with <code>doto</code> we can improve the caller with this
function that creates sockets with some timeout applied that connect automatically:</p>
<pre class="highlight"><code class="language-clojure">(defn- ->socket
([host port]
(->socket host port 10000))
([host port timeout]
(doto (java.net.Socket.)
(.setSoTimeout timeout)
(.connect (java.net.InetSocketAddress. host port) timeout))))
</code></pre>
<p>Replacing <code>java.net.Socket</code> from the example above with a call to this function
will make the call handle timeouts, configuring the socket on its creation.</p>
<p>Whatever, right? Better check the code for that. Beware that it may change as I
keep going with the development. Maybe not, it depends on the time I spend on this.</p>
<p>Here’s the link to the code. Relevant part can be found in
<code>src/clojure/clopher</code> in a file called <code>net</code> or similar:</p>
<p><a href="https://gitlab.com/ekaitz-zarraga/clopher">Link to the repository</a></p>
<p>It’s time to move on because this is taking longer than it should. We are just
warming up, let’s leave it simple at the beginning, there will be chance to
make this complex in the near future.</p>
<p>Hope you enjoyed this post.</p>
<h3>Hey! But what about the Menus?</h3>
<p>Menus are just queried like any other text document so they can be queried with
this little code. The parsing, processing and so on is only needed for user
interaction so we’ll deal with that later. Don’t worry. We all have to learn to
be patient.</p>
<p>See you in the next step.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Line terminator is <span class="caps">CRLF</span> (carriage-return + line-feed), aka <code>\r\n</code>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a><a class="footnote-backref" href="#fnref2:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>Don’t be afraid of the types because they are just a few of them. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>Remember to jump line after you enter the selector string. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p>You have to implement <code>get-extension</code> yourself but you know how to do it. <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Introducing Clopher2019-05-06T00:00:00+03:002019-05-06T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-05-06:/clopher01.html<p>Introducing Clopher, the terminal based Gopher client I’m making.</p><p>When you do a hack (or even a dirty hack) you do it for some reason. You do it
because you can understand the complexity of the problem and you see it’s a
complex problem to solve that needs a good enough solution for your case.</p>
<p>You are facing the complexity. You are seeing it. You are seeing the deepness
of the abyss.</p>
<p>This project started a little bit like an exercise to do that. Take a simple
problem: make a Gopher client, and try to solve it in a decent way collecting
information during the process.</p>
<p>It’s just a learning project, but it went wild.</p>
<p>The initial idea was to force myself to use Clojure’s network <span class="caps">API</span>, which is
Java’s one, because I never used it in the past and I wanted to learn about it
and the possible problems it can have. In order to do that I decided to write a
<a href="https://en.wikipedia.org/wiki/Gopher_%28protocol%29">Gopher</a> client, because that way I’d also had to read the
<a href="https://tools.ietf.org/html/rfc1436"><span class="caps">RFC</span></a> and some resources more.</p>
<p>I sketched the Gopher protocol exchange without many problems, because it’s
quite simple and the <span class="caps">RFC</span> is really well explained. The wild part came with the
rest of the project, which still is under heavy development and <em>it doesn’t
work yet</em> (this sentence may be edited in the future, I hope it will).</p>
<p>I wanted to make a terminal based client, and I had a cool library for this,
called <code>clojure-lanterna</code> which is just an interface to <code>lanterna</code>, a java
library for <span class="caps">TUI</span> (Terminal User Interfaces). When I wanted to use
<code>clojure-lanterna</code> I realized the project was kind of abandoned and it didn’t
cover the <span class="caps">UI</span> elements, only the basic screen interface, and I decided to make
it by myself.</p>
<p>Further than that, I thought that if I focused on only <span class="caps">POSIX</span> compatible
operating systems I wouldn’t need to use <code>lanterna</code> neither. So I decided to
implement everything by myself.</p>
<p>That took me to some thoughts I’ve been having these days: When software has
few dependencies or no dependencies at all you have more control over the
process of making it. People who code in popular programming languages have
even more libraries than we need and it’s really hard to stop the temptation to
use them (this explains some recent events with <span class="caps">NPM</span> repositories, for
instance). This is not only about security –possible security breaches in
libraries we use– or control –the fact that we included some software we don’t
know– it’s also about remembering that libraries can’t be software you just
import: they should be read, analysed and, often, thrown away in favor of an
ad-hoc solution. Many times ad-hoc solutions reduce the codebase size and they
solve the problem more accurately, as they are specifically design to solve
<em>our</em> problem.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>
<p>Also, it’s good to tell yourself you can code everything from scratch and try
to prove it true.</p>
<p>In summary, I wanted a project that covered these points:</p>
<ul>
<li>Be a simple Gopher client.</li>
<li>Written in Clojure.</li>
<li>Terminal User Interface (<span class="caps">TUI</span>).</li>
<li>No dependencies if possible.</li>
</ul>
<p>And all of them had some sense, at least in my mind, on the early beginning of
the project.</p>
<h3>So, here we are</h3>
<p>As I said before, the goal is not to create a good software. It’s not even to
create something that works. The idea is to learn during the process and this
post series is a way to put what I learned in an ordered way.</p>
<p>If you follow this post series, you’ll follow me on my research and hacks. We
are going to dive on all those weird concepts that will appear. I’ll try to be
as technically correct as I can but I’m not an expert and this is not a class.
I’m just sharing my experiences.</p>
<p>I’m looking at the abyss and telling you what I see from this view, pointing
the interesting things I spot.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>As a note, while I was writing this, I experienced some issues with
nested dependencies in a different piece of software I was using.
Dependencies can be understood as a tree, with your project at the root. More
deep the tree is, longer time for changes to arrive the root of the tree from
the leaves, because changes must be accepted in all the nodes of the affected
branch and developers are busy. This can be a problem like in the case I
experienced where a bug in a leave of the tree was solved but the root was
broken and was unable to solve the issue because they needed an intermediate
node to update the version of the leave. This <em>hurts</em>.<br>
<em>(They should’ve never added the change in the first place but when
dependencies go deep it’s more difficult to detect bugs)</em> <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Let’s document2019-02-01T00:00:00+02:002019-02-01T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-02-01:/templates-released.html<p>ElenQ Technology document templates released, with a long writeup
about the workflow we use with them.</p><p>At <a href="https://elenq.tech/">ElenQ Technology</a> I just released my documentation templates tool.
You can find it in the link below:</p>
<p><a href="https://gitlab.com/ElenQ/templates">https://gitlab.com/ElenQ/templates</a></p>
<p>I think that project, even if it’s quite simple, it’s a really good reason to
talk about document edition and my workflow.</p>
<h2>The story</h2>
<p>As part of my job at ElenQ Technology I document a lot of things: I have to
make reports, proposals, documentation for projects, notes for courses…</p>
<p>I have to write <strong>a lot</strong>.</p>
<p>If I could decide, I’d share all my documentation and files in plain text. But
I don’t decide, so I <strong>need</strong> to send <span class="caps">PDF</span> files and they need to have a nice
look so the clients understand I take care of my stuff. I also like to pay
attention to the aesthetics of what I do so I really like to keep everything in order.</p>
<p>That’s really difficult to do. Even more if you work with tools like
LibreOffice that have tons of options and menus and are sometimes difficult to
understand or hard to make them do exactly what you want. I have nothing
against LibreOffice but some years ago I realized it’s not a tool for me.
<span class="caps">WYSIWYG</span><sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> tools like that have some characteristics that don’t fit my
workflow well. Let me break them down:</p>
<ul>
<li>
<p>They are designed to work with a mouse, and I avoid using the mouse because
it makes my wrist and arm hurt. That’s why I often work with my wacom tablet
in mouse-intensive tasks like <span class="caps">PCB</span> routing and I use laptop’s touchpad in
everyday tasks.</p>
</li>
<li>
<p>They have tons of menus where you can get lost while the most of the
documents you write don’t have that kind of complexity. Often, that kind of
options just make the documents complex and hard to maintain.</p>
</li>
<li>
<p>They don’t have a clear separation between the content and the view. When I
write I like to focus on the content and avoid to get distracted with how it
looks on the screen. I hate “Oh my god, the picture moved and now the whole
layout is broken”-like errors.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup></p>
</li>
<li>
<p>Their file formats are difficult to operate with even if they are open
standards. Mixing data with something that comes from a different process is
really complex, and it makes the user write everything by hand.<br>
As an example of this: in the previous version of the <a href="https://gitlab.com/ElenQ/documentation-templates">ElenQ Documentation
Templates</a>, there was a tool to get all the git tags of the
project and insert them as a document changelog. This is really difficult to
make in LibreOffice. (This version doesn’t support that <em>yet</em>).</p>
</li>
</ul>
<p>Trying to solve all those issues, I’ve spent some time with LaTeX as a main
tool but, it also has a really thin separation between the content and the view
and its learning curve is crazy hard.</p>
<h2>Enter pandoc</h2>
<p>Some day, while working on her PhD, my sister discovered <strong>Pandoc</strong> and our
life changed.</p>
<p><a href="https://pandoc.org/">Pandoc</a> is a great tool which is able to convert between a lot of
different document formats. That opens a world of possibilities where you can
write in a format you like and then convert it to different output formats.
It’s huge. The main power of Pandoc is also related with the amount of output
formats it can handle. It is possible to write all the content of a document in
a common language like MarkDown, <span class="caps">RST</span> or AsciiDoc and then convert it to
different output formats like <span class="caps">PDF</span>, ePub or to a simple static website.</p>
<p>All this tooling also lets you write files that are easy to write and read,
like MarkDown is, without the need to play with tons of tags and weird commands
like <span class="caps">HTML</span> or LaTeX do.</p>
<p>Pandoc is a really powerful tool with tons of option that can be quite
overwhelming. It even lets you add filters that transform the <span class="caps">AST</span> it creates
during the conversion!</p>
<p>At the moment we discovered Pandoc I was really obsessed with productivity and
the chronic pain my hands, wrists and arms were suffering and I didn’t care
about anything else. If a tool could help me reduce the use of the mouse and
my keystroke count it was worth the time learning it.</p>
<p>I was so crazy at that time that I made a Vim plugin called
<a href="https://www.vim.org/scripts/script.php?script_id=5374">droWMark</a> for posting in WordPress. Taking advantage of Pandoc
filters I also made it able to <a href="https://github.com/ekaitz-zarraga/droWMark/issues/2">upload images</a> linked from the
MarkDown file. It was fun.</p>
<h2>Choose the tools</h2>
<p>Some time later I founded ElenQ Technology and I decided we needed to integrate
Pandoc in our tooling. That’s why with my sister’s help we created the first
version of the <a href="https://gitlab.com/ElenQ/documentation-templates">documentation templates</a>.</p>
<p>I wanted any person working with the documents to be able to use the editor
they like the most. And I only wanted to care about the aspect of the document
once: during the template creation.</p>
<p>It worked. I spent almost 2 years working with the old version of the templates
and they served me well. The only problem they had was that they needed many
files to work and they added some complexity to the folder where the
documents were edited.</p>
<h2>Choose the tools: remastered</h2>
<p>This new version eliminates that complexity. We needed to sacrifice a couple of
features but now there’s no need to add any extra file in the directory where
the document is. We removed the Makefiles and embedded the <span class="caps">SVG</span> logo of the
company inside the templates using TikZ. Now the tool is just a couple of
Pandoc LaTeX templates: <code>elenq-book</code> template for long documents and
<code>elenq-article</code> for short documents.</p>
<p>Like in the previous version, both templates are designed to create output
LaTeX files that can be compiled to <span class="caps">PDF</span> using XeLaTeX (or let Pandoc do the
conversion for you). The input formats are not defined, the only limitation is
on the metadata they need (you can read the documentation included with the
project for that).</p>
<p>All of this is installed <em>automagically</em> using <a href="https://www.gnu.org/software/stow/manual/stow.html">Stow</a>.</p>
<p>The project also explains in the <code>README.md</code> file how to create a couple of
command line aliases to simplify the calls to Pandoc. You really want to use
them because Pandoc needs <em>a lot</em> of input arguments. Using aliases, the
conversion as simple as running a command in the terminal:</p>
<pre class="highlight"><code class="language-bash">elenqdoc-book document.md -o book.pdf # For books
elenqdoc-article document.md -o article.pdf # For articles
</code></pre>
<p>With the new template system, the documents are just Markdown files and they
are easy to maintain under version control. Note that the same input file can
be used to create an article and a book, the input doesn’t really affect the
output of the process.</p>
<p>We decided to use MarkDown for some extra reasons too. Markdown is simple but
has everything that any simple document needs and it’s easy to read in plain
text even for people who don’t know it. But not only that, MarkDown is a widely
use format (<a href="https://gitlab.com/ekaitz-zarraga/personal_blog/raw/master/content/posts/templates-released.md">this blog</a> is written in MarkDown too!) and it’s really
extensible, letting the user insert <span class="caps">HTML</span> or LaTeX pieces to cover specific
cases like formulas or complex formatting.</p>
<h2>Choose the tools: future chapters</h2>
<p>Next step is the creation of a invoice control system that is integrated with
the Pandoc templates. The template integration is really easy, we only need to
inject some variables to the templates and Pandoc already has a tool for that:
the metadata system. From that side the problem is solved, now we need to make
all the rest.</p>
<p>On the other hand, as said before, in the future, if the conversion process
needs extra complexity, we’ll just need to add some Pandoc filters to create it.</p>
<h2>Wrapping up</h2>
<p>In summary, we can say that the tool we made is just a consequence of the
workflow we follow. This is probably not for anyone, but any person used to
work with the terminal and software is a potential user for this kind of tool.</p>
<p>It’s powerful, simple and straight-to-the-point. I think that fit’s our
workflow really well.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p><span class="caps">WYSIWYG</span>: What You See Is What You Get <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>Obligatory xkcd reference: <a href="https://xkcd.com/2109/">https://xkcd.com/2109/</a> <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Call me maybe2019-01-09T00:00:00+02:002019-01-09T00:00:00+02:00Ekaitz Zárragatag:ekaitz.elenq.tech,2019-01-09:/call-me-maybe.html<p>Recursion, stacks and optimizations.</p><p>Do you remember what happens when you call a function in your program?</p>
<p>What happens when you make too many nested calls?<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></p>
<p>When you call your functions, there some stuff going on in the memory, some
variables, the program counter and all that, that must be stored somewhere to
be able to come back to the place you called the function when the function
ends. Right?</p>
<p>The place where all that is stored is the stack. You already know all this.
When you call many nested functions, the stack goes pushing more and more data,
and there’s no chance to pop it, so it overflows.</p>
<p>This can happen anytime but there’s more risk for that when you call functions
recursively, because they call themselves many times by definition. In a
non-recursive program it can happen too, but devices can handle big levels of
nesting so it’s more unlikely to happen (in small devices like microcontrollers
or so, you have to take care of this too).</p>
<p>This doesn’t mean recursive functions will result in a stack overflow always.
That will only happen when the nesting level is bigger than the stack size.</p>
<blockquote>
<p>You are so stupid the recursive function that calculates your stupidity
causes a stack overflow.<br>
— Heard in a computer science class</p>
</blockquote>
<p>But this is not always true. There are some optimizations that can change this
behaviour and allow you to create stack-safe recursions. Let’s talk about
<strong>tail-call optimization</strong>.</p>
<p>Some programming languages implement tail-call optimization, that, if used
correctly, avoids stack overflows in recursive calls and increase performance.
First of all, in order to be able to make a tail-call optimization, the
function <strong>must</strong> have a call as its last action (tail-call). This means <strong>it
requires to be ordered in an specific way</strong>. Let’s see it with an
(oversimplified) example (in Python, but don’t pay attention to the language):</p>
<pre class="highlight"><code class="language-python">def factorial (a):
""" This function does not provide a tail-call, because the last thing
to execute in it is a multiplication, not a call """
if a == 1:
return 1
return a * factorial(a-1)
def factorial_tail (a, acc=1):
""" This function provides a tail-call, the last thing happening on it
it's a function call."""
if a == 1:
return acc
return factorial_tail(a-1, acc=acc*a)
</code></pre>
<p>As the comments say, the first function is not performing a tail-recursion, but
the second is. But, what’s the difference?</p>
<p>The main point is the first function, <code>factorial</code>, needs to go back in the call
stack to retrieve previous step’s <code>a</code> value, while the second function doesn’t.
That’s why the second can be optimized and the first not.</p>
<p>The optimization exploits this behaviour in a really clever way to avoid the
stack overflows I told you before. Tail call optimization just changes the
input parameters of the function and calls it again, replacing the original
call with the new call with different input arguments. This can be made because
the function is written in a way that doesn’t need anything from the previous step.</p>
<p>Imagine that we introduce a <code>3</code> in the first and the second function, let’s
compare the execution. Let’s check <code>factorial</code> first:</p>
<ul>
<li>Call <code>factorial(3)</code><ul>
<li>Call <code>factorial(2)</code><ul>
<li>Call <code>factorial(1)</code></li>
<li>Return <code>1</code></li>
</ul>
</li>
<li>Return <code>2 * 1</code></li>
</ul>
</li>
<li>Return <code>3 * 2</code></li>
</ul>
<p>Now with the <code>factorial-tail</code> function but without any optimization:</p>
<ul>
<li>Call <code>factorial-tail(3)</code><ul>
<li>Call <code>factorial-tail(2, acc=3)</code><ul>
<li>Call <code>factorial-tail(1, acc=6)</code></li>
<li>Return 6</li>
</ul>
</li>
<li>Return 6</li>
</ul>
</li>
<li>Return 6</li>
</ul>
<p>See the difference?</p>
<p>The <code>factorial-tail</code> call doesn’t need anything from the previous step, the
last <code>factorial-tail(1, acc=6)</code> function call’s result is the same as the
result of the <code>factorial-tail(3)</code> function. That changes everything!</p>
<p>What tail call optimization does is just change the call arguments and keep
running the same code. There’s no need to store anything on the stack, just
change the function call with the tail call.</p>
<p>Let’s optimize the second call now:</p>
<ul>
<li>Call <code>factorial-tail(3)</code></li>
<li>Replace the call with <code>factorial-tail(2, acc=3)</code></li>
<li>Replace the call with <code>factorial-tail(1, acc=6)</code></li>
<li>Return 6</li>
</ul>
<p>This can be stretched further! It can involve different functions! In any place
where a tail-call is made, even if the called function is a different function,
this kind of optimization can be done, reducing the stack size and increasing
the performance.</p>
<p>If you want to read more about this, there’s <a href="https://en.wikipedia.org/wiki/Tail_call">a great wikipedia page on the
subject</a> and you there’s <a href="https://www.lua.org/pil/6.3.html">a really good explanation in the book
Programming in Lua</a>.</p>
<p>But how is all this handled by the programming languages? You may ask.</p>
<p>The answer is there’s not a clear answer, all of them have their own style of
dealing with this. Let me give you some examples.</p>
<p><strong>Python</strong>, just to point out the language I chose for the example is no the
best example of this, has no tail recursion elimination. Guido and the
Pythonists<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup> argue that tail call optimization alters the stack traces (which
is true) and that they don’t like the recursion as a base for programming, so
they try to avoid it. In CPython there’s no tail call optimization, but they
don’t forbid (they can’t!) any other Python implementation to implement that
particular optimization. There’s a really <a href="https://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html">interesting post by Guido Van Rossum
about this</a>.</p>
<p><strong>Lua</strong>, as you’ve seen in the <a href="https://www.lua.org/pil/6.3.html">previous link</a>, implements proper tail
calls (as they call them there) and there’s nothing the programmer needs to do
to make sure they are optimized. The only thing is to put the tail calls correctly.</p>
<p><strong>Scala</strong> implements tail recursion optimization at compile time so the
compiler transforms the recursive call with a loop in compilation time. That’s
interesting because there’s a compile time check too. There’s an annotation
called <code>@tailrec</code> that can be used to make sure that your function is going to
be optimized. If the compiler is not able to optimize it will throw an error if
it has the <code>@tailrec</code> annotation. If it doesn’t have it, it will simply make a
standard recursion. In the <a href="https://docs.scala-lang.org/tour/annotations.html">annotations tour of the Scala language</a> has
some words about <code>@tailrec</code>.</p>
<p><strong>Clojure</strong> is a really interesting case too. Clojure doesn’t implement
tail-call optimization, but it has one (and only one) special form for non
stack consuming looping: <code>recur</code>. This special form rebinds the recursion
point’s bindings or arguments and jumps back to the recursion point. The
<em>recursion point</em> can be a <code>loop</code> special form or a function definition. So,
it’s just an explicit call to the tail recursion optimization. Tail call must
be done correctly too, <code>recur</code> is only allowed in a tail call and the compiler
checks if it’s located in the correct place. Also, it has some specific rules
that must be taken in consideration (multiple arity functions and so on), that
is better to <a href="https://clojure.org/reference/special_forms#recur">read in the documentation</a>.</p>
<blockquote>
<p><em>Edited 2019-01-25</em>: Thanks to a discussion in the fediverse about the topic,
I found the moment where <strong>Emacs Lisp</strong> got its tail call optimization
everything explained <a href="https://chrismgray.github.io/posts/emacs-tco/">in the author’s blog</a>. It’s really interesting.</p>
</blockquote>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Some help: what’s the name of the website you check when you don’t know
how to solve your programming problem? <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>My next music band name. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>My first time2018-06-23T00:00:00+03:002018-06-23T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2018-06-23:/First-Time.html<p>Thoughts about my first contribution to free software</p><p>The other day I remembered a very important day on my life, one of those early
beginnings that started to change my mind: <strong>The first time I contributed to
free software</strong>.</p>
<p>My first contribution was in 2014, more specifically the 22nd of May of 2014.</p>
<p>That’s only 4 years ago. But, at the same time, they already passed 4 years
since then? <span class="caps">OMG</span>.</p>
<p>You get the feeling, right?</p>
<p>You may think I started coding when I was 10 or something like that. I didn’t.
I learned programming in the university and not as well as a Computer Scientist
because I studied Telecommunication Engineering and computers are just a third
of the studies while the other two parts are electronics and signals related things.</p>
<p>I’m not a young hacker or a genius. My parents don’t like computers. I didn’t
live with a computer at home since I was a toddler. That didn’t happen.</p>
<p>Today I want to tell you my story. Not because it’s awesome and you’ll love it.
I want to tell you my story because it’s <strong>really</strong> standard. I want you to see
that you can also contribute to Free Software. Anyone can.</p>
<p>So, how did it all start?</p>
<p>I started my university studies in 2009. The first year we had one semester of
C and the next one of C++. Not real programming classes, just introductory
stuff for the languages and computers. A couple of years later we had a
networking subject where I used Linux for the first time. The computers had
<em>Kubuntu</em> installed. At that time my laptop started to give me some trouble and
I installed <em>Kubuntu</em> in a dual boot and tested it. It was nice.</p>
<p>Few time later the <em>Windows</em> partition failed again and I was comfortable
enough in <em>Kubuntu</em> to delete it and use only <em>Kubuntu</em>. It was easy.</p>
<p>The second semester that year another subject had some focus on Linux because
it was a networks and tools subject and I really needed it. We learned to use a
terminal, some <span class="caps">SQL</span> and many things like that. Simple tools but they resulted to
be useful in the future. I was really surprised by the power of the terminal
and I studied a lot in my free time I finished the subject with honours just
because I was really interested on it. As I said, I’m not a genius, I was interested.</p>
<p>We had a subject about <em>Minix</em>, following Andrew Tannenbaum’s <em>Operating
Systems: Desing and Implementation</em> book and <em>Minix</em> version 1, which gave us
the initial needed knowledge about Operating Systems at that time. That started
to give me some info about the ethical part of the free software and also
sparked more interest.</p>
<p>Next year I had a couple of Operating Systems subjects (the theoretical one and
the practical one). The teacher was part of <em><span class="caps">KDE</span> Spain</em>, and he talked about
free software in class. I was quite into it at that time. The practical part of
the subject was real software, we covered the contents of the book called
<em>Advanced Linux programming</em><sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. That was pure C development and we didn’t
have a lot of knowledge on that. We just touched some C/C++ during the first
year and some assembly in a couple of subjects. It was really hard, but it was
really cool.</p>
<p>We made a small shell. It was great!</p>
<p>Final year<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup> of the university: I had to make the final project.</p>
<p>I didn’t know what to do so I contacted the teacher who was part of <em><span class="caps">KDE</span> Spain</em>
and he mentored me. I installed a <span class="caps">IRC</span> client and started talking with the
people at <em>kde-telepathy</em> project. I wasn’t used to that kind of collaborative
development. Heck, I wasn’t used at any kind of development! But it was all
good, mostly thanks to the great people in the project (David, Diane, George,
Martin… <em>You</em> are awesome!).</p>
<p>The project itself was a <em><span class="caps">KDE</span></em> application, <em><span class="caps">KDE</span>-Telepathy</em>, a big one. Thanks
to heaven, my part of the project was quite separated so I could focus on my
piece. That taught me to search in a big codebase and focus on my part. Then I
had to code in C++ like in the real life, not like designed problems I’ve
worked on at the university, and I also had to read tons of documentation about
<em>Qt</em>, <em><span class="caps">KDE</span></em> and anything else.</p>
<p>I started with the contribution that opened this post and I went on until I
renewed the whole interface. It wasn’t great, but the code was finally merged
in the application some time later.</p>
<p>Since then I could say I code almost everyday and I’ve been studying many
languages more but, at that time, I was relatively new to programming and computers.</p>
<p>With all this I mean:</p>
<blockquote>
<p>If you are interested, try. Everything is going to be fine. You don’t need to
be a genius<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>.</p>
</blockquote>
<p><a href="https://git.reviewboard.kde.org/r/118256/diff/2#index_header">You can check the contribution
here</a>.</p>
<p>Love.</p>
<p>Ekaitz</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>It’s a great book, by the way. You can find it
<a href="https://mentorembedded.github.io/advancedlinuxprogramming/">online</a>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>When I studied, right before the <a href="https://en.wikipedia.org/wiki/Bologna_Process">Bolognia
Process</a>, the university was 5
years long for a Masters Degree and 3 for a Bachelor Degree. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>But congratulations if you are, that way you’ll learn faster and probably
have more reach if you want to. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>Genesis2018-04-15T00:00:00+03:002018-04-15T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2018-04-15:/Genesis.html<p>About this blog, ElenQ Technology, and myself</p><p>As a first post in this <em>official-but-not-very-official</em> blog I just want to
introduce myself and <em>ElenQ Technology</em>.</p>
<p>First of all, my name is Ekaitz Zárraga and I was born in 1991. I describe my
job as R&D Engineer but actually I studied Telecommunications Engineering and I
<em>only</em> make my research and development in that area. I’m mostly focused in
programming or computer-related activities but I can also do some electronics
and other kind of things. That’s my formal introduction. In the informal part
I’d say I’ve always been a really curious person and that made me try other
disciplines like arts in its different forms. This last point drives most of
what I’ll write about later in this text. That’s all from my part, I’ll write
down an informal resume in the future.</p>
<p><em>ElenQ Technology</em> is a name, it’s a name to call the way I am and the
interests I have. That said, it’s also the independent R&D project I’m running.
It’s a different kind of company which aims to raise awareness about ethical
technology or <a href="https://elenq.tech/en/about.html#ethical-innovation">Ethical Innovation</a> by example, demonstrating ethical
companies can be profitable. It’s not simply the way I make my living where I
make engineering, it’s also a performance. Like an art piece.</p>
<blockquote>
<p><em>ElenQ Technology</em> is an art piece which tells you that a different model is
possible. It tells you that you have a choice and you don’t need to work in a
corporation and be governed by its rules.</p>
</blockquote>
<p><em>ElenQ Technology</em> is the result of many things I felt working for other
companies and it’s also the result of a deep analysis of the status of the
technology in my context, which I think it can be generalized globally with a
decent accuracy.</p>
<p>First, in my near context most of the <span class="caps">IT</span> companies have a similar business
model based on <em>body shopping</em> and they pay really low salaries. Other sectors
are not in a much better position, but the <span class="caps">IT</span> is outrageous.</p>
<p>The jobs are not 9-to-5 jobs here. Working 10 hours per day is becoming the
norm. The famous <em>Economical Crisis™</em> mixed with a deep corruption made people
pray for jobs and the companies are being aware of that.</p>
<p>That said, you can easily imagine how the tech world works here. <span class="caps">IT</span>
corporations get a ridiculous amount of money while they don’t respect their
workers nor their clients. They make proprietary solutions because they don’t
want to lose the projects and let the client be independent. They don’t want
you to be free in any case because they need to maintain their rotten business model.</p>
<p>That is the general status of the <span class="caps">IT</span> world in my surroundings but I’m sure, as
I said, it can be generalized, maybe not totally but there are many points than
can be, mostly because the corporations I mention are present in other countries.</p>
<p>Personally, I had the luck to work in what I thought it was a better place.
The working conditions were not as bad as I described, or at least I
thought that. It was a R&D Engineer position in a not very big corporation. I
worked in a small department with less than ten co-workers. We made new stuff
for the company. It was fun.</p>
<p>After some time there I realized how it worked. It wasn’t really different to
other jobs. There were a lot of things I don’t want to share here but I started
to feel bad there and my personal situation didn’t help at all. I’ve always
been a really curious person, I love learning new things and the job simply
wasn’t giving me that as it did at the beginning. I started to need to fill my
needs spending more time after the job doing tech related stuff in the few free
time I had. The mix between the boring job and the organizational problems we
had made it really depressing.</p>
<p>While I was immerse in that depressing environment, our company wanted more
money and they started looking for new businesses with the resources they had.
Our team, as R&D team, was responsible of the development of the first
<em>proof-of-concept</em> of the new technology. We were asked to analyze the data
that the company had. Literally, we were asked to track people. The company
didn’t care if they were our users or not, we were asked to track <em>everyone</em>.</p>
<p>That was the straw that broke the camel’s back. I left the job because my
ethics are not compatible with people tracking. I don’t like it and I don’t
want to be part of it.</p>
<p>I always wanted to change the way the technology is created and I always
thought it was a great idea to make it by myself and encourage others to do so,
but I never had the courage to do it. What happened gave me the courage I needed.</p>
<p>But, why not simply move to another job?</p>
<p>I think it was the moment to try it. I had been defining an idea of what
Ethical Technology is for a while and I always wanted to apply that idea in my
field: R&D. Also, taking in account that most of the companies where I could
work have the same structural problems I decided to change it from the root. I
decided to try a different model. Did I really have other options?</p>
<p>Is living in a depressing environment making the world a worse place to live
in an option? Really?</p>
<p>Think about it.</p>
<p>Now, <em>ElenQ Technology</em> breaks the business model of the companies of my
environment. It makes ethical technology and it makes it in an ethical way.
That’s good for me because it lets me work on the fields I like and I can make
the world a better place to live in, which selfishly will improve the future I
leave for my children if I ever have them. But it’s also good for the clients
<em>ElenQ Technology</em> has, because all the projects are handled in a way <em>they</em>
own them, following the principles of the Free Software and Hardware and
<a href="https://2017.ind.ie/ethical-design/">Ethical Design manifesto</a> with some
extra ideas I added to be more specific to the innovation field (you can read
more <a href="https://elenq.tech/en/about.html#ethical-innovation">here</a>).</p>
<p>I want to change the world. Don’t you?</p>
<p>The thing is I can’t do it alone so <em>ElenQ Technology</em> wants to push other
people to take the same decision I took (or I was forced to take) and that’s
its main goal.</p>
<p>Let’s change the world together.</p>
<h3>About this website</h3>
<p>You already noticed this is not the official <em>ElenQ Technology</em> website. This
is <em>my</em> <em>official-but-not-very-official</em> blog as part (or head, but I don’t
like that word) of <em>ElenQ Technology</em>.</p>
<p>Here I’ll write about <em>ElenQ Technology</em>‘s philosophy, goals, achievements and
that kind of <em>official</em> things but mostly I’ll write <strong>about the things we
make</strong>. That’s what interests me the most.</p>
<p>This is going to be a really technical place where I’ll try to explain advanced
concepts in a simple way to let you learn stuff with me. I want to share what I
do with you all.</p>
<h3>About languages</h3>
<p>I’ll write all this post in English but some of the entries are going to be
translated. As the blog supports that, it’s better for me to leave it like a
supported tool and keep the translation option always enabled so I can add
community translated posts or posts translated by myself.</p>[EU] Genesis2018-04-15T00:00:00+03:002018-04-15T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2018-04-15:/Genesis-eu.html<p>Blog <em>ofizial-baina-ez-oso-ofizial</em> honen lehenengo post bezala, nor naizen eta
<em>ElenQ Technology</em> zer den aurkeztu nahi dut.</p>
<p>Hasteko, Ekaitz Zárraga naiz eta 1991. urtean jaio nintzen. Nire burua I+G
ingeniari bezala deskribatzen dudan arren, Telekomunikazio Ingeniaritza ikasi
nuen eta arlo horretan <em>bakarrik</em> egiten dut lan. Bereziki ordenagailuekin
lotutako gauzak egiten …</p><p>Blog <em>ofizial-baina-ez-oso-ofizial</em> honen lehenengo post bezala, nor naizen eta
<em>ElenQ Technology</em> zer den aurkeztu nahi dut.</p>
<p>Hasteko, Ekaitz Zárraga naiz eta 1991. urtean jaio nintzen. Nire burua I+G
ingeniari bezala deskribatzen dudan arren, Telekomunikazio Ingeniaritza ikasi
nuen eta arlo horretan <em>bakarrik</em> egiten dut lan. Bereziki ordenagailuekin
lotutako gauzak egiten ditut, programazioa eta horrelakoak, baina elektronika
eta bestelako gauzak jorratzeko ere gaitasuna daukat. Hori da nire sarrera
formala. Sarrera informalean jakin-min handia dudala esango nuke eta horrek
beste diziplina batzutan aritzeko aukera eman didala, haien artean artea.
Azkeneko puntu honek garrantzi handia dauka testu honetan idatzitakoarekin
erlazio zuzena izango duelako. Hauxe da nire aurkezpena, etorkizunean kurrikulum
informal bat egingo dut.</p>
<p><em>ElenQ Technology</em> izen bat da, nire interesak eta nire izaera adierazteko izen
bat baino ez. Hori esanda, aldi beran nire I+G proiektu independentea da.
Adibidearen bitartez, enpresa etikoak errentagarriak izan daitezkeela
erakutsiz, teknologia etikoari buruz kontzientzia eratzeko helburua duen
enpresa bat da. Ez da nire lana bakarrik, <em>performance</em> artistiko bat da.
Artelan baten antzera.</p>
<blockquote>
<p><em>ElenQ Technology</em> modelo ezberdin bat egin daitekeela esaten dizun artelan
bat da. Aukerak badaudela eta ez zaudela korporazio baten arauekin lan
egitera behartuta esaten dizu.</p>
</blockquote>
<p><em>ElenQ Technology</em> beste enpresetan lan egitearen eta nire ingurunean
teknologiaren egoeraren analisi bat egitearen emaitza da, baina uste dut nahiko
modu zehatzean orokortu daitekeela.</p>
<p>Hasteko, nire inguruko <span class="caps">IT</span> enpresek <em>body-shopping</em>-ean oinarritutako negozio
eredu antzekoa daukate. Askotan ilegalak diren praktikak egiteaz gain soldata
baxuak ordaintzen dituzte. Beste sektoreak ez dira askoz hobeak baina <span class="caps">IT</span>
munduaren kasua guztiz ankerra da.</p>
<p>Ordutegiak luzatu egin dira azken urteotan. Egunean 10 ordu lan egitea normala
bihurtzen hasi da. <em>Krisi Ekonomiko™</em> famatuaren ondorioz, prekaritatea
normalizatu egin da, gehien bat, enpresek argi daukatelako jendeak lanaren
behar larria daukala.</p>
<p>Hori esanda, erraz ulertu dezakezu teknologiaren mundua nola dagoen. <span class="caps">IT</span>
korporazioek haien bezeroak eta langileak errespetatu gabe dirutza egiten dute.
Bezeroak lotuta izateko produktu propietarioak saltzen dituzte. Haien helburu
nagusia daukaten negozio eredu ustelaren etorkizuna bermatzea da.</p>
<p>Hau da bizi dugun egoera. Lehen esan bezala, erraz orokortu daiteke, aipatutako
enpresa gehienak atzerrikoak direlako eta beste herrialdeetan ere kokatuta daudelako.</p>
<p>Nire kasuan, leku hobe batean lan egiten nuela uste nuen. Lan baldintzak
hobeak ziren bertan. I+G ingeniari postu bat nuen enpresa moderno batean.
Hamar bat pertsonaz osatutako departamendu txiki isolatu bat zen. Enpresaren
jostailu berriak egiten genituen. Nahiko dibertigarria zen.</p>
<p>Denbora pasa ahala, lan baldintzak hain onak ez zirela konturatzen hasi
nintzen. Aipatu nahi ez ditudan gauza asko gertatu zirenez, bertan txarto
sentitzen hasi nintzen eta, gainera, nire egoera pertsonalak ez zuen batere
lagundu. Oso pertsona kuriosoa naiz eta lanak nire jakin-mina asetzeko ematen
zidan aukera desagertzen hasi zen. Nire denbora librean gauza berriak ikasten
eta aztertzen hasi nintzen. Lan aspergarria barneko antolaketa arazoekin batu
zen eta nire bizitza kudeatzea oso zaila egin zitzaidan.</p>
<p>Depresioan murgilduta, enpresak, burtsara ateratzea helburu zuela, bere
negozioa handitu nahi zuen. Gure departamenduak, enpresaren I+G-aren
erantzulea zenez, enpresaren kontzeptu-proba berrien garapena egin behar
zuen. Enpresak zituen datuak aztertzea eskatu ziguten. Pertsonen kokapena
jarraitzea eskatu ziguten, mundu osotik, edozein momentuan. Ez zuten bezero eta
ez-bezeroen arteko ezberdintasunik egin nahi. <em>Guztiak</em> jarraitzeko eskatu ziguten.</p>
<p>Hori gehiegi zen niretzat. Arrazoi etikoengatik utzi nuen lana. Ez dut horretan
parte hartu nahi.</p>
<p>Teknologia eratzen den modua aldatu nahi izan dut beti. Nire kabuz teknologia
egitea eta besteak gauza bera egitera bultzatzea beti egon da nire buruan baina
orain arte ez dut salto hori emateko ausardirik izan. Gertatutakoak falta
zitzaidan bultzada eman zidan.</p>
<p>Baina, zergatik ez mugitu beste lan bateara?</p>
<p>Momentua zela uste dut. Denbora luzez ibili naiz teknologia etikoari buruz
pentsatzen eta nire esparruan, Ikerkuntza eta Garapenean, aplikatu nahi nuen.
Gainera, nire inguruko enpresetan izango nituen arazoak ikusita, zuzenean
sustraira joatea erabaki nuen. Beste modelo bat saiatzea erabaki nuen. Beste
aukerarik al nuen?</p>
<p>Mundua txarrerantz aldatzen giro deprimagarri batean lan egitea benetako aukera
bat al da?</p>
<p>Pentsatu ondo.</p>
<p>Orduan, <em>ElenQ Technolgy</em>-k nire inguruko konpainien negozio modeloa apurtzen
du. Teknologia Etikoa garatzen du, modu etiko batean. Hori ona da niretzat,
zuzenean, niri gustatzen zaizkidan gauzetan lan egiteko aukera ematen didalako,
etorkizunean eduki ditzakedan umeentzat mundua hobetzen dudan bitartean. Eta
bezeroentzat, proiektuak garatutako teknologia <em>bezeroarena</em> izateko moduan
kudeatzen ditugulako, Software eta Hardware Librearean printzipioak eta
<a href="https://2017.ind.ie/ethical-design/">Diseinu Etikoaren Manifestoa</a> (gure
esparrura moldatuta) jarraituz (gehiago irakurri dezakezu <a href="https://elenq.tech/eu/about.html#ethical-innovation">hemen</a>).</p>
<p>Mundua aldatu nahi dut, zuk ez?</p>
<p>Nik bakarrik ezin dudala mundua aldatu konturatu naiz, beraz, <em>ElenQ
Technology</em>-k besteak nik (behartuta edo ez) hartu nuen erabakia hartzera
bultzatzea du helburu.</p>
<p>Aldatu dezagun mundua guztiok batera.</p>
<h3>Blog honi buruz</h3>
<p>Jada konturatu zara blog hau <em>ElenQ Technology</em>-ren blog ofiziala ez dela.
<em>Nire</em> blog <em>ofizial-baina-ez-oso-ofiziala</em> da, <em>ElenQ Teknology</em>-ren parte
(edo buru, baina ez dut hitz hori gustoko) bezala.</p>
<p>Hemen <em>ElenQ Technology</em>-ri buruz idatziko dut, bere filosofia, helburu,
arrakasta, etab.-ei buruz. Baina gehien bat <strong>egiten dugunari buruz</strong> idatzi
nahi dut, hori baita niretzat interesgarriena. Prozesuaren parte egin nahi zaituztet.</p>
<h3>Hizkuntzei buruz</h3>
<p>Blog hau ingelesez idatziko da, baina aukera dago testuak (hau bezala) beste
hizkuntzetara itzultzeko. Blogak baimentzen duen bitartean nahiago dut aukera
prest uztea komunitateak itzulitako testuak gehitzeko edota, kasu honen moduan,
nik egindako itzulpenak igo ahal izateko.</p>[ES] Génesis2018-04-15T00:00:00+03:002018-04-15T00:00:00+03:00Ekaitz Zárragatag:ekaitz.elenq.tech,2018-04-15:/Genesis-es.html<p>Como primer post en este blog <em>oficial-pero-no-muy-oficial</em> sólo quiero
presentarme y presentar <em>ElenQ Technology</em>.</p>
<p>Me llamo Ekaitz Zárraga y nací en 1991. Suelo describir mi trabajo como
ingeniero de I+D pero en realidad estudié Ingeniería de Telecomunicaciones y
<em>sólo</em> trabajo en ese área. Mayormente me centro en actividades relacionadas …</p><p>Como primer post en este blog <em>oficial-pero-no-muy-oficial</em> sólo quiero
presentarme y presentar <em>ElenQ Technology</em>.</p>
<p>Me llamo Ekaitz Zárraga y nací en 1991. Suelo describir mi trabajo como
ingeniero de I+D pero en realidad estudié Ingeniería de Telecomunicaciones y
<em>sólo</em> trabajo en ese área. Mayormente me centro en actividades relacionadas
con los ordenadores como, por ejemplo, programar pero también puedo hacer
electrónica y otras cosas. Esa sería mi presentación formal. En la informal
diría que soy una persona bastante curiosa, lo que me ha hecho investigar y
profundizar en otras disciplinas como el arte en sus diferentes formas. Este
último punto explica mucho de lo que vendrá después en este texto. Y eso es
todo en lo que a mí respecta, ya escribiré un currículum vitae informal en el futuro.</p>
<p><em>ElenQ Technology</em> es un nombre, una forma de llamar a como soy y a los
intereses que tengo. Además, también es un proyecto de I+D que estoy
desarrollando. Es una empresa distinta a las demás, en la que pretendo generar
conciencia acerca de la tecnología ética mediante el ejemplo, demostrando que
las compañías de tecnología ética pueden ser rentables. No es sólo mi trabajo
en el que hago ingeniería, también es una <em>performance</em> artística. Es como una
obra de arte.</p>
<blockquote>
<p><em>ElenQ Technology</em> es un proyecto artístico que te dice que otro modelo es
posible. Te recuerda que tienes elección y que no tienes que trabajar en una
corporación y seguir sus reglas.</p>
</blockquote>
<p><em>ElenQ Technology</em> es, simplemente, el resultado de todas las cosas que he
sentido trabajando para otras compañías y el resultado de un análisis profundo
del estado de la tecnología en mi contexto cercano que, creo, puede ser
extrapolado al resto de lugares con una precisión aceptable.</p>
<p>Para contextualizar, las grandes empresas del mundo <span class="caps">IT</span> en mi zona cercana
tienen un modelo de negocio similar, basado en el <em>body shopping</em> (muchas de
ellas haciendo cesiones ilegales en subcontratas). Pagan unos salarios
bajísimos y las condiciones laborales son lamentables. El resto de sectores
tampoco están mucho mejor, pero el caso de las empresas del mundo <span class="caps">IT</span> es escalofriante.</p>
<p>Los trabajos rara vez son de 8 horas diarias, las jornadas se están alargando
cada vez más y, en muchos, es normal trabajar 10 horas al día. La famosa
<em>Crisis Económica™</em> mezclada con una profunda corrupción ha sido el caldo de
cultivo perfecto para que las grandes empresas se aprovechen de los trabajadores.</p>
<p>Dicho esto, es muy fácil entender cómo funciona el mundo de la tecnología por
aquí. Grandes empresas ganando insultantes cantidades de dinero mientras que no
respetan a sus trabajadores o clientes, soluciones tecnológicas privativas para
atar a los clientes e impedirles ser independientes, etc. Todo para mantener su
modelo de negocio podrido y corrupto hasta la médula.</p>
<p>Ese es el estado de las empresas de tecnología en mi entorno, el de las
grandes. Seguro que puede extrapolarse a otros lugares porque muchas de ellas
operan también en el extranjero.</p>
<p>En mi caso tuve la suerte de acabar en una empresa que parecía un lugar mejor.
Las condiciones eran ligeramente mejores que las que he descrito, o al menos
así lo creía yo. Era un trabajo de Ingeniero de I+D en una empresa no demasiado
grande. Trabajaba en un departamento aislado de menos de 10 personas. Hacíamos
los juguetes nuevos de la empresa. Era divertido.</p>
<p>Después de algún tiempo allí me di cuenta de cómo funcionaba. No era tan
diferente al resto. Hubo muchas cosas que no quiero compartir aquí pero empecé
a sentirme bastante mal y mi situación personal tampoco ayudó mucho. Siempre he
sido una persona curiosa a la que le gusta aprender cosas nuevas y ese trabajo
dejó de aportarme eso como lo hacía al principio. Empecé a necesitar llenar ese
hueco trabajando en mis proyectos personales en el poco tiempo que me quedaba
al día. La suma de un entorno de trabajo aburrido y deprimente más los
problemas organizativos que teníamos era difícil de gestionar.</p>
<p>Sumergido en ese entorno deprimente, la empresa, con intención de salir a bolsa
próximamente, quiso exprimir al máximo sus recursos y plantear nuevos negocios.
Nuestro departamento, como encargado del I+D de la empresa, era el responsable
de plantear las nuevas <em>pruebas de concepto</em>. Nos pidieron que analizásemos los
datos de la compañía. Literalmente, nos pidieron que siguiésemos a la gente,
que los localizásemos. No les importaba que fuesen nuestros clientes o no.
Querían que localizásemos a <em>todos</em>.</p>
<p>Eso fue la gota que colmó el vaso. Tenía que dejarlo porque eso superaba con
creces el límite de mi ética personal. No me gustan esas prácticas y no podía
ser parte de eso.</p>
<p>Llevaba tiempo pensando en la forma en la que hacemos tecnología y siempre me
había apetecido probarlo por mi cuenta. Eso me dio el valor que me faltaba
para hacerlo.</p>
<p>¿Por qué no simplemente cambiar de trabajo?</p>
<p>Creo que era el momento para intentarlo. Como entusiasta del software y
hardware libre, siempre me ha interesado definir lo que es la tecnología ética
y llevaba tiempo con ganas de aplicarlo en mi campo: el I+D. Además, teniendo
en cuenta el estado de las empresas para las que podía trabajar, decidí cambiar
las cosas de raíz. Decidí intentar un modelo distinto. ¿Tenía alguna otra
alternativa en realidad?</p>
<p>¿Es una alternativa real trabajar en un entorno deprimente que hace del mundo
un lugar peor? ¿Seguro?</p>
<p>Piensa en ello.</p>
<p><em>ElenQ Technology</em> rompe entonces con ese modelo de negocio y hace tecnología
ética de una forma ética. Eso es bueno para mí porque me permite trabajar en
los campos que me gustan y hacer del mundo un lugar mejor lo que, egoístamente,
mejorará el futuro que le deje a mis hijos, si algún día los tengo. Al mismo
tiempo esto es bueno para los clientes de <em>ElenQ Technology</em> porque los
proyectos se gestionan de forma que <em>ellos</em> son los dueños de la tecnología que
se crea. Para esto último se siguen los principios del Software y el Hardware
Libre y el <a href="https://2017.ind.ie/ethical-design/">Manifiesto del Diseño Ético</a>
junto con algunas ideas adicionales más específicas del campo al que me dedico
(puedes leer más <a href="https://elenq.tech/es/about.html#ethical-innovation">aquí</a>).</p>
<p>Quiero cambiar el mundo. ¿Tú no?</p>
<p>El problema es que yo no puedo hacerlo solo así que <em>ElenQ Technology</em> es una
forma de hacer que otros tomen la misma decisión que yo tomé (o fui forzado a
tomar) y ese es su objetivo principal.</p>
<p>Cambiemos el mundo juntos.</p>
<h3>Sobre este blog</h3>
<p>Ya te has dado cuenta que este blog no es el blog oficial de <em>ElenQ
Technology</em>. Esto es <em>mi</em> blog <em>oficial-pero-no-muy-oficial</em> como parte (o
“persona al frente”, pero decirlo así no me gusta) de <em>ElenQ Technology</em>.</p>
<p>Aquí escribiré sobre <em>ElenQ Technology</em>, sobre su filosofía, objetivos, logros
y ese tipo de temas <em>oficiales</em> que me parezcan relevantes pero sobre todo
tengo la intención de escribir sobre las <strong>cosas que hacemos</strong>. Eso es lo que
más me interesa.</p>
<p>Este sitio será un lugar muy técnico en el que trataré de explicar conceptos
avanzados de forma sencilla para que aprendáis conmigo. Quiero compartir lo que
haga con vosotros.</p>
<h3>Sobre los idiomas</h3>
<p>Este blog se escribe en inglés, pero algunas de las entradas (como esta misma)
podrán traducirse a otros idiomas. Como el blog soporta traducciones, prefiero
mantener las opción activa para, si es necesario, añadir traducciones
proporcionadas por la comunidad o hechas por mí mismo, como en este caso.</p>