Do you remember what happens when you call a function in your program?
What happens when you make too many nested calls?1
When you call your functions, there some stuff going on in the memory, some variables, the program counter and all that, that must be stored somewhere to be able to come back to the place you called the function when the function ends. Right?
The place where all that is stored is the stack. You already know all this. When you call many nested functions, the stack goes pushing more and more data, and there’s no chance to pop it, so it overflows.
This can happen anytime but there’s more risk for that when you call functions recursively, because they call themselves many times by definition. In a non-recursive program it can happen too, but devices can handle big levels of nesting so it’s more unlikely to happen (in small devices like microcontrollers or so, you have to take care of this too).
This doesn’t mean recursive functions will result in a stack overflow always. That will only happen when the nesting level is bigger than the stack size.
You are so stupid the recursive function that calculates your stupidity causes a stack overflow.
— Heard in a computer science class
But this is not always true. There are some optimizations that can change this behaviour and allow you to create stack-safe recursions. Let’s talk about tail-call optimization.
Some programming languages implement tail-call optimization, that, if used correctly, avoids stack overflows in recursive calls and increase performance. First of all, in order to be able to make a tail-call optimization, the function must have a call as its last action (tail-call). This means it requires to be ordered in an specific way. Let’s see it with an (oversimplified) example (in Python, but don’t pay attention to the language):
def factorial (a):
""" This function does not provide a tail-call, because the last thing
to execute in it is a multiplication, not a call """
if a == 1:
return 1
return a * factorial(a-1)
def factorial_tail (a, acc=1):
""" This function provides a tail-call, the last thing happening on it
it's a function call."""
if a == 1:
return acc
return factorial_tail(a-1, acc=acc*a)
As the comments say, the first function is not performing a tail-recursion, but the second is. But, what’s the difference?
The main point is the first function, factorial
, needs to go back in the call
stack to retrieve previous step’s a
value, while the second function doesn’t.
That’s why the second can be optimized and the first not.
The optimization exploits this behaviour in a really clever way to avoid the stack overflows I told you before. Tail call optimization just changes the input parameters of the function and calls it again, replacing the original call with the new call with different input arguments. This can be made because the function is written in a way that doesn’t need anything from the previous step.
Imagine that we introduce a 3
in the first and the second function, let’s
compare the execution. Let’s check factorial
first:
- Call
factorial(3)
- Call
factorial(2)
- Call
factorial(1)
- Return
1
- Call
- Return
2 * 1
- Call
- Return
3 * 2
Now with the factorial-tail
function but without any optimization:
- Call
factorial-tail(3)
- Call
factorial-tail(2, acc=3)
- Call
factorial-tail(1, acc=6)
- Return 6
- Call
- Return 6
- Call
- Return 6
See the difference?
The factorial-tail
call doesn’t need anything from the previous step, the
last factorial-tail(1, acc=6)
function call’s result is the same as the
result of the factorial-tail(3)
function. That changes everything!
What tail call optimization does is just change the call arguments and keep running the same code. There’s no need to store anything on the stack, just change the function call with the tail call.
Let’s optimize the second call now:
- Call
factorial-tail(3)
- Replace the call with
factorial-tail(2, acc=3)
- Replace the call with
factorial-tail(1, acc=6)
- Return 6
This can be stretched further! It can involve different functions! In any place where a tail-call is made, even if the called function is a different function, this kind of optimization can be done, reducing the stack size and increasing the performance.
If you want to read more about this, there’s a great wikipedia page on the subject and you there’s a really good explanation in the book Programming in Lua.
But how is all this handled by the programming languages? You may ask.
The answer is there’s not a clear answer, all of them have their own style of dealing with this. Let me give you some examples.
Python, just to point out the language I chose for the example is no the best example of this, has no tail recursion elimination. Guido and the Pythonists2 argue that tail call optimization alters the stack traces (which is true) and that they don’t like the recursion as a base for programming, so they try to avoid it. In CPython there’s no tail call optimization, but they don’t forbid (they can’t!) any other Python implementation to implement that particular optimization. There’s a really interesting post by Guido Van Rossum about this.
Lua, as you’ve seen in the previous link, implements proper tail calls (as they call them there) and there’s nothing the programmer needs to do to make sure they are optimized. The only thing is to put the tail calls correctly.
Scala implements tail recursion optimization at compile time so the
compiler transforms the recursive call with a loop in compilation time. That’s
interesting because there’s a compile time check too. There’s an annotation
called @tailrec
that can be used to make sure that your function is going to
be optimized. If the compiler is not able to optimize it will throw an error if
it has the @tailrec
annotation. If it doesn’t have it, it will simply make a
standard recursion. In the annotations tour of the Scala language has
some words about @tailrec
.
Clojure is a really interesting case too. Clojure doesn’t implement
tail-call optimization, but it has one (and only one) special form for non
stack consuming looping: recur
. This special form rebinds the recursion
point’s bindings or arguments and jumps back to the recursion point. The
recursion point can be a loop
special form or a function definition. So,
it’s just an explicit call to the tail recursion optimization. Tail call must
be done correctly too, recur
is only allowed in a tail call and the compiler
checks if it’s located in the correct place. Also, it has some specific rules
that must be taken in consideration (multiple arity functions and so on), that
is better to read in the documentation.
Edited 2019-01-25: Thanks to a discussion in the fediverse about the topic, I found the moment where Emacs Lisp got its tail call optimization everything explained in the author’s blog. It’s really interesting.