Fold under pressure

Sometimes you want to traverse a list, perform an operation on each element, then output a result based on the elements. Isn’t that what the function map does? Well, yes and no. The function map outputs a list. You might not want a list, but a value based on the elements of the input list. In general, you want to reduce the elements of a list to a value, i.e. fold a list into a value. The next section clarifies what we mean.

How to sum

The library function sum takes a list of numbers and adds all those numbers together to result in a sum. As an exercise to help us understand how the function sum works, we implement our own version as follows:

sum.hs

  
-- | Summing a list of numbers.
add :: Num a => [a] -> a
add []     = 0
add (x:xs) = x + add xs

Let’s generalize the above code into a pattern. Suppose we have a list $\ell$ of numbers. Let’s get the base case out of the way. If $\ell$ is empty, the sum of $\ell$ is zero. Now suppose $\ell$ is non-empty and let’s get on to the recursive case. Initially, the cumulative sum is zero. By definition of the recursive case, we can also argue that the initial cumulative sum is the head of $\ell$. As we visit the $i$-th element $e_i$ of $\ell$, the cumulative sum is the value of $e_i$ plus all the elements $e_0,\, e_1,\, \dots,\, e_{i-1}$. At any given step in the recursive process, we have three numbers:

Initial value. The value we start with prior to commencing the summation. Before summing, our sum is 0. Thus 0 is the initial value.
Current value. The value of $e_i$, the current element of $\ell$ we are considering. We step through each list element in order.
Accumulator. This is the cumulative sum so far. Before performing the summation, the value of the accumulator is the initial value. If we are at element $e_i$ of $\ell$, the accumulator at $e_i$ is the sum of all elements from $e_0$ up to and including $e_i$.

By way of example, consider summing the list [1, 2, 3, 4, 5]. The initial sum is zero, which is also the starting value of the accumulator. At element $e_0 = 1$, our accumulator is $e_0 = 1$ as well. At $e_1 = 2$, the accumulator is $e_0 + e_1 = 3$. At $e_2$, the accumulator is $e_0 + e_1 + e_2 = 6$. At $e_3$, the accumulator is $e_0 + e_1 + e_2 + e_3 = 10$. At $e_4$, the accumulator is $e_0 + e_1 + e_2 + e_3 + e_4 = 15$. We have exhausted all elements of the initial list, i.e. the remaining list is empty. By definition, the sum of an empty list is zero. Add zero to the accumulator to result in the accumulator itself. The result of summing the initial list is the current value of the accumulator, namely 15. We have reduced (or folded) the initial list [1, 2, 3, 4, 5] to a value. The function used in the reduction is arithmetic addition.

Fold left

The pattern described in the section How to sum is an example of a fold operation. The Haskell function foldl takes three parameters:¹

A binary operator, i.e. a function that accepts two parameters. In terms of the summation problem above, the binary operator is the addition operator +.
An initial value. For example, in the summation problem above the initial value is zero.
A list. We want to fold (or reduce) the elements of the list to a value.

We can use foldl to rewrite add as follows:

sum.hs

  
-- | Summing a list of numbers.  Use the fold operation.
addf :: Num a => [a] -> a
addf xs = foldl (+) 0 xs

The character l (lowercase L) in the function name foldl means “left”. This means that the fold operation goes from the left of the list to the right. For example, summing the list [1 .. 5] can be written as:

  
ghci> foldl (+) 0 [1 .. 5]
15

In the above GHCi session, the binary operator + hides the role of the accumulator. We do not always have a pre-defined library function to pass to foldl. Sometimes we must define our own binary function. Let’s rewrite the above code to clearly show how the accumulator is keeping track of the cumulative sum. Examine the lambda expression below:

  
ghci> foldl (\acc x -> acc + x) 0 [1 .. 5]
15

Prior to starting the summation, the value of the accumulator acc is 0. Now start the summation at the first element of the list [1 .. 5]. Then 1 would be passed to the lambda expression and assigned to the variable x, resulting in the cumulative sum $0 + 1 = 1$. The new value of the accumulator is now 1. Repeat the process for each of the remaining elements of the list and we obtain the sum of the list. Using the associative law for addition, the summation encapsulated in the above Haskell code can be written as:

  
((((0 + 1) + 2) + 3) + 4) + 5

Notice the position of the initial value 0 and how it corresponds to the position of the accumulator in the above lambda expression. Since we are folding from the left, the accumulator should be on the left-hand side in both infix and prefix notations. To evaluate the expression, we start from the innermost pair of parentheses and work our way to the outermost pair of parentheses.

Fold right

Rumack: Can you fly this plane and land it?
Ted Striker: Surely you can’t be serious.
Rumack: I am serious… and don’t call me Shirley.
— Airplane!, 1980

If there is a fold left, then surely there must be a fold right, yes? Of course there is a fold right. The Haskell function foldr allows you to fold from the right of a list to the left of the list. The function accepts the same three parameters as foldl. Here is a comparison:

  
ghci> foldl (+) 0 [1 .. 5]
15
ghci> foldl (\acc x -> acc + x) 0 [1 .. 5]
15
ghci> foldr (+) 0 [1 .. 5]
15
ghci> foldr (\x acc -> x + acc) 0 [1 .. 5]
15

The above summation with foldr can be written as:

  
1 + (2 + (3 + (4 + (5 + 0))))

Again, notice the position of the initial value 0 and the position of the accumulator in the lambda expression for foldr. Since we are folding from the right, the accumulator should be on the right-hand side in both infix and prefix notations. We start from the innermost pair of parentheses and work our way out.

Why do foldl and foldr result in the same answer in the above GHCi session? The reason is that addition is associative. It does not matter in which order we perform the addition. We always get the same result no matter the order in which we add.

Here is a simple case where foldl and foldr give different results:

  
ghci> foldl (-) 0 [1 .. 5]
-15
ghci> foldl (\acc x -> acc - x) 0 [1 .. 5]
-15
ghci> foldr (-) 0 [1 .. 5]
3
ghci> foldr (\x acc -> x - acc) 0 [1 .. 5]
3

Notice the position of the accumulator in the lambda expression for foldl and foldr. The subtraction problem denoted by the code foldl (-) 0 [1 .. 5] can be written as:

  
((((0 - 1) - 2) - 3) - 4) - 5

On the other hand, the code foldr (-) 0 [1 .. 5] can be written as:

  
1 - (2 - (3 - (4 - (5 - 0))))

In the case of subtraction, foldl and foldr give different results because subtraction is not associative. The order of operation matters in subtraction.

Which fold?

Suppose we have a binary operator $f$ that is associative, i.e. the order of operation does not matter. Using either foldl or foldr with $f$ would result in the same answer. Which fold operation should we choose?

Left or right?

A simple answer is: You should usually choose foldr if your binary operator is associative. The reason is that foldr is more time efficient than foldl. As an example, let’s sum all integers between 1 and 1,000,000. Here is foldl in action:

  
ghci> :set +s
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.34 secs, 161,302,760 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.25 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.23 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.23 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.25 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.25 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.25 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.23 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.23 secs, 161,299,688 bytes)
ghci> foldl (+) 0 [1 .. 1000000]
500000500000
(0.23 secs, 161,299,688 bytes)

The GHCi command :set +s would print some basic statistics about a function call. For the purposes of this example, we are only interested in the amount of time elapsed from the start of a function call to its termination. In the above GHCi session, we call foldl ten times to sum all integers between 1 and 1,000,000. Each call has an average elapsed time of approximately 0.249 seconds.

Now let’s see foldr in action:

  
ghci> :set +s
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.22 secs, 161,593,088 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.17 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.17 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.18 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.17 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.16 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.18 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.16 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.18 secs, 161,590,016 bytes)
ghci> foldr (+) 0 [1 .. 1000000]
500000500000
(0.15 secs, 161,590,016 bytes)
ghci>

The average elapsed time is approximately 0.174 seconds. As we can see from the above two sets of GHCi sessions, foldr is more time efficient than foldl in the case of addition.

This left or that left?

We know that there is a fold from the left (foldl) and a fold from the right (foldr). But did you know there is another version of foldl? It is called foldl', notice the single quotation mark '. The function foldl' is defined in the package GHC.List and is meant to be an efficient implementation of foldl.

Why another implementation of fold? The reason is due to associativity again. Suppose we have a binary operator $f$ that is not associative, i.e. the order of operation matters. Further assume that foldl (the left fold) produces the result we want, whereas foldr (the right fold) would result in an erroneous answer. Then foldl' should usually be preferred to foldl because the former is more time efficient than the latter.

Consider the example of subtraction. Remember, subtraction is not associative, unlike addition. The GHCi session below uses foldl ten times, yielding an average elapsed time of approximately 0.221 seconds.

  
ghci> :set +s
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.26 secs, 161,303,552 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.21 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.23 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.19 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.23 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.21 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.24 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.21 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.23 secs, 161,300,480 bytes)
ghci> foldl (-) 0 [1 .. 1000000]
-500000500000
(0.20 secs, 161,300,480 bytes)

Here’s the GHCi session for foldl'. Note that we must import the package GHC.List because foldl' is defined in that package. The average elapsed time of ten calls is about 0.062 seconds.

  
ghci> import GHC.List
ghci> :set +s
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.05 secs, 88,074,640 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.08 secs, 88,073,720 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.03 secs, 88,073,728 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.02 secs, 88,073,728 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.06 secs, 88,073,728 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.07 secs, 88,073,736 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.08 secs, 88,073,736 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.08 secs, 88,073,736 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.08 secs, 88,073,736 bytes)
ghci> foldl' (-) 0 [1 .. 1000000]
-500000500000
(0.07 secs, 88,073,736 bytes)

When you must use a left fold, foldl' would usually be a better choice than foldl.

Universal property of fold

Zapp: Wait. What are you people? Idiots? I’m still going mano a mano with this envelope. And the winner is: Miss Vega 4. There it is, Miss Universe. There it is, looking weird.
— Futurama, season 2, episode 6, 2000

The universal property of fold refers to the fact that many recursive definitions of functions can be equivalently expressed as a fold operation.² Suppose we have a binary operator $f$ and we define a function $g$ as follows:

  
g []     = v
g (x:xs) = f x (g xs)

where v is the base case for $g$. The universal property of fold says that the above recursive definition is equivalent to the following fold operation:

  
g = foldr f v

What does “universal property” mean? We won’t delve too deep into the notion. Suffice it to say that, in mathematics, the idea of universal property means structural equivalence. Mathematics gives us a number of abstract rules by which we can determine whether two constructions are fundamentally the same. Suppose we can construct an object $A$ via one technique and we can construct an object $B$ via another technique. If we can show that both $A$ and $B$ satisfy the same universal property, i.e. the same fundamental set of rules, then we may conclude that $A$ and $B$ are equivalent. The implication is immediate. Suppose the construction of $A$ happens to be long or takes a while to complete, but the construction of $B$ is short and efficient. If $A$ and $B$ are structurally equivalent, then we can replace $A$ by $B$. The fold operation happens to be a short and sweet way to define many recursive functions.

Examples

Let’s see how the universal property of fold can help us write an expression to calculate the length of a list. Recall from the section Length of list that we have the following recursive definition:

length.hs

  
-- | The length of a list.
size :: [a] -> Int
size []     = 0
size (x:xs) = 1 + size xs

To convert the above recursive definition to a fold operation, we require two ingredients:

The initial value. For the case base, the length of an empty list is zero. This is our initial value.
The binary operator. The function we require is addition. For clarity, we should write our binary operator in such a way that shows the role of the accumulator. As we traverse through each element of a list, we increment our accumulator by 1. Using a fold right operation, our lambda expression would be (\_ acc -> 1 + acc). The wildcard character _ means that we ignore any list element passed to the lambda expression. We replace each list element by the value 1.

Combine the above two ingredients to obtain the following expression:

length.hs

  
-- | The length of a list, using a fold operation.
sizef = foldr (\_ acc -> 1 + acc) 0

In the section No loop for you, we presented a homemade implementation of the library function map. Here’s the code for reference:

map.hs

  
-- | An implementation of the function "map".
imap :: (a -> b) -> [a] -> [b]
imap _ []     = []
imap f (x:xs) = [f x] ++ imap f xs

The initial value is the empty list []. The binary operator is the function f x acc, which takes an element x of the input list, calculates the result f x, and prepends the result to the accumulator acc. Writing a successive sequence of prepending as

  
ghci> 1 : 2 : 3 : []
[1,2,3]
ghci> 1 : (2 : (3 : []))
[1,2,3]

it is clear that we can use a fold right. Here’s an implementation of map by means of foldr:

map.hs

  
-- | An implementation of the function "map".  We use a fold operation.
imapf :: (a -> b) -> [a] -> [b]
imapf f xs = foldr (\x acc -> f x : acc) [] xs

Exercises

Exercise 1. Why do foldl and foldr yield different results in the GHCi session below?

  
ghci> foldl (^) 2 [1, 2, 3]
64
ghci> foldr (^) 2 [1, 2, 3]
1

What can be concluded about the operator ^?

Exercise 2. In the GHCi session below, why does foldr result in the list [1,2,3] but foldl results in an error? What can be concluded about the operator :?

  
ghci> foldr (:) [] [1, 2, 3]
[1,2,3]
ghci> foldl (:) [] [1, 2, 3]

<interactive>:2:7: error:
    * Couldn't match type 'a' with '[a]'
      Expected: [a] -> [[a]] -> [a]
        Actual: [a] -> [[a]] -> [[a]]
      'a' is a rigid type variable bound by
        the inferred type of it :: [a]
        at <interactive>:2:1-22
    * In the first argument of 'foldl', namely '(:)'
      In the expression: foldl (:) [] [1, 2, 3]
      In an equation for 'it': it = foldl (:) [] [1, 2, 3]
    * Relevant bindings include it :: [a] (bound at <interactive>:2:1)

Exercise 3. Is the division operator / associative? Why or why not? Provide an example using foldl and foldr.

Exercise 4. Repeat the factorial exercise, but use a fold operation.

Exercise 5. Read more about foldl, foldl', and foldr.

Exercise 6. The arithmetic mean (or average) $a_n$ of a sequence $x_1,\, x_2,\, \dots,\, x_n$ of $n$ numbers is defined as:

\[\begin{equation} \label{eqn:arithmeticMean} a_n = \frac{1}{n} \sum_{i=1}^n x_i. \end{equation}\]

Use a fold operation to implement expression (\ref{eqn:arithmeticMean}). Assume that each $x_i$ is an integer.

Exercise 7. Use a fold operation to implement the function last.

Exercise 8. Write a function join that takes two parameters: (1) a delimiter string delim; and (2) a list $\ell$ of numbers. The function join outputs the elements of $\ell$ as a string, where the elements are separated by the given delimiter delim. For example, the code join "," [1, 2, 3] should result in the string "1,2,3". Implement join with and without a fold operation.

Exercise 9. Use a fold operation to implement your own version of the function reverse.

Exercise 10. Repeat the exercise on dot product, but use a fold operation. Assume your lists are finite, but not necessarily of two elements.

Some programming languages use “reduce” to describe the fold operation. Examples include JavaScript, Python, and Ruby. ↩
See the following paper for more details: Graham Hutton. A tutorial on the universality and expressiveness of fold. Journal of Functional Programming, volume 9, issue 4, 1999, pp.355–372, doi: 10.1017/S0956796899003500 ↩