It is well-known that foldl
/ foldl'
can be defined via foldr
(Hutton 1999). Indeed, this is the default implementation in Data.Foldable
. However, there are two interesting things about the “foldl-as-foldr” pattern that I want to emphasize in this post:
Before we dive into the above points, let’s briefly go over some basics.
foldr
supports early terminationConsider the following example:
module Main where
import Prelude hiding (product, foldr)
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z = go
where
= z
go [] :xs) = f x (go xs)
go (x
product :: [Int] -> Int
product = foldr (\a b -> if a == 0 then 0 else a * b) 1
main :: IO ()
= print (product (2 : 3 : 0 : undefined)) main
This program will print 0
, as it doesn’t force the next cons cell, undefined
, after encountering 0
. Inlining the final call to go we have:
-> if a == 0 then 0 else a * b) 0 (go undefined) (\a b
which \beta-reduces to
if 0 == 0 then 0 else a * go undefined
Note that a saturated call to go
with a non-empty list \beta-reduces to a call to the passed in function f
. Thus, the caller of foldr
determines if the loop continues by forcing the recursive term b
in f
.
foldl'
doesn’t support early termination
module Test where
foldl' :: (b -> a -> b) -> b -> [a] -> b
= go
foldl' f where
= z
go z [] :xs) =
go z (xlet z' = f z x
in z' `seq` go z' xs
The caller of foldl'
has no such control. A saturated call to go
with a non-empty list \beta-reduces to a call to go
1. Thus, even if f
doesn’t force anything then we still traverse the spine of the list.
To demonstrate how GHC optimizes foldl'
when defined through foldr
, consider the following example:
module Test where
import Prelude hiding (foldr)
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z =
let go xs0 = case xs0 of
-> z
[] :xs -> f x (go xs)
xin go
foldl' :: (b -> a -> b) -> b -> [a] -> b
=
foldl' f z0 xs0 let g a b z =
let z' = f z a
in z' `seq` b z'
in foldr g id xs0 z0
Dumping the core we see exactly the same core that would be generated for the standard foldl'
definition:
foldl' :: forall b a. (b -> a -> b) -> b -> [a] -> b
foldl'= \ (@b_ave)
@a_avf)
(f_au5 :: b_ave -> a_avf -> b_ave)
(z0_au6 :: b_ave)
(xs0_au7 :: [a_avf]) ->
(
joinrec {Occ=LoopBreaker] :: [a_avf] -> b_ave -> b_ave
go_syk [LclId[JoinId(2)], Arity=2, Str=<S,1*U><L,U>, Unf=OtherCon []]
[xs1_agz :: [a_avf]) (eta_B0 :: b_ave)
go_syk (= case xs1_agz of {
-> eta_B0;
[] : x_agA xs_agB ->
case f_au5 eta_B0 x_agA of z'_auc { __DEFAULT ->
jump go_syk xs_agB z'_auc
}in
}; } jump go_syk xs0_au7 z0_au6
To understand how this is achieved, let’s break this down into a sequence of transformations. We use foldl
rather than foldl'
for simplicity.
The “let-float-from-application” transformation is described by the first example in §3.3 of (Peyton Jones, Partain, and Santos 1996). The analysis GHC performs to determine if a term should be \eta-expanded is described in (Breitner 2018).
foldl-as-foldr
Defining foldl
as foldr
is an application of the general technique of returning a function from foldr
(See \text{\S}5 of (Hutton 1999)). We can use this technique to define folds that blend the benefits of foldl
and foldr
. For example, we can define tail-recursive loops that may terminate early. Defining loops in this way is more than a convenience, as these functions may benefit from foldr/build
fusion (Gill, Launchbury, and Peyton Jones 1993) (Breitner 2018).
An important observation about this function-generating technique should be made regarding the b
parameter in: \textit{foldr} \;
(\lambda \; a \; \textcolor{red}{b} \; st. \; \ldots) \; \ldots
The observation is that this parameter becomes a tail recursive call in the generated loop.
To demonstrate this consider the following:
foldl
as foldr
unconditionally calls b
in the argument to foldr
, so it unconditionally tail calls itself. If it instead called b
conditionally then we have a left fold that can terminate early.
Let’s return to the foldr
product example:
module Test where
product :: [Int] -> Int
product = foldr (\a b -> if a == 0 then 0 else a * b) 1
We have the early termination that we want, but this function isn’t tail-recursive.
Dumping the core and focusing just on the worker of the worker/wrapper transformation we see:
Rec {
-- RHS size: {terms: 21, types: 11, coercions: 0, joins: 0/0}
Test.$wproduct [InlPrag=NOUSERINLINE[2], Occ=LoopBreaker]
:: [Int] -> GHC.Prim.Int#
GblId, Arity=1, Caf=NoCafRefs, Str=<S,1*U>, Unf=OtherCon []]
[Test.$wproduct
= \ (w_s1AY :: [Int]) ->
case w_s1AY of {
-> 1#;
[] : y_a1yn ys_a1yo ->
case y_a1yn of { GHC.Types.I# x_a1zf ->
case x_a1zf of wild2_Xr {
->
__DEFAULT case Test.$wproduct ys_a1yo of ww_s1B1 { __DEFAULT ->
GHC.Prim.*# wild2_Xr ww_s1B1
};0# -> 0#
}
}
}Rec } end
It is plain in the core that we recursively call $wproduct
then multiply the result of the recursive call, ww_s1B1
, by the list element wild2_Xr
.
If we want to optimize this function by making it tail-recursive while retaining the early-termination, we can utilize the function-generating foldr
approach.
module Test where
product :: [Int] -> Int
product xs0 = foldr f id xs0 1
where
f :: Int -> (Int -> Int) -> Int -> Int
= case nextNum == 0 of
f nextNum tailRecursiveCall productSoFar True -> 0
False ->
let productSoFar' = productSoFar * nextNum
in productSoFar' `seq` tailRecursiveCall productSoFar'
Dumping the core for the worker again we see the desired result: a tail-recursive left fold over the list that terminates early if it encounters 0
.
Rec {
-- RHS size: {terms: 21, types: 11, coercions: 0, joins: 0/0}
$sgo1 [Occ=LoopBreaker]
Test.product_ :: GHC.Prim.Int# -> [Int] -> Int
GblId, Arity=2, Caf=NoCafRefs, Str=<L,U><S,1*U>m, Unf=OtherCon []]
[$sgo1
Test.product_= \ (sc_s1JB :: GHC.Prim.Int#) (sc1_s1JA :: [Int]) ->
case sc1_s1JA of {
-> GHC.Types.I# sc_s1JB;
[] : y_a1Gg ys_a1Gh ->
case y_a1Gg of { GHC.Types.I# x_a1H8 ->
case x_a1H8 of wild2_Xs {
->
__DEFAULT $sgo1 (GHC.Prim.*# sc_s1JB wild2_Xs) ys_a1Gh;
Test.product_0# -> lvl_r1JX
}
}
}Rec }
end
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
lvl_r1JX :: Int
GblId, Caf=NoCafRefs, Str=m, Unf=OtherCon []]
[= GHC.Types.I# 0#
lvl_r1JX
Don’t be confused by seq
. It is a compiler builtin to control laziness and doesn’t affect the fact that go
is tail-recursive. This can be observed in the core generated by this definition of foldl'
:
foldl' :: forall b a. (b -> a -> b) -> b -> [a] -> b
foldl'= \ (@ b_atX)
@ a_atY)
(f_agd :: b_atX -> a_atY -> b_atX)
(eta_B2 :: b_atX)
(eta1_B1 :: [a_atY]) ->
(
joinrec {Occ=LoopBreaker] :: b_atX -> [a_atY] -> b_atX
go_svc [LclId[JoinId(2)], Arity=2, Str=<L,U><S,1*U>, Unf=OtherCon []]
[z_agf :: b_atX) (ds_duS :: [a_atY])
go_svc (= case ds_duS of {
-> z_agf;
[] : x_agh xs_agi ->
case f_agd z_agf x_agh of z'_atC { __DEFAULT ->
jump go_svc z'_atC xs_agi
}in
}; } jump go_svc eta_B2 eta1_B1