Programming clojure – Recursion and Lazy-seq

5.1 Functional Programming Concepts

The Six Rules 
Although the benefits of FP are compelling, FP is a wholesale change from the imperative programming style that dominates much of the programming world today. Plus, Clojure takes a unique approach to FP that strikes a balance between academic purity and the reality of running well on the JVM. That means there is a lot to learn all at once. 
But fear not. If you are new to FP, the following “Six Rules of Clojure FP” will help you on your initial steps toward FP mastery, Clojure style: 
1. Avoid direct recursion, 因为JVM不会优化, 直接导致爆stack 
    The JVM cannot optimize recursive calls, and Clojure programs that recurse will blow their stack. 
2. Use recur when you are producing scalar values or small, fixed sequences 
    Clojure will optimize calls that use an explicit recur. 
3. When producing large or variable-sized sequences, always be lazy 
    (Do not recur.) Then, your callers can consume just the part of the sequence they actually need.

4. Be careful not to realize more of a lazy sequence than you need 
5. Know the sequence library 
    You can often write code without using recur or the lazy APIs at all. 
6. Subdivide 
    Divide even simple-seeming problems into smaller pieces, and you will often find solutions in the sequence library that lead to more general, reusable code.

其实只讲两点, 
不要直接使用递归, 对于较小的有限seq使用recur优化, 对较大的或无限seq使用lazy, 并且不要随便realize你不需要的lazy seq部分.

不要闭门造车, 多使用sequence库.

 

5.2 How to Resolve Recursion

FP大量的使用递归, 当然直接使用递归会有爆stack问题, 所以怎么实现递归就成为一个问题, 对于所有FP都需要解决... 
上面的1,2,3,4 rule清晰描述了方法, 
要不使用tail recursion, 参考Pratical Cljr – loop/recur

要不使用lazy-seq, 通常你不知道该用哪个的时候, 用lazy总是没错的The Joy of Clojure – Laziness(6.3)

lazy-seq, 不需要recursion一定在tail, 但不是所有的case都可以表示为seq的形式

两种方法的思路其实是一样的, 避免使用stack, 这个取决于自底向上的设计

Functional programs make great use of recursive definitions.

recursive definition consists of two parts: 
• A basis, which explicitly enumerates some members of the sequence 
• An induction, which provides rules for combining members of the sequence to produce additional members 

Our challenge in this section is converting a recursive definition into working code. 
There are many ways you might do this: 
• A simple recursion, using a function that calls itself in some way to implement the induction step. 
• A tail recursion, using a function only calling itself at the tail end of its execution. Tail recursion enables an important optimization. 
• A lazy sequence that eliminates actual recursion and calculates a value later, when it is needed. 

Choosing the right approach is important. Implementing a recursive definition poorly can lead to code that performs terribly, consumes all available stack and fails, consumes all available heap and fails, or does all of these. In Clojure, being lazy is often the right approach.

 

Fibonacci numbers, 以此为例

Named for the Italian mathematician Leonardo (Fibonacci) of Pisa (c.1170 – c.1250), the Fibonacci numbers were actually known to Indian mathematicians as far back as 200 BC. The Fibonacci numbers have many interesting properties, and they crop up again and again in algorithms, data structures, and even biology

The Fibonaccis have a very simple recursive definition: 
• Basis: F0, the zeroth Fibonacci number, is zero. F1, the first Fibonacci number, is one. 
• Induction: For n > 1, Fn equals Fn−1+Fn−2.

(0 1 1 2 3 5 8 13 21 34)

直接递归

自顶向下的思路, f(n)=f(n-1)+f(n-2), 完全需要stack的支持

(defn stack-consuming-fibo [n]
  (cond
   (= n 0) 0      ;basis
   (= n 1) 1      ;basis
   :else (+ (stack-consuming-fibo (- n 1))
	    (stack-consuming-fibo (- n 2)))))   ;induction 

(stack-consuming-fibo 9)
34

(stack-consuming-fibo 1000000) ;会爆stack
java.lang.StackOverflowError

使用recur, tail-call optimization

直接使用上面的逻辑是无法进行tail-call optimization, 必须修改成自底向上的思路, 你如果直接看下面的递归有些困难, ok, 我换种写法

current = 0
next = 1
for(i =n; i>0; i--)
{
current = next
    next = current + next
}
所谓的尾部优化, 其实就是, 你如果可以写出for循环, 那么编译器就可以将stack调用优化成循环...
所以下面的递归调用, 其实是伪装成递归的循环(因为在FP中你无法直接写循环, 所以必须以递归的形式出现)

(defn recur-fibo [n]
  (letfn [(fib  ; letfn is like let but is dedicated to letting local functions
          [current next n] ;fib 3个参数
           (if (zero? n)
            current
            (recur next (+ current next) (dec n))))] ;tail-call optimization, 用recur替换函数名fib
  (fib 0 1 n))) 

(recur-fibo 1000000)
195 ... 208,982 other digits ... 875 ;不会导致爆stack, 因为recur优化, 没有使用stack

为什么要显示的表明recur, 进行tail-call optimization? 
The problem here is the JVM. While functional languages such as Haskell can perform TCO, the JVM was not designed for functional languages. 
No language that runs directly on the JVM can perform automatic TCO. 
The absence of TCO is unfortunate but not a showstopper for functional programs. Clojure provides several pragmatic workarounds: 
explicit self-recursion with recur, lazy sequences, and explicit mutual recursion with trampoline.

 

Lazy, 解决递归问题更好的方案

上面的Fibonacci数问题, 换个思路, 其实求n的Fibonacci 值, 就是在Fibonacci Seq上面取第n个item的值

所以我们可以定义lazy fib-seq, 这边思路和上面是不一样的, 逻辑是怎样生成seq, 更关键的是seq生成的思路必须是自底向上的

这个在FP里面至关重要, 因为只有自底向上的思路是不依赖stack的, 而如果自顶向下的思路一定需要stack的支持

对于lazy-seq为什么可以解决blow stack问题, 参考The Joy of Clojure – Laziness(6.3)

(defn lazy-seq-fibo
  ([]
     (concat [0 1] (lazy-seq-fibo 0 1))) ;concat, 两个seq串联
  ([a b]
     (let [n (+ a b)]    ;算出下个值
      (lazy-seq         ;lazy body
	(cons n (lazy-seq-fibo b n)))))) ;cons, item和seq串联

 

(take 10 (lazy-seq-fibo))
(0 1 1 2 3 5 8 13 21 34)


(rem (nth (lazy-seq-fibo) 1000000) 1000) ;rem求余数, 实际效果取最后3位数字
875

上面的做法已经满足1,2,3,4 rule

By rule 5, you can reuse existing sequence library functions that return lazy sequences.

直接使用Sequence库, 默认返回lazy seq 


(defn fibo []
  (map first (iterate (fn [[a b]] [b (+ a b)]) [0 1])))

Lazy definitions consume some stack and heap. 
But they do not consume resources proportional to the size of an entire (possibly infinite!) sequence. Instead, you choose how many resources to consume when you traverse the sequence.

 

Coming to Realization, 变现

Lazy sequences consume significant resources only as they are realized, that is, as a portion of the sequence is actually instantiated in memory. 
Clojure works hard to be lazy and avoid realizing sequences until it is absolutely necessary.


(def lots-o-fibs (take 1000000000 (fibo))) ;定义很大的fibo, 但并没有执行

(nth lots-o-fibs 100) ;只会计算100的fibo, 后面的不会继续算, 这就是lazy的优势

 

Most sequence functions return lazy sequences. 
If you are not sure whether a function returns a lazy sequence, the function’s documentation string typically will tell you the answer:

(doc take)
-------------------------
clojure.core/take
([n coll])
Returns a lazy seq of the first n items in coll, or all items if there are fewer than n.

 

The REPL, however, is not lazy. 
The printer used by the REPL will, by default, print the entirety of a collection. 
That is why we stuffed the first billion Fibonaccis into lots-o-fibs, instead of evaluating them at the REPL. Don’t enter the following at the REPL:

; don't do this
(take 1000000000 (fibo))

为什么REPL不是Lazy? 
其实不是说这个时候不lazy, 而是不应该lazy. Lazy是当用不到的时候不去eval, 但是在REPL中执行(fibo), REPL会试图print每个值, 这样相当于force所有的值eval.  
所以解决这个问题的方法, 很简单, 只需要限制REPL的默认print次数


(set! *print-length* 10)
(fibo)
(0 1 1 2 3 5 8 13 21 34 ...) ;只会eval10次

 

Losing Your Head


; holds the head (avoid!)
(def head-fibo (lazy-cat [0 1] (map + head-fibo (rest head-fibo))))

(nth head-fibo 1000000)
java.lang.OutOfMemoryError: Java heap space ;这儿不是stack blew, 是memory blew

原因就是本来lazy seq不停往后eval item的时候, 前面的值是会被回收的, 而这儿hold住head-fibo导致, 自动回收失效, 以至于把内存blew掉...

Unless you want to cache a sequence as you traverse it, you must be careful not to keep a reference to the head of the sequence.

With lazy sequences, losing your head is often a good idea.

5.4 Recursion Revisited

http://www.blogjava.net/killme2008/archive/2010/07/14/326129.html, 参考这个blog, 说的比较清楚

Shortcutting Recursion with Memoization, 记事本

clojure.core/memoize 
([f]) 
  Returns a memoized version of a referentially transparent function. The 
  memoized version of the function keeps a cache of the mapping from arguments 
  to results and, when calls with the same arguments are repeated often, has 
  higher performance at the expense of higher memory use.

非常简单的, 其实就是返回memoized版本的function, 会在memory里面buffer之前的结果, 当再次调用的时候可以快速返回. 
下面的例子很好的反应出它的用途, 如果没有memoize, 一定会blew stack, 因为这是典型的自上而下的思路, 但是加上memoize就不会出现这种情况, 为什么? 
因为Seq的eval顺序是从小到大的, 并且所有的结果都会被buffer, 所以后面的递归其实不用真正做, 因为结果已经buffer了...

(defn fac [n]
          (if (<= n 1)
              1
              (* n (fac (dec n)))))

(def fac (memoize fac))
(def fac-seq (map fac (iterate inc 0)))
(nth fac-seq 10000)

相互递归和trampoline

http://www.blogjava.net/killme2008/archive/2010/08/22/329576.html

相互递归问题, 比较典型的是求奇偶问题, 这种问题怎么解决blew stack问题?

(declare my-odd? my-even?)
(defn my-odd? [n]
      (if (= n 0)
          false
         (my-even? (dec n))))
(defn my-even? [n]
      (if (= n 0)
          true
         (my-odd? (dec n))))

 

思路, 两个函数无法recur, 所以封装成一个函数, 就可以使用recur

(defn parity [n]
      (loop [n n par 0]
            (if (= n 0)
                par
                (recur (dec n) (- 1 par)))))
user=> (parity 3)
1
user=> (parity 100)
0

但终究这个办法不是好的解决方案, clojure提供trampoline来解决这种问题

Clojure’s trampoline function invokes one of your mutually recursive functions:

(trampoline f & partial-args)
(defn trampoline
  ([f]
     (let [ret (f)]
       (if (fn? ret)
          (recur ret)
          ret)))

trampoline接收一个函数做参数并调用, 
如果结果为函数,那么就用recur做尾递归优化, 
如果不是,则返回结果,主要就是为了解决相互递归调用堆栈溢出的问题


(trampoline + 1 2) ;函数返回的不是函数是, 和一般函数没有不同, 直接返回结果
3

使用trampoline, 只需要做个小改动, 就是把原来的递归调用的地方, 改成返回匿名函数 

(declare my-odd? my-even?)
(defn my-odd? [n]
      (if (= n 0)
          false
         #(my-even? (dec n))))
(defn my-even? [n]
      (if (= n 0)
          true
         #(my-odd? (dec n))))

user=> (trampoline my-odd? 10000000) ;所以这样使用就不会blew stack, 因为trampoline会使用recur, 象跳床一样, 如果是函数就反复弹,至到得到结果
false
user=> (trampoline my-even? 10000) 
true

本文章摘自博客园,原文发布日期:2013-02-07

时间: 2024-08-14 20:36:44

Programming clojure – Recursion and Lazy-seq的相关文章

Programming Clojure - Unifying Data with Sequences

In Clojure, all these data structures can be accessed through a single abstraction: the sequence (or seq).  A seq (pronounced "seek") is a logical list, the seq is an abstraction that can be used everywhere.   Collections that can be viewed as s

Programming clojure – Multimethods

Multimethods, 其实就是FP基础里面说的, Pattern Matching, 说白了, 就是根据不同的参数定义不同的逻辑.  我首先想到的是函数重载, http://www.cnblogs.com/skynet/archive/2010/09/05/1818636.html 参数个数重载, 对于这种clojure函数天然支持, 如下可以定义多组参数列表 (defmacro and ([] true) ([x] x) ([x & rest] `(let [and# ~x] (if a

Programming clojure – Concurrency

Clojure的并发, 这兄弟写的比较系统, http://www.blogjava.net/killme2008/archive/2010/07/archive/2010/07/14/326027.html   Concurrency is a fact of life and, increasingly, a fact of software. 为什么需要并发? • Expensive computations may need to execute in parallel on multi

Clojure - 基本语法

Installing Clojure Clojure is an open-source project hosted at github.com. git clone https://github.com/clojure/clojure.git This will download the code from the master branch into the clojure directory in your workspace. Clojure is a Java project, an

The Joy of Clojure – Laziness(6.3)

Clojure is partially a lazy language. Clojure is lazy in the way it handles its sequence types. Laziness是FP重要的特征, pure FP应该是lazy language, 比如Haskell. 当然完全的lazy带来问题是, 无法保证程序的执行顺序.  而Clojure是在实用性和pure FP之间做妥协的, 所以他只是部分的支持lazy, 主要就表现在处理sequence的时候, 在其他大

Clojure世界:Http Client

    使用http client提交表单或者下载网页也是非常常见的任务,比如使用Java的时候可以用标准库的HttpURLConnection,也可以选择Apache Http Client.在clojure里也有这样的类库,这里我将介绍三个各有特色的http client实现.     首先,我最先推荐使用clj-http这个类库,它是Apache HttpClient的clojure wrapper,是一个提供同步API的简单易用的Http Client. 名称: clj-http 主页:

Clojure常用模块

Base   ->, (-> x form & more) http://clojuredocs.org/clojure_core/clojure.core/-%3E 线性化嵌套, 使其更具有可读性,  Inserts x as the second item in the first form 从下面的例子可以看出, 就是把第一个参数(x)作为最初的输入, 调用第二个参数(代表的fn), 然后拿返回值调用后续函数 和..用处差不多, 但..只能用于java调用 ;; Arguably

RDBMS vs. NoSQL &amp; Clojure概述

RDBMS vs. NoSQL 合作还是竞争 数据库要解决的主要问题 不管是RDBMS还是NoSQL,在大的方面他们都属于数据库这个范畴,这个范畴之内所要面临的一些共同问题有哪些呢.下面的图是一个大致的归纳. 从图中可以看出,一个数据库系统主要解决以下几个问题: 数据的存储,即要存入哪些数据到系统中,当然在data definition这一块,有schema和no schema两种,说白了就是数据格式和数据关系的定义问题 完成了data definition,那么接下来自然要发生的事情就是将数据

Pratical Cljr – loop/recur

Programming Clojure这块写的过于简单, Pratical Clojure写的还不错, 这本书在某些章节写的深度不错, 讨论why, 而不是仅仅how, 故摘录 首先, Clojure是不提供直接的looping语法的!!! 这个理解很重要 因为looping是imperative的概念, 对于FP, 需要改变思维的方式, 用递归来解决同样的问题, 所以下面也说对于从imperative 转来的程序员, 习惯从递归(recursive)的角度思考问题, 是很大的挑战. It wi