上图功能的技术实现方法拿MySQL来举例就是
select * from msgs where thread_id = ? limit page * count, count
不过在看Twitter API的时候,我们却发现不少接口使用cursor的方法,而不用page, count这样直观的形式,如 followers ids 接口
代码如下 | 复制代码 |
URL: http://twitter.com/followers/ids.format Returns an array of numeric IDs for every user following the specified user. Parameters: |
http://twitter.com/followers/ids.format
从上面描述可以看到,http://twitter.com/followers/ids.xml 这个调用需要传cursor参数来进行分页,而不是传统的 url?page=n&count=n的形式。这样做有什么优点呢?是否让每个cursor保持一个当时数据集的镜像?防止由于结果集实时改变而产生查询结果有重复内容?
在Google Groups这篇Cursor Expiration讨论中Twitter的架构师John Kalucki提到
代码如下 | 复制代码 |
A cursor is an opaque deletion-tolerant index into a Btree keyed by source userid and modification time. It brings you to a point in time in the reverse chron sorted list. So, since you can’t change the past, other than erasing it, it’s effectively stable. (Modifications bubble to the top.) But you have to deal with additions at the list head and also block shrinkage due to deletions, so your blocks begin to overlap quite a bit as the data ages. (If you cache cursors and read much later, you’ll see the first few rows of cursor[n+1]’s block as duplicates of the last rows of cursor[n]’s block. The intersection cardinality is equal to the number of deletions in cursor[n]’s block). Still, there may be value in caching these cursors and then heuristically rebalancing them when the overlap proportion crosses some threshold. |
在另外一篇new cursor-based pagination not multithread-friendly中John又提到
代码如下 | 复制代码 |
The page based approach does not scale with large sets. We can no longer support this kind of API without throwing a painful number of 503s. Working with row-counts forces the data store to recount rows in an O Proportionally, very few users require multiple page fetches with a Also, scraping the social graph repeatedly at high speed is could |
通过这两段文字我们已经很清楚了,对于大结果集的数据,使用cursor方式的目的主要是为了极大地提高性能。还是拿MySQL为例说明,比如翻页到100,000条时,不用cursor,对应的SQL为
select * from msgs limit 100000, 100
在一个百万记录的表上,第一次执行这条SQL需要5秒以上。
假定我们使用表的主键的值作为cursor_id, 使用cursor分页方式对应的SQL可以优化为
select * from msgs where id > cursor_id limit 100;
以上是小编为您精心准备的的内容,在的博客、问答、公众号、人物、课程等栏目也有的相关内容,欢迎继续使用右上角搜索按钮进行搜索数据
, 接口
, 代码
, block
twitter
cursor 分页、webmagic分页抓取、javaweb分页、web分页、java web分页完整代码,以便于您获取更多的相关知识。