PostgreSQL 用WITH伪递归求最短路径的例子

转自

http://www.depesz.com/2012/06/25/how-to-get-shortest-connection-between-two-cities/

Yesterday, on #postgresql on irc some guy asked:

22:28 < rafasc> i am trying to use plpgsql to find the shortest path between two cities, each pair of cities has one or more edges, each edge has a different wheight.
22:28 < rafasc> Is there a easy way to compute the shortest path between two cities?

Well, I was not really in a mood to solve it, so I just told him to try with recursive queries, and went on my way.

But I thought about it. And decided to see if I can write the query.

To get some test data, I created two simple tables:

$ \d cities
   Table "public.cities"
 Column │ Type │ Modifiers
────────┼──────┼───────────
 city   │ text │ not null
Indexes:
    "cities_pkey" PRIMARY KEY, btree (city)
Referenced by:
    TABLE "routes" CONSTRAINT "routes_from_city_fkey" FOREIGN KEY (from_city) REFERENCES cities(city)
    TABLE "routes" CONSTRAINT "routes_to_city_fkey" FOREIGN KEY (to_city) REFERENCES cities(city)
 
$ \d routes
      Table "public.routes"
  Column   │  Type   │ Modifiers
───────────┼─────────┼───────────
 from_city │ text    │ not null
 to_city   │ text    │ not null
 length    │ integer │ not null
Indexes:
    "routes_pkey" PRIMARY KEY, btree (from_city, to_city)
Check constraints:
    "routes_check" CHECK (from_city < to_city)
Foreign-key constraints:
    "routes_from_city_fkey" FOREIGN KEY (from_city) REFERENCES cities(city)
    "routes_to_city_fkey" FOREIGN KEY (to_city) REFERENCES cities(city)

Data in them is very simple:

$ select * from cities limit 5;
      city
────────────────
 Vancouver
 Calgary
 Winnipeg
 Sault St Marie
 Montreal
(5 rows)
 
$ select * from routes limit 5;
 from_city │  to_city  │ length
───────────┼───────────┼────────
 Calgary   │ Vancouver │      3
 Seattle   │ Vancouver │      1
 Portland  │ Seattle   │      1
 Calgary   │ Seattle   │      4
 Calgary   │ Helena    │      4
(5 rows)

In case you wonder – the data represents base map for “Ticket to Ride" game – awesome thing, and if you haven't played it – get it, and play.

This map was part of review of the game on ars technica.

But anyway. So, I have 36 cities, and 78 unique paths between them, each with length information. So, with this I should be able to find the shortest path.

One word of warning though – the fact that it's possible to do in database, doesn't mean it's good idea. Personally, I think that it should be done in some standalone application, which would use some smarter algorithms, extensive cache, and so on. But – this is just a proof of concept, and the data size that I'm working on is small enough that it shouldn't matter.

Each route is stored only once in routes. So I'll start by duplicating the rows, so I will have them written “in both directions":

CREATE VIEW all_routes AS
    SELECT from_city, to_city, length FROM routes
    UNION ALL
    SELECT to_city, from_city, length FROM routes

This will save me some typing later on.

First, let's start with some small route, but one that will show that it actually works – Duluth-Toronto is great example.

Reason is very simple, We have these 3 routes:

   from_city    │    to_city     │ length
────────────────┼────────────────┼────────
 Duluth         │ Sault St Marie │      3
 Sault St Marie │ Toronto        │      2
 Duluth         │ Toronto        │      6
(3 rows)

There is a direct connection (length 6), but it's actually cheaper to go via Sault St Marie, with total length of 5!

Here is a pause, of ~ 1 hour when I tried to write a query to solve my problem. And I failed. Kind of.

Query that would return the data is relatively simple:

WITH RECURSIVE
    multiroutes AS (
        SELECT
            from_city,
            to_city,
            ARRAY[ from_city, to_city ] as full_route,
            length as total_length
        FROM
            all_routes
        WHERE
            from_city = 'Duluth'
        UNION ALL
        SELECT
            m.from_city,
            n.to_city,
            m.full_route || n.to_city,
            m.total_length + n.length
        FROM
            multiroutes m
            join all_routes n ON m.to_city = n.from_city
        WHERE
            n.to_city <> ALL( m.full_route )
    )
SELECT *
FROM multiroutes
WHERE to_city = 'Toronto'
ORDER BY total_length desc limit 1;

But the problem is – it's extremely slow. And uses a lot of resources, which made OOM killer in my desktop to kill it (yes, stupid OOM killer).

I tried to implement simple pruning of searched paths if they are longer than current shortest on given route, but I couldn't find a way to do it – it seems to require subselect, and subselects referring to recursive queries, are not allowed within the recursive query itself.

(I think that perhaps RhodiumToad (on irc) can do it in a single query, but I'm far away from his level of skills, so I had to pass)

Does that mean it can't be done in database? No.

Luckily, we have functions. And functions can be rather smart.

To make the function simpler to use and write, I defined a type:

CREATE TYPE route_dsc as (
    from_city     TEXT,
    to_city       TEXT,
    full_route    TEXT[],
    total_length  INT4
);

This is a quite easy way to encapsulate all information about a single route as somewhat scalar value.

Now, I can write the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
CREATE OR REPLACE FUNCTION
    get_shortest_route( p_from TEXT, p_to TEXT )
    RETURNS SETOF route_dsc AS
$$
DECLARE
    sanity_count   INT4;
    final_routes   route_dsc[];
    current_routes route_dsc[];
    r              route_dsc;
BEGIN
    SELECT count(*) INTO sanity_count
        FROM cities
        WHERE city in (p_from, p_to);
    IF sanity_count <> 2 THEN
        raise exception 'These are NOT two, distinct, correct city names.';
    END IF;
 
    current_routes := array(
        SELECT row(from_city, to_city, ARRAY[from_city, to_city], length)
        FROM all_routes
        WHERE from_city = p_from
    );
    final_routes := current_routes;
 
    LOOP
        current_routes := array(
            SELECT row(
                c.from_city,
                a.to_city,
                c.full_route || a.to_city,
                c.total_length + a.length)
            FROM
                unnest( current_routes ) as c
                join all_routes a on c.to_city = a.from_city
            WHERE
                a.to_city <> all( c.full_route )
                AND
                c.total_length + a.length <= least(
                    coalesce(
                        (
                            SELECT min(l.total_length)
                            FROM unnest( final_routes ) as l
                            WHERE ( l.from_city, l.to_city ) = (c.from_city, p_to)
                        ),
                        c.total_length + a.length
                    ),
                    coalesce(
                        (
                            SELECT min(l.total_length)
                            FROM unnest( final_routes ) as l
                            WHERE ( l.from_city, l.to_city ) = (c.from_city, a.to_city)
                        ),
                        c.total_length + a.length
                    )
                )
        );
        EXIT WHEN current_routes = '{}';
        final_routes := final_routes || current_routes;
    END LOOP;
    RETURN query
        WITH rr as (
            SELECT
                from_city,
                to_city,
                full_route,
                total_length,
                dense_rank()
                    over (partition BY from_city, to_city ORDER BY total_length) as rank
            FROM unnest( final_routes )
            WHERE from_city = p_from AND to_city = p_to
        )
        SELECT from_city, to_city, full_route, total_length FROM rr where rank = 1;
    RETURN;
END;
$$ language plpgsql;

Looks huge, but in fact it's only because there are many queries inside. So, let's see what the function does:

  • lines 1-4 – standard preamble with function name, 2 arguments (cities we want to connect), and information that we will be returning set of records based on the type I just defined. In here you might wonder – why set of? We want just the shortest route. Yes, that's correct but it's perfectly possible (and very common) that there are many rows with the same, minimal length. So, instead of picking one randomly – I will return them all.
  • lines 6-9 – variable declarations, not really interesting
  • lines 11-16 – sanity check. Simple verification that both given names are city names, and that they are different.
  • lines 18-22 – I build current_routes based on all routes coming from source city. For example, If I'd call the function to find me route from Duluth to Toronto, the array would get these rows:

    $ SELECT from_city, to_city, ARRAY[from_city, to_city], length
    FROM all_routes
    WHERE from_city = 'Duluth';
     from_city │    to_city     │           array           │ length
    ───────────┼────────────────┼───────────────────────────┼────────
     Duluth    │ Helena         │ {Duluth,Helena}           │      6
     Duluth    │ Winnipeg       │ {Duluth,Winnipeg}         │      4
     Duluth    │ Sault St Marie │ {Duluth,"Sault St Marie"} │      3
     Duluth    │ Toronto        │ {Duluth,Toronto}          │      6
     Duluth    │ Omaha          │ {Duluth,Omaha}            │      2
     Duluth    │ Chicago        │ {Duluth,Chicago}          │      3
    (6 rows)
  • line 23 – I copy current_routes to “final_routes". current_routes contains only routes that the loop below has to work on, but final routes – is an array of all routes that will be used for finding final solution
  • lines 25-59 – basically infinite loop (of course with proper exit condition), which recursively finds routes:
    • lines 26-56 – core of the function. This query builds new list of routes, based on what's in current_routes, with following criteria:

      • new route must be from a city that is at the end of some route in “current_routes" (i.e. it's next segment for multi-city route
      • added (to route) city cannot be already in full_route (there is no point in revisiting cities when we're looking for shortest path
      • new total length of route (i.e. some route from current_routes + new segment) has to be shorter (or the same) as existing shortest path between these two cities. By “these" I mean original “from" city, and newly added “to" city. So, if we already have a route between cities “a" and “b" that is “10" long, there is no point in adding new route that is “20" long.
      • similar condition as above, but checking against already found requested route – i.e. route between cities user requested in passing arguments
      • above two criteria make sense only if there are matching routes already in final_routes – hence the need for coalesce()

      All such routes are stored in current_routes for future checking

    • line 57 – if the query above didn't return any routes – we're done, can exit the loop
    • line 58 – if there are some routes – add them to final_routes, and repeat the loop
  • lines 60-72 – return of the important data. I take all the routes in final_routes, from there, pick only the ones that match from_city/to_city with parameters given on function call, and then I use dense_rank() to find all records that have minimal total_length. All these records will get returned.

If that's complex, let me show you an example. What is stored, in which variable, at which step, when finding the route from Duluth to Toronto.

  • after line 23 in function, both current_routes and final_routes contain:
    from_city to_city total_length full_route
    Duluth Helena 6 {Duluth,Helena}
    Duluth Winnipeg 4 {Duluth,Winnipeg}
    Duluth Sault St Marie 3 {Duluth,"Sault St Marie"}
    Duluth Toronto 6 {Duluth,Toronto}
    Duluth Omaha 2 {Duluth,Omaha}
    Duluth Chicago 3 {Duluth,Chicago}
  • First run of the main recursive query – at line 57 current_routes are:
    from_city to_city total_length full_route
    Duluth Toronto 5 {Duluth,"Sault St Marie",Toronto}
    Duluth Pittsburg 6 {Duluth,Chicago,Pittsburg}
    Duluth Saint Louis 5 {Duluth,Chicago,"Saint Louis"}
    Duluth Denver 6 {Duluth,Omaha,Denver}
    Duluth Kansas City 3 {Duluth,Omaha,"Kansas City"}

    and since it's obviously not empty set – it continues.
    Please note that it didn't (for example) add route Duluth – Helena – Seattle (which is correct route, as you can see on the image above). Reason is very simple – we already found one route Duluth – Toronto, and its length is 6, so adding new route which is longer than this – doesn't make sense.

  • At line 58 final_routes are set to:
    from_city to_city total_length full_route
    Duluth Helena 6 {Duluth,Helena}
    Duluth Winnipeg 4 {Duluth,Winnipeg}
    Duluth Sault St Marie 3 {Duluth,"Sault St Marie"}
    Duluth Toronto 6 {Duluth,Toronto}
    Duluth Omaha 2 {Duluth,Omaha}
    Duluth Chicago 3 {Duluth,Chicago}
    Duluth Toronto 5 {Duluth,"Sault St Marie",Toronto}
    Duluth Pittsburg 6 {Duluth,Chicago,Pittsburg}
    Duluth Saint Louis 5 {Duluth,Chicago,"Saint Louis"}
    Duluth Denver 6 {Duluth,Omaha,Denver}
    Duluth Kansas City 3 {Duluth,Omaha,"Kansas City"}

    Which is simply previous final_routes with added new 5.

  • After next iteration of the loop, based on 5-element current_routes, we got only two new routes:
    from_city to_city total_length full_route
    Duluth Oklahoma City 5 {Duluth,Omaha,"Kansas City","Oklahoma City"}
    Duluth Saint Louis 5 {Duluth,Omaha,"Kansas City","Saint Louis"}

    And of course they got added to final_routes.

  • another iteration of the loop, based on current_routes with just two elements – didn't return any rows. There simply is no way to extend routes “Duluth-Omaha-Kansas City" or “Duluth-Omaha-Saint Louis" in a way that wouldn't extend already found route “Duluth-Sault St Marie-Toronto" with length 5.
  • Since this iteration of loop didn't find anything, loop exits, and the final_routes contains:
    from_city to_city total_length full_route
    Duluth Helena 6 {Duluth,Helena}
    Duluth Winnipeg 4 {Duluth,Winnipeg}
    Duluth Sault St Marie 3 {Duluth,"Sault St Marie"}
    Duluth Toronto 6 {Duluth,Toronto}
    Duluth Omaha 2 {Duluth,Omaha}
    Duluth Chicago 3 {Duluth,Chicago}
    Duluth Toronto 5 {Duluth,"Sault St Marie",Toronto}
    Duluth Pittsburg 6 {Duluth,Chicago,Pittsburg}
    Duluth Saint Louis 5 {Duluth,Chicago,"Saint Louis"}
    Duluth Denver 6 {Duluth,Omaha,Denver}
    Duluth Kansas City 3 {Duluth,Omaha,"Kansas City"}
    Duluth Oklahoma City 5 {Duluth,Omaha,"Kansas City","Oklahoma City"}
    Duluth Saint Louis 5 {Duluth,Omaha,"Kansas City","Saint Louis"}

Based on the final_routes above, query in lines 61-72 calculates correct answer, and shows it.

OK. So it works. But how slow it is?

First, let's start with very simple example – Atlanta – Nashville. These two cities are connected using a single one-element route. Call to function:

$ SELECT * FROM get_shortest_route('Atlanta', 'Nashville');
 from_city │  to_city  │     full_route      │ total_length
───────────┼───────────┼─────────────────────┼──────────────
 Atlanta   │ Nashville │ {Atlanta,Nashville} │            1
(1 row)
 
Time: 1.045 ms

What about the Duluth-Toronto?

$ SELECT * FROM get_shortest_route('Duluth', 'Toronto');
 from_city │ to_city │            full_route             │ total_length
───────────┼─────────┼───────────────────────────────────┼──────────────
 Duluth    │ Toronto │ {Duluth,"Sault St Marie",Toronto} │            5
(1 row)
 
Time: 2.239 ms

Something longer perhaps:

$ SELECT * FROM get_shortest_route('Duluth', 'Los Angeles');
 from_city │   to_city   │                                  full_route                                   │ total_length
───────────┼─────────────┼───────────────────────────────────────────────────────────────────────────────┼──────────────
 Duluth    │ Los Angeles │ {Duluth,Omaha,Denver,Phoenix,"Los Angeles"}                                   │           14
 Duluth    │ Los Angeles │ {Duluth,Omaha,Denver,"Santa Fe",Phoenix,"Los Angeles"}                        │           14
 Duluth    │ Los Angeles │ {Duluth,Omaha,"Kansas City","Oklahoma City","Santa Fe",Phoenix,"Los Angeles"} │           14
 Duluth    │ Los Angeles │ {Duluth,Helena,"Salt Lake City","Las Vegas","Los Angeles"}                    │           14
 Duluth    │ Los Angeles │ {Duluth,Omaha,Denver,"Salt Lake City","Las Vegas","Los Angeles"}              │           14
(5 rows)

And how about a cross country?

$ SELECT * FROM get_shortest_route('Vancouver', 'Miami');
 from_city │ to_city │                                      full_route                                      │ total_length
───────────┼─────────┼──────────────────────────────────────────────────────────────────────────────────────┼──────────────
 Vancouver │ Miami   │ {Vancouver,Calgary,Helena,Omaha,"Kansas City","Saint Louis",Nashville,Atlanta,Miami} │           23
 Vancouver │ Miami   │ {Vancouver,Seattle,Helena,Omaha,"Kansas City","Saint Louis",Nashville,Atlanta,Miami} │           23
(2 rows)
 
Time: 62.507 ms

The longer the road the more time it takes to find it. Which is pretty understandable.

So, to wrap it. It can be done in database. It is not as slow as I expected. I wasn't able to find a way to do it without functions, but it might be possible for someone smarter than me.

And I still don't think it's a good idea to put this logic in database.

时间: 2024-10-04 01:11:49

PostgreSQL 用WITH伪递归求最短路径的例子的相关文章

c语言-C语言用递归求圆周率的值,要求精确到小数点后3位,不得使用循环

问题描述 C语言用递归求圆周率的值,要求精确到小数点后3位,不得使用循环 C语言用递归求圆周率的值,要求精确到小数点后3位,不得使用循环 解决方案 http://jingyan.baidu.com/article/bea41d437c69b8b4c51be6e9.html 解决方案二: public class Test { public static void main(String[] args) { System.out.println("怎么插入代码块.."); } }

Floyd求最短路径算法详解

倘若我们要在计算机上建立一个交通咨询系统则可以采用图的结构来表示实际的交通网络.其实现最基本的功能,求出任意两点间的最短路径, 求最短路径的经典方法有很多种,最常用的便是迪杰斯特拉算法和佛洛依德(Floyd)算法,这篇文章就着重介绍Floyd算法. 求两点之间的最短路径无外乎有两种情况,一种就是从一点直接到另一点,另一种就是从一点经过n个节点后再到另一个节点,比如说要从A到B,则有两种情况就是A直接到B,或者是从A经过N个节点后再到B,所以,我们假设Dis(AB)为节点A到节点B的最短路径的距离

c语言-C语言用递归求圆周率的值,怎么实现

问题描述 C语言用递归求圆周率的值,怎么实现 C语言用递归求圆周率的值,要求精确到小数点后3位,不得使用循环 解决方案 C语言实现求圆周率归并排序递归实现C语言

使用dijkstra求最短路径,动态添加数据,无法求出最短路径

问题描述 使用dijkstra求最短路径,动态添加数据,无法求出最短路径 10C package Test; import java.util.ArrayList;import java.util.HashMap;import java.util.List;import java.util.Map;import java.util.PriorityQueue; import com.test.Station; public class DijSuccess { public static int

c语言递归求阶乘 ,不知道哪里不对

问题描述 c语言递归求阶乘 ,不知道哪里不对 递归求阶乘出问题了 #include long f(i) { long f=1; while(i>=1) f*=f(i-1); return f; } main() { int i=10; printf("factorial=%d",f(i)); } 帮忙看一下! 解决方案 = =你的i传进去 就是为了判断的吗? while(i>1) f=i* f(i-1) 解决方案二: #include long f(int i){ if(i

编译一直有错-vb.net用递归求阶乘 编译一直有问题

问题描述 vb.net用递归求阶乘 编译一直有问题 解决方案 你把代码贴上来看看啊. 解决方案二: 估计是你直接把代码写在命名空间里了.命名空间里只能装类,类里面才能装函数,函数里面才能装代码. 解决方案三: 你把代码贴上来我们才好看啊.

递归求字符串长度问题,求大神解答

问题描述 递归求字符串长度问题,求大神解答 int length(char * str) { if (*str == '') { return 0; } else { return (1+length(++str)); } } char str[10]="abcde"; 这个递归最后返回来的为什么会是5 不是应该返回0吗 求大神解惑 解决方案 strlen求字符串长度问题 解决方案二: 你还没有理解递归 最后一次是返回0 但是不是返回给main 而是返回给上一次递归 这样上一次递归就是

java数学归纳法非递归求斐波那契数列的方法_java

本文实例讲述了java数学归纳法非递归求斐波那契数列的方法.分享给大家供大家参考.具体如下: Integer能表示的最大值为 2147483647 大概是21.4亿,这里没有考虑溢出情况(当size为983时就会溢出)! import java.util.List; import java.util.ArrayList; /** * @author jxqlovejava * 斐波那契数列 */ public class Fibonacci { public static List<Intege

C++用Dijkstra(迪杰斯特拉)求最短路径的算法解析

算法介绍 迪杰斯特拉算法是由荷兰计算机科学家狄克斯特拉于1959  年提出的,因此又叫狄克斯特拉算法.是从一个顶点到其余各顶点的最短路径算法,解决的是有向图中最短路径问题.迪杰斯特拉算法主要特点是以起始点为中心向外层层扩展,直到扩展到终点为止.Dijkstra算法能得出最短路径的最优解,但由于它遍历计算的节点很多,所以效率低. 算法思想 按路径长度递增次序产生算法:  把顶点集合V分成两组:   (1)S:已求出的顶点的集合(初始时只含有源点V0)   (2)V-S=T:尚未确定的顶点集合  将