MonetDB multiplex-funnel

本文介绍一下MonetDB的漏斗功能. 此功能也是需要基于MonetDB的discovery和remote database.

m-funnel这个功能有点像plproxy, 多了一个queue/funnel的限制, 并且没有plproxy 的路由算法, 一个纯粹的代理.

plproxy接受客户端请求, 客户端的请求是并行提交给数据节点.

而MonetDB multiplex-funnel则是并行的接收客户端请求, 但是请求最终是一个一个提交给后端的所有数据节点的. 

返回结果的话则是所有数据节点并行执行的结果返回, 如有4个后端monetdbd, 那么count(*)返回4条记录.

另外需要注意的是m-funnel还限制了单条SQL返回的结果集的大小, 默认是80K.

注意使用m-funnel会打乱事务特性, 例如一个客户端的事务请求, 事务的中间可能被其他客户端的SQL插入.

MonetDB的multiplex-funnel举例 : 

使用monetdb创建m-funnel数据库

       -m pattern
              With the -m flag, instead of creating a database, a multiplex-funnel is created.  See section MULTIPLEX-
              FUNNEL in monetdbd(1).  The pattern argument is not fully the same as a pattern for connecting  or  dis-
              covery.   Each  parallel target for the multiplex-funnel is given as username+password@pattern sequence,
              separated by commas.  Here the pattern is an ordinary pattern as would  be  used  for  connecting  to  a
              database, and can hence also be just the name of a database.

注意pattern的格式,  username+password@pattern, 这里的pattern是数据库, 所以数据库需要在同一个广播域. 并且配置好tag(假设有重名的情况下)


这里dbx是边界, 主机属于两个广播域, 所以forward必须=proxy, 因为从db0的主机无法直接连接dbx1和dbx2. 必须通过dbx所在的monetdbd进行代理.

使用举例 : 

配置hostname

150.sky-mobi.com
db-172-16-3-221

配置/etc/sysconfig/network

HOSTNAME=db-172-16-3-221

在所有monetdb节点配置DNS或/etc/hosts, 使解析正常.

本例配置/etc/hosts

172.16.3.221 db-172-16-3-221
172.16.3.150 150.sky-mobi.com

配置防火墙, 让monetdbd之间可以相互通讯

iptables -t filter -I INPUT -s 172.16.0.0/16 -j ACCEPT

创建monetdbd storage

monetdbd create /data01/mdb1

配置discovery

monetdbd set discovery=yes /data01/mdb1

可选, 配置forward = proxy (如果这是个边界m-funnel库的话, 默认就是proxy)

monetdbd set forward=proxy /data01/mdb1

启动monetdbd

monetdbd start /data01/mdb1

创建数据库, 解锁数据库

[root@150 postgresql-9.3.5]# monetdb create db1
created database in maintenance mode: db1
[root@150 postgresql-9.3.5]# monetdb create db2
created database in maintenance mode: db2
[root@150 postgresql-9.3.5]# monetdb release db1
taken database out of maintenance mode: db1
[root@150 postgresql-9.3.5]# monetdb release db2
taken database out of maintenance mode: db2

[root@db-172-16-3-221 ~]# monetdb create dbx1
created database in maintenance mode: dbx1
[root@db-172-16-3-221 ~]# monetdb create dbx2
created database in maintenance mode: dbx2
[root@db-172-16-3-221 ~]# monetdb release dbx1
taken database out of maintenance mode: dbx1
[root@db-172-16-3-221 ~]# monetdb release dbx2
taken database out of maintenance mode: dbx2

配置shared, tag(可选, 仅仅当数据库在广播域有重名是需要使用)

[root@150 postgresql-9.3.5]# monetdb set shared=150 db1
[root@150 postgresql-9.3.5]# monetdb set shared=150 db2
[root@db-172-16-3-221 ~]# monetdb set shared=221 dbx1
[root@db-172-16-3-221 ~]# monetdb set shared=221 dbx2

发现

[root@db-172-16-3-221 ~]# monetdb discover
                    location
mapi:monetdb://150.sky-mobi.com:50000/db1/150
mapi:monetdb://150.sky-mobi.com:50000/db2/150
mapi:monetdb://db-172-16-3-221:50000/dbx1/221
mapi:monetdb://db-172-16-3-221:50000/dbx2/221

[root@150 mdb1]# monetdb discover
                    location
mapi:monetdb://150.sky-mobi.com:50000/db1/150
mapi:monetdb://150.sky-mobi.com:50000/db2/150
mapi:monetdb://db-172-16-3-221:50000/dbx1/221
mapi:monetdb://db-172-16-3-221:50000/dbx2/221

创建第一个m-funnel, 这里假设dbx这个m-funnel是边界funnel.

首先测试一下库名是否正常.

[root@db-172-16-3-221 ~]# mclient -h 127.0.0.1 dbx1/221
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx1'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>\q
[root@db-172-16-3-221 ~]# mclient -h 127.0.0.1 dbx2/221
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx2'
Type \q to quit, \? for a list of available commands
auto commit mode: on

使用这两个库名创建m-funnel, 注意用户密码我没改, 用默认的

[root@db-172-16-3-221 ~]# monetdb create -m monetdb+monetdb@dbx1/221,monetdb+monetdb@dbx2/221 dbx
created multiplex-funnel in maintenance mode: dbx
[root@db-172-16-3-221 ~]# monetdb release dbx
taken database out of maintenance mode: dbx

查看状态

[root@db-172-16-3-221 ~]# monetdb get all dbx
     name          prop     source           value
dbx              name      -        dbx
dbx              type      local    mfunnel
dbx              shared    default  yes
dbx              mfunnel   local    monetdb+monetdb@dbx1/221,monetdb+monetdb@dbx2/221

测试第一个m-funnel是否正常, 可以正常的像两个底层数据库发出SQL.

[root@db-172-16-3-221 ~]# mclient -h 127.0.0.1 dbx
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx2'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>create table abc(id int);
operation successful (22.611ms)
sql>insert into abc values(1);
2 affected row (7.259ms)
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    1 |
+------+
2 tuples (1.854ms)
sql>insert into abc values(2);
2 affected row (5.553ms)
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    2 |
|    1 |
|    2 |
+------+
4 tuples (1.014ms)

已经正常了, 插入和查询都从两个节点正常返回并合并结果.

配置第一个m-funnel的tag(可选, 仅仅当广播域中这个m-funnel的名字有重名才需要)

[root@db-172-16-3-221 ~]# monetdb set shared=221 dbx
[root@db-172-16-3-221 ~]# monetdb get all dbx
     name          prop     source           value
dbx              name      -        dbx
dbx              type      local    mfunnel
dbx              shared    local    221
dbx              mfunnel   local    monetdb+monetdb@dbx1/221,monetdb+monetdb@dbx2/221

创建第二个m-funnel : 

先执行discover得到URI, 从而得知database name.(含shared设置的tag)

[root@150 postgresql-9.3.5]# monetdb discover
                    location
mapi:monetdb://150.sky-mobi.com:50000/db1/150
mapi:monetdb://150.sky-mobi.com:50000/db2/150
mapi:monetdb://db-172-16-3-221:50000/dbx/221
mapi:monetdb://db-172-16-3-221:50000/dbx1/221
mapi:monetdb://db-172-16-3-221:50000/dbx2/221

创建m-funnel, 注意用户密码我没改, 用默认的

这个m-funnel包含3个库, 其中2个是normal 库, 另一个是前面创建的funnel, 所以实际上就是4个normal库, 但是funnel里看到的是3个库.

[root@150 postgresql-9.3.5]# monetdb create -m monetdb+monetdb@db1/150,monetdb+monetdb@db2/150,monetdb+monetdb@dbx/221 db0
created multiplex-funnel in maintenance mode: db0
[root@150 postgresql-9.3.5]# monetdb release db0
taken database out of maintenance mode: db0

查看m-funnel的状态

[root@150 postgresql-9.3.5]# monetdb get all db0
     name          prop     source           value
db0              name      -        db0
db0              type      local    mfunnel
db0              shared    default  yes
db0              mfunnel   local    monetdb+monetdb@db1/150,monetdb+monetdb@db2/150,monetdb+monetdb@dbx/221

测试 :

[root@150 postgresql-9.3.5]# mclient db0
Welcome to mclient, the MonetDB/SQL interactive terminal (Jan2014-SP3)
Database: MonetDB v11.17.21 (Jan2014-SP3), 'mapi:monetdb://db-172-16-3-221:50000/dbx2'
Type \q to quit, \? for a list of available commands
auto commit mode: on
sql>create table abc(id int);  -- 前面测试的时候创建了abc表, 所以这里创建会报错.
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use

-- 特别注意, 虽然报错了, 但是实际上只是2个节点报错, 其他2个节点创建成功了.

-- 这个测试用了auto commit, 所以导致部分提交成功, 部分提交失败了.

-- 要全局一致的话, 需要用start transaction

-- 但是用start transaction的话, 会扰乱, 因为并行的事情. 后面有测试

sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    2 |
|    1 |
|    2 |
+------+
4 tuples (1.607ms)
sql>insert into abc values (3);  -- 从这个插入我们就能知道, 上面建表的语句在2个节点执行成果了.
4 affected row (5.192ms)
sql>select * from abc;
+------+
| id   |
+======+
|    3 |
|    3 |
|    1 |
|    2 |
|    3 |
|    1 |
|    2 |
|    3 |
+------+
8 tuples (1.459ms)

假设我们用了事务会怎么样呢?

可能出现部分提交失败, 部分提交成功或者全部提交成功的情况, 回滚的话则全部节点回滚.

所以鉴于此, 如果报错了, 就立刻回滚, 那么是全部回滚的.

sql>create table abc(id int);
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use
sql>select * from abc;
node dbx/221 failed: node dbx1/221 failed: 25005!current transaction is aborted (please ROLLBACK)
sql>commit;  -- 此时commit, 部分节点提交成功, 部分节点提交失败.
node dbx/221 failed: node dbx1/221 failed: 2D000!COMMIT: failed
sql>rollback;  -- 事务已经结束了, 所以不能再回滚.
node db1/150 failed: 2DM30!ROLLBACK: not allowed in auto commit mode
sql>select * from abc;
+------+
| id   |
+======+
|    1 |
|    2 |
|    1 |
|    2 |
+------+
4 tuples (1.676ms)
sql>insert into abc values (3);  -- 因为部分提交成功, 所以DB1和DB2创建abc表完成.
4 affected row (5.538ms)
sql>select * from abc;
+------+
| id   |
+======+
|    3 |
|    3 |
|    1 |
|    2 |
|    3 |
|    1 |
|    2 |
|    3 |
+------+
8 tuples (1.395ms)

全部回滚的测试 : 

sql>create table abc(id int);
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use
sql>rollback;
auto commit mode: on
sql>\dt
TABLE  sys.abc
TABLE  sys.abc

部分提交成功和失败的测试 : 

sql>create table abc(id int);
node dbx/221 failed: node dbx1/221 failed: 42S01!CREATE TABLE: name 'abc' already in use
sql>\dt
TABLE  sys.abc
TABLE  sys.abc
TABLE  sys.abc
TABLE  sys.abc

全部提交成功的测试 : 

sql>insert into abc values (10);
4 affected row (5.611ms)
sql>select * from abc where id=10;
+------+
| id   |
+======+
|   10 |
|   10 |
|   10 |
|   10 |
+------+
4 tuples (1.906ms)

如果需要全局事务一致的话, 必须使用start trasaction;

使用start transaction带来的困扰, 原因就是漏斗效应, 因为并行转换成了串行来执行, 所以中间会被其他SQL插入, 测试如下.

会话A
sql>start transaction;
auto commit mode: off

会话B
sql>start transaction;  -- 因为在m-funnel 中已经启动了事务, 所以再次启动失败
node db1/150 failed: 25001!START TRANSACTION: cannot start a transaction within a transaction
sql>select * from abc;
node db1/150 failed: 25005!current transaction is aborted (please ROLLBACK)

会话A
sql>select * from abc;  -- 回退
node db1/150 failed: 25005!current transaction is aborted (please ROLLBACK)

-- 任何一个会话执行了rollback都可以, 就结束这个事务了.

[参考]
1. man monetdbd

MULTIPLEX-FUNNELS
       Monetdbd  implements multiplex-funnel capabilities.  As the name suggests two techniques are combined, the mul-
       tiplexer and the funnel.

       The funnel capability limits the access to the database to one client at a time.  That is, if multiple  clients
       connect  to  the  funnel, their queries will be serialized such that they are executed one after the other.  An
       effect of this approach is that clients no longer have an exclusive channel to the database, which  means  that
       individual queries from one client may have been interleaved with queries from others.  This most notably makes
       SQL transaction blocks unreliable with a funnel.  The funnel, hence, is meant to scale down a large  amount  of
       clients that perform short-running (read-only) queries, as typically seen in web-based query loads.

       When a funnel is defined to use multiple databases, the funnel adds a multiplexer to its query channel.  A mul-
       tiplex-funnel sends each query to all of the defined databases.  This behavior can be quite confusing at first,
       but  proves  to be useful in typical sharding configurations, where in particular simple selection queries have
       to be performed on each of the shards.  The multiplexer combines the answers from all defined databases in  one
       single  answer  that it sends back to the client.  However, this combining is without any smart logic, that is,
       the multiplexer does not evaluate the query it is running, but just combines all answers it receives  from  the
       databases.   This  results  in  e.g.  as many return tuples for a SELECT COUNT(*) query, as there are databases
       defined.

       Due to the two above mentioned characteristics, a multiplex-funnel has some limitations.  As mentioned  before,
       transactions over multiple queries are likely not to result in the desired behavior.  This is due to each query
       to the funnel is required to be self-contained.  Further, since for  each  query,  the  results  from  multiple
       servers  have  to  be  combined  into one, that query must only return a single response, i.e.  multi-statement
       queries are most likely causing the funnel to respond with an error, or return garbled results.  Last, the size
       of  each query is limited to currently about 80K.  While this size should be sufficient for most queries, it is
       likely not enough for e.g. COPY INTO statements.  Apart from the data transfer  implications,  such  statements
       should not be used with the funnel, as the results will be undefined due to the limited query buffer.  Applica-
       tions using the funnel should aim for short and single-statement queries that require no transactions.

       See the create command in the monetdb(1) man-page for details on how to setup a multiplex-funnel.
时间: 2024-09-20 01:01:44

MonetDB multiplex-funnel的相关文章

some limits when we use monetdb

使用monetdb时的一些注意事项,  1. m-funnel的返回结果集不能超过80K. man monetdbd MULTIPLEX-FUNNELS Monetdbd implements multiplex-funnel capabilities. As the name suggests two techniques are combined, the mul- tiplexer and the funnel. The funnel capability limits the acces

MonetDB vs PostgreSQL 2, width table with random data

前面一篇简单的对比了一下PostgreSQL和MonetDB在bulk load和简单的统计查询的性能. 因为测试数据比较单一, 可能没有什么说服力.  本文测试环境与之前的一致, 但是使用宽表, 随机离散字符串进行测试. 5000万测试数据, 60个字段, 单表, 容量149GB, (PG96GB) http://blog.163.com/digoal@126/blog/static/16387704020147139412871/ [注意]  PostgreSQL注重的是高并发, 而Mone

MonetDB bulk load performance & fixed length type performance & JOIN performance

之前写过一篇关于PostgreSQL使用UUID作为PK, 造成写入性能很差的分析文章, 有兴趣的童鞋可参考如下 :  http://blog.163.com/digoal@126/blog/static/16387704020129249646421/ 本文主要测试一下几个因素对导入速度的影响. 1. 索引列无序导入 2. 索引列顺序导入 3. 无索引无序导入(指某列无序) 4. 无索引顺序导入(指某列顺序) 测试数据使用的是前面一篇BLOG的测试数据, 60个字段, 第一个字段是INT, 主

互联网业最基本的概念:Conversion Funnel

文章描述:互联网业的30/10/10 黄金比率. Funnel (或称Conversion Funnel) 是互联网业最基本的概念之一,它的意思是100 个人路过你的网站,你能够把几个人变成忠实顾客. 一般最基本的电子商务Funnel,大概分五个步骤: 100% - 流量 20% - Email 订户(或Facebook 粉丝) 10% - 加入会员 3% - 下订单 1% - 重复购买 你可以看到我在每个步骤前面都加了一个数字,好让你们对于这个漏斗有些概念(这些是我大略抓的数字,每家公司会因为

MonetDB table level write lock? OR it's repeatable read above isolate?

在测试对单个表执行insert时, 发现并行的话会导致除最早提交的事务以外的其他事务回滚. 来看个例子 : sql>\d d CREATE TABLE "sys"."d" ( "id" INTEGER, "info" VARCHAR(64), "c1" VARCHAR(64) ); 当前d表有1条记录. sql>select * from d; +------+------+------+ |

The column-store pioneer | MonetDB

MonetDB是一个开源的列存储数据库产品, 起源于阿姆斯特丹大学1990年的一个研究项目MAGNUM, 2002年一位学生以该项目为基础的博士论文并以Monet命名. 第一个版本是2004年9月30日发布的.  MonetDB的架构, 包含3个层次. 第一层top layer, 负责查询接口, 输出MAL(MonetDB Assembly Language)指令. 第二层backend 层. 接收MAL指令, 负责基于成本的优化. 第三层database kernel层. 为数据库内核层, 负

MonetDB 1.6 billion(384GB) JOIN 2.4 billion(576GB) 60 columns-random-data wide-table

前面简单的对比了一下PostgreSQL和MonetDB宽表的导入, 查询, 关联方面的性能. 有兴趣的朋友可以参考 :  MonetDB vs PostgreSQL 2, width table with random data http://blog.163.com/digoal@126/blog/static/1638770402014714104326879/ MonetDB bulk load performance & fixed length type performance &am

MonetDB remote database forward method : proxy or redirect

MonetDB一个比较好用的功能, 共享数据库. 也就是说, monetdb可以通过共享数据库的方式, 让客户端连接一个节点就可以连接到其他节点. 如图 :    例如我在两台服务器上分别启动了一个monetdbd. 分别有一个数据库都名为test. 172.16.3.150 172.16.3.221 那么只要配置了数据库的shared=yes或者其他tag, 并且启动monetdbd时配置了discovery=yes, monetdb就可以在广播域发现它. 默认都是yes的. 如果不是, 可以

PostgreSQL fixed length wide-table VS MonetDB

注意这里所有的对比都是针对的PostgreSQL行存储, 没有使用cstore_fdw插件, 如果要在PG中使用列存储, 可以考虑一下. 参考 :  http://blog.163.com/digoal@126/blog/static/16387704020144141052312/ 本文对比一下MonetDB和PostgreSQL定长字段, 宽表的查询性能和导入性能. 篇幅太长, 测试环境和测试方法见以下BLOG :  http://blog.163.com/digoal@126/blog/s