MySQL 5.7 新特性 generated columns

MySQL 5.7的一个新特性，generated column
http://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-generated-columns
即generated column的值是普通column产生值，有点像视图，但是又有别于视图，因为它可以选择是否存储generated column产生的值。

CREATE TABLE triangle (
  sidea DOUBLE,
  sideb DOUBLE,
  sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb))
);
INSERT INTO triangle (sidea, sideb) VALUES(1,1),(3,4),(6,8);
mysql> SELECT * FROM triangle;
+-------+-------+--------------------+
| sidea | sideb | sidec              |
+-------+-------+--------------------+
|     1 |     1 | 1.4142135623730951 |
|     3 |     4 |                  5 |
|     6 |     8 |                 10 |
+-------+-------+--------------------+
col_name data_type [GENERATED ALWAYS] AS (expression)
  [VIRTUAL | STORED] [UNIQUE [KEY]] [COMMENT comment]
  [[NOT] NULL] [[PRIMARY] KEY]

virtual不存储值，stored存储值（并支持索引）。
但是MySQL这个特性貌似用处并不大，例如要参与计算的行只能是当前行。
在物联网中，可能存在类似的需求，但是一般会要求参与计算的行是相邻的N行，或者有规则可寻的N行。例如按照相邻的5行计算平均值，最大值，最小值，方差。
MySQL 满足不了这样的需求。

在PostgreSQL中，这不是什么新鲜概念，而且支持得更彻底。
例子：
对应 mysql vitrual generated column

postgres=# create table test(c1 int, c2 int);
CREATE TABLE
postgres=# create view v_test as select c1,c2,sqrt(c1*c2+c1*c2) from test;
CREATE VIEW
postgres=# insert into test values (1,2),(10,20);
INSERT 0 2
postgres=# select * from v_test;
 c1 | c2 | sqrt
----+----+------
  1 |  2 |    2
 10 | 20 |   20
(2 rows)

对应 mysql stored generated column

postgres=# create materialized view v_test1 as select c1,c2,sqrt(c1*c2+c1*c2) from test;
SELECT 2
postgres=# select * from v_test1;
 c1 | c2 | sqrt
----+----+------
  1 |  2 |    2
 10 | 20 |   20
(2 rows)

还有一个更适合物联网场景的，流式处理 :

pipeline=# create stream s1(c1 int, c2 int);
CREATE STREAM
pipeline=# create continuous view test as select c1,c2,sqrt(c1*c1+c2*c2) from s1;
CREATE CONTINUOUS VIEW
pipeline=# activate;
ACTIVATE
pipeline=# insert into s1 values (1,2),(10,20);
INSERT 0 2
pipeline=# select * from test;
 c1 | c2 |       sqrt
----+----+------------------
  1 |  2 | 2.23606797749979
 10 | 20 | 22.3606797749979
(2 rows)

流式处理加窗口和实时聚合 :

pipeline=# create continuous view test1 as select c1,count(*) over(partition by c1) from s1 ;
CREATE CONTINUOUS VIEW
pipeline=# create continuous view test2 as select c2,count(*) over w from s1 window w as(partition by c2);
CREATE CONTINUOUS VIEW
pipeline=# insert into s1 values (1,2);
INSERT 0 1
pipeline=# select * from test1;
 c1 | count
----+-------
  1 |     1
(1 row)

pipeline=# select * from test2;
 c2 | count
----+-------
  2 |     1
(1 row)

实时分析每个URL的访问次数，用户数，99%用户的访问延迟低于多少。

/*
 * This function will strip away any query parameters from each url,
 * as we're not interested in them.
 */
CREATE FUNCTION url(raw text, regex text DEFAULT '\?.*', replace text DEFAULT '')
    RETURNS text
AS 'textregexreplace_noopt'    -- textregexreplace_noopt@src/backend/utils/adt/regexp.c
LANGUAGE internal;  

CREATE CONTINUOUS VIEW url_stats AS
    SELECT
        url, -- url地址
    percentile_cont(0.99) WITHIN GROUP (ORDER BY latency_ms) AS p99,  -- 99%的URL访问延迟小于多少
        count(DISTINCT user) AS uniques,  -- 唯一用户数
    count(*) total_visits  -- 总共访问次数
  FROM
    (SELECT
        url(payload->>'url'),  -- 地址
        payload->>'user' AS user,  -- 用户ID
        (payload->>'latency')::float * 1000 AS latency_ms,  -- 访问延迟
        arrival_timestamp
    FROM logs_stream) AS unpacked
WHERE arrival_timestamp > clock_timestamp() - interval '1 day'
 GROUP BY url;  

CREATE CONTINUOUS VIEW user_stats AS
    SELECT
        day(arrival_timestamp),
        payload->>'user' AS user,
        sum(CASE WHEN payload->>'url' LIKE '%landing_page%' THEN 1 ELSE 0 END) AS landings,
        sum(CASE WHEN payload->>'url' LIKE '%conversion%' THEN 1 ELSE 0 END) AS conversions,
        count(DISTINCT url(payload->>'url')) AS unique_urls,
        count(*) AS total_visits
    FROM logs_stream GROUP BY payload->>'user', day;  

-- What are the top-10 most visited urls?
SELECT url, total_visits FROM url_stats ORDER BY total_visits DESC limit 10;
      url      | total_visits
---------------+--------------
 /page62/path4 |        10182
 /page51/path4 |        10181
 /page24/path5 |        10180
 /page93/path3 |        10180
 /page81/path0 |        10180
 /page2/path5  |        10180
 /page75/path2 |        10179
 /page28/path3 |        10179
 /page40/path2 |        10178
 /page74/path0 |        10176
(10 rows)  

-- What is the 99th percentile latency across all urls?
SELECT combine(p99) FROM url_stats;
     combine
------------------
 6.95410494731137
(1 row)  

-- What is the average conversion rate each day for the last month?
SELECT day, avg(conversions / landings) FROM user_stats GROUP BY day;
          day           |            avg
------------------------+----------------------------
 2015-09-15 00:00:00-07 | 1.7455000000000000000000000
(1 row)  

-- How many unique urls were visited each day for the last week?
SELECT day, combine(unique_urls) FROM user_stats WHERE day > now() - interval '1 week' GROUP BY day;
          day           | combine
------------------------+---------
 2015-09-15 00:00:00-07 |  100000
(1 row)  

-- Is there a relationship between the number of unique urls visited and the highest conversion rates?
SELECT unique_urls, sum(conversions) / sum(landings) AS conversion_rate FROM user_stats
    GROUP BY unique_urls ORDER BY conversion_rate DESC LIMIT 10;
 unique_urls |  conversion_rate
-------------+-------------------
          41 |  2.67121005785842
          36 |  2.02713894173361
          34 |  2.02034637010851
          31 |  2.01958418072859
          27 |  2.00045348712296
          24 |  1.99714899522942
          19 |  1.99438839453606
          16 |  1.98083502184886
          15 |  1.87983011139079
          14 |  1.84906254929873
(1 row)

时间： 2024-09-21 00:03:22

MySQL 5.7 新特性 generated columns的相关文章

MySQL 5.0 新特性--存储过程

Introduction 简介 MySQL 5.0 新特性教程是为需要了解5.0版本新特性的MySQL老用户而写的.简单的来说是介绍了"存储过程.触发器.视图.信息架构视图",在此感谢译者陈朋奕的努力. 希望这本书能像内行专家那样与您进行对话,用简单的问题.例子让你学到需要的知识.为了达到这样的目的,我会从每一个细节开始慢慢的为大家建立概念,最后会给大家展示较大的实用例,在学习之前也许大家会认为这个用例很难,但是只要跟着课程去学,相信很快就能掌握. Conventions and St

MySQL 5.0新特性教程存储过程:第一讲

mysql|存储过程|教程作者:mysql AB;翻译:陈朋奕 Introduction 简介 MySQL 5.0 新特性教程是为需要了解5.0版本新特性的MySQL老用户而写的.简单的来说是介绍了"存储过程.触发器.视图.信息架构视图",在此感谢译者陈朋奕的努力. 希望这本书能像内行专家那样与您进行对话,用简单的问题.例子让你学到需要的知识.为了达到这样的目的,我会从每一个细节开始慢慢的为大家建立概念,最后会给大家展示较大的实用例,在学习之前也许大家会认为这个用例很难,但是只要跟着

MySQL 5.0新特性教程存储过程：第一讲

MySQL · 特性分析 ·MySQL 5.7新特性系列四

继上三期月报:MySQL 5.7新特性之一介绍了一些新特性及兼容性问题MySQL 5.7新特性之二介绍了临时表的优化和实现MySQL 5.7新特性之三介绍了undo表空间的truncate功能这期我们一起来学习下MySQL 5.7的并行复制. 1. 背景 MySQL的master<->slave的部署结构,使用binlog日志保持数据的同步,全局有序的binlog在备库按照提交顺序进行回放. 由于新硬件的发展,SSD的引入和多core的CPU,master节点的并发处理能力持续提升,slav

MySQL · 特性分析 ·MySQL 5.7新特性系列三

继上两期月报,MySQL5.7新特性之一介绍了一些新特性及兼容性问题,MySQL 5.7新特性之二介绍了临时表的优化和实现. 这期我们一起来学习下undo空间管理,重点介绍truncate功能. 1. 背景 InnoDB存储引擎中,undo在完成事务回滚和MVCC之后,就可以purge掉了,但undo在事务执行过程中,进行的空间分配如何回收,就变成了一个问题. 我们亲历用户的小实例,因为一个大事务,导致ibdata file到800G大小. 我们先大致看下InnoDB的undo在不同的版本上的一

MySQL 5.0 新特性--存储过程(1)

MySQL 5.0 新特性教程第一部分存储过程:第三讲

mysql|存储过程|教程第三讲:新SQL语句,Loops 循环语句 The New SQL Statements 新SQL语句 Variables 变量在复合语句中声明变量的指令是DECLARE. (1) Example with two DECLARE statements 两个DECLARE语句的例子 CREATE PROCEDURE p8 () BEGIN DECLARE a INT; DECLARE b INT; SET a = 5; SET b = 5; INSERT INT

MySQL 5.0 新特性教程存储过程:第四讲

mysql|存储过程|教程作者:mysql AB;翻译:陈朋奕 Error Handling 异常处理好了,我们现在要讲的是异常处理 1. Sample Problem: Log Of Failures 问题样例:故障记录当INSERT失败时,我希望能将其记录在日志文件中我们用来展示出错处理的问题样例是很普通的.我希望得到错误的记录.当INSERT失败时,我想在另一个文件中记下这些错误的信息,例如出错时间,出错原因等.我对插入特别感兴趣的原因是它将违反外键关联的约束 2. Sample P

MySQL · 8.0新特性 · New data dictionary尝鲜篇

众所周知,由于MySQL采用统一Server层+不同的底层引擎插件的架构模式,在Server层为每个表创建了frm文件,以保存与表定义相关的元数据信息.然而某些引擎(例如InnoDB)本身也会存储元数据,这样不仅产生了元数据冗余,而且由于Server层和引擎层分别各自管理,在执行DDL之类的操作时,很难做到crash-safe,更别说让DDL具备事务性了. 为了解决这些问题(尤其是DDL无法做到atomic),从MySQL8.0开始取消了FRM文件及其他server层的元数据文件(frm, pa