数据仓库专题(21):Kimball总线矩阵说明-官方版

一、前言

Over the years, I have found that a matrix depiction of the data warehouse plan is a pretty good planning tool once you have gathered the business requirements and performed a full data audit. This matrix approach has been exceptionally effective for distributed data warehouses without a center. Most of the new Web-oriented, multiple organization warehouses we are trying to build these days have no center, so it is even more urgent that we find a way to plan these beasts.

二、一级数据集市(First-level data marts)

The matrix is simply a vertical list of data marts and a horizontal list of dimensions. Figure 1 is an example matrix for the enterprise data warehouse of a large telecommunications company. You start the matrix by listing all the first-level data marts that you could possibly build over the next three years across the enterprise. A first-level data mart is a collection of related fact tables and dimension tables that is typically:

  • Derived from a single data source
  • Supported and implemented by a single department
  • Based on the most atomic data possible to collect from the source
  • Conformed to the “data warehouse bus.”

First-level data marts should be the smallest and least risky initial implementations of an enterprise data warehouse. They form a foundation on which a larger implementation can be brought to completion in the least amount of time, but that are still guaranteed to contribute to the final result without being incompatible stovepipes.

You should try to reduce the risk of implementation as much as possible by basing the first-level data marts on single production sources. In my experience, the cost and complexity of data warehouse implementation, once the “right” data has been chosen, turns out to be proportional to the number of data sources that must be extracted. Each separate data source can be as much as a six-month programming and testing exercise. You must create a production data pipeline from the legacy source through the data staging area and on to the fact and dimension tables of the presentation part of the data warehouse.

In Figure 1, the first-level data marts for the telecommunications company are many of the major production data sources. An obvious production data source is the customer billing system, listed first. This row in the matrix is meant to represent all the base-level fact tables you expect to build in this data mart. Assume this data mart contains one major base-level fact table, the grain of which is the individual line item on a customer bill. Assume the line item on the bill represents the class of service provided, not the individual telephone call within the class of service. With these assumptions, you can check off the dimensions this fact table needs. For customer bills, you need Time, Customer, Service, Rate Category, Local Service Provider, Long Distance Provider, Location, and Account Status.

Continue to develop the matrix rows by listing all the possible first-level data marts that could be developed in the next three years, based on known, existing data sources. Sometimes I am asked to include a first-level data mart based on a production system that does not yet exist. I usually decline the offer. I try to avoid including “potential” data sources, unless there is a very specific design and implementation plan in place. Another dangerously idealistic data source is the grand corporate data model, which usually takes up a whole wall of the IT department. Most of this data model cannot be used as a data source because it is not real. Ask the corporate data architect to highlight with a red pen the tables on the corporate data model that are currently populated with real data. These red tables are legitimate drivers of data marts in the planning matrix and can be used as sources.

The planning matrix columns indicate all the dimensions a data mart might need. A real enterprise data warehouse contains more dimensions than those in Figure 1. It is often helpful to attempt a comprehensive list of dimensions before filling in the matrix. When you start with a large list of dimensions, it becomes a kind of creative exercise to ask whether a given dimension could possibly be associated with a data mart. This activity could suggest interesting ways to add dimensional data sources to existing fact tables. If you study the details of Figure 1, you may decide that more X’s should be filled in, or that some significant dimensions should be added. If so, more power to you! You are using the matrix as it was intended.

Inviting Data Mart Groups to the Conforming Meeting

Looking across the rows of the matrix is revealing. You can see the full dimensionality of each data mart at a glance. Dimensions can be tested for inclusion or exclusion. But the real power of the matrix comes from looking at the columns. A column in the matrix is a map of where the dimension is required.

FIGURE 1 The Matrix Plan for the enterprise data warehouse of a large telecommunications company.

The first dimension, Time, is required in every data mart. Every data mart is a time series. But even the Time dimension requires some thought. When a dimension is used in multiple data marts, it must be conformed. Conformed dimensions are the basis for distributed data warehouses, and using conformed dimensions is the way to avoid stovepipe data marts. A dimension is conformed when two copies of the dimensions are either exactly the same (including the values of the keys and all the attributes), or else one dimension is a perfect subset of the other. So using the Time dimension in all the data marts implies that the data mart teams agree on a corporate calendar. All the data mart teams must use this calendar and agree on fiscal periods, holidays, and workdays.

The grain of the conformed Time dimension needs to be consistent as well. An obvious source of stovepipe data marts is the reckless use of incompatible weeks and months across the data marts. Get rid of awkward time spans such as quad weeks or 4-4-5-week quarters.

The second dimension in Figure 1, Customer, is even more interesting than Time. Developing a standard definition for “customer” is one of the most important steps in combining separate sources of data from around the enterprise. The willingness to seek a common definition of the customer is a major litmus test for an organization intending to build an enterprise data warehouse. Roughly speaking, if an organization is unwilling to agree on a common definition of the customer across all data marts, the organization should not attempt to build a data warehouse that spans these data marts. The data marts should remain separate forever.

For these reasons, you can think of the planning matrix columns as the invitation list to the conforming meeting! The planning matrix reveals the interaction between the data marts and the dimensions.

Communicating With the Boss

The planning matrix is a good communication vehicle for senior management. It is simple and direct. Even if the executive does not know much about the technical details of the data warehouse, the planning matrix sends the message that standard definitions of calendars, customers, and products must be defined, or the enterprise won’t be able to use its data.

A meeting to conform a dimension is probably more political than technical. The data warehouse project leader does not need to be the sole force for conforming a dimension such as Customer. A senior manager such as the enterprise CIO should be willing to appear at the conforming meeting and make it clear how important the task of conforming the dimension is. This political support is very important. It gets the data warehouse project manager off the hook and puts the burden of the decision making process on senior management’s shoulders, where it belongs.

三、二级数据集市(Second-Level Data Marts)

After you have represented all the major production sources in the enterprise with first-level data marts, you can define one or more second-level marts. A second-level data mart is a combination of two or more first-level marts. In most cases, a second-level mart is more than a simple union of data sets from the first-level marts. For example, a second-level profitability mart may result from a complex allocation process that associates costs from several first-level cost-oriented data marts onto products and customers contained in a first-level revenue mart. I discussed the issues of creating these kinds of profitability data marts in my column, “Not so Fast.”

四、总结

The matrix planning technique helps you build an enterprise data warehouse, especially when the warehouse is a distributed combination of far-flung data marts. The matrix becomes a resource that is part technical tool, part project management tool, and part communication vehicle to senior management.

时间: 2024-11-02 23:24:31

数据仓库专题(21):Kimball总线矩阵说明-官方版的相关文章

数据仓库专题(22):总线架构和维度建模优势-杂项

一.总线架构 维度建模的数据仓库中,有一个概念叫Bus Architecture,中文一般翻译为"总线架构".总线架构是Kimball的多维体系结构(MD)中的三个关键性概念之一,另两个是一致性维度(Conformed Dimension)和一致性事实(Conformed Fact). 在多维体系结构(MD) 的数据仓库架构中,主导思想是分步建立数据仓库,由数据集市组合成企业的数据仓库.但是,在建立第一个数据集市前,架构师首先要做的就是设计出在整个企业 内具有统一解释的标准化的维度和事

数据仓库专题(2)-Kimball维度建模四步骤

一.前言 四步过程维度建模由Kimball提出,可以做为业务梳理.数据梳理后进行多维数据模型设计的指导流程,但是不能作为数据仓库系统建设的指导流程.本文就相关流程及核心问题进行解读. 二.数据仓库建设流程 以下流程是根据业务系统.组织结构.团队结构现状设定的数据仓库系统建设流程,适合系统结构复杂,团队协作复杂,人员结构复杂的情况,并且数据仓库建设团队和业务系统建设团队不同的情况.具体流程如下图所示:   图1 数据仓库系统建设流程   三.四步维度建模 Kimball四步建模流程适合上述数据仓库

数据仓库专题(7)-维度建模10大基本原则

一.前言       特别声明:本文整理自互联网.        遵循这些原则进行维度建模可以保证数据粒度合理,模型灵活,能够适应未来的信息资源,违反这些原则你将会把用户弄糊涂,并且会遇到数据仓库障碍. 二.正文 原则1.载入详细的原子数据到维度结构中 维度建模应该使用最基础的原子数据进行填充,以支持不可预知的来自用户查询的过滤和分组请求,用户通常不希望每次只看到一个单一的记录,但是你无法预测 用户想要掩盖哪些数据,想要显示哪些数据,如果只有汇总数据,那么你已经设定了数据的使用模式,当用户想要深

北京打车软件“官方版”上线:收电召费 不准加价

[TechWeb报道]8月21日消息,北京市交通委发布了首批4款"官方"手机打车软件,分别是"冠名"了96106之后的易达打车.移步叫车.摇摇招车和 嘀嘀打车(例如96106嘀嘀).消息显示这是第一批入围名单,其他打车软件尚在审核之中.根据规定,这四款入围的打车软件,将与出租汽车调度中心绑定服务,实现联合交互调派车辆.乘客通过手机软件下单后,司机可以通过车载电召终端和司机客户端手机电召软件两种方式进行应答抢单.同时,这四款打车软件,司机可收入即时打车5元.提前4小时

悠然乱弹:螺旋矩阵和蛇型矩阵的悠然版实现

螺旋矩阵和蛇型矩阵,是两个比较有趣的矩阵,有许多的公司面试题中有出现,这两个题的答案也有许多种,简单问一下度娘,就各自有N种实现,来源也非常丰富,比如CSDN.ITEYE.等等,当然也包括著名的OSC,但是整体看下来,呵呵,比较顺眼的比较少,比较经典的就越发少了. 考虑到不同的语言有不同的语言特性,因此今天就只用Java来进行实现,看看螺旋矩阵和蛇型矩阵的悠然版实现,让我们的OSC也更加高大上一些,. 概念说明 什么是螺旋矩阵 螺旋矩阵是指一个呈螺旋状的矩阵,它的数字由第一行开始到右边不断变大,

office 365官方版账户怎么注册?

  office 365官方版账户怎么注册?要下载使用office 365这款软件的时候,我们需要先注册office 365账户.那么,office 365账户怎么注册?在今天的教程中,我们就一起学习一下office 365账户注册的方法,文章中将为给大家分享注册时所需要的手续,人家要认真看哦! 在手机打开office 365其中一个办公软件,然后点击创建账户. 填写注册资料.在这里可以点击使用自己之前的邮箱注册哦. 添加验证码,提交注册. 发送验证邮件到自己的邮箱,在电脑打开邮件,访问收到的连

介绍最新的pdf转换成word转换器(2014官方版)

中介交易 http://www.aliyun.com/zixun/aggregation/6858.html">SEO诊断 淘宝客 云主机 技术大厅 大家知道,在我们日常工作学习中常会用到把PDF文档转换成容易编辑的word文档格式,特别是在互联网上很多资料图片都是PDF电子书格式,以方便浏览及日常使用.可是,PDF文档唯一缺陷是想要编辑实在是太难太麻烦了. 那么,有没有一款即操作简单又高效实用的pdf转换成word转换器呢?现在就给大家推荐一款不错的pdf转换成word转换器(2014官

新浪微博公测官方版微群:上限人数500

图为新浪微博官方版"微群"截图11月25日上午消息,新浪微博官方版"微群"正式进入公测阶段.据介绍,公测阶段,新浪微博用户只要有头像,粉丝超过10个,发微博数超过10条,就能创建微群.据新浪微博方面称,目前,新浪微博用户创建群的数量上限是3个,用户最多能加入10个微群,每个微群的人数上限是500.据介绍,新浪微群又名"围裙",其官方服务帐号为"微群小秘书".作为新浪微博的群类产品,微群定位为大微博里的小圈子,服务新浪微博现有用

新浪微博公测官方版微群 使用体验再度提升

中介交易 http://www.aliyun.com/zixun/aggregation/6858.html">SEO诊断淘宝客 站长团购 云主机 技术大厅 11月25日,倍受关注的新浪微博官方版"微群"正式进入公测阶段.新浪微群又名"围裙",其官方服务帐号为"微群小秘书".作为新浪微博的群类产品,微群定位为大微博里的小圈子,服务新浪微博现有用户,为用户提供小圈子的聚集.沟通.交流平台. 按照规定,新浪微群公测期间,微博用户需要通