Big Data's Data Trap 1: Plato's Allegory of the Cave

There
is an old Chinese saying that goes, "to believe everything that
is written in
books is even worse than to have no books at all." If that phrase was
coined in the 21st century, it might sound like this, ‘to believe
everything in data is worse than to have no data at all’. What I’m trying to
highlight with this is that the misuse and/or inappropriate use of data may actually
be worse than the complete absence of data.

Let
me explain myself a little better. If data is collected from only one or
several aspects, the data will never be adequate, just like using data
collected at a low dimension level to describe things at a high dimension
level. More importantly, a greater amount of data will lead to a greater number
of differences because a lot of data can be collected from one aspect to
support each unique point of view, which is in conflict with others. In this
case, the misuse of data is to some extent worse than absence of data. This is
just one of the many data traps of big data. In this 2-part article series I
will explore data-traps further as well as delve into the pitfalls of the
improper use of data. 

Mathematical Philosophy of Big Data

Currently,
data in every shape and form is being captured and recorded. Through data
emission, a big data system can record and track a user’s every move (such as
clicking records, browsing times and comments). It can also record data sent
from sensors such as temperature humidity, speed, pressure and the list goes on.
Data shows the present world help to push the boundaries, predict the future
and analyze incidences.

Regarded
by some, the big data era is a total different world and the attributes and
rules of anything can be transferred using appropriate code (digital medium) to
other homogeneous things, where the attributes and rules are expressed without
loss. In this regard, this school of thought believes that [1] big data is the equivalent of the world and
they are isomorphic to each other (see Figure 1). 

Figure 1 Mathematical philosophy of
big data— homogeneous relationship

By
quantizing everything, big data converts the whole world into data. This will
probably change the way we see and understand the world, bringing us a brand
new world view based on big data. So in other words, big data help us to get a
bigger picture of the world. 

Undoubtedly,
big data is a precious resource and a powerful tool. However, it would be
idealistic to deem big data homogeneous and wholly representative of the world.
Big data tells us information, but instead of interpreting it big data directs
people to understand it. If we misuse data, we may misunderstand it. Big data
has its bright side, but it also has a dark side.

2. Plato's "Allegory of the Cave"

The
famous ancient Greek philosopher Plato wrote the "Allegory of the
Cave" which appeared in Chapter 7 of his Utopia [2].

The allegory asks readers to imagine that there is a very deep cave which holds prisoners
who have been living in the cave since birth and who are chained so that they
cannot mover their legs, arms and heads. They are chained facing a wall, and
behind them is a fire. Between them and the fire is a walkway where people walk
carrying objects and puppets to create shadows on the only wall that the
prisoners can see. Because these shadows are the only things that the prisoners
can see in the cave, they grow up believing that the shadows are in fact real. These
shadows would become part of their only reality. (See Figure 2.)

Figure
2 Plato's
"Allegory of the Cave"
(Picture source: Wikipedia; created by: Markus Maurer)

Using
the prisoners’ perceived reality as the main metaphor, Plato created this
allegory to illustrate the effects one’s surroundings has on their perception
of the world.

In
the same light, we, limited by available measurement and learning means, can
only sense one or several aspects of a certain object, just like the prisoners
in Plato's allegory. Limited by the chains, they can only face the wall in
front of them, which makes them believe that the shadows they sense
(2-dimensional) are the true world (3-dimensional). If the shadows are converted into
data, then our data may fail to show us the full picture regardless of what
technology is used and how much data is collected. This is one of the biggest
data-traps out there. 

A
world without adequate dimensions is a false world. In the same light, a fact
without adequate data to back it up is a false fact. The worst case scenario
here is that these may create a negative situation where each person possessing
inadequate data becomes obstinate, leading to mutually exclusive data traps.

Surely more data helps you get closer to the truth, Right?

As
data is collected from an increasing number of things, more and more people
depend only on data when making decisions. This attitude is summed up by the
quote that Edwards Deming once famously said, "In God we trust. All others
(must) bring data".

Trusting
in data is fine.

However
there is a counter-argument to this stance. A famous Chinese saying states, ‘to
believe everything in books is worse than to have no books at all’, in the
context of the big data data-trap conversation roughly translates to that to
believe everything data tells you is worse than to have no data at all. The
misuse and/or inappropriate use of data may actually be worse than the complete
absence of data.

This
is because there is only one physical world, while it can be described from
numerous aspects. In many cases, the data that we can collect, the data
available to us, and the data we choose to trust are only data on one or
several aspects of facts.

Using
the data to interpret facts is just like describing a high-dimensional world at
a low-dimensional level. The interpretation would not be correct regardless of
the amount of data collected. It could be worse when the number of differences
may increase along with the amount of data collected, because data is available
to support each point of view from one aspect of a fact. Such points of view
are in conflict with each other, creating an infinite loop.

For
example, if we assume the fact that education quality is deteriorating, the
data we get is scores of standardized exams. Then, can that data fully reveal
students' potential? To what extent can such exams show students' creativity?
Does education aim for scores or abilities? The reason for disputes over
standardized exams is that the scores fail to show the potential of students.

Another
example, if I say that Li Hongzhang is one of the 3 most outstanding Chinese
diplomats (the others are Zhou Enlai and Wellington Koo) in the modern history
of China. You may be met with a wave of heated protests arguing that he should
be called a traitor because of his involvement in all of the 30 most unjust
treaties of China in the modern history.

Everyone
has a reason for different point of views. Everyone is keen to use one aspect
of the fact to deny another. 

Mr.
Tu Zipei, a big data expert, has pointed out in his Why
the Truth Gets Further Away As Data Aggregates
[3] that “just like the
man of Chu in the Chinese fable 'Carve on Gunwale of a Moving Boat' (about a
person who took measures without regarding the changes in circumstances), human
beings have access to only the facts within a limited space and time.”

Even
some of the most successful big data processing companies in the world, such as
Alibaba, still are not without difficulties in big data.

Mr.
Tu provided an example. Before joining Alibaba, a senior manager in charge of
business operations once turned to him for suggestions. By then, Alibaba had
nine business departments predicting consumers' product needs and wants. The
opinions of these departments were often in conflict with each other and
everyone of them believed that it has the most reasonable and accurate
prediction.

Mr.
Tu believes that this case shows a great potential risk of the big data era. Huge
amounts of data will result in the situation where "everyone has a valid reason."
A person can always come to a conclusion different from others' with support of
data.

According
to Thomas Grump, a digital anthropologist [4], all data is collected by people,
while no one is always rational!

As
a result, we often see many opinion conflicts with only a few consensuses.

To
some extent, this result may be worse than the result when data is unavailable.
This is one of the main data traps that must be considered when using big data.

Ways to eliminate the data trap

Mr.
Tu Zipei guessed that the Alibaba case resulted from the fact that the
departments made conclusions based on their respective data, which was
collected from different areas. Zipei’s suggestion in these situations is to
consolidate departments and integrate data to form multi-dimensional data that is
more close to the truth than predictions.

Zipei’s
suggestion can be summed up using another Chinese idiom, "listen to both
sides and you will be enlightened." Learning about facts from more aspects
brings people closer together to actual reality. Otherwise, they will just be
getting one aspect of a multi-dimensional issue.

Although,
in this modern age of technological advancements and big data, we’ll all do
well not to forget some ancient wisdom to help us falling into data traps.

References:

[1]
Change By Big Data [M] by Li Dewei;
published by Publishing House of Electronics Industry in October 2013

[2]
Utopia [M] by Plato; translated by Huang Yin and published by The Chinese
Overseas Publishing House in June 2012

[3]
Why the Truth Gets Further Away As Data
Aggregates by Tu Zipei; published by Logical Thinking in April 2016

[4]
The Anthropology of Numbers [M] by
Thomas Grump; translated by Zheng Yuanzhe and published by Central Compilation
& translation Press in August 2007

时间: 2024-09-19 09:22:56

Big Data's Data Trap 1: Plato's Allegory of the Cave的相关文章

手机sd卡android/data和data/data中都找不到相应的软件包

问题描述 手机sd卡android/data和data/data中都找不到相应的软件包 应用开发时,我建了一个数据库存在手机软件中,结果在手机上运行时,去android/data和data/data中都没有找到相应的包名,求大神指导. Manifest中加了如下权限: <instrumentation android:name="android.test.InstrumentationTestRunner" android:targetPackage="cn.itetc

Oracle 12.2新特性掌上手册 - 第七卷 Big Data and Data Warehousing

编辑手记:也许Oracle 12.2在内核上的智能改进只能让你眼前一亮,那今天基于Big Data和数据仓库的性能优化增强则会让你伸手触Oracle的强大灵魂.细腻中霸气侧漏,这就是Oracle 12.2. 1Partitioning:External Tables(外部表) 外部分区表提供了将分区的Hive表映射到Oracle数据库生态系统以及在基于Hadoop分布式文件系统(HDFS)的数据存储之上提供声明分区的功能. 作用 Oracle数据库基于外部HDFS的数据存储功能,使得数据库分区能

插入数据报错Data truncation: Data too long for column

问题描述 使用mysql数据库字段description,类型varchar,长度200,输入200个字符(中间有个回车),alert出来是200个字符,但插入报错,再删掉一个字符就没问题.不知道是什么原因...  问题补充:<div class="quote_title">khan 写道</div><div class="quote_div">不好意思,回车也是字符,在存储的时候也要占空间的啊.</div><b

ajax post data 获取不到数据,注意content-type的设置post/get

  ajax post  data  获取不到数据,注意 content-type的设置 .post/get 关于 jQuery data 传递数据.网上各种获取不到数据,乱码之类的. 好吧今天我也遇到了,网上一查各种纠结.乱码不管先看获取不到数据. 因为之前一直用jQuery ajax get的方式传递参数, 默认没有设置过 contentType 的值.       var Skip = 49; //Number of skipped row      var Take = 14; //  

如何创建一个成功的数据仓库(data warehouse) (想了解数据仓库的人士快看)

创建|数据        如何创建一个成功的数据仓库(data warehose),下面的故事将告诉你!       The company's first data warehouse project began with a casual conversation between several executives on their way to lunch. The people involved were the IT manager for decision support as w

Data Blocks and Freelists (from ixora)

Questions and Answers Data Blocks and Freelists Transaction and process freelists 26 October 1998 You mentioned that there are different types of free lists. Could you please explain a bit more about all this?   Each segment has at least a master fre

基于 IBM SPSS Data Collection 自动化脚本的数据处理解决方案

背景介绍:为什么在数据采集之后需要数据处理 Data Collection 的主要功能是数据收集.它以问卷为基础,支持多种方式包括 WEB.CAPI.CATI 来收集数据, 并且支持以多种数据格式存储来满足各种各样的用户需求,主要涵盖以下工具: 典型的 Data Collection 业务流程一般是这样的: 通过 Interview Server / Professional / Interviewer / Paper / Scan,可以将被访者的问卷答案收集到数据库中: 继而在 Profess

jquery click([data],fn)使用方法实例介绍

大概意思就是触发每一个匹配元素的click事件,本文通过一个实例为大家详细介绍下jquery click([data],fn)的使用方法,感兴趣的朋友可以参考下哈,希望对大家有所帮助   click([[data],fn]) 返回值:jQuery 概述 触发每一个匹配元素的click事件. 这个函数会调用执行绑定到click事件的所有函数. 参数 fnFunctionV1.0 在每一个匹配元素的click事件中绑定的处理函数. [data],fnString,FunctionV1.4.3 dat

jQuery中使用data()方法读取HTML5自定义属性data-*实例

 如果你使用jQuery类库,那么你可以非常愉悦的使用jquery的data()方法存取data-* 自定义属性,方法允许我们在DOM元素上绑定任意类型的数据,避免了循环引用的内存泄漏风险 主要的方法如下: 代码如下: .data( key, value ) .data( obj ) .data( key ) .data() 从jQuery 1.4.3起, HTML 5 data- 属性 将自动被引用到jQuery的数据对象中. 例如HTML: 复制代码 代码如下:<div data-role