(转)Predictive learning vs. representation learning 预测学习 与 表示学习

 

Predictive learning vs. representation learning  预测学习 与 表示学习

 

When you take a machine learning class, there’s a good chance it’s divided into a unit on supervised learning and a unit on unsupervised learning. We certainly care about this distinction for a practical reason: often there’s orders of magnitude more data available if we don’t need to collect ground-truth labels. But we also tend to think it matters for more fundamental reasons. In particular, the following are some common intuitions:

  • In supervised learning, the particular algorithm is usually less important than engineering and tuning it really well. In unsupervised learning, we’d think carefully about the structure of the data and build a model which reflects that structure.
  • In supervised learning, except in small-data settings, we throw whatever features we can think of at the problem. In unsupervised learning, we carefully pick the features we think best represent the aspects of the data we care about.
  • Supervised learning seems to have many algorithms with strong theoretical guarantees, and unsupervised learning very few.
  • Off-the-shelf algorithms perform very well on a wide variety of supervised tasks, but unsupervised learning requires more care and expertise to come up with an appropriate model.

I’d argue that this is deceptive. I think real division in machine learning isn’t between supervised and unsupervised, but what I’ll term predictive learning and representation learning. I haven’t heard it described in precisely this way before, but I think this distinction reflects a lot of our intuitions about how to approach a given machine learning problem.

In predictive learning, we observe data drawn from some distribution, and we are interested in predicting some aspect of this distribution. In textbook supervised learning, for instance, we observe a bunch of pairs , and given some new example , we’re interested in predicting something about the corresponding . In density modeling (a form of unsupervised learning), we observe unlabeled data , and we are interested in modeling the distribution the data comes from, perhaps so we can perform inference in that distribution. In each of these cases, there is a well-defined predictive task where we try to predict some aspect of the observable values possibly given some other aspect.

In representation learning, our goal isn’t to predict observables, but to learn something about the underlying structure. In cognitive science and AI, a representation is a formal system which maps to some domain of interest in systematic ways. A good representation allows us to answer queries about the domain by manipulating that system. In machine learning, representations often take the form of vectors, either real- or binary-valued, and we can manipulate these representations with operations like Euclidean distance and matrix multiplication. For instance, PCA learns representations of data points as vectors. We can ask how similar two data points are by checking the Euclidean distance between them.

In representation learning, the goal isn’t to make predictions about observables, but to learn a representation which would later help us to answer various queries. Sometimes the representations are meant for people, such as when we visualize data as a two-dimensional embedding. Sometimes they’re meant for machines, such as when the binary vector representations learned by deep Boltzmann machines are fed into a supervised classifier. In either case, what’s important is that mathematical operations map to the underlying relationships in the data in systematic ways.

Whether your goal is prediction or representation learning influences the sorts of techniques you’ll use to solve the problem. If you’re doing predictive learning, you’ll probably try to engineer a system which exploits as much information as possible about the data, carefully using a validation set to tune parameters and monitor overfitting. If you’re doing representation learning, there’s no good quantitative criterion, so you’ll more likely build a model based on your intuitions about the domain, and then keep staring at the learned representations to see if they make intuitive sense.

In other words, it parallels the differences I listed above between supervised and unsupervised learning. This shouldn’t be surprising, because the two dimensions are strongly correlated: most supervised learning is predictive learning, and most unsupervised learning is representation learning. So to see which of these dimensions is really the crux of the issue, let’s look at cases where the two differ.

Language modeling is a perfect example of an application which is unsupervised but predictive. The goal is to take a large corpus of unlabeled text (such as Wikipedia) and learn a distribution over English sentences. The problem is motivated by Bayesian models for speech recognition: a distribution over sentences can be used as a prior for what a person is likely to say. The goal, then, is to model the distribution, and any additional structure is unnecessary. Log-linear models, such as that of Mnih et al. [1], are very good at this, and recurrent neural nets [2] are even better. These are the sorts of approaches we’d normally apply in a supervised setting: very good at making predictions, but often hard to interpret. One state-of-the-art algorithm for density modeling of text is PAQ [3], which is a heavily engineered ensemble of sequential predictors, somewhat reminiscent of the winning entries of the Netflix competition.

On the flip side, supervised neural nets are often used to learn representations. One example is Collobert-Weston networks [4], which attempt to solve a number of supervised NLP tasks by learning representations which are shared between them. Some of the tasks are fairly simple and have a large amount of labeled data, such as predicting which of two words should be used to fill in the blank. Others are harder and have less data available, such as semantic role labeling. The simpler tasks are artificial, and they are there to help learn a representation of words and phrases as vectors, where similar words and phrases map to nearby vectors; this representation should then help performance on the harder tasks. We don’t care about the performance on those tasks per se; we care whether the learned embeddings reflect the underlying structure. To debug and tune the algorithm, we’d focus on whether the representations make intuitive sense, rather than on the quantitative performance. There are no theoretical guarantees that such an approach would work — it all depends on our intuitions of how the different tasks are related.

Based on these two examples, it seems like it’s the predictive/representation dimension which determines how we should approach the problem, rather than supervised/unsupervised.

In machine learning, we tend to think there’s no solid theoretical framework for unsupervised learning. But really, the problem is that we haven’t begun to formally characterize the problem of representation learning. If you just want to build a density modeler, that’s about as well understood as the supervised case. But if the goal is to learn representations which capture the underlying structure, that’s much harder to formalize. In my next post, I’ll try to take a stab at characterizing what representation learning is actually about.

[1] Mnih, A., and Hinton, G. E. Three new graphical models for statistical language modeling. NIPS 2009

[2] Sutskever, I., Martens, J., and Hinton, G. E. Generating text with recurrent neural networks. ICML 2011

[3] Mahoney, M. Adaptive weighting of context models for lossless data compression. Florida Institute of Technology Tech report, 2005

[4] Collobert, R., and Weston, J. A unified architecture for natural language processing: deep neural networks with multitask learning. ICML 2008

 

Posted in Machine Learning.

No comments

By Roger Grosse – February 4, 2013 

 

 

时间: 2024-09-09 03:51:10

(转)Predictive learning vs. representation learning 预测学习 与 表示学习的相关文章

(zhuan) Notes on Representation Learning

this blog from: https://opendatascience.com/blog/notes-on-representation-learning-1/   Notes on Representation Learning By Zac Kriegman, Senior Data Scientist in the Thomson Reuters Data Innovation Lab | 02/07/2017   Tags: Deep Learning , Neural Netw

Representation Learning on Network 网络表示学习

Note: 以下是根据综述 [1][2] 梳理的笔记,由于是初探,语言和理论必然有不严谨之处,欢迎指正. 网络表示学习(Representation Learning on Network),一般说的就是向量化(Embedding)技术,简单来说,就是将网络中的结构(节点.边或者子图),通过一系列过程,变成一个多维向量,通过这样一层转化,能够将复杂的网络信息变成结构化的多维特征,从而利用机器学习方法实现更方便的算法应用. Embedding Nodes 在这些方法中,受研究和应用关注最多的就是节

论文笔记之:UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

  UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS  ICLR 2016    摘要:近年来 CNN 在监督学习领域的巨大成功 和 无监督学习领域的无人问津形成了鲜明的对比,本文旨在链接上这两者之间的缺口.提出了一种 deep convolutional generative adversarial networks (DCGANs),that have ce

学习笔记DL002:AI、机器学习、表示学习、深度学习,第一次大衰退

AI早期成就,相对朴素形式化环境,不要求世界知识.如IBM深蓝(Deep Blue)国际象棋系统,1997,击败世界冠军Garry Kasparov(Hsu,2002).国际象棋,简单领域,64个位置,严格限制方式移动32个棋子.可由简短.完全形式化规则列表描述,容易事先准备.抽象.形式化,是人类最困难脑力任务,但计算机最容易.早期打败人类最好象棋选手,最近识别对象.语音任务达到人类平均水平.日常生活需要世界巨量知识,主观.直观,很难形式化表达.计算机智能需要获取同样知识.关键挑战,非形式化知识

专访阿里巴巴徐盈辉:深度学习和强化学习技术首次在双11中的大规模应用

12月6日-7日,由阿里巴巴集团.阿里巴巴技术发展部.阿里云联合主办,以"2016双11技术创新"为主题的阿里巴巴技术论坛(Alibaba Technology Forum,ATF)将在线举办.(https://yq.aliyun.com/promotion/139) 系列文章陆续发布: 专访阿里巴巴徐盈辉:深度学习和强化学习技术首次在双11中的大规模应用 专访阿里巴巴林伟:三项世界级挑战背后的思考.实践和经验 专访阿里巴巴魏虎:揭秘阿里双11背后的全站个性化&商铺千人千面 价

编程学习:Java学习从入门到精通

编程 Java Learning Path (一).工具篇 一. JDK (Java Development Kit) JDK是整个Java的核心,包括了Java运行环境(Java Runtime Envirnment),一堆Java工具和Java基础的类库(rt.jar).不论什么Java应用服务器实质都是内置了某个版本的JDK.因此掌握JDK是学好Java的第一步.最主流的JDK是Sun公司发布的JDK,除了Sun之外,还有很多公司和组织都开发了自己的JDK,例如IBM公司开发的JDK,BE

深度学习网络大杀器之Dropout(II)——将丢弃学习视为集成学习之我见

首发地址:https://yq.aliyun.com/articles/110002 更多深度文章,请关注云计算频道:https://yq.aliyun.com/cloud 关于dropout的分析,可以见博主的另外一篇文章: <深度学习网络大杀器之Dropout--深入解析Dropout> 1.引言 随着2012年Hiton的文章<ImageNet classification with deep convolutional neural networks>[1]的问世,掀开了学

BAIR论文:通过“元学习”和“一次性学习”算法,让机器人快速掌握新技能

我们都知道,深度学习是在大数据的背景下火起来的,传统的基于梯度的深度神经网络需要大量的数据学习,而绝大多数的深度学习内容否基于大数据量下的广泛迭代训练,当遇到新信息时往往会出现模型失效的情况从而需要重新进行学习.在机器人领域,深度神经网络可以是机器人展示出复杂的技能,但在实际应用中,一旦环境发生变化,从头学习技能并不可行.因此,如何让机器"一次性学习",即在"看"了一次演示后无需事先了解新的环境场景,能在不同环境中重复工作尤为重要. 研究发现,具有增强记忆能力的架构

【2万赞】一文读懂深度学习(附学习资源)

Image credit: Datanami   人工智能(AI)和机器学习(ML)都属于目前最热门的话题. 在日常生活中,AI这个术语我们随处可见.你或许会从立志高远的开发者那里听说她(他)们想要学习AI.你又或许会从运营者那里听到他们想要在他们的的服务中实施AI.但往往这些人中的绝大多数都并不明白什么是AI. 在你阅读完这篇文章之后,你将会了解AI和ML的基本知识.而更重要的是,你将会明白深度学习(https://en.wikipedia.org/wiki/Deep_learning),这类