(转) ICCV 2015:21篇最火爆研究论文

 

 

 


ICCV 2015:21篇最火爆研究论文

ICCV 2015: Twenty one hottest research papers

 

“Geometry vs Recognition” becomes ConvNet-for-X

Computer Vision used to be cleanly separated into two schools: geometry and recognition. Geometric methods like structure from motion and optical flow usually focus on measuring objective real-world quantities like 3D “real-world” distances directly from images and recognition techniques like support vector machines and probabilistic graphical models traditionally focus on perceiving high-level semantic information (i.e., is this a dog or a table) directly from images.

The world of computer vision is changing fast has changed. We now have powerful convolutional neural networks that are able to extract just about anything directly from images. So if your input is an image (or set of images), then there’s probably a ConvNet for your problem.  While you do need a large labeled dataset, believe me when I say that collecting a large dataset is much easier than manually tweaking knobs inside your 100K-line codebase. As we’re about to see, the separation between geometric methods and learning-based methods is no longer easily discernible.

By 2016 just about everybody in the computer vision community will have tasted the power of ConvNets, so let’s take a look at some of the hottest new research directions in computer vision.

ICCV 2015’s Twenty One Hottest Research Papers

 

This December in Santiago, Chile, the International Conference of Computer Vision 2015 is going to bring together the world’s leading researchers in Computer Vision, Machine Learning, and Computer Graphics.

To no surprise, this year’s ICCV is filled with lots of ConvNets, but this time the applications of these Deep Learning tools are being applied to much much more creative tasks. Let’s take a look at the following twenty one ICCV 2015 research papers, which will hopefully give you a taste of where the field is going.

1. Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images Mateusz Malinowski, Marcus Rohrbach, Mario Fritz

“We propose a novel approach based on recurrent neural networks for the challenging task of answering of questions about images. It combines a CNN with a LSTM into an end-to-end architecture that predict answers conditioning on a question and an image.”

2. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler


“To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book.”

3. Learning to See by Moving Pulkit Agrawal, Joao Carreira, Jitendra Malik

“We show that using the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt using class-label as supervision on the tasks of scene recognition, object recognition, visual odometry and keypoint matching.”

4. Local Convolutional Features With Unsupervised Training for Image Retrieval Mattis Paulin, Matthijs Douze, Zaid Harchaoui, Julien Mairal, Florent Perronin, Cordelia Schmid

“We introduce a deep convolutional architecture that yields patch-level descriptors, as an alternative to the popular SIFT descriptor for image retrieval.”

5. Deep Networks for Image Super-Resolution With Sparse Prior Zhaowen Wang, Ding Liu, Jianchao Yang, Wei Han, Thomas Huang

“We show that a sparse coding model particularly designed for super-resolution can be incarnated as a neural network, and trained in a cascaded structure from end to end.”

6. High-for-Low and Low-for-High: Efficient Boundary Detection From Deep Object Features and its Applications to High-Level Vision Gedas Bertasius, Jianbo Shi, Lorenzo Torresani

“In this work we show how to predict boundaries by exploiting object level features from a pretrained object-classification network.”

7. A Deep Visual Correspondence Embedding Model for Stereo Matching Costs Zhuoyuan Chen, Xun Sun, Liang Wang, Yinan Yu, Chang Huang

“A novel deep visual correspondence embedding model is trained via Convolutional Neural Network on a large set of stereo images with ground truth disparities. This deep embedding model leverages appearance data to learn visual similarity relationships between corresponding image patches, and explicitly maps intensity values into an embedding feature space to measure pixel dissimilarities.”

8. Im2Calories: Towards an Automated Mobile Vision Food Diary Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, Kevin P. Murphy

“We present a system which can recognize the contents of your meal from a single image, and then predict its nutritional contents, such as calories.”

9. Unsupervised Visual Representation Learning by Context Prediction Carl Doersch, Abhinav Gupta, Alexei A. Efros

“How can one write an objective function to encourage a representation to capture, for example, objects, if none of the objects are labeled?”

10. Deep Neural Decision Forests Peter Kontschieder, Madalina Fiterau, Antonio Criminisi, Samuel Rota Bulò

“We introduce a stochastic and differentiable decision tree model, which steers the representation learning usually conducted in the initial layers of a (deep) convolutional network.”

11. Conditional Random Fields as Recurrent Neural Networks Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr

“We formulate mean-field approximate inference for the Conditional Random Fields with Gaussian pairwise potentials as Recurrent Neural Networks.”

12. Flowing ConvNets for Human Pose Estimation in Videos Tomas Pfister, James Charles, Andrew Zisserman

“We investigate a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow.”

13. Dense Optical Flow Prediction From a Static Image Jacob Walker, Abhinav Gupta, Martial Hebert


“Given a static image, P-CNN predicts the future motion of each and every pixel in the image in terms of optical flow. Our P-CNN model leverages the data in tens of thousands of realistic videos to train our model. Our method relies on absolutely no human labeling and is able to predict motion based on the context of the scene.”

14. DeepBox: Learning Objectness With Convolutional Networks Weicheng Kuo, Bharath Hariharan, Jitendra Malik

“Our framework, which we call DeepBox, uses convolutional neural networks (CNNs) to rerank proposals from a bottom-up method.”

15. Active Object Localization With Deep Reinforcement Learning Juan C. Caicedo, Svetlana Lazebnik

“This agent learns to deform a bounding box using simple transformation actions, with the goal of determining the most specific location of target objects following top-down reasoning.”

16. Predicting Depth, Surface Normals and Semantic Labels With a Common Multi-Scale Convolutional Architecture David Eigen, Rob Fergus

“We address three different computer vision tasks using a single multiscale convolutional network architecture: depth prediction, surface normal estimation, and semantic labeling.”

17. HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, Yizhou Yu

“We introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers.”

18. FlowNet: Learning Optical Flow With Convolutional NetworksAlexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Häusser, Caner Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox

“We construct appropriate CNNs which are capable of solving the optical flow estimation problem as a supervised learning task.”

19. Understanding Deep Features With Computer-Generated Imagery Mathieu Aubry, Bryan C. Russell


“Rendered images are presented to a trained CNN and responses for different layers are studied with respect to the input scene factors.”

20. PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization Alex Kendall, Matthew Grimes, Roberto Cipolla

“Our system trains a convolutional neural network to regress the 6-DOF camera pose from a single RGB image in an end-to-end manner with no need of additional engineering or graph optimisation.”

21. Visual Tracking With Fully Convolutional Networks Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu

“A new approach for general object tracking with fully convolutional neural network.”

Conclusion

While some can argue that the great convergence upon ConvNets is making the field less diverse, it is actually making the techniques easier to comprehend. It is easier to “borrow breakthrough thinking” from one research direction when the core computations are cast in the language of ConvNets. Using ConvNets, properly trained (and motivated!) 21 year old graduate student are actually able to compete on benchmarks, where previously it would take an entire 6-year PhD cycle to compete on a non-trivial benchmark.

See you next week in Chile!


Update (January 13th, 2016)

The following awards were given at ICCV 2015.

Achievement awards

  • PAMI Distinguished Researcher Award (1): Yann LeCun
  • PAMI Distinguished Researcher Award (2): David Lowe
  • PAMI Everingham Prize Winner (1): Andrea Vedaldi for VLFeat
  • PAMI Everingham Prize Winner (2): Daniel Scharstein and Rick Szeliski for the Middlebury Datasets

Paper awards

  • PAMI Helmholtz Prize (1): David MartinCharles FowlkesDoron Tal, and Jitendra Malik for their ICCV 2001 paper “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics”.
  • PAMI Helmholtz Prize (2): Serge BelongieJitendra Malik, and Jan Puzicha, for their ICCV 2001 paper “Matching Shapes”.
  • Marr Prize: Peter KontschiederMadalina FiterauAntonio Criminisi, and Samual Rota Bulo, for “Deep Neural Decision Forests”.
  • Marr Prize honorable mention: Saining Xie and Zhuowen Tu for“Holistically-Nested Edge Detection”.

For more information about awards, see Sebastian Nowozin’s ICCV-day-2 blog post.

 

转载于:http://www.computervisionblog.com/2015/12/iccv-2015-twenty-one-hottest-research.html

如果您对该机器学习、图像视觉算法技术感兴趣,可以关注新浪微博:视觉机器人

 

 

 

 

 

 

本文来自:视觉机器人,永久链接:http://www.cvrobot.net/iccv-2015-twenty-one-hottest-research-papers/
2016年3月11日, 星期五, 10:45 订阅本文评论 订阅本站 引用本文
算法技术视觉算法

|更多

 

相关文章

 

发表见解

 

昵称:(必填)

邮箱:(必填)

地址:(以便回访)

 

                     

 

页面

E-mail: cvrobot@163.com

    • 备案号:

沪ICP备15010509号您是第 512294 位访客

版权声明

本站内容受《著作权法》保护.
版权所有 2004-2015 视觉机器人. 保留所有权.

网站开发

Designed by 视觉机器人

 

时间: 2024-08-01 08:25:15

(转) ICCV 2015:21篇最火爆研究论文的相关文章

深度学习-一篇图像标注论文的实现

问题描述 一篇图像标注论文的实现 10C 求问有大神知道这篇文章在哪里有人实现或者有大神实现了这篇文章?Deep Convolutional Ranking for Multilabel Image Annotation

英文论文 文献-怎么找计算机专业相关的最新的研究论文?英文的,毕设翻译用。

问题描述 怎么找计算机专业相关的最新的研究论文?英文的,毕设翻译用. 如题.想搜索最前沿的技术的相关论文.但不知道怎么去找.请各位英雄好汉指导一下. 解决方案 知网,不过要收费,SCI,很权威和有深度 解决方案二: 你们学校(一般都学都会购买一部分资源)应该有免费的知网资源 毕设英文翻译随便搞 解决方案三: 教育网不是内部有论文期刊网 各个高校图书馆也有查询的 解决方案四: 用google学术啊. 地址:scholar.google.com 只要是非保密的全球各种论文,基本都能知道. 但是为了社

[Qt教程] 第21篇 数据库(一)Qt数据库应用简介

[Qt教程] 第21篇 数据库(一)Qt数据库应用简介 楼主  发表于 2013-5-13 20:56:39 | 查看: 1403| 回复: 13 Qt数据库应用简介 版权声明 该文章原创于作者yafeilinux,转载请注明出处! 导语 下面十节讲解数据库和XML的相关内容.在学习数据库相关内容前,建议大家掌握一些基本的SQL知识,应该可以看懂基本的SELECT.INSERT.UPDATE和DELETE等语句,因为在这几篇教程中使用的都是非常简单的操作,所以即便没有数据库的专业知识也可以看懂!

9 篇顶会论文解读推荐中的序列化建模:Session-based Neural Recommendation

前言 本文对 Recurrent Neural Network 在推荐领域的序列数据建模进行梳理,整理推荐领域和深度学习领域顶会 RecSys.ICLR 等中的 9 篇论文进行整理.图片和文字来源于原文,帮助读者理解,有争议的请联系我. Session-based neural recommendation 首先介绍下 session-based 的概念:session 是服务器端用来记录识别用户的一种机制.典型的场景比如购物车,服务端为特定的对象创建了特定的 Session,用于标识这个对象,

21篇内容可视化文章,让你的社会化媒体与众不同

摘要: 可视化内容对于营销者的社会化战略很重要.因为越来越多的消费者对图片等可视化得内容越来越感兴趣,这点从流行网站就能看出来,如Pinterest, Instagram and Snapchat .因为相较文本,人 可视化内容对于营销者的社会化战略很重要.因为越来越多的消费者对图片等可视化得内容越来越感兴趣,这点从流行网站就能看出来,如Pinterest, Instagram and Snapchat .因为相较文本,人们更容易被可视化的图片或内容所打动.事实上,消费者更易因图片影像而采取实际

国内首篇网游论文发表女硕士提倡游戏流畅感

国内首篇与网游相关的论文发表在杂志上,作者是一名女硕士,她的论文<流畅感理论综述及其运用一由网游引发的若干思考>,深入谈及到"游戏的流畅感",并提出应该将这种游戏体验,迁移到教学或者工作当中,引起了游戏业界的关注. 根据原文描述,"流畅感,也有人译作沉浸,指的是一种暂时性.主观的经验,当人们在进行活动的时候如果完全的投入情境中,就是进入一种沉浸的状态.在网络游戏中也有可能产生这样的状态,而这也是许多人会对网络游戏不可自拔的着迷的原因." 学术界已经关注到

围观腾讯 AI Lab 的4篇 ICML 入选论文 | ICML 2017

ICML是国际顶级的机器学习会议,它与NIPS一起,是机器学习与人工智能研究领域影响力极高的两个主要会议.今年的ICML将于8月6-11日在澳大利亚悉尼召开,届时雷锋网(公众号:雷锋网) AI 科技评论也将前往现场进行报道. 作为国内著名的人工智能研究机构,腾讯 AI Lab 也有4篇论文入选了今年的 ICML.雷锋网 AI 科技评论对这4篇论文简单介绍如下. Efficient Distributed Learning with Sparsity 「高效的分布式稀疏学习」 论文作者:王佳磊(芝

(转) SLAM系统的研究点介绍 与 Kinect视觉SLAM技术介绍

      首页 视界智尚 算法技术 每日技术 来打我呀 注册     SLAM系统的研究点介绍          本文主要谈谈SLAM中的各个研究点,为研究生们(应该是博客的多数读者吧)作一个提纲挈领的摘要.然后,我们再就各个小问题,讲讲经典的算法与分类. 1. 前言         在<SLAM for Dummy>中,有一句话说的好:"SLAM并不是一种算法,而是一个概念.(SLAM is more like a concept than a single algorithm.

(转) 行人检测资源 综述文献

      首页 视界智尚 算法技术 每日技术 来打我呀 注册     行人检测资源(上)综述文献         行人检测具有极其广泛的应用:智能辅助驾驶,智能监控,行人分析以及智能机器人等领域.从2005年以来行人检测进入了一个快速的发展阶段,但是也存在很多问题还有待解决,主要还是在性能和速度方面还不能达到一个权衡.近年,以谷歌为首的自动驾驶技术的研发正如火如荼的进行,这也迫切需要能对行人进行快速有效的检测,以保证自动驾驶期间对行人的安全不会产生威胁. 1   行人检测的现状