(转) 实时SLAM的未来及与深度学习的比较

 

 

 

 

 

实时SLAM的未来及与深度学习的比较

The Future of Real-Time SLAM and “Deep Learning vs SLAM”

 

Last month’s International Conference of Computer Vision (ICCV) was full of Deep Learning techniques, but before we declare an all-out ConvNet victory, let’s see how the other “non-learning” geometric side of computer vision is doing.  SimultaneousLocalization and Mapping, orSLAM, is arguably one of the most important algorithms in Robotics, with pioneering work done by both computer vision and robotics research communities.  Today I’ll be summarizing my key points from ICCV’s Future of Real-Time SLAM Workshop, which was held on the last day of the conference (December 18th, 2015).Today’s post contains a brief introduction to SLAM, a detailed description of what happened at the workshop (with summaries of all 7 talks), and some take-home messages from the Deep Learning-focused panel discussion at the end of the session.

SLAM visualizations. Can you identify any of these SLAM algorithms?

Part I: Why SLAM Matters

Visual SLAM algorithms are able to simultaneously build 3D maps of the world while tracking the location and orientation of the camera (hand-held or head-mounted for AR or mounted on a robot). SLAM algorithms are complementary to ConvNets and Deep Learning: SLAM focuses on geometric problems and Deep Learning is the master of perception (recognition) problems. If you want a robot to go towards your refrigerator without hitting a wall, use SLAM. If you want the robot to identify the items inside your fridge, use ConvNets.

Basics of SfM/SLAM: From point observation and intrinsic camera parameters, the 3D structure of a scene is computed from the estimated motion of the camera. For details, see openMVG website.

 

SLAM is a real-time version of Structure from Motion (SfM). Visual SLAM or vision-based SLAM is a camera-only variant of SLAM which forgoes expensive laser sensors and inertial measurement units (IMUs). Monocular SLAM uses a single camera while non-monocular SLAM typically uses a pre-calibrated fixed-baseline stereo camera rig. SLAM is prime example of a what is called a “Geometric Method” in Computer Vision. In fact, CMU’s Robotics Institute splits the graduate level computer vision curriculum into a Learning-based Methods in Vision course and a separate Geometry-Based Methods in Vision course.

 

Structure from Motion vs Visual SLAM

Structure from Motion (SfM) and SLAM are solving a very similar problem, but while SfM is traditionally performed in an offline fashion, SLAM has been slowly moving towards the low-power / real-time / single RGB camera mode of operation. Many of the today’s top experts in Structure from Motion work for some of the world’s biggest tech companies, helping make maps better. Successful mapping products like Google Maps could not have been built without intimate knowledge of multiple-view geometry, SfM, and SLAM.  A typical SfM problem is the following: given a large collection of photos of a single outdoor structure (like the Colliseum), construct a 3D model of the structure and determine the camera’s poses. The image collection is processed in an offline setting, and large reconstructions can take anywhere between hours and days.

 

 

SfM SoftwareBundler is one of the most successful SfM open source libraries

 

Here are some popular SfM-related software libraries:

 

Visual SLAM vs Autonomous Driving

While self-driving cars are one of the most important applications of SLAM, according to Andrew Davison, one of the workshop organizers, SLAM for Autonomous Vehicles deserves its own research track. (And as we’ll see, none of the workshop presenters talked about self-driving cars). For many years to come it will make sense to continue studying SLAM from a research perspective, independent of any single Holy-Grail application. While there are just too many system-level details and tricks involved with autonomous vehicles, research-grade SLAM systems require very little more than a webcam, knowledge of algorithms, and elbow grease. As a research topic, Visual SLAM is much friendlier to thousands of early-stage PhD students who’ll first need years of in-lab experience with SLAM before even starting to think about expensive robotic platforms such as self-driving cars.

 

Google’s Self-Driving Car’s perception system. From IEEE Spectrum’s “How Google’s Self-Driving Car Works

Related: March 2015 blog post, Mobileye’s quest to put Deep Learning inside every new car.

Related: One way Google’s Cars Localize Themselves

Part II: The Future of Real-time SLAM

Now it’s time to officially summarize and comment on the presentations from The Future of Real-time SLAM workshop. Andrew Davison started the day with an excellent historical overview of SLAM called 15 years of vision-based SLAM, and his slides have good content for an introductory robotics course.

 

For those of you who don’t know Andy, he is the one and only Professor Andrew Davison of Imperial College London.  Most known for his 2003 MonoSLAM system, he was one of the first to show how to build SLAM systems from a single “monocular” camera at a time when just everybody thought you needed a stereo “binocular” camera rig. More recently, his work has influenced the trajectory of companies such as Dyson and the capabilities of their robotic systems (e.g., the brand new Dyson360).

 

I remember Professor Davidson from the Visual SLAM tutorial he gave at the BMVC Conference back in 2007. Surprisingly very little has changed in SLAM compared to the rest of the machine-learning heavy work being done at the main vision conferences. In the past 8 years, object recognition has undergone 2-3 mini revolutions, while today’s SLAM systems don’t look much different than they did 8 years ago. The best way to see the progress of SLAM is to take a look at the most successful and memorable systems. In Davison’s workshop introduction talk, he discussed some of these exemplary systems which were produced by the research community over the last 10-15 years:

  • MonoSLAM
  • PTAM
  • FAB-MAP
  • DTAM
  • KinectFusion

 

Davison vs Horn: The next chapter in Robot Vision
Davison also mentioned that he is working on a new Robot Vision book, which should be an exciting treat for researchers in computer vision, robotics, and artificial intelligence. The last Robot Vision book was written by B.K. Horn (1986), and it’s about time for an updated take on Robot Vision.

 

 

A new robot vision book?

While I’ll gladly read a tome that focuses on the philosophy of robot vision, personally I would like the book to focus on practical algorithms for robot vision, like the excellent Multiple View Geometry book by Hartley and Zissermann orProbabilistic Robotics by Thrun, Burgard, and Fox. A “cookbook” of visual SLAM problems would be a welcome addition to any serious vision researcher’s collection.

 

Related: Davison’s 15-years of vision-based SLAM slides

 

Talk 1: Christian Kerl on Continuous Trajectories in SLAM

The first talk, by Christian Kerl, presented a dense tracking method to estimate a continuous-time trajectory. The key observation is that most SLAM systems estimate camera poses at a discrete number of time steps (either they key frames which are spaced several seconds apart, or the individual frames which are spaced approximately 1/25s apart).

Continuous Trajectories vs Discrete Time Points. SLAM/SfM usually uses discrete time points, but why not go continuous?

 

Much of Kerl’s talk was focused on undoing the damage of rolling shutter cameras, and the system demo’ed by Kerl paid meticulous attention to modeling and removing these adverse rolling shutter effects.

Undoing the damage of rolling shutter in Visual SLAM.

 

Related: Kerl’s Dense continous-time tracking and mapping slides.
Related: Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras (C. Kerl, J. Stueckler, D. Cremers), In IEEE International Conference on Computer Vision (ICCV), 2015. [pdf]

 

Talk 2: Semi-Dense Direct SLAM by Jakob Engel

LSD-SLAM came out at ECCV 2014 and is one of my favorite SLAM systems today! Jakob Engel was there to present his system and show the crowd some of the coolest SLAM visualizations in town. LSD-SLAM is an acronym for Large-Scale Direct Monocular SLAM. LSD-SLAM is an important system for SLAM researchers because it does not use corners or any other local features. Direct tracking is performed by image-to-image alignment using a coarse-to-fine algorithm with a robust Huber loss. This is quite different than the feature-based systems out there. Depth estimation uses an inverse depth parametrization (like many other SLAM systems) and uses a large number or relatively small baseline image pairs. Rather than relying on image features, the algorithms is effectively performing “texture tracking”. Global mapping is performed by creating and solving a pose graph “bundle adjustment” optimization problem, and all of this works in real-time. The method is semi-dense because it only estimates depth at pixels solely near image boundaries. LSD-SLAM output is denser than traditional features, but not fully dense like Kinect-style RGBD SLAM.

LSD-SLAM in Action: LSD-SLAM generates both a camera trajectory and a semi-dense 3D scene reconstruction. This approach works in real-time, does not use feature points as primitives, and performs direct image-to-image alignment.

Engel gave us an overview of the original LSD-SLAM system as well as a handful of new results, extending their initial system to more creative applications and to more interesting deployments. (See paper citations below)Related: LSD-SLAM Open-Source Code on github LSD-SLAM project webpage
Related: LSD-SLAM: Large-Scale Direct Monocular SLAM (J. Engel, T. Schöps, D. Cremers), In European Conference on Computer Vision (ECCV), 2014. [pdf] [video]

 

 

An extension to LSD-SLAM, Omni LSD-SLAM was created by the observation that the pinhole model does not allow for a large field of view. This work was presented at IROS 2015 (Caruso is first author) and allows a large field of view (ideally more than 180 degrees). From Engel’s presentation it was pretty clear that you can perform ballerina-like motions (extreme rotations) while walking around your office and holding the camera. This is one of those worst-case scenarios for narrow field of view SLAM, yet works quite well in Omni LSD-SLAM.

 

Omnidirectional LSD-SLAM Model. See Engel’s Semi-Dense Direct SLAM presentation slides.

 

Related: Large-Scale Direct SLAM for Omnidirectional Cameras (D. Caruso, J. Engel, D. Cremers), In International Conference on Intelligent Robots and Systems (IROS), 2015.  [pdf] [video]Stereo LSD-SLAM is an extension of LSD-SLAM to a binocular camera rig. This helps in getting the absolute scale, initialization is instantaneous, and there are no issues with strong rotation. While monocular SLAM is very exciting from an academic point of view, if your robot is a 30,000$ car or 10,000$ drone prototype, you should have a good reason to not use a two+ camera rig. Stereo LSD-SLAM performs quite competitively on SLAM benchmarks.

Stereo LSD-SLAM. Excellent results on KITTI vehicle-SLAM dataset.

 

Stereo LSD-SLAM is quite practical, optimizes a pose graph in SE(3), and includes a correction for auto exposure. The goal of auto-exposure correcting is to make the error function invariant to affine lighting changes. The underlying parameters of the color-space affine transform are estimated during matching, but thrown away to estimate the image-to-image error. From Engel’s talk, outliers (often caused by over-exposed image pixels) tend to be a problem, and much care needs to be taken to care of their effects.Related: Large-Scale Direct SLAM with Stereo Cameras (J. Engel, J. Stueckler, D. Cremers), In International Conference on Intelligent Robots and Systems (IROS), 2015.  [pdf] [video]

Later in his presentation, Engel gave us a sneak peak on new research about integrating both stereo and inertial sensors. For details, you’ll have to keep hitting refresh on Arxiv or talk to Usenko/Engel in person. On the applications side, Engel’s presentation included updated videos of an Autonomous Quadrotor driven by LSD-SLAM. The flight starts with an up-down motion to get the scale estimate and a free-space octomap is used to estimate the free-space so that the quadrotor can navigate space on its own. Stay tuned for an official publication…

Quadrotor running Stereo LSD-SLAM.

See Engel’s quadrotor youtube video from 2012.

The story of LSD-SLAM is also the story of feature-based vs direct-methods and Engel gave both sides of the debate a fair treatment. Feature-based methods are engineered to work on top of Harris-like corners, while direct methods use the entire image for alignment. Feature-based methods are faster (as of 2015), but direct methods are good for parallelism. Outliers can be retroactively removed from feature-based systems, while direct methods are less flexible w.r.t. outliners. Rolling shutter is a bigger problem for direct methods and it makes sense to use a global shutter or a rolling shutter model (see Kerl’s work). Feature-based methods require making decisions using incomplete information, but direct methods can use much more information. Feature-based methods have no need for good initialization and direct-based methods need some clever tricks for initialization. There is only about 4 years of research on direct methods and 20+ on sparse methods. Engel is optimistic that direct methods will one day rise to the top, and so am I.

Feature-based vs direct methods of building SLAM systems. Slide from Engel’s talk.

At the end of Engel’s presentation, Davison asked about semantic segmentation and Engel wondered whether semantic segmentation can be performed directly on semi-dense “near-image-boundary” data.  However, my personal opinion is that there are better ways to apply semantic segmentation to LSD-like SLAM systems. Semi-dense SLAM can focus on geometric information near boundaries, while object recognition can focus on reliable semantics away from the same boundaries, potentially creating a hybrid geometric/semantic interpretation of the image.

 

Related: Engel’s Semi-Dense Direct SLAM presentation slides

Talk 3: Sattler on The challenges of Large-Scale Localization and Mapping

Torsten Sattler gave a talk on large-scale localization and mapping. The motivation for this work is to perform 6-dof localization inside an existing map, especially for mobile localization. One of the key points in the talk was that when you are using traditional feature-based methods, storing your descriptors soon becomes very costly. Techniques such as visual vocabularies (remember product quantization?) can significantly reduce memory overhead, and with clever optimization at some point storing descriptors no longer becomes the memory bottleneck.

 

Another important take-home message from Sattler’s talk is that the number of inliers is not actually a good confidence measure for camera pose estimation.  When the feature point are all concentrated in a single part of the image, camera localization can be kilometers away! A better measure of confidence is the “effective inlier count” which looks at the area spanned by the inliers as a fraction of total image area.  What you really want is feature matches from all over the image — if the information is spread out across the image you get a much better pose estimate.

 

Sattler’s take on the future of real-time slam is the following: we should focus on compact map representations, we should get better at understanding camera pose estimate confidences (like down-weighing features from trees), we should work on more challenging scenes (such as worlds with planar structures and nighttime localization against daytime maps).

Mobile Localisation: Sattler’s key problem is localizing yourself inside a large city with a single smartphone picture

 

Related: Scalable 6-DOF Localization on Mobile Devices. Sven Middelberg, Torsten Sattler, Ole Untzelmann, Leif Kobbelt. In ECCV 2014. [pdf]
Related: Torsten Sattler ‘s The challenges of large-scale localisation and mapping slides

 

Talk 4: Mur-Artal on Feature-based vs Direct-Methods

Raúl Mur-Artal, the creator of ORB-SLAM, dedicated his entire presentation to the Feature-based vs Direct-method debate in SLAM and he’s definitely on the feature-based side. ORB-SLAM is available as an open-source SLAM package and it is hard to beat. During his evaluation of ORB-SLAM vs PTAM it seems that PTAM actually fails quite often (at least on the TUM RGB-D benchmark). LSD-SLAM errors are also much higher on the TUM RGB-D benchmark than expected.

 

Feature-Based SLAM vs Direct SLAM. See Mur-Artal’s Should we still do sparse feature based SLAM? presentation slides

 

Related: Mur-Artal’s Should we still do sparse-feature based SLAM? slides
Related: Monocular ORB-SLAM R. Mur-Artal, J. M. M. Montiel and J. D. Tardos. A versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics. 2015 [pdf]
Related: ORB-SLAM Open-source code on githubProject Website

Talk 5: Project Tango and Visual loop-closure for image-2-image constraints

Simply put, Google’s Project Tango is the world’ first attempt at commercializing SLAM. Simon Lynen from Google Zurich (formerly ETH Zurich) came to the workshop with a Tango live demo (on a tablet) and a presentation on what’s new in the world of Tango. In case you don’t already know, Google wants to put SLAM capabilities into the next generation of Android Devices.

Google’s Project Tango needs no introduction.

The Project Tango presentation discussed a new way of doing loop closure by finding certain patters in the image-to-image matching matrix. This comes from the “Placeless Place Recognition” work. They also do online bundle adjustment w/ vision-based loop closure.

Loop Closure inside a Project Tango? Lynen et al’s Placeless Place Recognition. The image-to-image matrix reveals a new way to look for loop-closure. See the algorithm in action in this youtube video.

The Project Tango folks are also working on combing multiple crowd-sourced maps at Google, where the goals to combine multiple mini-maps created by different people using Tango-equipped devices.

 

Simon showed a video of mountain bike trail tracking which is actually quite difficult in practice. The idea is to go down a mountain bike trail using a Tango device and create a map, then the follow-up goal is to have a separate person go down the trail. This currently “semi-works” when there are a few hours between the map building and the tracking step, but won’t work across weeks/months/etc.During the Tango-related discussion, Richard Newcombe pointed out that the “features” used by Project Tango are quite primitive w.r.t. getting a deeper understanding of the environment, and it appears that Project Tango-like methods won’t work on outdoor scenes where the world is plagued by non-rigidity, massive illumination changes, etc.  So are we to expect different systems being designed for outdoor systems or will Project Tango be an indoor mapping device?

 

Related: Placeless Place Recognition. Lynen, S. ; Bosse, M. ; Furgale, P. ; Siegwart, R. In 3DV 2014.

Related: Google I/O talk from May 29, 2015 about Tango

 

Talk 6: ElasticFusion is DenseSLAM without a pose-graph

ElasticFusion is a dense SLAM technique which requires a RGBD sensor like the Kinect. 2-3 minutes to obtain a high-quality 3D scan of a single room is pretty cool. A pose-graph is used behind the scenes of many (if not most) SLAM systems, and this technique has a different (map-centric) approach. The approach focuses on building a map, but the trick is that the map is deformable, hence the name ElasticFusion. The “Fusion” part of the algorithm is in homage to KinectFusion which was one of the first high quality kinect-based reconstruction pipelines. Also surfels are used as the underlying primitives.

Image from Kintinuous, an early version of Whelan’s Elastic Fusion.

 

Recovering light sources: we were given a sneak peak at new unpublished work from Imperial College London / dyson Robotics Lab. The idea is that detecting the light source direction and detecting specularities, you can improve 3D reconstruction results. Cool videos of recovering light source locations which work for up to 4 separate lights.

 

Related: Map-centric SLAM with ElasticFusion presentation slides
Related: ElasticFusion: Dense SLAM Without A Pose Graph. Whelan, Thomas and Leutenegger, Stefan and Salas-Moreno, Renato F and Glocker, Ben and Davison, Andrew J. In RSS 2015.

 

Talk 7: Richard Newcombe’s DynamicFusion
Richard Newcombe’s (whose recently formed company was acquired by Oculus), was the last presenter.  It’s really cool to see the person behind DTAMKinectFusion, and DynamicFusion now working in the VR space.



 

 

Newcombe’s Dynamic Fusion algorithm. The technique won the prestigious CVPR 2015 best paper award, and to see it in action just take a look at the authors’ DynamicFusion Youtube video.

 

 

RelatedDynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time, Richard A. Newcombe, Dieter Fox, Steven M. Seitz. In CVPR 2015. [pdf] [Best-Paper winner]

Related: SLAM++: Simultaneous Localisation and Mapping at the Level of Objects Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H. J. Kelly and Andrew J. Davison (CVPR 2013)
Related: KinectFusion: Real-Time Dense Surface Mapping and Tracking Richard A. Newcombe Shahram Izadi,Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon (ISMAR 2011, Best paper award!)

Workshop Demos

During the demo sessions (held in the middle of the workshop), many of the presenter showed off their SLAM systems in action. Many of these systems are available as open-source (free for non-commercial use?) packages, so if you’re interested in real-time SLAM, downloading the code is worth a shot. However, the one demo which stood out was Andrew Davison’s showcase of his MonoSLAM system from 2004. Andy had to revive his 15-year old laptop (which was running Redhat Linux) to show off his original system, running on the original hardware. If the computer vision community is going to oneway decide on a “retro-vision” demo session, I’m just going to go ahead and nominate Andy for the best-paper prize, right now.

Andry’s Retro-Vision SLAM Setup (Pictured on December 18th, 2015)

 

It was interesting to watch the SLAM system experts wave their USB cameras around, showing their systems build 3D maps of the desk-sized area around their laptops.  If you carefully look at the way these experts move the camera around (i.e., smooth circular motions), you can almost tell how long a person has been working with SLAM. When the non-experts hold the camera, probability of tracking failure is significantly higher.

 

I had the pleasure of speaking with Andy during the demo session, and I was curious which line of work (in the past 15 years) surprised him the most. His reply was that PTAM, which showed how to perform real-time bundle adjustment, surprised him the most. The PTAM system was essentially a MonoSLAM++ system, but the significantly improved tracking results were due to taking a heavyweight algorithm (bundle adjustment) and making it real-time — something which Andy did not believe was possible in the early 2000s.

Part III: Deep Learning vs SLAM

The SLAM panel discussion was a lot of fun. Before we jump to the important Deep Learning vs SLAM discussion, I should mention that each of the workshop presenters agreed that semantics are necessary to build bigger and better SLAM systems. There were lots of interesting mini-conversations about future directions. During the debates, Marc Pollefeys (a well-known researcher in SfM and Multiple-View Geometry) reminded everybody that Robotics is the killer application of SLAM and suggested we keep an eye on the prize. This is quite surprising since SLAM was traditionally applied to Robotics problems, but the lack of Robotics success in the last few decades (Google Robotics?) has shifted the focus of SLAM away from Robots and towards large-scale map building (ala Google Maps) and Augmented Reality. Nobody at this workshop talked about Robots.

Integrating semantic information into SLAM

There was a lot of interest in incorporating semantics into today’s top-performing SLAM systems. When it comes to semantics, the SLAM community is unfortunately stuck in the world of bags-of-visual-words, and doesn’t have new ideas on how to integrate semantic information into their systems. On the other end, we’re now seeing real-time semantic segmentation demos (based on ConvNets) popping up at CVPR/ICCV/ECCV, and in my opinion SLAM needs Deep Learning as much as the other way around.

Integrating semantics into SLAM is often talk about, but it is easier said than done.

Figure 6.9 (page 142) from Moreno’s PhD thesis: Dense Semantic SLAM

 

“Will end-to-end learning dominate SLAM?”

Towards the end of the SLAM workshop panel, Dr. Zeeshan Zia asked a question which startled the entire room and led to a memorable, energy-filled discussion. You should have seen the look on the panel’s faces. It was a bunch of geometers being thrown a fireball of deep learning. Their facial expressions suggest both bewilderment, anger, and disgust. “How dare you question us?” they were thinking. And it is only during these fleeting moments that we can truly appreciate the conference experience. Zia’s question was essentially: Will end-to-end learning soon replace the mostly manual labor involved in building today’s SLAM systems?.Zia’s question is very important because end-to-end trainable systems have been slowly creeping up on many advanced computer science problems, and there’s no reason to believe SLAM will be an exception. A handful of the presenters pointed out that current SLAM systems rely on too much geometry for a pure deep-learning based SLAM system to make sense — we should use learning to make the point descriptors better, but leave the geometry alone. Just because you can use deep learning to make a calculator, it doesn’t mean you should.

 

Learning Stereo Similarity Functions via ConvNets, by Yan LeCun and collaborators.

While many of the panel speakers responded with a somewhat affirmative “no”, it was Newcombe which surprisingly championed what the marriage of Deep Learning and SLAM might look like.

Newcombe’s Proposal: Use SLAM to fuel Deep Learning
Although Newcombe didn’t provide much evidence or ideas on how Deep Learning might help SLAM, he provided a clear path on how SLAM might help Deep Learning.  Think of all those maps that we’ve built using large-scale SLAM and all those correspondences that these systems provide — isn’t that a clear path for building terascale image-image “association” datasets which should be able to help deep learning? The basic idea is that today’s SLAM systems are large-scale “correspondence engines” which can be used to generate large-scale datasets, precisely what needs to be fed into a deep ConvNet.

Concluding Remarks

There is quite a large disconnect between the kind of work done at the mainstream ICCV conference (heavy on machine learning) and the kind of work presented at the real-time SLAM workshop (heavy on geometric methods like bundle adjustment). The mainstream Computer Vision community has witnessed several mini-revolutions within the past decade (e.g., Dalal-Triggs, DPM, ImageNet, ConvNets, R-CNN) while the SLAM systems of today don’t look very different than they did 8 years ago. The Kinect sensor has probably been the single largest game changer in SLAM, but the fundamental algorithms remain intact.

Integrating semantic information: The next frontier in Visual SLAM.

Brain image from Arwen Wallington‘s blog post.

 

Today’s SLAM systems help machines geometrically understand the immediate world (i.e., build associations in a local coordinate system) while today’s Deep Learning systems help machines reason categorically (i.e., build associations across distinct object instances). In conclusion, I share Newcombe and Davison excitement in Visual SLAM, as vision-based algorithms are going to turn Augmented and Virtual Reality into billion dollar markets. However, we should not forget to keep our eyes on the “trillion-dollar” market, the one that’s going to redefine what it means to “work” — namely Robotics. The day of Robot SLAM will come soon.

 

转载于:http://www.computervisionblog.com/2016/01/why-slam-matters-future-of-real-time.html

 

如果您对该机器学习、图像视觉算法技术感兴趣,可以关注新浪微博:视觉机器人

 

 

 

 

 

 

 

 

本文来自:视觉机器人,永久链接:http://www.cvrobot.net/the-future-of-real-time-slam-and-deep-learning-vs-slam/
2016年3月11日, 星期五, 10:14 订阅本文评论 订阅本站 引用本文
算法技术视觉算法

|更多

 

相关文章

 

发表见解

 

昵称:(必填)

邮箱:(必填)

地址:(以便回访)

 

                     

 

页面

E-mail: cvrobot@163.com

    • 备案号:

沪ICP备15010509号您是第 512297 位访客

版权声明

本站内容受《著作权法》保护.
版权所有 2004-2015 视觉机器人. 保留所有权.

网站开发

Designed by 视觉机器人

 

时间: 2024-07-31 06:34:34

(转) 实时SLAM的未来及与深度学习的比较的相关文章

12位专家,展望未来5年深度学习发展趋势

2015年已然过去,2016刚刚开始,回头将目光集中于去年的成就上,以及对将来科学趋势的预测.去年最令人瞩目的一个领域就是深度学习,它是机器学习中越来越流行的一个分支,看起来它会继续向前发展,并会运用到越来越多的行业和领域中. 在过去的一年中,我们有幸通过RE•WORK节目与从事人工智能和计算机科学方面的许多大牛进行交流,我们期待在2016年会晤更多的专家,并向他们学习. 作为Q&A;系列的一部分,我们邀请了一些深度学习方面最为资深的人士,来预测深度学习领域未来5年的可能发展情况. 未来5年我们

大牛讲堂 | 深度学习Sequence Learning技术分享

雷锋网按:本文作者都大龙,2011年7月毕业于中科院计算技术研究所:曾任百度深度学习研究院(IDL)资深研发工程师,并连续两次获得百度最高奖-百万美金大奖:现在Horizon Robotics负责自主服务机器人.智能家居以及玩具方向的算法研究与开发,涉及深度学习.计算机视觉.人机交互.SLAM.机器人规划控制等多个领域. 深度学习独领风骚 人工智能领域深度学习独领风骚自2006 年Geoffery Hinton等在<科学>( Science) 杂志发表那篇著名的论文开始, 深度学习的热潮从学术

[译]深度学习:模仿人脑还是完善应用

近期,机器学习领域专家们关于深度学习的一些言论引起了极大的轰动.同时,关于深度学习的开创性和整体架构也在学术界引发了相当的讨论.伴随着人工智能开始进入进一步开发应用阶段,人们意识到它的成功与否完全取决于其实用性. 星期二,我和IBM沃森研究中心的副总裁John Gordon先生有过一次半个小时的专访.在专访中,Gordon先生从始至终都不认可人们对人工智能系统"沃森"的预期和它实际能做什么之间存在相当的差距.虽然在我和Gordon先生之间存在必然的误解,但很明确的一点是:Gordon先

Keras之父:大多数深度学习论文都是垃圾,炒作AI危害很大

Keras之父.谷歌大脑人工智能和深度学习研究员François Chollet最新撰写了一本深度学习Python教程实战书籍<Python深度学习>,书中介绍了深度学习使用Python语言和强大Keras库,详实新颖. 近日,François Chollet接受了采访,就"深度学习到底是什么"."Python为何如此广受欢迎"."目前深度学习面临的主要挑战"等议题进行了回答.他认为,目前很多深度学习领域的论文都是无意义的,因为这些研

深度学习在 iOS 上的实践 —— 通过 YOLO 在 iOS 上实现实时物体检测

本文讲的是深度学习在 iOS 上的实践 -- 通过 YOLO 在 iOS 上实现实时物体检测, 原文地址:Real-time object detection with YOLO 原文作者:Matthijs Hollemans 译文出自:掘金翻译计划 译者:Danny Lau 校对者:Dalston Xu ,DeepMissea 深度学习在 iOS 上的实践 -- 通过 YOLO 在 iOS 上实现实时物体检测 译者注: 在阅读这篇文章之前可能会遇到的一些名词,这里是解释(我自己也查了相当多的资

缺乏支持!Caffe深度学习框架未来堪忧

文章讲的是缺乏支持!Caffe深度学习框架未来堪忧,Caffe是由贾扬清在伯克利上学期间开发的,该项目已经开放源码,目前已获得社区贡献以及伯克利视觉和学习中心(BVLC)赞助.广泛来说,BVLC现在是伯克利人工智能研究(BAIR)实验室的一部分.同样,Caffe也已经超出了视觉范围,包括非视觉深度学习部分,虽然已发布的Caffe模型绝大多数仍然与图像和视频相关. Caffe是一个由表达式,速度和模块化组成的深度学习框架.其优势在于Caffe的模型和优化是通过没有硬编码的配置来定义的,以及在GPU

[译] 深度学习的未来

本文讲的是[译] 深度学习的未来, 原文地址:The future of deep learning 原文作者:Francois Chollet 译文出自:掘金翻译计划 本文永久链接:github.com/xitu/gold-m- 译者:Changkun Ou 校对者:MoutainOne, sunshine940326 这篇文章改编自我的书 Deep Learning with Python(Manning 出版社)第 9 章第 3 节(译者注:「结论」一章最后一小节). 它是讨论当前深度学习

Apple Photos背后的技术与深度学习及隐私的未来

无处不在的AI辅助成果为我们未来的智能生活指明了两条实现方向--云端与设备内,这种戏剧般的分支选择亦如故事般精彩纷呈.在云阵营中,谷歌公司举起领袖旗帜.而在设备端,苹果则一呼百应.那么谁能够在对抗最终胜出?各擅胜场抑或是双双失利? 如果是在一个星期前,我个人绝对会将选票毫不犹豫地投给云阵营.在拜读了Jeff Dean撰写的<谷歌的大规模深度学习>一文后,相信大家会和我一样惊叹于谷歌实现的技术成果--精妙.普适.智能.系统化且极具优势. 而苹果公司则在将深度学习引入自家产品方面表现得较为迟钝.但

中国人工智能学会通讯——深度学习与视觉计算 1.3 计算机视觉领域利用深度学习可能带来的未来研究方向

1.3 计算机视觉领域利用深度学习可能带来的未来研究方向 第一个,深度图像分析.目前基于深度 学习的图像算法在实验数据库上效果还是 不错的,但是远远不能够满足实际大规模 应用需求,需要进一步的提升算法性能从 而能够转化相应的实际应用.比如这个基 于图片的应用,可以估计性别和年龄,但 是其实经常会犯错,因此需要进一步提升 深度图像分析的性能. 第二个,深度视频分析.视频分析牵扯 到大量的数据和计算量,所以做起来更加 麻烦.当前深度视频分析还处于起步的阶 段,然而视频应用非常广泛,比如人机交互. 智