Interview with iDST Deputy Managing Director Hua Xiansheng: City Brain – Comprehensive Urban Cognition

Editor's Note: From October 11 to 14, 2017, The Computing Conference will be held once again in Hangzhou's Yunqi township (get your tickets now!). As one of the world's most influential technology expos, this conference will include brilliant lectures by many Alibaba Group's experts and industry leaders. Starting from today, the Yunqi Community will interview a series of conference guests.

The first guest we interviewed was Alibaba iDST Deputy Managing Director Hua Xiansheng. During the October Computing Conference, he will discuss the latest trends in the computer vision field and the latest progress of the City Brain.

Hua Xiansheng is a leading international expert in the field of visual recognition and search, and has previously served as the program committee chair for the ACM Multimedia Conference and other organizations. Dr. Hua is also a Thousand Talents Program expert, IEEE Fellow, ACM Distinguished Scientist, and MIT TR35 Young Innovator Award recipient.

In 2015, Dr. Hua left the Microsoft Research Institute for Alibaba. In the search business department, he was responsible for optimizing image-based product search technology and his team developed Pailitao, the image search function for the Taobao app. In April 2016, Dr. Hua also joined Alibaba's artificial intelligence research institute iDST, where he directed the research work of the visual computing team. At present, the City Brain project is one of the projects under his charge.

At the Conference on Computer Vision and Pattern Recognition (CVPR 2017) held at the end of July, Dr. Hua, as the director of iDST's visual computing team, delivered a keynote speech titled "Practices of Large-Scale Target Re-Identification", which brought up the City Brain project.

Finding Value in Heterogeneous City Data

The City Brain project was publicly announced at the 2016 Hangzhou Computing Conference. Wang Jian, the then chairman of the Alibaba Group's technical committee, introduced City Brain using the following words: "City Brain has Alibaba Cloud's ET artificial intelligence technology at its core to perform a comprehensive real-time analysis across the city, automate public resource allocation, and fix problems as they arise during city operation. City Brain will evolve into a super artificial intelligence for city governance."

Today, one year has passed, but City Brain remains a mysterious project to outsiders. If you want to use a plain and dated term to define it, you could call it a smart city. However, City Brain is actually far more advanced than what we usually refer to as a smart city.

In the words of Dr. Hua, at its core, City Brain uses big data and big computing to mine valuable information from large volumes of heterogeneous city data.

What is heterogeneous city data? It has two main features:

First, city data is a combination of visual data, public transport data, GPS data, and other heterogeneous data. Naturally, visual data make up the largest and most important part of such data. Second, city data volumes are huge. For example, a city may have hundreds of thousands of cameras, which produce massive data around the clock on a daily basis. Therefore, the inherent advantage of city data is its massive volume. The mission of City Brain is to find a way to extract valuable information from the data.

According to Dr. Hua, "in the past, the value of these data was not fully explored and the deployment and O&M costs for this many devices were very high. However, the value of such data goes far beyond traditional applications such as license plate identification and traffic fines."

City Brain is creating cities with data intelligence. By providing comprehensive, real-time, and complete awareness, it can recognize vehicle shapes, models, trajectories, and speeds, or perceive pedestrians and cyclists. On such a basis, the project can improve decision-making, make forecasts, and intervene. At present, the value of city data is gradually becoming more apparent.

Dr. Hua used traffic conditions as an example: When an emergency arose, City Brain could immediately find the relevant data, such as suspect vehicles, cars involved in accidents, and even criminal suspects. After analyzing relevant data, it can also optimize traffic for the entire city. Going one step further, City Brain can even predict such a situation before it happens. For instance, it can tell you where traffic jams will occur in the next 10 minutes. City Brain is also capable of making predictions much earlier and deploy police and medical resources in advance. It can even prevent traffic accidents by instituting preemptive traffic control and policing.

Dr. Hua added that the comprehensive perception of city data is possible due to two main technologies. First, improved computing power, such as cloud computing, GPUs, and FPGAs, allows us to compute massive volumes of data. For example, we can simultaneously process video feeds from thousands, tens of thousands, or even more roads in real time. Second, deep learning algorithms are critical to the progress in the field of computer vision.

Dr. Hua's team has already made many breakthroughs relative to algorithms. On the server end, they are using more optimized algorithms for vehicle detection and license plate recognition with greater precision. At the same time, they can monitor accidents in real time and predict traffic conditions. City Brain has been deployed and used in the Hangzhou and Xiaoshan metropolitan areas for quite some time.

"We can perform large-scale video processing, but either efficiency or stability poses a major challenge. Over the better half of this year, as a result of ongoing iteration and optimization efforts in the project, its overall processing speed has been increased by a factor of 20 today."

From Perception to Search

Without a doubt, computer vision is both the most important and most challenging aspect in the City Brain project. Dr. Hua stated that visual data is the core of heterogeneous city data. It is more comprehensive than other data. Therefore, the City Brain project invests the most time and energy in visual technology.

"From the coverage perspective, GPS data prevails over visual data, because GPS data is essentially cross-section data. However, visual data is more comprehensive and can give us complete details of what is happening at any given intersection."

However, besides the fundamental aspects of visual perception and recognition, City Brain must also deals with issues related to the structure of visual data, such as search.

Just like Taobao's image search feature, City Brain must index images in real time. One of the major breakthroughs of this project is indexing and searching visual data feeds from cameras across a city.

According to Dr. Hua, from the technical perspective, the overall approach to city image search is similar to Taobao's image search feature. First you need to know where your target is and detect it. Then, you need to identify the vehicle, person, or other moving target and the target's properties. Finally, you need to extract a feature, a high-dimensional vector representing the essential characteristics of this target.

However, city images searches are much more complex than product searches. As far as the customer is concerned, different instances of the same product are essentially identical. However, cars of the same model owned by different people cannot be consider identical. In addition, human feature description and search are another major challenge. If a person's facial image is not clear, this issue becomes even trickier. These are the real challenges that need to be overcome in actual applications.

Of course, the iDST visual team is already at the forefront of the industry. Their results achieved in open test sets have already greatly exceeded the best publicly available results.

Commercialization of AI

With artificial intelligence development in full swing, the past few years have seen the emergence of many AI startups, both in China and abroad. Successfully commercialization is the best standard for measuring the strength of these companies.

Dr. Hua believes that successful AI commercialization must meet five criteria:

First, competent algorithms serve as a foundation.

Second, related data must be available.

Third, there must be a user base large enough.

Forth, there also needs to be a platform with powerful computing capabilities and a sound system architecture (of course, cloud computing has already lowered the barrier to entry for many startups).

Fifth, there must be a good business model.

At present, most artificial intelligence companies focus on visual applications. It would be no exaggeration to say that the field of computer vision is already a "red ocean". It is undeniable that computer vision is the fastest in terms of commercialization among the numerous artificial intelligence technologies Dr. Hua predicts that there will be five main visual application trends in the future:

The first is transportation security, which is also a main focus of City Brain.

Then, there is rich media, the use of visual methods to find valuable information in large volumes of video or image data.

The third trend will be medical imaging. Although adoption of such technologies in the medical community may take longer, they will certainly be an important area in the future.

The fourth trend of application is industry vision. In the future, cameras will be able to replace manual-visual inspections and judgements in most scenarios. This is a field to be further explored in the future.

In addition, the field of terminal-based visual intelligence is quite promising, including chips and some visual-based applications.

It is not hard to see that the fields described above are exactly the R&D focuses of Alibaba Cloud's City Brain, Medical Brain, and Industrial Brain. However, the differences between the different fields are also quite obvious. During the interview, Dr. Hua repeatedly stressed the importance of in-depth study of each industry. Artificial intelligence is gradually penetrating into different industries and sectors. However, to realize the full potential of this technology, in addition to laying the foundation with data and algorithms, in-depth research into specific application scenarios is also of critical importance.

Below we have attached the transcript of our interview with Dr. Hua:

Yunqi: What are the limitations of deep learning when applied to computer vision applications? In the future, will it be outdated by new technologies?

Dr. Hua: In fact, there are many limitations. Deep learning looks wonderful, but there are still many issues that need to be addressed. For example, facial recognition works great on a small scale, and its results are passable when dealing with thousands of individuals. However, any further expansion of the scale is very difficult to achieve. Also, video quality, resolution, and obstructions all limit the effectiveness of recognition. In these aspects, machines still cannot compete with humans. Deep learning is highly reliant on data. Deep learning applications using small data need to be further explored.

In recent years, deep learning has been gaining momentum. However, in the future, there will surely be new technologies to challenge its position.

Yunqi: One of our papers entitled "Video to Shop: Matching Clothes in Videos to Online Shopping Images" was included in last month's CVPR. Can you talk about the innovative ideas about this application?

Dr. Hua: This application uses cutting-edge clothing detection and tracking technologies. To address the multiple angle, multiple scenario, and obstruction challenges in detection of the clothing worn by celebrities, we came up with a Reconfigurable Deep Tree structure. It relies on similarity matching between multiple frames to deal with obstructions, fuzziness, and other problems in individual frames. This structure can be considered an extension of the existing attention model and can be used to solve the problem of multi-model fusion.

Yunqi: In your opinion, what future changes can be predicted in the computer vision field?

Dr. Hua: It depends on which level you want to talk about. If we are talking about technology, I think the evolution of deep learning itself will be an important change. For example, GANs may be used in more scenarios. Large-scale video mining will be another important direction. From a higher level, if we look at the field from the perspective of intelligent applications, I think that more in-depth research into specific industries will truly jump-start commercialization of artificial intelligence, or the so-called visual intelligence. Then this technology will realize its true impact and potential. Practice and exploration in this area will in turn promote the further development of visual technologies. Only by putting this technology into practice can we discover what challenges remain to be addressed. After all, the real-world competition can be very cruel.

Yunqi: What do you plan to share with attendees during this Computing Conference? Can you give us a preview of the topics you will discuss and tell us why you chose them?

Dr. Hua: I will introduce some of the applications of visual technology in various fields and the challenges they face, with special focus to the technologies and applications in the City Brain project. Our previous discussions only touched upon the City Brain project. This time, I want to take a deeper dive. For example, I want to discuss the technical details of City Brain and how we can manifest its value.

时间: 2025-01-19 00:26:19

Interview with iDST Deputy Managing Director Hua Xiansheng: City Brain – Comprehensive Urban Cognition的相关文章

打造以人民币业务为特色的国际银行

访汇丰银行(中国)有限公司工商金融服务部总经理何舜华 Create "the international banks which are characteristics of RMB business" An interview with Montgomery Ho, Managing Director & Head of Commercial Banking HSBC Bank (China) Co., Ltd 文 /本刊记者 刘立新 人民币跨境贸易结算作为新业务,经历过叫好

专注合并也有大市场

访汉能投资集团董事总经理苏维洲 Focusing on merger An interview with Su Weizhou, managing director of the Hina Group 文/本刊记者 刘立新 近日,国内知名的IT公司亚信集团与联创科技合并,交易完成后联创股东获得6000万美元现金以及约2680万股亚信股票,交易总金额约为7.33亿美元,合并公司市值已超过23 亿美元.交易完成后,亚信联创公司将成为全球收入和市值均第二大的电信BSS/OSS提供商,成就了国内最大软件

javascript table排序 这个更简单了,不用改变现在的表格结构_javascript技巧

Name Age Position Income Gender John 37 Managing director 90.000 Male Susan 34 Partner 90.000 Female David 29 Head of production 70.000 Male Laura 29 Head of marketing 70.000 Female Kate 18 Marketing 50.000 Female Mona 21 Marketing 53.000 Female Mike

众星云集:世界智能大会演讲嘉宾大公开

世界智能大会将于6月29-30日在天津梅江会展中心举行. 作为智能领域全球首个大型高端交流平台,大会旨在打造世界级先进智能科技成果发布平台.创新合作平台.产业聚集平台和投融资对接平台,促进中国与世界智能领域交流,聚集全球智能科技产业发展要素,展现全球领先的前沿科技新成果. 大会将邀请国内外知名科研机构.领军企业的代表及专家学者参加,共同探讨智能科技发展趋势,共同分享产业创新合作成就,共同谋划经济社会应用前景. 其中,大会主论坛演讲嘉宾有:(排名不分先后,按大会日程排列) 李彦宏 百度公司创始人.

voa 2015 / 4 / 14

Even with falling oil prices and strong U.S. growth, the head of the International Monetary Fund said the global economy only expanded around 3.4 percent last year. While that is near the average growth over the last couple of decades, IMF Managing D

GMIC SV倒计时10天 Facebook副总Smith将出席

10月9日,GMIC SV将于10月19日在美国硅谷圣何塞会议中心隆重召开,此次GMIC SV是长城会在美国召开的第一届全球移动互联网大会,大会的召开将对长城会全球化战略奠定坚实的基础.倒计时10天,已经有越来越多的重量级嘉宾确认参会并将参加演讲等各种活动,群星闪耀的嘉宾名单包括: Kevin Chou, CEO, Kabam Tim Draper, Managing Director, DFJ Lei Jun, CEO, Xiaomi / Aaron Levie, CEO, Box Phil

网秦董事会人事再变动:1进2出总人数10变9

网秦今年人员流失极大12月19日消息,网秦在今早发布的前三季度财报公告中宣布了一项董事会人事调整,两名董事辞职,一名新董事加入.辞任董事职位的两人是丁健和陶秀明,即日生效,网秦表示二人是因为个人原因离职.根据网秦今年提交的20-F文件,丁健是GSR Venture总经理(managing director),从 2007年开始担任网秦董事,2012年4月起担任独立董事:陶秀明是JunZeJun律师事务所创始合作人,自2012年5月份开始担任网秦独立董事.新加入网秦董事会的是Roland Wu.根

三星电子加快全球市场人事调整:印度两人调岗

三星电子12月15日消息,据sammobile网站报道,三星电子目前正在全球市场进行管理层调整.作为全球重组计划的一部分,三星印度日前进行了管理层调整.三星印度管理总监(Managing Director).亚洲西南区总裁帕克(BD Park)卸职,三星拉丁美洲总裁HyunChil Hong接任.帕克在印度供职6年,是三星印度任职时间最长的高级执行级别高管,报道称他将出任三星电子总公司全球业务主管.除了帕克,三星印度副管理总监赖文达祖奇(Ravinder Zutshi)将于本月底退休.截止今年第

让人眼花缭乱的投行职级高职级含金量骤降

眼下券商投行人员名片上五花八门的头衔让一家拟上市公司董事长找不着北:把一张张挂着"高级副总裁"."执行总经理"."董事总经理"等不同头衔的名片握在手中比了又比,还是分不清谁是领导,只好对"高级副总裁"夸了句:"这么年轻就做到高管了,有前途啊!"可旁边的董事总经理闻言脸却立刻沉了下来.几乎所有的行外人都经历过这样的尴尬:在一群拥有副总裁.高级副总裁.总监.执行总经理.董事总经理等不同职级的人群里打转,常常因搞