Voiceprint Recognition System – Not Just a Powerful Authentication Tool

Introduction

In this advanced age, when mobile Internet is the norm, people leverage social networking, online shopping and online financial transactions without the need of being physically present at places. As a result, identity authentication has become the most critical security activity in the online world. The traditional solution uses a password or a private key that you need to remember. In fact, many people prefer keeping simple passwords such as "123456" to shuttle through the Internet world. Unfortunately, this makes their online data an easy target for hackers. Traditional solutions are a risky affair as the passwords are forgotten or lost and are also prone to hacker attacks.

Are you still using the default password "admin" for your home router?
Do you know that easy-to-crack passwords are the most vulnerable link in the security realm of the Internet of Things (IoT)?

Solutions

Fortunately, we all have unique "living passwords" on our bodies, such as the fingerprints, face, voice, and eyes. They are the unique and distinctive characteristics of individuals popularly called "biometric signatures." Voice is just one way of reflecting a person's identity. In reference to the nomenclature for "fingerprint," we also call it "voiceprint."

As per the United States National Biosignature Test Center at San Jose University, "Fundamentals of Biometric Technology," below is a quick comparison of types of biometrics signatures based on various factors:

Comparison Between Various Biometric Signatures

Let's read about the voiceprint recognition system and its underlying principle.

About Voiceprint Recognition System

Voiceprint refers to the acoustic frequency spectrum that carries the speech information in a human voice. Like fingerprints, it has unique biometric signatures, is individual-specific, and can function as an identification method. The acoustical signal is a unidimensional continuous signal. On discretization, you will get the acoustical signal that can be processed by conventional computers.

Discretized Acoustical Signals Processed by Computers

Similar to the widely-used fingerprint technology on mobile phones, voiceprint recognition (also known as speaker recognition) technology is also a bio-identification technology that extracts phonetic features from the speaker's voice signals to validate the speaker's identity. Everybody has a unique voiceprint gradually formed throughout the development process of our vocal organs. No matter how remarkably similar the imitated voice can be to the original voice, their voiceprints will remain different.

The Chinese saying of "someone may not yet be here bodily, but you can already hear him/her speaking" in real life vividly describes a scene where you identify another person by the voice. This explains why your mother knows it's you before you even finish saying "hello" over the phone. This is an extraordinary ability humans have acquired through long-term evolution. With the latest technological innovations, recognition systems can quickly identify a person after listening to 8 to 10 words; it is still not feasible to identify voice with a single word. It can also distinguish if you are one of the specified 1,000 people after speaking for more than a minute. It relies on an important concept applicable to most of the biometric identification systems: 1:1 and 1: N. It also encompasses a unique concept unique for the voiceprint recognition technology: text-dependence and text-independence.

Let's learn about its principles in detail in the proceeding section.

Working Principle

1:1 Recognition System

The working model of this biometric identification system requires you to provide your identity (account) and biometric features beforehand and saves it as a template. During processing, the system compares the entered features with the stored biometric characteristics, to determine whether the two sets match. Such systems are popularly known as 1:1 recognition system (also called speaker verification).

1: N Recognition System

The working model of this biometric identification system doesn't ask for biometric features before processing. It only requires the biometric features during runtime and then compares it with all the multiple records of biometric features stored in the background to determine the right match. Such systems are popularly known as 1: N recognition system (also called speaker identification).

Figure 1 below shows a quick comparison between both the recognition systems.

Figure 1: Speaker Verification and Speaker Identification

Figure 2 below shows the working process of a simple voiceprint recognition system:

Figure 2: Working Process of a Voiceprint Recognition System

From the perspective of users' speech content, there are two types of voiceprint recognition systems, namely text-dependence and text-independence.

As their names imply, "text-dependence" refers to a system that requires the user to only say system-prompted content or content within a small allowed range, while "text-independence" does not restrict the content spoken by the user. This way, text dependence content only requires the recognition system to process the small-range acoustic differences between users. Since the content is similar, the system only needs to care about the voice differences, with relatively less difficulty. Text independence systems require a recognition system not only to consider the distinct differences between the user voices but also to process the speech differences caused by different content, with relatively higher difficulty.

At present, there is a new technology that falls between the two, popularly known as "limited text-dependence." These systems collocate some numbers or symbols at random and require users to read the corresponding content to get the voiceprint recognized. Due to this randomness, the collected voiceprints vary in content sequence every time. This feature aligns with the widely-used short random numbers (such as digital verification codes). It is useful for identity validation or, in combination with other biometric signatures such as the face, to form multiple-factor authentication systems.

Voiceprint Recognition Algorithm: the Technical Details

Let's delve a little deeper into the technical details of the voiceprint recognition algorithm. In the feature layer, the classic Mel-Frequency Cepstral Coefficients (MFCC), the Perceptual Linear Prediction (PLP), the Deep Feature, and the Power-Normalized Cepstral Coefficients (PNCC) are all outstanding acoustic features used as inputs for model learning. However, MFCC remains the most frequently used feature.

It also allows you to use multiple features by combining any of the feature or model layers. In the machine learning model layer, the iVector framework that N.Dehak proposed in 2009 still takes a dominant role. Although the deep machine learning has been in the limelight today, and the voiceprint sector cannot escape its impact, the DNN-iVector derived from the legacy UBM-iVector framework only replaces the MFCC with the DNN (or BN) used for extracting features. Besides, it uses the DNN (or BN) as a supplement of MFCC, and the back-end learning framework remains iVector.

Figure 3 demonstrates a complete training and testing process of the voiceprint recognition system.

Figure 3: Complete Training and Recognition Framework of Voiceprint Recognition Algorithms

We can see that the iVector model training and the channel compensation model training that follows are the most relevant links. In the feature phase, you can use the BottleNeck feature to replace or supplement the MFCC feature and input it to the iVector framework for model training, as shown in Figure 4.

Figure 4: Training iVector Model with the BottleNeck Feature

In the system layer, different features and models can depict the speaker's voice features from different dimensions. Coupled with effective score normalization, various subsystems can be integrated to elevate the overall system performance substantially.

Conclusion

In this blog, we dissected and learned the basics of the voice recognition system, the details about its underlying principles, and how it plays a significant role in biometric identification industry.

时间: 2024-10-29 11:23:45

Voiceprint Recognition System – Not Just a Powerful Authentication Tool的相关文章

绿盟科技网络安全威胁周报2017.28 关注Nginx远程整数溢出漏洞CVE-2017-7529

绿盟科技发布了本周安全通告,周报编号NSFOCUS-17-28,绿盟科技漏洞库本周新增74条,其中高危53条.本次周报建议大家关注 Nginx 远程整数溢出漏洞 .目前厂商已经发布了升级补丁以修复这个安全问题,请到 厂商主页 下载更新. 焦点漏洞 Nginx 远程整数溢出漏洞 NSFOCUS ID  37146 CVE ID  CVE-2017-7529 受影响版本 Nginx Nginx 0.5.6-1.13.2 漏洞点评 Nginx是一款使用非常广泛的高性能web服务器.Nginx 0.5.

程序员,你真的懂得收发电子邮件吗?(转)

  http://www.cnblogs.com/rootq/articles/1320266.html   前言 在几年以前,相信不少朋友都听说过,马云同志创办阿里巴巴的时候,还不会发邮件.也不知道在阿里巴巴上市之后,他学会收发邮件了没有!呵呵.我是曾经从内心里"瞧不起"过,至少认为在这一点上我比他强很多.后来我才发现,我未必懂得收发邮件.除了我,还有很多的码农也根本不懂得收发邮件,更不懂得Email里面的工作原理.借此,向大家介绍一下和Email相关的技术. 老实说,我在七八年前就

人脸识别网上资源列表(装载)

http://sourceforge.net/apps/mediawiki/pyvision/index.php?title=FaceL:_Facile_Face_Labeling   原文地址: Related Open Source and Demo Software This is a short list of free or cheap webcam face recognition systems. The list includes some open source and com

全球物联网技术暨软件应用趋势分析

近年来由于网络.智能芯片.传感器.智能终端技术发展,使物联网(Internet of Thing,IoT)成为产业界瞩目的新兴科技应用.顾名思义,物联网是将各种智慧设备或传感器连上网络并加以利用而产生的新应用机会与商机. 一.云端运算与物联网技术暨软件的结合 云端的信息运作主要分为产生.处理.储存和传递四个部分.产生就如同Youtube提供影片服务:处理则是由于现今云端技术的发展,所有流程都强调要智能运作:储存是大数据时代下的必须:传递是物联网(Internet of Thing,IoT)概念的

Vagrant基础简要记录

Vagrant是一种开源软件,它为跨众多操作系统构建可重复的开发环境提供了一种方法.Vagrant使用提供者(provider)来启动隔离的虚拟环境.默认的提供者是Virtualbox   Vagrant ( http://www.vagrantup.com/ ) is a powerful development tool, which lets you manage and support the virtualization of your development environment.

(转) Graph-powered Machine Learning at Google

    Graph-powered Machine Learning at Google     Thursday, October 06, 2016 Posted by Sujith Ravi, Staff Research Scientist, Google Research Recently, there have been significant advances in Machine Learning that enable computer systems to solve comp

Win2000常用端口列表

Win2000常用端口列表  20=Ftp Data 21=FTP Open Server 23=Telnet 25=Smtp 31=Master Paradise.80 53=DNS,Bonk (DoS Exploit) 79=Finger 80=Http 110=Pop3 113=Auther Idnet 119=Nntp 121=BO jammerkillah 137=NetBios-NS 138=NetBios-DGN 139=NetBios-SSN 143=IMAP 161=Snmp

一个简单的ASP.NET Forms 身份认证

asp.net 当访问默认首页default.aspx时,会自动跳转到login.aspx页面上请求登录,随便输入用户名和密码,点击"登录"按钮,会回到首页,并显示当前登录的用户名. Web.config<configuration> <system.web> <compilation debug="true"/> <authentication mode="Forms"> <forms lo

ASP.NET基于角色的窗体安全认证机制

asp.net|安全 说明:两个月前我刚学 ASP.NET, 在 codeproject.com 看到题目叫 Role-based Security with Forms Authentication 的文章,觉得很有帮助.当时就想翻译成中文.不过直接翻译实在没意思,这两天我参照 Heath Stewart的这篇文章,并且根据自己的理解,把它按照自己的想法和表达方式写成中文.附带上自己为这篇文章做的一个演示的web应用程序. 如果有理解错误的地方,欢迎来信指出或发表评论. 概要: ASP.NET