搜档网
当前位置:搜档网 › A Study on Static Hand Gesture Recognition using(中英对照)

A Study on Static Hand Gesture Recognition using(中英对照)

A Study on Static Hand Gesture

Recognition using

Moments

静态手势识别研究中矩的应用

S. Padam Priyal and Prabin.K.Bora

Department of Electronics and Communication Engineering

Indian Institute of Technology Guwahati, Guwahati, India

Email: {s.priyal, prabin}@iitg.ernet.in

Abstract—Hand gesture recognition is one of the key techniques in developing user-friendly interfaces for human-computer interaction. Static hand gestures are the most essential facets of gesture recognition. View point invariance and user independence are among the important requirements for realizing a real time gesture recognition system. In this context, the geometric moments and the orthogonal moments namely the Zernike, Tchebichef and Krawtchouk moments are explored. The proposed system detects the hand region through skin color identification and obtains the binary silhouette. These images are normalized for rotation and scale changes. The moment features of the normalized hand gestures are classified using a minimum distance classifier. The classification results suggest that the Krawtchouk moment features are comparati vely robust to view point changes and also exhibit user independence.

I

摘要----手势识别是研发用户友好的人机交互界面的关键技术之一。静态手势手势识别是其中最重要的方面。观察视点不变性和用户独立性是众多为实现实时手势识别系统的重要需求中的其中两种。因此,我们对几何矩和正交矩即Zernike,Tchebichef和Krawtchouk矩进行了探讨。在应用中,拟定系统通过皮肤颜色识别检测手所在的区域,并且得到了二进制表示的手部轮廓。将这些得到的图像根据旋转度的变化和比例的变化进行规范化。规范化后的手势利用一种最小距离分类器将其矩特性区分成各个不同类型。结果表明: 相较其他类型的矩,Krawtchouk矩在观察点变化时健壮性(即稳定性)较好,并且较好地体现了用户独立性。

index Terms—Geometric moments, Hand gesture, Krawtchouk moments, Tchebichef moments, View and user independent recognition, Zernike moments.

索引词----几何矩,手势,Krawtchouk矩,,Tchebichef矩,视点和用户独立性的识别,Zernike矩。

I. INTRODUCTION

Human-computer interaction (HCI) is an interesting area of research factoring development in the field of automaton. The recent advancement has led to the emergence of HCI systems that embody the ‘natural’ way of communication between humans. Therefore, attempts in integrating communication modalities like speech, handwriting and hand gestures into HCI have gained importance. Researchers have focused in developing advanced hand gesture interfaces resulting in successful applications like robotics, assistive systems, sign language communication and virtual reality [1].

1.简介

人机交互(HCI)是一个十分有趣的研究领域,其涉及自动化机器(甚至机器人)领域的发展。最近的研究趋势和进步促进了新型HCI系统的出现,它可以以更“自然”的方式实现人机交流。因此,尝试将各种沟通形式如:语音,笔迹,手势整合到HCI中的研究得到了重视。对先进手势识别人机交互接口的研究也在应用领域取得了不少成功,如在机器人学、辅助系统、手语沟通与虚拟现实[1]中的应用。

The interpretation of gesture requires proper means by which the dynamic and/or static configurations of the hand could be properly defined to the machine. This problem is dealt by using the computer vision techniques which are economical and non-obtrusive [1]. The general approach to vision based gesture recognition can be based on 2D models like the image contour and the silhouette. These models offer low computational cost and high accuracy for a modest gesture vocabulary. In methods based on 2D models, recovering the hand shape is difficult due to scale changes, rotation and view point variations.

机器对手势的解释需要通过适当的方式来进行,这种方式要求机器能够正确地定义出动态以及(或者)静态的手型。这个问题可以通过使用计算机视觉技术来解决,这种技术相对要经济、保守一些。通常,这种基于计算机视觉技术的手势识别可以利用一些 2D(二维)的模型来实现,如图像的外形(等高线)和轮廓。在用于数量适中的手势词表中,这些建模仅需少量的计算量却能提供高的准确度。当然,这种方法也有缺陷,特别是:由于比例的变化,以及旋转度和视点的变化性,在需要重新获得手型的时候是比较困难的。

Researchers have successfully developed some scale and rotation invariant features, while very few works concentrate on view and user independence. These problems are addressed to some extent using the 3D hand modelling techniques [1]. The research on developing view and user independent methods based on 2D hand models is yet to mature.

研究人员已经成功地研发了一些有关比例和旋转角度不变性的特点,然而这些工作很少将注意力集中于视点变化和用户独立性的层面。并将解决这些问题在一定程度上归于使用3D建模技术[1]。关于发展这种基于二维的手部建模方法的视点变化和用户独立性的研究尚未成熟。

Elastic graph matching proposed in [2] is user independent and can efficiently identify the gestures in a complex background. The method is sensitive to view point distortions. Chin [3] employed the curvature scale space (CSS). The approach is computationally complex and sensitive to boundary

distortions. The geometric moment invariants derived from the binary hand silhouettes form the feature set in [4] and are not robust to view point variations. The Zernike and pseudo-Zernike moment features for rotation invariant gesture recognition is introduced in [5]. Gu and Su [6] investigated the Zernike moment features for view and user independent representations. Their approach employed a multivariate piecewise linear decision algorithm to successfully classify the dataset containing eleven static gesture signs.

在文献[2]中提出的图像灵活匹配方法是用户独立的并可以在复杂的背景下有效地识别手势。但该方法对视点扭曲(失真)十分敏感。Chin博士在其文献[3]中提到使用曲率尺度空间(CSS)的方法。该方法的计算复杂,并且对边界扭曲失真敏感。在文献[4]中提到由二进制手型轮廓得出的几何矩不变量在视点变化的情况下也是不够可靠的。在文献[5]中介绍了使用Zernike和伪

-Zernike矩的旋转不变量方法的手势识别。Gu博士和Su博士在文献[6]中研究了有关视点和用户独立性表现的Zernike矩特征。他们的方法使用一个名为多元分段线性决策德算法成功地将一个含有11种静态手势标志的数据组进行了分类。

This work evaluates the geometric moments and a few popular orthogonal moments in view independent gesture classification. The orthogonal moments considered are the: (1)Zernike (2) Tchebichef and (3) Krawtchouk moments. The user independence is tested by varying the number of

users included in the training data. The classification is done using the minimum distance classifier in order to evaluate the direct representation capability of these moment features. The rest of the paper is organized as follows: Section II presents the required mathemat ical theory of moments. Section III provides an overview of the proposed gesture recognition system. Experimental results are discussed in Section IV and Section V concludes the paper.

本文的工作旨在给出几何矩以及少数较常见的正交矩在视点独立的手势识别中的评估,其中我们采用的正交矩包括:(1)Zernike矩(2) Tchebichef矩以及(3) Krawtchouk矩。同时我们通过改变参

与用户的数量并使用对应的不同数量的测试数据来测试用户独立性。为了评估这些矩特征的直接表现能力,使用了最小距离分类器来实现手势的分类。在之后的文章中,组织顺序如下:第二章节将会介绍一些有关矩研究的数学理论,第三章节将会给出依据本文介绍设计的拟定系统的概观。第四章节将会对实际实验结果进行讨论,第五章节给出全文的总结。

II. T HEORY OF M OMENTS

Moments have the ability to represent the global characteristics of the image shape. The geometric moments are the efficient and regularly deployed features for object recognition[1], [7]. The Zernike moments are based on continuous orthogonal polynomials defined in the polar domain and are rotation invariant [7]. The implementation of these moments requires proper approximation in the discrete domain. The discretization error accumulates as the order of moment increases. Hence, moments based on discrete orthogonal polynomials like the Tchebichef and the Krawtchouk polynomials have been proposed [8]. These are directly defined in the image coordinate space and do not involve any

numerical approximation.

II.矩的原理

图形的矩可以表征出图形外形的全局特征。几何矩的应用在物体识别[1], [7]中是有效并且较规范的。Zernike矩必须基于一种定义在极地领域环境下(磁场方向特殊?)的连续正交多项式,并且这种矩有旋转不变性[7]的特点。这些矩的实现需要在离散处进行适当的近似,离散型误差随着矩数的增加而不断积累。因此,基于离散正交多项式如Tchebichef 和Krawtchouk 多项式的矩已经在文献[8]中提到。这些矩直接定义于图相得坐标空间中,且不包含任何数值的近似。

For a 2D image f (x, y) defined over a rectangular grid of

size N ×N with (x, y) ∈{0, 1, ...N ?1} ×{0, 1, ...N ?1},

the moments are formulated as follows.

现假设有函数f (x, y)的二维图形在一个大小为N ×N个单位的矩形坐标图中,图形上的点满足(x, y) ∈{0, 1, ...N ?1} ×{0, 1, ...N ?1},则以下是各种矩的计算方法:

A. Geometric moments

The (n + m)th order geometric moment is defined as [7]

Thus, the geometric moments can be observed as the projectionof f(x, y) on the bases formed by the

polynomials x n y m.

A.几何矩

位于第(m+n)位的几何矩定义为:

易知,几何矩可被视为函数图像f(x, y)在由多项式x n y m构成的基上的投影

B. Zernike Moments

The image coordinates (x, y) are transformed to polar coordinates (ρ, θ), such that 0 ≤ρ≤ 1

and 0 ≤θ≤2π. The complex Zernike polynomial of order n ≥0 and repetition r

is defined as [7]

V nr (ρ, θ) = R nr (ρ) exp(?jrθ) (2)

For even values of n?|r| and |r| ≤n, R nr is the real-valued radial polynomial given below:

The image f(ρ, θ) defined in the polar domain is represented as

Using the orthogonality property, the Zernike moment Z nr of order n is obtained from the numerical approximation on the image grid of

B. Zernike 矩

将图形的坐标转换为极坐标形式(ρ, θ),0 ≤ρ≤ 1 且0 ≤θ≤2π。则第n项(n ≥0)的复合Zernike多项式(重复数为r)为[7]:

V nr (ρ, θ) = R nr (ρ) exp(?jrθ) (2)

对于偶数值n?|r|,|r|≤n,R nr是一个实数的径向多项式:

在极坐标下的图像函数f(ρ, θ)为:

利用正交性,Zernike矩的第n项Z nr可通过数值近似由坐标图上的图像得到:

C. Tchebichef moments

The 1D discrete Tchebichef polynomial at a discrete point

x is defined as [9]

t n (x) = (1 ?N)n 3F2 (?n,?x,1+ n; 1, 1 ?N; 1)

where 3F2 is the hypergeometric function

and (a)v is the Pochhammer symbol given by

(a)v = a (a + 1) ... (a + v ?1)

The separability property is used to obtain the 2D bases and

f(x, y) is represented as

The Tchebichef moment T nm of order (n+m) is obtained as

where is a normalization constant.

C. Tchebichef 矩

一维离散Tchebichef 多项式在一个离散点x 上的定义为[9]:t n (x) = (1 ?N)n 3F2 (?n,?x,1+ n; 1, 1 ?N; 1)

其中3F2是一个超几何分布函数:

上式中(a)v是一个Pochhammer 标记,定义如下:

(a)v = a (a + 1) ... (a + v ?1)

其可分离属性是用于获得二维基则函数f(x, y) 可表示为

可由下式计算得第(m+n)项的Tchebichef 矩T nm:

其中是一个正规化常数.

D. Krawtchouk moments

The n th order weighted Krawtchouk polynomial at a discrete

point x is defined as [10]

By definition,

Where is the binomial weight function, ρ (n; p) is a constant given

by

and (0 < p < 1) is a controlling parameter. As p deviates from the value of 0.5 by Δp, the support of the weighted Krawtchouk polynomial is approximately shifted by NΔp

The directi on of shifting is dependent on the sign of Δp [10].

Using the separability property, the orthogonal 2D

Krawtchouk bases are defined and f (x, y) is approximated

as,

The Krawtchouk moments Qnm of order (n + m) is obtained

as

From the plots of 2D polyno mial functions in Fig. 1, we

can infer that the Zernike and the Tchebichef polynomials

have wide supports. It means that the polynomial function is

defined for all the points in the domain. Hence, the Zernike and

Tchebichef moments characterize the global shape features.

On the other hand, the Krawtchouk polynomials have compact

support. The support of the polynomial increases with its order.

Therefore, the lower order Krawtchouk moments capture the

local features and the higher order moments represent the

global characteristics. Thus, the Krawtchouk moments exhibit

better localization.

The moments obtained in (1), (3), (7) and (10) are used as

features for gesture representation.

D. Krawtchouk 矩

在离散点x处的第n项带权Krawtchouk多项式定义为[10]

其中:

其中函数:是一个二项分布权值计算函数, ρ (n; p)是一个常量:

且(0 < p < 1) 是一个控制变量. 记p与值0.5 之间的偏差为Δp, 加权Krawtchouk 多项式由近似的NΔp转换支持。

转换的方向取决于Δp的符号[10].

根据可分离性特征, 正交二维

Krawtchouk 基就可以定义出来,且函数f (x, y)可近似为:

Krawtchouk 矩的第(n+m)项Qnm 可由下式得到:

从Fig. 1的二维函数多项式示图中, 我们可以推断Zernike多项式和Tchebichef 多项式可靠面更广。即他们在所有点上有定义,因此,从另一方面来说Zernike矩和Tchebichef 矩能够表征形体的全局特征。Krawtchouk 多项式有紧凑可靠性. 随着项数增加其可靠性就越好,因此,低项数的Krawtchouk矩能够捕捉一些局部特征,而较高项数的Krawtchouk矩能够反映出全局特征。所以,Krawtchouk 矩展现了更好的局部性.

在式(1), (3), (7) 和(10) 中的矩将被用于手势的表示。

III. G ESTURE R ECOGNITION S YSTEM

The gesture recognition system consists of four modules as

shown in the block diagram Fig. 2. The functions of these

modules are summarized as follows.

III.手势识别系统

我们拟定的手势识别系统是由4个模块组成的,这四个模块在图Fig. 2中以框图表示。这些模块的功能综述如下。

A. Hand detection and Segmentation

The first step in processing is to extract the hand from

the image background. Teng et al [11] have given a simple

and effective method to detect skin color pixels by combining

the features obtained from the YCbCr and YIQ color spaces.

Hence, the hand regions are detected using the skin color

pixels. The resultant binary image is subjected to connected

component analysis followed by the morphological closing

operation to obtain the segmented hand image.

A. 手型探测和分割

程序的第一步是将手型从图像背景中单独提取出来。Teng et al博士在文献[11]中给出了一个简单有效的方法来检测皮肤色彩像素点,这种方法结合了YCbCr和YIQ 色彩空间的属性.因此使用这种方法将手型区域识别出来,对得到的二进制合成图像做联合分量分析以及形态学闭环运算,以得到分段的手型图像

B. Normalization

The binary hand images are normalized for orientation

changes and scale variations. The image is aligned such that

the major axis lying along the forearm region is at 90?with

respect to the horizontal axis of the image. After rotation correction,

the forearm region is removed through morphological

processing. The resolution of the resultant image is fixed at

104 ×104 with the hand object normalized to 64 ×64.

B.标准化

对得到的二进制手型图像需要进行定向改变和比例变化的标准化工作。图像需要被对齐,使得沿着前臂区域的主轴关于图像水平轴方向成90?,旋转修正之后,经过形态学处理将前臂区域移除,此时得到图像分辨率需要锁定在104 ×104 而手部分辨率需要标准化为64 ×64.

C. Moment Feature extraction

The moments computed from the normalized hand gesture

image form the feature vectors. The orders of the orthogonal

moments are selected experimentally based on its accuracy

in reconstruction. The order of geometric moments is chosen

based on the recognition performance.

C.提取矩特征.

将从规范化的手势图像中计算出来的矩构成特征向量。这些正交矩的顺序确定基于他们实验上的重构精确度。而几何矩的顺序由他们的识别性能高低来决定。

D. Classification

Classification is done using the minimum distance classifier

defined as follows:

where R is the index of signs in the trained set, z s is the

feature vector of the test image, z t is the feature vector of the

target image and T is the length of the feature vectors.

D. 划分类别

分类工作由最小距离分类器完成,其定义如下:

R是训练集的符号指数,z s是测试图像的特征向量,z t是目标图像的特征向量,T是特征向量的长度。

IV. E XPERIMENTAL R ESULTS AND DISCUSSION

The gesture data are captured using an RGB Frontech e-cam

of resolution 1280×960 connected to a Intel core-II duo 2GB

RAM processor. The images are collected under non-uniform

background. The background is restricted such that the hand

is the largest object in the field-of-view (FOV).

IV.实验结果以及讨论

实验中的手势数据是使用一个RGB制式的先端电子摄像头来采集的,分辨率达到1280×960,其

直接与Intel core-II duo 2GB RAM 的处理器相连接。这些采集图像是在非统一的背景下采集的。但背景仍需要一定的限制,以使得手在观察视野(FOV)中是最大的观察物体。

Two sets of gesture data are acquired for experimentation.

The first dataset consists of gestures collected from a perfect

view point, which means the angle between the line of focus

and the axis of the object is 90?[12]. The second dataset

consists of gestures taken at varying view points. The database

for testing is collected real-time under controlled environment.

实验获得了两个手势数据集,第一个数据集包含在完美的视点下收集到的手势数据,这意味着中心线和物体主轴之间的夹角是90?[12]。第二个数据集包含的收视数据是在各种不同视点下收集的。用于测试的数据库数据是在可控环境下实时收集的数据。

The data consist of 1, 240 images collected from 23 users.

There are 10 gesture signs with 124 samples for each gesture

sign. The gesture signs are shown in Fig. 3. The images are

collected under three different scales with random orientations

and the view angles at 45?, 90?,?45?and ?135?.

这些数据包含由来自23个用户的1,240张图像,有10个手势标记其每一个都有124个样本。手势样本在图Fig. 3中显示,这些图像是在三种不同的比例下选择随机的取向和不同的视角:45?, 90

?,?45?和?135?.来收集的。

The data contains 690 gestures taken at 90?and the remaining

550 at varying view angles. We refer the dataset taken at

90?as Dataset 1 and the remaining data as Dataset 2. Thus,

Dataset 1 accounts only for the rotation and scale changes

and not the shape profile. Dataset 2 consists of gestures that

include all the three variations (orientation, scale and view

angle). Therefore, the images undergo perspective distortion

that occurs because of the viewing angles [12].

The following experiments are performed to study and

compare the adequacy of the geometric and the orthogonal

moments for robust gesture classification.

数据中包含690个手势是在90?的的视角下采集,其余550个是在不同视角下采集的。我们把前者

称为数据集1,后者称为数据集2,这样,数据集1代表有旋转和比例变化而不含外形变化的数据,而数据集2代表包含所有三种变量(指向,比例和视角)的手势数据。因此,这些图像经过了由于视角不同而造成的透视变形[12]。接下来的实验是为了研究和比较几何矩和正交矩在稳定的手势分类中的恰当性。

A. Performance of orthogonal moments in gesture representation

The binary image in Fig. 4(a) is approximated using

different orthogonal polynomials. The representation ability

of the moments is compared on the basi s of accuracy in

reconstruction. The image reconstructed from the moments is

binarised through thresholding. The dissimilarity between the

original and the reconstructed image is measured using the

mean square error (MSE) and the structural similarity (SSIM)

index. The MSE is sensitive to small imperfections in the reconstructed

image caused by thresholding. However, the SSIM

index is insensitive to such deviations and hence, corroborates

the MSE values in terms of the geometric closeness.

The SSIM index between the images f and is computed

locally by dividing the images in to L blocks of size 11×11.

For l ∈{1, 2, ···, L}, the SSIM between the l th block of f

and is evaluated as [13]

A. 手势表现中正交矩的性能

在Fig. 4(a)中的二进制图像是使用不同的正交多项式近似得到的。这些矩的表现能力强弱是根据重构精确度为基础来衡量的。使用矩来重构的图像是根据阈值比较的方法来二进制化。原图像和重构图像之间的差别使用均方差(MSE)和结构相似性(SSIM)指数来衡量。

MSE对由重构中的阈值比较导致的小瑕疵敏感。然而,SSIM指数对这样的偏差不敏感,因此证

实了MSE值在几何学上的紧密性。图像f 和的SSIM指数通过将图像分为L块11×11分辨率大小的块来局部计算,其中l ∈{1, 2, ···, L}, 其中第L块的SSIM指数可以如下计算[13]:

where μf and μ? f denotes the mean intensities, ?2

f and ?2

?f

denotes the variances and ?f ?f denotes the covariance. The

constants c1 and c2 are chosen as 0.01 and 0.03 respectively.

The average of the locally computed values gives the SSIM

index representing the overall image quality. The value of

SSIM index lies between [?1, 1] and a larger value means

high similarity between the compared images.

Fig. 4(b) shows the reconstructed images obtained from

the Zernike, Tchebichef and the Krawtchouk moments. The

comparative plot of MSE and SSIM index for varying number

of moments is shown in Fig. 4(c) and 4(d) respectively. From

the results, it is evident that the images reconstructed using the

Zernike moments are not well defined. Hence, its performance

based on the values of MSE and SSIM index is inferior to other

two orthogonal moments. Also, for higher orders the Zernike

moments are numerically unstable and the reconstruction error

increases. The images reconstructed from the Tchebichef and

the Krawtchouk moments closely approximates the original

even for the lower orders. As the order increases, the reconstruction

error for Tchebichef moments decreases and its

approximation is close to the performance of the Krawtchouk

moments. It is noted that the edges are better defined in

Krawtchouk based approach. This is expected; because the

lower order Krawtchouk polynomials have relatively high

spatial frequency components. From the plots in Fig. 4, it is

evident that the rate of convergence towards the optimal value

is faster in the case of Krawtchouk moments.

其中μf 和μ?f代表了平均强度,?2f 和?2?f代表方差,?f ?f代表协方差,常量c1 和c2分别选为0.01 和0.03。

平均局部计算值使SSIM指数能够表征图像的总体性质。SSIM指数的取值范围为[?1, 1],较大的

值证明了相较的两个图像越相似。图Fig. 4(b)显示了由Zernike, Tchebichef 和Krawtchouk三种矩重构的图像。Fig. 4(c) 和4(d)分别给出了矩数量不同的MSE和SSIM指数对比图。

其中可以明显的看出使用Zernike矩重构的图像不能很好得定义出来,因此,它基于MSE和SSIM 指数值的这种重构方法不如另外两种正交矩。再者,提高Zernike矩的项数会使其在数值上的不稳定和重构误差增多。在较低项数的情况下由Tchebichef矩和Krawtchouk矩重构的图像更接近于原始图像,Tchebichef矩重构的图像在项数增加的情况下重构误差减少,且其近似值接近使用Krawtchouk矩时的性能。

可以发现基于Krawtchouk矩方法的图像边缘定义更准确。这也是所预期的。因为低项数的Krawtchouk多项式有相对较高的空间频率成分。从Fig. 4图中可以明显看出在Krawtchouk矩情况下达到最优值的收敛速度相较其他的快得多。

B. Gesture Classification

The maximum orders of the geometric moments, the

Zernike moments, the Tchebichef moments and the

Krawtchouk moments were fixed up to 14 (n = 7 and

m = 7), 30, 80 (n = 40 and m = 40) and 80 (n = 40

and m = 40) respectively. The parameters p1 and p2 of the

Krawtchouk polynomial are fixed to 0.5 each to ensure that

the moments are emphasized with respect to the centroid of

the object.

The experiments are conducted to verify user independence

and view invariance. In each case, the results are consolidated

separately for Datasets 1 and 2. The testing is made online in

order to verify its compliance with the real-time application.

1) Verification of User independence: In this case, the

system is trained only with the gestures in Dataset 1. The

maximum size of the training dataset is 230 with 23 training

samples for each gesture. Each training sample corresponds

to one of the 23 users. The classification results for varying

number of training samples are given in Table-I. In the case

of Dataset 1, the performance based on orthogonal moments

is high even for smaller training samples. But, the geometric

B. 手势分类

几何矩,Zernike 矩, Tchebichef 矩以及Krawtchouk 矩的最大序数分别固定在14 (n = 7 且m = 7), 30, 80 (n = 40 且m = 40) 以及80 (n = 40且m = 40). Krawtchouk 多项式的参数p1 和p2

都定为0.5 以保证其着重根据物体的重心。实验旨在验证用户独立性和视点不变性。在每一种

情况下, 结果被分别统一整理于数据集1和数据集2众. 该测试是在网上进行,以验证其实时应用情况。

1)验证用户独立性:在这个情况下,系统仅用数据集1的数据来测试. 测试数据集的最大容量为10

组手势,其中每组手势有23个样本。每组中的每个测试样本对应23个测试用户中的一名。

对于不同数量的测试样本的分类结果显示在Table-I 中,在测试数据集1时,

即使是较小的测试样本,基于正交矩的方法性能也能达到较高的水平。

但是,当样本较大时,用几何矩可以得到更好的结果. 使用Krawtchouk 矩得到的结果普遍较其他的好,因此,Krawtchouk矩提供较好的用户独立性。

2) Verification of View invariance: The classification results

given in Table-I for Dataset 2 also convey the robustness

of the moment features for view angle variations. It can

be observed that more misclassification occurs for geometric

moment features and hence, they are inefficient. The Zernike

moments perform better than geometric moments. However,

the performance is limited due to its sensitivity to deviations

in the radial profile caused by perspective distortion. Since,

the Tchebichef and the Krawtchouk moments are defined in

the image grid and do not involve any transformation, they

are robust to perspective distortions. Hence, these moments

are capable of efficiently recognizing the gestures under view point variations.

The experiments are repeated by including the gestures

taken at different view angles in to the training set. Thus, the extended training set consists of 430 gesture samples. A mong those, 230 samples are taken from Dataset 1 and 200 samples

from Dataset 2. We refer the training samples from Dataset

1 as Training set-I and the extended training set as Training set-II.

The results are consolidated in Table-II. As expected, there

is improvement in the recognition accuracy of Dataset 2.

The improvement is desirably higher for the Zernike and Tchebichef moments. But, for Zernike moments the recognition rate for Dataset 1 has slightly decreased. The performance

of the Krawtchouk moment features is consistently

higher for both the training sets. From the results, we can

infer that Krawtchouk features are capable of obtaining high performance with minimum training samples (Training set-I) for gestures in Dataset 2.

V. C ONCLUSION

This paper studied the suitability of geometric and orthogonal moments to deal with the view point and the user independent issues in hand gesture recognition. An outline

on the characteristics of the orthogonal moments and their image representation capability was presented. Experiments were performed to evaluate the ability of these moments

in classifying the gestures under view point variations. The extent of user independence was verified by repeating the experiments for different number of training samples. From the results, we conclude that the Krawtchouk moments are more robust features for achieving view point and user independent gesture recognition. The experiments were performed on a petite gesture library that would be sufficient for control based applications. The work can be extended for a large gesture library as in sign-language communication and a comparative performance study with respect to other existing static gesture recognition techniques can also be made.

2) 验证视点不变性:

Table-I给出的数据集2的分类结果表明了矩特征在视角变化下的稳定性。观察发现,使用几何矩时分类失误的发生更频繁,因此其是效率低下的。Zernike矩比几何矩的性能好。然而,性能是有限的,由于其对透视变形引起的径向分布的偏差的敏感性。

既然,Tchebichef矩和Krawtchouk 矩是在图像网格上定义的且不包含任何转换,他们在透视变形的情况下式稳定的。因此,这些矩能有效用于视点变化下的手势识别。在不同的视角下采集手势形成测试集并重复试验。这样,扩展的测试集包含了430个手势样本。其中,230个样本采集于数据集1,而200个样本采集于数据集2。我们称采集于数据集1的为测试集-I,而扩展测试集称为测试集-II,结果整理于Table-II中。如预料的一样,对数据集2的识别精确度有了提高。

Zernike矩和Tchebichef 矩的提高较理想,但是在数据集1的识别率上Zernike矩有些许下降。Krawtchouk的性能在两个测试集的表现始终很好。从结果看,我们认为Krawtchouk 矩能够在手势数据集2的最小测试样本(测试集1)中获得觉高的性能。

V. 总结

这篇论文主要是研究了使用几何矩和正交矩在用于解决手势识别的视点和用户独立性问题上的适合程度。这些正交矩的大概特性和他们的图像显示能力在文中呈现。介绍了用于评估这些矩在视点变化情况下的手势分类能力的实验方法。通过使用不同数量的测试样本来进行重复试验来验证扩展的用户独立性。结果显示,Krawtchouk矩在实现视点和用户独立性手势识别上更稳定。实验是在一个足以满足基于控制应用的小型手势库上进行的。

这项工作可以扩展为用于手语沟通的大型手势库以及用于一个相对于其他现有的静态手势识别技术的性能比较研究。

相关主题