Within-class Covariance Normalization for SVM-based Speaker Recognition

上傳人：x*** IP屬地：河北上傳時間：2021-05-19 格式：DOCX 頁數(shù)：20 大小：19.65KB 積分：15 舉報 版權(quán)申訴

Within-class Covariance Normalization for SVM-based Speaker Recognition_第2頁

Within-class Covariance Normalization for SVM-based Speaker Recognition_第3頁

Within-class Covariance Normalization for SVM-based Speaker Recognition_第4頁

Within-class Covariance Normalization for SVM-based Speaker Recognition_第5頁

已閱讀5頁，還剩15頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

1、Within-class Covariance Normalization for SVM-based Speaker RecognitionWithin-Class Covariance Normalization for SVM-based Speaker Recognition Andrew O.Hatch,Sachin Kajarekar,and Andreas StolckeThe International Computer Science Institute,Berkeley,CA,USAThe University of California at Berkeley,USASR

2、I International,Menlo Park,CA,USAahatch/doc/e63753fd04a1b0717fd5dddb.html,sachin/doc/e63753fd04a1b0717fd5dddb.html,stolcke/doc/e63753fd04a1b0717fd5dddb.htmlAbstractThis paper extends the within-class covariance normalization(WCCN)technique des

3、cribed in1,2for training generalizedlinear kernels.We describe a practical procedure for applyingWCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space.Our ap-proach involves using principal component analysis(PCA)to splitthe original featu

4、re space into two subspaces:a low-dimensional“PCA space”and a high-dimensional“PCA-complement space.”After performing WCCN in the PCA space,we concatenate theresulting feature vectors with a weighted version of their PCA-complements.When applied to a state-of-the-art MLLR-SVMspeaker recognition syst

5、em,this approach achieves improvements of up to22%in EER and28%in minimum decision cost function(DCF)over our previous baseline.We also achieve substantial im-provements over an MLLR-SVM system that performs WCCN inthe PCA space but discards the PCA-complement.Index Terms:kernel machines,support vec

6、tor machines,feature normalization,generalized linear kernels,speaker recognition.1.IntroductionIn recent years,support vector machines(SVMs)have become oneof the most important and widely-used classi?cation techniqueswithin the?eld of speaker recognition.Most top-performing speaker recognition syst

7、ems use output“scores”obtained fromSVM-based speaker models to arrive at a?nal decision for agiven speaker trial.As with every SVM-based classi?er,thesespeaker models are trained using some prede?ned kernel function .Proper selection of the kernel function can be critical to the success of an SVM-ba

8、sed system,particularly in cases where theamount of available training data for either the impostor class orthe target speaker class is very limited(e.g.the1-conversationtraining condition in speaker recognition).With some exceptions(e.g.the rank normalization techniquedescribed in3,4),most of the e

9、xisting work on kernel selec-tion for speaker recognition has focused on generalized linear ker-nelsthat is,kernels of the form,whereis a positive semide?nite parameter matrix.Approaches for train-ing include the technique described in5,which essentially in-volves setting equal to,where is the covar

10、iance matrix of the training data.A diagonal parameterization for is derived in6for count-based features(e.g.phone n-grams).These param-eterizations have both yielded substantial improvements over other kernels on a variety of speaker recognition tasks and feature sets. Nonetheless,both parameteriza

11、tions are somewhat limited by the fact that they are unsupervisedthat is,they do not take speaker labels into account when training.This limitation is addressed, at least partially,by Solomonoff et al.in7and in8,where the authors use speaker labels to identify orthonormal vectors or“di-rections”in f

12、eature space that maximize task-relevant information while minimizing noise.Solomonoffs approach has been shown to be quite useful for?ltering out channel noise and for perform-ing feature reduction.However,the approach in7,8does not prescribe any scheme for weighting the directions in feature space

13、 that are retained.Thus,this approach does not fully answer the question of how to train for a generalized linear kernel.In this paper,we expand on the within-class covariance nor-malization(WCCN)technique for training generalized linear ker-nels that was recently introduced in1,2.The WCCN tech-niqu

14、e prescribes setting equal to,where is the ex-pected within-class covariance matrix over all classes(i.e.speak-ers)in the training data.WCCN uses information about class la-bels from the training data to identify orthonormal directions in feature space that maximize task-relevant information.However

15、, unlike other techniques in the literature,WCCN optimally weights each of these directions to minimize a particular upper bound on error rate1,2.Thus,the WCCN approach can,in principle,har-ness whatever task-relevant information is contained in each of the “directions”of the underlying feature spac

16、eeven directions that are largely dominated by noise.We describe a set of experiments where we combine WCCN with a version of the principal component analysis(PCA)tech-nique described in9.Our algorithm provides a practical ap-proach for applying WCCN to large feature sets,where inverting or simply e

17、stimating is impractical for computational reasons. In experiments on SRIs latest MLLR-SVM speaker recognition system(i.e.feature set),our combined WCCN approach achieves relative improvements of up to22%in equal-error rate(EER)and 28%in minimum DCF below SRIs previous baseline.The paper is organize

18、d as follows:In section2,we summarize the WCCN approach and discuss practical considerations for how to apply WCCN to large feature sets.In section3,we describe the approach used in9for breaking feature vectors down into PCA and PCA-complement components.This is followed by section4, where we descri

19、be the experimental procedure that we use to per-form feature normalization and to train SVM-based speaker mod-els.Finally,in sections5and6,we describe a set of experiments, provide results,and end with a set of conclusions.2.Within-Class Covariance Normalization The concept of within-class covarian

20、ce normalization(WCCN) for SVM training was recently introduced in1and then extendedin2.To derive the WCCN approach,the authors?rst construct a set of upper bounds on the rates of false positives and false nega-tives in a linear classi?er(i.e.a binary classi?er that uses a linear or af?ne decision b

21、oundary).Under various conditions,the problem of minimizing these upper bounds with respect to the parameters of the linear classi?er leads to a modi?ed formulation of the hard-margin support vector machine(SVM)10,11.Given a general-ized linear kernel of the form,where is a positive semide?nite para

22、meter matrix,this modi?ed SVM for-mulation implicitly prescribes the parameterization, where is the expected within-class covariance matrix over all classes.We can represent mathematically asHere,represents a random draw from class,represents the total number of classes,and represents the expected v

23、alue of .We use and to represent the covariance matrix and the prior probability of class.(Note that in this paper,the term,“class”is synonymous with“speaker.”)Given,where is full-rank,we can implement a generalized linear kernel with by using the following feature transformation,:(1) Here,is de?ned

24、 as the Cholesky factorization of:In practice,empirical estimates of are typically quite noisy; thus,a certain amount of smoothing is usually required to make the WCCN approach work.In this paper,we use the following smoothing model:(2) Here,represents a smoothed version of the empirical expected wi

25、thin-class covariance matrix,and represents an identity matrix where is the dimensionality of the feature space. The parameter represents a tunable smoothing weight whose value is between0and1.Its straightforward to show that in the above model,the eigenvectors of are constant with respect to .Thus,

26、we can compute the WCCN feature transformation,in (1)for any value of without having to recompute the eigenvec-tors of.2.1.WCCN for Large Feature SetsIn this paper,we examine the problem of how to apply WCCN to large feature sets,where inverting or simply estimating is im-practical for computational

27、 reasons.For large feature sets,we can use kernel principal component analysis(KPCA)to?rst reduce the dimensionality of the feature space to a more manageable size before performing WCCN.One potential problem with this ap-proach,however,is that by?ltering out various orthogonal vectors or“directions

28、”in feature space(i.e.by performing feature reduc-tion),we lose a signi?cant amount of the information contained in the original feature set.To avoid this problem,we use the PCA decomposition described in9,where the feature space is divided into two sets:a set that represents the top features obtain

29、ed from performing PCA,where is the number of training vectors (i.e.the PCA set)and a PCA-complement set,which represents all of the information contained in the original features but not in the PCA set.Since all of the covariance information in the training data is con?ned to the PCA set(the PCA-co

30、mplement is for all feature vectors in the training data but generally non-zero for fea-ture vectors outside of the training data),we can perform WCCN on the PCA set,which has reduced dimensionality,and then con-catenate the resulting feature set with the PCA-complement.This procedure is described i

31、n the following sections.3.Kernel PCA and the PCA-Complement This section provides an overview of kernel PCA and also de-scribes the PCA-complement approach used in9.We begin by de?ning to be a column matrix containing scaled,mean-centered versions of the feature vectors in the training set:Here rep

32、resents the th training vector,and represents the average over all training vectors.Given the above de?nition, we can represent(i.e.the empirical covariance matrix of the data)as follows:(3) In the second line of the above equation,we de?ne to be the eigendecomposition of.We can represent the corres

33、ponding eigendecomposition for as follows:(4) Here,we de?ne to be a column matrix containing the eigen-vectors of and to be a diagional matrix containing the corresponding eigenvalues.If is full-rank,then we can com-bine(3)with(4)to arrive at the following expression for,the eigenvector matrix of:(5

34、) The columns of represent the set of all eigenvectors of whose corresponding eigenvalue is non-zero.Thus,we can perform PCA by projecting the input feature vectors onto the column vectors of .This leads to the following feature transformation,:(6) This transformation reduces the dimensionality of t

35、he underlying feature space down to features,where is the size of the train-ing set.Since the input feature vectors appear in the form of inner products,which can be replaced with kernel functions,this feature transformation is referred to as kernel PCA12.We use(7) The PCA-complement represents the

36、portion of the original fea-ture space that is orthogonal to the training set.Thus,4.Experimental ProcedureThe experiments in this paper compare two different feature nor-malizations:WCCN and standard covariance normalization(CN), where.(Here,represents a smoothed version of ,the empirical covarianc

37、e matrix of the training data.)Sincefor every feature vector in the train-ing and test sets.This gives us the PCA-complement fea-ture set.4.Perform either within-class covariance normalization(WCCN)or standard covariance normalization(CN)on thePCA feature set.Both normalizations can be represented i

38、nthe form of a matrix multiplication.We use the smoothingmodel shown in equation(2)for both WCCN and standardCN.The smoothing parameter is tuned on a set of held-out cross-validation data.5.Concatenate a scaled version of the normalized PCA featureset with a scaled version of the PCA-complement feat

39、ureset to arrive at our?nal feature representation,:in equations(8),(6),and(7).Equation(9)shows that when and,then applying the feature transformation,to the input feature vectors does not affect the kernel function beyond a scaling factor.Thus,by concatenating the PCA set with the PCA-complement se

40、t,we preserve all of the information contained in the original feature set,at least for the purpose of computing linear kernels.5.Experiments and ResultsIn this section,we describe the tasks,datasets,and features used in our experiments.The results of these experiments are discussed in section5.4.5.

41、1.MLLR-SVM SystemWe used an MLLR-SVM system similar to the one described in4 to compute feature vectors for our experiments.The MLLR-SVM system uses speaker adaptation transforms from SRIs DECIPHER speech recognition system as features for speaker recognition.A total of8af?ne transforms are used to

42、map the Gaussian mean vec-tors from speaker-independent to speaker-dependent speech mod-els.The transforms are estimated using maximum-likelihood lin-ear regression(MLLR),and can be viewed as a text-independent encapsulation of the speakers acoustic properties.For every con-versation side,we compute

43、 a total of24960transform coef?cients, which are used as features.Note that this system uses twice as many features as the original MLLR-SVM system described in 3,1.The input feature vectors are identical to those used in4. However,besides applying the feature transformation to the in-put feature ve

44、ctors,our system differs from the MLLR-SVM sys-tem used in4in the following ways:1)our system does not ap-ply rank normalization3to the input feature vectors and2)our system does not apply TNORM13to the output SVM scores. We have yet to experiment with applying these normalizations toa system that u

45、ses WCCN.5.2.Task and DataExperiments were performed on the1-conversation training con-dition of two NIST-de?ned tasks:SRE-2004and a subset of SRE-2003.Note that these tasks and datasets are the same as those described in previous reports(see4,1).The SRE-2003subset was divided into two splits of dis

46、joint speaker sets,both com-prised of3600conversation sides and300speakers.Each split comprises580speaker models and9800speaker trials.These splits were alternately used for training(/doc/e63753fd04a1b0717fd5dddb.htmlputing covariance estimates and feature transformations)and for

47、testing.We used SRE-2004to tune and for testing on SRE-2003,and vice-versa.To simplify the tuning process,was optimized for the case where.The resulting parameter was then held?xed while tuning.Further details on the tasks and datasets can be found in4.5.3.SVM TrainingWe used SVM14to train SVM-based

48、 speaker models for each task.Each speaker model was trained with a linear kernel using the default value of the SVM hyperparameter.A held-out dataset composed of425conversation sides taken from the Switchboard-2corpus and1128conversation sides taken from the Fisher corpus was used as negative examp

49、les for the SVM training.5.4.ResultsTable1shows results on the MLLR-SVM system for various fea-ture representations.Here,the labels“WCCN”and“CN”de-note within-class covariance normalization and standard covari-ance normalization,where is tuned on the cross-validation set. The parameter is optimized

50、on the cross-validation set for sys-tems that are labeled“PCA,”is set equal to zero(i.e.the PCA-complement is omitted from the?nal feature representation).The“baseline”label repre-sents the MLLR-SVM system without any feature normalization.As shown in table1,the WCCN approach provides improve-ments that are quite substantial,at least in most cases,over stan-dard CN(see the“improvement over PCA+CN+PCA”system are the best recorded so far in the literature for an MLLR-SVM system.Even without using rank normalization

人人文庫> 全部分類> 應(yīng)用文書 > 作業(yè)報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

Within-class Covariance Normalization for SVM-based Speaker Recognition

文檔簡介

溫馨提示

最新文檔

評論

Within-class Covariance Normalization for SVM-based Speaker Recognition

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔