DiscLDA:用于降维和分类的判别式学习外文翻译资料

 2022-06-04 23:08:37

DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

Introduction

Dimensionality reduction is a common and often necessary step in most machine learning appli- cations and high-dimensional data analyses. There is a rich history and literature on the subject, ranging from classical linear methods such as principal component analysis (PCA) and Fisher dis- criminant analysis (FDA) to a variety of nonlinear procedures such as kernelized versions of PCA and FDA as well as manifold learning algorithms.

A recent trend in dimensionality reduction is to focus on probabilistic models. These models, which include generative topological mapping, factor analysis, independent component analysis and prob- abilistic latent semantic analysis (pLSA), are generally specified in terms of an underlying indepen- dence assumption or low-rank assumption. The models are generally fit with maximum likelihood, although Bayesian methods are sometimes used. In particular, Latent Dirichlet Allocation (LDA) is a Bayesian model in the spirit of pLSA that models each data point (e.g., a document) as a collec- tion of draws from a mixture model in which each mixture component is known as a topic [3]. The mixing proportions across topics are document-specific, and the posterior distribution across these mixing proportions provides a reduced representation of the document. This model has been used successfully in a number of applied domains, including information retrieval, vision and bioinfor- matics [7, 1].

The dimensionality reduction methods that we have discussed thus far are entirely unsupervised. Another branch of research, known as sufficient dimension reduction (SDR), aims at making use of

supervisory data in dimension reduction [4, 6]. For example, we may have class labels or regression responses at our disposal. The goal of SDR is then to identify a subspace or other low-dimensional object that retains as much information as possible about the supervisory signal. Having reduced di- mensionality in this way, one may wish to subsequently build a classifier or regressor in the reduced representation. But there are other goals for the dimension reduction as well, including visualization, domain understanding, and domain transfer (i.e., predicting a different set of labels or responses).

In this paper, we aim to combine these two lines of research and consider a supervised form of LDA. In particular, we wish to incorporate side information such as class labels into LDA, while retain- ing its favorable unsupervised dimensionality reduction abilities. The goal is to develop parameter estimation procedures that yield LDA topics that characterize the corpus and maximally exploit the predictive power of the side information.

As a parametric generative model, parameters in LDA are typically estimated with maximum like- lihood estimation or Bayesian posterior inference. Such estimates are not necessarily optimal for yielding representations for prediction and regression. In this paper, we use a discriminative learn- ing criterion—conditional likelihood—to train a variant of the LDA model. Moreover, we augment the LDA parameterization by introducing class-label-dependent auxiliary parameters that can be tuned by the discriminative criterion. By retaining the original LDA parameters and introducing these auxiliary parameters, we are able to retain the advantages of the likelihood-based training procedure and provide additional freedom for tracking the side information.

The paper is organized as follows. In Section 2, we introduce the discriminatively trained LDA (Dis- cLDA) model and contrast it to other related variants of LDA models. In Section 3, we describe our approach to parameter estimation for the DiscLDA model. In Section 4, we report empirical results on applying DiscLDA to model text documents. Finally, in Section 5 we present our conclusions.

Model

We start by reviewing the LDA model [3] for topic modeling. We then describe our extension to LDA that incorporates class-dependent auxiliary parameters. These parameters are to be estimated based on supervised information provided in the training data set.

LDA

The LDA model is a generative process where each document in the text corpus is modeled as a set of draws from a mixture distribution over a set of hidden topics. A topic is modeled as a probability distribution over words. Let the vector wd be the bag-of-words representation of document d. The generative process for this vector is illustrated in Fig. 1 and has three steps: 1) the document is first associated with a K-dimensional topic mixing vector theta;d which is drawn from a Dirichlet

distribution, theta;d sim; Dir(alpha;); 2) each word wdn in the document is then assigned to a single topic zdn drawn from the multinomial variable, zdn sim; Multi(theta;d); 3) finally, the word wdn is drawn from a V -dimensional multinomial variable, wdn sim; Multi(phi;zdn ), where V is the size of the vocabulary.

Given a set of documents, {wd}D , the principal task is to estimate the parameters {phi;k}K . This

d=1

k=1

can be done by maximum likelihood, Phi;lowast; = arg maxPhi; p({wd}; Phi;), where Phi; isin; RV times;K is a matrix

parameter whose columns {<s

剩余内容已隐藏,支付完成后下载完整资料</s


DiscLDA:用于降维和分类的判别式学习

1.介绍

降维是大多数机器学习应用和高维数据分析中一个普遍且通常需要的步骤。有一个丰富的历史和文学的主题,从经典的线性方法,如主成分分析(PCA)和费舍尔-犯罪分析(FDA)的各种非线性程序,如PCA和FDA的核化版本,以及流形学习算法。

jiagrave;ng降 weacute;i维 shigrave;是 dagrave;大 duō多 shugrave;数 jī机 qigrave;器 xueacute;学 xiacute;习 yigrave;ng应 yograve;ng用 heacute;和 gāo高 weacute;i维 shugrave;数 jugrave;据 fēn分 xī析 zhōng中 yī一 gegrave;个 pǔ普 biagrave;n遍 qiě且 tōng通 chaacute;ng常 xū需 yagrave;o要 de的 bugrave;步 zhograve;u骤 。 yǒu有 yī一 gegrave;个 fēng丰 fugrave;富 de的 ligrave;历 shǐ史 heacute;和 weacute;n文 xueacute;学 de的 zhǔ主 tiacute;题 , coacute;ng从 jīng经 diǎn典 de的 xiagrave;n线 xigrave;ng性 fāng方 fǎ法 , ruacute;如 zhǔ主 cheacute;ng成 fegrave;n分 fēn分 xī析 ( P C A ) heacute;和 fegrave;i费 shegrave;舍 ěr尔 - fagrave;n犯 zuigrave;罪 fēn分 xī析 ( F D A ) de的 gegrave;各 zhǒng种 fēi非 xiagrave;n线 xigrave;ng性 cheacute;ng程 xugrave;序 , ruacute;如 P C A heacute;和 F D A de的 heacute;核 huagrave;化 bǎn版 běn本 , yǐ以 jiacute;及 liuacute;流 xiacute;ng形 xueacute;学 xiacute;习 suagrave;n算 fǎ法 。

A recent trend in dimensionality reduction is to focus on probabilistic models. These models, which include generative topological mapping, factor analysis, independent component analysis and prob- abilistic latent semantic analysis (pLSA), are generally specified in terms of an underlying indepen- dence assumption or low-rank assumption. The models are generally fit with maximum likelihood, although Bayesian methods are sometimes used. In particular, Latent Dirichlet Allocation (LDA) is a Bayesian model in the spirit of pLSA that models each data point (e.g., a document) as a collec- tion of draws from a mixture model in which each mixture component is known as a topic [3]. The mixing proportions across topics are document-specific, and the posterior distribution across these mixing proportions provides a reduced representation of the document. This model has been used successfully in a number of applied domains, including information retrieval, vision and bioinfor- matics [7, 1].

降维的一个最新趋势是着眼于概率模型。这些模型,包括生成拓扑映射、因子分析、独立成分分析和潜在语义分析(PLSA),一般是根据潜在的独立假设或低秩假设来指定的。模型通常适合最大似然,虽然有时使用贝叶斯方法。特别地,潜在的Dirichlet分配(LDA)是PLSA精神的贝叶斯模型,它将每个数据点(例如,文档)建模为从混合模型中抽取的集合,其中每个混合成分被称为主题[3 ]。跨主题的混合比例是文档特定的,并且在这些混合比例上的后验分布提供了文档的简化表示。该模型已成功地应用于许多领域,包括信息检索、视觉和生物信息学〔7, 1〕。

迄今为止我们讨论的降维方法完全是无监督的。另一个研究分支,称为充分降维(SDR),旨在利用降维中的监督数据〔4, 6〕。例如,我们可以有类标签或回归反应。SDR的目标是识别一个子空间或其他低维对象,该对象保留尽可能多的关于监控信号的信息。以这种方式减少了二元性,人们可能希望随后在简化表示中建立分类器或回归器。但也有其他维度减少的目标,包括可视化、域理解和域转移(即,预测不同的标签或响应集)。

jiagrave;ng降 weacute;i维 zhōng中 de的 jiān监 dū督 shugrave;数 jugrave;据 〔 4 , 6 〕 。 ligrave;例 ruacute;如 , wǒ我 men们 kě可 yǐ以 yǒu有 legrave;i类 biāo标 qiān签 huograve;或 huiacute;回 guī归 fǎn反 yigrave;ng应 。 S D R de的 mugrave;目 biāo标 shigrave;是 shiacute;识 bieacute;别 yī一 gegrave;个 zi子 kōng空 jiān间 huograve;或 qiacute;其 tā他 dī低 weacute;i维 duigrave;对 xiagrave;ng象 , gāi该 duigrave;对 xiagrave;ng象 bǎo保 liuacute;留 jigrave;n尽 kě可 neacute;ng能 duō多 de的 guān关 yuacute;于 jiān监 kograve;ng控 xigrave;n信 hagrave;o号 de的 xigrave;n信 xī息 。 yǐ以 zhegrave;这 zhǒng种 fāng方 shigrave;式 jiǎn减 shǎo少 le了 egrave;r二 yuaacute;n元 xigrave;ng性 , reacute;n人 men们 kě可 neacute;ng能 xī希 wagrave;ng望 suiacute;随 hograve;u后 zagrave;i在 jiǎn简 huagrave;化 biǎo表 shigrave;示 zhōng中 jiagrave;n建 ligrave;立 fēn分 legrave;i类 qigrave;器 huograve;或 huiacute;回 guī归 qigrave;器 。 dagrave;n但 yě也 yǒu有 qiacute;其 tā他 weacute;i维 dugrave;度 jiǎn减 shǎo少 de的 mugrave;目 biāo标 , bāo包 kuograve;括 kě可 shigrave;视 huagrave;化 、 yugrave;域 lǐ理 jiě解 heacute;和 yugrave;域 zhuǎn转 yiacute;移 ( jiacute;即 , yugrave;预 cegrave;测 bugrave;不 toacute;ng同 de的 biāo标 qiān签 huograve;或 xiǎng响 yigrave;ng应 jiacute;集 ) 。

In this paper, we aim to combine these two lines of research and consider a supervised form of LDA. In particular, we wish to incorporate side information such as class labels into LDA, while retain- ing its favorable unsupervised dimensionality reduction abilities. The goal is to develop parameter estimation procedures that yield LDA topics that characterize the corpus and maximally exploit the predictive power of the side information.

在本文中,我们的目标是结合这两条线的研究,并考虑监督形式的LDA。特别是,我们希望将侧信息(如类标签)并入LDA,同时保留其良好的无监督降维能力。我们的目标是开发参数估计程序,产生LDA主题特征的语料库,并最大限度地利用预测能力的侧信息。

zagrave;i在 běn本 weacute;n文 zhōng中 , wǒ我 men们 de的 mugrave;目 biāo标 shigrave;是 jieacute;结 heacute;合 zhegrave;这 liǎng两 tiaacute;o条 xiagrave;n线 de的 yaacute;n研 jiū究 , bigrave;ng并 kǎo考 lǜ虑 jiān监 dū督 xiacute;ng形 shigrave;式 de的 L D A 。 tegrave;特 bieacute;别 shigrave;是 , wǒ我 men们 xī希 wagrave;ng望 jiāng将 cegrave;侧 xigrave;n信 xī息 ( ruacute;如 legrave;i类 biāo标 qiān签 ) bigrave;ng并 rugrave;入 L D A , toacute;ng同 shiacute;时 bǎo保 liuacute;留 qiacute;其 liaacute;ng良 hǎo好 de的 wuacute;无 jiān监 dū督 jiagrave;ng降 weacute;i维 neacute;ng能 ligrave;力 。 wǒ我 men们 de的 mugrave;目 biāo标 shigrave;是 kāi开 fā发 cān参 shugrave;数 gū估 jigrave;计 cheacute;ng程 xugrave;序 , chǎn产 shēng生 L D A zhǔ主 tiacute;题 tegrave;特 zhēng征 de的 yǔ语 liagrave;o料 kugrave;库 , bigrave;ng并 zuigrave;最 dagrave;大 xiagrave;n限 dugrave;度 de地 ligrave;利 yograve;ng用 yugrave;预 cegrave;测 neacute;ng能 ligrave;力 de的 cegrave;侧 xigrave;n信 xī息 。

As a parametric generative model, parameters in LDA are typically estimated with maximum like- lihood estimation or Bayesian posterior inference. Such estimates are not necessarily optimal for yielding representations for prediction and regression. In this paper, we use a discriminative learn- ing criterion—conditional likelihood—to train a variant of the LDA model. Moreover, we augment the LDA parameterization by introducing class-label-dependent auxiliary parameters that can be tuned by the discriminative criterion. By retaining the original LDA parameters and introducing these auxiliary parameters, we are able to retain the advantages of the likelihood-based training procedure and provide additional freedom for tracking the side information.

作为参数生成模型,通常用最大似然估计或贝叶斯后验推断来估计LDA中的参数。这样的估计对于预测和回归的屈服表示不一定是最优的。在本文中,我们使用判别学习准则条件似然训练LDA模型的变种。此外,我们增加了LDA参数化通过引入类标签依赖辅助参数,可以调整的判别标准。通过保留原始LDA参数并引入这些辅助参数,我们能够保留基于似然的训练过程的优点,并提供额外的自由来跟踪旁侧信息。

zuograve;作 weacute;i为 cān参 shugrave;数 shēng生 cheacute;ng成 moacute;模 xiacute;ng型 , tōng通 chaacute;ng常 yograve;ng用 zuigrave;最 dagrave;大 sigrave;似 raacute;n然 gū估 jigrave;计 huograve;或 begrave;i贝 yegrave;叶 sī斯 hograve;u后 yagrave;n验 tuī推 duagrave;n断 laacute;i来 gū估 jigrave;计 L D A zhōng中 de的 cān参 shugrave;数 。 zhegrave;这 yagrave;ng样 de的 gū估 jigrave;计 duigrave;对 yuacute;于 yugrave;预 cegrave;测 heacute;和 huiacute;回 guī归 de的 qū屈 fuacute;服 biǎo表 shigrave;示 bugrave;不 yī一 digrave;ng定 shigrave;是 zuigrave;最 yōu优 de的 。 zagrave;i在 běn本 weacute;n文 zhōng中 , wǒ我 men们 shǐ使 yograve;ng用 pagrave;n判 bieacute;别 xueacute;学 xiacute;习 zhǔn准 zeacute;则 tiaacute;o条 jiagrave;n件 sigrave;似 raacute;n然 xugrave;n训 liagrave;n练 L D A moacute;模 xiacute;ng型 de的 biagrave;n变 zhǒng种 。 cǐ此 wagrave;i外 , wǒ我 men们 zēng增 jiā加 le了 L D A cān参 shugrave;数 huagrave;化 tōng通 guograve;过 yǐn引 rugrave;入 legrave;i类 biāo标 qiān签 yī依 lagrave;i赖 fǔ辅 zhugrave;助 cān参 shugrave;数 , kě可 yǐ以 tiaacute;o调 zhěng整 de的 pagrave;n判 bieacute;别 biāo标 zhǔn准 。 tōng通 guograve;过 bǎo保 liuacute;留 yuaacute;n原 shǐ始 L D A cān参 shugrave;数 bigrave;ng并 yǐn引 rugrave;入 zhegrave;这 xiē些 fǔ辅 zhugrave;助 cān参 shugrave;数 , wǒ我 men们 neacute;ng能 gograve;u够 bǎo保 liuacute;留 jī基 yuacute;于 sigrave;似 raacute;n然 de的 xugrave;n训 liagrave;n练 guograve;过 cheacute;ng程 de的 yōu优 diǎn点 , bigrave;ng并 tiacute;提 gōng供 eacute;额 wagrave;i外 de的 zigrave;自 yoacute;u由 laacute;i来 gēn跟 zōng踪 paacute;ng旁 cegrave;侧 xigrave;n信 xī息 。

The paper is organized as follows. In Section 2, we introduce the discriminatively trained LDA (Dis- cLDA) model and contrast it to other related variants of LDA models. In Section 3, we describe our approach to parameter estimation for

剩余内容已隐藏,支付完成后下载完整资料


资料编号:[465891],资料为PDF文档或Word文档,PDF文档可免费转换为Word

原文和译文剩余内容已隐藏,您需要先支付 30元 才能查看原文和译文全部内容!立即支付

以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。