scieee Science in your language
[en] (orig)

On the benefits for model regularization of a variational formulation of GTM

Author: Olier Caparroso, Iván,Vellido Alcacena, Alfredo
Publisher: IEEE
Year: 2008
DOI: 10.1109/IJCNN.2008.4634005
Source: https://upcommons.upc.edu/bitstream/2117/13348/1/NN0508.pdf
On he benefi s o model egula iza ion o a Va ia ional
o mula ion o GTM
I ´
an Olie and Al edo Vellido
Abs ac — Gene a i e Topog aphic Mapping (GTM) is a
mani old lea ning model o he simul aneous isualiza ion and
clus e ing o mul i a ia e da a. I was o iginally o mula ed as
a cons ained mix u e o dis ibu ions, o which he adap i e
pa ame e s we e de e mined by Maximum Likelihood (ML),
using he Expec a ion-Maximiza ion (EM) algo i hm. In his
o mula ion, GTM is p one o da a o e fi ing unless a egu-
la iza ion mechanism is included. The heo e ical p inciples o
Va ia ional GTM, an app oxima e me hod ha p o ides a ull
Bayesian ea men o a Gaussian P ocess (GP)-based a ia ion
o he GTM, we e ecen ly in oduced as al e na i e way o
con ol da a o e fi ing. In his pape we assess in some de ail
he gene aliza ion capabili ies o Va ia ional GTM and compa e
hem wi h hose o al e na i e egula iza ion app oaches in
e ms o es log-likelihood, using se e al a ificial and eal
da ase s.
I. INTRODUCTION
STa is ical Machine Lea ning (SML) p o ides a uni ied
p incipled amewo k o machine lea ning me hods
and helps o o e come some o hei limi a ions. Bayesian
p obabili y heo y, in pa icula , has impo an modeling
implica ions. Fo ins ance, i equi es modeling assump ions,
including he speci ica ion o p io dis ibu ions, o be made
explici , a oiding a bi a y modelling decisions; i also au-
oma ically sa is ies he likelihood p inciple and p o ides a
na u al amewo k o handle unce ain y.
Gene a i e Topog aphic Mapping (GTM) [1] is a SML
mani old lea ning model o da a isualiza ion and clus e ing,
whose p obabilis ic se ing and unc ional simila i y make
i a p incipled al e na i e o Sel -O ganizing Maps (SOM)
[2]. In i s basic o mula ion, he GTM is ained wi hin
he ML amewo k using EM, pe mi ing he occu ence o
da a o e i ing unless egula iza ion is included, a majo
d awback when modelling noisy da a. I s p obabilis ic de ini-
ion, hough, allows he o mula ion o p incipled ex ensions,
such as hose p o iding ac i e model egula iza ion. Some
egula iza ion me hods o GTM desc ibed in [3], [4] a e
based on Bayesian e idence app oaches. Al e na i ely, a
a ia ional Bayesian app oach o he GTM was ecen ly
in oduced in [5], [6] o endow he model wi h egula iza ion
capabili ies based on a ia ional echniques.
In his pape he pe o mance o Va ia ional GTM is
assessed in se e al expe imen s, using bo h a i icial and eal
da ase s. Such pe o mance is also compa ed, in e ms o
gene aliza ion capabili y (i.e., he capabili y o a oid o e -
i ing), o ha o o he GTM models including al e na i e
I ´
an Olie and Al edo Vellido a e wi h he Depa men o Com-
pu ing Languages and Sys ems, Technical Uni e si y o Ca alonia, C/.
Jo di Gi ona 1-3, Edi ici Omega, 08034 - Ba celona, Spain (email:
{iaolie ,a ellido}@lsi.upc.edu).
e idence-based egula iza ion me hods, as well as o ha o
he s anda d un egula ized GTM and he GTM wi h GP p io .
The emaining o he pape is o ganized as ollows: Fi s ,
in sec ion II, an in oduc ion o he o iginal GTM, he GTM
egula ized models based on e idence, he GTM wi h GP
p io and a Bayesian app oach o he GTM, a e p o ided.
This is ollowed, in sec ion III, by he desc ip ion o he
Va ia ional GTM. Se e al expe imen s o he assessmen o
he pe o mance o he models a e desc ibed, and hei esul s
p esen ed and discussed, in sec ion IV. The pape w aps up
wi h a b ie conclusion sec ion.
II. GENERATIVE TOPOGRAPHIC MAPPING
A. The O iginal GTM
The neu al ne wo k-inspi ed GTM is a nonlinea la en
a iable model o he mani old lea ning amily, wi h sound
ounda ions in p obabili y heo y. I pe o ms simul aneous
clus e ing and isualiza ion o he obse ed da a h ough a
nonlinea and opology-p ese ing mapping om a isual-
iza ion la en space in L(wi h Lbeing usually 1 o 2 o
isualiza ion pu poses) on o a mani old embedded in he D
space, whe e he obse ed da a eside. The mapping ha
gene a es he mani old is ca ied ou h ough a eg ession
unc ion gi en by:
y=WΦ (u)(1)
whe e y∈
D,u∈
L,Wis he ma ix ha gene a es
he mapping, and Φis a ma ix wi h he images o Sbasis
unc ions φs(de ined as adially symme ic Gaussians in he
o iginal o mula ion o he model). To achie e compu a ional
ac abili y, he p io dis ibu ion o uin la en space is
cons ained o o m a uni o m disc e e g id o Kcen es,
analogous o he layou o he SOM uni s, in he o m:
p(u)= 1
K
K

k=1
δ(u−uk)
This way de ined, he GTM can also be unde s ood as
a cons ained mix u e o Gaussians. A densi y model in
da a space is he e o e gene a ed o each componen ko
he mix u e, which, assuming ha he obse ed da a se
Xis cons i u ed by Nindependen , iden ically dis ibu ed
(i.i.d.) da a poin s xn, leads o he de ini ion o a comple e
likelihood in he o m:
1569
978-1-4244-1821-3/08/$25.00 c
2008 IEEE
P(X|W,β)=
β
2πND/2N

n=1 1
K
K

k=1
exp −β
2xn−yk2(2)
whe e yk=WΦ (uk)a e he e e ence ec o s. F om Eq.
2, he adap i e pa ame e s o he model, which a e Wand
he common in e se a iance o he Gaussian componen s,
β, can be op imized by ML using he EM algo i hm. De ails
can be ound in [1].
B. GTM Regula ized Models
The op imiza ion o Eq. 2 makes he model i wha e e
noise is p esen in he da ase . An ad an age o he p oba-
bilis ic de ini ion o he GTM is he possibili y o in oducing
egula iza ion in he mapping. This p ocedu e au oma ically
egula es he le el o map smoo hing necessa y o a oid da a
o e i ing, eso ing o ei he a single egula iza ion e m
[3], o o mul iple ones (in a p ocedu e called Selec i e Map
Smoo hing : [4]). The i s case en ails he de ini ion o a
penalized log-likelihood o he o m:
PEN (W,β)=(W,β)−1
2γw2
whe e (W,β)is he log-likelihood o he o iginal o -
mula ion o GTM (loga i hm o Eq. 2), γis a egula iza ion
coe icien and wis a ec o shaped by conca ena ion o he
di e en column ec o s o he weigh ma ix W.
A Bayesian app oach o he es ima ion o he egula iza-
ion coe icien γ, as well as he in e se a iance β,was
in oduced in [7]. In his p ocedu e, Bayes’ heo em is used
o es ima e he dis ibu ion o γand βgi en he da a poin s:
p(γ,β|X)=p(X|γ,β)p(γ,β)
p(X)(3)
Assuming unin o ma i e p io s, he op imiza ion o he
equa ion 3 is equi alen o he maximiza ion o he e idence
o ma ginal likelihood:
p(X|γ,β)=p(X|w,β)p(w|γ)dw(4)
A no mal p io is choosen o he weigh s:
p(w,γ)=γ
2πW/2exp −1
2γw2
whe e Wis he numbe o weigh s in W. The log-
e idence o ma ginal log-likelihood o γand βis gi en
by:
ln p(X|γ,β)=
(W∗,β)−1
2γw∗2−1
2ln |H∗|+W
2ln γ+C(5)
whe e W∗is he alue o wa he maximum o he
pos e io dis ibu ion (Eq. 4) and H∗is he Hessian o
p(X|w∗,β)p(w∗|γ). All he cons an e ms ha e been
g ouped as C. The maximiza ion o his equa ion o γand
βleads o he s anda d upda ing o mulae o he e idence
app oxima ion.
Al e na i ely, mul iple egula iza ion e ms can also be
conside ed, one o each basis unc ion. This me hod known
as Selec i e Map Smoo hing (SMS) was o iginally in oduced
in [4]. In SMS, he p io dis ibu ion o e he weigh s is gi en
by
p(w,{γs})=
S

s=1 γs
2πD/2exp −1
2
S

s=1
γsws2
whe e each γsde ines a egula iza ion coe icien o
each basis unc ion, and wsis he ec o o weigh s in
Wassocia ed wi h he hype pa ame e s. The ma ginal log-
likelihood o Eq. 5 is e o mula ed as:
ln p(X|{γs},β)=(W∗,β)−1
2
S

s=1
γsw∗s2−
1
2ln |H∗{γs}| +D
2
S

s=1
ln γs
C. A Gaussian P ocess Fo mula ion o GTM
The o iginal o mula ion o GTM desc ibed in he p e ious
sec ion has a ha d cons ain imposed on he mapping om
he la en space o he da a space due o he ini e numbe o
basis unc ions used. An al e na i e app oach is in oduced
in [3], whe e he eg ession unc ion using basis unc ions
is eplaced by a smoo h mapping ca ied ou by a GP p io .
This way, he likelihood akes he o m:
P(X|Z,Y,β)=
β
2πND/2N

n=1
K

k=1 exp −β
2xn−yk2zkn
(6)
whe e: Z={zkn}a e bina y membe ship a iables com-
plying wi h he es ic ion K
k=1 zkn =1and yk=
(yk1,...,y
kD)Ta e he column ec o s o a ma ix Yand
he cen oids o sphe ical Gaussian gene a o s equi alen o
he e e ence ec o s in he case o he o ginal o mula ion
o GTM. No e ha he spi i o ykin his app oach is simila
o he eg ession e sion o GTM (Eq. 1) bu wi h a di e en
o mula ion: A GP o mula ion is assumed in oducing a
p io mul i a ia e Gaussian dis ibu ion o e Yde ined as:
P(Y)=(2π)−KD/2|C|−D/2
D

d=1
exp −1
2yT
(d)C−1y(d)
whe e y(d)is each one o he ow ec o s o he ma ix Y
and Cis a ma ix whe e each o i s elemen s is a co a iance
unc ion ha can be de ined as
15702008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008)
C(i, j)=C(ui,uj)=νexp −ui−uj2
2α2,
i, j =1...K
and whe e pa ame e νis usually se o 1.Theαpa ame e
con ols he lexibili y o he mapping om he la en space
o he da a space. An ex ended e iew o co a iance unc ions
can be ound in [8]. An al e na i e GP o mula ion was
in oduced in [9], bu his app oach had he disad an age
o no p ese ing he opog aphic o de ing in la en space,
being he e o e inapp opia e o da a isualiza ion pu poses.
No e ha Eqs. 2 and 6 a e equi alen i a p io
mul inomial dis ibu ion o e Zin he o m P(Z)=
N
n=1 K
k=1 1
Kzkn =1
KNis assumed.
Eq. 6 leads o he de ini ion o a log-likelihood, and
pa ame e s Yand βo his model can be op imized using
he EM algo i hm (in a simila way o he pa ame e s W
and βin he eg ession o mula ion o GTM). Some basic
de ails a e p o ided in [3].
D. Bayesian GTM
The speci ica ion o a ull Bayesian model o GTM can
be comple ed by de ining p io s o e he pa ame e s Zand
β. Since zkn a e de ined as bina y alues, a mul inomial
dis ibu ion can be chosen o Z:
P(Z)=
N

n=1
K

k=1
pzkn
kn
whe e pkn is he pa ame e o he dis ibu ion.
As in [10], a Gamma dis ibu ion1is chosen o be he p io
o e β:
P(β)=Γ(β|dβ,s
β)
whe e dβand sβa e he pa ame e s o he dis ibu ion.
The e o e, he join p obabili y P(X,Z,Y,β)is gi en by:
P(X,Z,Y,β)=P(X|Z,Y,β)P(Z)P(Y)P(β)
In gene al, he join p obabili y can be maximized h ough
e idence me hods using he Laplace app oxima ion [7]o ,
al e na i ely, using app oxima e me hods, such as Ma ko
Chain Mon e Ca lo [11] and a ia ional in e ence [12], [13].
The la e is he app oach we ollow o de ine Va ia ional
GTM in sec ion III.
III. VARIATIONAL GTM
A. Mo i a ion o he Use o Va ia ional In e ence
A basic p oblem in SML is he compu a ion o he
ma ginal likelihood P(X)=P(X,Θ) dΘ, whe e Θ=
{θi}is he se o pa ame e s de ining he model. Depending
o he complexi y o he model, he analy ical compu a ion
1The Gamma dis ibu ion is de ined as ollows: Γ(ν|dν,s
ν)=
sdν
ννdν−1exp−sνν
Γ(dν)
o his in eg al could be in ac able. Va ia ional in e ence al-
lows app oxima ing he ma ginal likelihood h ough Jensen’s
inequali y as ollows:
ln P(X)=ln
P(X,Θ) dΘ
=ln
Q(Θ) P(X,Θ)
Q(Θ) dΘ
≥Q(Θ) ln P(X,Θ)
Q(Θ) dΘ=F(Q)
The unc ion F(Q)is a lowe bound unc ion such ha
i s con e gence gua an ees he con e gence o he ma ginal
likelihood. The goal in a ia ional me hods is choosing a
sui able o m o he densi y Q(Θ) in such a way ha
F(Q)can be eadily e alua ed and ye which is su icien ly
lexible ha he bound is easonably igh . A easonable
app oxima ion o Q(Θ) is based on he assump ion ha
i ac o izes o e each one o he pa ame e s as Q(Θ) =
iQi(θi). Tha assumed, F(Q)can be maximized leading
he op imal dis ibu ions:
Qi(θi)= exp ln P(X,Θ)k=i
exp ln P(X,Θ)k=idθi
(7)
whe e .k=ideno es an expec a ion wi h espec o he
dis ibu ions Qk(θk) o all k=i.
B. A Bayesian App oach o GTM Based on Va ia ional
In e ence
In o de o apply he a ia ional p inciples o he Bayesian
GTM wi hin he amewo k desc ibed in he p e ious sec ion,
aQdis ibu ion o he o m:
Q(Z,Y,β)=Q(Z)Q(Y)Q(β)
is assumed, whe e na u al choices o Q(Z),Q(Y)and
Q(β)a e simila dis ibu ions o he p io s P(Z),P(Y)
and P(β), espec i ely. Thus, Q(Z)=N
n=1 K
k=1 ˜pzkn
kn ,
Q(Y)=D
d=1 Ny(d)|˜
m(d),˜
Σ, and Q(β)=
Γβ|˜
dβ,˜sβ. Using hese exp essions in Eq. 7, he ollowing
o mula ion o he a ia ional pa ame e s ˜
Σ,˜
m(d),˜pkn,˜
dβ
and ˜sβcan be ob ained:
2008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008) 1571
˜
Σ=β
N

n=1
Gn+C−1−1
˜
m(d)=β˜
Σ
N

n=1
xnd zn
˜pkn =
exp −β
2xn−yk2
K
k=1 exp −β
2xn−yk2
˜
dβ=dβ+ND
2
˜sβ=sβ+1
2
N

n=1
K

k=1
zknxn−yk2
whe e znco esponds o each ow ec o o Zand Gn
is a diagonal ma ix o size K×Kwi h elemen s zn.
The momen s in he p e ious equa ions a e de ined as:
zkn=˜pkn,β=˜
dβ
˜sβ, and xn−yk2=D˜
Σkk +
D
d=1 xnd −˜m(kd)2.
De ails o hese calcula ions can be ound in [5].
IV. EXPERIMENTS
A. Expe imen al Design
The main goal o he se o expe imen s p esen ed and
discussed in his sec ion is he assessmen o he pe o mance
o he p oposed Va ia ional GTM in he p esence o noise.
Tha is, he assessmen o i s obus ness in e ms o model
egula iza ion. The pe o mance o he Va ia ional GTM is
compa ed wi h hose o he o iginal un egula ized GTM;
he GTM egula ized using e idence me hods, ei he wi h
a single egula iza ion e m, o wi h mul iple ones; and he
GP o mula ion o GTM.
The models used in all he expe imen s we e ini ialized in
he same way o allow s aigh o wa d compa ison. The ma-
ix cen oids o he Gaussian gene a o s Yand he in e se
o he a iance βwe e se h ough PCA-based ini ializa ion
[1] and he pa ame e s {pkn}a e ixed and we e ini ialized
o 1/K. The pa ame e sβwas se o dβ/β and dβwas
ini ialized o a small alue close o 0. Fo each se o
expe imen s, se e al alues o αwe e ied hough inally
i was se o 0.1.
Fi e publicly a ailable da ase s and a six h syn he ically
gene a ed one, all wi h di e en cha ac e is ics, we e selec ed
o he expe imen s. They a e now summa ily desc ibed:
•Wine da a: This da ase consis s o 13 a ibu es and
179 cases, desc ibing he esul s o chemical analysis
o wine samples. I is a ailable om he UCI machine
lea ning eposi o y2.
•3-PhaseOil da a: This da ase consis ing o 12 a ibu es
and 1,000 da a poin s was a i icially gene a ed om
he dynamical equa ions o a pipeline sec ion ca ying
a mix u e o oil, wa e and gas which can belong o one
2h p://mlea n.ics.uci.edu/MLReposi o y.h ml
o h ee equally dis ibu ed geome ical con igu a ions.
I was o iginally used in [1] and i is a ailable in he
GTM Homepage3.
•Shu le da a: I is a da ase consis ing o 6 a ibu es and
1,000 da a poin s ob ained om a ious ine ial senso s
om Space Shu le mission STS-574.
•Abalone da a: Ano he da ase om he UCI eposi o y
consis ing o 8 a ibu es and 3,175 da a poin s. I was
o iginally used o p edic he age o abalone ma ine
gas opods om physical measu emen s.
•Le e da a: This da ase consis s o 16 a ibu es and
20,000 da a poin s, used o le e ca ego y ecogni ion.
I is also a ailable om he UCI eposi o y.
•Spi al da a: A simple wo-dimensional a i icial da ase
consis ing o 200 da a poin s was a i icially gene a ed
using he equa ion o a spi al con amina ed wi h Gaus-
sian noise, as ollows:
X=x1=n
200 sin (4πn/200) + σ(0.05)
x2=n
200 cos (4πn/200) + σ(0.05) ,
whe e 1≤n≤200 and σ(0.05) is he Gaussian noise
wi h s anda d de ia ion o 0.05.
B. Compa a i e Assessmen o he pe o mance o Va ia-
ional GTM
The pe o mance o all me hods is assessed using he
es log-likelihood o he esul ing models. Ten- old c oss-
alida ion o each da ase and me hod was used. The e-
sul s o he expe imen s a e shown in Figs. 1 o 6. These
igu es summa ily display he es log-likelihoods o each
me hod, as a unc ion o he numbe o la en poin s. All
igu es p o ide e idence ha he p oposed Va ia ional GTM
ou pe o ms he es o models, o e all (wi h he excep ion
o he Shu le da a) and o almos any numbe o la en
poin s. Mo eo e , his di e ence o pe o mance is, in some
cases (Figs. 1, 3, 5 and 6), qui e big. In con as wi h o he
models (such as GTM-GP in Figs. 1 and 2), he pe o mance
o Va ia ional GTM does no de e io a e wi h he numbe o
la en poin s. In e es ingly, he pe o mance o he o iginal
GTM and he GTM egula ized wi h e idence-based me hods
(GTM-SRT and GTM-SMS) is qui e simila in all igu es.
In u n, in mos cases, he pe o mances o e idence-based
me hods and GTM-GP a e e y simila up o a numbe
o la en poin s, beyond which hei pe o mances di e ge
no ably.
C. On he in luence o Model Regula iza ion in he Visual-
iza ion o he Da a
The low dimensionali y o he Spi al da a se allows us
o display i di ec ly in Fig. 7, oge he wi h he co espond-
ing e e ence ec o s ykob ained using each o he GTM
a ian s. The o iginal spi al wi hou noise is also added o
he displays so ha he le el o i ing o each model o
he da a can be isually assessed. I is clea ly obse ed
3h p://www.nc g.as on.ac.uk/GTM
4h p://www.cs.uc .edu/∼eamonn/
15722008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008)
4 9 16 25 36 49 64 81
−120
−100
−80
−60
−40
−20
0
Numbe o La en Poin s
Tes Log−Likelihood
NREG
SRT
SMS
GP
VAR
Fig. 1. Mean es log-likehood esul s o he Spi al da a o all me hods:
Un egula ized GTM (NREG); GTM egula ized wi h e idence me hods:
Single egula iza ion e m (SRT) and Selec i e Mapping Smoo hing (SMS);
GTM wi h GP p io (GP); and Va ia ional GTM (VAR). The e ical ba s
indica e he s anda d de ia ion o he es log-likelihood o e he c oss-
alida ion uns.
4 9 16 25 36 49 64 81
−500
−450
−400
−350
−300
−250
−200
Numbe o La en Poin s
Tes Log−Likelihood
NREG
SRT
SMS
GP
VAR
Fig. 2. Mean es log-likehood esul s o he Wine da a. Rep esen a ion
as in Fig. 1.
ha he Va ia ional GTM app oxima es he o iginal spi al
a be e han any o he al e na i e me hods (leading o
be e gene aliza ion capabili ies, as illus a ed by he es
log-likelihood esul s epo ed in he p e ious sec ion), which
end o be mo e sensible o he e ec o he added noise
(alloca ing, as a esul , some e e ence ec o s o a eas
ou side he o iginal spi al).
Fo da a o highe dimensionali y, wo isualiza ion s a e-
gies can be ollowed. In he i s one, da a a e isualized in
wo dimensions in he model la en space, using he mean
p ojec ion [1] calcula ed as umean
n=kp(uk|xn)uk o all
me hods wi h excep ion o he Va ia ional GTM, o which is
calcula ed as umean
n=kzknuk. This is illus a ed by he
isualiza ion o he Wine da a se . The o iginal da ase was
4 9 16 25 36 49 64 81
−1500
−1000
−500
0
Tes Log−Likelihood
Numbe o La en Poin s
NREG
SRT
SMS
GP
VAR
Fig. 3. Mean es log-likehood esul s o he 3-PhaseOil da a.Rep esen-
a ion as in Fig. 1.
4 9 16 25 36 49 64 81
−600
−400
−200
0
200
400
600
800
Numbe o La en Poin s
Tes Log−Likelihood
NREG
SRT
SMS
GP
VAR
Fig. 4. Mean es log-likehood esul s o he Shu le da a. Rep esen a ion
as in Fig. 1.
i s di ided in o a aining subse (66% o all da a poin s,
andomly selec ed) and a es subse ( es o he da a). The
aining da a a e isualized o all GTM a ian s in Fig. 8,
while he es da a a e isualized in Fig. 9. Bo h igu es show
ha , o all models bu Va ia ional GTM, he da a ocupy
mos o he la en space. Thus, hei isualiza ion does no
e eal any clea g ouping s uc u e. The o iginal h ee-class
s uc u e o he Wine da a is only ecognized by labelling
each class di e en ly in he display. Ins ead, Va ia ional
GTM cap u es he unde lying h ee-class s uc u e pe ec ly,
isola ing each g oup in a e y de ined a ea o he la en
space. Mo eo e , he labelling o da a poin s allows us o
iden i y, wi hou any ambigui y, se e al da a poin s which
a e clea ly mislabeled: ha is, poin s wi h a class label ha
does no co espond o hei na u al g ouping as e ealed by
Va ia ional GTM.
2008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008) 1573

4 9 16 25 36 49 64 81
−2500
−2000
−1500
−1000
−500
0
500
Numbe o La en Poin s
Tes Log−Likelihood
NREG
SRT
SMS
GP
VAR
Fig. 5. Mean es log-likehood esul s o he Abalone da a. Rep esen a ion
as in Fig. 1.
4 9 16 25 36 49 64 81
−4.5
−4
−3.5
−3
−2.5
−2 x 104
Numbe o La en Poin s
Tes Log−Likelihood
NREG
SRT
SMS
GP
VAR
Fig. 6. Mean es log-likehood esul s o he Le e da a. Rep esen a ion
as in Fig. 1.
The second s a egy deals wi h he isualiza ion o he
gene al clus e s uc u e de ined by he GTM a ian s. I is
accomplished h ough he membe ship map gene a ed using
he mode p ojec ion [1] o he da a in o he la en space,
gi en by umode
n=a gmax
k
p(uk|xn) o all me hods wi h
excep ion o he Va ia ional GTM, o which is gi en by
umode
n=a gmax
k
zkn. This is illus a ed by he isualiza ion
o he Wine da a se clus e s in Figs. 10 and 11. Again,
as in he case o he mean p ojec ions, he unde lying
h ee-class s uc u e o he da a is only clea ly obse ed in
Va ia ional GTM. Mo eo e , only Va ia ional GTM p o ides
a pa simonious clus e desc ip ion o he da a, using a e y
small numbe o clus e s o each o he h ee wine classes.
This e lec s he success o he egula iza ion p ocess. In
compa ison, he es o GTM a ian s, egula ized o no ,
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Fig. 7. (Top ow, le ) Spi al da a, (Top ow, igh ) o iginal GTM, (Middle
ow, le ) GTM-SRT, (Middle ow, igh ) GTM-SMS, (Bo om ow, le )
GTM-GP, and (Bo om ow, igh ) Va ia ional GTM. The common s anda d
de ia ion is ep esen ed by ci cles cen ed on each e e ence ec o , wi h
adius 1/√β.
show a p oli e a ion o clus e s ha is he esul o da a
o e i ing.
V. CONCLUSIONS
The bene i s o a Va ia ional o mula ion o he mani old
lea ning GTM model, in o de o achie e e ec i e model
egula iza ion, ha e been demos a ed in his pape . Se e al
expe imen s, using di e se da ase s o e y di e en cha ac-
e is ics, ha e shown ha Va ia ional GTM is able o a oid, a
leas pa ially, da a o e i ing and, he e o e, is able o gene -
alize be e han se e al al e na i e GTM o mula ions, bo h
egula ized and un egula ized. Addi ionaly, he ad an ages o
he a ia ional o mula ion o da a and clus e isualiza ion
ha e been clea ly illus a ed.
Fu u e esea ch will be de o ed o include some o he
model pa ame e s wi hin he a ia ional amewo k. In pa ic-
ula , a a ia ional ea men o hype pa ame e αis di icul .
Howe e , an in e es ing app oach o i s calcula ion in he
con ex o a ia ional GP classi ie s, using lowe and uppe
15742008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008)
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Fig. 8. Da a isualiza ion h ough mean p ojec ion o he aining subse
o he Wine da a, (Top ow, le ) o iginal GTM, (Top ow, igh ) GTM-SRT,
(Middle ow, le ) GTM-SMS, (Middle ow, igh ) GTM-GP, and (Bo om
ow) Va ia ional GTM.
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
−1 −0.5 0 0.5 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Fig. 9. Da a isualiza ion h ough mean p ojec ion o he es subse o
he Wine da a, (Top ow, le ) o iginal GTM, (Top ow, igh ) GTM-SRT,
(Middle ow, le ) GTM-SMS, (Middle ow, igh ) GTM-GP, and (Bo om
ow) Va ia ional GTM.
Fig. 10. Da a isualiza ion h ough membe ship maps o he aining
subse o he Wine da a, (Top ow, le ) o iginal GTM, (Top ow, igh )
GTM-SRT, (Middle ow, le ) GTM-SMS, (Middle ow, igh ) GTM-GP,
and (Bo om ow) Va ia ional GTM. Each clus e is ep esen ed by a squa e
o size p opo ional o he numbe o da a poin s assigned o i .
bound un ions, was p esen ed in [14] and will be explo ed
in he con ex o GTM. Fu he mo e, an addi ional ec o o
adap i e hype pa ame e s o e pa ame e Ycould be used
o con ol he mix u e o Gaussian componen s. The eby, an
op imum numbe o mix u e componen s could be calcula ed.
Finally, we ema k ha he compu a ional complexi y o
Va ia ional GTM does no inc ease wi h espec o ha
o he s anda d GTM wi h GP p io . On he o he hand,
he o mula ion o Va ia ional GTM in oduces a hea ie
compu a ional load as compa ed o he s anda d GTM, as
usual in mos o mula ions in ol ing Bayesian in e ence.
Howe e , he e was no signi ican inc ease in he unning
imes o he expe imen s epo ed in his pape . A mo e
ho ough s udy o he compu a ional e iciency o he me hod
will also be he ma e o u u e esea ch.
REFERENCES
[1] C. M. Bishop, M. S ens´
en, and C. K. I. Williams, “GTM: The
Gene a i e Topog aphic Mapping,” Neu al Compu ., ol. 10, no. 1,
pp. 215–234, 1998.
[2] T. Kohonen, Sel -O ganizing Maps (3 d ed). Sp inge -Ve lag, Be lin,
2001.
[3] C. M. Bishop, M. S ens´
en, and C. K. I. Williams, “De elopmen s o
he Gene a i e Topog aphic Mapping,” Neu ocompu ing, ol. 21, no.
1–3, pp. 203–224, 1998.
2008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008) 1575
Fig. 11. Da a isualiza ion h ough membe ship maps o he es subse
o he Wine da a, (Top ow, le ) o iginal GTM, (Top ow, igh ) GTM-SRT,
(Middle ow, le ) GTM-SMS, (Middle ow, igh ) GTM-GP, and (Bo om
ow) Va ia ional GTM. Clus e ep esen a ion as in Fig. 10.
[4] A. Vellido, W. El-De edy, and P. J. G. Lisboa, “Selec i e smoo hing
o he Gene a i e Topog aphic Mapping,” IEEE T. Neu al Ne wo .,
ol. 14, no. 4, pp. 847–852, 2003.
[5] I. Olie and A. Vellido, “A a ia ional Bayesian o mula ion o GTM:
Theo e ical ounda ions,” Technical Uni e si y o Ca alonia (UPC),
Tech. Rep. LSI-07-33-R, 2007.
[6] ——, “Va ia ional GTM,” in The 8 h In e na ional Con e ence on
In elligen Da a Enginee ing and Au oma ed Lea ning (IDEAL’07).
Lec . No es Compu . Sc., ol. 4881, 2007, pp. 77–86.
[7] D. J. C. MacKay, “A p ac ical Bayesian amewo k o back-
p opaga ion ne wo ks,” Neu al Compu ., ol. 4, no. 3, pp. 448–472,
1992.
[8] P. Ab ahamsen, “A e iew o Gaussian andom ields and co ela ion
unc ions,” No wegian Compu ing Cen e , Oslo, No way, Tech. Rep.
917,1997.
[9] A. U sugi, “Bayesian sampling and ensemble lea ning in Gene a i e
Topog aphic Mapping,” Neu al P ocess. Le ., ol. 12, pp. 277–290,
2000.
[10] C. M. Bishop, “Va ia ional p incipal componen s,” in P oceedings
Nin h In e n. Con . on A i icial Neu al Ne wo ks, ol.1,1999, pp.
509–514.
[11] C. And ieu, N. de F ei as, A. Douce , and M. I. Jo dan, “An in o-
duc ion o MCMC o machine lea ning,” Mach. Lea n., ol. 50, pp.
5–43, 2003.
[12] M. Beal, “Va ia ional algo i hms o app oxima e Bayesian in e ence,”
Ph.D. disse a ion, The Ga sby Compu a ional Neu oscience Uni ,
Uni . College London, 2003.
[13] T. Jakkola and M. I. Jo dan, “Bayesian pa ame e es ima ion ia
a ia ional me hods,” S a . Compu ., ol. 10, pp. 25–33, 2000.
[14] M. Gibbs and D. J. C. MacKay, “Va ia ional Gaussian p ocess classi-
ie s,” IEEE T. Neu al Ne wo ., ol. 11, no. 6, pp. 1458–1464, 2000.
15762008 In e na ional Join Con e ence on Neu al Ne wo ks (IJCNN 2008)