An SMT-based Disco e y Algo i hm o C-ne s
Ma c Sol´e1,2and Josep Ca mona2
1Compu e A chi ec u e Depa men , UPC [email p o ec ed]
2So wa e Depa men , UPC [email p o ec ed]
Abs ac . Recen ly, Causal ne s ha e been p oposed as a sui able model
o p ocess disco e y, due o hei decla a i e seman ics and he g ea
exp essi eness hey possess. In his pape we p opose an algo i hm o
disco e a causal ne om a se o aces. I is based on encoding he
p oblem as a Sa is iabili y Modulo Theo ies (SMT) o mula, and uses a
bina y sea ch s a egy o op imize he de i ed model. The me hod has
been implemen ed in a p o o ype ool ha in e ac s wi h an SMT sol e .
The expe imen al esul s ob ained wi ness he capabili y o he app oach
o disco e complex beha io in limi ed ime.
1 In oduc ion
P ocess Mining [1] is a discipline ha aims a using he da a a ailable in in-
o ma ion sys ems o disco e he p ocesses aking place inside an o ganiza ion,
check hei compliance and pe o m p edic ions. I is an e ol ing a ea which,
al hough becoming c ucial o suppo decision making, i s ill needs o se le
down in e ms o algo i hmic and model suppo . One example o his is he
as amoun o algo i hms ha exis o a wide ange o models: Pe i ne s,
Heu is ic ne s, E en -D i en P ocess Chains, Fuzzy models, among o he s [1].
Recen ly, a o malism called Causal ne s (C-ne s) [2] has been p oposed as
a sui able modeling language o p ocess mining. I is a a he compac ep e-
sen a ion ha allows exp essing complex beha io ha i is some imes di icul
o desc ibe using o he models. Fo ins ance conside he se o aces (log) in
Fig. 1(a). In Fig. 1(b) we can see wo Pe i ne s ha a e equi ed o ep esen all
he sequences wi hou adding ex a beha io . I is possible o use a single Pe i
ne , ins ead o wo, bu hen he use o silen ansi ions is needed, as shown in
(c). On he o he hand he equi alen C-ne ep esen a ion (d) is qui e compac .
The seman ics o ha C-ne can be in o mally desc ibed as:
Ac i i y amus be execu ed ini ially, ollowed ei he by b,co band c. Any
o he hese h ee possibili ies is ollowed by he execu ion o ac i i y e.
Figu e 2 ( om [2]) shows a mo e meaning ul example, desc ibing a C-ne
ha models he p ocess o booking esou ces o a a el.
The disco e y p oblem e e s o ob aining a model (in some sui able o mal-
ism) ha desc ibes he beha io eco ded in a log. To he bes o ou knowledge,
he e a e ew disco e y algo i hms o C-ne s. One indi ec me hod is o i s
2
a
bc
e
a
bc
e
a
b
c
e
Log
abe
ace
abce
acbe
(a)
(b)
(c)
(d) a
b
c
e
Fig. 1. (a) A log. (b) Se o wo Pe i ne s desc ibing he log. (c) Single Pe i ne
desc ibing he log wi h he help o silen ansi ions. (d) Causal ne o he same log.
a
s a
booking
b
book ligh
c
book ca
d
book ho el
e
comple e
booking
Log
abe
ace
abde
adbe
abcde
abdce
acbde
adbce
adcbe
acdbe
Fig. 2. Causal ne C a el om [2].
disco e a Pe i ne and hen con e i in o a C-ne as desc ibed in [2]. This
s a egy is e y cumbe some since he con e sion in oduces a silen ac i i y in
he C-ne o each place in he Pe i ne , hus inc easing signi ican ly he size
o he C-ne , al hough a e y compac C-ne ep esen ing he same language
may exis . Ano he possibili y is o use disco e y algo i hms o lexible heu is-
ic ne s [3], a model closely ela ed o C-ne s, o heu is ic ne s [4, 5] which can be
iewed as a es ic ed subclass o C-ne s. Howe e , hese wo app oaches canno
gua an ee ha he log is included in he language o he de i ed model.
This pape p esen s he i s algo i hm o disco e a C-ne om a log ha
gua an ees ha i) he language o he C-ne includes he log, and ii) he C-ne
3
has he minimal numbe o a cs. I is based on encoding he p oblem as an SMT
o mula, and using bina y sea ch o ind a minimal C-ne (in e ms o numbe
o a cs). A p o o ype ool which in e aces an SMT sol e is epo ed, showing
p omising expe imen al esul s.
O ganiza ion o he pape : In Sec . 2 we gi e he o mal de ini ion o C-
ne s and we in oduce some o he ma hema ical backg ound used in he es o
he pape . Ou app oach o he disco e y o C-ne s is explained in Sec . 3 and
expe imen ally es ed in Sec . 4. Finally, Sec . 5 p esen s some u u e wo k and
concludes his pape .
2 Backg ound
2.1 Ma hema ical p elimina ies
A mul ise (o a bag) is a se in which elemen s o a se Xcan appea mo e han
once, o mally de ined as a unc ion X→N. We deno e as B(X) he space o all
mul ise s ha can be c ea ed using he elemen s o X. Le M1, M2∈B(X), we
conside he ollowing ope a ions on mul ise s: sum (M1+M2)(x) = M1(x) +
M2(x), sub ac ion (M1−M2)(x) = max(0, M1(x)−M2(x)) and inclusion (M1⊆
M2)⇔ ∀x∈X, M1(x)≤M2(x).
A log Lis a bag o sequences o ac i i ies. In his wo k we es ic he ype
o sequences ha can o m a log. In pa icula , we assume ha all he sequences
s a wi h he same ini ial ac i i y and end wi h he same inal ac i i y, and ha
hese wo special ac i i ies only appea once in e e y sequence. This assump ion
is wi hou loss o gene ali y, since any log can be easily con e ed by using wo
new ac i i ies ha a e p ope ly inse ed in each ace.
Gi en a ini e sequence o elemen s σ=e1e2. . . en, i s leng h is deno ed
|σ|=n, and i s p e ix sequence up o elemen i, wi h i≤n, deno ed σi, is
e1. . . ei. We de ine σ0as he emp y sequence, deno ed . Con e sely, i s su ix
sequence a e i, wi h i<n, deno ed σi→, is ei+1 . . . en. The alphabe o σ,
deno ed Aσ, is he se o elemen s in σ. We ex end his no a ion o logs, so ha
ALis he alphabe o he log L,i.e.,AL=Sσ∈LAσ. Finally, he numbe o
occu ences o a gi en elemen ein σ,i.e. |{ei|ei=e}|, is deno ed as #(σ, e).
2.2 Causal ne s (C-ne s)
In his sec ion we in oduce he main model used along his pape .
De ini ion 1 (Causal ne [2]). A Causal ne is a uple C=hA, as, ae, I, Oi,
whe e Ais a ini e se o ac i i ies, as∈Ais he s a ac i i y, ae∈Ais he
end ac i i y, and I(and O) a e he se o possible inpu (ou pu esp.) bindings
pe ac i i y. Fo mally, bo h Iand Oa e unc ions A→SA, whe e SA={X⊆
P(A)|X={∅} ∨ ∅ /∈X}, and sa is y he ollowing condi ions:
–{as}={a|I(a) = {∅}} and {ae}={a|O(a) = {∅}}
4
–all he ac i i ies in he g aph (A, a cs(C)) a e on a pa h om as o ae,
whe e a cs(C)is he dependency ela ion induced by Iand Osuch ha
a cs(C) = {(a1, a2)|a1∈SX∈I(a2)X∧a2∈SY∈O(a1)Y}.
De ini ion 1 sligh ly di e s om he o iginal one om [2], whe e he se
a cs(C) is explici ly de ined in he uple. The C-ne o Fig. 1(d) is o mally
de ined as C=h{a, b, c, e}, a, e, I, Oi, wi h I(a) = ∅,O(a) = {{b},{c},{b, c}},
I(b) = {{a}},O(b) = {{e}},I(c) = {{a}},O(c) = {{e}},I(e) = {{b},{c},{b, c}}
and O(e) = ∅. The dependency ela ion o C, which co esponds g aphically o
he a cs in he igu e, in his case is: a cs(C) = {(a, b),(a, c),(b, e),(c, e)}. The
ac i i y bindings a e deno ed in he igu e as do s in he a cs, e.g.,{b} ∈ O(a) is
ep esen ed by he do in he a c (a, b) ha is nex o ac i i y a, while {a} ∈ I(a)
is he do in a c (a, b) nex o b. Non-single on ac i i y bindings a e ep esen ed
by a cs connec ing he do s: {b, c} ∈ O(a) is ep esen ed by he wo do s in a cs
(a, b), (a, c) ha a e connec ed h ough an a c.
De ini ion 2 (Binding, Binding Sequence, P ojec ion). Gi en a C-ne
hA, as, ae, I, Oi, he se Bo ac i i y bindings is {(a, SI, SO)|a∈A∧SI∈
I(a)∧SO∈O(a)}. A binding sequence β∈B∗is a sequence o ac i i y
bindings. By emo ing he inpu and ou pu bindings om a binding sequence β,
we do ob ain an ac i i y sequence deno ed as σβ.
Two binding sequences o he C-ne in Fig. 1(d) a e: β1= (a, ∅,{b})(b, {a},{e})(e, {b},∅)
and β2= (a, ∅,{b, c})(c, {a},{e})(e, {c},∅). The p ojec ion o β1is σβ1=abe.
The seman ics o a C-ne a e achie ed by selec ing, among all he possible
binding sequences, he ones sa is ying ce ain p ope ies. These sequences will
o m he se o alid binding sequences o he C-ne , and hei co esponding
p ojec ion (see De . 2) will de ine he language o he C-ne . The nex de ini ion
add esses his.
De ini ion 3 (S a e, Valid Binding Sequence, Language). Gi en a C-ne
C=hA, as, ae, I, Oi, i s s a e space S=B(A×A)is composed o s a es ha
ep esen mul ise s o pending obliga ions. Func ion ψ∈B∗→Sis de ined
induc i ely: ψ() = ∅and ψ(β·(a, SI, SO)) = ψ(β)−(SI× {a}) + ({a} × SO).
The s a e ψ(β)is he s a e o he C-ne a e he sequence o bindings β. The
binding sequence β= (a1, SI
1, SO
1). . . (an, SI
n, SO
n)is said o be alid i :
1. a1=as,an=aeand ∀k: 1 < k < n, ak∈A {as, ae}
2. ∀k: 1 ≤k≤n, (SI
k× {ak})⊆ψ(βk−1)
3. ψ(β) = ∅
The se o all alid binding sequences o Cis deno ed as VCN(C). The language
o C, deno ed L(C), is he se o ac i i y sequences ha co espond o a alid
binding sequence o C,i.e.,L(C) = {σβ|β∈VCN(C)}.
Fo ins ance, in Fig. 1(d), β1is a alid binding sequence, while β2is no , since
he inal s a e is no emp y (condi ion 3 is iola ed). The language o ha C-
ne is {abe, ace, abce, acbe}. Simila ly o Wo k low Pe i ne s [6], C-ne s ha e a
no ion o soundness [2]:
5
ab
c
d
e z
(a)
a
b
c
e
(b)
Fig. 3. (a) C-ne mixing concu en and exclusi e beha io o ac i i ies cand d.
(b) Immedia ely ollows C-ne o log L={abce, acbe}. I s language (using egula
exp essions) is ab(cb)∗e∪a(bc)+e∪ac(bc)∗e∪a(cb)+e.
De ini ion 4 (Soundness). A C-ne C=hA, as, ae, I, Oiis sound i (i) ∀a∈
A, ∀S∈I(a),∃β∈VCN(C):(a, S, X)∈β, and (ii) ∀a∈A, ∀S∈O(a),∃β∈
VCN(C) : (a, X, S)∈β. Tha is, e e y inpu and ou pu binding de ined in Cis
used in a leas one alid sequence.
Impo an ly, C-ne s can na u ally ep esen beha io ha canno be easily
exp essed in he Pe i ne no a ion unless unobse able (silen ) ansi ions a e
used. Fig. 3(a) illus a es his poin . In he C-ne , ac i i ies cand dcan occu
concu en ly o exclusi ely, e en in di e en i e a ions o he loop c ea ed by
ac i i y ,e.g.,abcdez o abde bcez. Howe e , he e is s ill ano he possibili y
ha a ises om combining he AND-spli and he XOR-join: abdcee bd bcez.
No e ha in his las ace, ac i i y ecould execu e wice o a single b(al hough
in he o e all ace hey execu e he same numbe o imes).
C-ne s, con a y o Pe i ne s, ha e an “addi i e” na u e. Tha is, while
adding a place o a Pe i ne can only es ic beha io , adding an a c (o any
o he elemen ) o a C-ne can only add beha io . The “addi i e” na u e o C-ne s
is o mally de ined wi h he help o De . 5 and P ope y 1.
De ini ion 5. Gi en wo C-ne s C1=hA1, a1
s, a1
e, I1, O1iand C2=hA2, a2
s, a2
e, I2, O2i,
we say ha C1is included in C2, deno ed C1⊆C2, i :
–a1
s=a2
s∧a1
e=a2
e,
–A1⊆A2, and
–∀a∈A1, I1(a)⊆I2(a)∧O1(a)⊆O2(a)
P ope y 1. Le C1and C2be wo C-ne s. I C1⊆C2, hen VCN(C1)⊆VCN(C2),
L(C1)⊆ L(C2) and a cs(C1)⊆a cs(C2).
2.3 C-ne disco e y
Gi en a log L, he p oblem ackled in his pape is o de i e a C-ne Csuch ha
L(C)⊇Land Ccon ains he minimal numbe o a cs. In Sec . 3 we p esen
such a me hod oge he wi h heu is ics o limi he language o he de i ed ne .
6
Gi en he addi i e na u e o C-ne s, he e is a simple me hod o gene -
a e a C-ne ha can eplay all he aces in L. I is based on he immedi-
a ely ollows ela ion [6] be ween he ac i i ies in L, deno ed <Land de ined as
<L={(a, b)| ∃σ=a1. . . an∈L:ai=a∧ai+1 =b}.
De ini ion 6. Gi en a log L, he immedia ely ollows C-ne o L, deno ed CIF(L),
is he C-ne hA, as, ae, I, Oisuch ha : (i) A=AL, (ii) ∀σ=a1. . . an∈L, a1=
as∧an=ae, (iii) ∀a∈A, O(a) = {{b} | a <Lb} ∧ I(a) = {{b} | b <La}.
The immedia ely ollows C-ne can be compu ed in linea ime wi h espec o
he size o he log, bu allows o many unobse ed beha io , hus exhibi ing a
poo p ecision [7]. Fo ins ance conside he ollowing log: L={abce, acbe}.
Ac i i ies band cin e lea e, bu i we build he immedia ely ollows C-ne
(Fig. 3(b)) we can see ha i allows o loops o a bi a y leng h in ol ing
hese wo ac i i ies, e.g.,abcbce o acbce, since L(CIF(L)) = ab(cb)∗e∪a(bc)+e∪
ac(bc)∗e∪a(cb)+e, which signi ican ly di e s om L.
3 Disco e ing s a egies o C-ne s based on SMT
3.1 P o obinding sequences o a log
In Sec . 2.2 we ha e seen i s he de ini ion o a C-ne and hen he de ini ion o
he alid sequences o bindings i can p oduce. To disco e a C-ne om a log,
we ollow he same pa h bu in he opposi e di ec ion: we will de ine sequences
o iples ep esen ing un es ic ed bindings ha sa is y some p ope ies, and
hen we will show ha gi en hese sequences, i is easy o ob ain a C-ne Csuch
ha hese sequences a e ac ually alid sequences o bindings o C. Consequen ly,
his ans o ms he disco e y p oblem o C-ne s in o he p oblem o de i ing
hese sequences o iples om he sequences in he log. Le us i s o malize
he concep o p o obinding:
De ini ion 7 (P o obinding, Well-Fo med P o obinding Sequence). A
iple (a, X, Y )is a p o obinding i ais an elemen and bo h Xand Ya e se s.
A sequence β= (a1, X1, Y1). . . (an, Xn, Yn)o p o obindings is well- o med i i
sa is ies he ollowing condi ions:
(W1) ∀i: 1 < i ≤n, Xi⊃ ∅ ∧ ai6=a1
(W2) ∀i: 1 ≤i < n, Yi⊃ ∅ ∧ ai6=an
(W3) X1=Yn=∅
(W4) ∀i: 1 ≤i≤n, ψ(βi−1)⊇(Xi× {ai})
(W5) ψ(βn) = ∅
Compa ed wi h he de ini ion o binding (De . 2), his is a weake de ini ion since
i is no longe ied o a pa icula C-ne . Gi en a se Bo sequences o well- o med
p o obindings i is possible o cha ac e ize (wi h necessa y condi ions) all he
C-ne s such ha hei se o alid sequences o bindings con ain he sequences
o p o obindings in B, as nex lemma s a es.
7
Lemma 1. Gi en a se o well- o med p o obinding sequences Bwi h iden ical
ini ial and inal ac i i ies asand ae, espec i ely, and a C-ne C=hA, as, ae, I, Oi.
The ollowing condi ions:
(N1) A⊇ {ai| ∃β∈B: (ai, Xi, Yi)∈β}
(N2) ∀a∈A, I(a)⊇ {Xi| ∃β∈B: (ai, Xi, Yi)∈β}
(N3) ∀a∈A, O(a)⊇ {Yi| ∃β∈B: (ai, Xi, Yi)∈β}
hold i , and only i , VCN(C)⊇B.
P oo . ⇒Take any p o obinding sequence β∈B, we will p o e ha since i
is well- o med i is ac ually a alid binding sequence o C, hus VCN(C)⊇B.
By De . 2, we know ha o be a binding sequence (no necessa ily alid) o
C, e e y p o obinding (ai, Xi, Yi)∈βmus sa is y ha ai∈A,Xi∈I(ai)
and Yi∈O(ai), which is i ially ue gi en ou de ini ion o C. Thus, βis a
binding sequence o C. We now p o e ha βis in ac also alid. To p o e his
we use he ac ha βis well- o med ( hus sa is ies W1 o W5). P o ing ha
β= (a1, X1, Y1). . . (an, Xn, Yn) is a alid binding sequence, we need o p o e
ha a1=as,an=aeand ∀k: 1 < k < n, ak∈A {as, ae}. The i s wo
condi ions a e sa is ied because we equi e ha all he sequences in Bs a wi h
asand end wi h ae. The hi d condi ion is gua an eed by W1 and W2. The
emaining wo condi ions o a alid sequence a e di ec ly W4 and W5.
⇐I Cis a C-ne and L(C)⊇ {σβ|β∈B}, hen i N1 does no hold i
exis s some ac i i y ha appea s in he sequences o B ha canno be execu ed
by C. I N2 does no hold, hen he e is some sequence ha is no possible
because some Xicanno be used by C, and he same applies o N3 and Yi.u
C ea ing a uple hA, as, ae, I, Oisa is ying N1, N2 and N3 does no necessa ily
yield a C-ne ( o ins ance we could add ac i i ies o A ha iola e some o he
condi ions o De . 1), since hese a e necessa y bu no su icien condi ions. To
gua an ee ha a C-ne is gene a ed and i is sound (c. ., De . 4), we es ic he
condi ions o he p e ious lemma in he ollowing heo em:
Theo em 1. Gi en a se o well- o med p o obinding sequences Bwi h iden ical
ini ial and inal ac i i ies asand ae, espec i ely, he uple C=hA, as, ae, I, Oi
wi h:
(T1) A={ai| ∃β∈B: (ai, Xi, Yi)∈β}
(T2) ∀a∈A, I(a) = {Xi| ∃β∈B: (ai, Xi, Yi)∈β}
(T3) ∀a∈A, O(a) = {Yi| ∃β∈B: (ai, Xi, Yi)∈β}
is a sound C-ne such ha VCN(C)⊇B.
P oo . We will p o e ha Cis a C-ne , since hen we can use Lemma 1 o
p o e ha VCN(C)⊇B. P o ing ha Cis sound, once we know i is a C-ne ,
is s aigh o wa d since e e y inpu and ou pu binding in Iand Oappea s in
a leas one o he sequences in B, hus in a leas one alid binding sequence
o C, hus Cis sound. So we ha e only o p o e ha Cis a C-ne sa is ying
8
De . 1. Fi s o all we ha e o p o e ha asis he only ac i i y wi h emp y
inpu binding (I(as) = ∅) and aeis he only ac i i y wi h emp y ou pu binding
(O(ae) = ∅). Conside all sequences β= (a1, X1, Y1). . . (an, Xn, Yn)∈B, by
W3, we know ha ∅ ∈ I(as) and ∅ ∈ O(ae) (because a1=asand an=ae),
and since hese wo ac i i ies a e only execu ed once (due o W1 and W2) he e
is no o he se in I(as) and O(ae). Since he o he ac i i ies a∈A {as, ae}
a e no execu ed a he begining no he end o β, also by W1 and W2 we
know hei Xiand Yise s a e non-emp y, hus hey canno ha e I(a) = {∅}
no O(a) = {∅}. We mus also p o e ha he Iand O unc ions a e de ined
o e he powe se o A,i.e.,∀a∈A, ∀X∈I(a), X ⊆A. Because o W4 and
W5 he sequence can only consume obliga ions p e iously p oduced, and all
he obliga ions mus be consumed. I some Xiin βcon ains ac i i ies no in
A, i is impossible o sa is y W4 hus i would no be well- o med, which is a
con adic ion. Simila ly, i some Yiin βcon ains ac i i ies no in A, hen he
obliga ions c ea ed can ne e be consumed, iola ing W5. Finally, we ha e o
p o e ha all he ac i i ies in he g aph (A, a cs(C)) a e on a pa h om as o ae.
Again by W4 and W5, since an ac i i y ai(di e en han as) can only execu e
when i has a leas one obliga ion (a, ai) o i in ψ(βi−1) and a∈Xi, hus
(a, ai)∈a cs(C), and he sou ce o all his chain o obliga ions is asand ends in
aeo o he wise ai(o some o i s successo s in he obliga ion chain) would ha e
p oduced an obliga ion ha nobody would ha e consumed, iola ing W5, hen
e e y ac i i y is in a dependency chain be ween asand ae.u
The heo em allows an easy con e sion om p o obinding sequences o C-
ne s, so ha he C-ne disco e y p oblem om a log Lcan be educed o he
ollowing p oblem: gi en a log L, compu e a well- o med p o obinding sequence
o each sequence in L. Since by de ini ion all ou sequences in he log ha e
he same ini ial and inal ac i i ies, hen all he p o obinding sequences will also
ha e, hus we can use Theo em 1 o disco e a C-ne . Al hough he heo em does
no conside all he C-ne s whose alid binding sequences include he p o obind-
ing sequences B, i gi es always he smalles C-ne (in e ms o alid binding
sequences and also in e ms o numbe o s uc u al elemen s o he C-ne ) ha
can gene a e he sequences in Bas nex co olla y s a es.
Co olla y 1. Gi en a se o well- o med p o obinding sequences Bwi h iden-
ical ini ial and inal ac i i ies asand ae, espec i ely, a C-ne Csuch ha
VCN(C)⊇Band a C-ne C⊥sa is ying T1, T2 and T3, hen C⊥⊆C.
P oo . I Cis a C-ne and VCN(C)⊇B, hen by Lemma 1 we know i sa is ies
N1, N2 and N3. Since C⊥sa is ies T1, T2 and T3, by De . 5 C⊥⊆C, and his
en ails by P ope y 1 VCN(C⊥)⊆VCN(C). u
In he nex sec ion we explain how we can encode as linea cons ain s he
p oblem o compu ing he sequences o p o obindings.
3.2 Encoding he p oblem as linea cons ain s
Gi en a sequence σ=a1. . . ano a log L, i is i ial o build a p o obinding
sequence βσou o i as βσ= (a1, X1, Y1). . . (an, Xn, Yn). The di icul pa is
9
o ensu e ha βσis ac ually well- o med. We will encode he unknown Xiand
Yise s using in ege a iables and hen de ine he linea cons ain s ha will
gua an ee ha βσis well- o med. We s a by delimi ing he alues ha he Xi
and Yiunknowns can ake using he ollowing p ope y:
P ope y 2. Le σ=a1. . . anbe a sequence o ac i i ies. Conside he p o o-
binding sequence βσ= (a1, X1, Y1). . . (an, Xn, Yn). I βσis well- o med, hen
∀i: 1 ≤i≤n, Xi⊆Aσi−1∧Yi⊆Aσi→.
P oo . I o some i,Xi6⊆ Aσi−1, hen le a∈Xi Aσi−1. Now ac i i y ai
is wai ing o an obliga ion (a, ai) ha canno ha e been p oduced (since a /∈
Aσi−1). Thus ψ(βi−1)6⊇ (Xi× {ai}) and βis no well- o med ( iola es W4),
which is a con adic ion. Simila ly, i Yi6⊆ Aσi→, hen i exis s an ac i i y asuch
ha a∈Yi Aσi→. Now obliga ion (ai, a) canno be consumed (since a /∈Aσi→)
so ψ(βn)6=∅( iola ing W5), and again βis no well- o med. u
To encode a i hme ically he se s Xiand Yi o each βσ, we use an in ege
a iable o e he domain {0,1}(i.e., a Boolean a iable, al hough we ea i as
an in ege in his sec ion) o encode he ac ha a pa icula ac i i y belongs o
he se . In pa icula we use a a iable xσ,i,(a,ai) o indica e whe he ac i i y a
belongs o Xiin βσo no . As usual when se s a e encoded using cha ac e is ic
unc ions we use he ollowing seman ics:
xσ,i,(a,ai)=(1 i a∈Xiin βσ
0 o he wise.
Simila ly, he a iable yσ,i,(ai,a)indica es i abelongs o Yiin βσ. Due o P op-
e y 2, he ac i i y a o x a iables can only be chosen among he alphabe o
p e ix σi−1, while in y a iables i is es ic ed o he alphabe o he su ix o σ
a e ai,i.e.,Aσi→. We deno e by Xand Y he se o all xand all y a iables,
espec i ely.
We will now ew i e he condi ions (W1,W2,W3,W4 and W5) o De . 7 o a
well- o med p o obinding sequence βσ= (a1, X1, Y1). . . (an, Xn, Yn) as inequal-
i ies using he Xand Y a iables.
Condi ion W1 In his case, pa o he condi ion is al eady gua an eed, since
ou de ini ion o log al eady assumes ha he ini ial ac i i y only appea s once.
Thus he condi ion simpli ies o equi ing ha e e y Xi(excep X1) mus be
non-emp y:
∀i: 1 < i ≤n, X
e∈Aσi−1
xσ,i,(e,ai)≥1 (1)
Condi ion W2 This is he symme ical case o W1 bu wi h he Yise s. Since
he uniqueness o he inal ac i i y is al eady gua an eed, we mus only en o ce
ha he Yise s (excep Yn) a e non-emp y:
∀i: 1 ≤i < n, X
e∈Aσi→
yσ,i,(ai,e)≥1 (2)
16
he log and he posi ion ha he ac i i y occupies in all o he sequences ha
sha e a p e ix wi h he cu en sequence. Fo ins ance in he log used in Fig. 4,
{abcdez, abdcez}, ac i i y dis execu ed in he ou h posi ion in he i s se-
quence, bu he e is ano he sequence sha ing a p e ix ab in which i appea s in
he hi d posi ion. Consequen ly, we would penalize ac i i y di i s inpu binding
does no con ain a leas one o he ac i i ies a e he i s posi ion. Tha is,
i i s inpu binding is {a}we would add one o he penal y unc ion. Summing
all he penal ies o he i s occu ence o e e y ac i i y in each sequence, we
ob ain an exp ession ha can be bounded, simila ly o (5), and added in o he
SMT p oblem. In ou example, he C-ne o 4(b) would ha e a penal y o one
(because o ac i i y c), while he C-ne in (a) has a ze o penal y.
Limi ing he inpu /ou pu bindings pe ac i i y The second app oach
in ol es limi ing he numbe o inpu and/o ou pu bindings pe ac i i y. Aux-
ilia y a iables a e used o his ask. We illus a e his poin using a iables X,
since he s a egy o he Y a iables is iden ical. Assume ha wo inpu bindings
a e allowed o each ac i i y. Fo each a iable xσ,i,(a,b)in ol ing an obliga ion
(a, b) we gene a e one a iable ik,(a,b) o each one o he kinpu bindings we
allow. Since in his case k= 2, his means ha we would ha e i1,(a,b)and i2,(a,b).
Now o e e y inpu binding se in posi ion io sequence σ, we will en o ce ha
W1≤j≤kVa∈Aσixσ,i,(a,ai)=ij,(a,ai)∧Va/∈Aσixσ,i,(a,ai)= 0.
Limi ing he obliga ion alphabe We ha e seen in P ope y 1 ha he se
o inpu bindings o an ac i i y aiin sequence σis a subse o Aσi. Howe e
his allows o e y long causal dependencies, ha a e a he un equen in mos
o he logs. We can simpli y he C-ne disco e y p oblem by bounding his se
using a window: we will only allow ac i i y ai o consume obliga ions gene a ed
by ac i i ies ha a e a mos a dis ance wo any occu ence o aiin any o he
sequences o he log. Using a window size o one (w= 1) he numbe o a iables
can be d ama ically educed ( he same p inciples a e used o es ic he ou pu
binding se s), allowing he disco e y o la ge benchma ks. Howe e , when using
his app oach complex causali ies be ween ac i i ies can be missed.
4 Expe imen s
Fi s o all, Table 2 desc ibes some o he examples used in ou expe imen s.
We ha e used logs ob ained by simula ion o C-ne s coming om [2], o ha we
ha e c ea ed o ep esen a a ie y o non- i ial beha io . O he benchma ks
a e logs in oduced in [12], o which we ha e manually c ea ed a C-ne ( o se
a a ge C-ne o achie e) om a Pe i ne gene a ed by a disco e y ool using
he heo y o egions [11]. The able also includes he ollowing in o ma ion: |L|
is he numbe o dis inc sequences in he log, |σm|is he leng h o he la ges
sequence and |A|is he size o he alphabe o ac i i ies.
17
aals 2b [2] |L|= 10, |σm|= 5, |A|= 5
abcde
ab
c
de
abcbcdde
abbbcccddde
abbcdcde
abcbdcde
abcbbdccdde
abbccdde
abbcbdcdcde
mixedXo And |L|= 8, |σm|= 11, |A|= 5
abcdez
See Fig. 3(a)abde bcez
abdcee bd bcez
a12 0n00 5 [12] |L|= 5, |σm|= 7, |A|= 12
SbcejE
S
b
c
d
e
j
g
h
i
k
E
SbdjE
S ghikE
S gihkE
S hgikE
op ional1 |L|= 11, |σm|= 8, |A|= 6
ab
a
b
c
e
d
acbde
abbed
abbbbed
acbed
acbbbed
acbb
abbbbde
abbde
acbbbde
acb
cycles |L|= 7, |σm|= 18, |A|= 8
abc gz
ab
c
d h
g
z
abcdbc gh gz
abc ghdbc gz
abc gdhbc gz
abc dgbhc gz
abc dghbc gz
abcdbcdbc gh gh gz
Table 2. Logs used in he expe imen s
18
We ha e implemen ed Algo i hm 1 in a p o o ype ool ha uses he STP
sol e [9] as he unde lying SMT sol e . We ha e compa ed i wi h he Flexible
Heu is ic Mine (FHM) [3] which is able o disco e a o malism simila o C-ne s
(called lexible heu is ic ne s om a log [3]) . Table 3 shows he esul s on some
small log examples wi h he ollowing in o ma ion: a cs is he numbe o a cs
o he inal C-ne , Tis he elapsed ime (in seconds) equi ed o comple e he
disco e y p ocess, id indica es i he ob ained C-ne was iden ical o he o iginal
one, in he case whe e he log o igina ed om a C-ne , o has he same language
as a Pe i ne ound using he heo y o egions, c is he cos -based i ness pe
case me ic o [13] whe e 1.0 indica es ha all sequences in he log belong o
he language o he C-ne , and he smalle he alue is he less sequences a e
ep oducible by he C-ne , |X∪Y|is he numbe o Boolean a iables used o
encode he SMT p oblem, |E|is he numbe o equa ions ha he SMT p oblem
con ains, bounds is he ini ial ange in he numbe o a cs whe e he bina y
sea ch mus ake place, i is he numbe o i e a ions o ob ain his C-ne , and
column heu indica es i some o he heu is ics in Sec . 3.6 was used, whe e
e e s o es ic ing he i s occu ence o ac i i ies and i(o) o limi ing he
numbe o inpu (ou pu ) bindings pe ac i i y.
FHM Algo i hm 1
Benchma k a cs T id c |X∪Y| |E|bounds a cs i T heu id c
aals 1 (Fig. 2) 6 0.1 n 0.71 156 147 [6,11] 6 2 0.3 – y 1.0
aals 2b 7 0.0 n 0.24 156 147 [5,9] 6 3 0.2 – n 1.0
mixedXo And 8 0.0 n 0.12 219 162 [7,11] 8 3 0.2 – y 1.0
a12 0n00 5 14 0.2 n 0.87 176 143 [12,17] 14 3 0.1 y 1.0
op ional1 6 0.0 n 0.27 413 264 [6,10] 9 2 0.1 ,o y 1.0
cycles 9 0.1 n 0.09 839 542 [8,17] 9 3 1.3 ,i y 1.0
a22 0n00 1 34 0.5 n 0.37 28898 18942 [22,166] ≤39 ≥4>1h – – –
Table 3. Resul s o disco e y algo i hm on small examples.
The esul s on hese small benchma ks show ha he app oach is, in gene al,
able o de i e aluable C-ne s. In ac he quali y o he disco e ed ne s is much
be e han he ones de i ed using FHM. Fo ins ance, he la e gene a es ou
C-ne s wi h emp y language. In con as , Algo i hm 1 always gene a es C-ne s
whose language con ains he gi en log ( i ness=1.0), no only his bu also i
edisco e s he o iginal C-ne s in mos o he cases. Howe e wo logs a e no
success ully disco e ed: o he aals 2b benchma k, we ob ain a C-ne equal o
he one in Table 2, bu wi hou he a c be ween band d; on he o he hand, he
la ges benchma k in his able (a22 0n00 1 om [12]) could no be disco e ed
in he one hou limi used in ou expe imen s. Al hough i seems easonable o
assume ha he la ge numbe o a iables and equa ions is he esponsible o
his ac , a mo e ca e ul e alua ion shows ha his is no he de e minan ac o .
Fo ins ance, in Fig. 5, we can see he ime used by he sol e o sol e he se o
19
equa ions, as he bound in he numbe o a cs allowed is educed. Ini ially, he
sol e inds quickly a solu ion, bu as he bound app oaches he lowe limi , he
ime needed g ows exponen ially. The eason is simple: he se o alid solu ions
diminishes as he bound also educes, and he sol e has o spend mo e ime
sea ching. No e ha he se o equa ions o sol e in all hese cases is exac ly he
same (same a iables, same equa ions) wi h he only excep ion ha he cons an
used o bound he numbe o a cs is di e en .
10
100
1000
10000
40 60 80 100 120 140
Sol e ime (s)
Bound (numbe o a cs)
Fig. 5. s p sol e imes o he a22 0n00 1 benchma k.
To be able o p ocess la ge benchma ks we ha e o eso o ou las heu is ic
(limi ing he obliga ion alphabe ). Table 4 shows he esul s o ou p e ious
benchma ks as well as some la ge examples also om [12]. In his case we ha e
no used any o he heu is ic. Despi e he ac ha he o iginal models we e
disco e ed only in h ee o he benchma ks, in each case he model ound was
ac ually he o iginal bu in which some addi ional inpu and ou pu binding se s
we e p esen . This s ongly sugges s ha a s a egy ha minimizes he numbe
o such se s would g ea ly imp o e he esul s, as well as he o e all eadabili y
o he de i ed C-ne s.
5 Conclusion and Fu u e Wo k
This pape has p esen ed an algo i hm o de i e a C-ne om a se o aces,
which gua an ees minimali y in he numbe o a cs. As u u e wo k, we plan o
inco po a e in he algo i hm new ideas on how o bound he language o he
C-ne ob ained. Also, high-le el s a egies ha can make he app oach able o
handle indus ial examples will be conside ed in he u u e.
20
Benchma k |L| |σm| |A| |X∪Y| |E|bounds a cs i T id c
aals 1 10 5 5 136 137 [6,11] 6 2 0.0 y 1.0
aals 2b 8 11 5 240 246 [5,9] 6 3 0.1 n 1.0
mixedXo And 3 14 7 89 98 [7,11] 8 3 0.0 y 1.0
a12 0n00 5 5 7 12 72 91 [12,17] 14 3 0.0 y 1.0
op ional1 11 8 6 229 220 [6,10] 9 2 0.0 n 1.0
cycles 7 18 8 265 288 [8,17] 9 3 0.1 y 1.0
a22 0n00 1 99 46 22 12827 10369 [22,166] 34 7 9.3 n 1.0
a22 0n00 5 836 76 22 121281 97429 [22,183] 34 7 264.9 n 1.0
a32 0n00 1 100 73 32 26378 19049 [32,362] 46 8 35.4 n 1.0
a42 0n00 1 100 58 42 48432 31815 [42,735] 63 9 248.4 n 1.0
Table 4. Resul s o disco e y algo i hm when heu is ics o limi he numbe o a iables
a e used (ac i i y window o size 1).
Re e ences
1. an de Aals , W.M.P.: P ocess Mining - Disco e y, Con o mance and Enhance-
men o Business P ocesses. Sp inge (2011)
2. Van De Aals , W., Ad iansyah, A., Van Dongen, B.: Causal ne s: a modeling
language ailo ed owa ds p ocess disco e y. In: CONCUR. (2011) 28–42
3. Weij e s, A.J.M.M., Ribei o, J.T.S.: Flexible heu is ics mine ( hm). In: CIDM,
IEEE (2011) 310–317
4. an de Aals , W.M.P., de Medei os, A.K.A., Weij e s, A.J.M.M.: Gene ic p ocess
mining. In: ICATPN. Volume 3536 o LNCS. (2005) 48–69
5. Weij e s, A., an de Aals , W., de Medei os, A.A.: P ocess mining wi h he
heu is ics mine -algo i hm. Technical Repo WP 166, BETA Wo king Pape Se-
ies, Eindho en Uni e si y o Technology (2006)
6. an de Aals , W.M.P., Weij e s, T., Ma us e , L.: Wo k low mining: Disco e ing
p ocess models om e en logs. IEEE TKDE 16(9) (2004) 1128–1142
7. Munoz-Gama, J., Ca mona, J.: A F esh Look a P ecision in P ocess Con o mance.
In: Business P ocess Managemen (BPM). (2010)
8. Jha, S., Limaye, R., Seshia, S.: Bea e : Enginee ing an e icien SMT sol e o
bi - ec o a i hme ic. In: Compu e Aided Ve i ica ion. (2009) 668–674
9. Ganesh, V., Dill, D.L.: A decision p ocedu e o bi - ec o s and a ays. In: Com-
pu e Aided Ve i ica ion. (2007) 524–536
10. Be gen hum, R., Desel, J., Lo enz, R., S.Mause : P ocess mining based on egions
o languages. In: Business P ocess Managemen (BPM). (2007) 375–383
11. Badouel, E., Be na dinello, L., Da ondeau, P.: Polynomial algo i hms o he syn-
hesis o bounded ne s. In: Theo y and P ac ice o So wa e De elopmen (TAP-
SOFT). Volume 915 o LNCS. (1995) 364–383
12. an de We , J.M.E.M., an Dongen, B.F., Hu kens, C.A.J., Se eb enik, A.: P o-
cess disco e y using in ege linea p og amming. In: Pe i Ne s. Volume 5062 o
LNCS. (2008) 368–387
13. Ad iansyah, A., an Dongen, B., an de Aals , W.: Con o mance checking using
cos -based i ness analysis. In: En e p ise Dis ibu ed Objec Compu ing Con e -
ence (EDOC). (2011) 55 –64