1
O iginal Resea ch
Quan i ying S uc u al Selec ion Bias in Obse a ional Coho Da a: A Ponde a ion Analysis o Age -
Speci ic Incidence Ra es o In o m Vaccine Sa e y Ve i ica ion
Ma co Rocce i
Depa men o Compu e Science and Enginee ing
Uni e si y o Bologna, 40126, I aly
ma co. occe [email protected]
ORCID: 0000-0003-1264-8595
Abs ac
Backg ound: A ecen na ionwide coho s udy epo ed an unadjus ed Haza d Ra io (HR) o 2.714 o Vi iligo
incidence ollowing COVID-19 accina ion, indica ing a majo sa e y conce n. This inding was based on coho s
wi h an ≈ 11-yea age di e ence, immedia ely aising c i ical conce ns ega ding ex eme s uc u al selec ion and
de ec ion bias.
Objec i es: We hypo hesize ha his ex eme associa ion is an a i ac o a a al me hodological law, challenging
he s udy's in e nal alidi y and subsequen ex e nal alidi y. We aim o quan i a i ely sepa a e he HR a ibu able
o he s uc u al age imbalance (HR S uc u al) om he esidual HR (HR Residual), which quan i ies he
unco ec ed me hodological ailu e. We u he pe o m a plausible ecalcula ion o isk o demons a e he
comple e collapse o he isk signal upon co ec ing he me hodological ailu e in he baseline coho .
Me hods: We pe o med a s a i ied ponde a ion analysis using he age dis ibu ion o he sc u inized s udy’s
coho s (Vaccina ed, mean age=56.32 yea s s Non-Vaccina ed, mean age=45.51 yea s) and applied es ablished
na ional age-speci ic Vi iligo incidence a es (IR) om ex e nal epidemiology.
2
Resul s: The HR S uc u al was calcula ed o be 1.21. The emaining HR Residual o 2.24 quan i ies he
unco ec ed me hodological ailu e. The NV coho 's obse ed incidence a e (0.67/10,000) was ound o be nea ly
70% lowe han he expec ed a e (2.21 / 10,000), p o iding quan i iable e idence o p o ound non-compa abili y.
The subsequen ecalcula ion o isk, co ec ing o his baseline ailu e, educes he obse ed HR o = 2.714 o
an HR Co ec ed o alomos a uni y, hus comple ely annulling he signal o isk due o accina ion.
Discussion: The HR= 2.714 o he sc u inized s udy is an uns able s a is ical a i ac . The o e whelming majo i y
o he obse ed associa ion is a consequence o a a al design law. The HR Co ec ed o almos 1 con i ms ha
co ec ing he me hodological e o elimina es he isk signal, demons a ing a se e e lack o in e nal and ex e nal
alidi y o he o iginal s udy.
Keywo ds: Co id-19 accine sa e y, S uc u al selec ion bias, Ponde a ion analysis, Haza d a io decomposi ion,
In e nal/Ex e nal alidi y, De ec ion bias
1. In oduc ion
Obse a ional s udies u ilizing na ional egis ies, such as hose conduc ed in Sou h Ko ea [1], ep esen a c i ical
esou ce o pos -ma ke ing su eillance and accine sa e y e i ica ion. Howe e , he eliance on p e-exis ing
da a necessi a es s ic adhe ence o es ablished me hodological s anda ds, no ably he STROBE (ST eng hening
he Repo ing o Obse a ional s udies in Epidemiology) guidelines [2]. The p ima y goal is o ensu e in e nal
alidi y, ha he obse ed associa ion is eal wi hin he s udy con ex , which is a p e equisi e o achie ing ex e nal
alidi y ( ha is gene alizabili y o he b oade popula ion).
The ecen s udy published in [1] epo ed a s ikingly high, unadjus ed Haza d Ra io (HRG oss) o 2.714 o
Vi iligo ollowing COVID-19 accina ion, based on a compa ison be ween a Vaccina ed (V) coho
(mean age=56.32 yea s) and a Non-Vaccina ed (Non-Vaccina ed) coho (mean age=45.51 yea s). This ≈11-yea
age di e ence immedia ely lagged c i ical conce ns ega ding con ounding by indica ion and immo al ime bias
[3]. The shee magni ude o he ≈11-yea age di e ence, coupled wi h he cumula i e incidence a es obse ed
(2.22 s 0.67 pe 10,000), s ongly sugges s ha he coho s we e inhe en ly non-compa able.
3
Ou analysis posi s ha he epo ed HR=2.714 is no a e lec ion o a obus biological signal bu a he a
quan i a i e measu e o a a al design law. We hypo hesize ha an Ex eme S uc u al Selec ion and De ec ion
Bias was in oduced by de ining he coho s in a manne ha a i icially minimized he baseline isk in he NV
g oup, while simul aneously maximizing he de ec ion and p e alence isk in he V g oup. We p esen a igo ous,
quan i a i e me hod, ha is a s a i ied ponde a ion analysis using ex e nal Sou h Ko ean na ional age-speci ic
incidence da a, o decompose he obse ed HR and isola e he ue con ibu ion o he s uc u al bias [6-10].
The quan i a i e indings o he p esen esea ch con i m his hypo hesis: he s uc u al age di e ence alone
accoun s o a calcula ed S uc u al Haza d Ra io (HR S uc u al) o 1.211. Impo an ly, we demons a e ha he
non-compa abili y o he baseline coho (NV) caused i s obse ed incidence a e o d op nea ly 70% below he
expec ed a e. The o e whelming majo i y o he obse ed associa ion is cap u ed by he esidual Haza d Ra io
(HR Residual o 2.24) which s ands as a clea measu e o he unco ec ed me hodological ailu e. We u he
show ha applying a easonable ecalcula ion o isk, which co ec s he ailu e o he baseline incidence a e,
educes he obse ed isk signal om 2.714 o a clinically insigni ican HR o app ox. 1 (HR Co ec ed).
Essen ially, he quan i a i e indings o he p esen esea ch has mo e along wo conca ena ed di ec ions. Fi s we
con i m his hypo hesis ha he s uc u al age di e ence alone accoun s o a calcula ed s uc u al Haza d Ra io
(HR S uc u al) o 1.21. This means ha he obse ed demog aphic imbalance explains mo e han he 12% o he
ini ially epo ed excess isk. Then, we show ha he majo i y o he associa ion cap u ed by he esidual Haza d
Ra io (HR Residual) o 2.24 simply s ands as a clea measu e o he unco ec ed me hodological ailu e. This
subs an ial esidual alue no only s ongly indica es ha he coho s we e no subjec o a common suppo ,
leading o a p o ound iola ion o he assump ion o compa abili y equi ed by he Cox P opo ional Haza ds
model bu can be also explained i one akes in o accoun he d op o nea ly 70% in he incidence a e o Vi iligo
in he NV g oup. Pu ing his in o ma ion in o he calcula ion easily leads o a co ec ed alue o HR o almos 1
ha p ac ically annuls he isk di e ence be ween V and NV subg oups.
Ul ima ely, he goal o his e-e alua ion is o easse he impe a i e o epidemiological alidi y in s udies o
accine sa e y de i ed om obse a ional da a. We demons a e ha sophis ica ed s a is ical adjus men s canno
emedy undamen al laws in coho design whe e non-measu ed con ounding ac o s, such as heal h-seeking
4
beha io , su eillance equency and dis ibu ion o he disease incidence peaks a e une enly dis ibu ed [7]. By
quan i a i ely isola ing and measu ing he non-causal s uc u al bias, ou analysis p o ides a c i ical amewo k
o in e p e ing ex eme isk es ima es and ensu ing ha public heal h conclusions a e based on associa ions ha
a e epidemiologically sound, a he han a i ac ual.
2. Me hods
We he e p o ide all he undamen al me hods and da a use ul o he aim o ponde ing he s uc u al bias on which
we a e ines iga ing.
2.1 S udy Da a and Baseline Cha ac e is ics
We ex ac ed he ollowing key da a om [1] o es ablish he basis o he s uc u al bias as epo ed in he ollowing
Table 1.
Table 1: Baseline Cha ac e is ics and Unadjus ed Incidence Ra es om [1].
Coho
Mean Age
S anda d De ia ion
(SD)
Cumula i e Incidence Ra e (a 3 mo) (pe
10,000 p-y)
Non-Vaccina ed
(NV)
45.51 yea s
17.31
P(NV) = 0.67
Vaccina ed (V)
56.32 yea s
16.55
P(V) = 2.22
2.2 S a i ied Ponde a ion Analysis
We pe o med a i s p elimina y quan i a i e analysis by combining he age dis ibu ion pe cen ages P(i) o he
V and NV g oups o [1] wi h independen , es ablished age-speci ic annual incidence a es IR(i) o Vi iligo in
Sou h Ko ea, based on 2019 da a as epo ed in [5]. I is epo ed in Table 2 below.
5
Table 2: Inpu Da a o Ponde a ion Analysis
Age G oup
Sou h Ko ean IR (pe 10,000 p-y)
% in NV G oup
P(i, NV)
% in V G oup
P(i,V)
< 20 y
3.4241
No included in [1]
No included in [1]
20-29 y
1.5717
18.46%
9.92%
30-39 y
1.7813
25.49%
7.70%
40-49 y
1.9053
20.92%
10.82%
50-59 y
2.5874
14.16%
24.76%
>= 60 y
3.3643
20.97%
45.79%
To al
—
100%
100%
A key obse a ion is ha , i should be no ed he e ha he composi ion o he NV g oup is s ongly in luenced by
he p esence o young indi iduals who a e al eady pas he i s peak o Vi iligo onse , con as ing wi h he massi e
p esence in he NV g oup o olde indi iduals en i ely included in he in e al whe e he second incidence peak
o Vi iligo mani es s, which is uni e sally ecognized as ha ing a bimodal dis ibu ion (unde 20, o e 40/50 yea s)
[5].
2.3 Calcula ion o HR S uc u al and HR Residual
The Expec ed Annual Incidence Ra e, IR(Expec ed,) o each coho , based solely on i s s uc u al age
composi ion, can be calcula ed using he ollowing Fo mula 1:
IR(Expec ed) = ∑(IR(i) × P(i)). (1)
6
Whe e IR(i) a e he Sou h Ko ean age-speci ic incidence a es om ex e nal da a (Table 2) and P(i) a e he
p opo ional dis ibu ions o he espec i e coho (V o NV) epo ed in he same Table 2.
Applying his ponde a ion o he Non-Vaccina ed (NV) coho demog aphics, we ob ain he baseline expec ed
incidence, IR(NV, Expec ed) exac ly as ollows:
IR(NV, Expec ed) = (1.5717 × 0.1846) + (1.7813 × 0.2549) + (1.9053 ×0.2092) + (2.5874 × 0.1416) + (3.3643 ×
0.2097) ≈ 2.2146 / 10,000.
Simila ly, applying he ponde a ion o he Vaccina ed (V) coho demog aphics yields IR(V, Expec ed):
IR(V, Expec ed) = (1.5717 × 0.0992) + (1.7813 × 0.0770) + (1.9053 × 0.1082) + (2.5874 × 0.2476) + (3.3643 ×
0.4579) ≈ 2.6804 / 10,000.
This allowed us o calcula e he HR S uc u al as ollows: HR S uc u al = 2.6804 / 2.2146 ≈ 1.21.
Finally, he HR Residual can be compu ed as he a io be ween he HR p o ided in [1] ( e med HR Obse ed) and
ou compu ed HR S uc u al: HR Residual = HR Obse ed / HR S uc u al = 2.714 / 1.21 ≈ 2.24.
The consis ency o his mul iplica i e decomposi ion is con i med by he nea -exac econs uc ion o he obse ed
alue: HR S uc u al × HR Residual ≈ 1.21 × 2.24 ≈ 2.714.
3. Resul s
We he e p o ide he wo main se o esul s s emming om ou analysis whose me hodology has been an icipa ed
in he p e ious Sec ion.
3.1 Decomposi ion o he Obse ed Haza d Ra io and Incidence Disc epancy
Following he calcula ion o he Expec ed Incidence Ra es IR(Expec ed) based solely on he s uc u al age
composi ions o he wo coho s (Sec ion 2.3), we p oceeded o quan i y he ue ex en o he me hodological
ailu e. This in ol ed decomposing he high, obse ed HR Obse ed =2.714 om [1] in o wo dis inc componen s:
he isk a ibu able pu ely o he s uc u al age imbalance (HR S uc u al) and he isk s emming om all o he
7
unco ec ed design laws and selec ion biases (HR Residual). Since Haza d Ra ios combine mul iplica i ely, ha
is HR Obse ed = HR S uc u al × HR Residual, he HR Residual hus ac s as a p ecise me ic o he deg ee o
non-compa abili y ha pe sis s despi e accoun ing o he known age di e ence. The b eakdown o his isk is
de ini ely p esen ed in Table 3.
Table 3: Decomposing he Obse ed Haza d Ra io (HR=2.714 [1])
Pa ame e
Desc ip ion
Value
Con ibu ion o Excess Risk:
(HRx−1)/(HRObs −1) x 100
HR Obse ed
Unadjus ed Haza d
Ra io om [1]
2.714
100%
HR S uc u al
HR due o Age
S uc u al Bias Alone
1.21
12.25%
HR Residual
HR Unexplained by
S uc u al Age Bias
2.24
72.35%*
The mos salien inding o he ponde a ion analysis shown abo e is he ex eme di e gence be ween he expec ed
and obse ed incidence a es in he e e ence coho . The demog aphic composi ion o he Non-Vaccina ed (NV)
subg oup p edic ed an expec ed Incidence Ra e o 2.21 / 10,000 based on es ablished na ional age-speci ic a es.
Howe e , he a e obse ed in [1] was only 0.67 / 10,000. This disc epancy means he obse ed incidence in he
e e ence coho was nea ly 70% lowe han me hodologically expec ed, p o iding immedia e, quan i iable
e idence o he p o ound non-compa abili y o he baseline g oups.
Fu he mo e, i is c ucial o emphasize how ou obus , age-speci ic ponde a ion analysis has shown ha he
s uc u al age di e ence explains only 12.25% o he excess isk signaled by he au ho s o [1]. The o e whelming
majo i y o he associa ion (72.35%, esul ing in an HR Residual o ≈ 2.24) is en i ely a ibu able o unco ec ed
me hodological laws which should be a ibu ed o a basic ailu e in he cons uc ion o he coho and heu
8
subg oups as hypo hesized below. I is also o be no ed (le mos bo om cell o he Table abo e) ha being his
modeling mul iplica i e in na u e i s ansla ion in pe cn ages hides a so called In e ac ion Te m, mos ly due o
ounding du ing he ela i e ope a ions. This explain why we do no ge a pe ec 100%.
3.2 Recalcula ion o Risk: The Collapse o he Obse ed Associa ion
The massi e quan i a i e disc epancy iden i ied in Sec ion 3.1, whe e he IR(NV, Obse ed) o 0.67 / 10,000 alls
sho o he IR(NV, Expec ed) o app ox. 2.21 / 10,000, by nea ly 70%, due o non-compa abili y bias, allows us
o pe o m a undamen al ecalcula ion o isk. We ob iously hypo hesize ha had he NV coho been p ope ly
ma ched o unmeasu ed con ounde s, he ue baseline isk would ha e app oached he expec ed incidence a e
o 2.21 / 10,000. Fu he mo e, i we assume, o he sake o his calcula ion, ha he De ec ion Bias ac ing on he
Vaccina ed (V) coho is negligible (i.e., ha IR(V, Obse ed) ≈ IR(V, T ue), we can calcula e he esul ing Haza d
Ra io (HR Co ec ed) as ollows: HR Co ec ed = IR(V, Obse ed) / IR(NV, Expec ed) ≈ 1.
This simple ecalcula ion demons a es ha by co ec ing he me hodological ailu e o he denomina o alone,
he obse ed isk signal o 2.714 collapses en i ely o ≈ 1. An HR o 1 would mean a clinically insigni ican
inc eased isk, e ec i ely educing he s udy's inding o a null hypo hesis.
4. Discussion
We will summa ize he e he key akeaways o his discussion in o wo p ima y issues o ensu e hey a e p ope ly
highligh ed, no ing ha a he hea o he ma e lie p oblems o loss o compa abili y and esul ing clinical
signi icance.
4.1 The Collapse o In e nal Validi y: The Double Dis o ion Mechanism
The pe sis ence o he high esidual HR (2.24) a e obus adjus men o age s uc u e (HR S uc u al = 1.21)
p o ides a de ini i e quan i a i e p oo ha he coho s ha e been cons uc ed as non-compa able. The s udy's
design o [1] su e s om a double dis o ion mechanism ha undamen ally iola es he co e p emise o
obse a ional epidemiology.
9
Fi s , he a i icial baseline dep ession o he NV sub-g oup is a ibu ed o he Selec ion Bias e ealed in Sec ion
3.1: he IR(NV) being 70% lowe han expec ed indica es ha he indi iduals composing he denomina o a e no
ep esen a i e o he gene al popula ion. This massi e de ici is caused by he inclusion o a mino i y o olde
indi iduals (>= 60 yea s) who a e pa o an excep ionally heal hy su i o coho and/o ha e minimal medical
in e ac ion [8], hus ensu ing p o ound unde -de ec ion o Vi iligo cases. This ailu e o cap u e he ue baseline
isk, dep essing he denomina o om he expec ed 2.21 / 10,000 o he obse ed 0.67 / 10,000, is he p ima y
quan i a i e d i e o he esul ing high HR Residual ≈ 2.24.
Second, we con on an in la ed incidence by de ec ion and isk in he V subg oup. Con e sely, he V g oup's
composi ion o [1] (≈ 70% aged >= 50 yea s) gua an ees maximal isk exposu e because his demog aphic ac i ely
in e cep s he second, highe incidence peak o Vi iligo (as seen in he Incidence Ra e da a be o e). This s uc u al
law, coupled wi h he heigh ened su eillance and u iliza ion bias ypical o accina ed g oups, ensu es ha bo h
he in insic isk and he diagnos ic de ec ion a e a e maximized. This ex eme s uc u al sepa a ion, especially
in he high- isk and high-su eillance age ca ego ies, ep esen s a iola ion o he common suppo assump ion.
The Cox model employed in he sc u inized s udy, he e o e, did no compa e like wi h like, bu a he measu ed
he isk di e en ial be ween an a i icially clean con ol g oup and a maximally su eilled isk g oup.
4.2 Clinical and Ex e nal Validi y Implica ions o Vaccine Sa e y Ve i ica ion
The ex eme HR Residual ≈ 2.24 canno be in e p e ed as a genuine biological e ec . The ecalcula ion o Risk
pe o med in Sec ion 3.2, which demons a es he comple e collapse o he isk signal o HR ≈ 1 upon co ec ing
he me hodological baseline ailu e, p o ides compelling e idence agains a genuine biological associa ion. I he
HR Residual we e indeed he genuine accine e ec , i would imply ha he ex ensi e me hodological ailu e
obse ed in he denomina o (≈ 70% unde - epo ing) mus be en i ely compensa ed by an unmeasu ed bias in he
nume a o . Epidemiologically, i is a mo e plausible ha he 2.24 ep esen s he cumula i e e ec o unco ec ed
Selec ion and De ec ion Bias, a di ec consequence o he s udy's aul y design.
The clinical implica ion is ha he epo ed HR = 2.714 o [1] is g a ely misleading o pa ien s and clinicians. I
does no e lec he inc emen al isk o accina ion bu a he he di e ence in unde lying heal h and heal hca e
seeking beha io be ween wo demog aphically dis inc g oups in Sou h Ko ea. This me hodological ailu e is a