scieee Science in your language
[en] (orig)

In-Browser Agentic Web: a Decentralized Approach to Information Access

Author: Zerhoudi, Saber; Granitzer, Michael
Publisher: Zenodo
DOI: 10.5281/zenodo.17229737
Source: https://zenodo.org/records/17229737/files/ossym-2025--RLM-P01--S90--In-Browser-Agentic-Web--A-Decentralized-Approach-to-Information-Access--S-Zerhoudi--v01-doi.pdf
IN-BROWSER AGENTIC WEB: A DECENTRALIZED APPROACH TO
INFORMATION ACCESS
S. Ze houdi, M. G ani ze , Uni e si y o Passau, Passau, Ge many
Abs ac
The cen aliza ion o web sea ch aises c i ical conce ns
ega ding p i acy p o ec ion and use au onomy in in o -
ma ion access. While ad ancemen s in web echnologies
o e new possibili ies o pe sonal in o ma ion managemen ,
cu en sea ch sys ems ypically p ocess use da a on ex e -
nal se e s wi h limi ed pe sonaliza ion op ions. This pa-
pe in oduces a concep ual me hodology o b owse -based
web indexing ha p ocesses and s o es da a locally, add ess-
ing hese p i acy and con ol limi a ions. Ou app oach
implemen s a ge ed c awling mechanisms aligned wi h in-
di idual use in e es s and main ains all ope a ions wi hin
he b owse en i onmen . The echnical amewo k con-
e s web con en in o dense ec o ep esen a ions h ough
seman ic embedding echniques, enabling e icien s o age
and e ie al wi hin b owse cons ain s. The a chi ec u e
ea u es: (1) an in-b owse language model o seman ic
sea ch and con ex -awa e con en gene a ion, (2) adap i e
c awling algo i hms ha adjus pa ame e s based on s o age
limi a ions and use beha io , and (3) inc emen al upda -
ing mechanisms o main ain index eshness. E alua ion
app oaches using bo h simula ion-based assessmen and hu-
man pa icipan alida ion a e p oposed. This wo k encou -
ages esea ch on p i acy-p ese ing web sea ch echnologies
and es ablishes a ounda ion o de eloping use -con olled
in o ma ion e ie al sys ems.
INTRODUCTION
The digi al ecosys em’s apid g ow h in web con en
c ea es bo h oppo uni ies and challenges o in o ma ion
e ie al. Cu en web sea ch se ices, con olled by a
ew majo co po a ions like Google and Mic oso , ypi-
cally employ use acking, cen alized indexing, and undis-
closed algo i hmic me hods, aising conce ns abou p i acy,
da a so e eign y, algo i hmic anspa ency, and o line ac-
cess [G ani ze e al.(2024),Hend iksen e al.(2024a)].
These cen alized sea ch p o ide s ely on collec ing and
analyzing use da a o imp o e sea ch ele ance and ad-
e ising e enue, which aises e hical ques ions ega ding
p i acy and con ol. Thei algo i hmic p ocesses o en lack
anspa ency, po en ially enabling manipula ion o biased
esul s in luenced by comme cial o poli ical ac o s [G an-
i ze e al
.
(2024)]. This opaci y unde mines use us and
may comp omise in o ma ion eliabili y. Addi ionally, he
ma ke dominance o a ew sea ch p o ide s has also led
o p ac ices like Sea ch Engine Op imiza ion (SEO), whe e
con en c ea o s p io i ize algo i hmic isibili y o e in o -
ma ional quali y and use alue.
In esponse o hese limi a ions, esea ch in e es has
shi ed o decen alized, anspa en , and p i acy-conscious
al e na i es. The Open Web Index (OWI) ini ia i e p omo es
openly accessible indexing in as uc u es and s anda ds, em-
phasizing anspa ency, collabo a ion, and open da a p in-
ciples [Hend iksen e al
.
(2024b)]. OWI add esses cen al-
ized indexing challenges by c ea ing public da a s uc u es
ha democ a ize sea ch engine de elopmen . This p ojec
employs ex ensi e indexing ope a ions suppo ed by high-
pe o mance compu ing (HPC) esou ces ac oss Eu ope,
aiming o di e si y he digi al in o ma ion ecosys em.
Concu en wi h hese la ge-scale e o s, ad ances in
b owse -based AI in e ence echnologies ha e c ea ed new
possibili ies o p i acy- ocused and pe sonalized web in-
dexing. Recen de elopmen s, such as WebLLM [Ruan
e al
.
(2024)], demons a e he easibili y o unning sophis i-
ca ed AI models di ec ly wi hin b owse s. These echnolo-
gies le e age WebGPU [Kenw igh (2022)] and WebAssem-
bly [Haas e al
.
(2017)] o enable e icien local p ocessing
wi hou ex e nal cloud se ices. By p ocessing da a locally,
hese b owse -based app oaches inhe en ly enhance p i acy
and use au onomy.
Adap i e web c awling echniques d i en by seman ic
modeling o use in e es s ha e eme ged as essen ial com-
ponen s o pe sonalized e ie al [Du ga e al
.
(2024)]. Un-
like adi ional ixed c awling algo i hms, use modeling
app oaches ha adap o b owsing pa e ns and eal- ime
in e ac ions imp o e e ie al accu acy and ele ance. This
adap i e me hodology ensu es con en emains speci ic and
cu en , enhancing use expe ience.
Ou esea ch p oposes an in-b owse web indexing ap-
p oach ha in eg a es a ge ed, adap i e c awling and con-
en acquisi ion based on use -de ined in e es s, local in-
dexing using comp essed ec o embeddings, and seman ic
sea ch powe ed by b owse -based language models. This
me hodology add esses limi a ions o cen alized sys ems
by p io i izing p i acy, pe sonaliza ion, o line unc ionali y,
and use con ol.
The app oach cen e s on c ea ing a localized, b owse -
con ained seman ic index using comp essed dense embed-
dings, p o iding con ex ual unde s anding beyond keywo d-
based echniques. This allows he sys em o deli e pe son-
alized sea ch esul s wi hin he use ’s local en i onmen .
Ou wo k ex ends p inciples om he OWI ini ia i e bu
adap s hem o b owse en i onmen s. Ra he han employ-
ing collabo a i e indexing a scale, ou app oach ocuses on
localized da a o ganiza ion, e icien embedding me hods,
and s eamlined in e ence capabili ies sui able o esou ce-
limi ed pe sonal compu ing con ex s.
Ou con ibu ions include: (1) p oposing a concep ual de-
sign o a decen alized, p i acy-p ese ing b owse -based
web indexing app oach ha add esses p i acy, au onomy,
and o line access challenges; (2) de ining heo e ical adap-
h ps://doi.o g/10.5281/zenodo.17229737
i e c awling and con en acquisi ion me hods based on se-
man ic use -in e es models ha align con en e ie al wi h
p e e ences; (3) ou lining e icien seman ic embedding ech-
niques op imized o b owse -based s o age and compu a-
ion cons ain s; and (4) desc ibing po en ial in eg a ion
o b owse -based language model capabili ies suppo ing
seman ic sea ch and e ie al-augmen ed gene a ion o pe -
sonalized con en .
RELATED WORK
B owse echnologies ha e ad anced subs an ially om
basic ende ing o sophis ica ed local compu a ion capabil-
i ies. Ex ending WebLLM’s wo k [Ruan e al
.
(2024)], e-
sea che s ha e u he op imized on-de ice language model
in e ence, educing memo y equi emen s and imp o ing
execu ion speed. These echnical ad ances complemen
p i acy-enhancing echnologies esea ch, whe e [Kuma
e al
.
(2025)] de eloped amewo ks o e alua ing p i acy
p ese a ion in AI applica ions wi hou unc ionali y com-
p omises.
Vec o space ep esen a ion o web con en has enhanced
in o ma ion e ie al beyond keywo d ma ching. Recen
embedding echniques cap u e seman ic ela ionships and
con ex ual nuances ha keywo d app oaches canno add ess.
Embedding comp ession me hods ha e educed s o age e-
qui emen s by up o 75% while main aining 90% o seman ic
in eg i y [Li e al
.
(2024)]. These e iciency imp o emen s
a e pa icula ly aluable o b owse en i onmen s wi h s o -
age cons ain s. Resea ch shows ha op imized quan iza-
ion and dimension educ ion echniques main ain e ie al
quali y while educing compu a ional demands, balancing
seman ic p ecision wi h esou ce limi a ions.
Adap i e c awling me hodologies ha e p o en e ec-
i e beyond basic pe sonaliza ion. Building on [Du ga
e al
.
(2024)]’s use modeling wo k, subsequen s udies ha e
measu ed bene i s showing up o 40% imp o emen in con-
en ele ance h ough dynamic c awling pa ame e adjus -
men . These app oaches combine use in e ac ion signals
wi h con en classi ica ion o c ea e e ined in e es models.
By analyzing con en consump ion pa e ns, dwell ime, and
explici p e e ences, hese sys ems de elop accu a e ep e-
sen a ions o use in o ma ion needs ha e ol e o e ime.
This adap abili y pa icula ly bene i s specialized knowledge
domains whe e s anda d c awling o en misses ele an bu
less-connec ed con en .
Dis ibu ed indexing sys em a chi ec u e has e ol ed be-
yond simple cen alized/decen alized di isions. The Eu o-
pean OpenWebSea ch.eu
1
p ojec demons a es how ed-
e a ed app oaches can dis ibu e compu a ional wo kloads
while main aining consis en access pa e ns. Thei ed-
e a ed s o age app oach sepa a es c awling, indexing, and
e ie al componen s, allowing speci ic op imiza ion o each
elemen [G ani ze e al
.
(2025)]. This a chi ec u al pa e n
in o ms ou b owse -based app oach, hough we adap hese
p inciples o ope a e en i ely wi hin he clien en i onmen .
1h ps://openwebsea ch.eu/
Con en eshness main enance in limi ed- esou ce en-
i onmen s ep esen s ano he ele an esea ch di ec ion.
T adi ional sea ch engines use con inuous c awling wi h
ex ensi e se e in as uc u e, bu esou ce-cons ained sys-
ems equi e mo e s a egic app oaches. Recen esea ch
shows ha selec i e ec awling based on con en ola il-
i y p edic ion can main ain index eshness wi h educed
compu a ional equi emen s [Gossen e al
.
(2015)]. These
p edic ions use con en ype, his o ical upda e pa e ns, and
domain cha ac e is ics o p io i ize ec awling o apidly
changing con en while conse ing esou ces o s able in-
o ma ion.
While hese esea ch a eas p o ide aluable ounda-
ions, in eg a ing hem in o a cohesi e b owse -based in-
dexing sys em p esen s unique challenges ha emain in-
su icien ly add essed. Cu en app oaches end o o-
cus on indi idual componen s—ei he op imizing language
models [Ruan e al
.
(2024)], imp o ing ec o ep esen a-
ions [Li e al
.
(2024)], enhancing c awling s a egies [Du ga
e al
.
(2024)], o de eloping dis ibu ed a chi ec u es [Hen-
d iksen e al
.
(2024b)]—wi hou ully conside ing how hese
elemen s in e ac wi hin b owse cons ain s. Ou wo k syn-
hesizes hese ad ances in o a comp ehensi e amewo k
speci ically designed o b owse en i onmen s, add essing
he echnical limi a ions and p i acy conce ns inhe en in
cen alized sea ch sys ems. By combining adap i e c awling,
e icien seman ic indexing, and local e ie al augmen a ion,
we p opose a sys em ha balances pe o mance equi emen s
wi h p i acy p ese a ion. The ollowing sec ions de ail ou
concep ual a chi ec u e and ope a ional wo k low, demon-
s a ing how hese componen s wo k oge he o enable pe -
sonalized web indexing di ec ly wi hin he b owse .
CONCEPTUAL ARCHITECTURE
This sec ion ou lines a concep ual app oach o b owse -
based web indexing designed o enhance p i acy and pe son-
aliza ion. The me hodology add esses cons ain s o cen al-
ized sea ch sys ems h ough clien -side p ocessing, s o age,
and e ie al echniques ha unc ion wi hin web b owse
limi a ions while enhancing use con ol. The me hodology
enables localized in o ma ion managemen ha educes de-
pendency on ex e nal sea ch p o ide s while main aining
sea ch unc ionali y.
Con en Acquisi ion
The ounda ion o e ec i e pe sonalized indexing begins
wi h selec i e con en acquisi ion based on use in e es s.
Unlike con en ional web c awle s ha aim o comp ehen-
si e co e age, his app oach employs a ge ed c awling o e-
ie e only con en aligned wi h indi idual use p e e ences,
he eby educing s o age equi emen s while enhancing el-
e ance.
The sys em would cons uc dynamic use in e es p o-
iles h ough mul iple mechanisms. Building on [Du ga
e al
.
(2024)]’s use modeling app oach, he p o ile would
inco po a e bo h explici inpu s (use -speci ied opics, do-
h ps://doi.o g/10.5281/zenodo.17229737
mains, and keywo ds) and implici signals (b owsing pa -
e ns, bookma king beha io , and con en in e ac ion his-
o ies). These p o iles would con inuously e ol e h ough
adap i e algo i hms ha de ec shi s in in e es s and adjus
acco dingly.
Guided by hese p o iles, he c awling componen would
assign p io i y sco es o po en ial URLs based on seman ic
alignmen wi h use in e es s. This p io i iza ion mecha-
nism would conside bo h con en simila i y o es ablished
in e es s and explo a ion po en ial o adjacen opics. The
c awle would main ain compliance wi h web s anda ds and
si e policies, espec ing obo s. x di ec i es and implemen -
ing app op ia e a e limi ing o ensu e esponsible esou ce
u iliza ion.
Beyond c awling me hods, he sys em would o e al e -
na i e con en acquisi ion pa hways. Use s can le e age
he OWI Py hon clien (owilix) de eloped by [G ani ze
e al
.
(2025)], which p o ides sophis ica ed da ase manage-
men capabili ies speci ically designed o OWI en i on-
men s. This ool enables e icien pushing and pulling o
da ase s and suppo s emo e SQL que y execu ion, allowing
use s o e ie e daily index slices p ecisely ailo ed o hei
in e es s wi hou he o e head o ull c awling ope a ions.
Fo use s wi h p i a e documen collec ions, he sys-
em would implemen a secu e, p i acy-p ese ing inges-
ion pipeline. This p ocess begins wi h he secu e pa sing
o pe sonal documen s s o ed in a sel -hos ed cloud solu-
ion, ex ac ing aluable ex ual con en and me ada a. The
ex ac ed in o ma ion is hen no malized and loaded in o
DuckDB [Raas eld and Mühleisen(2019)], a ligh weigh
analy ical da abase deployed wi hin he use ’s p i a e in as-
uc u e. This embedded da abase e icien ly indexes he
con en , c ea ing op imized s uc u es o apid que ying. To
enable seamless in eg a ion wi h clien -side applica ions, he
indexed con en can be expo ed om DuckDB in JSON o
simila se ializable o ma s and impo ed in o a comp essed
b owse da abase. This inal s ep b idges se e -side index-
ing wi h clien -side s o age, p o iding use s wi h e icien
o line sea ch capabili ies while main aining end- o-end p i-
acy p o ec ion h oughou he en i e pipeline.
Seman ic Indexing
Once con en is acqui ed, he sys em would ans o m i
in o op imized ep esen a ions sui able o b owse -based
s o age and e ie al. The p ima y mechanism o his ans-
o ma ion would be dense ec o embeddings ha cap u e
seman ic ela ionships be ween con en i ems beyond simple
keywo d ma ching.
These embeddings would map ex ual con en in o mul-
idimensional seman ic spaces whe e p oximi y indica es
concep ual simila i y. D awing inspi a ion om echniques
desc ibed by [Li e al
.
(2024)], he sys em would gene a e em-
beddings a mul iple g anula i y le els, om documen -wide
ep esen a ions o sen ence-le el encodings. A key ea u e
would be adjus able dimensionali y, allowing dynamic bal-
ancing be ween seman ic p ecision and s o age e iciency.
This adap abili y would enable he sys em o ope a e e ec-
i ely ac oss de ices wi h a ying esou ce cons ain s.
The p ocessed con en would eside in comp essed
b owse da abases u ilizing echnologies like IndexedDB [Al-
Shaikh and Slei (2017)]. To maximize s o age e iciency
wi hin b owse cons ain s, he sys em would implemen
s uc u ed da a pa i ioning inspi ed by la ge -scale ap-
p oaches om he Open Web Index ini ia i e [G ani ze
e al
.
(2025)]. Con en would be o ganized in o logical seg-
men s based on sou ce domains, empo al ac o s, and he-
ma ic ca ego ies, enabling e icien que y p ocessing. Addi-
ionally, me ada a elemen s such as i les, con en acquisi ion
da es, and language indica o s would be in eg a ed di ec ly
alongside seman ic ep esen a ions o acili a e apid il e -
ing and esul e inemen du ing e ie al ope a ions.
In e ac i e Re ie al
The e ie al p ocess would begin wi h que y encoding,
ans o ming use in o ma ion needs in o he same seman ic
ec o space used o con en ep esen a ion. These que y
embeddings would hen unde go simila i y compa ison
agains he indexed con en using es ablished me ics such
as cosine simila i y, iden i ying he mos ele an ma ches
om he local da abase.
Building on ecen ad ances in b owse -based AI ame-
wo ks demons a ed by WebLLM [Ruan e al
.
(2024)], he
sys em would inco po a e a locally execu ed language
model o ad anced e ie al and con en syn hesis. This
model would implemen e ie al-augmen ed gene a ion
(RAG) echniques, using he locally indexed con en o
g ound i s esponses in use -speci ic in o ma ion sou ces.
The b owse -na i e execu ion would le e age echnologies
like WebGPU [Kenw igh (2022)] and WebAssembly [Haas
e al
.
(2017)] o op imize pe o mance wi hin clien -side con-
s ain s.
Use con ol would emain cen al o he e ie al p o-
cess h ough cus omizable sea ch pa ame e s. These would
include domain-speci ic weigh ings (p io i izing p e e ed
sou ces), empo al il e s ( ocusing on ecen o his o ical
con en ), and adjus able balance be ween seman ic simila -
i y and me ada a ma ching. These cus omiza ion op ions
would allow use s o ailo he sys em’s beha io o speci ic
in o ma ion-seeking con ex s, om explo a o y esea ch o
a ge ed ac - inding.
Index F eshness Managemen
Main aining ele ance o e ime equi es mechanisms o
con en e esh and index op imiza ion. The p oposed sys em
would implemen con ex -awa e scheduling o ec awling
ope a ions, p io i izing sou ces based on ac o s including
upda e equency, use engagemen pa e ns, and con en
ola ili y.
Ins ead o comple e eindexing, he sys em would employ
inc emen al p ocessing echniques ha e icien ly in eg a e
new con en in o exis ing indices. This app oach would
minimize compu a ional o e head while ensu ing he index
emains cu en . The scheduling mechanism would balance
h ps://doi.o g/10.5281/zenodo.17229737
mul iple ac o s: use p e e ences, connec i i y condi ions,
and de ice esou ce a ailabili y, p e e en ially pe o ming
in ensi e ope a ions du ing op imal condi ions (e.g., du ing
low-ac i i y nigh ime hou s).
Con en p uning s a egies would p e en unbounded in-
dex g ow h by iden i ying and emo ing ou da ed o low-
ele ance i ems om he da abase. These decisions would
conside mul iple signals including ecency, access e-
quency, and seman ic edundancy wi h newe con en . This
comp ehensi e main enance app oach would ensu e he sys-
em emains esponsi e and esou ce-e icien o e ex ended
usage pe iods while adap ing o e ol ing use in e es s.
OPERATIONAL WORKFLOW
This sec ion desc ibes he concep ual wo k low and com-
ponen in e ac ions in he p oposed b owse -based indexing
app oach. The design in eg a es a ious p ocesses o enable
pe sonalized in o ma ion access while main aining use p i-
acy and con ol h oughou he ope a ional cycle. Figu e 1
shows an o e iew o he in-b owse app oach a chi ec u e
and wo k low.
Use Modeling Ini ializa ion
The p oposed sys em would begin wi h minimal se up
equi emen s, a oiding in usi e in o ma ion ga he ing du -
ing ini ializa ion. Ins ead o demanding ex ensi e up on
con igu a ion, he sys em would g adually build use in e es
p o iles h ough wo complemen a y mechanisms.
The passi e obse a ion componen would analyze con-
en om pas con e sa ional sea ch ac i i ies and pages
isi ed du ing no mal b owsing in acco dance wi h use p i-
acy p e e ences. This ligh weigh seman ic analysis would
ex ac key concep s, en i ies, and opics wi hou dis up ing
use expe ience. The ex ac ed in o ma ion would popula e
an ini ial in e es model ha e ol es o e ime as he use
con inues b owsing.
Complemen ing passi e obse a ion, he sys em would
p o ide explici eedback mechanisms h ough which use s
could e iew, modi y, o emo e in e es s iden i ied by he
sys em. These con ols would be p ominen ly accessible
wi hin he b owse ex ension, ensu ing use s main ain awa e-
ness and con ol o e hei in e es p o iles.
Adap i e C awling S a egy
Once use in e es s a e es ablished, he con en acquisi ion
p ocess would begin. The c awling componen would em-
ploy a dynamic p io i iza ion mechanism ha e alua es po-
en ial URLs based on mul iple ac o s: seman ic alignmen
wi h iden i ied in e es s, b owsing equency and his o ical
engagemen pa e ns. This p io i iza ion would op imize
esou ce alloca ion by ocusing on con en mos likely o
p o ide alue o he speci ic use .
To ope a e e ec i ely wi hin b owse cons ain s, he
c awle would implemen adap i e esou ce managemen
echniques. These would include adjus able pa ame e s o
concu en eques s, c awling dep h, and scheduling e-
quency based on de ice capabili ies and connec ion s a us.
Du ing ac i e b owsing sessions, he c awle would educe
i s ac i i y o minimize impac on pe o mance, while po-
en ially inc easing ac i i y du ing idle pe iods.
The c awle would espec obo s. x di ec i es, imple-
men app op ia e a e limi ing, and ollow s anda dized
c awling policies. These p ac ices would ensu e he sys-
em beha es esponsibly wi hin he b oade web ecosys em
while ga he ing pe sonalized con en .
In-B owse Indexing
The indexing p ocess would ope a e en i ely wi hin he
b owse en i onmen , ans o ming e ie ed con en in o
sea chable ep esen a ions. Con en p ocessing would begin
wi h seman ic embedding gene a ion, con e ing ex ual con-
en in o dense ec o ep esen a ions using locally s o ed o
dynamically loaded models. These embeddings would cap-
u e seman ic ela ionships be ween con en i ems, enabling
meaning-based a he han keywo d-based e ie al.
Following embedding gene a ion, he sys em would ex-
ac and in eg a e me ada a elemen s including i les, con-
en acquisi ion da es, sou ce in o ma ion, and language indi-
ca o s. This s uc u ed app oach would enable e icien il e -
ing du ing sea ch ope a ions. The indexed con en would be
o ganized using pa i ioning s a egies inspi ed by he OWI
p ojec [G ani ze e al
.
(2025)], di iding he index logically
by con en o igin, opical domains, o empo al ac o s.
To main ain index eshness while minimizing compu a-
ional demands, he sys em would implemen inc emen al
upda ing mechanisms. Ra he han ebuilding he en i e in-
dex when new con en is acqui ed, only changes would be
p ocessed and in eg a ed. A local changelog would ack
modi ica ions enabling e icien upda es. The sys em would
also employ in elligen p uning algo i hms o emo e ou -
da ed o low- ele ance con en , p e en ing unbounded index
g ow h o e ime.
Re ie al-Augmen ed Sea ch
When use s ini ia e a sea ch que y, he in-b owse lan-
guage model would p ocess he inpu o unde s and he in-
o ma ion need. The que y would be encoded in o he same
ec o space used o con en ep esen a ion, enabling di ec
compa ison be ween he que y and indexed con en . The
e ie al engine would iden i y ele an con en based on
seman ic simila i y measu emen s, e u ning esul s anked
by ele ance o he use ’s que y.
Fo complex in o ma ion needs, he sys em would imple-
men e ie al-augmen ed gene a ion as desc ibed by [Ruan
e al
.
(2024)]. This app oach would g ound language model
ou pu s in he use ’s pe sonal index, combining he lexibili y
o gene a i e AI wi h he accu acy o e ie ed in o ma ion.
By le e aging locally s o ed con en , esponses would e-
lec he use ’s speci ic knowledge base a he han gene ic
in o ma ion.
The sea ch in e ace would p o ide in e ac i e e inemen
op ions, allowing use s o adjus esul p esen a ion based on
h ps://doi.o g/10.5281/zenodo.17229737
Figu e 1: An o e iew o he In-B owse indexing and pe sonalized con en e ie al app oach.
a ious pa ame e s. These adjus men s migh include sou ce
p e e ences, ecency equi emen s, o opic emphasis. Each
in e ac ion would eed back in o he sys em’s unde s anding
o use p e e ences, g adually imp o ing e ie al accu acy
h ough ongoing lea ning om use beha io pa e ns.
Use -Con olled P i acy
P i acy p o ec ion would be undamen al o he sys em
a chi ec u e, wi h all da a p ocessing occu ing exclusi ely
wi hin he b owse en i onmen . This localized app oach
would ensu e sensi i e in o ma ion emains unde use con-
ol a he han being ansmi ed o ex e nal se e s. The
design would collec only in o ma ion necessa y o sys em
unc ionali y.
The sys em would p o ide comp ehensi e anspa ency e-
ga ding da a usage h ough an accessible b owse ex ension
in e ace. This in e ace would display he cu en in e -
es model, c awling ac i i ies, and index con en s in use -
iendly o ma s. All aspec s o he sys em would emain
use -modi iable, wi h op ions o edi , expo , o dele e any
s o ed in o ma ion.
Con ol g anula i y would ex end o ope a ional pa ame-
e s, allowing use s o adjus he balance be ween pe sonal-
iza ion dep h and esou ce u iliza ion. Use s could con igu e
c awling schedules, s o age limi a ions, and embedding di-
mensions based on hei p e e ences and de ice capabili ies.
This lexibili y would enable he sys em o accommoda e
di e se usage pa e ns and ha dwa e cons ain s while main-
aining co e unc ionali y.
Th ough his in eg a ed ope a ional low, he p oposed
sys em would c ea e a sel -con ained in o ma ion ecosys em
wi hin he b owse en i onmen . By combining in e es
modeling, adap i e con en acquisi ion, seman ic indexing,
and e ie al-augmen ed sea ch, i would o e pe sonalized
in o ma ion access while p ese ing use p i acy.
EVALUATION APPROACH
E alua ing a b owse -based indexing sys em p esen s
speci ic challenges equi ing ca e ul me hodological plan-
ning. This sec ion ou lines some possbile esea ch-based
app oaches o assess such concep ual a chi ec u es.
Technical pe o mance e alua ion equi es adap ing
s anda d in o ma ion e ie al me ics o he b owse con-
ex . Measu es such as p ecision, ecall, and mean ecip ocal
ank mus be applied wi hin pe sonal indexing cons ain s,
whe e co pus size a ies be ween use s and changes o e
ime. These me ics should assess e ie al e ec i eness
ela i e o indexed con en a he han global eposi o ies.
B owse -speci ic indica o s including memo y usage, s o -
age e iciency, and in e ace esponsi eness a e c ucial o
e alua ing clien -side easibili y.
Simula ion-based assessmen o e s aluable insigh s o
concep ual a chi ec u es be o e ull implemen a ion. Use
simula ion me hods desc ibed by [Balog and Zhai(2025)]
can be adap ed o model a ious use in e es s, b owsing
pa e ns, and in o ma ion needs. This enables sys ema ic
es ing ac oss di e en use p o iles wi hou ex ensi e de el-
opmen esou ces. By c ea ing syn he ic b owsing his o ies
and in e es p o iles, esea che s can gene a e ep esen a-
i e pe sonal indexes o es ing. Simula ed que ies wi h
p ede e mined ele ance judgmen s p o ide measu able pe -
o mance me ics while allowing pa ame e a ia ion.
LLM-based agen s, ollowing me hods p oposed by [Lu
e al
.
(2025)], o e an e ec i e e alua ion s a egy. These
agen s can simula e di e en use pe sonas wi h a ying
in o ma ion needs, echnical expe ise, and p i acy conce ns.
This acili a es assessmen o bo h echnical pe o mance
and use expe ience aspec s, including in e ace usabili y
and pe cei ed u ili y. While LLM agen s canno comple ely
eplica e human beha io , hey p o ide cos -e ec i e ini ial
e alua ion be o e human pa icipan es ing.
Scien i ic alidi y equi es ca e ul benchma k de elop-
men , including cu a ed web con en wi h p ede ined ele-
ance judgmen s, s anda dized b owsing p o iles, and con-
sis en que y se s. Such benchma ks enable ep oducible
compa isons be ween implemen a ion app oaches and help
assess design decisions ega ding embedding dimensions,
c awling s a egies, and index pa i ioning me hods.
h ps://doi.o g/10.5281/zenodo.17229737

Human pa icipan alida ion emains essen ial o ho -
ough e alua ion. Well-designed use s udies employing
mixed me hods can assess bo h objec i e pe o mance and
subjec i e expe ience. Fo his pu pose, amewo ks like
Sea chLab [Ze houdi and G ani ze (2025)] o e aluable
capabili ies as a modula web-based pla o m speci ically
designed o sea ch beha io s udies. Pa icipan s should
engage wi h he sys em o e ex ended pe iods o cap u e e-
alis ic usage and allow na u al in e es p o ile de elopmen .
Pe o mance e alua ion should combine logged in e ac ion
da a and s uc u ed asks wi h de ined success c i e ia. Qual-
i a i e me hods such as hink-aloud p o ocols, in e iews,
and usabili y ques ionnai es complemen quan i a i e mea-
su es by e ealing use pe cep ions. The comp ehensi e
da a collec ion capabili ies o Sea chLab educe he need
o cus om applica ion de elopmen , allowing esea che s o
ocus on s udy design a he han echnical implemen a ion.
IMPACT AND RESEARCH DIRECTIONS
The b owse -based indexing app oach we p opose has
implica ions beyond indi idual sea ch expe iences. This
sec ion examines po en ial e ec s on web ecosys ems, use
au onomy, and echnological syne gies, while ou lining u-
u e esea ch pa hs.
Web In o ma ion Ecosys ems
Decen alizing web indexing h ough pe sonal b owse -
based sys ems could al e web in o ma ion dynamics. Cu -
en indexing powe concen a ion among ew co po a-
ions has c ea ed an en i onmen whe e con en disco e y
is mainly con olled by p op ie a y algo i hms op imized
o ad e ising e enue a he han in o ma ion di e si y.
As [G ani ze e al
.
(2024)] no e, his cen aliza ion in o-
duces sys ema ic biases ha may homogenize con en and
a o comme cial in e es s.
A dis ibu ed app oach whe e use s main ain pe sonal
indexes could educe hese concen a ing e ec s. Con en
c ea o s migh espond by p oducing mo e specialized ma-
e ial o niche audiences ins ead o op imizing solely o
dominan sea ch algo i hms. Publishe s cu en ly in es in
sea ch engine op imiza ion echniques ha o en p io i ize
algo i hmic isibili y o e con en quali y. When disco e y
becomes mo e pe sonalized h ough b owse -based index-
ing, hese incen i es may shi owa d con en ha se es
use in e es s a he han algo i hmic p e e ences.
The p oposed b owse -based indexing sys em would unc-
ion alongside b oade open web ini ia i es. Use s could
op ionally con ibu e anonymized, agg ega ed indexing da a
(wi h explici consen ) o collabo a i e p ojec s like Open-
WebSea ch.eu [G ani ze e al
.
(2024)], c ea ing a mu ually
bene icial ela ionship be ween pe sonal and collec i e in-
dexing e o s. This a angemen could add ess a limi a-
ion o pu ely pe sonal indexing: educed con en disco e y
b ead h. By olun a ily pa icipa ing in ede a ed e o s,
use s could main ain p i acy ad an ages while con ibu ing
o and bene i ing om collec i e knowledge o ganiza ion.
Use Au onomy
The a chi ec u e we p opose imp o es use con ol o e
pe sonal da a and sea ch expe iences in se e al ways. By
p ocessing and s o ing da a locally, he sys em emo es he
ex e nal da a ans e s ound in cen alized indexing sys ems.
Use s would gain p o ec ion om ex e nal da a collec ion
and cla i y abou wha in o ma ion hei sys em has cap u ed
and how i a ec s hei sea ch esul s.
The adap i e use -in e es model o e s ano he aspec o
use empowe men . Unlike ixed indexing app oaches ha
ea all use s iden ically, he p oposed sys em would e ine
i s unde s anding o indi idual in e es s h ough b owsing
pa e ns and explici eedback. This esponsi eness allows
sea ch esul s o e lec ac ual use needs a he han gene al
assump ions o comme cial p io i ies. The sys em could
show use s isualiza ions o hei in e es p o iles, allowing
hem o adjus o co ec misin e p e a ions, enhancing bo h
con ol and sys em accu acy.
Cla i y ex ends beyond da a collec ion o he sea ch p o-
cess i sel . Comme cial sea ch engines ypically p o ide
minimal insigh in o esul selec ion o que ies. A locally
managed index could gi e use s clea explana ions o ank-
ing ac o s, po en ially building us in he sys em. This
cla i y could help use s de elop be e sea ch s a egies and
unde s and he connec ion be ween hei b owsing beha io s
and sea ch ou comes.
Le e aging AI Models
Recen ad ancemen s in language models and gene a i e
AI c ea e aluable oppo uni ies o b owse -based indexing
sys ems. Local language models could imp o e mul iple
sys em aspec s, om in e es p o iling o sea ch que y p o-
cessing. By analyzing con en seman ics mo e deeply, hese
models could build mo e nuanced ep esen a ions o use
in e es s han con en ional keywo d-based app oaches. This
capabili y could help he sys em di e en ia e be ween em-
po a y in o ma ion needs and endu ing in e es s, adjus ing
c awling p io i ies acco dingly.
The de elopmen and e alua ion o such sys ems p esen
dis inc challenges ha AI could help add ess. Language
models could simula e a ious use beha io s o es sys em
esponsi eness ac oss di e en usage pa e ns. While [Lu
e al
.
(2025)] cau ion abou limi a ions in AI-based simu-
la ion, such app oaches could s ill p o ide use ul insigh s
du ing ea ly de elopmen s ages. These simula ions could
help iden i y weaknesses in c awling s a egies o in e es
modeling be o e deploymen wi h ac ual use s.
As b owse -in eg a ed language models like WebLLM be-
come mo e capable, he sys em could implemen p oac i e
indexing based on an icipa ed in o ma ion needs. The model
migh iden i y concep s ela ed o cu en b owsing ac i i-
ies and index ele an con en in ad ance. Howe e , such
capabili ies aise impo an ques ions abou esou ce usage
and use consen ha would equi e ca e ul conside a ion in
any implemen a ion.
h ps://doi.o g/10.5281/zenodo.17229737
Technical Challenges
This p oposal aces se e al implemen a ion challenges.
B owse memo y and p ocessing limi a ions ep esen a p i-
ma y obs acle. Resea ch is equi ed o de elop compac
ec o da abases sui able o b owse en i onmen s. Cu en
embedding me hods a e ypically designed o se e en i-
onmen s wi h g ea e compu a ional esou ces, equi ing
adap a ion o clien -side use. Techniques such as quan-
iza ion [Li e al
.
(2024)] ha educe s o age equi emen s
while p ese ing seman ic in o ma ion could enhance he
easibili y o he sys em.
Adap i e in e es models ep esen ano he esea ch chal-
lenge. Use modeling implemen a ions mus balance com-
plexi y wi h compu a ional e iciency. Resea ch in o inc e-
men al model upda es could imp o e use expe ience and
esou ce use. Inco po a ing explici eedback and implici
signals while main aining model cohe ence p esen s a ma-
chine lea ning challenge equi ing u he s udy.
As web con en spans mul iple modali ies, esea ch in o
e icien mul imodal indexing becomes c ucial. Ex ending
b owse -based sys ems o ep esen and sea ch ac oss ex ,
images, audio, and ideo p esen s echnical challenges. Uni-
ied embedding spaces ha cap u e c oss-modal ela ion-
ships while emaining compac would ad ance he ield.
E alua ing pe sonalized, decen alized sea ch sys ems
p esen s me hodological challenges. De eloping s anda d-
ized benchma ks ha accommoda e indi idual di e ences
while allowing sys ema ic compa ison would acili a e
p og ess. Such amewo ks need o add ess sea ch qual-
i y, esou ce e iciency, and use sa is ac ion.
These echnical challenges highligh how b owse -based
indexing in e sec s in o ma ion e ie al, machine lea ning,
and human-compu e in e ac ion, equi ing solu ions ha
conside social and e hical implica ions o dis ibu ed in o -
ma ion access.
ACKNOWLEDGEMENTS
This esea ch was unded by he Eu opean Union’s Ho i-
zon Eu ope esea ch and inno a ion p og am unde g an
ag eemen No 101070014 (OpenWebSea ch.EU,
h ps:
//doi.o g/10.3030/101070014).
REFERENCES
[Al-Shaikh and Slei (2017)]
Ala’a Al-Shaikh and Azzam Slei .
2017. E alua ing IndexedDB pe o mance on web b owse s.
In 2017 8 h In e na ional Con e ence on In o ma ion Tech-
nology (ICIT). IEEE, 488–494.
[Balog and Zhai(2025)]
K isz ian Balog and ChengXiang Zhai.
2025. Use Simula ion in he E a o Gene a i e AI: Use
Modeling, Syn he ic Da a Gene a ion, and Sys em E alua ion.
a Xi p ep in a Xi :2501.04410 (2025).
[Du ga e al.(2024)]
Csl Vijaya Du ga, RJ Anandhi, Saloni Bansal,
Na deep Singh, Ra i Kal a, and Nabaa M Bade . 2024. Adap-
i e Web C awling S a egies Based on On ological Use In-
e es Modeling o Pe sonalized Con en Re ie al. In 2024
In e na ional Con e ence on T ends in Quan um Compu ing
and Eme ging Business Technologies. IEEE, 1–5.
[Gossen e al.(2015)]
Ge ha d Gossen, Elena Demido a, and
Thomas Risse. 2015. iC awl: Imp o ing he eshness o
web collec ions by in eg a ing social web and ocused web
c awling. In P oceedings o he 15 h ACM/IEEE-CS Join
Con e ence on Digi al Lib a ies. 75–84.
[G ani ze e al.(2025)]
Michael G ani ze , Mohamad Hayek, Se-
bas ian Heineking, Gijs Hend iksen, Ma in Golasowski,
Michael Dinzinge , and Sabe Ze houdi. 2025. OpenWeb-
Sea ch. eu-Building an Open Web Index on Eu oHPC JU
In as uc u es. P ocedia Compu e Science 255 (2025), 43–
52.
[G ani ze e al.(2024)]
Michael G ani ze , S e an Voig , Noo A -
shan Fa hima, Ma in Golasowski, Ch is ian Gue l, Tobias
Hecking, Gijs Hend iksen, Djoe d Hiems a, Jan Ma ino ič,
Jelena Mi o ić, e al
.
2024. Impac and de elopmen o an
Open Web Index o open web sea ch. Jou nal o he Associ-
a ion o In o ma ion Science and Technology 75, 5 (2024),
512–520.
[Haas e al.(2017)]
And eas Haas, And eas Rossbe g, De ek L
Schu , Ben L Ti ze , Michael Holman, Dan Gohman, Luke
Wagne , Alon Zakai, and JF Bas ien. 2017. B inging he web
up o speed wi h WebAssembly. In P oceedings o he 38 h
ACM SIGPLAN con e ence on p og amming language design
and implemen a ion. 185–200.
[Hend iksen e al.(2024a)]
Gijs Hend iksen, Michael Dinzinge ,
Sheikh Mas u a Fa zana, Noo A shan Fa hima, Maik F öbe,
Sebas ian Schmid , Sabe Ze houdi, Michael G ani ze ,
Ma hias Hagen, Djoe d Hiems a, Ma in Po has , and
Benno S ein. 2024a. The Open Web Index. In Ad ances in
In o ma ion Re ie al, Nazli Goha ian, Nicola Tonello o, Yu-
lan He, Aldo Lipani, G aham McDonald, C aig Macdonald,
and Iadh Ounis (Eds.). Sp inge Na u e Swi ze land, Cham,
130–143.
[Hend iksen e al.(2024b)]
Gijs Hend iksen, Michael Dinzinge ,
Sheikh Mas u a Fa zana, Noo A shan Fa hima, Maik F öbe,
Sebas ian Schmid , Sabe Ze houdi, Michael G ani ze ,
Ma hias Hagen, Djoe d Hiems a, Ma in Po has , and
Benno S ein. 2024b. The Open Web Index - C awling
and Indexing he Web o Public Use. In Ad ances in In-
o ma ion Re ie al - 46 h Eu opean Con e ence on In o -
ma ion Re ie al, ECIR 2024, Glasgow, UK, Ma ch 24-28,
2024, P oceedings, Pa V (Lec u e No es in Compu e Sci-
ence, Vol. 14612), Nazli Goha ian, Nicola Tonello o, Yu-
lan He, Aldo Lipani, G aham McDonald, C aig Macdon-
ald, and Iadh Ounis (Eds.). Sp inge , 130–143.
h ps:
//doi.o g/10.1007/978-3-031-56069-9_10
[Kenw igh (2022)]
Benjamin Kenw igh . 2022. In oduc ion o
he webgpu api. In Acm sigg aph 2022 cou ses. 1–184.
[Kuma e al.(2025)]
P iyanshu Kuma , Elaine Lau, Sa anya Vi-
jayakuma , Tu T inh, Elaine T Chang, Vaughn Robinson,
Shuyan Zhou, Ma F ed ikson, Sean M Hend yx, Summe
Yue, e al
.
2025. Aligned LLMs A e No Aligned B owse
Agen s. In The Thi een h In e na ional Con e ence on Lea n-
ing Rep esen a ions.
[Li e al.(2024)]
Xianming Li, Zongxi Li, Jing Li, Hao an Xie,
and Qing Li. 2024. 2d ma yoshka sen ence embeddings.
a Xi p ep in a Xi :2402.14776 (2024).
[Lu e al.(2025)]
Yuxuan Lu, Bingsheng Yao, Hansu Gu, Jing
Huang, Jessie Wang, Lau ence Li, Ji i Gesi, Qi He, Toby
h ps://doi.o g/10.5281/zenodo.17229737
Jia-Jun Li, and Dakuo Wang. 2025. UXAgen : An LLM
Agen -Based Usabili y Tes ing F amewo k o Web Design.
a Xi p ep in a Xi :2502.12561 (2025).
[Raas eld and Mühleisen(2019)]
Ma k Raas eld and Hannes
Mühleisen. 2019. Duckdb: an embeddable analy ical
da abase. In P oceedings o he 2019 in e na ional con e ence
on managemen o da a. 1981–1984.
[Ruan e al.(2024)]
Cha lie F Ruan, Yucheng Qin, Xun Zhou, Rui-
hang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun
Yu, Yiyan Zhai, Sudeep Aga wal, e al
.
2024. WebLLM: A
High-Pe o mance In-B owse LLM In e ence Engine. a Xi
p ep in a Xi :2412.15803 (2024).
[Ze houdi and G ani ze (2025)]
Sabe Ze houdi and Michael
G ani ze . 2025. Sea chLab: Explo ing Con e sa ional and
T adi ional Sea ch In e aces in In o ma ion Re ie al. In
P oceedings o he 2025 ACM SIGIR Con e ence on Human
In o ma ion In e ac ion and Re ie al (CHIIR ’25), Ma ch
24–28, 2025, Melbou ne, VIC, Aus alia. ACM.
h ps:
//doi.o g/10.1145/3698204.3716475
h ps://doi.o g/10.5281/zenodo.17229737