scieee Science in your language
[en] (orig)

Time to land prediction in Barcelona-El Prat airport based on machine learning classification models

Author: Mouriño Pérez, Víctor
Publisher: Universitat Politècnica de Catalunya
Year: 2020
Source: https://upcommons.upc.edu/bitstream/2117/328961/1/memoria.pdf
TREBALL FINAL DE GRAU
TFG TITLE: Time o land p edic ion in Ba celona-El P a ai po based on
machine lea ning classi ica ion models.
DEGREE: G au en Enginye ia d’Ae ona egació
AUTHOR: Víc o Mou iño Pé ez
ADVISOR: C is ina Ba ado Muxí
DATA: July 24 h, 2020
Tí ulo: P edicción del iempo de a e izaje en el ae opue o de Ba celona-El
P a a pa i de modelos de clasi icación de ap endizaje au omá ico.
Au o : Víc o Mou iño Pé ez
Di ec o a: C is ina Ba ado Muxí
Fecha: 24 de julio de 2020
Resumen
Es e documen o con iene un mé odo pa a es ima el iempo de a e izaje en el
ae opue o de Ba celona-El P a median e modelos de clasi icación de
ap endizaje au omá ico. El obje i o del p oyec o, po lo an o, no es p edeci
una a iable con inua sino si el iempo de a e izaje cae den o de una de las
siguien es ca ego ías: a anzado, planeado o e asado. Adicionalmen e, se
han añadido dos ca ego ías llamadas muy a anzado y muy e asado que
con ienen aquellos uelos que han enido alo es anómalos en su iempo de
a e izaje, ya sea po i muy ápido o muy len o.
Pa a ob ene los da os, se ha usado una an ena ADS-B ubicada en el ejado
de la escuela de ingenie ía y ae oespacial de Cas ellde els. Es a an ena cap a
las señales de odas las llegadas a Ba celona que pos e io men e se
decodi ican g acias a un p og ama ya exis en e esc i o en C#. Median e un
p og ama p opio esc i o en py hon, se han ex aído las ca ac e ís icas más
ele an es de es os uelos y se han p esen ado en o ma de ma iz con el
obje i o de que ue an en endibles po los dis in os modelos.
Los da os han sido escalados y di ididos en mues as de en enamien o y de
es . Las p ime as u ilizadas pa a que los dis in os modelos ap endan y las
segundas pa a comp oba la e icacia de los mismos.
Se han en enado seis modelos dis in os, cua o de ellos han ob enidos unos
alo es de exac i ud po encima del 60%, o o de ellos ha llegado a alo es del
75% y o o no ha log ado pasa del 30%. Los es mejo es, se han ajus ado
median e dos écnicas dis in as: una búsqueda alea o ia y una búsqueda de
cuad ícula. Se ha podido comp oba que la búsqueda alea o ia es mucho
mejo pues o que se ob ienen los mismos esul ados en mucho menos iempo
y se equie e mucha menos capacidad de compu ación. Además, se han
u ilizado ambién dis in os mé odos pa a mejo a los esul ados de los modelos
ya ajus ados como clasi icado es de o ación o impulso es.
Finalmen e, se han implemen ado écnicas de sob emues eo pa a el
p oblema de las cinco ca ego ías pues o que los casos ex emos es án muy
in a ep esen ados. G acias a es as écnicas, se ha podido mejo a la
exac i ud especí ica de es as dos ca ego ías, pe o a cambio se ha pe dido
exac i ud global.
El endimien o de es os modelos, se pod ía mejo a en el u u o añadiendo
más uelos a las mues as de da os o añadiendo más ca ac e ís icas de los
mismos.
Ti le: Time o land p edic ion in Ba celona-El P a ai po based on machine
lea ning classi ica ion models.
Au ho : Víc o Mou iño Pé ez
Di ec o : C is ina Ba ado Muxí
Da e: July 24 h, 2020
O e iew
This documen con ains a me hod o assess he landing ime in Ba celona-El
P a ai po using machine lea ning classi ica ion models. The goal o his
p ojec is no o p edic a con inuous a iable bu i he ime o land alls inside
one o he ollowing ca ego ies: ad anced, delayed o planned. Addi ionally,
wo ca ego ies called e y ad anced and e y delayed ha e been included
con aining ligh s ha ha e had abno mal alues on hei ime o land, ei he
because o e y as app oaches o e y slow ones.
To ob ain he da a, an ADS-B an enna loca ed on he ae ospace and
elecommunica ions enginee ing school o Cas ellde els oo op has been
used. This an enna, cap u es he signals o all he a i als in o Ba celona which
a e la e decoded hanks o an al eady exis ing p og am w i en in C#. By
means o a cus om p og am w i en in py hon, he mos ele an cha ac e is ics
o hese ligh s a e ex ac ed and p esen ed in a ma ix o ma so ha he
di e en models can unde s and hem.
All he da a has been scaled and di ided in aining and es samples. The i s
ones a e used o each he di e en models and he second ones a e used o
measu e hei e iciency.
Six di e en models ha e been ained, ou o hem eached accu acy alues
o e 60%, ano he one eached a alue o 75% and ano he one could no go
o e 30%. The bes h ee models, ha e been adjus ed wi h wo di e en
echniques: andom sea ch and g id sea ch. I has been e i ied ha a andom
sea ch is signi ican ly be e since he same esul s can be ob ained bu i
equi es much less ime and compu ing esou ces. Also, di e en me hods o
enhance he esul s o he al eady adjus ed models such as o ing classi ie s
and boos e s ha e been used.
Finally, o e sampling echniques ha e been implemen ed o sol e he i e
ca ego ies p oblem as he ex eme cases a e e y unde ep esen ed. Thanks
o his echnique, he accu acy o hese wo ca ego ies was imp o ed bu in
u n he o e all accu acy descended.
The pe o mance o hese models could be upg aded in he u u e i mo e ligh s
o mo e cha ac e is ics o hose ligh s we e added.

A mis abuelos, Joaquín y Filomena.
CONTENTS
INTRODUCTION ................................................................................................ 1
CHAPTER 1. MACHINE LEARNING .............................................................. 2
1.1 Scien i ic ki lea ning (sklea n) ........................................................................................ 2
1.2 ML models .......................................................................................................................... 3
1.2.1 Random o es classi ie s ....................................................................................... 3
1.2.2 Logis ic eg esso ................................................................................................... 3
1.2.3 MLP classi ie ......................................................................................................... 4
1.2.4 K-nea es neighbo s ............................................................................................... 5
1.2.5 Suppo ec o classi ica ion (SVC) ........................................................................ 6
1.2.6 Nai e Bayes............................................................................................................ 6
1.3 T aining and es se ......................................................................................................... 7
1.3.1 T ain es spli ......................................................................................................... 7
1.3.2 S a i ied sampling .................................................................................................. 8
1.3.3 Scaling he da a ...................................................................................................... 8
1.4 E alua ion echniques .................................................................................................... 10
1.4.1 C oss alida ion .................................................................................................... 10
1.5 Hype pa ame e s uning ................................................................................................. 11
1.5.1 G id sea ch ........................................................................................................... 12
1.5.2 Randomized sea ch .............................................................................................. 12
1.6 Enhancemen echniques ............................................................................................... 12
1.6.1 Vo ing classi ie s ................................................................................................... 13
1.6.2 AdaBoos classi ie ............................................................................................... 13
1.6.3 O e sampling........................................................................................................ 13
1.7 Pe o mance measu es ................................................................................................... 13
1.7.1 Accu acy ............................................................................................................... 14
1.7.2 Con usion ma ix ................................................................................................... 14
CHAPTER 2. DATA COLLECTION AND PREPARATION .......................... 19
2.1 Scope o he p oblem ...................................................................................................... 20
2.1.1 Desi ed ou pu s .................................................................................................... 20
2.1.2 A ea o in e es ..................................................................................................... 20
2.1.3 P oblem bounda ies ............................................................................................. 21
2.2 Da a collec ion ................................................................................................................. 23
2.2.1 ADS-B ................................................................................................................... 23
2.2.2 ADS-B Decode .................................................................................................... 24
2.2.3 F om ADS-B decode o da a p epa a ion ............................................................ 24
2.3 Da a p epa a ion .............................................................................................................. 25
2.3.1 Di ec ea u es p epa a ion ................................................................................... 25
2.3.2 Indi ec ea u es ob en ion and p epa a ion .......................................................... 29
2.3.3 Excep ions ............................................................................................................ 36
2.3.4 Time o land .......................................................................................................... 37

ACKNOWLEDGMENTS
Fi s , I’d like o hank my ad iso C is ina Ba ado Muxí o he dedica ion and
guidance h oughou his p ojec . Wi hou he help i would ha e been impossible
o comple e i . Du ing he las six mon hs and despi e o he excep ional si ua ion
due o he s a e o ala m, C is ina has always ound ime o a ange online
mee ings and keep a good ack o he p ojec . When I s a ed in Feb ua y 2020,
my knowledge abou machine lea ning was almos null bu she has been a kind
men o and has p o ided me wi h all he necessa y ools and guidelines o
success ully comple e his endea ou .
Second, I would like o exp ess my g a i ude o my amily and iends. Du ing his
ou -yea jou ney hey ha e been an excep ional help, e y necessa y in he
di icul momen s.
Las bu no leas , I would like o hank Ma cos Pé ez Ba lle o his ADS-B decode
and Daniel Ga cía-Mon ea a o, Jaume Assens and F ancesc Fe nandez o hei
e o in ob aining he ae onau ical cha s.
INTRODUCTION
The Eu opean ae ial ne wo k egula o (Eu ocon ol), es ima es ha ai a el will
inc ease a a cons an a e o 1.9% pe yea . This means ha by 2040 he annual
numbe o ope a ions will ha e inc eased a 53% and he ne wo k will ha e o
alloca e mo e han 16 million ex a ligh s each yea . In his scena io, i will be o
he upmos impo ance o de elop e ec i e ools ha can make co ec
p edic ions so ha he whole ne wo k can wo k smoo hly.
Among he di e en pa ame e s o p edic , ime is p obably he mos impo an
one as i is he key o synch oniza ion. The e a e al eady me hods o es ima e
he imes a he ai po , which include axi ime calcula ions o he ime a which
a plane should be eady o s a i s pushback. Anyway, his p oblem is a mo e
di icul when planes a e ai bo ne, especially du ing he app oach.
A c uise le el, ai c a usually ollow ai ways o p esc ibed acks and hey do i
a a cons an speed. Hence, calcula ing he ime in which a plane will each a
ce ain waypoin is ela i ely easy. On he con a y, du ing he descen phase and
he app oach, he speed and ajec o y a iabili y a e signi ican ly inc eased
specially because o human in e en ion. Ai a ic con olle s usually allow
ai c a o ake sho cu s, o ins uc hem o hold a ce ain poin s bu he e is no
a speci ic c i e ion. A he same ime, pilo s con inuously change he speed o
ai c a depending on he speci ic ligh condi ions. Thus, i is e y complica ed o
p edic he ime a plane will ake o land wi h a classical algo i hm as i is almos
impossible o co ec ly con empla e all he a iables a ec ing he p oblem.
In his p ojec , a me hod o calcula e he landing ime based on machine lea ning
is p oposed. Thanks o his echnology, i is possible o c ea e models ha can
accoun o ens o hund eds o a iables and manage highly nonlinea da a
simpli ying he app oach o he p oblem.
Chap e one p o ides a heo e ical backg ound abou machine lea ning. I
con ains a li le explana ion o he di e en models ha ha e been used, how o
spli and p epa e he da ase , how o e alua e and imp o e he di e en models
and how o assess hei pe o mance.
Chap e wo explains all he p ocess ha was ollowed o ga he , p ocess and
p esen he da a in an adequa e o ma . The a iables ha in luence he p ojec
a e p esen ed and hei o e all impo ance is discussed. Also, i is explained how
he ime o land is con e ed in o ca ego ies.
Chap e h ee p esen s he main esul s; he pe o mance o he di e en models
is compu ed and i is explained how hey we e imp o ed. Also, he conclusions
o he p ojec a e exposed and some ideas o imp o e he pe o mance in he
u u e a e gi en.
The code o his p ojec , can be ound in he ollowing link:
h ps://gi hub.com/Vic o Mou inoPe ez/TFG-Vic o -Mou ino-Pe ez
2 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
CHAPTER 1. MACHINE LEARNING
Machine lea ning (ML) is a b anch o compu e science ha aims a c ea ing
sys ems ha can lea n au onomously. T adi ionally, he only way o a compu e
p og am o do some hing was by w i ing an algo i hm ha de ined he con ex and
he de ails o e e y ac ion. This echnology, lips he side o he coin, because i
is able o do i s own calcula ions and o ex ac i s own conclusions ha inally
esul in a new algo i hm ha can unde s and he inpu da a. This kind o sys ems
educe conside ably human in e en ion and hey a e much be e a dealing wi h
huge amoun s o in o ma ion.
A ML model is jus he ile ained o ecognize ce ain ypes o pa e ns which is
e en ually capable o c ea ing i s own ules. These models equi e huge
quan i ies o da a as an inpu . The inpu is usually a ma ix and i has o con ain
a high numbe o ins ances ( ows) bu also a conside able numbe o ea u es
(columns). Ins ances a e he di e en objec s om which models will lea n and
ea u es desc ibe how he di e en ins ances a e [1].
The e a e h ee main ML p oblems: classi ica ion, eg ession and clus e ing
p oblems. Classi ica ion p oblems, like his one, y o p edic ca ego ies o
classes. Tha means ha we will no p edic a con inuous unc ion, ins ead we will
ocus on p edic ing i he ime o land o a plane belongs o one o he ollowing
h ee ca ego ies: planned, ad anced o delayed. Hence, his p ojec can be
de ined as a mul iou pu ca ego isa ion p oblem o h ee classes.
Also, we will look o “ex eme” cases ha will be classi ied as e y ad anced o
e y delayed. Adding hese wo ca ego ies will es i he di e en ML models a e
able o p edic e y a e scena ios as no many da a will be a ailable.
1.1 Scien i ic ki lea ning (sklea n)
Sklea n is a ee so wa e ML lib a y o py hon and i allows use s o sol e he
h ee di e en ypes o ML p oblems. I also includes unc ionali ies ha pe mi
dimensionali y educ ion and da a pos p ocessing. All he models in he lib a y
admi some hing called g id sea ch o pa ame e uning which is used o inc ease
hei o e all pe o mance [2].
In combina ion wi h sklea n, we also used o he open sou ce lib a ies such as
pandas and NumPy. The i s one is mos ly used o da a manipula ion and
analysis. In pa icula , i o e s da a s uc u es and ope a ions o manipula ing
nume ical ables [3]. NumPy, on he o he hand, adds suppo o la ge, mul i-
dimensional a ays and ma ices, along wi h a la ge collec ion o high-le el
ma hema ical unc ions o ope a e on hese a ays [4].
MACHINE LEARNING 3
1.2 ML models
In his sec ion, we b ie ly desc ibe he di e en models ha ha e been used in
he p ojec , whe e o ind hem in he sklea n en i onmen and wha a e he
undamen al p inciples ha go e n he decision-making p ocess.
1.2.1 Random o es classi ie s
The basic idea behind a andom o es classi ie is o c ea e an ensemble o
decision ees, aining hem a he same ime and ou pu ing he class ha is he
mode o he p edic ed classes [5]. The ea u es used a each ee a e comple ely
andom, hus, a single ee is no a e y accu a e classi ie bu combining many
o hem will esul in a good p edic o . This phenomenon is called wisdom o he
c owd and i s a es ha in mos cases, he agg ega ed answe o many people is
be e han he opinion o an expe . Random o es s a e a ailable in sklea n
unde he ensemble module [6].
Fig. 1.1 illus a es he wo king p inciple o a n- ees andom o es classi ie .
Fig. 1.1 Func ioning scheme o a andom o es classi ie [7].
1.2.2 Logis ic eg esso
A logis ic eg esso is a classi ica ion me hod ha uses a logis ic unc ion o model
a bina y dependen a iable [8]. As i can be seen in Fig. 1.2, he alues o e he
sigmoidal cu e a e classi ied as ue samples and he alues unde he cu e
a e classi ied as alse samples.

4 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
To use a logis ic eg esso in mul iou pu p oblems like his one, a s a egy called
one e sus he es (O R) mus be used. Ins ead o aining a single classi ie o
all he classes, we c ea e one classi ie pe class. Each indi idual classi ie , will
guess i an ins ance belongs o hei class o no and i will also ou pu a
con idence sco e. I wo di e en classi ie s hink ha he same ins ance belongs
o hei class, he con idence sco e is used o b eak he ie. Logis ic eg esso s
can be ound in he linea model module o sklea n using he linea eg ession
unc ion [9].
Fig. 1.2 Example o a logis ic eg ession [10].
1.2.3 MLP classi ie
A mul ilaye pe cep on (MLP) is a eed o wa d a i icial neu al ne wo k (ANN).
They a e composed o a leas h ee laye s: an inpu laye , a hidden laye and an
ou pu laye as i can be seen in Fig. 1.3 [11].
Fig. 1.3 MLP classi ie wo king p inciple [12].
MACHINE LEARNING 5
The hidden laye s in any MLP a e composed o nodes. Each node has an
ac i a ion h eshold and he unc ions o calcula e hei alue a e highly non-
linea . I is also wo h men ioning ha MLPs a e ully connec ed, ha means ha
e e y node is connec ed o all he nodes o he ollowing laye .
This classi ie lea ns by using he leas mean squa e me hod, compa ing he
expec ed ou pu wi h he ou pu o he classi ie . On each i e a ion, he weigh s o
he connec ions o he di e en nodes a e changed un il he alue o he ou pu
unc ion is sa is ac o y o he maximum numbe o i e a ions is eached. MLP
classi ie s can be ound unde he neu al ne wo k module o sklea n [13].
1.2.4 K-nea es neighbo s
K-nea es neighbo s is one o he simples classi ie s in ML bu ye i is e y
e ec i e. The easies way o unde s and how his model wo ks is by looking a
he example in Fig. 1.4. I we wan o classi y he g een ci cle as a blue squa e o
a ed iangle, he only hing ha we ha e o do is de ining he numbe o
neighbo s. I K equals 3, hen he neighbo s o he ci cle a e wo iangles and
one squa e, hus he g een ci cle is classi ied as a iangle. On he o he hand, i
K equals 5, he neighbo s a e h ee squa es and wo iangles and he ci cle is
classi ied as a squa e.
Fig. 1.4 Decision scheme o a K-nea es neighbo s [14].
A common s a egy ha is used wi h his model is o assign weigh s o he
neighbo s equal o he in e se o hei Euclidean dis ance wi h he ins ance we
a e ying o classi y. Anyway, he concep o “dis ance” canno be clea ly
isualized o mo e han h ee dimensions.
A li le d awback o using his me hod is ha i consumes a lo o memo y and
CPU esou ces, hence i should no be used wi h big da ase s [15]. I can be
ound in he sklea n neighbo s module [16].
6 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
1.2.5 Suppo ec o classi ica ion (SVC)
A suppo ec o classi ie is e y simila o a logis ic o a linea eg esso , he
main di e ence is ha in his case, he model will y o ind he op imal hype plane
which is di iding he p oblem a iables. Suppo ec o s a e he poin s nea he
hype plane which in luence i s posi ion and o ien a ion. A hype plane in a
bidimensional space is jus a line, hus Fig. 1.5 ep esen s wha an SVC would
do wi h wo ea u es al hough i is impossible o imagine how a hype plane looks
like o mo e han h ee ea u es [17].
Fig. 1.5 G aphical ep esen a ion o he op imal hype plane [18].
This classi ie , like he logis ic eg esso , is mean o wo k wi h bina y p oblems
only. To use i as a mul iclass classi ie a sligh ly di e en app oach is used,
ins ead o using an O R s a egy, we will use a one e sus one (O O). In ou
p ojec , his means ha i will c ea e a classi ie o dis inguish be ween ad anced
and planned ligh s, ano he one o dis inguish be ween ad anced and delayed
ligh s and a las one o dis inguish be ween planned and delayed ligh s ( o he
h ee ca ego ies app oach). Gene ally, his s a egy will esul in aining mo e
classi ie s han a simple O R app oach bu i is p e e ed since he classi ie s
mus be only ained in a small pa o he da ase and he p ocess is as e . SVC
can be ound in he sklea n SVM module [19].
1.2.6 Nai e Bayes
The Nai e Bayes classi ie is o cou se based on he Bayes p obabili y heo em
and i is said o be Nai e because i assumes ha he di e en ea u es a e
independen om each o he which is alse in many cases.
MACHINE LEARNING 7
Pa icula ly, we ha e used a model called complemen Nai e Bayes
(complemen NB) unde he Nai e Bayes sklea n module [20]. This model
con e s he ea u es o he da ase in a p obabili y ec o by coun ing how many
imes a ce ain ea u e appea s and di ing i by he o al numbe o samples in
he da ase . Once he ea u es p obabili ies a e calcula ed, he model applies he
Bayes heo em and i compu es he p obabili y o a ce ain class happening. The
class wi h he highes p obabili y is he ou pu class.
1.3 T aining and es se
The inpu da a o any machine lea ning model has o be spli in o a aining se
and a es se . The aining se will be used o each he models, ha means ha
hey will y o unde s and why all he ea u es esul in a ime o land being
ad anced, planned o delayed. On he o he hand, he es se will be used o
assess he model pe o mance by compa ing he ou pu o he classi ie wi h he
eal ou pu .
An impo an p emise is ha he aining and es se s mus be comple ely
independen om each o he and he e a e a couple o easons o ha . The i s
one is known as da a snooping bias. Human b ains a e e y good a ecognizing
pa e ns, i we ied o gain insigh s wi h all he da ase , we would p obably selec
o disca d ce ain models in luenced by he pa e ns we ha e ecognized bu his
could lead o models ha only wo k well o he pa icula da a we a e wo king
wi h.
The second eason is ha i we allow he di e en models o see all he da ase ,
hey will be o e i ed. O e i ing a model means ha i will p obably ha e an
ou s anding pe o mance o he da ase i has lea n om, bu i will ail o make
co ec p edic ions wi h o he da a as he model has been hea ily adjus ed only
o a pa icula in o ma ion [12].
We ha e used wo di e en echniques o di ide he da ase : a ain es spli and
a s a i ied sampling.
1.3.1 T ain es spli
A ain es spli is he simples way o doing he pa i ion. We only ha e o selec
a andom seed; he seed will selec some ins ances and he ins ances assigned
o he es se will always be he same. In his way, we will ha e wo independen
da ase s o wo k as shown in Fig. 1.6.
14 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
1.7.1 Accu acy
Accu acy in classi ica ion p oblems is he numbe o co ec p edic ions made by
he model o e all he p edic ions made and i is exp essed as a pe cen age [31].
Al hough i is use ul, when wo king wi h skewed da ase s like he one in his
p ojec , i s use is jus indica i e. To unde s and why, we can hink o he ollowing
classical example: we ha e a da ase wi h handw i en numbe s om 0 o 9 and
we wan o make a classi ie o know which numbe s a e no a 5. I he classi ie
always o es ha he image is no a 5, i will ha e an accu acy o 90% o any
well-balanced da ase . Al hough he esul is imp essi e i is no gi ing an ac ual
sense o how he model is doing. A much be e pe o mance measu e can be
ob ained hanks o he con usion ma ix.
1.7.2 Con usion ma ix
A con usion ma ix is jus a manne o isualize how a ML model has classi ied
he ins ances. To be e unde s and wha a con usion ma ix is, we will use he
ollowing example: i his p ojec ins ead o p edic ing 3 di e en classes, only
ied o p edic whe he a ligh landing ime is going o be as planned o no a
con usion ma ix like he one in Fig. 1.11 would appea , whe e he meaning o
he di e en ac onyms is:
• T ue posi i es (TP): ue posi i es a e he ligh s wi h a planned landing
ime ha we e co ec ly p edic ed as planned.
• False posi i es (FP): alse posi i es a e he ligh s wi h a non-planned
landing ime ha we e w ongly p edic ed as planned.
• T ue nega i es (TN): ue nega i es a e he ligh s wi h a non-planned
landing ime ha we e co ec ly p edic ed as non-planned.
• False nega i es (FN): alse nega i es a e he ligh s wi h a planned landing
ime ha we e w ongly p edic ed as non-planned.
Fig. 1.11 Example o a 2x2 con usion ma ix.

MACHINE LEARNING 15
F om he abo e con usion ma ix, h ee new pe o mance measu es a e
ob ained: p ecision, ecall and 1. Al hough hey a e o en con used wi h
accu acy, hey do no ep esen he same measu e [31].
1.7.2.1 P ecision
P ecision ies o answe he ques ion: which is he co ec p opo ion om all he
posi i e p edic ed alues? In ou example ha would be: om all he ins ances
ha ha e been p edic ed as planned, how many o hem a e eally planned? To
calcula e he esponse, we mus use equa ion 1.1.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃
𝑇𝑃 +𝐹𝑃
(1.1)
A p ecision o an 80% means ha om all he ins ances ha we e p edic ed as
planned by he model, 8 ou o 10 we e co ec ly p edic ed.
1.7.2.2 Recall
Recall, on he o he hand, ies o answe wha is he p opo ion o eal posi i es
ha we e co ec ly iden i ied? Ge ing back o ou example, ha would be, om
all he ins ances ha a e ac ually planned, how many o hem ha e been co ec ly
classi ied? In his case, we ha e o use equa ion 1.2.
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃
𝑇𝑃 +𝐹𝑁
(1.2)
Now, a ecall o an 80%, means ha om all he planned ins ances, 8 ou o 10
ha e been co ec ly classi ied.
1.7.2.3 1 sco e
The 1 sco e combines he p ecision and he ecall in o a single me ic which is
e y con enien because i can be used o compa e di e en classi ie s. I is
de ined as he ha monic mean o p ecision and ecall and i is calcula ed wi h
equa ion 1.3.
16 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
𝐹1 = 2
1
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +1
𝑟𝑒𝑐𝑎𝑙𝑙
= 2 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃
𝑇𝑃 + 𝐹𝑁 +𝐹𝑃
2
(1.3)
The ha monic mean is used ins ead o he egula mean because in ha way he
1 alue is only good when bo h p ecision and ecall ha e also good alues.
1.7.2.4 P ecision/ ecall ade-o
Un o una ely, he e is a ade-o be ween ecall and p ecision, his means ha
we canno ha e an excellen p ecision and ecall a he same ime. Fig. 1.12
shows a p ecision- ecall cu e o ou h ee classes p oblem unde a Linea SVC
model. The blue, u quoise and o ange lines ep esen he beha iou o he
ad anced, planned and delayed ca ego ies espec i ely when analysed
indi idually ( he p ecision and ecall a e ob ained o each ca ego y om he
con usion ma ix). The yellow line, shows he beha iou o all he ca ego ies a
he same ime. Addi ionally, he plo con ains iso- 1 cu es o obse e wha 1
alues ou model is achie ing.
Fig. 1.12 P ecision/Recall cu e o he h ee classes p oblem.
MACHINE LEARNING 17
Ve y in e es ing conclusions can be d awn om he p e ious igu e: we can ob ain
pe ec ecall o pe ec p ecision; he planned ca ego y allows us o imp o e he
ecall wi hou sca i ying much p ecision as he cu e is mo e o less la bu we
canno do ha wi h he ad anced and delayed ca ego ies; he op imum 1 alue
is well o e 0.6 (0.68 in ac ).
The decision o aiming a a high p ecision, a high ecall o an in e media e
solu ion depends on he p oblem. I we use a ML model o ecognise hie es wi h
ideo came as, we do no wan o miss any hie e (high ecall) al hough we will
s op innocen people (low p ecision). In his p ojec , we a e no pa icula ly
in e es ed on a high ecall o a high p ecision so we will y o op imize he 1 and
he accu acy.
To gene alize he concep s, Fig. 1.13 shows a eal i e class con usion ma ix.
The elemen s on he main diagonal a e he co ec ly classi ied ins ances. As an
example, he hi d ow and ou h column elemen , indica es ha 130 planned
ins ances we e w ongly p edic ed as delayed. Also, i can be seen ha he
con usion ma ix has a colou legend ha associa es colou s wi h he numbe o
ins ances. Thus, a a i s glance, we can obse e a yellow one in he middle o
he ma ix meaning ha mo e han 2,000 planned ins ances we e co ec ly
classi ied.
Fig. 1.13 Example o a i e classes con usion ma ix.
18 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
In his case, o calcula e p ecision and ecall, we need o add up all he elemen s
o he class. The ecall o he planned ca ego y is compu ed in 1.4.
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃
𝑇𝑃 +𝐹𝑁 = 2794
108 +2794 +130 = 0.9215
(1.4)
DATA COLLECTION AND PREPARATION 19
CHAPTER 2. DATA COLLECTION AND PREPARATION
This chap e explains how we can go om a heo e ical p oblem such as
p edic ing he ime o land o an ai c a o he poin in which a ML model is eady
o s a making p edic ions. The inal ou pu is he ma ix ha is used as an inpu
o ML and in his case e e y ins ance co esponds wi h a ligh a i ing o
Ba celona and e e y ea u e is one o he cha ac e is ics o his ligh lis ed in
Table 2.1.
Table 2.1 Fea u es o he ML inpu ma ix
Fea u e ca ego y
Fea u e
Fea u e ype
Posi ion
La i ude
Floa
Longi ude
Floa
Al i ude
In ege
Heading
Floa
Speed
Indica ed ai speed (IAS)
In ege
G ound speed (GS)
In ege
Me eo ology
Wind di ec ion
Floa
Wind speed
Floa
Wind a iabili y
Floa
Visibili y
In ege
Ba ome ic p essu e
In ege
Ai po
Runaway
S ing
Ai po azimu h
Floa
Ai po dis ance
Floa
T a ic cha ac e is ics
Landing ca ego y
S ing
Mix index
Floa
Type o ai line
In ege
Time
Day o he week
In ege
Ini ial ime
In ege
The whole p ocess, desc ibed in his chap e , consis s o ou basic s eps:
1) Deciding he limi s o he p ojec , he condi ions ha we will ocus on and
wha a e he desi ed ou pu s.
2) Ob aining he mos ele an da a o he p ojec and con e ing i in o an
adequa e o ma .
3) P epa ing he da a acco ding o he c i e ia es ablished in poin one.
4) Gene a ing a ma ix ha can be unde s ood by a ML model so ha i can
s a making p edic ions wi h i .

20 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
This is clea ly summa ized in he scheme o Fig. 2.1.
Fig. 2.1 Ini ial scheme o he p ojec de elopmen .
2.1 Scope o he p oblem
2.1.1 Desi ed ou pu s
As we said be o e, we will i s y o p edic h ee di e en classes. To de ine
hem, we need o p epa e all he da a and inally lea n wha mos planes do. I
mos o hem ake h ee housand seconds, ha will be he planned ime o land
and wi h ha e e ence, we will be able o de ine wha ad anced o delayed
means.
The same hing has o be done wi h he e y ad anced and e y delayed
ca ego ies al hough hei landing ime, will ep esen he a es scena ios.
2.1.2 A ea o in e es
To p edic how as o how slow will ai c a land in a ce ain a ea, we need o
de e mine wha is he mos in e es ing one. As he p ojec is ocused in
DATA COLLECTION AND PREPARATION 21
Ba celona, he e a e h ee possibili ies: Ba celona’s ligh in o ma ion egion
(FIR), s anda d a i al ou es (STAR) o app oaches. Du ing mos pa s o he en-
ou e o descen phase, he e a e no subs an ial changes in he speed o he
ajec o y (i.e. planes ly a gi en pa h a a cons an speed). On he o he hand,
be o e en e ing he app oach, con olle s o en ins uc ai c a o hold, o change
hei heading o o s ay below a ce ain speed c ea ing a complex en i onmen in
which i is di icul o p edic wha planes will do.
Conside ing he speed and ajec o y a iabili y, he mos in e es ing a ea is he
one comp ising he end o he STARs and all he app oaches un il he
in e media e ix (IF) as i is he mos di e se one.
The IF is no conside ed because, as i can be seen in Fig. 2.2, when ai c a
a i e a ha poin , hey ha e o ollow he ins umen al landing sys em (ILS) and
no signi ican changes can occu .
Fig. 2.2 Ins umen al app oach cha o unaway 25R.
2.1.3 P oblem bounda ies
De ining he p oblem bounda ies consis s on ans o ming he a ea o in e es in o
some hing angible. I we wan o make p edic ions, ha means ha we shall no
wai o know all he a iables un il he plane landing. We ha e o de ine a poin a
which p edic ions mus be made.
To selec he bes bounda y, we also looked a he Spanish ae onau ical
in o ma ion publica ion (AIP). As i can be seen in Fig. 2.3 he e is a con ol a ea
22 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
(CTA) wi h a pink ci cula shape cen ed a he ai po . This kind o ai spaces a e
e y in e es ing since a lo o ai a ic con olle s (ATC) ac ion is equi ed.
Fig. 2.3 CTA in Ba celona (INSIGNIA map).
Combining he a ea o in e es wi h he ai space on op o he ai po , we
de e mined ha he mos sui able bounda y would be a 68 Km ci cle wi h i s
cen e in he ae od ome e e ence poin (ARP). This ci cle comp ises all he a ea
o in e es and i exceeds by 20 Km he limi s o he CTA. The esul wi h he
25,294 ligh s in he p ojec can be seen in Fig. 2.4.
Fig. 2.4 Final bounda y wi h ligh s.
DATA COLLECTION AND PREPARATION 23
An example on he ac ual implemen a ion, is an a i ing ligh as an ins ance o
he inpu ma ix and all he ea u es o he ligh ob ained a he momen in which
he ai c a en e s he ci cle.
2.2 Da a collec ion
2.2.1 ADS-B
Au oma ic da a su eillance b oadcas (ADS-B) is a da a-link b oadcas mode in
which ai c a sha e hei iden i ica ion, posi ion, senso ou pu s and any o he
ele an in o ma ion o g ound s a ions o o he ai c a [32]. To ob ain he aw
da a, he easies way is o analyse ADS-B epo s using ools o decode
messages ha a e al eady a ailable [33].
To ob ain hese epo s, an ADS-B an enna loca ed on EETAC’s oo op was
used. This an enna ecei es signals om nea by ai c a , uploads he messages
o a se e called ada cape whe e a py hon sc ip can download he aw da a.
Addi ionally, all he ecei ed in o ma ion is sha ed wi h a websi e called:
www. ligh ada 24.com which p o ides in e u n a ee p emium accoun . The
an enna se up can be seen in Fig. 2.5.
Fig. 2.5 An enna se up on EETAC’s oo op.
I is wo h men ioning ha ADS-B epo s sui pe ec ly he objec i e o his p ojec
as hey con ain an eno mous quan i y o in o ma ion (many ea u es o columns)
and mo e han 850 planes ly in o Ba celona each day [34] (many ins ances o
ows). Also, he an enna has enough ange o cap u e all he planes wi hin he
a ea o in e es . In ac , some o hem can be acked un il he sou h o F ance,
o sou he n egions in Spain.
30 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
Rega ding he ARP azimu h, i has a c ucial impo ance. Whe eas heading
measu es he angle di e ence be ween he nose o he plane and he magne ic
no h, he ARP azimu h measu es he angle be ween he nose o he plane and
he ARP o he ai po . Thus, i indica es whe e planes a e coming om when
app oaching o he ai po . In combina ion wi h he heading, e y in e es ing
conclusions can be ob ained. Fo ins ance, an ai c a could be en e ing he ci cle
om he no h bu i s nose could be poin ing o he eas , sugges ing ha i may
no go di ec ly o he ai po . On he con a y i he plane was en e ing h ough
he eas and he heading was also poin ing eas wa ds, his could mean a apid
app oach o he ai po . The di e ence can be app ecia ed in Fig. 2.11 whe e he
blue line shows he heading and he o ange one he ARP azimu h.
Fig. 2.11 Di e ence be ween ARP azimu h (o ange) and heading (blue).
Two ob ain he ARP dis ance and azimu h we only need o use he same lib a y
as wi h he heading. We selec he en e ing poin o he ci cle and he p e iously
de ined ARP coo dina es and i will au oma ically compu e he desi ed
pa ame e s.
2.3.2.2 Runaways
Knowing which unaway is in use o landings is impo an . Depending on he
ac i e unaways a an ai po , ai c a will ly di e en p ocedu es and app oaches
ha di ec ly in luence he speed hey will ha e. Ba celona ai po has h ee
unaways and he ypical con igu a ion is 25R o a i als and 25L o depa u es,

DATA COLLECTION AND PREPARATION 31
o 7L and 7R in case he wind di ec ion is changed. Du ing nigh s ai c a ypically
land a 02 and 20 is ne e used [37].
To ob ain he unaways, we used a me hod which consis s on de ining i e
ci cula ouch down zones which can be seen in Fig. 2.12. La e , we go o e he
a i als lis and we check which coo dina es all inside he ed zones.
Fig. 2.12 Touch down zones a Ba celona.
By using his me hod o ob ain he unaways, we a e using coo dina es which a e
beyond he ci cle (in ac a he e y end o he ligh ). Anyway, his is comple ely
ai since his ea u e could be easily ob ained in he loop by o he means. (i.e.
unaways can be ob ained a he bounda y o he 68 Km ci cle o e en be o e bu
no wi h ADS-B epo s).
2.3.2.3 Me eo ology
The me eo ological ea u es ha we will use a e: wind di ec ion, wind a iabili y,
wind speed, ba ome ic p essu e and isibili y. All hese pa ame e s ha e a
common in luence, which is educing o inc easing he ai po capaci y bu hey
do i in di e en ways. The wind di ec ion has a e y s ong in luence on
c osswinds. La ge wind speeds along wi h a high a iabili y can c ea e a
phenomenon called wind shea . When his happens, ai c a canno land and hey
ha e o s a lying holding pa e ns which subs an ially inc ease he landing ime.
32 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
Low ba ome ic p essu es, indica e he p esence o s o ms whe eas high
p essu es a e associa ed wi h good days o ly. Finally, i he isibili y is
d ama ically educed, ai po s can s a he so-called low isibili y p ocedu es
(LVP). Unde hese condi ions he ope a ions a e signi ican ly a ec ed and his
can lead o se e e delays.
Al hough ADS-B epo s con ain some me eo ological in o ma ion, we decided
ha i was mo e sensible o use he me eo ological condi ions a he ai po . This
is because i we use he da a di ec ly ead om he ai c a senso s, local
phenomena like a s ong gus o wind could lead ML models o w ong conclusions
when he condi ions a he ai po could be excellen . To ob ain hem, METARS
we e used. A METAR is jus a s anda d o ma o ansmi me eo ological
in o ma ion. In he case o Ba celona, hey a e published e e y hal an hou and
hey con ain all he ele an in o ma ion ha ATC and pilo s need o know [38].
METARS we e ob ained om he webpage: www.ogime .com. I allows o
download la ge ex iles con aining in o ma ion om a whole mon h. Se e al iles
can be downloaded and a anged oge he so ha hey can be ead la e wi h
py hon. An example o how hese iles look like can be seen in Fig. 2.13.
Fig. 2.13 Tex ile wi h METARS ob ained om OGIMET
Fo una ely, o decode METARS, he e is a lib a y wi h he same name c ea ed
by Tom Polla d which handles almos all he possible cases. All he iles can be
decoded wi h he lib a y and only some a e cases ha canno be ully decoded
equi e manual ac ion. E en ough, hese wei d cases do no ha e a di ec
in luence on he ea u es ha we a e sea ching, hus he p oblema ic pa s can
jus be e ased.
As METARS a e published e e y hal an hou , he momen in which ai c a en e
he ci cle does no exac ly ma ch he condi ions a he ai po . Anyway, his is no
pa icula ly ele an because me eo ological condi ions do no change ex emely
as . We can conside ha he 30 minu es gap is sho enough o assume ha
DATA COLLECTION AND PREPARATION 33
condi ions emain he same du ing he pe iod. This means ha he me eo ological
in o ma ion is ob ained om he closes 30 minu es gap in which he ai c a
c osses he ci cle bounda y.
The isibili y, he wind speed and he ba ome ic p essu e can be di ec ly
assigned wi hou any u he ea men bu he wind di ec ion and he wind
a iabili y equi e a li le bi o p ocessing. The wind di ec ion is jus he angle
whe e he wind is coming om (wi h espec o he No h) and he wind a iabili y
is a numbe be ween 0º and 180º indica ing how much does he wind di ec ion
a y om he p edominan one. The e a e ou possible cases:
1) I bo h ea u es a e co ec , hey a e di ec ly assigned.
2) I he wind di ec ion is co ec bu he e is no in o ma ion abou he
a iabili y, he i s one is di ec ly assigned and he second one is se o
ze o.
3) I he wind a iabili y is co ec bu he e is no in o ma ion abou he wind
di ec ion, he i s one is di ec ly assigned and he second one is se as a
andom in ege be ween 0º and 359º.
4) In all he o he cases, he wind di ec ion is se as a andom in ege
be ween 0º and 359º, and he a iabili y is se as 180º.
By aking he abo e c i e ion, some e o is being in oduced as some ea u es
ha e andom alues. E en hough, i is p e e able mainly o wo easons: some
di e si y is in oduced in he da a which could educe bias o ce ain cases and
some ins ances ha would o he wise be los can be kep o he s udy.
2.3.2.4 Landing ca ego y
The landing ca ego y o wake u bulence ca ego y (WTC) is a classi ica ion o
ai c a depending on hei maximum ake-o weigh (MTOW). The e a e cu en ly
ou ca ego ies acco ding o ICAO: ligh , medium, hea y and supe [39]. The
impo ance o he landing ca ego y comes om he ac ha ATC sepa a e ai c a
depending on hese landing ca ego ies as i can be seen in Fig. 2.14.
*Indica es ha minimum sepa a ion is minimum ada sepa a ion (MRS).
Fig. 2.14 Ai c a sepa a ion depending on landing ca ego y.
34 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
In Ba celona he as majo i y o ai c a all inside he medium ca ego y. This is
because he bigges ai lines ha ope a e in he ai po a e low-cos ca ie s wi h
a ypical lee composed o A320s and B737s. E en hough, he e a e also some
hea y ai c a , specially om he ai lines ha ly o Ame ica o Asia, some ligh
ai c a which co espond o gene al a ia ion and wo daily A380s om he Dubai
based company Emi a es. Gi en his a ie y, his ea u e may be use ul o ML
models o make be e p edic ions.
To ob ain his ea u e wo s eps a e needed. The i s one, o cou se, is o ob ain
he ai c a ype. As he ICAO hexadecimal code uniquely iden i ies e e y ai c a
in he wo ld, we can use ee public da abases ha ela e he code wi h he ai c a
ype. A e ha , and hanks o o he public da abases which ela e e e y single
ai c a model in he wo ld wi h i s WTC he ea u e can be ob ained and assigned.
2.3.2.5 Mix index
The mix index is s ongly ela ed wi h he landing ca ego y and i is jus a measu e
o how he e ogeneous he a ic is. I s impo ance comes om he ac ha
al hough he landing ca ego y al eady o e s a measu e o he ype o a ic o
e e y single ai c a , he mix index helps o unde s and he in e ac ions be ween
hem [40]. The equa ion o calcula e i is 2.1.
𝑀𝑖𝑥 𝑖𝑛𝑑𝑒𝑥 (%)= 𝐶 + 3𝐷
(2.1)
Whe e C is he pe cen age o ligh and medium ai c a and D is he pe cen age
o hea y and supe ai c a . The o iginal equa ion only conside s he medium
ones o C and he hea y ones o D bu we sligh ly modi ied i o include all he
di e en a ic ypes. A mix index o 100% means ha only ligh o medium
ai c a a e p esen , on he o he hand, a mix index o a 300% means ha only
hea y o supe ai c a a e p esen . Be ween hese wo cases, he e a e a lo o
possible combina ions ha esul in in e media e mix indexes.
We assigned o e e y ins ance he mix index o he ai space when i was c ossing
he ci cle. Tha means, ha all ai c a ha ha e al eady c ossed he ci cle bu
ha e no landed ye accoun o he calcula ion o he pa ame e . This should gi e
an idea o how complex he si ua ion is upon a i al.
2.3.2.6 Day o he week
Ano he in e es ing ea u e ha we selec ed is he day o he week. This ea u e
should no ha e an eno mous impac bu ML models can accoun o many
a iables and i will be in e es ing o see wha is i s weigh in he decision p ocess.
I can be di ec ly ob ained om he ime s amp o he i s coo dina e.
DATA COLLECTION AND PREPARATION 35
Fig. 2.15 shows he mean numbe o ligh s pe day du ing yea 2016 in Eu ope.
I is ema kable o see how weekends a e by a he less c owded days in he
Eu opean ne wo k whe eas F idays can alloca e up o i e housand mo e ligh s
in a single day.
Fig. 2.15 Mean numbe o ligh s pe day du ing 2016.
2.3.2.7 Ai lines
The ai line ype, as he day o he week, should no ha e a s ong in luence on
he landing ime because ATC should no be sugges ed by i . I s pu pose is simply
o check i ha assump ion is igh and see i any bias is p esen .
Fou di e en ypes o ai lines we e de ined: Eu opean ai lines, Ame ican ai lines
( o USA and Canada), La in Ame ican ai lines ( o cen al and sou h Ame ica)
and o he ai lines (mos ly composed o Russian and Asian ai lines).
E e y ins ance will ha e ou a ibu es co esponding wi h he ou ypes o
ai lines and all o hem will be null excep o he one con aining he ca ie , which
will be se o one. To ob ain he ai line ype, we can use he callsign since he i s
h ee le e s indica e he company. A e ob aining he le e s, we compa e hem
wi h a cus om da abase ha includes all he ai lines ope a ing in Ba celona and
hei classi ica ion (Eu opean, Ame ican, e c.)

36 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
2.3.3 Excep ions
The code we designed in py hon, only needs a empo a y ile and i i has he
co ec o ma , he p og am will wo k well. Un o una ely, he e a e some cases
which a e e y di icul o p edic and w ong o misleading alues can appea in
he di e en ea u es. Be o e c ea ing he inal ma ix ha will be used as an inpu
o ML models, some da a pos p ocessing is needed. When sc eening he
di e en a ibu es some mis akes we e ound mos ly because o e o s in he
da a.
An example is he ob en ion o he landing ca ego ies o some a i als. We said
ha hey can be ob ained om a public da a base and al hough his is ue, in
some cases he e is no any log o ce ain hexadecimal codes. Thus, a web
sc appe has o be designed o ob ain he same ea u e om webpages such as
www.ai ames.o g. This p ocess is alid bu e y ime consuming since e e y
eques will las a leas 15 seconds compa ed wi h an almos immedia e sea ch
in a da abase.
Fig. 2.16 shows ano he example o inco ec da a. The e ical p o ile o he
a i al is co ec since i ends a ze o me e s bu when looking closely a he
coo dina es i can be seen ha he plane is supposed o be landing on a axiway
pa allel o he unaway.
Fig. 2.16 Da a e o example.
A inal example o e oneous da a is he case o wo planes ha acco ding o hei
coo dina es and al i ude, we e landing nea Ma seille al hough i is comple ely
DATA COLLECTION AND PREPARATION 37
impossible o he an enna o cap u e a signal a such a low al i ude and so a
away.
Finally, an example o a di icul case o p edic and handle can be seen in Fig.
2.17. I shows a go a ound, a manoeu e which consis s on e u ning o he ai
when an ai c a is close o he unaway due o echnical issues o any ac o
leading o an unsa e app oach. This case is pa icula ly di icul when wo o mo e
missed app oaches occu .
Fig. 2.17 Example o a go a ound in Ba celona.
2.3.4 Time o land
As we said be o e, when using ML, all he da ase ( he inpu ma ix) is di ided in
wo di e en pa s: a es se and a aining se . Rega dless o he se , he
“solu ions” a e needed. The aining se needs hem o lea n as i will y o
unde s and why all he ea u es esul in a ime o land being ad anced, delayed
o planned. The es se , on he o he hand, needs he “solu ions” o assess how
good i did a p edic ing he ime o land.
Taking ha in o accoun , we need o de ine a inal ea u e o column called ime
o land. I is simply calcula ed by sub ac ing he ime in which he ai c a lands
o he momen in which i c osses he ci cle. The p ocess o con e ing he ime
o land in o di e en ca ego ies is explained in 2.4.1.3.
38 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
2.4 Final ma ix gene a ion
The inal ma ix ha will be ed o ML models is he objec i e o all he p ocess
desc ibed in CHAPTER 2. We ha e de ined he scope o he p oblem, ADS-B
epo s ha e been decoded, he ou pu o he decode has been dumped in o a
empo a y ile ha has been impo ed by a py hon sc ip and all he di ec and
indi ec ea u es ha e been ob ained and assigned as a ibu es o a class called
ins ance.
A he end o he p og am we added a li le pa o c ea e he ma ix and once i
was ully ope a i e, we had o choose da a o un he code. The ini ial idea o
his p ojec was o use eal da a om 2020 bu due o he COVID-19 ou b eak
and he d ama ic educ ion in ai a el i was impossible o keep up wi h i .
Ins ead, we used ADS-B epo s ob ained om he same an enna du ing 2017
which we e s o ed in a se e called Ble io .
Fou di e en mon hs we e chosen o he p ojec and all he ligh s du ing ha
pe iod we e ans o med in o ins ances ( ows) wi h hei co esponding a ibu es
o ea u es (columns). The selec ed mon hs we e: Janua y, Feb ua y, July and
Augus o 2017. These mon hs we e selec ed o in oduce di e si y in he da a as
wo o hem co espond o he win e season and he o he wo o he summe
season. These mon hs ha e e y di e en me eo ological condi ions, di e en
a ic densi ies and e en di e en ai c a as some ai lines change hem
depending on he ime o he yea . The ma ix ha we c ea ed had a size o
25,294 ows and 21 columns (19 nume ical plus 2 ca ego ical). Fig. 2.18 shows
he esul wi h some a ibu es.
Fig. 2.18 Ou pu ma ix om he py hon code.
DATA COLLECTION AND PREPARATION 39
2.4.1 Ma ix da a uning
The c ea ed ma ix was co ec bu i had o be adap ed o he o ma used by
Sciki -Lea n which is he py hon ML lib a y ha we will use. To adap i , wo
di e en echniques known as label bina iza ion and nume ical pipelines can be
used.
2.4.1.1 Label bina ize
The wo ca ego ical a ibu es in he p ojec a e he unaways and he WTC.
These ea u es a e no “unde s ood” by py hon ML models as hey can only
handle nume ical inpu s, hus he ea u es ha e o be con e ed.
Label bina iza ion is a echnique which consis s on going om a ca ego ical
a ibu e o a nume ical one by c ea ing as many nume ical columns as labels
we e in he i s place [12]. This can be easily isualized in Fig. 2.19 which shows
he esul o applying a label bina iza ion o he unaways.
Fig. 2.19 Label bina iza ion example o he unaways.
To do his con e sion, we used a py hon class called LabelBina ize [41]. Using
i is pa icula ly use ul because no only i does he con e sion bu i also s o es
he in o ma ion in a e y e icien manne . Ins ead o s o ing a ma ix wi h all he
ze os and ones, i only s o es he loca ions o he non-ze o elemen s.
Label bina iza ion could be used in his case because i only c ea es i e columns
o he unaways and ou columns o he WTC. I he same echnique had been
used wi h he ai lines (ins ead o di iding hem in Eu opean, Ame ican, e c.) mo e
han one hund ed columns would appea looding he da ase wi h useless
in o ma ion.
46 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
Logis ic
eg esso
0.5970
0.6142
0.6327
Planned
K-Nea es
neighbo s
0.6207
0.6194
0.6283
Planned
SVC
0.6177
0.6640
0.6623
Planned
Nai e-Bayes
0.2489
0.5190
0.3225
Ad anced
F om he abo e ables, he i s conclusion ha we can ob ain is ha gene ally a
s a i ied sampling wo ks be e han a ain es spli . Hence, he g id sea ch and
he andom sea ch we e pe o med on a da ase spli wi h a s a i ied sampling
s a egy.
Rega ding he ML models, i can be seen ha he andom o es classi ie and
he MLP a e clea ly he bes ones. On he o he hand, o selec a hi d model,
he e is a “ ie” be ween he SVC and he K-nea es neighbo s as bo h o hem
p esen e y simila igu es. We decided o un an ex ensi e hype pa ame e
sea ch on he SVC as i has a highe p ecision and accu acy bu also a as one
on he K-nea es neighbo s o see how i esponded wi hou sac i icing much
ime.
Finally, ano he ac ha d aws a en ion is he beha iou o he Nai e Bayes
classi ie . While all models seem o pe o m bes on he planned ca ego y, his
classi ie pe o ms bes on he ad anced ca ego y. Also, all o hem a e achie ing
1 and accu acy sco es o a leas 60% and his one is s uggling o ge o 30%.
To unde s and be e wha is happening, we can look a Fig. 3.2 which shows he
con usion ma ix o his case.
Fig. 3.2 Con usion ma ix o a Nai e Bayes classi ie .

RESULTS 47
We can see ha he model is ac ually eally good a p edic ing ad anced
ins ances as i eaches an accu acy o 76% and i pe o ms easonably well wi h
he delayed ca ego y eaching an accu acy o 64%. The p oblem comes wi h he
planned ca ego y as only a 6.5% o he ins ances in ha ca ego y a e co ec ly
p edic ed al hough hey ep esen 60% o he ligh s. Gene ally, we could say ha
he model has a s ong endency o a ou he less ep esen ed cases.
The easons o his beha iou a e p obably h ee:
1. The di e en ins ances a e no eally independen al hough he model is
making ha assump ion.
2. The e sion ha we used is called Complemen NB and i is pa icula ly
sui ed o unbalanced da ase s. This a ou s he ad anced and delayed
ca ego ies as he e a e less samples.
3. Nai e Bayes models wo k be e when he a iables o he p oblem a e
disc e e bu in his case he e a lo o con inuous a iables.
3.2 Fine uning he models
3.2.1 Bes h ee models
The ables on his sec ion show he di e en hype pa ame e combina ions ha
we ied and hei op imal alues o a g id sea ch and a andom sea ch. The
pe o mance measu es we e ob ained by i ing he bes es ima o wi h he
aining da a and p edic ing on he es da a.
Table 3.3 shows he op imal hype pa ame e s o a andom o es classi ie and
Table 3.4 shows he pe o mance measu es ob ained om he bes es ima o .
Table 3.3 Hype pa ame e uning esul s o a andom o es classi ie .
Hype pa ame e
Possible alues
G id sea ch
op imal alue
Random sea ch
op imal alue
Numbe o
es ima o s
100/200/500/100
0/200
500
500
Boo s ap
T ue, False
False
False
Class weigh
Balanced,
Balanced
subsample, None
Balanced
Balanced
subsample
CCP alpha
0/0.5/1.5
0
0
48 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
Minimum
samples pe lea
0.5/1/1.5
2
2
Max ea u es
Au o, Sq , Log2
Log2
Log2
C i e ion
Gini, En opy
Gini
Gini
Table 3.4 Pe o mance measu es o he bes andom o es es ima o .
Pe o mance measu e
G id sea ch bes esul
Random sea ch bes
esul
1
0.7493
0.7506
P ecision
0.7487
0.7498
Accu acy/Recall
0.7527
0.7535
Table 3.5 shows he op imal hype pa ame e s o an MLP classi ie and Table 3.6
shows he pe o mance measu es ob ained om he bes es ima o .
Table 3.5 Hype pa ame e uning esul s o an MLP classi ie .
Hype pa ame e
Possible alues
G id sea ch
op imal alue
Random sea ch
op imal alue
Hidden laye
sizes
50/75/100/200
75
100
Ac i a ion
Iden i y, Logis ic,
Tanh, Relu
Logis ic
Tanh
Sol e
Lb gs, Sgd, Adam
Lb gs
Adam
Lea ning a e
Cons an ,
In scaling,
Adap i e
Cons an
Cons an
Powe
0.3/0.5
0.5
0.5
Shu le
T ue, False
False
False
Ini ial lea ning
a e
1e-3/5e-4
1e-3
1e-3
Table 3.6 Pe o mance measu es o he bes MLP es ima o .
Pe o mance measu e
G id sea ch bes esul
Random sea ch bes
esul
RESULTS 49
1
0.6857
0.6882
P ecision
0.6868
0.6884
Accu acy/Recall
0.6948
0.6946
Table 3.7 shows he op imal hype pa ame e s o an SVC classi ie and Table 3.8
shows he pe o mance measu es ob ained om he bes es ima o .
Table 3.7 Hype pa ame e uning esul s o an SVC classi ie .
Hype pa ame e
Possible alues
G id sea ch
op imal alue
Random sea ch
op imal alue
Class weigh
Balanced, None
None
None
Gamma
Scale, Au o
Au o
Au o
Ke nel
Linea , Poly, Rb ,
Sigmoid
Poly
Poly
C
0.5/1/1.5/2/3
3
2
Sh inking
False, T ue
T ue
T ue
Coe 0
-1/0/1
1
1
Deg ee
2/3/5
5
5
Tol
1e-3/1e-4
1e-4
1-3
B eak ies
False, T ue
T ue
T ue
Table 3.8 Pe o mance measu es o he bes SVC es ima o .
Pe o mance measu e
G id sea ch bes esul
Random sea ch bes
esul
1
0.6594
0.6565
P ecision
0.6596
0.6569
Accu acy/Recall
0.6687
0.6667
In he ligh o he abo e, he andom sea ch and he g id sea ch p esen e y
simila pe o mances bu hei execu ion imes a e adically di e en . A good
andom sea ch las s mo e o less a couple o hou s depending on he numbe o
i e a ions ha we wan o y bu a no mal g id sea ch will las a he e y leas
one day. In ac , some algo i hms like he MLP can las mo e han a week as hey
do no admi pa alleliza ion. Gi en ha , we can a i m wi hou any doub ha he
andom sea ch is he bes op ion as we will ob ain he same esul s in es ing less
50 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
ime and compu ing capaci y. The only si ua ions whe e i would be easonable
o use a g id sea ch is i we had ei he e y powe ul machines o a lo o ime.
Fig. 3.3 shows he imp o emen in he di e en pe o mance measu es ha we
accomplished. Fo ins ance, he ligh e colou s indica e he alue be o e uning
and he da ke ones he alues a e he sea ch.
Fig. 3.3 Pe o mance measu es o he di e en es ima o s.
Finally, i is clea ha he andom o es classi ie is he bes model so a . Hence,
we decided o y he boos e s and he i e ca ego ies p oblem wi h he bes
andom o es es ima o .
3.2.2 K-Nea es neighbo s
As we also wan ed o y how he K-Nea es neighbo s imp o ed wi h a as
hype pa ame e uning, we an a sho andom sea ch on he algo i hm and he
bes alues we e ound o be he ones in Table 3.9 imp o ing i s 1 sco e by a
4.4%, i s accu acy by a 4.69% and i s p ecision by a 4.77%.
Table 3.9 Hype pa ame e uning esul s o a K-Nea es neighbo s classi ie .
Hype pa ame e
Possible alues
Random sea ch
op imal alue
Lea size
10/20/30
10
Numbe o neighbo s
3/5/10
10
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
Random Fo es MLP SVC
Hype pa ame e uning compa ison
P ecision Accu acy F1
RESULTS 51
CHAPTER 2. Weigh s
Uni o m, Dis ance
Dis ance
P
1/2/3
1
3.3 Boos ing and ensemble me hods
Rega ding he boos ing echniques, he pe o mance o he AdaBoos Classi ie
was qui e disappoin ing as i did no imp o e he esul s o he andom o es . In
ac , all he pe o mance measu es wen down by app oxima ely a 1%. I boos e s
we e used in bina y classi ica ion p oblems such as p edic ing i a ime o land is
going o be as planned o no , he esul s would p obably ge be e .
On he o he hand, he ensemble me hods we e able o imp o e he pe o mance
o he andom o es bu he esul s we e a om excellen . This me hod
inc eases he maximum accu acy by a 0.5% and he 1 sco e and he p ecision
by a ound a 0.25% each. Al hough hese igu es ep esen an inc ease, i is no
a subs an ial one.
3.4 Fi e ca ego ies and o e sampling
We also es ed how he andom o es pe o med wi h he i e ca ego ies p oblem
and i by means o o e sampling echniques we we e able o imp o ing he esul s
o he ex eme cases. Table 3.10 shows he pe o mance measu es o he aining
se wi h a en olds CV be o e and a e implemen ing o e sampling echniques.
The e y ad anced and e y delayed ca ego ies which o iginally con ained 506
ins ances each, we e o e sampled so ha 5,000 ligh s appea ed on each one.
Table 3.10 Pe o mance measu es o he i e ca ego ies p oblem.
Pe o mance measu e
No o e sampling
O e sampling
1 sco e
0.7071
0.6866
P ecision
0.7092
0.7005
Accu acy
0.7180
0.7095
We can easily no ice ha he o e all esul s a e wo se a e he o e sampling and
e en hough i may seem coun e in ui i e his is some hing good. When we
inc eased he numbe o samples on he ex eme cases, we we e making i mo e
di icul o he classi ie o co ec ly make p edic ions as some bias and noise
was being in oduced. In u n, he ecall o he e y ad anced and e y delayed
ca ego ies could imp o e and his is wha ac ually happened. Fig. 3.4 shows ha
he ecall o he e y delayed ca ego y was mo e han doubled going om a 16%
o a 34%, on he o he hand, he ecall o he e y ad anced ca ego y wen om
26% o 35%. Thus, we can claim ha o e sampling echniques wo ked pe ec ly
ad hey allow us o as ly imp o e he esul s o he unde ep esen ed classes.

52 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
Fig. 3.4 Accu acy change in he ex eme cases be o e and a e SMOTE.
3.5 Final assessmen
Du ing he di e en es s, we acciden ally came ac oss a e y in e es ing esul .
The con usion ma ix on he igh -hand side o Fig. 3.5 can be ob ained wi h an
SVC classi ie uned wi h he ollowing hype pa ame e s: gamma se o scale,
class weigh se o balanced and ke nel se o b .
Fig. 3.5 Compa ison o he con usion ma ix o a andom o es and an SVC
classi ie .
0
0,05
0,1
0,15
0,2
0,25
0,3
0,35
0,4
Ve y delayed Ve y ad anced
Accu acy be o e and a e SMOTE
No o e sampling O e sampling
RESULTS 53
Al hough his SVC solu ion is no op imal, in ac i s 1 sco e is only a 50% (16%
less han he bes one), he esul s o he ad anced and delayed ca ego ies a e
eally good. The accu acy o he ad anced ligh s is a 90% and o he delayed
ones, we each a 78%. On he con a y, he accu acy o he planned ligh s is as
low as a 32.5%. Anyway, when we look a he le -hand side o Fig. 3.5 we can
see ha he accu acy o he planned ligh s wi h a andom o es is a ound 88%.
The conclusion is: he bes way o e alua e i a landing ime in Ba celona-El P a
ai po is going o be as planned, ad anced o delayed is o use an SVC classi ie
and a andom o es . We eed an ins ance o bo h models and we ask hem o
make a p edic ion bu also o calcula e he p obabili y o ha p edic ion being
co ec wi h he sklea n unc ion p edic _p oba. I hey ag ee, we di ec ly classi y
he ins ance and i he e is a disag eemen be ween hem, we use he p obabili y
unc ion o b eak he ie.
54 Time o land p edic ion in Ba celona-El P a ai po based on machine lea ning classi ica ion models.
CONCLUSIONS
This p ojec p esen s a me hod o calcula e he ime o land in Ba celona-El P a
ai po . To do ha , ADS-B da a was decoded, p ocessed wi h a py hon sc ip and
a anged in a ma ix o ma ha could be used as an inpu o ML classi ica ion
models.
Thanks o a unc ion called ea u es impo ance, we ha e been able o assess
how he di e en a iables con ibu e o he inal decision o he classi ie s. The
mos impo an ones a e he ARP azimu h and he al i ude ollowed by he ini ial
ime, he heading and dis ance o he ARP. Two hypo heses we e also made
claiming ha nei he he ai line ype no he day o he week would ha e a s ong
impac on he decision p ocess. Whe eas he i s one is ul illed, indica ing ha
ATC do no p esen any kind o p e e ence o ce ain ai lines, he second one is
no . I was ound ha he day o he week has indeed an in e media e in luence
on he classi ie s abo e o he ea u es such as he unaways o he isibili y.
Two di e en s a egies o di ide he es and he aining se we e ied and i
was ound ha a s a i ied sampling s a egy deli e s be e esul s. This is
consis en wi h he idea ha he da a om which models lea n, mus be unbiased
and mus be as ep esen a i e o he es da a as possible.
Rega ding he di e en ML models, he Nai e Bayes classi ie was he one wi h
he poo es esul s. On he o he end, we ind he andom o es classi ie , despi e
i s simplici y, he esul s a e ou s anding. I we compa e i wi h he second-bes
classi ie ( he MLP), we can see ha an un uned andom o es al eady o e s
be e esul s. Looking a classi ie s wi h an in e media e pe o mance i is
in e es ing o see he di e ence be ween he SVC and he logis ic eg ession
classi ie . The SVC b ings sligh ly be e esul s and his is p obably because
al hough hey use simila algo i hms, he i s one looks o he hype plane which
di ides he a iables and he second looks one only o a line. Hence, in highly
nonlinea p oblems like his one, con aining 27 ea u es i is mo e easonable o
go o he hype plane.
The hype pa ame e uning has p o en o be e y e ec i e, specially he andom
sea ch, since i allowed us o imp o e he esul s in a easonable ime and wi hou
in es ing an excessi e amoun o esou ces. Also, he o e sampling echniques
allowed us o enhance he less ep esen ed ca ego ies by c ea ing new ins ances
om he exis ing ones. Pe haps combining his wi h an unde sampling o he
mos ep esen ed classes, would b ing e en be e esul s.
Finally, we concluded ha o co ec ly p edic he ime o land in Ba celona, he
mos easonable s a egy is o use a combina ion o an SVC and a andom o es
classi ie along wi h a con idence sco e o hei p edic ions.
The u u e wo k o his p ojec should be mainly ocused on u he imp o ing he
eliabili y o he ML models. To do so, he mos logical idea would be o inc ease
he da a olume by inc easing he numbe o ins ances and ea u es o he
da ase o ying new models ha o e be e esul s.
CONCLUSIONS 55
Adding mo e ins ances is ela i ely easy as he se e al eady con ains all he
ligh s ha landed in Ba celona du ing 2017 and he wo inal mon hs o 2016. As
he code o con e ADS-B epo s in o inpu ma ices is al eady c ea ed, we
should only p ocess he da a and ep oduce again he esul s o see i some hing
changes.
Adding ea u es is mo e complica ed because new code has o be de eloped bu
in u n i could be much mo e e ec i e han jus inc easing he ins ances. Some
unknown a iables could pe haps be he key o be e es ima o s. Th ee
in e es ing ea u es ha could be added a e: he numbe o ai planes in he
ai space a he same ime ( a ic densi y), he cumula i e delay o he ai po
when he ai plane c osses he ci cle and a bina y ca ego y indica ing i he plane
has been doing any kind o holding be o e en e ing he ci cle.
Ano he idea o u u e wo k would be o comple ely edi ec he scope o he
p oblem. Ins ead o p edic ing he ime o land, we could p edic wha a e he
waypoin s ha he di e en planes will ly o e in hei app oaches owa ds
Ba celona. This is also a classi ica ion p oblem ha could be ackled by c ea ing
one classi ie pe waypoin able o ecognize i a plane is going o ly o e i o
no .