Proudly sponsored by PyMC Labs (https://www.pymc-labs.io/) , the Bayesian Consultancy. Book a call (https://calendar.google.com/calendar/appointments/schedules/AcZssZ1nOI_SElJzSiQ2sXBDiaW9w98ErjnHVzmHcSilYNWeXxJgV870NGuWZUGo3W-8-gDG8jIXQhBf) , or get in touch (mailto:alex.andorra@pymc-labs.io) !

    • My Intuitive Bayes Online Courses (https://www.intuitivebayes.com/)
    • 1:1 Mentorship with me (https://topmate.io/alex_andorra)

    As you may know, I’m kind of a nerd. And I also love football — I’ve been a PSG fan since I’m 5 years old, so I’ve lived it all with this club.. And yet, I’ve never done a European-centered football analytics episode because, well, the US are much more advanced when it comes to sports analytics.

    But today, I’m happy to say this day has come: a sports analytics episode where we can actually talk about European football. And that is thanks to Maximilan Göbel.

    Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan. Before that, he did his PhD in Economics at the Lisbon School of Economics and Management. 

    Max is a very passionate football fan and played himself for almost 25 years in his local football club. Unfortunately, he had to give it up when starting his PhD — don’t worry, he still goes to the gym, or goes running and sometimes cycling.

    Max is also a great cook, inspired by all kinds of Italian food, and an avid podcast listener — from financial news, to health and fitness content, and even a mysterious and entertaining Bayesian podcast…

    Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome work at https://bababrinkman.com/ !

    Thank you to my Patrons for making this episode possible!

    Yusuke Saito, Avi Bryant, Ero Carrera, Giuliano Cruz, Tim Gasser, James Wade, Tradd Salvo, William Benton, James Ahloy, Robin Taylor,, Chad Scherrer, Zwelithini Tunyiswa, Bertrand Wilden, James Thompson, Stephen Oates, Gian Luca Di Tanna, Jack Wells, Matthew Maldonado, Ian Costley, Ally Salim, Larry Gill, Ian Moran, Paul Oreto, Colin Caprani, Colin Carroll, Nathaniel Burbank, Michael Osthege, Rémi Louf, Clive Edelsten, Henri Wallen, Hugo Botha, Vinh Nguyen, Raul Maldonado, Marcin Elantkowski, Adam C. Smith, Will Kurt, Andrew Moskowitz, Hector Munoz, Marco Gorelli, Simon Kessell, Bradley Rode, Patrick Kelley, Rick Anderson, Casper de Bruin, Philippe Labonde, Michael Hankin, Cameron Smith, Tomáš Frýda, Ryan Wesslen, Andreas Netti, Riley King, Yoshiyuki Hamajima, Sven De Maeyer, Michael DeCrescenzo, Fergal M, Mason Yahr, Naoya Kanai, Steven Rowland, Aubrey Clayton, Jeannine Sue, Omri Har Shemesh, Scott Anthony Robson, Robert Yolken, Or Duek, Pavel Dusek, Paul Cox, Trey Causey, Andreas Kröpelin, Raphaël R, Nicolas Rode, Gabriel Stechschulte, Arkady, Kurt TeKolste, Gergely Juhasz, Marcus Nölke, Maggi Mackintosh, Grant Pezzolesi, Avram Aelony, Joshua Meehl, Javier Sabio, Kristian Higgins, Alex Jones, Gregorio Aguilar, Matt Rosinski, Bart Trudeau and Luis Fonseca.

    Visit https://www.patreon.com/learnbayesstats to unlock exclusive Bayesian swag 😉

    Links from the show:

    • Max’s website: https://www.maximiliangoebel.com/home
    • Max on GitHub: https://github.com/maxi-tb22
    • Max on LinkedIn: https://www.linkedin.com/in/maximilian-g%C3%B6bel-188b0413a/
    • Max’s Soccer Analytics page: https://www.maximiliangoebel.com/soccer-analytics
    • Soccer Factor Model on GitHub: https://github.com/maxi-tb22/SFM
    • Max webinar on his Soccer Factor Model: https://www.youtube.com/watch?v=2dGrN8JGd_w

    Max’s paper using Bayesian inference:

    • VARCTIC – A Baysian Vector Autoregression for the Arctic: “Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis”: https://journals.ametsoc.org/view/journals/clim/34/13/JCLI-D-20-0324.1.xml

    Forecasting Arctic Sea Ice:

    • Daily predictions of Arctic Sea Ice Extent: https://chairemacro.esg.uqam.ca/arctic-sea-ice-forecasting/?lang=en
    • Sea Ice Outlook (SIO) Forecasting competition: https://www.arcus.org/sipn/sea-ice-outlook

    Some of Max’s coauthors:

    • Philippe Goulet Coulombe (UQAM): https://philippegouletcoulombe.com/
    • Francis X. Diebold (UPenn): https://www.sas.upenn.edu/~fdiebold/

    Abstract

    by Christoph Bamberg (https://christophbg.github.io)

    We already covered baseball analytics in the U.S.A. with Jim Albert in episode 85 and looked back at the decade long history of sports analytics there. How does it look like in Europe? 

    To talk about this we got Max Göbel on the show. Max is a post-doctoral researcher in Economics and Finance at Bocconi University in Milan and holds a PhD in Economics from the Lisbon School of Economics and Management.

    What qualifies him to talk about the sports-side of sports analytics is his passion for football and decades of playing experience. 

    So, can sports analytics in Europe compete with analytics in the U.S.A.? Unfortunately, not yet. Many sports clubs do not use models in their h…

    As you may know I am kind of a nerd and I also love football I’m a PSG fan since I’m 5year old so I’ve lived it all with this club and yet I’ve never done a European centered football analytics episode because well the US are much more advanced when it comes to sports

    Analytics but today I am happy to say this day has come a sports analytic episode where we can actually talk about European football and that is thanks to maximilan gble Max is a postdoctoral researcher in economics and finance at ban University in Milan before that he

    Did his PhD in economics at the Lisbon School of economics and management and Max is a very passionate football fan and played himself for almost 25 years in his local football club unfortunately he had to give it up when starting his PhD don’t worry though he still goes to

    The gym or go running or sometimes cycling Max is also a great cook inspired by all kinds of Italian food and an avid podcast listener from financial news to health and fitness content and even a mysterious and entertaining basian podcast this is learning basian statistics episode 91 recorded August 23 2023 Welcome to learning basian statistics a forn nightly podcast on basan inance the methods the projects and the people who make it possible I’m your host Alex Andora you can follow me on Twitter at Alex Andora like the country for any info about the podcast learn bats.com is

    Laast to be show notes becoming a corporate sponsor supporting lbs on patreon unlocking Bas merch everything is in there that’s learn based dan.com if with all that info a basion model is still resisting you or if you find my voice especially smooth and want me to come and teach Basin stats in your

    Company then reach out at alex. Andora at pc-abs or book a call with me at learn based dan.com thanks a lot folks and best basan wishes to you all let me show you how to be a good basy and change your predictions after taking information in and if you thinking I’ll

    Be less than amazing let’s adjust those expectations what’s aasian is someone who cares about evidence and doesn’t jump through assumptions based on intuitions and Prejudice aasian makes predictions on the best available info and adjust the probability cuz every belief is provisional and when I kick a flow mostly I’m watching eyes wide and

    Maybe cuz my likeness lowers expectations of tight rhyming how would I know unless I’m rhyming in front of a bunch of blind droing Placebo controlled science like I’m Richard maximan learning ban statistics thank Alex yeah thank you for taking the time I’m really excited about this episode

    I’m really having a variety of podcast episodes these days going from so episode 989 is going to get out in a few days you’ll see it’s about sport also but it’s about the science of sports and nutrition of exercise and nutrition and so today we’re going to talk a lot about

    Sports also but more about football or soccer as it’s known in the US so that’s going to be a fun one and I’m really happy to have you on the show because you are German so if I remember correctly Germany is in Europe and so you would be the first soccer analytics

    Episodes Europe centered which is cool and yeah is one of the thing I’m saying we should do more here in Europe but but before that as usual we’ll start with your your origin story Max how did you come to the world of econometrics and machine learning because it’s actually

    What you’re doing most of the time if I understood correctly yeah you’re right Alex well actually it’s been I well if I say it’s quite a journey it sounds dramatic but that’s not the case but it took me quite a while let’s say yeah that’s B maybe the uh the better framing

    I I started out in my PhD basically the first year years yeah of course just some cour work but I went into the PhD without really having something that I really wanted to work on in particular so I took the first year to see which courses I like which not at my

    University it was not really a lot to choose from I mean we have macroeconomics microeconomics and econometrics the usual stuff but yeah really nothing’s resonated with me so much I have to say and then I thought I would do some macroeconomics think many people or most of the people or PhD

    Students really want to do something in that field so it was also me but yeah I really never yeah got familiar with that stuff so much and I never really liked it uh but in the second year then there was a course on computational economics

    I like that quite a lot and it was also let’s say a tough schedule I had to prepare a proposal within a week and I didn’t have any idea about computational economics but that really got me into looking into that stuff very deeply or or deeper let’s say basically what was

    Working on there was some clustering some unsupervised learning basically uh but it wasn’t really as fancy machine learning back then so what I did was like the project was related to clustering community structure in the S&P 500 basically that was the project yeah but I really thought oh this network analysis this community

    Structure detection that’s really cool I want to work on that and yeah so I thought this would be basically the outline for the rest of my PhD and how did I get into economet some machine learning then because it wasn’t really related to or not really machine

    Learning what I was doing back then so how did I get there then it wasn’t until the third year basically until I got luckily invited to the University of Pennsylvania as a visiting student and I got introduced I got invited by Francis deol and yeah I’ll be forever grateful

    For him for inviting me there and he had a research group on econometrics and at that time the topic was about climate and I again I thought well I’m I don’t care about the topic actually I just want to learn whatever comes to me and so yeah I I took that opportunity he

    Introduced me to his research group and they were working on climate forecasting climate econometrics and that’s how I got basically really introduced into econometrics because before I went to to the University of Pennsylvania I thought like yeah I basically know what’s going on and I have this and this project and

    That’s cool but when I really arrived there I really got to know what what uh PhD in economics is really about and yeah that was that was pretty insightful I would say yeah and that’s how I got introduced basically through this research group through projects that we

    Were working on and then there was one guy he was Frank’s ra and yeah he was working on machine learning in particular and basically once couple of weeks in he came to me and asked me well Max you want to get me that and that

    Data and we can work on a project and that started off a a long well quite well a couple of years now of co-authorship within with who is now a professor at ukam University of kbec at Montreal and he’s working a lot on machine learning and he basically introduced me to to that

    Sphere and so in the end it was the third year of my PhD that I got introduced into chometric and machine learning and yeah quite late as I would say but yeah better late than never maybe so I mean better late than than never right so so it’s cool and you seem to

    Enjoy that so that’s super fun and so today what are you doing basically how would you define the the work you’re doing nowadays and the topics you are particularly interested in yeah well that’s a good question because everyone I got asked that question i al already or always had a difficult time actually

    Saying because I was doing something here something there so in between I also thought I would like to get back to macroeconomics actually but after spending a couple of months on something there and it didn’t really work out I completely ditched it at least for the

    Meantime so what I’m working now is on is basically yeah machine learning macroeconomic forecasting let’s say I had a project on recession forecasting in United States which is probably a Hot Topic currently uh everyone is awaiting it that doesn’t really seem to to occur so maybe we have to wait a couple of

    Months more and then the other stuff is basically related to climate a lot of climate forecasting especially about arctic CIS how Arctic CIS is projected to evolve in the future not only in the near future but also in the let’s say longer run so when Arctic CIS might

    Potentially disappear there are a couple of projects on that that is still like related to that climate econometrics group and then the other stuff is basically yeah machine learning and I got really interested in in finance um asset pricing what you can do like yeah kind of predicting stock returns using

    Machine learning tools there that’s super fascinating and yeah just I mean I have to say that I’m not a a specialist in machine learning or so I’m just super interested and fascinated by the tools and uh the problems that come with them so yeah there’s a lot of well they’re

    Powerful but applying them to finance and economics also comes with some drawbacks and um so yeah you have to work around that and it makes it super interesting yeah for sure and I mean that’s probably by being really interest interes in interested in a topic that

    That you end up being a specialist to it right so it’s like you don’t really start being a specialist and then being interested in the subject outside the cality go the other way around so that’s good like trying a lot of things I end up finding what you’re really passionate

    About so yeah awesome and I’m curious actually in the research realm of Economics which tools do you use machine learning tools to to work in these models like I’m guessing a lot of Open Source package I’m hoping because I remember I was introduced a bit to I

    Mean I knew a bit the econometrics economics field in Europe a few years ago and they were using stata all over the place so I’m curious if that changed and how that changed that’s a funny question because sta yeah I mean some people love sta I’m actually at the

    Complete other end of the let’s say of the distribution so I always try to avoid it as as much as I can I don’t know I never really yeah liked it so what I’m using is basically R and python I also worked a bit on Matlab I like

    Matlab actually a lot but yeah now I’m mostly working in R and Python and it really depends sometimes I prefer R sometimes I pref prefer python for machine learning I’m mostly using python well let’s say for machine learning I’m actually using R let’s say when it comes

    To random forest or gring the trees or something like that or just plain Lassa or Ridge but when it comes to deep learning then I’m using python so tensor flow now I’m trying to switch to pytorch actually and yeah so that’s basically the patch that I’m using interesting and

    How do you choose the the tool the particular tool you’re using for a particular project that’s a good question I think that’s always mostly an art than a science I would say and it’s it’s up to your preference but not all tools work in every context right so in

    Economics it’s really the the problem especially in I would say macroeconomic forecasting where you have like uh time series of let’s say it gets until like 700 observations on a monthly basis for the United States maybe and then you have Feature Feature set of let’s say 100 features when you include lags and

    All that you can yeah Pump It Up maybe to a thousand or something but for machine learning or for deep learning this is still rather small data set I would say so that’s ridiculous actually but still you that’s then the challenge right to tune them to train them so that

    They don’t overfit and that’s really the interesting part for me I think and yeah other in other contexts other tools might work much more um conveniently let’s say or much easier to apply or so some lasso or so when you have a lot of features and you just don’t know which

    Features are important then you yeah I like lasso in that regard because it yeah selects basically the features for you or you might say well you’re in F in an asset pricing context you have returns a lot of noise in there signal to noise ratio very low you really don’t

    Know which features are important so R is maybe the better option because lasso would basically set almost everything to zero yeah it really depends you really have to make it dependent on the the context that you’re working in and yeah but that’s also interesting to see which

    Models prefer um work or work well on which data sets in which context and yeah I’m still learning in in that regard and that’s super interesting no for sure and I find that super interesting also to see this ability of Open Source tools to basically be adopted more and more in research which

    Of course I’m like extremely biased but I welcome but also mainly because I do think that open data and open source are natural consequence but also cause I would say of more open science which I definitely welcome and I think should be way more the case like more and more you

    See papers with accompanying GitHub repositories and accompanying GitHub open source packages even in python or INR which is definitely something new and that’s super cool that the research realm is is catching up on that because less and less you see papers where I remember a few years ago like the first

    Up say up science or up data papers was like oh yeah the data is available by the way at the end of the paper and then you had to basically beg corresponding author about like three times a week for four month to get some of the data and that

    Was not really open basically so yeah that that’s a really cool development that I really love I have to say no absolutely and this is also I think that’s a very good point for example me and my co-authors or my co-authors are pushing for that really to make the

    Codes then also available on the website for example so that people can cross check and that’s very good and yeah I like that also myself when I read papers and I want to replicate something and the authors are making the code available basically you can check if

    Your own code is correct that’s super helpful and that’s you learn a lot by that and yeah especially when for example using boosted trees or so I mean it’s XG boost and it’s super convenient to use and for sure there’s some tuning that you have to do yourself and but

    Still the package is there basically and super convenient to use you don’t have to cope the the whole Forest basically yourself so no for sure yeah that’s super nice and well done like picking up all those different tools and different languages that’s super cool and I don’t

    Know how it changed but I do remember that a few years ago doing open source development wasn’t really incentivized for doctoral candidates or postdoctoral candidates so maybe that changed and that’s further better but if that didn’t the fact that you’re doing it is like even more more commentable I would say

    Because that’s well that’s not really that’s a bit adjacent to your project so yeah well done on doing that and and taking the time to do it that’s cool for sure so now I’d like to talk a bit about yeah so you said you’re doing econometrics but can you define

    Econometrics for us and tell us what it brings to economics basically a lot of weight now for me on giving the textbook definition of econometrics no I I mean it’s basically or now I’m butchering the whole definition probably but it’s applying statistical tools to an economic context

    And trying to use statistical tools to basically verify some economic theory or some to understand some relationships between economic variables so I think it’s a yeah that’s basically it and it’s kind of a fancier term for what it actually is applying statistical tools for understanding economic relationships

    I would say that’s basically and it’s I mean it’s essential I mean for empirical work for sure they economists who you only work on Theory but yeah for policy analysis or for you need to analyze the data in the end and basically that’s what I’m doing um I I don’t really do

    Theory stuff but for me it’s just all empirical and yeah so definitely it’s it’s very useful in the end especially for for policym at central banks and and everywhere also for for the the industry be it banking industry or be it just um normal in the real economy for analyzing

    Demand and all that so so I’m curious how you got introduced to basan methods actually and why they stuck with you because from what I remember from the world of econometrics Base was not used a lot in this field so I’m actually curious why you you are using it well I

    Have to admit like so I already said that it was like third year that I got introduced inom matric and there was this project when Philip Frank’s ra basically came to me and ask me to gather some data on climate variables because we want to run a vector Auto

    Regression of the Arctic basically you basically get some what we basically did is we gathered data which or time series on certain climate variables which we thought with proxy for the Arctic ecosystem basically and um then we wanted to use Vector Auto regression to analyze certain amplification mechanisms

    If there is a shock to CO2 for example and also to be able to produce long run forecasting projections so when RCS might potentially disappear in the future and so the data is highly non-stationary and in vs or yeah when you work with v most economists really

    Work with patient M methods there and as I said data was highly stationary so patient statistics or the frame bent framework gives you some some leeway there GRS you some Freedom there so that was yeah that was why phip then told me okay look at beijan vs look at the

    Beijan way and that’s how I actually got introduced to that and there was at the time I really didn’t have any exposure so there was a package in MLB for doing ban inference basically with vs and that was super helpful that helped me a lot that was super Ed or great education the

    Source of Education really that was great and the more I learned about it the more it resonated with me this concept of quantifying uncertainty I think this is because especially in economics this is Quint essential to really get an idea of what the uncertainty is I mean Point estimate is

    Always nice but you want to have the the the uncertainty around it and that’s also what Frank Dil always told us yeah you want to have a measure of uncertainty and um definitely that’s true and yeah you get it from the in the ban framework just so intuitive to think

    About it and yeah I like that a lot and unfortunately I don’t really work so much or haven’t worked in so many project with with B bent methods uh lately um or as not as much as I would like to but yeah it’s ever it’s ever is

    Ever since resonated with me and still I wanted to learn more and that’s how basically I got into looking at PMC because I wanted to learn with pyth Learn Python and thought well maybe an application evion methods the evasion framework would be cool to learn and

    That’s how I got on into pmc3 or PMC basically or looked at it looked at it so yeah H nice that’s interesting so yeah basically it’s like the uncertainty qu quantifying that was really important to you exactly so that was really the key point there that does make sense

    Right because like that’s really one of the part where base does shine a lot and also especially for the Arctic C eyes project that you were talking about like it’s not like it’s not like it’s a reproducible experiment right it’s really hard in these cases to think from

    A from a frequenti framework of rable repeatable experiments you cannot have multiple Earth which you can do R CS where you melt the ice caps or not and you melt it natural like naturally or thanks to human intervention it’s just like it doesn’t work in that case so I’m

    Not surprised that it would be a project where Bas fits way more natural that’s for sure I mean there for example these climate models from these climate institutions and there these are yeah I mean these are huge models and big malls they to train them or well to to run

    These mods it takes a lot of time and they are very sophisticated so really sophisticated they’re basically deterministic models and they give you a point estimate in the end but our interest was basically really to see well we get a point estimate but we also want to see especially when you project

    The path of Arctic seis the uncertainty around it well How likely is it that maybe or that we see Arctic CS disappearing not at our Point estimate in the 2060s OR 70s but beforehand like how large is the uncertainty maybe our model is really not good and the

    Uncertainty is so much all over the place that it’s more or less useless but yeah in that project it was actually interesting to see that the uncertainty or the the the credible region was basically spanning like 20 years 25 years around so that was very interesting and gave us a quick

    Quantification of uncertainty too yeah that was really interesting that’s really interesting for me to talk with someone who recently got into the B framework and to understand how you get into it and why and how so I would have a lot of other questions on that but I

    Want to talk about football or soccer so let’s switch to that and then if we have time at the end of the episode I’ll come back with my nerdy educational questions so yeah basically you you have an area or hobby of yours where you do apply and need actually basion stats and that’s

    Soccer analytics first yeah I read a bit your website and I saw you were passionate of football since for like since you were a child and you mention a bunch of European championships not the French one though I was absolutely outraged what happened what happened did

    Like don’t you get the French games in Germany that’s another issue so when I was younger really I mean it was only the Bundesliga and sometimes when you were lucky sometimes you got the highlights of the of the French league premier league and the the Ser but uh

    Yeah you had to be really lucky it was not always available and um I wasn’t that I didn’t know the websites where you could watch it basically so that was another another issue but yeah the French well the French leak I was never really a fan of I’m sorry Alex but uh

    Yeah that’s just even though one of my favorite players was Jango so Olympic oh really oh he went to Milan yeah no offense taken I think the yeah the French leag is pretty boring as long as I mean yeah as long as PSG is dominating like that I mean that’s good for me

    Because I’m a PSG fan since I’m like five year old but yeah like it’s not a very interesting league and the level is kind of down with by the years so hopefully we’ll get some investor in other clubs which make for good competition for Paris but until now it’s

    Really bad and it’s actually bad for Paris because the competition inside the country is really bad so then when they get on the European stage they’re not really they’re not really used to the intensity and having so much adversity in a way so it’s too easy for them let’s

    Say so basically but I didn’t get you on the show to trash the French league I want to talk about soccer fact model that you recently worked on and I found it super interesting because that’s mainly yeah the main question I always have in soccer analytics the nerd

    In me is always very careful about takes that you see the the commentators have about players where it’s like yeah but what’s the how do you SE separate a player’s skill from the ability skills and ability from his team’s strength and that’s to is extremely important because

    Mostly in Europe right now most of the clubs mainly invest on players on gut feeling basically and the thing is when you do that and you’re not able to separate inherent player abilities from a team strength then you get a Nora effect from the beginning of your career

    That can follow you even though you’re not that good of a player but basically like it can this Aura can follow you even even though you are not making that much of a difference but it’s just like it’s hard to contradict it because you don’t really have the method of the like

    The scientific way of disproving basically what’s going on that actually well it’s not really your inherent abilities but mainly the people you’re surrounded with and I think it’s like absolutely important to do that and should lead to really revolutionized way of uh transferring players and signing

    Them and so on so that was basically the background for people who are not interested in football even though even if the field doesn’t interest you I think the method and the goal of the model is actually extremely important because you can also think about that in

    Finance for instance like I know a lot more work has been done in finance for that because I mean the return are like basically the incentives of the money are much more important because if you make money or not but I know there is a lot of literature right on basically

    Passive investment versus active investment and like how do you actually prove that an active investment is better than a passive one and that it’s actually due to the skills of the person who invested on the market instead of just random Market fluctuation so like

    You can see that in a lot lot of context where basically information is sparse is hard to decipher and so you need a model to make sense of it so you can see that I would say in football in a lot of sports in finance in medicine also right

    Where it’s like you can have a lot of these celebrity effect basically I think in a lot of context where celebrity effect is important it can be broken down by that scientific way of estimating it so these politics of course movie like I think it’s basically

    A very a theme that’s running in a lot of in a lot of fields where the celebrity effect is extremely big so yeah that was a very long introduction but to say that I think it’s very useful so you can react to what I said and also

    Afterwards if you can tell us what a factor model is because your model you call it the soccer Factor model but then can you tell us before that what a factor model is noex I mean you laid it out perfectly I couldn’t have said it any more accurately I would say really

    On the point as far as I see that so Factor model what it actually is is a factor basically is some I would define it as some proxy for certain exposure to a certain in finance to a certain risk basically also a reduction for example in when you look at economics or

    Macroeconomics it’s often yeah related to the context you have a huge set of features and you reduce it to a couple of underlying factors or a single Factor only it’s a kind of a feature reduction like dimensionally reduction technique like PCA principal component analysis or that and but in finance it’s really like

    A kind of appr proxy for a certain risk exposure that basically kind of the cross-section of stocks or all stock returns are kind of exposed to a certain systematic risk exposure all stock returns are basically exposed to this is basically a factor and you have in the literature in asset pricing has

    Identified several of these and common risk exposures basically across the the whole universe of of stocks basically but as you already said um you can use it also as a quantifying the ability for example of of a portfolio manager so if he is adding if he has some skill in the

    Game basically if he has like really Superior selection potential than just following along these common risk exposures basically and that’s also what this soccer Factor model basically is inspired by to identify certain features that kind of all players are exposed to because of the differences in the teams

    And then when you account for that then you can basically extract the skill and the inheritability of each player after you account for these systematic differences across teams basically that influences the ability or The observed performance of the player for sure because like in the exam in the example

    Of of football like you would say it’s easier to be the number nine so the how do you say in English that position like the front number nine is like the guy who’s supposed to score the goals like the English natives can then tell me what the the name is in French that

    Would be ATT like it’s easier to be the number nine of PSG than the number nine of very small team in France right because the whole the rest of the team is stronger the manager is supposed to be stronger yeah you’re like yeah but maybe if you took the number nine of the

    Small team and you put it in Paris maybe he would perform as well as the current number number nine does so how do you make how do you make the difference so that’s what we’re going to talk about before that I’m curious from a structural standpoint these kind of

    Factor models how do they work like how much time do you need to really for start to decipher the difference between inherent skills and exous basically strength have and that question is basically how much data do you need from the past years to start having an idea

    Like how data hungry are those models that’s definitely a good question a good point so you have to create these or yeah you have so in the model that I’m basically proposing is basically you need I I need a lead time into the season to really account for certain

    Differences so I need a couple of games already that would need to be played to really account for differences in teams because before the first game basically everything or based on the data that I had everyone would have been uh the same it depends really on the data if you

    Have data that allows you to account for differences across teams budget let’s say or so you can just start right away and for overall data I would say like more data is always better if you have only a few observations I think the the basan framework is then tailor made for

    That as well like it yeah it grants you some leeway there but I would say really it’s the more data you have the better but yeah but you could already okay so you could already start having that idea with just a few games then you get the

    Idea of the strength of the team and then you can start deciphering the strengths of the player as far as I always used a kind of a certain number of let’s say burn in games which really account for that yeah and I mean it’s not that superficial right because you

    Can think like right now it’s August it’s the beginning of the leagues for the European teams August is a weird moment where the team team are still warming up basically and they’re not really they are clearly not at Peak Performance usually they try to peek around spring for the Northern

    Hemisphere so around March from February to to May basically they’re trying to get their Peak so they are still warming up they can still trade players until the end of August so you could really say that the games they are doing in August even though they are official

    Games they are still warming up games and don’t really mean a lot for long-term performance perspective so that’s an interesting moment to start warming up the model I’d say and so but something and maybe you have that for future iterations of the model where you could put in the priors we’re going to

    Talk about the structure of the model right away right after that but something I’m thinking about is that you could put in the prior the information that you have about the strengths of the team in the way that yeah you have the budget which is a good proxy for

    Potential future performance but also like just past performance right if you know that Paris has been the champion for nine years out of 10 well you have really good prior about the the strengths of the team right so you can probably also add that into the model

    And in that way reduce the warming up period of the model any a way no absolutely or how yeah Paris against Leon let’s say has performed in the past so they direct the direct comparison between those two basically when they faced each other the past years that

    Would also feed in there so absolutely there’s a lot of potential and my model is when you basically suggesting this stuff my model just appears like very rudimentary but could be definitely extended in that regard yeah that’s the fun thing of modeling rights like you have to start somewhere that’s good

    Enough and then you have a lot of ideas to to extend it and it’s a never ending Endeavor like each model if you wanted to you could work on it your whole life if you’re interested enough you definitely can do that my models that I often revisit are the ones for

    Predicting French presidential elections when I started doing that in 2017 and compared to the one I had for 2022 it’s just embarrassing so yeah like but it in a way it’s it’s it’s good that the work you’re doing right now is the best one you’ve ever done and in a few years when

    You look at the work you’re doing right now it kind of should be the worst you’ve ever done because that means you’ve progressed a lot in the meantime so I think it’s a good it’s a good mindset so how did you adapt that factor model for soccer like what does the

    Model structure look like basically for listeners to have an idea and for those watching on YouTube you can share your screen actually so if you want to share anything at some point feel free to to do it otherwise like the audio format is here for you because it’s a podcast so

    It’s an audio first content if I get it on the screen I’ll do that but for now maybe the structure I think is pretty simple and as you laid it out already very accurately it’s basically trying to come up with some features do some feature engineering that basically accounts for differences across teams

    And when you look at let’s say player a certain player let’s say Cristiano Ronaldo and you really want account for the difference that his current team is currently between his team and the team that he’s facing at the at that EXA exact instance and you want to create

    Some features that can proxy for these differences across teams and that’s basically the heart of the model and this is basically inspired by this these asset pricing factors that try to account for differences across across assets across stocks across firms basically and the modeling part itself is really nothing sophisticated you can

    Includes you include kind of a hierarchical structure you don’t need to but it can help definitely but it’s really the feature engineering that is the heart of it and then IMC comes in very conveniently and just basically does the dirty work for you so then that’s cool if it’s a simple structure

    Yeah can you talk about what was your what was your likelihood and then what kind of distribution you put on the parameters and things like that I think it would be a fun fun thing to talk about for the listeners yeah so the structure basically is relatively simple

    You need some idea of what the performance of the player is and you have to have a proxy for that and well you need this performance to be observed obviously and the proxy that I choose for players performance is whether he scores a goal or not so zero or one in a

    Certain game it’s basically B distributed our y our Target and it’s basically a logistic regression that we are running because what we want to identify is really the skill and the ability latent variable hidden in our observe performance measure basically and so the model is pretty pretty simple

    You need prior you have basically bunch of coefficients that is you have the alpha the skill the ability that you’re interested in and then you have the loadings the coefficients on all the factors that are in your model so you basically have to impose priors for all

    The coefficients and then you have to define the likelihood the nly distributed and yeah that’s basically the model it’s on the workbook and people can go through it it’s there’s also redacted version basically where people if they are fancy can try to work with their own priors and all that and

    Try to do themselves first and then check the under that’s cool they want to play with that a bit yeah that’s basically it so so it’s nothing really crazy it’s um four lines of code the basic model basically and yeah when you look at multiple players so you can do

    That for a single player only but you can also do that for sure for multiple players the key reason is that basically everyone should be exposed to the each player should be exposed to the to these factors with the same loading basically so you can Expo impose a hierarchical

    Structure on the ability and skill of each player you should definely do that but you can impose the hierarchical structure by player or also by season so the ability of the player May evolve over Seasons or across Seasons basically I think something worth looking into or worthwhile doing and then basically you

    Have the loadings on the factors and that they should account for the team effort basically you want to account that and you want to get that out of the way so that you’re basically in the end left with this latent factor the alpha the inherent skill and ability of the

    Player yeah that makes sense and I mean for sure I will put all of these in your episodes show notes and actually I think I can share my screen I didn’t know way I didn’t think about that before and here is the notbook right am I on the

    Right notebook yeah so there yeah so there are a couple of notebooks there so there’s this in the pmon folder that’s the one where there’s the redacted version and the unredacted version and the version that we’re currently looking on that’s the initial part with all its

    Typos in there and yeah ah okay so it’s not the right one then should look at it’s it’s the F it’s fine one so it’s it’s it’s perfect the other one is just a bit smaller and more concise I would say and um so yeah like for those of you

    Watching on YouTube I’m sharing it right now and so basically this is is the part of the model where you’re talking about the likelihood where it’s goal is scored or not scored and then you have here the probability which is basically here this Alpha that you talked about right that

    Is the inherent skill of the player which enters probability and you have the x’s and the beta so the X’s are they the factors or the beta are the factors exactly so the a is are the factors these are the differences across the teams or between

    The teams and this is what you want to yeah basically account for and to clean the perform the observe performance measure from oh yeah okay I see yeah for sure and then the beta is the slope basically on the factors yeah yeah yeah exactly yeah it’s a fun model so of

    Course it’s hard to make it justice on the on the podcast but I encourage you to go and watch that part on YouTube I’m sharing it right now and also you can just take a look at the notebook from from Max which I put in the show notes

    Where you have all the details so it’s pretty fun to to look at and also as you were saying the model is like pretty small so that’s the like the amazing thing right that I find is that basically and now if we go look at the

    PC implementation so a bit later down in the model the really cool thing is that basically the model is quite easy to to code right and in a way that’s just a few lines of codes so basically four lines of codes as you were saying and

    You’re done so that’s the beauty of the probabilistic programming framework right say it’s a really useful model but if you want to get to a first good enough version that already gives you interesting insights you don’t have to reinvent everything and you don’t have to go with the first hardest version

    From the start right where you have a hierarchy called time series model where everything is varying and pulling information sure that’s cool but don’t start with that right like if you’re starting to train don’t start with 100 push-ups start by like try five first and then do a few series of them and

    Then build your way up to 100 so that’s the Ral thing I find of here the pent framework coupled to the part of pro probabilistic programming languages which is you can get down to a first good in version and then in a few line of codes having your version and then

    Sampling from it right because here you have it on the screen the likelihood then you have a line for deterministic which is the logistic regression line and then you have your intercept and your and your coefficient on the on the factors and basically that’s it that’s

    Really amazing I think the beauty of PMC that it allows you to uh describe or build your model in a pretty intuitive way and you can even let it be printed out to see if everything is as you would have expected it and yeah then time see

    Does the the Dirty Work the sampling and all that for you and yeah but it already gives you intuitive idea of how the modeling works and yeah that’s absolutely yeah yeah no it’s really fun well done on that and so I’m curious what are your do you have any ideas like

    Do you want to keep working on this model do you have any any ideas on where to take it from what he is right now that’s a good question actually so definitely the model is can be improved and defin it’s all depending on the features that you have on the data that

    You have and I think the the clubs they have so much more interesting data than I have and they could build many more interesting factors accounting for differences across team so yeah I I really don’t know because I try to reach out to a couple of clubs let’s say but

    There was nothing really coming back so yeah apparently clubs are not interested in that or maybe they have their own models already or something so so I really don’t know um I’d be excited to work on that but as you said it’s rather side project that I did once upon a time

    And yeah it’s not really related to economics or Finance that’s why I’m currently working absolutely on other stuff but yeah I would love to work on that in that regard but yeah it seems not so many teams are picking up on that at least to those that I reached out and

    It seems to be European clubs because in part of your last episodes I heard people talking about that in the United States it’s pretty different and yeah there are a lot of apparently a lot of clubs already trying to implement that really trying to understand the the

    Inherent latent skill of of players not necessarily in circle but in baseball or in other in other disciplines this is sad but I’m kind of reassured to hear you say that because I do think it’s a huge area of improvement that there is in Europe clubs just don’t seem to be

    Very interested the thing I know is that a few English clubs are using data pretty pretty heavily like Liverpool Manchester City clubs like that but still is kind of the exception I know tulus now in France which is a small club and that makes sense right if

    You’re a small Club you have less money so you have much more competitive pressure to find good players which you are not overpaying which is basically where science can help you right uh you don’t want to pay for just a name you want to pay for someone who has a name

    Because he’s got talent not just because he’s got a name so it’s like to me everybody should do that and I just don’t understand understand why they don’t because it’s just like that’s also the beauty of sport right you don’t care about a name you care about what someone

    Can do and if they’ have talent or not like you should not care at all about the name about the color of the skin about nothing else but what they can do on the field and yeah like to me that if I had a club that would be one of my

    First priority how do we make sure we optimize the way we are signing the players because it costs a lot of money I think one club that also does a lot of that data work is in Denmark the FC Midland or something I think name I got

    It completely wrong but I heard Once Upon or that they’re really investing a lot in data science and trying to sign players according to data or at least incorporate data a lot of in their also in their daily training exercises and all that so yeah they are one of the

    Cutting Edge maybe there in Europe as well small Club but I think they won the Danish Championship like couple years ago so yeah yeah not surprised I me something I see a lot at least in France and I’ve seen that a lot also on electr forecasting is basically this idea that

    If you start doing that you’re basically becoming kind of inhuman and you make players being robot basically that’s really an interesting thing to me because one of the spots that really use data heavily is cycling and so in the tour to France a lot of the teams are using now data

    Here again thanks a lot to the British which often who often in Europe are the first one to first ones to take up the data wave and so I know for instance Bradley Wiggins I think he’s won the Tour of France I don’t remember how many

    Times but a lot of times and basically a lot like the whole team was using data to optimize the performances of the team and that was one like the British started being like okay we need to get back on our cing game they started using data extremely optimally and well they

    Did and thanks to this basically a lot of the teams started to do that again and the to of France is extremely optimized on that but it’s funny because when you hear the mediatic coverage of that at least in France it’s a bad thing because it’s like players are becoming

    Robot and they cannot eat what they want at the time they want and they like it just gets the magic out of the toour of FR or else and I strongly disagree with that of course because the performances get better in a clean way of course well then

    That’s just better for everybody the show is going to get better and also we’re talking about the to or professional athletes like their goal is not to recreationally do that they do that for a living so it’s important for their own basically income but also they

    Do that because they want to be the best they are not doing that because well they just want to cycle on the weekends right they cycle for living so yeah sure sure if you’re an amateury cyclist then okay you don’t need the same structure as a professional cyclist but even then

    If you want to improve your performance as an AM matter cyclist you’re going to need to optimize some of the things and if you really care about it you’re going to need to optimize your nutrition for instance and and maybe when you take take your meals or else but if you’re a

    Professional the the the one slightest change can mean you perform one second better or two seconds better which can make you win the toour of FR or not so I don’t understand this argument in this context where you try to optimize performance for for me it’s like not

    Something that should count here they are not doing that for pleasure only absolutely agree oh absolutely agree should be incorporated much more especially for the clubs in the end it will I think it it will pay off as you want to pick a lemon and you just pick

    It it’s an interesting topic for me because I’m trying to crack that net and I cannot crack it for now like understand why basically the the clubs in Europe are not really interested in that because I don’t really care about the journalist side or else I’m like

    Once the club starts picking that up then everybody will have to but what I’m trying to understand is why the clubs don’t do that because it’s just leaving Gates on the table I’m just super curious about why they would do that from a sociological standpoint honestly

    Because I’ve seen a lot of clubs using they have data science teams but they use it for marketing like that’s such a shame I see I see I don’t know why so if anybody knows please get in touch if anybody’s working in a club please get

    In touch with Max or me because I want to know about it we don’t even need to work together I would be happy to help you out with a model but for now I just want to know why and what are the internal factors because definitely there’s something

    Going on but I don’t know what it is and I’m just curious about it so yeah to try and make it a bit more constructive like do you have any idea on like how you like we personally in the data world could change the status quo in that

    Regard and not only for sports but that’s also true for a lot of domain where more robust application of the scientific method would be useful but it’s hard to get it done do you have any ideas personally on how that status quo could be changed that’s really hard to

    Say depends on the willingness to adopt these to be open to these methods I would say and the players play an important part or I think the crucial part because if the players are not willing to adopt these additional insights I would say that’s just not

    Possible but for sure I mean as you say it’s management it’s internal things that are going on there politics potentially but I really don’t know how can someone resolve that I don’t know I regard it always as for sure you shouldn’t base all your decisions on this model or on a single

    Model or so but it can help stimulate your decision process and I think it’s a useful addition and in the end for sure there might be an upfront cost basically to to implement to get the data to implement the model to hire people to produce that but in the end it actually

    May pay off economically because it may save you from picking a lemon overpaying massively so yeah I see it really as a worthwh investment and I think the United us boards has demonstrated that so yeah I mean just look at the US just look at all the other fields especially

    Marketing for instance which is starting and already started to adopt data analysis and modeling aggressively and they just like we do that a l at Labs basically making them save a lot of money and not only save money but make more money so like it’s just yeah like I

    Don’t think this is a question but yeah I mean something you can do I would think if you’re interested in it and having the time something maybe that could work is if you could make some predictions with your model basically and I I I would think to get it per

    Player you would probably need some hierarchical structure in that to get some better predictions but once you get there you have some something of a web page with basically the predictions of the model per player saying basically these player is basically overvalued and this player is undervalued based on the

    Mo the results of the model and then basically see what that gives you during the season because if at the beginning of the Season you can say that player is basically undervalued he’s going to perform better than what the market currently think and then people see that

    It’s true well that’s a clear sign that basically these kind of methods and models are working and so that could spark some interest because definitely demonstrating what the model is for because I’m my hinch hunch I think it’s hunch my hunch is that basically the decision makers in the clubs are not

    Data don’t really know what data is about and they even don’t know what a model is and what it can give you but if you are able to demonstrate what a model can give you because they don’t care about the model the priors the parameters stuff like that you just care

    About the results of the model so if you can demonstrate the results of the model and even better what the model can say about recruiting that player or not recruiting that player that would maybe have a better impact or at least I would say it increases the probability that

    The impact these methods can have get noticed that’s absolutely the case for sure it depends on having the real time data basically getting the realtime data that’s I mean an upfront cost that you would have to pay no but that’s actually the intent really this is the intent to

    Run that model for multiple players as as part of the workbook for example to laid out and to compare which players perform well or not and you see it for example Cristiano Ronald when he won the World Player of the Year Award in 2008 he was basically in the middle of the

    Pack and um in that season so there were other players actually outperforming for example the berberov at that uh in that very season he was playing for Tottenham later on in the year thereafter signed by Manchester United so so you see that and for sure there’s a lot of subjective

    Judgment then coming in from When You observe it and you see the model telling you something completely different but this is stimulating and it should potentially update your prior so you’re sub good friend basically so yeah yeah and forces you to lay out your priers clearly in on paper so it’s actually

    Very important yeah so I would say definitely something like that and if you have the predictions for the biggest number of players on a web page and basically betting based on the model saying that this model this player is going to overperform in respect to the market or underperform in respect to the

    Market that’s an interesting thing and also yeah as you were saying for the individual rewards where the name is extremely like counts a lot where you can see someone like Messi with like who is yeah sure an incredible player but like the number of times he’s got the

    Golden how is it called in English Balor golden bow You could argue that some of these Seasons where he did get the reward and maybe there were other players who were actually overperforming him but they don’t have the name recognition so they are not scrutinized as and they don’t have the confirmation

    Bias going in their favor where it’s like everybody’s looking at Messi because they’re already know he’s extremely good so they just look at confirming the the fact that he’s incredible he’s but maybe not all the time so as to get so many rewards so yeah like that to me that would be a

    Really good way of demonstrating the the utility of these methods basically making it really concrete for the decision makers so before we close up the show I’d like to get back a bit beat on your personal experience with Bas and I’m curious what was your main pain

    Point on this project the sucker Factor model and and just in general when you’re using the the ban workflow what is your main pain point right now in that project I really have to admit that mayor was lucky there wasn’t really a huge pain point I mean it’s not

    Something publishable for paper or so it’s just basically sketching the idea behind the model and basically showing the outline of the model what it can give you but when I was running it the sampling worked pretty well didn’t I didn’t really yeah I don’t remember any really big problems so then when I

    Looked at the model evaluation everything looked fine I mean for example how we can evaluate the how well the model works is when you look at in this logistic regression at the area under the curve for example it’s a popular metric and it wasn’t a reasonable Ballpark and that was fine

    For me so that the model didn’t the results were really what you would or that it’s kind of reliable the results so there was not much of a pain point and that’s that was also nice for me to see that yeah it’s a simple model and it

    Works also pretty simply and yeah I was that was a project that I was pleased to see that there were not many obstacles that I had to overcome nice yeah that’s good to hear and so and in general in the beijan workflow do you identify something in your own learning is

    Costing you to learn right now that has cost you to learn and you would like an easier way to have learned that I have to say that for example with all the different Samplers that are out there that’s not my major field I would like to learn much much more about the inner

    Workings of all these Samplers I mean I coded maybe one of the simpler ones myself maybe maybe once or so but then I really resort to open source packages for that but I to really understand what’s going on I think yeah looking deeper into that that’s definitely

    Something uh I would like to do and and would need to do but yeah I think that’s basically the math of it I think is the most fascinating stuff and how it really works and how it’s then implemented in code I think that’s the most fascinating

    Stuff but yeah the beauty of PC then is if you really want are interested in the outcome and want to fast outcomes it’s pretty intuitive and okay well it’s good to hear yeah and I’m asking that from a developer perspective and also teacher perspective that’s always interesting

    For me to to get a peak in the learning experience of the of the people cool so before we close up the show is there is there a topic I didn’t to ask you about and that you’d like to mention well actually it was yeah my my career hasn’t

    Progressed so much so far so I think we covered everything there so oh yeah pretty interesting and uh yeah you covered every actually everything yeah we did record for a long time so I’m not surprised yeah and I’m happy I I got to ask you the main thing I wanted to ask

    You so that’s super cool in a reasonable amount of time I I’m sure the listeners will appreciate it because the last two episodes the were the two longest of the whole podcast so it’s good to get back to reasonable amounts of time for people I guess and yeah so before letting you go

    I’m going to ask you the last two questions I ask every guest at the end of the show so max if you had unlimited time and resources which problem would you try to solve I think one of the most popular answers is climate change and definitely it’s probably the most

    Pressing problem especially here in Milan currently and you really feel it when I’ve been or throughout the time I’ve been working on a bit of climate econometrics let’s say forecasting RC as I saw what people are really doing it in clim it to them what yeah they’re fascinating people out there very

    Intelligent people so I think money throwing money on me would be wasted in that regard I mean what I’d be rather interested in is like yeah maybe implementing that into sports into sports analytics right to allow teams to access data to have access to data and to kind of create that Level Playing

    Field across players and then really yeah it’s an investment and people spend a lot of especially in investing and in in in banking and finance spend a lot of time on crunching numbers and why not do that in sports as well if you have the data available so I’d be very interested

    In working on that that’s for sure yeah I love it me too that’s a good one and if you could have dinner with any great scientific mind Dead Alive or fictional who would it be that’s pretty a tough question I have to say yeah no really it’s yeah there’s so many amazing people

    Out there and when you read papers it’s really incredible what people are doing and so yeah there’s so many people I would like to talk to well one well one for sure it’s Frank debal the guy the who basically invited me to the univers of Pennsylvania because that was a

    Defining point in my PhD absolutely but then if I could pick one is because we should expand on your network basically Ben banki he was former president of the Federal Reserve he received the Nobel prize in economics well people say there is no Noel prize in economics but yeah

    The Rick Bank price last year for his work on Bank Banks crisis and um yeah that would be super interesting to talk to him he served his country basically then he was assistant professor so how he managed all that and yeah that would be super interesting to talk to him

    Phenomenal scholar and I like reading his papers so super cool love it very nerdy answer awesome well thanks a lot Max that was really interesting you allowed me to rent about some of my pet peeves about data analytics and soccer and I hope people learned learned a bit more

    And of course if they are curious as usual I will put resources and a link to your website side in the show notes for those who want to dig deeper thank you again Max for taking the time and being on this show thanks Alex was a pleasure this has been another episode

    Of learning patient statistics be sure to rate review And subscribe to the show on your favorite butger or purchaser and visit learn bats.com for more resources based on today’s topics as well as access to more episodes that will help you reach true PA and state of mind

    That’s learn Bas dan.com our theme music is good Bion by B Brinkman with MC Lars and megaan check out his awesome work at bbr man.com I’m your host Alex Endora you can follow me on Twitter at alexor Endora like the country you can support the show and unlock exclusive benefits by visiting

    Patreon.com Bas stats thanks so much for listening and for your support you’re truly a goody and change your predictions after taking information and if you’re thinking I’ll be less than amazing let ad just those expectations let me show you how to be a good basy change calculations after

    Taking fresh data in those predictions that your brain is making let’s get them on a solid Foundation

    Leave A Reply