YOUNG Seminar Series Machine Learning NeEDS Mathematical Optimization Branding the role of OR in AI with the Support of EURO

    Speaker 1: Prof Paula Carroll, Chair of the EURO WISDOM Forum
    Introduction to the EURO WISDOM Forum

    Speaker 2: Nuria Gómez-Vargas, PhD Student, University of Seville, Spain
    A predict-and-optimize approach to profit-driven churn prevention

    Speaker 3: Maria Eduarda Pinheiro, PhD Student, Trier University, Germany
    Mixed-Integer Quadratic Optimization and Iterative Clustering Techniques for Semi-Supervised Support Vector Machines

    Speaker 4: Bahar Taskesen, Ph.D. candidate, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
    Distributionally Robust Linear Quadratic Control

    welcome everyone to a new JN session GRE our machine learning needs mathematical optimization online seminar series we are very pleased to be back and to see many participants today and today is a special session as we count with the participation of Professor paa Carol who is chair of the Euro wisdom Forum of women in society doing operational research and management science and we also have this time the collaboration of our past Co organizer CIA kurish shenko introducing the speakers as always we are going to have three speakers and all the questions will be asked at the end but if you have some urgent matter you can write it in the chat so the floor is yours paa thank you very much niia and thank you very much for the um invitation to speak today so hopefully my couple of slides have popped up and you can see them here we go so you can see my slides good okay I will um just talk for a couple of minutes so firstly thank you for the uh invitation uh to speak I just want to tell you a little bit about what wisdom is so it’s a general interest group so it spans uh a lot of um interests and activities and it was established in uh 2020 so I am the chair but I’m just the chair of a a larger group so lots of people involved and what we aim to do is provide a platform to support and encourage and Empower all genders to participate in o or and specifically we have a couple of actions and things that we uh try uh to do so um these are our three main uh aims so to advise and the Euro executive and you or uh group about the issues that women face uh we also try to promote uh various different ways of networking and championing uh young women H especially those at the early stages of their career in o or and uh we also want to promote a conversation about how or can o or can actually be used um to create a more um diverse and inclusive future so what does that all mean and how do we do us so we um um have a couple of subcommittees and we have one committee that ranges events and one of the events you may have seen is our young women 40 or which is a visibility Initiative for young women at the early stages of their career uh in O War so all of these uh lovely young women here and many other uh young women as well um participate in O War and we want to stay them to stay in O and be able to forge a career for themselves H in O War so you’ll hear more about these uh young women hopefully through some of our uh initiatives um what else do we do so as well as our events we have a research um subcommittee um and one of the first things we did when we got together which was during the pandemic so we were all working online was that we did our own little bit of research on uh gender studies because it’s not something that in general the O Community are familiar with so we did a a systematic literature review and we wrote a white paper which you can find on um research gate and we shared it with lots of people and we asked them to to look at what we had written and to look at our recommendations and our recommendations really were to try and encourage all of the Euro and O organizations in uh Europe to think about equality diversity and inclusion um and to start U formalizing how they approach those issues because it’s historically not something we in the O Community have thought about uh we also set out about figuring out how we could gather some data to measure whether or not what we are doing H is successful and I’m very pleased uh to let you know that our first peer review uh publication has just or is just coming out it is in press we have our DOI so we returned our proofs uh last uh Thursday or Friday so it should be available uh soon so a big thank you to the editors at uh the Journal of operational uh research Society so jores um have published our study on uh gender equality so opportunities and challenges for the o community and I just want to tell you a little bit about the results of that particular paper so what we did was a systematic literature review an SLR and we Tred to find what is happening at what we call the Nexus the intersection of gender and operations research and we didn’t find many papers but where we did find an intersection we found that gender or sex might have Arisen as an explanatory variable in regression models we also found that sometimes it arose maybe in a straint so somebody who was doing Healthcare or classroom tiing and wanted to segregate by sex it might have appeared as a constraint and we also found it appeared sometimes in stakehold holder involvement so because there weren’t many papers there at the gender oor Nexus what we also did was we went and we looked back at papers that talked about the history of oor and we applied what’s called a gender lens so a gender lens is simply saying that gender may be hidden in their somewhere so do a little bit of work on focusing on the gender aspect of what you’re looking at which in this case was the history of a war and we sound found a set of papers and that talked about the history of a war and when we analyzed them we found that they were talking about the Oar community so the people in The Oar Community historically who were mainly men those papers also talked about the theory of o or and again they were mainly men who were um applauded for founding various different subdisciplines uh within uh O War and also on the practice of O we found papers that talked about the history of O War in various different industry sectors and again it was very malale dominated we really didn’t find the presence of women um in the history of o or so what we then did was we also looked at the present so we run a survey at Euro and 2022 um and we ask people to reflect on their experiences in o or and we analyzed the data the responses that we got and we created a decision tree and this is one way of um uh extracting insights from our data and we looked at that data to see what were the distinguishing characteristics and what we found was what distinguishes the gender of our respondents was that um if they perceived that they were genders they were most likely to be female compared to the male respondents didn’t perceive that there were so many gender uh barriers to their progression in a career in o or so I won’t talk about this too much more you’ll have to read the paper so if that’s the present the rest of our research is looking towards the future so what are we going to do uh next and this is our work in progress so we’re looking at collaboration and co-authorship networks we’re also looking at those ideas of what’s called gender mainstreaming it’s the idea that there is some gender Dimension to nearly everything we do so we should look at our policies and practices to see what is the impact of what we’re doing on gender equality so we think there’s huge opportunity here at this gender or or Nexus so that’s our research and just very briefly about what’s happening next so upcoming events H our next event and I should have said those young women 40 or one of the ways that we give them visibility is that we host a set of webinars where they get to present their um uh research H so we reach out to um the O community and invite some subject matter expert to um meet um in that webinar with some of the young women H so our next expert is corelia uh and our next speakers you can see their names here soia who’s also with us today will be presenting on the 17th of May so our PR sorry what else is happening euro is coming up in the summer so we will have a workshop on the Sunday in advance of the uh main Euro conference and all of our young women for a war from the current cohort and the previous cohort will also have a special session to talk about themselves and their research so giving them a further platform to to connect with the or Community um the last set of activities that we have is a PR uh subcommittee and they help us connect and promote um what we’re doing so we have a website we have an email address we on LinkedIn Twitter and we have a YouTube which has a um the back catalog of our uh webinars so you’re you’re welcome to connect with us there so finally I would like to say thank you so thank you for the opportunity to speak today to give you a very high level overview of what we’re doing in wisdom so thank you to all of the uh coordinators uh today to Euro who give us some Financial Funding and lots of moral support uh a big thank you to all of the wisdom committee members so they do a lot of work in the background as do all of the ml needs op uh committee here today as well uh thank you also to all of our ordinary members and all of our subject matter experts who have responded to our call to engage with our young women for O War so thank you very much everybody for listening I’m going to stop sharing because I know we have some lovely H talks to H here H today thank you Nua and over to CIA thank you Paula for your very interesting uh presentation um the decision tree is very impressed so uh I’m definitely going to look into the paper and check the results so now it’s time uh for our first speaker uh nura she’s a PhD student uh at the University of civilia and um in Spain and today she’s going to present uh a predict and optimize approach to profit driven CH prevention so nura uh please floor is yours thank you Senor okay so as Sena said I am from the University of seil but today I am presenting our work that we did during a stay in the faculty of business in the University of Chile with professors Sebastian Maldonado and Carla vti we wanted to work in this field because as you may know the markets are each time more and more competitive and they need to have good relationship with their clients a good way to to achieve this is through retention campaigns which consist on targeting customers that are likely to leave our company in order to increase their customer lifetime value this clv is nothing more than a metric that represents their value and it is an average of their or the value the the money that they spend also how frequently do they spend their money and finally what we are interested in is how long is this relationship with the company so H traditionally what is done is charm prediction or defection detection H which consist on H using a predictive model to identify the customers that are likely to leave our company and this has been applied in several sector such as the financial or even telecommunication and higher education if we see this problem as the student Dropout so typically what we are going to work with is with a is with a binary classification problem where we consider a customer I which is described by a vector of features X that can either be a Cher which we denote by y equals to zero or not and the goal is to estimate through machine learning models a probabilistic outcome is H to classify the customers into Cher or not depending on certain threshold T that we have to choose and traditionally it has been chosen according to maximizing some statistical measure here we can see for example accuracy or maximizing the true or false positive rates but in profit driving charm prediction methods H what is done is to use some goal oriented metrics in particular here our goal is to obtain the maximum profit that that we can and we can see the the form of this profit by considering a third time fixed cost of contact of CTIC de CLI and some incentive cost D which is only incur if the C if the customer accepts our offer so it is expected that all the fors will be Turner which is the customers that are incly classified as chers will accept the incentive as they didn’t didn’t never want to leave but uh only a fraction gamma of the contacted W chers are supposed that uh accept the int the incentive and stay in the company which ends in again clv for us and the clv is considered much higher than the campaign cost also we want to note here to remark that the cost Associated to losing some po itial Cher is zero and we also want to enhance that so based on this outputs here we can see the formula of the average profit and it consist on using this cost with also the proportion of each instances falling in each category so what it is done in the literature is um first the work of B who propos it a use this profit formula to choose the the threshold T and it is similar to the method proposed by nesling which considered the total profit instead of the average one later B ring proposed a stochastic version to this model where the parameter gamma the probability of accepting the the incentive was assumed to be a beta hand distribution and this Expressions also have also been used H in different Frameworks for example it has been used in Prof logic to train the machine learning ER method and also we can find the vast versions to this problem but they all share one thing and it is that choosing only one threshold implies to have to use one average clv for all of our of our customers and this can in suboptimal solution as the clv can be very heterogeneous in some markets so as a solution Maldonado at all propos it using a different segments of the values of this clv and maximizing a profit for each of these segments from or train classifiers but we want to go on a step further and um address the problem of the retention campaign and the prediction of the uncertainty all at once through predict and optimize approaches and we are going to consider then an optimization problem where which is H in its parametric form that depends on some um parameters why that are the uncertainty and we assume that they do not affect our fible fions so what it is done in great and optimize which you can also find as end to end learning or decision focal learning is is to integrate the training phase of the machine learning model and the optimization of the decisions so generally we are going to use an output Hardy of the predicted model that is going to induce a prescribed decision set by solving the problem with our H predicted parameters this uh decision is going to include some cost after the true parameters are seen or known and the excess of course due due to the fact that this solution might be suboptimal is what is known as the suboptimality gap of the bread and it is thought that its minimization should be the criteria to H train the machine learning models so we say that we do not work on charm prediction because it is it consist on separate steps first estimating the probability of char then deciding a threshold and then targeted some customers without taking into account the individual CD but that we do charm prevention as we cast in one optimization problem the optimal balance between its profitability and their Pro potential to CH so we consider this optimization problem where we are going to take some decisions by in the binary variable sets H depending on this optimal balance per each client which takes into account the individual clvs so from this expression if we work in the nominal problem where the intention to chart was known we could derive an expression for the optimal solution of the of the problem depending on the individual clv which is H if the customer is not avilable enough if the costs are positive then do not Target the the client regardless of their in to CH and otherwise Target decline only if it has intention to CH obviously but as you know we are working on the uncertain scenario where we have a machine learning model which is going to pro pro provide some output and here with Rec call that it is like a prediction score it doesn’t have to be a probability because it serves as a mere intermediate for us and that we focus on the decision set and here we can derive the relation ship between the prediction and the prescribed solution which is Target the the client only if if it score is smaller than a third time threshold that depends on this individual customer lifetime value from these Expressions we can give an explicit definition for theor optim or optimality Gap but recall that this loss function should h be used to train the machine learning model and we need to differentiate it so we have to give an smooth version to it which can be easily derived by using the sigmoid function and with this we are in conditions to give a proper loss function that allows the inclass of the individual CV and also we have addressed the challenge in predan optimize of computing the D the derivatives of the solution with respect to the ER predictions also with this expression of this regret we have successfully captured the indirect cost of losing some potential Cher from the experiment you can see that we have evaluated 12 different data sets where the charm is predicted from 24 features and here you can see a summary of these data sets when where you can see both the sizes of the and the test sample and the proportion of in of instances in each class as well as what was the average clv in each of the data sets for the predicted model we Ed neuron networks and for the parameters of the profit measures we use the one in the literature except for the incentive cost D which is harder to Define if you do not have a prior information of that specific Market so we perform a sensitivity analysis for five different value doing 60 experiments in total and I’m going to show the result for the smaller one as it is the closer to the literature in this table you can see that we have performed a comparison between our model P for charm prevention with proy as implemented in the literature by the authors and the maximum segment H profit which is implemented with five different classifiers and as well those same five classifiers but uh where the threshold was chosen According to some statistical measures from here we have reported on the profits that’s why you can see the minus sign and in each column you can see H what was the maximum total profit that could have been obtained in the case of the nominal known parameters and for each of the classifiers we you can see the The Profit that it provided in the test sample as well as the accuracy from here we can see that our model performed the best in seven of the 12 data set in terms of profit and that in terms of accuracy andom forest was the best one but to better see all these results we have summarized here um a classification we ordered them in function of the Bing for each data set and we show here the average as well as the average profit and we can see that our model outstand but we also performed some test as the Freedman test to conclud that at least one algorithm’s performance was different than the the others across all the data sets and From Here We performed the homes H procedures from a pair wise comparison between our model and the other 11 ones H to conclude that at least there are some difference between our model and all those classifiers that we are training s where the threshold was chosen according to the statistical measures as for the sensitivity analysis here we presented the normalized optimality Gap from here you can see optim profit zero profit and we can see that our model was the best one performing for the SM smaller incentive value but that as this value increases all the profit driving methodologies tends to zero profit and we think that this could be due to the fact that as the incentive cost increases there are not enough client that can cover the campaign expenses finally we also wanted to check whether our higher profit was only because we are targeting a larger sample of customers so here what we compare is the fraction of customers that are targeted and the total profit and I show here two examples where we um Target the same or even less customers that the second best performing model Prof logic and that invol cases our model results in a higher profit so to sum up our methodology could effectively include different heterogeneous CRV values in an individual level and that with this we could effectively penalize the indirect cost that is of EX clading potential chers all this is done by taking into account our optimization goal which was to maximize our campaign profits during the training step not only afterward afterwards to choose the the threshold and finally that experiment show the superiority performance of those meth methods that maximize The Profit during training which are our model p and Prof logic but that the enhanced performance of ours could be due to the fact that we can include the individual clvs in the regret framework so here you can see all the references that I showed and you can find our paper online if you have interest on on the matter and if you have questions then I will be happy to answer all then at the end thank you very much thank you nura for your very good uh presentation it was very interesting topic and um um to remind you we um have all the questions uh in the end after all the speakers present the talks so our next uh speaker is uh PhD student Maria from uh University uh threeyear University in Germany and today she’s going to present her uh research on uh semi-supervised support Vector machines so Maria please be yeah looking forward to see your presentation hi so thank you for the introduction and also thank you for the invitation I will share my room here and then can you see my my presentation hi yes yes everything perfect all right uh so today I will talk about my work with my supervisors H Pablo burgar and Martin shimit uh where I will talk about a mix integer optimization model for semi supervisor sport Vector machine using cluster techniques so to start I will talk about the motivation of our work then what is part vector Mach our approach some numerical results and I will finalize with the conclusion okay so uh in supervising learning we have a population with n label data and the idea the goal is learn a function f that can predict the Y of a new point x so here we have apple and isber and what we want to do is a machine learning model such that we can we can say if a new fruit is a apple or it’s a straberry so this is the goal of supervising learning using some label data to predict the outputs for new inputs H however one thing that can happens is that only a small part of the label it’s available uh and more than that this subset that we have the label it’s no representative so for instance here we have a half perc of the label available and this number can be much small and also when you we see the label data we have more is bars than Apple but when you look to the entire data it’s 50% so this subset of label data it’s no representative so in the semi supervising learning we use the label points but also we use the unlabel points that are the points that we don’t know the label and well one thing that’s also can happens is that we can know H the how many unlabel points belongs to each class so for instance in German we know how many people are employed and how many people are unemployed but because of the privacy of the data we don’t know H who exactly is employer and who is unemployed and another example is in a super market that the owner want to know H who it’s paying cash and who it’s pay in money to do some kind of uh promotion but he doesn’t know this information he just know how many people pay in cash and how many people pay in C in cards so the idea here it’s working with semi supervising learning with this cardal constraints that say for us how many points belongs to each class and then now I you can talk about support Vector machines okay so to start to a very based stuff in sport Vector machines we have the points with positive labels and the points with negative labels and we can see here that we have several hyperlanes that separates these two data sets but the goal of support Vector machine is find this optimal hyper plane that that it’s the hyper plane that maximize the margin between the two data sets so talking about optimization this is the model proposed for supervisor support Vector machines so our object function is minimize the norm square of Omega that is the same thing that uh maximize the margin so we are minimizing this and when you look to our constraints we have the following if Yi it’s positive it’s one we want to have that our inner prodor also be positive on the other hand if the points belongs to the negative class y i is minus one and to these inequality holds we need that this in prodor be negative and here with this CI we accept some erors so we penalize in our object function so this is the support Vector Machine model and where we only work with the label data however as I told before we can have a lot of points that we don’t know the label so they are the unlabel points but we have this information that it’s how many of these unlabeled points belong to the positive class and then we use this number to here in to try H predict the the label the unlabel information okay so I will now present our approach to lead with this support Vector machines with this cardinality constraint okay so everything that is in Black it’s from the standard support Vector machines and what we do it’s add this right um constraints here and what these constraints are telling for us is the following if the inner productor is greater than zero the binary variable of this point is one and if the inner prodor is negative the binary variable of this point is zero and what we also do since we have have this cardinality constraints what we do is that we sum our binary variable so here we are counting how many unlabel points belongs to the class the positive class and then again here we say that the sum is between T and minusa 1 and T plusa 2 to accept some tolerance and what we do is again penalize in our object function so this is our mixed integer quadratic programming model that uh works for support Vector machines and we call constraints uh semi supervisor support Vector machines but one thing that we can see here is that for each on label point we have a binary variable and in general when we working with this mix integer program we know that if we have a lot of binary variables this can increase a lot the computation cost so to try to overcome this we use some cluster techniques so clustering we have the following information we have a number of data points and a number of clustering and the goal is obtain un segment for every data point to exactly one clustering and then we also have a croid for every cluster and we also have the cluster size and the cluster size the sum of the cluster size be Al always will be equal our number of data points okay so now instead of using each unlabel point we use the Centro of each cluster so we have almost the same thing but we have three difference the first difference is in these constraints we’re using the centroid of the each cluster the other difference is in this sum and what this sum is doing is the following if the centroid of the cluster belongs to the positive class we are saying that we are counting at least that all the elements in the cluster are also in the positive class and the third G is that instead of have the binary variable regarding the unlabel points we have regarding the croid we have only regarding uh the Clusters so here if you take a number of cluster much and small than the unlabel data we have few binary variables than before okay so here we have an example we have some positive points some negative points and the black ones is are the unlabel points and what we do here it’s we cluster the unlabel data and we solve this optimization problem here but one thing that’s happening here is the following if we take a look in this green clustering we can see that there is a cluster that is cut by the hypop plane and in these constraints here we are saying if the centroid is in the positive side we are counting that all the points in this cluster are in the positive side and this is not true here we can see that this point is in the positive size and all these are in the negative size so what we do here is the following we split these clusters in two one one cluster is the one that is above and the other is that are below and we compute again the hyper plane solution of our problem and then we need to check if there is any cluster cut by the hyperplane and here we can see that we have and we will split this cluster again until there is no cluster cut by the hyperplane so this is the reclustering method so it’s writing here exactly what I say so first we will clustering the unlabeled data we will solve the optimization problem and then we will check if there is any cluster that is cut by the hyperplane if there is we will split these clusters in sh and we will start a new interation computes a new hyperplane and we will do this until there is no cluster that is cut by the hyper plane and this we call Rec cluster method okay so when we have this approach we also start to thinking in some uh it’s small improvements that we could do so the first thing that we can do is that uh if you take a look in the support Vector machines in general the points that are far from the hyper plane they don’t have the same importance that the points that are closed the hypop plane so we handle with the situation and in our model we have this m this Big M parameter and we also provide a updating rule for this m in each interation of our approach so the first ter that we could prove is that our algorith terminates and also that the object function found by our approach is the upper bound of the original uh formulation that it’s the one without cluster okay so based on that I can present the numerical results so uh to do the numerical results we consider 9 s data sets for each data set we create five b sample so in a b sample we have H some points have more probability to be in the sample than other points and we use Julia G Rob and jump and for each instance we take a time limit of 1 hour and then for more this 400 instance we consider the following We compare the following so svm that is the approach that only the label data is considering we also uh using this CS 3vm that was the first approach that that I present here that lead with this cardinality constraint then our rec Improvement recluster method and also we using this um approach as War starting so this is what we have Improvement and warm starting recluster meod okay so now I will present the results so the first is the accuracy that measuring h how many points we correctly classify so what we did here is that we compute the accuracy for each method and also and then be divide by the accuracy of the svm consider the entire data sets and what we can see here is that our three approach they had a better accuracy than sport Vector machines and more than that if you look in in this two we can see that we also have a better accuracy than the blue one which means that this idea of reclustering also improves the accur okay so when we look to the time we can see two things the first thing is that uh svm is the fast Alward but this is expected because support Vector machine does lead with the binary variable but when you only look to the approach that consider this cardinality constraints we can see again that the idea of reclustering also improves the time so this is the red one the red one but we only solve uh 50 55% of the instance but one thing that we need to remember is that we have a theorem that says that the point find by the ercn it’s only a feasible solution for the problem it’s not the optimal solution so what we could do is measur how good is the solution of our approach to the original problem and here we compare exactly this so we compare how good is our solution and if you we can see here we can see that the the ICM that is the red line very fast H can follow the two other approach which means that our object function get by our approach it’s closer and the other thing that we can see here is that the svm provides also a feasible solution for our problem but we can see here that the quality of this solution is really really bad okay so to finalize the contribution is that we present a interactive recluster method we also take care about tailor Dimension reduction and we use warar techniques we also present a rule for the Big M and we have a better accuracy than the original svm however the problem is very challenged to solve as we can see in the numerical results and our work was accept for publication in the transtion operate research uh and I think it will be very soon available but for now we have the prayer printing archive so it’s that from my side thank you so much thank you so much for this impressive svm model um so now we have our uh last speaker uh Bahar I hope I pronounce it correctly um no it’s great no no it’s great yes okay thank you so Bahar is a PhD candidate from E po technique feder Federate sorry uh e po technique in loan yes yes uh uh so and today she’s going to present a distributionally robust linear quadratic control um model so let’s um start your presentation the floor is yours okay uh okay perfect that’s great so let me uh try to share my screen um okay all right so yeah okay everything works okay okay perfect so uh can can you see it now yes yes everything is good thanks okay okay perfect so um now I’m going to talk about distribution robust linear quity control and I would like thank you uh for uh this kind of introduction as well and um also inviting me uh to speak in this seminar so this a join work with dannyan Coit and um Daniel Kun and also let me actually set up a timer to make sure I will be on time and um so the problem that I’m going to talk about is linear quality control and it has several applications across numerious domains for example it is used in the it was used in the navigation system for Apollo eight and it was critical for the first moon landing and uh after that it was it’s used heavily in several domains including electrical mechanical chemical engineering and it has several exciting applications in medicine and economics okay so here is our road map so I’m going to first introduce this linear quality control problem and we will see that it is affected by uncertainty which is modeled as random variable and we do not know uh is under uh the distribution of this uh Rand R varable often time and this is going to incentify us to look at the robust version of this problem after that I’m going to introduce you some structure results that is going to lead to efficient uh design of efficient algorithms to provide numerical solution for this problem okay so let’s get started with thequality control problem okay so therefore we are studying a linear dynamical system where the next state XT + one is a linear function of the current state XT control put UT and some process noise the controller aims to minimize this expected cost which incentify the controller to bring the stage to zero without using too much input okay and often times in many applications we do not have exact State measurements rather we have imperfect State measurements that is the controller only has access to outputs which is linear transformation of the current state corrupted with some measurement noise and a control policy can be only implemented if it is Cal that is it should be a measurable function of the output history observed up to that time so it may not depend on the future outputs at the time that we’re applying the control so therefore we can write this optimization problem as we see here where u y denotes the set of causal controllers with respect to Y okay so so the blue notation in this presentation represents the indigenous uncertainties that depend on the inputs U while red notation refers to the exogeneous uncertainties and this includes the initial State process noise and measurement noise and the joint distribution is represented as p and I will assume that they’re mutually independent uh with mean zero under this distribution P so if you look at any distribution that satisfies this assumption then optimal inputs has a very elegant solution it is product of feedback gain Matrix and state estimates so what is nice is that this feedback gain Matrix is in can be computed by dynamic programming independently of the underlying distribution P while the state estimates can be nonlinear in the conditioned uh output history and in fact in general is sharply hard to compute luckily when the underlying distribution is a gan distribution then the state estimator is linear in the output history and it can be computed computed efficiently using common filtering so therefore it is heavily applied in numerious domains because we can efficiently solve this problem if the underlying distribution is Goan however the tractability is lost if the underlying distribution is not Goan and often time we do not know the exact underlying distribution of these uh random variables so this incentified us to study the robust version of this problem so now here I formally State our robust distribution robust control problem where this time the controller minimizes the expected worst case loss uh uh cost where the uh worst case is taken with respect to all distributions of the exogeneous uncertainties that belong to some ambiguity set okay so this is a zero sum game between the controller and a fictitious adversary or the nature that uh chooses a un distribution that will be undesirable for the roller okay so now I will introduce you waserstein distance which uh to Define this ambig set it uh it basically uh a method to compare these two uh given two distributions and it um by minimizing the transportation cost of one distribution into another and it gives a geometric way uh measure of uh measuring difference between two distributions so in general when two distributions are discrete Computing bin distance is equal to solving a linear program however when one of the distribution is not discrete and it is it is it is continuous let’s say then this we lose tractability of this problem luckily when two distributions are Goan distributions was time distance between these two distributions is available in closed form and we see here that it only depends on the co variances if these two distributions have zero mean and in 1990 galri showed that this close form solution provides lower bound to was time distance between two generic distributions that are perhaps non Goan par sorry uh we constantly have some jumps uh to your first slide uh maybe you could yeah I don’t know uh maybe you could uh start sharing again okay ah sorry I’m okay let me try uh okay let’s see that so now how is it like now it’s uh I think it’s stable but let’s see I will let you know if it will jump again okay okay thank you okay okay so now I’m going to refer to this uh lower bound this close form as gric distance uh between two Co variances it’s denoted by this G okay all right so now let’s look at our distribution robust control problem and let’s put a little bit more structure on the igit set W so we’re going constructed as product of several ambiguity sets for each exogeneous uncertainties and uh we’re going to as as we construct as product of different ambiguity sets when we select one distribution that belongs to this ambiguity set W then it’s going to be it’s going to still satisfy this assumption that we had in the beginning that mut the exogeneous uncertainties are mutually independent so let’s construct one ambig set for the initial uh initial uh state so this uh ambigous it is going to con uh contain all the distributions with mean zero that has at most rx0 was Stein distance from its nominal distance okay and we can similarly construct the other ambiguity sets for all exogeneous uncertainties now the question sorry again yes I’m sorry so I think it’s because of the uh full mode so if you could uh exit the full mode I think we will not jump because it jump always for the slide you okay yeah so okay so then what should I do should I uh yeah yeah proceed but like that I I think uh it will be better because then we don’t have jumps uh to to another slide okay less like that yeah maybe we’re going to lose then the some of the okay maybe then I will okay let me let me present okay let me share again the my desktop this time and then I will uh go here and then try again because there were some animations how is it now uh now I can see your slides uh it’s not uh jumping it’s okay it’s very smooth okay okay perfect okay thank you okay so now the question is as we construct these ambig sets which is what is their actually nominal uh distributions right so we set this to be Goan with mean zero so hopefully this makes sense because when we set these radiuses to zero right we end up looking at the linear quartic control problem with these where the underlying distribution is going to be these nominal distributions and we would like to hope that we can solve these problems problem right and we know that we lose tractability if the underlying distribution is not goal okay so we set the centers of these ambigu set to be Goan but I would like to emphasize that these BS ambigu sets contain also non Goan distributions okay all right so so beyond the looking at the robust version this problem is still like uh very challenging to solve because of its inherited structure so the outputs affect the inputs because we are looking at causal inputs that are uh causal in outputs they affect the states and States goes and affect the outputs therefore we’re facing with some chicken and egg problem because the information available to the controller depends on its previous decisions okay so such problem structures are known to be very challenging to solve therefore when we look at this distribution robust linear quadratic gotion control problem in fact we have very little hope to solve it okay so now I’m going to show you a reparametrization technique that is going to allow us to write down this linear quadratic control problem as a convex optimization problem without the robust part part part first okay so this technique is called purified observ outputs and it’s introduced by Balo and neosi in 20 5 to Define it we’re going to copy our original noisy system okay we’re going to delete the noise from the copied system and they are defined as a difference between the measurements in the original noises system and measurements in this copied system and through a simple algebra we can show that these pried outputs are linear function of the exogenous uncertainties so they are not dependent on the indigeneous uncertainties or control input you basically what is perhaps more surprising is that the uh controller is indifferent between seeing purified outputs or the measurements that is the family of caal output feedback policies is equivalent to the family of purified output feedback policies okay so now then what we can do is we can replace these outputs with purified output EA and look at the control policies that are causal in the purified outputs as these purified outputs are only depend on exogeneous uncertainties we would simply break this cycle and uh and basically complexify our problem okay so therefore when we write this traditional problem simply we can replace these controls to be causal in the proi outputs okay so now we’re going to look again to our distribution robust problem where we reparameterize using purified outputs and we’re going to establish some structure results okay so for that let’s study the Dual of our Primal problem where we interchange the mean and the max okay so so in the Primal problem the mean player plays first and the max player responds right and in the Dual problem the max player plays first and the mean player responds so in each time when the second player sees the ADV uh strategy of the first player therefore has an advantage which means then that we immediately have this weak Duality so the optimal value of the Primal problem great is greater or equal to the optimal value of the Dual problem so why this is important because we’re going to use this V Duality to establish these structure results so the structure results are as follows first the Primal problem is solved by a linear policy the Dual problem is solved by a Goan distribution and finally we will see that strong Duality holds okay and this is we’re going to achieve this the proof idea is going to be uh built upon this V toity and sandwich argument we’re going to come up with an upper bound to the Primal problem and a lower bound to the Dual problem and finally we will show that upper bound to the Primal and lower bound to the Dual are in fact equivalent which means that all of these inequalities are going to be equalities okay so we construct our upper Bound by uh restricting the feasible set of the minimization problem to linear policies and relaxing the feasible set of the maximiz ation problem to distributions that has the same first and second order moments as the distributions in this ambig set W okay and this gives an upper Bond immediately okay and we show that the the upper Bond problem admits a finite dimensional mean Max problem and uh I don’t expect you to pass pass this slide immediately but what we can see here is that this inner maximization problem is over W and V which is in Block diagonal form where each block corresponds to covariance of the underlying noise uh underlying random variable and satisfies that uh this this inequality constraint such that it uh it deviates from its nominal Co variance at most its own radius with respect to garic distance okay all right so this this finishes our step one and in step two we construct the lower bound to the Dual problem and it’s very simpler actually so we just restrict the outer maximization problems feasible set to normal distributions okay uh that that that belongs ambig set W and as the outer problem is only over normal distributions in this MB set W the inner problem is classical linear quadratic gotion problem and we know that linear policies are optimal therefore without loss of optimality we can simply restrict the inner problems feasible set to policies that are linear in the purified observations okay all right okay so now then we show that we can also reformulate uh this lower Bond uh as a finite dimensional Max me problem and in fact what we can see is this upper Bond and lower Bond are very similar problems with the same objective while only the order of minimization and maximization is different by showing desirable properties these of these objective and feasible sets we can uh we can leverage from C celebrated mean Max theem and show that the order of minimization maximization in fact doesn’t matter and now this means that all of these inequalities are in fact equalities and each equality is going to realize a result uh a structure result that we’ve seen before equality a tells us that the primary problem is solved by a linear policy equality B then says that the Dual problem is solved by a Goan distribution and equality C then implies strong Duality FS okay so now we will use this structure results to design an um an algorithm that an efficient algorithm that’s going to provide us a numerical solution okay thanks to this reformulation in fact we can look at this uh Primal problem we can dualize the inner maximization problem and write this as a giant STP that scales with the time Horizon and the square of the state Dimension so while this is nice in theory that our algorithm is in our problem is in fact tractable it’s bad in practice because large stps uh are are difficult to solve so therefore we’re going to seek for a efficient method to solve this problem okay we take advantage from the structure results and show that the first we show that the the optimal solution of the Primal problem and optimal solution of the Dual problem um forms a n equilibrium therefore if we can find this P star that solve the Dual problem then the solution of this uh of this inner Primal problem is going to be just a classical linear quadratic gion control problem under this distribution P star and which we know uh how to solve it have to compute it efficiently through comma filter and dynamic programming okay so um then the question is like how we’re going to compute this P star and we show that actually we look at the Dual problem and we show that we can actually uh compute it by using uh A first order uh optimization method so we look at the Dual problem uh we we just simply for the for the for the Simplicity of exposition now I just call this inner problems optimal value as a function of w and V we show that this function uh this F of w andv is in fact smooth and concave and we can therefore use Frank W algorithm to efficiently compute W star and V Star which is going to con which is going to allow us to construct the underlying distribution P star so now I don’t go to the details of this algorithm but we can see that this direction finding uh sub problems in fact can be solved by fast bisection method it decomposes over time and we compute the gradients through monop um by using uh pytor back propagation back end okay and finally we show that our method uh provides 600 full speed up to mosc that sols the STP that we’ve seen before okay so here’s the summary of results but I feel like we are I’m over time so yeah with that I would like to conclude my talk and thank you very much for your attention thank you okay thank you very much Bahar it was a great presentation now it is time for questions so if you have some some please write them in the chat or raise your hands and that we can allow you to unmute and ask it yourself paa if I can very briefly I know we’re just up on time so I kind of just had some very quick comments so firstly I thought they were three wonderful presentations well done to all of you I really enjoyed all of them and today I’m wearing my wisdom hat and when I spoke about wisdom I said we’re interested in where is that gender or or research you know where are those questions so noria when you spoke you you mentioned that in your churn data sets you had a data set with 24 features and I want was wondering were any of those features the gender of the customer and I thought probably not because quite often those things are masked which brought me to Maria’s presentation where she was talking about predicting the labels and I thought well actually I don’t want to be unmasked so the reverse of what you’re trying to do I would like to stay hidden so rather than being able to predict the label as being female um I would like you to not predict that I’m female so it’s possibly something these are not quite what you’re looking at at all but maybe interesting tangents and then Bahar to your uh presentation at the very end you talked about when um the distributions are not gusy and your problem becomes intractable and you had mentioned medical applications and it did occur to me that when we look at disaggregated data by gender uh sometimes the individual distributions are normal but the combined distribution so a male distribution is normal a female distribution is normal but the combined human is bodal you know so if you were to look at an application a medical application it could be good to keep the uh genders uh disaggregated they’re really comments they’re just really I thought your presentations were fantastic I really enjoyed them and I know we’re way over time so they’re they’re comments rather than questions well done to all of you I thought they were great talks I’ll hand back now and I unmute myself thank you very much paa for your comments we have a question for Maria I think uh Mustafa so we have it in the chat so the question is how would you approach tools of dimensionality in the clustering pass yeah so can you hear me yes yeah okay so I think that the more Pro problem here is that if you increase the dimensionality maybe you need start with a large number of clusters then you don’t have points that are so far away in the in each clustering so I think maybe this could help thank you okay Mustafa if you want to ask the question to Bard yourself I can I can also read actually is it this is this is this is the question right so in the chat right okay uh so the question is like how would imperfect State measurements translate humanitarian or related optimal transport domain so um we look at an example like in terms of like um in in inventory control so uh let’s say that you you you know the how many products that you have for example in your inventory but let’s say that they are perishable product so they age differently so actually your state would be like for each age how many products that you would have but you only know the total number of products so you do not exactly know for example with for each age which how many products that you have usually this is for example we T an example of bananas let’s say that you have bananas and then they’re aged differently right but in the inventory you would know how many bananas that you would have while while uh for each age you would like to know their uh uh how many that you have for each age if you want to design a policy for a p perishable uh product control does thank you thank you so it’s it is also related to the reliability domain too I’m I’m sorry I think it’s also Rel related to the reliability and resilience domain too yes yes yes I guess yes we can say that yes thank you great is there any other question uh can I ask you Nua um so my question is um about uh for example you are a new firm and uh but for example you don’t have money to conduct this experiment could you use pretrained the neural network uh from other firms uh to be able to predict those customers in your model yes of course you can use a pre Trin un Network or even start easily training a NE network with a standard loss function uh and then just include those input WS that are frozen and then continue training with our suboptimality okay I see so but uh in order to do that you need to have the same features right as other firms head exactly yeah but transfer learning will work as well with this okay and uh last question uh to you uh what type of Industry did you test uh in your uh numerical section okay it is a Chilean company that is in the mutual food industry that’s all that I understand of the field okay thank you so much thank you Sena so if there is no there no other question then okay yeah I have a question from Maria first thanks for all uh for their presentation uh in one slide that Maria presented if I’m not mistaken that was a box blot that we can see the accuracy more than uh one or 100% um I want to ask her how it possible yeah so this is a have a question that a lot of people do because what’s happening if so it’s the relative accuracy so we compute the accuracy for the sport Vector machines consider that all the entire data sets has the label information and for a few case can happens that for instance our approach classy more classify better than the the true svm so when we divide one by and another we have a more than one because we have the relative accurac there oh okay thank you okay any other if not I think that we can I see REM REM was uh showing oh no sorry it’s my sorry yeah sorry no hands no question great so I would like to finish with saying that the needs project is ending and this was our last J session so I wanted to say thank you all for being here Through The Years thank you thank you very much thank you thank you thank you very much thank you all

    Leave A Reply