This is the presentation capture of the LCN conference best paper candidate
    – entitled
    – presented by Christos Profentzas
    – all co-authors: Christos Profentzas and Magnus Almgren (Chalmers University of Technology, Sweden); Olaf Landsiedel (Kiel University, Germany & Chalmers University of Technology, Sweden)

    The best paper candidates plenary session has been held at IEEE LCN 2022 on Monday, 26 September 2022.

    See more about the LCN awards at https://www.ieeelcn.org/Annals_Awards.html

    And, we apologize for the huge delay it took to finalize the video editing of the presentations of LCN 2022, we even not got this ready before LCN 2023 started.

    Okay so now um if you can release the screen from your end I will actually thank you very much for the presentation so now we move on to our uh last best paper candidate um please feel free to share your screen Christos I believe you’re online

    Yeah can you hear me yes we can hear you uh so please feel free to share your screen and full screen mode and you hear do you see the presentation we see the presentation and we see you so great thank you very much so yeah I’ll just introduce you and then

    You can go ahead so um again this is H our final candidate for the best paper award last but not least it’s just the ordering as we mentioned is going to be um in person and Then followed by remote presentations so we have Christus here to present to us microt TL a very

    Interesting paper on transfer learning uh from Chalmer University of Technology Christ the floor is yours thank you uh thank you for the presentation and without delay because I holding you from l i will going on for presentation so uh I will start to by uh the motivation about why we uh look into

    This particular problem and uh where everything started is that nowadays we see that the current trend is to aggregate data to the cloud and have a centralized approach for machine learning but uh this uh uh actually is sifting because of the privacy issues and it’s not just the researchers that

    Look into it’s also the companies that uh uh have look into alternatives for uh centralized approach of uh Cloud aggregation of data and machine learning uh and uh un particular uh approach that we look into is on device learning uh and uh this comes because uh iot uh devices

    Have a lot of data and we can locally proceed in real time and by applying on device leting you shift this local personalization towards the device and the user is more happy than knowing that big companies in the cloud can make assumptions about what time weit where we travel photos that we

    Uh take uh in our holidays so uh this is the motivation and uh uh there is a challeng however like uh trying to bring traditional algorithms that run in Cloud nodes where resources are abundant and also the system is quite complicated you cannot just take the algorithm and put it in a

    Low power device with 256 kilobytes of Ram uh as I say cloud has abundant resources and the data is easy to be transferred on the other end the resource uh the resources are constrained on these iot devices and the bandwidth is limited you cannot uh take data and if you

    Considering that iot devices communicate with pver you don’t have so much uh easy access of the data uh and also you need to have a clean design so uh the moment that you put something on the device and becomes an application you cannot ask for more RAM more CPU

    Compared to Cloud where On Demand you can ask always more resources so with that uh we propose microt TL on device transfer learning uh the goal is to have on device learning with local data and we do that by tailoring transfer learning for these devices and

    Uh uh I need to point out here this is not the full-fledged learning is uh you retrain parts of the network so our contribution is that we enable of course transfer learning but also we show that it’s feasible to take parts of the Nets and retraining on really low power out

    Devices and the main reason that a benefit that you do that is that you can adapt dynamically to changes of your initial problem because nowadays that when you deploy uh deep nuron Network very uh usually you never change it so so you train a model you put it on it

    Device and you do some kind of inference for your application to classify uh pictures or audio but uh as we all know is iot applications are quite dynamic uh we can shft and thust learning can help you by uh taking an existing problem and uh shft

    Del so moving on to so you actually how this looks like on the device is like uh there is two steps involved one is that uh uh during inference you you collect intermittent outputs from from the inference layers this is can be convolution layer or can be a RNN on the device Rec

    RoR and this is a a compressed format because it’s IND so you collect a lot of data and you only decompress the data when you actually you want to train and I need to repeat again we train here uh the last two layers which is the fully connected layers we don’t train train

    The whole network uh and this gives you the benefit of uh just applying uh like to compressing the data only during training and because it’s we use uh it’s typical to use Minibots uh graded desc where you take a small number of uh data it’s time to train your network uh this quite

    Visible uh however this uh sounds nice but during uh the actual experiments we saw that the main problem is that uh like why why why I do this why I compress and de compress the data this is the main question here but I try to answer is

    Like um when you quanti quantize Anor you have inders so you don’t have a smooth uh curve to take gradients and you can see the output are ating and this works for inference but for learning you need to calculate the gradients so our solution was to reverse uh the quantise outputs

    Uh and uh this looks like like this image can look complicated but we apply this to quantize function and when we say qift is a funy word to to say that uh your data needs to be align on a zero point because uh from when you have a domain

    From integer to float the zero point can shift so this Q shift is just to align the zero point which is really important for Activation function in neural networks and then you have a scale factors that takes integers and make it to floats and actually this avoids the main problem of Vanishing gradients

    Because if you keep the your data in integers uh it’s hard really hard to train and again since we train only the last part of a network uh we find out that this is visible and also you you have an efficient storage so it’s uh quite beneficial to to keep this

    Structure so uh we actually uh t test our algorithm in a real Hardware in a uh low power developing board this is ANF 52 and this is a common chipset that you can find your Smartwatch so you can imagine like the application scenario can be that uh your Smartwatch already being trained

    For uh 10 applications 10 activities running biking swimming uh going to the gym but you don’t want all of these actions and a user may be it’s more into running and another user is more into biking so you just collect uh a small amount of samples

    From 100 from the user and you retrain the network locally the device so this actually can be the application scenar here and the interesting results that we find is that uh there is a big tradeoff of the energy you need to consume and the accuracy

    That uh you can achieve and in the first graph we see that uh uh we don’t have a lot of resources so you will not do so much fine tuning so you will not run for 20 EPO you will apply early stopping in practice and you

    Stop in the 10 F Po and you will be happy with your accuracy that you receive uh so there is a lot of more things to look into the future about how you uh find the sweet point where you stop where you train and how much training you actually can achieve and

    That becomes more interesting because the inference part that you collect the data it happens on the digital signal process of device which actually consumes uh I mean can draw more current than the flood Point Unit where we do the training and again I need to remind you

    That I trained um the last part of the network if you want to train the whole network uh will be quite costly so we avoid this approach and uh uh there is more things to look into more advanced algorithms how you can actually uh take this results and utilize new

    Approach so I want to uh conclude this by uh saying that what we propose is uh and what we show is uh transing all priority devices we saw that is feasible to not just doing inference but uh you apply learning on really constraint devices and the the main benefit that I

    That uh is good for the user is that you address a big privacy issue because again nowadays uh even with all these methods that uh uh we try to propose for uh federating learning and others the the still Cloud collects a lot of data from us

    And we don’t know how the companies are using uh however yeah cloud service are still important uh you need to somehow at the end incorporate your iot device with a cloud because you don’t have resources and I think with that I can take your questions thank you very much thank you Chris um so questions from the audience I actually have one to to start Christa so you kind of uh semi answered it with the very last line on slide number 10 uh I mean the point of trying to do this transfer learning on the low power iot devices make sense but you

    Also as you mentioned there a lot of this data is already collected on cloud services now with Edge uh and fog networks you know we already have significant Computing resources close to the edge so what is the main motivation for trying to do this transfer learning on the iot devices themselves given that

    They move they have you know serious mobility issues they have you know constrained resources and so on so the main motivation is that some data is not available uh before you actually the user start using the device like uh uh for example I live in Sweden

    You I ski in the winter but my SmartWatch doesn’t know this and some data is available after actually I start using the device yep and uh this is also uh depending depending the kind of data that you want to send to the data can be quite expensive actually to

    Transfer the data to the cloud and uh uh I mean it’s not like uh I try to Buton Cloud but can be a win-win approach while like uh when a device needs to balance the small battery that they have and try to preserve energy they can apply like uh

    Some transfer learning but when you have a lot of battery and you’re at home you can just download a new model so it’s not like I just a cloud services so I think can be a win-win approach sure so the the two main targets were basically conserving of the

    Battery power but also uh having more responsiveness in the system right so that you don’t have to wait until you do this offline so perhaps uh maybe even in future work they have certain directions of dissecting different types of services which ones should be uh done locally using transfer learning on the

    Iot devices which ones should be transferred to an edge or fog Network so the the thing is that uh if you have a um like human activity where you uh have time series data this dat is actually not so big so you your local Edge can actually process them faster but if you

    Have data that needs some kind of images uh can be quite expensive actually and uh and also one thing to considering here is that sometimes you throw data or because of lack of communication uh not so much memory the device actually will need to throw some data so

    I think images are more important to considering here and like if you have time series maybe you can compress it in yep send send faster thanks Chris so there’s one more question from the floor yeah uh so thank you for the presentation uh so like I have uh two

    Quick questions so the first one is that like uh did you like compare between transfer learning like conducted on iot devices and like learning techniques uh basically that can be run on the cloud I mean in terms of accuracy and the second question is how you can for see uh like

    The collaboration between both transfer learning on iot devices and the cloud services okay thank you so yeah actually I I I’m I compare the accuracy with I mean again we need to considering here that no matter if you do transing of the device or the cloud at the end you need to

    Quantize your network on the device so one detail that maybe I didn’t it’s addressed on the paper and you can read it is that my fully connected layers at end is still in floating point so I can achieve a little bit better accuracy than having a totally quantized Network so the I can

    Compare to the cloud I I have like a benefit about this but uh at the end of the day the cloud you always outperform you you just save uh energy by not communicating and that’s the main benefit so the collaboration with the cloud usually is about uh like

    I’m not sure actually I’m looking into how how the cloud can know uh what data is important on the device and you can actually decide if I need to send this device and training new network from scratch or I you know it happens you have already a network on your device

    Let’s utilize it do some transfer learning and Achieve better accuracy for uh you know till a new version of software is available so this is hard question I think okay thank you thank you very much Christos thank you Thank okay so with that we wrap up our best paper candidate session thank you to all of the presenters and authors and obviously the attendees we really appreciate it we now break for the lunch

    Leave A Reply