In cloud computing environments, VMs require fast access to resources like storage and networking. The hardware that the VMs access is implemented in software and/or by passing through a dedicated hardware device. Software-based solutions consume extra CPU cycles, thus resulting in poor performance. Also, these require to expose a device-model to the guest, thus increasing the attack surface. Conversely, hardware passthrough provides better performance and security but can be expensive in terms of the number of physical resources, since each device is dedicated to a single VM. This talk focuses on how Vates is working on sharing hardware resources among VMs by relying on dedicated processors named Data Processing Units (DPU). More precisely, Vates work on offloading Xen hypervisor of storage emulation by relying on Kalray K200 DPU PCIe controllers, a hardware accelerator based on MPPA architecture.
—————————————–
The CloudStack Collaboration Conference 2023 took place on 23-24th November. The conference, arranged by a group of volunteers from the Apache CloudStack Community, took place in the voco hotel, in Porte de Clichy, Paris. It hosted over 350 attendees, with 47 speakers holding technical talks, user stories, new features and integrations presentations and more.
—————————————–
👉 Read more about Apache CloudStack:
https://cloudstack.apache.org/
⏬ Download CloudStack: https://cloudstack.apache.org/downloads.html
👉 Get Social – Follow us Online
Apache CloudStack Blog ► https://blogs.apache.org/cloudstack/
LinkedIn ► https://www.linkedin.com/company/apachecloudstack/
Twitter ► @CloudStack
#CSCollab2023 #CloudStack #opensource #cloudmanagement #IaaS #cloud #cloudorchestration
Uh now we have Andre uh starting this session uh it will be for enabling uh dpu Hardware with the xng and I’ll pass it over to him okay hello so my name is Andre Simo and I’m am part of software engineer in xpg team from wates company
Which is software in editor and works mostly for cloud platform environment uh I will talk about um some Hardware facilities recent Hardware facilities to uh to facilitate IU devices providing to dvms uh well little bit uh this work is concerns mostly the lower layer of software stock so maybe not really really related
To uh Cloud platform management and but maybe it will be interesting for you to see what’s going on our level and uh at the end of the road you probably could be interested to use this so here we go uh I will start to talk a little bit
About our team what we are doing and um mostly our architecture to understand uh what is this all this is about after that I will talk um about different approaches and providing device device virtualization to DVM because there are many of them as you probably know uh the in the for virtualizing the
Platform there are 20 years we have the hardware facilities to virtualize the CPU or to virtualize the memory with uh btx PT AMD V MD uh nested Pages it there’s no such uh many facilities from the hardware to enable um the device virtualization there are some kind of building blocks um as vtd
Or which comes from the processor providers or areer iuv Sr IU which comes from PCI group uh but this is not very there’s no common way to do this so there’s many approaches toiz devices so um I will talk to the recent approach which comes from Hardware wender calr
Which provide uh k200 dpu accelerator dpu for uh data processing units uh we will talk about this later and uh I will talk about what was done on our level to make all this work together we’ll see in the next what will be coming next in observable future
To maybe enhance uh even more uh when this work okay so here we go xpg Cloud platform uh well basically this is about well what has provide the whole stock for the for the cloud the there’s an supervisor is there’s a management stack there’s a web interface
Uh uh to to handle to manage all dvms well basically in xng team we were more concerned about uh uh hypervisors so intervene on this level uh in so it’s Zen hypervisor based solution which is Zen as you know probably know is the type one hypervisor which runs directly uh on the bare
Metal um these people are some kind of pioneers and partialization concept which uh which tends to to make uh um some facilities to access the virtualize the CPU or the devices so we talk about this two uh Zan provide uh efficient CPU and memory virtualizations with the help of
The hardware uh they have U the VMS are called in Zan lingo so domain so they have a Dom zero which is uh uh control plane domain which is basically here for to dispatch the the platform resources and to to manage dvms so to start to create and to shut down
And stuff like that after that you have the users spam which are called domu uh well this is the basic VM you can host website or whatever you want disponible the accessible for the users um okay so we’ll talk about iio virtualization um well it’s is as I said
It’s a basic uh concerns in the in the supervisors to virtualize IO because uh um obviously AVM will need that some uh at some point to communicate with the external world to to store the data to communicate with other VMS with the user and like that and there are many approaches which been
Made and none of them is perfect uh well the first approach is probably just to do I device simulation uh this is kindly this is kind of kind of done with the U which do this uh very well but uh when you we talk about software emulation it’s um not
Very efficient uh way to virtualize because um modern controllers are basically kind of complex uh complex piece of the hardware uh so I I took just as example uh uh ATA and interface uh which is uh very very old uh and very basic interface to access the disk uh the hard
Drive in the x86 world and uh this is about uh me not about uh this is not even dma enabled mode just piu mode to access this device and you have to virtualize all the register you have to virtualize kind about uh 200 of AT commands and uh each uh time the VM
Access the register so you need to switch to the software emulator software emulator will think about and we go back to the go back to the VM to so it’s not very very efficient um way to virtualize uh uh the IU device because uh the the plenty of overhead software overhead had
How you handle this and uh the major advantage advantage of this appro benefit of this approach is because you can use native driver in yourm the most uh of Theos is they have at8 drivers in uh its kernel so basically you don’t you take DUS from as this as is and you can
Uh make it use the um the hard drive uh it’s very complex Implement it’s probably complex implementation in the software so there’s this is very error pron there’s plenty of bugs and um well basically it’s not very very used I mean now uh the other way to do the uh your
Device utilization or providing is what I call device sharing so as the CPU is uh used by DVM at some stage and when your timer or or the event um is raised is triggered you you save the context of the CPU from one VM you give the CPU to
The other VM you restore the context so it’s kind of each domain can use a device each one on its turn based on some event Maybe scheding or just the decision of the system administrator uh the thing is He the major the most of the devices not fit very well with these approaches because on um this level you basically exclude all D capable devices because uh if the there’s on goinging dma requests so you have to wait for this it’s kind of it takes time
Um you don’t probably for the most of the controllers they don’t like very much to be uh save and restore the context uh each time you switch yourm so it’s not very some kind of device uh fits with these approaches we can talk about for example I took the example of
Frame buffer which is basically a memory range so you can switch frame buffer device from one VM to the to the other VM or we can talk about RTC device real time clock it’s very basic device you can switch it from uh one wh to the
Other this is not very very used because the most interesting devices as probably network controller so or this controllers storage controllers not fit very well with this approach so the approach which was uh adapted from the most of the most of the hypervisor provider provider as Zan KVM like that it’s about par
Virtualization so instead of emulating the complex IU device uh as uh uh for example uh SATA controller ahci controller you took it on the other level so you have a some plug-in in this in the W kernel which works on the some kind of client server model in uh Zen
World we talk about backend driver and front end driver which extremely simplifies the the device access so if you talk about disk uh it will be about uh read the dis block block write at this block and that’s it so it’s uh very simple simplistic even approach to providing
Device uh but it have a rather the good uh performances the implementation is not very very very complicated uh but you need the specific driver in DVM uh this is more or less can be difficult if for example you use uh windows and stuff like that um basically the backend drial will be
Run in them in some control domain so uh or if you talk about KVM probably L Linux kernel so you have also this bottleneck uh effect because uh uh one server will be handle will handle uh the request from all dvms so you need to do some calization stuff like this uh
Probably this is not very best for the performance performances it’s acceptable but it’s not uh the best way to do this uh also uh in your backend driver in front end driver you can find the security issues the bugs uh for example foran there’s a plenty of CBE xsa they
Call them in the xand world which concerns uh backend and front end drivers so it has it drawbacks this approach but probably for now it’s the most used approach uh in the cloud world well the other way to do this uh is the very very good way it’s
To just grant the access to the device uh for DVM so with some condition the VM totally access uh the device controller from native driver and uh it ask nothing uh Dom zero whatever to do with this it uh the device probably will do some dma so no CPU involvement in this
Too and uh and that’s it uh it offers excellent performances because uh just the user VM will access its device uh do stuff as he wish and uh and that’s it no no major pitfalls except one uh it’s very very costly solution because uh if you think about it uh your host your
Platform um will run probably dozens of the VM and maybe thousands so if you want to provide uh an iio device to to each of this VM you have to have as much of Hardware controllers which is uh not very realistic in the major cases so it’s very good solution but it’s
Uh limited uh it’s uh very difficult to scale when if you have two VMS you can imagine that you have uh for example nvme controller for each of the them but if you want to put uh plenty of them in your in your host will be difficult uh and of course the other
Once you have your physical devices if tomorrow your customer tell you I want to now I have an vme device of one terabyte and I want to have uh two terab device well you will probably have to replace your device and to plug new newer controller with new so not very flexible and
Now the idal world what would like to have well we probably know now said uh device pass throughing is um the very very good way to provide a device performance- wise so we we would like to have the performances uh to access the your device as if the device was pass through
It we also would like to have the software Capac capabilities to specify specify what’s the device cap capacities capabilities so uh I want to reconfigure the device uh the sides of the device or maybe the red level of the device on stuff like that um on the fly without changing without changing
Hardware uh I want the solution to which scales in Cloud environment so you have as many devices as I would like to have on the cloud platform uh as I said don’t there’s uh some solution which um comes from the hardware vendors there are all always um Property Solutions you have for example
Networks cards uh controllers that um Implement uh Sr iov PCI specifications so you can uh with some tweaking uh present uh multiple Network controllers based on the only one network controllers that you have uh uh on your platform uh these other approaches which um based on this s which called
Dpu uh which is which stands for data processing unit which is basically common term to say that uh we have a some kind of CPU which uh oriented with towards uh specific tasks so this dpu can help and they does they do uh to pass through the
Device and so we have uh for example a colray which is Hardware vendor French Hardware vendor based in gr no uh the dpu k200 which basically NPC e uh PCI Express device which you plug in your host and which based on the nvme um devices present in your platform
For example you have a three nvme controller of each one offers I don’t know 10 terabyte of uh storage capacities so it grabs these devices from the system and with some configuring you can have uh as many there’s some limitation but we will talk uh to simplify as many
Virtual NV NV devices as uh as you want so you can specify the size of the device of these devices you can specify which red level will be this device uh you do some configuring uh and after that thanks to us SRV technology these devices raise as the hot plug devices
PCI devices and you can you can pass through these devices to your guests uh worse to mention uh this PCI uh k200 dpu uh use uh pcie peer-to-peer transactions to finally stock the data on the real en uh device so the CPU is totally offload offload of the task to
Do the transactions on the PCI bus on the System bus whatever it’s totally all the all the magic uh go is done on the in the card uh so you can you can have highly configurable V vvme devices you have some tools which comes with this scray um
Device uh you user mode driver which use wayi your framework uh you have some spdk based tool stack to manage this virtual nvme and so on so what have we done to make this run our environments basically as I said um uh k200 is a is a property property architecture
MPP they called it mppa massive parallel processor AR and they run Linux in it so it have has its own address space which has nothing to do with uh uh the host address space so to marry uh as I could say these two address spaces you have to
Do some IO translations some memory translations for when you do the transactions and for that they need to use the capabilities uh unfortunately Zen hypervisor do not provide access to IU to its guests um it use it obviously for its own tasks but you don’t have the access to it um
Uh from your guest and that’s why we we we provided such access uh so we modified Zen IU framework basically first to export these capabilities to control and to export these capabilities we also developed an par virtualized U driver in Linux kernel to be plugged in Linux kernal I
Framework so it exports some function common with the other IU specific devices um which use this framework sbtd or or whatever AMD VI or smmu uh and uh it’s use Zam IU framework offered to his guests to accomplish this task uh also we modified our manage manager tool stack which based on xen
To to be able to configure this virtual NV devices uh specify their size how much and so on and that was basically it about what we have done well it also was some tweaking in dumu to use this virtual Ando driver but it’s not such a big thing and what’s
Next what’s will go next with the uh this uh Cal R cards and with our environment too so nvme has the nvme specification specify some some stuff which called nvme other Fabrics so you have a way to access from uh NVM controller to other controllers on the
Network network in the big in big mean of this word if I if I could say uh so this calor card will have a dedicated Network U uh to access other nvmes uh which can be elsewhere and you even don’t have to have a uh sorry you even don’t have to have this
Real mvme in your platform uh just whole your your whole net your whole storage capacities will have the performances which are comparable of native access to nvme which is kind of performance device and uh they could be placed elsewhere obviously Network maybe have some slow down uh related to the network but it’s
About how you administrate this and when you put these devices but uh your host your platform don’t have to have uh storage capacities so sorry uh so basically it’s about it okay so if you have any question [Applause] any questions for Andre from the room so is it still supported by now on
Xpg or it’s a work in progress It’s uh well this is a project which we kind of achieved on the development level we will need to package this and stuff like that and to provide this because you have to have a special Hardware so obviously we not just uh it’s a if you
Want to use it it’s uh well we will help you to install the configure and to find a good host to host these cards just some little little limitation but um basically no problem with that and uh yeah it’s the thing we we like to deploy uh with our customers
Obviously um can you tell something more about the performance how much faster is it compared to the old nvme approach nvme it’s kind of I mean uh uh I had some bench I did some bench sorry I didn’t present this because I didn’t put in the graphics all this stuff for for instance
Uh it’s basically nvme it’s five five F five times faster than uh the more recent satire controllers uh and uh it preserved this there are some small issues on the smaller blocks uh when you have small accesses than the when you access with the large so I mean this is also tween
On the coloray level because we works with them uh hand in hand to provide a common solution so yeah we talk about these they can also um maybe correct some to see transect some things on their level what’s going on on on the mppp mppa controller so their controller but yeah basically you
Have native performances as is as if you had the the NVM controllers in your guest um one question from uh online audience um a bit obvious I guess it’s an obvious yes but perhaps you can give some examples is question is can we use x PNG in production maybe you can give
Some examples XP in production yeah you mean just xpg or it’s about uh it’s just referring to XP yeah xpg in production I mean we have plenty of maybe give some examples of large environments or any you know users well I know they have Washington University I don’t know uh well look I’m
Not a marketing not s salesman I don’t know about who buyo solution I know I have my paycheck at the end of the so I know there’s some customers get it get it probably but I don’t know if the how many of them yeah
I mean uh from the numbers I saw it’s uh Sky roing sorry can’t can’t response about this all right any other question from the room cool let’s give a round of applause for Andre and this loveely presentation thank you very much bye