Gigue: A JIT Code Binary Generator for Hardware Testing (Video, VMIL 2023)
    Quentin Ducasse, Pascal Cotret, and Loïc Lagadec
    (ENSTA Bretagne, France; ENSTA Bretagne, France; ENSTA Bretagne, France)

    Abstract: Just-in-time compilers are the main virtual machine components responsible for performance. They recompile frequently used source code to machine code directly, avoiding the slower interpretation path. Hardware acceleration and performant security primitives would benefit the generated JIT code directly and increase the adoption of hardware-enforced primitives in a high-level execution component.

    The RISC-V instruction set architecture presents extension capabilities to design and integrate custom instructions. It is available as open-source and several capable open-source cores coexist, usable for prototyping. Testing JIT-compiler-specific instruction extensions would require extending the JIT compiler itself, other VM components, the underlying operating system, and the hardware implementation. As the cost of hardware prototyping is already high, a lightweight representation of the JIT compiler code region in memory would ease prototyping and implementation of new solutions.

    In this work, we present Gigue, a binary generator that outputs bare-metal executable code, representing a JIT code region snapshot composed of randomly filled methods. Its main goal is to speed up hardware extension prototyping by defining JIT-centered workloads over the newly defined instructions. It is modular and heavily configurable to qualify different JIT code regions’ implementations from VMs and different running applications. We show how the generated binaries can be extended with three custom extensions, whose execution is guaranteed by Gigue’s testing framework. We also present different application case generation and execution on top of a fully-featured RISC-V core.

    Article: https://doi.org/10.1145/3623507.3623553

    ORCID: https://orcid.org/0000-0001-9927-675X, https://orcid.org/0000-0001-6325-0777, https://orcid.org/0000-0003-3778-3144

    Video Tags: JIT, RISC-V, Hardware Development, splashws23vmilmain-p76-p, doi:10.1145/3623507.3623553, orcid:0000-0001-9927-675X, orcid:0000-0001-6325-0777, orcid:0000-0003-3778-3144

    Presentation at the VMIL 2023 conference, October 23, 2023, https://2023.splashcon.org/home/vmil-2023
    Sponsored by ACM SIGPLAN, ACM SIGAda,

    Uh talk of the session is going to be given by Quinton ducas uh and this is going to be about how to test new hardware extensions new hardware features when you’re developing Jets I hope yeah thank you for the introduction so my name isas um I’m from the instab in France so

    This is Joint work with my uh PhD advisors so Pascal K and L and I’ll present to you jig which is the J Code binary generator for Hardware testing so what does it mean I’ll first go go into some context and background so I’m sure

    Most of you are familiar with VM so I won’t di in this uh figure that I believe is maybe incomplete but one thing that I would like to highlight is that uh the J Code region is a key component because it is uh both performance critical in the sense that

    It has optimize machine code and it’s also security critical because it’s usually a part of memory that has to handle both writeable and executable memory and as you might know this is usually a bad sign of uh vulnerability that you can uh you can be vulnerable to code injection attacks code reuse

    Attacks and so on and so forth so um there is a history of uh attacks so I mentioned code injection per so code reuse we can also attack some of the VM internals for example even if you have secured your jit code region you might just gives it as input malicious bite

    Code that will just generate malicious machine code uh in VM so I would just like to highlight three stateof the art differences that I think are very interesting so first one uh is called jitu which uses uh intels GX which is uh now deprecated but which

    Is a cqm clav that has uh encrypt CED content that is decrypted on the fly so this is to accelerate and protect the jet compiler then we have nitsu which uses uh infr process isolation with memory protection key so they stay they manage to handle some of the permission

    Checks in the user space and uh accelerate while keeping security guarantees and as last one which dates from last year which is called cities uses Intel C which is a flow enforcement technique that stores a shadow stack somewhere in memory and that also performs indirect Branch tracking so

    What I want to highlight with this three vapers which I believe are very good rits um is that there is definite need for Hardware enforced um security due to the performance it manages to squeeze out of the for performance critical components so now that this is the

    Context of why we might need uh some Hardware features I’d like to give some some background on the far VM which is our mental model for the design of jig so far VM uses an indirect thread interpreter a linear non optimizing method based jit compiler that we

    Recently ported to R 5 and usually what we have is the VM source code that’s written in restricted small talk that we had a talk earlier on so that’s called slang that we then transpile to C and we generate the execut table on which we can execute the application we also have

    A test framework uh in which we managed to do blackbox testing of a jit compile code using a CPU emulator which is nam unicor and I will go back on this more later on and so I tried to motivate why we would need Hardware security features

    And now I’d like to motivate why the R 5 Isa might be a good candidate for that so if you haven’t heard about it uh it revolves on three main objectives first first one being open source in the sense that the standards that are open for uh

    Everyone you can still modify and do a proposal for instruction groups that are still being drafted it also modular so you have instruction groups uh that are separated to support a wide range of applications so as an example we have the base one which is I for integers we

    Have multiplication and so on and so forth and if you manage to design a core that supports the airv 64 M A FDC extensions this means that you’re ready to support the fully featured OS such as Linux uh we also have several standard allocated spaces for extensions what

    Does that mean is that there are dedicated up codes so that our Nam is custom 0 to three that are fixed in standards and that won’t be used by Future extensions so this means that any change that you might want to make now will be available in the future and will

    Not uh overlap with another instruction group we also have some hints which are instruction with specific arguments that fall back to being an up if they are not implemented so for example here the Lui X x0 and value is usually the load upper immediate so you would take the 20 upper

    Bits of the value and load it in the register if you pass it the register x0 which is the hardw Zer this will do nothing but you can Implement some logic in your processor to actually do something so I talked about why we might need custom instruction but which custom

    Instruction I’ll be talking about so first example which I call E1 are simple rotation instructions that are not actually fixed in the RIS five standards so there are very basic rotation instruction that you might expect uh the second one is the use of a shadow stack

    So you can just put two new instructions CFI call that you would put right before a call to store the return address in some part of your Shadow stack and CFI return where you’d have to pop the return address and compare it to your actual return address to see if the

    Control flow has been disrupted or not and the third one is uh duplicates all memory accesses and tag them with a domain so this means that for example we can use it to put um in the jit method the jit code region could only use duplicated memory accesses that are tied

    To domain and that can only access data in the jit uh region we also put some um specific change domain rated domain instruction that allow the the switching of this actual domain to be uh performed so some motivation design of the of jig so as we as in the team we are mostly

    Hardware uh developers and when we would like to make jit specific extension jit specific custom instruction this mean that we have to support the whole software stack and as R 5 is still in stable we’d like to uh get something way more simple and flatten the stack to get

    Uh a meaningful representation of what might be the jit code region so we we do some assumptions first that the jit and ahead of time compilers are the only components that are modifying machine code and then that a snapshot of a jit code is representative of the changes that

    Are made by those components so this means that if we manage to get the J Cod verion after a certain amount of time we might say that if we get this one is representative of the application that’s running so our motivation is to flatten the software stack significantly so we

    Can speed up Hardware development and support VM specific custom instructions so jig which is the French for jeta in Signal processing so it’s a workload generator it’s a random workload generator that will produce an executable file model after the g code region and support custom instructions ready to execute on extended

    C so we designed it around three main uh ideas parameterization so it takes in input qualification parameters to uh qualify diverse application and VMS it’s modular in the s that we’ like to extend it with customer instruction we like to extend it with custom jit code region

    Constructs and it it does heavy use of testing so we have lots of Sanity checks and we have a custom execution model to validate the binary we generate before running it on the real or simulated Hardware design so three main components are very simple it’s a there are methods that are

    Filled with random instructions and calls we have a model for polymorphic in line caches which is basically a number of caches and then our methods and we have some trle is to manage the the call two and for um the interpretation Loop so our binary structure which is cell

    Contain is we have an interpret an interpretation Loop sorry that calls all jet elements in a random order then we have each each jit elements that will call a number of other elements and both have access to the different trampolines for routines the resulting binary is compiled using the binary framework

    Provided by the RIS fire assembly test so this that it will automatically be supported by the different uh race five core that support this assembly test Suite so for parameters I will go quickly over them basically they are split into what qualifies your VM what qualifies your application and um we we

    Would say that uh for a VM we have a J Code verion size that we fix the frequency of the different J elements so methods and and inline caches the different usable register that we might need so we control the environment of our generated binary and then for the

    Application have number of methods the difference uh we might want to inject uh in the method size variation the co occupation and we can also and the one I think is interesting is the frequency of instruction so if want to have if we want to have a binary that’s very memory

    Intensive we can change that and also we can specify the the shape and size of the data so the binary generation Works uh using two components one is a generator which is responsible for the higher level structure handling and a builder which is just responsible for instruction

    Emission so it goes as follows first we’ll instantiate the trampolines uh then we determine the m method base size using the fixed jit size and the number of methods and then we’ll instantiate the different elements that all ins the same API so we just go through the different weights that we provided for

    Them them and uh instantiate them then we’ll fill them with a random instruction following the distribution that we we chose uh we patch C so the C graph is fixed at binary generation then we’ll generate data and Link it in a self-contained uh binary so as the generated binaries are

    Contains random instruction we have to perform some kind of Sanity checks so the registers that are used are fixed at generation with sanitized jumps and branches so that they don’t break the C graph data accesses are in indirect through dedicated base registers and we perform the co patching when the whole C

    Graph has been uh fixed and one thing that we uh wanted to um enforce is that the binary execution is correct so we added we extended uh it with testing which will just come uh too so modularity and the test framework so if I come back to the three example

    That I presented earlier so to include the different instructions in The Jig framework so for the first one um we we’ve added the the the instructions in the random generation of immediate instructions and register instructions and I would argue that this adds uh so 100 lines of codes most of them are boly

    Code and that’s why I just written on the side 12 line of code so this means adding them to the random generation and simply in the correct place in the code so this mean overloading the the the Builder random I and random R generation for CFI Co and CFI red so the

    Shadow stack custom instructions they are added to the method called epilogues so right after return and they are added at uh call generation and same thing it adds two instructions and I’d argue that this adds 12 efficient line of code for the last one so we added 15 instructions and we completely replaced

    The random generation of stores and loads when we change we also change the method epilog and code generation so that we check and sanitize the domain that we’re jumping to so for this one we add a bunch more lines of codes but once again I would argue that this is

    Basically 54 lines of code so the test framework we Define uh is based on Unicom which is a lightweight wrapper on top of qmu uh it defines flexible wrappers that are triggered events such as exceptions memory accesses or registers register values that are being rich rich uh we extend them to catch custom

    Instructions as defined in the standard so this way this provide um an easy API for the user to add a custom instruction that will have um an effect in the software test so for the rotation what we do is when we catch the rotation we’ll simply perform the rotation on the

    Result of the two registers push it back in the Unicorn State and resume the the V execution for the example two would store a list with a return address on the software side and just push and pop to and from it and for the third one we

    Add a tag of domains to duplicate in instructions and we and we ensure that we’re in the correct domain when we’re executing and fall back to the traditional load install so I will just show you the a set of use case that we’ve uh used jig for so the hardware development stack

    That we have to go through to extend our call with custom instruction looks like this so this is without having to put a VM on top so first we have our core definition so our processor implementation that’s written in a high level Hardware description language which is called chisel which is an

    Extension of Scala then we compile uh this chisel language description of a core down to very L Code system VAR code sorry which is another Hardware description language that I have to use and then once we have this we have the choice to either go through the

    Synthesis tool to get down to the B room and put it on an fpga or or go through a simulator a cycle accurate simulator which is what we do so we can execute The Jig binary on this very letter simulator so it is able to run a fully featured OS but it’s

    Extremely extremely slow and we cannot uh go through it to validate that so that’s why we like to have a binary that models this J Code interaction and give us some insight on whether our custom interaction is suitable for a jit compiler or not at

    All so as an example we took some values from the um FM dumps and uh artificially extended them to generate different application classes so can you see my mouse yeah so on the left we have we we made varying uh call occupation so what

    That means is uh if a method in the jit code contains loads of uh calls on not and uh on the right we use varying um memory accesses so this is on the core that simply implements the the E1 so the rotations and we wanted to see if uh we

    Can generate artificially inflate the number of uh application classes that we have so we can have a meaningful workload to work on so as conclusion I presented uh jig which is a workload generator for Hardware testing I believe that it’s parameterizable to qualify VMS and applications it’s

    Modular so we simplified the addition of elements and instructions and provided a way to test your custom instruction before um adding them in the actual Hardware uh we are implementing variant of the E2 so the shadow stack and E3 in an actual core and we’ like to comp them

    Because usually when we have a paper presenting a shadow stack they will compare to the software version and usually they’re faster and same thing for for the example number three we like to have a common ground to compare different solutions and to provide Insight on what would be an interesting

    Fit for a jit compiler and yeah and the phm supports already supports the E1 so which is just a rotation but it supports it in its custom testing environment and the final final step that we would like to reach is to have a complete stack training which means uh the um core implementing

    Custom instructions that might be interesting and that we can propagate through the VM and running on top of no but we’re still far from there and this was this is just a way to flatten this stack and try to have some preliminary sorry preliminary results so thank you for listening to me

    Uh the code is available open source on GitHub and I I actually have some questions to you so which custom instruction would you integrate in your vmj code what does it mean for you to integrate Uh custom instructions and also which parameters would you qualify

    The jit code for your VM so I have much more details in the paper on how we use the parameters to generate binary to have a a representation of this piece of memory but I’d be happy to hear your thoughts thank you so this is going to be probably the

    First St when if you come to the mic you can either ask a question or provide a response so please go ahead so you were mentioning three different uh uh extensions uh or sets of instructions that that you were adding and you were saying were based around

    Security and and the the second and third I I definitely see that but how is a rotation a security prib of I didn’t quite get that oh no no the first one is is just an example of a very simple one that we can extend it’s it’s definitely not a security one

    Sorry yeah maybe okay so everything made sense for me with the one exception which is I didn’t understand the random Generations thing because if you have a jit and like you just want to extract an executable subset of that why don’t you just grab like lots of them from a real

    Jit workload and take those and then feed them into your core like I I I didn’t quite understand the random part did you like explain that again yeah of course so um we made that choice because actually it was easier to control the environment on this uh subset of um how

    Say we’ll just qualify the the dump of FM so after running some kind of application and then we can get some um metrics on the used instructions on the number of uh polymorphic inline caches in comparison to the methods and then we try to artificially uh expand that on

    Something that we can control a bit better so it is we control the registers a bit better on the on the side so it’s it’s almost like a fuzzing like approach except you’re trying to mimic the you’re trying to match statistically the like number of load store compare exactly

    Okay it was a bit um how would I say faster for us to do this this way and it by no means replace the actual jit compilation okay that makes more sense yeah thank you any more questions if not let the let’s let us thank Quenton one more time

    Leave A Reply