As things stand today: “We have to scale. We first test on small data sets, then on medium-sized sets and finally on the very large sets,” explains Hochreiter. The XLSTM technology performs “very well” on the small and medium-sized data sets. You have to beat all models on the market, says Linz. Over 1,000 GPUs around the world are currently computing the development from Upper Austria. The time horizon is ambitious: The tests on the smaller and medium-sized data sets should be completed by the end of March. “The large data sets will then take centre stage in June,” explains Albert Ortig, Managing Director of NXAI.

Thanks for listening. We welcome suggestions for topics, criticism and a few stars on Apple, Spotify and Co.

We thank our partner **HANNOVER MESSE**
https://www.hannovermesse.de/de/

Albert Ortig [Contact](https://www.linkedin.com/in/aajjoo/)
Sepp Hochreiter [Contact](https://www.linkedin.com/in/sepp-hochreiter-41514846/)

This podcast is presented by Hanover Messa your leading event for industrial AI hello everybody and welcome to new episode of our industrial AI podcast my name is Robert Bieber and it’s a pleasure to talk to Peter seberg hello Robert hello people from around the world hello Peter we have a special episode today because we have two guests from Lind Professor drer welcome Z hi

Nice to be here and Albert ortic hello Albert hi nice to be here too both of you speak today is it right for a new company called nxi is it right or are you speaking for University SE and Albert for the company please make it

Clear for us I think we both now speak for the company okay because uh this is a new thing yes but I’m still at the University yeah yes yeah se you are well known so Albert please introduce yourself briefly to the business so uh very brief I’m in the

Business since some years since 25 years and doing Digital Services digital product development uh have a company with 100 people uh and found it several ones and uh together with SE now uh and another partner we founded anx AI which is focusing on thisi topic uh where SE is highly in

It yes we already recorded an episode on X lstm and you SE told us to be patient we were patient but today you have news what is the news and what can you tell us uh we have founded this company m in particular uh also to

Advance uh this ideas is xlsm idea MH but we in the company we will also follow up different research directions but uh we start with this xlsm which is at my heart and uh I’m very eager to to push it to see whether we can be better than this nonsense Transformer stuff

Which which is on the market right now and uh what what’s important for this compan is we got funding MH to do the compute because to show that it’s really nonsense we need compute uh to to test or compare the xlsm for very large models with everything which is on the

Market right now in particular we we we are going for a scaling loss uh we compare smaller models with larger models and if we always better then we can go and R the very large models and can beat every large language model which is on the market which I think we

Will do in in in small scale right you’re you’re already done in small scale we now doing in small scale but we have the funding also then produce one in large scale but in small scale we have to show that we are better that’s what we’re doing right now yes but uh

And it looks like we can do it but if this is done we build a big model and bring it uh on the market then we will perhaps lease it sell it we don’t know how to how the business model would look like mhm but you know Excel is them

Which must faster much much faster than than this Transformer nonsense and therefore it’s much more important for industrial applications uh Robert you and Peter you remember we had this industrial meetings in the Alps yes and here the industrial partner said had slow and we canot use it and now we can

Supply something which is not only better but much faster okay I’ve I’ve two questions one to Albert who gives you the money to now go on large scale so the whole compan is funded by natural X which is the company builder of us where I’m part of it and where also P

Digital holding is part of it and we too are also the only uh one beside sep as partners in this company and we funded it out of that so of course the big uh industry company p and the family behind supported us here also on terms of funding for our German or

For other International liness peria is is an Austrian based company and one brand or one one important brand is KTM right Albert exactly p is a big Mobility company which has different brands KTM husana and others also in bicycle Market mostly known in the bike Market I have to comment on this I’m

Very happy that we have companies like those here locally because if you look around it was really hard to get something funded which is on the research stage it’s not already a product R and you get this the time in Silicon Valley even in China or whatever

But uh to be honest Germany Europe is too lame to do stuff like this it’s just too lame it’s I I don’t uh invest in in new technologies very early to keep the technology in the country in Europe but here we found somebody who is doing this which have some mindset you normally

Find in Silicon Valley so Peter you have a little bit more technical questions right yeah let’s see last time we talked SE was in October actually so for those listeners who want to go back for the details already at that time you wanted to kick open AI from the market strong

Language which you’re using still today as well you talk about transformin non is now just going 20 six years back that’s when you did the paper you invented lstm together with Jurgen you put the the paper on the market by the way we’re going to ask you if you’re

Going to doing paper this time as well it took 10 years and then it took off and US big Tech made big money with your technology then Transformers came and so with X lsdm you want to make sure that’s my understanding that lstm X lstm is

Going to be the number one again uh has something structurally changed since then is your basically the USP of what it is that you’re offering is one the most important thing that I got out is it that you have a linear approach rather than an exponential approach from

Transformer um now what is this Transformer have quadratic in context length it’s the number of words and the number of tokens you consider and we are linear and there’s a trend in large language models those language models perform better which have a longer context meaning your question and and giving a background

Is longer and then also if it’s a complex question your answer is longer or you have to drive something or explain something and their Transformer do not perform because quadratically and here we are much faster so one thing is and of course benchmarking two questions number one

Are you going to have a paper that you’re going to share number two uh you’re making very strong claims which I’m very happy with I don’t know who is making these claims so that makes me even more waiting for uh for the details but how do you Benchmark I mean there’s

So many different benchmarks comparing to all the gptx to clo to mistra to Gemini to Lama and all the other ones how do you compare what is the the main Criterion by which you say you’re better or faster first of all you already mentioned something like mistol Alf Alpa

Llama but all Transformer Technologies are quadratic forget them what where we do benchmarks is whether the benchmarks are established there a Hing face Benchmark there are benchmarks where the new things which came out do the benchmarks on we at the beginning we’ll use slim pajama or red pajama which is

The data set so open source Llama Or was trained on and everybody thought llama was trained on and then we use standard benchmarks because we have to use the same data and the same evaluation criteria to show that we are better if you use something else we cannot compare

We use exactly the same stuff and sure we’re doing the same but only better we are better and that’s very important that’s science science is you get the same uh input have the same time the same number of parameters but the output is just better so you mentioned hugging

Phase are we going to see the out put of your approach on hugging phase soon as well or no it’s not decided we will uh the other question was at some point we will publish we will discuss whether uh we keep the IP rights in the company because you already mentioned there was

This lsdm and Google made money with it Facebook om made money with it Amazon made money with it Microsoft made money with it by do made money with it Alibaba made money with it and I can continous list Europe had nothing I had nothing and I learned I hope I

Learned and first of all now I want to keep IP was a technology so know how here in Europe in South Europe Austria Germany and so on and we will also power as companies with it if we have the technology we can found new startups which use this technology to really go

To the market and put this nonsense from the market the nonsense is this Transformer can you can you give our listeners a use case what can be done better with the xlsm what can you what can you imagine as a as a consumer as a industrial AI guy

We have to do the evaluation we don’t know where we are better we can that’s a standard measure predict the next word better but if you can predict the next word better normally you’re better in coding uh you’re better in logic and because xlsm has two parts one is a

Phological parts so memory Parts yes is a semantic part uh because we have additional Mantic part I would guess that we much better in abstract reasoning if you do you know there a symbolic reasoning you never do reasoning on the poor input because if you say yeah this Ferrari went from this

Place to a house this Audi went from this place to this house at this street but in abstract set say a car is going to a house and if you have this abstract Concepts you can do reasoning Things become much easier because you can can move symbols around and reasoning you

Always do in abstract space and we can open this abstract space therefore we can and I guess so we have to prove it we have to show it we are better in some very complex situations if the text or the semantics is about complex situation complex things peps like coding peps

Like logic but we have to see and here I see we have would have a big big Advantage but uh this have to be proven one one question on reasoning sir because that’s a big claim I love it uh the symbolic guys you know Gary Marcus

And many other ones they’re not going to like your claim that you’re going to be capable of reasoning perhaps I’m not we are not so good like the symbolic reasoning machines but I think we are better or we hope we have to prove it we are better than what’s on the market

Okay Albert can you give us a little bit more details on your timeline so we we already heard small scale you’re already done now large scale what is the timeline when we can see when it’s possible to test maybe the first model what is the

Idea so at the moment we are training a lot with huge uh compute power also where all around the world or in Europe at the moment all around the world this is one of the topics we are also focusing on to bring it to Europe uh to

Uh build our own compute or to make sure that we have our own compute perspective and this is what we are doing within the next two months very intensively so till end of March end of March we have a comparison to all models uh with our

Model SE explained and then um on on the on the small scale right yeah small medium scale I would say it’s not it’s it’s it’s not so small it’s size uh but it’s not really large and the really large scale um then we have to do some some homework after this training phase

Uh one to two months and then we’ll go to the very large training uh data and uh sessions that’s the plan so that means till end of March hopefully we’ll really see the charts away able from all models in comparison to our model and then you have April May beginning of

June we will go into this large scale perspective one question when we talk with our industrial AI guys we had an conference this week in in Frankfurt and they are not focused on this big large large large large language they are focused on maybe individual smaller

Language models is it is it necessary to scale large because you want to show I’m better than than chap GPT or is it also an an option to say okay we are better in in small scale medium scale large scale okay maybe not maybe um but we are

Also fine with this both too we want to do all of that but first of all we want to show us that along the scale we everywhere better if we have shown this we can say hey our small model is perhaps better than your guys large model but say another big big difference

And this is our methods are based on on lsdm technology lsdm have memory and the Transformer have to store all the words all the tokens it has seen and also for prompts it have to see all the prompts have to store all the proms and there was this thing like prompt engineering

MH yes and we will introduce a complete new thing uh you can directly modify memory perhaps there a memory cell it tells you how friendly the AI is and you can make it more friendly or less friendly while you have a a neuron which is uh saying uh is it talking more about

Leisure Hobbies or is more talking about work is it talking more technical or is it talking more on a high level and until now everybody has done prompt engineering B bring a complete new thing into G you can still do prompt engineering you make prompts let the

Xlsm run over the prompts and then you have uh you f the memory but you can directly modify the memory and hopefully we see some memory components where we can fin adjust uh the whole AI to some specific user desired properties and it opens a complete new field let me ask a

Question of that se I’m personally not a big fan of humans needing to study for days or weeks like doing the prompt engineering so I mean as long as within an engineering encoders community that’s understandable but I believe we need to go to a point where you know the large

Language models try to understand humans you know start communicating like humans is there some potential for it is what you were just explaining that you know we’re going to be able to communicate with the model and it’s going to ask us questions like I do now with you and with Robert and with

Albert um I don’t know because in principle the high level method is the same it’s predicting the next word it’s memorizing what already already was there and also the next word is give me some probability things like hallucination stuff like this still remain because it’s in the technology

It’s inherent in it but but via this memory uh you can better steer your opponent you know whether it misunderstood things perhaps you even have some knobs and buttons you can push to make the AI to understand more technical things or whatever this might be possible VI the memory but the

Principle is the same as with the existing technology we don’t change this we do the same but better also better modifiable by the memory faster with less computer but the principle is the same I was going to ask exactly that to clarify one more time in a number of

Points what means better we understand faster better reasoning semantics less power means less money so it’s going to be cheaper it’s going to consume less power so it’s going to be good for climate as well yes a better uh what for me is the main thing is how good am I in

Predicting the next word there’s this measure perplexity and if a large language model is good in predicting the next word in terms of complexity or a likelihood or whatever you’re bet in different task question answering or solving some some math task or whatever but that’s our main measurement because

It’s the main measurements for all other methods okay very good one final maybe question from my side I’m moving to you Robert is you know we just learned that meta is going to what buy is it 10,000 is is it the h100 Nvidia um and you were

Telling last time that your guys your students guys and girls around you are all writing Cuda code um and I assume that that’s still going to be the standard but if you’re going to be you know at a certain level faster are we going still going to need the very fast

Best top Hardware do you say well you know maybe there’s going to be another market for the middle segment not $30,000 a piece uh graphics processor but less than that but I would say uh first of all for training we need some fast Hardware because we have to push

All the big text corer through the model and and learn on that but also in inference in application if you have to deal with many users at the same time and stuff like this we probably also need it what’s also better with this memory as lsdm memory for long context

You don’t have to store the original tokens you’re more memory efficient meaning you can run this model perhaps on smaller devices perhaps on your cell phone with less memory this might be also the case yes but I still would bit on the Kudo coils on the Nvidia gpus uh

We can go to AMD and other uh techniques but this is yeah somehow a standard for deep learning days yeah maybe also from my perspective um the hardware thing at the moment is how can we be fast uh and not only fast by Hardware but also fast

By knowledge um the the researchers are focused on or are used to work with Nvidia Hardware infrastructure which makes it simple to start training fast right and um time is a critical thing at the moment in in the whole case so this is why we decided to go on that even

Because of time I believe that there will be new technology and I believe also we will go into research also how to use this new technology but at the moment it would cost us time to focus on that this is I think the simple answer at the moment and media is the way

Everybody is doing it to be fast and efficient and efficiency is it you can you can have a look into different ways on efficiency but if it’s about time to Market this efficiency is uh is well covered the other question you also asked Rob a small scale midscale for

Industry we didn’t uh discuss so far what we are planning and what we are doing in NX AI beside xlsm so of course we are starting with xlsm and our core is to do R&D research and development around H Ai and xlsm is is is a huge

Thing uh we are doing and everything is focusing on that beside that and that’s important is that we are also realizing a transformation layer from this basic research and development technology foundational model new large language model to Industrial vertical applications so we are very open and it

Is our Focus uh in parallel to this r&t part also to work on the way how to be able to use this technology in the industries and for that of course also small and midscale is very important we have to be better also in small and midscale because a lot of applications

Are based on that absolutely absolutely can you tell me how much was the investment in your company can you share this with us we we cannot or we do not want to share a number of course we could but we share enough but I hope so

I hope so yeah but it’s a lot so we invested a lot we at the moment training on more than thousand gpus day by day so uh if you uh calculate that for some month you will have a nice amount but this is the first small part right so

The the the next part is we are go in deep to build products out of it and we are going also into more research not only into the xlsm but also how to use it and uh to bring it to other areas where we are building up uh

Research lab together with SE so these are our Focus topics on the one side research on the other side bringing it to the industry partnering with industry and keeping it in Europe and trying to set up a second or third column of European AI technology with a a new

Foundational model great so Albert are you do you have first ideas on Roi so yes we understand it’s a huge uh investment so what what are ideas maybe you have today without details how money is going to flow back to you I mean are we talking

Open source are we talking IP are we talking potential contracts with maybe you know industrial companies uh what about end users do you have initial ideas on how the money may be flowing back towards you yeah maybe some words around that when we started that se was

In the situation that he said he needs a lot of money and he cannot make sure that everything works it’s research right we do not know what comes out and you cannot um call a venture for and ask for some millions or a lot of millions

And you do not have a glue about the return on investment so this is why it is a very special situation to have this combination of this three SEP with the team as one of the most valued researchers in this area the pr who are really so open-minded uh and not eager

To have a business model when started and on the third hand the natural companies from my side where we are focused on being able to bring this to Industry digital products and the return on investments will be different um there will be different income streams

In future for sure if we have a great model so our focus at the moment is really not the business model but the great model and the first transformation from the model to products what can be models to have a investment on the one hand it’s of course licensing technology

It’s of course uh having own products out of it where we can make money also if it’s open sourced or not we will discuss uh we are in discussion I think we will have much more information to this question in two months um and if we

See the timeline we started to discuss I do not know SE in September right uh December we founded the company with the money we need and we started to train uh already on 1 of December in March we will for sure have a good clue what we

Will start first with the technology to monetize something but the core is research I I I want to I want to give you one application I would love to do uh especially with xlsm because this large language models I thought to to to contain all the knowledge which is in the internet

All human knowledge best I would love to build something for a company where each company has a large language model and a this large language models knows everything all knowledge in the company it knows how was the text last year did the supplier deliver on time does the

Supplier in these some conditions or the prices where is this machine located where is this specific screw in my company when has this person holidays is this person available is my car my truck whatever available and so on and I only have it in my cell phone and I know

Everything the problem what now all the companies have says one expert perhaps it’s a tax expert or it’s expert on one machine or whatever and if this person leaves a company it takes a long time until the next guy uh say on this dat again but if you have all this knowledge

In your cell phone you can ask everything you can ask a new customer com say hey do we know this customer where’s your problems what where did say complain about you get it immediately and and say they were always not happy blah blah and you addresses everything

In the cell phone everything about uh your company where’s the last sheet of paper where’s every screw where’s a device Logistics when does the next pallet of uh plastic come or whatever you have one thing and all the knowledge of your company hey is this cool but

There there’s yeah but there’s even more possible uh two episodes uh last episodes we had an episode with beckov automation company from Germany they are building their hmis on the machine with a large language model so you give him hey give me something about the pressure

Give me something about uh I don’t know the output and then the large language model like is building first draft of a of a HMI HMI means your machine interface so uh the cockpit right but that’s that’s coding uh all these language models are very good in coding

You can always make a thing and also you can make a first design a rough design you can also uh uh design a rough workflow have a new thing you want to have a rock flow or you have a new PR thing going on and it can make a

Timeline for your PR initiative you can do so many things yeah yeah I have the last question I have a few weeks ago there was the neurs in New Orleans you you visit in eurs in New Orleans and there were all the Transformer guys what

Did they say uh yes um we had many papers there of course we don’t had xlsm because but they already heard that you were working on xlm yes that was a problem you’re right but we did not tell them because otherwise OPI uh or meter would grab it instead smarts immediately

Would would would look into it but I I mentioned something in a workshop and then many guys run at me and say hey what is about when does it come out and uh guys just space guys who develop Mamba which is also now a competitor to Transformer and we are looking very very

Deeply into Mamba because if we bring something on the market we will uh want to beat everything so Mamba guys jumped at me hey can we invite you to talk what is about and so we’re a little bit scared about what’s coming out many guys also guys from jur jur Lab

Ask me hey Z is it about this I try to to lure out some some information I have some hypothesis and so on yes there was a a lot of rumors at nurs okay so in March we will hear more we will hear more about the the small

And medium scale all the best for you guys thank you very much for this update on X lsdm and greetings to Lin thanks Z and thanks Albert sure thank you also very much Robert and Peter thanks guys have a nice day thank you thank you byebye byebye Tom

1 Comment

  1. Will XLSTM solely focus on the "next word-guessing" technology or do we have a chance to see it back in the forecasting of Time Series? The climate science world greatly needs one.

Leave A Reply