Join Global AI United Kingdom for an insightful event focusing on harnessing the power of Microsoft Azure for AI in the cloud. Dive into how Azure’s robust cloud computing capabilities can elevate your AI projects, from data processing to deploying cutting-edge AI algorithms. Whether you are a data scientist, AI enthusiast, or technology professional, this event will provide valuable insights into leveraging Microsoft Azure for machine learning and data science applications. Engage in discussions around the latest advancements in artificial intelligence, and explore how Azure’s tools and services can streamline your workflows and enhance AI development. Don’t miss this opportunity to connect with like-minded individuals and discover the endless possibilities of integrating AI with cloud computing.

    Thursday 20th June – Season of AI

    18:00 BST Kick-off & Introductions – Gavita Regunath, Luke Menzies & Tori Tompkins

    18:15 BST Beginner’s Guide to Azure AI Studio – Gavita Regunath

    Dive into Azure AI Studio – a cutting-edge ‘code first’ experience for building generative AI copilots. You will start by touring the studio experience, moving on to creating a RAG model, exploring other multi-modal models and wrapping up with the importance of evaluations and deployments. Join us and harness the power of Azure AI Studio to transform your ideas into reality!

    19:00 BST Turn your RAGs to Riches using Azure OpenAI! – Alex Billington

    A vast treasure of knowledge is contained within all of your documentation, but trying to trawl through 10,000+ documents to find what you need can be a daunting task. But have no fear! Using Retrival Augmented Generation and Microsoft’s Azure OpenAI you can quickly and easily leverage the power of Generative AI to make sense of all of the information locked away in those pesky PDFs!

    In this talk I will cover the basics of Generative AI, Large Language Models, and Vector Data bases before demonstrating how you can quickly and easily build a RAG implementation using industry leadings tools: Azure OpenAI, PromptFlow and LangChain!

    hello I think we might be live I will give everyone a couple of minutes to load in uh before we get started in the meantime how is your day both yeah good T how are you oh good all good the football’s on so this was poorly timed uh this stream if you’re I guess we’re done no I would have chosen this anytime how are you Luke I’m good same old missing the [Laughter] football oh nice is it good weather oh I’ve just pulled out the weather line but it’s nice and warm hair I know way further up North this is broadcasting to uh screen in my local pub and I’m really excited because this is our first kick well first user group that’s going live today so I’m super super excited good stuff cool we’ll have a couple of minutes um we should be stre streaming to both YouTube and Linkedin um yeah if anyone wants to leave a comment if it’s not working I guess if not we’ll give it a couple of minutes oh oh Alex okay we can at least safely say that YouTube is working um okay [Music] oh LinkedIn works good all right so the technical difficulties are out the way everything now should be smooth sailing um cool do we want to start there three minutes yeah I think we can kick off let’s jump in cool all right well thank you everyone for joining uh this is the global AI UK very first meet up so um yeah so super exciting for us hopefully we do this every three months the plan um so four times a year um um have a bit of a mix of streaming and in person hopefully but yeah so welcome everyone to our our very first um Meetup uh so I will get us started um with a bit of an agenda um oh so before we start uh it might be nice for you to all meet us um so we run uh the UK Meetup uh Gabby did you want to introduce yourself a little bit yeah sure uh I’m Gabby I’m the head of AI at advancing analytics but also a cohor co-host of the global AI UK um yeah I’m I’m an AI MVP as well as a data breaks champion and yeah that’s that’s a quick kind of introduction uh of me I’m gonna pass you on to Luke hello everyone I’m Luke I’m principal AI consultant at advancing analytics been in the world a bit uh so started off doing a PhD in nuclear physics then uh moved away from that to to bins and then found myself here that’s pretty much me thanks I’ll pass you back over thanks uh yeah I’m Tori um also with AA advanc Antics I’m a senior AI cons consultant um yeah co-host um also do a little bit of podcasting bit of charity work as well um yeah that’s me um thought would be nice for everyone to see our faces just the once or at least for the first one um but if you want any updates on the Meetup then it’s always good to follow at least one of us um I think we all post quite often to LinkedIn um see yeah follow along um cool I sorry I’m going you say you do a bit of charity work you don’t do charity work for advancing analytic surely I mean advancing El isn’t my charity work that’s it if that’s what you were asking that’s I was asking do a bit of charity work and then spending time with you Gaby my yes so cool uh we do have 10 minutes before the first Talk starts it might be nice to go through the talks we have today so uh we do have our very own Gabby doing the first session of the first Meetup uh talk to us a little bit about Azure AI studio and we also have a guest we have Alex Billington with us talking about uh rag with Azure open AI as well um so Gabby will be on at quarter past um which I realized now uh that I was never going to fill up the intro 15 minutes next time I’ll make that five minute intro um and then Alex will be joining us at 7 um so we’ve got two great sessions on um yeah so we got a little time until you kick off Gabby so we do I mean I’m happy to kick off because I have have a feeling my slides are going to take slightly more than 45 minutes yeah or or we can talk about you know what what we think the trends are the minute you know the back of build yeah either way we can either get started or we can that what’s your what was your favorite Microsoft build announcement then so Microsoft build feels like ages ago and since Microsoft Bill we’ve had you know the data break Summit as well and things things are progressing along as especially in the generative AI um field and I think I think what’s coming up is Agents right agents it’s all about agents that’s what I’m hearing a lots more and more and more but and and that’s what I’ve said all along I think agents will start to develop and mature to a level that we start incorporating and implementing into Frameworks and I think it’s hugely hugely exciting as well yeah I remember you mentioning I feel like it was about a year ago now the agents were going to be the next thing and I was like yeah yeah sure and then sort of Bill announced their um the Azure open AI agent assistant API was called and then um was it last week we had the data bricks uh keynote announcements as well and they have favor of Agents as well absolutely I think in build as well there was I think a couple of talks on all to and you know how you use autogen and how people have been implementing autogen which is all about agents and autogen is Microsoft framework to implement uh agents and yeah you know that that’s come along Leaps and Bounds and I’ve got a question on on the chat here Alex B asking what my thoughts are on auto J studio now I used it when it first came out I think it was I think it’s version two now or you might be even more than version two but version one certainly was exciting but I’d say it’s not quite B yet in terms of using it for development work and productionizing agents what is autogen Gaby autogen is Microsoft Fame framework to implement agents so pretty cool um have a look at it there’s a whole GitHub report there’s a whole yeah whole kind of documentation and instructions in terms of how you start getting use start using aut you know how you use it all the different use cases typical what I’d say is allocate sufficient amount of time because you’re going to get sucked into reading it like I did and spend three hours just understanding how you do it playing around with it because it’s super super exciting oh good sounds like a future next talk [Laughter] yeah no next talk like you gonna be talking about that was my dramatic pause you ready yeah so I think I’m super excited to see how that evolves what about you guys um favorite announcement from build uh my favorite and the stuff that I’ve already kind of gotten stuck in with is the uh API management yeah open AI integration um yeah I think it’s setting up Leaps and Bounds towards LM Ops or FM Ops do have kind of preference of terminology there um yeah just load balancing caching all of that stuff in and apim um yeah I was most excited about that yeah what about you Luke um is it a bit cheeky if uh from the uh data and AI Summit I’m just relieved that Lake housee mon monitoring is finally uh into ga oh right yeah of course yeah that makes it a lot easier in terms of monitoring your models in data breaks I’ve been waiting for you just for that yeah yeah yeah yeah cool exciting anyone in the comments got some their favorite announcements from build or from the data and AI Summit um there were tons really weren’t there yeah there were loads I think what’s coming up as well a use of small language models you know that you know we’ve seen with 53 with multimodality that’s that’s been really really effective and really cool as well so there’s there’s a lot of things that’s coming up all to do with AI and generative AI which is super exciting yeah and good timing because we have two gen talks yeah exactly exactly we didn’t do that on purpose but nicely cool we got a three minutes hopefully we’ll have a few more join and then we can get you kicked off Gabby all right cool very go this is the time when in a traditional Meetup you would be getting your beer and that’s probably why I gave it 15 minutes but note yourself next time the intro needs to be five minutes so it’s not 10 minutes of awkward conversation on air awkward I don’t think this is [Laughter] awkward any you ever played about with staple diffusion models it’s mid Journey not not the right thing to say here then that probably doesn’t count no no I have not cool though yeah I’ve noticed since you uh acquired the license recently there’s been more embarrassing pictures appearing on uh on the company chat I don’t know what you’re talking about Luke anyway it’ be good um to also ask our audience who’ve joined us today terms of you know have Has anyone used AI Studio before has anyone heard of AI studio is anyone using it actively to develop large language model applications it may just be cool to understand what people are are doing with AI Studio or if they not hold of AI Studio then you’re in for retreat but if you have and you using it already you know be cool just to see some comments come through in terms of understanding what people are using it for yeah and while that’s going through I think we should we set up your slides Gabby yeah absolutely yep make sure they’re working all right let me remove mine [Music] okay sorry it’s like a RAC trck outside my window I will move myself um awesome all right it’s gonna take over till 6:15 any second so me and Luke will retire backstage and you take the floor then Gabby all right cool thank you all right we’re banging on time now so I’ll get started my the title of my talk today is all about a beginner’s guide to aure AI studio so if you’ve not heard about aure AI Studio you’re in for a treat however if you’ve heard about AI Studio there might be certain elements throughout this talk that you just pick up on and try and implement it and use it and find it useful um so what’s AI studio right AI studio is offers a a unified platform to all large language application developers out there are looking to develop generative AI apps right and it provides you know for data and search for example capabilities it’s got a catalog of foundation models that you can pick and choose from to start developing your large language applications and Microsoft as we know are very very hot and in a safety and resp responsible AI so they’ve baked in a lot of the safety and responsible AI within oo AI studio and we’ll talk about bit about it as well and what I love about AI studio is how it allows developers to use prompt flow for example to develop an endtoend um life cycle of uh LM applications and also gives you the ability to do llm Ops and model monitoring so in terms of providing a unified platform a AI studio is pretty good and pretty and you’ll see through the demos that we have that it gives you an entire endtoend experience to develop a generative AI apps so let’s start off with a quick demo and an introduction to AI studio and if you click on AI studio and and when you start it off this is what it looks like I’ve got a little demo that’s recorded that I’m going to play um to what what you see on the left hand plane is a load of of options what you can do is go to settings right and within settings you’d see all your information about you know Jo AI resource the connections you have with AI search for example the compute instance you’re running what run times you’re running and all the information even if you have project members working on the platform it gives you all that information what it also does is gives you the ability to look at deployments deploy models here what we’re doing is deploying a d Vinci model but the playground is pretty pretty cool right so you can play around with large language models here you can type questions you can get answers and you can experiment with different models you add your data functionality which we’ll see in another demo for is pretty cool you can bring in your data specifically toop and Ai and you can ask questions according to the data that you bring in so it will only answer questions based on the DAT data that you’re embedding into your US Open studio so here what we’ve got is we’re going to be quite cheeky here we’re going to ask a question in Greek which I think is pretty cool and you’ll see here right immediately it’s got the functionality to understand what you’ve asked uh asked in even if it’s a different language and is capable of giving you that response back in English U your system message gives you the flexibility to change you know your your system messages so for example here I’ve said you’re a friendly and funny um you know everything that comes out of the answer should be friendly and funny and as you can see here it does exactly what you’ve asked it to do now what cool about aure Studio as well is about having to do your manual evaluation you’ve got lots and lots of different functionality I know I’m going through this quite quickly but there have we have demos after these slides that you’ll see later on as to how we carry out manual evaluation that’s a pretty pretty cool Fe feature especially when it comes to large language model and you wanting to know how each model is performing you can come into your oo AI Studio you can evaluate you can put in your own uh data set for example and depending on what you’re doing you can get your AI Studio to help you evaluate your large Lang models um so what I like the cool what the cool thing about here is you can even go through in manual evaluation and you can put a thumbs up if you think you agree with the evaluation and thumbs down and it will tend to show you the stats at the end of it so you know exactly how your model is performing right so that was a quick introduction to AI studio and what we’re going to talk about next is data right how we connect data how we use the functionality of data at search within AI studio now we all know data is a fuel that powers AI if you don’t have data has absolutely no point building large language model applications and it goes back to saying that you’ve probably heard right garbage in garbage out and it’s so important especially when it comes to large language models you feed in good data and good quality data so that you’re going to get something that’s good quality on the output as well and what we find we’ve worked with quite a number of applications large language model applications now the most common problem with customers and large language model is that the data that they’re feeding into large language model is often poorly formatted very unclean and what that leads to is subpar outcomes right it’s terrible outcomes and then the blame the large language model so what I’m really trying to say here is data is really important and what Microsoft has done for us has given us the ability to connect all your different data sources out there and the great thing about it is it gives you the ability to integrate structured and unstructured data okay so here as you can see we’ve got you know blob storage you’ve got loads of loads of different options and the goal here really is to make sure whatever you’re giving in as an input you have the ability to do what you need to do and we’ll talk about it in the next slide to feed into large language models now when we talk about um large language models what comes down to making sure your input is clean and process and you’re feeding it to your large language model it all boils down to you know how well you chunk your documents and and the embedding model that you use to embed the documents so what do I mean by chunking and embedding so here in my slide I’ve shown a slide this is the big book of envelops that I’ve taken as an example and chunking is merely as the as the word describes dividing tax into chunks of you know you can choose different chunks like sa argum say you’re going to divide the chunks to 512 words and then what you have to do is you need to convert it to a format that you computed on this Tex right and this is what embedding does so embedding what it does is it takes a series of texts it converts it into a vector of numeric values and then it stores it in a vector store and the reason why it does that it can do searches across numbers more effectively than text and one of my favorite uh diagrams that I go to to understand how embedding work is this diagram that you see on the screen so what you see here is a couple of words right you’ve got cat kitten dog houses they are text but what you can do is use very good embedding models that available out there to convert your text into numeric so for example here you’ve got a cat that converts it to a a vector of uh numbers and that then gets stored into a vector database right so if we came along and did a search for example queen that would then look into your vector database and will do a distance check because it’s all converted into numbers and figure out Queen is CL very closely related to King and therefore it should be related so in general that’s how chunking and embedding Works chunking and embedding you know it doesn’t just end there in terms of developing large language model when you have chunking and embedding with a documentation you often are relying on VOR searches so you’re storing all these documents after you um you EMB you chunk it and you embed it into a um a vector store and what you need to do is do Vector surges right so within Azure AI Studio that is such a really cool world you’ve got a couple of things right you’ve got Vector search that just looks at a series of vectors and gives you the closest Vector you’ve got keyword search that tends to look at your database and and extract the closest keyword and then you’ve got reranking as well so there’s lots of different options within Azure AI search now as you can see what that plot shows on the right hand side is the hybrid and semantic ranker that’s available in O AI search is very good in terms of giving you high quality outputs compared to all the other searches right so if you’re doing a hybrid if you’re doing Rector you’re doing keyword search it’s not as good as if you were to use a hybrid and semantic ranker within a AI search now these results have all been benchmarked against the same documents they were chunked in the same size as well so here for example they were all chunk using 512 tokens with 25% overlap and they’re all using the same embedding models so it’s showing comparison of like for like and again just to reiterate if you’re doing an AI search um if you’re doing a search within a AI it’s highly recommended to use the hybrid and semantic ranker to get you very good results all right so let’s have a look right we spoke about um how you get your data how you chunk it up how you embed it let’s look at how you can add your own data intoo search so for this I’ve got a little demo here to do this you you click on select your data you can upload your data so you got options here here what we’re showing is we’re uploading a series of markdown files we simply select all those files and we go through the options for Good Housekeeping you want to give it you know your name so you can identify F easily and here what I’m doing is you’re just checking whether the markdown files is exactly what I anticipated to be and what I’m going to do is create an index from all that markdown files that were uploaded so for this it’s quite a simple process as you can see it’s um you simply click what search you want to use you’ve got uh an open AI resource that you can select and then what you do is you go through the next faces and you give it give give give it a name so an index name here for Good Housekeeping give it a name that you can recognize and it’s as simple as that to create an index of all the documents uh that you uploaded so here what you can see is the workflow in terms of how a AI is going to start taking all that documents it’s going to chunk it all up generate your embeddings and then give you your index at the at the end of it and once it does that you can come and test it out by adding your own data so for to do this what I’m doing here is just selecting the data source what I’m going to do is point it towards the index that we just created when we uploaded all that markdown documents and for this I’m going to select an embedding model and it’s simply as simple as just going through the steps here now this fa this this step is all about mapping your data fields and then here you’ve got the different search types that we spoke about like I said hybrid and semantic is you’re going to be your best performance so we’re going to select that and it’s as simple as that you can now talk to your documents that you’ve uploaded and you’ve index in the playground so here what we’ve done is we’ve asked a question about products that are suitable to use in San Francisco and what I love about this is it gives you all the answers right but not only that it also gives you the references as to where these answers have come come from so for example here you you can look at the answers the answers that have been provided have been referenced to that particular markdown file so you can even click on the markdown file and you can go and investigate to just verify and validate it’s exactly where it’s coming from so that’s really good all right just been through that okay so we’ve spoken about you know how aurei search offers a unified platform how you do how you you make sure your data is clean and structured has been chunked and embedded into your um Vector store uh and how you do AI searches within AO AI studio now comes to fun part right now comes to the part of foundation models now what does this mean now we within oo AI Studio we have a choice of many different Foundation models so you know you’ve got your GPT 4 Turbo you’ve got your multim model uh large language uh models here as well so you got your GPT 4 vision uh you’ve got gbd4 all which has recently released as well which is very very cool very fast very performant and you’ve got loads of different models that you have as options to start developing with and using now what I’d say is within aure AI there’s lots of lots of questions around you know is my data secured is my data safe is is open a using my data to retrain uh foundational models and is my data protected and the answer to all those three questions is it’s absolutely yes yes and yes so your data is your data your data is not used to train any foundation AI models and your data is protected by micro Microsoft Enterprise compliance and security controls that they have across pure infrastructure right okay so now moving on to um quite a fun topic rag versus fine-tuning now we get this we we get this question asked us quite a lot you know what’s a difference when would you use rag when would you use fine tuning now just for contact rag stands for retrieval augmented generation and that is pretty much what we were talking about right that talks about how you incorporate data directly into your prompts without retraining your models now rag is very efficient very cost effective because you don’t really need to get a model you don’t need to fine-tune it you don’t need to redefine weights It’s relatively quite straightforward process now what you saw as a demo that we we showed a couple of slides ago that’s effectively your right process right so by taking your documents uploading it indexing it and doing searches around fine-tuning on the other hand requires an additional compute for example to adust a model weights um you need very very good data you need lots of good data as well to do fine-tuning so it’s like more advanced it’s more expensive as well but you’ve got options to do both in aure to do right and one slide that I like to share is um you know when do you do when do you choose between these two techniques when you choose whether you’re doing r or fine tuning and the way we do it um is we’ve got a series of questions here so whether you know for example do you need external knowledge requirement and if you need external knowledge and you probably want to be using rag whether you know if you’ve got a model that you you would like to change its behavior in terms of it responses you like it to respond in a certain way then you’re probably looking at fine tuning in terms of hallucination now R definitely minimizes the hallucination it never ever get rids of hallucination but it minimizes hallucination and it also provides you with different ways of trying to validate and be comfortable with what’s coming out of the large language model but like I said it doesn’t ever get rid of hallucination fine-tuning it can help reduce hallucination to a certain extent but if we don’t have enough data it’s going to fall off the edge and it’s going to act exactly at how uh a normal you know pre-trained large language model would would act if with a lack of data so here we’ve got a series of questions and criteria that we assess whether we choose to do rag or whether we choose to go down the route of doing fine tuning and like I said before P AI Studios offers both Rag and fine tuning for whatever applications uh you wish to develop okay so let’s talk about all the different models that’s available um within aure AI studio and like I said before you have options to use loads of loads of different models here we’re looking at aure open AI models um you also have the option to deploy um hugging face models for example there’s over thousands of different open source models that’s available to you within the model catalog more recently I think you’ve got bu3 as well available in your model catalog so if you’re wanting to play with those kind of models you can go ahead uh you can you can look at your catalog option and you can simply deploy it and start experimenting with those things okay so we spoke about Metal catalog but we’ve also got the other bit of option of model ass a service the model as a service is a step further right so what it does is it manages the entire infrastructure for you so what it does is it pretty much you can deploy for example mist’s premium model or met Lama 2 model and Microsoft looks after all the infrastructure and it just offers you the API and the way you use it is use the API as pay as you go billing based on the tokens that you’re using for that particular model it’s quite cool it’s quite easy it it it’s a couple of clicks I think it maybe three clicks you get to use all these cool different models and it gives you an easy way as well to integrate it whether you know if you’re using promt flow for orchestration or semantic kernel or whether you want to do more of a backend development using Lang chain that has the flexibility as well it gives you loads and loads of flexibility and what you can do with model as a service as well so for example you can deploy llama to and you can also go the step further of fine-tuning lotu without provisioning gpus which is honestly quite great because quite hard to get gpus uh at times so model as a service gives you that FX flexibility to do all the things that I’ve spoken about in terms of deploying a llama 2 model without having to worry about the infrastructure without having to worry about spinning up gpus and the aim really here is to provide all of us developers customers to integrate as many AI capabilities into your application okay so within Azure AI Studio we spoke we spoke about all those different models that’s available to us how do you know which model to use you know how do you know which model is going to be better than another model that’s out there that perhaps you’ve heard of and you’re experimenting and you’re not really sure as to which model to use so we’ve got model benchmarking as well within o AI Studio which is pretty pretty cool um and we can see in the next slide I’ve got a little demo to see just how you would use model benchmarking to decide in terms of if you’re doing question and answering for example as a task which model would you choose which model would you select to implement for your development so let’s have a quick look at the demo so this is model as a service um let’s have a look at model benchmarking here so here what I’m doing is I’m going on to the explore and looking at benchmarking you’ve got all the different models like I said if you choose question and answering you can select all the different models that you wish to use in your project and what you can do then is compare those different models uh and what you will do is plot it as a benchmark for you so it gives you a really clear idea in terms of right you know I’m going to be using the GPT 432k because it’s more performing in terms of scope so let’s talk about how you’ll deploy model as we spoke about in models a service you can simply click the model that you wish to deploy and within two clicks you can deploy llama to 7 billion model as a pyal model so here what I’m doing is I’m just searching for the project and you select the project the AI Studio then sets up and looks after it for you as a project and you simply click subscribe and deploy and it does all the backend stuff for you so it’s deploying all the infrastructure and looking after that bit for you and and it’s pretty much that quick and easy to get a llama 27 billion model deployed within your model as a service so here what you what what you’re going to see is you’re going to see it been creating the endpoint will be created it takes a couple of seconds it can take a couple of minutes but that’s it you’ve got the API key if you want to develop it in the backend you use the API key if you want to test it you’ve also got the functionality of playing around with it in the playground um like like I’ve just show now what I’m going to show is you know how you do fine tuning so we deploy Lama 2 model but what we want to do is get Lama 2 model and we want to do some fine tuning right so here is exactly what I’m going to do I’m going to use for housekeeping I give it a good name and it’s it’s very very simple what I’m choosing is a task C text generation you can upload your own training data which has to be in in dot Json format and when you do that what it then gives you is it identifies your Json and then you’re simply mapping the fields so you got to make sure your prompt call column is mapped to the exact column that you’ve uploaded and then as a matter of just clicking on next and you’ve got validation data right because when you’re fine-tuning your data you need to validate it to see how accurate is and how it performs you’ve got a couple of options it can either split your training data set or you can if you’ve got your own data set you can choose to upload that like it’s showing here on on the demo and again it’s simply you map the relevant data columns to the fields um and it’s as simple as that now here you’re just telling it you know your learning rate in eox you tend not to change that anyway you tend to go with the defaults and it’s as easy as that as simple as that you’ve now created you’ve got your training data set you’ve got your validation data set and what it’s going to do is deploy your fine tune model you can have a look at the metrix in terms of uh evaluating how good that particular model is doing and it’s got loads and loads of metrics for you to decide so this is going to help you decide whether you need to go and choose another model to F tune or whether this model that we find tune is good enough to go and what we do now we can deploy it so you get an endpoint so pre in previous demo we showed how to deploy this and it’s the same way just give it a name and your matter of just clicking on deploy and your fine tune model now will get deployed in the same way as deploying any model as a service model right so here what you can do now is also test the the model that we fine tuned in the playground and here it’s just going to do because it’s a text completion that we’ve trained it us what what I’m expecting here is exactly what you see it’s just going to complete text and can you can do more advanced stuff but this is pretty much a quick demo in terms of showing you how you use model as a service to deploy Advance models and also to do better fine tuning okay this is what I shown before now I’ve got another quick model following that and it’s all about you know how you use utilize the power of multimodality within a AI studio so what do I mean by that U let’s have a look at this demo multimodal modality is pretty cool I spent a lot of hours playing around with this here what we’re going to show is we’re going to go to the playground and we’re going to play going to work with images here so with images you can choose J Dar which I’ve spent hours playing with it and you can generate lots of cool and exciting pictures here that I’m sure you guys have played around with uh D is pretty cool um you can simply put your prompts in and you can generate loads of images you can download the images as well and you can have variations so yeah I’ve spent a lot of time playing around with this but what’s cool about it is in the playground you can upload an image so here we’re going to upload a San Francisco image here you can’t really hear it but what you can do is also speak to it so it can convert speech to taex so that’s where the multimodality comes in here and you can ask it to give you an it of what to do in San Francisco it recognizes speech Texs it recognizes the image and it will then pan out will plan out an itenary for you appropriately like you’ve see what we can do as well which is pretty pretty cool is um if you enable the vision option here what it does it it Taps into um AI um Vision Services and this is pretty cool because what it can do now is it can do image segmentation here for you so here what we’ve done is uploaded the same picture but you can see it being highlighted now so if you click on the highlighted link you can tell where exactly in the image the bridge appears or the skyscrapers appear so pretty pretty cool things that you can do here which is quite easy to do all right all right so we spoke about you know the the platform itself how you how you prepare your data how you would utilize Azure AI search to do um hybrid Vector search we tapped into foundational models you how you use model as a service to deploy your likes of llama your mistal models and how you do fine-tuning we also looked into how you would do multi model large language uh model application so just playing around with it let’s let’s talk about saf safety and responsible AI within Microsoft now Microsoft got this principles for a long long time so even if you’re working in traditional machine learning you’ve got you know your evaluation and uh responsible AI behind your traditional machine learning now Microsoft have done a pretty good job in terms of uh transferring all those learnings to large language models and things are constantly being developed so things like you know fness reliability um safety privacy and security are all incorporated into AJ I studio and within that you’ve got aure content safety which is fairly new that you can use to understand um your inputs and outputs into your large language model so if you’re playing around with AI studio if you’re playing around in terms of developing large language uh llm applications I highly recommend just investigating content safety it’s very very easy to use and it’s a tool that we use every every single development that we do Okay so we’ve spoken about quite a lot of things now but how do we put all these together and how do we perform llm Ops and how do we do model monitoring right now ai Studio you can imagine gives you that capability so it gives you the capability in terms of prom flow where it orchestrates everything for your all AI applications quite easily now I’ve got a little demo as well and let’s have a look at this demo so here what I’ve got is again within Ojo AI Studio we we going to view this flow and this flow is developed in a prompt flow so here what you see is a graph showing all the different elements within prompt flow it’s quite easy as you can see on the right hand side you can go in and change things forever so ever so slightly you can have many many different prompt variants quite easily you can even get it to generate different prompt marins for you all in prompt flow and what it does is it gives you a lot of flexibility in terms of making sure you know you’ve got placeholders in your prompt flow you’re getting all your inputs and outputs and you can even do a batch evaluation here so what what we’re doing here is create we’re doing an evaluation on the question and answer pass uh for housekeeping we give it a good name so you can understand what you’re doing and what we’re doing here is just choosing all your different options and you’re selecting The Prompt variants here that you wish to assess and evaluate and what I love about this is it’s got metrics as well so you’ve got your groundedness relevance and coherence you can use whether you want to use your existing data set or upload your data set you’ve got a choice of doing it um and it’s as easy as just you know if you’re uploading your data set make sure you map your data sources to column names appropriately and when you do that it’s as easy as just submitting the job and in the background FL is evaluating all of that for you automatically and giving you the sces almost instantaneously as you can see on the screen so that I found quite cool I love the metric dashboard because it immediately tells me whether I need to go back and EV do more with my llm application or make sure I’ve got better data coming into my model here um what I love about it is you can you can choose and pick and see all the different metrics in different ways so for example here what I’m doing is I’m comparing different prompts and looking at different metrics side by side which is pretty cool and now what I’m going to do is if I’m happy with it I can even use within promp flow I can choose to deploy it within promp flow so here I’m just selecting my options and it’s as easy as just clicking through the options and going through the stages and Azure AI Studio looks after all of this for is quite i’ say very very easily right it looks after it does all the infrastructure spin up as well and it will give you an end point like what you’ve SE now so what you can also do is if you get the endpoint this is what we do quite often you can test the endpoint to see whether it’s giving you an output as per your expectation if you wanted to consume it and do your development backend you’ve got a load of cat there to do it it’s also got monitoring to show you know how it’s been uh well model monitoring it can even go the the extra m in terms of saying if you want certain matrics to be monitored you can choose what metrix is relevant to your project uh and it’s as simple as just ticking it choosing choosing all your column mapping names and then you create a dashboard that will give you an immediate view of how well your model is doing in terms of monitor okay in terms of deployment you’ve got a lot of different options here so so here we’ve deployed a couple of uh workflows and you can swap one workflow out with each other which is what I’m going to show so we’ve got two deployment names and you can adjust the traffic accordingly which gives you so much of flexibility if you’re testing one deployment one workflow against the other and as as simple as that prom flow is a great tool in terms of developing your workflows uh helping you evaluate um Lun language applications and even going through the extra Mall of deploying it for you uh I think that is pretty much the end oh yes before I pass platform on to our next speaker it be I’d really appreciate it if you can fill out the survey it it just helps us as well deliver more sessions like this this actually goes back to Microsoft uh but yeah it would just help us put all this kind of sessions to see how handy it is for you guys whether you found it useful whether you want to hear from us talking more about generative AI how you a Joi Studio as well and that’s the end for me that’s my clap thank you Gabby um we have 10 minutes for questions so if anyone’s got some questions we do have some questions already we have eight questions already um so how does aour validate clean data and is Jason the only format of raw data it collects so I think you’re talking about the fine tuning here where I showed Jason S right I hope it is because that’s that’s what I was showing if you’re fine tuning data if data or if you’re validating your data and you’re uploading your data set then yes it’s Jason for awesome um cool so next question is there a publish option for testing in real time oh that’s a good question I’m not sure I don’t think so I don’t think so I’m not sure if it is but I’ve not seen a publish option for testing in real time cool thank you uh we can wait a couple of minutes for some more questions um yeah but thank you thank you Gaby um I can next speaker maybe Alex I can see you backstage I don’t want to just drop you on the stream if you’re not ready thank you though Gaby you ready to go hello how you doing good thank you how are you yeah not too bad thanks awesome stuff it is a Thursday so it’s almost to Friday which means it’s almost the weekend exactly that is what everyone is waiting for Course St right so we have about nine minutes before we kick off your session um so as we did with the last session this is our time to awkwardly stare at the camera for 10 minutes time comes around um did you catch up on build and the data in AI Summit not the data and I AI Summit so much but I’ve been going over some of the stuff from build and I think I am equally on the agent hype with uh Gaby um yeah autogen looks quite fun and definitely something that I am going to be getting my teeth into um as soon as I find a chance yeah y it does look awesome cool all right so we have no more comments I don’t know if comments are coming through on LinkedIn um but yeah so eight minutes I can always just kick off early unless we want to make AI related small talk for 10 minutes otherwise I’m just gonna update every minute um yeah yeah and if anyone joins too late we’re on YouTube so you can rewind it back um good suggestion so are you ready to kick off yeah I am living the one monitor Life currently so let me set up my screen so I now can’t see myself or what I’m presenting but hopefully you can see my slides and then tell me when I’m presenting because I will have no way of knowing if I am or not cool are you sharing them uh they should be in my screen should be sharing in the uh in the Stream your back end all good all good all right I’m going to leave you the floor Alex um awesome thank you I’m just going to check the channel on YouTube and put it on my phone next to me so that I can actually see what it is that I’m doing because this is very Hightech um solution cool yeah it looks like my slides are sharing so I will kick off um so this is going to be sort of focusing a little bit more on Rag implementations and how you can do it using Azure open AI um it will touch on some of the topics that Gaby covered in that last talk um but then get into a little bit more technical detail and I’m going to attempt to do a live demo at the end rather than pre-recording them so everyone can enjoy when it all goes horribly wrong even though it worked 10 minutes ago uh so I guess the first question is who am I uh I’m Alex I’m the lead R&D engineer at hurry uh and we make S of zero code analytics platforms and in my R&D capacity we’re looking at how we can AI power everything to do with analytics and sort of platform usability um I’ve been a machine learning engineer for about six years and I’ve worked on a whole bunch of different projects ranging from sort of iot devices and sports wearables and looking at how you can Implement machine learning there for things like fall detection um and sports monitoring I have worked in potatoes uh and anyone who has spoken to me before will know that I talk about potatoes an awful lot uh but I did a lot of sort of computer vision there and also a huge amount of sort of machine learning operations and actually looking at how you can deploy uh machine learning applications in real time um and then I’ve also done some consultancy and worked in kind of uh various different Insurance type Industries um and sort of most importantly I am a converted gen homie and I’ve been working on uh industry gen projects for approximately a year uh which considering chat GPT and the general AI Buzz of came out in November 2022 is is a reasonable amount of gen as a field uh in its life cycle um yeah so let’s get kicked off I have a uh brief agenda slide um so I will start with a very short history of artificial intelligence and generative AI uh because everyone says gen and Rag and throws all the terms around uh to the point where not necessarily everyone knows what they mean or understand it so I’ll kind of cover that in brief just to get everyone up to the same page uh I will then immediately jump into everything that’s wrong with generative AI what some of the limit limitations of large language models are and then get into the good stuff which is a little bit more detail around what rag is and how it works I will then cover the sort of core components of rag they are very much the core components in my opinion uh there’s a whole lot of stuff going on in rag but I’m going to try and break it down and make it as simple as possible uh and then finally I’m going to talk about how we can build um and pretty much get into a Deployable State a rag implementation and that’s going to be looking at using Azure open AI AI search and then also Lang chain in the background to pull it all together uh with a guest little spot from streamlit which is a nice little sort of python web application platform that you can use uh and I will also have some silly memes dotted about uh the slides because I think they’re funny and no one can stop me um so first of all we’ll jump into a quick history of AI um so artificial intelligence as a field came about in roughly the 1950s and was generally just can you get computers to think and act as well as or better than humans in terms of their levels of intelligence um and a great example of the first sort of AI application that really caught my eye is something called shaky the robot so shaky the robot was essentially a autonomous navigating robot uh and he got the name shaky because AI technology back then he was essentially re-evaluating the path that he was following to get from point A to B every second or a few times a second to the point where he’d be constantly making slightly different decisions and would just shake about whenever he was moving uh getting the adorable little nit name of shaky uh and that was sort of the field of AI for quite a while uh things like perceptrons and Euro networks were kind of hypothesized and theorized um but the technology wasn’t quite there to really try them to their full potential but around 1997 is when machine learning kind of took off so with the Big Field of AI being um problem solving machine learning is a sort of subfield of artificial intelligence uh looking at taking and learning from existing data and improving on that data to either make decisions or to make predictions um and so there’s kind of where your traditional machine learning algorithms come in like linear aggression support Vector machines random forests everything like that uh and they sort of bumbled along very nicely for a few years they became quite powerful and it’s sort of I guess referred to as traditional AI now which makes me a little bit sad because it’s only a couple of years old and still extremely powerful and a lot of B but everyone loves their deep learning and their generative AI so deep learning sort of came to the Forefront roughly mid 20010 there will be people who go oh I’ve been developing neural network since 1999 um but when things like py uh py no pytorch and tensorflow sort of really came to the Forefront and there were lots of Technologies around that and things like YOLO for computer vision is when deep learning really took off and that is using um neural networks which are essentially deep models which have multiple layers of neurons all stacked up and they can make sort of very big and informed decisions and then within deep learning you have the field of generative AI uh which has sort of taken off in the last couple of years and is definitely all the rage currently um so we then have another very similar diagram as generative AI in is of itself is a very large field within AI within machine learning within deep learning which is creating new written Visual and auditory content given prompts or existing data so it is Ai and making it create something new so with a traditional machine learning model you might give a load of data on iris plants with their petal and settle lengths and wids lengths and widths that is a mouthful uh and it will predict what species of Iris it is uh generative AI is saying draw me a picture of an iris and then the model goes away and it creates something new which is the picture of the plant or it is kind of classifying them and generating new information uh within generative AI the models that are used a lot of the time are Foundation models uh and they are large machine learning models which have been pre-trained on huge amounts of unlabeled data uh and then they can be fine-tuned as Gabby discussed previously to be better at more specific tasks and then a type of foundation model and the sort that I’m going to be focusing on today are large language models uh which are these Foundation models but specifically designed for natural language processing tasks so they’re things like your GPT models your Falcons your mistrals all of that good stuff there so now that we have sort of covered the the basics of what AI is what gen is we’re going to kind of double down into large language models uh and first I’m just going to talk a little bit bit about what they can be used for um so I’m going to go from right to left even though my diagram goes left to right because I think it’s easier to talk about the sorts of things you can do with them and than the data that’s required so the answer is honestly a whole bunch of stuff more than just the four bullet points that are on the slides but some of the key use cases and things can really bring value to businesses very fast are things like question answering uh so building chat Bots for your s of websites applications anything like that that are actually smart actually able to answer questions and have a little bit more sort of ump to them outside of just taking Q&A pairs that previously existed going yeah this seems to be about the most relevant QA pair I’m just going to give you this answer even though it doesn’t necessarily matter uh they’re great for sentiment analysis so if you have a whole bunch of text you know you want to know how your brand is doing you can scrape a load of tweets do sentiment analysis and get gen to tell you if people like you hate you or are indifferent uh you can extract sort of relevant information so if you have huge legal documents which are a full of a load of jargon that you don’t quite understand you can pass them into a large language model and you can get uh the key bits extracted and then also summarized so that you have a nice little couple of paragraphs to feedback the general gist of what these huge documents say and as to what you can feed in you can feed in sort of blogs technical documents call transcripts the traditional QA pairs from old school chatbots if you have a vast amount of information in your company’s emails that you think would be useful to the whole company you can feed in a load of emails and then build yourself a nice little chat bot to answer questions internally based on all the information that’s shared and not necessarily published uh so now we are going to get on to a particular use case for large language models so there are a ton of sort of big business use cases like you know understanding all of the technal documentation if you have very large Factory or if you have a very sort of complex tool that you want the users to understand you can build a chat bot based on your internal of technical documentation knowledge base uh but for today we are going to talk about a man named Robert Goodman who is this uh fine fellow with the ginger hair in the mustache here on the screen uh he is one of my characters in a tabletop role playing game called Lancer um and he is a great example of someone that doesn’t exist on the internet no one can possibly know about him chat Bots can’t answer questions about him because he doesn’t exist outside of a few documents that live predominantly in my Google Drive so he is a fictitious piece of information and a great example for something that a model wouldn’t necessarily know about upfront uh so we can go to chat GPT uh and we can ask him who Robert Goodman is and he will tell us a load of rubbish he is not from um code gas he is a madeup character I have created um so you know chat GPT in its raw form is wrong uh which brings brings us through to the biggest limitations of llms which are hallucinations and the fact that they don’t know what they don’t know um so you can see that chat GPT gave its best answer about who he was it was completely wrong um that it’s not who he was um but it gave an incredibly detailed and sort of very believable answer to the question even though it was talking complete rubbish which is one of the issues in that you can get uncontextualized data ungrounded data feed it into a large language model and you’ll get an answer back that presents itself with a high level of confidence even if it is factually incorrect irrelevant or just complete and utter nonsense but there is an easy solution to this problem and the way we teach our llm about Robert Goodman is through rag uh so what rag is is retrieval augmentation and generation or retrieval augmented generation uh any sort of combination of those General words and they are the three steps uh it is sort of along from you know zero shot prompting and few shot prompting it is one of the two ways of giving information and context to a large language model so that it can answer questions with more factual information and with a higher level of confidence um fine-tuning is a very similar way of getting the same sort of responses uh and although fine chuning is getting easier and easier to do it is computationally very expensive and for the vast majority of use cases unless you’re kind of have tens of thousands hundreds of thousands millions of documents you can get a very fast very accurate solution by using rag uh and so the way rag works is in three simple Steps step one uh you get the information from a database that relates to the question that you asked uh this is where Vector databases come in because you can embed and store all of your documents um in a database and then you can get very very fast very very accurate searches on what are relevant documents from that database you can then augment your initial prompt with these documents from the database and this sort of context specific information uh combine together and then generate an answer to the question that has context and is grounded and actually gives you meaningful information uh and there’s a few sort of tips and tricks along the way that you can utilize to make sure that it doesn’t spit out rubbish and that the model doesn’t hallucinate um and we are going to sort of cover the architecture first and then cover the core components and then I’ll talk a little bit more about the Practical applications and how you can implement it and show off some code and fun stuff like that um so the overall architecture in a very basic form of a rag implementation looks like this so I’m not sure if my mouse is visible on the screen I don’t think it is um but in the top left you can see we have the user who is asking the question tell me about Robert Goodman um and so he then feeds that into the system uh that will be embedded so that will be essentially vectorized and turned into these multi-dimensional um arrays and vectors uh so that they can be passed into a semantic search Hybrid search keyword search whatever type of search you wish to use into a vector database uh which has all of your documentation and in this case all the facts and the info and the life of Robert Goodman stored in this vector format in the vector database it can look up the most relevant information from that Vector store and then return it you can um then combine that with the initial prompt and pass the whole thing your question and the information with the answers contained within it to a large language model which will generate you an answer and you can then display that back to the user and in this case we get that he is it’s a redheaded space farmer from the planet Eden Kappa um most of that sounds like gobbins because essentially it is it’s all made up but our llm knows about it and it can answer the questions correctly with confidence um which is a step above essentially hoping chat GPT doesn’t hallucinate too badly um so looking at the core components of a vector database not of a vector database uh looking at the core components of a rag implementation you have a large language model uh you need something that is going to be able to take your information your prompts and then generate you the answers to the questions large language models are the absolute peak of NLP currently and they are a fantastic way of doing this uh and there’s a whole bunch of them out there uh there’s sort of your open AI models your gpts uh meta of got the various llama models there’s hugging face models and there’s a whole bunch out there to choose based on the different use cases and sort of their different strengths and weaknesses which are generally things to do with speed the kind of power and performance of their um implementation and then also cost is a factor as well whether that’s the cost to host the cost to use them through an API anything like that you then need a vector database uh which is a database which stores High dimensional vectors and the mathematical representations of your features and attributes and your data uh which can be searched very quickly with a high level of accuracy there are a whole bunch of them out there um kind of the three that spring to mind and I think are the most powerful are um Azure AI search chrom ADB and pine cone there are a whole bunch of other ones out there too which all kind of have their merits uh but those are the three that I think are a good place to look at for a starting point you know there is always of pros and cons to all of them but if you want a vector database to take a look at there’s three to kind of have a little bit of a play around with and I think that one of them will more than certainly fulfill kind of all of the vector database needs that you have and then finally it’s not really a component it’s kind of a slightly wishy-washy science some people would argue it’s just speaking English it’s just grammar but also prompt engineering is very important in rag because you need to get your model to use the data in a meaningful way and to actually produce an output that you want um and give information in a way that’s actually useful to your user and your use case you could just feed in a question as the kind of user prompt do the vector database lookup and then let the llm run wild but you’re likely to get very inconsistent answers you might not get the format you want it might start spitting out information you don’t necessarily think it’s necessary so prompt engineering becomes even more important in these rag sort of implementations due to the volume of data that you’re looking at and due to the sort of slightly random nature of large language models but it’s very easy to control all of that through a bit of prompt engineering and a bit of temperature control uh so I will go over all of these three components kind of in their fundamentals and then I will talk about the actual sort of Technologies and demonstrate how they can be used uh so first of all large language models uh they are a uh specific specifically designed Foundation model for NLP tasks and they are far more complicated than four little sort of bullet points but in brief there are sort of four main steps well two main steps and then an input and an output uh which break down how they work so the first thing you do is you provide it an input um in this case the input is the description of what segmentation is um and tokenization uh that is what you would pass into the model that you then have something which will tokenize segment and embed your information um so the process of tokenization is essentially breaking text down into much smaller chunks uh sometimes they’re sort of a part of a word a whole word or a small phrase um and depending what sort of tokenizer you’re using that will sort of make those decisions for you so you don’t have to go through an arduous process of string processing and splitting strings based on punctuation or spaces you can use something that’s a little bit smar and will tokenize it in a way that is optimized for use in large language models uh you then pass this nice tokenize message into the model itself uh you might want to combine it with a system prompt so in this example here the system prompt is you are an AI helper designed to summarize text please summarize and shorten any inputs you receive so we pass in uh the user’s piece of text combine it with this system prompt and we hand it over to the model which uh for this slide Works its magic uh it is you know probably talking its own right explaining how these models work and generate your answers but they are essentially looking at the probability distributions of what the next word in any sequence of tokens will be and they’re doing that in a loop with a lot more complexity and a huge amount of training data to give you an output and the output in this example is tokens are bits of text fed to and made by AI models they can be letters words or even parts of a word so essentially what we can see there is how you can take your input text tokenize it pass it to a model with a little bit of a system impr prompt and then get your output of the model S of doing its specific task uh we then go on to looking at Vector databases uh which are again another sort of very complex topic gab did a really good job of sort of describing how they work um so I won’t go into too much detail so I have a couple of sort of key concept here which are embeddings and similarity searches and then a very nice diagram which I stole from a Blog by quadrant who are another example of a fantastic uh Vector database company that kind of of uh sums everything up and how it works so embeddings are numerical machine learning representations of the semantics of the inputs of data and they are really good at capturing the meaning of high complexity data or high dimensional data um because you know you can have a load of words and you can sort of go these are similar uh based on you know they’ve got the same number of vowels in but that doesn’t really capture some of the more complex factors in it whereas through embedding into these high dimensional vectors using embeddings models you can sort of split them up into clusters of similar words and you have a lot more of the meaning of your words retained and then also when you are doing a search it dramatically Narrows down the search space which you are looking through uh so when you are trying to match you know similar uh things or retrieve relevant documents it’s much faster being able to go to a specific point in the vector database and pull it out from there without having to comb through absolutely every single thing in there and do a sort of comparison one by one because when you’re having sort of tens of thousands or millions of embedded vectors in there that would take far too long to be usable and then the other thing outside of being a fantastic way of storing data and that I’ve kind of touched on is that there are a fantastic way of retrieving data as well and they excel at finding similar data points which is crucial for Gen and of machine learning Solutions uh as it can get the relative or sort of the relevant information based on meaning in context not just oh this matches this looks the same it must be the same which adds another level of sort of smart to it um overall adding to the AI part of artificial intelligence and improving its ability to problem solve uh so finally we have prompt engineering or the basics sort of thereof uh and the general gist of it is the better the prompt the better the response so much like you have garbage in garbage out when you’re looking at data if you have a good prompt for your llm and for your system it is far more likely to give you a response that you happy with now there are a whole bunch of different approaches to prompt engineering there’s a whole bunch of Frameworks and templates uh I’m going to sort of briefly talk through the co-star framework template methodology whatever you might want to call it which is a kind of set of guidelines on how to design a good prompt to ensure that you’re getting kind of consistent well formatted answers and responses from your model um this isn’t sort of a step-by-step guide 1 2 3 4 5 six it is a little bit more of an abstract concept uh so we don’t have a particularly structured diagram there are these six factors that you need to consider when you are writing your prompts and as long as you touch on them at some point in the prompt then you know that you’ll be able to get a good response from your model so the first of them is context uh your model needs to know what its purpose is and what it’s trying to do are you an AI assistant uh are you a you know a funny jokester that speaks only in rhyme you need to provide the context for what you’re doing and what the purpose is uh you need to say what the objective is is the objective to summarize information and make it smaller is it to expand on things is it to provide recommendations is it provide sentiment analysis it needs to know what it’s trying to do uh you can then sort of have style and tone which is what is a sort of style of the response and the general tone that you want do you want it to be you know sort of bullet pointed uh and very formal or do you want it to be a little bit more Whimsical all of those things need to be described in the prompt to ensure you’re getting that in your outputs uh finally it is always handy to talk about the audience because that will very much set the theme for the style of the response so even a sentence as simple as you are a AI chatbot used to help non-technical users understand technical documentation if it has that general gist of who the audience is is able to generate responses that are appropriate to them and kind of contain relevant information without going into too much detail that you know that they won’t be able to understand and finally examples of the type of response that you want to do you can provide in the prompt a couple of example inputs and outputs if there’s particular formatting that you want so you want various different headings or you want to make sure that it’s less than 500 words or you know sometimes models particularly GPT 40 it really loves to format things as markdown uh which is great if your output is going into a system which can handle markdown but if it’s going into a plain text response it can just look messy so even things as simple as do not format your output as markdown are great ways to sort of control the response that you get um and then this isn’t even touching on things like saying if you don’t get the answer that you want from the context reply with I don’t know there’s a whole bunch of things you can do but having just a general checklist and a framework towards writing your prompts allows you to make them even better and sort of have your application have the little next step up and the professional touch that you’re looking for uh and finally now we’ve gone over the sort of basics in their kind of theoretical I guess form I’m going to talk a little bit more about the technical stuff there will be a code demonstration and we can get into the the fun bit as I like to think uh because there is nothing better than a little bit of code so the first thing that we need is a way to use a large language model which is step one or component one of our EG implementation uh and I’m going to be showing how you can use Azure open AI uh because it is essentially a fantastic rest API for getting very quick very controlled access to all of the um open AI models and also through using things like AI Studio you can have one place to have your hugging face models your open air models your llama models a whole bunch of choice because more options is always better and then the vector database that I’m going to use is AI search um because that is a very fast very powerful um Vector database it has hybrid search functionality uh which allows it to do keyword and semantic search at the same time um which is just sort of boosting the performance of the search and also um the Microsoft sort of Suite of AI Services work really nicely together they’re kind of all underpinned through AI studio and various AI services so if you stick in the sort of Microsoft boat you can leverage things like you know AI services so things like document intelligence the models that they provide um all of the various things that were in cognitive services like translate before it got renamed um so it kind of makes sense to use and then finally you need a framework that you can use to sort of orchestrate all of your steps uh and actually build the app ification and the flow and the logic uh and Lang chain is a fantastic tool for doing that uh it’s very well established it is documented beyond belief and it’s also completely open source um so it is a great tool that you can use but there are other Alternatives you could just use prompt flow uh promp flow is a great way to do sort of prototyping rigorous prototyping evaluation it has some deployment options uh through the sort of realtime deployments uh but you get a little bit more rigor if you’re using something like Lang chain through being able to contain miniz your own applications build them into other things and having a little bit more flexibility um and then I am going to use streamlet which is hiding up in the top right corner to just sort of tie all of this together and be the the very simple web front end for the chatbot that I am going to build um you can build this into any other you know you could build all of these into function apps or fast apis and have them as an API called from a traditional sort of react web front end uh but you just get a lot of choice and stream lit is the fastest an easiest way for a web development Noob like me to actually pull something together to essentially demonstrate and show off uh so now we get to the fun bit uh it’s never that exciting watching people sort of scroll through Visual Studio code uh so I’m going to do my best to keep this reasonably brief um talk about what is going on and then we will go into a live demo um oh there was actually a slide on that with another funny picture uh yeah so what I’m going to cover in the demo uh I was going to cover AI studio and embedding data and creating indexes in AI search uh but Gaby has done that already and her demo was pre-recorded and far cleaner than mine will be so at the last second I struck that off the agenda uh but the next thing that we’re going to do is create a rag implementation uh using Lang chain uh the word flow is there because it was going to be prompt flow but now it’s Lang chain uh we’re going to connect all of the relevant Services we’re going to do a little prompt engineering and we’re going to get a good answer to the question of who is Robert Goodman uh and as my little picture shows nobody says anything but there are always problems right before the demo uh including the fact that you noticed typos in your slides hopefully there were none of them before uh but if there were any uh please don’t call me out on them uh so jumping into the code uh we have the standard block of imports at the top of any piece of code uh which the vast majority of these are all importing from the various different Lang Chain Services uh we’ve got Lang chain open AI which is going to handle uh the embeddings model and the chat models uh we have the vector stores and we’re using AI search which will handle all of the sort of connection and querying of our Vector database uh and then we have a couple of little utilities at the output passer and the chat templates just to make things sort of clean and nice and simple uh and then we’ve got a couple of libraries because I wanted to put images on my website uh so we use those there uh you might notice a very weirdly out ofpl um environment variable set just there that is because there is a currently a bug um in AI search which is where it will default the content um Vector field name to contentor vector or lowercase somewhere in their code it will o it will set that environment variable so if you don’t have a Content Vector in your field of your vector database it will never be able to find it um so before you import the AI search Library uh the Azure search Library you need to set that yourself to the name of the content Vector field in your vector database um that was allegedly fixed about six months ago but it still doesn’t work so if anyone from Microsoft is watching there’s a bug fix uh that needs to be done and scrolling through the code there is someone who has just left a line almost identical to that one in the um Azure search code which just sets it to contentor Vector um from there we get into actually setting our own environment variables um so the only things that you will need to do if you are using a m file or you have environment variables set in your container these two lines are Superfluous but because I’m running a sort of very lightweight demo I take information stored in my secret management which for this is using the streamlet secrets management and then I just set them as environment variables because the various functions that I want to use um for open AI will pull the environment variables in you don’t have to set them in the code so I set those there uh I then set some paths and load some images uh to make my Stream app look pretty and then we get into some of the implementation of building a chatbot uh so the first thing that we want to do is essentially we want to create the messages that our chat bot is going to display um this is done by using a session State variable which means that it will persist so it will not overwrite itself anytime you add something new you don’t need to worry about constantly and iteratively going back over old messages and adding them to a variable they will essentially persist persist in the web app uh and we set it off with a assistant roll message uh which just says hello and introduces themselves uh in a fun and quirky way uh we then get into sort of model loading which is getting into the uh good Lang chain stuff see if I can shrink my terminal anymore yeah oh that may be gone forever but that’s fine uh which is where we are going to create our connections to the various models that we want and to the vector database again this is done as a uh session State variable so that you don’t need to constantly connect to the V Vector database every time someone makes a call um and it means that if you are hosting this and there are multiple users you are saving a huge amount of compute and time because you’re not const making the same calls with the same data um so using Lang chain it is incredibly easy to do because there is the Azure open aai or the Azure chat open AI um function which will allow you to connect through to your model uh it will pull the key and the endpoint from the environment variables so you don’t even need to Define them here then all you need to do is tell it what environment or sorry what API version you’re using and what the name of your deployment is um you can hardcode them in they’re not necessarily something that needs to be kept a secret um but by having them in the environment variables uh it makes it a little bit easier to change them in a deployed app rather than going right I’ve changed this variable time to dig through the code change everything you know if you’re deploying a new model you don’t want to have to be digging through your code making sure you change all the names forgetting one of them somewhere and going ah can’t remember which file I didn’t change the name in but everything’s broken you can know that you have a single source of Truth for these things which are your environment variables or your secrets it is exactly the same for the embeddings model um doing that there and then finally you can connect to your vector database uh which is azure search and this time uh you do need to pass in the search endpoint the API key the index name and then also an embeddings model that you want to use but fortunately um as your search will handle embedding the plain text input for you so you don’t need to do any of that stuff you can just go cool I have uh some text and I want to do a vector search using it pass it into the vector search it will embed it for you do the vector lookup and then respond with plain text uh which is fantastic um so then we get onto the sort of two key functions that make up the entire thing which is doing your vector search and calling your model uh I’ve split it out into two separate steps you can do it all in one chain using Lang chain um it can be a little bit more efficient and it allows you to run some steps in parallel but it is a little bit more difficult to understand and there isn’t that much of a loss in quality by splitting it out doing them one after the other and then sort of joining it all together so quite simply all we do here is we have a function that takes a prompt from the user that they will input in the chat box uh it will then take that session State variable of the vector store and it will perform a search uh you have the ability to set how many documents you want to look up uh whether that be two 3 5 10 you can kind of control it the fewer documents you use the more likely you are to focus Focus just on the key and the best responses but if you have quite small documents or you know that your information is spread out between multiple different documents you’re likely to miss some of the other important documents if that is too low and that is just a sort of value that you can tweak and tune when you’re developing your application based on the inputs uh and the data that you’re using uh that will then return um an object of documents and all I do here is simply go through all of them just extract the page content so what’s written in them ignoring any metadata and join it all together in one big string it’s not necessarily the most optimal way to do it but it is quite an easy and robust way to make sure that you take everything from your documents get it in a big block of text so that you know that your model is going to have all of the relevant and contextual information and this kind of makes up the uh ra part of rag which is the retrieval and the augmentation uh we then get into the calling of the model so what this takes here is again the user’s prompt The Prompt was used before to look up the relevant information but it’s also needed in the model call so that you can actually uh you know refer back to it and answer it appropriately and then we need all of the data that has been retrieved uh I am using Lang chains lcal here which is essentially a very nice way to get very simple easy to understand prompts which are very code readable um so if we look at the chain that I have there it contains three steps um it contains prompt it contains a model and it contains an output passer which essentially says I take something I pass it to a model then I format the output uh we then have the prompt here which is very simply you’re a helpful assistant designed to answer questions that isn’t following the co-star framework there is a huge amount of information that is missing um which we could do a little bit of um prompt engineering on we could specify the format a little bit more we can make some conditions we could set the tone but it will generally do the job and then where you get the sort of augmentation is here so you’re asking it please answer the following question and you pass in what the user has given you and then you say using the information provided which is your retrieved information there and then when you call and invoke that chain you are able to pass in the user prompt variable in the information variable uh you can make this a lot larger it doesn’t just have to be a user prompt and retrieve data you can feed in all sorts of stuff so if you are pulling uh data from databases or dashboards or analytics tools into your application you can feed all of that into the prompt and give the model A incredibly contextualized picture of everything that’s going on it’ll have all of the information that it needs to know uh and it will be able to give you very high quality answers on information that otherwise isn’t available to it uh and then the rest of this is just a little bit of streamlet code to essentially build a chatbot uh so there is a section here which is because of streamlets incredibly readable and easy to write formatting a subheader uh and a title which on my name uh in two separate columns and then a picture to go next to it uh we then have something that is going to write out the contents of the chat and then we create a chat input button here uh using the wallrus operator it is essentially taking the content of it and checking that it is occurred it will um output the question the user has just entered into um the chat window and then it will retrieve the documents create the augmented response generate the answer and display that there so if I double check my terminal uh we can see that my application is running so if we jump back into my web browser we should be able to see the Robert Goodman rag chatbot uh so if I hit refresh there you’ll see that it very quick loads all of the models and makes the connection to the vector database uh so we only need to do that once and then we are good to chat so here we can now ask questions that chat GPT and other models won’t know the answer to like how old is Robert Goodman it will have a little think and it will tell us that Robert Goodman is 37 years old I know that to be correct um I could pull up the document and prove it all to you but you’ll have to take my word for it but we can then answer some more intricate question so where is he from yeah and then it will tell us that based on the provided that he is from Eden kapper and then we can ask what are Robert Goodman’s beliefs if I can spell um and it is going to give us a little bit more information on the weird and wonderful things that Robert Goodman believes so here we can see that they can pull out a little bit more information from the embedded documents uh and what it is going to say is that he has three core beliefs one is that each planet in the universe has a soul which can feel pain and joy um that there is a group of sentient crows who secretly run the entire world and that traveling through blink Gates or teleporting creates a clone of new in your um new location and then your old clones become enslaved so he is clearly quite the tinf foil hat wearer and a little bit of a zany person uh but in short that is how to build Implement and then short of creating a Docker file turning this streamlet app into a container and hosting it on something like app Services you have an endtoend fully implemented rag project uh so I’m not sure if I was due to finish at half seven and I’ve run over by 2 minutes or whether I have been good and have finished a few minutes early in time for questions uh but thank you very much for listening uh and if there are any questions um stick them in the chat I’m sure Tori will tell me what they are because I can’t really see anything um hopefully you now have all of the skills you need to build a rag implementation and become a Gen Master in no time at all using a mixture of AI studio and Lang chain oh thank you Alex always enjoy listening to you talk about such things okay got some questions so question I want to ask is how does uh rag deal with uh like tabular data so when you got things uh numerical um so you have sort of a number of choices when it comes to numerical data one is you can essentially just feed it in in a reasonably raw format and hope for the best um newer versions of sort of large language models are actually quite good at understanding numerical data uh in the format that they come in um so you can you know feed in a Time series bit of data and say what do you think of this and it will have a reasonably good stab at of identifying basic Trends uh you could also go a little bit more advanced and use something like function calling to essentially call out to some more mathematical functions that you’ve defined to do a little bit of the analysis up front and then essentially feed that back in a condensed format uh back into the prompt which can then analyze that as it’s a little bit more text based um and give you a slightly better answer nice cool thanks very much uh we’ve got someone else saying what is your top three recommendations to become a gen homie um I wonder who recommendation number one um is accept that traditional machine learning is traditional uh and that actually there is a ton of cool stuff you can do with geni that you can’t do with old school machine learning um and then number two is pick one tool and just have a go with that first it’s easy to sort of look and see right there are 10 different models I could consider 50 different Vector databases a thousand different Frameworks they all look really hard even if it’s just sort of Eeny meeny miny mo pick a model pick a vect database pick a framework and just sort of roll with it you’ll be surprised how easy it is to build something very cool in a short space of time uh and then once you’ve sort of built a couple of applications um you can then go back and start to critique all of the steps along the way and make more informed decisions but just actually doing something is definitely a good first step uh and then the third step is definitely be peer pressured by your colleagues to start looking into it uh because they are right and you are wrong when you are skeptical oh nice one okay we’re running uh a bit late but uh I think we got time for one more question so someone’s asked about uh cost so if it’s possible to touch on cost for setting up these projects cool um yeah so that is a great question um there are it’s it’s a difficult one to answer because it depends on the implementation that you go for uh the costs can be very low so if you’re using sort of um Azure open Ai and like a gpt3 model um they have very very low cost per token and they pay as you go so you don’t need to worry too much about sort of racking up massive bills using an API or doing any fine tuning um then you know touching on things like deployment streamlet can be very very cheap to deploy you can very easily containerize the application Lang train is open source and free so there’s no overheads there and then you can host a containerized application in sort of an app service or any sort of kubernetes orchestrator and have quite a sort of cheap front end and deployment there where costs will come in are around Vector databases um you can avoid cost by using something like chroma DB spinning it up yourself um and then also hosting that in kubernetes or equivalent um for essentially very minimal overheads quite cheap implementation but you have a lot of issues with having to do all of the sort of scaling and load balancing yourselves all of the sort of devops and things like that uh but then if you want to use sort of a lot of the serverless kind of options available kind of go down the uh AI Studio route uh it can be quite expensive expensive but then it becomes a lot easier to sort of work on and do um where the the time cost is reduced because they’re a fantastic tool for doing things very quickly uh but the monetary cost is a little bit higher because they are more expensive to use nice oh thank you okay I think that’s all the questions we’ve got time for but yeah thank you Alex for for joining us uh yeah very good talk and uh hopefully this is not the last time we see you for your your lovely talk cheers thanks Alex okay thank you everyone for tuning in uh if anyone uh is has any interest in uh speaking at these please uh feel free to uh drop a message to myself Gabby or Tori also uh just want to make sure that to check out uh the meet group for the next time we do this so we’re looking to do one in September 19th so so please keep uh keep your ears close to the Meetup to see what happens and thank you very much for joining in and enjoy the rest of your evening see you later

    Leave A Reply