2023-11-15 | Input Talk | Irene Schumm & Ulrich Krieger

    Abstract
    In the age of data-driven research, the University of Mannheim Research Data Center (DFZ) at the Mannheim University Library bundles together services for researchers interested in research data management. In this talk we will introduce the audience to the FDZ’s core services such as guidelines for Data Management Plans or the data collection via OCR, provide information about its data resources such as the Aktienführer data archive, and (re-)introduce the newest addition to the FDZ: The German Internet Panel data collection infrastructure. In addition, this talk will provide an overview of the University Libraries activity within the BERD@NFDI consortium. The second part of the talk will deal with the NFDI consortium BERD@NFDI which is coordinated by the University Library. BERD@NFDI focuses on the research data management and analysis of unstructured data in Business, Economics and related fields, such as unstructured text from social media, news, images etc. We will present the services under development and how researchers can profit from BERD@NFDI.

    Presenter(s)
    Irene Schumm is Head of the Research Data Center Department at the Mannheim University Library.
    Ulrich Krieger is Coordinator for the BERD@NFDI consortium based at the Mannheim University Library.

    Hello welcome everyone my name is Dennis K and I’m happy to welcome you to today’s input talk by in Shum and who’s sitting in the audience right now uh which is entitled Finding accessing and reusing research data the University of manam research data center and bird at

    Nfdi for those of you who are uh tuning in for the first time today the social science data lab is an event series here at the Manheim Center for European social research where we provide a platform for researchers to present tools and methods for the collection management analysis and visualization of

    Data and the social sciences to briefly introduce our two speakers today who’s standing next to me right here is head of the research data center department at the Manheim University Library Anda is coordinator for the birded nfdi Consortium based at the Manheim University Library in the first part of today’s talk enena will

    Introduce the research data Center’s core services and then following this input Olie will present and introduce the bird at nfdi Consortium uh you are free to ask your questions after each block uh throughout the talk so feel free to engage and ask your questions as a disclaimer as usual uh we are

    Recording this live stream with the intent of putting up uh it up on our YouTube channel uh we are recording today’s talk in active speaker mode which means that we will be recording audio from on-site attendees and remote attendees and if remote CES unmute yourselves to ask a question and their

    Camera is on their video will be recorded as well uh if you want to avoid having your voice and or video recorded feel free to post your question in the zoom chat in which case we will read them out for you with that being said

    Enena o we are so glad to have you here today and the floor is yours thank you very much Dennis and thank you for inviting us so uh we are talking today about um both the services of the research data center and um the services of the nfdi constuction B at nfdi and

    This talk is somewhat different to the talk you’ve already had um this semester because it’s more about infrastructure that should support you during res research and not so much about research we are doing or methods we are applying so um this is somewhat different but

    Still um I think uh many of the things that we that might be helpful for you all right um so um I have a team in my of my research data center and obviously I cannot bring everyone here but still to give you an impression of um yeah who

    Who you are dealing with um I have put up an image here of of my team uh we are still growing going currently so um you might have heard maybe that the jip transition to the to the library just as of now more or less so um tobas retic

    And anabal are joining our team and apart from that we have some research data Consultants that are partly also involved in in in other projects but who are also um Consulting on research data management and have different uh scientific backgrounds as you can see so what are we doing in the research

    Data center we support researchers from University of Manheim at all career stage and from all disciplines in managing their research data and research management is yeah it is a broad uh area of course so I’m going to break that down a little bit um and present especially the services that

    Might be of interest for yeah Ms researchers so um uh I’ve I’ve organized it along the research data cycle that you can see of nowadays so we’re going to start with the data management planning and fair data management um Services then with data acquisition and collection together with

    The data offering so this goes hand in hand more or less um we are talk going to talk about data cleaning structuring and linking um the services that we offer there and also about that data archiving sharing and Publishing all right uh data management planning and fair data management um so

    Data management plans is something that you hear a lot nowadays in in in the research data management Community um so it’s intended for the researcher to yeah to to support him or her uh in considering all relevant as aspects and research data management from the very beginning of each um

    Project so uh that it it stimulates you to think about stuff like um data handling legal uh topics organization documentation uh and these things and um yeah it’s it’s especially important in in collaborative projects of course where you um need to maybe clarify such things before you start already before

    You know problems about how you’re going to share data and who’s going to take care of the data before those problems arise and in the project so this definition is from science Euro I’ve posted that guide here uh it’s a guide how you can um how you

    Can write a data management plan it’s really good guide uh that I I can recommend um and yeah apart from the intrinsic motivation to write a data management plan there might also be some extrinsic motivation like a funders requirements and for example the dut for SCH they have a general checklist

    Regarding data management so that they don’t require you to write a data management plan but more or less when you go through the checklist you have in the end a data management plan and you need to elaborate on on that um in your in your proposal on what you’re going to

    Do with the data and how you’re going to manage it and um apart from that or on top of that there are also subject specifics checklists of the DFG that you should also of course consider like the sociology checklist um yeah so this is is the requirements of the the

    DG then there’s the ministry funding from different Ministries on state and federal level um there’s no general policies uh that we saw but call specific requirements so that might Verve from a data management plan to yeah no requirement at all so it’s it’s really um you have to look at the

    Call um it’s most regulated on the European level for the ESC Grant and for the Horizon funding EU EU Horizon funding so um you have to write a data management plan as a deliver of the project within six months of funding start and you have to elaborate in that

    Um data management plan on how you going to ensure Fair data management so ensure how data is findable accessible interoperable and reusable um so they have real templates uh for that that you know you have to use that templates and um submit the data management plan there

    Um all right uh and apart from that um there are typical topics that mostly any data management plan covers or should cover like the administrative information about project um maybe also an abstract of course the pis the data description what type format volume and so on the data has how it’s been

    Collected whether it’s you know um a primary data collection or um secondary use of data how you going to document it and ensure data quality um what legal or ethical issues arise and how you how they are solved um how you’re going to and where you’re going to store and back up the

    Data it security is is should be tackled there and also um you should talk about how you’re going to share the data and do long-term reservation and if it’s not possible due to legal reasons you should also of course mention that and in the end um also especially

    If it’s it’s uh um some some data man management plan for a proposal you should um also elaborate on data management responsibilities and the resources so what money you need for for data management because that’s also something you can apply for um at at uh most of the funders as now often some

    Time already all right so at the fdz we offer services regarding data management plan like a workshop coming up in December by David Mor um so I posted the link down here I guess the presentation will be shared afterwards um where yeah we can or will he will

    Give a um more or a deeper introduction into into data management plans okay and we also have um um currently testing a platform rdmo it’s called it’s it’s open source platform for um manage or for creating data management plans I just give you an a quick impression of

    That so it’s it’s already available but we have been doing some updates this week so I wouldn’t maybe try it this week next week um you can already find it here the link is also on the slides and you can log in with your uni

    ID so and uh so you can have your data managment plans right here uh it’s yeah in terminology of the rdmo it’s it’s your projects of course uh you can create new projects here on the right um but maybe I’ll just show you one data management plan that we already have

    Right here so you can first choose when you create a data management plan from a a catalog which is more or less a data management plan template um so that could be the DFG checklist for example here then it would guide you through the the questions of

    The DFG checklist um but it’s we also have the Horizon Europe and also the ERC Grant uh data management plan templates uh in there and you can also switch between those so once you’ve created maybe a data management plan for the DFG and later need that data management plan

    Maybe for ANC Grand you can also of course then uh just um yeah switch basically all right um then you can uh go through that through the data management Plans by clicking that link answer your questions or answer questions um and then there will be some maybe we

    Start from the beginning so he will basically ask you um so there are a lot of lot of questions that he might ask you what metadata um you will have and and collect and how and with with which standards and you know all the stuff that goes into data management plan like

    The size volum format of data legal issues you can all document it in here and um that’s also the plan that um we are working towards is that you can also reuse all that stuff for example for um for reports that you um Can can give to the data Protection Team for example

    About the project um so the the goal is then that you don’t need to um to enter all the data over and over again and compile reports over and over again but to have one single point where you can uh enter everything and then reuse it in in multiple um

    Ways yeah so uh that’s basically the the platform um once you’ve you’ve gone through that checklist or through that template you can um you can then uh have different yeah export options for example as a you know ERC uh data management plan or um obviously a a beeld template exists

    Too um also a horizon um data management plan and different others and the tool is also very flexible so you can also create own templates there of course um so uh this is something we are currently testing and that you can also test with with us and also of course give us feedback

    About a question can I use an existing uh data management plan that is some included and and then adjusted for my purposes um I mean we have written some just you know for training purposes um and yeah we can share them of course yeah it’s not a um feature you

    Want to implement so that for example if somebody has to write a data management plan one can look up what exists and see is it similar to what I have to do and can I then use that one and and yeah yeah so I mean the you can share

    Data management plan plans with others yeah we need to to discuss of course whether it’s you know a general policy that we share every everything with each other or whether it’s you know you have a catalog of data management plans and you can maybe ask someone to

    Get access to to his or her data management plan yeah but it’s a good idea because I mean often times you you’re in front of the the the data management plan template and you think what should I write here so okay so this is the data management planning

    Um service um I don’t know whether there are any other questions right now I don’t see any in the chat so well then I’ll go on to the data acquisition and collection and also data offerings um that we have in the in the fdz so uh data acquisition um that’s

    Mainly when you’re buying data that um is also something um so the library of course we’ve been buying um and and Licensing electronic resources like ebooks electronic journals databases for some time of course and then the university said that uh also for for this databases commercial databases or

    Other um fee based data sources the library should um do the negotiations so this is something my colleague does um the the central uh licensing department more or less um but I guess it also goes through your method section if if people from MSS would would want to license data right

    Library what do you mean by licensing when somebody wants to buy data it has to go to through the university and not to us okay okay okay everything has to go through this contact I was thinking that maybe first they would approach you okay so this would be then the contact for for

    Licensing data that is fee based and Commercial data yeah it’s possible for just finding data or finding out how to get the data but for the real to have a contract we are not allowed to sign contracts contracts have to be signed through the Library okay and yeah since since then all the

    Contracts go through the library we also have an overview of the data license that exists at the University so we’re about to um create a data catalog with the licenses so that researchers of the University can look up where what licenses are available maybe and then also get in touch with other researchers

    And um yeah contact each other and and yeah discuss what what what how they can maybe use data how it makes sense maybe to to use licenses together um and stuff like that so this is coming soon um so this is the data acquisition of uh secondary data sources

    Um then this is something that’s quite new to the Li as as I’ve already mentioned the German internet panel um you probably know all some of the backgrounds already about the German internet panel so I won’t go very much into into detail but um still um there are some some new uh

    Regulations as of now um yeah still it’s it’s it’s a a regular online panel survey existing since 2012 and based on random sample of the total population of Germany so at the time of recruitment um it consisted of people aged uh 16 to 75 and from private households

    Only so uh we have six panel waves per year where uh we have a a a survey um running and currently uh we have 67 completed waves with more than 200,000 interviews more than 7,500 participants uh so the the 68th wave is currently in the field and um the chip sery

    Infrastructure was implemented by the former sfb and now transitioned to the to the library and what’s new now is that it’s not only um it’s not only a survey instrument or survey uh infrastructure of the sfb but service base can now be booked by by all researchers university-wide also

    Beyond uh so if you have any questions and need uh uh answers uh from from a survey from a jip then um yeah feel free to contact us of course um um yeah funding is now over of sfb so we have to uh also um implement some some fees for for that

    Service base um but yeah you can also of course then in the end when you’re when you’re having a proposal also get the the reimbursement for that fees from from your funer probably all right so how do you use the the Gip infrastructure as a researcher

    Um so when you have a server idea you you contact the chip um the chip staff and and they con will consult you regarding feasibility of your plans uh the scope the cost that you might have with that the wording the filtering and and stuff like that then the jip staff organizes the

    Implementation of the questionnaire um they also test the questionnaire and you also as a researcher of course uh can also test the questionnaire um you have to consider that a normal time to survey is about three to four months um in the case of you know uh current events um there’s

    Also possible a possibility of posting emergency questions um yeah but of course that can’t do that every every wave um once the the service in the field um it’s about 20 minutes long and in maximum and a modular design that that means as a research you don’t have

    To fill the whole 20 minutes um and you don’t have to ask the whole uh sample you can also ask um subsamples so that’s why we can say um right now the the cost of of the survey U um in of the survey but yeah it really depends on length on and the

    Sample so the server participants so then invited via email and reminded uh as needed um we have a an email and telephone hotline for support and the participants also receive incentives so um so far uh it was five uh Euros per uh survey um and

    As of now with a modular design which is new um we have only planed with uh 25 cents per minute so if they are there for the the whole 20 minutes um they get at maximum also five in the end all right um so this is also

    Something then that we are dealing with the incentives that’s nothing that you as a researcher have to deal with um then after the survey the chip team receives the data does some data processing and documentation after about one month the researcher will receive the data and uh the chip Team all so

    Creates reports for the participants with some interesting findings you know to keep them connected to the chip um and after six months the data then will be published uh in the ge’s data archive and also the documentation um is published in the pen.org um website yes verification does that mean

    That there’s only the the whole questionnaire is um available or there are no standard questions or things you ask each wave or so it’s researchers can fill all the the whole questionnaire um there there are some basic questions um so core questions that uh they going to ask regularly and

    This is I think outside of the 20 minutes oh no I think it’s I have to check actually El ask maybe iie he’s he’s coming from Chip so you can maybe also get I was I was uh it’s pretty much the the whole of the November wave used to

    Be the core questionnaire so soci demographics there are I don’t know if the current keep this up or we’ll split that more Bri spand it out over the over the whole so there is um yeah so so some some part of the questions is as as you you probably

    Thought is is reserved for those questions that pretty much everyone needs household composition income and that stuff um my question would be um could could you elaborate a little bit on the fee structure for using service space so is this charged per item per response time uh do does the Fe structure vary

    Depending on whether you just want a single um module within one wave or continuous um blocks that repeat across waves yeah it’s it’s um per per minute um of the of the survey and I think we we are planning on 3 minutes minimum and uh it also depends on

    Whether you um um yeah you ask the whole the whole uh sample or just a subsample of the participants all right I think there’s also something in the chat I don’t know whether it’s related to yes clarified um that one of the six waves per year usually the September wave was

    Reserved for the core and that going for they will most likely spread it into something like 3 to four minutes per wave going forward all right um yeah then I go to the third pillar of data acquisition and collection and also to the data offers

    Uh then in the end of the library so uh that’s the Optica character recognition it’s it’s both you know a data offer but also a data colle collection service more or less so in the past um as many libraries did or do um the library carried out several digitization

    Projects um and of course we digitize print books and on the other hand we also do an automated text recognition so that people can search those digitized images um for text um many libraries have done that in uh you know quick and dir way with some standard tools and which was fine just

    For for searching but in the past researchers have approached us and said we need a better data quality not you know just out of the box optical character recognition but really good quality because we want to analyze the data and uh this is why we’ve had several projects in the past um about

    This and um some of the projects are still still running so also in bed nfdi we have a part dealing with optical character recognition and also um the fdz together with bir nfdi provides cons consultation and support on on automated text recognition so we also share then

    Our experience that we acquired with um the tools and with the the procedures of automated text recognition so the consultation all is via Zoom every month and the next is on 7th of December And of course we also support one and one um with specific projects and optimizing digitization and workflows so that’s the generic Services more or less that result from it but we also have of course then the data sources that we have created um so one of the data sources is the actur another

    One is the r Anda that we did in the past um and and yeah the accent fura is um a data source about German companies stock listed companies that was published in print starting in 1956 every year and contained information for example about supervisory board members the uh

    Managing directors um about um you know some key figures like uh the profits loss um and stuff like that and um researchers have been using their data because it’s a very homogeneous and high quality data source for German companies um and with a long history and uh the standard um procedure

    Was that they said the students assistant that they should type in the certain data that they want to analyze and then in the end of course the data again vanished so it was never never published uh so um we started a project in 2013 to digitize the the a to have it

    Electronic and then do some optimized OCR so this was the wish of the researcher to have really good data quality because it it obviously matters whether the machine recognizes an o or a zero or you know a comma or um uh a period and um so that’s what what we have been

    Doing then um but so this is some some um typical data from the aten fua so in that in that uh form we have the data just you know unstructured full text and we then went through the text and and also structured it so we have then the

    Names of the uh managing directors um the first names the title academic titles the locations and everything and we put everything in a database and also made a web font and the researchers then can search but also export all the data in in structure tabular

    Format um so we have been we have we have now many many users of the database and also some some external users so we had uh for example research Pro projects who um or which analyzed you know the fraction of female Board of director members uh over the time so that’s

    Something that can be done very easily now with the database all right another data source that we have is the so-called R Ander which um is official was the official Journal of the Russian government so the PO ons oner at first and then do oner so

    It was a journal um intended uh from the government of course to do some positive reporting about the government but also to um for the proclamation of new laws and regulations so it’s also the predecessor of the bundus anida in that sense um they have Public Announcement

    There losses during the first world war and also historical commercial registers so uh they have to publish there when when new companies in Germany were founded and also when they went into bankruptcy so that’s actually where the the first research project was about it was about looking at bankruptcies of

    Firms that were owned by Jews um and that’s how you know the rice Anda came came into the library more or less as a project and it’s still going on so uh we are still improving the the the full text with optical character recognition it’s still not perfect because obviously

    You have all the also the um um fracture shrift and font style um so the old font style and uh also many many tables which is also problem for OCR when it’s too messy um so this is really a huge data set that uh is also very valuable for for research in

    Historical um dimensions all right then we go on to if there not any other questions right now actually um going back to to one of the previous points about um not o but acquisition in general um you mentioned that for commercial or proprietary data there is an option to set up contracts

    To this contact point in the library um what’s the general so or is there a general rule on fee cover cage for these proprietary data is this by default does the Ping need to come from the applicant or are there certain data that liary considers of general interest that would

    Then be bought from Central University CL normally it comes from from the um researcher um of course I mean if if there’s a data source that’s really the must have data source for the for the University I guess if there’s some consense in in a faculty or something

    Like that then might be also be possible to you know apply for Central funding or to organize also Central funding thank you all right then uh going forward to data cleaning structuring and linking and um this is something that was also coming when the ainur project started because

    We then got more and more data sources histo with historic current current data about companies and we we had a problem then we didn’t want to have secluded and separate data seos so we want to wanted to connect that data basically um and we did that also in in

    Projects also in B nfdi currently um but this is a topic with uh with um company knowledge graphs or company data knowledge graphs um using the wiki based software so um I mean currently in in the data database the company names for example they are just meaningless

    Strings more or less also the people and places related to the companies and with the knowledge graph we have the opportunity to to model them as you know no um unique and identifiable entities that have certain properties so I show you an example quickly um that makes it clearer

    Probably um and the idea is then also to link those entities once we have identified them and once they have also an ID um once they can be addressed you with the identifier um and to link that within the knowledge graph and also Beyond so then you can really easily uh

    Analyze also networks between companies between people people and companies and so on so uh in the end it’s a semen ification from unfair more less just text strings to interl and very fair uh entities of data so um it’s starting off like this the raw data where you just have you

    Know your unstructured full text then it goes into a structure so you have data of D B for example um and know some structure with the city here inart tkim but still the database doesn’t really know what that City what that means so um we go on them and um find the

    Um yeah the definition of that or the the specification of that city um so it’s a headquarter location and not just you know any unrelated random City there and it’s also interl then to other knowledge graphs for example viky data um where they also have um um entities for example for that um

    For that cities and yeah what’s the profit then of having a data in a Knowledge Graph um you can have with a Wiki based Knowledge Graph as spet end point for querying the data out of it and creating visualizations so something like give me uh the headquarters and visualize the

    Headquarters of German companies of a given year is can be done very easily with that kind of data structure so um yeah that would be for example such uh a visualization of uh machine industry companies in I don’t know 194 40 something um where you can see okay uh the hotpots

    Of machine industry are here and there so that’s something uh that can be done um you can link to and from other knowledge graphs and you can also enrich tabular data using knowledge graphs so um consider you have a a table with data for example about companies just the

    Company names uh then you can look up that company names in in a Knowledge Graph and when you have other properties that um also describe the company you can really identify them and then also here connect them with the other knowledge graphs and retrieve data from the other knowledge graphs and enrich

    Your table so this is I think then uh a very useful thing if you want to do data matching and data enrichment all right um yeah as I said we are currently involved in several knowledge craft projects so we can also support you in creating your own Knowledge Graph

    And we also have upcoming in uh December um a talk by Rach chap on knowledge for research so that’s all about knowledge GRS um then I go on to the to the last step in the research data cycle I mean the data archiving data sharing and data

    Publishing um and yeah I mean as a social science researcher you probably um you’re probably used to GES and um this is the go-to uh data archiving and sharing um institution of course as a social science researcher but I also want of course to mention that we have a uh institutional repository for

    Archiving and or sharing data it’s called my data and um it’s somewhat similar to The Thesis basic um archiving service so we uh have a data description there uh with metadata using a common metadata standard data site so this is um and and daa which is basically also

    Coming from GES we get also digital object identifier for the data sets there they’re also registered to GIS um and also the metadata is indexed by GES um and apart from that also at for example at Google Google data set search and soon also at the open air search of uh e

    EU um the data stays on the service of the University of Manheim with Mod data um and there are different data she sharing modes possible you can publish the data as completely open data uh you can make it available upon request and also with an embargo open data access so

    For example it can be open data for example after two years um and yeah we guarantee to keep the data for at least 10 years which is in line with the standard requirements of the funders and of course after the 10 years we uh we will get in touch with

    Researchers and and ask about you know how how to proceed from here uh so we not just you know deleting data from from our data okay and coming soon there will be also a secure data room in the fdz um in the library to be more exact and

    Um this is to Grant researchers on campus access to data that should not leave the university that you don’t want you send out per per email to other researchers so they can can come here into this Q data room and analyze the data at a library computer which is not

    Connected to Internet or anything like that so um it’s really then yeah safer to share the data through that secure data okay Dennis yes two questions the first about ma data I mean this sounds super useful however it seems to me that most people who do survey research have

    So far opted for Commercial Services as provided by Theseus um could you maybe maybe explain what you think the main differences are and maybe which of the platforms or Services is a better fit for which types of data collection and then a second question about the secure

    Data room will this only like will this only Grant access to data that have been produced at the University of Manheim or will researchers also be able to use external secure data as currently upper let’s say as the social economic panel of research data center or in the Giza secure data

    Room um yeah maybe first question first um yeah I mean it’s it’s it’s an an additional option with Mod data I think as a researcher in social science it makes sense of course to go to the um um to the well-known institution of the social science and a wellknown

    Archive um however there are some researchers who somewhat care about the data staying within the university so you know it’s just um um yeah just no additional option for that for that demand more or less Um yeah the secure data rooms so we’ve had that in the past that a a student was writing her master thesis with data from the statistical officers and we also uh got a contract with the statistical officers that um she could analyze it within the library we didn’t

    Have the the secure data room then but um yeah I mean it depends on on the contract yeah sure so that means that there’s demand for certain contracts to be made then yeah then this goes hand in hand with the data acquisition perfect thank

    You okay so then we at the end of the cycle and uh it can begin over again so then I thank you very much for my side and hand over to Olie for the bird at nfdi well thanks for the opportunity to uh present uh to introduce you uh the

    Audience in this zoom and online to our consult from bir at nfdi place for Big Data enhancing research in business economics in related fields in related fields is um all especially all social sciences um I’m going to start with an with an uh with an overview over the nfdi and as I

    Mentioned earlier we could talk with the nfdi we can go on about that forever and I try to keep it as brief and concise as possible to just give you the general idea some some of you have heard about the nfdi this is the national research data infrastructure and um the national

    Research data infrastructure now has a vision that’s the vision of the nfdi data is a common good for excellent research organized by science in Germany um so the NF is a new funding scheme where the um where the federal state and the federal the Federal Republic and the

    Federal states joined forces to enhance the research data infrastructure in Germany because couple years ago they realized that um Germany or they they they stated that Germany is lacking resources now we heard about all the nice things that the University of manam is offering and University of heidleberg

    Library of course always also offers these services and G status Center offers some other services many data are still stored somewhere on uh personal laptops where researchers and um the the consensus was that this has to be changed and there needs to be some interconnection between universities and

    Research uh infrastructures like the big brown Hofer and light institutes and all this should come together and enable more and better research through a common understanding common re infrastructure a connection of all these data types that’s where the nfdi comes into play and that’s where it is um

    Funded um it is a totally new funding scheme and we’re making it up as we go along um and this is the reason why I often hear from people so what has the nfdi ever done for us so there’s a researcher here sitting at the mcds and

    Think like okay this is big words and what does it do for us and this always the trade off that if you go with established structures if the money would have been funneled into say the frown hoer Society or liet institutes then you can’t build something new and

    Ambitious like the nfdi but if you want to build something ambitious with as the nfdi then you have new infrastructure and you have to build um new schemes and develop um um new new types of um funding and and this all takes time so this that’s always the tradeoff and um

    Colleague of mine once said that nfdi means doing the right thing in the most complicated way possible um and on some days I wouldn’t really argue with this colleague um but I think we’re getting it um so the nfdi is structured so the so how this is set up is it’s structured in

    Consortia consortia are um are is it is a team or is a contract by universities or other players they come together in a um specific subfield so it’s organized um by by research topics research fields and um this is so the reason for this is to get everybody on

    Board but in the end we want to improve research um the the net the infrastructure for for all scientists so there’s even Fields missing like law isn’t in there and uh but we um we want to want to improve the infrastructure for everyone um for all the researches in Germany and general public

    Um so the cons so there’s 26 consortia and some of them are in engineering and life sciences and we have the social sciences here and um I’d say if you want to watch out or follow if you want to follow some of those I’d recommend for this audience

    Um take a look at nfdi for data science there somehow here in engineering science but I don’t really understand why because they’re um this is pretty much a meta Consortium that’s doing all kinds of interesting data sciencey things and um n ifbi or computer science n

    Fdcs we also have to come up with these acronyms also takes time I guess um uh and then of course a consort SVD this is the Consortium that is based on the r for soal and Sten and this is where all the research data centers in Germany are

    Teamed up as you know as we all know and they they formed the consortion within the nfdi to create new and interesting states and then there’s us and we are having a focus on business and economics and um but as I said we’re also covering other fields um the division of labor between

    Consort SD and bird is that consort is doing structured data and bird is doing unstructured data so in a nutshell that’s what it boils down to but there’s a there is a there is quite some overlap between all these topics also I just came from a conference at nfdi for data

    Science and I thought they’re also doing similar things um that’s a common theme in the nfdi as we bring things together we see people from different angles coming to the same Solutions or they have similar um similar challenges like um for us it’s always data protection or

    How can we securely offer data to researchers for our colleagues at the text plus it’s about literature it’s it’s always proprietary information that if a book is published recently how can you share Text data without um violating any terms of services so but there’s very so I mean

    Just just to get the idea that it’s it’s very similar and often you you in the strangest Fields you find similarities and overlaps um so at the NBI that’s quite a beast and it’s hu it’s very complicated it’s um and this is due to the reason how

    This is set up it’s funded as I said it’s funded by B and lender so they reach an agreement and and every and this so there’s 17 important stakehold ERS and many others and the way they did this is they they formed an association as we Germans do uh and

    Then this the NBI um the association is based at the K in k um and um this is where they organize all these activities members of these Association are only there’s only non-natural members like University of Manheim is a member of the NFD and then the nfdi has all sorts of things like

    You have here has a senate and uh um kuratorium director members assembly and um it’s very um it’s a mixture of um top down and bottom up approaches and there’s lots of lots of people involved and um many sessions everything going on but is more interesting maybe to

    Researchers is also we have this cross cutting topics sections they are much like scientific associations in a way um and there um five at the moment those are topics that are of relevant to many consortia so if the idea is that if two many Consortium for example take training and

    Education pretty much every Consortium has training and education activities so there’s overlap and there’s common interest in this topic so there is the section and that um here um important decisions or collaborations are formed um so and everybody can join like you me everyone can join these

    Sections uh and um and why why would one join these sections because there’s very there’s a lot of intelligent people and I now I said it’s it’s it’s quite complicated and the nfdi is beast and you say do I have more time for another Association but it’s also a venue um for

    Um I mean to for for very stimulating and interesting debate um and I mean this in a sincere way this is um there’s really good things coming out of that uh it’s bottom up so it’s everybody needs to get involved but um so that’s an invit invitation so if if

    One once you can check those sections out and you can reach out to people and um join the discussion the 27th Consortium is based for nfdi now we have a Consortium there is not a Consortium but a um Consortium like so that develops basic services for the

    Nfdi um most notably right now they’re working on a interoperable signing or identification services for all of uh German science foundations I thought this problem was solved with this shibolet um log in but obviously the some some of the big research associations like frer um and hel hols

    They’re not part of the system so um it’s a fragmented landscape as we do again in Germany uh that’s the NBI so in a nutshell um the N of thei is I think in its fourth year and um it is um the the conundrum is that it’s funding longterm

    That the goal is to form uh to form a stable um continuous data infrastructure but it’s and the model that we’re doing this is on fiveyear uh funding scheme and there um in the initial in the initial agree Financial agreement they said it ought or it should be funded continuously but

    Um as of now it’s not so which is um there shall be a review in a couple of years of how the how this all holds up how um how well it’s functioning and um depending on that we will see how this goes into the future um yes I mean

    Yeah the German L science landscape is littered with all these attempts to do something um big and ambitious it didn’t really perform but um and and and I think the jury is still out there with the nfdi but I think we’re getting there so okay now to our Consortium uh having said all

    That so so so I mean the idea behind bird is um and and I’m probably preaching the choir here that more and more of our research is data driven and uh a lot of um there there estimates how much of that research is based on data and this

    It’s a growing it’s an Ever growing number and this is an important topic for all our for um as a basis for all of our researches and um a corner that we picked out now here are these data um some of those that I also mentioned that are not um as of now

    Available in a um fixed table format and that is all the data that we call unstructured also text and pictures and um audio recordings and um or we saw um text of the aent F that first needs to be given some structure structure to analyze it

    Um and that we and in that regard we um yeah we we have the what we call in our application we call the traditional model where we have a table here with a structured data that is somehow that is the result of a of a survey or um some data collection by the

    Government and then we have there’s empirical methods for for cause analysis as we all know we call this a traditional model but I don’t know how to how how many how to how many how much of the research is still traditional and of course the M3s is obviously on the

    Forefront of developing new um as and you all know now we have this unstructured non- studed data from from dig digital resources all these found data or Big Data as one C that or text that we first need to structure um and and when we structure that data obvious

    Are often so big that we can’t do that by hands we need um we need help by a i machine learning algorithms to structure the data and that is the um that the idea behind bird is to focus on this part of the of that flow diagram and and

    Um and leave that consult SVD so to say the um the data that is already structured um so we we want to we want to be the we want to be the the um the entity that offers infrastructures and services for the integration of unstructured text audio video data into into the research

    Um and what we what we um how we so the the Main Avenue of of we doing this is is building a platform form of course as everyone does and our platform will then have all the services um built together and linked together so that the re that is fairly

    Easy to re uh to use for the researchers and it helps um them um gather new insights with data that had been previously unavailable for research so there’s a beta test right now of our platform some and I will goil what is so and and I would invite

    Everyone that wants to join that bet test um it’s an email address by my colleague Andreas write him an email and be on um the we want to we want to go live with this platform in Spring next year uh with more services and make it

    Available so the the so the downside of these talks right now is that I’m keeping all these talks and say this will be cool in the future and you can’t access this right now but it it exists so it can be joined inest could you you may explain what

    People can expect when up yes okay um and there will be examples on there so an overview what we will have on there is so what we envision is there’s a data portal where we can publish assimilate and share data and I will show screenshots of that in a bit then

    There’s an analytics portal and this is um um right now one of these one of these unique selling propositions so to say of the bird platform that we also we we we’re documenting uh machine learning models that help structuring the data in the process and that will be also on the

    Platform because um we’re as we saw with M data and we talked about the Gees data service and um documenting data we um we’ve come came a long way with that so that’s um that’s something that we all working on but um these um documenting the uh the pre-processing

    Methods that’s uh something where we see still there’s a there’s a gap in in in the documentation process and obviously makes huge difference how you structure how you what kind of methods you use to structure the unstructured data um prior to analyzing them so if this is if if

    This is not documented that makes the whole research um unreproducible and there will be a training in Education portal where be um where we have resources to learn and improve on methods for instructed data and then there’s other services tools and guidances for for the work with unstructured

    Data um these are the screenshots that I said so we we going to have here a curated um curated set of data that is um that are part of published research that is um that are of interest to the research community so this is um by default right now it’s more focus

    More um driven from a business and economics um perspective so these are the so important data sets and I didn’t I mean I didn’t know that I I also I mean my background is also in um in survey research and um with these with these data sets

    In business it’s kind of different is it’s it’s a little different with then these um with a survey if you if you want to use sub data you use the data from the German socio economic panel and you know where to find them and there’s only one panel like that and with these

    If you want to have a data set of thicker data or um some Amazon review data there’s obviously there’s thousands of them everybody can collect some and harvest some and uh but you want to curate the good ones those that are actually used in the important um cited publication

    This is what we’re going to have on our platform here uh with a metadata description and um so findable and accessible and you get you can then get um descriptions data set metadata this is how this will look like um so that make it that that makes it findable and

    Usable for researchers that are um looking for data and answer the research questions on but we’re also going to link um then methods data sets in Publications so there are the the the also store the important Publications that are the data used on and this is um something where um that’s quite a

    Challenge to identify the methods used the datas used in Publications um there’s also other groups working on this and and I think this is the next step that you can um you can link Publications data set methods together and what it’s not is something that’s missing here then also

    Um training and and and suggestions for methods so that if you say I have this data set and this is important for my research question then there’s this link to we Al we can others use this method or this is the state-of-the-art methods most used to analyze sta set and

    Something that’s missing here um I don’t know why is then also to offer these training resources I talked about training and Education portal we want to link those together that when you’re when you’re new when you’re a young career researcher and you have this data set you stumble into a field and you

    Need some new tools and techniques that you can also then um find the right courses um um something that we that we um that’s also right now in testing and that’s um an interesting I think it’s an interesting part of the the whole um Endeavor here

    Is to have um industry data sets as we here said the nonresearch institutions there data sets um by companies that that have um that collect data or they have found data inside the company and they don’t they can’t publish that or they don’t want to publish that because it’s they um some

    Of their business model is hidden in this data and we want to connect researchers with these companies uh where researchers can apply for exess um so that um so if there’s here we have an example data sets here say it’s from an automotive company that that have a data set and say somebody

    Could gain some insight that or maybe the company even has questions specific questions of data set um and we can link together researchers and these companies for a successful Corporation and uh find ways how they can securely access the data um however that will be as onite or through a

    Secure way um like in a Marketplace bring them together that’s why we call this a data Marketplace I I don’t know if I I think it’s a weird way name but these business people they kind of like it so it’s it’s it’s a name so but that’s the

    Concept and I think if this picks up then this is also something that’s really missing in Europe there’s services in the United States um that are also yeah quite fancy but also quite expensive uh and um if we can do this in in Europe and offer this in a lower

    Lower scale and fund that view the nfdi that would be a really good um I think that would be a good addition to the to the tools Services I just quick question about what sort of data get onto that Marketplace um so I mean both in terms

    Of your preferences what are the I mean you already said you want the big curated data sets that end up in or have ended up in important Publications um but your preferences aside there maybe certain entities that are not Cooperative but they don’t want to share their data or where the terms of

    Services um prevent the data from being shared on third party platforms so what are what are the limits to this so I mean this um this the the data Marketplace this is this separate part of the of the data portal where I think this as I’ve been told that right now

    The the method is that you’re a you’re a doctoral student and you want to analyze something and then you get access to data from company X because you know somebody that knows somebody that your advisor knows and you through these informal channels and we we want to make

    That more transparent and make that more efficient in a way and um um match the the people with the right skill set to the right data so and this is um we want to we want to make that easy and attract companies to do that and I

    Can’t Envision who who who will join that or I have no preference um with these the the data set that I had on the slides earlier so I’ve been so um so there’s two ways to get data into this data portal the one way is that we

    Ident we curate them so we we identify important Publications and we want to document the data with these Publications and the other that’s more the gaz’s data service approach that if you’re doing research with unstructured data you’re using tools on the service you also can document your own data here

    You can um fill in publication requirements that you get by funders or journals that would be the that would be the other Avenue and so and and the tricky thing I think is then to also link these data repositories and make them sech able and

    Inter connect them in a way so that you and and GES is in our Consortium so we’re working on like making ge’s data searchable with our platform and make our platform searchable by other platforms or um research data providers so that that in the end it doesn’t really kind

    Of it don’t erect new barriers the critique against all these this platformism obviously is always then you have all these platforms how is this interoperable but um we’ve seen that problem and we’re trying to avoid that and um playing our strength is ours here and but

    Um um don’t make it an island a research Island in a way so but Ju Just so I understand there are certain types of data that probably legally could never make we on right like I know social media data especially yes Twitter when it was still called that was super

    Popular when the version two of the research track API was still present but the problem as I understand it was that you could never share any sprayed um or crawled tweets because once they are deleted from the platform they could no longer or you no longer have the right

    To distribute them as part of your data collection so these are problems that this platform couldn’t get around no um no if there’s restrictions I mean we we’re not publishing any um any data that’s we’re focusing on anonymized data right now so there’s no pseud poniz data

    Right now so no uh German socio economic panel and um we of course we have a we we also need to have a service a contract with the people who upload these data and if there is no way of displaying this data right now um I know this is a complicated issue

    And right now as we we’re going to go we spare that so we put that aside this problem um I could imagine that in the future there would be a way to access data somewhere on site or something um um but right now we’re focusing on

    Things that we we legally or that that can be displayed thank but it’s obviously in business and economics that’s a lot I mean there’s a lot of data that you can and there there’s much more open data than in social sciences I learned now um Have here that’s another screenshot of these of that procedure that’s not um that’s my last slide um yes um so okay so there’s better testing right now of the of the platform what is online but you have two things two parts of the platform that are that have a

    Temporary home on our on our website of there is training and education and there’s quite a lot happening there from the colleagues at delun Munich that do the training and education um Team there they have very good services so that’s if you if you if the question would now be so that’s all

    Nice and good to hear but what can I see of uh what can I do now you can get join the better test but there’s also these training and education activities with very good there’s there’s good stuff happening out there so like visiting programs and um really good talks and

    And U in person and safe past um learning modules on AI um and all these tools you need for unstructured data research and the other things is that we have um we have tools and services like the OCR um data that is now temporarily based at our homepage it’s bird minus

    Nfdi dode where you can ACC Sy them they will be integrated in the in the platform but right now that is probably taking another 12 months um an optimistic scenario I guess um because I mean we want to have them want to have link that all together but um yeah

    Um uh yeah we’re getting there so um keep an eye out for that and um yeah that’s that’s our consortion uh that’s that’s that’s the that’s the stateof thee art of the where we are right now uh and I’m glad that I could uh share that with you thank you thank you

    Leave A Reply