The Best Practice session will showcase some examples from DataCite Members who have implemented workflows that enable connection metadata to be included when they register DataCite DOIs.
The connection metadata properties in the DataCite Metadata Schema are nameIdentifiers (e.g., ORCIDs), affiliationIdentifiers (e.g., RORs), relatedIdentifiers (e.g., DOIs), and fundingReferences (e.g., Crossref funder IDs).
Connecting DataCite DOIs to other PIDs is essential for the discovery and reuse of the underlying content and for making sure researchers get credit for sharing their outputs.
Join this session to understand how DataCite Members adopt best practices for connecting metadata in their research outputs.
Speakers:
00:00 Welcome and introduction – Bosun Obileye (Regional Engagement Specialist Africa)
01:57 DOI Workflow Best Practices – Herbarium Kiel: IGSN Registration Workflow – Thorge Peterson (IT Developer Research Data Management, Christian-Albrechts-Universität zu Kiel), slides: https://doi.org/10.5281/zenodo.8429360
13:15 DOI Workflow Best Practices – Adding Person, Affiliation and Relationship Connections to DataCite Metadata Records (KAUST) – Daryl Grenz (Digital Repository Lead, University Library, King Abdullah University of Science and Technology) Rawan Karsou (Digital Collections Coordinator, University Library, King Abdullah University of Science and Technology), slides: https://doi.org/10.5281/zenodo.8429384
24:32 DOI Workflow Best Practices – The IITA way – Hafeez Adepoju (Data Repository Officer and Programmer, International Institute of Tropical Agriculture), slides: https://doi.org/10.5281/zenodo.8429376
35:39 Q&A
Good morning good afternoon good evening from wherever you may be participating you are welcome to our which is your community data site annual community meeting meet meeting 20123 this is the best practice session and we’ll be looking at dii workflow best practices Regional exampler from our members and today we’ll be having Tet
From K University presenting to Ross also we having from the library of King Abdullah University of Science of Technology we’ll be having Dar grains and ranu and also we be having from International Institute of tropical agriculture a Adu will be presenting to us so if you have
If you have joined us you can participate with us on Twitter or masteron and a hashtag for today is data site 2023 please go online and share your thoughts you can review the data site code of conduct has been shared with us in this Zoom meeting after the event we encourage you
To fill the survey also the slides and recording of the event will be shared afterwards please enjoy yourself as we start now we be handing over to to as start yes thank you I will start I hope you can all see my screen here is it working sure it is great so
Um yeah hello again I’m to Peterson I’m part of the research data management team at K University in Germany and today I will share some insights into project that is uh undertaken by our Botanical Institute and the Botanical Garden it’s the herbarium KE or ke herbarium uh which is currently being
Digitized and includes the registration of igsn IDs um so International generic sample number uh which is basically a DOI so let’s have a quick overview um the herbarium is like I said created by the Botanical Institute and Botanical Garden it’s a collection of approximately 250,000 specimens spending over quart quarter Century it includes
Diverse plant groups phy collections seeds and even pieces of wood and at the core of the general herbarium the LI specimens gathered from 18th and 19th century research Expeditions and these specimens were collected by r known explorers such as Alexander F H or Joseph Dalton
Hooker on the right side you can see a photo of the old storage a picture that hopefully soon will belong to the Past over the years the barium faced challenges like moving places to world wars and that left visible marks on its past year so also the use of acidic paper had an
Effect and made the sheets a little brittle so all specimens will receive new protecting sheets and will be stored in durable cabinets within the next years uh since 2021 we’ve also received funding to protect this Heritage through the preservation of written cultural heritage program here in shy Colin and Within These preservation
Efforts um yeah we also want to digitize and and make the collection accessible so in in alignment with the preservation efforts we try to improve the accessibility according to fair and open data principles um we making the collection accessible to everyone starting with the general herbarium which is also the oldest
Part uh the data is published in a specimen database and virtual herbarium Shak it’s a jointly administered herbarium it’s in based in Vienna and where users can easily search and and explore the collection on the right side you can see a collection overview of our karium or snippet of of that collection
Overview yes and in this procet each specimen is given an international generic sample number to make the identification and citation effortless and also to improve the discoverability these efforts are carried out by our Botanical Institute members often in the spare time so it’s a I guess a longlasting
Process let’s have a quick overview of our digitization workflow of course we Safeguard zium sheets put them in sturdy cabinets um we maintain the historic taxonomy in the cataloging process so we can preserve the original context of each specimen but for today more important is a p ID assignment so each specimen
Receives a unique identity through Q QR codes with igsn on it ensuring the permanent and easy identification and discoverability retrieval citation of course improves everything um we have a photography step where high resolution Imaging captures the specimen details on the right side you can see our scanning device is Herb scan light
Box and all the metadata that is on the sheets is then entered accurately in in an in-house database that is provided by a local company phas HL and it ensures a consistent reliable metadata collection an expert reviews the data afterwards so we have some guarantee for the quality
And reliability as well in the last step we publish this data it’s exported to the virtual herbarium jug and of course our igsn meter data is then updated ensuring that the information remains up to date I actually said updated because we have implemented the igsn ahead of the transition to data site registration
Services um before the transition um we only had registered igsn IDs or Legacy igsn IDs for our digitized specimens registering became then necessary during the transition to data site because we had a lot of QR codes already placed on the on the sheets but the sheets were
Not digitized already so this was a a process a streamline process and um now all igsn ifds that are printed on QR codes have been Alas and have been registered um ig’s nids of already digitized specimens are findable and so we have some discover ability for those entries where also the
Metad data already exists and and linked are these legy igsn with alternate identifier property to the to the new igsn and vice versa this was an automated process during the transition to data site in in an early 23 and yeah it worked very well thanks to data site
Again when we generate our metad data um first the the metad data is entered by step in an inhouse data base you can see also here the input form of the database there are some yellow Fields the form is also much longer but the yellow fields
Are not transferred to Jack and I wasn’t much involved in this this is is more from the discipline specific point of view and a local company accomplished it with my uh with the local Botanical Institute here so the data is then transferred to Jack after an expert
Review and there are also some Norm data linkages uh that are applied so um we provide the data to Jack and they are already enriched with some P IDs like Orchid IDs VI data IDs they use VF um virtual International Authority file and they also use uh a couple of discipline
Specific PS such as International plant names index on the right side you can see the input form of our inhouse database and the entry in inj the scans are provided to trip Vier some of you may notice that the igsn is not shown on the landing page so far but
We still working on implementing the best practices there so we transform the data when we register igsn or when we update the igsn and we mainly utilize the already present information of the ja export so the transformation is handled by small python script this can be
Automated and we try to find at least one name identifier and affiliation identifier for each person involved so we also try to gather more antivi uh identifiers based on the given ones uh for example the g& D which is here in Germany um used in many in many cases
And we try to identify the roles of the contributors to provide the contributor type so for example we have here the data collector Bernard lard and if we have date information such as the collecting date we add it with the appropriate date type as well and then we use some other technical
Info such as plan family taxonomy terms or genus and so and so on in the description and title Fields so this can be utilized in the data site search of course we also utilize other discipline and not discipline specific metadata like the Geo information and so
On to ensure a certain data quality this is all reviewed by an expert um in the discipline um but regarding the IG andr regist ation the transformed Json data which is according to the data site metadata schema this can be uploaded to our Central igsn service um we started
Such a service this year here in K University and it offers some features so um for example data management members or respective data Steward can review registration requests and it’s it’s kind of like a Fabrica for our University employees and we have some validation there we use autocomplete name
Identifiers for Orit and on research organization registry IDs um we have landing page fallbacks and you can see a landing page on the right lower side here which is a fallback which is provided by our service and of course um we have much user documentation so all
These measures help to follow the data site uh specific best practices when registering ig’s nids especially here in the University context where multiple institutes registering IDs so last Slide the current status um we are very happy with the data side transition and um Works kind of well for
Us we have for now only 31 discoverable finable entries this will accelerate there are a lot of entries in our in-house database that will soon be published um we have um findable entries that have collector information and respective contributor they all have been linked and with at least one name
Identifier uh we are utilizing the QR codes in the cataloging process which is very handy and yeah our Legacy igsn still persist and we can work with your new DUI igsn IDs while providing this coverability only for the already digitized entries so this all worked out very well
And as well we yeah we are going to revise entries into check database we’re still working on the implementation of the best practices for the external landing pages like here with Jack it’s a jointly administered system so this takes some time and yeah then there are some other technical things like the
Triple if protocol and we’re we we’re trans transitioning our inous database to a modern software STI so this is it this was a quick overview I hope you have gained some insights in our process and if there are any questions just follow up thank you we really appreciate that and for
Her participants please if you have questions kindly po it in the Q&A we will have special time after the three presentations for question and answer as well as discussions too thank you very much now let’s have uh partic I mean our presenters from Abdullah University of Science and
Technology from the library of that institution over to darel and ran you may kindly share your screen thank you uh thank you yes so I will start today by introducing uh some of the uh basic things that we are doing to connect our records uh using identifiers
And then ran will talk about a new pilot project we have started Um so our repository has been uh was established back in 2011 and we started using Orchid IDs in 2015 and then we joined data site and started registering dois in in 2018 um our initial decision to join data site was because we wanted to uh support data set and software
Archiving um but we also wanted to provide dois for other unique materials and for us it’s turned out so far that this uh the focus of our DOI registrations has been on uh providing dois for our thesis dissertations um though we do have a number of data sets and other materials as
Well um so when we joined uh data site our repository records already had Orchid IDs for many of the people who are authors uh especially on feces and dissertations because we had been requiring uh student Orchid ID since 2015 um we also have orid IDs on most of
Our data set submissions uh even though the users aren’t required to make the connection during the submission um if we have a confirmed Orchid ID for that uh person from our University we will add it to the record even if they didn’t add it during the submission originally um
So most of our DOI metadata that we send to data site includes the name identifiers as a orchid for the authors um and this has been successful and I think we have uh benefited from this process and it’s been straightforward for us um for a vill affiliation connections uh
We have done less so we during our submissions for most items we only collect or we only know actually um if someone is a c person or not and if there are authors that are not affiliated to C we don’t actually know normally uh who their affiliation is to
Um so so we only include affiliation entries for the authors if they are C people um but we do now include the RO ID for cost on those uh affiliation entries um the other types of relationships uh through identifiers that we have started adding are um related identifi
Um so this is most common for the data set submissions where we do ask the submitter to identify um if the data set is uh supplement to a paper um the authors have an option just to list the title or text citation and actually this is most common because normally when the data
Set uh was um submitted the article had not been published yet um so then we go back periodically to check if the article has been published and add the identifier for that article and then it will have the relationship in the data site metadata so these are the records
Basically that have related identifiers um in our uh registrations um there are a couple of these that do have um inverse relationships so a few of the thesis and dissertations that have an is supplemented by relation is actually to a data set that is uh in our repository
That then also has the inverse relationship but most of these relations are to external dois or your or else um okay and the last part here is what um my colleague ran will talk about where we’ve been doing some work to try to add uh reference lists uh to thesis and dissertation records
Raan yes thank you darl so this work have been done with uh as a collaboration between us and uh in Columbia University isan FR and um as darl said uh it’s a workflow that we want that we want to test to add the reference list uh display the references
In our repository and disseminate this references relationship to our record through data site medata um so we start with requesting the tech file from the student we email uh 200 around 200 student who complete their thesis in 2022 uh after that we decided to set uh an automation or automatic notification
To be sent to any student who archive uh uh their thesis of dissertation and um we didn’t get really high a response uh only 16 responses and the reference list uh added only two separate records um so next I will talk about uh converting uh the reference or the btic file to XML
So uh we con we converted the peptic file to XML using PHP some of that challenges that we ran into that um every B file is different and uh it’s not all the same is they will U some of the Bic file will include uh uh extra special character or
Extra uh bracket and uh also we noticed that uh some btic files including uh extra note at the beginning uh the first line of the file that uh been produced or generated by the program being used to generate the PIP file um also some uh identifier may be State
Differently uh for example ARF DUI ARF IDs uh it will be U identifi as URL or DUI um talking about uh converting the bip file to XML uh when when we were uh mapping the our XML to data site uh XML uh main two thing we
Need to or we thought about uh using related uh identifier or related items for the reference that we have and the relationship uh type that we want to use it site or reference so our decision were um Dar if we can go next please yeah so our uh decision were to create a
Related item for all the entries or entries for all the reference that we have addition to that we create related identifier entry for the reference with URL or the UI and we decide to use reference relationship type uh which is equant to sites relationship for our purpose okay this is one thing we
Noticed that uh only the DUI reference uh that will be show in uh uh data site commment and if we remove a related identifier it will not be remove the citation from data site comments next we can see an example of how uh the record look like with all uh
The reference showing in data site comments um and that’s all what I want to share with you today please let us know if you have anything to ask in the question B soon oh sorry I was muted sorry thanks so much ran and darel sorry I’m sorry I
Never was muted before I was talking so sorry thanks for that and I think I mean that’s good to see how we can use related identifier with du high and metadata that’s a good one and also I want to appreciate what you have seen concerning igsn with what K University is doing now
It’s over to aice a you may kindly share your screen okayy um good morning I’ll share my screen right now hope my screen is up now yes it is good yes so yeah this is um the DOI best veres uh the I the data repository officer and programmer for iita that is the
International Institute of tropical agriculture so um we kick start this so um in iita we utilize um dois and the dois we generated um to uh data site we have uh different uh research products from uh research I mean journals and article Publications we have images from our researches we have documentations we
Also have a data sets and these are the uh research products we actually generate DOI for and espcially on data sit DOI and uh at the moment uh we have um over 3,000 findable dois that have been registered with data site precisely that is um 3,000
237 uh data sets that are uh registered with uh data sit and all these are findable even through um Google data set search you will still get a ITA research products special data sets and as well as Publications even to Google Scholar you have dois there so and we have a
Workflow data management workflow that you can see on screen and uh at this point we can see where we assign dois to our research products we have uh we receive Dev data from the start to our researchers our scientists then we have data curators as well who do uh the
Quality checking for each of the uh data sets as well then uh we have a uh the data curators too who still update the data dictionary of our data sets we also have the institutional data manager who uh reviews the quality for quality assurance and quality control
Then the data curators still upates the CG call metadata the C metadata schema this is a thematic uh metadata schema for agriculture I us through the the CG Consortium so as a repository officer once this metadata has been completely filled it will be mapped to uh the data side metadata stea version
4.4 so we can see we assign our DOI using the fabria the DOI fabria from the data site uh we have uh been generating you know um the metadata to the CG call metadata schema which is a easy to map to C the data schema version 4.4 the uh
CG uh is an extension of the D as we will see in the few SL dist we know that the data site metadata I mean the DI Fabrica is intuitive generating DOI is straightforward as long as we have the all metadata details especially if you have been using uh uh metadata schemas
You found it very very easy to uh use the DUI FAA because we use the CG schema so it is easy for us to map this to uh the data site Fabrica metadata schema version 4.4 so we are able to provide a rich metadata which has really helped us uh increasing the findability
Accessibility inability and reability of uh um sear products so we have like um over 90% of all the details uh required uh to be filled in the um DOI Fabrica so we can see uh this is uh a kind of a mapping although this is the
The uh left side which is in yellow are the uh schema details from uh the CG um the data site API where we have uh the type of the DOI it is published I mean that is um findable if it’s uh the name of the um
Creators we have that we also have the identifier of the creator that is the orid can see we have the type Creator type here we have the Creator and this we can see it has been mapped from the CG core metadata to the data side metadata schemer the CG core like I said
Area is a a thematic metadata schema for agriculture and the iita is uh an agricultural Research Institute so we map these uh CG call metadata to the data set metadata schema version 4.4 which is seamless and uh we also exploit uh the data site apis for uh DOI generation we
Do some um integration with databases we have a different research databases that are thematic areas of the institution we have Cava base for the Cava CL research we have a m base for the M base I mean banana cor research we also have yam base for the yam cor research and also
We still have the Genetic Resource uh Center which have Genesis so we map all these uh metadata from all these databases these metadata are not structured so we have employed the VAP AP which is the VD based apis we uh pass the metadata to the second apis which is
The uh institutional data repository apis so this Maps the unstructured apis I mean unstructured metadata to the CG metadata to the second apis so from the second apis we uh pass that to the data side apis which will map the CG metadata to the uh data site metadata schema f
4.4 which is easy for us because the CG metadata already has all the details required in the data site metadata schema so this makes it easy for us to really map uh the metadata details and uh we generate dois automatically for these uh data sets coming from the
Databases so we also utilize the do to generate multistandard uh citation on our repositories we have this on our data repository that is the data. i.g we have the multi standard citation here through the C site uh thees between data site as well as course so data the
Course site has a mad of data citation standards so we just plug into uh the C site to pick like um 10 of these uh multistandard citation styles to put on our repository we can see here we have a way to link uh through our dois data
Site do generated through data sites to get different cation St so when you use uh uh resarch products there’s no um it is easy to site each of uh the research products so uh also we link oid these the oid ID of our researchers
To uh the data sets I mean we have on the repository we can see um I’ll work through how this is done in a bit so we make it in such a way that the uh data sets or whatever research products you have on data site with with
The data site DOI will definitely show up in your works on your orid profile so um to do this you do that from the orid profile of uh each researcher which we have Ted them on to uh link their orid with data side dois so um from the Orit
You can see you add Works click the search link search and Link um option when you want to add Works to your oit profile then you data site you can see where the data site is Select data s so from the data site you get an option to
Create a token for your um for the connection between data site and your orid you can also do this with your GTP but we do that with our orid hi this so um click that to get uh the token as well then also uh once the token is
Gotten you have to update the uh uh settings then uh with that you get uh your data sets I mean could be data sets could be Publications as well as any other visite products that you have your dois I mean your orid on and as data
Site dois so these will be linked to your um oid profile we can have um this particular scientist we have here as um a lot of data sets I mean dois not just permitted to scroll through his page so he has a lot of uh data site I mean dois
Pulled from his uh data site um Publications to his oid profile then uh also lastly but not least we also link uh the Publications I mean the research articles to uh the uh data sets we have so we have the data sets we also have the publication itself so we have a to
Link uh the two we have the dois from each publication goes back to the data sets uh used in the publication so um we do all this using our data site do usually the data site the DS we generated to data site and um we still
Have um more services that we are trying to um um aness to the data set like the data citation counters the downloads especially through um dat site Commons which we are working on so and we still hope to work more on all these um Services um if you have any questions
And we welcome them now thank you thank you so much H yeah thank you so much for the Fantastic presentation and good to know that you are using that uh feature in Commons to Bush all the Publications that you are having to the orid records yeah so to uh all the
Presenters please switch on your cam and to all the attendees if you have any question please post them in the Q&A box and H you may stop sharing yeah fantastic thank you so if you have any question please post it in the Q&A box and yeah while we
Are waiting for the question I just have an idea and within our conversation with gab regions and I’m sure also my colleague uh bu soon agree agree with me on this point we usually receive inquiries okay so I have a repository and I’m going to you connected with data
Site and connected with the DUIs but I have a full Tex information and metadata information in Arabic or in for example in Thai or in Hindi can I still connect with data site can I still use dois for my content so maybe darel and raan giving your experience with King
Abdullah repository maybe you have a full Tex and metadata information in Arabic if you can share uh your experience with that uh so we are not a good example because actually our University uh the metadata and full text is normally in English so I see okay yes
Sorry no worries but just so let you know the answer is usually yes you can still register even if the F TI or the metadata information in a different uh language you can still register uh the content H do you want to add anything yeah yes um in I because we have a
Science it’s a diverse um research environment sometimes we get some metadata that are in French so it is um once you can convert this to the French you can uh put that in um the DOI I mean do Fabrica using uh any language we PR usually French because we work with some
People who speak French you know so I know we do um Implement some met dois using French yeah yeah fantastic yeah thank you so much and we have a very interesting example for in Asia for example the national research Council in Thailand they are registering a huge
Number of of data sets in Thai languages all the metadata information are author names publication titles all these information are presented in the Tha language so this is a fantastic orinal example as well yeah if you have uh any questions please put it in the Q&A box I don’t see any questions until
Now I I was ask I mean I have a question for a member that was talking about igsn and it was one day the member was wondering whether creating samples and same time having paper Publications was talking about what could will there be problem with the identifiers because they are not the same
So I don’t know whether has experience of MBE yeah sorry but I didn’t get the question right so what is actually the problem when I register sample how I can site the sample then in a publication or um what actually yes I think this is basically
The same with a typical DOI um there’s a difference was a legacy igsn before we had a prefix for the whole igsn namespace it was uh 10273 and all registering members had to divide this Nam Space by using uh prefixes in the suffix Nam space of a
DUI and this changed so now we can create repositories as much as we want um the data site best practice and I think it’s also a rule is that you put only igsn in a single repository and don’t mix it up with other types like texts text documents or whatever data
Sets and so before we could recognize igsn just with the prefix now it’s not possible but I think there will be some services in future that can do this I think data site has this on the road map so we can basically identify it by the um prefix then as well
Um maybe this will help in Publications to further point out that this is a simple um identifier and not a text identifier but I think from a DI you can’t uh see it as well if it’s a data set or a text resource but maybe I’m talking too much in the wrong direction
Um if the person could specify the question U maybe I can further direct to a good answer then yes I I think you are addressing it because the organization was thinking of combining both samples with Publications in the same repository and they were asking me how would they be able to
Separate samples from Publications and because they consider their samples to be igsn and Publications will be DOI which is what you are just addressing yes yes that’s right what I would basically have separate repositories then um but make use of the um related identifiers then the cited by um and the
Inverse directions and so on um this would be helpful I think for for the disc discoverability then um if if you don’t mind I I’ll jump in a little bit so I’m I’m Rory Edmonds I’m the the samples Community Management data site so um igsn IDs are now
Functionally DOI so there is no distinction um in data sit systems um or Services as regards um an igsn and another DOI at the record level that is something as toga has kind of alluded to is something that’s definitely on our road map and something that we are very
Keen to implement um as we move forward with the data site metadata schema at the moment we are recognizing igsn IDs at the repository level so again as toga has has mentioned we basically have a repository type that is the an igsn ID catalog um and so you create a specific
Repository account and prefix as an igsn ID catalog you only put samples within that that allows us then to filter searches it allows us to filter harvesting it allows us to Monitor and help people who are registering samples as they do as they do so so we have a
Good solution but not necessarily the perfect solution and we have made recommendations as regards how do you um put site samples within the literature but I think that is an entirely difficult question anyway because samples are already very poorly cited within the literature so there is
There is already a problem that we need to address and talk with the the um publishing community and get some consensus both within the samples Community itself and within the publishing community so there is already that difficulty I mean we know that citation of of of text documentation is
Is easy we know that data is improving but not perfect but something like samples is still rather in its infancy by comparison so I apologize for jumping in but I just hope that that helps uh clarify things a little bit thank you yeah thank you Rory so
Please if you have any question you can post it in the Q&A box maybe you can give it one minute if you have any questions okay I don’t see any questions so thank you so much to all uh our speakers so I can see a message from yeah so yeah
Thank you so much to all our speakers today for sharing their interesting development uh with the community we highly appreciate your uh support and your presentations again all the slides will be available at data site zodo community and all the recording of This webinar also will be uploaded to data site
YouTube channel at the end once we close the webinar you will receive a ball so please give us uh your feedback this is very very important for us so share your feedback with us and thank you again for attending and joining uh data site Community session we’ll have also
Sessions throughout the day in different time zones so yeah please uh do join us thank you so much for your uh participation