-
Notifications
You must be signed in to change notification settings - Fork 18
PIDs Persistent Identifiers
We summarise policies and our usage of PIDs below.
By default to a landing page (web page), where a user can read more information on the data (metadata) and download the data (if possible). With a specific query, resolve for "pure metadata". That is supposed to be mainly for automation.
-
https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0022-AAF5-B
- has this PID: 11858/00-097C-0000-0022-AAF5-B
- which can be accessed through a handle http proxy http://hdl.handle.net/11858/00-097C-0000-0022-AAF5-B
- asking directly for cmdi http://hdl.handle.net/11858/00-097C-0000-0022-AAF5-B@format=cmdi
PID should point to a state of data in time. "Data" here means primarily the abstract information, so a trivial change in format is OK. A non-trivial format change is borderline and it is better to assign it a new PID. A format change that changes the information in data, like exporting PDT 2.5 from PML toCoNLL format (that can only store a subset of the information) clearly requires a new PID.
Only repository administrators. Persistency is very important. That being said not all changes are equal and maybe submitters (and definitely Editors) could add new files (like readmes, documentation) and change metadata. Submitters' changes should still go to through the Editors.
We should also simplify creating new versions (derived records) from existing records (and specify the relation of records by dc.relation: subset, version-of, supersedes, etc.).
Submission. This way submitters can decide the granularity. They submit (and describe) what should be preserved with PID (a corpus, a sub-corpus of an existing corpus, one document, or even a special sentence).
PIDs could be used also for pointing to words in a corpus, for instance, but there are significantly more effective ways to do it.
New handle functionality was implemented in DSpace. This includes the following:
Functionality to define different handle prefixes for different communities was added. This can be useful for instance in case of merging multiple repositories into one, where the former repositories are transformed into communities.
Example configuration:
# per community pid configurations for pid prefixes of the format:
# communtity=<community ID>,prefix=<prefix>,type=<local|epic>,canonical_prefix=<URL of handle>,subprefix=<subprefix>
# multiple configurations can be given separated by semicolon
# default configuration should have asterisk as the community ID
# subprefix is only used for local handles
lindat.pid.community.configurations = community=*,prefix=11858,type=epic,canonical_prefix=http://hdl.handle.net/,subprefix=1
the subprefix
keyword is optional and can be used to generate prefixes of the form: <prefix>/<subprefix>-<handle>.
With #766 the default (community=*
) is used also for new collections and communties
In LINDAT/CLARIN project the following subprefixes are used for new items starting from June 17th, 2014:
| *Subprefix* | *Description* |
| 1 | Common submissions |
| 5 | Weblicht Web Services |
| 6 | Demos |
New functionality to store links to external resources in the handle table (so called External handles) was added. It is now possible to have records that do not point to an object in the database, but rather to some defined absolute URL.
This serves for two main purposes:
- it can be used as a mean to point multiple handles at the same record if needed (for instance if an item is a duplicate of some existing item and has to be deleted or if the handle prefix changes, but old handles should remain resolvable)
- it can be used to point to external non data entities such as services
Table handle was extended by adding new url column:
Table "public.handle"
Column | Type | Modifiers
------------------+-------------------------+-----------
handle_id | integer | not null
handle | character varying(256) |
resource_type_id | integer |
resource_id | integer |
url | character varying(2048) |
The following record is an example of an external handle pointing at an external resource:
handle_id | handle | resource_type_id | resource_id | url
-----------+--------------------------------+------------------+-------------+-----------------------------------------------------------------------------
288 | 11234/5-CESILKO-URL | | | https://lindat.mff.cuni.cz/services/rest/cesilko/translate
The following record is an example of an external handle pointing at an existing handle:
handle_id | handle | resource_type_id | resource_id | url
-----------+--------------------------------+------------------+-------------+-----------------------------------------------------------------------------
123 | 11858/00-097C-0000-0001-4870-7 | | | http://hdl.handle.net/11858/00-097C-0000-0001-4877-A
Although a tool for changing handle prefix already existed in DSpace as a command line tool (bin/dspace), the functionality was reimplemented and corrected and a GUI was added to facilitate this task and prevent user made mistakes. Old handles are preserved in item metadata and if needed the old handles are archived i.e. preserved in handle table but pointed to new handles via the external handle functionality mentioned above.
New GUI for managing handles has been developed. This enables users to conveniently browse through existing handles, add new external handles and edit existing external handles as well as change handle prefix.
Screenhost of listing of existing handles:
Screenshot of form for changing handle prefix:
Screenshot of form for editing external handle: