-
Notifications
You must be signed in to change notification settings - Fork 18
EudatReplication
{{toc}}
iRods (https://www.irods.org) technology is used for replication in EUDAT project. We use the replication for submissions. Our requirement is to have this module as an integral part of DSpace. The original DSpace module (link) cannot be used because it is too outdated; however, there is another implementation at https://www.irods.org/index.php/DSpace but has its own problems. The default jargon irods client library (link) shipped with DSpace had to be updated as well.
The workflow from the client point-of-view is simple: file replicas are uploaded somewhere* and you access the replicas using handles.
For now, we obtained credentials to cines (ariane.cines.fr) from Stephane.
Finally, after upgrading and reimplementing the plugin we have a working solution, see Control panel’s IRODs replication tab (only on https://ufal-point-dev.ms.mff.cuni.cz/jm/repository/ for now)
We automatically replicate each submission after it has been approved. The submission is converted to AIP format which is uploaded to the iRods server. We use our PID in the name of each AIP e.g.,
irods://[email protected]:XX/CINESZone/home/jmisutka/dspace_1.8.2/11858_00-097C-0000-0001-487A-4-6451959456007568280.zip
irods://[email protected]:XX/CINESZone/home/jmisutka/dspace_1.8.2/11858_00-097C-0000-0001-487E-B-6544474914476974525.zip
After uploading the file, we fill out its metadata e.g.,
EUDAT_ROR : http://hdl.handle.net/11858/00-097Z-0000-0022-E46B-E
OTHER_AckEmail : [email protected]
OTHER_From : https://ufal-point-dev.ms.mff.cuni.cz/jm/xmlui
Then, the server assigns a (cines) PID to the replica and does more processing. The final set of metadata looks like:
ADMIN_Status : Transferred
EUDAT_PID : http://hdl.handle.net//11137/12976d62-3f9e-11e3-86be-0013725874a1?noredirect
EUDAT_ROR : http://hdl.handle.net/11858/00-097Z-0000-0022-E46B-E
INFO_Checksum : e1f90e29c5983d78abfd073acb65ad4a
INFO_TimeOfDataUpload : 2013-10-28.02:56:29
INFO_TimeOfTransfer : 2013-10-28.02:57:12
OTHER_AckEmail : [email protected]
OTHER_From : https://ufal-point-dev.ms.mff.cuni.cz/jm/xmlui
When you list the replicas in Control panel, the metadata are read and are used to sort the listing according to INFO_TimeOfDataUpload and status is read from ADMIN_Status.
There are three important classes responsible for replication. The ReplicationManager
, IrodsReplication
and ItemModifyConsumer
. The first class is an adapter for IrodsReplication
which is independent of DSpace (except for getting credentials from configuration files). ReplicationManager
contains these important methods/functions:
-
replicate
- replicates DSpace Item which represents a submission, -
replicate_missing
- replicate each handle which cannot be found among replicas, -
list_replicas
- list files inside specific directory used for replication, query the server for specific metadata for each file (according to specification from the EUDAT partner CINES responsible for the IRODS backend e.g., EUDAT_PID), -
list_missing_replicas
- list replicas, obtain handle for each replica and subtract the set of all handles and the replica handles.
The asynchronous upload code in ReplicationManager
is shown below:
public void run() {
...
context = new Context();
context.setCurrentUser( EPerson.find(context, this.eperson_id) );
...
// wait for DSpace for submitting the item
Item item_with_proper_context = wait_for_dspace_item(context, item_id);
// prepare AIP
File file = File.createTempFile(IrodsReplication.handle_to_name(handle)+"-",".zip");
file.deleteOnExit();
new DSpaceAIPDisseminator().disseminate(
context, item_with_proper_context, new PackageParameters(), file);
// AIP failure
if ( !file.exists() ) {
throw new IOException( String.format(
"AIP package has not been created [%s]", file.getCanonicalPath()) );
}
// replicate
String item_url = String.format( "%s%s",
ConfigurationManager.getProperty("handle.canonical.prefix"),
handle);
new IrodsReplication().replicate(
file.getAbsolutePath(), item_url, force);
...
You can look at the actual code of ReplicationManager.java and IrodsReplication.java .
The last part of replication is the DSpace hook which defines when the replication is called. It is an event listener which starts the replication on creation and modification of items. See ItemModifyConsumer.java .
The integration in DSpace is through our DSpace control panel extension. You can
- List home directory - displays remote home directory of the IRODS server
- List replicas - list files in specific directory including information about status, links, metadata
- List items to be replicated - lists files which are not on remote server (e.g., they were manually deleted by irods web user interface)
- Replicate missing items async. - starts replication of the items above but max. count is the number specified in the user interface
- Replicate specific handle - replicate item (force delete if it exists) manually by specifying the handle e.g., 11858/00-097C-0000-000D-F696-9
-
download * ReplicationManager.java * IrodsReplication.java * ItemModifyConsumer.java
-
put them into
dspace-api/src/main/java/cz/cuni/mff/ufal
according to the package they belong to. Either change the logger class inReplicationManager
to standard org.apache.log4j.Logger or use our own implementation (which sends emails on specific errors available here ) -
add these variables into dspace.cfg and fill them out
ufal.replication.eudat.host= ufal.replication.eudat.port= ufal.replication.eudat.username= ufal.replication.eudat.password= ufal.replication.eudat.homedirectory= ufal.replication.eudat.zone= ufal.replication.eudat.defaultstorage= ufal.replication.eudat.notification_email=
and add the event listener to dspace.cfg
# list of event listeners
event.dispatcher.default.consumers = search, browse, discovery, eperson, harvester, eudatreplication
# consumer to maintain the browse index
event.consumer.eudatreplication.class = cz.cuni.mff.ufal.dspace.storage.ItemModifyConsumer
event.consumer.eudatreplication.filters = Community|Collection|Item+Create|Modify
Integrating the control panel tab requires a refactored version of our control panel, see Controlpanel and then ControlPanelReplicationTab.
-
https://ufal-point-dev.ms.mff.cuni.cz/jm/xmlui/admin/panel?replication
-
tunnel port: ssh -L 8899:ariane.cines.fr:80 [email protected]
- create new submission, go to http://ufal-point-dev.ms.mff.cuni.cz:8083/login?from=%2F , select dspace-master-jm-lazy-test and click Build Now
- #663
- https://confluence.csc.fi/display/Eudat/Home
- http://ariane.cines.fr/rodsweb/index.php (e.g, through tunnel on ufal-point-dev to ariane.cines.fr:80)