Skip to content
Ondřej Košarko edited this page Jan 11, 2016 · 1 revision

{{toc}}

Eudat Replication

iRods (https://www.irods.org) technology is used for replication in EUDAT project. We use the replication for submissions. Our requirement is to have this module as an integral part of DSpace. The original DSpace module (link) cannot be used because it is too outdated; however, there is another implementation at https://www.irods.org/index.php/DSpace but has its own problems. The default jargon irods client library (link) shipped with DSpace had to be updated as well.

The workflow from the client point-of-view is simple: file replicas are uploaded somewhere* and you access the replicas using handles.

For now, we obtained credentials to cines (ariane.cines.fr) from Stephane.

Finally, after upgrading and reimplementing the plugin we have a working solution, see Control panel’s IRODs replication tab (only on https://ufal-point-dev.ms.mff.cuni.cz/jm/repository/ for now)

What do we replicate

We automatically replicate each submission after it has been approved. The submission is converted to AIP format which is uploaded to the iRods server. We use our PID in the name of each AIP e.g.,

irods://[email protected]:XX/CINESZone/home/jmisutka/dspace_1.8.2/11858_00-097C-0000-0001-487A-4-6451959456007568280.zip
irods://[email protected]:XX/CINESZone/home/jmisutka/dspace_1.8.2/11858_00-097C-0000-0001-487E-B-6544474914476974525.zip

After uploading the file, we fill out its metadata e.g.,

EUDAT_ROR : http://hdl.handle.net/11858/00-097Z-0000-0022-E46B-E
OTHER_AckEmail : [email protected]
OTHER_From : https://ufal-point-dev.ms.mff.cuni.cz/jm/xmlui

Then, the server assigns a (cines) PID to the replica and does more processing. The final set of metadata looks like:

ADMIN_Status : Transferred
EUDAT_PID : http://hdl.handle.net//11137/12976d62-3f9e-11e3-86be-0013725874a1?noredirect
EUDAT_ROR : http://hdl.handle.net/11858/00-097Z-0000-0022-E46B-E
INFO_Checksum : e1f90e29c5983d78abfd073acb65ad4a
INFO_TimeOfDataUpload : 2013-10-28.02:56:29
INFO_TimeOfTransfer : 2013-10-28.02:57:12
OTHER_AckEmail : [email protected]
OTHER_From : https://ufal-point-dev.ms.mff.cuni.cz/jm/xmlui

When you list the replicas in Control panel, the metadata are read and are used to sort the listing according to INFO_TimeOfDataUpload and status is read from ADMIN_Status.

Implementation

There are three important classes responsible for replication. The ReplicationManager, IrodsReplication and ItemModifyConsumer. The first class is an adapter for IrodsReplication which is independent of DSpace (except for getting credentials from configuration files). ReplicationManager contains these important methods/functions:

  • replicate - replicates DSpace Item which represents a submission,
  • replicate_missing - replicate each handle which cannot be found among replicas,
  • list_replicas - list files inside specific directory used for replication, query the server for specific metadata for each file (according to specification from the EUDAT partner CINES responsible for the IRODS backend e.g., EUDAT_PID),
  • list_missing_replicas - list replicas, obtain handle for each replica and subtract the set of all handles and the replica handles.

The asynchronous upload code in ReplicationManager is shown below:

  public void run() {
...
      context = new Context();
      context.setCurrentUser( EPerson.find(context, this.eperson_id) );
...
      // wait for DSpace for submitting the item
      Item item_with_proper_context = wait_for_dspace_item(context, item_id);

      // prepare AIP
      File file = File.createTempFile(IrodsReplication.handle_to_name(handle)+"-",".zip");
      file.deleteOnExit();
      new DSpaceAIPDisseminator().disseminate(
          context, item_with_proper_context, new PackageParameters(), file);

      // AIP failure
      if ( !file.exists() ) {
        throw new IOException( String.format(
            "AIP package has not been created [%s]", file.getCanonicalPath()) );
      }

      // replicate
      String item_url = String.format( "%s%s",
              ConfigurationManager.getProperty("handle.canonical.prefix"),
              handle);
      new IrodsReplication().replicate(
              file.getAbsolutePath(), item_url, force);
...

You can look at the actual code of ReplicationManager.java and IrodsReplication.java .

The last part of replication is the DSpace hook which defines when the replication is called. It is an event listener which starts the replication on creation and modification of items. See ItemModifyConsumer.java .

UI

The integration in DSpace is through our DSpace control panel extension. You can

  • List home directory - displays remote home directory of the IRODS server
  • List replicas - list files in specific directory including information about status, links, metadata
  • List items to be replicated - lists files which are not on remote server (e.g., they were manually deleted by irods web user interface)
  • Replicate missing items async. - starts replication of the items above but max. count is the number specified in the user interface
  • Replicate specific handle - replicate item (force delete if it exists) manually by specifying the handle e.g., 11858/00-097C-0000-000D-F696-9

How to install EUDAT replication for DSpace 1.8

  1. download * ReplicationManager.java * IrodsReplication.java * ItemModifyConsumer.java

  2. put them into dspace-api/src/main/java/cz/cuni/mff/ufal according to the package they belong to. Either change the logger class in ReplicationManager to standard org.apache.log4j.Logger or use our own implementation (which sends emails on specific errors available here )

  3. add these variables into dspace.cfg and fill them out

    irods specific for EUDAT

    ufal.replication.eudat.host= ufal.replication.eudat.port= ufal.replication.eudat.username= ufal.replication.eudat.password= ufal.replication.eudat.homedirectory= ufal.replication.eudat.zone= ufal.replication.eudat.defaultstorage= ufal.replication.eudat.notification_email=

and add the event listener to dspace.cfg

# list of event listeners
event.dispatcher.default.consumers = search, browse, discovery, eperson, harvester, eudatreplication 
# consumer to maintain the browse index
event.consumer.eudatreplication.class = cz.cuni.mff.ufal.dspace.storage.ItemModifyConsumer
event.consumer.eudatreplication.filters = Community|Collection|Item+Create|Modify 

Control panel tab integration

Integrating the control panel tab requires a refactored version of our control panel, see Controlpanel and then ControlPanelReplicationTab.

Test it (restricted)

  1. https://ufal-point-dev.ms.mff.cuni.cz/jm/xmlui/admin/panel?replication

  2. tunnel port: ssh -L 8899:ariane.cines.fr:80 [email protected]

  3. https://ufal-point-dev.ms.mff.cuni.cz/jm/repository/

Related tasks

  • #663

Links

Clone this wiki locally