Request Mechanism (initial purpose: Data Access Request) #3609

damoodamoo · 2023-07-11T11:53:37Z

Context

It's common that a Researcher needs to request access for a specific data set on which to perform their research. The way this happens across projects, organisations and industries are myriad, so a core TRE Data Access Request mechanism must provide:

A minimum, base experience to allow a researcher to log a request, and an authorised person to approve that request.
The ability for an organisation to supply their own request form, in JSON Forms format
Extension points to enable a customer to build their own backend workflow and trigger their own processes during and following the request.

User Stories

As a Researcher, I want to use the TRE Portal to create a new Data Access Request for the workspace I am working in.
As a Researcher, I want to view my request, see any updates to it, and understand where in the process it currently is.
As a Researcher, I want to see all existing and past Data Access Requests that pertain to the workspace I am in.

As a Data Controller / Data Manager, I want to view a list of Data Access Requests for the workspace I am looking at.
As a Data Controller / Data Manager, I want to use the TRE Portal to approve / reject the request, and supply comments.
As a Data Controller / Data Manager, I want to be able to send the request back to the initiator for re-work, supplying comments.

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type.
As a TRE Implementer, I want to be able to trigger my own workflows via HTTP/webhook after the initial request has been submitted, so that I can use a workflow tool of my choice to build a complex business-related approval workflow.
As a TRE Implementer, I want my custom workflow to be able to log progress messages back to the initial request, so the initiator can see what is happening.
As a TRE Implementer, I want my custom workflow to be able to update the status of the initial request.
As a TRE Implementer, I want to trigger a custom background process via HTTP/webhook to provision the actual dataset in some way, after the request has been approved.

Differences to Airlock

The airlock process is a more rigid and custom process. It contains lots of logic around reviews - creating review VMs, wiring up a Review Workspace, providing mechanisms to actually move files, generate SAS links etc. Whilst the new DAR mechanism will be informed by Airlock, and will be integrated in the UX, it will not reuse the airlock API endpoints. Efforts will be made to reduce code duplication - such as refactoring the concept of a status away from being airlock specific, and shared by all request types.

Could this be used for any type of approval process, such as requesting a VM?

Potentially. However, the primary use case is Data Access Requests, and that will be the focus in the UI. There will be a single requests service, which handles the database interactions and CRUD operations, and then it will be up to the endpoints to build upon that service to enforce any custom model structures and decision making applicable to that request type.

Managing Form Templates

Each organisation will need to be able to build and manage a number of custom form templates. These will be heavily inspired from the resource templates. In the repo, a new directory named forms will be created under ./templates. This will house any number of JSON schema documents.

Each form template will contain the following fields:

$schema, $id, title, description: Same as resource templates
form:
- required, properties, allOf/oneOf (etc) blocks: Same as resource templates, defining the fields
- uiSchema to order fields and provide extra UI hints
triggers: a new list block specifically for forms. These entries will be used by the API logic to trigger webhook URLs upon status changes.
- name: string - friendly name of the trigger
- status: string - if the request status equals this, fire the below URI
- URI: string - may contain sensitive data. Will not be surfaced in get requests.
formType: string - Used to lookup, for instance "all Data Access Request forms".
isGlobal: boolean - Indicates whether this form could show up anywhere in the TRE
workspaceTypes: Optional. A list of strings matching the workspace definition names (ie base). Used to recall forms only meant for a particular type of workspace.

To support the new forms concept, we'll need the underlying plumbing:

Forms cosmos collection
forms API and service:
- create / update / delete operations accessible only by TRE Admin
- get/list operations accessible to TRE Admins. If a form is scoped to a particular type of workspace, the user requesting the form will need to be authenticated to a workspace of that type.

Requests API Design

`request` model

A model to contain a fixed and flexible structure to store data for all requests

title (string)
description (string)
requestor (user object)
status (string / enum)
request_type (string / enum)
requested_when (datetime)
workspace_id (guid, optional)
messages (list)
- message (string)
- user (user object)
- message_when (datetime)
updates (list - each item capturing the diff made to the overall object)
- update (dict of fields submitted for update)
- user (user object)
- updated_when (datetime)
triggers (list - each item capturing details about a fired trigger)
- trigger_name
- status
- response
- triggered_when (datetime)
request_data (dict - acts as a flexible property bag to store any custom request data. Likely populated from form data defined in the form above)

`requests` service

A single service to handle the CRUD operations for request models. This service will handle shared logic around creating, updating, and listing requests. The service will be intentionally 'dumb'. It will be up to the calling code to enforce any permission restrictions, data structure checks (for any custom request_data) etc.

`create_request`

Accept a request model.

check_and_fire_triggers
Store the model in the database
Return it along with a unique ID.

`get_request`

Accept a request_id, return the object from the database.

`list_requests_for_workspace`

Accept a workspace_id. Return a date ordered list of all request objects matching the workspace_id.

`list_my_requests`

Accept an optional workspace_id. Return a date-ordered list of all request objects where the requestor.user_id == current user ID.

`update_request`

Accept a request_id, user, and diff object (dict) containing only the changes to make (ie. a PUT).

Get the request object via get_request
Add the diff object to the updates list in the request
check_and_fire_triggers
Merge the diff object with the request object to make the changes
Save the request model back to the database
Return the updated model

`check_and_fire_triggers`

Internal to this service. Accept the request, status and the form_template.

If the status is IN form_template.triggers:
- Send the entire request object to the URI defined as a POST
- Return success if POST succeeded

`add_message`

Accept a request_id and message (string).

Get the request object via get_request
Add the message to the messages list
Save the request model back to the database
Return the updated model

Note: The methods above are very likely to change as we implement. More methods will emerge, but this should give enough of an idea of the purpose of this service.

Indicative UI Mockups

Viewing all requests within a given workspace:

Starting a new request. The UI offers all the forms available for this workspace, of type data_access_request:

Selecting a particular type of form allows the requestor to complete the details and submit. Following submission, the request is marked as in_review, and triggers a background workflow to collect approvals as needed:

The text was updated successfully, but these errors were encountered:

marrobi · 2023-07-12T08:16:46Z

Looks good. One question, typically the DAR comes before a workspace has been requested. How does this tie in with this user story?

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type.

Not sure we need per workspace forms? Is there a use case for other types of request?

damoodamoo · 2023-07-12T10:51:30Z

@marrobi - this is the main question I have too, and this design takes an opinion that a workspace needs to exist before a DAR is performed. The end of a DAR would trigger a provisioning process to move the data over into the workspace, so it would either need to exist already, or we'd a wider process to create the workspace and then trigger the data provisioning.

This requests mechanism could support a "Project initiation" style workflow too, with requests being raised at the top level of the TRE and resulting in a new workspace. That, in my view, would be a next step after this is in place, as we need to support getting data into an existing workspace either way.

I would also expect that we would want per-workspace forms. Each form would define a trigger to run when the request is approved, which might well be different between workspace types.

damoodamoo · 2023-07-20T08:55:01Z

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

marrobi · 2023-07-21T10:44:10Z

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

I wouldn't say it isn't on the roadmap, it's being requested by many users, the issue is we don't have the resource to support beyond initial implementation. This may change.

marrobi · 2024-01-15T12:42:48Z

Going to reopen, as it is a requested feature, so shows up on backlog.

marrobi · 2024-09-05T11:26:43Z

@damoodamoo I'm back in this space again. Looking at your (and team's) work here - https://github.com/SAFEHR-data/Data-Access-Request-Seedling .

The ask is for this to happen pre workspace creation, the concept of project has been discussed, but I'm seeing this as 1:1 match with "data access request". Welcome a discussion on how much of the seedling work could be reused. UI is out of scope, so the forms work would likely have to wait, but the request APIs are a good starting point.

West-P · 2025-02-11T12:52:38Z

Adding to this as for some research projects the data access request could come post workspace creation/request.

Would an option to enable/disable the actual data upload/vmreview as part of the airlock process?

So the process would be what we already have in the airlock excluding:

Creation of storage account for data review
Creation of review VM upon airlock manager reviewing request.
Post review approval, vm destroy and storage account actions are excluded

This would allow the auditing of data access requests within the TRE.

Alternatively, a process (very similar to the airlock), called External Data Access Request, could be created.
Excluding the elements above. But keeping the same requesting process for a researcher and a similar reviewing experience for the Airlock Manager? The only difference is there is no actual data to store and review. The data is already known (maybe a TRE Administrator can populate a list of "known" data sources that may already have an external approval process as a "pre-approved" dataset).

marrobi · 2025-02-11T12:58:53Z

@West-P see https://gist.github.com/marrobi/be5fc6d1932848b9e1ada674628653e3 (3 years old, but what this issue would look to implement).

marrobi · 2025-02-11T12:59:45Z

On approval a data provisioner, be it data factory, or some other compute either moves the data or creates the appropriate network connection.

damoodamoo added the feature label Jul 11, 2023

damoodamoo self-assigned this Jul 11, 2023

damoodamoo closed this as not planned Won't fix, can't repro, duplicate, stale Jul 20, 2023

marrobi reopened this Jan 15, 2024

marrobi assigned LizaShak and unassigned damoodamoo Nov 14, 2024

This was referenced Feb 4, 2025

Pushing data into workspaces from external data stores #4312

Open

Routing approved airlock data to shared storage automatically #4335

Open

marrobi added this to AzureTRE Feature Roadmap Feb 11, 2025

marrobi moved this to Next in AzureTRE Feature Roadmap Feb 11, 2025

marrobi mentioned this issue Feb 20, 2025

Request and approval workflow for creation of new resources #3099

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Mechanism (initial purpose: Data Access Request) #3609

Request Mechanism (initial purpose: Data Access Request) #3609

damoodamoo commented Jul 11, 2023 •

edited

Loading

marrobi commented Jul 12, 2023

damoodamoo commented Jul 12, 2023 •

edited

Loading

damoodamoo commented Jul 20, 2023

marrobi commented Jul 21, 2023

marrobi commented Jan 15, 2024

marrobi commented Sep 5, 2024

West-P commented Feb 11, 2025

marrobi commented Feb 11, 2025

marrobi commented Feb 11, 2025

Request Mechanism (initial purpose: Data Access Request) #3609

Request Mechanism (initial purpose: Data Access Request) #3609

Comments

damoodamoo commented Jul 11, 2023 • edited Loading

Context

User Stories

Differences to Airlock

Could this be used for any type of approval process, such as requesting a VM?

Managing Form Templates

Requests API Design

request model

requests service

create_request

get_request

list_requests_for_workspace

list_my_requests

update_request

check_and_fire_triggers

add_message

Indicative UI Mockups

marrobi commented Jul 12, 2023

damoodamoo commented Jul 12, 2023 • edited Loading

damoodamoo commented Jul 20, 2023

marrobi commented Jul 21, 2023

marrobi commented Jan 15, 2024

marrobi commented Sep 5, 2024

West-P commented Feb 11, 2025

marrobi commented Feb 11, 2025

marrobi commented Feb 11, 2025

damoodamoo commented Jul 11, 2023 •

edited

Loading

`request` model

`requests` service

`create_request`

`get_request`

`list_requests_for_workspace`

`list_my_requests`

`update_request`

`check_and_fire_triggers`

`add_message`

damoodamoo commented Jul 12, 2023 •

edited

Loading