Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Mechanism (initial purpose: Data Access Request) #3609

Open
damoodamoo opened this issue Jul 11, 2023 · 9 comments
Open

Request Mechanism (initial purpose: Data Access Request) #3609

damoodamoo opened this issue Jul 11, 2023 · 9 comments
Assignees
Labels

Comments

@damoodamoo
Copy link
Member

damoodamoo commented Jul 11, 2023

Context

It's common that a Researcher needs to request access for a specific data set on which to perform their research. The way this happens across projects, organisations and industries are myriad, so a core TRE Data Access Request mechanism must provide:

  • A minimum, base experience to allow a researcher to log a request, and an authorised person to approve that request.
  • The ability for an organisation to supply their own request form, in JSON Forms format
  • Extension points to enable a customer to build their own backend workflow and trigger their own processes during and following the request.

User Stories

As a Researcher, I want to use the TRE Portal to create a new Data Access Request for the workspace I am working in.
As a Researcher, I want to view my request, see any updates to it, and understand where in the process it currently is.
As a Researcher, I want to see all existing and past Data Access Requests that pertain to the workspace I am in.

As a Data Controller / Data Manager, I want to view a list of Data Access Requests for the workspace I am looking at.
As a Data Controller / Data Manager, I want to use the TRE Portal to approve / reject the request, and supply comments.
As a Data Controller / Data Manager, I want to be able to send the request back to the initiator for re-work, supplying comments.

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type.
As a TRE Implementer, I want to be able to trigger my own workflows via HTTP/webhook after the initial request has been submitted, so that I can use a workflow tool of my choice to build a complex business-related approval workflow.
As a TRE Implementer, I want my custom workflow to be able to log progress messages back to the initial request, so the initiator can see what is happening.
As a TRE Implementer, I want my custom workflow to be able to update the status of the initial request.
As a TRE Implementer, I want to trigger a custom background process via HTTP/webhook to provision the actual dataset in some way, after the request has been approved.

Differences to Airlock

The airlock process is a more rigid and custom process. It contains lots of logic around reviews - creating review VMs, wiring up a Review Workspace, providing mechanisms to actually move files, generate SAS links etc. Whilst the new DAR mechanism will be informed by Airlock, and will be integrated in the UX, it will not reuse the airlock API endpoints. Efforts will be made to reduce code duplication - such as refactoring the concept of a status away from being airlock specific, and shared by all request types.

Could this be used for any type of approval process, such as requesting a VM?

Potentially. However, the primary use case is Data Access Requests, and that will be the focus in the UI. There will be a single requests service, which handles the database interactions and CRUD operations, and then it will be up to the endpoints to build upon that service to enforce any custom model structures and decision making applicable to that request type.

Managing Form Templates

Each organisation will need to be able to build and manage a number of custom form templates. These will be heavily inspired from the resource templates. In the repo, a new directory named forms will be created under ./templates. This will house any number of JSON schema documents.

Each form template will contain the following fields:

  • $schema, $id, title, description: Same as resource templates
  • form:
    • required, properties, allOf/oneOf (etc) blocks: Same as resource templates, defining the fields
    • uiSchema to order fields and provide extra UI hints
  • triggers: a new list block specifically for forms. These entries will be used by the API logic to trigger webhook URLs upon status changes.
    • name: string - friendly name of the trigger
    • status: string - if the request status equals this, fire the below URI
    • URI: string - may contain sensitive data. Will not be surfaced in get requests.
  • formType: string - Used to lookup, for instance "all Data Access Request forms".
  • isGlobal: boolean - Indicates whether this form could show up anywhere in the TRE
  • workspaceTypes: Optional. A list of strings matching the workspace definition names (ie base). Used to recall forms only meant for a particular type of workspace.

To support the new forms concept, we'll need the underlying plumbing:

  • Forms cosmos collection
  • forms API and service:
    • create / update / delete operations accessible only by TRE Admin
    • get/list operations accessible to TRE Admins. If a form is scoped to a particular type of workspace, the user requesting the form will need to be authenticated to a workspace of that type.

Requests API Design

request model

A model to contain a fixed and flexible structure to store data for all requests

  • title (string)
  • description (string)
  • requestor (user object)
  • status (string / enum)
  • request_type (string / enum)
  • requested_when (datetime)
  • workspace_id (guid, optional)
  • messages (list)
    • message (string)
    • user (user object)
    • message_when (datetime)
  • updates (list - each item capturing the diff made to the overall object)
    • update (dict of fields submitted for update)
    • user (user object)
    • updated_when (datetime)
  • triggers (list - each item capturing details about a fired trigger)
    • trigger_name
    • status
    • response
    • triggered_when (datetime)
  • request_data (dict - acts as a flexible property bag to store any custom request data. Likely populated from form data defined in the form above)

requests service

A single service to handle the CRUD operations for request models. This service will handle shared logic around creating, updating, and listing requests. The service will be intentionally 'dumb'. It will be up to the calling code to enforce any permission restrictions, data structure checks (for any custom request_data) etc.

create_request

Accept a request model.

  • check_and_fire_triggers
  • Store the model in the database
  • Return it along with a unique ID.

get_request

Accept a request_id, return the object from the database.

list_requests_for_workspace

Accept a workspace_id. Return a date ordered list of all request objects matching the workspace_id.

list_my_requests

Accept an optional workspace_id. Return a date-ordered list of all request objects where the requestor.user_id == current user ID.

update_request

Accept a request_id, user, and diff object (dict) containing only the changes to make (ie. a PUT).

  • Get the request object via get_request
  • Add the diff object to the updates list in the request
  • check_and_fire_triggers
  • Merge the diff object with the request object to make the changes
  • Save the request model back to the database
  • Return the updated model

check_and_fire_triggers

Internal to this service. Accept the request, status and the form_template.

  • If the status is IN form_template.triggers:
    • Send the entire request object to the URI defined as a POST
    • Return success if POST succeeded

add_message

Accept a request_id and message (string).

  • Get the request object via get_request
  • Add the message to the messages list
  • Save the request model back to the database
  • Return the updated model

Note: The methods above are very likely to change as we implement. More methods will emerge, but this should give enough of an idea of the purpose of this service.

Indicative UI Mockups

Viewing all requests within a given workspace:
image

Starting a new request. The UI offers all the forms available for this workspace, of type data_access_request:
image

Selecting a particular type of form allows the requestor to complete the details and submit. Following submission, the request is marked as in_review, and triggers a background workflow to collect approvals as needed:
image

@damoodamoo damoodamoo self-assigned this Jul 11, 2023
@marrobi
Copy link
Member

marrobi commented Jul 12, 2023

Looks good. One question, typically the DAR comes before a workspace has been requested. How does this tie in with this user story?

As a TRE Implementer, I want to be able to create a number of Data Access Request forms per workspace type.

Not sure we need per workspace forms? Is there a use case for other types of request?

@damoodamoo
Copy link
Member Author

damoodamoo commented Jul 12, 2023

@marrobi - this is the main question I have too, and this design takes an opinion that a workspace needs to exist before a DAR is performed. The end of a DAR would trigger a provisioning process to move the data over into the workspace, so it would either need to exist already, or we'd a wider process to create the workspace and then trigger the data provisioning.

This requests mechanism could support a "Project initiation" style workflow too, with requests being raised at the top level of the TRE and resulting in a new workspace. That, in my view, would be a next step after this is in place, as we need to support getting data into an existing workspace either way.

I would also expect that we would want per-workspace forms. Each form would define a trigger to run when the request is approved, which might well be different between workspace types.

@damoodamoo
Copy link
Member Author

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

@damoodamoo damoodamoo closed this as not planned Won't fix, can't repro, duplicate, stale Jul 20, 2023
@marrobi
Copy link
Member

marrobi commented Jul 21, 2023

As discussed, closing this as building a Data Access Request mechanism into the TRE is not on the roadmap.

I wouldn't say it isn't on the roadmap, it's being requested by many users, the issue is we don't have the resource to support beyond initial implementation. This may change.

@marrobi
Copy link
Member

marrobi commented Jan 15, 2024

Going to reopen, as it is a requested feature, so shows up on backlog.

@marrobi marrobi reopened this Jan 15, 2024
@marrobi
Copy link
Member

marrobi commented Sep 5, 2024

@damoodamoo I'm back in this space again. Looking at your (and team's) work here - https://github.com/SAFEHR-data/Data-Access-Request-Seedling .

The ask is for this to happen pre workspace creation, the concept of project has been discussed, but I'm seeing this as 1:1 match with "data access request". Welcome a discussion on how much of the seedling work could be reused. UI is out of scope, so the forms work would likely have to wait, but the request APIs are a good starting point.

@West-P
Copy link

West-P commented Feb 11, 2025

Adding to this as for some research projects the data access request could come post workspace creation/request.

Would an option to enable/disable the actual data upload/vmreview as part of the airlock process?

So the process would be what we already have in the airlock excluding:

  • Creation of storage account for data review
  • Creation of review VM upon airlock manager reviewing request.
  • Post review approval, vm destroy and storage account actions are excluded

This would allow the auditing of data access requests within the TRE.

Alternatively, a process (very similar to the airlock), called External Data Access Request, could be created.
Excluding the elements above. But keeping the same requesting process for a researcher and a similar reviewing experience for the Airlock Manager? The only difference is there is no actual data to store and review. The data is already known (maybe a TRE Administrator can populate a list of "known" data sources that may already have an external approval process as a "pre-approved" dataset).

@marrobi
Copy link
Member

marrobi commented Feb 11, 2025

@West-P see https://gist.github.com/marrobi/be5fc6d1932848b9e1ada674628653e3 (3 years old, but what this issue would look to implement).

@marrobi
Copy link
Member

marrobi commented Feb 11, 2025

On approval a data provisioner, be it data factory, or some other compute either moves the data or creates the appropriate network connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants