Redesign Airlock Export Process to Eliminate SAS URLs #4308

TonyWildish-BH · 2025-02-04T13:55:30Z

The airlock process is effective, but cumbersome. In particular, the need to use SAS URLs inside the w/s VMs means we can't completely block pasting into the VMs, which is something we'd like to do by default - only allowing it on a per-case basis, but that's another ticket.

The need for a SAS URL inside the workspace could be eliminated if the process were redesigned. Once an import is approved, there's no reason to only make access to it ephemeral, it makes sense to have the file accessible for the lifetime of the project. The file can be pushed to the/a shared storage directly, so it's immediately accessible from all machines, eliminating the need for the storage explorer on import.

For exports, a staging storage space can be made available, and the user given access to it from within the workspace. The act of pushing a file there can be used to trigger creation of a draft export request, with the file being automatically moved to an inaccessible (to the user) storage which preserves the NONCE semantics.

This is related to #2402, about the need for access to the UI from within the workspace. However, this simplification of the airlock process is worth it on its own, regardless of that issue. It will greatly improve the user experience.

marrobi · 2025-02-04T14:16:23Z

I'll ask others to comment, but believe the reason for dedicated storage per request was as we had a scenario where the PI wanted to do the import, and not share it with others in the workspace, hence it must not touch the shared storage.

We could have the option to automatically transfer the file. Maybe a destination for the request.

As for automating the creation, there is an amount of metadata needed alongside a request, how would that get provided?

TonyWildish-BH · 2025-02-04T14:22:19Z

What metadata is required that isn't available in the workspace already? Presumably a function triggered by the file upload can access that metadata, something similar must be happening now, unless I've misunderstood the process?

jonnyry · 2025-02-04T14:47:30Z

Throwing in my two pennies worth:

It would be a smoother experience if the researcher could upload files to / download files from the TRE portal itself to import and export from the Airlock. Realise there's a lot of work to make that happen.
Could we move to Entra auth on the airlock storage accounts and do away with the SAS key altogether? Is the time bound SAS token necessary?
Could we rationalise/reduce the number of storage account used the by airlock? I understand different network settings are needed on some accounts - for example the export needs to be publicly available. Though could some accounts be shared (e.g. blocked/rejected/in progress) using different containers to compartmentalise the data?

The number of accounts (particularly the workspace ones) seems to be a bit of a bind on scalability, and also ups the cost when Defender is part of the mix (although you can create exclusions).

Airlock storage accounts - core

Name	Description
`st + airlockp + <TRE_ID>`	Airlock Processor
`st + alexapp + <TRE_ID>`	Airlock Export Approved
`st + alimblocked + <TRE_ID>`	Airlock Import Blocked
`st + alimex + <TRE_ID>`	Airlock Import External
`st + alimip + <TRE_ID>`	Airlock Import In Progress
`st + alimrej + <TRE_ID>`	Airlock Import Rejected

Airlock storage accounts - per workspace

Name	Description
`st + alexblocked + ws + <WS_ID>`	Airlock Export Blocked
`st + alexint + ws + <WS_ID>`	Airlock Export Internal
`st + alexip + ws + <WS_ID>`	Airlock Export In Progress
`st + alexrej + ws + <WS_ID>`	Airlock Export Rejected
`st + alimapp + ws + <WS_ID>`	Airlock Import Approved

fortunkam · 2025-02-04T14:58:19Z

It would be a smoother experience if the researcher could upload files to / download files from the TRE portal itself to import and export from the Airlock. Realise there's a lot of work to make that happen.

I have a branch I am working on which will enable this exact scenario (client side storage access using RBAC), will tag this issue once that PR is ready.

marrobi · 2025-02-04T14:59:59Z

@fortunkam can you create an issue with details so we can assign it to you? thanks.

TonyWildish-BH · 2025-02-04T15:04:24Z

It would be a smoother experience if the researcher could upload files to / download files from the TRE portal itself to import and export from the Airlock. Realise there's a lot of work to make that happen.

I have a branch I am working on which will enable this exact scenario (client side storage access using RBAC), will tag this issue once that PR is ready.

That sounds useful in some circumstances, but not all. We often have data sitting on VMs in Azure that needs to be imported, so pulling it locally and then uploading through the browser isn't a good solution in those cases.

Are there any limits on file size for upload via the UI? Now that the malware checking limit of 2 GB has been raised (I've tested up to 50 GB, just for fun), it would be good to know if the browser or UI impose their own limits.

fortunkam · 2025-02-04T15:12:46Z

I tested the client side file upload with files up to 2GB. Will give it a go with a larger file once I pulled the changes in.

fortunkam · 2025-02-04T15:24:23Z

See #4309

TonyWildish-BH · 2025-02-04T15:27:40Z

thanks. Please don't remove the ability to upload via the CLI, we need that for cases where we don't have a browser, such as having the data on a Linux VM in the cloud. It's not always feasible or desirable to pull the data locally and then push it through the browser.

marrobi · 2025-02-04T15:41:38Z

Thanks @fortunkam . Can any "upload via UI discussion, happen here - #4309 .

What are the other points to discuss here?

Restructuring airlock storage accounts to use fewer accounts
Auto creation of airlock requests from a file drop
SAS token vs RBAC for file upload?

Is that it? I think these are likely separate issues with separate implementation plans.

marrobi · 2025-02-07T08:59:28Z

@TonyWildish-BH #4335 might be of interest?

TonyWildish-BH · 2025-02-07T09:29:43Z

@TonyWildish-BH #4335 might be of interest?

Thanks @marrobi. That's an option for imports, yes, but I'd also like to make exports easier.

marrobi · 2025-02-10T09:54:37Z

@TonyWildish-BH @jonnyry I'm going rename this to "Redesign Airlock Export Process to Eliminate SAS URLs"

If feel this misrepresents the issue, let me know. If there are other "airlock issues" please create new issues.

jonnyry · 2025-02-11T14:39:21Z

Opened seperate ticket for my point above on number of storage accounts #4358

marrobi · 2025-02-11T14:52:39Z

So think

@TonyWildish-BH can I close this issue in favour of these new issues? If another one is required, please suggest it and we can close this conversation. Thanks.

TonyWildish-BH · 2025-02-12T12:19:04Z

@marrobi, #4335 and #4358 are fine, but #4309 is not a complete solution for us.

Using the UI only works if the file to upload, or the download destination, are locally available. We frequently push data through VMs in the cloud, and use the CLI to upload from or download to those VMs. Using the UI isn't always an option there.

I'll comment on #4309, but until that covers all the bases, I don't think this ticket should be closed.

marrobi · 2025-02-12T13:18:25Z

@TonyWildish-BH can you be specific about what this ask is for? I'm a little confused.

Is it "Trigger creation of a draft export request when file is dropped in specific location in workspace shared storage"?

If so can we rename the issue?

TonyWildish-BH · 2025-02-12T16:58:58Z

@marrobi, the current title is on the mark, I'd like to eliminate the use of SAS URLs. In particular, I'd like to eliminate their use inside the TRE.

There are two reasons for this:

The primary reason is that we want to be able to disable pasting into the workspace, and eliminating SAS URLs is a requirement for that.
The secondary reason is that the airlock process is very cumbersome, and eliminating SAS URLs can be done in a way that makes the user experience much more pleasant.

The "Trigger creation..." bit is just one suggestion, I'm open to other ideas. I don't much mind how it's done as long as it satisfies those two requirements.

Hope that clarifies things.

West-P · 2025-02-12T17:37:32Z

When creating the storage account can we leverage the requesting user's identity to add a role assignment on the blob container?

The Airlock Manager role could also be added. Potentially this could be done via a group (if group creation is enabled when creating a workspace) as a workspace could have multiple airlock managers.

This would allow the authentication to the container via EntraID rather than SAS token.

The UI could then give the container URL rather than the full SAS token.
However, the airlock help files would also need updating as a different process is needed if connecting via Storage Explorer/CLI and/or the Azure Portal.

It would remove the need for a SAS token, however I don't think it makes the process any more efficient for the end user, other than removing the knowledge barrier of what to do with a SAS token.

Addition to the above.
What authentication is used for the airlock security scan/s? Does this currently use the SAS to scan when an airlock request is submitted. If it does a different method would need to be used such as using a Service Principal with a role assignment on the storage account.

marrobi · 2025-02-12T17:42:55Z

Can we grant RBAC access at file level basis? I think that is why SAS tokens were used.

West-P · 2025-02-13T08:19:24Z

Can we grant RBAC access at file level basis? I think that is why SAS tokens were used.

No you cannot. You can use ACLs if using Data Lake per container.
However, could the airlock process change to creating a storage account per request?

Again, not ideal as then if you had 100 requests you then have 100 storage accounts floating around.
The storage account would need to be destroyed if a request is denied and what happens to a storage account after the request is approved.....well I guess it just persists, unless you setup a schedule whereby an approved request gives access into the TRE workspace for x days and after x days the storage account is removed.

But it all comes back to SAS tokens essentially.
Trying to figure out ways around it but it makes sense why SAS tokens were used in the first place.

marrobi · 2025-02-13T08:36:04Z

@West-P there are quota on number of storage accounts per subscription, we already run into these when have large number of workspaces. So storage account per request isn't a good idea.

I think the only solution is automatic transfer to/from a shared area, but this depends if it is appropriate for the data in the request to be accessible be all users in the workspace.

West-P · 2025-02-13T09:07:01Z

@West-P there are quota on number of storage accounts per subscription, we already run into these when have large number of workspaces. So storage account per request isn't a good idea.

I think the only solution is automatic transfer to/from a shared area, but this depends if it is appropriate for the data in the request to be accessible be all users in the workspace.

Yes I can see that being an issue.
Only other theories which I haven't added yet as I can't think of an end to end solution for is using Azure Files or using a small Data disk to mount to VMs for the process.

There are a lot of issues with both though.
Azure files would mean storage explorer could be used however the permissions become a bit tricky with having to create a new share per request.
Data disk may be useful however doing security scans on these may be an issue. Plus you would need to then enable the disk once it is mounted onto a VM (could be automated) but then this takes away the ability to az copy into it.

jonnyry · 2025-02-13T09:16:33Z

Could this be used? Looks like a new feature...

https://learn.microsoft.com/en-us/azure/role-based-access-control/conditions-overview

marrobi · 2025-02-13T09:20:12Z

Potentially. I'm not sure it resolves the SAS situation, but might help the multiple storage accounts (#4358) as can add conditions on each blob/container based on the private endpoint its being accessed from. I did test it a few months ago at a high level and looked promising.

TonyWildish-BH added the feature label Feb 4, 2025

marrobi changed the title ~~Proposed change to airlock process~~ Redesign Airlock Export Process to Eliminate SAS URLs Feb 10, 2025

TonyWildish-BH mentioned this issue Feb 12, 2025

Support in UI Airlock File uploads and downloads. #4309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign Airlock Export Process to Eliminate SAS URLs #4308

Redesign Airlock Export Process to Eliminate SAS URLs #4308

TonyWildish-BH commented Feb 4, 2025

marrobi commented Feb 4, 2025

TonyWildish-BH commented Feb 4, 2025

jonnyry commented Feb 4, 2025 •

edited

Loading

fortunkam commented Feb 4, 2025

marrobi commented Feb 4, 2025

TonyWildish-BH commented Feb 4, 2025

fortunkam commented Feb 4, 2025

fortunkam commented Feb 4, 2025

TonyWildish-BH commented Feb 4, 2025

marrobi commented Feb 4, 2025

marrobi commented Feb 7, 2025

TonyWildish-BH commented Feb 7, 2025

marrobi commented Feb 10, 2025

jonnyry commented Feb 11, 2025

marrobi commented Feb 11, 2025

TonyWildish-BH commented Feb 12, 2025 •

edited

Loading

marrobi commented Feb 12, 2025

TonyWildish-BH commented Feb 12, 2025

West-P commented Feb 12, 2025 •

edited

Loading

marrobi commented Feb 12, 2025

West-P commented Feb 13, 2025

marrobi commented Feb 13, 2025

West-P commented Feb 13, 2025

jonnyry commented Feb 13, 2025 •

edited

Loading

marrobi commented Feb 13, 2025

Redesign Airlock Export Process to Eliminate SAS URLs #4308

Redesign Airlock Export Process to Eliminate SAS URLs #4308

Comments

TonyWildish-BH commented Feb 4, 2025

marrobi commented Feb 4, 2025

TonyWildish-BH commented Feb 4, 2025

jonnyry commented Feb 4, 2025 • edited Loading

Airlock storage accounts - core

Airlock storage accounts - per workspace

fortunkam commented Feb 4, 2025

marrobi commented Feb 4, 2025

TonyWildish-BH commented Feb 4, 2025

fortunkam commented Feb 4, 2025

fortunkam commented Feb 4, 2025

TonyWildish-BH commented Feb 4, 2025

marrobi commented Feb 4, 2025

marrobi commented Feb 7, 2025

TonyWildish-BH commented Feb 7, 2025

marrobi commented Feb 10, 2025

jonnyry commented Feb 11, 2025

marrobi commented Feb 11, 2025

TonyWildish-BH commented Feb 12, 2025 • edited Loading

marrobi commented Feb 12, 2025

TonyWildish-BH commented Feb 12, 2025

West-P commented Feb 12, 2025 • edited Loading

marrobi commented Feb 12, 2025

West-P commented Feb 13, 2025

marrobi commented Feb 13, 2025

West-P commented Feb 13, 2025

jonnyry commented Feb 13, 2025 • edited Loading

marrobi commented Feb 13, 2025

jonnyry commented Feb 4, 2025 •

edited

Loading

TonyWildish-BH commented Feb 12, 2025 •

edited

Loading

West-P commented Feb 12, 2025 •

edited

Loading

jonnyry commented Feb 13, 2025 •

edited

Loading