-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign Airlock Export Process to Eliminate SAS URLs #4308
Comments
I'll ask others to comment, but believe the reason for dedicated storage per request was as we had a scenario where the PI wanted to do the import, and not share it with others in the workspace, hence it must not touch the shared storage. We could have the option to automatically transfer the file. Maybe a As for automating the creation, there is an amount of metadata needed alongside a request, how would that get provided? |
What metadata is required that isn't available in the workspace already? Presumably a function triggered by the file upload can access that metadata, something similar must be happening now, unless I've misunderstood the process? |
Throwing in my two pennies worth:
The number of accounts (particularly the workspace ones) seems to be a bit of a bind on scalability, and also ups the cost when Defender is part of the mix (although you can create exclusions). Airlock storage accounts - core
Airlock storage accounts - per workspace
|
I have a branch I am working on which will enable this exact scenario (client side storage access using RBAC), will tag this issue once that PR is ready. |
@fortunkam can you create an issue with details so we can assign it to you? thanks. |
That sounds useful in some circumstances, but not all. We often have data sitting on VMs in Azure that needs to be imported, so pulling it locally and then uploading through the browser isn't a good solution in those cases. Are there any limits on file size for upload via the UI? Now that the malware checking limit of 2 GB has been raised (I've tested up to 50 GB, just for fun), it would be good to know if the browser or UI impose their own limits. |
I tested the client side file upload with files up to 2GB. Will give it a go with a larger file once I pulled the changes in. |
See #4309 |
thanks. Please don't remove the ability to upload via the CLI, we need that for cases where we don't have a browser, such as having the data on a Linux VM in the cloud. It's not always feasible or desirable to pull the data locally and then push it through the browser. |
Thanks @fortunkam . Can any "upload via UI discussion, happen here - #4309 . What are the other points to discuss here?
Is that it? I think these are likely separate issues with separate implementation plans. |
@TonyWildish-BH #4335 might be of interest? |
Thanks @marrobi. That's an option for imports, yes, but I'd also like to make exports easier. |
@TonyWildish-BH @jonnyry I'm going rename this to "Redesign Airlock Export Process to Eliminate SAS URLs" If feel this misrepresents the issue, let me know. If there are other "airlock issues" please create new issues. |
Opened seperate ticket for my point above on number of storage accounts #4358 |
So think
@TonyWildish-BH can I close this issue in favour of these new issues? If another one is required, please suggest it and we can close this conversation. Thanks. |
@marrobi, #4335 and #4358 are fine, but #4309 is not a complete solution for us. Using the UI only works if the file to upload, or the download destination, are locally available. We frequently push data through VMs in the cloud, and use the CLI to upload from or download to those VMs. Using the UI isn't always an option there. I'll comment on #4309, but until that covers all the bases, I don't think this ticket should be closed. |
@TonyWildish-BH can you be specific about what this ask is for? I'm a little confused. Is it "Trigger creation of a draft export request when file is dropped in specific location in workspace shared storage"? If so can we rename the issue? |
@marrobi, the current title is on the mark, I'd like to eliminate the use of SAS URLs. In particular, I'd like to eliminate their use inside the TRE. There are two reasons for this:
The "Trigger creation..." bit is just one suggestion, I'm open to other ideas. I don't much mind how it's done as long as it satisfies those two requirements. Hope that clarifies things. |
When creating the storage account can we leverage the requesting user's identity to add a role assignment on the blob container? The Airlock Manager role could also be added. Potentially this could be done via a group (if group creation is enabled when creating a workspace) as a workspace could have multiple airlock managers. This would allow the authentication to the container via EntraID rather than SAS token. The UI could then give the container URL rather than the full SAS token. It would remove the need for a SAS token, however I don't think it makes the process any more efficient for the end user, other than removing the knowledge barrier of what to do with a SAS token. Addition to the above. |
Can we grant RBAC access at file level basis? I think that is why SAS tokens were used. |
No you cannot. You can use ACLs if using Data Lake per container. Again, not ideal as then if you had 100 requests you then have 100 storage accounts floating around. But it all comes back to SAS tokens essentially. |
@West-P there are quota on number of storage accounts per subscription, we already run into these when have large number of workspaces. So storage account per request isn't a good idea. I think the only solution is automatic transfer to/from a shared area, but this depends if it is appropriate for the data in the request to be accessible be all users in the workspace. |
Yes I can see that being an issue. There are a lot of issues with both though. |
Could this be used? Looks like a new feature... https://learn.microsoft.com/en-us/azure/role-based-access-control/conditions-overview |
Potentially. I'm not sure it resolves the SAS situation, but might help the multiple storage accounts (#4358) as can add conditions on each blob/container based on the private endpoint its being accessed from. I did test it a few months ago at a high level and looked promising. |
The airlock process is effective, but cumbersome. In particular, the need to use SAS URLs inside the w/s VMs means we can't completely block pasting into the VMs, which is something we'd like to do by default - only allowing it on a per-case basis, but that's another ticket.
The need for a SAS URL inside the workspace could be eliminated if the process were redesigned. Once an import is approved, there's no reason to only make access to it ephemeral, it makes sense to have the file accessible for the lifetime of the project. The file can be pushed to the/a shared storage directly, so it's immediately accessible from all machines, eliminating the need for the storage explorer on import.
For exports, a staging storage space can be made available, and the user given access to it from within the workspace. The act of pushing a file there can be used to trigger creation of a draft export request, with the file being automatically moved to an inaccessible (to the user) storage which preserves the NONCE semantics.
This is related to #2402, about the need for access to the UI from within the workspace. However, this simplification of the airlock process is worth it on its own, regardless of that issue. It will greatly improve the user experience.
The text was updated successfully, but these errors were encountered: