-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675
base: master
Are you sure you want to change the base?
Conversation
webpack.config.js
Outdated
* own Client ID from Globus and substitute it in. | ||
* @see https://docs.globus.org/api/auth/developer-guide/#developing-apps | ||
*/ | ||
GLOBUS_CLIENT_ID: JSON.stringify("f3c5dd86-8c8e-4393-8f46-3bfa32bfcd73"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default value is a Client ID that is managed by the Globus team.
It is something that could be distributed as part of the codebase or removed as a default and only referenced here as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 from my side for at least not needing a fork to set a different value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshmoore – that was my intent here, but I am a bit unsure this method works as intended...
I thought something like npm run build -- --define GLOBUS_CLIENT_ID=example
was the intended use, but it looks like the NEUROGLANCER_CLI
environment flag needs to be disabled in order for the incoming --define
properties to be merged.
Running npm run build -- --env NEUROGLANCER_CLI=false ...
encounters an error due to the .strict()
usage in build_tools/cli.ts
– I don't want to derail the addition of this functionality, but just wanted to make sure I had a better grasp on the change and how it is used.
Thanks, and sorry for the delay in responding. Part of the delay is that I had been working on a major refactor of file access in Neuroglancer. Now that it has landed, it should make it easier to integrate additional data sources like this, but some refactoring of your PR will be needed. In the README you mention that the user needs to enter a UUID --- can you describe a bit more about how Globus works and how this UUID is handled? If the UUID is necessary to access the server, shouldn't it be included in the datasource URL somehow so that when sharing a link to the Neuroglancer state, another user doesn't have to enter it? |
Also, can you clarify how the client ids work and how the default client id you have provided will work? As far as I can gather, users host their own globus server, but they rely on the central globus authentication server? What are the allowed origins for the default client id? Is the list of allowed origins global or specific to a given globus server instance? Similarly, is the authentication token that is received valid for all globus servers or just a single instance (based on the way the scopes work, I guess the answer is just a single server)? Or is that irrelevant because users are local to a single server instance anyway? The scope doesn't seem to say anything about what permissions are granted? Does that mean that all permissions (i.e. read and write) are granted? A general issue that comes up is that a user may wish to view certain datasets in Neuroglancer, and therefore must necessarily grant a given Neuroglancer instance read access to that particular dataset. However, it is often not possible to limit permissions narrowly like that, and instead the user is forced to either grant no permission, or grant very broad permissions, e.g. read access to everything. This may not be an issue if Neuroglancer can be trusted with full access, e.g. because the person administering the Neuroglancer instance is also administering the Globus server, but often it is an issue. This issue is exactly why Neuroglancer does not provide the option to access GCS via regular google oauth2 login, because it would require users to grant Neuroglancer full read access to all of their GCS resources, which would not normally be a good idea unless they have created a separate Google account with access limited to those datasets they wish to view in Neuroglancer. Instead, there is ngauth, which allows you to grant Neuroglancer access only to specific datasets. Is it possible with Globus for a user to somehow say: I want to grant this specific Neuroglaner instance access to just these specific datasets? Furthermore, it would be nice if user A can grant that permission, and then share a Neuroglancer link with user B, who also has access to that dataset, and because user A has access to the dataset and already granted access to that dataset to Neuroglancer, and user B has access to the dataset, user B can login and then automatically access it without needing to specifically grant any additional permissions. |
Hey @jbms! Thanks for taking the time to look at this. I'll review the refactor and work to get this code updated if you think we can land on an integration that makes sense – hoping my responses below can help determine that.
As a high-level background on the functionality, this would be tapping into the following aspects of the Globus Platform:
Currently, a GCS HTTPS URL does not contain enough information to determine the Collection's UUID, and no programmatic method exists for deriving this information — hopefully, this will be possible at some point in the future.
A registered Globus Application can be configured as an OAuth 2.0 confidential or public client, allowing Globus (Auth) to be used for authentication. The Client ID included in the code change is a public client I've created1 with the following supported Redirect URIs:
A token created by the client would only be valid for the scopes requested by the client and approved by the user. Depending on how the data was shared, there are a few different scopes that the client might request:
This means the token only grants the client access to the Collection where the data resides (not necessarily all GCS resources the user has access to)—that Collection might even be a single folder in a file system.
One potential solution would be for User A to create a Guest Collection ( I know I didn't respond to all of your questions directly, but I hope the explanation above helps determine if this can move forward – I am happy to expand on any of the above! Footnotes
|
Okay, let me see if I understand things correctly:
Is there a 1-1 relationship between Globus collections and UUIIDs? Is there a 1-1 relationship between Globus collections and Globus Connect Server hostnames/origins? Or can there be more than one Globus collection at a given hostname, i.e. an HTTP path is also needed? Or conversely, might the same Globus collection be hosted at more than one Globus Connect Server hostname/path? You said that there is no way to go from Globus collection HTTP URL to the corresponding UUID, but given just the UUID, can we lookup the HTTP URL for it? This example seems to suggest that we can: https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/#from_the_command_line |
Correct.
Yes, given a Collection UUID and resource (file) path, you can programmatically determine the HTTPS hostname and construct a fully resolvable URL; The current implementation in the PR uses the HTTPS URL since this is more likely to be shared when referencing a specific resource (specifically when targeting HTTPS as the transport).
Fair! Sorry about that! |
What about using the following URL syntax then:
Additionally, if I understand correctly, after you type @ it could auto-complete the hostname, or there could be an alternative syntax for specifying just the UUID and have it lookup the hostname automatically. But perhaps given normal usage of Globus that isn't useful? Ideally we can find a syntax that works well not only for Neuroglancerbut could also be supported by other tools, along the lines of my zarr proposal ZEP 8 --- see https://github.com/zarr-developers/zeps/pull/48/files I see from the documentation here (https://docs.globus.org/globus-connect-server/v5/https-access-collections/#supported_http_methods) that directory listing is not supported. That will still work fine but for interactively entering/browsing datasets from Neuroglancer directory listing would certainly be helpful. |
Two quick points for clarity (from the naive user side):
|
…alternate approach for scope/token discovery.
5ceda12
to
6a9e9a1
Compare
As @joshmoore points out, the full HTTPS URL is one of the more common references to be shared by end-users. I just pushed up an alternative approach that would support I'm using a request to the Globus Connect Server host that will result in a well-known error in order to derive the required scopes (and Collection ID, indirectly). I still need to do a closer review of the code changes to make sure this fits in well with the new provider patterns (and write some documentation), but just wanted to share this as a possible solution. |
Detecting the UUID automatically sounds like a big improvement --- thanks! Thinking a bit more about the client ids:
Option 1 is how such client ids are normally handled. For any custom deployment of neuroglancer, whoever is deploying it can register their own globus application and option 1 works quite well. However, with the current application id provided by @jbottigliero / Globus, the default deployment of neuroglancer at https://neuroglancer-demo.appspot.com will be unusable with Globus, which seems rather unfortunate. My inclination is to instead deploy to neuroglancer-demo.appspot.com with a working client id, that allows both neuroglancer-demo.appspot.com as well as localhost. That way, if users wish to use the default deployment of neuroglancer with globus, they can do so. Additionally, from a security perspective, I don't see any theoretical advantage of allowing the user to specify their own client id --- if the deployment is controlled by a malicious adversary, the user having entered their own client id doesn't provide any protection, because either way the adversary can obtain the credentials. Potentially someone may wish to use globus via a localhost deployment for testing but not grant access to neuroglancer-demo.appspot.com, and also doesn't want to have to bother to register their own globus client id. In that case, a localhost-only client id provided directly by Globus, i.e. the one currently included in this PR, would still be useful. It could make sense to make that one the default one but override it when deploying to neuroglancer-demo.appspot.com. |
{ | ||
method: "GET", | ||
headers: { | ||
"X-Requested-With": "XMLHttpRequest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this header being included? it will result in an additional preflight OPTIONS request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header is used to signal "Programmatic Access" (https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/#programmatic_access) which ensures an Content-Type: application/json
response. I think this is managed as a separate header than just standard Accept
because the underlying asset is responsible for content negotiation when served over HTTPS.
I can add a comment here that explains this with a reference to the documentation.
const challenge = await generateCodeChallenge(verifier); | ||
const url = getGlobusAuthorizeURL({ | ||
clientId, | ||
scope: authorization_parameters.required_scopes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that there may be a security vulnerability here --- suppose the user has previously accessed:
globus+https://good-host.com/...
which has UUID XXXX
and granted access.
Then the user gets directed to visit a Neuroglancer link that specifies a datasource of globus+https://bad-host.com/...
bad-host.com
maliciously reports that it has the same UUID of XXXX as good-host.com
.
What will happen in that case? Will the user have to grant permission again or will it be assumed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, it does seem like a plausible vector. In the updated processing a server that spoofs the required_scopes
of good-host.com
would result in tokens for the good-host.com
being sent as Authorization
headers to bad-host.com
.
As a way to validate the required_scope
response, it seems plausible to do a reverse lookup of the domain using the the scopes in the response, but this might require additional initial consent – I'm going to discuss this internally at Globus to see if we can come up with a more secure alternative.
) as GlobusLocalStorage; | ||
} | ||
|
||
async function waitForAuth( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor this to use newly added utilities in src/credentials_provider/interactive_credentials_provider.ts
Please move it from |
My concern would really be the impression of users who come to the system. Thinking of my installation, I would rather users see the authority figure they expect (i.e., me) when being asked to use oauth for an application. |
To be clear, by user I mean whoever is using the browser. If you are deploying your own neuroglancer instance then you would anyway need to specify your own globus client id at build time. If no one cares to use globus with the default neuroglancer instance then we could just not worry about that. |
This pull request adds integration with Globus. Specifically, adding the ability to source data via HTTPS on a Globus Connect Server instance, authenticated using Globus Auth.
credential_provider
was created based off of the#src/util/google_oauth2.ts
implementation and other OAuth-like credential providers (e.g.middleauth
andngauth
).The Globus credential provider uses PKCE for the authorization flow, which I've added a few basic utilities around 1.
@ravescovi has successfully deployed an instance of Neuroglancer configured with Globus using the proposed changes – much of this implementation is based on his initial work integrating with the Neuroglancer codebase.
We're looking forward to discussing the implementation and seeing what needs to be addressed to get this into the mainline!
Footnotes
These utilities could be replaced with an external library (e.g.
pkce-challenge
) if that is preferred. It might be worth noting many of added methods were pulled from the Globus SDK for JavaScript directly. ↩