feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675

jbottigliero · 2024-12-02T15:28:02Z

This pull request adds integration with Globus. Specifically, adding the ability to source data via HTTPS on a Globus Connect Server instance, authenticated using Globus Auth.

The credential_provider was created based off of the #src/util/google_oauth2.ts implementation and other OAuth-like credential providers (e.g. middleauth and ngauth).

The Globus credential provider uses PKCE for the authorization flow, which I've added a few basic utilities around ¹.

@ravescovi has successfully deployed an instance of Neuroglancer configured with Globus using the proposed changes – much of this implementation is based on his initial work integrating with the Neuroglancer codebase.

We're looking forward to discussing the implementation and seeing what needs to be addressed to get this into the mainline!

These utilities could be replaced with an external library (e.g. pkce-challenge) if that is preferred. It might be worth noting many of added methods were pulled from the Globus SDK for JavaScript directly. ↩

jbottigliero · 2024-12-02T15:38:30Z

webpack.config.js

+       * own Client ID from Globus and substitute it in.
+       * @see https://docs.globus.org/api/auth/developer-guide/#developing-apps
+       */
+      GLOBUS_CLIENT_ID: JSON.stringify("f3c5dd86-8c8e-4393-8f46-3bfa32bfcd73"),


This default value is a Client ID that is managed by the Globus team.

It is something that could be distributed as part of the codebase or removed as a default and only referenced here as a comment.

👍 from my side for at least not needing a fork to set a different value.

@joshmoore – that was my intent here, but I am a bit unsure this method works as intended...

I thought something like npm run build -- --define GLOBUS_CLIENT_ID=example was the intended use, but it looks like the NEUROGLANCER_CLI environment flag needs to be disabled in order for the incoming --define properties to be merged.

Running npm run build -- --env NEUROGLANCER_CLI=false ... encounters an error due to the .strict() usage in build_tools/cli.ts – I don't want to derail the addition of this functionality, but just wanted to make sure I had a better grasp on the change and how it is used.

jbms · 2025-01-31T05:12:35Z

Thanks, and sorry for the delay in responding. Part of the delay is that I had been working on a major refactor of file access in Neuroglancer.

Now that it has landed, it should make it easier to integrate additional data sources like this, but some refactoring of your PR will be needed.

In the README you mention that the user needs to enter a UUID --- can you describe a bit more about how Globus works and how this UUID is handled?

If the UUID is necessary to access the server, shouldn't it be included in the datasource URL somehow so that when sharing a link to the Neuroglancer state, another user doesn't have to enter it?

jbms · 2025-01-31T05:29:11Z

Also, can you clarify how the client ids work and how the default client id you have provided will work?

As far as I can gather, users host their own globus server, but they rely on the central globus authentication server?

What are the allowed origins for the default client id? Is the list of allowed origins global or specific to a given globus server instance? Similarly, is the authentication token that is received valid for all globus servers or just a single instance (based on the way the scopes work, I guess the answer is just a single server)? Or is that irrelevant because users are local to a single server instance anyway?

The scope doesn't seem to say anything about what permissions are granted? Does that mean that all permissions (i.e. read and write) are granted?

A general issue that comes up is that a user may wish to view certain datasets in Neuroglancer, and therefore must necessarily grant a given Neuroglancer instance read access to that particular dataset. However, it is often not possible to limit permissions narrowly like that, and instead the user is forced to either grant no permission, or grant very broad permissions, e.g. read access to everything. This may not be an issue if Neuroglancer can be trusted with full access, e.g. because the person administering the Neuroglancer instance is also administering the Globus server, but often it is an issue.

This issue is exactly why Neuroglancer does not provide the option to access GCS via regular google oauth2 login, because it would require users to grant Neuroglancer full read access to all of their GCS resources, which would not normally be a good idea unless they have created a separate Google account with access limited to those datasets they wish to view in Neuroglancer. Instead, there is ngauth, which allows you to grant Neuroglancer access only to specific datasets.

Is it possible with Globus for a user to somehow say: I want to grant this specific Neuroglaner instance access to just these specific datasets? Furthermore, it would be nice if user A can grant that permission, and then share a Neuroglancer link with user B, who also has access to that dataset, and because user A has access to the dataset and already granted access to that dataset to Neuroglancer, and user B has access to the dataset, user B can login and then automatically access it without needing to specifically grant any additional permissions.

jbottigliero · 2025-02-07T15:30:19Z

Hey @jbms! Thanks for taking the time to look at this. I'll review the refactor and work to get this code updated if you think we can land on an integration that makes sense – hoping my responses below can help determine that.

[...] can you describe a bit more about how Globus works and how this UUID is handled?

As a high-level background on the functionality, this would be tapping into the following aspects of the Globus Platform:

Globus Connect Server (GCS) is an agent installed on end-user systems. The agent acts as an interface to underlying storage systems, the broader Globus platform, GridFTP, and HTTPS.
A "Globus Collection" (Collection) is a specific access point to a GCS Endpoint, including path, access control, etc.
GCS supports HTTPS access to resources in these Collections.
Resources are accessed with a combination of permissions configured at the Collection level (with some configuration inheritance) and local file system permissions.
- I'll expand on this further below in reply to the scope behavior, but a token with a scope for the Collection is required (the scope contains the Collection's UUID as part of the identifier).
- Globus Auth is then used as a federated authentication system and security fabric.

[...] If the UUID is necessary to access the server, shouldn't it be included in the datasource URL [...]

Currently, a GCS HTTPS URL does not contain enough information to determine the Collection's UUID, and no programmatic method exists for deriving this information — hopefully, this will be possible at some point in the future.

Also, can you clarify how the client ids work and how the default client id you have provided will work? As far as I can gather, users host their own globus server, but they rely on the central globus authentication server?

A registered Globus Application can be configured as an OAuth 2.0 confidential or public client, allowing Globus (Auth) to be used for authentication.

The Client ID included in the code change is a public client I've created¹ with the following supported Redirect URIs:

https://localhost:8000, https://localhost:8080, https://127.0.0.1:8000, https://127.0.0.1:8080

A token created by the client would only be valid for the scopes requested by the client and approved by the user. Depending on how the data was shared, there are a few different scopes that the client might request:

https://auth.globus.org/scopes/<COLLECTION_ID>/https HTTPS access is always required.
https://auth.globus.org/scopes/<COLLECTION_ID>/data_access is required for "mapped non-high assurance" collections.²

This means the token only grants the client access to the Collection where the data resides (not necessarily all GCS resources the user has access to)—that Collection might even be a single folder in a file system.

I want to grant this specific Neuroglaner instance access to just these specific datasets? Furthermore, it would be nice if user A can grant that permission, and then share a Neuroglancer link with user B, who also has access to that dataset, and because user A has access to the dataset and already granted access to that dataset to Neuroglancer, and user B has access to the dataset, user B can login and then automatically access it without needing to specifically grant any additional permissions.

One potential solution would be for User A to create a Guest Collection (G1) at a specific path where all datasets to be shared with Neuroglancer would be published. That single collection ID becomes the only requested and granted scope (https://auth.globus.org/scopes/G1/https) by the installation, and in the current code, when User B interacts with Neroglancer, the GlobusLocalStorage.domainMappings would smooth out the UX a little by only asking for the Collection ID once (while available in localStorage).

I know I didn't respond to all of your questions directly, but I hope the explanation above helps determine if this can move forward – I am happy to expand on any of the above!

For transparency, I work for Globus, and my Globus account manages this application. Ideally, when someone is shipping an instance of Neuroglancer they would swap the pre-configured Client ID with an application they manage. ↩
Globus has two types of Collections "Mapped" and "Guest"; For the sake of this integration, I don't think those implementation details matter so much. ↩

jbms · 2025-02-07T17:43:53Z

Okay, let me see if I understand things correctly:

Globus connect server is a proxy-like server that provides authenticated access to various systems.
Note: the use of "GCS" to refer to Globus Connect Server was a bit confusing to me because I am used to "GCS" meaning "Google Cloud Storage".
To access data, you need (1) an HTTP URL for the Globus Connect Server, (2) a UUID, (3) authentication credentials, if anonymous access is not enabled.

Is there a 1-1 relationship between Globus collections and UUIIDs? Is there a 1-1 relationship between Globus collections and Globus Connect Server hostnames/origins? Or can there be more than one Globus collection at a given hostname, i.e. an HTTP path is also needed? Or conversely, might the same Globus collection be hosted at more than one Globus Connect Server hostname/path?

You said that there is no way to go from Globus collection HTTP URL to the corresponding UUID, but given just the UUID, can we lookup the HTTP URL for it? This example seems to suggest that we can: https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/#from_the_command_line

jbottigliero · 2025-02-07T18:25:33Z

Correct.

Yes, a Collection has a distinct UUID.
Globus Connect Server has many collections; HTTPS hostnames can be set at the Globus Connect Server level (endpoint) and/or collection level.

You said that there is no way to go from Globus collection HTTP URL to the corresponding UUID, but given just the UUID, can we lookup the HTTP URL for it? This example seems to suggest that we can: https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/#from_the_command_line

Yes, given a Collection UUID and resource (file) path, you can programmatically determine the HTTPS hostname and construct a fully resolvable URL; The current implementation in the PR uses the HTTPS URL since this is more likely to be shared when referencing a specific resource (specifically when targeting HTTPS as the transport).

Note: the use of "GCS" to refer to Globus Connect Server was a bit confusing to me because I am used to "GCS" meaning "Google Cloud Storage".

Fair! Sorry about that!

jbms · 2025-02-07T21:34:47Z

What about using the following URL syntax then:

globus+https://<UUID>@<HOSTNAME>/<PATH>

Additionally, if I understand correctly, after you type @ it could auto-complete the hostname, or there could be an alternative syntax for specifying just the UUID and have it lookup the hostname automatically. But perhaps given normal usage of Globus that isn't useful?

Ideally we can find a syntax that works well not only for Neuroglancerbut could also be supported by other tools, along the lines of my zarr proposal ZEP 8 --- see https://github.com/zarr-developers/zeps/pull/48/files

I see from the documentation here (https://docs.globus.org/globus-connect-server/v5/https-access-collections/#supported_http_methods) that directory listing is not supported. That will still work fine but for interactively entering/browsing datasets from Neuroglancer directory listing would certainly be helpful.

joshmoore · 2025-02-10T08:59:09Z

Two quick points for clarity (from the naive user side):

The URLs that I get from Globus for sharing with others are of the form -- https://app.globus.org/file-manager?origin_id=COLLECTION_UUID&origin_path=%2FPATH%2F -- i.e., I never see any other concept of a HOSTNAME beyond globus.org
This doesn't cover the client ID UUID from feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675 (comment)

…alternate approach for scope/token discovery.

jbottigliero · 2025-02-11T01:30:59Z

What about using the following URL syntax then:

globus+https://<UUID>@<HOSTNAME>/<PATH>

Additionally, if I understand correctly, after you type @ it could auto-complete the hostname, or there could be an alternative syntax for specifying just the UUID and have it lookup the hostname automatically. But perhaps given normal usage of Globus that isn't useful?

As @joshmoore points out, the full HTTPS URL is one of the more common references to be shared by end-users.

I just pushed up an alternative approach that would support globus+https://<https_asset_url>.

I'm using a request to the Globus Connect Server host that will result in a well-known error in order to derive the required scopes (and Collection ID, indirectly).

I still need to do a closer review of the code changes to make sure this fits in well with the new provider patterns (and write some documentation), but just wanted to share this as a possible solution.

jbms · 2025-02-11T05:06:02Z

Detecting the UUID automatically sounds like a big improvement --- thanks!

Thinking a bit more about the client ids:

Either a single client id gets baked into the build (i.e. specific deployment of neuroglancer), or
there is some mechanism to configure the client id, either in the JSON state or via user local storage.

Option 1 is how such client ids are normally handled.

For any custom deployment of neuroglancer, whoever is deploying it can register their own globus application and option 1 works quite well.

However, with the current application id provided by @jbottigliero / Globus, the default deployment of neuroglancer at https://neuroglancer-demo.appspot.com will be unusable with Globus, which seems rather unfortunate.

My inclination is to instead deploy to neuroglancer-demo.appspot.com with a working client id, that allows both neuroglancer-demo.appspot.com as well as localhost. That way, if users wish to use the default deployment of neuroglancer with globus, they can do so.

Additionally, from a security perspective, I don't see any theoretical advantage of allowing the user to specify their own client id --- if the deployment is controlled by a malicious adversary, the user having entered their own client id doesn't provide any protection, because either way the adversary can obtain the credentials.

Potentially someone may wish to use globus via a localhost deployment for testing but not grant access to neuroglancer-demo.appspot.com, and also doesn't want to have to bother to register their own globus client id. In that case, a localhost-only client id provided directly by Globus, i.e. the one currently included in this PR, would still be useful. It could make sense to make that one the default one but override it when deploying to neuroglancer-demo.appspot.com.

jbms · 2025-02-11T06:46:45Z

src/datasource/globus/credentials_provider.ts

+        {
+          method: "GET",
+          headers: {
+            "X-Requested-With": "XMLHttpRequest",


Why is this header being included? it will result in an additional preflight OPTIONS request.

This header is used to signal "Programmatic Access" (https://docs.globus.org/globus-connect-server/v5.4/https-access-collections/#programmatic_access) which ensures an Content-Type: application/json response. I think this is managed as a separate header than just standard Accept because the underlying asset is responsible for content negotiation when served over HTTPS.

I can add a comment here that explains this with a reference to the documentation.

jbms · 2025-02-11T06:55:02Z

src/datasource/globus/credentials_provider.ts

+      const challenge = await generateCodeChallenge(verifier);
+      const url = getGlobusAuthorizeURL({
+        clientId,
+        scope: authorization_parameters.required_scopes,


I'm concerned that there may be a security vulnerability here --- suppose the user has previously accessed:

globus+https://good-host.com/... which has UUID XXXX

and granted access.

Then the user gets directed to visit a Neuroglancer link that specifies a datasource of globus+https://bad-host.com/... bad-host.com maliciously reports that it has the same UUID of XXXX as good-host.com.

What will happen in that case? Will the user have to grant permission again or will it be assumed?

Good call, it does seem like a plausible vector. In the updated processing a server that spoofs the required_scopes of good-host.com would result in tokens for the good-host.com being sent as Authorization headers to bad-host.com.

As a way to validate the required_scope response, it seems plausible to do a reverse lookup of the domain using the the scopes in the response, but this might require additional initial consent – I'm going to discuss this internally at Globus to see if we can come up with a more secure alternative.

jbms · 2025-02-11T06:56:36Z

src/datasource/globus/credentials_provider.ts

+  ) as GlobusLocalStorage;
+}
+
+async function waitForAuth(


Please refactor this to use newly added utilities in src/credentials_provider/interactive_credentials_provider.ts

jbms · 2025-02-11T06:57:56Z

Please move it from src/datasource to src/kvstore. This is considered a "root key-value store" under the current terminology used in the neuroglancer docs.

joshmoore · 2025-02-11T14:39:56Z

Additionally, from a security perspective, I don't see any theoretical advantage of allowing the user to specify their own client id --- if the deployment is controlled by a malicious adversary, the user having entered their own client id doesn't provide any protection, because either way the adversary can obtain the credentials.

My concern would really be the impression of users who come to the system. Thinking of my installation, I would rather users see the authority figure they expect (i.e., me) when being asked to use oauth for an application.

jbms · 2025-02-11T14:43:26Z

Additionally, from a security perspective, I don't see any theoretical advantage of allowing the user to specify their own client id --- if the deployment is controlled by a malicious adversary, the user having entered their own client id doesn't provide any protection, because either way the adversary can obtain the credentials.

My concern would really be the impression of users who come to the system. Thinking of my installation, I would rather users see the authority figure they expect (i.e., me) when being asked to use oauth for an application.

To be clear, by user I mean whoever is using the browser. If you are deploying your own neuroglancer instance then you would anyway need to specify your own globus client id at build time.

If no one cares to use globus with the default neuroglancer instance then we could just not worry about that.

jbottigliero mentioned this pull request Dec 2, 2024

wip:feat: adds Globus GCS-sourced assets as a datasource #666

Closed

jbottigliero commented Dec 2, 2024

View reviewed changes

jbottigliero added 2 commits February 10, 2025 10:52

feat: adds Globus GCS-sourced assets as a datasource

5f8839d

chore: updates Globus registration to better match refactor and uses …

6a9e9a1

…alternate approach for scope/token discovery.

jbottigliero force-pushed the feat-globus-datasource branch from 5ceda12 to 6a9e9a1 Compare February 11, 2025 01:20

chore: clean-up and documentation

ebd73fb

jbottigliero changed the title ~~feat: adds Globus GCS-sourced assets as a datasource~~ feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) Feb 11, 2025

jbms reviewed Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675

feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675

jbottigliero commented Dec 2, 2024 •

edited

Loading

jbottigliero Dec 2, 2024

joshmoore Dec 2, 2024

jbottigliero Dec 2, 2024

jbms commented Jan 31, 2025

jbms commented Jan 31, 2025

jbottigliero commented Feb 7, 2025

jbms commented Feb 7, 2025

jbottigliero commented Feb 7, 2025

jbms commented Feb 7, 2025

joshmoore commented Feb 10, 2025

jbottigliero commented Feb 11, 2025

jbms commented Feb 11, 2025

jbms Feb 11, 2025

jbottigliero Feb 11, 2025

jbms Feb 11, 2025

jbottigliero Feb 11, 2025

jbms Feb 11, 2025

jbms commented Feb 11, 2025

joshmoore commented Feb 11, 2025

jbms commented Feb 11, 2025

feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675

Are you sure you want to change the base?

feat: adds Globus Connect Server-sourced assets as a datasource (via HTTPS) #675

Conversation

jbottigliero commented Dec 2, 2024 • edited Loading

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbms commented Jan 31, 2025

jbms commented Jan 31, 2025

jbottigliero commented Feb 7, 2025

Footnotes

jbms commented Feb 7, 2025

jbottigliero commented Feb 7, 2025

jbms commented Feb 7, 2025

joshmoore commented Feb 10, 2025

jbottigliero commented Feb 11, 2025

jbms commented Feb 11, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbms commented Feb 11, 2025

joshmoore commented Feb 11, 2025

jbms commented Feb 11, 2025

jbottigliero commented Dec 2, 2024 •

edited

Loading