Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle StorageID in S3 Gateway #8642

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

itaigilo
Copy link
Contributor

Closes #8641.

Change Description

Background

Making sure S3 Gateway plays way with StorageID.

Note that StorageID was already added to most of the relevant ObjectPointer objects,
So it's mainly about not allowing copy ops between two different StorageIDs.

Testing Details

AFAIK there's no coverage for the copy part,
So this was tested manually.

@itaigilo itaigilo requested review from N-o-Z and a team and removed request for N-o-Z February 11, 2025 20:44
@itaigilo itaigilo added the exclude-changelog PR description should not be included in next release changelog label Feb 11, 2025
@@ -211,6 +211,7 @@ type Adapter interface {

BlockstoreType() string
BlockstoreMetadata(ctx context.Context) (*BlockstoreMetadata, error)
// GetStorageNamespaceInfo returns the StorageNamespaceInfo for storageID or nil if not found.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment that I promised @guy-har to add for my next OSS PR 🤓

@@ -46,7 +47,7 @@ func TestWriteBlob(t *testing.T) {
reader := bytes.NewReader(data)
adapter := testutil.NewMockAdapter()
objectPointer := block.ObjectPointer{
StorageID: "",
StorageID: config.SingleBlockstoreID,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of tidying up, in the area of this PR.

Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

14 passed

Copy link

E2E Test Results - Quickstart

11 passed

Copy link
Contributor

@nadavsteindler nadavsteindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments

@@ -177,6 +177,12 @@ func handleUploadPart(w http.ResponseWriter, req *http.Request, o *PathOperation
}
}

if srcRepo.StorageID != o.Repository.StorageID {
o.Log(req).WithField("copy_source", copySource).WithError(err).Error("copy between different blockstores is not allowed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are now 4 places where we add copy_source to the logger via copy-pasted code. Why not use a constant or perhaps at the beginning of the code block do copySourceLogger := o.Log(req).WithFIeld(copy_source",copySource) and then use that logger in these 4 places

@@ -46,7 +47,7 @@ func TestWriteBlob(t *testing.T) {
reader := bytes.NewReader(data)
adapter := testutil.NewMockAdapter()
objectPointer := block.ObjectPointer{
StorageID: "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also test non blank storage IDs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true!
But that's for a different Issue (opened in a different project).

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Mostly worried about missing test coverage.

@@ -2738,6 +2738,10 @@ func (c *Catalog) CopyEntry(ctx context.Context, srcRepository, srcRef, srcPath,
if err != nil {
return nil, err
}

if srcRepo.StorageID != destRepo.StorageID {
return nil, fmt.Errorf("copy between different blockstores is not allowed: %w", graveler.ErrInvalidValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("copy between different blockstores is not allowed: %w", graveler.ErrInvalidValue)
return nil, fmt.Errorf("%w: cannot copy between different blockstores", graveler.ErrInvalidValue)

But also note that this error message is quite nasty. Consider using ErrInvalidStorageId, and also mentioning the two storage IDs.

@@ -442,6 +442,7 @@ func handleListMultipartUploads(w http.ResponseWriter, req *http.Request, o *Rep
opts.KeyMarker = &keyMarker
}
mpuResp, err := o.BlockStore.ListMultipartUploads(req.Context(), block.ObjectPointer{
StorageID: o.Repository.StorageID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could TestListMultipartUploads pass without this? I think perhaps still using blank StorageIDs in those tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT code was introduced here with no tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests indeed -
For almost the whole gateway/operations code...

Since it requires a decent amount of work, and it seems to me out of scope for this PR -
How about adding an Issue for adding unit-tests for these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is with adding untested code to the gateway. I am more comfortable with the new code as long as there are no storage IDs anywhere - I guess we can pull this code untested. But not sure of the benefit TBH.

@@ -177,6 +177,12 @@ func handleUploadPart(w http.ResponseWriter, req *http.Request, o *PathOperation
}
}

if srcRepo.StorageID != o.Repository.StorageID {
o.Log(req).WithField("copy_source", copySource).WithError(err).Error("copy between different blockstores is not allowed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand "WithError(err)" - is there an error?

Also please rephrase the error message as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste 🤦
Nice catch.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;-(

@@ -442,6 +442,7 @@ func handleListMultipartUploads(w http.ResponseWriter, req *http.Request, o *Rep
opts.KeyMarker = &keyMarker
}
mpuResp, err := o.BlockStore.ListMultipartUploads(req.Context(), block.ObjectPointer{
StorageID: o.Repository.StorageID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT code was introduced here with no tests.

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielshaqed thanks for your review,
Your comments were fixed / addressed.

Please note that this is indeed not covered by unit-tests,
But this whole gateway part isn't, and adding tests to it might be a massive task.

It's been manually tested, and I think it's enough in this stage of the project -
But I'm interested in hearing your perspective.

@@ -177,6 +177,12 @@ func handleUploadPart(w http.ResponseWriter, req *http.Request, o *PathOperation
}
}

if srcRepo.StorageID != o.Repository.StorageID {
o.Log(req).WithField("copy_source", copySource).WithError(err).Error("copy between different blockstores is not allowed")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste 🤦
Nice catch.

@@ -442,6 +442,7 @@ func handleListMultipartUploads(w http.ResponseWriter, req *http.Request, o *Rep
opts.KeyMarker = &keyMarker
}
mpuResp, err := o.BlockStore.ListMultipartUploads(req.Context(), block.ObjectPointer{
StorageID: o.Repository.StorageID,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests indeed -
For almost the whole gateway/operations code...

Since it requires a decent amount of work, and it seems to me out of scope for this PR -
How about adding an Issue for adding unit-tests for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While review this PR's comments, found out that this tests file has nothing to actually do with putobject -
It tests write_blob, hence moved here.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This also fixes Catalog.CopyEntry, which is obviously useful. Please make sure to write tests. I do know that the S3 gateway is not unit-tested. But it does have some integration tests in Esti - let's make sure that we at least run those with nonempty StorageIDs!

This is a requirement for releasing code that sets a nonempty StorageID.

@@ -442,6 +442,7 @@ func handleListMultipartUploads(w http.ResponseWriter, req *http.Request, o *Rep
opts.KeyMarker = &keyMarker
}
mpuResp, err := o.BlockStore.ListMultipartUploads(req.Context(), block.ObjectPointer{
StorageID: o.Repository.StorageID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is with adding untested code to the gateway. I am more comfortable with the new code as long as there are no storage IDs anywhere - I guess we can pull this code untested. But not sure of the benefit TBH.

@itaigilo
Copy link
Contributor Author

Thanks! This also fixes Catalog.CopyEntry, which is obviously useful. Please make sure to write tests. I do know that the S3 gateway is not unit-tested. But it does have some integration tests in Esti - let's make sure that we at least run those with nonempty StorageIDs!

This is a requirement for releasing code that sets a nonempty StorageID.

Spent some extra time adding unit-tests for the Gateway, but it seems like a task that exceeds the scope of this Issue (and project).
I've added an Issue for covering S3 Gateway with unit-tests,
And tagged you (@arielshaqed) in another issue for covering this part with Esti tests.

Obviously I don't enjoy adding code with no tests covering it, but it should be covered before the project is released.

Merging - if you still think there's things that can be done, lmk.

@itaigilo
Copy link
Contributor Author

@nadavsteindler thanks for reviewing -
Your comments have been fixed / addressed.

Can you PTAL again? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exclude-changelog PR description should not be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle StorageID for S3 Gateway ops
3 participants