Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received Message Attribute via XML is not decoded from base 64 #343

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kojisaikiAtSony
Copy link
Contributor

@kojisaikiAtSony kojisaikiAtSony commented Jan 14, 2025

Problem:
JSON requests has decoded from base 64 on JSON marshaller, but for XML, the SetAttributesFromForm does not have decoding step. So internal BinaryValue was still base 64 string (as binary) value.

This is a part of XML protocol, but today the actual AWS SNS and the SDK for SNS is still using XML whereas SQS has already moved to SQS (#337). We have acknowledged that by our investigation and lack of JSON description on official SNS doc. This blocks tests that use SNS apis.

Fix:
Add base 64 decoding step in SetAttributesFromForm.

@@ -370,7 +372,7 @@ func TestSendMessageBatchV1_Xml_Success_including_attributes(t *testing.T) {
WithFormField("Entries.1.MessageBody", messageBody2).
WithFormField("Entries.1.MessageAttributes.1.Name", binaryAttributeKey).
WithFormField("Entries.1.MessageAttributes.1.Value.DataType", binaryType).
WithFormField("Entries.1.MessageAttributes.1.Value.BinaryValue", binaryValue).
WithFormField("Entries.1.MessageAttributes.1.Value.BinaryValue", binaryValueEncodeString).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is for blobs are sent and received after being base64 encoded.
BinaryValue in XML should be base 64 encoded.

@kojisaikiAtSony
Copy link
Contributor Author

Hi @Admiral-Piett , can you please review this? This is the last piece to update our AWS SDK version...

@Admiral-Piett
Copy link
Owner

Admiral-Piett commented Jan 21, 2025

@kojisaikiAtSony I thought we talked about this change before a bit didn't we? Maybe I'm misremembering. It has been a while. What this change looks like it's trying to do is that if you were to send in a MessageAttribute that is base64 encoded, this would decode that value and return it to you decoded in the RecieveMessage request. I believe I took out that kind of functionality because that's not what I see AWS doing in real life. (I can't really remember if I changed this, but I admit, I think I did, when I rebuilt some of these endpoints for version 0.5.)

So what I want to know, is what chain of events or calls exactly are you having trouble with, and in a way that is GoAWS behaving differently than what AWS would do in production? Screenshots and a full list of the different endpoints you are calling, in order would be helpful.

The reason I am asking, is that I just tested this on my real-life AWS account. I sent a message via an SNS Publish to a Topic which was subscribed to a queue, and then I received the message. Now, look at the picture below, that's the response my SDK got back from AWS. You can see in the MessageAttributes the encoded field has a Binary attribute there that is base64 encoded, that's the exact same value I sent in during the Publish call. I did the same test with just a SQS SendMessage then a RecieveMessage call. Same deal.

Screenshot 2025-01-21 at 4 59 26 PM

We know AWS is encoding it's Binary Values on the way in and decoding them on the way out. That is happening in AWS internally, but the way I see it, that's not relevant to what GoAWS needs to do. It's the request and response contracts that I want to mimic. How we do that internally is up to us, and what I see between the 2 seems to match. The SendMessage requested values are not base64 encoded and the ReceiveMessage response values are not base64 encoded. Same.

The only real difference I can see in my testing, is that if I did a Publish > RecieveMessage, and by the Subscription between my Topic and Queue has disabled Raw Delivery, then I think I am getting the attributes back in an base64 encoded state, as a part of that full message block. But I don't think that's what we're talking about here is it? That seems like a different issue to this one, and one I will need to look into more.

Let me know if I am misunderstanding anything here. Let's get this issue sorted for you once and for all.

@kojisaikiAtSony
Copy link
Contributor Author

kojisaikiAtSony commented Jan 29, 2025

Our application scenario is the below:

  1. Send a message with binary value attribute via SNS Publish API with aws-query (since latest AWS SDK uses aws-query for SNS)
  2. Receive the message with the attribute via SQS ReceiveMessage API with aws-json

(Since the latest AWS SDK for Java uses aws-query for SNS, and uses aws-json for SQS in the one SDK version....).

The problem we found is, the binary value in the received message is base64 encoded "twice".
Here is the screenshots of raw http requests to the latest GoAWS 0.5.2:

  1. Call SNS Publish with attribute value YmluYXJ5LXZhbHVl (base64 encoded string of binary-value).
    image

  2. Call SQS ReceiveMessage.
    The binary value in the received message is WW1sdVlYSjVMWFpoYkhWbA==, this is the base64 encoded string of YmluYXJ5LXZhbHVl, this is over-encoded from the original value binary-value. This is the problem.
    image

If we called SQS ReceiveMessage with aws-query, the value was YmluYXJ5LXZhbHVl.
image

@kojisaikiAtSony
Copy link
Contributor Author

kojisaikiAtSony commented Jan 29, 2025

Currently, when we test our scenario with various protocols, we get the following distortion:

Legend:
"Published" => value in SNS Publish request
"base64" => base64 encoded value
"raw" => raw binary
"Received" => value in SQS ReceiveMessage result

- Publishd (aws-query): base64 -> Inner Queue: base64 -> Received (aws-query): base64
- Publishd (aws-query): base64 -> Inner Queue: base64 -> Received (aws-json): base64 encoded twice (Our Issue)
- Publishd (aws-json): base64  -> Inner Queue: raw    -> Received (aws-query): raw (I ignore this since SQS never uses aws-query)
- Publishd (aws-json): base64  -> Inner Queue: raw    -> Received (aws-json): base64

With this PR, the behaviors will be simplified.

- Publishd (aws-query): base64 -> Inner Queue: raw -> Received (aws-query): raw (we can ignore now)
- Publishd (aws-query): base64 -> Inner Queue: raw -> Received (aws-json): base64 (Expected)
- Publishd (aws-json): base64  -> Inner Queue: raw -> Received (aws-query): raw
- Publishd (aws-json): base64  -> Inner Queue: raw -> Received (aws-json): base64

@kojisaikiAtSony kojisaikiAtSony force-pushed the added-base64-decode-on-publish-v2 branch from a450346 to f774423 Compare January 29, 2025 11:43
@kojisaikiAtSony
Copy link
Contributor Author

Actual AWS behaves as I expected.

  1. Call SNS Publish with attribute value YmluYXJ5LXZhbHVl (base64 encoded string of binary-value).
    image

  2. Call SQS ReceiveMessage. The response is YmluYXJ5LXZhbHVl (base64 encoded string of binary-value). It's expected.
    image

@kojisaikiAtSony
Copy link
Contributor Author

Hi @Admiral-Piett , I've expressed the above what we found in raw HTTP, not via the SDK. Does this make sense to you?

@kojisaikiAtSony
Copy link
Contributor Author

Hi @Admiral-Piett, if you have time, could you please check the additional comments about what we want to achieve in the above?

@Admiral-Piett
Copy link
Owner

@kojisaikiAtSony I'm sorry, this has been on my agenda, but I have been swamped with other things. I think I understand now, I can see a similar problem in my testing, but just to make sure I understand the issue you're seeing has to do with the following flow.

  1. Publish SNS Message - subscription to a queue set up as NOT raw delivery for XML (though I think this also affects raw delivery for JSON messages)
  2. Recieve SQS Message

Your issue is that, by the time you call RecieveMessage, the Binary Message Attributes have been base64 encoded beyond whatever they were when they were sent in (if they were at all - my tests didn't seem to require pre-encoded values). So in short, you're receiving a different value in the BinaryValue field for the message attribute than the one you sent in and you don't like that? Is that right?

I assume that is what we're talking about, and if so, I think you are right. That is indeed a bug. I don't think the proposed change is going to fix the root cause of this though. I think it's hinging on the fact that you've already base64 encoded it once to make this work, so a regular value like "BinaryValue: my-string" would still ruin it. This problem is a a by-product of the []byte type I implemented on that field, which I think may have been overzealous of me. That []byte type gets base64 encoded by the Marshallers by default because they can't return a []byte type as a value resulting value in XML or JSON responses, and as far as they're concerned, it could really be anything in those []byte fields, so they encode it.

For the fix I think there are 2 options or even steps to this.

  1. Convert MessageAttribute.BinaryValue to string type.
    a. We luck out because everything we receive in the Publish SNS request is going to be a string. URL forms are all strings, no matter what - and since SNS is still on the Form/XML format instead of JSON, this works. (A gotcha that probably supported how it was working before.)
  2. We go ahead and encode during Publish whenever we get a MessageAttribute.BinaryValue and store it internally as a base64 encoded string. Then, we do an optional step (it must be optional, raw VS not raw subscriptions will determine whether this is needed), decoding step (kind of as a best effort), to convert back and cast it from a []byte to an int, string, or boolean so that it can be returned.

Let me know what you think, but I think step 1 will fix your issue for today. Step 2 would need to be done eventually when we open up to the possibility of getting non-strings coming in via JSON, but for usability right now, that's not a concern. Do you still want to make this change? If so, you could do 1 or the other or both. If not, I can do it, and I'll probably do them both while I'm thinking of the issue. Let me know!

Squashed commits:
[4c0cd93] change binary value type to string
[f774423] fix endpoint (+1 squashed commit)
Squashed commits:
[a450346] fix want value (+1 squashed commit)
Squashed commits:
[a52b572] added base 64 deconde on publish
[cfae427] fix endpoint
[a450346] fix want value (+1 squashed commit)
Squashed commits:
[a52b572] added base 64 deconde on publish
@kojisaikiAtSony kojisaikiAtSony force-pushed the added-base64-decode-on-publish-v2 branch from f774423 to 36c63fb Compare February 20, 2025 11:46
@kojisaikiAtSony
Copy link
Contributor Author

Thank you for having time for this! You got our point correctly. And please say thank you for giving good alternatives!

I like the idea and I applied "Step1. Convert MessageAttribute.BinaryValue to string type." to this PR. It looks it worked well since the test Test_Publish_sqs_json_with_message_attributes_raw what we achieve has been passed.
And I agree to make the Step2 an issue for the future.

This is ready to merge so please take a look 👍

Comment on lines -116 to +118
addBytesToHash(hasher, attributeValue.BinaryValue)
bytes, _ := base64.StdEncoding.DecodeString(attributeValue.BinaryValue)
addBytesToHash(hasher, []byte(bytes))
Copy link
Contributor Author

@kojisaikiAtSony kojisaikiAtSony Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modification is necessary if we want to manage values ​​stored internally as base64 encoded strings. The MD5 hash must be generated from the decoded original value.
As your commented, previously the value was managed internally as a base64 encoded string,
so it was also decoded when hashing.
https://github.com/Admiral-Piett/goaws/blob/v0.4.8/app/common/common.go#L52-#L54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants