Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor bits for UTF8=ACCEPT #114

Merged
merged 4 commits into from
Feb 22, 2023
Merged

Minor bits for UTF8=ACCEPT #114

merged 4 commits into from
Feb 22, 2023

Conversation

arnt
Copy link
Contributor

@arnt arnt commented Feb 22, 2023

This really just makes testing a little bit more convenient, and avoids a warning when someone uses a utf8 string in code like foo.select("æ").

Sending as literal requires counting bytes, which is a pain while testing.
And it causes an unnecessary slowdown due to the server roundtrip.
Without UTF8=ACCEPT/IMAP4REV2, only ASCII is legal IMAP, and some servers
do reject 8-bit data in quoted-string productions. With either UTF8=ACCEPT
or IMAP4REV2 enabled, sending UTF-8 is legal.

ASCII-8BIT makes no sense in either case (and causes a warning).
Copy link
Collaborator

@nevans nevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you change the rdoc for UTF8=ACCEPT at lines 1934-1937? I think it would be fine to simply delete that paragraph.

lib/net/imap.rb Outdated Show resolved Hide resolved
Copy link
Collaborator

@nevans nevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I accidentally approved when I intended to "request changes". The SMTPUTF8 vs UTF8=ACCEPT needs to be resolved, at least. 🙂

@arnt
Copy link
Contributor Author

arnt commented Feb 22, 2023

That's what I get for merging conflicts manually while jetlagged.

I adjusted the documentation; even with UTF8=ACCEPT it's perfectly legal to send a message with Content-Type: text/plain; charset=ebcdic, and retrieving something like BODY[1] will then retrieve EBCDIC. With BINARY (RFC 3510) you might also get a .zip file in a literal, without any encoding.

@nevans
Copy link
Collaborator

nevans commented Feb 22, 2023

This is great, thanks!

@nevans nevans merged commit e5bcb67 into ruby:master Feb 22, 2023
@arnt arnt deleted the utf8accept branch February 22, 2023 21:45
@arnt
Copy link
Contributor Author

arnt commented Feb 22, 2023

One final comment.

I reviewed 9051 and 6855, and there are two differences. Both support UTF8 strings etc., but two things differ:

  1. 6855 forbids UID SEARCH CHARSET UTF8 SUBJECT rené, while 9051 allows that. Ruby code will generally not send that, so I don't think there is a problem, but it's something you should be aware of.
  2. 6855 offers extra APPEND syntax, which 9051 does not. net-imap does not use that, which is IMO good. All the syntax offers is a way to say "if there's any 8-bit in the headers of this message I'm appending, that's UTF8" so you can escape from 2047 hell. That's not a very generous offer.

AIUI you have a general plan of only sending the commands that are legal under both regimes, and I like that plan.

The other final comment is — ask me any favour. My favourite dynamic language plus these quick responses. What a combination. I'm charmed.

@nevans
Copy link
Collaborator

nevans commented Feb 23, 2023

Thanks again. Your expertise is much appreciated! I'll push a very simple documentation PR with your notes in it soon, maybe tonight or tomorrow.

As it currently stands, net-imap does little to no validation or normalization of most command arguments. It has some basic encoding heuristics (e.g. arrays become parenthesized lists) but most of the time it is very "free form" and allows the users of the library to do what they will, for better and for worse. This does make it very easy to "support" certain specifications: out of scope == not our problem. 😉 On the other hand, we should at least provide documentation (with links into RFCs) for important points like these.


I do have plans to add more "smarts" to net-imap, to make a more "ergonomic" interface and to protect against basic, easily caught mistakes. But I also want to preserve backward compatibility and retain the freedom to easily "cheat" a bit. I'm slowly working on a set of "generic extensions" PRs right now, mostly for #35, #43, #44, and #49. And they should simultaneously move us in the direction of more freedom and extensibility but also better definitions and validations, ultimately making it easier for users of the library to follow the specs.

To your first point, we currently require the user to provide the appropriate charset, sending nothing when charset is nil. I intend to make charset an optional parameter and auto-detect the appropriate charset when none is explicitly given or raise an exception when incompatible charsets are used.

To your second point, I'm glad you don't consider the literal8/utf8-literal extensions to append to be a priority... because I wasn't prioritizing them either (I do have an implementation of BINARY FETCH nearly ready to go). I'm assuming that nearly every ruby script or app that touches email uses the mail gem, and (IIRC) although it can parse UTF8 email, it still generates US-ASCII by default.

@nevans
Copy link
Collaborator

nevans commented Feb 25, 2023

we currently require the user to provide the appropriate charset

I don't know why I was thinking this. It's already optional, as it should be. I must have been looking at the private internal_search method. (I still intend to add auto-detection code as part of my RFC4466 return opts commit.)

Anyway, I'm incorporating your notes into the rdoc for search, append, and enable, right now. 🙂

@nevans nevans added the IMAP4rev2 Requirement for IMAP4rev2, RFC9051 label Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IMAP4rev2 Requirement for IMAP4rev2, RFC9051
Development

Successfully merging this pull request may close these issues.

2 participants