Minor bits for UTF8=ACCEPT #114

arnt · 2023-02-22T11:57:25Z

This really just makes testing a little bit more convenient, and avoids a warning when someone uses a utf8 string in code like foo.select("æ").

Sending as literal requires counting bytes, which is a pain while testing. And it causes an unnecessary slowdown due to the server roundtrip.

Without UTF8=ACCEPT/IMAP4REV2, only ASCII is legal IMAP, and some servers do reject 8-bit data in quoted-string productions. With either UTF8=ACCEPT or IMAP4REV2 enabled, sending UTF-8 is legal. ASCII-8BIT makes no sense in either case (and causes a warning).

nevans

Thanks! Could you change the rdoc for UTF8=ACCEPT at lines 1934-1937? I think it would be fine to simply delete that paragraph.

lib/net/imap.rb

nevans

Oops, I accidentally approved when I intended to "request changes". The SMTPUTF8 vs UTF8=ACCEPT needs to be resolved, at least. 🙂

Co-authored-by: nicholas a. evans <[email protected]>

arnt · 2023-02-22T18:12:21Z

That's what I get for merging conflicts manually while jetlagged.

I adjusted the documentation; even with UTF8=ACCEPT it's perfectly legal to send a message with Content-Type: text/plain; charset=ebcdic, and retrieving something like BODY[1] will then retrieve EBCDIC. With BINARY (RFC 3510) you might also get a .zip file in a literal, without any encoding.

nevans · 2023-02-22T21:42:12Z

This is great, thanks!

arnt · 2023-02-22T21:56:06Z

One final comment.

I reviewed 9051 and 6855, and there are two differences. Both support UTF8 strings etc., but two things differ:

6855 forbids UID SEARCH CHARSET UTF8 SUBJECT rené, while 9051 allows that. Ruby code will generally not send that, so I don't think there is a problem, but it's something you should be aware of.
6855 offers extra APPEND syntax, which 9051 does not. net-imap does not use that, which is IMO good. All the syntax offers is a way to say "if there's any 8-bit in the headers of this message I'm appending, that's UTF8" so you can escape from 2047 hell. That's not a very generous offer.

AIUI you have a general plan of only sending the commands that are legal under both regimes, and I like that plan.

The other final comment is — ask me any favour. My favourite dynamic language plus these quick responses. What a combination. I'm charmed.

nevans · 2023-02-23T23:56:57Z

Thanks again. Your expertise is much appreciated! I'll push a very simple documentation PR with your notes in it soon, maybe tonight or tomorrow.

As it currently stands, net-imap does little to no validation or normalization of most command arguments. It has some basic encoding heuristics (e.g. arrays become parenthesized lists) but most of the time it is very "free form" and allows the users of the library to do what they will, for better and for worse. This does make it very easy to "support" certain specifications: out of scope == not our problem. 😉 On the other hand, we should at least provide documentation (with links into RFCs) for important points like these.

I do have plans to add more "smarts" to net-imap, to make a more "ergonomic" interface and to protect against basic, easily caught mistakes. But I also want to preserve backward compatibility and retain the freedom to easily "cheat" a bit. I'm slowly working on a set of "generic extensions" PRs right now, mostly for #35, #43, #44, and #49. And they should simultaneously move us in the direction of more freedom and extensibility but also better definitions and validations, ultimately making it easier for users of the library to follow the specs.

To your first point, we currently require the user to provide the appropriate charset, sending nothing when charset is nil. I intend to make charset an optional parameter and auto-detect the appropriate charset when none is explicitly given or raise an exception when incompatible charsets are used.

To your second point, I'm glad you don't consider the literal8/utf8-literal extensions to append to be a priority... because I wasn't prioritizing them either (I do have an implementation of BINARY FETCH nearly ready to go). I'm assuming that nearly every ruby script or app that touches email uses the mail gem, and (IIRC) although it can parse UTF8 email, it still generates US-ASCII by default.

nevans · 2023-02-25T16:20:52Z

we currently require the user to provide the appropriate charset

I don't know why I was thinking this. It's already optional, as it should be. I must have been looking at the private internal_search method. (I still intend to add auto-detection code as part of my RFC4466 return opts commit.)

Anyway, I'm incorporating your notes into the rdoc for search, append, and enable, right now. 🙂

arnt added 2 commits February 22, 2023 12:47

Send single-line UTF8 strings as string when permitted.

cf31764

Sending as literal requires counting bytes, which is a pain while testing. And it causes an unnecessary slowdown due to the server roundtrip.

nevans approved these changes Feb 22, 2023

View reviewed changes

lib/net/imap.rb Outdated Show resolved Hide resolved

nevans requested changes Feb 22, 2023

View reviewed changes

arnt and others added 2 commits February 22, 2023 18:51

Fix bad manual conflict resolution

a6a7664

Co-authored-by: nicholas a. evans <[email protected]>

More accurate documentation

e0656f3

nevans merged commit e5bcb67 into ruby:master Feb 22, 2023

arnt deleted the utf8accept branch February 22, 2023 21:45

nevans added the IMAP4rev2 Requirement for IMAP4rev2, RFC9051 label Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minor bits for UTF8=ACCEPT #114

Minor bits for UTF8=ACCEPT #114

arnt commented Feb 22, 2023

nevans left a comment •

edited

Loading

nevans left a comment

arnt commented Feb 22, 2023 •

edited

Loading

nevans commented Feb 22, 2023

arnt commented Feb 22, 2023

nevans commented Feb 23, 2023

nevans commented Feb 25, 2023

Minor bits for UTF8=ACCEPT #114

Minor bits for UTF8=ACCEPT #114

Conversation

arnt commented Feb 22, 2023

nevans left a comment • edited Loading

Choose a reason for hiding this comment

nevans left a comment

Choose a reason for hiding this comment

arnt commented Feb 22, 2023 • edited Loading

nevans commented Feb 22, 2023

arnt commented Feb 22, 2023

nevans commented Feb 23, 2023

nevans commented Feb 25, 2023

nevans left a comment •

edited

Loading

arnt commented Feb 22, 2023 •

edited

Loading