-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control characters incorrectly ignored in OSC string #5173
Comments
We took the state handling from vt100.net, which lists for OSC_STRING state:
Paul Williams has tested that against multiple DEC VT devices, so I tend to believe, that xterm might be slightly off here. |
The de-facto standard is xterm. It is more important to match xterm than essentially-non-existent hardware. The project is called xterm.js, not vt100.js. The demo sets the |
Sorry, I strongly disagree. The parser is the best tested thing we have in a continuity sense, thanks to Paul Williams. While xterm had to reverse engineer most things back in the 80s/90s, when literally no docs were available. So there are still flaws hidden there.
Yes I am aware of the name conflict. I really dont like the name and wished more than once it would not carry that name.
I voted more than once for a proper terminfo entry and an overall better TE options discovery. I literally never got any serious or constructive response to any of that. |
Perhaps, but that ship has sailed. Almost all who want to use term.js will want to have modern extensions. Nobody wants a vt100. Modern extensions have standardized on xterm or a superset thereof. See my new example. The Maybe we need the equivalent of gcc's For what it's worth, if you run |
We cannot easily change the parser rules, as it will have unforeseen side effects on the shape of sequences. OSC is specced to safely transmit only ASCII printables, so no control chars. If your HTML snippets contain meaningful chars outside of that range (even higher UTF-8 or any other 8bit or multibyte encoding) you gonna need to use a transport encoding like base64. Thats the gist of it and done that way by all more complex OSC sequences like iTerm's IIP. Btw DCS is a bit more free in its payload letting a few more control chars pass, but in general the same rule applies here for arbitrary bytes. ECMA-35 allows to spec custom sequence payload protocols with a special modifier sequence (was it called DOCS?), but thats def. dead and not supported by most TEs I have tested (played around with that when trying to create a sound transport sequence). For best support across TEs we are kinda bound to OSC with ASCII printables (and to DCS with less support across current TEs). |
I believe we have to allow general UTF-8 printable characters. There are a number of existing "standard" OSC escape sequences that take raw text without base64-encoding. For example OSC 2 (change window title) and OSC 8 (set link URL). We cannot restrict these to ASCII without breaking applications and reasonable expectations. OSC and DCS payloads cannot pass through arbitrary binary content without encoding, but they should handle all valid "text" content. Pedantic compatibility with some obsolete pre-Unicode specification is not a good argument against. "Text content" (as may appear in "text files") includes printable characters, joiners, and valid whitespace. Portable valid whitespace includes at minimum space, tabs, and standard line-endings (LF or CR+LF). Might as well also allow at least VT, NEL, and FF, even if not as "portable" or well-defined.
I think it is unlikely any application would misbehave if LF is passed through by OSC rather than ignored. If there is, either the application or the OSC handler can easily be fixed - and should be. However, having LF be ignored rather than passed through is more likely to cause misbehavior, and is more complicated to work around. |
I just checked ECMA-48 - it also allows 00/08-00/13 (BS, HT, LF, VT, FF CR) in OSC. Seems DEC had a stricter idea about OSC, idk why. So I am not totally opposed to allowing those, as it would bring us closer to ECMA conformance. We still need to check for all OSC handlers, that they dont break badly (warning: the fact that DCS also allows LF caused us a CVE in the past). I think your argument about about UTF8 is a "false friend" for your HTML purpose. HTML itself can be encoded in any text encoding, not only UTF8. Since you use system tools to generate those snippets it is not even guaranteed, that a custom system locale will not interfere here. Imho as an OSC sequence creator you should be prepared for that, e.g. make the sequence more robust in that regard. In summary - if you dont want other TEs to choke on your OSC sequence, it should stay in ASCII printables. |
Most control characters (including CR and LF) are ignored in OSC_STRING.
This is inconsistent with xterm, which I think we should follow in this respect. See the
sos_table
inVTPrsTbl.c
in the xterm sources: "The CASE_IGNORE entries correspond to the characters that can be accumulated for the string function (e.g., OSC)." I also verified this using thegdb
debugger.A simple fix in
EscapeSequenceParse.ts
is to replace:by:
Optionally, we can also tweak the handling of
ParserAction.OSC_PUT
in line 769:My suggestion is to not change that line.
I will submit a pull request if this looks reasonable.
The text was updated successfully, but these errors were encountered: