Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 encoder allows to encode codepoints in range #xD800 - #xDFFF #47

Open
Gleefre opened this issue Oct 11, 2023 · 0 comments
Open

UTF-8 encoder allows to encode codepoints in range #xD800 - #xDFFF #47

Gleefre opened this issue Oct 11, 2023 · 0 comments

Comments

@Gleefre
Copy link

Gleefre commented Oct 11, 2023

Such code-points do not represent unicode characters.
This also breaks the non-ambiguity of :utf-8 encoding:

(babel:string-to-octets (string (code-char #xd800)))
; => #(237 160 128)
(babel:octets-to-string *)
; Evaluation aborted on #<BABEL-ENCODINGS:CHARACTER-OUT-OF-RANGE {10053D9533}>.

For example sbcl throws an error in such case:

(sb-ext:string-to-octets (string (code-char #xd800)))
; Evaluation aborted on #<SB-IMPL::OCTETS-ENCODING-ERROR {10013BEA23}>.

This seems to affect some other utf/ucs encodings as well (like :utf-16be or :utf-16le).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant