Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for en-US-posix #2928

Open
zbraniecki opened this issue Jan 6, 2021 · 14 comments
Open

Test for en-US-posix #2928

zbraniecki opened this issue Jan 6, 2021 · 14 comments

Comments

@zbraniecki
Copy link
Member

zbraniecki commented Jan 6, 2021

In https://bugzilla.mozilla.org/show_bug.cgi?id=1685075 we found a potential bug in JSC/V8 behavior around en-US-posix canonicalization. @anba believes the bug to be different for JSC and for V8, but the end result is that en-US-posix locale, which is present in CLDR data, is always stripped of the variant because JSC/V8 use ICU for canonicalization and that's the behavior ICU uses.

@FrankYFTang - what's your take here? What behavior would you like to see for

let a = Intl.Collator("en-US");
let b = Intl.Collator("en-US-posix");
a.compare("Virtio block device", "Virtio SCSI") // -1 in both Chrome and Firefox
b.compare("Virtio block device", "Virtio SCSI") // -1 in Chrome, 1 in Firefox
a.resolvedOptions().locale // "en-US" in both Chrome and Firefox
b.resolvedOptions().locale // "en-US" in Chrome, "en-US-posix" in Firefox

we should encode a test for the generated consensus :)

@FrankYFTang
Copy link
Contributor

There is a fix related to en-US-posix recently. And I cherrypick that to the latest v8 trunk.
The current v8 trunk show the following

d8> let a = Intl.Collator("en-US");
let b = Intl.Collator("en-US-posix");undefined
d8> 
undefined
d8> a.resolvedOptions().locale
"en-US"
d8> b.resolvedOptions().locale
"en-US-u-va-posix"
d8> a.compare("Virtio block device", "Virtio SCSI") 
-1
d8> b.compare("Virtio block device", "Virtio SCSI")
-1

@FrankYFTang
Copy link
Contributor

I believe are facing two different issue here

  1. what is the name of the locale after canonicalization for "en-US-posix"
  2. the sorting order for "en-US-posix"
    The 15.5.4.9_CE should check for locale-sensitive comparison #2 issue is a simpler one, because I know the cause is because we remove the collation data in our filter file. I may add them back https://bugs.chromium.org/p/v8/issues/detail?id=11304

@zbraniecki
Copy link
Member Author

I am less concerned about (2), since implementers are free to chose which locales they carry data for, and our spec is supposed to handle lack of data for es-US-posix gracefully.

I am much more concerned about (1), as it seems like ICU4C internal logic leaking onto most popular ECMA-402 implementation in a spec-incompatible way.
es-US-posix is a valid language identifier and, if I'm not mistaken, should not be modified or canonicalized to en-US-u-va-posix in V8.

@FrankYFTang
Copy link
Contributor

ok, if (1) is what you are concerning about, I will look into the reason why is that changed.

@FrankYFTang
Copy link
Contributor

FrankYFTang commented Jan 12, 2021

I am not fully understand the whole picture yet, but read this first
http://unicode.org/reports/tr35/#Legacy_Variants

When converting to old syntax, the Unicode locale extension "-u-va-posix" should be converted to the "POSIX" variant, not to old extension syntax like "@va=posix". This is an exception: The other mappings above should not be reversed.

Examples:

en_US_POSIX ↔ en-US-u-va-posix
en_US_POSIX@colNumeric=yes ↔ en-US-u-kn-va-posix
en-US-POSIX-u-kn-true → en-US-u-kn-va-posix
en-US-POSIX-u-kn-va-posix → en-US-u-kn-va-posix

@FrankYFTang
Copy link
Contributor

Notice "POSIX" is not a registered variant you can find in
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry

@zbraniecki
Copy link
Member Author

@zbraniecki
Copy link
Member Author

In the downstream ticket, Anba responded:

I don't think https://unicode.org/reports/tr35/#Legacy_Variants applies for "Unicode BCP 47 locale identifiers", but instead only for older locale identifier syntaxes. In the test262 ticket, you mentioned:

[...] https://unicode.org/reports/tr35/#Canonical_Unicode_Locale_Identifiers which calls https://unicode.org/reports/tr35/#Legacy_Variants .

But I don't see any reference to "3.8.2 Legacy Variants" in "3.2.1 Canonical Unicode Locale Identifiers". And I also don't see it mentioned in Annex C. LocaleId Canonicalization.

Therefore I still think the correct canonicalisation (in an ECMA-402 context) for en-US-posix is en-US-posix.

@zbraniecki
Copy link
Member Author

@FrankYFTang ^ thoughts?

@FrankYFTang
Copy link
Contributor

I agree with what Anba said about the lack of connection between that two sections in the current version of UTS35 and I try to get Mark Davis who author UTS35 to look into this. It could be something he missed and need to add back to UTS35. So could we held the adding that test for now? Notice "POSIX" is not a registered variant you can find in
https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry and I believe that is one key reason why ICU have to change en-US-POSIX to en-US-u-va-posix. Because POSIX is not a "valid" variant per such registry.

@Constellation
Copy link
Member

Filed WebKit tracking issue in https://bugs.webkit.org/show_bug.cgi?id=221169

@zbraniecki
Copy link
Member Author

@FrankYFTang any progress? We got another bug in result of spec-compliant behavior on FreeBSD - https://bugzilla.mozilla.org/show_bug.cgi?id=1690795

@FrankYFTang
Copy link
Contributor

Somehow the private email exchanged got lost track. File CLDR bug for UTS35 here https://unicode-org.atlassian.net/browse/CLDR-14487

@FrankYFTang
Copy link
Contributor

double file ICU ticket https://unicode-org.atlassian.net/browse/ICU-21489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants