Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix transliterator fallback loading #5489

Merged
merged 5 commits into from
Sep 6, 2024

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian requested a review from a team as a code owner September 4, 2024 08:54
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some commentary

)),
s => Err(DataError::custom("unavailable transliterator").with_debug_context(s)),
))),
_ => None,
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion (optional): invert load_with_override to also return Option<Result>

Comment on lines 311 to 314
.strip_prefix("x-")
.map(|special| Transliterator::load_special(special, normalizer_provider))
.and_then(|special| Transliterator::load_special(special, normalizer_provider))
// b) the user-provided override
.or_else(|| Transliterator::load_with_override(&dep, lookup?).transpose())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the logic here:

  1. dep.strip_prefix returns Option<...>
  2. .and_then now resolves to Option<Result<...>>. Will be None if either strip_prefix failed or if load_special did not match anything. OK so far.
  3. .or_else is skipped if load_special returned Some. Otherwise, it returns Option<Result<...>>, with a None value if there was no override. Still OK.
  4. .unwrap_or_else returns the Result<...> or else calls load_rbt and propagates the result from that.

Previously, the behavior was:

  1. dep.strip_prefix returns Option<...>
  2. .map resolves to Option<Result<...>>. Will be None only if strip_prefix failed.

So the difference is that we never previously called load_with_override or load_rbt if the ID started with the string x-. As far as I can tell, the x- gets added here:

unparsed.replace_range(0..0, "x-");

Note that load_with_override already fails unconditionally on x- because it needs the ID to be valid BCP-47. However, load_rbt just takes the ID and stuffs it into the DataMarkerAttributes.

Basically, I think this means that a string referencing a transliterator ID that doesn't have a BCP-47 alias, such as "::Lower" (which appears to become "x-any-lower"), could now be returned by a data provider that responds to requests for "und/x-any-lower" (using "/" as the data marker delimiter). So, that is the nature of the behavior change.

Without this change, it is required to always add a BCP-47 alias mappings for any transliterator IDs that are not in the built-in set.

Manishearth
Manishearth previously approved these changes Sep 5, 2024
sffc
sffc previously approved these changes Sep 6, 2024
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Were you comfortable with the observations from the comments I left above?

@robertbastian
Copy link
Member Author

Yes

@robertbastian robertbastian merged commit 7193a94 into unicode-org:main Sep 6, 2024
28 checks passed
@robertbastian robertbastian deleted the special branch October 17, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants