generated from explainers-by-googlers/template
-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathindex.bs
714 lines (457 loc) · 36.8 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
<pre class='metadata'>
Title: Translator and Language Detector APIs
Shortname: translation
Level: None
Status: CG-DRAFT
Group: webml
Repository: webmachinelearning/translation-api
URL: https://webmachinelearning.github.io/translation-api
Editor: Domenic Denicola, Google https://google.com, [email protected], https://domenic.me/
Abstract: The translator and language detector APIs gives web pages the ability to translate text between languages, and detect the language of such text.
Markup Shorthands: markdown yes, css no
Complain About: accidental-2119 yes, missing-example-ids yes
Assume Explicit For: yes
Default Biblio Status: current
Boilerplate: omit conformance
Indent: 2
Die On: warning
</pre>
<pre class="anchors">
urlPrefix: https://tc39.es/ecma402/; spec: ECMA-402
type: dfn
text: Unicode canonicalized locale identifier; url: sec-language-tags
type: abstract-op
text: LookupMatchingLocaleByBestFit; url: sec-lookupmatchinglocalebybestfit
</pre>
<h2 id="intro">Introduction</h2>
For now, see the [explainer](https://github.com/webmachinelearning/translation-api/blob/main/README.md).
<h2 id="translator-api">The translator API</h2>
<xmp class="idl">
partial interface AI {
readonly attribute AITranslatorFactory translator;
};
[Exposed=(Window,Worker), SecureContext]
interface AITranslatorFactory {
Promise<AITranslator> create(AITranslatorCreateOptions options);
Promise<AIAvailability> availability(AITranslatorCreateCoreOptions options);
};
[Exposed=(Window,Worker), SecureContext]
interface AITranslator {
Promise<DOMString> translate(
DOMString input,
optional AITranslatorTranslateOptions options = {}
);
ReadableStream translateStreaming(
DOMString input,
optional AITranslatorTranslateOptions options = {}
);
readonly attribute DOMString sourceLanguage;
readonly attribute DOMString targetLanguage;
};
AITranslator includes AIDestroyable;
dictionary AITranslatorCreateCoreOptions {
required DOMString sourceLanguage;
required DOMString targetLanguage;
};
dictionary AITranslatorCreateOptions : AITranslatorCreateCoreOptions {
AbortSignal signal;
AICreateMonitorCallback monitor;
};
dictionary AITranslatorTranslateOptions {
AbortSignal signal;
};
</xmp>
Every {{AI}} has a <dfn for="AI">translator factory</dfn>, an {{AITranslatorFactory}} object. Upon creation of the {{AI}} object, its [=AI/translator factory=] must be set to a [=new=] {{AITranslatorFactory}} object created in the {{AI}} object's [=relevant realm=].
The <dfn attribute for="AI">translator</dfn> getter steps are to return [=this=]'s [=AI/translator factory=].
<h3 id="translator-creation">Creation</h3>
<div algorithm>
The <dfn method for="AITranslatorFactory">create(|options|)</dfn> method steps are:
1. If [=this=]'s [=relevant global object=] is a {{Window}} whose [=associated Document=] is not [=Document/fully active=], then return [=a promise rejected with=] an "{{InvalidStateError}}" {{DOMException}}.
1. If |options|["{{AITranslatorCreateOptions/signal}}"] [=map/exists=] and is [=AbortSignal/aborted=], then return [=a promise rejected with=] |options|["{{AITranslatorCreateOptions/signal}}"]'s [=AbortSignal/abort reason=].
1. [=Validate and canonicalize translator options=] given |options|.
<p class="note">This can mutate |options|.
1. Return the result of [=creating an AI model object=] given [=this=]'s [=relevant realm=], |options|, [=compute translator options availability=], [=download the translation model=], [=initialize the translation model=], and [=create a translator object=].
</div>
<div algorithm>
To <dfn>validate and canonicalize translator options</dfn> given an {{AITranslatorCreateCoreOptions}} |options|, perform the following steps. They mutate |options| in place to canonicalize language tags, and throw a {{TypeError}} if any are invalid.
1. [=Validate and canonicalize language tags=] given |options| and "{{AITranslatorCreateCoreOptions/sourceLanguage}}".
1. [=Validate and canonicalize language tags=] given |options| and "{{AITranslatorCreateCoreOptions/targetLanguage}}".
</div>
<div algorithm>
To <dfn>download the translation model</dfn>, given an {{AITranslatorCreateCoreOptions}} |options|:
1. [=Assert=]: these steps are running [=in parallel=].
1. Initiate the download process for everything the user agent needs to translate text from |options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"] to |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"].
This could include both a base translation model and specific language arc material, or perhaps material for multiple language arcs if an intermediate language is used.
1. If the download process cannot be started for any reason, then return false.
1. Return true.
</div>
<div algorithm>
To <dfn>initialize the translation model</dfn>, given an {{AITranslatorCreateCoreOptions}} |options|:
1. [=Assert=]: these steps are running [=in parallel=].
1. Perform any necessary initialization operations for the AI model backing the user agent's capabilities for translating from |options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"] to |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"].
This could include loading the model into memory, or loading any fine-tunings necessary to support the specific options in question.
1. If initialization failed for any reason, then return false.
1. Return true.
</div>
<div algorithm>
To <dfn>create a translator object</dfn>, given a [=ECMAScript/realm=] |realm| and an {{AITranslatorCreateCoreOptions}} |options|:
1. [=Assert=]: these steps are running on |realm|'s [=ECMAScript/surrounding agent=]'s [=agent/event loop=].
1. Return a new {{AITranslator}} object, created in |realm|, with
<dl class="props">
: [=AITranslator/source language=]
:: |options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"]
: [=AITranslator/target language=]
:: |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"]
</dl>
</div>
<h3 id="translator-availability">Availability</h3>
<!-- TODO: consider deduping this with writing assistance APIs + language detector, as it's very similar. -->
<div algorithm>
The <dfn method for="AITranslatorFactory">availability(|options|)</dfn> method steps are:
1. If [=this=]'s [=relevant global object=] is a {{Window}} whose [=associated Document=] is not [=Document/fully active=], then return [=a promise rejected with=] an "{{InvalidStateError}}" {{DOMException}}.
1. [=Validate and canonicalize translator options=] given |options|.
1. Let |promise| be [=a new promise=] created in [=this=]'s [=relevant realm=].
1. [=In parallel=]:
1. Let |availability| be the result of [=computing translator options availability=] given |options|.
1. [=Queue a global task=] on the [=AI task source=] given [=this=]'s [=relevant global object=] to perform the following steps:
1. If |availability| is null, then [=reject=] |promise| with an "{{UnknownError}}" {{DOMException}}.
1. Otherwise, [=resolve=] |promise| with |availability|.
</div>
<div algorithm>
To <dfn>compute translator options availability</dfn> given an {{AITranslatorCreateCoreOptions}} |options|, perform the following steps. They return either an {{AIAvailability}} value or null, and they mutate |options| in place to update language tags to their best-fit matches.
1. [=Assert=]: this algorithm is running [=in parallel=].
1. Let |availabilities| be the user agent's [=translator language arc availabilities=].
1. If |availabilities| is null, then return null.
1. [=map/For each=] |languageArc| → |availability| in |availabilities|:
1. Let |sourceLanguageBestFit| be [$LookupMatchingLocaleByBestFit$](« |languageArc|'s [=language arc/source language=] », « |options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"] »).
1. Let |targetLanguageBestFit| be [$LookupMatchingLocaleByBestFit$](« |languageArc|'s [=language arc/target language=] », « |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"] »).
1. If |sourceLanguageBestFit| and |targetLanguageBestFit| are both not undefined, then:
1. Set |options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"] to |sourceLanguageBestFit|.\[[locale]].
1. Set |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"] to |targetLanguageBestFit|.\[[locale]].
1. Return |availability|.
1. If (|options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"], |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"]) [=language arc/can be fulfilled by the identity translation=], then return "{{AIAvailability/available}}".
<p class="note">Such cases could also return "{{AIAvailability/downloadable}}", "{{AIAvailability/downloading}}", or "{{AIAvailability/available}}" because of the above steps, if the user agent has specific entries in its [=translator language arc availabilities=] for the given language arc. However, the identity translation is always available, so this step ensures that we never return "{{AIAvailability/unavailable}}" for such cases.
<div class="example" id="example-identity-translation">
<p>One [=language arc=] that [=language arc/can be fulfilled by the identity translation=] is (`"en-US"`, `"en-GB"`). It is conceivable that an implementation might support a specialized model for this translation, which would show up in the [=translator language arc availabilities=].
<p>On the other hand, it's pretty unlikely that an implementation has any specialized model for the [=language arc=] ("`en-x-asdf`", "`en-x-xyzw`"). In such a case, this step takes over, and later calls to the [=translate=] algorithm will use the identity translation.
<p>Note that when this step takes over, |options|["{{AITranslatorCreateCoreOptions/sourceLanguage}}"] and |options|["{{AITranslatorCreateCoreOptions/targetLanguage}}"] are not modified, so if this algorithm is being called from {{AITranslatorFactory/create()}}, that means the resulting {{AITranslator}} object's {{AITranslator/sourceLanguage}} and {{AITranslator/targetLanguage}} properties will return the original inputs, and not some canonicalized form.
</div>
1. Return "{{AIAvailability/unavailable}}".
</div>
A <dfn>language arc</dfn> is a [=tuple=] of two strings, a <dfn for="language arc">source language</dfn> and a <dfn for="language arc">target language</dfn>. Each item is a [=Unicode canonicalized locale identifier=].
<div algorithm>
The <dfn>translator language arc availabilities</dfn> are given by the following steps. They return a [=map=] from [=language arcs=] to {{AIAvailability}} values, or null.
1. [=Assert=]: this algorithm is running [=in parallel=].
1. If there is some error attempting to determine what language arcs the user agent supports translating text between, which the user agent believes to be transient (such that re-querying the [=translator language arc availabilities=] could stop producing such an error), then return null.
1. Return a [=map=] from [=language arcs=] to {{AIAvailability}} values, where each key is a [=language arc=] that the user agent supports translating text between, filled according to the following constraints:
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=] without performing any downloading operations, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{AIAvailability/available}}".
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after finishing a currently-ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{AIAvailability/downloading}}".
* If the user agent supports translating text from the [=language arc/source language=] to the [=language arc/target language=] of the [=language arc=], but only after performing a not-currently ongoing download, then the map must contain an [=map/entry=] whose [=map/key=] is that [=language arc=] and whose [=map/value=] is "{{AIAvailability/downloadable}}".
* The [=map/keys=] must not include any [=language arcs=] that [=language arc/overlap=] with the other [=map/keys=].
</div>
<div class="example" id="example-language-arc-support">
Let's suppose that the user agent's [=translator language arc availabilities=] are as follows:
* ("`en`", "`zh-Hans`") → "{{AIAvailability/available}}"
* ("`en`", "`zh-Hant`") → "{{AIAvailability/downloadable}}"
The use of [$LookupMatchingLocaleByBestFit$] means that {{AITranslatorFactory/availability()}} will probably give the following answers:
<xmp class="language-js">
function a(sourceLanguage, targetLanguage) {
return ai.translator.availability({ sourceLanguage, targetLanguage }):
}
await a("en", "zh-Hans") === "available";
await a("en", "zh-Hant") === "downloadable";
await a("en", "zh") === "available"; // zh will best-fit to zh-Hans
await a("en", "zh-TW") === "downloadable"; // zh-TW will best-fit to zh-Hant
await a("en", "zh-HK") === "available"; // zh-HK will best-fit to zh-Hans
await a("en", "zh-CN") === "available"; // zh-CN will best-fit to zh-Hans
await a("en-US", "zh-Hant") === "downloadable"; // en-US will best-fit to en
await a("en-GB", "zh-Hant") === "downloadable"; // en-GB will best-fit to en
// Even very unexpected subtags will best-fit to en or zh-Hans
await a("en-Braille-x-lolcat", "zh-Hant") === "downloadable";
await a("en", "zh-BR-Kana") === "available";
</xmp>
</div>
<div algorithm>
A [=language arc=] |arc| <dfn for="language arc">overlaps</dfn> with a [=set=] of [=language arcs=] |otherArcs| if the following steps return true:
1. Let |sourceLanguages| be the [=set=] composed of the [=language arc/source languages=] of each [=set/item=] in |otherArcs|.
1. If [$LookupMatchingLocaleByBestFit$](|sourceLanguages|, « |arc|'s [=language arc/source language=] ») is not undefined, then return true.
1. Let |targetLanguages| be the [=set=] composed of the [=language arc/target languages=] of each [=set/item=] in |otherArcs|.
1. If [$LookupMatchingLocaleByBestFit$](|targetLanguages|, « |arc|'s [=language arc/target language=] ») is not undefined, then return true.
1. Return false.
</div>
<div class="example" id="example-language-arc-overlap">
The [=language arc=] ("`en`", "`fr`") [=language arc/overlaps=] with « ("`en`", "`fr-CA`") », so the user agent's [=translator language arc availabilities=] cannot contain both of these [=language arcs=] at the same time.
Instead, a typical user agent will either support only one English-to-French language arc (presumably ("`en`", "`fr`")), or it could support multiple non-overlapping English-to-French language arcs, such as ("`en`", "`fr-FR`"), ("`en`", "`fr-CA`"), and ("`en`", "`fr-CH`").
In the latter case, if the web developer requested to create a translator using <code highlight="js">ai.translator.create({ sourceLanguage: "en", targetLanguage: "fr" })</code>, the [$LookupMatchingLocaleByBestFit$] algorithm would choose one of the three possible language arcs to use (presumably ("`en`", "`fr-FR`")).
</div>
<div algorithm>
A [=language arc=] |arc| <dfn for="language arc">can be fulfilled by the identity translation</dfn> if the following steps return true:
1. If [$LookupMatchingLocaleByBestFit$](« |arc|'s [=language arc/source language=] », « |arc|'s [=language arc/target language=] ») is not undefined, then return true.
1. If [$LookupMatchingLocaleByBestFit$](« |arc|'s [=language arc/target language=] », « |arc|'s [=language arc/source language=] ») is not undefined, then return true.
1. Return false.
</div>
<h3 id="the-aitranslator-class">The {{AITranslator}} class</h3>
Every {{AITranslator}} has a <dfn for="AITranslator">source language</dfn>, a [=string=], set during creation.
Every {{AITranslator}} has a <dfn for="AITranslator">target language</dfn>, a [=string=], set during creation.
<hr>
The <dfn attribute for="AITranslator">sourceLanguage</dfn> getter steps are to return [=this=]'s [=AITranslator/source language=].
The <dfn attribute for="AITranslator">targetLanguage</dfn> getter steps are to return [=this=]'s [=AITranslator/target language=].
<hr>
<div algorithm>
The <dfn method for="AITranslator">translate(|input|, |options|)</dfn> method steps are:
1. Let |operation| be an algorithm step which takes arguments |chunkProduced|, |done|, |error|, and |stopProducing|, and [=translates=] |input| given [=this=]'s [=AITranslator/source language=], [=this=]'s [=AITranslator/target language=], |chunkProduced|, |done|, |error|, and |stopProducing|.
1. Return the result of [=getting an aggregated AI model result=] given [=this=], |options|, and |operation|.
</div>
<div algorithm>
The <dfn method for="AITranslator">translateStreaming(|input|, |options|)</dfn> method steps are:
1. Let |operation| be an algorithm step which takes arguments |chunkProduced|, |done|, |error|, and |stopProducing|, and [=translates=] |input| given [=this=]'s [=AITranslator/source language=], [=this=]'s [=AITranslator/target language=], |chunkProduced|, |done|, |error|, and |stopProducing|.
1. Return the result of [=getting a streaming AI model result=] given [=this=], |options|, and |operation|.
</div>
<h3 id="translator-translation">Translation</h3>
<h4 id="translator-algorithm">The algorithm</h4>
<div algorithm>
To <dfn>translate</dfn> given:
* a [=string=] |input|,
* a [=Unicode canonicalized locale identifier=] |sourceLanguage|,
* a [=Unicode canonicalized locale identifier=] |targetLanguage|,
* an algorithm |chunkProduced| that takes a string and returns nothing,
* an algorithm |done| that takes no arguments and returns nothing,
* an algorithm |error| that takes [=error information=] and returns nothing, and
* an algorithm |stopProducing| that takes no arguments and returns a boolean,
perform the following steps:
1. [=Assert=]: this algorithm is running [=in parallel=].
1. In an [=implementation-defined=] manner, subject to the following guidelines, begin the processs of translating |input| from |sourceLanguage| into |targetLanguage|.
If |input| is the empty string, or otherwise consists of no translatable content (e.g., only contains whitespace, or control characters), then the resulting translation should be |input|. In such cases, |sourceLanguage| and |targetLanguage| should be ignored.
If (|sourceLanguage|, |targetLanguage|) [=language arc/can be fulfilled by the identity translation=], then the resulting translation should be |input|.
1. While true:
1. Wait for the next chunk of translated text to be produced, for the translation process to finish, or for the result of calling |stopProducing| to become true.
1. If such a chunk is successfully produced:
1. Let it be represented as a [=string=] |chunk|.
1. Perform |chunkProduced| given |chunk|.
1. Otherwise, if the translation process has finished:
1. Perform |done|.
1. [=iteration/Break=].
1. Otherwise, if |stopProducing| returns true, then [=iteration/break=].
1. Otherwise, if an error occurred during translation:
1. Let the error be represented as [=error information=] |errorInfo| according to the guidance in [[#translator-errors]].
1. Perform |error| given |errorInfo|.
1. [=iteration/Break=].
</div>
<h4 id="translator-errors">Errors</h4>
When translation fails, the following possible reasons may be surfaced to the web developer. This table lists the possible {{DOMException}} [=DOMException/names=] and the cases in which an implementation should use them:
<table class="data">
<thead>
<tr>
<th>{{DOMException}} [=DOMException/name=]
<th>Scenarios
<tbody>
<tr>
<td>"{{NotAllowedError}}"
<td>
<p>Translation is disabled by user choice or user agent policy.
<tr>
<td>"{{NotReadableError}}"
<td>
<p>The translation output was filtered by the user agent, e.g., because it was detected to be harmful, inaccurate, or nonsensical.
<tr>
<td>"{{QuotaExceededError}}"
<td>
<p>The input to be translated was too large for the user agent to handle.
<tr>
<td>"{{UnknownError}}"
<td>
<p>All other scenarios, or if the user agent would prefer not to disclose the failure reason.
</table>
<p class="note">This table does not give the complete list of exceptions that can be surfaced by {{AITranslator/translate()|translator.translate()}} and {{AITranslator/translateStreaming()|translator.translateStreaming()}}. It only contains those which can come from the [=implementation-defined=] [=translate=] algorithm.
<h2 id="language-detector-api">The language detector API</h2>
<xmp class="idl">
partial interface AI {
readonly attribute AILanguageDetectorFactory languageDetector;
};
[Exposed=(Window,Worker), SecureContext]
interface AILanguageDetectorFactory {
Promise<AILanguageDetector> create(
optional AILanguageDetectorCreateOptions options = {}
);
Promise<AIAvailability> availability(
optional AILanguageDetectorCreateCoreOptions options = {}
);
};
[Exposed=(Window,Worker), SecureContext]
interface AILanguageDetector {
Promise<sequence<LanguageDetectionResult>> detect(
DOMString input,
optional AILanguageDetectorDetectOptions options = {}
);
readonly attribute FrozenArray<DOMString>? expectedInputLanguages;
undefined destroy();
};
dictionary AILanguageDetectorCreateCoreOptions {
sequence<DOMString> expectedInputLanguages;
};
dictionary AILanguageDetectorCreateOptions : AILanguageDetectorCreateCoreOptions {
AbortSignal signal;
AICreateMonitorCallback monitor;
};
dictionary AILanguageDetectorDetectOptions {
AbortSignal signal;
};
dictionary LanguageDetectionResult {
DOMString detectedLanguage;
double confidence;
};
</xmp>
Every {{AI}} has a <dfn for="AI">language detector factory</dfn>, an {{AILanguageDetector}} object. Upon creation of the {{AI}} object, its [=AI/language detector factory=] must be set to a [=new=] {{AILanguageDetectorFactory}} object created in the {{AI}} object's [=relevant realm=].
The <dfn attribute for="AI">languageDetector</dfn> getter steps are to return [=this=]'s [=AI/language detector factory=].
<h3 id="language-detector-creation">Creation</h3>
<div algorithm>
The <dfn method for="AILanguageDetectorFactory">create(|options|)</dfn> method steps are:
1. If [=this=]'s [=relevant global object=] is a {{Window}} whose [=associated Document=] is not [=Document/fully active=], then return [=a promise rejected with=] an "{{InvalidStateError}}" {{DOMException}}.
1. If |options|["{{AILanguageDetectorCreateOptions/signal}}"] [=map/exists=] and is [=AbortSignal/aborted=], then return [=a promise rejected with=] |options|["{{AILanguageDetectorCreateOptions/signal}}"]'s [=AbortSignal/abort reason=].
1. [=Validate and canonicalize language detector options=] given |options|.
<p class="note">This can mutate |options|.
1. Return the result of [=creating an AI model object=] given [=this=]'s [=relevant realm=], |options|, [=compute language detector options availability=], [=download the language detector model=], [=initialize the language detector model=], and [=create the language detector object=].
</div>
<div algorithm>
To <dfn>validate and canonicalize language detector options</dfn> given an {{AILanguageDetectorCreateCoreOptions}} |options|, perform the following steps. They mutate |options| in place to canonicalize language tags, and throw a {{TypeError}} if any are invalid.
1. [=Validate and canonicalize language tags=] given |options| and "{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}".
</div>
<div algorithm>
To <dfn>download the language detector model</dfn>, given an {{AILanguageDetectorCreateCoreOptions}} |options|:
1. [=Assert=]: these steps are running [=in parallel=].
1. Initiate the download process for everything the user agent needs to detect the languages of input text, including all the languages in |options|["{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}"].
This could include both a base language detection model, and specific fine-tunings or other material to help with the languages identified in |options|["{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}"].
1. If the download process cannot be started for any reason, then return false.
1. Return true.
</div>
<div algorithm>
To <dfn>initialize the language detector model</dfn>, given an {{AILanguageDetectorCreateCoreOptions}} |options|:
1. [=Assert=]: these steps are running [=in parallel=].
1. Perform any necessary initialization operations for the AI model backing the user agent's capabilities for detecting the languages of input text.
This could include loading the model into memory, or loading any fine-tunings necessary to support the languages identified in |options|["{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}"].
1. If initialization failed for any reason, then return false.
1. Return true.
</div>
<div algorithm>
To <dfn>create the language detector object</dfn>, given a [=ECMAScript/realm=] |realm| and an {{AILanguageDetectorCreateCoreOptions}} |options|:
1. [=Assert=]: these steps are running on |realm|'s [=ECMAScript/surrounding agent=]'s [=agent/event loop=].
1. Return a new {{AILanguageDetector}} object, created in |realm|, with
<dl class="props">
: [=AILanguageDetector/expected input languages=]
:: the result of [=creating a frozen array=] given |options|["{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}"] if it [=set/is empty|is not empty=]; otherwise null
</dl>
</div>
<h3 id="language-detector-availability">Availability</h3>
<!-- TODO: consider deduping this with writing assistance APIs + translator, as it's very similar. -->
<div algorithm>
The <dfn method for="AILanguageDetectorFactory">availability(|options|)</dfn> method steps are:
1. If [=this=]'s [=relevant global object=] is a {{Window}} whose [=associated Document=] is not [=Document/fully active=], then return [=a promise rejected with=] an "{{InvalidStateError}}" {{DOMException}}.
1. [=Validate and canonicalize language detector options=] given |options|.
1. Let |promise| be [=a new promise=] created in [=this=]'s [=relevant realm=].
1. [=In parallel=]:
1. Let |availability| be the result of [=computing language detector options availability=] given |options|.
1. [=Queue a global task=] on the [=AI task source=] given [=this=]'s [=relevant global object=] to perform the following steps:
1. If |availability| is null, then [=reject=] |promise| with an "{{UnknownError}}" {{DOMException}}.
1. Otherwise, [=resolve=] |promise| with |availability|.
</div>
<!-- TODO: consider deduping this with writing assistance APIs, as it's very similar. (Not similar to translator though!) -->
<div algorithm>
To <dfn>compute language detector options availability</dfn> given an {{AILanguageDetectorCreateCoreOptions}} |options|, perform the following steps. They return either an {{AIAvailability}} value or null, and they mutate |options| in place to update language tags to their best-fit matches.
1. [=Assert=]: this algorithm is running [=in parallel=].
1. If there is some error attempting to determine what languages the user agent supports detecting, which the user agent believes to be transient (such that re-querying could stop producing such an error), then return null.
1. Let |availabilities| be the result of [=getting language availabilities=] given the purpose of detecting text written in that language.
1. Let |availability| be "{{AIAvailability/available}}".
1. [=set/For each=] |language| in |options|["{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}"]:
1. [=list/For each=] |availabilityToCheck| in « "{{AIAvailability/available}}", "{{AIAvailability/downloading}}", "{{AIAvailability/downloadable}}" »:
1. Let |languagesWithThisAvailability| be |availabilities|[|availabilityToCheck|].
1. Let |bestMatch| be [$LookupMatchingLocaleByBestFit$](|languagesWithThisAvailability|, « |language| »).
1. If |bestMatch| is not undefined, then:
1. [=list/Replace=] |language| with |bestMatch|.\[[locale]] in |options|["{{AILanguageDetectorCreateCoreOptions/expectedInputLanguages}}"].
1. Set |availability| to the [=AIAvailability/minimum availability=] given |availability| and |availabilityToCheck|.
1. [=iteration/Break=].
1. Return "{{AIAvailability/unavailable}}".
1. Return |availability|.
</div>
<h3 id="the-ailanguagedetector-class">The {{AILanguageDetector}} class</h3>
Every {{AILanguageDetector}} has an <dfn for="AILanguageDetector">expected input languages</dfn>, a <code>{{FrozenArray}}<{{DOMString}}></code> or null, set during creation.
<hr>
The <dfn attribute for="AILanguageDetector">expectedInputLanguages</dfn> getter steps are to return [=this=]'s [=AILanguageDetector/expected input languages=].
<hr>
<!-- TODO: consider deduping *SOME* of this with "get an aggregated AI model result", as it's similar. But this case is fundamentally less streaming, so the cut will be tricky. -->
<div algorithm>
The <dfn method for="AILanguageDetector">detect(|input|, |options|)</dfn> method steps are:
1. If [=this=]'s [=relevant global object=] is a {{Window}} whose [=associated Document=] is not [=Document/fully active=], then return [=a promise rejected with=] an "{{InvalidStateError}}" {{DOMException}}.
1. Let |signals| be « [=this=]'s [=AIDestroyable/destruction abort controller=]'s [=AbortController/signal=] ».
1. If |options|["`signal`"] [=map/exists=], then [=set/append=] it to |signals|.
1. Let |compositeSignal| be the result of [=creating a dependent abort signal=] given |signals| using {{AbortSignal}} and [=this=]'s [=relevant realm=].
1. If |compositeSignal| is [=AbortSignal/aborted=], then return [=a promise rejected with=] |compositeSignal|'s [=AbortSignal/abort reason=].
1. Let |abortedDuringOperation| be false.
<p class="note">This variable will be written to from the [=event loop=], but read from [=in parallel=].
1. [=AbortSignal/add|Add the following abort steps=] to |compositeSignal|:
1. Set |abortedDuringOperation| to true.
1. Let |promise| be [=a new promise=] created in [=this=]'s [=relevant realm=].
1. [=In parallel=]:
1. Let |stopProducing| be the following steps:
1. Return |abortedDuringOperation|.
1. Let |result| be the result of [=detecting languages=] given |input| and |stopProducing|.
1. [=Queue a global task=] on the [=AI task source=] given [=this=]'s [=relevant global object=] to perform the following steps:
1. If |abortedDuringOperation| is true, then [=reject=] |promise| with |compositeSignal|'s [=AbortSignal/abort reason=].
1. Otherwise, if |result| is an [=error information=], then [=reject=] |promise| with the result of [=exception/creating=] a {{DOMException}} with name given by |errorInfo|'s [=error information/error name=], using |errorInfo|'s [=error information/error information=] to populate the message appropriately.
1. Otherwise:
1. [=Assert=]: |result| is a [=list=] of {{LanguageDetectionResult}} dictionaries. (It is not null, since in that case |abortedDuringOperation| would have been true.)
1. [=Resolve=] |promise| with |result|.
</div>
<h4 id="language-detector-algorithm">The algorithm</h4>
<div algorithm>
To <dfn>detect languages</dfn> given a [=string=] |input| and an algorithm |stopProducing| that takes no arguments and returns a boolean, perform the following steps. They will return either null, an [=error information=], or a [=list=] of {{LanguageDetectionResult}} dictionaries.
1. [=Assert=]: this algorithm is running [=in parallel=].
1. Let |availabilities| be the result of [=getting language availabilities=] given the purpose of detecting text written in that language.
1. Let |currentlyAvailableLanguages| be |availabilities|["{{AIAvailability/available}}"].
1. In an [=implementation-defined=] manner, subject to the following guidelines, let |rawResult| and |unknown| be the result of detecting the languages of |input|.
|rawResult| must be a [=map=] which has a [=map/key=] for each language in |currentlyAvailableLanguages|. The [=map/value=] for each such key must be a number between 0 and 1. This value must represent the implementation's confidence that |input| is written in that language.
|unknown| must be a number between 0 and 1 that represents the implementation's confidence that |input| is not written in any of the languages in |currentlyAvailableLanguages|.
The [=map/values=] of |rawResult|, plus |unknown|, must sum to 1. Each such value, or |unknown|, may be 0.
If the implementation believes |input| to be written in multiple languages, then it should attempt to apportion the values of |rawResult| and |unknown| such that they are proportionate to the amount of |input| written in each detected language. The exact scheme for apportioning |input| is [=implementation-defined=].
<div class="example" id="example-multilingual-input">
<p>If |input| is "`tacosを食べる`", the implementation might split this into "`tacos`" and "`を食べる`", and then detect the languages of each separately. The first part might be detected as English with confidence 0.5 and Spanish with confidence 0.5, and the second part as Japanese with confidence 1. The resulting |rawResult| then might be «[ "`en`" → 0.25, "`es`" → 0.25, "`ja`" → 0.5 ]» (with |unknown| set to 0).
<p>The decision to split this into two parts, instead of e.g. the three parts "`tacos`", "`を`", and "`食べる`", was an [=implementation-defined=] choice. Similarly, the decision to treat each part as contributing to "half" of the result, instead of e.g. weighting by number of [=code points=], was [=implementation-defined=].
<p>(Realistically, we expect that implementations will split on larger chunks than this, as generally more than 4-5 [=code points=] are necessary for most language detection models.)
</div>
If |stopProducing| returns true at any point during this process, then return null.
If an error occurred during language detection, then return an [=error information=] according to the guidance in [[#language-detector-errors]].
1. [=map/Sort in descending order=] |rawResult| with a less than algorithm which given [=map/entries=] |a| and |b|, returns true if |a|'s [=map/value=] is less than |b|'s [=map/value=].
1. Let |results| be an empty [=list=].
1. Let |cumulativeConfidence| be 0.
1. [=map/For each=] |key| → |value| of |rawResult|:
1. If |value| is 0, then [=iteration/break=].
1. If |value| is less than |unknown|, then [=iteration/break=].
1. [=list/Append=] «[ "{{LanguageDetectionResult/detectedLanguage}}" → |key|, "{{LanguageDetectionResult/confidence}}" → |value| ]» to |results|.
1. Set |cumulativeConfidence| to |cumulativeConfidence| + |value|.
1. If |cumulativeConfidence| is greater than or equal to 0.99, then [=iteration/break=].
1. [=Assert=]: 1 − |cumulativeConfidence| is greater than or equal to |unknown|.
1. [=list/Append=] «[ "{{LanguageDetectionResult/detectedLanguage}}" → "`und`", "{{LanguageDetectionResult/confidence}}" → 1 − |cumulativeConfidence| ]» to |results|.
1. Return |results|.
<p class="note" id="note-language-detection-post-processing">The post-processing of |rawResult| and |unknown| essentially consolidates all languages below a certain threshold into the "`und`" language. Languages which are less than 1% likely, or contribute to less than 1% of the text, are considered more likely to be noise than to be worth detecting. Similarly, if the implementation is less sure about a language than it is about the text not being in any of the languages it knows, that language is probably not worth returning to the web developer.
</div>
<h4 id="language-detector-errors">Errors</h4>
When language detection fails, the following possible reasons may be surfaced to the web developer. This table lists the possible {{DOMException}} [=DOMException/names=] and the cases in which an implementation should use them:
<table class="data">
<thead>
<tr>
<th>{{DOMException}} [=DOMException/name=]
<th>Scenarios
<tbody>
<tr>
<td>"{{NotAllowedError}}"
<td>
<p>Language detection is disabled by user choice or user agent policy.
<tr>
<td>"{{QuotaExceededError}}"
<td>
<p>The input to be detected was too large for the user agent to handle.
<tr>
<td>"{{UnknownError}}"
<td>
<p>All other scenarios, or if the user agent would prefer not to disclose the failure reason.
</table>
<p class="note">This table does not give the complete list of exceptions that can be surfaced by {{AILanguageDetector/detect()|detector.detect()}}. It only contains those which can come from the [=implementation-defined=] [=detect languages=] algorithm.