Skip to content

Commit

Permalink
textseg: Update tables for Unicode 11.0.0
Browse files Browse the repository at this point in the history
As of Unicode 11 the grapheme cluster definition is derived from a
property defined in the emoji data, so we need to introduce a table from
there too now in addition to the table for the text segmentation
specification.

This also includes an updated table of grapheme cluster segmentation tests
derived from the Unicode character database version 11.0.0.
  • Loading branch information
apparentlymart committed Mar 6, 2020
1 parent d674eb8 commit e7dab68
Show file tree
Hide file tree
Showing 12 changed files with 6,347 additions and 9,238 deletions.
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module github.com/apparentlymart/go-textseg/v10
module github.com/apparentlymart/go-textseg/v11

go 1.12
go 1.13
2 changes: 0 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
github.com/apparentlymart/go-textseg v1.0.0 h1:rRmlIsPEEhUTIKQb7T++Nz/A5Q6C9IuX2wFoYVvnCs0=
github.com/apparentlymart/go-textseg v1.0.0/go.mod h1:z96Txxhf3xSFMPmb5X/1W05FF/Nj9VFpLOpjS5yuumk=
269 changes: 269 additions & 0 deletions textseg/emoji_table.rl

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion textseg/generate.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package textseg

//go:generate go run make_tables.go -output tables.go
//go:generate go run make_test_tables.go -output tables_test.go
//go:generate ruby unicode2ragel.rb --url=http://www.unicode.org/Public/10.0.0/ucd/auxiliary/GraphemeBreakProperty.txt -m GraphemeCluster -p "Prepend,CR,LF,Control,Extend,Regional_Indicator,SpacingMark,L,V,T,LV,LVT,E_Base,E_Modifier,ZWJ,Glue_After_Zwj,E_Base_GAZ" -o grapheme_clusters_table.rl
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/11.0.0/ucd/auxiliary/GraphemeBreakProperty.txt -m GraphemeCluster -p "Prepend,CR,LF,Control,Extend,Regional_Indicator,SpacingMark,L,V,T,LV,LVT,ZWJ" -o grapheme_clusters_table.rl
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/emoji/11.0/emoji-data.txt -m Emoji -p "Extended_Pictographic" -o emoji_table.rl
//go:generate ragel -Z grapheme_clusters.rl
//go:generate gofmt -w grapheme_clusters.go
7,311 changes: 2,902 additions & 4,409 deletions textseg/grapheme_clusters.go

Large diffs are not rendered by default.

11 changes: 6 additions & 5 deletions textseg/grapheme_clusters.rl
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error) {

%%{
include GraphemeCluster "grapheme_clusters_table.rl";
include Emoji "emoji_table.rl";

action start {
startPos = p
Expand All @@ -55,7 +56,7 @@ func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error) {
return endPos+1, data[startPos:endPos+1], nil
}

ZWJGlue = ZWJ (Glue_After_Zwj | E_Base_GAZ Extend* E_Modifier?)?;
ZWJGlue = ZWJ (Extended_Pictographic Extend*)?;
AnyExtender = Extend | ZWJGlue | SpacingMark;
Extension = AnyExtender*;
ReplacementChar = (0xEF 0xBF 0xBD);
Expand All @@ -69,8 +70,8 @@ func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error) {
LVT T* |
T+
) Extension;
EmojiSeq = (E_Base | E_Base_GAZ) Extend* E_Modifier? Extension;
ZWJSeq = ZWJGlue Extension;
EmojiSeq = Extended_Pictographic Extend* Extension;
ZWJSeq = ZWJ (ZWJ | Extend | SpacingMark)*;
EmojiFlagSeq = Regional_Indicator Regional_Indicator? Extension;

UTF8Cont = 0x80 .. 0xBF;
Expand All @@ -82,7 +83,7 @@ func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error) {
);

# OtherSeq is any character that isn't at the start of one of the extended sequences above, followed by extension
OtherSeq = (AnyUTF8 - (CR|LF|Control|ReplacementChar|L|LV|V|LVT|T|E_Base|E_Base_GAZ|ZWJ|Regional_Indicator|Prepend)) Extension;
OtherSeq = (AnyUTF8 - (CR|LF|Control|ReplacementChar|L|LV|V|LVT|T|Extended_Pictographic|ZWJ|Regional_Indicator|Prepend)) (Extend | ZWJ | SpacingMark)*;

# PrependSeq is prepend followed by any of the other patterns above, except control characters which explicitly break
PrependSeq = Prepend+ (HangulSeq|EmojiSeq|ZWJSeq|EmojiFlagSeq|OtherSeq)?;
Expand Down Expand Up @@ -129,4 +130,4 @@ func ScanGraphemeClusters(data []byte, atEOF bool) (int, []byte, error) {
// Just take the first UTF-8 sequence and return that.
_, seqLen := utf8.DecodeRune(data)
return seqLen, data[:seqLen], nil
}
}
101 changes: 28 additions & 73 deletions textseg/grapheme_clusters_table.rl
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# The following Ragel file was autogenerated with unicode2ragel.rb
# from: http://www.unicode.org/Public/10.0.0/ucd/auxiliary/GraphemeBreakProperty.txt
# from: https://www.unicode.org/Public/11.0.0/ucd/auxiliary/GraphemeBreakProperty.txt
#
# It defines ["Prepend", "CR", "LF", "Control", "Extend", "Regional_Indicator", "SpacingMark", "L", "V", "T", "LV", "LVT", "E_Base", "E_Modifier", "ZWJ", "Glue_After_Zwj", "E_Base_GAZ"].
# It defines ["Prepend", "CR", "LF", "Control", "Extend", "Regional_Indicator", "SpacingMark", "L", "V", "T", "LV", "LVT", "ZWJ"].
#
# To use this, make sure that your alphtype is set to byte,
# and that your input is in utf8.
Expand All @@ -16,6 +16,7 @@
| 0xE0 0xA3 0xA2 #Cf ARABIC DISPUTED END OF AYAH
| 0xE0 0xB5 0x8E #Lo MALAYALAM LETTER DOT REPH
| 0xF0 0x91 0x82 0xBD #Cf KAITHI NUMBER SIGN
| 0xF0 0x91 0x83 0x8D #Cf KAITHI NUMBER SIGN ABOVE
| 0xF0 0x91 0x87 0x82..0x83 #Lo [2] SHARADA SIGN JIHVAMULIYA..SHARA...
| 0xF0 0x91 0xA8 0xBA #Lo ZANABAZAR SQUARE CLUSTER-INITIAL L...
| 0xF0 0x91 0xAA 0x86..0x89 #Lo [4] SOYOMBO CLUSTER-INITIAL LETTER ...
Expand Down Expand Up @@ -87,12 +88,13 @@
| 0xDD 0x00..0x8A #
| 0xDE 0xA6..0xB0 #Mn [11] THAANA ABAFILI..THAANA SUKUN
| 0xDF 0xAB..0xB3 #Mn [9] NKO COMBINING SHORT HIGH TONE..NKO...
| 0xDF 0xBD #Mn NKO DANTAYALAN
| 0xE0 0xA0 0x96..0x99 #Mn [4] SAMARITAN MARK IN..SAMARITAN MARK ...
| 0xE0 0xA0 0x9B..0xA3 #Mn [9] SAMARITAN MARK EPENTHETIC YUT..SAM...
| 0xE0 0xA0 0xA5..0xA7 #Mn [3] SAMARITAN VOWEL SIGN SHORT A..SAMA...
| 0xE0 0xA0 0xA9..0xAD #Mn [5] SAMARITAN VOWEL SIGN LONG I..SAMAR...
| 0xE0 0xA1 0x99..0x9B #Mn [3] MANDAIC AFFRICATION MARK..MANDAIC ...
| 0xE0 0xA3 0x94..0xA1 #Mn [14] ARABIC SMALL HIGH WORD AR-RUB..ARA...
| 0xE0 0xA3 0x93..0xA1 #Mn [15] ARABIC SMALL LOW WAW..ARABIC SMALL...
| 0xE0 0xA3 0xA3..0xFF #Mn [32] ARABIC TURNED DAMMA BELOW..DEVANAG...
| 0xE0 0xA4 0x00..0x82 #
| 0xE0 0xA4 0xBA #Mn DEVANAGARI VOWEL SIGN OE
Expand All @@ -108,6 +110,7 @@
| 0xE0 0xA7 0x8D #Mn BENGALI SIGN VIRAMA
| 0xE0 0xA7 0x97 #Mc BENGALI AU LENGTH MARK
| 0xE0 0xA7 0xA2..0xA3 #Mn [2] BENGALI VOWEL SIGN VOCALIC L..BENG...
| 0xE0 0xA7 0xBE #Mn BENGALI SANDHI MARK
| 0xE0 0xA8 0x81..0x82 #Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI...
| 0xE0 0xA8 0xBC #Mn GURMUKHI SIGN NUKTA
| 0xE0 0xA9 0x81..0x82 #Mn [2] GURMUKHI VOWEL SIGN U..GURMUKHI VO...
Expand Down Expand Up @@ -138,6 +141,7 @@
| 0xE0 0xAF 0x8D #Mn TAMIL SIGN VIRAMA
| 0xE0 0xAF 0x97 #Mc TAMIL AU LENGTH MARK
| 0xE0 0xB0 0x80 #Mn TELUGU SIGN COMBINING CANDRABINDU ...
| 0xE0 0xB0 0x84 #Mn TELUGU SIGN COMBINING ANUSVARA ABOVE
| 0xE0 0xB0 0xBE..0xFF #Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL...
| 0xE0 0xB1 0x00..0x80 #
| 0xE0 0xB1 0x86..0x88 #Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL ...
Expand Down Expand Up @@ -267,6 +271,7 @@
| 0xEA 0xA0 0xA5..0xA6 #Mn [2] SYLOTI NAGRI VOWEL SIGN U..SYLOTI ...
| 0xEA 0xA3 0x84..0x85 #Mn [2] SAURASHTRA SIGN VIRAMA..SAURASHTRA...
| 0xEA 0xA3 0xA0..0xB1 #Mn [18] COMBINING DEVANAGARI DIGIT ZERO..C...
| 0xEA 0xA3 0xBF #Mn DEVANAGARI VOWEL SIGN AY
| 0xEA 0xA4 0xA6..0xAD #Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE C...
| 0xEA 0xA5 0x87..0x91 #Mn [11] REJANG VOWEL SIGN I..REJANG CONSON...
| 0xEA 0xA6 0x80..0x82 #Mn [3] JAVANESE SIGN PANYANGGA..JAVANESE ...
Expand Down Expand Up @@ -303,6 +308,8 @@
| 0xF0 0x90 0xA8 0xB8..0xBA #Mn [3] KHAROSHTHI SIGN BAR ABOVE..KHAR...
| 0xF0 0x90 0xA8 0xBF #Mn KHAROSHTHI VIRAMA
| 0xF0 0x90 0xAB 0xA5..0xA6 #Mn [2] MANICHAEAN ABBREVIATION MARK AB...
| 0xF0 0x90 0xB4 0xA4..0xA7 #Mn [4] HANIFI ROHINGYA SIGN HARBAHAY.....
| 0xF0 0x90 0xBD 0x86..0x90 #Mn [11] SOGDIAN COMBINING DOT BELOW..SO...
| 0xF0 0x91 0x80 0x81 #Mn BRAHMI SIGN ANUSVARA
| 0xF0 0x91 0x80 0xB8..0xFF #Mn [15] BRAHMI VOWEL SIGN AA..BRAHMI VI...
| 0xF0 0x91 0x81 0x00..0x86 #
Expand All @@ -316,15 +323,15 @@
| 0xF0 0x91 0x85 0xB3 #Mn MAHAJANI SIGN NUKTA
| 0xF0 0x91 0x86 0x80..0x81 #Mn [2] SHARADA SIGN CANDRABINDU..SHARA...
| 0xF0 0x91 0x86 0xB6..0xBE #Mn [9] SHARADA VOWEL SIGN U..SHARADA V...
| 0xF0 0x91 0x87 0x8A..0x8C #Mn [3] SHARADA SIGN NUKTA..SHARADA EXT...
| 0xF0 0x91 0x87 0x89..0x8C #Mn [4] SHARADA SANDHI MARK..SHARADA EX...
| 0xF0 0x91 0x88 0xAF..0xB1 #Mn [3] KHOJKI VOWEL SIGN U..KHOJKI VOW...
| 0xF0 0x91 0x88 0xB4 #Mn KHOJKI SIGN ANUSVARA
| 0xF0 0x91 0x88 0xB6..0xB7 #Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN ...
| 0xF0 0x91 0x88 0xBE #Mn KHOJKI SIGN SUKUN
| 0xF0 0x91 0x8B 0x9F #Mn KHUDAWADI SIGN ANUSVARA
| 0xF0 0x91 0x8B 0xA3..0xAA #Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWA...
| 0xF0 0x91 0x8C 0x80..0x81 #Mn [2] GRANTHA SIGN COMBINING ANUSVARA...
| 0xF0 0x91 0x8C 0xBC #Mn GRANTHA SIGN NUKTA
| 0xF0 0x91 0x8C 0xBB..0xBC #Mn [2] COMBINING BINDU BELOW..GRANTHA ...
| 0xF0 0x91 0x8C 0xBE #Mc GRANTHA VOWEL SIGN AA
| 0xF0 0x91 0x8D 0x80 #Mn GRANTHA VOWEL SIGN II
| 0xF0 0x91 0x8D 0x97 #Mc GRANTHA AU LENGTH MARK
Expand All @@ -333,6 +340,7 @@
| 0xF0 0x91 0x90 0xB8..0xBF #Mn [8] NEWA VOWEL SIGN U..NEWA VOWEL S...
| 0xF0 0x91 0x91 0x82..0x84 #Mn [3] NEWA SIGN VIRAMA..NEWA SIGN ANU...
| 0xF0 0x91 0x91 0x86 #Mn NEWA SIGN NUKTA
| 0xF0 0x91 0x91 0x9E #Mn NEWA SANDHI MARK
| 0xF0 0x91 0x92 0xB0 #Mc TIRHUTA VOWEL SIGN AA
| 0xF0 0x91 0x92 0xB3..0xB8 #Mn [6] TIRHUTA VOWEL SIGN U..TIRHUTA V...
| 0xF0 0x91 0x92 0xBA #Mn TIRHUTA VOWEL SIGN SHORT E
Expand All @@ -357,8 +365,9 @@
| 0xF0 0x91 0x9C 0x9D..0x9F #Mn [3] AHOM CONSONANT SIGN MEDIAL LA.....
| 0xF0 0x91 0x9C 0xA2..0xA5 #Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL S...
| 0xF0 0x91 0x9C 0xA7..0xAB #Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN K...
| 0xF0 0x91 0xA8 0x81..0x86 #Mn [6] ZANABAZAR SQUARE VOWEL SIGN I.....
| 0xF0 0x91 0xA8 0x89..0x8A #Mn [2] ZANABAZAR SQUARE VOWEL SIGN REV...
| 0xF0 0x91 0xA0 0xAF..0xB7 #Mn [9] DOGRA VOWEL SIGN U..DOGRA SIGN ...
| 0xF0 0x91 0xA0 0xB9..0xBA #Mn [2] DOGRA SIGN VIRAMA..DOGRA SIGN N...
| 0xF0 0x91 0xA8 0x81..0x8A #Mn [10] ZANABAZAR SQUARE VOWEL SIGN I.....
| 0xF0 0x91 0xA8 0xB3..0xB8 #Mn [6] ZANABAZAR SQUARE FINAL CONSONAN...
| 0xF0 0x91 0xA8 0xBB..0xBE #Mn [4] ZANABAZAR SQUARE CLUSTER-FINAL ...
| 0xF0 0x91 0xA9 0x87 #Mn ZANABAZAR SQUARE SUBJOINER
Expand All @@ -379,6 +388,10 @@
| 0xF0 0x91 0xB4 0xBF..0xFF #Mn [7] MASARAM GONDI VOWEL SIGN AU..MA...
| 0xF0 0x91 0xB5 0x00..0x85 #
| 0xF0 0x91 0xB5 0x87 #Mn MASARAM GONDI RA-KARA
| 0xF0 0x91 0xB6 0x90..0x91 #Mn [2] GUNJALA GONDI VOWEL SIGN EE..GU...
| 0xF0 0x91 0xB6 0x95 #Mn GUNJALA GONDI SIGN ANUSVARA
| 0xF0 0x91 0xB6 0x97 #Mn GUNJALA GONDI VIRAMA
| 0xF0 0x91 0xBB 0xB3..0xB4 #Mn [2] MAKASAR VOWEL SIGN I..MAKASAR V...
| 0xF0 0x96 0xAB 0xB0..0xB4 #Mn [5] BASSA VAH COMBINING HIGH TONE.....
| 0xF0 0x96 0xAC 0xB0..0xB6 #Mn [7] PAHAWH HMONG MARK CIM TUB..PAHA...
| 0xF0 0x96 0xBE 0x8F..0x92 #Mn [4] MIAO TONE RIGHT..MIAO TONE BELOW
Expand All @@ -405,6 +418,7 @@
| 0xF0 0x9E 0x80 0xA6..0xAA #Mn [5] COMBINING GLAGOLITIC LETTER YO....
| 0xF0 0x9E 0xA3 0x90..0x96 #Mn [7] MENDE KIKAKUI COMBINING NUMBER ...
| 0xF0 0x9E 0xA5 0x84..0x8A #Mn [7] ADLAM ALIF LENGTHENER..ADLAM NUKTA
| 0xF0 0x9F 0x8F 0xBB..0xBF #Sk [5] EMOJI MODIFIER FITZPATRICK TYPE...
| 0xF3 0xA0 0x80 0xA0..0xFF #Cf [96] TAG SPACE..CANCEL TAG
| 0xF3 0xA0 0x81 0x00..0xBF #
| 0xF3 0xA0 0x84 0x80..0xFF #Mn [240] VARIATION SELECTOR-17..VA...
Expand Down Expand Up @@ -527,6 +541,7 @@
| 0xF0 0x91 0x82 0xB0..0xB2 #Mc [3] KAITHI VOWEL SIGN AA..KAITHI VO...
| 0xF0 0x91 0x82 0xB7..0xB8 #Mc [2] KAITHI VOWEL SIGN O..KAITHI VOW...
| 0xF0 0x91 0x84 0xAC #Mc CHAKMA VOWEL SIGN E
| 0xF0 0x91 0x85 0x85..0x86 #Mc [2] CHAKMA VOWEL SIGN AA..CHAKMA VO...
| 0xF0 0x91 0x86 0x82 #Mc SHARADA SIGN VISARGA
| 0xF0 0x91 0x86 0xB3..0xB5 #Mc [3] SHARADA VOWEL SIGN AA..SHARADA ...
| 0xF0 0x91 0x86 0xBF..0xFF #Mc [2] SHARADA VOWEL SIGN AU..SHARADA ...
Expand Down Expand Up @@ -560,7 +575,8 @@
| 0xF0 0x91 0x9A 0xB6 #Mc TAKRI SIGN VIRAMA
| 0xF0 0x91 0x9C 0xA0..0xA1 #Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL S...
| 0xF0 0x91 0x9C 0xA6 #Mc AHOM VOWEL SIGN E
| 0xF0 0x91 0xA8 0x87..0x88 #Mc [2] ZANABAZAR SQUARE VOWEL SIGN AI....
| 0xF0 0x91 0xA0 0xAC..0xAE #Mc [3] DOGRA VOWEL SIGN AA..DOGRA VOWE...
| 0xF0 0x91 0xA0 0xB8 #Mc DOGRA SIGN VISARGA
| 0xF0 0x91 0xA8 0xB9 #Mc ZANABAZAR SQUARE SIGN VISARGA
| 0xF0 0x91 0xA9 0x97..0x98 #Mc [2] SOYOMBO VOWEL SIGN AI..SOYOMBO ...
| 0xF0 0x91 0xAA 0x97 #Mc SOYOMBO SIGN VISARGA
Expand All @@ -569,6 +585,10 @@
| 0xF0 0x91 0xB2 0xA9 #Mc MARCHEN SUBJOINED LETTER YA
| 0xF0 0x91 0xB2 0xB1 #Mc MARCHEN VOWEL SIGN I
| 0xF0 0x91 0xB2 0xB4 #Mc MARCHEN VOWEL SIGN O
| 0xF0 0x91 0xB6 0x8A..0x8E #Mc [5] GUNJALA GONDI VOWEL SIGN AA..GU...
| 0xF0 0x91 0xB6 0x93..0x94 #Mc [2] GUNJALA GONDI VOWEL SIGN OO..GU...
| 0xF0 0x91 0xB6 0x96 #Mc GUNJALA GONDI SIGN VISARGA
| 0xF0 0x91 0xBB 0xB5..0xB6 #Mc [2] MAKASAR VOWEL SIGN E..MAKASAR V...
| 0xF0 0x96 0xBD 0x91..0xBE #Mc [46] MIAO SIGN ASPIRATION..MIAO VOWE...
| 0xF0 0x9D 0x85 0xA6 #Mc MUSICAL SYMBOL COMBINING SPRECHGES...
| 0xF0 0x9D 0x85 0xAD #Mc MUSICAL SYMBOL COMBINING AUGMENTAT...
Expand Down Expand Up @@ -1556,73 +1576,8 @@
| 0xED 0x9E 0x89..0xA3 #Lo [27] HANGUL SYLLABLE HIG..HANGUL SYLLAB...
;

E_Base =
0xE2 0x98 0x9D #So WHITE UP POINTING INDEX
| 0xE2 0x9B 0xB9 #So PERSON WITH BALL
| 0xE2 0x9C 0x8A..0x8D #So [4] RAISED FIST..WRITING HAND
| 0xF0 0x9F 0x8E 0x85 #So FATHER CHRISTMAS
| 0xF0 0x9F 0x8F 0x82..0x84 #So [3] SNOWBOARDER..SURFER
| 0xF0 0x9F 0x8F 0x87 #So HORSE RACING
| 0xF0 0x9F 0x8F 0x8A..0x8C #So [3] SWIMMER..GOLFER
| 0xF0 0x9F 0x91 0x82..0x83 #So [2] EAR..NOSE
| 0xF0 0x9F 0x91 0x86..0x90 #So [11] WHITE UP POINTING BACKHAND INDE...
| 0xF0 0x9F 0x91 0xAE #So POLICE OFFICER
| 0xF0 0x9F 0x91 0xB0..0xB8 #So [9] BRIDE WITH VEIL..PRINCESS
| 0xF0 0x9F 0x91 0xBC #So BABY ANGEL
| 0xF0 0x9F 0x92 0x81..0x83 #So [3] INFORMATION DESK PERSON..DANCER
| 0xF0 0x9F 0x92 0x85..0x87 #So [3] NAIL POLISH..HAIRCUT
| 0xF0 0x9F 0x92 0xAA #So FLEXED BICEPS
| 0xF0 0x9F 0x95 0xB4..0xB5 #So [2] MAN IN BUSINESS SUIT LEVITATING...
| 0xF0 0x9F 0x95 0xBA #So MAN DANCING
| 0xF0 0x9F 0x96 0x90 #So RAISED HAND WITH FINGERS SPLAYED
| 0xF0 0x9F 0x96 0x95..0x96 #So [2] REVERSED HAND WITH MIDDLE FINGE...
| 0xF0 0x9F 0x99 0x85..0x87 #So [3] FACE WITH NO GOOD GESTURE..PERS...
| 0xF0 0x9F 0x99 0x8B..0x8F #So [5] HAPPY PERSON RAISING ONE HAND.....
| 0xF0 0x9F 0x9A 0xA3 #So ROWBOAT
| 0xF0 0x9F 0x9A 0xB4..0xB6 #So [3] BICYCLIST..PEDESTRIAN
| 0xF0 0x9F 0x9B 0x80 #So BATH
| 0xF0 0x9F 0x9B 0x8C #So SLEEPING ACCOMMODATION
| 0xF0 0x9F 0xA4 0x98..0x9C #So [5] SIGN OF THE HORNS..RIGHT-FACING...
| 0xF0 0x9F 0xA4 0x9E..0x9F #So [2] HAND WITH INDEX AND MIDDLE FING...
| 0xF0 0x9F 0xA4 0xA6 #So FACE PALM
| 0xF0 0x9F 0xA4 0xB0..0xB9 #So [10] PREGNANT WOMAN..JUGGLING
| 0xF0 0x9F 0xA4 0xBD..0xBE #So [2] WATER POLO..HANDBALL
| 0xF0 0x9F 0xA7 0x91..0x9D #So [13] ADULT..ELF
;

E_Modifier =
0xF0 0x9F 0x8F 0xBB..0xBF #Sk [5] EMOJI MODIFIER FITZPATRICK TYPE...
;

ZWJ =
0xE2 0x80 0x8D #Cf ZERO WIDTH JOINER
;

Glue_After_Zwj =
0xE2 0x99 0x80 #So FEMALE SIGN
| 0xE2 0x99 0x82 #So MALE SIGN
| 0xE2 0x9A 0x95..0x96 #So [2] STAFF OF AESCULAPIUS..SCALES
| 0xE2 0x9C 0x88 #So AIRPLANE
| 0xE2 0x9D 0xA4 #So HEAVY BLACK HEART
| 0xF0 0x9F 0x8C 0x88 #So RAINBOW
| 0xF0 0x9F 0x8C 0xBE #So EAR OF RICE
| 0xF0 0x9F 0x8D 0xB3 #So COOKING
| 0xF0 0x9F 0x8E 0x93 #So GRADUATION CAP
| 0xF0 0x9F 0x8E 0xA4 #So MICROPHONE
| 0xF0 0x9F 0x8E 0xA8 #So ARTIST PALETTE
| 0xF0 0x9F 0x8F 0xAB #So SCHOOL
| 0xF0 0x9F 0x8F 0xAD #So FACTORY
| 0xF0 0x9F 0x92 0x8B #So KISS MARK
| 0xF0 0x9F 0x92 0xBB..0xBC #So [2] PERSONAL COMPUTER..BRIEFCASE
| 0xF0 0x9F 0x94 0xA7 #So WRENCH
| 0xF0 0x9F 0x94 0xAC #So MICROSCOPE
| 0xF0 0x9F 0x97 0xA8 #So LEFT SPEECH BUBBLE
| 0xF0 0x9F 0x9A 0x80 #So ROCKET
| 0xF0 0x9F 0x9A 0x92 #So FIRE ENGINE
;

E_Base_GAZ =
0xF0 0x9F 0x91 0xA6..0xA9 #So [4] BOY..WOMAN
;

}%%
2 changes: 1 addition & 1 deletion textseg/make_tables.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ import (
)

var url = flag.String("url",
"http://www.unicode.org/Public/10.0.0/ucd/auxiliary/",
"http://www.unicode.org/Public/11.0.0/ucd/auxiliary/",
"URL of Unicode database directory")
var verbose = flag.Bool("verbose",
false,
Expand Down
2 changes: 1 addition & 1 deletion textseg/make_test_tables.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import (
)

var url = flag.String("url",
"http://www.unicode.org/Public/10.0.0/ucd/auxiliary/",
"http://www.unicode.org/Public/11.0.0/ucd/auxiliary/",
"URL of Unicode database directory")
var verbose = flag.Bool("verbose",
false,
Expand Down
Loading

0 comments on commit e7dab68

Please sign in to comment.