Skip to content

Commit

Permalink
textseg: update for Unicode 15
Browse files Browse the repository at this point in the history
Unicode 15 makes no changes to the segmentation algorithm relative to
the last version supported by this library, Unicode 13. This commit
therefore updates only the character data tables.
  • Loading branch information
kmoe authored and apparentlymart committed Aug 29, 2023
1 parent 45ed1d8 commit 72b78f4
Show file tree
Hide file tree
Showing 9 changed files with 3,569 additions and 3,021 deletions.
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module github.com/apparentlymart/go-textseg/v13
module github.com/apparentlymart/go-textseg/v15

go 1.16
54 changes: 37 additions & 17 deletions textseg/emoji_table.rl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# The following Ragel file was autogenerated with unicode2ragel.rb
# from: https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt
# from: https://www.unicode.org/Public/15.0.0/ucd/emoji/emoji-data.txt
#
# It defines ["Extended_Pictographic"].
#
Expand Down Expand Up @@ -150,8 +150,8 @@
| 0xE2 0x9D 0x87 #E0.6 [1] (❇️) sparkle
| 0xE2 0x9D 0x8C #E0.6 [1] (❌) cross mark
| 0xE2 0x9D 0x8E #E0.6 [1] (❎) cross mark button
| 0xE2 0x9D 0x93..0x95 #E0.6 [3] (❓..❕) question mark..white e...
| 0xE2 0x9D 0x97 #E0.6 [1] (❗) exclamation mark
| 0xE2 0x9D 0x93..0x95 #E0.6 [3] (❓..❕) red question mark..whi...
| 0xE2 0x9D 0x97 #E0.6 [1] (❗) red exclamation mark
| 0xE2 0x9D 0xA3 #E1.0 [1] (❣️) heart exclamation
| 0xE2 0x9D 0xA4 #E0.6 [1] (❤️) red heart
| 0xE2 0x9D 0xA5..0xA7 #E0.0 [3] (❥..❧) ROTATED HEAVY BLACK HE...
Expand Down Expand Up @@ -299,7 +299,7 @@
| 0xF0 0x9F 0x94 0x89 #E1.0 [1] (🔉) speaker medium volume
| 0xF0 0x9F 0x94 0x8A..0x94 #E0.6 [11] (🔊..🔔) speaker high volume...
| 0xF0 0x9F 0x94 0x95 #E1.0 [1] (🔕) bell with slash
| 0xF0 0x9F 0x94 0x96..0xAB #E0.6 [22] (🔖..🔫) bookmark..pistol
| 0xF0 0x9F 0x94 0x96..0xAB #E0.6 [22] (🔖..🔫) bookmark..water pistol
| 0xF0 0x9F 0x94 0xAC..0xAD #E1.0 [2] (🔬..🔭) microscope..telescope
| 0xF0 0x9F 0x94 0xAE..0xBD #E0.6 [16] (🔮..🔽) crystal ball..downw...
| 0xF0 0x9F 0x95 0x86..0x88 #E0.0 [3] (🕆..🕈) WHITE LATIN CROSS.....
Expand Down Expand Up @@ -377,7 +377,7 @@
| 0xF0 0x9F 0x98 0xAE..0xAF #E1.0 [2] (😮..😯) face with open mout...
| 0xF0 0x9F 0x98 0xB0..0xB3 #E0.6 [4] (😰..😳) anxious face with s...
| 0xF0 0x9F 0x98 0xB4 #E1.0 [1] (😴) sleeping face
| 0xF0 0x9F 0x98 0xB5 #E0.6 [1] (😵) dizzy face
| 0xF0 0x9F 0x98 0xB5 #E0.6 [1] (😵) face with crossed-out ...
| 0xF0 0x9F 0x98 0xB6 #E1.0 [1] (😶) face without mouth
| 0xF0 0x9F 0x98 0xB7..0xFF #E0.6 [10] (😷..🙀) face with medical m...
| 0xF0 0x9F 0x99 0x00..0x80 #
Expand Down Expand Up @@ -427,7 +427,9 @@
| 0xF0 0x9F 0x9B 0x93..0x94 #E0.0 [2] (🛓..🛔) STUPA..PAGODA
| 0xF0 0x9F 0x9B 0x95 #E12.0 [1] (🛕) hindu temple
| 0xF0 0x9F 0x9B 0x96..0x97 #E13.0 [2] (🛖..🛗) hut..elevator
| 0xF0 0x9F 0x9B 0x98..0x9F #E0.0 [8] (🛘..🛟) <reserved-1F6D8>..<...
| 0xF0 0x9F 0x9B 0x98..0x9B #E0.0 [4] (🛘..🛛) <reserved-1F6D8>..<...
| 0xF0 0x9F 0x9B 0x9C #E15.0 [1] (🛜) wireless
| 0xF0 0x9F 0x9B 0x9D..0x9F #E14.0 [3] (🛝..🛟) playground slide..r...
| 0xF0 0x9F 0x9B 0xA0..0xA5 #E0.7 [6] (🛠️..🛥️) hammer and wrench...
| 0xF0 0x9F 0x9B 0xA6..0xA8 #E0.0 [3] (🛦..🛨) UP-POINTING MILITAR...
| 0xF0 0x9F 0x9B 0xA9 #E0.7 [1] (🛩️) small airplane
Expand All @@ -443,10 +445,12 @@
| 0xF0 0x9F 0x9B 0xBA #E12.0 [1] (🛺) auto rickshaw
| 0xF0 0x9F 0x9B 0xBB..0xBC #E13.0 [2] (🛻..🛼) pickup truck..rolle...
| 0xF0 0x9F 0x9B 0xBD..0xBF #E0.0 [3] (🛽..🛿) <reserved-1F6FD>..<...
| 0xF0 0x9F 0x9D 0xB4..0xBF #E0.0 [12] (🝴..🝿) <reserved-1F774>..<...
| 0xF0 0x9F 0x9D 0xB4..0xBF #E0.0 [12] (🝴..🝿) LOT OF FORTUNE..ORCUS
| 0xF0 0x9F 0x9F 0x95..0x9F #E0.0 [11] (🟕..🟟) CIRCLED TRIANGLE..<...
| 0xF0 0x9F 0x9F 0xA0..0xAB #E12.0 [12] (🟠..🟫) orange circle..brow...
| 0xF0 0x9F 0x9F 0xAC..0xBF #E0.0 [20] (🟬..🟿) <reserved-1F7EC>..<...
| 0xF0 0x9F 0x9F 0xAC..0xAF #E0.0 [4] (🟬..🟯) <reserved-1F7EC>..<...
| 0xF0 0x9F 0x9F 0xB0 #E14.0 [1] (🟰) heavy equals sign
| 0xF0 0x9F 0x9F 0xB1..0xBF #E0.0 [15] (🟱..🟿) <reserved-1F7F1>..<...
| 0xF0 0x9F 0xA0 0x8C..0x8F #E0.0 [4] (🠌..🠏) <reserved-1F80C>..<...
| 0xF0 0x9F 0xA1 0x88..0x8F #E0.0 [8] (🡈..🡏) <reserved-1F848>..<...
| 0xF0 0x9F 0xA1 0x9A..0x9F #E0.0 [6] (🡚..🡟) <reserved-1F85A>..<...
Expand Down Expand Up @@ -476,7 +480,7 @@
| 0xF0 0x9F 0xA5 0xB2 #E13.0 [1] (🥲) smiling face with tear
| 0xF0 0x9F 0xA5 0xB3..0xB6 #E11.0 [4] (🥳..🥶) partying face..cold...
| 0xF0 0x9F 0xA5 0xB7..0xB8 #E13.0 [2] (🥷..🥸) ninja..disguised face
| 0xF0 0x9F 0xA5 0xB9 #E0.0 [1] (🥹) <reserved-1F979>
| 0xF0 0x9F 0xA5 0xB9 #E14.0 [1] (🥹) face holding back tears
| 0xF0 0x9F 0xA5 0xBA #E11.0 [1] (🥺) pleading face
| 0xF0 0x9F 0xA5 0xBB #E12.0 [1] (🥻) sari
| 0xF0 0x9F 0xA5 0xBC..0xBF #E11.0 [4] (🥼..🥿) lab coat..flat shoe
Expand All @@ -494,29 +498,45 @@
| 0xF0 0x9F 0xA7 0x81..0x82 #E11.0 [2] (🧁..🧂) cupcake..salt
| 0xF0 0x9F 0xA7 0x83..0x8A #E12.0 [8] (🧃..🧊) beverage box..ice
| 0xF0 0x9F 0xA7 0x8B #E13.0 [1] (🧋) bubble tea
| 0xF0 0x9F 0xA7 0x8C #E0.0 [1] (🧌) <reserved-1F9CC>
| 0xF0 0x9F 0xA7 0x8C #E14.0 [1] (🧌) troll
| 0xF0 0x9F 0xA7 0x8D..0x8F #E12.0 [3] (🧍..🧏) person standing..de...
| 0xF0 0x9F 0xA7 0x90..0xA6 #E5.0 [23] (🧐..🧦) face with monocle.....
| 0xF0 0x9F 0xA7 0xA7..0xBF #E11.0 [25] (🧧..🧿) red envelope..nazar...
| 0xF0 0x9F 0xA8 0x80..0xFF #E0.0 [112] (🨀..🩯) NEUTRAL CHESS KING....
| 0xF0 0x9F 0xA9 0x00..0xAF #
| 0xF0 0x9F 0xA9 0xB0..0xB3 #E12.0 [4] (🩰..🩳) ballet shoes..shorts
| 0xF0 0x9F 0xA9 0xB4 #E13.0 [1] (🩴) thong sandal
| 0xF0 0x9F 0xA9 0xB5..0xB7 #E0.0 [3] (🩵..🩷) <reserved-1FA75>..<...
| 0xF0 0x9F 0xA9 0xB5..0xB7 #E15.0 [3] (🩵..🩷) light blue heart..p...
| 0xF0 0x9F 0xA9 0xB8..0xBA #E12.0 [3] (🩸..🩺) drop of blood..stet...
| 0xF0 0x9F 0xA9 0xBB..0xBF #E0.0 [5] (🩻..🩿) <reserved-1FA7B>..<...
| 0xF0 0x9F 0xA9 0xBB..0xBC #E14.0 [2] (🩻..🩼) x-ray..crutch
| 0xF0 0x9F 0xA9 0xBD..0xBF #E0.0 [3] (🩽..🩿) <reserved-1FA7D>..<...
| 0xF0 0x9F 0xAA 0x80..0x82 #E12.0 [3] (🪀..🪂) yo-yo..parachute
| 0xF0 0x9F 0xAA 0x83..0x86 #E13.0 [4] (🪃..🪆) boomerang..nesting ...
| 0xF0 0x9F 0xAA 0x87..0x8F #E0.0 [9] (🪇..🪏) <reserved-1FA87>..<...
| 0xF0 0x9F 0xAA 0x87..0x88 #E15.0 [2] (🪇..🪈) maracas..flute
| 0xF0 0x9F 0xAA 0x89..0x8F #E0.0 [7] (🪉..🪏) <reserved-1FA89>..<...
| 0xF0 0x9F 0xAA 0x90..0x95 #E12.0 [6] (🪐..🪕) ringed planet..banjo
| 0xF0 0x9F 0xAA 0x96..0xA8 #E13.0 [19] (🪖..🪨) military helmet..rock
| 0xF0 0x9F 0xAA 0xA9..0xAF #E0.0 [7] (🪩..🪯) <reserved-1FAA9>..<...
| 0xF0 0x9F 0xAA 0xA9..0xAC #E14.0 [4] (🪩..🪬) mirror ball..hamsa
| 0xF0 0x9F 0xAA 0xAD..0xAF #E15.0 [3] (🪭..🪯) folding hand fan..k...
| 0xF0 0x9F 0xAA 0xB0..0xB6 #E13.0 [7] (🪰..🪶) fly..feather
| 0xF0 0x9F 0xAA 0xB7..0xBF #E0.0 [9] (🪷..🪿) <reserved-1FAB7>..<...
| 0xF0 0x9F 0xAA 0xB7..0xBA #E14.0 [4] (🪷..🪺) lotus..nest with eggs
| 0xF0 0x9F 0xAA 0xBB..0xBD #E15.0 [3] (🪻..🪽) hyacinth..wing
| 0xF0 0x9F 0xAA 0xBE #E0.0 [1] (🪾) <reserved-1FABE>
| 0xF0 0x9F 0xAA 0xBF #E15.0 [1] (🪿) goose
| 0xF0 0x9F 0xAB 0x80..0x82 #E13.0 [3] (🫀..🫂) anatomical heart..p...
| 0xF0 0x9F 0xAB 0x83..0x8F #E0.0 [13] (🫃..🫏) <reserved-1FAC3>..<...
| 0xF0 0x9F 0xAB 0x83..0x85 #E14.0 [3] (🫃..🫅) pregnant man..perso...
| 0xF0 0x9F 0xAB 0x86..0x8D #E0.0 [8] (🫆..🫍) <reserved-1FAC6>..<...
| 0xF0 0x9F 0xAB 0x8E..0x8F #E15.0 [2] (🫎..🫏) moose..donkey
| 0xF0 0x9F 0xAB 0x90..0x96 #E13.0 [7] (🫐..🫖) blueberries..teapot
| 0xF0 0x9F 0xAB 0x97..0xBF #E0.0 [41] (🫗..🫿) <reserved-1FAD7>..<...
| 0xF0 0x9F 0xAB 0x97..0x99 #E14.0 [3] (🫗..🫙) pouring liquid..jar
| 0xF0 0x9F 0xAB 0x9A..0x9B #E15.0 [2] (🫚..🫛) ginger root..pea pod
| 0xF0 0x9F 0xAB 0x9C..0x9F #E0.0 [4] (🫜..🫟) <reserved-1FADC>..<...
| 0xF0 0x9F 0xAB 0xA0..0xA7 #E14.0 [8] (🫠..🫧) melting face..bubbles
| 0xF0 0x9F 0xAB 0xA8 #E15.0 [1] (🫨) shaking face
| 0xF0 0x9F 0xAB 0xA9..0xAF #E0.0 [7] (🫩..🫯) <reserved-1FAE9>..<...
| 0xF0 0x9F 0xAB 0xB0..0xB6 #E14.0 [7] (🫰..🫶) hand with index fin...
| 0xF0 0x9F 0xAB 0xB7..0xB8 #E15.0 [2] (🫷..🫸) leftwards pushing h...
| 0xF0 0x9F 0xAB 0xB9..0xBF #E0.0 [7] (🫹..🫿) <reserved-1FAF9>..<...
| 0xF0 0x9F 0xB0 0x80..0xFF #E0.0[1022] (🰀..🿽) <reserved-1FC...
| 0xF0 0x9F 0xB1..0xBE 0x00..0xFF #
| 0xF0 0x9F 0xBF 0x00..0xBD #
Expand Down
4 changes: 2 additions & 2 deletions textseg/generate.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ package textseg

//go:generate go run make_tables.go -output tables.go
//go:generate go run make_test_tables.go -output tables_test.go
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/13.0.0/ucd/auxiliary/GraphemeBreakProperty.txt -m GraphemeCluster -p "Prepend,CR,LF,Control,Extend,Regional_Indicator,SpacingMark,L,V,T,LV,LVT,ZWJ" -o grapheme_clusters_table.rl
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt -m Emoji -p "Extended_Pictographic" -o emoji_table.rl
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/15.0.0/ucd/auxiliary/GraphemeBreakProperty.txt -m GraphemeCluster -p "Prepend,CR,LF,Control,Extend,Regional_Indicator,SpacingMark,L,V,T,LV,LVT,ZWJ" -o grapheme_clusters_table.rl
//go:generate ruby unicode2ragel.rb --url=https://www.unicode.org/Public/15.0.0/ucd/emoji/emoji-data.txt -m Emoji -p "Extended_Pictographic" -o emoji_table.rl
//go:generate ragel -Z grapheme_clusters.rl
//go:generate gofmt -w grapheme_clusters.go
Loading

0 comments on commit 72b78f4

Please sign in to comment.