Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated hyphen at end of line #1

Open
PhilterPaper opened this issue Dec 10, 2020 · 4 comments
Open

Repeated hyphen at end of line #1

PhilterPaper opened this issue Dec 10, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@PhilterPaper
Copy link
Owner

I can see in examples/KP.pl (paragraph-shaping example) that one place where a compound word was split at a hard hyphen (ASCII x2D) results in a new hyphen being added, giving a double hyphen -- at the end of the line. I haven't yet tracked down whether this is a problem in Text::KnuthPlass, or something that should be handled in the calling code (KP.pl).

A similar issue will probably arise with soft hyphens, non-breaking hyphens, narrow-hyphens, and various dashes (en-, em-, fig-). I would like to take care of all of them together, in a consistent manner, rather than chasing them down one at a time.

@PhilterPaper PhilterPaper added the bug Something isn't working label Dec 10, 2020
@PhilterPaper
Copy link
Owner Author

I've fixed this in the example (KP.pl) for an ordinary hyphen (U+002D), and see that Knuth-Plass refuses to split after an em-dash, so that still needs looking into. It's still up in the air whether such special treatment (suppress the extra hyphen after splitting at a hyphen or dash) should be moved upstream into KnuthPlass.pm.

@PhilterPaper
Copy link
Owner Author

See PDF::Builder's /UniWrap.pm for code which purports to show where text may be/may not be/must be split within a line, according to the alphabet (script) in use.

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Jan 9, 2022

See also both #2 and #3 regarding word/line splitting in general. Various sorts of hyphens and dashes need to be handled consistently and in accordance with UniCode rules. If a line (or word) can be split after a hyphen or dash (and possibly other punctuation), normally you would not need to add any form of hyphen at the end of the line.

This brings up the point of how to split a line (mentioned in #4) when mixing LTR and RTL text -- what hyphen to use and where on the line to place it. For example:

We are out of space in the middle of a long German wor...:txet werbeH YRAMIRP emos si ereH
...dToBeSplit (you know how they love long words). ?txet werbeH erom emos seog ereh ebyaM

(I'm not even sure whether the split German line is to the physical left or right of the RTL Hebrew text, much less where the split word's hyphen goes. Assume the document is RTL overall.)

@PhilterPaper PhilterPaper added enhancement New feature or request and removed bug Something isn't working labels Oct 3, 2022
@PhilterPaper
Copy link
Owner Author

The original problem of an extra hyphen has been fixed in the examples (calling program), so I will remove the "bug" label. Since other line-split dashes, etc. still need to be handled somewhere, I'll add an "enhancement" label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant