-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up highlighting by about 2x #764
base: master
Are you sure you want to change the base?
Conversation
f8a64ee
to
94a5c86
Compare
I see that travis ci build is failing because my code doesn't work on older versions of zsh (I've only tested it on zsh 5.8). If you are open to accepting the PR, I'll port it to older zsh so that travis ci build passes. |
Milestoning 0.8.0 for review purposes. (Not expressing an opinion about whether we should release 0.8.0 before or after this is merged.) |
Context for the release timing question: We merged redrawhook last week, which was the main release blocker, and the merge went well, so I for one was in a "let's clear the milestone and release" mindset; and this change and #758, though both very welcome, are also a bit invasive — so I'm not sure whether I'd like to merge them before or after cutting a release. I commented to this effect on #722. If we release first, then we could have an 0.9.0 release not long afterwards, of course. ("Release early and often") @romkatv Thoughts? |
I don't have an opinion on the matter of release timing. I'll defer to your judgement. I believe the vast majority of users are following |
I don't have an opinion on the matter of release timing. I'll defer to your judgement.
Ack, thanks.
I believe the vast majority of users are following `master` because an uncountable number articles on medium.com tell them to do it this way.
The time between introduction and report of a regression is regularly under a day, so there must be a large number of users who follow master and update to HEAD regularly. I don't know whether or not they are a majority, though, since I don't have any way to estimate how many people use z-sy-h and update it less frequently.
So in practical terms whatever you merge into master ends up being used by some 90+% of users right away.
Yeah, releases are mainly for people using z-sy-h via distro packages.
|
Regarding the #758 conflict the changes are fairly small the diff is just huge. No need to block this on my part; I can rebase. Looked over a bit of the changes and they look ok. It would be nice to have a style guide documented as well to document which idioms are known to be ideal/non-ideal to help reduce regressions. |
I've fixed all issues identified by travis ci build. It's green now. |
Thanks! (But have no time for more than that right now) |
This PR had a regression that caused newly installed commands not being recognized as commands until the user explicitly calls |
$1 == [^/]*/* && $zsyh_user_options[pathdirs] == on && -n ${^path}/$1(#q-N.*) ]]; then | ||
REPLY=command | ||
elif (( _zsh_highlight_main__rehash )); then | ||
builtin rehash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's our place to invoke rehash
. A library shouldn't change global state unless explicitly asked to. That's just as valid for printing to stdout and setting envvars as it is for populating cmdnamtab
.
The new-command.zsh
test clarifies the failing use-case. Could you elaborate on what the failure mode was (why that use-case works in master, and what part of the PR broke it)? Also, why run rehash
at all? If I do, say, apt install rsync
and then type rsync
in command position, and it gets highlighted in red, that'll be perfectly correct behaviour (as invoking accept-line
at that point will result in an error).
master does call type -w
, which does an implicit rehash
, but that happens in a subshell.
And, yes, what I wrote above implies that even that subshell shouldn't be doing a rehash
, if we can figure a way to implement that. That may be a bug, but if so, it only affects zsh/parameter
-less builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
new-command.zsh
test clarifies the failing use-case. Could you elaborate on what the failure mode was (why that use-case works in master, and what part of the PR broke it)?
In master
a newly installed command gets highlighted as "command" (green) but this comes at the cost of two forks. Here's how master
figures out the type of zsyh-new-command
:
- Check
$+commands[zsyh-new-command]
. It's zero, so no dice. - Check whether
builtin type -w -- zsyh-new-command
fails in a subshell. It doesn't fail, so no dice. - Run
builtin type -w -- zsyh-new-command
in a subshell a second time and parse the output to extract "command" from it.
Highlighting of zsyh-new-command
gets fast (no forks) only after rehash
is invoked by the user.
In the first version of this PR zsyh-new-command
was highlighted as "unknown-token" (red) because the algorithm was giving up after checking $+commands[zsyh-new-command]
. I mistakenly believed that builtin type -w
calls in master
were there only for forward compatibility and thus could be avoided when ZSH_VERSION <= 5.8
and zsh/parameter
is present.
Thanks to _zsh_highlight_main__command_type_cache
, highlighting of zsyh-new-command
in master
is slow only once per command. The reason I wanted to get rid of forks in my PR is not specfically to speed up highlighting of newly installed commands. My goal was faster highlighting of partially typed commands. For example, when I type systemctl status
one character at a time, master
forks twice on every character until I get to systemctl
. Two forks (or even just one) on every keystroke is something I'd like to avoid.
If I do, say,
apt install rsync
and then typersync
in command position, and it gets highlighted in red, that'll be perfectly correct behaviour (as invokingaccept-line
at that point will result in an error).
Not in zsh 5.8 with the default options.
% sudo docker run -e TERM -e LC_ALL=C.UTF-8 -it --rm zshusers/zsh:5.8
# rsync
zsh: command not found: rsync
# apt-get update && apt-get install -y rsync
# rsync
rsync version 3.1.2 protocol version 31
# print $+commands[rsync]
0
Are there options that make the second rsync
command fail with "command not found"?
Tangent: Seems like the meaning of hash_cmds
option is reversed. If the commands above are executed with no_hash_cmds
, $+commands[rsync]
ends up set to 1
at the end.
I'm not sure it's our place to invoke
rehash
. A library shouldn't change global state unless explicitly asked to.
This is a good point. rehash
has the side effect of removing manually hashed commands -- a destructive action.
% hash say=/bin/echo
% say hello
hello
% rehash
% say hello
zsh: command not found: say
I've added a test that verifies that rehash
is not called (no-rehash.zsh
). It fails with the version of this PR that invokes rehash
, and succeeds in master
.
I've also added a test (removed-command.zsh
) that verifies highlighting of a newly uninstalled command (e.g., after apt remove rsync
). Since the command cannot be invoked, it should be highlighted as "unknown-token" (red). This test fails on master
and the version of this PR that you've looked at.
I've added a couple more tests that cover edge cases related to hashed commands. My goal was to highlight commands in green if executing them would succeed and in red if executing them would fail. This doesn't always align with the output of type -w
. For example:
% hash foo=/not-found
% type -w foo
foo: hashed
% foo
command not found: foo
I believe it's desirable for foo
to be highlighted in red here. Let me know if I misunderstood the expected behavior of zsyh.
I've changed main-highlighter.zsh
so that all tests pass. It's still as fast as in the very first version of the PR. No forks when using a recent-enough version of zsh. Note that this code now highlights some tokens as "hashed-command" that were previously (mistakenly, I think) highlighted as "command".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal was faster highlighting of partially typed commands. For example, when I type
systemctl status
one character at a time,master
forks twice on every character until I get tosystemctl
. Two forks (or even just one) on every keystroke is something I'd like to avoid.
Makes sense.
Previous discussions about the "command word whilst it's being typed" case: #148, #244.
Are there options that make the second
rsync
command fail with "command not found"?
No, sorry, I must've been misremembering. I stand corrected.
% sudo docker run -e TERM -e LC_ALL=C.UTF-8 -it --rm zshusers/zsh:5.8 # rsync zsh: command not found: rsync # apt-get update && apt-get install -y rsync # rsync rsync version 3.1.2 protocol version 31 # print $+commands[rsync] 0Tangent: Seems like the meaning of
hash_cmds
option is reversed. If the commands above are executed withno_hash_cmds
,$+commands[rsync]
ends up set to1
at the end.
I see that too:
$ sudo chmod -x /usr/local/bin/caste
$ zsh -f -o nohashcmds
% sudo chmod +x /usr/local/bin/caste
% caste &>/dev/null
% echo $+commands[caste]
1
But checking another way, I don't:
$ sudo chmod -x /usr/local/bin/caste
$ zsh -f -o nohashcmds
% sudo chmod +x /usr/local/bin/caste
% caste &>/dev/null
% hash | grep caste
%
I think the 1
output in the NO_HASH_CMDS case is because of the implicit cmdnamtab->filltable()
call in the getter when the HASH_LIST_ALL option is set: https://github.com/zsh-users/zsh/blob/zsh-5.8/Src/Modules/parameter.c#L217-L221. However, I can't explain the 0
output in the "default option settings" case. Isn't it a bug?
I don't have a strong opinion regarding whether or not HASH_CMDS should or shouldn't affect ${commands}.
I've added a test that verifies that
rehash
is not called (no-rehash.zsh
). It fails with the version of this PR that invokesrehash
, and succeeds inmaster
.I've also added a test (
removed-command.zsh
) that verifies highlighting of a newly uninstalled command (e.g., afterapt remove rsync
). Since the command cannot be invoked, it should be highlighted as "unknown-token" (red). This test fails onmaster
and the version of this PR that you've looked at.
Thanks for the additional tests and bugfixes!
I believe it's desirable for
foo
to be highlighted in red here. Let me know if I misunderstood the expected behavior of zsyh.
That's an edge case indeed, but I agree it should be highlighted in red.
I've changed
main-highlighter.zsh
so that all tests pass. It's still as fast as in the very first version of the PR. No forks when using a recent-enough version of zsh. Note that this code now highlights some tokens as "hashed-command" that were previously (mistakenly, I think) highlighted as "command".
Thanks. Looks good, but a few comments:
-
The log message of the latest commit (6a210f1) isn't clear about which corner cases were fixed. Is it just "the cases that tests are being added for"?
-
There is one case where the following logic incorrectly sets
← could you please either add an XFail test, or make this a### TODO:
comment, or both? -
Please explain the magic number
5.8
in that comment. (I guess it should just say that that's the latest zsh release at the time of writing.) -
-n $1(#q-.*N)
seems a little roundabout: why not[[ -x $1 ]]
? (The answer should be in a code comment, please, rather than here.) -
Should
_zsh_highlight_main__command_type_cache
be emptied when we notice$PATH
or${zsyh_user_options[pathdirs]}
has changed? (Preëxisting issue, at least for PATH_DIRS) -
Now the PR adds
_zsh_highlight_main__rehash
and then removes it again. What's your preference — to keep it like that, or to rewrite history (after review finishes, before merging) to avoid the intermediate state? (Honest question.) -
What's the difference between the
hashed-command
test in master and theinvalid-hashed-command
test in the PR? Seems like they're identical, except that one is XFailing and one is passing? The latest commit seems to have essentially repurposedhashed-command
to test a different scenario than it tests in master, relegating the scenario tested in master to another file, and I don't see why. -
no-rehash
: Cherry-picked in 62c5575. Thanks! -
Changing
hash sudo=false
is correct, but isn't covered by the log message.
Thanks for all the work so far!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your comments and I appreciate them. I should be able to address them early next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't explain the
0
output in the "default option settings" case. Isn't it a bug?
Looks like a bug.
I don't have a strong opinion regarding whether or not HASH_CMDS should or shouldn't affect ${commands}.
My expectation is that $commands
contains hashed commands and only hashed commands. So if HASH_CMDS
causes an invoked command to be hashed, $commands
should reflect that. I don't have a strong support for this expectation, it's just how I thought it works.
- The log message of the latest commit (6a210f1) isn't clear about which corner cases were fixed.
See below.
Is it just "the cases that tests are being added for"?
Yes.
There is one case where the following logic incorrectly sets
← could you please either add an XFail test, or make this a### TODO:
comment, or both?
Done. Added highlighters/main/test-data/ambiguous-hashed-command.zsh
.
Note that this isn't a regression. The PR fixes most misclassifications of hashed commands as plain commands but it doesn't fix all of them.
- Please explain the magic number
5.8
in that comment. (I guess it should just say that that's the latest zsh release at the time of writing.)
Done.
-n $1(#q-.*N)
seems a little roundabout: why not[[ -x $1 ]]
? (The answer should be in a code comment, please, rather than here.)
Added a comment.
# [[ -n $1(#q-.*N) ]] is a faster version of [[ -f $1 && -x $1 ]].
The difference in performance is greatest on filesystems with slow stat()
. Another reason I've written it this way is consistency with the code a few lines below.
- Should
_zsh_highlight_main__command_type_cache
be emptied when we notice$PATH
or${zsyh_user_options[pathdirs]}
has changed? (Preëxisting issue, at least for PATH_DIRS)
_zsh_highlight_main__command_type_cache
is emptied on precmd
. I think this is reasonable. (I think computing zsyh_user_options
only once per command would also be reasonable but it makes no noticeable difference on performance.)
- Now the PR adds
_zsh_highlight_main__rehash
and then removes it again. What's your preference — to keep it like that, or to rewrite history (after review finishes, before merging) to avoid the intermediate state? (Honest question.)
I don't know the standard PR methodology or even if there is one. I usually do it like this:
- Work on code changes locally. Commit when I feel like.
- Once done, rebase all commits. Split changes into reasonable self-contained chunks so that commits can be reviewed one-by-one.
- Once the reviewer posts their first comments, avoid force pushes. Additional changes requested by the reviewer must go in separate commits.
- When the reviewer is happy with the code changes, ask them if they prefer the PR to be rebased and whether they have a preference for how it should be split into commits.
I think we are currently at 3 -- discussing the code changes and making changes to it. If you are OK with the current state of the code, let me know whether you want me to rebase the PR into one commit with a long description or a handful of commits with somewhat shorter descriptions. The former is obviously easier for me but I'm fine with the latter, too.
- What's the difference between the
hashed-command
test in master and theinvalid-hashed-command
test in the PR? Seems like they're identical, except that one is XFailing and one is passing? The latest commit seems to have essentially repurposedhashed-command
to test a different scenario than it tests in master, relegating the scenario tested in master to another file, and I don't see why.
It's a trade-off between the simplicity of the final version of the code and the size of the code delta to reach it. I wanted to have two tests for hashed commands: one where the command is valid and another where it's invalid. hashed-command
and invalid-hashed-command
seemed like good names for these tests, so I grabbed them.
That said, I have a very weak preference here, so if you'd like me to change anything I'll be happy to oblige.
- Changing
hash sudo=false
is correct, but isn't covered by the log message.
Indeed. I realize that commit messages are not up to z-sy-h standards. I wanted to invest only as much time in them as would be necessary for you to review the code. I can write higher-quality commit messages once you are OK with the proposed code changes.
Thanks for all the work so far!
Thanks for the review and great feedback!
Please don't hesitate using more direct language to request changes from me. "Add a comment here", "rename this test", "merge these commits", etc. I'm happy to comply as soon as I understand what's required. By proposing these code changes I'm effectively asking you to maintain them going forward, so it's only reasonable that you request compliance with your own standards.
See comments within for the rationale. This is a regression test for a regression that was only present in development versions of PR #764 and was never present in master.
Sorry for the latency.
> I can't explain the `0` output in the "default option settings" case. Isn't it a bug?
Looks like a bug.
Would you report it upstream, please?
> I don't have a strong opinion regarding whether or not HASH_CMDS should or shouldn't affect ${commands}.
My expectation is that `$commands` contains hashed commands and only hashed commands. So if `HASH_CMDS` causes an invoked command to be hashed, `$commands` should reflect that. I don't have a strong support for this expectation, it's just how I _thought_ it works.
Seems reasonable.
> 2. `There is one case where the following logic incorrectly sets` ← could you please either add an XFail test, or make this a `### TODO:` comment, or both?
Done. Added `highlighters/main/test-data/ambiguous-hashed-command.zsh`.
Thanks. Also thanks for the other (snipped) changes.
Note that this isn't a regression. The PR fixes most misclassifications of hashed commands as plain commands but it doesn't fix all of them.
*nod*. That's not a problem. A PR is mergeable if it improves at least one thing and doesn't regress anything.
> 4. `-n $1(#q-.*N)` seems a little roundabout: why not `[[ -x $1 ]]`? (The answer should be in a code comment, please, rather than here.)
Added a comment.
```zsh
# [[ -n $1(#q-.*N) ]] is a faster version of [[ -f $1 && -x $1 ]].
```
The difference in performance is greatest on filesystems with slow `stat()`. Another reason I've written it this way is consistency with the code a few lines below.
Fair enough.
With my zsh upstream hat, I recall that Perl has a magic variable `_` (without a sigil) exactly for this use-case: in Perl, `-f $foo && -x _` is equivalent to `-f $foo && -x $foo` except that it doesn't call stat() a second time. I wonder if zsh upstream should consider implementing something along these lines.
> 5. Should `_zsh_highlight_main__command_type_cache` be emptied when we notice `$PATH` or `${zsyh_user_options[pathdirs]}` has changed? (Preëxisting issue, at least for PATH_DIRS)
`_zsh_highlight_main__command_type_cache` is emptied on `precmd`. I think this is reasonable.
Well, yes, it's not exactly likely that a widget will change $PATH. Nevertheless, that's neither impossible nor forbidden, so could you please record it in a comment or an issue or fix it (whichever is easiest for you)?
(I think computing `zsyh_user_options` only once per command would also be reasonable but it makes no noticeable difference on performance.)
*nod*
> 6. Now the PR adds `_zsh_highlight_main__rehash` and then removes it again. What's your preference — to keep it like that, or to rewrite history (after review finishes, before merging) to avoid the intermediate state? (Honest question.)
I don't know the standard PR methodology or even if there is one. I usually do it like this:
1. Work on code changes locally. Commit when I feel like.
2. Once done, rebase all commits. Split changes into reasonable self-contained chunks so that commits can be reviewed one-by-one.
3. Once the reviewer posts their first comments, avoid force pushes. Additional changes requested by the reviewer must go in separate commits.
4. When the reviewer is happy with the code changes, ask them if they prefer the PR to be rebased and whether they have a preference for how it should be split into commits.
I think we are currently at 3 -- discussing the code changes and making changes to it. If you are OK with the current state of the code, let me know whether you want me to rebase the PR into one commit with a long description or a handful of commits with somewhat shorter descriptions. The former is obviously easier for me but I'm fine with the latter, too.
In general, I prefer small, self-contained commits. The extra time required to do the rebase before pushing is generally well offset by making reviews and future bisects easier. (As producingoss points out, in general it's advisable for one person to do a little more work if that means O(N) people will each hvae to do a little less work down the road.)
> 7. What's the difference between the `hashed-command` test in master and the `invalid-hashed-command` test in the PR? Seems like they're identical, except that one is XFailing and one is passing? The latest commit seems to have essentially repurposed `hashed-command` to test a different scenario than it tests in master, relegating the scenario tested in master to another file, and I don't see why.
It's a trade-off between the simplicity of the final version of the code and the size of the code delta to reach it. I wanted to have two tests for hashed commands: one where the command is valid and another where it's invalid. `hashed-command` and `invalid-hashed-command` seemed like good names for these tests, so I grabbed them.
That said, I have a very weak preference here, so if you'd like me to change anything I'll be happy to oblige.
Please keep `hashed-command`'s existing name and meaning and name the new test `hashed-command-valid`. Rationale:
- Avoid renaming or repurposing existing tests, as that tends to create merge conflicts downstream without a commensurate upside. (In this case, I happen to maintainer a downstream package which happens to actually have a downstream-only mod for `hashed-command`, but that's not why I have this preference.)
- Make both names start the same way so as to keep them next to each other (in tab completion, `make test` output, and so on).
- I were naming them from scratch I'd dub them `hashed-command-valid` and `hashed-command-invalid`, but in this case I'd leave `hashed-command`'s name untouched for reasons given in the first bullet.
Renaming might be easier to implement with git-filter-branch(1) than with git-rebase(1). YMMV.
> 9. Changing `hash sudo=false` is correct, but isn't covered by the log message.
Indeed. I realize that commit messages are not up to z-sy-h standards. I wanted to invest only as much time in them as would be necessary for you to review the code. I can write higher-quality commit messages once you are OK with the proposed code changes.
+1 and thanks. It's rarely an error to make log messages or comments _more_ elaborate.
I don't recall a systemic problem with all log messages or I'd have mentioned it.
Feel free to push history rewrites before the next request for review.
> Thanks for all the work so far!
Thanks for the review and great feedback!
You're welcome! Credit goes to my teachers on dev@subversion ☺
Please don't hesitate using more direct language to request changes from me. "Add a comment here", "rename this test", "merge these commits", etc. I'm happy to comply as soon as I understand what's required. By proposing these code changes I'm effectively asking you to maintain them going forward, so it's only reasonable that you request compliance with your own standards.
Wilco.
As to maintaining the changes going forward, I was hoping you'd stick around, or failing that, that you'd at least resurface should a regression be reported that'll turn out to have been caused by your patches. It'd be great to have you on board, and you needn't be any more active than you'll want to be. That said, I'll be happy to accept the PR on any terms.
|
Relevant zsh code is around https://github.com/zsh-users/zsh/blob/master/Src/cond.c#L341 . Memoizing the dostat call would be trivial; however, -x calls the
whereas |
I've made another performance improvement that affects highlighting of commands with an alias in command position. PTAL.
Good point. Zsh is also affected by these edge cases.
I believe this is intended behavior. I've fixed the code so that it doesn't get tripped over this corner case. It's now a bit slower but I can measure the difference in performance only on WSL + NTFS, and even then it's fairly small.
Done: workers/47366.
Added a comment. Note that this isn't a regression.
Done.
Please confirm whether the code looks good (after the latest changes) and I'll go ahead with rebasing. |
I've made another performance improvement that affects highlighting of commands with an alias in command position. PTAL.
Will do, but it may be a few days. Cheers!
|
See comments within for the rationale.
No worries. As long as there are no large changes in master (which I would have to merge), the delay has no negative impact on me. Take your time.
Done. The current status of this PR from my point of view: I'm waiting for you to review the code (the whole diff) and either request additional changes or let me know that the code looks good to go. In the latter case I'll split all changes into reasonable-sized commits and force-push. |
Any movement on this @romkatv, @danielshahaf? |
As the optimization target I measured how long it takes to type a command one character at a time when syntax highlighting is triggered after every character. This matches the most common use case. I used zsh 5.8 with the following
.zshrc
:To run the benchmark, set
ZSYH_BM_BUFFER
to the command you want to get typed (e.g.,ZSYH_BM_BUFFER='ls abc'
) and press Ctrl+F.This benchmark encourages reuse of computation artifacts on successive
_zsh_highlight
calls whenBUFFER
changes only slightly.invalid
ls abc
ls abc abc abc abc abc abc abc abc abc
ls 1 2 3 4 5 6 7 8 9
In addition to
bm-zsyh
I also used a slightly modified version that appends characters to the front ofBUFFER
, as if a user is typing a command backwards, starting from the last character. I'm not showing the results here because they are identical. The optimizations in the PR take advantage ofBUFFER
changing slightly between calls to_zsh_highlight
but they are agnostic about the position and nature of those changes.This PR has minor positive effect on
zsh tests/test-perfs.zsh main
: 17.332s with the PR vs 21.064s without. If_zsh_highlight
is defined as an empty function, the benchmark reports 9.680s. This implies higher impact of the PR: 7.652s vs 11.384s, a speedup of 49%. I believe this benchmark is less representative of user activity and thus less useful as a target of optimization than the one I presented above.make test
reports no failures.There is nothing really interesting in the PR. No new algorithms or architectural changes. Just dumb local micro optimizations here and there.
Every commit in the PR has measurable performance benefits compared to the state of the repository to which it's applied.