Speed up _is_eol_token #1257

correctmost · 2024-08-10T02:59:50Z

Continuation of #1256 because I force pushed and cannot reopen that PR. (Sorry for the noise.)

If this PR seems too risky because of assumptions about tokenization, feel free to pass!

Stats

Before

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.885    0.000    1.197    0.000 pycodestyle.py:1831(_is_eol_token)
  1472360    0.364    0.000    0.364    0.000 {method 'lstrip' of 'str' objects}

Command	Mean [s]	Min [s]	Max [s]	Relative
`pycodestyle .`	18.472 ± 0.196	18.067	18.848	1.00

After

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.511    0.000    0.511    0.000 pycodestyle.py:1831(_is_eol_token)
   225341    0.055    0.000    0.055    0.000 {method 'lstrip' of 'str' objects}

Command	Mean [s]	Min [s]	Max [s]	Relative
`pycodestyle .`	18.360 ± 0.186	18.159	18.781	1.00

correctmost · 2024-08-10T08:09:05Z

Versions tested locally (on Arch Linux with pyenv):

3.13.0b4
3.12.4
3.12.0
3.11.9
3.11.0
3.10.14
3.10.0
3.9.19
3.9.0
3.8.19
3.8.0

asottile

this one seems more risky than the others so I'm leaning towards passing on it

asottile · 2024-08-11T15:17:36Z

pycodestyle.py

+    # Check if the line's penultimate character is a continuation
+    # character
+    if token[4][-2] != '\\':
+        return False


I'm worried about this IndexErroring -- though the only case I can think of is DEDENT tokens on a blank line triggering this? and I'd expect the testsuite to catch that already

I know we had trouble with ENDMARKER in the past -- there are several patch versions where python reports end-of-file inconsistently / incorrectly -- especially when it either doesn't end with a newline or ends with an escape sequence (and unfortunately I think our test coverage is lacking here!)

Thanks for the review :). I wasn't able to trigger an IndexError with this patch, but I may be overlooking some test cases.

I left some testing notes below in case someone else picks this up in the future.

Testing notes

Test cases:

Run pycodestyle's test suite with pytest

Test with the attachment from https://bugs.python.org/issue44667

Test with that attachment + a newline character added at the end

Test with the example from tokenize.py: NEWLINE/NL mixup after line continuation character in indented block python/cpython#106202

Release versions tested:

3.8.0 - 3.8.19

3.9.0 - 3.9.19 (3.9.3 was yanked)

3.10.0 - 3.10.14

3.11.0 - 3.11.9

3.12.0 - 3.12.3

Speed up _is_eol_token

ec81dec

asottile reviewed Aug 11, 2024

View reviewed changes

correctmost closed this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up _is_eol_token #1257

Speed up _is_eol_token #1257

correctmost commented Aug 10, 2024

correctmost commented Aug 10, 2024

asottile left a comment

asottile Aug 11, 2024

correctmost Aug 12, 2024

Speed up _is_eol_token #1257

Speed up _is_eol_token #1257

Conversation

correctmost commented Aug 10, 2024

Stats

Before

After

correctmost commented Aug 10, 2024

asottile left a comment

Choose a reason for hiding this comment

asottile Aug 11, 2024

Choose a reason for hiding this comment

correctmost Aug 12, 2024

Choose a reason for hiding this comment

Testing notes