-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrink Model #67
Shrink Model #67
Conversation
…ng unused branches from num_non_zeros_counts7x7
Performance is better though, compilation with Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.lep images/0.jpg':
390 126 953 cache-references (45,24%)
28 850 992 cache-misses # 7,40% of all cache refs (45,42%)
9 172 030 367 cycles (45,42%)
395 459 199 stalled-cycles-backend # 4,31% backend cycles idle (45,43%)
20 297 434 stalled-cycles-frontend # 0,22% frontend cycles idle (45,55%)
23 285 910 196 instructions # 2,54 insn per cycle
# 0,02 stalled cycles per insn (45,70%)
2 387 202 436 branch-instructions (45,76%)
72 726 804 branch-misses # 3,05% of all branches (45,76%)
4 172 462 359 ic_fetch_stall.ic_stall_any (45,75%)
23 132 818 l2_cache_misses_from_ic_miss (45,56%)
761 995 310 l2_latency.l2_cycles_waiting_on_fills (45,35%)
83 319 faults
1 migrations
2,034150060 seconds time elapsed
1,875813000 seconds user
0,155984000 seconds sys Performance counter stats for 'taskset -c 10 nice -n -20 target/release/lepton_jpeg_util images/img_52MP_7k.lep images/0.jpg':
409 828 089 cache-references (45,22%)
27 165 356 cache-misses # 6,63% of all cache refs (45,22%)
8 971 453 852 cycles (45,22%)
376 186 450 stalled-cycles-backend # 4,19% backend cycles idle (45,50%)
22 944 563 stalled-cycles-frontend # 0,26% frontend cycles idle (45,82%)
22 665 330 779 instructions # 2,53 insn per cycle
# 0,02 stalled cycles per insn (45,90%)
2 444 459 853 branch-instructions (45,90%)
75 619 279 branch-misses # 3,09% of all branches (45,90%)
3 927 762 385 ic_fetch_stall.ic_stall_any (45,72%)
20 967 746 l2_cache_misses_from_ic_miss (45,34%)
715 590 263 l2_latency.l2_cycles_waiting_on_fills (45,09%)
83 297 faults
1 migrations
1,981634374 seconds time elapsed
1,816212000 seconds user
0,163666000 seconds sys |
we could make ModelPerColor::counts a single array of NON_ZERO_7X7_BINS * 49 size and pass a single usize that is incrementally update during the loop, adding 1 each iteration and using a NON_ZERO_TO_BIN_7X7 * 49 to incrementally update the index. |
Not quite get it. Do you have in mind that we can use an one-dimensional array there and update index by NON_ZERO_TO_BIN_7X7 always and by -1 if coef !=0 and bin is changed? |
Yeah, so in serialize_token/parse_token we can incrementally update the index into the one-dimensional array and pass a single parameter to read/write_coef (nz_zig49 or something like that). You can either have non_zero_to_bin as the most significant or the zig49. In one case you do the above, I did it the oppose way: Each iteration increment the array counter always += 1 Don't know which is faster. We might also shrink the table to some degree since we can't have more non-zeros than there are coefficients left to process. |
The size of Model affects number of cache misses. Some of the Branches are never referenced, then more dense arrays for
get_grid/put_grid
andnum_non_zeros_counts7x7
can be used along the lines of this comment. Performance is not much changed on x64, but memory consumption is lower by ~10 kB per Model and Branches requested in succession are grouped together, that improve cache locality.