Skip to content

Commit

Permalink
[Bugfix] Fix illegal memory access in FP8 MoE kernel (vllm-project#6382)
Browse files Browse the repository at this point in the history
  • Loading branch information
comaniac authored Jul 12, 2024
1 parent 21b2dce commit 75f64d8
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions vllm/model_executor/layers/fused_moe/fused_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -492,12 +492,14 @@ def fused_experts(hidden_states: torch.Tensor,
if tokens_in_chunk == 0:
break

if tokens_in_chunk < CHUNK_SIZE:
# will only happen in the last chunk
if tokens_in_chunk < CHUNK_SIZE and chunk > 0:
# Adjust the intermediate cache size and config for the last
# chunk. Note that in most cases we only have one chunk
# so the cache size and config are already set correctly and
# do not need to be adjusted.
intermediate_cache1 = intermediate_cache1[:tokens_in_chunk]
intermediate_cache2 = intermediate_cache2[:tokens_in_chunk]
intermediate_cache3 = intermediate_cache3[:tokens_in_chunk]
# reload config to get better performance on the last chunk
config = get_config_func(tokens_in_chunk)

curr_topk_ids = topk_ids[begin_chunk_idx:end_chunk_idx]
Expand Down

0 comments on commit 75f64d8

Please sign in to comment.