Add support for tensors/heads not divisible by GPUs · abacusai/vllm@36a43ca

Triggered via pull request February 4, 2025 15:27

synchronize #1

Status Failure

Total duration 4m 19s

Artifacts –

pre-commit.yml

on: pull_request

10 errors

Ruff (F841): vllm/config.py#L706

vllm/config.py:706:9: F841 Local variable `total_num_attention_heads` is assigned to but never used

Ruff (F841): vllm/config.py#L708

vllm/config.py:708:9: F841 Local variable `tensor_parallel_size` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/fused_moe/layer.py#L455

vllm/model_executor/layers/fused_moe/layer.py:455:9: F841 Local variable `tp_rank` is assigned to but never used

Ruff (E501): vllm/model_executor/layers/linear.py#L292

vllm/model_executor/layers/linear.py:292:81: E501 Line too long (93 > 80)

Ruff (F841): vllm/model_executor/layers/linear.py#L379

vllm/model_executor/layers/linear.py:379:13: F841 Local variable `shard_size` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/linear.py#L680

vllm/model_executor/layers/linear.py:680:9: F841 Local variable `tp_size` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/linear.py#L1129

vllm/model_executor/layers/linear.py:1129:9: F841 Local variable `tp_rank` is assigned to but never used

Ruff (F841): vllm/model_executor/layers/linear.py#L1153

vllm/model_executor/layers/linear.py:1153:13: F841 Local variable `shard_size` is assigned to but never used

Ruff (E501): vllm/model_executor/layers/quantization/base_config.py#L45

vllm/model_executor/layers/quantization/base_config.py:45:81: E501 Line too long (83 > 80)

Ruff (F841): vllm/model_executor/layers/quantization/fp8.py#L174

vllm/model_executor/layers/quantization/fp8.py:174:9: F841 Local variable `tp_chunk` is assigned to but never used