vllm.model_executor.layers.quantization.kernels.scaled_mm.flashinfer ¶
FlashInferFP8ScaledMMLinearKernel ¶
Bases: FP8ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/flashinfer.py
apply_scaled_mm ¶
apply_scaled_mm(
*,
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor | None,
output_shape: list,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/flashinfer.py
can_implement classmethod ¶
can_implement(
c: FP8ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]