From a5b63953d2584e06855192dc68c6849589d60aa5 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 11 Feb 2025 15:39:07 -0800 Subject: [PATCH] [Ulysses tutorial] typos (#7024) Fix typos --- docs/_tutorials/ds-sequence.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/_tutorials/ds-sequence.md b/docs/_tutorials/ds-sequence.md index ffcb94946e6a..7a6041f393f9 100755 --- a/docs/_tutorials/ds-sequence.md +++ b/docs/_tutorials/ds-sequence.md @@ -49,7 +49,7 @@ def forward(): ``` -* **Add sequence parallel communication group**: Note that DistributedAttention takes `local_attn` and `sequence_parallel_group` as the parameters, where local_attn can be your original attention block. You also need to build the sequence parallel nication group and pass that the DistributedAttention. One way to do this is to build the sequence parallel group at the model initialization stage. +* **Add sequence parallel communication group**: Note that DistributedAttention takes `local_attn` and `sequence_parallel_group` as the parameters, where local_attn can be your original attention block. You also need to build the sequence parallel communication group and pass that the DistributedAttention. One way to do this is to build the sequence parallel group at the model initialization stage. ```python @@ -94,7 +94,7 @@ DeepSpeed's sequence parallelism can be combined with different types of attenti `FlashAttention`: the implementation from [FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness](https://arxiv.org/abs/2205.14135). Enabled by `--use-flash-attn`. -`FlashAttention + Triton`: a of FlashAttention in Triton (tested with triton==2.0.0.dev20221202). Enabled by `--use-flash-attn-triton`. +`FlashAttention + Triton`: FlashAttention in Triton (tested with triton==2.0.0.dev20221202). Enabled by `--use-flash-attn-triton`. For the best performance, we recommend using FlashAttention + Triton. Below are the installation steps. Note that FlashAttention is compatible only with NVIDIA Turing, Ampere, Ada, or Hopper GPUs. @@ -114,4 +114,4 @@ cd flash-attention python setup.py install ``` -You may also want to ensure your model configuration is compliant with FlashAttention's requirements. For instance, to achieve optimal performance, the head size should be divisible by 8. Refer to the document of FlashAttention for more details. +You may also want to ensure your model configuration is compliant with FlashAttention's requirements. For instance, to achieve optimal performance, the head size should be divisible by 8. Refer to the FlashAttention documentation for more details.