You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During our actual training process, we employed the flash attention2 mechanism; however, due to its heavy reliance on numerous other dependencies, the training codebase is not conducive to being open-sourced. For those seeking enhanced inference performance, we strongly advise adopting our newly released gguf quantization version or exploring alternative open-source options, including vllm and Tensor RT, among others.
i want to know if it can use flash attention.thanks.
The text was updated successfully, but these errors were encountered: