You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1, dowload auto-gptq , then Change the peft_utils.py in my own auto-gptq path(python path/auto_gptq/utils/peft_utils.py) with the new one(qa-lora).
2, follow the instruction of auto-gptq, I run this project as follow : python quant_with_alpaca.py --pretrained_model_dir /home/lizhangming/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16 --quantized_model_dir llama7b-quant4bit-g32 --bits 4 --group_size 32 --save_and_reload
3,Then I got three files config.json , quantize_config.json, gptq_model-4bit-32g.bin
4, Then I copy the rest file from huggingface llama7b
5,last I run CUDA_VISIBLE_DEVICES=0 HF_DATASETS_OFFLINE=1 python qalora.py --model_path AutoGPTQ/examples/quantization/llama7b-quant4bit-g32/
##########################################The I got the error#########################
The text was updated successfully, but these errors were encountered:
Hi, in our experiment, using both FP32 and Triton might cause an increase in loss. You can try the following options:
Try uninstalling Triton directly by running pip uninstall triton. The backend will switch to Torch.
Use FP16 entirely. Although this might result in some precision loss, it will be faster and less likely to cause an increase in loss.
If you have any further issues, please feel free to reach out.
------------------ 原始邮件 ------------------
发件人: "yuhuixu1993/qa-lora" ***@***.***>;
发送时间: 2024年5月28日(星期二) 下午4:10
***@***.***>;
***@***.******@***.***>;
主题: Re: [yuhuixu1993/qa-lora] is that right? could you tell me how to fix this error? (Issue #36)
In our experiment, using both FP32 and Triton simultaneously might lead to an increase in loss. You can try the following options:
Try uninstalling Triton directly using pip uninstall triton, which will switch the backend to Torch.
Use FP16 entirely. Although this may result in some precision loss, it will be faster and less likely to cause an increase in loss.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
1, dowload auto-gptq , then Change the peft_utils.py in my own auto-gptq path(python path/auto_gptq/utils/peft_utils.py) with the new one(qa-lora).
![image](https://private-user-images.githubusercontent.com/13574341/334256704-19373f6e-3970-4b1c-996b-2659f2d031ce.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzU1NDUsIm5iZiI6MTczOTM3NTI0NSwicGF0aCI6Ii8xMzU3NDM0MS8zMzQyNTY3MDQtMTkzNzNmNmUtMzk3MC00YjFjLTk5NmItMjY1OWYyZDAzMWNlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE1NDcyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE3ODRkZDk4Nzk0ZGVkNDY0ODQ1YmY4MDcyN2I4MDVlZTZkNTU4MjJiYTBkZjRiMzI4N2EzMDY5YWIzOGFhYjQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.YX6zScQEVyOSsCT23QPAXAIUZAzgzKsxSVGRD-5IDL4)
2, follow the instruction of auto-gptq, I run this project as follow : python quant_with_alpaca.py --pretrained_model_dir /home/lizhangming/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16 --quantized_model_dir llama7b-quant4bit-g32 --bits 4 --group_size 32 --save_and_reload
3,Then I got three files config.json , quantize_config.json, gptq_model-4bit-32g.bin
4, Then I copy the rest file from huggingface llama7b
![image](https://private-user-images.githubusercontent.com/13574341/334257991-3f191ba8-8ba3-4458-ae99-a7567ead642e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzU1NDUsIm5iZiI6MTczOTM3NTI0NSwicGF0aCI6Ii8xMzU3NDM0MS8zMzQyNTc5OTEtM2YxOTFiYTgtOGJhMy00NDU4LWFlOTktYTc1NjdlYWQ2NDJlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE1NDcyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWE3MjAyN2RkYjc2NjQ5YTIzZjc5ZjY4MjFjOThmNjkxZDczOTlmMTg1NDkwOGY3M2U2ZjQzODg4YWVmYzE0NTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.fSh7X9tkIujDpK0w0n_v_rMHYJUOXypzCjWHgyxrKwU)
![image](https://private-user-images.githubusercontent.com/13574341/334258071-dc35e900-d8bc-4ae9-8d2c-f13947005946.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkzNzU1NDUsIm5iZiI6MTczOTM3NTI0NSwicGF0aCI6Ii8xMzU3NDM0MS8zMzQyNTgwNzEtZGMzNWU5MDAtZDhiYy00YWU5LThkMmMtZjEzOTQ3MDA1OTQ2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEyVDE1NDcyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTc3YWI1OTlkNjU3YjY3N2Y1NjE0ZmY5YzEwMmVkMWYyNGQ1YjJhMDlhYWNlMmUyMzIzOWNiNGIxOWQwNGQwMjMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.2SG3KdPqGQhCoOB3jWGV5evahBRvODPuWqXEkdmjjIM)
5,last I run CUDA_VISIBLE_DEVICES=0 HF_DATASETS_OFFLINE=1 python qalora.py --model_path AutoGPTQ/examples/quantization/llama7b-quant4bit-g32/
##########################################The I got the error#########################
The text was updated successfully, but these errors were encountered: