Releases: bmaltais/kohya_ss
v22.6.0
- 2024/01/27 (v22.6.0)
-
Merge sd-scripts v0.8.3 code update
- Fixed a bug that the training crashes when
--fp8_base
is specified with--save_state
. PR #1079 Thanks to feffy380!safetensors
is updated. Please see Upgrade and update the library.
- Fixed a bug that the training crashes when
network_multiplier
is specified with multi-GPU training. PR #1084 Thanks to fireicewolf! - Fixed a bug that the training crashes when training ControlNet-LLLite.
- Fixed a bug that the training crashes when
-
Merge sd-scripts v0.8.2 code update
-
[Experimental] The
--fp8_base
option is added to the training scripts for LoRA etc. The base model (U-Net, and Text Encoder when training modules for Text Encoder) can be trained with fp8. PR #1057 Thanks to KohakuBlueleaf!- Please specify
--fp8_base
intrain_network.py
orsdxl_train_network.py
. - PyTorch 2.1 or later is required.
- If you use xformers with PyTorch 2.1, please see xformers repository and install the appropriate version according to your CUDA version.
- The sample image generation during training consumes a lot of memory. It is recommended to turn it off.
- Please specify
-
[Experimental] The network multiplier can be specified for each dataset in the training scripts for LoRA etc.
- This is an experimental option and may be removed or changed in the future.
- For example, if you train with state A as
1.0
and state B as-1.0
, you may be able to generate by switching between state A and B depending on the LoRA application rate. - Also, if you prepare five states and train them as
0.2
,0.4
,0.6
,0.8
, and1.0
, you may be able to generate by switching the states smoothly depending on the application rate. - Please specify
network_multiplier
in[[datasets]]
in.toml
file.
-
Some options are added to
networks/extract_lora_from_models.py
to reduce the memory usage.--load_precision
option can be used to specify the precision when loading the model. If the model is saved in fp16, you can reduce the memory usage by specifying--load_precision fp16
without losing precision.--load_original_model_to
option can be used to specify the device to load the original model.--load_tuned_model_to
option can be used to specify the device to load the derived model. The default iscpu
for both options, but you can specifycuda
etc. You can reduce the memory usage by loading one of them to GPU. This option is available only for SDXL.
-
The gradient synchronization in LoRA training with multi-GPU is improved. PR #1064 Thanks to KohakuBlueleaf!
-
The code for Intel IPEX support is improved. PR #1060 Thanks to akx!
-
Fixed a bug in multi-GPU Textual Inversion training.
-
.toml
example for network multiplier[general] [[datasets]] resolution = 512 batch_size = 8 network_multiplier = 1.0 ... subset settings ... [[datasets]] resolution = 512 batch_size = 8 network_multiplier = -1.0 ... subset settings ...
-
-
Merge sd-scripts v0.8.1 code update
-
Fixed a bug that the VRAM usage without Text Encoder training is larger than before in training scripts for LoRA etc (
train_network.py
,sdxl_train_network.py
).- Text Encoders were not moved to CPU.
-
Fixed typos. Thanks to akx! PR #1053
-
What's Changed
- Update Chinese Documentation by @boombbo in #1896
- Change cudann to cuDNN by @EugeoSynthesisThirtyTwo in #1902
- v22.6.0 by @bmaltais in #1907
New Contributors
- @EugeoSynthesisThirtyTwo made their first contribution in #1902
Full Changelog: v22.5.0...v22.6.0
v22.5.0
- 2024/01/15 (v22.5.0)
- Merged sd-scripts v0.8.0 updates
- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with Upgrade.
- Some model files (Text Encoder without position_id) based on the latest Transformers can be loaded.
torch.compile
is supported (experimental). PR #1024 Thanks to p1atdev!- This feature works only on Linux or WSL.
- Please specify
--torch_compile
option in each training script. - You can select the backend with
--dynamo_backend
option. The default is"inductor"
.inductor
oreager
seems to work. - Please use
--spda
option instead of--xformers
option. - PyTorch 2.1 or later is recommended.
- Please see PR for details.
- The session name for wandb can be specified with
--wandb_run_name
option. PR #1032 Thanks to hopl1t! - IPEX library is updated. PR #1030 Thanks to Disty0!
- Fixed a bug that Diffusers format model cannot be saved.
- Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with Upgrade.
- Fix LoRA config display after load that would sometime hide some of the feilds
What's Changed
- bugfix. check is_sdxl before adding sdxl params. by @aplio in #1835
- IPEX add DISABLE_VENV_LIBS env variable and use TCMalloc by @Disty0 in #1858
- v22.5.0 by @bmaltais in #1880
New Contributors
Full Changelog: v22.4.1...v22.5.0
v22.4.1
What's Changed
- Fix missing import in BLIP captioning by @paboum in #1821
- Update zh-TW localization by @hinablue in #1832
- Bump crate-ci/typos from 1.16.23 to 1.16.26 by @dependabot in #1831
- v22.4.1 by @bmaltais in #1834
New Contributors
Full Changelog: v22.4.0...v22.4.1
v22.4.0
- 2023/12/28 (v22.4.0)
- Fixed to work
tools/convert_diffusers20_original_sd.py
. Thanks to Disty0! PR #1016 - The issues in multi-GPU training are fixed. Thanks to Isotr0py! PR #989 and #1000
--ddp_gradient_as_bucket_view
and--ddp_bucket_view
options are added tosdxl_train.py
. Please specify these options for multi-GPU training.
- IPEX support is updated. Thanks to Disty0!
- Fixed the bug that the size of the bucket becomes less than
min_bucket_reso
. Thanks to Cauldrath! PR #1008 --sample_at_first
option is added to each training script. This option is useful to generate images at the first step, before training. Thanks to shirayu! PR #907--ss
option is added to the sampling prompt in training. You can specify the scheduler for the sampling like--ss euler_a
. Thanks to shirayu! PR #906keep_tokens_separator
is added to the dataset config. This option is useful to keep (prevent from shuffling) the tokens in the captions. See #975 for details. Thanks to Linaqruf!- You can specify the separator with an option like
--keep_tokens_separator "|||"
or withkeep_tokens_separator: "|||"
in.toml
. The tokens before|||
are not shuffled.
- You can specify the separator with an option like
- Attention processor hook is added. See #961 for details. Thanks to rockerBOO!
- The optimizer
PagedAdamW
is added. Thanks to xzuyn! PR #955 - NaN replacement in SDXL VAE is sped up. Thanks to liubo0902! PR #1009
- Fixed the path error in
finetune/make_captions.py
. Thanks to CjangCjengh! PR #986
What's Changed
New Contributors
Full Changelog: v22.3.1...v22.4.0
v22.3.1
What's Changed
- Add goto button to manual caption utility
- Add missing options for various LyCORIS training algorythms
- Refactor how feilds are shown or hidden
- Made max value for network and convolution rank 512 except for LyCORIS/LoKr.
- Add goto page button by @binitt in #1755
- Update requirements.txt by @tjip1234 in #1765
- IPEX update to PyTorch 2.1 and Bundle-in MKL & DPCPP by @Disty0 in #1772
- Update docker-compose.yaml by @WellDone2094 in #1771
- Update lpw_stable_diffusion.py by @DevArqSangoi in #1782
- v22.3.1 by @bmaltais in #1787
New Contributors
- @binitt made their first contribution in #1755
- @tjip1234 made their first contribution in #1765
- @WellDone2094 made their first contribution in #1771
Full Changelog: v22.3.0...v22.3.1
v22.3.0
- 2023/12/06 (v22.3.0)
- Merge sd-scripts updates:
finetune\tag_images_by_wd14_tagger.py
now supports the separator other than,
with--caption_separator
option. Thanks to KohakuBlueleaf! PR #913- Min SNR Gamma with V-predicition (SD 2.1) is fixed. Thanks to feffy380! PR#934
- See #673 for details.
--min_diff
and--clamp_quantile
options are added tonetworks/extract_lora_from_models.py
. Thanks to wkpark! PR #936- The default values are same as the previous version.
- Deep Shrink hires fix is supported in
sdxl_gen_img.py
andgen_img_diffusers.py
.--ds_timesteps_1
and--ds_timesteps_2
options denote the timesteps of the Deep Shrink for the first and second stages.--ds_depth_1
and--ds_depth_2
options denote the depth (block index) of the Deep Shrink for the first and second stages.--ds_ratio
option denotes the ratio of the Deep Shrink.0.5
means the half of the original latent size for the Deep Shrink.--dst1
,--dst2
,--dsd1
,--dsd2
and--dsr
prompt options are also available.
- Add GLoRA support
What's Changed
Full Changelog: v22.2.2...v22.3.0
v22.2.2
2023/12/03 (v22.2.2)
- Update Lycoris module to 2.0.0 (https://github.com/KohakuBlueleaf/LyCORIS/blob/0006e2ffa05a48d8818112d9f70da74c0cd30b99/README.md)
- Update Lycoris merge and extract tools
- Remove anoying warning about local pip modules that is not necessary.
- Adding support for LyCORIS presets
- Adding Support for LyCORIS Native Fine-Tuning
- Adding support for Lycoris Diag-OFT
What's Changed
- Add PagedAdamW32bit to Dropdown by @xzuyn in #1695
- Update Dockerfile by @julien-blanchon in #1717
- Bump crate-ci/typos from 1.16.21 to 1.16.23 by @dependabot in #1719
- v22.2.2 by @bmaltais in #1731
New Contributors
- @xzuyn made their first contribution in #1695
- @julien-blanchon made their first contribution in #1717
Full Changelog: v22.2.1...v22.2.2
v22.2.1
What's Changed
- Fix issue with
Debiased Estimation loss
not getting properly loaded from json file. - Fix the bug that it fails to load a VAE by @nattoheaven in #1688
New Contributors
- @nattoheaven made their first contribution in #1688
Full Changelog: v22.2.0...v22.2.1
v22.2.0
- 2023/11/15 (v22.2.0)
- sd-scripts code base update:
-
sdxl_train.py
now supports different learning rates for each Text Encoder.- Example:
--learning_rate 1e-6
: train U-Net only--train_text_encoder --learning_rate 1e-6
: train U-Net and two Text Encoders with the same learning rate (same as the previous version)--train_text_encoder --learning_rate 1e-6 --learning_rate_te1 1e-6 --learning_rate_te2 1e-6
: train U-Net and two Text Encoders with the different learning rates--train_text_encoder --learning_rate 0 --learning_rate_te1 1e-6 --learning_rate_te2 1e-6
: train two Text Encoders only--train_text_encoder --learning_rate 1e-6 --learning_rate_te1 1e-6 --learning_rate_te2 0
: train U-Net and one Text Encoder only--train_text_encoder --learning_rate 0 --learning_rate_te1 0 --learning_rate_te2 1e-6
: train one Text Encoder only
- Example:
-
train_db.py
andfine_tune.py
now support different learning rates for Text Encoder. Specify with--learning_rate_te
option.- To train Text Encoder with
fine_tune.py
, specify--train_text_encoder
option too.train_db.py
trains Text Encoder by default.
- To train Text Encoder with
-
Fixed the bug that Text Encoder is not trained when block lr is specified in
sdxl_train.py
. -
Debiased Estimation loss is added to each training script. Thanks to sdbds!
- Specify
--debiased_estimation_loss
option to enable it. See PR #889 for details.
- Specify
-
Training of Text Encoder is improved in
train_network.py
andsdxl_train_network.py
. Thanks to KohakuBlueleaf! PR #895 -
The moving average of the loss is now displayed in the progress bar in each training script. Thanks to shirayu! PR #899
-
PagedAdamW32bit optimizer is supported. Specify
--optimizer_type=PagedAdamW32bit
. Thanks to xzuyn! PR #900 -
Other bug fixes and improvements.
-
- kohya_ss gui updates:
- Implement GUI support for SDXL finetune TE1 and TE2 training LR parameters and for non SDXL finetune TE training parameter
- Implement GUI support for Dreambooth TE LR parameter
- Implement Debiased Estimation loss at the botom of the Advanced Parameters tab.
Full Changelog: v22.1.1...v22.2.0
v22.1.1
What's Changed
- Bump crate-ci/typos from 1.16.15 to 1.16.18 by @dependabot in #1595
- Update release # by @bmaltais in #1597
- fixed minor typo in README.md by @iamrohitanshu in #1631
- Update README.md by @LokiL in #1633
- Bump crate-ci/typos from 1.16.18 to 1.16.21 by @dependabot in #1650
- documentation: fix recommended parameter "train_unet_only" -> "network_train_unet_only" by @nylki in #1669
- Improved Environment Variable Handling for Enhanced Flexibility in TensorBoard Launch by @lcolok in #1664
- Add STARTUP_CMD env var and IPEXRUN support to gui.sh by @Disty0 in #1662
- Fix VAE being applied (for LoRA training) by @rockerBOO in #1656
New Contributors
- @dependabot made their first contribution in #1595
- @iamrohitanshu made their first contribution in #1631
- @LokiL made their first contribution in #1633
- @nylki made their first contribution in #1669
- @lcolok made their first contribution in #1664
- @rockerBOO made their first contribution in #1656
Full Changelog: v22.1.0...v22.1.1