Skip to content

Releases: bmaltais/kohya_ss

v22.6.0

27 Jan 19:03
62fbae6
Compare
Choose a tag to compare
  • 2024/01/27 (v22.6.0)
  • Merge sd-scripts v0.8.3 code update

    • Fixed a bug that the training crashes when --fp8_base is specified with --save_state. PR #1079 Thanks to feffy380!
      • safetensors is updated. Please see Upgrade and update the library.
    • Fixed a bug that the training crashes when network_multiplier is specified with multi-GPU training. PR #1084 Thanks to fireicewolf!
    • Fixed a bug that the training crashes when training ControlNet-LLLite.
  • Merge sd-scripts v0.8.2 code update

    • [Experimental] The --fp8_base option is added to the training scripts for LoRA etc. The base model (U-Net, and Text Encoder when training modules for Text Encoder) can be trained with fp8. PR #1057 Thanks to KohakuBlueleaf!

      • Please specify --fp8_base in train_network.py or sdxl_train_network.py.
      • PyTorch 2.1 or later is required.
      • If you use xformers with PyTorch 2.1, please see xformers repository and install the appropriate version according to your CUDA version.
      • The sample image generation during training consumes a lot of memory. It is recommended to turn it off.
    • [Experimental] The network multiplier can be specified for each dataset in the training scripts for LoRA etc.

      • This is an experimental option and may be removed or changed in the future.
      • For example, if you train with state A as 1.0 and state B as -1.0, you may be able to generate by switching between state A and B depending on the LoRA application rate.
      • Also, if you prepare five states and train them as 0.2, 0.4, 0.6, 0.8, and 1.0, you may be able to generate by switching the states smoothly depending on the application rate.
      • Please specify network_multiplier in [[datasets]] in .toml file.
    • Some options are added to networks/extract_lora_from_models.py to reduce the memory usage.

      • --load_precision option can be used to specify the precision when loading the model. If the model is saved in fp16, you can reduce the memory usage by specifying --load_precision fp16 without losing precision.
      • --load_original_model_to option can be used to specify the device to load the original model. --load_tuned_model_to option can be used to specify the device to load the derived model. The default is cpu for both options, but you can specify cuda etc. You can reduce the memory usage by loading one of them to GPU. This option is available only for SDXL.
    • The gradient synchronization in LoRA training with multi-GPU is improved. PR #1064 Thanks to KohakuBlueleaf!

    • The code for Intel IPEX support is improved. PR #1060 Thanks to akx!

    • Fixed a bug in multi-GPU Textual Inversion training.

    • .toml example for network multiplier

      [general]
      [[datasets]]
      resolution = 512
      batch_size = 8
      network_multiplier = 1.0
      
      ... subset settings ...
      
      [[datasets]]
      resolution = 512
      batch_size = 8
      network_multiplier = -1.0
      
      ... subset settings ...
  • Merge sd-scripts v0.8.1 code update

    • Fixed a bug that the VRAM usage without Text Encoder training is larger than before in training scripts for LoRA etc (train_network.py, sdxl_train_network.py).

      • Text Encoders were not moved to CPU.
    • Fixed typos. Thanks to akx! PR #1053

What's Changed

New Contributors

Full Changelog: v22.5.0...v22.6.0

v22.5.0

16 Jan 00:21
bfe8b06
Compare
Choose a tag to compare
  • 2024/01/15 (v22.5.0)
  • Merged sd-scripts v0.8.0 updates
    • Diffusers, Accelerate, Transformers and other related libraries have been updated. Please update the libraries with Upgrade.
      • Some model files (Text Encoder without position_id) based on the latest Transformers can be loaded.
    • torch.compile is supported (experimental). PR #1024 Thanks to p1atdev!
      • This feature works only on Linux or WSL.
      • Please specify --torch_compile option in each training script.
      • You can select the backend with --dynamo_backend option. The default is "inductor". inductor or eager seems to work.
      • Please use --spda option instead of --xformers option.
      • PyTorch 2.1 or later is recommended.
      • Please see PR for details.
    • The session name for wandb can be specified with --wandb_run_name option. PR #1032 Thanks to hopl1t!
    • IPEX library is updated. PR #1030 Thanks to Disty0!
    • Fixed a bug that Diffusers format model cannot be saved.
  • Fix LoRA config display after load that would sometime hide some of the feilds

What's Changed

New Contributors

Full Changelog: v22.4.1...v22.5.0

v22.4.1

03 Jan 01:33
842d9c7
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v22.4.0...v22.4.1

v22.4.0

28 Dec 19:07
89cfc46
Compare
Choose a tag to compare
  • 2023/12/28 (v22.4.0)
  • Fixed to work tools/convert_diffusers20_original_sd.py. Thanks to Disty0! PR #1016
  • The issues in multi-GPU training are fixed. Thanks to Isotr0py! PR #989 and #1000
    • --ddp_gradient_as_bucket_view and --ddp_bucket_viewoptions are added to sdxl_train.py. Please specify these options for multi-GPU training.
  • IPEX support is updated. Thanks to Disty0!
  • Fixed the bug that the size of the bucket becomes less than min_bucket_reso. Thanks to Cauldrath! PR #1008
  • --sample_at_first option is added to each training script. This option is useful to generate images at the first step, before training. Thanks to shirayu! PR #907
  • --ss option is added to the sampling prompt in training. You can specify the scheduler for the sampling like --ss euler_a. Thanks to shirayu! PR #906
  • keep_tokens_separator is added to the dataset config. This option is useful to keep (prevent from shuffling) the tokens in the captions. See #975 for details. Thanks to Linaqruf!
    • You can specify the separator with an option like --keep_tokens_separator "|||" or with keep_tokens_separator: "|||" in .toml. The tokens before ||| are not shuffled.
  • Attention processor hook is added. See #961 for details. Thanks to rockerBOO!
  • The optimizer PagedAdamW is added. Thanks to xzuyn! PR #955
  • NaN replacement in SDXL VAE is sped up. Thanks to liubo0902! PR #1009
  • Fixed the path error in finetune/make_captions.py. Thanks to CjangCjengh! PR #986

What's Changed

New Contributors

Full Changelog: v22.3.1...v22.4.0

v22.3.1

21 Dec 00:08
41009ae
Compare
Choose a tag to compare

What's Changed

  • Add goto button to manual caption utility
  • Add missing options for various LyCORIS training algorythms
  • Refactor how feilds are shown or hidden
  • Made max value for network and convolution rank 512 except for LyCORIS/LoKr.
  • Add goto page button by @binitt in #1755
  • Update requirements.txt by @tjip1234 in #1765
  • IPEX update to PyTorch 2.1 and Bundle-in MKL & DPCPP by @Disty0 in #1772
  • Update docker-compose.yaml by @WellDone2094 in #1771
  • Update lpw_stable_diffusion.py by @DevArqSangoi in #1782
  • v22.3.1 by @bmaltais in #1787

New Contributors

Full Changelog: v22.3.0...v22.3.1

v22.3.0

06 Dec 22:49
b3ea59c
Compare
Choose a tag to compare
  • 2023/12/06 (v22.3.0)
  • Merge sd-scripts updates:
    • finetune\tag_images_by_wd14_tagger.py now supports the separator other than , with --caption_separator option. Thanks to KohakuBlueleaf! PR #913
    • Min SNR Gamma with V-predicition (SD 2.1) is fixed. Thanks to feffy380! PR#934
      • See #673 for details.
    • --min_diff and --clamp_quantile options are added to networks/extract_lora_from_models.py. Thanks to wkpark! PR #936
      • The default values are same as the previous version.
    • Deep Shrink hires fix is supported in sdxl_gen_img.py and gen_img_diffusers.py.
      • --ds_timesteps_1 and --ds_timesteps_2 options denote the timesteps of the Deep Shrink for the first and second stages.
      • --ds_depth_1 and --ds_depth_2 options denote the depth (block index) of the Deep Shrink for the first and second stages.
      • --ds_ratio option denotes the ratio of the Deep Shrink. 0.5 means the half of the original latent size for the Deep Shrink.
      • --dst1, --dst2, --dsd1, --dsd2 and --dsr prompt options are also available.
    • Add GLoRA support

What's Changed

Full Changelog: v22.2.2...v22.3.0

v22.2.2

05 Dec 00:50
8fb0b31
Compare
Choose a tag to compare

2023/12/03 (v22.2.2)

What's Changed

New Contributors

Full Changelog: v22.2.1...v22.2.2

v22.2.1

16 Nov 16:23
Compare
Choose a tag to compare

What's Changed

  • Fix issue with Debiased Estimation loss not getting properly loaded from json file.
  • Fix the bug that it fails to load a VAE by @nattoheaven in #1688

New Contributors

Full Changelog: v22.2.0...v22.2.1

v22.2.0

16 Nov 00:06
Compare
Choose a tag to compare
  • 2023/11/15 (v22.2.0)
  • sd-scripts code base update:
    • sdxl_train.py now supports different learning rates for each Text Encoder.

      • Example:
        • --learning_rate 1e-6: train U-Net only
        • --train_text_encoder --learning_rate 1e-6: train U-Net and two Text Encoders with the same learning rate (same as the previous version)
        • --train_text_encoder --learning_rate 1e-6 --learning_rate_te1 1e-6 --learning_rate_te2 1e-6: train U-Net and two Text Encoders with the different learning rates
        • --train_text_encoder --learning_rate 0 --learning_rate_te1 1e-6 --learning_rate_te2 1e-6: train two Text Encoders only
        • --train_text_encoder --learning_rate 1e-6 --learning_rate_te1 1e-6 --learning_rate_te2 0: train U-Net and one Text Encoder only
        • --train_text_encoder --learning_rate 0 --learning_rate_te1 0 --learning_rate_te2 1e-6: train one Text Encoder only
    • train_db.py and fine_tune.py now support different learning rates for Text Encoder. Specify with --learning_rate_te option.

      • To train Text Encoder with fine_tune.py, specify --train_text_encoder option too. train_db.py trains Text Encoder by default.
    • Fixed the bug that Text Encoder is not trained when block lr is specified in sdxl_train.py.

    • Debiased Estimation loss is added to each training script. Thanks to sdbds!

      • Specify --debiased_estimation_loss option to enable it. See PR #889 for details.
    • Training of Text Encoder is improved in train_network.py and sdxl_train_network.py. Thanks to KohakuBlueleaf! PR #895

    • The moving average of the loss is now displayed in the progress bar in each training script. Thanks to shirayu! PR #899

    • PagedAdamW32bit optimizer is supported. Specify --optimizer_type=PagedAdamW32bit. Thanks to xzuyn! PR #900

    • Other bug fixes and improvements.

  • kohya_ss gui updates:
    • Implement GUI support for SDXL finetune TE1 and TE2 training LR parameters and for non SDXL finetune TE training parameter
    • Implement GUI support for Dreambooth TE LR parameter
    • Implement Debiased Estimation loss at the botom of the Advanced Parameters tab.

Full Changelog: v22.1.1...v22.2.0

v22.1.1

11 Nov 12:49
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v22.1.0...v22.1.1