Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for loading single file CLIPEmbedding models #6813

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lstein
Copy link
Collaborator

@lstein lstein commented Sep 4, 2024

Summary

We're starting to see fine-tuned CLIPEmbedding models for improved FLUX performance appearing in the wild. For example: https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14 . However, these fine-tunes are formatted as single "checkpoint"-style files rather than Transformers-compatible folders. This PR adds support for installing and loading these models.

Related Issues / Discussions

There is a problem with this implementation. The CLIP text embedder needs two models: the encoder and the tokenizer. When FLUX support was added to InvokeAI, these two models were grouped together under a single folder, and are treated as two submodels:

└── clip-vit-large-patch14
    ├── text_encoder
    │   ├── config.json
    │   └── model.safetensors
    └── tokenizer
        ├── merges.txt
        ├── special_tokens_map.json
        ├── tokenizer_config.json
        └── vocab.json

However, the single-file format contains just the text encoder and not the auxiliary files needed for the tokenizer. So as a workaround, when a CLIPEmbed single-file's tokenizer is requested I call CLIPTokenizer.from_pretrained() to download the tokenizer from the InvokeAI/clip-vit-large-patch14 HF repository. After it is downloaded, it is cached in the HuggingFace cache, so the next access does not require network. This is preferable to loading the tokenizer from the locally-installed clip model because (1) there is no guarantee that this has been installed previously; and (2) doing this would be incredibly ugly, and require the low-level loader to communicate with the high level model manager.

The main downside is that the first time the tokenizer is needed, the backend will hit the network, which is something we are trying to avoid. (see PR #6740 )

QA Instructions

Use the model manager tab to install one of the "HF" format CLIPTextModel models located at https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14. Try to render with it. The Tokenizer and TextEncoder should load and run successfully.

Merge Plan

Merge when approved.

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • Documentation added / updated (if applicable)

@github-actions github-actions bot added python PRs that change python files backend PRs that change backend files labels Sep 4, 2024
Copy link

@RahulVadisetty91 RahulVadisetty91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r4532ew

You need to add the following lines to include the new model configurations.

    Annotated[SpandrelImageToImageConfig, SpandrelImageToImageConfig.get_tag()],
    Annotated[T2IAdapterConfig, T2IAdapterConfig.get_tag()],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend PRs that change backend files python PRs that change python files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants