Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced a model store #805

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

engelmi
Copy link
Member

@engelmi engelmi commented Feb 13, 2025

Added a model store to standardize pulling, storing and using models across the different repositories.

A key goal of this store is to support downloading multiple files, e.g. the chat template from ollama models or additional metadata from non-GGUF models. In addition, this is probably a first step towards enabling the usage of safetensors (#642) where a model consists of multiple files.

The proposed structure is inspired by how the huggingface-cli stores its files and extends it for the multi-source usage in ramalama.

The proposed storage structure looks like this after running

  • ramalama pull ollama://tinyllama
  • ramalama pull hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf
~/.local/share/ramalama/store
   |-- ollama
   |   |-- tinyllama
   |   |   |-- blobs
   |   |   |   |-- sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
   |   |   |   |-- sha256:6331358be52a6ebc2fd0755a51ad1175734fd17a628ab5ea6897109396245362
   |   |   |   |-- sha256:af0ddbdaaa26f30d54d727f9dd944b76bdb926fdaf9a58f63f78c532f57c191f
   |   |   |-- refs
   |   |   |   |-- latest
   |   |   |-- snapshots
   |   |   |   |-- sha256:6331358be52a6ebc2fd0755a51ad1175734fd17a628ab5ea6897109396245362
   |   |   |   |   |-- chat_template -> ../../blobs/sha256:af0ddbdaaa26f30d54d727f9dd944b76bdb926fdaf9a58f63f78c532f57c191f
   |   |   |   |   |-- config.json -> ../../blobs/sha256:6331358be52a6ebc2fd0755a51ad1175734fd17a628ab5ea6897109396245362
   |   |   |   |   |-- tinyllama -> ../../blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
   |-- huggingface
   |   |-- ibm-granite
   |   |   |-- granite-3b-code-base-2k-GGUF
   |   |   |   |-- blobs
   |   |   |   |   |-- sha256:c803a9bb910be0699501319140329ba0d2850aeb8827389e38f24e8370f04293
   |   |   |   |-- refs
   |   |   |   |   |-- latest
   |   |   |   |-- snapshots
   |   |   |   |   |-- sha256:c803a9bb910be0699501319140329ba0d2850aeb8827389e38f24e8370f04293
   |   |   |   |   |   |-- granite-3b-code-base.Q4_K_M.gguf -> ../../blobs/sha256:c803a9bb910be0699501319140329ba0d2850aeb8827389e38f24e8370f04293

Summary by Sourcery

New Features:

  • Add a model store to handle pulling, storing, and accessing models across repositories.

Copy link
Contributor

sourcery-ai bot commented Feb 13, 2025

Reviewer's Guide by Sourcery

This pull request introduces a ModelStore class to standardize the pulling, storing, and using of models across different repositories. It refactors the Ollama and Huggingface classes to use the new ModelStore and updates the OCI and URL classes accordingly. The changes include handling multiple files per model, symlinking to existing Ollama cache, and fetching checksums from the API.

Updated class diagram for ModelStore

classDiagram
    class ModelStore {
        - _store_base_path: Path
        - _model_name: str
        - _model_organization: str
        - _model_registry: ModelRegistry
        + store_path: str
        + model_name: str
        + model_organization: str
        + model_registry: ModelRegistry
        + model_base_directory: str
        + blob_directory: str
        + ref_directory: str
        + snapshot_directory: str
        + get_ref_file_path(model_tag: str) : str
        + get_snapshot_directory(hash: str) : str
        + get_blob_file_path(hash: str) : str
        + get_snapshot_file_path(hash: str, filename: str) : str
        + resolve_model_directory(model_tag: str) : str
        + ensure_directory_setup() : None
        + exists(model_tag: str) : bool
        + get_cached_files(model_tag: str) : Tuple[str, list[str], bool]
        + prepare_new_snapshot(model_tag: str, snapshot_hash: str, snapshot_files: list[SnapshotFile]) : None
        + new_snapshot(model_tag: str, snapshot_hash: str, snapshot_files: list[SnapshotFile]) : None
    }
    class SnapshotFile {
        + url: str
        + header: Dict
        + hash: str
        + name: str
        + should_show_progress: bool
        + should_verify_checksum: bool
        + required: bool
    }
    class ModelRegistry {
        + HUGGINGFACE
        + OLLAMA
        + OCI
        + URL
    }
    ModelStore -- ModelRegistry
    ModelStore -- SnapshotFile
Loading

Updated class diagram for Ollama Model

classDiagram
    class Ollama {
        - model: str
        - model_tag: str
        - directory: str
        - filename: str
        - store: ModelStore
        + __init__(model: str, store_path: str = "")
        + pull(debug: bool = False) : str
    }
    class Model {
        <<Abstract>>
        - model: str
        - type: str
        + login(args: any)
        + logout(args: any)
        + pull(args: any)
        + push(source: str, args: any)
        + is_symlink_to(file_path: str, target_path: str) : bool
        + garbage_collection(args: any)
        + setup_container(args: any)
        + exec_model_in_container(model_path: str, cmd_args: list[str], args: any)
        + build_exec_args_perplexity(args: any, model_path: str) : list[str]
        + check_name_and_container(args: any)
        + build_prompt(args: any) : str
        + execute_model(model_path: str, exec_args: list[str], args: any)
        + validate_args(args: any)
        + build_exec_args_serve(args: any, exec_model_path: str) : list[str]
        + execute_command(model_path: str, exec_args: list[str], args: any)
        + serve(args: any)
        + inspect(args: any)
        + get_model_registry(args: any) : str
    }
    Ollama -- ModelStore
    Ollama --|> Model
Loading

Updated class diagram for Huggingface Model

classDiagram
    class Huggingface {
        - model: str
        - model_tag: str
        - directory: str
        - filename: str
        - store: ModelStore
        - hf_cli_available: bool
        + __init__(model: str, store_path: str = "")
        + login(args: any)
        + logout(args: any)
        + pull(debug: bool = False) : str
        + push(source: str, args: any)
    }
    class Model {
        <<Abstract>>
        - model: str
        - type: str
        + login(args: any)
        + logout(args: any)
        + pull(args: any)
        + push(source: str, args: any)
        + is_symlink_to(file_path: str, target_path: str) : bool
        + garbage_collection(args: any)
        + setup_container(args: any)
        + exec_model_in_container(model_path: str, cmd_args: list[str], args: any)
        + build_exec_args_perplexity(args: any, model_path: str) : list[str]
        + check_name_and_container(args: any)
        + build_prompt(args: any) : str
        + execute_model(model_path: str, exec_args: list[str], args: any)
        + validate_args(args: any)
        + build_exec_args_serve(args: any, exec_model_path: str) : list[str]
        + execute_command(model_path: str, exec_args: list[str], args: any)
        + serve(args: any)
        + inspect(args: any)
        + get_model_registry(args: any) : str
    }
    Huggingface -- ModelStore
    Huggingface --|> Model
Loading

File-Level Changes

Change Details Files
Introduced a ModelStore class to manage the storage and retrieval of models, including handling multiple files per model and symlinking to existing Ollama cache.
  • Created a ModelStore class to handle model storage and retrieval.
  • Implemented a directory structure for storing blobs, refs, and snapshots.
  • Added methods for resolving model directories and checking for the existence of models.
  • Implemented caching and symlinking to existing Ollama cache.
  • Added SnapshotFile and RefFile data structures to manage model files and references.
  • Added ModelRegistry enum to represent different model registries.
  • Added methods to download and verify checksums of model files.
  • Added methods to create symlinks to model files.
ramalama/ollama.py
ramalama/huggingface.py
ramalama/model.py
ramalama/common.py
ramalama/cli.py
ramalama/oci.py
ramalama/url.py
ramalama/http_client.py
ramalama/model_store.py
Refactored the Ollama class to use the new ModelStore for pulling models, including fetching manifest data, handling different layer types, and creating necessary directories.
  • Modified the Ollama class to inherit from the Model class and use the ModelStore.
  • Removed the _local method and related logic for determining model paths.
  • Implemented the pull method to fetch manifest data, handle different layer types, and create necessary directories.
  • Added logic to symlink to existing Ollama cache if available.
  • Removed the init_pull, pull_config_blob, and pull_blob functions.
ramalama/ollama.py
Refactored the Huggingface class to use the new ModelStore for pulling models, including fetching checksums from the API and handling different file types.
  • Modified the Huggingface class to inherit from the Model class and use the ModelStore.
  • Implemented the pull method to fetch checksums from the API and handle different file types.
  • Added logic to use huggingface-cli to download models if available.
  • Removed the hf_pull and url_pull methods.
ramalama/huggingface.py
Modified the Model class to include a ModelStore instance and handle model tags.
  • Added a ModelStore instance to the Model class.
  • Added logic to handle model tags in the constructor.
  • Modified the __init__ method to accept a store_path argument.
ramalama/model.py
Updated the OCI and URL classes to use the new ModelStore.
  • Modified the OCI and URL classes to inherit from the Model class and use the ModelStore.
  • Updated the constructors to accept a store_path argument.
ramalama/oci.py
ramalama/url.py
Updated the cli.py to pass the store path to the model constructors.
  • Modified the New function to pass the store path to the model constructors.
ramalama/cli.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@engelmi
Copy link
Member Author

engelmi commented Feb 13, 2025

@rhatdan @ericcurtin Could you have a brief look at this? Its still in draft, but I wanted to get your opinion if this is going in a good direction or not.

@engelmi engelmi force-pushed the download-additional-metadata-files branch from e2cd7ac to c3fc4fd Compare February 13, 2025 17:17
"""
h = hashlib.new("sha256")
h.update(to_hash.encode("utf-8"))
return f"sha256:{h.hexdigest()}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this function, actually thinking about making a breaking change soon and changing ':' character to '-' on the filesystem like Ollama, just not for this PR. For one, on some filesystems ':' is an illegal character.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching : with - makes much sense.
Since this PR would introduce a breaking regarding the file storage anyway, I can include this as well.

if e.code == HTTP_RANGE_NOT_SATISFIABLE: # "Range Not Satisfiable" error (file already downloaded)
return # No need to retry
# "Range Not Satisfiable" error (file already downloaded)
if e.code in [HTTP_RANGE_NOT_SATISFIABLE, HTTP_NOT_FOUND]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #818

return True
else:
return False
return available("huggingface-cli")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

HUGGINGFACE = "huggingface"
OLLAMA = "ollama"
OCI = "oci"
URL = "url"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could see the community adding even more here, such as s3:// , just something to keep in mind, I wonder should we continue with https, etc. We do have to code up any protocol we support.

Copy link
Member Author

@engelmi engelmi Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Its already different with the default https download and the huggingface-/ollama-cli and oci- although they use https as well, the "library" is different. Maybe I can inject a class/callable - which specifies HOW to download - to the model stores new_snapshot function. Then this would be easily extensible.

@ericcurtin
Copy link
Collaborator

ericcurtin commented Feb 13, 2025

The code looks great to me so far, don't see any significant issues at the moment at a high-level.

Added a model store to standardize pulling, storing and using models
across the different repositories.

Signed-off-by: Michael Engel <[email protected]>
@engelmi engelmi force-pushed the download-additional-metadata-files branch from c3fc4fd to 5f826a6 Compare February 14, 2025 08:12
@ericcurtin
Copy link
Collaborator

Since this is a breaking change I guess that's the main question, do we want to break all users now on upgrade? I just want to be sure the benefits are worth it. If we gain some functional benefits I think it's a good idea. What I'm less of a fan of is non-functional breaking changes like renaming "repos/models" -> "store" . But if we need to break everything for functional reasons anyway, now is certainly the time to do renames and the rename is fine.

I also think it's possible to do this in a non-breaking way in the existing ~/.local/share/ramalama/repos directory.

~/.local/share/ramalama/models/ is just a directory intended to be a more presentable way for humans and for "ramalama ls" to parse, etc. The metadata, messy implementation details were intended to go to ~/.local/share/ramalama/repos/ directory.

It's worth passing this by @swarajpande5 also, he wrote a fair bit of this.

If we decide we are doing breaking changes, another one we should do is add .gguf extension to the Ollama .gguf files, some tools refused to load them in the past without that file extension. Example that we ended up fixing vllm-side: vllm-project/vllm#7993

But just fair warning @engelmi there's many tests, assumptions built up around the existing technique. This could be tough to get through CI. You will likely spend a lot of time massaging the tests here, etc. If you do this in a non-breaking way it will be less effort to code and less impact on the users when multi-file models such as the safetensors one are enabled.

@engelmi
Copy link
Member Author

engelmi commented Feb 14, 2025

Since this is a breaking change I guess that's the main question, do we want to break all users now on upgrade? I just want to be sure the benefits are worth it. If we gain some functional benefits I think it's a good idea. What I'm less of a fan of is non-functional breaking changes like renaming "repos/models" -> "store" . But if we need to break everything for functional reasons anyway, now is certainly the time to do renames and the rename is fine.

The "store" directory is just an intermediate for local development at the moment. It would probably be best to switch to the model directory, i.e. having ~/.local/share/ramalama/models/ollama/tinyllama/<blobs|refs|snapshots>. The repos directory would go into the respective blobs directory. It would be a breaking change nonetheless.
And I agree, there should be functional benefits for this. These are the benefits I can think of:

  • clear distinction between name, tag and organization of a model - also on a file/directory level (the repository, i.e. ollama or hf, is already used in the current implementation). This also helps during development, I think.
  • using model tags in the "refs" directory enables the user to easily switch between different versions. Ollama already provides tags and for huggingface one can use the commit, e.g. https://huggingface.co/ibm-granite/granite-3.0-8b-instruct/raw/257d6976020e06daa75f9b19d056a5e7590bf7fc/model-00001-of-00004.safetensors
    I think it would be possible possible to use tags with the current approach, but this would either require updating the symlinks in the models directory to point to the correct sha256 or encode it into the path in the models directory.
  • all files for one model and version are bundled together - so its harder to break things
  • easier cleanup, repairing and symlink checking for models
  • support for safetensors on a file storage level "out of the box" (not for running them, only to pull, store and provide these files) with the advantages of versioning and bundling

WDYT? @ericcurtin @swarajpande5

I also think it's possible to do this in a non-breaking way in the existing ~/.local/share/ramalama/repos directory.

~/.local/share/ramalama/models/ is just a directory intended to be a more presentable way for humans and for "ramalama ls" to parse, etc. The metadata, messy implementation details were intended to go to ~/.local/share/ramalama/repos/ directory.

It's worth passing this by @swarajpande5 also, he wrote a fair bit of this.

Yes, it should be possible to support multiple files with the current approach. I am wondering, though, if it makes things harder to understand and maintain - code-wise as well as from the directories and files.
For example:
When having multiple files (incl. metadata). Those should be stored in "repos" and linking only the model from "models" to the "repos" directory, right? As a human inspecting the "models" directory, wouldn't I want to know which files are used? And if I add symlinks to the other files, how can I ensure the versions of those files stay the same? Introducing model tags/versions and encoding it into the path would also be a (smaller) breaking change. Adding a "refs" directory, we'd have to keep track of all files for a version there.

The messy implementation details should be only in code where we can encapsulate them (i.e. in the Ollama/HF/URL/etc. Model) and the file storage should be as standardized as possible, I think.

If we decide we are doing breaking changes, another one we should do is add .gguf extension to the Ollama .gguf files, some tools refused to load them in the past without that file extension. Example that we ended up fixing vllm-side: vllm-project/vllm#7993

Sounds good to me!
In the model store class this should be simple - basically a check with is_model_gguf on the downloaded blob with a check for the file extension on the model name.
Same for replacing : with -, which can also be done in a central location of the store class.

But just fair warning @engelmi there's many tests, assumptions built up around the existing technique. This could be tough to get through CI. You will likely spend a lot of time massaging the tests here, etc. If you do this in a non-breaking way it will be less effort to code and less impact on the users when multi-file models such as the safetensors one are enabled.

Yes, that is true and will be quite cumbersome to change.

@rhatdan
Copy link
Member

rhatdan commented Feb 17, 2025

I really like this PR, but it needs a rebase. We need to get this in ASAP, and probably need a mechanism to upgrade people, rather then break their stores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants