Introduced a model store #805

engelmi · 2025-02-13T14:32:06Z

Added a model store to standardize pulling, storing and using models across the different repositories.

A key goal of this store is to support downloading multiple files, e.g. the chat template from ollama models or additional metadata from non-GGUF models. In addition, this is probably a first step towards enabling the usage of safetensors (#642) where a model consists of multiple files.

The proposed structure is inspired by how the huggingface-cli stores its files and extends it for the multi-source usage in ramalama.

The proposed storage structure looks like this after running

ramalama pull ollama://tinyllama
ramalama pull hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf

~/.local/share/ramalama/store
   |-- ollama
   |   |-- tinyllama
   |   |   |-- blobs
   |   |   |   |-- sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
   |   |   |   |-- sha256:6331358be52a6ebc2fd0755a51ad1175734fd17a628ab5ea6897109396245362
   |   |   |   |-- sha256:af0ddbdaaa26f30d54d727f9dd944b76bdb926fdaf9a58f63f78c532f57c191f
   |   |   |-- refs
   |   |   |   |-- latest
   |   |   |-- snapshots
   |   |   |   |-- sha256:6331358be52a6ebc2fd0755a51ad1175734fd17a628ab5ea6897109396245362
   |   |   |   |   |-- chat_template -> ../../blobs/sha256:af0ddbdaaa26f30d54d727f9dd944b76bdb926fdaf9a58f63f78c532f57c191f
   |   |   |   |   |-- config.json -> ../../blobs/sha256:6331358be52a6ebc2fd0755a51ad1175734fd17a628ab5ea6897109396245362
   |   |   |   |   |-- tinyllama -> ../../blobs/sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816
   |-- huggingface
   |   |-- ibm-granite
   |   |   |-- granite-3b-code-base-2k-GGUF
   |   |   |   |-- blobs
   |   |   |   |   |-- sha256:c803a9bb910be0699501319140329ba0d2850aeb8827389e38f24e8370f04293
   |   |   |   |-- refs
   |   |   |   |   |-- latest
   |   |   |   |-- snapshots
   |   |   |   |   |-- sha256:c803a9bb910be0699501319140329ba0d2850aeb8827389e38f24e8370f04293
   |   |   |   |   |   |-- granite-3b-code-base.Q4_K_M.gguf -> ../../blobs/sha256:c803a9bb910be0699501319140329ba0d2850aeb8827389e38f24e8370f04293

Summary by Sourcery

New Features:

Add a model store to handle pulling, storing, and accessing models across repositories.

sourcery-ai · 2025-02-13T14:32:11Z

Reviewer's Guide by Sourcery

This pull request introduces a ModelStore class to standardize the pulling, storing, and using of models across different repositories. It refactors the Ollama and Huggingface classes to use the new ModelStore and updates the OCI and URL classes accordingly. The changes include handling multiple files per model, symlinking to existing Ollama cache, and fetching checksums from the API.

Updated class diagram for ModelStore

classDiagram
    class ModelStore {
        - _store_base_path: Path
        - _model_name: str
        - _model_organization: str
        - _model_registry: ModelRegistry
        + store_path: str
        + model_name: str
        + model_organization: str
        + model_registry: ModelRegistry
        + model_base_directory: str
        + blob_directory: str
        + ref_directory: str
        + snapshot_directory: str
        + get_ref_file_path(model_tag: str) : str
        + get_snapshot_directory(hash: str) : str
        + get_blob_file_path(hash: str) : str
        + get_snapshot_file_path(hash: str, filename: str) : str
        + resolve_model_directory(model_tag: str) : str
        + ensure_directory_setup() : None
        + exists(model_tag: str) : bool
        + get_cached_files(model_tag: str) : Tuple[str, list[str], bool]
        + prepare_new_snapshot(model_tag: str, snapshot_hash: str, snapshot_files: list[SnapshotFile]) : None
        + new_snapshot(model_tag: str, snapshot_hash: str, snapshot_files: list[SnapshotFile]) : None
    }
    class SnapshotFile {
        + url: str
        + header: Dict
        + hash: str
        + name: str
        + should_show_progress: bool
        + should_verify_checksum: bool
        + required: bool
    }
    class ModelRegistry {
        + HUGGINGFACE
        + OLLAMA
        + OCI
        + URL
    }
    ModelStore -- ModelRegistry
    ModelStore -- SnapshotFile

Updated class diagram for Ollama Model

classDiagram
    class Ollama {
        - model: str
        - model_tag: str
        - directory: str
        - filename: str
        - store: ModelStore
        + __init__(model: str, store_path: str = "")
        + pull(debug: bool = False) : str
    }
    class Model {
        <<Abstract>>
        - model: str
        - type: str
        + login(args: any)
        + logout(args: any)
        + pull(args: any)
        + push(source: str, args: any)
        + is_symlink_to(file_path: str, target_path: str) : bool
        + garbage_collection(args: any)
        + setup_container(args: any)
        + exec_model_in_container(model_path: str, cmd_args: list[str], args: any)
        + build_exec_args_perplexity(args: any, model_path: str) : list[str]
        + check_name_and_container(args: any)
        + build_prompt(args: any) : str
        + execute_model(model_path: str, exec_args: list[str], args: any)
        + validate_args(args: any)
        + build_exec_args_serve(args: any, exec_model_path: str) : list[str]
        + execute_command(model_path: str, exec_args: list[str], args: any)
        + serve(args: any)
        + inspect(args: any)
        + get_model_registry(args: any) : str
    }
    Ollama -- ModelStore
    Ollama --|> Model

Updated class diagram for Huggingface Model

classDiagram
    class Huggingface {
        - model: str
        - model_tag: str
        - directory: str
        - filename: str
        - store: ModelStore
        - hf_cli_available: bool
        + __init__(model: str, store_path: str = "")
        + login(args: any)
        + logout(args: any)
        + pull(debug: bool = False) : str
        + push(source: str, args: any)
    }
    class Model {
        <<Abstract>>
        - model: str
        - type: str
        + login(args: any)
        + logout(args: any)
        + pull(args: any)
        + push(source: str, args: any)
        + is_symlink_to(file_path: str, target_path: str) : bool
        + garbage_collection(args: any)
        + setup_container(args: any)
        + exec_model_in_container(model_path: str, cmd_args: list[str], args: any)
        + build_exec_args_perplexity(args: any, model_path: str) : list[str]
        + check_name_and_container(args: any)
        + build_prompt(args: any) : str
        + execute_model(model_path: str, exec_args: list[str], args: any)
        + validate_args(args: any)
        + build_exec_args_serve(args: any, exec_model_path: str) : list[str]
        + execute_command(model_path: str, exec_args: list[str], args: any)
        + serve(args: any)
        + inspect(args: any)
        + get_model_registry(args: any) : str
    }
    Huggingface -- ModelStore
    Huggingface --|> Model

File-Level Changes

Change	Details	Files
Introduced a `ModelStore` class to manage the storage and retrieval of models, including handling multiple files per model and symlinking to existing Ollama cache.	Created a `ModelStore` class to handle model storage and retrieval. Implemented a directory structure for storing blobs, refs, and snapshots. Added methods for resolving model directories and checking for the existence of models. Implemented caching and symlinking to existing Ollama cache. Added `SnapshotFile` and `RefFile` data structures to manage model files and references. Added `ModelRegistry` enum to represent different model registries. Added methods to download and verify checksums of model files. Added methods to create symlinks to model files.	`ramalama/ollama.py` `ramalama/huggingface.py` `ramalama/model.py` `ramalama/common.py` `ramalama/cli.py` `ramalama/oci.py` `ramalama/url.py` `ramalama/http_client.py` `ramalama/model_store.py`
Refactored the `Ollama` class to use the new `ModelStore` for pulling models, including fetching manifest data, handling different layer types, and creating necessary directories.	Modified the `Ollama` class to inherit from the `Model` class and use the `ModelStore`. Removed the `_local` method and related logic for determining model paths. Implemented the `pull` method to fetch manifest data, handle different layer types, and create necessary directories. Added logic to symlink to existing Ollama cache if available. Removed the `init_pull`, `pull_config_blob`, and `pull_blob` functions.	`ramalama/ollama.py`
Refactored the `Huggingface` class to use the new `ModelStore` for pulling models, including fetching checksums from the API and handling different file types.	Modified the `Huggingface` class to inherit from the `Model` class and use the `ModelStore`. Implemented the `pull` method to fetch checksums from the API and handle different file types. Added logic to use `huggingface-cli` to download models if available. Removed the `hf_pull` and `url_pull` methods.	`ramalama/huggingface.py`
Modified the `Model` class to include a `ModelStore` instance and handle model tags.	Added a `ModelStore` instance to the `Model` class. Added logic to handle model tags in the constructor. Modified the `__init__` method to accept a `store_path` argument.	`ramalama/model.py`
Updated the `OCI` and `URL` classes to use the new `ModelStore`.	Modified the `OCI` and `URL` classes to inherit from the `Model` class and use the `ModelStore`. Updated the constructors to accept a `store_path` argument.	`ramalama/oci.py` `ramalama/url.py`
Updated the `cli.py` to pass the store path to the model constructors.	Modified the `New` function to pass the store path to the model constructors.	`ramalama/cli.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

engelmi · 2025-02-13T14:33:26Z

@rhatdan @ericcurtin Could you have a brief look at this? Its still in draft, but I wanted to get your opinion if this is going in a good direction or not.

ericcurtin · 2025-02-13T17:49:57Z

ramalama/common.py

+    """
+    h = hashlib.new("sha256")
+    h.update(to_hash.encode("utf-8"))
+    return f"sha256:{h.hexdigest()}"


I like this function, actually thinking about making a breaking change soon and changing ':' character to '-' on the filesystem like Ollama, just not for this PR. For one, on some filesystems ':' is an illegal character.

Switching : with - makes much sense.
Since this PR would introduce a breaking regarding the file storage anyway, I can include this as well.

ericcurtin · 2025-02-13T17:50:16Z

ramalama/common.py

-            if e.code == HTTP_RANGE_NOT_SATISFIABLE:  # "Range Not Satisfiable" error (file already downloaded)
-                return  # No need to retry
+            # "Range Not Satisfiable" error (file already downloaded)
+            if e.code in [HTTP_RANGE_NOT_SATISFIABLE, HTTP_NOT_FOUND]:


Done in #818

ericcurtin · 2025-02-13T17:53:38Z

ramalama/huggingface.py

-        return True
-    else:
-        return False
+    return available("huggingface-cli")


ericcurtin · 2025-02-13T17:57:21Z

ramalama/model_store.py

+    HUGGINGFACE = "huggingface"
+    OLLAMA = "ollama"
+    OCI = "oci"
+    URL = "url"


Could see the community adding even more here, such as s3:// , just something to keep in mind, I wonder should we continue with https, etc. We do have to code up any protocol we support.

Good point! Its already different with the default https download and the huggingface-/ollama-cli and oci- although they use https as well, the "library" is different. Maybe I can inject a class/callable - which specifies HOW to download - to the model stores new_snapshot function. Then this would be easily extensible.

ericcurtin · 2025-02-13T17:58:28Z

The code looks great to me so far, don't see any significant issues at the moment at a high-level.

Added a model store to standardize pulling, storing and using models across the different repositories. Signed-off-by: Michael Engel <[email protected]>

ericcurtin · 2025-02-14T10:09:12Z

Since this is a breaking change I guess that's the main question, do we want to break all users now on upgrade? I just want to be sure the benefits are worth it. If we gain some functional benefits I think it's a good idea. What I'm less of a fan of is non-functional breaking changes like renaming "repos/models" -> "store" . But if we need to break everything for functional reasons anyway, now is certainly the time to do renames and the rename is fine.

I also think it's possible to do this in a non-breaking way in the existing ~/.local/share/ramalama/repos directory.

~/.local/share/ramalama/models/ is just a directory intended to be a more presentable way for humans and for "ramalama ls" to parse, etc. The metadata, messy implementation details were intended to go to ~/.local/share/ramalama/repos/ directory.

It's worth passing this by @swarajpande5 also, he wrote a fair bit of this.

If we decide we are doing breaking changes, another one we should do is add .gguf extension to the Ollama .gguf files, some tools refused to load them in the past without that file extension. Example that we ended up fixing vllm-side: vllm-project/vllm#7993

But just fair warning @engelmi there's many tests, assumptions built up around the existing technique. This could be tough to get through CI. You will likely spend a lot of time massaging the tests here, etc. If you do this in a non-breaking way it will be less effort to code and less impact on the users when multi-file models such as the safetensors one are enabled.

engelmi · 2025-02-14T12:00:53Z

Since this is a breaking change I guess that's the main question, do we want to break all users now on upgrade? I just want to be sure the benefits are worth it. If we gain some functional benefits I think it's a good idea. What I'm less of a fan of is non-functional breaking changes like renaming "repos/models" -> "store" . But if we need to break everything for functional reasons anyway, now is certainly the time to do renames and the rename is fine.

The "store" directory is just an intermediate for local development at the moment. It would probably be best to switch to the model directory, i.e. having ~/.local/share/ramalama/models/ollama/tinyllama/<blobs|refs|snapshots>. The repos directory would go into the respective blobs directory. It would be a breaking change nonetheless.
And I agree, there should be functional benefits for this. These are the benefits I can think of:

clear distinction between name, tag and organization of a model - also on a file/directory level (the repository, i.e. ollama or hf, is already used in the current implementation). This also helps during development, I think.
using model tags in the "refs" directory enables the user to easily switch between different versions. Ollama already provides tags and for huggingface one can use the commit, e.g. https://huggingface.co/ibm-granite/granite-3.0-8b-instruct/raw/257d6976020e06daa75f9b19d056a5e7590bf7fc/model-00001-of-00004.safetensors
I think it would be possible possible to use tags with the current approach, but this would either require updating the symlinks in the models directory to point to the correct sha256 or encode it into the path in the models directory.
all files for one model and version are bundled together - so its harder to break things
easier cleanup, repairing and symlink checking for models
support for safetensors on a file storage level "out of the box" (not for running them, only to pull, store and provide these files) with the advantages of versioning and bundling

WDYT? @ericcurtin @swarajpande5

I also think it's possible to do this in a non-breaking way in the existing ~/.local/share/ramalama/repos directory.

~/.local/share/ramalama/models/ is just a directory intended to be a more presentable way for humans and for "ramalama ls" to parse, etc. The metadata, messy implementation details were intended to go to ~/.local/share/ramalama/repos/ directory.

It's worth passing this by @swarajpande5 also, he wrote a fair bit of this.

Yes, it should be possible to support multiple files with the current approach. I am wondering, though, if it makes things harder to understand and maintain - code-wise as well as from the directories and files.
For example:
When having multiple files (incl. metadata). Those should be stored in "repos" and linking only the model from "models" to the "repos" directory, right? As a human inspecting the "models" directory, wouldn't I want to know which files are used? And if I add symlinks to the other files, how can I ensure the versions of those files stay the same? Introducing model tags/versions and encoding it into the path would also be a (smaller) breaking change. Adding a "refs" directory, we'd have to keep track of all files for a version there.

The messy implementation details should be only in code where we can encapsulate them (i.e. in the Ollama/HF/URL/etc. Model) and the file storage should be as standardized as possible, I think.

If we decide we are doing breaking changes, another one we should do is add .gguf extension to the Ollama .gguf files, some tools refused to load them in the past without that file extension. Example that we ended up fixing vllm-side: vllm-project/vllm#7993

Sounds good to me!
In the model store class this should be simple - basically a check with is_model_gguf on the downloaded blob with a check for the file extension on the model name.
Same for replacing : with -, which can also be done in a central location of the store class.

But just fair warning @engelmi there's many tests, assumptions built up around the existing technique. This could be tough to get through CI. You will likely spend a lot of time massaging the tests here, etc. If you do this in a non-breaking way it will be less effort to code and less impact on the users when multi-file models such as the safetensors one are enabled.

Yes, that is true and will be quite cumbersome to change.

rhatdan · 2025-02-17T11:32:43Z

I really like this PR, but it needs a rebase. We need to get this in ASAP, and probably need a mechanism to upgrade people, rather then break their stores.

engelmi force-pushed the download-additional-metadata-files branch from e2cd7ac to c3fc4fd Compare February 13, 2025 17:17

ericcurtin reviewed Feb 13, 2025

View reviewed changes

ramalama/huggingface.py

return True

else:

return False

return available("huggingface-cli")

Copy link

Collaborator

ericcurtin Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

ericcurtin reviewed Feb 13, 2025

View reviewed changes

Added a model store

5f826a6

Added a model store to standardize pulling, storing and using models across the different repositories. Signed-off-by: Michael Engel <[email protected]>

engelmi force-pushed the download-additional-metadata-files branch from c3fc4fd to 5f826a6 Compare February 14, 2025 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduced a model store #805

Introduced a model store #805

engelmi commented Feb 13, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 13, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

engelmi commented Feb 13, 2025 •

edited

Loading

ericcurtin Feb 13, 2025

engelmi Feb 14, 2025

ericcurtin Feb 13, 2025

engelmi Feb 17, 2025

ericcurtin Feb 13, 2025

ericcurtin Feb 13, 2025

engelmi Feb 14, 2025 •

edited

Loading

ericcurtin commented Feb 13, 2025 •

edited

Loading

ericcurtin commented Feb 14, 2025

engelmi commented Feb 14, 2025 •

edited

Loading

rhatdan commented Feb 17, 2025

Introduced a model store #805

Are you sure you want to change the base?

Introduced a model store #805

Conversation

engelmi commented Feb 13, 2025 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Feb 13, 2025 • edited Loading

Reviewer's Guide by Sourcery

Updated class diagram for ModelStore

Updated class diagram for Ollama Model

Updated class diagram for Huggingface Model

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

engelmi commented Feb 13, 2025 • edited Loading

ericcurtin Feb 13, 2025

Choose a reason for hiding this comment

engelmi Feb 14, 2025

Choose a reason for hiding this comment

ericcurtin Feb 13, 2025

Choose a reason for hiding this comment

engelmi Feb 17, 2025

Choose a reason for hiding this comment

ericcurtin Feb 13, 2025

Choose a reason for hiding this comment

ericcurtin Feb 13, 2025

Choose a reason for hiding this comment

engelmi Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

ericcurtin commented Feb 13, 2025 • edited Loading

ericcurtin commented Feb 14, 2025

engelmi commented Feb 14, 2025 • edited Loading

rhatdan commented Feb 17, 2025

engelmi commented Feb 13, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Feb 13, 2025 •

edited

Loading

engelmi commented Feb 13, 2025 •

edited

Loading

engelmi Feb 14, 2025 •

edited

Loading

ericcurtin commented Feb 13, 2025 •

edited

Loading

engelmi commented Feb 14, 2025 •

edited

Loading