Skip to content

mlabs-haskell/rigel

Repository files navigation

Rigel

The following are the various components of this project:

  • modified_llama
    Llama2 modified to allow extraction of the context vectors.

  • generate_context_vectors.py
    Use modified_llama to extract the context vectors from articles and store it using the cv_storage library (see below).
    Check the arguments to the main function for the available options like input files and output folders.

  • wikipedia_parser
    Read files generated by https://github.com/mlabs-haskell/wikipedia_parser/

  • indexed_binary_db
    A binary database that can consists of an index file and a data file.
    The index file stores the span of each entry (start, end) in the data file, and some metadata. The index is supposed to be small, so that it can be quickly loaded into memory to search for an entry based on their metadata and find their span in the data file. This span is then used to load the actual entry from the data file.

  • cv_storage
    Efficiently store context vectors, queryable by article and section names. Uses indexed_binary_db under the hood.

  • cv_library
    Generate lower-fidelity versions of a context vector for fast searching.
    Analogous to Mipmaps in 3D rendering (https://en.wikipedia.org/wiki/Mipmap).

  • cv_hier_storage
    Use the lower-fidelity versions of the context vectors to quickly compare an input context vector and find the closest match. Uses indexed_binary_db under the hood.

  • query_generator
    Generate LLM prompts from an article to help with context vector generation

  • generate_hier_cv_db.py
    Read an instance of cv_storage and use cv_library to generate an instance of cv_hier_storage

Tests

  • Run just tests to run the module level tests.
  • Run just cv_hier_db_test_e2e to test the cv_hier_storage module by generating one and running some sanity checks on it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published