Rigel

The following are the various components of this project:

modified_llama
Llama2 modified to allow extraction of the context vectors.
generate_context_vectors.py
Use modified_llama to extract the context vectors from articles and store it using the cv_storage library (see below).
Check the arguments to the main function for the available options like input files and output folders.
wikipedia_parser
Read files generated by https://github.com/mlabs-haskell/wikipedia_parser/
indexed_binary_db
A binary database that can consists of an index file and a data file.
The index file stores the span of each entry (start, end) in the data file, and some metadata. The index is supposed to be small, so that it can be quickly loaded into memory to search for an entry based on their metadata and find their span in the data file. This span is then used to load the actual entry from the data file.
cv_storage
Efficiently store context vectors, queryable by article and section names. Uses indexed_binary_db under the hood.
cv_library
Generate lower-fidelity versions of a context vector for fast searching.
Analogous to Mipmaps in 3D rendering (https://en.wikipedia.org/wiki/Mipmap).
cv_hier_storage
Use the lower-fidelity versions of the context vectors to quickly compare an input context vector and find the closest match. Uses indexed_binary_db under the hood.
query_generator
Generate LLM prompts from an article to help with context vector generation
generate_hier_cv_db.py
Read an instance of cv_storage and use cv_library to generate an instance of cv_hier_storage

Tests

Run just tests to run the module level tests.
Run just cv_hier_db_test_e2e to test the cv_hier_storage module by generating one and running some sanity checks on it.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github/workflows		.github/workflows
charts		charts
cv_hier_storage		cv_hier_storage
cv_library		cv_library
cv_storage		cv_storage
indexed_binary_db		indexed_binary_db
modified_llama		modified_llama
query_generator		query_generator
scripts		scripts
wikipedia_parser		wikipedia_parser
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
evaluate_hierarchical_compression.ipynb		evaluate_hierarchical_compression.ipynb
generate_context_vectors.py		generate_context_vectors.py
generate_hier_cv_db.py		generate_hier_cv_db.py
justfile		justfile
requirements.txt		requirements.txt
text_processing.py		text_processing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rigel

Tests

About

Releases

Packages

Contributors 2

Languages

mlabs-haskell/rigel

Folders and files

Latest commit

History

Repository files navigation

Rigel

Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages