-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synch main
with develop
#416
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Synch `develop` with `main`
Sync `develop` with `main`
…306) * Use generic task type references for prompt examples instead of kwargs. Drop Python 3.6 support. * Use generic task type references for prompt examples instead of kwargs. Drop Python 3.6 support. * Use span-specific example types in SpanTask. * Fix Pydantic import. * Fix Pydantic issue with SpanCoTExample. * Fix import compat issue. * Remove comment. * Fix example generation. * Fix Pydantic import. * Update spacy_llm/ty.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Drop BaseRELExample. * Revert check_label_consistency call. --------- Co-authored-by: Sofie Van Landeghem <[email protected]>
Synch `develop` with `main`
Synch `develop` with `main`
Sync `develop` with `main`
Sync `dev` with `main`
* Add foundation for EL task. * Add foundation for EL task. * Continue building EL task. * Fix merge conflicts. Finish zero-shot implementation. * Fix tests. * Create toy KB for EL tests. * Fix LRU cache init. * Fix remaining outdated paths. * Fix Windows .csv issue (maybe). * Debug Windows .csv issue. * Debug Windows .csv issue. * Clean up CSV parsing in CandidateSelector. * entity_linking -> entity_linker. * Migrate EL task to new task structure. * Incorporate feedback. * Incorporate feedback. * Use standardized entity candidate representation. * Warn instead of raise if entity ID isn't in descriptions file. * Ignore entities without KB IDs set instead of ignoring the whole example. * Fix typing import. * Make entity description more robust. * Make entity description more robust. * Update default entity description text. * Fix typing import. * Remove mentions from Entity, rename. * Fix Entity creation. * Fix EL example creation. * Update comment. * Move CandidateSelector to initialize. * Fix tests. * Rename EL test file. * Fix test import. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Rename to ELExample. * Format. Rename to PipelineCandidateSelector. * Incorporate feedback. * Fix EL tests. * Update spacy_llm/tests/tasks/test_entity_linker.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Return str instead of Doc in ent highlighting. * Use require function for candidate selector check. * Move var for unavalable ent desc. * Fix comment. * Add ent highlighting unit test. * Make EL component name configurable. * Make entity description reader configurable. * Move entities used for tests to file. * Move aliases to file. * Resolve file path. * Rename to .yaml. * Add usage example. Add NIL clause to template. * Finish usage example. * Fix tests. * Allow NIL responses. * Uncommented cfg_strings. * Update spacy_llm/tests/tasks/test_entity_linker.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Fix tests. Start adding auto-NIL check. * Adjust data to auto-nil. * Fix bugs, complete test. * Fix LLM EL factory test. * Fix EL tests. * Fix REL tests. * Remove redundant security check. * Make entity descriptions optional. * Simplify EL task init. * Fix tests. * Update spacy_llm/tasks/entity_linker/registry.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Modify assertion. * Fix usage tests. * Simplify KB loader construction. Run usage example from KB data file. * Update spacy_llm/tasks/entity_linker/task.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update usage_examples/el_openai/README.md Co-authored-by: Sofie Van Landeghem <[email protected]> * Update usage_examples/tests/test_usage_examples.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Fix docstring. Fix EL factory test. * Make embedding in EL data format optional. * Fix processing of optional embeddings. * Revert docstring change. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review * Rename KB loaders. * Change comment style. More informative error message on malformed KB .yaml file. * Add set_candidate_selector(). * Update spacy_llm/tests/pipeline/test_llm.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/tasks/entity_linker/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/tasks/entity_linker/task.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate review feedback. * Add unit test for example with code only. * Check for candidate_selector being set in .initialize(). --------- Co-authored-by: Sofie Van Landeghem <[email protected]>
Sync `develop` with `main`
* Add context length info. Refactor BuiltinTask and models to facilitate this. * Add token count estimator plumbing. * Add plumbing for mapper and reducer. * Add ShardMapper prototype. * Integrating mapping into prompt generation workflow. * Update response parsing and component to support sharding (WIP). * Fix shard & prompt flow. * Fix shard & prompt flow. * Remove todo comments. * Fix Anthropic, Cohere, NoOp model tests. * Fix test_llm_pipe(). * Fix type checking test. * Fix span parsing tests. * Fix internal tests. * Fix _CountTask. * Fix sentiment and summarization tasks and tests. * Fix Azure connection URL. Fix Model test pings. * Fix Lemma parsing. * Start work on doc-to-shard property copying. * Fix REL doc preprocessing. * Remove comment on doc attribute handling during sharding, as this is done by spaCy's slicing directly. * Add reducer implementations. * Implement outstanding task reducers. * Add shardable/non-shardable LLM task typing distinction. Add support for handling both types of tasks. Update tests. * Fix EL task. * Fix EL tokenization and highlighting partially. * Fix tokenization and whitespaces for EL task. * Add new registry handlers (with context length and arbitrary model names) for all REST models. * Add sharding test with simple count task. * Fix sharding algorithm. * Add test with simple count task. * Add context length as init arg in HF models. * Fix tests. Don't stringify IO lists if sharded. * Fix tests. * Add NER sharding test. * Add REL and sentiment sharding tests. * Add summary sharding tests. * Add EL sharding task. Fix bug in shard mapper. * Fix REL error with RELExample parsing. * Use regex for punctuation in REL conversion. * Maintain custom doc attributes, incl. test. * Filter merge warnings in textcat reduction. * Fix custom doc data merging. * Update spacy_llm/models/langchain/model.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/pipeline/llm.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Move sharding compatibility warning to component constructor. * Update spacy_llm/tasks/entity_linker/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/models/hf/base.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Fix doc string --------- Co-authored-by: Sofie Van Landeghem <[email protected]>
* Add context length info. Refactor BuiltinTask and models to facilitate this. * Add token count estimator plumbing. * Add plumbing for mapper and reducer. * Add ShardMapper prototype. * Integrating mapping into prompt generation workflow. * Update response parsing and component to support sharding (WIP). * Fix shard & prompt flow. * Fix shard & prompt flow. * Remove todo comments. * Fix Anthropic, Cohere, NoOp model tests. * Fix test_llm_pipe(). * Fix type checking test. * Fix span parsing tests. * Fix internal tests. * Fix _CountTask. * Fix sentiment and summarization tasks and tests. * Fix Azure connection URL. Fix Model test pings. * Fix Lemma parsing. * Start work on doc-to-shard property copying. * Fix REL doc preprocessing. * Remove comment on doc attribute handling during sharding, as this is done by spaCy's slicing directly. * Add reducer implementations. * Implement outstanding task reducers. * Add shardable/non-shardable LLM task typing distinction. Add support for handling both types of tasks. Update tests. * Fix EL task. * Fix EL tokenization and highlighting partially. * Fix tokenization and whitespaces for EL task. * Add new registry handlers (with context length and arbitrary model names) for all REST models. * Add sharding test with simple count task. * Fix sharding algorithm. * Add test with simple count task. * Add context length as init arg in HF models. * Fix tests. Don't stringify IO lists if sharded. * Fix tests. * Add NER sharding test. * Add REL and sentiment sharding tests. * Add summary sharding tests. * Add EL sharding task. Fix bug in shard mapper. * Fix REL error with RELExample parsing. * Use regex for punctuation in REL conversion. * Maintain custom doc attributes, incl. test. * Filter merge warnings in textcat reduction. * Fix custom doc data merging. * Add RawTask. * Fix task version. * Add sharding test. * Update spacy_llm/models/langchain/model.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/pipeline/llm.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Move sharding compatibility warning to component constructor. * Update spacy_llm/tasks/entity_linker/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/models/hf/base.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Update spacy_llm/tasks/raw/registry.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Fix tests. * Remove boilerplate text in raw template. * Fix sharding test. --------- Co-authored-by: Sofie Van Landeghem <[email protected]>
* Add context length info. Refactor BuiltinTask and models to facilitate this. * Add token count estimator plumbing. * Add plumbing for mapper and reducer. * Add ShardMapper prototype. * Integrating mapping into prompt generation workflow. * Update response parsing and component to support sharding (WIP). * Fix shard & prompt flow. * Fix shard & prompt flow. * Remove todo comments. * Fix Anthropic, Cohere, NoOp model tests. * Fix test_llm_pipe(). * Fix type checking test. * Fix span parsing tests. * Fix internal tests. * Fix _CountTask. * Fix sentiment and summarization tasks and tests. * Fix Azure connection URL. Fix Model test pings. * Fix Lemma parsing. * Start work on doc-to-shard property copying. * Fix REL doc preprocessing. * Remove comment on doc attribute handling during sharding, as this is done by spaCy's slicing directly. * Add reducer implementations. * Implement outstanding task reducers. * Add shardable/non-shardable LLM task typing distinction. Add support for handling both types of tasks. Update tests. * Fix EL task. * Fix EL tokenization and highlighting partially. * Fix tokenization and whitespaces for EL task. * Add new registry handlers (with context length and arbitrary model names) for all REST models. * Add sharding test with simple count task. * Fix sharding algorithm. * Add test with simple count task. * Add context length as init arg in HF models. * Fix tests. Don't stringify IO lists if sharded. * Fix tests. * Add NER sharding test. * Add REL and sentiment sharding tests. * Add summary sharding tests. * Add EL sharding task. Fix bug in shard mapper. * Fix REL error with RELExample parsing. * Use regex for punctuation in REL conversion. * Maintain custom doc attributes, incl. test. * Filter merge warnings in textcat reduction. * Fix custom doc data merging. * Add translation task. * Fix init test. * Make smoke check more forgiving w.r.t. translation content. * Resolve merge conflicts. * Fix test_llm_task_factories.
Sync `develop` with `main`
* Add logo. * Update readme. * Update radme with Azure link. * Update header. * Update title. * Update logos. * Update with new tasks, prompt sharding. * Revert code changes.
* Format. * Fix usage examples.
* Format. * Fix usage examples. * Disable schedule GPU tests.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Synch
main
withdevelop
.Corresponding documentation PR
-
Types of change
Chore.
Checklist
tests
andusage_examples/tests
, and all new and existing tests passed. This includespytest
ran with--external
)pytest
ran with--gpu
)