Synch `main` with `develop` #416

rmitsch · 2024-01-19T09:56:27Z

Description

Synch main with develop.

Corresponding documentation PR

-

Types of change

Chore.

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran all tests in tests and usage_examples/tests, and all new and existing tests passed. This includes
- all external tests (i. e. pytest ran with --external)
- all tests requiring a GPU (i. e. pytest ran with --gpu)
My changes don't require a change to the documentation, or if they do, I've added all required information.

Synch `develop` with `main`

Sync `develop` with `main`

* Reverse control flow, ditch kwargs for generate_prompts(). * Revert examples to kwargs structure. * Fix TextCatExample.

…306) * Use generic task type references for prompt examples instead of kwargs. Drop Python 3.6 support. * Use generic task type references for prompt examples instead of kwargs. Drop Python 3.6 support. * Use span-specific example types in SpanTask. * Fix Pydantic import. * Fix Pydantic issue with SpanCoTExample. * Fix import compat issue. * Remove comment. * Fix example generation. * Fix Pydantic import. * Update spacy_llm/ty.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Drop BaseRELExample. * Revert check_label_consistency call. --------- Co-authored-by: Sofie Van Landeghem <[email protected]>

Synch `develop` with `main`

Sync `develop` with `main`

Sync `dev` with `main`

* Add foundation for EL task. * Add foundation for EL task. * Continue building EL task. * Fix merge conflicts. Finish zero-shot implementation. * Fix tests. * Create toy KB for EL tests. * Fix LRU cache init. * Fix remaining outdated paths. * Fix Windows .csv issue (maybe). * Debug Windows .csv issue. * Debug Windows .csv issue. * Clean up CSV parsing in CandidateSelector. * entity_linking -> entity_linker. * Migrate EL task to new task structure. * Incorporate feedback. * Incorporate feedback. * Use standardized entity candidate representation. * Warn instead of raise if entity ID isn't in descriptions file. * Ignore entities without KB IDs set instead of ignoring the whole example. * Fix typing import. * Make entity description more robust. * Make entity description more robust. * Update default entity description text. * Fix typing import. * Remove mentions from Entity, rename. * Fix Entity creation. * Fix EL example creation. * Update comment. * Move CandidateSelector to initialize. * Fix tests. * Rename EL test file. * Fix test import. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Rename to ELExample. * Format. Rename to PipelineCandidateSelector. * Incorporate feedback. * Fix EL tests. * Update spacy_llm/tests/tasks/test_entity_linker.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Return str instead of Doc in ent highlighting. * Use require function for candidate selector check. * Move var for unavalable ent desc. * Fix comment. * Add ent highlighting unit test. * Make EL component name configurable. * Make entity description reader configurable. * Move entities used for tests to file. * Move aliases to file. * Resolve file path. * Rename to .yaml. * Add usage example. Add NIL clause to template. * Finish usage example. * Fix tests. * Allow NIL responses. * Uncommented cfg_strings. * Update spacy_llm/tests/tasks/test_entity_linker.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Fix tests. Start adding auto-NIL check. * Adjust data to auto-nil. * Fix bugs, complete test. * Fix LLM EL factory test. * Fix EL tests. * Fix REL tests. * Remove redundant security check. * Make entity descriptions optional. * Simplify EL task init. * Fix tests. * Update spacy_llm/tasks/entity_linker/registry.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Modify assertion. * Fix usage tests. * Simplify KB loader construction. Run usage example from KB data file. * Update spacy_llm/tasks/entity_linker/task.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update usage_examples/el_openai/README.md Co-authored-by: Sofie Van Landeghem <[email protected]> * Update usage_examples/tests/test_usage_examples.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Fix docstring. Fix EL factory test. * Make embedding in EL data format optional. * Fix processing of optional embeddings. * Revert docstring change. * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review * Rename KB loaders. * Change comment style. More informative error message on malformed KB .yaml file. * Add set_candidate_selector(). * Update spacy_llm/tests/pipeline/test_llm.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/tasks/entity_linker/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/tasks/entity_linker/task.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate review feedback. * Add unit test for example with code only. * Check for candidate_selector being set in .initialize(). --------- Co-authored-by: Sofie Van Landeghem <[email protected]>

Sync `develop` with `main`

* Add context length info. Refactor BuiltinTask and models to facilitate this. * Add token count estimator plumbing. * Add plumbing for mapper and reducer. * Add ShardMapper prototype. * Integrating mapping into prompt generation workflow. * Update response parsing and component to support sharding (WIP). * Fix shard & prompt flow. * Fix shard & prompt flow. * Remove todo comments. * Fix Anthropic, Cohere, NoOp model tests. * Fix test_llm_pipe(). * Fix type checking test. * Fix span parsing tests. * Fix internal tests. * Fix _CountTask. * Fix sentiment and summarization tasks and tests. * Fix Azure connection URL. Fix Model test pings. * Fix Lemma parsing. * Start work on doc-to-shard property copying. * Fix REL doc preprocessing. * Remove comment on doc attribute handling during sharding, as this is done by spaCy's slicing directly. * Add reducer implementations. * Implement outstanding task reducers. * Add shardable/non-shardable LLM task typing distinction. Add support for handling both types of tasks. Update tests. * Fix EL task. * Fix EL tokenization and highlighting partially. * Fix tokenization and whitespaces for EL task. * Add new registry handlers (with context length and arbitrary model names) for all REST models. * Add sharding test with simple count task. * Fix sharding algorithm. * Add test with simple count task. * Add context length as init arg in HF models. * Fix tests. Don't stringify IO lists if sharded. * Fix tests. * Add NER sharding test. * Add REL and sentiment sharding tests. * Add summary sharding tests. * Add EL sharding task. Fix bug in shard mapper. * Fix REL error with RELExample parsing. * Use regex for punctuation in REL conversion. * Maintain custom doc attributes, incl. test. * Filter merge warnings in textcat reduction. * Fix custom doc data merging. * Update spacy_llm/models/langchain/model.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/pipeline/llm.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Move sharding compatibility warning to component constructor. * Update spacy_llm/tasks/entity_linker/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/models/hf/base.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Fix doc string --------- Co-authored-by: Sofie Van Landeghem <[email protected]>

* Add context length info. Refactor BuiltinTask and models to facilitate this. * Add token count estimator plumbing. * Add plumbing for mapper and reducer. * Add ShardMapper prototype. * Integrating mapping into prompt generation workflow. * Update response parsing and component to support sharding (WIP). * Fix shard & prompt flow. * Fix shard & prompt flow. * Remove todo comments. * Fix Anthropic, Cohere, NoOp model tests. * Fix test_llm_pipe(). * Fix type checking test. * Fix span parsing tests. * Fix internal tests. * Fix _CountTask. * Fix sentiment and summarization tasks and tests. * Fix Azure connection URL. Fix Model test pings. * Fix Lemma parsing. * Start work on doc-to-shard property copying. * Fix REL doc preprocessing. * Remove comment on doc attribute handling during sharding, as this is done by spaCy's slicing directly. * Add reducer implementations. * Implement outstanding task reducers. * Add shardable/non-shardable LLM task typing distinction. Add support for handling both types of tasks. Update tests. * Fix EL task. * Fix EL tokenization and highlighting partially. * Fix tokenization and whitespaces for EL task. * Add new registry handlers (with context length and arbitrary model names) for all REST models. * Add sharding test with simple count task. * Fix sharding algorithm. * Add test with simple count task. * Add context length as init arg in HF models. * Fix tests. Don't stringify IO lists if sharded. * Fix tests. * Add NER sharding test. * Add REL and sentiment sharding tests. * Add summary sharding tests. * Add EL sharding task. Fix bug in shard mapper. * Fix REL error with RELExample parsing. * Use regex for punctuation in REL conversion. * Maintain custom doc attributes, incl. test. * Filter merge warnings in textcat reduction. * Fix custom doc data merging. * Add RawTask. * Fix task version. * Add sharding test. * Update spacy_llm/models/langchain/model.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/pipeline/llm.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Move sharding compatibility warning to component constructor. * Update spacy_llm/tasks/entity_linker/util.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Update spacy_llm/models/hf/base.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Incorporate feedback. * Update spacy_llm/tasks/raw/registry.py Co-authored-by: Sofie Van Landeghem <[email protected]> * Apply suggestions from code review Co-authored-by: Sofie Van Landeghem <[email protected]> * Fix tests. * Remove boilerplate text in raw template. * Fix sharding test. --------- Co-authored-by: Sofie Van Landeghem <[email protected]>

* Add context length info. Refactor BuiltinTask and models to facilitate this. * Add token count estimator plumbing. * Add plumbing for mapper and reducer. * Add ShardMapper prototype. * Integrating mapping into prompt generation workflow. * Update response parsing and component to support sharding (WIP). * Fix shard & prompt flow. * Fix shard & prompt flow. * Remove todo comments. * Fix Anthropic, Cohere, NoOp model tests. * Fix test_llm_pipe(). * Fix type checking test. * Fix span parsing tests. * Fix internal tests. * Fix _CountTask. * Fix sentiment and summarization tasks and tests. * Fix Azure connection URL. Fix Model test pings. * Fix Lemma parsing. * Start work on doc-to-shard property copying. * Fix REL doc preprocessing. * Remove comment on doc attribute handling during sharding, as this is done by spaCy's slicing directly. * Add reducer implementations. * Implement outstanding task reducers. * Add shardable/non-shardable LLM task typing distinction. Add support for handling both types of tasks. Update tests. * Fix EL task. * Fix EL tokenization and highlighting partially. * Fix tokenization and whitespaces for EL task. * Add new registry handlers (with context length and arbitrary model names) for all REST models. * Add sharding test with simple count task. * Fix sharding algorithm. * Add test with simple count task. * Add context length as init arg in HF models. * Fix tests. Don't stringify IO lists if sharded. * Fix tests. * Add NER sharding test. * Add REL and sentiment sharding tests. * Add summary sharding tests. * Add EL sharding task. Fix bug in shard mapper. * Fix REL error with RELExample parsing. * Use regex for punctuation in REL conversion. * Maintain custom doc attributes, incl. test. * Filter merge warnings in textcat reduction. * Fix custom doc data merging. * Add translation task. * Fix init test. * Make smoke check more forgiving w.r.t. translation content. * Resolve merge conflicts. * Fix test_llm_task_factories.

Sync `develop` with `main`

* Add logo. * Update readme. * Update radme with Azure link. * Update header. * Update title. * Update logos. * Update with new tasks, prompt sharding. * Revert code changes.

* Format. * Fix usage examples.

* Format. * Fix usage examples. * Disable schedule GPU tests.

rmitsch and others added 21 commits September 11, 2023 12:24

Merge pull request #292 from explosion/main

bf3a000

Synch `develop` with `main`

Merge pull request #318 from explosion/main

bec8571

Sync `develop` with `main`

Refactor to ditch **kwargs (#299)

3283d99

* Reverse control flow, ditch kwargs for generate_prompts(). * Revert examples to kwargs structure. * Fix TextCatExample.

Merge pull request #327 from explosion/main

e5acc1f

Synch `develop` with `main`

Fix function signatures and docstrings w.r.t. example typing. (#329)

bd619c3

Fix docstring. Fix REL pre-annotation. (#330)

2f008af

Merge pull request #333 from explosion/main

83e7384

Synch `develop` with `main`

Merge pull request #341 from explosion/main

bc70d9f

Sync `develop` with `main`

Sync develop with main.

7149308

Fix REL type check.

a3911b4

Merge pull request #369 from explosion/chore/sync-dev-with-main

bbab211

Sync `dev` with `main`

Update dev from main.

2624c62

Merge pull request #383 from explosion/chore/sync-develop

dbae0c9

Sync `develop` with `main`

Fix merge conflicts.

bf4a286

Merge pull request #406 from explosion/main

07c5c8f

Sync `develop` with `main`

Update readme (#401)

b8ae128

* Add logo. * Update readme. * Update radme with Azure link. * Update header. * Update title. * Update logos. * Update with new tasks, prompt sharding. * Revert code changes.

rmitsch added Test external Run external tests Test GPU Run GPU tests labels Jan 19, 2024

rmitsch self-assigned this Jan 19, 2024

rmitsch added 3 commits January 19, 2024 12:23

Fix various API- and LangChain-related issues (#417)

caa0187

* Format. * Fix usage examples.

Disable schedule GPU tests (#418)

062517e

* Format. * Fix usage examples. * Disable schedule GPU tests.

Remove download helper. (#419)

377b1d4

rmitsch marked this pull request as ready for review January 19, 2024 11:56

rmitsch merged commit 96237c7 into main Jan 19, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synch `main` with `develop` #416

Synch `main` with `develop` #416

rmitsch commented Jan 19, 2024

Synch main with develop #416

Synch main with develop #416

Conversation

rmitsch commented Jan 19, 2024

Description

Corresponding documentation PR

Types of change

Checklist

Synch `main` with `develop` #416

Synch `main` with `develop` #416