[Text Generation][V2] NonKVCachePipeline #1483

dbogunowicz · 2023-12-18T16:55:43Z

Feature Description

Added the TestGenerationPipelineNoKVCache. This pipeline processes the prompt and returns the new token. That's it.
Its main functionality is mapping prompt tokens to logits, instrumental for computing the perplexity of the model given a dataset

Testing

Updated the integration tests to cover the case of non-kv-cache inference.

Example Use

from deepsparse.v2.text_generation import TextGenerationPipelineNoCache

prompt = ["Some funny prompt", "Why are you so"]

pipeline = TextGenerationPipelineNoCache(model_path="hf:mgoin/TinyStories-1M-ds",
                                         onnx_model_name="model-orig.onnx",
                                         sequence_length=20)

out = pipeline(prompt=prompt,
               include_prompt_logits=True,
               generation_kwargs=dict(output_scores=True))

for gen in out.generations:
    print(gen)

text='.' score=array([[ 2.9344807 , -0.03345669, -4.11256   , ..., -6.9316325 ,
        -4.6005425 ,  1.1827914 ],
       [ 7.008805  , -0.11603884, -7.1837015 , ..., -7.0405912 ,
        -2.386351  , -2.2007818 ],
       [ 6.348213  , -2.2960157 , -6.433192  , ..., -6.5930486 ,
        -5.8315077 , -0.58804405],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32) finished=True finished_reason='length' # notice that logits get zero padding from the end, this is because all logits need to have the same shape (the length of the longest prompt in the input +1)
text=' sad' score=array([[ 2.560934 ,  1.1993233, -6.670935 , ..., -7.3002615, -3.823823 ,
         1.8125833],
       [-1.1050931, -2.4256568, -7.3015127, ..., -6.1500154, -4.074909 ,
         1.8155754],
       [ 6.172593 , -2.2252593, -9.146653 , ..., -7.70834  , -4.810748 ,
         0.3985293],
       [ 1.4988875,  1.0973434, -4.4714937, ..., -4.8026247, -1.1791464,
         1.6924176]], dtype=float32) finished=True finished_reason='length'

Next steps

Create a parentTextGenerationPipeline operator that can either choose to use the kv-cache or non-kv cache version of the pipeline, depending on the topology of the ONNX model
Move the overwriting of the transformer inputs to some high-level function
Use the V2 pipeline for Perplexity calculation
swap GraphRouter for LinearRouter in TextGenerationPipelineNoKVCache

… router and image classification pipeline/operators/example (#1325) * initial functionality and working example with image classification * remove testing image * update args * initial functionality and working example with image classification * remove testing image * pr comments * defines schemas for operators and test * add image classification test, PR comments * fix input/output handling in pipeline and operator base classes to be more generic; remove context * add additional operator input message * typo fix

* [v2] EngineOperator updates to make continuous batching easier * test fixes

…ity (#1348) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings

…generation functionality (#1356) * initial functionality and working example with image classification * remove testing image * rebase fixes * initial functionality and working example with image classification * text gen * updates func * prompt inference, initial functionality * remove image; update state docstring * Fix typo * add todo for split/join * remove context, clean-up args, remove prefill_preprocess_operaator * fix docstrings * initial functionality and working example with image classification * updates func * prompt inference, initial functionality * finish generation operators and update routes * further breakdown operators * add operators * fix can_operate condition * update can_operate to not rely on the inference_state * rebase + update * fix condition * fix capacity settting again * typo fixes

…e code to remove repeat code, update map function

…eature/damian/v2/factor_out_transformation_utils

) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class

…scope of fixtures module to help with speed

…eature/damian/v2/factor_out_transformation_utils

* unit testing for text generation operators * additional changes * unit testing completion * remove debug * fix * add todo * more clean-up * fix test * add docstrings/comments * break out tests to individual unit test files; add conftest and make scope of fixtures module to help with speed * fix name

…ng and prioritization (#1373) * [Continuous Batching] Queue Implementation to support batching grouping and prioritization * has_key method * thread safety * add blocking option for pop_batch * update docstring * allow mutex to be shared across continuous batching objects * revert last commit

…#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack

* [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition

* [continuous batching] singleton pattern for scheduler * catch from review

…cache

…cache_retrieve

src/deepsparse/transformers/helpers.py

src/deepsparse/transformers/pipelines/text_generation/generate_new_token.py

src/deepsparse/transformers/pipelines/text_generation/prep_for_generation.py

src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py

src/deepsparse/transformers/helpers.py

src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py

…_no_kv_cache.py

dsikka

Thank you for the changes - this looks great.
I think the only thing that we're missing is the logic check which runs the appropriate pipeline by checking if kv_cache is present. We should do this in a follow-up PR.

bfineran and others added 30 commits October 26, 2023 13:22

Pipelines Refactor - Initial Impl (#1287)

3e00175

[v2] EngineOperator updates to make continuous batching easier (#1371)

58b0758

* [v2] EngineOperator updates to make continuous batching easier * test fixes

add split/join functionality

f18d5f3

update router to include split/join in parent class, refactor pipelin…

2c4d231

…e code to remove repeat code, update map function

process multiple generations

672ca20

initial commit

304eb35

fix error

71515ac

Merge remote-tracking branch 'origin/features/v2/run_multiple' into f…

6f1b175

…eature/damian/v2/factor_out_transformation_utils

[Pipeline Refactor] Split/Join Functionality for multiple prompts (#1384

041174b

) * add split/join functionality * update router to include split/join in parent class, refactor pipeline code to remove repeat code, update map function * process multiple generations * move map to base class

unit testing for text generation operators

a508342

additional changes

cbb0e86

unit testing completion

2541581

remove debug

8c8989d

fix

f8d75e3

add todo

fd1e466

more clean-up

64c0552

fix test

913665a

add docstrings/comments

e15521f

break out tests to individual unit test files; add conftest and make …

379481e

…scope of fixtures module to help with speed

Merge remote-tracking branch 'origin/features/v2/unit_testing' into f…

a90a20a

…eature/damian/v2/factor_out_transformation_utils

Merge branch 'v2' into feature/damian/v2/factor_out_transformation_utils

c0c4240

Delete tests/deepsparse/v2/unit/text_generation/test_msic.py

4f248dd

[Continuous Batching] Executor thread for running continuous batching (…

d81012d

…#1374) * [Continuous Batching] Executor thread for running continuous batching * quality * ensure that executor stops when main thread does - clean up test hack

[ContinuousBatching] ContinuousBatchingScheduler Implementation (#1375)

5c48505

* [ContinuousBatching] ContinuousBatchingScheduler Implementation * cleanup unnecessary stop condition

[continuous batching] singleton pattern for scheduler (#1391)

e1b7f37

* [continuous batching] singleton pattern for scheduler * catch from review

dbogunowicz added 4 commits December 6, 2023 12:27

Merge remote-tracking branch 'origin/v2' into feature/damian/no_kv_cache

39be9a0

bring back functionalities that were lost in v2 during rebasing

dcab3f9

Merge remote-tracking branch 'origin/main' into feature/damian/no_kv_…

e0a9dee

…cache

Merge remote-tracking branch 'origin/main' into feature/damian/no_kv_…

bc1b11e

…cache_retrieve

dbogunowicz commented Dec 18, 2023

View reviewed changes

src/deepsparse/transformers/helpers.py Outdated Show resolved Hide resolved

dbogunowicz and others added 5 commits December 18, 2023 17:56

Update src/deepsparse/transformers/helpers.py

e5d2f39

ready for review

9ed5b06

bring tests back"

1ac1f5c

quality

a734459

original readme

60fa00f

dbogunowicz changed the title ~~Feature/damian/no kv cache retrieve~~ [Text Generation][V2] NonKVCachePipeline Dec 18, 2023

dbogunowicz requested review from dsikka and bfineran December 18, 2023 18:05

bfineran previously approved these changes Dec 18, 2023

View reviewed changes

dsikka requested changes Dec 19, 2023

View reviewed changes

dbogunowicz and others added 2 commits December 20, 2023 11:41

Merge branch 'main' into feature/damian/no_kv_cache_retrieve

14b0dc0

addressing Dipikas comments

9371990

dbogunowicz dismissed bfineran’s stale review via 9371990 December 20, 2023 10:58

dbogunowicz requested review from bfineran and dsikka December 20, 2023 10:58

dbogunowicz commented Dec 20, 2023

View reviewed changes

src/deepsparse/transformers/pipelines/text_generation/pipeline_no_kv_cache.py Show resolved Hide resolved

dbogunowicz and others added 5 commits December 20, 2023 16:00

Update src/deepsparse/transformers/pipelines/text_generation/pipeline…

4eed463

…_no_kv_cache.py

Merge branch 'main' into feature/damian/no_kv_cache_retrieve

0b17bd8

addressing PR review

111d533

Merge branch 'main' into feature/damian/no_kv_cache_retrieve

4370c52

Merge branch 'main' into feature/damian/no_kv_cache_retrieve

8d352fc

bfineran approved these changes Dec 29, 2023

View reviewed changes

dsikka approved these changes Jan 2, 2024

View reviewed changes

dbogunowicz merged commit dd0f574 into main Jan 2, 2024
13 checks passed

dbogunowicz deleted the feature/damian/no_kv_cache_retrieve branch January 2, 2024 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Text Generation][V2] NonKVCachePipeline #1483

[Text Generation][V2] NonKVCachePipeline #1483

dbogunowicz commented Dec 18, 2023 •

edited

Loading

dsikka left a comment

[Text Generation][V2] NonKVCachePipeline #1483

[Text Generation][V2] NonKVCachePipeline #1483

Conversation

dbogunowicz commented Dec 18, 2023 • edited Loading

Feature Description

Testing

Example Use

Next steps

dsikka left a comment

Choose a reason for hiding this comment

dbogunowicz commented Dec 18, 2023 •

edited

Loading