fix: remove max_tokens cap to support thinking models (Kimi-K2.5) (#869)

mirror of https://github.com/Monadical-SAS/reflector.git synced 2026-04-20 12:16:55 +00:00

* fix: remove max_tokens cap to support thinking models (Kimi-K2.5)

Thinking/reasoning models like Kimi-K2.5 use output tokens for internal
chain-of-thought before generating the visible response. When max_tokens
was set (500 or 2048), the thinking budget consumed all available tokens,
leaving an empty response — causing TreeSummarize to return '' and
crashing the topic detection retry workflow.

Set max_tokens default to None so the model controls its own output
budget, allowing thinking models to complete both reasoning and response.

Also fix process.py CLI tool to import the Celery worker app before
dispatching tasks, ensuring the Redis broker config is used instead of
Celery's default AMQP transport.

* fix: remove max_tokens=200 cap from final title processor

Same thinking model issue — 200 tokens is especially tight and would be
entirely consumed by chain-of-thought reasoning, producing an empty title.

* Update server/reflector/tools/process.py

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

* fix: remove max_tokens=500 cap from topic detector processor

Same thinking model fix — this is the original callsite that was failing
with Kimi-K2.5, producing empty TreeSummarize responses.

---------

Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>

This commit is contained in:

Mathieu Virbel

2026-02-20 12:07:34 -06:00

committed by

GitHub

parent d4cc6be1fe

commit 527a069ba9

5 changed files with 9 additions and 4 deletions

									
										4

server/reflector/llm.py
									
												View File
												
				@@ -202,7 +202,9 @@ class StructuredOutputWorkflow(Workflow, Generic[OutputT]):

				class LLM:

				    def __init__(self, settings, temperature: float = 0.4, max_tokens: int = 2048):

				    def __init__(

				        self, settings, temperature: float = 0.4, max_tokens: int | None = None

				    ):

				        self.settings_obj = settings

				        self.model_name = settings.LLM_MODEL

				        self.url = settings.LLM_URL

fix: remove max_tokens cap to support thinking models (Kimi-K2.5) (#869)

4 server/reflector/llm.py Unescape Escape View File

4

server/reflector/llm.py

View File