Repository logo
Article

Time-Frequency Token Advantage Clipping for Training Efficient Large Reasoning Model

dc.contributor.authorBao, Rong
dc.contributor.authorWang, Bo
dc.contributor.authorLi, Hongyu
dc.contributor.authorZheng, Riu
dc.contributor.authorWang, Xiao
dc.contributor.authorRutkowski, Leszek
dc.contributor.authorZhang, Qi
dc.contributor.authorDing, Liang
dc.contributor.authorTao, Dacheng
dc.contributor.departmentWydział Informatyki
dc.date.available2025-12-13T12:04:31Z
dc.date.issued2026
dc.description.abstractLong Chain-of-Thought (CoT) reasoning enhances large reasoning models’ performance but suffers from severe inefficiencies, as models often overthink simple problems or underthink complex ones. Current sequence-level optimizations, like length penalties, are too coarse-grained to distinguish core logic from verbose language, precluding the necessary token-level control for efficient reasoning CoT. To overcome these limitations, we introduce Time-Frequency token Advantage Clipping (TFAC), a novel training framework designed to build efficient large reasoning models via token-level interventions. Specifically, TFAC functions along two dimensions: 1) The Frequency Dimension: It discourages inefficient loops and encourages deeper exploration by dynamically reducing the advantage scores of high-entropy tokens that are repeatedly generated within a single reasoning path. 2) The Time Dimension: It reduces excessive overthinking of the system by establishing a historical baseline for the occurrence count of each critical token in previously successful trajectories, and clipping the advantages of tokens that exceed this baseline during training. Crucially, to preserve the model’s exploratory capabilities on novel problems, this suppression mechanism is automatically disabled when no historical record of success is available. Experiments conducted on the Deepseek-Distill-32B and Qwen3- 8B models show that TFAC outperforms leading baseline methods, improving performance by 2.3 and 3.1 percentage points, respectively, while simultaneously reducing inference costs by 35% and 28% in scenarios where correct answers are generated. These results validate the significant efficacy of TFAC in training large reasoning models that are both powerful and highly efficient. The source code and datasets used in this study are available at https://github.com/rbao2018/TFAC.pl
dc.description.typereferat z konferencji
dc.description.versionpostprint
dc.identifier.urihttps://repo.agh.edu.pl/handle/AGH/115322
dc.language.isoeng
dc.rightsAttribution 4.0 International
dc.rights.accessotwarty dostęp
dc.subjectChain-of-Thought reasoningen
dc.subjectefficient reasoningen
dc.subjecttoken-level optimizationen
dc.subjectadvantage clippingen
dc.subjectTFAC frameworken
dc.titleTime-Frequency Token Advantage Clipping for Training Efficient Large Reasoning Model
dc.title.relatedProceedings of the 40th Annual AAAI Conference on Artificial Intelligence
dc.typeartykuł
dspace.entity.typePublication
project.funder.nameMinisterstwo Edukacji i Nauki (MEiN)
project.identifierUMO-2021/01/2/ST6/00004, ARTIQ/0004/2021
project.nameExcellence initiative - research university
publicationvolume.volumeNumber2026
relation.isAuthorOfPublicationf4e14ff0-fd9b-48f8-a378-6a0344fa2c8c
relation.isAuthorOfPublication.latestForDiscoveryf4e14ff0-fd9b-48f8-a378-6a0344fa2c8c
relation.isOrgUnitOfPublicationfb6e7bdc-52f1-4b71-bc9e-7304ddff61a2
relation.isOrgUnitOfPublication.latestForDiscoveryfb6e7bdc-52f1-4b71-bc9e-7304ddff61a2

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AAAI-clip.pdf
Size:
1.09 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.82 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections