Nojun Kwak — Research Repository

AI & Data Science Preprint PDF DOI

Kwai Summary Attention Technical Report

Chenglong Chu, Guorui Zhou, Guowang Zhang, Han Li, Hao Peng, Hongtao Cheng, Jian Liang, Jiangxia Cao, Kun Gai, Lingzhi Zhou, Lu Ren, Qi Zhang, Ruiming Tang, Ruitao Wang, Xinchen Luo, Yi Su, Zhiyuan Liang, Ziqi Wang, Boyang Ding, Chengru Song, Dunju Zang, Hui Wang, Jiao Ou, Jiaxin Deng, Jijun Shi, Jinghao Zhang, Junmin Chen, Lejian Ren, Minxuan Lv, Qianqian Wang, Qigen Hu, Shiyao Wang, Siyang Mao, Tao Wang, Xingmei Wang, Zhixin Ling, Ziming Li, Zixing Zhang · 2026

Long-context ability, has become one of the most important iteration direction of next-generation Large Language Models, particularly in semantic understanding/reasoning, code agentic intelligence and…

Read Paper →

AI & Data Science Preprint PDF DOI

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Hillary Mutisya, John Mugane · 2026

We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a tran…

Read Paper →

AI & Data Science Preprint PDF DOI

Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via Cross-Lingual Transfer and Unsupervised Clustering

Hillary Mutisya, John Mugane · 2026

We present a method for discovering morphological features in low-resource Bantu languages by combining cross-lingual transfer learning with unsupervised clustering. Applied to Giriama (nyf), a langua…

Read Paper →

Mathematics Preprint PDF DOI

An introduction to separated graphs and their type semigroups

Pere Ara · 2026

We introduce $C^*$-algebras associated with directed graphs, along with two generalizations of this concept, namely Exel-Pardo $C^*$-algebras associated with a self-similar action of a group on a dire…

Read Paper →

AI & Data Science Preprint PDF DOI

Exploring Concreteness Through a Figurative Lens

Saptarshi Ghosh, Tianyu Jiang · 2026

Static concreteness ratings are widely used in NLP, yet a word's concreteness can shift with context, especially in figurative language such as metaphor, where common concrete nouns can take abstract …

Read Paper →

AI & Data Science Preprint PDF DOI

More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage

Wei He · 2026

Vision-Language Models (VLMs) excel at photorealistic generation, yet often struggle to represent abstract meaning such as idiomatic interpretations of noun compounds. To study whether high visual fid…

Read Paper →

AI & Data Science Preprint PDF DOI

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

Yang Liu, Hongming Li, Melissa Xiaohui Qin, Qiankun Liu, Chao Huang · 2026

We present SemanticQA, an evaluation suite designed to assess language models (LMs) in semantic phrase processing tasks. The benchmark consolidates existing multiword expression (MwE) resources and re…

Read Paper →

AI & Data Science Preprint PDF DOI

MetFuse: Figurative Fusion between Metonymy and Metaphor

Saptarshi Ghosh, Tianyu Jiang · 2026

Metonymy and metaphor often co-occur in natural language, yet computational work has studied them largely in isolation. We introduce a framework that transforms a literal sentence into three figurativ…

Read Paper →

Computer Science Preprint PDF DOI

Faster Approximate Linear Matroid Intersection

Tatsuya Terao · 2026

We consider a fast approximation algorithm for the linear matroid intersection problem. In this problem, we are given two $r \times n$ matrices $M_1$ and $M_2$, and the objective is to find a largest …

Read Paper →

AI & Data Science Preprint PDF DOI

Exemplar Retrieval Without Overhypothesis Induction: Limits of Distributional Sequence Learning in Early Word Learning

Jon-Paul Cacioli · 2026

Background: Children do not simply learn that balls are round and blocks are square. They learn that shape is the kind of feature that tends to define object categories -- a second-order generalisatio…

Read Paper →

AI & Data Science Preprint PDF DOI

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Greta Gorzoni, Ludovica Pannitto, Francesca Masini · 2026

Interpretability research has highlighted the importance of evaluating Pretrained Language Models (PLMs) and in particular contextual embeddings against explicit linguistic theories to determine what …

Read Paper →

AI & Data Science Preprint PDF DOI

Towards a theory of morphology-driven marking in the lexicon: The case of the state

Mohamed El Idrissi · 2026

All languages have a noun category, but its realisation varies considerably. Depending on the language, semantic and/or morphosyntactic differences may be more or less pronounced. This paper explores …

Read Paper →

AI & Data Science Preprint PDF DOI

Repetition Without Exclusivity: Scale Sensitivity of Referential Mechanisms in Child-Scale Language Models

Jon-Paul Cacioli · 2026

We present the first systematic evaluation of mutual exclusivity (ME) -- the bias to map novel words to novel referents -- in text-only language models trained on child-directed speech. We operational…

Read Paper →

AI & Data Science Preprint PDF DOI

Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds

Edward Izgorodin · 2026

AI memory systems increasingly organize knowledge into graph structures -- knowledge graphs, entity relations, community hierarchies -- yet lack a principled mechanism for continuous resolution contro…

Read Paper →

AI & Data Science Preprint PDF DOI

A theoretical model of dynamical grammatical gender shifting based on set-valued set function

Mohamed El Idrissi · 2026

This study investigates the diverse characteristics of nouns, focusing on both semantic (e.g., countable/uncountable) and morphosyntactic (e.g., masculine/feminine) distinctions. We explore inter-word…

Read Paper →

AI & Data Science Preprint PDF DOI

3D-DRES: Detailed 3D Referring Expression Segmentation

Qi Chen, Changli Wu, Jiayi Ji, Yiwei Ma, Liujuan Cao · 2026

Current 3D visual grounding tasks only process sentence level detection or segmentation, which critically fails to leverage the rich compositional contextual reasonings within natural language express…

Read Paper →

Mathematics Preprint PDF DOI

Robust Kaczmarz methods for nearly singular linear systems

Yunying Ke, Hao Luo · 2026

The Kaczmarz method is an efficient iterative algorithm for large-scale linear systems. However, its linear convergence rate suffers from ill-conditioned problems and is highly sensitive to the smalle…

Read Paper →

Computer Science Preprint PDF DOI

A Three-stage Neuro-symbolic Recommendation Pipeline for Cultural Heritage Knowledge Graphs

Krzysztof Kutt, Elzbieta Sroka, Oleksandra Ishchuk, Luiz do Valle Miranda · 2026

The growing volume of digital cultural heritage resources highlights the need for advanced recommendation methods capable of interpreting semantic relationships between heterogeneous data entities. Th…

Read Paper →

AI & Data Science Preprint PDF DOI

Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation

Lorenzo Mur Labadia, Ruben Martinez-Cantin, Jose J.Guerrero, Giovanni M. Farinella, Antonino Furnari · 2026

Short Term object-interaction Anticipation consists in detecting the location of the next active objects, the noun and verb categories of the interaction, as well as the time to contact from the obser…

Read Paper →

AI & Data Science Preprint PDF DOI

Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

Ruchira Dhar, Qiwei Peng, Anders S{o}gaard · 2026

Compositionality is considered central to language abilities. As performant language systems, how do large language models (LLMs) do on compositional tasks? We evaluate adjective-noun compositionality…

Read Paper →

Browse Research Papers

Kwai Summary Attention Technical Report

Neural Recovery of Historical Lexical Structure in Bantu Languages from Modern Data

Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via Cross-Lingual Transfer and Unsupervised Clustering

An introduction to separated graphs and their type semigroups

Exploring Concreteness Through a Figurative Lens

More Than Meets the Eye: Measuring the Semiotic Gap in Vision-Language Models via Semantic Anchorage

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

MetFuse: Figurative Fusion between Metonymy and Metaphor

Faster Approximate Linear Matroid Intersection

Exemplar Retrieval Without Overhypothesis Induction: Limits of Distributional Sequence Learning in Early Word Learning

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Towards a theory of morphology-driven marking in the lexicon: The case of the state

Repetition Without Exclusivity: Scale Sensitivity of Referential Mechanisms in Child-Scale Language Models

Semantic Level of Detail: Multi-Scale Knowledge Representation via Heat Kernel Diffusion on Hyperbolic Manifolds

A theoretical model of dynamical grammatical gender shifting based on set-valued set function

3D-DRES: Detailed 3D Referring Expression Segmentation

Robust Kaczmarz methods for nearly singular linear systems

A Three-stage Neuro-symbolic Recommendation Pipeline for Cultural Heritage Knowledge Graphs

Integrating Affordances and Attention models for Short-Term Object Interaction Anticipation

Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

Browse by Category

Research Type

Publish Your Research