Omar Ramirez in Computer Science — Research Repository

Computer Science Preprint PDF DOI

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

Nan Xu, Shiheng Li, Shengchao Hou · 2026

We propose a new approach for a practical two-stage Optical Music Recognition (OMR) pipeline, with a particular focus on its second stage. Given symbol and event candidates from the visual pipeline, w…

Read Paper →

Computer Science Preprint PDF DOI

Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models

Xiang He, Chenxing Li, Jinting Wang, Yan Rong, Tianxin Xie, Wenfu Wang, Li Liu, Dong Yu · 2026

Large Audio-Language Models (LALMs) have made significant progress in audio understanding, yet they primarily operate as perception-and-answer systems without explicit reasoning processes. Existing me…

Read Paper →

Computer Science Preprint PDF DOI

EvA: An Evidence-First Audio Understanding Paradigm for LALMs

Xinyuan Xie, Shunian Chen, Zhiheng Liu, Yuhao Zhang, Zhiqiang Lv, Liyin Liang, Benyou Wang · 2026

Large Audio Language Models (LALMs) still struggle in complex acoustic scenes because they often fail to preserve task-relevant acoustic evidence before reasoning begins. We call this failure the evid…

Read Paper →

Computer Science Preprint PDF DOI

TAC: Timestamped Audio Captioning

Sonal Kumar, Prem Seetharaman, Ke Chen, Oriol Nieto, Jiaqi Su, Zhepei Wang, Rithesh Kumar, Dinesh Manocha, Nicholas J. Bryan, Zeyu Jin, Justin Salamon · 2026

Large Audio Language Models struggle to disentangle overlapping events in complex acoustic scenes, yielding temporally inconsistent captions and frequent hallucinations. We introduce Timestamped Audio…

Read Paper →

Computer Science Preprint PDF DOI

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

Ziyang Ma, Ruiyang Xu, Yinghao Ma, Chao-Han Huck Yang, Bohan Li, Jaeyeon Kim, Jin Xu, Jinyu Li, Carlos Busso, Kai Yu, Eng Siong Chng, Xie Chen · 2026

Recent Large Audio Language Models (LALMs) excel in understanding but often lack transparent reasoning. To address this "black-box" limitation, we organized the Audio Reasoning Challenge at Interspeec…

Read Paper →

Computer Science Preprint PDF DOI

AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning

Siqian Tong, Xuan Li, Yiwei Wang, Baolong Bi, Yujun Cai, Shenghua Liu, Yuchen He, Chengpeng Hao · 2026

Large Audio Language Models (LALMs) excel at perception but struggle with complex reasoning requiring precise acoustic measurements. While external tools can extract fine-grained features like exact t…

Read Paper →

Computer Science Preprint PDF DOI

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding

Jiaming Zhou, Xuxin Cheng, Shiwan Zhao, Yuhang Jia, Cao Liu, Ke Zeng, Xunliang Cai, Yong Qin · 2026

Autoregressive (AR) large audio language models (LALMs) such as Qwen-2.5-Omni have achieved strong performance on audio understanding and interaction, but scaling them remains costly in data and compu…

Read Paper →

Computer Science Preprint PDF DOI

Bridging the Perception Gap: A Lightweight Coarse-to-Fine Architecture for Edge Audio Systems

Hengfan Zhang, Yueqian Lin, Hai Helen Li, Yiran Chen · 2026

Deploying Audio-Language Models (Audio-LLMs) on edge infrastructure exposes a persistent tension between perception depth and computational efficiency. Lightweight local models tend to produce passive…

Read Paper →

Computer Science Preprint PDF DOI

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

Yanxi Chen, Wenhui Zhu, Xiwen Chen, Zhipeng Wang, Xin Li, Peijie Qiu, Hao Wang, Xuanzhao Dong, Yujian Xiong, Anderson Schneider, Yuriy Nevmyvaka, Yalin Wang · 2025

Although Large Audio-Language Models (LALMs) deliver state-of-the-art (SOTA) performance, they frequently suffer from hallucinations, e.g. generating text not grounded in the audio input. We analyze t…

Read Paper →

Computer Science Preprint PDF DOI

Hyper-Minrank: A Unified Hypergraph Characterization of Multi-Sender Index Coding

Ali Khalesi, Petros Elia · 2025

This work introduces a hypergraph formulation that generalizes the classical paradigm of Bar-Yossef et al. to the multi-sender index coding (MSIC) setting. Central to the model is a 4-regular side-inf…

Read Paper →

Computer Science Preprint PDF DOI

Multi-Objective Agentic Rewrites for Unstructured Data Processing

Lindsey Linxi Wei, Shreya Shankar, Sepanta Zeighami, Yeounoh Chung, Fatma Ozcan, Aditya G. Parameswaran · 2025

One year ago, we open-sourced DocETL, a declarative system for LLM-powered data processing that, as of March 2026, has 3.7K GitHub stars and users across domains (e.g., journalism, law, medicine, poli…

Read Paper →

Computer Science Preprint PDF DOI

SAR-LM: Symbolic Audio Reasoning with Large Language Models

Termeh Taheri, Yinghao Ma, Emmanouil Benetos · 2025

Large language models (LLMs) have advanced in text and vision, but their reasoning on audio remains limited. Most existing methods rely on dense audio embeddings, which are difficult to interpret and …

Read Paper →

Computer Science Preprint PDF DOI

Artificial intelligence and the Gulf Cooperation Council workforce adapting to the future of work

Mohammad Rashed Albous, Melodena Stephens, Odeh Rashed Al-Jayyousi · 2025

The rapid expansion of artificial intelligence (AI) in the Gulf Cooperation Council (GCC) raises a central question: are investments in compute infrastructure matched by an equally robust build-out of…

Read Paper →

Computer Science Preprint PDF DOI

Coverage Analysis and Optimization of FIRES-Assisted NOMA and OMA Systems

Farshad Rostami Ghadi, Kai-Kit Wong, Masoud Kaveh, Hanjiang Hong, Chan-Byoung Chae, Lajos Hanzo · 2025

Fluid integrated reflecting and emitting surfaces (FIRES) are investigated. In these metasurfaces, each subarea hosts an active element capable of simultaneous transmission and reflection, phase, and …

Read Paper →

Computer Science Preprint PDF DOI

Communication Platform for Non-verbal Autistic children in Oman using Android mobile

Amna Al-Araimi, Yue Zheng, Haiming Liu · 2025

This paper discusses the issue regarding Non-verbal Autism Spectrum Disorder. It has been observed that this mental disorder is listed in major parts of the world including the US, UK, and India. To m…

Read Paper →

Computer Science Preprint PDF DOI

AudioToolAgent: An Agentic Framework for Audio-Language Models

Gijs Wijngaard, Elia Formisano, Michel Dumontier, Jenia Jitsev · 2025

Large Audio-Language Models (LALMs) perform well on audio understanding tasks but lack multistep reasoning and tool-calling found in recent Large Language Models (LLMs). This paper presents AudioToolA…

Read Paper →

Computer Science Preprint PDF DOI

Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models

Junyu Wang, Ziyang Ma, Zhengding Luo, Tianrui Wang, Meng Ge, Xiaobao Wang, Longbiao Wang · 2025

Large Audio-Language Models (LALMs) often suffer from audio-textual attention imbalance, prioritizing text over acoustic information, particularly in the multi-modal fusion layers of the Transformer a…

Read Paper →

Computer Science Preprint PDF DOI

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering

Jinghua Zhao, Hang Su, Lichun Fan, Zhenbo Luo, Hui Wang, Haoqin Sun, Yong Qin · 2025

With the rapid progress of large audio-language models (LALMs), audio question answering (AQA) has emerged as a challenging task requiring both fine-grained audio understanding and complex reasoning. …

Read Paper →

Computer Science Preprint PDF DOI

BOLT: Bandwidth-Optimized Lightning-Fast Oblivious Map powered by Secure HBM Accelerators

Yitong Guo, Hongbo Chen, Haobin Hiroki Chen, Yukui Luo, XiaoFeng Wang, Chenghong Wang · 2025

While Trusted Execution Environments provide a strong foundation for secure cloud computing, they remain vulnerable to access pattern leakages. Oblivious Maps (OMAPs) mitigate this by fully hiding acc…

Read Paper →

Computer Science Preprint PDF DOI

LMAR: Language Model Augmented Retriever for Domain-specific Knowledge Indexing

Yao Zhao, Yantian Ding, Zhiyue Zhang, Dapeng Yao, Yanxun Xu · 2025

Retrieval Augmented Generation (RAG) systems often struggle with domain-specific knowledge due to performance deterioration of pre-trained embeddings and prohibitive computational costs of large langu…

Read Paper →

Browse Research Papers

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models

EvA: An Evidence-First Audio Understanding Paradigm for LALMs

TAC: Timestamped Audio Captioning

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

AuTAgent: A Reinforcement Learning Framework for Tool-Augmented Audio Reasoning

DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding

Bridging the Perception Gap: A Lightweight Coarse-to-Fine Architecture for Edge Audio Systems

AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives

Hyper-Minrank: A Unified Hypergraph Characterization of Multi-Sender Index Coding

Multi-Objective Agentic Rewrites for Unstructured Data Processing

SAR-LM: Symbolic Audio Reasoning with Large Language Models

Artificial intelligence and the Gulf Cooperation Council workforce adapting to the future of work

Coverage Analysis and Optimization of FIRES-Assisted NOMA and OMA Systems

Communication Platform for Non-verbal Autistic children in Oman using Android mobile

AudioToolAgent: An Agentic Framework for Audio-Language Models

Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models

Omni-CLST: Error-aware Curriculum Learning with guided Selective chain-of-Thought for audio question answering

BOLT: Bandwidth-Optimized Lightning-Fast Oblivious Map powered by Secure HBM Accelerators

LMAR: Language Model Augmented Retriever for Domain-specific Knowledge Indexing

Browse by Category

Research Type

Publish Your Research