Goran Muric in Computer Science — Research Repository

Computer Science Preprint PDF DOI

Real-Time Control of a Virtual Orchestra by Recognition of Conducting Gestures

Mert Mermerci, Emile Pascoe, Fredrik Edstrom, Hedvig Kjellstrom · 2026

We present a museum installation in a 180{\deg} dome theater, which gives the museum visitor the experience of conducting a symphony orchestra. We have pre-recorded a short music piece performed by a …

Read Paper →

Computer Science Preprint PDF DOI

SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

Xuzheng He, Nan Nan, Zhilin Wang, Ziyue Kang, Zhuoru Mo, Ao Li, Yu Pan, Xiaobing Li, Feng Yu, Xiaohong Guan · 2026

Generating symphonic music requires simultaneously managing high-level structural form and dense, multi-track orchestration. Existing symbolic models often struggle with a "complexity-control imbalanc…

Read Paper →

Computer Science Preprint PDF DOI

An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization

Leekyung Kim, Jonghun Park · 2026

Automatic chord recognition (ACR) extracts time-aligned chord labels from music audio recordings. Despite recent advances, ACR still struggles with oversegmentation, data scarcity, and imbalance, espe…

Read Paper →

Computer Science Preprint PDF DOI

From Players to Participants: Citizen Science and Video Games to Understand Cognition

Syrine Salouhou, Edgar Dubourg, Maxwell Scott-Slade, Hugo Spiers, Antoine Coutrot · 2026

Citizen science is transforming how cognitive scientists study the human mind, and video games are at the heart of this shift. By embedding experimental tasks into engaging, game-like experiences, res…

Read Paper →

Computer Science Preprint PDF DOI

MUSIC: Learning Muscle-Driven Dexterous Hand Control

Pei Xu, Yufei Ye, Shuchun Sun, Yu Ding, Elizabeth Schumann, C. Karen Liu · 2026

We present a data-driven approach for physics-based, muscle-driven dexterous control that enables musculoskeletal hands to perform precise piano playing for novel pieces of music outside the reference…

Read Paper →

Computer Science Preprint PDF DOI

Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments

Charles Patrick Martin · 2026

Machine generation of symbolic music and digital audio are hot topics but there have been relatively few digital musical instruments that integrate generative AI. Present musical AI tools are not arti…

Read Paper →

Computer Science Preprint PDF DOI

CineAGI: Character-Consistent Movie Creation through LLM-Orchestrated Multi-Modal Generation and Cross-Scene Integration

Tianyidan Xie, Zhentao Huang, Mingjie Wang, Xin Huang, Jun Zhou, Minglun Gong, Zili Yi · 2026

Automated movie creation requires coordinating multiple characters, modalities, and narrative elements across extended sequences -- a challenge that existing end-to-end approaches struggle to address …

Read Paper →

Computer Science Preprint PDF DOI

Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

Yan-Martin Tamm, Anna Aljanaki · 2026

Over the years, Music Information Retrieval (MIR) research community has released various models pretrained on large amounts of music data. Transfer learning showcases the proven effectiveness of pret…

Read Paper →

Computer Science Preprint PDF DOI

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

Maximilian Wachter, Sebastian Murgul, Michael Heizmann · 2026

Rhythm transcription is a key subtask of notation-level Automatic Music Transcription (AMT). While deep learning models have been extensively used for detecting the metrical grid in audio and MIDI per…

Read Paper →

Computer Science Preprint PDF DOI

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

Menghe Ma, Siqing Wei, Yuecheng Xing, Yaheng Wang, Fanhong Meng, Peijun Han, Luu Anh Tuan, Haoran Luo · 2026

Omnimodal Notation Processing (ONP) represents a unique frontier for omnimodal AI due to the rigorous, multi-dimensional alignment required across auditory, visual, and symbolic domains. Current resea…

Read Paper →

Computer Science Preprint PDF DOI

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

Nan Xu, Shiheng Li, Shengchao Hou · 2026

We propose a new approach for a practical two-stage Optical Music Recognition (OMR) pipeline, with a particular focus on its second stage. Given symbol and event candidates from the visual pipeline, w…

Read Paper →

Computer Science Preprint PDF DOI

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

Lekai Qian, Haoyu Gu, Jingwei Zhao, Ziyu Wang · 2026

Tokenizing music to fit the general framework of language models is a compelling challenge, especially considering the diverse symbolic structures in which music can be represented (e.g., sequences, g…

Read Paper →

Computer Science Preprint PDF DOI

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

Feiyu Zhao, Yiming Chen, Wenhuan Lu, Daipeng Zhang, Xianghu Yue, Jianguo Wei · 2026

Large Audio-Language Models (LALMs) have recently achieved strong performance across various audio-centric tasks. However, hallucination, where models generate responses that are semantically incorrec…

Read Paper →

Computer Science Preprint PDF DOI

Latent Fourier Transform

Mason Wang, Cheng-Zhi Anna Huang · 2026

We introduce the Latent Fourier Transform (LatentFT), a framework that provides novel frequency-domain controls for generative music models. LatentFT combines a diffusion autoencoder with a latent-spa…

Read Paper →

Computer Science Preprint PDF DOI

A novel LSTM music generator based on the fractional time-frequency feature extraction

Li Ya, Chen Wei, Li Xiulai, Yu Lei, Deng Xinyi, Chen Chaofan · 2026

In this paper, we propose a novel approach for generating music based on an artificial intelligence (AI) system. We analyze the features of music and use them to fit and predict the music. The fractio…

Read Paper →

Computer Science Preprint PDF DOI

Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation

Vaibhavi Lokegaonkar, Aryan Vijay Bhosale, Vishnu Raj, Gouthaman KV, Ramani Duraiswami, Lie Lu, Sreyan Ghosh, Dinesh Manocha · 2026

Video-to-music (V2M) is the fundamental task of creating background music for an input video. Recent V2M models achieve audiovisual alignment by typically relying on visual conditioning alone and prov…

Read Paper →

Computer Science Preprint PDF DOI

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Heewon Oh · 2026

We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs…

Read Paper →

Computer Science Preprint PDF DOI

TinyMU: A Compact Audio-Language Model for Music Understanding

Xiquan Li, Aurian Quelennec, Slim Essid · 2026

Music understanding and reasoning are central challenges in the Music Information Research field, with applications ranging from retrieval and recommendation to music agents and virtual assistants. Re…

Read Paper →

Computer Science Preprint PDF DOI

A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven's Piano and Cello Sonatas

Ignasi Sole · 2026

Empirical performance analysis depends on the accurate extraction of tempo data from recordings, yet standard computational tools, designed for monophonic audio or modern studio conditions, fail syste…

Read Paper →

Computer Science Preprint PDF DOI

Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models

Yanda Li, Yuhan Liu, Zirui Song, Yunchao Wei, Martin Takac, Salem Lahlou · 2026

Large audio-language models (LALMs) generalize across speech, sound, and music, but unified decoders can exhibit a \emph{temporal smoothing bias}: transient acoustic cues may be underutilized in favor…

Read Paper →

Browse Research Papers

Real-Time Control of a Virtual Orchestra by Recognition of Conducting Gestures

SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton

An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization

From Players to Participants: Citizen Science and Video Games to Understand Cognition

MUSIC: Learning Muscle-Driven Dexterous Hand Control

Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments

CineAGI: Character-Consistent Movie Creation through LLM-Orchestrated Multi-Modal Generation and Cross-Scene Integration

Adopting State-of-the-Art Pretrained Audio Representations for Music Recommender Systems

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations

ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence

From Image to Music Language: A Two-Stage Structure Decoding Approach for Complex Polyphonic OMR

BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

Latent Fourier Transform

A novel LSTM music generator based on the fractional time-frequency feature extraction

Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

TinyMU: A Compact Audio-Language Model for Music Understanding

A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven's Piano and Cello Sonatas

Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models

Browse by Category

Research Type

Publish Your Research