Akshay Soni in Engineering — Research Repository

Engineering Preprint PDF DOI

SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment

Dapeng Wu, Shun Lei, Wei Tan, Guangzheng Li, Yunzhe Wang, Huaicheng Zhang, Lishi Zuo, Zhiyong Wu · 2026

Recent advancements in Text-to-Song generation have enabled realistic musical content production, yet existing evaluation benchmarks lack the professional granularity to capture multi-dimensional aest…

Read Paper →

Engineering Preprint PDF DOI

Quantitative measurements of biological/chemical concentrations using smartphone cameras

Zhendong Cao, Hongji Dai, Zhida Li, Ash Parameswaran · 2026

This paper presents a smartphone-based imaging system capable of quantifying the concentration of an assortment of biological/chemical assay samples. The main objective is to construct an image databa…

Read Paper →

Engineering Preprint PDF DOI

DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming

Hung-Chieh Fang, Amber Xie, Jennifer Grannen, Kenneth Llontop, Dorsa Sadigh · 2026

Performing in-hand, contact-rich, and long-horizon dexterous manipulation remains an unsolved challenge in robotics. Prior hand dexterity works have considered each of these three challenges in isolat…

Read Paper →

Engineering Preprint PDF DOI

SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

Jianyi Chen, Rongxiu Zhong, Shilei Zhang, Kun Qian, Jinglei Liu, Yike Guo, Wei Xue · 2026

Composing coherent long-form music remains a significant challenge due to the complexity of modeling long-range dependencies and the prohibitive memory and computational requirements associated with l…

Read Paper →

Engineering Preprint PDF DOI

CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

Peng Ren, Chuan Qi, Haoyang Ge, Qiyuan Su, Xuguo He, Cong Huang, Pei Chi, Jiang Zhao, Kai Chen · 2026

Dynamic ball-interaction tasks remain challenging for robots because they require tight perception-action coupling under limited reaction time. This challenge is especially pronounced in humanoid rack…

Read Paper →

Engineering Preprint PDF DOI

DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming

Shuide Wen, Sungil Seok, Beier Ku, Richee Li, Yubin He, Bowen Qu, Yang Yang, Ping Su, Can Jiao · 2026

We present DLIOS, a Large Language Model (LLM)-augmented real-time multi-modal interactive enhancement overlay system for Douyin (TikTok) live streaming. DLIOS employs a three-layer transparent window…

Read Paper →

Engineering Preprint PDF DOI

Using Songs to Improve Kazakh Automatic Speech Recognition

Rustem Yeshpanov · 2026

Developing automatic speech recognition (ASR) systems for low-resource languages is hindered by the scarcity of transcribed corpora. This proof-of-concept study explores songs as an unconventional yet…

Read Paper →

Engineering Preprint PDF DOI

An Efficient Power Management Unit With Continuous MPPT and Energy Recycling for Wireless Millimetric Biomedical Implants

Yiwei Zou, Huan-Cheng Liao, Wei Wang, Wonjune Kim, Yumin Su, Jacob T. Robinson, Kaiyuan Yang · 2026

Biomedical implants offer transformative tools to improve medical outcomes. To realize minimally invasive implants with miniaturized volume and weight, wireless power transfer has been extensively stu…

Read Paper →

Engineering Preprint PDF DOI

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge

Guobin Ma, Yuxuan Xia, Jixun Yao, Huixin Xue, Hexin Liu, Shuai Wang, Hao Liu, Lei Xie · 2026

This paper summarizes the ICASSP 2026 Automatic Song Aesthetics Evaluation (ASAE) Challenge, which focuses on predicting the subjective aesthetic scores of AI-generated songs. The challenge consists o…

Read Paper →

Engineering Preprint PDF DOI

Leveraging Adaptive Group Negotiation for Heterogeneous Multi-Robot Collaboration with Large Language Models

Siqi Song, Xuanbing Xie, Zonglin Li, Yuqiang Li, Shijie Wang, Biqing Qi · 2025

Multi-robot collaboration tasks often require heterogeneous robots to work together over long horizons under spatial constraints and environmental uncertainties. Although Large Language Models (LLMs) …

Read Paper →

Engineering Preprint PDF DOI

Certifiable Alignment of GNSS and Local Frames via Lagrangian Duality

Baoshan Song, Matthew Giamou, Penggao Yan, Chunxi Xia, Li-Ta Hsu · 2025

Estimating the absolute orientation of a local system relative to a global navigation satellite system (GNSS) reference often suffers from local minima and high dependency on satellite availability. E…

Read Paper →

Engineering Preprint PDF DOI

Structure-Aware Antibody Design with Affinity-Optimized Inverse Folding

Xinyan Zhao, Yi-Ching Tang, Rivaaj Monsia, Victor J. Cantu, Ashwin Kumar Ramesh, Xiaozhong Liu, Zhiqiang An, Xiaoqian Jiang, Yejin Kim · 2025

Motivation: The clinical efficacy of antibody therapeutics critically depends on high-affinity target engagement, yet laboratory affinity-maturation campaigns are slow and costly. In computational set…

Read Paper →

Engineering Preprint PDF DOI

L1 Sample Flow for Efficient Visuomotor Learning

Weixi Song, Zhetao Chen, Tao Xu, Xianchao Zeng, Xinyu Zhou, Lixin Yang, Donglin Wang, Cewu Lu, Yong-Lu Li · 2025

Denoising-based models, such as diffusion and flow matching, have been a critical component of robotic manipulation for their strong distribution-fitting and scaling capacity. Concurrently, several wo…

Read Paper →

Engineering Preprint PDF DOI

Music Flamingo: Scaling Music Understanding in Audio Language Models

Sreyan Ghosh, Arushi Goel, Lasha Koroshinadze, Sang-gil Lee, Zhifeng Kong, Joao Felipe Santos, Ramani Duraiswami, Dinesh Manocha, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro · 2025

We introduce Music Flamingo, a novel large audio-language model designed to advance music (including song) understanding in foundational audio models. While audio-language research has progressed rapi…

Read Paper →

Engineering Preprint PDF DOI

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

Zhengyi Luo, Ye Yuan, Tingwu Wang, Chenran Li, Sirui Chen, Fernando Castaneda, Zi-Ang Cao, Jiefeng Li, David Minor, Qingwei Ben, Xingye Da, Runyu Ding, Cyrus Hogg, Lina Song, Edy Lim, Eugene Jeong, Tairan He, Haoru Xue, Wenli Xiao, Zi Wang, Simon Yuen, Jan Kautz, Yan Chang, Umar Iqbal, Linxi "Jim" Fan, Yuke Zhu · 2025

Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remai…

Read Paper →

Engineering Preprint PDF DOI

FGO MythBusters: Explaining how Kalman Filter variants achieve the same performance as FGO in navigation applications

Baoshan Song, Ruijie Xu, Li-Ta Hsu · 2025

Sliding window-factor graph optimization (SW-FGO) has gained more and more attention in navigation research due to its robust approximation to non-Gaussian noises and nonlinearity of measuring models.…

Read Paper →

Engineering Preprint PDF DOI

Global-State-Free Obstacle Avoidance for Quadrotor Control in Air-Ground Cooperation

Baozhe Zhang, Xinwei Chen, Qingcheng Chen, Chao Xu, Fei Gao, Yanjun Cao · 2025

CoNi-MPC provides an efficient framework for UAV control in air-ground cooperative tasks by relying exclusively on relative states, eliminating the need for global state estimation. However, its lack …

Read Paper →

Engineering Preprint PDF DOI

DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching

Yuepeng Jiang, Huakang Chen, Ziqian Ning, Jixun Yao, Zerui Han, Di Wu, Meng Meng, Jian Luan, Zhonghua Fu, Lei Xie · 2025

Generating full-length, high-quality songs is challenging, as it requires maintaining long-term coherence both across text and music modalities and within the music modality itself. Existing non-autor…

Read Paper →

Engineering Preprint PDF DOI

Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy

Weixu Zhu, Marco Dorigo, Mary Katherine Heinrich · 2025

Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, …

Read Paper →

Engineering Preprint PDF DOI

SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision

Chunbo Hao, Ruibin Yuan, Jixun Yao, Qixin Deng, Xinyi Bai, Yanbo Wang, Wei Xue, Lei Xie · 2025

Music structure analysis (MSA) underpins music understanding and controllable generation, yet progress has been limited by small, inconsistent corpora. We present SongFormer, a scalable framework that…

Read Paper →

Browse Research Papers

SongBench: A Fine-Grained Multi-Aspect Benchmark for Song Quality Assessment

Quantitative measurements of biological/chemical concentrations using smartphone cameras

DexDrummer: In-Hand, Contact-Rich, and Long-Horizon Dexterous Robot Drumming

SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports

DLIOS: An LLM-Augmented Real-Time Multi-Modal Interactive Enhancement Overlay System for Douyin Live Streaming

Using Songs to Improve Kazakh Automatic Speech Recognition

An Efficient Power Management Unit With Continuous MPPT and Energy Recycling for Wireless Millimetric Biomedical Implants

The ICASSP 2026 Automatic Song Aesthetics Evaluation Challenge

Leveraging Adaptive Group Negotiation for Heterogeneous Multi-Robot Collaboration with Large Language Models

Certifiable Alignment of GNSS and Local Frames via Lagrangian Duality

Structure-Aware Antibody Design with Affinity-Optimized Inverse Folding

L1 Sample Flow for Efficient Visuomotor Learning

Music Flamingo: Scaling Music Understanding in Audio Language Models

SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control

FGO MythBusters: Explaining how Kalman Filter variants achieve the same performance as FGO in navigation applications

Global-State-Free Obstacle Avoidance for Quadrotor Control in Air-Ground Cooperation

DiffRhythm 2: Efficient and High Fidelity Song Generation via Block Flow Matching

Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy

SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision

Browse by Category

Research Type

Publish Your Research