Keval Vora in Engineering — Research Repository

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Qituan Shangguan, Junhao Du, Kunyang Peng, Feng Xue, Hui Zhang, Xinsheng Wang, Kai Yu, Shuai Wang · 2026

Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker a…

Read Paper →

Engineering Preprint PDF DOI

Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

Jaskirat Sudan, Hashim Ali, Surya Subramani, Hafiz Malik · 2026

Supervised contrastive learning (SupCon) is widely used to shape representations, but has seen limited targeted study for audio deepfake detection. Existing work typically combines contrastive terms w…

Read Paper →

Engineering Preprint PDF DOI

CRC-SAM: SAM-Based Multi-Modal Segmentation and Quantification of Colorectal Cancer in CT, Colonoscopy, and Histology Images

Daniel Lao · 2026

We present CRC-SAM, a unified framework for colorectal cancer segmentation across colonoscopy, CT, and histopathology images. Unlike prior single-modality methods, CRC-SAM provides consistent, modalit…

Read Paper →

Engineering Preprint PDF DOI

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations

Paul A. Bereuter, Alois Sontacchi · 2026

Evaluation of musical source separation (MSS) has traditionally relied on Blind Source Separation Evaluation (BSS-Eval) metrics. However, recent work suggests that BSS-Eval metrics exhibit low correla…

Read Paper →

Engineering Preprint PDF DOI

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

Jean Mercat, Sedrick Keh, Kushal Arora, Isabella Huang, Paarth Shah, Haruki Nishimura, Shun Iwase, Katherine Liu · 2026

We present VLA Foundry, an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. Most open-source VLA efforts specialize on the action training stage, often stitching tog…

Read Paper →

Engineering Preprint PDF DOI

Dual-Radio BLE-LoRa Hierarchical Mesh for Infrastructure-Free Emergency Communication

Andrii Vakhnovskyi · 2026

We present a dual-radio hierarchical mesh architecture for infrastructure-free emergency communication that exploits the complementary strengths of Bluetooth Low Energy (BLE) and LoRa. Nodes equipped …

Read Paper →

Engineering Preprint PDF DOI

Noncoherent Maximum Likelihood Detection for LoRa Signals in Multipath Fading

The Khai Nguyen, Ebrahim Bedeer, Robert Barton · 2026

This letter derives the noncoherent (NC) maximum likelihood (ML) detection rule for LoRa signals under Rician multi-path fading channel. The proposed NC-ML detection only requires the channel statisti…

Read Paper →

Engineering Preprint PDF DOI

Rapid LoRA Aggregation for Wireless Channel Adaptation in Open-Set Radio Frequency Fingerprinting

Mingxi Zhang, Renjie Xie, Jincheng Wang, Guyue Li, Wei Xu · 2026

Radio frequency fingerprints (RFFs) enable secure wireless authentication but struggle in open-set scenarios with unknown devices and varying channels. Existing methods face challenges in generalizati…

Read Paper →

Engineering Preprint PDF DOI

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Qixi Zheng, Yuxiang Zhao, Tianrui Wang, Wenxi Chen, Kele Xu, Yikang Li, Qinyuan Chen, Xipeng Qiu, Kai Yu, Xie Chen · 2026

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion …

Read Paper →

Engineering Preprint PDF DOI

Graph-Enhanced LLM for SWAN-ISAC

Qian Gao, Ruikang Zhong, Yuanwei Liu · 2026

Segmented pinching antenna assisted integrated sensing and communication (ISAC) systems enable flexible spatial resource utilization by allowing different waveguide segments to be dynamically configur…

Read Paper →

Engineering Preprint PDF DOI

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

Mingyang Xie, Jin Wei-Kocsis · 2026

The integration of large language models (LLMs) into robotic control pipelines enables natural language interfaces that translate user prompts into executable commands. However, this digital-to-physic…

Read Paper →

Engineering Preprint PDF DOI

Hypernetwork-Conditioned Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Actuator Failures

Dennis Marquis, Mazen Farhood · 2026

This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to certain actuator failures. The controller is condi…

Read Paper →

Engineering Preprint PDF DOI

HyperCT: Low-Rank Hypernet for Unified Chest CT Analysis

Fengbei Liu, Sunwoo Kwak, Hao Phung, Nusrat Binta Nizam, Ilan Richter, Nir Uriel, Hadar Averbuch-Elor, Daborah Estrin, Mert R. Sabuncu · 2026

Non-contrast chest CTs offer a rich opportunity for both conventional pulmonary and opportunistic extra-pulmonary screening. While Multi-Task Learning (MTL) can unify these diverse tasks, standard har…

Read Paper →

Engineering Preprint PDF DOI

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

Junwei You, Pei Li, Zhuoyu Jiang, Weizhe Tang, Zilin Huang, Rui Gan, Jiaxi Liu, Yan Zhao, Sikai Chen, Bin Ran · 2026

Multimodal large language models (MLLMs) have shown strong potential for autonomous driving, yet existing benchmarks remain largely ego-centric and therefore cannot systematically assess model perform…

Read Paper →

Engineering Preprint PDF DOI

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

Ashwini Dasare, Nirmesh Shah, Ashishkumar Gudmalwar, Pankaj Wasnik · 2026

Evaluating AI generated dubbed content is inherently multi-dimensional, shaped by synchronization, intelligibility, speaker consistency, emotional alignment, and semantic context. Human Mean Opinion S…

Read Paper →

Engineering Preprint PDF DOI

Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems

Iman Sharifi, Alex Zongo, Peng Wei · 2026

The growing deployment of small Unmanned Aerial Systems (sUASs) in low-altitude airspaces has increased the need for reliable tactical deconfliction under safety-critical constraints. Tactical deconfl…

Read Paper →

Engineering Preprint PDF DOI

Hybrid Diffusion Model for Breast Ultrasound Image Augmentation

Farhan Fuad Abir, Sanjeda Sara Jennifer, Niloofar Yousefi, Laura J. Brattain · 2026

We propose a hybrid diffusion-based augmentation framework to overcome the critical challenge of ultrasound data augmentation in breast ultrasound (BUS) datasets. Unlike conventional diffusion-based a…

Read Paper →

Engineering Preprint PDF DOI

DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers

Tianyu Cao, Helin Wang, Ari Frummer, Yuval Sieradzki, Adi Arbel, Laureano Moro Velazquez, Jesus Villalba, Oren Gal, Thomas Thebaud, Najim Dehak · 2026

Recent advances in generative models, such as diffusion and flow matching, have shown strong performance in audio tasks. However, speech enhancement (SE) models are typically trained on limited datase…

Read Paper →

Engineering Preprint PDF DOI

Developing an ESG-Oriented Large Language Model through ESG Practices

Gabriel Assis, Ayrton Surica, Pedro Kroll, Gabriela Aires, Darian Rabbani, Edson Bollis, Lucas Pellicer, Aline Paes · 2026

Environmental, Social, and Governance (ESG) considerations play a central role in contemporary financial decision-making. In parallel, Large Language Model (LLM) applications in this domain have prima…

Read Paper →

Browse Research Papers

BUT System Description for CHiME-9 MCoRec Challenge

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Similarity Choice and Negative Scaling in Supervised Contrastive Learning for Deepfake Audio Detection

CRC-SAM: SAM-Based Multi-Modal Segmentation and Quantification of Colorectal Cancer in CT, Colonoscopy, and Histology Images

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations

VLA Foundry: A Unified Framework for Training Vision-Language-Action Models

Dual-Radio BLE-LoRa Hierarchical Mesh for Infrastructure-Free Emergency Communication

Noncoherent Maximum Likelihood Detection for LoRa Signals in Multipath Fading

Rapid LoRA Aggregation for Wireless Channel Adaptation in Open-Set Radio Frequency Fingerprinting

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Graph-Enhanced LLM for SWAN-ISAC

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

Hypernetwork-Conditioned Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Actuator Failures

HyperCT: Low-Rank Hypernet for Unified Chest CT Analysis

V2X-QA: A Comprehensive Reasoning Dataset and Benchmark for Multimodal Large Language Models in Autonomous Driving Across Ego, Infrastructure, and Cooperative Views

Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content?

Fine-Tuning Large Language Models for Cooperative Tactical Deconfliction of Small Unmanned Aerial Systems

Hybrid Diffusion Model for Breast Ultrasound Image Augmentation

DiT-Flow: Speech Enhancement Robust to Multiple Distortions based on Flow Matching in Latent Space and Diffusion Transformers

Developing an ESG-Oriented Large Language Model through ESG Practices

Browse by Category

Research Type

Publish Your Research