Databases in Engineering — Research Repository

Engineering Preprint PDF DOI

Content-Based Image Retrieval Using COSFIRE Descriptors with application to Radio Astronomy

Steven Ndungu, Trienko Grobler, Stefan J. Wijnholds, George Azzopardi · 2024

The morphologies of astronomical sources are highly complex, making it essential not only to classify the identified sources into their predefined categories but also to determine the sources that are…

Read Paper →

Engineering Preprint PDF DOI

Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation

Yulin Wang, Honglin Xiong, Kaicong Sun, Shuwei Bai, Ling Dai, Zhongxiang Ding, Jiameng Liu, Qian Wang, Qian Liu, Dinggang Shen · 2024

Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. However, due to the accessibility of MRI scanners and their lengthy acquisition time, multimodal MR ima…

Read Paper →

Engineering Preprint PDF DOI

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions

Shuai Wang, Pengcheng Zhu, Haizhou Li · 2024

Fixed-dimensional speaker embeddings have become the dominant approach in speaker modeling, typically spanning hundreds to thousands of dimensions. These dimensions are hyperparameters that are not sp…

Read Paper →

Engineering Preprint PDF DOI

Language-based Audio Moment Retrieval

Hokuto Munakata, Taichi Nishimura, Shota Nakada, Tatsuya Komatsu · 2024

In this paper, we propose and design a new task called audio moment retrieval (AMR). Unlike conventional language-based audio retrieval tasks that search for short audio clips from an audio database, …

Read Paper →

Engineering Preprint PDF DOI

A Large Language Model and Denoising Diffusion Framework for Targeted Design of Microstructures with Commands in Natural Language

Nikita Kartashov, Nikolaos N. Vlassis · 2024

Microstructure plays a critical role in determining the macroscopic properties of materials, with applications spanning alloy design, MEMS devices, and tissue engineering, among many others. Computati…

Read Paper →

Engineering Preprint PDF DOI

Deep Learning-Based Channel Squeeze U-Structure for Lung Nodule Detection and Segmentation

Mingxiu Sui, Jiacheng Hu, Tong Zhou, Zibo Liu, Likang Wen, Junliang Du · 2024

This paper introduces a novel deep-learning method for the automatic detection and segmentation of lung nodules, aimed at advancing the accuracy of early-stage lung cancer diagnosis. The proposed appr…

Read Paper →

Engineering Preprint PDF DOI

Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information

Choonghan Kim, Seonhee Cho, Joo Heung Yoon · 2024

Background: Large language models (LLMs) are gaining use in clinical settings, but their performance can suffer with incomplete radiology reports. We tested whether multimodal LLMs (using text and ima…

Read Paper →

Engineering Preprint PDF DOI

Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation

Volodymyr Shcherbyna1, Linh Kastner, Diego Diaz, Huu Giang Nguyen, Maximilian Ho-Kyoung Schreff, Tim Lenz, Jonas Kreutz, Ahmed Martban, Huajian Zeng, Harold Soh · 2024

Building on the foundations of our previous work, this paper introduces Arena 4.0, a significant advancement over Arena 3.0, Arena-Bench, Arena 1.0, and Arena 2.0. Arena 4.0 offers three key novel con…

Read Paper →

Engineering Preprint PDF DOI

From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

Xu Han, Xianda Chen, Zhenghan Cai, Pinlong Cai, Meixin Zhu, Xiaowen Chu · 2024

Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitat…

Read Paper →

Engineering Preprint PDF DOI

P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task

Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li · 2024

Embodied Everyday Task is a popular task in the embodied AI community, requiring agents to make a sequence of actions based on natural language instructions and visual observations. Traditional learni…

Read Paper →

Engineering Preprint PDF DOI

Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance

Huang-Cheng Chou, Haibin Wu, Hung-yi Lee, Chi-Chun Lee · 2024

Speech Emotion Recognition (SER) systems rely on speech input and emotional labels annotated by humans. However, various emotion databases collect perceptional evaluations in different ways. For insta…

Read Paper →

Engineering Preprint PDF DOI

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

Pan-Pan Jiang, Jimmy Tobin, Katrin Tomanek, Robert L. MacDonald, Katie Seaver, Richard Cave, Marilyn Ladewig, Rus Heywood, Jordan R. Green · 2024

Project Euphonia, a Google initiative, is dedicated to improving automatic speech recognition (ASR) of disordered speech. A central objective of the project is to create a large, high-quality, and div…

Read Paper →

Engineering Preprint PDF DOI

Using Ear-EEG to Decode Auditory Attention in Multiple-speaker Environment

Haolin Zhu, Yujie Yan, Xiran Xu, Zhongshu Ge, Pei Tian, Xihong Wu, Jing Chen · 2024

Auditory Attention Decoding (AAD) can help to determine the identity of the attended speaker during an auditory selective attention task, by analyzing and processing measurements of electroencephalogr…

Read Paper →

Engineering Preprint PDF DOI

Music auto-tagging in the long tail: A few-shot approach

T. Aleksandra Ma, Alexander Lerch · 2024

In the realm of digital music, using tags to efficiently organize and retrieve music from extensive databases is crucial for music catalog owners. Human tagging by experts is labor-intensive but mostl…

Read Paper →

Engineering Preprint PDF DOI

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

Zhihong Lei, Xingyu Na, Mingbin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han, Zhen Huang · 2024

Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. Howev…

Read Paper →

Engineering Preprint PDF DOI

The Role of Explainable AI in Revolutionizing Human Health Monitoring: A Review

Abdullah Alharthi, Ahmed Alqurashi, Turki Alharbi, Mohammed Alammar, Nasser Aldosari, Houssem Bouchekara, Yusuf Shaaban, Mohammad Shoaib Shahriar, Abdulrahman Al Ayidh · 2024

The complex nature of disease mechanisms and the variability of patient symptoms pose significant challenges in developing effective diagnostic tools. Although machine learning (ML) has made substanti…

Read Paper →

Engineering Preprint PDF DOI

3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents

Yingjie Zhou, Zicheng Zhang, Farong Wen, Jun Jia, Yanwei Jiang, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai · 2024

Although 3D generated content (3DGC) offers advantages in reducing production costs and accelerating design timelines, its quality often falls short when compared to 3D professionally generated conten…

Read Paper →

Engineering Preprint PDF DOI

Scanning Electron Microscopy-based Automatic Defect Inspection for Semiconductor Manufacturing: A Systematic Review

Enrique Dehaerne, Bappaditya Dey, Victor Blanco, Jesse Davis · 2024

In this review, automatic defect inspection algorithms that analyze Scanning Electron Microscopy (SEM) images for Semiconductor Manufacturing (SM) are identified, categorized, and discussed. This is a…

Read Paper →

Engineering Preprint PDF DOI

Retrieval Augmented Correction of Named Entity Speech Recognition Errors

Ernest Pusateri, Anmol Walia, Anirudh Kashi, Bortik Bandyopadhyay, Nadia Hyder, Sayantan Mahinder, Raviteja Anantha, Daben Liu, Sashank Gondala · 2024

In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names…

Read Paper →

Engineering Preprint PDF DOI

ECG Biometric Authentication Using Self-Supervised Learning for IoT Edge Sensors

Guoxin Wang, Shreejith Shanker, Avishek Nag, Yong Lian, Deepu John · 2024

Wearable Internet of Things (IoT) devices are gaining ground for continuous physiological data acquisition and health monitoring. These physiological signals can be used for security applications to a…

Read Paper →

Browse Research Papers

Content-Based Image Retrieval Using COSFIRE Descriptors with application to Radio Astronomy

Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation

M-Vec: Matryoshka Speaker Embeddings with Flexible Dimensions

Language-based Audio Moment Retrieval

A Large Language Model and Denoising Diffusion Framework for Targeted Design of Microstructures with Commands in Natural Language

Deep Learning-Based Channel Squeeze U-Structure for Lung Nodule Detection and Segmentation

Utility of Multimodal Large Language Models in Analyzing Chest X-ray with Incomplete Contextual Information

Arena 4.0: A Comprehensive ROS2 Development and Benchmarking Platform for Human-centric Navigation Using Generative-Model-based Environment Generation

From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving

P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task

Stimulus Modality Matters: Impact of Perceptual Evaluations from Different Modalities on Speech Emotion Recognition System Performance

Learnings from curating a trustworthy, well-annotated, and useful dataset of disordered English speech

Using Ear-EEG to Decode Auditory Attention in Multiple-speaker Environment

Music auto-tagging in the long tail: A few-shot approach

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

The Role of Explainable AI in Revolutionizing Human Health Monitoring: A Review

3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents

Scanning Electron Microscopy-based Automatic Defect Inspection for Semiconductor Manufacturing: A Systematic Review

Retrieval Augmented Correction of Named Entity Speech Recognition Errors

ECG Biometric Authentication Using Self-Supervised Learning for IoT Edge Sensors

Browse by Category

Research Type

Publish Your Research