Synthical
Your space
Profile
Activity
Favorites
Folders
Feeds
All articles
Simple
Original
Articles about
Audio and Speech Processing
Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
5 days ago by
Haven Kim
and
others
Sound
,
Multimedia
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
5 days ago by
Shufan Li
and
Aditya Grover
Computation and Language
,
Sound
Diff-TONE: Timestep Optimization for iNstrument Editing in Text-to-Music Diffusion Models
5 days ago by
Teysir Baoueb
and
others
Sound
,
Machine Learning
Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
5 days ago by
Jaza Syed
and
others
Sound
,
Audio and Speech Processing
Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives
5 days ago by
Jinchao Li
and
others
Audio and Speech Processing
,
Machine Learning
Factorized RVQ-GAN For Disentangled Speech Tokenization
5 days ago by
Sameer Khurana
and
others
Audio and Speech Processing
,
Computation and Language
Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models
5 days ago by
Bo Li
and
others
Computation and Language
,
Artificial Intelligence
SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning
5 days ago by
Anuradha Chopra
and
others
Sound
,
Artificial Intelligence
I Know You're Listening: Adaptive Voice for HRI
5 days ago by
Paige Tuttösí
Robotics
,
Human-Computer Interaction
MERGE -- A Bimodal Audio-Lyrics Dataset for Static Music Emotion Recognition
5 days ago by
Pedro Lima Louro
and
others
Sound
,
Information Retrieval
An accurate and revised version of optical character recognition-based speech synthesis using LabVIEW
5 days ago by
Prateek Mehta
and
Anasuya Patil
Sound
,
Computation and Language
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments
5 days ago by
Md Jahangir Alam Khondkar
and
others
Sound
,
Machine Learning
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
5 days ago by
Jiamin Xie
and
others
Audio and Speech Processing
,
Artificial Intelligence
Beyond Universality: Cultural Diversity in Music and Its Implications for Sound Design and Sonification
6 days ago by
Rubén García-Benito
Physics and Society
,
Sound
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
6 days ago by
Li-Wei Chen
and
others
Computation and Language
,
Artificial Intelligence
pycnet-audio: A Python package to support bioacoustics data processing
6 days ago by
Zachary Ruff
and
Damon Lesmeister
Sound
,
Computer Vision and Pattern Recognition
ASAP-FE: Energy-Efficient Feature Extraction Enabling Multi-Channel Keyword Spotting on Edge Processors
6 days ago by
Jongin Choi
and
others
Audio and Speech Processing
,
Hardware Architecture
The Perception of Phase Intercept Distortion and its Application in Data Augmentation
6 days ago by
Venkatakrishnan Vaidyanathapuram Krishnan
and
Nathaniel Condit-Schultz
Signal Processing
,
Machine Learning
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
6 days ago by
Chang Li
and
others
Sound
,
Artificial Intelligence
An Open Research Dataset of the 1932 Cairo Congress of Arab Music
6 days ago by
Baris Bozkurt
Sound
,
Digital Libraries
Unifying Streaming and Non-streaming Zipformer-based ASR
6 days ago by
Bidisha Sharma
and
others
Sound
,
Artificial Intelligence
M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset
6 days ago by
Shilong Wu
and
others
Audio and Speech Processing
,
Multimedia
Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic Audio
6 days ago by
Leigh Abbott
and
others
Sound
,
Audio and Speech Processing
SLEEPING-DISCO 9M: A large-scale pre-training dataset for generative music modeling
6 days ago by
Tawsif Ahmed
and
others
Sound
,
Machine Learning
Fretting-Transformer: Encoder-Decoder Model for MIDI to Tablature Transcription
6 days ago by
Anna Hamberger
and
others
Sound
,
Computation and Language
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments
6 days ago by
Shitong Xu
and
others
Sound
,
Artificial Intelligence
Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios
6 days ago by
Aswin Shanmugam Subramanian
and
others
Audio and Speech Processing
,
Computation and Language
AsyncSwitch: Asynchronous Text-Speech Adaptation for Code-Switched ASR
6 days ago by
Tuan Nguyen
and
Huy-Dat Tran
Computation and Language
,
Sound
Controllable Dance Generation with Style-Guided Motion Diffusion
6 days ago by
Hongsong Wang
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Can we train ASR systems on Code-switch without real code-switch data? Case study for Singapore's languages
6 days ago by
Tuan Nguyen
and
Huy-Dat Tran
Computation and Language
,
Artificial Intelligence
Load more