Synthical
Your space
Profile
Activity
Favorites
Folders
Feeds
All articles
Simple
Original
Articles about
Multimedia
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs
17 April 2025 by
Haoxuan Li
and
others
Information Retrieval
,
Computation and Language
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
17 April 2025 by
Ruixiang Jiang
and
Changwen Chen
Computer Vision and Pattern Recognition
,
Artificial Intelligence
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
17 April 2025 by
Wenqi Dong
and
others
Graphics
,
Computer Vision and Pattern Recognition
Multimodal Fake News Video Explanation: Dataset, Analysis and Evaluation
17 April 2025 by
Lizhi Chen
and
others
Computer Vision and Pattern Recognition
,
Multimedia
FashionDPO:Fine-tune Fashion Outfit Generation Model using Direct Preference Optimization
17 April 2025 by
Mingzhe Yu
and
others
Multimedia
,
Information Retrieval
Saliency-Aware Diffusion Reconstruction for Effective Invisible Watermark Removal
17 April 2025 by
Inzamamul Alam
and
others
Computer Vision and Pattern Recognition
,
Multimedia
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
17 April 2025 by
Sifei Li
and
others
Multimedia
,
Sound
Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models
17 April 2025 by
Yuxiang Lin
and
others
Artificial Intelligence
,
Multimedia
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
17 April 2025 by
Xiangru Zhu
and
others
Computation and Language
,
Artificial Intelligence
Scene-Text Grounding for Text-Based Video Question Answering
17 April 2025 by
Sheng Zhou
and
others
Computer Vision and Pattern Recognition
,
Multimedia
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
17 April 2025 by
Qianqian Sun
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Taming Data and Transformers for Audio Generation
16 April 2025 by
Moayed Haji-Ali
and
others
at
Rice University
Sound
,
Computation and Language
Towards Realistic Low-Light Image Enhancement via ISP Driven Data Modeling
16 April 2025 by
Zhihua Wang
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments
16 April 2025 by
Yifei Chen
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
16 April 2025 by
Qintong Zhang
and
others
at
Tsinghua University
Multimedia
,
Artificial Intelligence
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
16 April 2025 by
Roberto Henschel
and
others
Computer Vision and Pattern Recognition
,
Artificial Intelligence
Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR
16 April 2025 by
Yulong Zhang
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Interpreting the Linear Structure of Vision-language Model Embedding Spaces
16 April 2025 by
Isabel Papadimitriou
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Dependency Structure Augmented Contextual Scoping Framework for Multimodal Aspect-Based Sentiment Analysis
15 April 2025 by
Hao Liu
and
others
Computation and Language
,
Multimedia
Leveraging multimodal explanatory annotations for video interpretation with Modality Specific Dataset
15 April 2025 by
Elisa Ancarani
and
others
Computer Vision and Pattern Recognition
,
Multimedia
Graph-Driven Multimodal Feature Learning Framework for Apparent Personality Assessment
15 April 2025 by
Kangsheng Wang
and
others
Computer Vision and Pattern Recognition
,
Computation and Language
Causal Graphical Models for Vision-Language Compositional Understanding
15 April 2025 by
Fiorenzo Parascandolo
and
others
Computer Vision and Pattern Recognition
,
Artificial Intelligence
MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique
15 April 2025 by
Shuhang Liu
and
others
Multimedia
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
15 April 2025 by
Yan Rong
and
others
Sound
,
Multimedia
Muse: A Multimodal Conversational Recommendation Dataset with Scenario-Grounded User Profiles
15 April 2025 by
Zihan Wang
and
others
Multimedia
UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation
15 April 2025 by
Lei Zhao
and
others
Multimedia
,
Artificial Intelligence
Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
15 April 2025 by
Naoto Nishida
and
others
Human-Computer Interaction
,
Multimedia
Ichiyo: Fragile and Transient Interaction in Neighborhood
15 April 2025 by
Hirofumi Shibata
and
others
Human-Computer Interaction
,
Multimedia
Efficient Prompt Tuning for Hierarchical Ingredient Recognition
15 April 2025 by
Yinxuan Gui
and
others
Multimedia
SteerMusic: Enhanced Musical Consistency for Zero-shot Text-Guided and Personalized Music Editing
15 April 2025 by
Xinlei Niu
and
others
Sound
,
Multimedia
Load more