Contributed to the 3DTV Content Search Project Sponsored by European Project FP7 [URL]
Publication
Selected Papers
Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, Yuki Mitsufuji, “GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping,” accepted at Neural Information Processing Systems (NeurIPS), 2024 [arXiv][code][demo]
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, “PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher,” accepted at Neural Information Processing Systems (NeurIPS), 2024 [arXiv][code]
Silin Gao, Mete Ismayilzada, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “DiffuCOMET: Contextual Commonsense Knowledge Diffusion,” In Proc. the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4809–4831, 2024 [ACL][arXiv][code]
Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco Martínez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon, “MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models,” In Proc. International Joint Conferences on Artificial Intelligence (IJCAI) AI, Arts & Creativity Track, pp. 7805–7813, 2024 [IJCAI][arXiv][code][demo][video]
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon, “Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion,” in Proc. International Conference on Learning Representations (ICLR), 2024 [OpenReview][arXiv][code][demo]
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon, “Manifold Preserving Guided Diffusion,” in Proc. International Conference on Learning Representations (ICLR), 2024 [OpenReview][arXiv][demo]
Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, “SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer,” in Proc. International Conference on Learning Representations (ICLR), 2024 [OpenReview][arXiv][code][demo]
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji, “STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events,” in Proc. Neural Information Processing Systems (NeurIPS), 2023 [OpenReview][arXiv][code][dataset][demo]
Silin Gao, Beatriz Borges, Soyoung Oh, Deniz Bayazit, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives,” in Proc. the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 6569–6591, 2023 [ACL][arXiv][code][bibtex] – Outstanding Paper Award [Certificate]
Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji, “The Whole Is Greater than the Sum of Its Parts: Improving Music Source Separation by Bridging Networks,” EURASIP Journal Audio, Speech, and Music Processing (EURASIP J. ASMP), vol. 2024, Issue 1, pp. 39–58, 2024 [EURASHIP][arXiv]
Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji, “The Sound Demixing Challenge 2023 – Cinematic Demixing Track,” Transactions of the International Society for Music Information Retrieval (Trans. ISMIR), vol. 7, Issue 1, pp. 44–62, 2024 [TISMIR][arXiv][challenge]
Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji, “The Sound Demixing Challenge 2023 – Music Demixing Track,” Transactions of the International Society for Music Information Retrieval (Trans. ISMIR), vol. 7, Issue 1, pp. 63–84, 2024 [TISMIR][arXiv][challenge]
Yuhta Takida, Wei-Hsiang Liao, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji, “Preventing Oversmoothing in VAE via Generalized Variance Parameterization,” Neurocomputing, vol. 509, pp. 137–156, 2022 [Elsevier][arXiv]
Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk, “Music Demixing Challenge 2021,” Frontiers in Signal Processing (Front. signal process.), vol. 1, 2022 [Frontiers][arXiv][challenge][bibtex]
Jihui Aimee Zhang, Naoki Murata, Yu Maeno, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Yuki Mitsufuji, “Coherence-Based Performance Analysis on Noise Reduction in Multichannel Active Noise Control Systems,” Journal of the Acoustical Society of America (JASA), vol. 148, issue 3, 2020 [ASA]
Yuki Mitsufuji, Norihiro Takamune, Shoichi Koyama, Hiroshi Saruwatari, “Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain,” IEEE/ACM Transactions on Audio, Speech, and Language Processing (Trans. ASLP), vol. 29, pp. 607–617, 2020 [IEEE][bibtex]
Tetsu Magariyachi, Yuki Mitsufuji, “Analytic Error Control Methods for Efficient Rotation in Dynamic Binaural Rendering of Ambisonics,” Journal of the Acoustical Society of America (JASA), vol. 147, issue 1, 2020 [ASA]
Yu Maeno, Yuki Mitsufuji, Prasanga N. Samarasinghe, Naoki Murata, Thushara D. Abhayapala, “Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays,” IEEE/ACM Transactions on Audio, Speech, and Language Processing (Trans. ASLP), vol. 28, pp. 656–670, 2019 [IEEE][bibtex]
Yuki Mitsufuji, Stefan Uhlich, Norihiro Takamune, Daichi Kitamura, Shoichi Koyama, Hiroshi Saruwatari, “Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain,” IEEE/ACM Transactions on Audio, Speech, and Language Processing (Trans. ASLP), vol. 28, pp. 49–60, 2019 [IEEE][bibtex]
Fabian-Robert Stöter, Stefan Uhlich, Antoine Liutkus, Yuki Mitsufuji, “Open-Unmix – A Reference Implementation for Music Source Separation,” Journal of Open Source Software (JOSS), vol. 4, no. 41, pp. 1667, 2019 [OSI][code][bibtex]
Yuki Mitsufuji, Axel Röbel, “On the Use of a Spatial Cue as Prior Information for Stereo Sound Source Separation Based on Spatially Weighted Non-Negative Tensor Factorization,” EURASIP Journal of Advancement of Signal Processing (EURASIP J.Adv.Signal Process.), issue 1, 2014 [Springer][bibtex]
Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, Yuki Mitsufuji, “GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping,” accepted at Neural Information Processing Systems (NeurIPS), 2024 [arXiv][code][demo]
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, “PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher,” accepted at Neural Information Processing Systems (NeurIPS), 2024 [arXiv][code]
Roser Batlle-Roca, Wei-Hsiang Liao, Xavier Serra, Yuki Mitsufuji, Emilia Gómez, “Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio,” accepted at the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024 [arXiv][code]
Marco Comunita, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya. Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, “SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond,” accepted at the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024 [arXiv][demo]
Mayank Kumar Singh, Naoya Takahashi, Wei-Hsiang Liao, Yuki Mitsufuji, “SilentCipher: Deep Audio Watermarking,” In Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2235–2239, 2024 [ISCA][arXiv][code][demo]
Silin Gao, Mete Ismayilzada, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “DiffuCOMET: Contextual Commonsense Knowledge Diffusion,” In Proc. the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4809–4831, 2024 [ACL][arXiv][code]
Mengjie Zhao, Junya Ono, Zhi Zhong, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Takashi Shibuya, Hiromi Wakaki, Yuki Mitsufuji, “On the Language Encoder of Contrastive Cross-modal Models,” In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4923–4940, 2024 [ACL][arXiv]
Sungho Lee, Marco Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji, “Searching For Music Mixing Graphs: A Pruning Approach,” In Proc. Digital Audio Effect Conference (DAFx), pp. 147–154, 2024 [DAFx][arXiv][Demo] – Beset Show & Tell Award [Certificate]
Yu-Hua Chen, Woosung Choi, Wei-Hsiang Liao, Marco A. Martínez-Ramírez, Kin Wai Cheuk, Yuki Mitsufuji, Jyh-Shing Roger Jang and Yi-Hsuan Yang, “Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data,” accepted at Digital Audio Effect Conference (DAFx), pp. 192–199, 2024 [DAFx][arXiv][demo]
Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco Martínez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon, “MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models,” In Proc. International Joint Conferences on Artificial Intelligence (IJCAI) AI, Arts & Creativity Track, pp. 7805–7813, 2024 [IJCAI][arXiv][code][demo][video]
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon, “Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion,” in Proc. International Conference on Learning Representations (ICLR), 2024 [OpenReview][arXiv][code][demo]
Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon, “Manifold Preserving Guided Diffusion,” in Proc. International Conference on Learning Representations (ICLR), 2024 [OpenReview][arXiv][demo]
Yuhta Takida, Masaaki Imaizumi, Takashi Shibuya, Chieh-Hsin Lai, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, “SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer,” in Proc. International Conference on Learning Representations (ICLR), 2024 [OpenReview][arXiv][code][demo]
Carlos Hernandez-Olivan, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramirez, Wei-Hsiang Liao, Yuki Mitsufuji, “VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 596–600, 2024 [IEEE][arXiv][demo]
Kazuki Shimada, Kengo Uchida, Yuichiro Koyama, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, Tatsuya Kawahara, “Zero- and Few-shot Sound Event Localization and Detection,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 636–640, 2024 [IEEE][arXiv]
Frank Cwitkowitz, Kin-Wai Cheuk, Woosung Choi, Marco A. Martínez-Ramírez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji, “Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1291–1295, 2024 [IEEE][arXiv]
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji, “BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network,” in Proc. at International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 10121–10125, 2024 [IEEE][arXiv][demo][code]
Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji, “Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 12951–12955, 2024 [IEEE][arXiv]
Eleonora Grassucci, Yuki Mitsufuji, Ping Zhang, Danilo Comminiello, “Enhancing Semantic Communication with Deep Generative Models – An ICASSP Special Session Overview,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 13021–13025, 2024 [IEEE][arXiv]
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji, “STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events,” in Proc. Neural Information Processing Systems (NeurIPS), pp. 72931–72957, 2023 [OpenReview][arXiv][code][dataset][demo]
Zhi Zhong, Hao Shi, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, “Extending Audio Masked Autoencoders Toward Audio Restoration,” in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–5, 2023 [IEEE][arXiv][bibtex]
Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, WeiHsiang Liao, Yuki Mitsufuji, “Automatic Piano Transcription with Hierarchical Frequency-Time Transformer,” in Proc. International Society for Music Information Retrieval (ISMIR) Conference, 2023 [ISMIR][arXiv][code]
Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, “Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 3824–3828, 2023 [ISCA][arXiv][code]
Silin Gao, Beatriz Borges, Soyoung Oh, Deniz Bayazit, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives,” in Proc. the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 6569–6591, 2023 [ACL][arXiv][code][bibtex] – Outstanding Paper Award [Certificate]
Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, “GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Linear Inverse Problems with Denoising Diffusion Restoration,” in Proc. International Conference on Machine Learning (ICML), pp. 25501–25522, 2023 [PRML][OpenReview][arXiv][code][bibtex]
Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, “FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation,” in Proc. International Conference on Machine Learning (ICML), pp. 18365–18398, 2023 [PRML][OpenReview][arXiv][code][bibtex]
Zhi Zhong, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Shusuke Takahashi, Yuki Mitsufuji, “An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.1–5, 2023 [IEEE][arXiv][bibtex]
Koichi Saito, Naoki Murata, Toshimitsu Uesaka, Chieh-Hsin Lai, Yuhta Takida, Takao Fukui, Yuki Mitsufuji, “Unsupervised Vocal Dereverberation with Diffusion-based Generative Models,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [IEEE][arXiv][demo][bibtex]
Junghyun Koo, Marco A. Martı́nez-Ramı́rez, Wei-Hsiang Liao, Stefan Uhlich, Kyogu Lee, Yuki Mitsufuji, “Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [IEEE][arXiv][demo][code][bibtex]
Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji, “Hierarchical Diffusion Models for Singing Voice Neural Vocoder,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [IEEE][arXiv][demo][bibtex]
Kin-Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji, “DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 [IEEE][arXiv][demo][code][bibtex]
Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick, “CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos,” in Proc. International Conference on Learning Representations (ICLR), 2023 [OpenReview][arXiv][demo][code][bibtex]
Silin Gao, Jena D. Hwang, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “ComFact: A Benchmark for Linking Contextual Commonsense Knowledge,” In Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1656–1675, 2022 [ACL][arXiv][code][bibtex]
Marco A. Martínez Ramírez, WeiHsiang Liao, Giorgio Fabbro, Stefan Uhlich, Chihiro Nagashima, Yuki Mitsufuji, “Automatic Music Mixing with Deep Learning and Out-of-Domain Data,” in Proc. the 23rd International Society for Music Information Retrieval (ISMIR) Conference, pp.411–418, 2022 [ISMIR][arXiv][demo][code]
Johannes Imort, Giorgio Fabbro, Marco A. Martinez Ramirez, Stefan Uhlich, Yuichiro Koyama, Yuki Mitsufuji, “Distortion Audio Effects: Learning How to Recover the Clean Signal,” in Proc. the 23rd International Society for Music Information Retrieval (ISMIR) Conference, pp.218–225, 2022 [ISMIR][arXiv][demo]
Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji, “SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization,” in Proc. International Conference on Machine Learning (ICML), pp.20987–21012, 2022 [PMLR][arXiv][code][bibtex]
Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Naoya Takahashi, Emiru Tsunoo, Yuki Mitsufuji, “Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 316–320, 2022 [IEEE][arXiv][bibtex]
Bo-Yu Chen, Wei-Han Hsu, Wei-Hsiang Liao, Marco A. Martínez Ramírez, Yuki Mitsufuji, Yi-Hsuan Yang, “Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 466–470, 2022 [IEEE][arXiv][demo][code][bibtex]
Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji, “Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 8872–8876, 2022 [IEEE][arXiv][bibtex]
Yuichiro Koyama, Naoki Murata, Stefan Uhlich, Giorgio Fabbro, Shusuke Takahashi, Yuki Mitsufuji, “Music Source Separation with Deep Equilibrium Models,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 296–300, 2022 [IEEE][arXiv][bibtex]
Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji, “Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 431–435, 2022 [IEEE][arXiv][code][bibtex]
Naoya Takahashi, Yuki Mitsufuji, “Amicable Examples for Informed Source Separation,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 241–245, 2022 [IEEE][arXiv][bibtex]
Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji, “Source Mixing and Separation Robust Audio Steganography,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4368–4372, 2022 [arXiv]
Yasuhide Hyodo, Chihiro Sugai, Junya Suzuki, Masafumi Takahashi, Masahiko Koizumi, Asako Tomura, Yuki Mitsufuji, Yota Komoriya, “Psychophysiological Effect of Immersive Spatial Audio Experience Enhanced Using Sound Field Synthesis,” in Proc. International Conference on Affective Computing & Intelligent Interaction (ACII), pp. 1–8, 2021 [IEEE][bibtex]
Naoya Takahashi, Kumar Singh Singh, Yuki Mitsufuji, “Hierarchical Disentangled Representation Learning for Singing Voice Conversion,” International Joint Conference on Neural Networks (IJCNN), pp. 1–7, 2021 [IEEE][arXiv][bibtex]
Naoya Takahashi, Yuki Mitsufuji, “Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks,” in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 993–1002, 2021 [CVF][IEEE][arXiv][code][bibtex]
Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, “ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 915–919, 2021 [IEEE][arXiv][code][bibtex]
Naoya Takahashi, Shota Inoue, Yuki Mitsufuji, “Adversarial Attacks on Audio Source Separation,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 521–525, 2021 [IEEE][arXiv][bibtex]
Ryosuke Sawata, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji, “All for One and One for All: Improving Music Separation by Bridging Networks,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 51–55, 2021 [IEEE][arXiv][code][bibtex]
Yu Maeno, Yuhta Takida, Naoki Murata, Yuki Mitsufuji, “Array-Geometry-Aware Spatial Active Noise Control Based on Direction-of-Arrival Weighting,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 8414–8418, 2020 [IEEE][bibtex]
Naoya Takahashi, Mayank Kumar Singh, Sakya Basak, Parthasaarathy Sudarsanam, Sriram Ganapathy, Yuki Mitsufuji, “Improving Voice Separation by Incorporating End-To-End Speech Recognition,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 41–45, 2020 [IEEE][arXiv][bibtex]
Naoki Murata, Jihui Zhang, Yu Maeno, Yuki Mitsufuji, “Global and Local Mode Domain Adaptive Algorithms for Spatial Active Noise Control Using Higher-Order Sources,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 526–530, 2019 [IEEE][bibtex]
Naoya Takahashi, Sudarsanam Parthasaarathy, Nabarun Goswami, Yuki Mitsufuji, “Recursive Speech Separation for Unknown Number of Speakers,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1348–1352, 2019 [ISCA][arXiv][bibtex]
Naoya Takahashi, Purvi Agrawal, Nabarun Goswami, Yuki Mitsufuji, “PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation,” in Proc. Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 2713–2717, 2018 [ISCA][bibtex]
Wei-Hsiang Liao, Yuki Mitsufuji, Keiichi Osako, Kazunobu Ohkuri, “Microphone Array Geometry for Two Dimensional Broadband Sound Field Recording,” in Proc. 145th Audio Engineering Society (AES) Convention, 2018 [AES][bibtex]
Yu Maeno, Yuki Mitsufuji, Prasanga N. Samarasinghe, Thushara D. Abhayapala, “Mode-domain Spatial Active Noise Control Using Multiple Circular Arrays,” in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 441–445, 2018 [IEEE][bibtex]
Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji, “MMDenseLSTM: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation,” in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2018 [IEEE][arXiv][bibtex]
Yuki Mitsufuji, Asako Tomura, Kazunobu Ohkuri, “Creating a Highly-Realistic ”Acoustic Vessel Odyssey” Using Sound field Synthesis with 576 Loudspeakers,” in Proc. Audio Engineering Society (AES) Conference on Spatial Reproduction-Aesthetics and Science, 2018 [AES][bibtex]
Yu Maeno, Yuki Mitsufuji, Thushara D. Abhayapala, “Mode Domain Spatial Active Noise Control Using Sparse Signal Representation,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 211–215, 2018 [IEEE][arXiv][bibtex]
Naoya Takahashi, Yuki Mitsufuji, “Multi-Scale Multi-Band DenseNets for Audio Source Separation,” in Proc. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 21–25, 2017 [IEEE][arXiv][bibtex]
Stefan Uhlich, Marcello Porcu, Franck Giron, Michael Enenkl, Thomas Kemp, Naoya Takahashi, Yuki Mitsufuji, “Improving Music Source Separation Based on Deep Neural Networks Through Data Augmentation and Network Blending,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 261–265, 2017 [IEEE][bibtex]
Keiichi Osako, Yuki Mitsufuji, Rita Singh, Bhiksha Raj, “Supervised Monaural Source Separation Based on Autoencoders,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 11–15, 2017 [IEEE][bibtex]
Yuki Mitsufuji, Shoichi Koyama, Hiroshi Saruwatari, “Multichannel Blind Source Separation Based on Non-Negative Tensor Factorization in Wavenumber Domain,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 56–60, 2016 [IEEE][bibtex]
Stefan Uhlich, Franck Giron, Yuki Mitsufuji, “Deep Neural Network Based Instrument Extraction from Music,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2135–2139, 2015 [IEEE][bibtex]
Xin Guo, Stefan Uhlich, Yuki Mitsufuji, “NMF-Based Blind Source Separation Using a Linear Predictive Coding Error Clustering Criterion,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 261–265, 2015 [IEEE][bibtex]
Yuki Mitsufuji, Marco Liuni, Alex Baker, Axel Röbel, “Online Non-Negative Tensor Deconvolution for Source Detection in 3DTV Audio,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3082–3086, 2014 [IEEE][bibtex]
Yuki Mitsufuji, Axel Röbel, “Sound Source Separation Based on Non-Negative Tensor Factorization Incorporating Spatial Cue as Prior Knowledge,” in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 71–75, 2013 [IEEE][bibtex]
R. Oguz Araz, Joan Serrà, Xavier Serra, Yuki Mitsufuji, Dmitry Bogdanov, “DISCOGS-VINET-MIREX,” Cover Song Identification Track (MIREX), 2024 [MIREX]
Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji, “Demonstrating OpenMU-LightBench: A Benchmark Suite for Music Understanding,” ISMIR Late Breaking Demo (ISMIR LBD), 2024 [ISMIR]
Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Michele Mancusi, Yuki Mitsufuji, “ITO-Master: Inference-Time Optimization for Music Mastering Style Transfer,” ISMIR Late Breaking Demo (ISMIR LBD), 2024 [ISMIR]
Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke Toyama, Wei-Hsiang Liao, Simon Dixon, Yuki Mitsufuji, “Source-Level Pitch and Timbre Editing for Mixtures of Tones Using Disentangled Representations,” ISMIR Late Breaking Demo (ISMIR LBD), 2024 [ISMIR]
David Diaz-Guerra, Archontis Politis, Parthasaarathy Sudarsanam, Kazuki Shimada, Daniel A. Krause, Kengo Uchida, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji, Tuomas Virtanen, “Baseline Models and Evaluation of Sound Event Localization and Detection with Distance Estimation in DCASE2024 Challenge,” in Proc. Detection and Classification of Acoustic Scenes and Events 2024 Workshop (DCASE Workshop), 2024 [DCASE]
Koichi Saito, Dongjun Kim, Takashi Shibuya, Chieh-Hsin Lai, Zhi Zhong, Yuhta Takida, Yuki Mitsufuji, “SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation,” accepted at NeurIPS Workshop on AI-Driven Speech, Music, and Sound Generation (NeurIPS Audio Imagination), 2024 [arXiv][code][demo]
Yin-Jyun Luo, Kin Wai Cheuk, Woosung Choi, Toshimitsu Uesaka, Keisuke Toyama, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Wei-Hsiang Liao, Simon Dixon, Yuki Mitsufuji, “Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation,” accepted at NeurIPS Workshop on AI-Driven Speech, Music, and Sound Generation (NeurIPS Audio Imagination), 2024 [arXiv]
Mayank Kumar Singh, Naoya Takahashi, Wei-Hsiang Liao, Yuki Mitsufuji, “LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking,” accepted at NeurIPS Workshop on AI-Driven Speech, Music, and Sound Generation (NeurIPS Audio Imagination), 2024 [arxiv][demo]
Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter, “Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation,” accepted at Workshop on Creativity and Artificial Intelligence (NeurIPS Creativity), 2024 [arXiv]
Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji, “Distillation of Discrete Diffusion through Dimensional Correlations,” accepted at Workshop on Machine Learning and Compression (NeurIPS Neural Compression), 2024 [arXiv]
Yunkee Chae, Woosung Choi, Yuhta Takida, Junghyun Koo, Yukara Ikemiya, Zhi Zhong, Kin Wai Cheuk, Marco A. Martínez-Ramírez, Kyogu Lee, Wei-Hsiang Liao, Yuki Mitsufuji, “VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression,” accepted at Workshop on Machine Learning and Compression (NeurIPS Neural Compression), 2024 [arXiv]
Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki Mitsufuji, “A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation,” in Proc. ECCV Workshop Audio-Visual Generation and Learning (ECCV AVGenL), 2024 [arXiv]
Silin Gao, Mete Ismayilzada, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “DiffuCOMET: Contextual Commonsense Knowledge Diffusion,” in Proc. the Third Workshop on Knowledge Augmented Methods for NLP (KnowledgeNLP), 2024 [URL]
Sungho Lee, Marco Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji, “GRAFX: An Open-source Library for Audio Processing Graphs in PyTorch,” in Proc. DAFx Demo/LBR (DAFx Demo/LBR), 2024 [DAFx][arXiv]
Toshimitsu Uesaka, Taiji Suzuki, Yuhta Takida, Chieh-Hsin Lai, Naoki Murata, Yuki Mitsufuji, “Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information,” in Proc. ICLR Workshop on Bridging the Gap Between Practice and Theory in Deep Learning (ICLR BGPT), 2024 [arXiv]
Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon, “Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion,” NeurIPS Workshop on Diffusion Models (NeurIPS WDM), 2023 [URL]
Yu-Hua Chen, Woosung Choi, WeiHsiang Liao, Marco A. Martínez-Ramírez, Kin-Wai Cheuk, Yi-Hsuan Yang, Yuki Mitsufuji, “Neural Amplifier Modelling with Several GAN Variants,” ISMIR Late Breaking Demo (ISMIR LBD), 2023 [ISMIR][demo]
Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, Stefano Ermon, “On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization,” ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling (ICML SPIGM), 2023 [OpenReview][arXiv]
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji, “Toward an Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events,” CVPR 2023 Workshop Sight and Sound (CVPR WSS), 2023 [URL][dataset]
Silin Gao, Jena D. Hwang, Saya Kanno, Hiromi Wakaki, Yuki Mitsufuji, Antoine Bosselut, “ComFact: A Benchmark for Linking Contextual Commonsense Knowledge,” in Proc. AAAI 2023 Workshop on Knowledge Augmented Methods for NLP (KnowledgeNLP-AAAI’23), 2023 [AAAI][code]
Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon, “Regularizing Score-based Models with Score Fokker-Planck Equations,” in Proc. NeurIPS 2022 Workshop on Score-Based Methods (NeurIPS SBM), 2022 [OpenReview]
Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen, “STARSS22: A Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events,” in Proc. Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE Workshop), 2022 [DCASE][arXiv][dataset]
Fabian-Robert Stöter, Maria Clara Machry, Delton de Andrade Vaz, Stefan Uhlich, Yuki Mitsufuji, Antoine Liutkus, “Open.Unmix.app – Towards Audio Separation on the Edge,” Wave Audio Conference (WAC), 2021 [URL][demo]
Joachim Muth, Stefan Uhlich, Nathanael Perraudin, Thomas Kemp, Fabien Cardinaux, Yuki Mitsufuji, “Improving DNN-based Music Source Separation Using Phase Features,” Joint Workshop on Machine Learning for Music at ICML, IJCAI/ECAI and AAMAS, 2018 [arXiv]
Roser Batlle-Roca, Emilia Gómez, Wei-Hsiang Liao, Xavier Serra, Yuki Mitsufuji, “Transparency in Music-Generative AI: A Systematic Literature Review,” under review at ACM Computing Surveys (Comput. Surv.), 2024 [preprint]
Chieh-Hsin Lai, Bac Nguyen, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, “PadvFlow: Towards Learning Imperceptible Adversarial Distribution for Black-Box Attacks against Image Classifiers and Automatic Speech Recognition Systems,” under review at IEEE Transactions on Multimedia (TMM), 2024
Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji, “Robust One-Shot Singing Voice Conversion,” to be submitted to EURASIP Journal Audio, Speech, and Music Processing (EURASIP J. ASMP), 2024 [arXiv][demo]
Masato Hirano, Shimada Kazuki, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji, “Diffusion-based Signal Refiner for Speech Separation,” to be submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (Trans. ASLP), 2024 [arXiv]
Hao Hao Tan, Kin Wai Cheuk, Taemin Cho, Wei-Hsiang Liao, Yuki Mitsufuji, “MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage,” under review, 2024 [arXiv]
Qiyu Wu, Mengjie Zhao, Yutong He, Lang Huang, Junya Ono, Hiromi Wakaki, Yuki Mitsufuji, “Towards Reporting Bias in Visual-Language Datasets: Bimodal Augmentation by Decoupling Object-Attribute Association,” under review, 2024 [arXiv]
Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji, “Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation,” under review, 2024 [arXiv]
Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji, “Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation,” under review, 2024 [arXiv]
Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon, “Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning,” under review, 2024 [arXiv][code][demo]
Kengo Uchida, Takashi Shibuya, Yuhta Takida, Naoki Murata, Shusuke Takahashi, Yuki Mitsufuji, “MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training,” under review, 2024 [arXiv][code][demo]
Hiromi Wakaki, Yuki Mitsufuji, Yoshinori Maeda, Yukiko Nishimura, Silin Gao, Mengjie Zhao, Keiichi Yamada, Antoine Bosselut, “ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark,” under review, 2024 [arXiv][dataset]
Michele Mancusi, Yurii Halychansky, Kin Wai Cheuk, Chieh-Hsin Lai, Stefan Uhlich, Junghyun Koo, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Giorgio Fabbro, Yuki Mitsufuji, “Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer,” under review, 2024 [arxiv][demo]
Giannis Daras, Hyungjin Chung, Chieh-Hsin Lai, Yuki Mitsufuji, Jong Chul Ye, Peyman Milanfar, Alexandros G. Dimakis, Mauricio Delbracio, “A Survey on Diffusion Models for Inverse Problems,” under review, 2024 [arXiv]
Saurav Jha, Shiqi Yang, Masato Ishii, Mengjie Zhao, Christian Simon, Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji, “Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models,” under review, 2024 [arXiv]
Yangming Li, Chieh-Hsin Lai, Carola-Bibiane Schönlieb, Yuki Mitsufuji, Stefano Ermon, “Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space,” under review, 2024 [arXiv]
Ayano Hiranaka, Shang-Fu Chen, Chieh-Hsin Lai, Dongjun Kim, Naoki Murata, Takashi Shibuya, Wei-Hsiang Liao, Shao-Hua Sun, Yuki Mitsufuji, “Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning,” under review, 2024 [arXiv]
M. Jehanzeb Mirza, Mengjie Zhao, Zhuoyuan Mao, Sivan Doveh, Wei Lin, Paul Gavrikov, Michael Dorkenwald, Shiqi Yang, Saurav Jha, Hiromi Wakaki, Yuki Mitsufuji, Horst Possegger, Rogerio Feris, Leonid Karlinsky, James Glass, “GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models,” under review, 2024 [arXiv][code]
Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Yuki Mitsufuji, “Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models,” under review, 2024 [arXiv]
Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Bac Nguyen, Stefano Ermon, Yuki Mitsufuji, “G2D2: Gradient-Guided Discrete Diffusion for Image Inverse Problem Solving,” under review, 2024 [arXiv]
Bac Nguyen, and Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Stefano Ermon, Yuki Mitsufuji, “Mitigating Embedding Collapse in Diffusion Models for Categorical Data,” under review, 2024 [arXiv]
Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji, “OpenMU: Your Swiss Army Knife for Music Understanding,” under review, 2024 [arXiv][code][demo][dataset]
WeiHsiang Liao, Yuhta Takida, Yukara Ikemiya, Zhi Zhong, Chieh-Hsin Lai, Giorgio Fabbro, Kazuki Shimada, Keisuke Toyama, Kinwai Cheuk, Marco A. Martínez-Ramírez, Shusuke Takahashi, Stefan Uhlich, Taketo Akama, Woosung Choi, Yuichiro Koyama, Yuki Mitsufuji, “Music Foundation Model as Generic Booster for Music Downstream Tasks,” under review, 2024 [arXiv]
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir Memon, Julian Togelius, Yuki Mitsufuji, “Classifier-Free Guidance inside the Attraction Basin May Cause Memorization,” under review, 2024 [arXiv]
Michail Dontas, Yutong He, Naoki Murata, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, “Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion,” under review, 2024 [arXiv]
Best Show & Tell Award for “Searching For Music Mixing Graphs: A Pruning Approach,” accepted at Digital Audio Effect Conference (DAFx), 2024 [Certificate]
Elevated to the grade of IEEE Senior Member [Certificate][URL]
Outstanding Paper Award for “PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives” at the Annual Meeting of the Association for Computational Linguistics (ACL), 2023 [URL][Certificate]
Local Commendation for Invention 2022 Award [URL][Certificate]
Ranked 1st in Task 3 at DCASE2021 Challenge (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) [URL][arXiv]
Ranked 3rd in Task 3 at DCASE2020 Challenge (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) [arXiv]
Japan Media Arts Festival 2019 Jury Selections – Acoustic Vessel Odyssey [URL][AES]
Ranked 1st in Music Task at the 2018 Signal Separation Evaluation Campaign [URL]
Ranked 1st in Music Task at the 2016 Signal Separation Evaluation Campaign [URL]
Ranked 1st in Music Task at the 2015 Signal Separation Evaluation Campaign [URL]
Granted Patents
US11067661B2 “Information processing device and information processing method” [URL]
US10924849B2 “Sound source separation device and method” [URL]
US10880638B2 “Sound field forming apparatus and method” [URL]
US10757505B2 “Signal processing device, method, and program stored on a computer-readable medium, enabling a sound to be reproduced at a remote location and a different sound to be reproduced at a location neighboring the remote location” [URL]
US10674255B2 “Sound processing device, method and program” [URL]
US10650841B2 “Sound source separation apparatus and method” [URL]
US10602266B2 “Audio processing apparatus and method, and program” [URL]
US10595148B2 “Sound processing apparatus and method, and program” [URL]
US10567872B2 “Locally silenced sound field forming apparatus and method” [URL]
US10524075B2 “Sound processing apparatus, method, and program” [URL]
US10477309B2 “Sound field reproduction device, sound field reproduction method, and program” [URL]
US10412531B2 “Audio processing apparatus, method, and program” [URL]
US10380991B2 “Signal processing device, signal processing method, and program for selectable spatial correction of multichannel audio signal” [URL]
US10206034B2 “Sound field collecting apparatus and method, sound field reproducing apparatus and method” [URL]
US10015615B2 “Sound field reproduction apparatus and method, and program” [URL]
US9711161B2 “Voice processing apparatus, voice processing method, and program” [URL]
US9654872B2 “Input device, signal processing method, program, and recording medium” [URL]
US9426564B2 “Audio processing device, method and program” [URL]
US9406312B2 “Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program” [URL]
US9380398B2 “Sound processing apparatus, method, and program” [URL]
US9208795B2 “Frequency band extending device and method, encoding device and method, decoding device and method, and program” [URL]
US8295507B2 “Frequency band extending apparatus, frequency band extending method, player apparatus, playing method, program and recording medium” [URL]
Academic Activity
Competition Organizer
Sounding Video Generation Challenge
Sounding Video Generation (SVG) Challenge 2024 [URL]
General Chair of Music Demixing (MDX) Challenge 2021 [URL] [report][Workshop]
IEEE DCASE Challenge
Task Organizer of DCASE2024 Challenge Task 3: “Audio and Audiovisual Sound Event Localization and Detection with Source Distance Estimation” [URL][dataset]
Task Organizer of DCASE2023 Challenge Task 3: “Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes” [URL][report][dataset]
Task Organizer of DCASE2022 Challenge Task 3: “Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes” [URL][report][dataset]
Workshop Organizer at ECCV 2024: “AVGenL: Audio- Visual Generation and Learning” [URL]
IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC) Member 2023–2026 [URL]
IEEE ICCE Japan Program Committee Chair 2021–2023
Session Chair at IEEE ICASSP 2024: “Generative Semantic Communication: How Generative Models Enhance Semantic Communications” [URL]
Oral Session Chair at IEEE ICASSP 2023: “Diffusion-based Generative Models for Audio and Speech” [URL]
Session Chair at IEEE ICASSP 2022: Signal Processing and Neural Approaches for Soundscapes (SiNApS)” [URL]
Session Chair at IEEE ICASSP 2020: “Active Control of Acoustic Noise over Spatial Regions” [URL]
PhD Supervision
TRAMUCA: Transparency in AI-powered Music Creation Algorithms, 4-year Fully-funded PhD Studentship by Sony and MTG-UPF, Joint Supervision with Dr. Emilia Gómez and Dr. Xavier Serra [URL]
“Transparency in Music-Generative AI: A Systematic Literature Review” [preprint]
Roser Batlle-Roca, Wei-Hsiang Liao, Xavier Serra, Yuki Mitsufuji, Emilia Gómez, “Towards Assessing Data Replication in Music Generation with Music Similarity Metrics on Raw Audio,” accepted at the 25th International Society for Music Information Retrieval (ISMIR) Conference, 2024 [arXiv][code]
Lecture at University
“AI x Creators: Pushing Creative Abilities to the Next Level” at Matsuo Lab, the University of Tokyo on Dec. 12, 2024 [URL]
“Deep Generative Models for Audio Applications” at AI Research Center, National Institute of Advanced Industrial Science and Technology (AIST) on Mar. 22, 2024 [URL]
“Deep Generative Models for Audio Applications” at TélécomParis (Audio/ADASP group) on Jan. 25, 2024 [URL]
“AI x Creators: Pushing Creative Abilities to the Next Level” at Matsuo Lab, the University of Tokyo on Nov. 24, 2023 [URL]
“AI & Network Communication Systems”, 7-lecture Course at Tokyo Institute of Technology, 3rd Quarter (Fall), 2023 [URL]
“AI x Creators: Pushing Creative Abilities to the Next Level” at Matsuo Lab, the University of Tokyo on Dec. 16, 2022 [URL]
“AI & Network Communication Systems”, 7-lecture Course at Tokyo Institute of Technology, 3rd Quarter (Fall), 2022 [URL]
“AI x Creators: Pushing Creative Abilities to the Next Level” at Matsuo Lab, the University of Tokyo on Feb. 16, 2022 [URL]
“Content Creation by Cutting Edge AI-powered Music Technology” at Tokyo Institute of Technology on Dec. 1, 2021 [URL]
“AI x Creators: Pushing Creative Abilities to the Next Level” at Keio University on Oct. 21, 2021