Publications
Automatic Speech Recognition (ASR)
- T.Kawahara.
Automatic meeting transcription system for the Japanese Parliament (Diet).
In Proc. APSIPA ASC, (overview talk), 2017.
(PDF file)
- K.Matsuura, S.Ueno, M.Mimura, S.Sakai, and T.Kawahara.
Speech corpus of Ainu folklore and end-to-end speech recognition
for Ainu language.
In Proc. Int'l Conf. Language Resources \& Evaluation (LREC),
pp.2622--2628, 2020.
(PDF file)
- S.Ueno, A.Lee, and T.Kawahara.
Refining synthesized speech using speaker information and phone masking for data augmentation of speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.32, pp.3924--3933, 2024.
(text)
(KURENAI)
- H.Inaguma and T.Kawahara.
Alignment knowledge distillation for online streaming attention-based speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.31, pp.1371--1385, 2021.
(text)
Speech Emotion Recognition (SER)
- Y.Gao, H.Shi, C.Chu, and T.Kawahara.
Speech emotion recognition with multi-level acoustic and semantic information extraction and interaction.
In Proc. INTERSPEECH, pp.1060--1064, 2024.
(PDF file)
- H.Feng, S.Ueno, and T.Kawahara.
End-to-end speech emotion recognition combined with acoustic-to-word
ASR model.
In Proc. INTERSPEECH, pp.501--505, 2020.
(PDF file)
Robust Speech Recognition
- H.Shi, M.Mimura, and T.Kawahara.
Waveform-domain speech enhancement using spectrogram encoding for robust speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.32, pp.3049--3060, 2024.
(text)
(KURENAI)
- K.Shimada, Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Unsupervised speech enhancement based on multichannel NMF-informed
beamforming for noise-robust automatic speech recognition.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.27, No.5, pp.960--971, 2019.
(text)
(KURENAI)
Source Separation and Speech Enhancement
- K.Sekiguchi, Y.Bando, A.A.Nugraha, K.Yoshii, and T.Kawahara.
Fast multichannel nonnegative matrix factorization with
directivity-aware jointly-diagonalizable spatial covariance matrices for
blind source separation.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28,
pp.2610--2625, 2020.
(text)
- Y.Bando, M.Mimura, K.Itoyama, K.Yoshii, and T.Kawahara.
Statistical speech enhancement based on probabilistic integration of
variational autoencoder and non-negative matrix factorization.
In Proc. IEEE-ICASSP, pp.716--720, 2018.
(PDF file)
Spoken Language Understanding (SLU)
- T.Zhao and T.Kawahara.
Joint dialog act segmentation and recognition in human conversations
using attention to dialog context.
Computer Speech and Language, Vol.50, pp.108--127, 2019.
(text)
- T.V.Dang, T.Zhao, S.Ueno, H.Inaguma, and T.Kawahara.
End-to-end speech-to-dialog-act recognition.
In Proc. INTERSPEECH, pp.3910--3914, 2020.
(PDF file)
Spoken Dialogue Systems (SDS)
- T.Kawahara.
Spoken dialogue system for a human-like conversational robot ERICA.
In Proc. Int'l Workshop Spoken Dialogue Systems (IWSDS), (keynote speech), 2018.
(PDF file)
- K.Inoue, K.Hara, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
Job interviewer android with elaborate follow-up question generation.
In Proc. ICMI, pp.324--332, 2020.
(PDF file)
- K.Inoue, D.Lala, K.Yamamoto, S.Nakamura, K.Takanashi, and T.Kawahara.
An attentive listening system with android ERICA: Comparison of
autonomous and WOZ interactions.
In Proc. SIGdial Meeting Discourse \& Dialogue, pp.118--127,
2020.
(PDF file)
- Tatsuya Kawahara, Hiroshi Saruwatari, Ryuichiro Higashinaka, Kazunori Komatani, and Akinobu Lee.
Spoken Dialogue Technology for Semi-Autonomous Cybernetic Avatars.
In Hiroshi Ishiguro, Fuki Ueno, and Eiki Tachibana, editors,
(text)
Interaction Analysis and Model
- K.Yamamoto, K.Inoue, and T.Kawahara.
Character expression for spoken dialogue systems with semi-supervised learning using variational auto-encoder.
Computer Speech and Language, Vol.79, No. 101469, 2023.
(text)
- K.Inoue, D.Lala, and T.Kawahara.
Can a robot laugh with you?: Shared laughter generation for empathetic spoken dialogue.
Frontiers in Robotics and AI, Vol.Computational Intelligence in Robotics, pp.1--11, 9:933261, 2022.
(text)
(KURENAI)
- K.Inoue, B.Jiang, E.Ekstedt, T.Kawahara, and G.Skantze.
Multilingual turn-taking prediction using voice activity projection.
In Proc. COLING, pp.11873--11883, 2024.
(PDF file)
Multi-modal Conversation Analysis
- K.Inoue, D.Lala, K.Takanashi, and T.Kawahara.
Engagement recognition by a latent character model based on
multimodal listener behaviors in spoken dialogue.
APSIPA Trans. Signal \& Information Process., Vol.7, No.e9,
pp.1--16, 2018.
(text)
- T.Kawahara, T.Iwatate, K.Inoue, S.Hayashi, H.Yoshimoto, and K.Takanashi.
Multi-modal sensing and analysis of poster conversations with smart posterboard.
APSIPA Trans. Signal \& Information Process., Vol.5, No.e2, pp.1--12, 2016.
(text)
Natural Language Processing for Rich Transcription
- J.Nozaki, T.Kawahara, K.Ishizuka, and T.Hashimoto.
End-to-end speech-to-punctuated-text recognition.
In Proc. INTERSPEECH, pp.1811--1815, 2022.
(PDF file)
- M.Mimura, S.Sakai, and T.Kawahara.
An end-to-end model from speech to clean transcript for parliamentary meetings.
In Proc. APSIPA ASC, pp.465--470, 2021.
(PDF file)
Computer Assisted Language Learning (CALL)
- R.Duan, T.Kawahara, M.Dantsuji, and H.Nanjo.
Cross-lingual transfer learning of non-native acoustic modeling for
pronunciation error detection and diagnosis.
IEEE/ACM Trans. Audio, Speech \& Language Process., Vol.28,
No.1, pp.391--401, 2020.
(text)
(KURENAI)
- M.Mirzaei, K.Meshgi, and T.Kawahara.
Exploiting automatic speech recognition errors to enhance partial and synchronized caption for facilitating second language listening.
Computer Speech and Language, Vol.49, pp.17--36, 2018.
(text)
Large Vocabulary Continuous Speech Recognition Platform
- A.Lee and T.Kawahara.
Recent development of open-source speech recognition engine Julius.
In Proc. APSIPA ASC, pp.131--137, 2009.
(PDF file)
- T.Kawahara, A.Lee, K.Takeda, K.Itou, and K.Shikano.
Recent progress of open-source LVCSR engine Julius and Japanese model repository.
In Proc. ICSLP, pp.3069--3072, 2004.
(PDF file)