Jean-Marc Valin — Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Peer-reviewed journal articles

J.-M. Valin, Jan Büthe, A. Mustafa, M. Klingbeil, DRED: Deep REDundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder, IEEE Journal of Selected Topics in Signal Processing, 2024.
J.-M. Valin, A. Mustafa, Jan Büthe, Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction, IEEE Signal Processing Letters, 2024, arXiv:2405.21069.
Y. Chen, D. Murherjee, J. Han, A. Grange, Y. Xu, S. Parker, C. Chen, H. Su, U. Joshi, C.-H. Chiang,Y. Wang, P. Wilkins, J. Bankoski, L. Trudeau, N. Egge, J.-M. Valin, T. Davies, S. Midtskogen, A. Norkin, P. de Rivaz, Z. Liu, An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media, APSIPA Transactions on Signal and Information Processing, 2020. (publisher)
J.-M. Valin, D. V. Smith, C. Montgomery, T. B. Terriberry, An Iterative Linearised Solution to the Sinusoidal Parameter Estimation Problem, Computers and Electrical Engineering (Elsevier), Vol. 36, No. 4, pp. 603-616, 2010. (arXiv, demo)
J.-M. Valin, T. B. Terriberry, C. Montgomery, G. Maxwell, A High-Quality Speech and Audio Codec With Less Than 10 ms delay, IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 1, pp. 58-67, 2010. (arXiv)
J.-M. Valin, S. Yamamoto, J. Rouat, F. Michaud, K. Nakadai, H. G. Okuno, Robust Recognition of Simultaneous Speech By a Mobile Robot, IEEE Transactions on Robotics, Vol. 23, No. 4, pp. 742-752, 2007. (arXiv)
J.-M. Valin, I. B. Collings, Interference-Normalised Least Mean Square Algorithm, IEEE Signal Processing Letters, Vol. 14, No 12, pp. 988-991, 2007. (arXiv)
J.-M. Valin, On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk, IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 3, pp. 1030-1034, 2007. (arXiv)
J.-M. Valin, F. Michaud, J. Rouat, Robust Localization and Tracking of Simultaneous Moving Sound Sources Using Beamforming and Particle Filtering, Robotics and Autonomous Systems Journal (Elsevier), Vol. 55, No. 3, pp. 216-228, 2007. (arXiv, video)
F. Michaud, C. Cote, D. Létourneau, Y. Brosseau, J.-M. Valin, E. Beaudry, C. Raievsky, A. Ponchon, P. Moisan, P. Lepage, Y. Morin, F. Gagnon, P. Giguere, M.-A. Roux, S. Caron, P. Frenette, F. Kabanza, Spartacus attending the 2005 AAAI conference, Autonomous Robots (Springer), Vol. 22, No. 4, pp. 369-383, 2007. (video)
S. Yamamoto, J.-M. Valin, K. Nakadai, M. Nakano, H. Tsujino, K. Komatani, T. Ogata, H. G. Okuno, Simultaneous Speech Recognition based on Automatic Missing-Feature Mask Generation Integrated With Sound Source Separation (音源分離との統合によるミッシングフィーチャマスク自動生成に基づく同時発話音声認識). Journal of Robotic Society of Japan, Vol. 25, No. 1, 2007.
S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, R. Takeda, K. Komatani, T. Ogata, H. G. Okuno, Improving Location-Based Speech Recognition of Simultaneous Speech Signals by Parameter Optimization with Genetic Algorithm (in Japanese). Human Interface, Vol.8, No.2, pp. 203-212, 2006.
D. Létourneau, F. Michaud, J.-M. Valin, Autonomous Mobile Robot That Can Read, EURASIP Journal on Applied Signal Processing, Special Issue on Advances in Intelligent Vision Systems: Methods and Applications, pp. 2650-2662, 2004.

Peer-reviewed conference and workshop papers

2025

D. Rowe, J.-M. Valin, RADE: A Neural Codec for Transmitting Speech over HF Radio Channels, Proceedings of WASPAA, arXiv:2505.06671, 2025.
J. Büthe, J.-M. Valin, A Lightweight and Robust Method for Blind Wideband-to-Fullband Extension of Speech, Proceedings of WASPAA, arXiv:2412.11392, 2025.

2024

J. Büthe, A. Mustafa, J.-M. Valin, K. Helwani, M.M. Goodwin, NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024, arXiv:2309.14521.
K. Subramani, J.-M. Valin, J. Büthe, P. Smaragdis, M.M. Goodwin, Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024, arXiv:2309.14507.
M. Togami, J.-M. Valin, K. Helwani, R. Giri, U. Isik, M.M. Goodwin, Real-time Stereo Speech Enhancement with Spatial-Cue Preservation based on Dual-path Structure, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024, arXiv:2402.00337.

2023

J. Büthe, J.-M. Valin, A. Mustafa, LACE: A light-weight, causal model for enhancing coded speech through adaptive convolutions, Proceedings of WASPAA, arXiv:2307.06610, 2023.
J.-M. Valin, J. Büthe, A. Mustafa, Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023, arXiv:2212.04453v2.
A. Mustafa, J.-M. Valin, J. Büthe, P. Smaragdis, M.M. Goodwin, Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023, arXiv:2212.04532.
Z. Wang, R. Giri, D. Shah, J.-M. Valin, M.M. Goodwin, P. Smaragdis, A Framework for Unified Real-Time Personalized and Non-Personalized Speech Enhancement, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023, arXiv:2302.11768.

2022

J.-M. Valin, A. Mustafa, C. Montgomery, T.B. Terriberry, M. Klingbeil, P. Smaragdis, A. Krishnaswamy, Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model, Proceedings of INTERSPEECH, arxiv:2205.05785, 2022.
K. Subramani, J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation, Proceedings of INTERSPEECH, arxiv:2106.04129, 2022.
J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), arxiv:2106.04129, 2022.
S. Yuan, Z. Wang, U. Isik, R. Giri, J.-M. Valin, M.M. Goodwin, A. Krishnaswamy, Improved Singing Voice Separation with Chromagram-Based Pitch-Aware Remixing, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022.

2021

R. Giri, S. Venkataramani, J.-M. Valin, U. Isik and A. Krishnaswamy, Personalized PercepNet: Real-time, Low-Complexity Target Voice Separation and Enhancement, Proceedings of INTERSPEECH, arxiv:2106.04129, 2021.
L. Drude, J. Heymann, A. Schwarz and J.-M. Valin, Multi-Channel Opus Compression for Far-Field Automatic Speech Recognition With a Fixed Bitrate Budget, Proceedings of INTERSPEECH, arxiv:2106.07994, 2021.
J.-M. Valin, S. Tenneti, K. Helwani, U. Isik, A. Krishnaswamy, Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based on PercepNet, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), arxiv:2102.05245, 2021. First place in ICASSP 2021 Acoustic Echo Cancellation Challenge
J. Casebeer, V. Vale, U. Isik, J.-M. Valin, R. Giri, A. Krishnaswamy, Enhancing Into the Codec: Noise Robust Speech Coding With Vector-Quantized Autoencoders, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), arxiv:2102.06610, 2021.
Z. Wang, R. Giri, U. Isik, J.-M. Valin, A. Krishnaswamy, Semi-Supervised Singing Voice Separation With Noisy Self-Training, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), arxiv:2102.07961, 2021.

2020

U. Isik, R. Giri, N. Phansalkar, J.-M. Valin, K. Helwani and A. Krishnaswamy, PoCoNet: Better Speech Enhancement with Frequency-Positional Embeddings, Semi-Supervised Conversational Data, and Biased Loss, Proc. INTERSPEECH, arxiv:2008.04470, 2020. (video) First place in INTERSPEECH 2020 Deep Noise Suppression Challenge non real-time track
J.-M. Valin, U. Isik, N. Phansalkar, R. Giri, K. Helwani and A. Krishnaswamy, A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech, Proc. INTERSPEECH, arxiv:2008.04259, 2020. (video) Second place in INTERSPEECH 2020 Deep Noise Suppression Challenge real-time track
J. Skoglund, J.-M. Valin, Improving Opus Low Bit Rate Quality with Neural Speech Synthesis, Proc. INTERSPEECH, arxiv:1905.04628, 2020. (video)

2019

J.-M. Valin, J. Skoglund, A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet, Proceedings of INTERSPEECH, arXiv:1903.12087, 2019. (poster, demo)
J.-M. Valin, J. Skoglund, LPCNet: Improving Neural Speech Synthesis Through Linear Prediction, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), arXiv:1810.11846, 2019. (slides, demo)

2018

J.-M. Valin, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement, Proceedings of IEEE Multimedia Signal Processing (MMSP) Workshop, arXiv:1709.08243, 2018. (poster, demo)
Y. Chen, D. Mukherjee, J. Han, A. Grange, Y. Xu, Z. Liu, S. Parker, C. Chen, H. Su, U. Joshi, C.-H. Chiang, Y. Wang, P. Wilkins, J. Bankoski, L. Trudeau, N. Egge, J.-M. Valin, T. Davies, S. Midtskogen, A. Norkin, P. de Rivaz, An Overview of Core Coding Tools in the AV1 Video Codec, Proceedings of Picture Coding Symposium (PCS), 2018.
S. Midtskogen, J.-M. Valin, The AV1 Constrained Directional Enhancement Filter (CDEF), Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), arXiv:1602.05975, 2018. (slides, demo)

2016

Y. Cho, T. J. Daede, N. E. Egge, G. Martres, T. Matthews, C. Montgomery, T. B. Terriberry, J.-M. Valin, Perceptually-Driven Video Coding with the Daala Video Codec, Proceedings of SPIE Workshop on Applications of Digital Image Processing (ADIP), arXiv:1610.02488, 2016. (slides)
J.-M. Valin, T. B. Terriberry, N. E. Egge, T. J. Daede, Y. Cho, C. Montgomery, M. Bebenita, Daala: Building a Next-Generation Video Codec From Unconventional Technology, Proceedings of IEEE Multimedia Signal Processing (MMSP) Workshop, 2016. (arXiv, poster, demo)
J.-M. Valin, N. E. Egge, T. J. Daede, T. B. Terriberry, C. Montgomery, Daala: A Perceptually-Driven Still Picture Codec, Proceedings of IEEE Internal Conference on Image Processing (ICIP), 2016. (arXiv, slides)
T. J. Daede, N. E. Egge, J.-M. Valin, G. Martres, T. B. Terriberry, Daala: A Perceptually-Driven Next Generation Video Codec, Presented at Data Compression Conference (DCC), 2016. (arXiv)

2015

N. Egge, J.-M. Valin, T. B. Terriberry, T. Daede, C. Montgomery, Using Daala Intra Frames for Still Picture Coding, Proceedings of Picture Coding Symposium (PCS), 2015. (slides)
J.-M. Valin, T. B. Terriberry, Perceptual Vector Quantization for Video Coding, Proceedings of SPIE Visual Information Processing and Communication, 2015. (arXiv, slides, demo)
N. Egge, J.-M. Valin, Predicting Chroma from Luma with Frequency Domain Intra Prediction, Proceedings of SPIE Visual Information Processing and Communication, 2015. (arXiv, slides, demo)

2013

J.-M. Valin, G. Maxwell, T. B. Terriberry, K. Vos, High-Quality, Low-Delay Music Coding in the Opus Codec, Proceedings of the 135^th AES Convention, 2013. (arXiv, slides, demo)
K. Vos, K. V. Sorensen, S. S. Jensen, J.-M. Valin, Voice Coding with Opus, Proceedings of the 135^th AES Convention, 2013. (slides)

2012

M. Frechette, D. Létourneau, J.-M. Valin, F. Michaud, Integration of Sound Source Localization and Separation to Improve Dialogue Management on a Robot, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012.

2009

F. Sabrina, J.-M. Valin, Priority based dynamic rate control for VoIP traffic, Proceedings of Globecom, 2009.
A. P. Badali, J.-M. Valin, F. Michaud, P. Aarabi, Evaluating Real-time Audio Localization Algorithms for Artificial Audition on Mobile Robots, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009.
J.-M. Valin, T. B. Terriberry, G. Maxwell, A Full-Bandwidth Audio Codec with Low Complexity and Very Low Delay, Proceedings of EUSIPCO, 2009. (arXiv, slides)
D. J. Ryan, I. B. Collings, J.-M. Valin, Reflected Simplex Codebooks for Limited Feedback MIMO Beamforming, Proceedings of IEEE International Conference on Communications (ICC), 2009.

2008

F. Sabrina, J.-M. Valin, Adaptive Rate Control for Aggregated VoIP Traffic, Proceedings of Globecom, 2008.
J.-M. Valin, Perceptually-Motivated Nonlinear Channel Decorrelation For Stereo Acoustic Echo Cancellation, Proceedings of Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2008. (arXiv, poster)
S. Brière, J.-M. Valin, F. Michaud, Dominic Létourneau, Embedded Auditory System for Small Mobile Robots, Proceedings of International Conference on Robotics and Automation (ICRA), 2008.
H. G. Okuno, S. Yamamoto, K. Nakadai, J.-M. Valin, K. Komatani, T. Ogata, A Portable Robot Audition Software System for Multiple Simultaneous Speech Signals, Proceedings of Acoustics'08.

2007

J.-M. Valin, D. V. Smith, C. Montgomery, T. B. Terriberry, Low-Complexity Iterative Sinusoidal Parameter Estimation, Proceedings of International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 276-283, 2007. (arXiv, slides)
J.-M. Valin, I.B. Collings, A New Robust Frequency Domain Echo Canceller With Closed-Loop Learning Rate Adaptation, Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2007. (arXiv, poster)
S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, K. Komatani, T. Ogata, H. G. Okuno, Design and Implementation of a Robot Audition System for Automatic Speech Recognition of Simultaneous Speech, Proceedings of ASRU, 2007.

2006

J.-M. Valin, F. Michaud, J. Rouat, Robust 3D Localization and Tracking of Sound Sources Using Beamforming and Particle Filtering, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 841-844, 2006. (arXiv, slides)
J.-M. Valin, C. Montgomery, Improved Noise Weighting in CELP Coding of Speech - Applying the Vorbis Psychoacoustic Model To Speex, Proceedings of the 120^th AES Convention, 2006. (arXiv, slides)
J.-M. Valin, Channel Decorrelation For Stereo Acoustic Echo Cancellation In High-Quality Audio Communication, Proceedings of Workshop on the Internet, Telecommunications and Signal Processing (WITSP), 2006. (arXiv, slides)
S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, K. Komatani, T. Ogata, H. G. Okuno, Real-Time Robot Audition System That Recognizes Simultaneous Speech in the Real World, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2006.
S. Yamamoto, R. Takeda, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, K. Komatani, T. Ogata, H. G. Okuno, Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition. Proceedings of 9th Biennial Pacific Rim International Conference on Artificial Intelligence (PRICAI), pp. 484-494, 2006.
S. Yamamoto, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, R. Takeda, K. Komatani, T. Ogata, H. G. Okuno, Genetic Algorithm based Improvement of Robot's Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals. Proceedings of Nineteenth International Conference on Industrial, Engineering and Other Applications of Applied Intelligence Systems (IEA/AIE), pp.207-217, 2006.
S. Briere, D. Létourneau, M. Frechette, J.-M. Valin, F. Michaud, Embedded and integration audition for a mobile robot, Proceedings AAAI Fall Symposium Workshop Aurally Informed Performance: Integrating Machine Listening and Auditory Presentation in Robotic Systems, FS-06-01, 6-10, 2006
S. Yamamoto, R. Takeda, K. Nakadai, M. Nakano, H. Tsujino, J.-M. Valin, K. Komatani, T. Ogata, H. G. Okuno, Leak Energy based Missing Feature Mask Generation for ICA and GSS and Its Evaluation with Simultaneous Speech Recognition, Proceedings of ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA), pp.42-46, 2006.

2005

S. Yamamoto, K. Nakadai, J.-M. Valin, J. Rouat, F. Michaud, K. Komatani, T. Ogata, H. G. Okuno, Making a robot recognize three simultaneous sentences in real-time, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2005.
M. Murase, S. Yamamoto, J.-M. Valin, K. Nakadai, K. Yamada, K. Komatani, T. Ogata, H. G. Okuno, Multiple Moving Speaker Tracking by Microphone Array on Mobile Robot, Proceedings of European Conference on Speech Communication and Technology (Interspeech), 2005.
F. Michaud, Y. Brosseau, C. Côté, D. Létourneau, P. Moisan, A. Ponchon, C. Raïevsky, J.-M. Valin, E. Beaudry, F. Kabanza, Modularity and Integration in the Design of a Socially Interactive Robot, Proceedings of International Workshop on Robot and Human Interactive Communication, pp. 172-177, 2005.
S. Yamamoto, J.-M. Valin, K. Nakadai, J. Rouat, F. Michaud, T. Ogata, H. G. Okuno, Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory, Proceedings of International Conference on Robotics and Automation (ICRA), 2005.
F. Michaud, D. Létourneau, P. Lepage, Y. Morin, F. Gagnon, P. Giguère, É. Beaudry, Y. Brosseau, C. Côté, A. Duquette, F.-F. Laplante, M.-A. Legault, P. Moisan, A. Ponchon, C. Raïevsky, M.-A. Roux, T. Salter, J.-M. Valin, S. Caron, P. Frenette, P. Masson, F. Kabanza, M. Lauria, Socially interactive robots for real life use, Proceedings Workshop on Mobile Robot Competition, American Association for Artificial Intelligence Conference (AAAI), 2005.
F. Michaud, D. Létourneau, P. Lepage, Y. Morin, F. Gagnon, P. Gigere, E. Beaudry, Y. Brosseau, C. Côté, A. Duquette, J.-F. Laplante, M.-A. Legault, P. Moisan, A. Ponchon, C. Raïevsky, M.-A. Roux, T. Salter, J.-M. Valin, S. Caron, P. Masson, F. Kabanza, M. Lauria, A brochette of socially interactive robots, Proceedings of American Association for Artificial Intelligence Conference, pp. 1733-1734, 2005.

2004

J.-M. Valin, J. Rouat, F. Michaud, Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2123-2128, 2004. (arXiv, slides, demo)
C. Côté, D. Létourneau, F. Michaud, J.-M. Valin, Y. Brosseau, C. Raievsky, M. Lemay, V. Tran, Code Reusability Tools for Programming Mobile Robots, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1820-1825, 2004. (slides)
J.-M. Valin, J. Rouat, F. Michaud, Microphone Array Post-Filter for Separation of Simultaneous Non-Stationary Sources, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 221-224, 2004. (arXiv, slides)
J.-M. Valin, F. Michaud, B. Hadjou, J. Rouat, Localization of Simultaneous Moving Sound Sources for Mobile Robot Using a Frequency-Domain Steered Beamformer Approach, Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 1033-1038, 2004. (arXiv, slides)
M. Lemay, F. Michaud, D. Létourneau, J.-M. Valin, Autonomous Initialization of Robot Formation, Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 3018-3023, 2004.

2003

J.-M. Valin, F. Michaud, J. Rouat, D. Létourneau, Robust Sound Source Localization Using a Microphone Array on a Mobile Robot, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1228-1233, 2003. (arXiv, slides, video)
D. Létourneau, F. Michaud, J.-M. Valin, C. Proulx, Textual Message Read by a Mobile Robot, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2724-2729, 2003.
D. Létourneau, F. Michaud, J.-M. Valin, C. Proulx, Making a Mobile Robot Read Textual Messages, Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp. 4236-4241, 2003.

2002

F. Michaud, D. Létourneau, M. Gilbert, J.-M. Valin, Dynamic Robot Formations Using Directional Visual Perception, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2740-2745, 2002.

2000

J.-M. Valin, R. Lefebvre, Bandwidth Extension of Narrowband Speech for Low Bit-Rate Wideband Coding, Proceedings of IEEE Speech Coding Workshop (SCW), 2000, pp. 130-132. (arXiv, slides)

1999

J.-M. Valin, D. Stork, Open Mind Speech Recognition, Proceedings of Automatic Speech Recognition and Understanding Workshop (ARSU), 1999.
S.D. Peters, P. Stubley, J.-M. Valin, On the Limits of Speech Recognition in Noise, Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1999, pp. 365-368.

Thesis and Dissertation

J.-M. Valin, Auditory System For a Mobile Robot, PhD Thesis, 102 pp., 2005. (arXiv, defence slides)
J.-M. Valin, Extension spectrale d'un signal de parole de la bande téléphonique à la bande AM, Masters dissertation, 65 pp., 2001. (arXiv)

Unpublished/Preprints

J.-M. Valin, R. Giri, S. Venkataramani, U. Isik, A. Krishnaswamy, To Dereverb Or Not to Dereverb? Perceptual Studies On Real-Time Dereverberation Targets, arXiv:2206.07917, 2022.
J.-M. Valin, Speex: A Free Codec For Free Speech, arXiv:1602.08668 [cs.SD], Presented at linux.conf.au, Dunedin, 2006. (slides)

IETF Documents

RFC

J.-M. Valin, K. Vos, Updates to the Opus Audio Codec, RFC 8251, Internet Engineering Task Force (IETF), October 2017.
J.-M. Valin, C. Bran, WebRTC Audio Codec and Processing Requirements, RFC 7874, Internet Engineering Task Force (IETF), May 2016.
J. Spittka, K. Vos, J.-M. Valin, RTP Payload Format for Opus Speech and Audio Codec, RFC 7587, Internet Engineering Task Force (IETF), June 2015.
J.-M. Valin, K. Vos, T. B. Terriberry, Definition of the Opus Audio Codec, RFC 6716, Internet Engineering Task Force (IETF), Sep 2012.
J.-M. Valin, S. Borilin, K. Vos, C. Montgomery, R. Chen, Guidelines for Development of an Audio Codec within the IETF, RFC 6569, Internet Engineering Task Force (IETF), Mar 2012.
C. Perkins, J.-M. Valin, Guidelines for the Use of Variable Bit Rate Audio with Secure RTP, RFC 6562, Internet Engineering Task Force (IETF), Mar 2012.
J.-M. Valin, K. Vos, Requirements for an Internet Audio Codec, RFC 6366, Internet Engineering Task Force (IETF), Aug 2011.
G. Herlein, J.-M. Valin, A. Heggestad, A. Moizard, RTP Payload Format for the Speex Codec, RFC 5574, Internet Engineering Task Force (IETF), June 2009.

Internet Drafts

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

J.-M. Valin, T.B. Terriberry, Extension Formatting for the Opus Codec, Internet draft, Internet Engineering Task Force (IETF), March 2023.
J.-M. Valin, J. Büthe, Deep Audio Redundancy (DRED) Extension for the Opus Codec , Internet draft, Internet Engineering Task Force (IETF), March 2023.
J.-M. Valin, Directional Deringing Filter, Internet draft, Internet Engineering Task Force (IETF), October 2015.
J.-M. Valin, Screencasting Considerations and L1-Tree Wavelet Coding, Internet draft, Internet Engineering Task Force (IETF), July 2015.
J.-M. Valin, Pyramid Vector Quantization for Video Coding, Internet draft, Internet Engineering Task Force (IETF), June 2015.

Patents

Some of these patents are available on a royalty-free basis, for use in the Opus, Daala, and AV1 codecs.

J.-M. Valin, A. Krishnaswammy, J. Peil, Audio coding with depth and bandwidth scalability, US App. 19/375,387.
R. Giri, Z. Wang, D. Shah, J.-M. Valin, M. M. Goodwin, Unified Audio Suppression Model, US App. 18/478,759.
R. Giri, M.M. Goodwin, A. Krishnaswammy, U. Isik, J.-M. Valin, Z. Wang, Semi-supervised training of a machine learning model for target speaker audio enhancement, US 12,531,067.
J.-M. Valin, J. Büthe, A. Mustafa, Neural coding for redundant audio information transmission, US 12,431,143.
A. Mustafa, J.-M. Valin, J. Büthe, P. Smaragdis, M.M. Goodwin, Efficient voice synthesis using frame-based processing, US 12,354,593.
R. Giri, S. Venkataramani, J.-M. Valin, M. U. Isik, A. Krishnaswamy, Real-time target speaker audio enhancement, US 12,272,371.
M. Togami, K. Helwani, J.-M. Valin, M.M. Goodwin, Real-time low-complexity stereo speech enhancement with spatial cue preservation, US 12,167,223.
R. Giri, U. Isik, N. Phansalkar, J.-M. Valin, K. Helwani, A. Krishnaswamy, Speech enhancement machine learning model for estimation of reverberation in a multi-task learning framework, US 12,014,748.
U. Isik, R. Giri, N. Phansalkar, J.-M. Valin, K. Helwani, A. Krishnaswamy, Convolutional neural network with positional embeddings for audio processing, US 12,008,457.
J.-M. Valin, K. Helwani, S. Tenneti, E. Soltanmohammadi, M. U. Isik, R. Newman, M. M. Goodwin, A. Krishnaswammy, Joint noise and echo suppression for two-way audio communication enhancement, US 11,924,367.
J.-M. Valin, U. Isik, N. Phansalkar, R. Giri, K. Helwani, A. Krishnaswamy, Ratio mask post-filtering for audio enhancement, US 11,521,637.
J.-M. Valin, T. B. Terriberry, Directional deringing filters, US 10,432,932.
J.-M. Valin, T. B. Terriberry, Probability modeling of intra prediction modes, US 9967594B2.
J.-M. Valin, T. B. Terriberry, Vector quantization with non-uniform distributions, US 9425820B2.
J.-M. Valin, T. B. Terriberry, Pyramid vector quantization for video coding, US 9560386B2.
T. B. Terriberry, J.-M. Valin, Method and system for two-step spreading for tonal artifact avoidance in audio coding, US 8838442B2.
J.-M. Valin, T. B. Terriberry, Methods and systems for adaptive time-frequency resolution in digital data coding, US 9008811B2.
J.-M. Valin, T. B. Terriberry, Methods and systems for bit allocation and partitioning in gain-shape vector quantization for audio coding, US 9009036B2.
J.-M. Valin, T. B. Terriberry, Methods and systems for avoiding partial collapse in multi-block audio coding, US 9015042B2.
I. B. Collings, D. Ryan, J.-M. Valin, Vector quantization in wireless communication, US 8396163B2.

Technical Notes

This is a collection of mini-papers on various subjects. They have not been published anywhere (other than here!) and have not been peer-reviewed. Some of the content may be interesting, some may be wrong, use with caution.

J.-M. Valin, Intra Paint Deringing Filter, 2015. (deprecated)
J.-M. Valin, Jmspeex' Journal of Dubious Theoretical Results, 2012-2015.
J.-M. Valin, Probability Modelling of Intra Prediction Modes, 2014.
J.-M. Valin, PVQ Encoding with Non-Uniform Distribution, 2014.
J.-M. Valin, Energy Preservation in PVQ-Based Video Coding, 2014. Mostly obsolete, see the SPIE paper Perceptual Vector Quantization for Video Coding instead.