Odette, could you tell us a bit about your background, and what the road to your current position was?
In high school, I enjoyed both languages and science topics such as physics, chemistry and biology. When researching what I wanted to study I came across “Language, Speech, and Computer Science” at Radboud University, Nijmegen, the Netherlands, which sounded and indeed was an interesting combination of both languages and science topics. Probably inspired by one of my favourite TV series when I was younger, the Knight Rider, which included a car with which you could communicate through speech, I from early on focused on speech technology.
After obtaining my university degree in 2000, I was offered a PhD position at the same department as I pursued my studies, on another interdisciplinary topic: computational modelling of human speech processing. My PhD project (2001-2005) combined theories about human speech processing (psycholinguistics) and tools and approaches from automatic speech recognition (which itself is more or less at the cross-roads of electrical engineering and computer science) in order to learn more about how humans process speech and improve automatic speech recognition (i.e., the conversion of speech into text).
After obtaining my PhD (in 2005), I went to the Speech and Hearing group in the Department of Computer Science at the University of Sheffield, UK, for a visiting post-doc position (funded by a Dutch Science Foundation (NWO) Talent Scholarship). I then returned to Radboud University for a 3-year post-doc position (funded by an NWO Veni personal fellowship) on new computational modelling of human speech processing project. After this project, I felt that after having read so much about the theories about humans process speech, I really wanted to know how researchers actually came to these theories. So, in the next few years, my research focused on human speech processing. First at the Max Planck Institute for Psycholinguistics, where I was trained as a psycholinguist, and subsequently, funded by an NWO Vidi personal grant, again at the Radboud University, where I became Associate Professor.
Towards the end of my Vidi-project (in 2016), I started to miss the computer science component of my earlier research and decided to try to move back into automatic speech recognition. I had an idea, met two amazing speech researchers who loved my idea, and we decided to collaborate. This collaboration (still ongoing) has allowed me to move back into the field of automatic speech recognition that at that time was rapidly changing due to the rise of deep learning.
In 2018, my Vidi project and contract at Radboud University ended, and I became unemployed. I was then headhunted by a company on automatic speech recognition for health applications. However, I felt that I wanted to stay in academia. Luckily for me, shortly after joining the company, Delft University of Technology offered me a Delft Technology Fellowship, and I joined TU Delft in June 2018, where I’ve since then worked as an Associate Professor of Speech Technology.
How important is interdisciplinarity in your research on speech?
As probably is clear from my road so far, I am an interdisciplinary researcher. The field of automatic speech recognition is already interdisciplinary in that it combines electrical engineering and computer science. However, in my research, I use my knowledge about sounds and sound structures (i.e., phonetics, a subfield of linguistics) and am inspired by and use knowledge about how humans process speech (i.e., psycholinguistics). The speech signal is a signal that can be researched and viewed from different angles: from the perspective of frequencies (physics), the perspective of the individual sounds (phonetics), meaning (semantics), as a means to convey a message or intent, etc. . It also contains different types of information: the words of the message, information about the speaker’s identity, age, gender, height, health status, emotional status, native language, to name only a few.
The focus of my research is on automatic speech recognition. Automatic speech recognisers typically work well for “standard” speakers of a small number of languages. In fact, for only about 2% of all the languages in the world, there is enough annotated speech data to build automatic speech recognisers. Moreover, a large portion of society does not speak in a “standard” way: “standard” speakers are native speakers of a language, without a speech impediment, without a strong regional accent, typically highly educated, and between the ages of 18 and 60 years. As you can tell, this excludes a large portion of our society: children, elderly, people with speaking or voice disorders, deaf people, immigrants, etc. In my work, I focus on making speech technology, and particularly automatic speech recognition, available for everyone, irrespective of how one speaks and the language one speaks. In order to do so, I look at how humans process speech as they are the best speech recognisers that exist; moreover, they can quickly adapt to idiosyncrasies in a speaker’s speech or voice. Moreover, I use knowledge about how sounds and the voice sound differently depending on, for instance, the speaker’s age or health status. So, in my research towards inclusive speech technology, I combine computer science with linguistics and psycholinguistics. Interdisciplinarity is thus at the core of my research.
What disciplines do you combine yourself in your own work?
As explained above, in my research I combine multiple research fields, most notably: computer science, different subfields of psycholinguistics (first and second language learning, native and non-native speech processing; the processing of emotions) and linguistics (primarily phonetics and a bit of conversational analysis).
Could you name a grand research challenge in your current field of work?
There are several grand research challenges in my field:
- I already named one: making speech technology available for everyone, irrespective of how one speaks and what language one speaks. One of the grand challenges for this is to build speech technology for speech that is not only highly variable but for which also only a little amount of data is available (i.e., low resource scenarios).
- A second grand challenge: when people speak they often use words or phrases from another language, this is called code-switching. Automatic speech recognisers are typically built for one language; it is very hard for them to deal with code-switched speech.
- A third grand challenge: speech is often produced with background noise or background speech present. This deteriorates recognition performance tremendously. Dealing with all the different types of background noise and speech is another grand challenge.
You have been an active champion for diversity and inclusion. Could you tell us a bit more about your activities on these topics?
When I was growing academically, I did not really have a female role model, and especially not female role models who had children. When I was in my late twenties/early thirties, I found this hard because I was afraid that having children would negatively impact my chances for the next academic job and my academic career in general. Also, being not only a first-generation PhD but also a first-generation academic, it took me a really long time to realise there were unwritten rules and, knowing what these were and how to deal with them (not sure I now know all 😉 ). Then, when I became Associate Professor at Radboud University, I found that several students, male and female, regularly came to talk to me about personal and academic issues and, that they thought my advice useful and I found it interesting and motivating to talk to them. I wanted to do more regarding gender equality but didn’t know how.
Then in 2016, a group of senior female speech researchers together organised the Young Female Researchers in Speech Science and Technology Workshop, in conjunction with the flagship conference of the International Speech Communication Association (ISCA) Interspeech, in order to attract more female students into a speech PhD program. I was invited as a mentor. This workshop was highly successful and now is a yearly workshop in conjunction with Interspeech. I joined the organisation of this workshop for 3 years. Then in 2019, having advocated gender equality in the ISCA board of which I’ve been a member since 2017, I was asked to form a new committee: the committee for gender equality. Very quickly this committee started to focus on more than gender and look at other types of diversity, sexual orientation, research areas (ISCA encompasses several speech sub-areas, including phonetics, psycholinguistics, health, automatic speech recognition, speech generation, etc.), and geographical regions. Naturally, we not only wanted to attract people from diverse backgrounds but also wanted to retain them, so we also started to look into inclusion. The first thing our committee did, was to create a website where female speech researchers who hold a PhD can list themselves. This website is used to help workshop/conference organisers to find female researchers for the organising committee, as panellists and keynote/invited speakers, etc. We then went on to organise diversity and inclusion meetings at Interspeech and for 2 years we organise a separate ISCA-queer meeting. We have held a workshop in Africa (remotely due to the pandemic) in order to reach local speech researchers there and see where we can collaborate and where we can help them with our resources and expertise. We wrote a code of conduct for session chairs at workshops/conferences in order for them to know how to balance questions from people from minority groups and non-minority groups. To name but a few of our activities.
In 2020 I came up with the idea for a mentoring programme within the IEEE Signal Processing Society (SPS), for students from minority groups, which was well received and was funded with $50K annually. This programme, loosely based on the YFRSW-format, provides students with a mentor from our society who will supervise them for a period of 9 months, and who will mentor them and help them build a network. Each student receives $4K to visit one of the IEEE (SPS) conferences/workshops. In the first round, we awarded 9 students from all over the world.
In addition to these activities, I’ve also been on the board of the Delft Women in Science (DEWIS) at my university and the chair of the Diversity and Inclusion Committee (EDIT) at my faculty at TU Delft. Additionally, I am regularly asked to appear as a female role model in STEM for young girls and in Dutch media.
In getting to your current position, you experienced some personal hardships. In serving as a public role model, you have been open about these. How can we learn from these experiences to make academia a better place?
My CV shows the (many) consecutive positions I’ve had and how almost all are financed by personal grants that I obtained. These personal grants especially tend to attract a lot of praise. What my CV doesn’t show is the story behind it. It doesn’t show the many job applications I sent out, which never led to a position. It doesn’t show that for a period of more than 2 years I did not have a contract, meaning that I did not have any social security, while I was working on a post-doc position. It does not show how I was bullied at my previous university and the damage that did to my self-esteem, something I still struggle with. It does not show that I had to leave behind my 10-month-old daughter for a month and again for 2 weeks because I was expected to be in Germany for a post-doc position, nor does it show the two bouts of (mild) depression I suffered (one directly related to the bullying). I never talked about all of this because as a temporary (and young and female) researcher, you feel extremely vulnerable because you are so dependent on (the goodwill of) other, more senior researchers. If you don’t want to or cannot do a task, if you complain, they will simply find someone else and you are without a job again. On top of that, you often simply are not believed.
When I became a mentor for students and young researchers, I decided to share some of my struggles so that they knew that they were not the only ones who struggled and that I knew what they were going through. I began to receive feedback from these students that they appreciated my honesty and openness, which gave me the courage to be more open about my own issues. However, only after I received my permanent position at TU Delft (in 2019), and after becoming active in diversity and inclusion, did I very slowly dare to speak more openly to my colleagues and senior people about what had happened.
In late 2019, I was asked to talk about what it is like to be a female researcher in speech technology at the IEEE Workshop on Automatic Speech Recognition and Understanding. I thought about the story I wanted to tell, and eventually, I decided to tell my colleagues, including many of my close friends, my story: I started by showing them my CV, which received a lot of appreciative nods. I then told the story of my life, the story that is not shown by my CV, including the hardships. This resulted in many of my male colleagues and friends crying. Of course, this was never my intention. I don’t think that my story is that much different from the average person from a minority group and probably there are quite a few men whose stories are worse than mine.What I wanted to say was: CVs might look great, or they might not. It is important to not take CVs or facts or numbers at face value, you don’t know what people go through or have done to get where they are. Everyone has a story to tell; but it is, unfortunately, the case that the bad stories far more often happen to women and other people from minority groups.
A third message of piece of advice is that if you go through a hard time, know that you are not alone. In life in general, and in academia particularly, we celebrate successes, but failures and hardships are ignored and are often considered a weakness. I strongly believe that by being open about one’s hardships, you will feel better yourself, and will help others with dealing with their hardships.
Finally, we need to see fellow academics as people and treat them as people. We need to be supportive of one another, especially of our younger colleagues and of those from minority groups. We should be mentors and role models. We should listen to what they are saying and believe what they are saying. Not question what they say, but believe them when they describe something bad that has happened to them and help, because daring to speak up takes an enormous amount of strength and courage. If one dares to speak up, believe that it is true and tell them that you know how courageous they have to be to speak up.
How and in what form do you feel we as academics can be most impactful?
As academics we have many responsibilities: we teach the younger generation, we investigate and develop new technology and theories. Some of our research has a direct impact on society, some research does not yet, some research will maybe never have a direct impact on society. I don’t believe that all research needs to have an impact. I do believe that we as academics can be impactful, and that is by explaining science to the general audience. What is science? Why don’t scientists have answers to all questions? Why is what you do important? By explaining one’s research in layman’s terms, science and scientific output will become easier to understand for non-scientists. It will help shape public debate. It will lead to scientific results not being as easily dismissed as nowadays often happens. At the same time, and at least as important: by talking to people from the general public, you as an academic will see the world through their eyes, look at the impact of your work in a different way, and I am convinced it will also often lead to the explanation of why a certain development or technology is not adopted by society at large or by a particular group in society. In short, academics can be most impactful by communicating with the general public, and communication is and thus should be a two-directional process.
Bios
Dr Odette Scharenborg is an Associate Professor and Delft Technology Fellow at the Multimedia Computing Group at the Delft University of Technology, the Netherlands, and the Vice-President of the International Speech Communication Association (ISCA). Her research focuses on human speech-processing inspired automatic speech processing with the aim to develop inclusive speech technology, i.e., speech technology that works for everyone irrespective of how they speak or the language they speak.
Since 2017, Odette is on the Board of ISCA, where she is also the chair of the Diversity committee (since 2019) and was co-chair of the Interspeech Conferences committee and of the Technical Committee (2017-2019). From 2018-2021, Odette was a member of the IEEE Speech and Language Processing Technical Committee (subarea Speech Production and Perception). From 2019-2021, she was an Associate Editor of IEEE Signal Processing Letters, where she now is a Senior Associate Editor.
Editor Biographies
Dr Cynthia C. S. Liem is an Associate Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests focus on making people discover new interests and content which would not trivially be retrieved in music and multimedia collections, assessing questions of validation and validity in data science, and fostering trustworthy and responsible AI applications when human-interpreted data is involved. She initiated and co-coordinated the European research projects PHENICX (2013-2016) and TROMPA (2018-2021), focusing on technological enrichment of digital musical heritage, and participated as technical partner in an ERASMUS+ education innovation project on Big Data for Psychological Assessment. She gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach, Researcher-in-Residence 2018 at the National Library of The Netherlands, general chair of the ISMIR 2019 conference, and keynote speaker at the RecSys 2021 conference. Presently, she co-leads the Future Libraries Lab with the National Library of The Netherlands, is track leader of the Trustworthy AI track in the AI for Fintech lab with the ING bank, holds a TU Delft Education Fellowship on Responsible AI teaching, and is a member of the Dutch Young Academy.
Dr Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com